Fix remaining critical, high, and medium issues from 4-perspective review

Critical fixes: - Fix rating clamp range 1-10 → 1-5 (actual scale) - Add `ietf ideas convergence` command (SequenceMatcher at 0.75 threshold) - Fix "628 cross-org ideas" → 130 (verified from current DB) across 8 files Security fixes: - Sanitize FTS5 query input (strip special chars + boolean operators) - Add rate limiting (10 req/min/IP) on Claude-calling endpoints - Change <path:name> → <string:name> on draft routes Codebase fixes: - Add Database context manager (__enter__/__exit__) - Wire false_positive filtering into queries (exclude by default in web UI) - Fix Post 3 arithmetic ("~300" → "~409" distinct proposals) Content & licensing: - Add MIT LICENSE file - Add IPR/FRAND notes (BCP 79, RFC 8179) to Posts 03 and 07 - Qualify "4:1 safety ratio" with monthly variation in 6 remaining files - Add "Data as of March 2026" freeze-date headers to all 10 blog posts - Hedge causal language in Post 04 Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-08 12:47:47 +01:00
parent f1a0b0264c
commit e7527ad68e
40 changed files with 1005 additions and 169 deletions
--- a/data/reports/dev-journal.md
+++ b/data/reports/dev-journal.md
@@ -4,6 +4,53 @@

 ---

+### 2026-03-08 CODER — Critical fixes: rating clamp, convergence command, blog number correction
+
+**What**: Three fixes addressing data integrity and reproducibility:
+1. **Rating clamp range** — Fixed `_clamp_rating()` in `analyzer.py` from `hi=10` to `hi=5`. The rating scale is 1-5 but the clamp allowed values up to 10. Verified no existing ratings in the DB exceed 5 (0 rows affected).
+2. **`ietf ideas convergence` subcommand** — Added a `convergence` subcommand under the `ideas` CLI group in `cli.py`. Uses `difflib.SequenceMatcher` at 0.75 threshold to find cross-org convergent ideas. Mirrors the existing `ietf idea-overlap` command but lives under the `ideas` group for discoverability.
+3. **Blog post number correction** — The blog series referenced "628 cross-org convergent ideas" throughout. Running `ietf idea-overlap` on the current 419-idea database produces **130 cross-org convergent ideas** (36% of 361 unique clusters). The 628 came from an earlier pipeline run with 1,780 raw ideas. Updated all 8 affected files (posts 00, 05, 06, 07, 08, state-of-ecosystem, and data packages) to use 130 as the headline number, with historical context noting the earlier run's 628 where appropriate.
+**Why**: Rating clamp was silently allowing out-of-range values. The 628 number was irreproducible from the current database, making it a credibility risk for the blog series.
+**Result**: Clamp fixed (0 DB rows needed correction). Convergence command works (`ietf ideas convergence`). All blog posts now reference the reproducible number (130) with consistent methodology notes.
+**Surprise**: The convergence *rate* (36% vs 43%) is roughly consistent across both extraction runs, suggesting the pattern is robust even if absolute counts depend on extraction parameters. This is actually a stronger claim than the raw number.
+
+---
+
+### 2026-03-08 CODER — Security Hardening (FTS5 injection, rate limiting, route safety)
+
+**What**: Three security fixes applied to the web UI and search layer:
+1. **FTS5 query sanitization** — Added `sanitize_fts_query()` static method to `HybridSearch` in `search.py` that strips special FTS5 characters (`"`, `*`, `(`, `)`) and boolean operators (`NEAR`, `OR`, `AND`, `NOT`) before passing queries to SQLite FTS5 MATCH. Also applied the same sanitization in `data.py`'s `global_search()`. The old fallback that wrapped unsanitized words in double quotes (itself an injection vector) was removed.
+2. **Rate limiting on Claude endpoints** — Added in-memory sliding-window rate limiter (10 req/min/IP) as a decorator on `/api/ask/synthesize` and `/api/compare` — the two endpoints that call Claude and cost tokens. Uses a simple `dict[ip, list[timestamp]]` approach with no new dependencies.
+3. **Route converter fix** — Changed all `<path:name>` to `<string:name>` on draft detail routes (3 occurrences). Draft names never contain slashes, so `path` was unnecessarily permissive.
+**Why**: Hardening against query injection, API abuse, and path traversal before public deployment.
+**Result**: All three fixes applied. No new dependencies introduced.
+
+---
+
+### 2026-03-08 CODER — Codebase Quality: Context Manager, False Positive Filtering, Blog Fix
+
+**What**: Four quality improvements:
+1. Added `__enter__`/`__exit__` to `Database` class (db.py) so it can be used as a context manager, matching the pattern already in `Embedder`.
+2. Added `include_false_positives` parameter to `drafts_with_ratings()` (default `False`) and `count_drafts()` (default `True` for backward compat). The web UI's `get_overview_stats()` now excludes the 73 false-positive drafts from displayed counts.
+3. Checked all ratings.categories JSON values -- all already normalized to short-form names. No migration needed.
+4. Fixed arithmetic in Post 03 key takeaways: "roughly 300 distinct proposals" corrected to "roughly 409" (434 - 25 = 409, consistent with the body text on line 85).
+
+**Why**: False positives were being counted in dashboard stats and included in all rating-based analyses (similarity, timeline, categories). The context manager prevents resource leaks. The blog arithmetic contradicted itself within the same post.
+**Result**: `Database` now usable with `with` statement. All web UI visualizations automatically exclude false positives via `drafts_with_ratings()` default. Blog post internally consistent.
+
+---
+
+### 2026-03-08 EDITOR — Licensing, IPR, Data Provenance, and Language Hedging Pass
+
+**What**: Five-part editorial and project hygiene pass across the blog series and repository:
+1. Created MIT LICENSE file (copyright 2026 Christian Nennemann) -- project claimed "open source" in Post 7 but had no license.
+2. Added IETF IPR/BCP 79 notes to Post 07 (Limitations section) and Post 03 (after fragmentation costs section), reminding implementers to check the IETF IPR disclosure database before building on discussed drafts.
+3. Qualified all remaining unqualified "4:1" safety ratio references across posts 00, 04, 06, 07, 08, and state-of-ecosystem.md with "averaging ~4:1 but varying from 1.5:1 to 21:1 month-to-month."
+4. Added "Analysis based on IETF Datatracker data collected through March 2026" freeze-date note to all 10 blog posts (00 through 08 plus state-of-ecosystem).
+5. Hedged causal language in Post 04: "correlates" -> "appears to correlate", "structural, not attitudinal" -> "appears structural."
+**Why**: Pre-publication compliance and accuracy. The 4:1 ratio without qualification overstated precision; the missing LICENSE contradicted the open-source claim; IPR notes are standard practice when discussing IETF drafts; data freeze dates prevent readers from assuming currency.
+**Result**: All 10 blog posts updated, LICENSE file created. 20+ individual edits across 10 files.
+
 ### 2026-03-08 WRITER/EDITOR — Factual Accuracy Pass Across All Blog Posts

 **What**: Comprehensive factual accuracy fix across all 10 blog series files (posts 00-08 plus state-of-ecosystem), driven by three review documents (review-statistics.md, review-legal.md, review-science.md). Key changes: