Fix remaining critical, high, and medium issues from 4-perspective review

Critical fixes:
- Fix rating clamp range 1-10 → 1-5 (actual scale)
- Add `ietf ideas convergence` command (SequenceMatcher at 0.75 threshold)
- Fix "628 cross-org ideas" → 130 (verified from current DB) across 8 files

Security fixes:
- Sanitize FTS5 query input (strip special chars + boolean operators)
- Add rate limiting (10 req/min/IP) on Claude-calling endpoints
- Change <path:name> → <string:name> on draft routes

Codebase fixes:
- Add Database context manager (__enter__/__exit__)
- Wire false_positive filtering into queries (exclude by default in web UI)
- Fix Post 3 arithmetic ("~300" → "~409" distinct proposals)

Content & licensing:
- Add MIT LICENSE file
- Add IPR/FRAND notes (BCP 79, RFC 8179) to Posts 03 and 07
- Qualify "4:1 safety ratio" with monthly variation in 6 remaining files
- Add "Data as of March 2026" freeze-date headers to all 10 blog posts
- Hedge causal language in Post 04

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
This commit is contained in:
2026-03-08 12:47:47 +01:00
parent f1a0b0264c
commit e7527ad68e
40 changed files with 1005 additions and 169 deletions

View File

@@ -4,6 +4,53 @@
---
### 2026-03-08 CODER — Critical fixes: rating clamp, convergence command, blog number correction
**What**: Three fixes addressing data integrity and reproducibility:
1. **Rating clamp range** — Fixed `_clamp_rating()` in `analyzer.py` from `hi=10` to `hi=5`. The rating scale is 1-5 but the clamp allowed values up to 10. Verified no existing ratings in the DB exceed 5 (0 rows affected).
2. **`ietf ideas convergence` subcommand** — Added a `convergence` subcommand under the `ideas` CLI group in `cli.py`. Uses `difflib.SequenceMatcher` at 0.75 threshold to find cross-org convergent ideas. Mirrors the existing `ietf idea-overlap` command but lives under the `ideas` group for discoverability.
3. **Blog post number correction** — The blog series referenced "628 cross-org convergent ideas" throughout. Running `ietf idea-overlap` on the current 419-idea database produces **130 cross-org convergent ideas** (36% of 361 unique clusters). The 628 came from an earlier pipeline run with 1,780 raw ideas. Updated all 8 affected files (posts 00, 05, 06, 07, 08, state-of-ecosystem, and data packages) to use 130 as the headline number, with historical context noting the earlier run's 628 where appropriate.
**Why**: Rating clamp was silently allowing out-of-range values. The 628 number was irreproducible from the current database, making it a credibility risk for the blog series.
**Result**: Clamp fixed (0 DB rows needed correction). Convergence command works (`ietf ideas convergence`). All blog posts now reference the reproducible number (130) with consistent methodology notes.
**Surprise**: The convergence *rate* (36% vs 43%) is roughly consistent across both extraction runs, suggesting the pattern is robust even if absolute counts depend on extraction parameters. This is actually a stronger claim than the raw number.
---
### 2026-03-08 CODER — Security Hardening (FTS5 injection, rate limiting, route safety)
**What**: Three security fixes applied to the web UI and search layer:
1. **FTS5 query sanitization** — Added `sanitize_fts_query()` static method to `HybridSearch` in `search.py` that strips special FTS5 characters (`"`, `*`, `(`, `)`) and boolean operators (`NEAR`, `OR`, `AND`, `NOT`) before passing queries to SQLite FTS5 MATCH. Also applied the same sanitization in `data.py`'s `global_search()`. The old fallback that wrapped unsanitized words in double quotes (itself an injection vector) was removed.
2. **Rate limiting on Claude endpoints** — Added in-memory sliding-window rate limiter (10 req/min/IP) as a decorator on `/api/ask/synthesize` and `/api/compare` — the two endpoints that call Claude and cost tokens. Uses a simple `dict[ip, list[timestamp]]` approach with no new dependencies.
3. **Route converter fix** — Changed all `<path:name>` to `<string:name>` on draft detail routes (3 occurrences). Draft names never contain slashes, so `path` was unnecessarily permissive.
**Why**: Hardening against query injection, API abuse, and path traversal before public deployment.
**Result**: All three fixes applied. No new dependencies introduced.
---
### 2026-03-08 CODER — Codebase Quality: Context Manager, False Positive Filtering, Blog Fix
**What**: Four quality improvements:
1. Added `__enter__`/`__exit__` to `Database` class (db.py) so it can be used as a context manager, matching the pattern already in `Embedder`.
2. Added `include_false_positives` parameter to `drafts_with_ratings()` (default `False`) and `count_drafts()` (default `True` for backward compat). The web UI's `get_overview_stats()` now excludes the 73 false-positive drafts from displayed counts.
3. Checked all ratings.categories JSON values -- all already normalized to short-form names. No migration needed.
4. Fixed arithmetic in Post 03 key takeaways: "roughly 300 distinct proposals" corrected to "roughly 409" (434 - 25 = 409, consistent with the body text on line 85).
**Why**: False positives were being counted in dashboard stats and included in all rating-based analyses (similarity, timeline, categories). The context manager prevents resource leaks. The blog arithmetic contradicted itself within the same post.
**Result**: `Database` now usable with `with` statement. All web UI visualizations automatically exclude false positives via `drafts_with_ratings()` default. Blog post internally consistent.
---
### 2026-03-08 EDITOR — Licensing, IPR, Data Provenance, and Language Hedging Pass
**What**: Five-part editorial and project hygiene pass across the blog series and repository:
1. Created MIT LICENSE file (copyright 2026 Christian Nennemann) -- project claimed "open source" in Post 7 but had no license.
2. Added IETF IPR/BCP 79 notes to Post 07 (Limitations section) and Post 03 (after fragmentation costs section), reminding implementers to check the IETF IPR disclosure database before building on discussed drafts.
3. Qualified all remaining unqualified "4:1" safety ratio references across posts 00, 04, 06, 07, 08, and state-of-ecosystem.md with "averaging ~4:1 but varying from 1.5:1 to 21:1 month-to-month."
4. Added "Analysis based on IETF Datatracker data collected through March 2026" freeze-date note to all 10 blog posts (00 through 08 plus state-of-ecosystem).
5. Hedged causal language in Post 04: "correlates" -> "appears to correlate", "structural, not attitudinal" -> "appears structural."
**Why**: Pre-publication compliance and accuracy. The 4:1 ratio without qualification overstated precision; the missing LICENSE contradicted the open-source claim; IPR notes are standard practice when discussing IETF drafts; data freeze dates prevent readers from assuming currency.
**Result**: All 10 blog posts updated, LICENSE file created. 20+ individual edits across 10 files.
### 2026-03-08 WRITER/EDITOR — Factual Accuracy Pass Across All Blog Posts
**What**: Comprehensive factual accuracy fix across all 10 blog series files (posts 00-08 plus state-of-ecosystem), driven by three review documents (review-statistics.md, review-legal.md, review-science.md). Key changes: