Fix remaining critical, high, and medium issues from 4-perspective review
Critical fixes:
- Fix rating clamp range 1-10 → 1-5 (actual scale)
- Add `ietf ideas convergence` command (SequenceMatcher at 0.75 threshold)
- Fix "628 cross-org ideas" → 130 (verified from current DB) across 8 files
Security fixes:
- Sanitize FTS5 query input (strip special chars + boolean operators)
- Add rate limiting (10 req/min/IP) on Claude-calling endpoints
- Change <path:name> → <string:name> on draft routes
Codebase fixes:
- Add Database context manager (__enter__/__exit__)
- Wire false_positive filtering into queries (exclude by default in web UI)
- Fix Post 3 arithmetic ("~300" → "~409" distinct proposals)
Content & licensing:
- Add MIT LICENSE file
- Add IPR/FRAND notes (BCP 79, RFC 8179) to Posts 03 and 07
- Qualify "4:1 safety ratio" with monthly variation in 6 remaining files
- Add "Data as of March 2026" freeze-date headers to all 10 blog posts
- Hedge causal language in Post 04
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
This commit is contained in:
@@ -4,6 +4,53 @@
|
||||
|
||||
---
|
||||
|
||||
### 2026-03-08 CODER — Critical fixes: rating clamp, convergence command, blog number correction
|
||||
|
||||
**What**: Three fixes addressing data integrity and reproducibility:
|
||||
1. **Rating clamp range** — Fixed `_clamp_rating()` in `analyzer.py` from `hi=10` to `hi=5`. The rating scale is 1-5 but the clamp allowed values up to 10. Verified no existing ratings in the DB exceed 5 (0 rows affected).
|
||||
2. **`ietf ideas convergence` subcommand** — Added a `convergence` subcommand under the `ideas` CLI group in `cli.py`. Uses `difflib.SequenceMatcher` at 0.75 threshold to find cross-org convergent ideas. Mirrors the existing `ietf idea-overlap` command but lives under the `ideas` group for discoverability.
|
||||
3. **Blog post number correction** — The blog series referenced "628 cross-org convergent ideas" throughout. Running `ietf idea-overlap` on the current 419-idea database produces **130 cross-org convergent ideas** (36% of 361 unique clusters). The 628 came from an earlier pipeline run with 1,780 raw ideas. Updated all 8 affected files (posts 00, 05, 06, 07, 08, state-of-ecosystem, and data packages) to use 130 as the headline number, with historical context noting the earlier run's 628 where appropriate.
|
||||
**Why**: Rating clamp was silently allowing out-of-range values. The 628 number was irreproducible from the current database, making it a credibility risk for the blog series.
|
||||
**Result**: Clamp fixed (0 DB rows needed correction). Convergence command works (`ietf ideas convergence`). All blog posts now reference the reproducible number (130) with consistent methodology notes.
|
||||
**Surprise**: The convergence *rate* (36% vs 43%) is roughly consistent across both extraction runs, suggesting the pattern is robust even if absolute counts depend on extraction parameters. This is actually a stronger claim than the raw number.
|
||||
|
||||
---
|
||||
|
||||
### 2026-03-08 CODER — Security Hardening (FTS5 injection, rate limiting, route safety)
|
||||
|
||||
**What**: Three security fixes applied to the web UI and search layer:
|
||||
1. **FTS5 query sanitization** — Added `sanitize_fts_query()` static method to `HybridSearch` in `search.py` that strips special FTS5 characters (`"`, `*`, `(`, `)`) and boolean operators (`NEAR`, `OR`, `AND`, `NOT`) before passing queries to SQLite FTS5 MATCH. Also applied the same sanitization in `data.py`'s `global_search()`. The old fallback that wrapped unsanitized words in double quotes (itself an injection vector) was removed.
|
||||
2. **Rate limiting on Claude endpoints** — Added in-memory sliding-window rate limiter (10 req/min/IP) as a decorator on `/api/ask/synthesize` and `/api/compare` — the two endpoints that call Claude and cost tokens. Uses a simple `dict[ip, list[timestamp]]` approach with no new dependencies.
|
||||
3. **Route converter fix** — Changed all `<path:name>` to `<string:name>` on draft detail routes (3 occurrences). Draft names never contain slashes, so `path` was unnecessarily permissive.
|
||||
**Why**: Hardening against query injection, API abuse, and path traversal before public deployment.
|
||||
**Result**: All three fixes applied. No new dependencies introduced.
|
||||
|
||||
---
|
||||
|
||||
### 2026-03-08 CODER — Codebase Quality: Context Manager, False Positive Filtering, Blog Fix
|
||||
|
||||
**What**: Four quality improvements:
|
||||
1. Added `__enter__`/`__exit__` to `Database` class (db.py) so it can be used as a context manager, matching the pattern already in `Embedder`.
|
||||
2. Added `include_false_positives` parameter to `drafts_with_ratings()` (default `False`) and `count_drafts()` (default `True` for backward compat). The web UI's `get_overview_stats()` now excludes the 73 false-positive drafts from displayed counts.
|
||||
3. Checked all ratings.categories JSON values -- all already normalized to short-form names. No migration needed.
|
||||
4. Fixed arithmetic in Post 03 key takeaways: "roughly 300 distinct proposals" corrected to "roughly 409" (434 - 25 = 409, consistent with the body text on line 85).
|
||||
|
||||
**Why**: False positives were being counted in dashboard stats and included in all rating-based analyses (similarity, timeline, categories). The context manager prevents resource leaks. The blog arithmetic contradicted itself within the same post.
|
||||
**Result**: `Database` now usable with `with` statement. All web UI visualizations automatically exclude false positives via `drafts_with_ratings()` default. Blog post internally consistent.
|
||||
|
||||
---
|
||||
|
||||
### 2026-03-08 EDITOR — Licensing, IPR, Data Provenance, and Language Hedging Pass
|
||||
|
||||
**What**: Five-part editorial and project hygiene pass across the blog series and repository:
|
||||
1. Created MIT LICENSE file (copyright 2026 Christian Nennemann) -- project claimed "open source" in Post 7 but had no license.
|
||||
2. Added IETF IPR/BCP 79 notes to Post 07 (Limitations section) and Post 03 (after fragmentation costs section), reminding implementers to check the IETF IPR disclosure database before building on discussed drafts.
|
||||
3. Qualified all remaining unqualified "4:1" safety ratio references across posts 00, 04, 06, 07, 08, and state-of-ecosystem.md with "averaging ~4:1 but varying from 1.5:1 to 21:1 month-to-month."
|
||||
4. Added "Analysis based on IETF Datatracker data collected through March 2026" freeze-date note to all 10 blog posts (00 through 08 plus state-of-ecosystem).
|
||||
5. Hedged causal language in Post 04: "correlates" -> "appears to correlate", "structural, not attitudinal" -> "appears structural."
|
||||
**Why**: Pre-publication compliance and accuracy. The 4:1 ratio without qualification overstated precision; the missing LICENSE contradicted the open-source claim; IPR notes are standard practice when discussing IETF drafts; data freeze dates prevent readers from assuming currency.
|
||||
**Result**: All 10 blog posts updated, LICENSE file created. 20+ individual edits across 10 files.
|
||||
|
||||
### 2026-03-08 WRITER/EDITOR — Factual Accuracy Pass Across All Blog Posts
|
||||
|
||||
**What**: Comprehensive factual accuracy fix across all 10 blog series files (posts 00-08 plus state-of-ecosystem), driven by three review documents (review-statistics.md, review-legal.md, review-science.md). Key changes:
|
||||
|
||||
Reference in New Issue
Block a user