Files
ietf-draft-analyzer/data/reports/reviews/verified-counts.md
Christian Nennemann 439424bd04 Fix security, data integrity, and accuracy issues from 4-perspective review
Security fixes:
- Fix SQL injection in db.py:update_generation_run (column name whitelist)
- Flask SECRET_KEY from env var instead of hardcoded
- Add LLM rating bounds validation (_clamp_rating, 1-10)
- Fix JSON extraction trailing whitespace handling

Data integrity:
- Normalize 21 legacy category names to 11 canonical short forms
- Add false_positive column, flag 73 non-AI drafts (361 relevant remain)
- Document verified counts: 434 total/361 relevant drafts, 557 authors, 419 ideas, 11 gaps

Code quality:
- Fix version string 0.1.0 → 0.2.0
- Add close()/context manager to Embedder class
- Dynamic matrix size instead of hardcoded "260x260"

Blog accuracy:
- Fix EU AI Act timeline (enforcement Aug 2026, not "18 months")
- Distinguish OAuth consent from GDPR Einwilligung
- Add EU AI Act Annex III context to hospital scenario
- Add FIPA, eIDAS 2.0 references where relevant

Methodology:
- Add methodology.md documenting pipeline, limitations, rating rubric
- Add LLM-as-judge caveats to analyzer.py
- Document clustering threshold rationale

Reviews from: legal (German/EU law), statistics, development, science perspectives.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-08 10:52:33 +01:00

4.6 KiB

Verified Database Counts

Source: data/drafts.db -- queried 2026-03-08 Purpose: Single source of truth for all counts, replacing inconsistent numbers across blog posts and reports.


Core Tables

Table Count Notes
drafts 434 Up from 361 after 2026-03-07 fetch
ratings 434 1:1 with drafts
authors 557 Unique persons from Datatracker
ideas 419 See "Ideas Count History" below
gaps 11 Not 12 -- see gap list below
embeddings 434 1:1 with drafts
draft_authors 1,057 Draft-author links
llm_cache 1,397 Cached API calls

False Positive Analysis

73 drafts flagged as false_positive = 1 in ratings table (new column added 2026-03-08).

Criteria Count
Relevance <= 2 (auto-flagged) 38
Relevance 3+ but clearly not AI-agent (manually reviewed) 35
Total false positives 73
Drafts excluding false positives 361

Relevance Score Distribution (all 434 drafts)

Relevance Count
1 2
2 36
3 102
4 196
5 98

Category Counts (excluding false positives)

All categories normalized to short-form names (21 legacy long-form entries migrated 2026-03-08).

Category Count
Data formats/interop 146
A2A protocols 146
Agent identity/auth 127
Autonomous netops 103
Policy/governance 97
Agent discovery/reg 82
ML traffic mgmt 77
AI safety/alignment 44
Model serving/inference 42
Human-agent interaction 33
Other AI/agent 18

Note: Drafts average ~2.4 categories each, so these sum to more than 361.

Gap List (11 gaps, not 12)

ID Topic Severity Category
37 Multi-Agent Consensus Protocols high A2A protocols
38 Agent Behavioral Verification critical AI safety/alignment
39 Cross-Protocol Agent Migration medium Agent discovery/reg
40 Real-Time Agent Rollback Mechanisms high Autonomous netops
41 Agent Resource Accounting and Billing medium new
42 Federated Agent Learning Privacy high Policy/governance
43 Agent Capability Negotiation medium A2A protocols
44 Cross-Domain Agent Audit Trails high Agent identity/auth
45 Agent Failure Cascade Prevention critical AI safety/alignment
46 Human Override Standardization high Human-agent interaction
47 Agent Performance Benchmarking medium new

Blog posts reference 12 gaps with different names (e.g., "Agent Resource Exhaustion Protection" vs DB's "Agent Resource Accounting and Billing"). The blog list appears to be an editorial rewrite, not raw pipeline output. The missing 12th gap may be "Cross-Protocol Translation" or "Agent Data Provenance" which appear in blog posts but not in the database.

Ideas Count History

The database currently contains 419 ideas across 377 drafts. This is the third different count encountered:

Source Count Date Likely Explanation
Blog post 5 filename 1,262 ~2026-03-03 Pre-expansion dataset (260 drafts), before dedup
Blog post 5 text / master stats 1,780 ~2026-03-05 Post-expansion (361 drafts), before dedup
Current database 419 2026-03-08 After dedup_ideas run (0.85 threshold) or re-extraction with different params

Ideas by Type (current DB)

Type Count
protocol 96
architecture 95
extension 79
mechanism 68
requirement 42
pattern 35
framework 3
format 1

Ideas per Draft Distribution

Ideas/Draft Drafts
1 337
2 38
3 2
0 (no ideas) 57

The near-uniform 1-idea-per-draft (89% of drafts with ideas) suggests either aggressive dedup or a re-extraction with constrained output. The original pipeline extracted 1-4 ideas per draft, so the 1,780 figure likely reflects pre-dedup counts.

Excluding false positives: 365 ideas across 326 drafts.

Actions Taken (2026-03-08)

  1. Category normalization: Updated 21 ratings rows from legacy long-form category names to canonical short forms. All 11 categories now consistent.
  2. False positive flagging: Added false_positive column to ratings table. Flagged 73 drafts (38 with relevance <= 2, 35 manually reviewed at relevance 3+).
  3. Schema migration: Updated db.py schema and migration code to include false_positive column.
  4. This document: Created as single source of truth for counts.