v0.3.0: Publication-ready release with blog site, paper update, and polish

Release prep: - Version bump to 0.3.0 (pyproject.toml, cli.py) - Rewrite README.md with current stats (475 drafts, 713 authors, 501 ideas) - Add CONTRIBUTING.md with dev setup and code conventions Blog site: - Add scripts/build-site.py (markdown → HTML with clean CSS, dark mode, nav) - Generate static site in docs/blog/ (10 pages) - Ready for GitHub Pages deployment Academic paper (paper/main.tex): - Update all counts: 474→475 drafts, 557→710 authors, 1907→462 ideas, 11→12 gaps - Add false-positive filtering methodology (113 excluded, 361 relevant) - Add cross-org convergence analysis (132 ideas, 33% rate) - Add GDPR compliance gap to gap table - Add LLM-as-judge caveats to rating methodology and limitations - Add FIPA, IEEE P3394, W3C WoT to related work with bibliography entries - Fix safety ratio to show monthly variation (1.5:1 to 21:1) Pipeline: - Fetch 1 new draft (475 total), 3 new authors (713 total) - Fix 16 ruff lint errors across test files - All 106 tests pass Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-08 17:54:43 +01:00
parent e247bfef8f
commit 1ec1f69bee
34 changed files with 4268 additions and 272 deletions
--- a/README.md
+++ b/README.md
@@ -1,8 +1,8 @@
 # IETF Draft Analyzer

-Track, categorize, rate, and visualize AI/agent-related IETF Internet-Drafts.
+Track, categorize, rate, and map AI/agent-related IETF Internet-Drafts.

-**260 drafts** analyzed across **19 categories** with **403 authors**, **1,262 extracted ideas**, and **12 identified gaps** — spanning June 2025 to February 2026.
+**475 drafts** analyzed (361 relevant after false-positive filtering) with **713 authors**, **501 extracted ideas**, **132 cross-org convergent ideas**, and **12 identified gaps** — spanning 2024 to March 2026.

 ## What This Does

@@ -12,9 +12,10 @@ The IETF is experiencing an unprecedented wave of standardization activity aroun
 - **Rates** each draft on 5 dimensions (novelty, maturity, overlap, momentum, relevance) using Claude
 - **Embeds** drafts with Ollama for pairwise similarity and clustering
 - **Extracts** discrete technical ideas and identifies landscape gaps
+- **Analyzes** cross-organizational convergence (SequenceMatcher at 0.75 threshold)
 - **Maps** the author collaboration network and organizational affiliations
- **Generates** interactive visualizations, markdown reports, and a filterable browser
- **Produces** publication-ready figures for an arXiv paper
+- **Generates** markdown reports and a full web dashboard
+- **Filters** false positives automatically (relevance-based + manual flagging)

 ## Quick Start

@@ -28,23 +29,29 @@ export ANTHROPIC_API_KEY=sk-ant-...
 # Fetch drafts from IETF Datatracker
 ietf fetch

-# Rate all unrated drafts with Claude
+# Rate all unrated drafts (--cheap uses Haiku for lower cost)
 ietf analyze --all
+ietf analyze --all --cheap    # ~10x cheaper with Haiku

 # Generate embeddings (requires Ollama running locally)
 ietf embed

 # Extract technical ideas
-ietf ideas --all
+ietf ideas --all --cheap --batch 5
+
+# Analyze cross-org convergence
+ietf ideas convergence

 # Identify gaps in the landscape
 ietf gaps

-# Generate all visualizations
-ietf viz all
+# Fetch author data
+ietf authors --fetch

-# Open the interactive browser
-xdg-open data/figures/browser.html
+# Generate reports
+ietf report overview
+ietf report landscape
+ietf report authors

 # Launch the web dashboard
 ./scripts/run-webui.sh
@@ -52,26 +59,51 @@ xdg-open data/figures/browser.html

 ## Web Dashboard

-A full interactive dashboard at `http://127.0.0.1:5000` with 8 pages:
+A full interactive dashboard at `http://127.0.0.1:5000`:

 ```bash
-# Start the dashboard
 ./scripts/run-webui.sh
-# or: python src/webui/app.py
+# or: FLASK_APP=src/webui/app.py flask run
 ```

 | Page | What it shows |
 |------|---------------|
-| **Overview** | Stat cards, score histogram, category donut, submission timeline, category radar |
-| **Draft Explorer** | Searchable/filterable/sortable table of all drafts with category pills and score badges |
-| **Draft Detail** | Individual draft view with score ring, dimension bars, ideas, references, and linked authors |
-| **Ratings** | Score distributions, dimension box plots, category radar, novelty vs maturity scatter, top-20 leaderboard |
-| **Landscape** | t-SNE embedding map, quality quadrants, violin plots by category |
-| **Authors** | Co-authorship force-directed graph, organization charts, cross-org collaboration |
+| **Overview** | Stat cards, score histogram, category radar, submission timeline |
+| **Draft Explorer** | Searchable/filterable/sortable table with category pills and score badges |
+| **Draft Detail** | Score ring, dimension bars, ideas, references, linked authors |
+| **Ratings** | Score distributions, box plots, category radar, novelty vs maturity scatter |
+| **Landscape** | t-SNE embedding map, quality quadrants |
+| **Authors** | Co-authorship force-directed graph, organization charts |
 | **Ideas** | Extracted ideas grouped by type with search |
-| **Gaps** | Gap cards sorted by severity with links to related drafts |
+| **Gaps** | Gap cards sorted by severity with related drafts |
+| **Citations** | RFC cross-reference graph |
+| **Similarity** | Draft similarity network |
+| **Timeline** | Submission trends over time |
+| **Monitor** | Pipeline health, API costs, processing status |

-Charts are interactive (Plotly.js) — click data points to navigate to draft details, click categories to filter.
+Charts are interactive (Plotly.js). GDPR-compliant analytics (no cookies, daily-salted IP hashing).
+
+## Blog Series
+
+An 8-post analysis series in `data/reports/blog-series/`:
+
+1. **The Gold Rush** — Growth from 9 drafts to 9.3% of all IETF submissions
+2. **Who Writes the Rules** — Huawei's 16%, geopolitical dynamics, team blocs
+3. **The OAuth Wars** — 14 competing OAuth proposals, fragmentation costs
+4. **What Nobody Builds** — The safety deficit (4:1 ratio), 12 identified gaps
+5. **Where Drafts Converge** — 132 cross-org convergent ideas, implicit consensus
+6. **The Big Picture** — Architectural vision, EU AI Act implications
+7. **How We Built This** — Methodology, cost ($9-15), limitations
+8. **Agents Building the Agent Analysis** — Meta post on using Claude agent teams
+
+## Key Findings
+
+- **Safety deficit**: ~4:1 ratio of capability-building to safety proposals (varies 1.5:1 to 21:1 monthly)
+- **Extreme fragmentation**: 155 competing A2A protocols, 42 overlap clusters
+- **Organizational concentration**: Huawei ~16% of all drafts, Chinese orgs ~40%
+- **Cross-org convergence**: 132 ideas (33%) independently proposed by multiple organizations
+- **12 gaps identified**: 2 critical (behavior verification, human override), 5 high, 5 medium
+- **Top-rated drafts**: Safety-focused proposals score highest (VOLT 4.75, DAAP 4.75)

 ## CLI Commands

@@ -80,69 +112,42 @@ Charts are interactive (Plotly.js) — click data points to navigate to draft de
 | Command | Description |
 |---------|-------------|
 | `ietf fetch` | Fetch AI/agent drafts from IETF Datatracker |
-| `ietf analyze --all` | Rate all unrated drafts using Claude (5 dimensions + summary) |
-| `ietf embed` | Generate semantic embeddings via Ollama |
-| `ietf ideas --all` | Extract technical ideas from drafts using Claude |
-| `ietf gaps` | Identify under-addressed areas in the landscape |
-| `ietf authors --fetch` | Fetch author/affiliation data from Datatracker |
+| `ietf analyze --all [--cheap] [--dry-run]` | Rate drafts using Claude |
+| `ietf embed [--dry-run]` | Generate semantic embeddings via Ollama |
+| `ietf ideas --all [--cheap] [--batch N] [--dry-run]` | Extract technical ideas |
+| `ietf ideas convergence [--threshold 0.75]` | Cross-org convergence analysis |
+| `ietf ideas dedup` | Deduplicate similar ideas |
+| `ietf gaps [--dry-run]` | Identify landscape gaps |
+| `ietf authors --fetch` | Fetch author/affiliation data |

 ### Exploration

 | Command | Description |
 |---------|-------------|
 | `ietf list` | List tracked drafts |
-| `ietf show <name>` | Show detailed info for a specific draft |
-| `ietf search <query>` | Full-text search across all stored drafts |
-| `ietf similar <name>` | Find the most similar drafts by embedding similarity |
+| `ietf show <name>` | Show detailed info for a draft |
+| `ietf search <query>` | Full-text search (FTS5) |
+| `ietf similar <name>` | Find similar drafts by embedding similarity |
 | `ietf clusters` | Find clusters of near-duplicate drafts |
-| `ietf compare <name1> <name2> ...` | Compare drafts for overlap and unique contributions |
-| `ietf authors` | Show top authors and their draft counts |
-| `ietf network` | Show organizational collaboration network |
-
-### Visualizations (`ietf viz`)
-
-All outputs go to `data/figures/`. Interactive charts are standalone HTML files (no server needed).
-
-| Command | Output | Format |
-|---------|--------|--------|
-| `ietf viz all` | Generate everything below | mixed |
-| `ietf viz browser` | Filterable draft browser with search, category chips, score sliders | HTML |
-| `ietf viz landscape` | t-SNE/UMAP 2D scatter of all drafts colored by category | HTML |
-| `ietf viz heatmap` | 260x260 clustered pairwise similarity matrix | PNG |
-| `ietf viz distributions` | Violin plots for all 5 rating dimensions by category | PNG |
-| `ietf viz timeline` | Stacked area chart of monthly submissions by category | HTML |
-| `ietf viz bubble` | Novelty vs Maturity explorer (size=relevance, color=category) | HTML |
-| `ietf viz radar` | Average rating profile per category | HTML |
-| `ietf viz network` | Author co-authorship force-directed graph | HTML |
-| `ietf viz treemap` | Category composition treemap (color=avg score) | HTML |
-| `ietf viz quality` | Score vs uniqueness with quadrant annotations | HTML |
-| `ietf viz orgs` | Organization contribution horizontal bar chart | HTML |
-| `ietf viz ideas` | Ideas frequency by type | HTML |
+| `ietf compare <name1> <name2>` | Compare drafts for overlap |
+| `ietf authors` | Top authors and draft counts |
+| `ietf network` | Organizational collaboration network |

 ### Reports (`ietf report`)

-Markdown reports saved to `data/reports/`.
-
 | Command | Description |
 |---------|-------------|
-| `ietf report overview` | Sortable table of all rated drafts with bar-chart scores |
-| `ietf report landscape` | Category-grouped view with per-category rankings |
-| `ietf report timeline` | Monthly submission volume and category trends |
-| `ietf report overlap-matrix` | Top similar pairs, per-category overlap, cross-category matrix |
-| `ietf report authors` | Top authors, organizations, collaboration pairs |
-| `ietf report digest` | Weekly digest of recently fetched drafts |
-| `ietf report ideas` | Most common ideas, unique ideas, ideas by type |
-
-### Other
-
-| Command | Description |
-|---------|-------------|
-| `ietf draft-gen <topic>` | Generate an Internet-Draft addressing a landscape gap |
-| `ietf config` | Show or modify configuration |
+| `ietf report overview` | Sortable table of all rated drafts |
+| `ietf report landscape` | Category-grouped view with rankings |
+| `ietf report authors` | Top authors, organizations, collaboration |
+| `ietf report ideas` | Ideas by type, most common, unique |
+| `ietf report gaps` | Gap analysis with severity ratings |
+| `ietf report timeline` | Monthly submission trends |
+| `ietf report overlap-matrix` | Similar pairs and cross-category matrix |

 ## Rating System

-Each draft is scored 1-5 on five dimensions:
+Each draft is scored 1-5 on five dimensions by Claude (LLM-as-judge, see [methodology](data/reports/methodology.md) for caveats):

 | Dimension | What it measures |
 |-----------|-----------------|
@@ -152,145 +157,71 @@ Each draft is scored 1-5 on five dimensions:
 | **Momentum** | Community engagement, revisions, adoption |
 | **Relevance** | Importance to the AI/agent ecosystem |

-**Composite score:**
-
-```
-score = 0.30 * novelty + 0.25 * relevance + 0.20 * maturity + 0.15 * momentum + 0.10 * (6 - overlap)
-```
-
-## Key Findings
-
- **36x growth** in 9 months (2 drafts/month to 72)
- **7.9% of draft pairs** exceed 0.80 cosine similarity — significant redundancy
- **Safety deficit**: AI safety proposals (36) are vastly outnumbered by protocol proposals (290+)
- **Organizational concentration**: Top 5 orgs contribute ~35% of all drafts
- **1,262 technical ideas** extracted across 6 types (mechanism, architecture, protocol, pattern, extension, requirement)
- **12 identified gaps** in the current landscape (3 critical, 6 high, 3 medium)
-
-## Gap Analysis
-
-Claude-powered gap analysis identifies 12 under-addressed areas across the 260-draft landscape. Each gap is cross-referenced with the drafts and ideas that partially touch on the topic, highlighting where effort is concentrated and where it's missing.
-
-### Critical Gaps
-
-| # | Gap | Category | Drafts in Category | Key Issue |
-|--:|-----|----------|-------------------:|-----------|
-| 1 | **Agent Resource Management** | Autonomous netops | 60 | No framework for scheduling, quotas, or fair allocation when agents compete for compute, memory, and bandwidth. Drafts focus on communication but ignore resource contention in multi-agent environments. |
-| 2 | **Agent Behavior Verification** | AI safety/alignment | 36 | No runtime mechanisms to verify that deployed agents actually behave according to declared policies. Gap between stated capabilities and observed behavior. Closest work: `draft-birkholz-verifiable-agent-conversations` (attestation), `draft-aylward-daap-v2` (accountability). |
-| 3 | **Agent Error Recovery & Rollback** | Autonomous netops | 60 | Missing standards for cascading failure recovery and rollback of autonomous decisions. Only `draft-yue-anima-agent-recovery-networks` specifically addresses recovery; `draft-srijal-agents-policy` touches mandatory failure behavior. |
-
-### High-Severity Gaps
-
-| # | Gap | Category | Drafts in Category | Key Issue |
-|--:|-----|----------|-------------------:|-----------|
-| 4 | **Cross-Protocol Translation** | A2A protocols | 92 | 92 competing A2A protocol drafts with high overlap but no universal translation layer or negotiation mechanism for interoperability between them. |
-| 5 | **Agent Lifecycle Management** | Agent discovery/reg | 57 | Registration and discovery are covered but no standards for agent versioning, updates, graceful shutdown, or retirement without disrupting dependent services. |
-| 6 | **Multi-Agent Consensus** | A2A protocols | 92 | No framework for groups of agents to reach consensus on conflicting decisions. Closest: `draft-li-dmsc-inf-architecture` (DMSC protocol), `draft-takagi-srta-trinity` (SRTA architecture). |
-| 7 | **Human Override & Intervention** | Human-agent interaction | 22 | Only 22 drafts (vs 60 autonomous netops) address human-agent interaction. No emergency override protocols. Best effort: `draft-irtf-nmrg-llm-nm` (human-in-the-loop framework). |
-| 8 | **Cross-Domain Security Boundaries** | Agent identity/auth | 98 | Missing frameworks for agents operating across security domains with different trust levels. `draft-diaconu-agents-authz-info-sharing` and `draft-cui-dmsc-agent-cdi` are early attempts but lack enforcement mechanisms. |
-| 9 | **Dynamic Trust & Reputation** | Agent identity/auth | 98 | Static certificate-based auth is insufficient for long-running autonomous systems. No dynamic trust scoring or reputation tracking. Closest: `draft-cosmos-protocol-specification` (trust scoring), `draft-jiang-seat-dynamic-attestation`. |
-
-### Medium-Severity Gaps
-
-| # | Gap | Category | Drafts in Category | Key Issue |
-|--:|-----|----------|-------------------:|-----------|
-| 10 | **Agent Performance Monitoring** | Autonomous netops | 60 | No standardized metrics, SLOs, or observability framework for production agent deployments. `draft-fu-nmop-agent-communication-framework` mentions monitoring but doesn't define standards. |
-| 11 | **Agent Explainability** | AI safety/alignment | 36 | No protocols for agents to explain decisions to other agents or humans. Critical for debugging and regulatory compliance. Only 36 safety drafts total. |
-| 12 | **Agent Data Provenance** | Data formats/interop | 102 | No standards for tracking data lineage as information flows between agents. 102 data format drafts but none address provenance tracking. |
-
-### Gap Coverage Ratio
-
-The safety deficit is the most striking finding — only **12.3%** of categorized drafts (36/292) address AI safety/alignment, while 92 focus on A2A protocols and 60 on autonomous operations. The ratio of "how to do things" to "how to do things safely" is roughly **7:1**.
+**Important**: Ratings are generated from abstracts and partial full text without human calibration. They should be treated as relative rankings, not absolute quality measures.

 ## Tech Stack

 - **Python 3.11+** with Click CLI
 - **SQLite** with FTS5 full-text search and WAL mode
- **Anthropic Claude** (Sonnet 4) for analysis, rating, idea extraction, gap analysis
+- **Anthropic Claude** (Sonnet/Haiku) for analysis, rating, idea extraction, gap analysis
 - **Ollama** (nomic-embed-text) for local embeddings and similarity
- **Flask** with Jinja2 for the interactive web dashboard
- **Plotly** for interactive HTML visualizations
- **Matplotlib/Seaborn** for publication-ready static figures
- **NetworkX** for author collaboration graph analysis
- **NumPy/SciPy/scikit-learn** for similarity computation and dimensionality reduction
+- **Flask** with Jinja2 for the web dashboard
+- **Plotly** for interactive visualizations
+- **NumPy/SciPy/scikit-learn** for similarity computation and clustering

 ## Project Structure

 ```
 src/ietf_analyzer/
-    cli.py          # Click CLI entry point (all commands)
+    cli.py          # Click CLI entry point (~30 commands)
    fetcher.py      # IETF Datatracker API client
-    analyzer.py     # Claude-based analysis, rating, idea extraction, gap analysis
-    embeddings.py   # Ollama embeddings + cosine similarity + clustering
-    db.py           # SQLite with FTS5 (7 tables: drafts, ratings, embeddings, llm_cache, authors, draft_authors, ideas, gaps)
+    analyzer.py     # Claude-based analysis (rating, ideas, gaps)
+    embeddings.py   # Ollama embeddings + similarity + clustering
+    db.py           # SQLite with FTS5 (8 tables)
    models.py       # Author, Draft, Rating dataclasses
    reports.py      # Markdown report generation
-    visualize.py    # Interactive HTML + static PNG visualizations
-    authors.py      # AuthorNetwork: Datatracker author fetching, collaboration graph
-    draftgen.py     # Internet-Draft generation from gap analysis
-    config.py       # Configuration with defaults
+    authors.py      # Author network analysis
+    search.py       # Hybrid FTS5 + embedding search
+    classifier.py   # Two-stage Ollama classifier
+    readiness.py    # Draft readiness scoring
+    config.py       # Configuration

 src/webui/
-    app.py          # Flask application with all routes
-    data.py         # Data access layer (stats, filtering, t-SNE, network graphs)
-    templates/      # Jinja2 templates (base + 8 page templates)
+    app.py          # Flask application (20 API endpoints)
+    data.py         # Data access layer with TypedDicts
+    auth.py         # Admin authentication
+    analytics.py    # GDPR-compliant pageview tracking
+    templates/      # Jinja2 templates (23 pages)

 data/
-    drafts.db       # SQLite database (all analysis data)
-    reports/        # Generated markdown reports
-    figures/        # Generated visualizations (HTML + PNG)
+    drafts.db       # SQLite database
+    reports/        # Generated reports + blog series
+    .cache/         # Similarity matrix cache

 paper/
-    main.tex        # arXiv paper: "The AI Agent Standardization Wave"
-    export_figures.py  # Export interactive charts to static images
-    Makefile        # Build: make pdf
+    main.tex        # arXiv paper
 ```

 ## Database Schema

 | Table | Purpose | Records |
 |-------|---------|--------:|
-| `drafts` | Draft metadata + full text | 260 |
-| `ratings` | 5-dimension AI ratings + summary | 260 |
-| `embeddings` | Semantic vectors (nomic-embed-text) | 260 |
-| `llm_cache` | Claude API response cache | ~500 |
-| `authors` | Person records from Datatracker | 403 |
-| `draft_authors` | Author-draft relationships | 742 |
-| `ideas` | Extracted technical ideas | 1,262 |
+| `drafts` | Draft metadata + full text | 475 |
+| `ratings` | 5-dimension ratings + summary + false_positive flag | 475 |
+| `embeddings` | Semantic vectors (nomic-embed-text, 768-dim) | 475 |
+| `llm_cache` | Claude API response cache (SHA-256 dedup) | ~1,500 |
+| `authors` | Person records from Datatracker | 713 |
+| `draft_authors` | Author-draft relationships | ~1,400 |
+| `ideas` | Extracted + deduplicated technical ideas | 501 |
 | `gaps` | Gap analysis results | 12 |
-| `drafts_fts` | FTS5 full-text search index | — |
-
-## arXiv Paper
-
-A 13-page paper is included in `paper/main.tex`:
-
-> **The AI Agent Standardization Wave: A Quantitative Analysis of 260 IETF Internet-Drafts on Autonomous Agents and Artificial Intelligence**
-
-Build with:
-
-```bash
-cd paper
-python3 export_figures.py   # copy/export figures
-pdflatex main.tex           # compile (run twice for references)
-```
-
-## Configuration
-
-```bash
-# Show current config
-ietf config
-
-# Change Claude model
-ietf config --set claude_model claude-sonnet-4-20250514
-
-# API key via .env file (auto-loaded)
-echo "ANTHROPIC_API_KEY=sk-ant-..." > .env
-```

 ## Cost

-Full analysis of 260 drafts consumed ~475K API tokens (rating + idea extraction + gap analysis). At current Sonnet pricing, this is approximately $2-3 USD.
+Full pipeline for 475 drafts: ~$9-15 USD total
+- Sonnet for rating + gap analysis (~$3)
+- Haiku for bulk idea extraction (~$1)
+- Ollama embeddings: free (local)

 ## License

-MIT
+MIT — see [LICENSE](LICENSE)