v0.2.0: visualizations, interactive browser, arXiv paper, gap analysis

New features: - 12 interactive visualizations (ietf viz): t-SNE landscape, similarity heatmap, score distributions, timeline, bubble explorer, radar charts, author network graph, category treemap, quality vs overlap, org bar chart, ideas chart, and interactive draft browser - Interactive draft browser (browser.html): filterable by category, keyword, score sliders with sortable table and expandable detail rows - arXiv paper (paper/main.tex): 13-page manuscript with all findings - Gap analysis: 12 identified under-addressed areas - Author network: collaboration graph, org contributions, cross-org analysis - Draft generation from gaps (ietf draft-gen) - Auto-load .env for API keys (python-dotenv) New modules: visualize.py, authors.py, draftgen.py New reports: timeline, overlap-matrix, authors, gaps New deps: plotly, matplotlib, seaborn, scipy, scikit-learn, networkx, python-dotenv Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-28 13:37:55 +01:00
parent f44f9265bd
commit be9cf9c5d9
32 changed files with 4447 additions and 4 deletions
--- a/README.md
+++ b/README.md
@@ -0,0 +1,229 @@
+# IETF Draft Analyzer
+
+Track, categorize, rate, and visualize AI/agent-related IETF Internet-Drafts.
+
+**260 drafts** analyzed across **19 categories** with **403 authors**, **1,262 extracted ideas**, and **12 identified gaps** — spanning June 2025 to February 2026.
+
+## What This Does
+
+The IETF is experiencing an unprecedented wave of standardization activity around AI agents. This tool provides a quantitative lens on that activity:
+
+- **Fetches** draft metadata and full text from the IETF Datatracker API
+- **Rates** each draft on 5 dimensions (novelty, maturity, overlap, momentum, relevance) using Claude
+- **Embeds** drafts with Ollama for pairwise similarity and clustering
+- **Extracts** discrete technical ideas and identifies landscape gaps
+- **Maps** the author collaboration network and organizational affiliations
+- **Generates** interactive visualizations, markdown reports, and a filterable browser
+- **Produces** publication-ready figures for an arXiv paper
+
+## Quick Start
+
+```bash
+# Install
+pip install -e .
+
+# Set your API key (or add to .env file)
+export ANTHROPIC_API_KEY=sk-ant-...
+
+# Fetch drafts from IETF Datatracker
+ietf fetch
+
+# Rate all unrated drafts with Claude
+ietf analyze --all
+
+# Generate embeddings (requires Ollama running locally)
+ietf embed
+
+# Extract technical ideas
+ietf ideas --all
+
+# Identify gaps in the landscape
+ietf gaps
+
+# Generate all visualizations
+ietf viz all
+
+# Open the interactive browser
+xdg-open data/figures/browser.html
+```
+
+## CLI Commands
+
+### Core Pipeline
+
+| Command | Description |
+|---------|-------------|
+| `ietf fetch` | Fetch AI/agent drafts from IETF Datatracker |
+| `ietf analyze --all` | Rate all unrated drafts using Claude (5 dimensions + summary) |
+| `ietf embed` | Generate semantic embeddings via Ollama |
+| `ietf ideas --all` | Extract technical ideas from drafts using Claude |
+| `ietf gaps` | Identify under-addressed areas in the landscape |
+| `ietf authors --fetch` | Fetch author/affiliation data from Datatracker |
+
+### Exploration
+
+| Command | Description |
+|---------|-------------|
+| `ietf list` | List tracked drafts |
+| `ietf show <name>` | Show detailed info for a specific draft |
+| `ietf search <query>` | Full-text search across all stored drafts |
+| `ietf similar <name>` | Find the most similar drafts by embedding similarity |
+| `ietf clusters` | Find clusters of near-duplicate drafts |
+| `ietf compare <name1> <name2> ...` | Compare drafts for overlap and unique contributions |
+| `ietf authors` | Show top authors and their draft counts |
+| `ietf network` | Show organizational collaboration network |
+
+### Visualizations (`ietf viz`)
+
+All outputs go to `data/figures/`. Interactive charts are standalone HTML files (no server needed).
+
+| Command | Output | Format |
+|---------|--------|--------|
+| `ietf viz all` | Generate everything below | mixed |
+| `ietf viz browser` | Filterable draft browser with search, category chips, score sliders | HTML |
+| `ietf viz landscape` | t-SNE/UMAP 2D scatter of all drafts colored by category | HTML |
+| `ietf viz heatmap` | 260x260 clustered pairwise similarity matrix | PNG |
+| `ietf viz distributions` | Violin plots for all 5 rating dimensions by category | PNG |
+| `ietf viz timeline` | Stacked area chart of monthly submissions by category | HTML |
+| `ietf viz bubble` | Novelty vs Maturity explorer (size=relevance, color=category) | HTML |
+| `ietf viz radar` | Average rating profile per category | HTML |
+| `ietf viz network` | Author co-authorship force-directed graph | HTML |
+| `ietf viz treemap` | Category composition treemap (color=avg score) | HTML |
+| `ietf viz quality` | Score vs uniqueness with quadrant annotations | HTML |
+| `ietf viz orgs` | Organization contribution horizontal bar chart | HTML |
+| `ietf viz ideas` | Ideas frequency by type | HTML |
+
+### Reports (`ietf report`)
+
+Markdown reports saved to `data/reports/`.
+
+| Command | Description |
+|---------|-------------|
+| `ietf report overview` | Sortable table of all rated drafts with bar-chart scores |
+| `ietf report landscape` | Category-grouped view with per-category rankings |
+| `ietf report timeline` | Monthly submission volume and category trends |
+| `ietf report overlap-matrix` | Top similar pairs, per-category overlap, cross-category matrix |
+| `ietf report authors` | Top authors, organizations, collaboration pairs |
+| `ietf report digest` | Weekly digest of recently fetched drafts |
+| `ietf report ideas` | Most common ideas, unique ideas, ideas by type |
+
+### Other
+
+| Command | Description |
+|---------|-------------|
+| `ietf draft-gen <topic>` | Generate an Internet-Draft addressing a landscape gap |
+| `ietf config` | Show or modify configuration |
+
+## Rating System
+
+Each draft is scored 1-5 on five dimensions:
+
+| Dimension | What it measures |
+|-----------|-----------------|
+| **Novelty** | Originality relative to existing standards |
+| **Maturity** | Completeness of specification |
+| **Overlap** | Redundancy with other drafts (5 = heavily overlapping) |
+| **Momentum** | Community engagement, revisions, adoption |
+| **Relevance** | Importance to the AI/agent ecosystem |
+
+**Composite score:**
+
+```
+score = 0.30 * novelty + 0.25 * relevance + 0.20 * maturity + 0.15 * momentum + 0.10 * (6 - overlap)
+```
+
+## Key Findings
+
+- **36x growth** in 9 months (2 drafts/month to 72)
+- **7.9% of draft pairs** exceed 0.80 cosine similarity — significant redundancy
+- **Safety deficit**: AI safety proposals (36) are vastly outnumbered by protocol proposals (290+)
+- **Organizational concentration**: Top 5 orgs contribute ~35% of all drafts
+- **1,262 technical ideas** extracted across 6 types (mechanism, architecture, protocol, pattern, extension, requirement)
+- **12 identified gaps** in the current landscape
+
+## Tech Stack
+
+- **Python 3.11+** with Click CLI
+- **SQLite** with FTS5 full-text search and WAL mode
+- **Anthropic Claude** (Sonnet 4) for analysis, rating, idea extraction, gap analysis
+- **Ollama** (nomic-embed-text) for local embeddings and similarity
+- **Plotly** for interactive HTML visualizations
+- **Matplotlib/Seaborn** for publication-ready static figures
+- **NetworkX** for author collaboration graph analysis
+- **NumPy/SciPy/scikit-learn** for similarity computation and dimensionality reduction
+
+## Project Structure
+
+```
+src/ietf_analyzer/
+    cli.py          # Click CLI entry point (all commands)
+    fetcher.py      # IETF Datatracker API client
+    analyzer.py     # Claude-based analysis, rating, idea extraction, gap analysis
+    embeddings.py   # Ollama embeddings + cosine similarity + clustering
+    db.py           # SQLite with FTS5 (7 tables: drafts, ratings, embeddings, llm_cache, authors, draft_authors, ideas, gaps)
+    models.py       # Author, Draft, Rating dataclasses
+    reports.py      # Markdown report generation
+    visualize.py    # Interactive HTML + static PNG visualizations
+    authors.py      # AuthorNetwork: Datatracker author fetching, collaboration graph
+    draftgen.py     # Internet-Draft generation from gap analysis
+    config.py       # Configuration with defaults
+
+data/
+    drafts.db       # SQLite database (all analysis data)
+    reports/        # Generated markdown reports
+    figures/        # Generated visualizations (HTML + PNG)
+
+paper/
+    main.tex        # arXiv paper: "The AI Agent Standardization Wave"
+    export_figures.py  # Export interactive charts to static images
+    Makefile        # Build: make pdf
+```
+
+## Database Schema
+
+| Table | Purpose | Records |
+|-------|---------|--------:|
+| `drafts` | Draft metadata + full text | 260 |
+| `ratings` | 5-dimension AI ratings + summary | 260 |
+| `embeddings` | Semantic vectors (nomic-embed-text) | 260 |
+| `llm_cache` | Claude API response cache | ~500 |
+| `authors` | Person records from Datatracker | 403 |
+| `draft_authors` | Author-draft relationships | 742 |
+| `ideas` | Extracted technical ideas | 1,262 |
+| `gaps` | Gap analysis results | 12 |
+| `drafts_fts` | FTS5 full-text search index | — |
+
+## arXiv Paper
+
+A 13-page paper is included in `paper/main.tex`:
+
+> **The AI Agent Standardization Wave: A Quantitative Analysis of 260 IETF Internet-Drafts on Autonomous Agents and Artificial Intelligence**
+
+Build with:
+
+```bash
+cd paper
+python3 export_figures.py   # copy/export figures
+pdflatex main.tex           # compile (run twice for references)
+```
+
+## Configuration
+
+```bash
+# Show current config
+ietf config
+
+# Change Claude model
+ietf config --set claude_model claude-sonnet-4-20250514
+
+# API key via .env file (auto-loaded)
+echo "ANTHROPIC_API_KEY=sk-ant-..." > .env
+```
+
+## Cost
+
+Full analysis of 260 drafts consumed ~475K API tokens (rating + idea extraction + gap analysis). At current Sonnet pricing, this is approximately $2-3 USD.
+
+## License
+
+MIT