v0.3.0: Gap-to-Draft pipeline, Living Standards Observatory, blog series

Gap-to-Draft Pipeline (ietf pipeline): - Context builder assembles ideas, RFC foundations, similar drafts, ecosystem vision - Generator produces outlines + sections using rich context with Claude - Quality gates: novelty (embedding similarity), references, format, self-rating - Family coordinator generates 5-draft ecosystem (AEM/ATD/HITL/AEPB/APAE) - I-D formatter with proper headers, references, 72-char wrapping Living Standards Observatory (ietf observatory): - Source abstraction with IETF + W3C fetchers - 7-step update pipeline: snapshot, fetch, analyze, embed, ideas, gaps, record - Static GitHub Pages dashboard (explorer, gap tracker, timeline) - Weekly CI/CD automation via GitHub Actions Also includes: - 361 drafts (expanded from 260 with 6 new keywords), 403 authors, 1,262 ideas, 12 gaps - Blog series (8 posts planned), reports, arXiv paper figures - Agent team infrastructure (CLAUDE.md, scripts, dev journal) - 5 new DB tables, schema migration, ~15 new query methods Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-04 00:48:57 +01:00
parent be9cf9c5d9
commit d6beb9c0a0
87 changed files with 24471 additions and 401 deletions
--- a/scripts/agent-team-prompt.md
+++ b/scripts/agent-team-prompt.md
@@ -0,0 +1,362 @@
+# Agent Team: IETF AI Agent Landscape — The Big Picture
+
+> **Goal**: Transform a fragmented analysis of 361 IETF Internet-Drafts into a coherent, compelling narrative about what's coming with AI agents in the standards world. Produce a blog series, implement deeper analysis features, and draft the missing architectural pieces that tie everything together.
+
+---
+
+## Context: What We Have
+
+This project (`ietf-draft-analyzer`) has already built a powerful pipeline:
+
+- **361 drafts** fetched from IETF Datatracker (AI/agent-related, keywords: agent, ai-agent, llm, autonomous, machine-learning, artificial-intelligence, mcp, agentic, inference, generative, intelligent, aipref)
+- **403 authors** from **184 organizations** mapped with co-authorship networks
+- **1,262 technical ideas** extracted and classified (mechanisms, architectures, protocols, patterns, extensions, requirements)
+- **12 standardization gaps** identified (3 critical, 6 high, 3 medium)
+- **13+ interactive visualizations** (t-SNE landscape, heatmap, timeline, network graph, treemap, bubble charts, radar, quality scatter, etc.)
+- **33 team blocs** detected via co-authorship analysis
+- A **5-draft ecosystem architecture** already outlined (AEM, ATD, HITL, AEPB, APAE) building on SPIFFE/WIMSE/ECT
+- One existing blog post draft ("The IETF's AI Agent Gold Rush")
+- An arXiv paper draft (13 pages, needs finishing)
+
+**Key finding**: The IETF is in a 36x growth sprint on AI agent protocols (2→72 drafts/month in 9 months), but with a 4:1 ratio of capability-building to safety work. The landscape is deeply fragmented — 92 A2A protocol drafts with no interoperability layer, 13 competing OAuth-for-agents proposals, and critical gaps in behavior verification, resource management, and error recovery.
+
+**Data freshness note**: 101 new drafts (from keyword expansion on 2026-03-03) still need: analysis/rating, author fetch, idea extraction, embedding. Run the pipeline first.
+
+---
+
+## The Team
+
+### 1. Architect — "The Big Picture"
+
+**Role**: Connect the dots. You see what nobody else sees — the patterns across 361 drafts, the structural forces shaping this landscape, the story of what's actually happening and what's coming next.
+
+**Your job**:
+- Read ALL existing reports in `data/reports/` to internalize the landscape
+- Read `data/reports/holistic-agent-ecosystem-draft-outlines.md` and `data/reports/draft-family-consistency.md` — these are the most intellectually developed pieces
+- Design the **narrative arc** for the blog series: what story are we telling, in what order, building to what conclusion?
+- Identify what analysis is MISSING that would make the story more compelling (and tell the Coder what to build)
+- Draft the **architectural vision document** — the missing "here's what the IETF agent ecosystem will look like in 2 years" piece
+- Review everything the Writer produces for technical accuracy and narrative coherence
+
+**Key questions only you can answer**:
+- What's the meta-story? (Not just "there are lots of drafts" but "here's what this means for the future of the internet")
+- Where are the tectonic plates? (Chinese standards push vs. Western absence, safety deficit, protocol fragmentation)
+- What are the 3-5 things someone building agent systems TODAY needs to know from this data?
+- How does the proposed 5-draft ecosystem (AEM/ATD/HITL/AEPB/APAE) fit into or challenge the existing landscape?
+
+**Deliverables**:
+- [ ] Blog series outline with narrative arc (titles, key thesis per post, reading order)
+- [ ] "State of the Agent Ecosystem" architectural vision (~2000 words)
+- [ ] Review notes on each blog post before publication
+- [ ] List of missing analyses for the Coder to implement
+
+---
+
+### 2. Analyst — "The Data Whisperer"
+
+**Role**: You are the one who goes deep into the database and extracts the stories hiding in the numbers. Every blog post needs data-backed claims, and you find them.
+
+**Your job**:
+- Run the pipeline on the 101 new drafts first: `ietf analyze --all --cheap`, `ietf authors --fetch`, `ietf ideas --all --cheap --batch 5`, `ietf embed`, `ietf gaps`
+- Query the SQLite database (`data/drafts.db`) directly for insights the CLI doesn't surface
+- Find the stories in the data: trends, outliers, surprising patterns, contradictions
+- Generate fresh reports: `ietf report overview`, `ietf report landscape`, `ietf report authors`, etc.
+- Create custom SQL queries for specific blog post needs
+- Cross-reference findings with the Architect's narrative needs
+
+**Specific analyses to run**:
+- **Trend analysis**: How have categories shifted month-over-month? Is the safety gap widening or narrowing?
+- **New keyword impact**: What did the 101 new drafts from `mcp`, `agentic`, `inference`, `generative`, `intelligent`, `aipref` add? Any new categories or patterns?
+- **Author velocity**: Who's publishing fastest? Any new entrants in the last 3 months?
+- **Idea convergence**: Which ideas appear across the most organizations (not just drafts)?  Cross-org idea overlap = actual consensus signals
+- **Gap evolution**: Have any of the 12 gaps been partially filled by newer drafts?
+- **Competition mapping**: For each crowded area (OAuth for agents, A2A protocols, agent discovery), who's competing and how do their approaches differ?
+- **Quality vs. quantity**: Do high-volume authors/orgs produce higher or lower quality scores?
+
+**Tools at your disposal**:
+```bash
+# CLI commands
+ietf fetch                    # Pull latest drafts
+ietf analyze --all --cheap    # Rate all unrated drafts (Haiku, ~$0.30)
+ietf ideas --all --cheap --batch 5  # Extract ideas in batches
+ietf embed                    # Generate embeddings
+ietf gaps                     # Re-run gap analysis
+ietf report <type>            # overview|landscape|digest|timeline|overlap-matrix|authors|ideas|gaps
+ietf search "<query>"         # FTS5 search
+ietf similar <draft-name>     # Find similar drafts by embedding
+ietf viz all                  # Regenerate all visualizations
+
+# Direct DB queries
+sqlite3 data/drafts.db "SELECT ..."
+```
+
+**Deliverables**:
+- [ ] Pipeline run on 101 new drafts (analyze, authors, ideas, embed, gaps)
+- [ ] Refreshed reports (all types)
+- [ ] Per-blog-post data package: key stats, tables, chart data, quotable numbers
+- [ ] "Surprising findings" list — things that challenge assumptions
+- [ ] Custom visualizations or data tables for blog posts
+
+---
+
+### 3. Coder — "The Feature Builder"
+
+**Role**: Implement new analysis features that unlock deeper insights. The current tool is powerful but there are capabilities that would dramatically improve the analysis.
+
+**Your job**:
+- Read the existing codebase thoroughly: `src/ietf_analyzer/` (cli.py, db.py, analyzer.py, reports.py, embeddings.py, authors.py, fetcher.py, models.py, config.py, orgs.py, draftgen.py, visualize.py)
+- Implement features the Architect and Analyst need
+- Write clean, tested code that fits the existing patterns (Click CLI, SQLite, Claude API, rich output)
+- Save all scripts to `scripts/` for reproducibility
+
+**Priority features to implement** (coordinate with Architect on order):
+
+1. **RFC Cross-Reference** (`ietf refs`):
+   - Parse each draft's full text for RFC references (regex: `RFC\s*\d{4,}`, `\[RFC\d+\]`)
+   - Store in new `draft_refs` table (draft_name, ref_type, ref_id)
+   - Report: which RFCs are most referenced? Which drafts build on what foundation?
+   - This reveals the "dependency tree" of the agent ecosystem — what existing standards are they building on?
+
+2. **Revision Tracking** (`ietf revisions`):
+   - For drafts with rev > 00, fetch all prior revisions from Datatracker
+   - Compute diff stats (added/removed sections, size changes)
+   - Track: which drafts are actively evolving? Which are stale?
+   - This tells us about momentum and commitment
+
+3. **Category Trend Analysis** (`ietf trends`):
+   - Monthly breakdown of new drafts per category
+   - Growth rate calculation per category
+   - Identify: which categories are accelerating? Which peaked and declined?
+   - Output: markdown table + data for visualization
+
+4. **Cross-Organization Idea Overlap** (`ietf idea-overlap`):
+   - For each idea that appears in 2+ drafts, check if the drafts come from different orgs
+   - Cross-org idea overlap = genuine convergence (not just one team's duplicates)
+   - Report: top ideas with multi-org convergence
+
+5. **Draft Status Tracking**:
+   - Fetch current IESG state and WG adoption status from Datatracker
+   - Track: which drafts have been adopted by WGs? Which are in IETF Last Call?
+   - This reveals which proposals have real traction vs. which are just ideas
+
+6. **Semantic Search Enhancement** (`ietf ask "<question>"`):
+   - Natural language query against embeddings + FTS5
+   - "Which drafts address agent authentication?" → ranked results
+   - Useful for blog research and ad-hoc exploration
+
+**Code patterns to follow**:
+- CLI: Click commands in `cli.py` with `@click.option()` decorators
+- DB: Add tables in `db.py` `ensure_tables()`, queries as methods on `DraftDB`
+- Reports: Add report types in `reports.py` `generate_report()`
+- Config: Add new config fields in `config.py` `AnalyzerConfig`
+- Always cache Claude API calls via `llm_cache` table
+
+**Deliverables**:
+- [ ] RFC cross-reference feature (parse, store, report)
+- [ ] Category trend analysis (monthly breakdown, growth rates)
+- [ ] Cross-org idea overlap analysis
+- [ ] Draft status/traction tracking
+- [ ] Any features the Architect identifies as needed for the blog series
+- [ ] Scripts in `scripts/` for running new analyses
+
+---
+
+### 4. Writer — "The Storyteller"
+
+**Role**: Turn data and architectural insights into compelling, publishable blog posts. You make the complex accessible and the dry data riveting.
+
+**Your job**:
+- Work from the Architect's narrative arc and the Analyst's data packages
+- Write a series of blog posts that build on each other
+- Each post should stand alone but link to the series
+- Tone: authoritative but accessible, data-driven but narrative, opinionated where the data supports it
+- Reference specific drafts, authors, and organizations by name — this is journalism, not a whitepaper
+- Include data tables, pull quotes, and key stats formatted for web publication
+
+**The Blog Series** (refine with Architect, but start from this skeleton):
+
+### Post 1: "The IETF's AI Agent Gold Rush" (UPDATE existing draft)
+*The overview piece. Hook readers with the numbers.*
+- Update from 260→361 drafts with new findings
+- The 36x growth curve
+- The 4:1 safety deficit
+- The organizational landscape (Huawei dominance, Western absence)
+- Tease the rest of the series
+
+### Post 2: "Who's Writing the Rules for AI Agents?"
+*The geopolitics piece. Follow the authors.*
+- Deep dive into the 33 team blocs
+- The Huawei 13-person, 94%-cohesion, 22-draft campaign
+- Chinese institutional ecosystem (Huawei + China Mobile + Tsinghua + ZTE + ...)
+- Where are Google, Microsoft, Apple? Why?
+- Cross-org collaboration map: who works together, who doesn't
+- What does standards authorship concentration mean for the future?
+
+### Post 3: "The OAuth Wars and Other Battles"
+*The fragmentation piece. Where multiple teams fight over the same problem.*
+- The 13-draft OAuth-for-AI-agents cluster
+- The 8-draft multi-agent communication protocol convergence
+- Near-duplicate analysis: 25+ draft pairs with >0.98 similarity
+- The A2A protocol zoo: 92 drafts, no interop
+- What fragmentation costs: wasted effort, delayed deployment, confusion for implementers
+
+### Post 4: "What Nobody's Building (And Why It Matters)"
+*The gaps piece. The most important blog post in the series.*
+- The 3 critical gaps: behavior verification, resource management, error recovery
+- The 6 high-priority gaps: human override, cross-protocol translation, lifecycle, consensus, cross-domain security, dynamic trust
+- Only 22 human-agent interaction drafts vs 92 A2A protocols — agents talking to agents but not to humans
+- Only 36 safety drafts — who's building the guardrails?
+- For each gap: what could go wrong if this isn't addressed? Real-world scenario.
+
+### Post 5: "The 1,262 Ideas That Will Shape Agent Infrastructure"
+*The ideas piece. What's actually being proposed.*
+- Taxonomy: 488 mechanisms, 217 architectures, 179 protocols, 169 patterns, 99 extensions, 93 requirements
+- The ideas that show up everywhere (convergence signals)
+- The ideas that appear exactly once (innovation at the edges)
+- Map ideas to the gaps: which ideas partially fill which gaps?
+- The most ambitious technical proposals in the corpus
+
+### Post 6: "Drawing the Big Picture: What the Agent Ecosystem Actually Needs"
+*The vision piece. The capstone.*
+- Synthesize everything: growth, fragmentation, gaps, ideas, geopolitics
+- The proposed holistic ecosystem (AEM/ATD/HITL/AEPB/APAE) — what it would solve
+- DAG orchestration, HITL as first-class, protocol agnosticism, dual regime (relaxed vs regulated)
+- What's coming next: predictions based on the data
+- Call to action: what builders, policymakers, and standards participants should do
+
+### Post 7 (optional): "How We Built This: Analyzing 361 IETF Drafts with Claude and Ollama"
+*The methodology piece. For the technical audience.*
+- The tool: Python CLI, Datatracker API, Claude for analysis, Ollama for embeddings
+- The pipeline: fetch → analyze → embed → ideas → gaps → viz
+- Cost: ~$3 for 260 drafts, hybrid Haiku/Sonnet approach
+- Lessons in LLM-powered document analysis at scale
+- Open source: how others can use it
+
+### Post 8: "Agents Building the Agent Analysis"
+*The meta piece. The one that makes people smile.*
+- We used Claude Code agent teams (Architect, Analyst, Coder, Writer) to analyze the IETF's AI agent standards
+- The irony is the point: the very ecosystem gaps we identified (orchestration, HITL, assurance) mirror our own team structure
+- What worked: parallel execution, specialization, the prompt as project spec
+- What surprised us: moments where agent output changed our direction
+- Cost/speed transparency: tokens, time, human oversight required
+- Honest about limitations: where human judgment was non-negotiable
+- Source material: `data/reports/dev-journal.md` (every session logs milestones there)
+
+**Style guide**:
+- Lead with the most surprising finding in each post
+- Use data tables (markdown) for quantitative claims
+- Bold the key numbers on first mention
+- Link to specific drafts on datatracker.ietf.org
+- Each post: 1500-2500 words
+- End each post with a teaser for the next one
+- Include a "Key Takeaways" box (3-5 bullet points) at the end
+
+**Deliverables**:
+- [ ] Post 1: Updated "Gold Rush" overview (data/reports/blog-series/01-gold-rush.md)
+- [ ] Post 2: Authors and geopolitics (data/reports/blog-series/02-who-writes-the-rules.md)
+- [ ] Post 3: Fragmentation and competition (data/reports/blog-series/03-oauth-wars.md)
+- [ ] Post 4: Gaps analysis (data/reports/blog-series/04-what-nobody-builds.md)
+- [ ] Post 5: Ideas taxonomy (data/reports/blog-series/05-1262-ideas.md)
+- [ ] Post 6: Vision and ecosystem (data/reports/blog-series/06-big-picture.md)
+- [ ] Post 7: Methodology (data/reports/blog-series/07-how-we-built-this.md)
+
+---
+
+## Development Journal (CRITICAL)
+
+**Every agent MUST log milestones to `data/reports/dev-journal.md`.**
+
+This journal is the source material for Post 8 ("Agents Building the Agent Analysis") — the meta blog post about using Claude agent teams to build this project. Without it, we lose the story.
+
+Append entries in this format:
+```markdown
+### [DATE] [AGENT] — [SHORT TITLE]
+
+**What**: [What was done]
+**Why**: [The reasoning]
+**Result**: [Outcome, key numbers, artifacts]
+**Surprise**: [Optional — anything unexpected]
+**Cost**: [Optional — API tokens, time, model]
+```
+
+Log: pipeline runs, features built, posts written, architectural decisions, coordination moments (when one agent's output changed another's plan), data surprises, tool limitations. Skip: routine file reads, minor formatting fixes.
+
+See `CLAUDE.md` for full details.
+
+---
+
+## Workflow
+
+```
+Phase 1: Foundation (Analyst + Coder in parallel)
+├── Analyst: Run pipeline on 101 new drafts, refresh all reports
+├── Coder: Implement RFC cross-reference + category trends
+└── Architect: Read all reports, design narrative arc
+
+Phase 2: Deep Analysis (Analyst + Coder, informed by Architect)
+├── Analyst: Run new analyses (trends, cross-org ideas, competition mapping)
+├── Coder: Implement remaining features (status tracking, idea overlap)
+└── Architect: Finalize blog outline, draft vision document
+
+Phase 3: Writing (Writer, supported by Analyst)
+├── Writer: Draft posts 1-3 (overview, authors, fragmentation)
+├── Analyst: Provide data packages for each post
+└── Architect: Review each post for accuracy and narrative
+
+Phase 4: Vision (Writer + Architect)
+├── Writer: Draft posts 4-6 (gaps, ideas, big picture)
+├── Architect: Co-author post 6 (the vision piece)
+└── Writer: Draft post 7 (methodology, optional)
+
+Phase 5: Polish
+├── Architect: Final review of all posts
+├── Analyst: Fact-check all numbers against latest DB
+└── Writer: Final edits, cross-linking, consistent formatting
+```
+
+---
+
+## File Structure
+
+```
+data/reports/blog-series/
+├── 00-series-overview.md      # Series intro and reading order
+├── 01-gold-rush.md            # The overview
+├── 02-who-writes-the-rules.md # Authors and geopolitics
+├── 03-oauth-wars.md           # Fragmentation
+├── 04-what-nobody-builds.md   # Gaps (the most important one)
+├── 05-1262-ideas.md           # Ideas taxonomy
+├── 06-big-picture.md          # Vision and ecosystem
+└── 07-how-we-built-this.md    # Methodology
+```
+
+---
+
+## Key Data Points to Weave Throughout
+
+These numbers should appear across multiple posts:
+
+| Stat | Value | Context |
+|------|-------|---------|
+| Total drafts | 361 | (up from 260 after keyword expansion) |
+| Growth rate | 36x | (2→72 drafts/month, Jun 2025→Feb 2026) |
+| Safety ratio | 4:1 | (capability to safety drafts) |
+| Authors | 403 | from 184 organizations |
+| Top org | Huawei | 39 authors, 45 drafts |
+| Ideas extracted | 1,262 | from 260 analyzed drafts |
+| Gaps identified | 12 | 3 critical, 6 high, 3 medium |
+| Team blocs | 33 | covering 25% of all authors |
+| Biggest bloc | Huawei | 13 people, 94% cohesion, 22 shared drafts |
+| OAuth cluster | 13 drafts | all solving agent auth via OAuth, incompatible |
+| Near-duplicates | 25+ pairs | >0.98 cosine similarity |
+| A2A protocols | 92 drafts | no interoperability layer |
+| Human-agent | 22 drafts | vs 92 A2A and 60 autonomous netops |
+| Analysis cost | ~$3.16 | 260 drafts via Claude Sonnet |
+
+---
+
+## The Thesis
+
+Across all posts, we're building to one argument:
+
+**The IETF's AI agent standardization effort is the largest, fastest-growing, and most consequential standards race in a decade — but it's building the highways before the traffic lights. The data shows explosive growth (36x in 9 months), deep fragmentation (92 competing A2A protocols), concerning concentration (one company writes 12% of all drafts), and a structural safety deficit (4:1 capability to guardrails). What's missing isn't more protocols — it's the connective tissue: a shared execution model, human oversight primitives, protocol interoperability, and assurance profiles that work from dev to production. The drafts we analyzed contain 1,262 technical ideas, many of them brilliant, but they need an architecture to fit into. That's the big picture.**