Files

Christian Nennemann d6beb9c0a0 v0.3.0: Gap-to-Draft pipeline, Living Standards Observatory, blog series

Gap-to-Draft Pipeline (ietf pipeline):
- Context builder assembles ideas, RFC foundations, similar drafts, ecosystem vision
- Generator produces outlines + sections using rich context with Claude
- Quality gates: novelty (embedding similarity), references, format, self-rating
- Family coordinator generates 5-draft ecosystem (AEM/ATD/HITL/AEPB/APAE)
- I-D formatter with proper headers, references, 72-char wrapping

Living Standards Observatory (ietf observatory):
- Source abstraction with IETF + W3C fetchers
- 7-step update pipeline: snapshot, fetch, analyze, embed, ideas, gaps, record
- Static GitHub Pages dashboard (explorer, gap tracker, timeline)
- Weekly CI/CD automation via GitHub Actions

Also includes:
- 361 drafts (expanded from 260 with 6 new keywords), 403 authors, 1,262 ideas, 12 gaps
- Blog series (8 posts planned), reports, arXiv paper figures
- Agent team infrastructure (CLAUDE.md, scripts, dev journal)
- 5 new DB tables, schema migration, ~15 new query methods

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

2026-03-04 00:48:57 +01:00

20 KiB

Raw Blame History

Blog Series: The IETF's AI Agent Standards Race

Series Overview and Narrative Arc

Architectural design document governing the 7-post blog series. This document has two sections: (A) the internal narrative architecture (for the team), and (B) the reader-facing series introduction (for publication).

PART A: NARRATIVE ARCHITECTURE (Internal)

Overall Thesis

The IETF's AI agent standardization effort is the largest, fastest-growing, and most consequential standards race in a decade -- but it is building the highways before the traffic lights.

The data tells a story in three acts:

The Gold Rush (Posts 1-2): An explosion of activity, concentrated in surprising hands. 361 drafts, 36x growth in 9 months, one company writing 18% of all drafts, Western tech giants dramatically underrepresented.
The Fragmentation (Posts 3-4): That activity is not converging. 120 competing A2A protocols with no interoperability layer. 14 OAuth-for-agents proposals that cannot coexist. A 4:1 ratio of capability-building to safety work. Critical gaps where nobody is building at all.
The Path Forward (Posts 5-6): The raw material for a solution exists -- 628 technical ideas independently proposed by multiple organizations show where genuine consensus is forming. But convergence on components is not convergence on architecture. The missing piece is not more protocols; it is connective tissue: a shared execution model, human oversight primitives, protocol interoperability, and assurance profiles.

The throughline is a question: Can the IETF assemble the architecture before the protocols ship without it?

Narrative Arc Diagram

TENSION
  ^
  |                                          Post 6: THE BIG PICTURE
  |                                         /  (resolution: here's
  |                                        /    what the ecosystem
  |                Post 4: THE GAPS  -----+     actually needs)
  |               / (climax: what                         \
  |              /   nobody's building)                    \
  |    Post 3  /                           Post 5          \
  |    FRAGMENTATION                       CONVERGENCE      \
  |   /  (escalation:                     (628 cross-org     \
  |  /   competing                         for solutions)     Post 7
  | /    protocols)                                           HOW WE
  |/                                                          BUILT THIS
  Post 1              Post 2
  GOLD RUSH           WHO WRITES
  (hook: the          THE RULES
   numbers)           (stakes:
                       geopolitics)
  +-----------------------------------------------------------> TIME/POSTS

The emotional arc: Wow, this is huge (Post 1) -> Wait, who controls it? (Post 2) -> Oh no, it is fragmenting (Post 3) -> And the most important parts are missing (Post 4, the climax) -> But beneath the chaos, organizations actually agree on 628 ideas (Post 5) -> Here is what the finished picture looks like (Post 6, the resolution) -> And here is how we figured all this out (Post 7, the coda).

Per-Post Design

Post 1: "The IETF's AI Agent Gold Rush"

File: 01-gold-rush.md Word count: 1800-2200 Base: Existing draft at data/reports/blog-post.md, needs update from 260 to 361 drafts

Key thesis: The IETF is experiencing an unprecedented standardization sprint around AI agents, with growth rates not seen since the early web standards era.

Key data points to include:

361 drafts (up from 260 after keyword expansion with mcp, agentic, inference, generative, intelligent, aipref)
36x growth: 2 drafts/month (Jun 2025) to 72 drafts/month (Feb 2026)
557 authors from 230 organizations
10+ categories, with data formats/interop (145), A2A protocols (120), and identity/auth (108) leading
Average quality score: ~3.38/5.0 (range 1.35-4.8)
Top-rated drafts: VOLT (4.8), DAAP (4.8), STAMP (4.6), TPM-attestation (4.6)
4:1 safety deficit ratio (first mention -- this becomes the recurring motif)

What makes it worth reading alone: The sheer numbers. Nobody else has quantified this. The 36x growth curve is the hook.

Ends with: Teaser for Post 2 -- "But who is writing all these drafts? The answer is more concentrated than you'd expect."

Post 2: "Who's Writing the Rules for AI Agents?"

File: 02-who-writes-the-rules.md Word count: 2000-2500

Key thesis: The standards that will govern AI agents are being written by a remarkably concentrated set of authors, with geopolitical implications that the IETF community has not reckoned with.

Key data points to include:

Huawei: 53 authors, 66 drafts, 18% of all drafts (up from 12% pre-expansion)
The 13-person Huawei bloc: 22 shared drafts, 94% cohesion, core 7 (B. Liu, N. Geng, Z. Li, Q. Gao, X. Shang, J. Mao, G. Zeng) each on 13-23 drafts
Chinese institutional ecosystem: Huawei (53) + China Mobile (24) + China Telecom (24) + China Unicom (22) + Tsinghua (13) + ZTE (12) + BUPT (14) + Pengcheng Lab (8) + Zhongguancun Lab (4) = 160+ authors
Western underrepresentation: Google now visible (5 authors, 9 drafts) but dramatically small relative to market position. Microsoft, Apple still largely absent. Amazon has 6 authors on 6 drafts (PQ crypto, not agent-specific).
18 team blocs covering ~25% of 557 authors
Cross-org collaboration is sparse: top cross-team pair (Rosenberg-Jennings, Five9/Cisco) shares only 3 drafts
Ericsson + Inria team focused narrowly on EDHOC/post-quantum (5 people, 6 drafts, 100% cohesion)
JPMorgan + Telefonica + Oracle on transitive attestation (Western financial sector emerging)
Chinese orgs form a tightly linked ecosystem: Huawei-China Unicom (6 shared drafts), Tsinghua-Zhongguancun Lab (5), China Mobile-ZTE (4)

Structural insight: Team blocs inflate apparent collaboration. When you account for intra-bloc pairs, cross-pollination between groups is thin. The landscape is a collection of islands, not a network.

What makes it worth reading alone: The geopolitics angle. The Huawei concentration is a genuine story. The Western absence is the surprise.

Ends with: "These 18 teams are not just writing separate drafts -- they are writing separate futures. The fragmentation runs deeper than authorship."

Post 3: "The OAuth Wars and Other Protocol Battles"

File: 03-oauth-wars.md Word count: 2000-2500

Key thesis: The AI agent standards landscape is not just growing -- it is fragmenting. Multiple teams are solving the same problems independently, producing incompatible solutions that will impose real costs on implementers.

Key data points to include:

14-draft OAuth-for-agents cluster: aap-oauth-profile, aylward-daap-v2, barney-caam, chen-ai-agent-auth, chen-oauth-rar, goswami-agentic-jwt, jia-oauth-scope, liu-agent-operation-auth, liu-oauth-a2a, oauth-ai-agents-on-behalf-of-user, rosenberg-oauth-aauth, song-oauth-ai-agent-auth, song-oauth-ai-agent-collaborate, yao-agent-auth
10-draft Agent Gateway cluster
25+ near-duplicate draft pairs (>0.98 similarity)
42 topical clusters at 0.85 similarity threshold, 34 at 0.90
120 A2A protocol drafts with no interoperability layer
Near-duplicate taxonomy: same-draft/different-WG (14), renamed (5), evolution (3), competing (2)
Specific examples of WG shopping: draft submitted to both NMRG and OPSAWG, or both individual and WG track

Structural insight: Three causes of fragmentation: (1) WG shopping -- authors submit to multiple WGs hoping one sticks. (2) Parallel invention -- teams in isolation solving the same problem. (3) Strategic duplication -- organizations maximizing surface area. The data lets us distinguish these.

What makes it worth reading alone: The concrete examples. 14 ways to do OAuth for agents. People share this out of horrified fascination.

Ends with: "Fragmentation is costly but fixable -- teams can converge. The deeper problem is what nobody is building at all."

Post 4: "What Nobody's Building (And Why It Matters)"

File: 04-what-nobody-builds.md Word count: 2000-2500

THIS IS THE CLIMAX OF THE SERIES.

Key thesis: The most dangerous gaps in AI agent standardization are not where competing solutions exist -- they are where no solutions exist at all. The three critical gaps address what happens when autonomous agents fail or misbehave, and these scenarios have received almost no attention.

Key data points to include:

12 gaps total: 3 critical, 6 high, 3 medium
Critical Gap 1: Behavior Verification -- no mechanisms to verify agents follow declared policies. 44 safety drafts vs 361 total.
Critical Gap 2: Resource Management -- 93 autonomous netops drafts, no agent-specific resource management framework.
Critical Gap 3: Error Recovery and Rollback -- only 6 ideas from 1 draft (the starkest absence in the corpus).
High Gap: Cross-Protocol Translation -- 120 A2A protocols, zero ideas for cross-protocol interop.
High Gap: Human Override -- 30 human-agent drafts vs 120 A2A vs 93 autonomous netops. CHEQ exists but no emergency override protocol.
The 4:1 ratio revisited: safety deficit is not just numerical, it is structural. Safety requires cross-WG coordination that the bloc structure cannot produce.
Gap severity correlates with coordination difficulty

For each critical gap, include a scenario: "What goes wrong if this is never addressed?" -- make the gaps concrete and visceral.

What makes it worth reading alone: The fear factor. This is the "what keeps you up at night" post.

Ends with: "The gaps are real. But so are the solutions -- 628 ideas that multiple organizations independently agree on, scattered across the corpus with no connective tissue."

Post 5: "Where 361 Drafts Converge (And Where They Don't)"

File: 05-1262-ideas.md Word count: 2000-2500

Key thesis: Beneath the fragmentation, genuine consensus is forming. 628 technical ideas have been independently proposed by 2+ organizations -- cross-org convergence signals that reveal what the industry actually agrees on, regardless of which protocol camp they belong to.

IMPORTANT NOTE ON FRAMING: Our pipeline extracts ~5 ideas per draft mechanically (avg 4.9). The raw count (~1,780) is inflated and not the story. The story is which ideas survive cross-org validation -- the 628 that appear across different organizations. That is the defensible, meaningful metric. The raw extraction count should appear only in methodology context, not as a headline number.

Key data points to include:

628 cross-org convergent ideas (ideas in 2+ drafts from different organizations) -- the headline metric
Top convergence: "A2A Communication Paradigm" (8 orgs, 5 countries), "AI Agent Network Architecture" (8 orgs), "Multi-Agent Communication Protocol" (7 orgs)
Org-pair overlap matrix: Chinese intra-bloc alignment (Huawei-China Unicom: 32 shared ideas) vs thin cross-regional signal (Ericsson-Inria: 21)
Cross-org ideas that span Chinese-Western divide: 180 ideas (genuine cross-cultural consensus)
Gap-to-convergence mapping: which gaps have cross-org attention, which have none?
The "big 6" ambitious proposals: VOLT, ECT, CHEQ, STAMP, DAAP, ADL -- standout ideas regardless of convergence metrics
The absent ideas: capability degradation signaling, multi-agent transaction semantics, agent migration, privacy-preserving discovery, agent cost/billing

Structural insight: Convergence and fragmentation coexist. Teams agree on WHAT needs building (628 ideas converge). They disagree on HOW (120 competing A2A protocols). The gap between "what" and "how" is where architecture is needed.

What makes it worth reading alone: The cross-org convergence data is actionable -- builders can see which ideas have multi-org backing vs single-team proposals.

Ends with: "628 ideas the industry agrees on, 12 gaps nobody is filling, and a question: what would it look like if someone drew the big picture?"

Post 6: "Drawing the Big Picture: What the Agent Ecosystem Actually Needs"

File: 06-big-picture.md Word count: 2000-2500

THIS IS THE RESOLUTION AND CAPSTONE.

Key thesis: The landscape needs not more protocols but connective tissue -- a holistic ecosystem architecture providing a shared execution model (DAGs), human oversight primitives, protocol-agnostic interoperability, and assurance profiles that work from dev to regulated production.

Key data points to include:

Full synthesis: 361 drafts, 557 authors, 628 cross-org convergent ideas, 12 gaps, 18 team blocs, 42 overlap clusters
The proposed 5-draft ecosystem: AEM (architecture), ATD (task DAG), HITL (human-in-the-loop), AEPB (protocol binding), APAE (assurance profiles)
How this builds on existing work: SPIFFE (identity), WIMSE (security context), ECT (execution evidence)
The dual-regime insight: same execution model must work in K8s (fast/relaxed) AND regulated environments (proofs/attestation)
Predictions based on data trajectories
What builders should do TODAY: which drafts to watch, which gaps to fill, which patterns to adopt

Structural insight: The ecosystem needs five layers and existing work covers ~60%. Missing pieces: (1) DAG orchestration semantics, (2) HITL as first-class, (3) protocol translation, (4) assurance profiles. These map precisely to the critical and high-severity gaps.

What makes it worth reading alone: The vision. The forward-looking piece people share with their teams.

Ends with: "The IETF has navigated standardization sprints before. The drafts are being written. The question is whether architecture or fragmentation wins the race."

Post 7: "How We Built This: Analyzing 361 IETF Drafts with Claude and Ollama"

File: 07-how-we-built-this.md Word count: 1500-2000

Key thesis: LLM-powered document analysis at scale is practical, cheap, and effective -- with careful engineering around caching, cost optimization, and hybrid model strategies.

Key data points to include:

Pipeline: fetch (Datatracker API) -> analyze (Claude Sonnet) -> embed (Ollama nomic-embed-text) -> ideas (Claude Haiku, batched) -> gaps (Claude Sonnet)
Cost: ~$3.16 for 260 drafts; Haiku batch mode cut costs ~10x for idea extraction
Hybrid strategy: Claude for analysis (reasoning), Ollama for embeddings (local, free, fast)
Caching via llm_cache table (SHA256 prompt hash) -- zero waste on re-runs
Tech: Python + Click + SQLite + FTS5 + httpx + rich + anthropic SDK + ollama
13 CLI commands, 13+ visualizations, 11 report types

What makes it worth reading alone: Practical engineering details for anyone building similar systems.

Ends with: Cross-link to Post 8 (the meta post about the agent team).

Recurring Motifs (thread across all posts)

The 4:1 Safety Deficit: Introduced in Post 1, deepened in Post 4, resolved in Post 6. The series' signature metric.
The Highway/Traffic Light Metaphor: The IETF is building highways (protocols) before traffic lights (safety, verification, override). Use sparingly but consistently.
Fragmentation vs. Architecture: Bottom-up protocol proliferation vs. top-down ecosystem design. Posts 3 and 6 are the poles of this tension.
Concentration and Absence: Huawei's dominance and Western absence. Introduced in Post 2, revisited in Post 6.
The Islands Problem: Team blocs as islands. Ideas cluster within orgs. Cross-pollination is thin. The ecosystem needs bridges, not more islands.

Data Needs Per Post (for the Analyst)

Post	Data Needed
1	Updated counts (361), category breakdown with new drafts, growth timeline, score distribution
2	Author/org rankings (refreshed for 361), bloc details, cross-org matrix, Chinese vs Western counts
3	OAuth cluster details (14 drafts with approaches), near-duplicate pairs, overlap clusters, A2A count
4	Full gap details, per-gap idea counts, safety ratio, category vs gap matrix
5	Full idea taxonomy, cross-org idea overlap, common ideas, unique ideas, idea-to-gap mapping
6	Synthesis: top-level stats, gap fill estimates, category growth rates, WG adoption signals
7	Pipeline stats: API call counts, costs, cache hit rates, timing

Missing Analyses the Coder Should Build

Category Trend Analysis (Posts 1, 3, 6): Monthly breakdown per category. Growth rates. Which accelerating, which plateauing?
RFC Cross-Reference Map (Posts 5, 6): Which RFCs do the 361 drafts build on? Reveals the foundation layer.
Cross-Org Idea Overlap (Post 5): Ideas in 2+ drafts from different orgs = genuine consensus signal.
Draft Status / WG Adoption (Post 6): Which drafts adopted by WGs? Which past -00? Traction vs aspiration.

Tone and Style

Data-driven but narrative: Every claim backed by a number, every number wrapped in a story.
Authoritative but accessible: Analysis, not advocacy. Let the data argue.
Opinionated where data supports it: The safety deficit is a problem. Fragmentation is costly. Concentration is concerning.
Name names: Specific drafts, authors, organizations. This is journalism.
Lead with surprise: Each post opens with its most unexpected finding.
End with forward link: Each post teases the next.
1500-2500 words per post: Dense enough to be substantial, short enough to finish.

PART B: READER-FACING SERIES INTRODUCTION

What happens when the internet's standards body tries to build the rules for AI agents -- in real time, with 361 drafts, 557 authors, and a 4:1 safety deficit?

About This Series

The Internet Engineering Task Force is in the middle of the largest, fastest-growing standards race in a decade. In fifteen months, AI- and agent-related Internet-Drafts went from 0.5% to 9.3% of all IETF submissions -- nearly 1 in 10. We built an automated analyzer to fetch, categorize, rate, and map every one of them.

This series tells the story of what we found: explosive growth, deep fragmentation, a concerning safety deficit, and hidden patterns that reveal where the real power lies and where the real risks lurk.

The Posts

#	Title	What You'll Learn
1	The IETF's AI Agent Gold Rush	The numbers: 361 drafts, 0.5% to 9.3% growth in 15 months, and a 4:1 capability-to-safety ratio
2	Who's Writing the Rules for AI Agents?	The geopolitics: Huawei's 13-person bloc, Chinese institutional dominance, Western underrepresentation
3	The OAuth Wars and Other Battles	The fragmentation: 14 competing OAuth drafts, 120 A2A protocols with no interop
4	What Nobody's Building (And Why It Matters)	The gaps: 12 missing standards, 3 critical, and what goes wrong without them
5	Where 361 Drafts Converge (And Where They Don't)	The convergence: 628 cross-org ideas reveal genuine consensus beneath the fragmentation
6	Drawing the Big Picture	The vision: what the agent ecosystem actually needs and what comes next
7	How We Built This	The methodology: analyzing 361 drafts with Claude, Ollama, and Python

How to Read

Linear (recommended): 1 -> 2 -> 3 -> 4 -> 5 -> 6 -> 7

By interest:

Executives / decision-makers: Post 1 (overview) -> Post 4 (gaps) -> Post 6 (vision)
Standards participants: Post 2 (who's writing) -> Post 3 (fragmentation) -> Post 5 (ideas) -> Post 6 (vision)
Builders / implementers: Post 4 (gaps) -> Post 5 (ideas) -> Post 6 (vision) -> Post 7 (methodology)

Each post stands alone, but they build on each other. If you read one, make it Post 4 -- the gaps analysis is the most consequential finding.

The Data

All findings come from our open-source IETF Draft Analyzer, which fetches drafts via the Datatracker API, rates them using Claude, extracts technical ideas, detects collaboration patterns via co-authorship analysis, and identifies standardization gaps. Data current as of March 2026.

Stat	Value
Drafts analyzed	361
Authors mapped	557
Organizations	230
Cross-org convergent ideas	628
Gaps identified	12 (3 critical)
Team blocs detected	18
Analysis cost	~$9

Designed by the Architect agent, 2026-03-03.

20 KiB Raw Blame History

Blog Series: The IETF's AI Agent Standards Race

Series Overview and Narrative Arc

PART A: NARRATIVE ARCHITECTURE (Internal)

Overall Thesis

Narrative Arc Diagram

Per-Post Design

Post 1: "The IETF's AI Agent Gold Rush"

Post 2: "Who's Writing the Rules for AI Agents?"

Post 3: "The OAuth Wars and Other Protocol Battles"

Post 4: "What Nobody's Building (And Why It Matters)"

Post 5: "Where 361 Drafts Converge (And Where They Don't)"

Post 6: "Drawing the Big Picture: What the Agent Ecosystem Actually Needs"

Post 7: "How We Built This: Analyzing 361 IETF Drafts with Claude and Ollama"

Recurring Motifs (thread across all posts)

Data Needs Per Post (for the Analyst)

Missing Analyses the Coder Should Build

Tone and Style

PART B: READER-FACING SERIES INTRODUCTION

About This Series

The Posts

How to Read

The Data

20 KiB

Raw Blame History