v0.3.0: Publication-ready release with blog site, paper update, and polish

Release prep:
- Version bump to 0.3.0 (pyproject.toml, cli.py)
- Rewrite README.md with current stats (475 drafts, 713 authors, 501 ideas)
- Add CONTRIBUTING.md with dev setup and code conventions

Blog site:
- Add scripts/build-site.py (markdown → HTML with clean CSS, dark mode, nav)
- Generate static site in docs/blog/ (10 pages)
- Ready for GitHub Pages deployment

Academic paper (paper/main.tex):
- Update all counts: 474→475 drafts, 557→710 authors, 1907→462 ideas, 11→12 gaps
- Add false-positive filtering methodology (113 excluded, 361 relevant)
- Add cross-org convergence analysis (132 ideas, 33% rate)
- Add GDPR compliance gap to gap table
- Add LLM-as-judge caveats to rating methodology and limitations
- Add FIPA, IEEE P3394, W3C WoT to related work with bibliography entries
- Fix safety ratio to show monthly variation (1.5:1 to 21:1)

Pipeline:
- Fetch 1 new draft (475 total), 3 new authors (713 total)
- Fix 16 ruff lint errors across test files
- All 106 tests pass

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
This commit is contained in:
2026-03-08 17:54:43 +01:00
parent e247bfef8f
commit 1ec1f69bee
34 changed files with 4268 additions and 272 deletions

View File

@@ -0,0 +1,377 @@
<!DOCTYPE html>
<html lang="en">
<head>
<meta charset="utf-8">
<meta name="viewport" content="width=device-width, initial-scale=1">
<title>Blog Series: The IETF's AI Agent Standards Race — IETF AI Agent Analysis</title>
<link rel="stylesheet" href="/blog/css/style.css">
</head>
<body>
<div class="container">
<nav><a href="/blog/" class="site-title">IETF AI Agent Analysis</a>
<a href="/blog/posts/01-gold-rush.html">Rush</a>
<a href="/blog/posts/02-who-writes-the-rules.html">Rules</a>
<a href="/blog/posts/03-oauth-wars.html">Wars</a>
<a href="/blog/posts/04-what-nobody-builds.html">Builds</a>
<a href="/blog/posts/05-1262-ideas.html">Converge</a>
<a href="/blog/posts/06-big-picture.html">Picture</a>
<a href="/blog/posts/07-how-we-built-this.html">This</a>
<a href="/blog/posts/08-agents-building-the-analysis.html">Analysis</a></nav>
<h1 id="blog-series-the-ietfs-ai-agent-standards-race">Blog Series: The IETF's AI Agent Standards Race</h1>
<h2 id="series-overview-and-narrative-arc">Series Overview and Narrative Arc</h2>
<p><em>Architectural design document governing the 7-post blog series. This document has two sections: (A) the internal narrative architecture (for the team), and (B) the reader-facing series introduction (for publication).</em></p>
<p><em>Analysis based on IETF Datatracker data collected through March 2026. Counts and statistics reflect this snapshot.</em></p>
<hr />
<h1 id="part-a-narrative-architecture-internal">PART A: NARRATIVE ARCHITECTURE (Internal)</h1>
<h2 id="overall-thesis">Overall Thesis</h2>
<p><strong>The IETF's AI agent standardization effort is the largest, fastest-growing, and most consequential standards race in a decade -- but it is building the highways before the traffic lights.</strong></p>
<p>The data tells a story in three acts:</p>
<ol>
<li>
<p><strong>The Gold Rush</strong> (Posts 1-2): An explosion of activity, concentrated in surprising hands. 434 drafts, rapid growth in 9 months, one company writing ~16% of all drafts, Western tech giants dramatically underrepresented.</p>
</li>
<li>
<p><strong>The Fragmentation</strong> (Posts 3-4): That activity is not converging. 155 competing A2A protocols with no interoperability layer. 14 OAuth-for-agents proposals that cannot coexist. A ~4:1 ratio of capability-building to safety work (averaging ~4:1 but varying from 1.5:1 to 21:1 month-to-month). Critical gaps where nobody is building at all.</p>
</li>
<li>
<p><strong>The Path Forward</strong> (Posts 5-6): The raw material for a solution exists -- <strong>130 cross-org convergent ideas</strong> (36% of unique clusters) independently proposed by multiple organizations show where genuine consensus is forming. But convergence on components is not convergence on architecture. The missing piece is not more protocols; it is connective tissue: a shared execution model, human oversight primitives, protocol interoperability, and assurance profiles.</p>
</li>
</ol>
<p>The throughline is a question: <strong>Can the IETF assemble the architecture before the protocols ship without it?</strong></p>
<hr />
<h2 id="narrative-arc-diagram">Narrative Arc Diagram</h2>
<pre><code>TENSION
^
| Post 6: THE BIG PICTURE
| / (resolution: here's
| / what the ecosystem
| Post 4: THE GAPS -----+ actually needs)
| / (climax: what \
| / nobody's building) \
| Post 3 / Post 5 \
| FRAGMENTATION CONVERGENCE \
| / (escalation: (130 cross-org \
| / competing for solutions) Post 7
| / protocols) HOW WE
|/ BUILT THIS
Post 1 Post 2
GOLD RUSH WHO WRITES
(hook: the THE RULES
numbers) (stakes:
geopolitics)
+-----------------------------------------------------------&gt; TIME/POSTS
</code></pre>
<p><strong>The emotional arc</strong>: Wow, this is huge (Post 1) -&gt; Wait, who controls it? (Post 2) -&gt; Oh no, it is fragmenting (Post 3) -&gt; And the most important parts are missing (Post 4, the climax) -&gt; But beneath the chaos, organizations actually agree on 130 ideas (Post 5) -&gt; Here is what the finished picture looks like (Post 6, the resolution) -&gt; And here is how we figured all this out (Post 7, the coda).</p>
<hr />
<h2 id="per-post-design">Per-Post Design</h2>
<h3 id="post-1-the-ietfs-ai-agent-gold-rush">Post 1: "The IETF's AI Agent Gold Rush"</h3>
<p><strong>File</strong>: <code>01-gold-rush.md</code>
<strong>Word count</strong>: 1800-2200
<strong>Base</strong>: Existing draft at <code>data/reports/blog-post.md</code>, needs update from 260 to 434 drafts</p>
<p><strong>Key thesis</strong>: The IETF is experiencing an unprecedented standardization sprint around AI agents, with growth rates not seen since the early web standards era.</p>
<p><strong>Key data points to include</strong>:
- 434 drafts (up from 260 after keyword expansion with mcp, agentic, inference, generative, intelligent, aipref)
- Rapid growth: from 5 drafts/month (Jun 2025) to 85 drafts/month (Feb 2026)
- 557 authors from 230 organizations
- 10+ categories, with data formats/interop (174), A2A protocols (155), and identity/auth (152) leading
- Average quality score: ~3.27/5.0 (4-dim composite, range 1.25-4.75)
- Top-rated drafts: VOLT (4.75), DAAP (4.75), STAMP (4.5), TPM-attestation (4.5)
- ~4:1 safety deficit ratio on aggregate, varying from 1.5:1 to 21:1 by month (first mention -- this becomes the recurring motif)</p>
<p><strong>What makes it worth reading alone</strong>: The sheer numbers. Nobody else has quantified this. The rapid growth curve is the hook.</p>
<p><strong>Ends with</strong>: Teaser for Post 2 -- "But who is writing all these drafts? The answer is more concentrated than you'd expect."</p>
<hr />
<h3 id="post-2-whos-writing-the-rules-for-ai-agents">Post 2: "Who's Writing the Rules for AI Agents?"</h3>
<p><strong>File</strong>: <code>02-who-writes-the-rules.md</code>
<strong>Word count</strong>: 2000-2500</p>
<p><strong>Key thesis</strong>: The standards that will govern AI agents are being written by a remarkably concentrated set of authors, with geopolitical implications that the IETF community has not reckoned with.</p>
<p><strong>Key data points to include</strong>:
- Huawei: 53 authors, 69 drafts, ~16% of all drafts (up from 12% pre-expansion)
- The 13-person Huawei bloc: 22 shared drafts, 94% cohesion, core 7 (B. Liu, N. Geng, Z. Li, Q. Gao, X. Shang, J. Mao, G. Zeng) each on 13-23 drafts
- Chinese institutional ecosystem: Huawei (53) + China Mobile (24) + China Telecom (24) + China Unicom (22) + Tsinghua (13) + ZTE (12) + BUPT (14) + Pengcheng Lab (8) + Zhongguancun Lab (4) = 160+ authors
- Western underrepresentation: Google now visible (5 authors, 9 drafts) but dramatically small relative to market position. Microsoft, Apple still largely absent. Amazon has 6 authors on 6 drafts (PQ crypto, not agent-specific).
- 18 team blocs covering ~25% of 557 authors
- Cross-org collaboration is sparse: top cross-team pair (Rosenberg-Jennings, Five9/Cisco) shares only 3 drafts
- Ericsson + Inria team focused narrowly on EDHOC/post-quantum (5 people, 6 drafts, 100% cohesion)
- JPMorgan + Telefonica + Oracle on transitive attestation (Western financial sector emerging)
- Chinese orgs form a tightly linked ecosystem: Huawei-China Unicom (6 shared drafts), Tsinghua-Zhongguancun Lab (5), China Mobile-ZTE (4)</p>
<p><strong>Structural insight</strong>: Team blocs inflate apparent collaboration. When you account for intra-bloc pairs, cross-pollination between groups is thin. The landscape is a collection of islands, not a network.</p>
<p><strong>What makes it worth reading alone</strong>: The geopolitics angle. The Huawei concentration is a genuine story. The Western absence is the surprise.</p>
<p><strong>Ends with</strong>: "These 18 teams are not just writing separate drafts -- they are writing separate futures. The fragmentation runs deeper than authorship."</p>
<hr />
<h3 id="post-3-the-oauth-wars-and-other-protocol-battles">Post 3: "The OAuth Wars and Other Protocol Battles"</h3>
<p><strong>File</strong>: <code>03-oauth-wars.md</code>
<strong>Word count</strong>: 2000-2500</p>
<p><strong>Key thesis</strong>: The AI agent standards landscape is not just growing -- it is fragmenting. Multiple teams are solving the same problems independently, producing incompatible solutions that will impose real costs on implementers.</p>
<p><strong>Key data points to include</strong>:
- 14-draft OAuth-for-agents cluster: aap-oauth-profile, aylward-daap-v2, barney-caam, chen-ai-agent-auth, chen-oauth-rar, goswami-agentic-jwt, jia-oauth-scope, liu-agent-operation-auth, liu-oauth-a2a, oauth-ai-agents-on-behalf-of-user, rosenberg-oauth-aauth, song-oauth-ai-agent-auth, song-oauth-ai-agent-collaborate, yao-agent-auth
- 10-draft Agent Gateway cluster
- 25+ near-duplicate draft pairs (&gt;0.98 similarity)
- 42 topical clusters at 0.85 similarity threshold, 34 at 0.90
- 155 A2A protocol drafts with no interoperability layer
- Near-duplicate taxonomy: same-draft/different-WG (14), renamed (5), evolution (3), competing (2)
- Specific examples of WG shopping: draft submitted to both NMRG and OPSAWG, or both individual and WG track</p>
<p><strong>Structural insight</strong>: Three causes of fragmentation: (1) WG shopping -- authors submit to multiple WGs hoping one sticks. (2) Parallel invention -- teams in isolation solving the same problem. (3) Strategic duplication -- organizations maximizing surface area. The data lets us distinguish these.</p>
<p><strong>What makes it worth reading alone</strong>: The concrete examples. 14 ways to do OAuth for agents. People share this out of horrified fascination.</p>
<p><strong>Ends with</strong>: "Fragmentation is costly but fixable -- teams can converge. The deeper problem is what nobody is building at all."</p>
<hr />
<h3 id="post-4-what-nobodys-building-and-why-it-matters">Post 4: "What Nobody's Building (And Why It Matters)"</h3>
<p><strong>File</strong>: <code>04-what-nobody-builds.md</code>
<strong>Word count</strong>: 2000-2500</p>
<p><strong>THIS IS THE CLIMAX OF THE SERIES.</strong></p>
<p><strong>Key thesis</strong>: The most dangerous gaps in AI agent standardization are not where competing solutions exist -- they are where no solutions exist at all. The three critical gaps address what happens when autonomous agents fail or misbehave, and these scenarios have received almost no attention.</p>
<p><strong>Key data points to include</strong>:
- 11 gaps total: 2 critical, 5 high, 4 medium
- <strong>Critical Gap 1: Behavioral Verification</strong> -- no mechanisms to verify agents follow declared policies. 47 safety drafts vs 434 total.
- <strong>Critical Gap 2: Failure Cascade Prevention</strong> -- 114 autonomous netops drafts, no cascade prevention framework.
- <strong>Critical Gap 3: Error Recovery and Rollback</strong> -- only 6 ideas from 1 draft (the starkest absence in the corpus).
- <strong>High Gap: Cross-Protocol Translation</strong> -- 155 A2A protocols, zero ideas for cross-protocol interop.
- <strong>High Gap: Human Override</strong> -- 34 human-agent drafts vs 155 A2A vs 114 autonomous netops. CHEQ exists but no emergency override protocol.
- The ~4:1 ratio (varying 1.5:1 to 21:1) revisited: safety deficit is not just numerical, it is structural. Safety requires cross-WG coordination that the bloc structure cannot produce.
- Gap severity correlates with coordination difficulty</p>
<p><strong>For each critical gap, include a scenario</strong>: "What goes wrong if this is never addressed?" -- make the gaps concrete and visceral.</p>
<p><strong>What makes it worth reading alone</strong>: The fear factor. This is the "what keeps you up at night" post.</p>
<p><strong>Ends with</strong>: "The gaps are real. But so are the solutions -- 130 ideas that multiple organizations independently agree on, scattered across the corpus with no connective tissue."</p>
<hr />
<h3 id="post-5-where-434-drafts-converge-and-where-they-dont">Post 5: "Where 434 Drafts Converge (And Where They Don't)"</h3>
<p><strong>File</strong>: <code>05-1262-ideas.md</code>
<strong>Word count</strong>: 2000-2500</p>
<p><strong>Key thesis</strong>: Beneath the fragmentation, genuine consensus is forming. <strong>130 cross-org convergent ideas</strong> (36% of unique clusters) have been independently proposed by 2+ organizations -- cross-org convergence signals that reveal what the industry actually agrees on, regardless of which protocol camp they belong to.</p>
<p><strong>IMPORTANT NOTE ON FRAMING</strong>: The current database contains 419 ideas in 361 unique clusters. Cross-org convergence analysis (SequenceMatcher at 0.75 threshold) yields 130 ideas appearing across 2+ organizations. An earlier pipeline run with ~1,780 raw ideas produced 628 cross-org convergent ideas; the convergence <em>rate</em> (~36%) is consistent across both runs. The raw count is not the story. The story is which ideas survive cross-org validation. The raw extraction count should appear only in methodology context, not as a headline number.</p>
<p><strong>Key data points to include</strong>:
- <strong>130 cross-org convergent ideas</strong> (ideas in 2+ drafts from different organizations) -- the headline metric
- Top convergence: "A2A Communication Paradigm" (8 orgs, 5 countries), "AI Agent Network Architecture" (8 orgs), "Multi-Agent Communication Protocol" (7 orgs)
- Org-pair overlap matrix: Chinese intra-bloc alignment (Huawei-China Unicom: 32 shared ideas) vs thin cross-regional signal (Ericsson-Inria: 21)
- Cross-org ideas that span Chinese-Western divide: 180 ideas (genuine cross-cultural consensus)
- Gap-to-convergence mapping: which gaps have cross-org attention, which have none?
- The "big 6" ambitious proposals: VOLT, ECT, CHEQ, STAMP, DAAP, ADL -- standout ideas regardless of convergence metrics
- The absent ideas: capability degradation signaling, multi-agent transaction semantics, agent migration, privacy-preserving discovery, agent cost/billing</p>
<p><strong>Structural insight</strong>: Convergence and fragmentation coexist. Teams agree on WHAT needs building (130 ideas converge across orgs). They disagree on HOW (155 competing A2A protocols). The gap between "what" and "how" is where architecture is needed.</p>
<p><strong>What makes it worth reading alone</strong>: The cross-org convergence data is actionable -- builders can see which ideas have multi-org backing vs single-team proposals.</p>
<p><strong>Ends with</strong>: "130 ideas the industry agrees on, 11 gaps nobody is filling, and a question: what would it look like if someone drew the big picture?"</p>
<hr />
<h3 id="post-6-drawing-the-big-picture-what-the-agent-ecosystem-actually-needs">Post 6: "Drawing the Big Picture: What the Agent Ecosystem Actually Needs"</h3>
<p><strong>File</strong>: <code>06-big-picture.md</code>
<strong>Word count</strong>: 2000-2500</p>
<p><strong>THIS IS THE RESOLUTION AND CAPSTONE.</strong></p>
<p><strong>Key thesis</strong>: The landscape needs not more protocols but connective tissue -- a holistic ecosystem architecture providing a shared execution model (DAGs), human oversight primitives, protocol-agnostic interoperability, and assurance profiles that work from dev to regulated production.</p>
<p><strong>Key data points to include</strong>:
- Full synthesis: 434 drafts, 557 authors, 130 cross-org convergent ideas, 11 gaps, 18 team blocs, 42 overlap clusters
- The proposed 5-draft ecosystem: AEM (architecture), ATD (task DAG), HITL (human-in-the-loop), AEPB (protocol binding), APAE (assurance profiles)
- How this builds on existing work: SPIFFE (identity), WIMSE (security context), ECT (execution evidence)
- The dual-regime insight: same execution model must work in K8s (fast/relaxed) AND regulated environments (proofs/attestation)
- Predictions based on data trajectories
- What builders should do TODAY: which drafts to watch, which gaps to fill, which patterns to adopt</p>
<p><strong>Structural insight</strong>: The ecosystem needs five layers and existing work covers ~60%. Missing pieces: (1) DAG orchestration semantics, (2) HITL as first-class, (3) protocol translation, (4) assurance profiles. These map precisely to the critical and high-severity gaps.</p>
<p><strong>What makes it worth reading alone</strong>: The vision. The forward-looking piece people share with their teams.</p>
<p><strong>Ends with</strong>: "The IETF has navigated standardization sprints before. The drafts are being written. The question is whether architecture or fragmentation wins the race."</p>
<hr />
<h3 id="post-7-how-we-built-this-analyzing-434-ietf-drafts-with-claude-and-ollama">Post 7: "How We Built This: Analyzing 434 IETF Drafts with Claude and Ollama"</h3>
<p><strong>File</strong>: <code>07-how-we-built-this.md</code>
<strong>Word count</strong>: 1500-2000</p>
<p><strong>Key thesis</strong>: LLM-powered document analysis at scale is practical, cheap, and effective -- with careful engineering around caching, cost optimization, and hybrid model strategies.</p>
<p><strong>Key data points to include</strong>:
- Pipeline: fetch (Datatracker API) -&gt; analyze (Claude Sonnet) -&gt; embed (Ollama nomic-embed-text) -&gt; ideas (Claude Haiku, batched) -&gt; gaps (Claude Sonnet)
- Cost: ~$3.16 for 260 drafts; Haiku batch mode cut costs ~10x for idea extraction
- Hybrid strategy: Claude for analysis (reasoning), Ollama for embeddings (local, free, fast)
- Caching via llm_cache table (SHA256 prompt hash) -- zero waste on re-runs
- Tech: Python + Click + SQLite + FTS5 + httpx + rich + anthropic SDK + ollama
- 13 CLI commands, 13+ visualizations, 11 report types</p>
<p><strong>What makes it worth reading alone</strong>: Practical engineering details for anyone building similar systems.</p>
<p><strong>Ends with</strong>: Cross-link to Post 8 (the meta post about the agent team).</p>
<hr />
<h2 id="recurring-motifs-thread-across-all-posts">Recurring Motifs (thread across all posts)</h2>
<ol>
<li>
<p><strong>The ~4:1 Safety Deficit</strong> (averaging ~4:1, varying from 1.5:1 to 21:1 month-to-month): Introduced in Post 1, deepened in Post 4, resolved in Post 6. The series' signature metric.</p>
</li>
<li>
<p><strong>The Highway/Traffic Light Metaphor</strong>: The IETF is building highways (protocols) before traffic lights (safety, verification, override). Use sparingly but consistently.</p>
</li>
<li>
<p><strong>Fragmentation vs. Architecture</strong>: Bottom-up protocol proliferation vs. top-down ecosystem design. Posts 3 and 6 are the poles of this tension.</p>
</li>
<li>
<p><strong>Concentration and Absence</strong>: Huawei's dominance and Western absence. Introduced in Post 2, revisited in Post 6.</p>
</li>
<li>
<p><strong>The Islands Problem</strong>: Team blocs as islands. Ideas cluster within orgs. Cross-pollination is thin. The ecosystem needs bridges, not more islands.</p>
</li>
</ol>
<hr />
<h2 id="data-needs-per-post-for-the-analyst">Data Needs Per Post (for the Analyst)</h2>
<table>
<thead>
<tr>
<th>Post</th>
<th>Data Needed</th>
</tr>
</thead>
<tbody>
<tr>
<td>1</td>
<td>Updated counts (361), category breakdown with new drafts, growth timeline, score distribution</td>
</tr>
<tr>
<td>2</td>
<td>Author/org rankings (refreshed for 361), bloc details, cross-org matrix, Chinese vs Western counts</td>
</tr>
<tr>
<td>3</td>
<td>OAuth cluster details (14 drafts with approaches), near-duplicate pairs, overlap clusters, A2A count</td>
</tr>
<tr>
<td>4</td>
<td>Full gap details, per-gap idea counts, safety ratio, category vs gap matrix</td>
</tr>
<tr>
<td>5</td>
<td>Full idea taxonomy, cross-org idea overlap, common ideas, unique ideas, idea-to-gap mapping</td>
</tr>
<tr>
<td>6</td>
<td>Synthesis: top-level stats, gap fill estimates, category growth rates, WG adoption signals</td>
</tr>
<tr>
<td>7</td>
<td>Pipeline stats: API call counts, costs, cache hit rates, timing</td>
</tr>
</tbody>
</table>
<hr />
<h2 id="missing-analyses-the-coder-should-build">Missing Analyses the Coder Should Build</h2>
<ol>
<li>
<p><strong>Category Trend Analysis</strong> (Posts 1, 3, 6): Monthly breakdown per category. Growth rates. Which accelerating, which plateauing?</p>
</li>
<li>
<p><strong>RFC Cross-Reference Map</strong> (Posts 5, 6): Which RFCs do the 434 drafts build on? Reveals the foundation layer.</p>
</li>
<li>
<p><strong>Cross-Org Idea Overlap</strong> (Post 5): Ideas in 2+ drafts from different orgs = genuine consensus signal.</p>
</li>
<li>
<p><strong>Draft Status / WG Adoption</strong> (Post 6): Which drafts adopted by WGs? Which past -00? Traction vs aspiration.</p>
</li>
</ol>
<hr />
<h2 id="tone-and-style">Tone and Style</h2>
<ul>
<li><strong>Data-driven but narrative</strong>: Every claim backed by a number, every number wrapped in a story.</li>
<li><strong>Authoritative but accessible</strong>: Analysis, not advocacy. Let the data argue.</li>
<li><strong>Opinionated where data supports it</strong>: The safety deficit is a problem. Fragmentation is costly. Concentration is concerning.</li>
<li><strong>Name names</strong>: Specific drafts, authors, organizations. This is journalism.</li>
<li><strong>Lead with surprise</strong>: Each post opens with its most unexpected finding.</li>
<li><strong>End with forward link</strong>: Each post teases the next.</li>
<li><strong>1500-2500 words per post</strong>: Dense enough to be substantial, short enough to finish.</li>
</ul>
<hr />
<h1 id="part-b-reader-facing-series-introduction">PART B: READER-FACING SERIES INTRODUCTION</h1>
<p><em>What happens when the internet's standards body tries to build the rules for AI agents -- in real time, with 434 drafts, 557 authors, and a ~4:1 safety deficit (varying from 1.5:1 to 21:1 by month)?</em></p>
<hr />
<h2 id="about-this-series">About This Series</h2>
<p>The Internet Engineering Task Force is in the middle of the largest, fastest-growing standards race in a decade. In fifteen months, AI- and agent-related Internet-Drafts went from <strong>0.5% to 9.3%</strong> of all IETF submissions -- nearly 1 in 10. We built an automated analyzer to fetch, categorize, rate, and map every one of them.</p>
<p>This series tells the story of what we found: explosive growth, deep fragmentation, a concerning safety deficit, and hidden patterns that reveal where the real power lies and where the real risks lurk.</p>
<h2 id="the-posts">The Posts</h2>
<table>
<thead>
<tr>
<th>#</th>
<th>Title</th>
<th>What You'll Learn</th>
</tr>
</thead>
<tbody>
<tr>
<td>1</td>
<td><a href="01-gold-rush.md">The IETF's AI Agent Gold Rush</a></td>
<td>The numbers: 434 drafts, 0.5% to 9.3% growth in 15 months, and a ~4:1 capability-to-safety ratio (varying 1.5:1 to 21:1)</td>
</tr>
<tr>
<td>2</td>
<td><a href="02-who-writes-the-rules.md">Who's Writing the Rules for AI Agents?</a></td>
<td>The geopolitics: Huawei's 13-person bloc, Chinese institutional dominance, Western underrepresentation</td>
</tr>
<tr>
<td>3</td>
<td><a href="03-oauth-wars.md">The OAuth Wars and Other Battles</a></td>
<td>The fragmentation: 14 competing OAuth drafts, 155 A2A protocols with no interop</td>
</tr>
<tr>
<td>4</td>
<td><a href="04-what-nobody-builds.md">What Nobody's Building (And Why It Matters)</a></td>
<td>The gaps: 11 missing standards, 2 critical, and what goes wrong without them</td>
</tr>
<tr>
<td>5</td>
<td><a href="05-1262-ideas.md">Where 434 Drafts Converge (And Where They Don't)</a></td>
<td>The convergence: 130 cross-org ideas reveal genuine consensus beneath the fragmentation</td>
</tr>
<tr>
<td>6</td>
<td><a href="06-big-picture.md">Drawing the Big Picture</a></td>
<td>The vision: what the agent ecosystem actually needs and what comes next</td>
</tr>
<tr>
<td>7</td>
<td><a href="07-how-we-built-this.md">How We Built This</a></td>
<td>The methodology: analyzing 434 drafts with Claude, Ollama, and Python</td>
</tr>
</tbody>
</table>
<h2 id="how-to-read">How to Read</h2>
<p><strong>Linear (recommended)</strong>: 1 -&gt; 2 -&gt; 3 -&gt; 4 -&gt; 5 -&gt; 6 -&gt; 7</p>
<p><strong>By interest</strong>:
- <strong>Executives / decision-makers</strong>: Post 1 (overview) -&gt; Post 4 (gaps) -&gt; Post 6 (vision)
- <strong>Standards participants</strong>: Post 2 (who's writing) -&gt; Post 3 (fragmentation) -&gt; Post 5 (ideas) -&gt; Post 6 (vision)
- <strong>Builders / implementers</strong>: Post 4 (gaps) -&gt; Post 5 (ideas) -&gt; Post 6 (vision) -&gt; Post 7 (methodology)</p>
<p>Each post stands alone, but they build on each other. If you read one, make it <strong>Post 4</strong> -- the gaps analysis is the most consequential finding.</p>
<h2 id="the-data">The Data</h2>
<p>All findings come from our open-source IETF Draft Analyzer, which fetches drafts via the Datatracker API, rates them using Claude, extracts technical ideas, detects collaboration patterns via co-authorship analysis, and identifies standardization gaps. Data current as of March 2026.</p>
<table>
<thead>
<tr>
<th>Stat</th>
<th>Value</th>
</tr>
</thead>
<tbody>
<tr>
<td>Drafts analyzed</td>
<td>434</td>
</tr>
<tr>
<td>Authors mapped</td>
<td>557</td>
</tr>
<tr>
<td>Organizations</td>
<td>230</td>
</tr>
<tr>
<td>Cross-org convergent ideas</td>
<td>130</td>
</tr>
<tr>
<td>Gaps identified</td>
<td>11 (2 critical)</td>
</tr>
<tr>
<td>Team blocs detected</td>
<td>18</td>
</tr>
<tr>
<td>Analysis cost</td>
<td>~$9</td>
</tr>
</tbody>
</table>
<hr />
<p><em>Designed by the Architect agent, 2026-03-03.</em></p>
<div class="post-nav"><span></span><a href="/blog/posts/01-gold-rush.html">The Gold Rush &rarr;</a></div>
<footer>
<p>IETF Draft Analyzer &mdash; Data collected through March 2026.
<a href="https://github.com/cnennemann/ietf-draft-analyzer">Source on GitHub</a></p>
</footer>
</div>
</body>
</html>

View File

@@ -0,0 +1,312 @@
<!DOCTYPE html>
<html lang="en">
<head>
<meta charset="utf-8">
<meta name="viewport" content="width=device-width, initial-scale=1">
<title>The IETF's AI Agent Gold Rush: 434 Drafts, 557 Authors, and the Race to Define How AI Agents Talk — IETF AI Agent Analysis</title>
<link rel="stylesheet" href="/blog/css/style.css">
</head>
<body>
<div class="container">
<nav><a href="/blog/" class="site-title">IETF AI Agent Analysis</a>
<strong>Rush</strong>
<a href="/blog/posts/02-who-writes-the-rules.html">Rules</a>
<a href="/blog/posts/03-oauth-wars.html">Wars</a>
<a href="/blog/posts/04-what-nobody-builds.html">Builds</a>
<a href="/blog/posts/05-1262-ideas.html">Converge</a>
<a href="/blog/posts/06-big-picture.html">Picture</a>
<a href="/blog/posts/07-how-we-built-this.html">This</a>
<a href="/blog/posts/08-agents-building-the-analysis.html">Analysis</a></nav>
<h1 id="the-ietfs-ai-agent-gold-rush-434-drafts-557-authors-and-the-race-to-define-how-ai-agents-talk">The IETF's AI Agent Gold Rush: 434 Drafts, 557 Authors, and the Race to Define How AI Agents Talk</h1>
<p><em>Fifteen months ago, AI agents barely registered at the IETF. Today, nearly 1 in 10 new Internet-Drafts is about AI agents. We analyzed every one.</em></p>
<p><em>Analysis based on IETF Datatracker data collected through March 2026. Counts and statistics reflect this snapshot.</em></p>
<hr />
<p>For every Internet-Draft addressing how to keep an AI agent safe, roughly four are building new capabilities for it. That is the single most important number in this analysis.</p>
<p>We built an automated pipeline to fetch, categorize, rate, and map every AI- and agent-related Internet-Draft currently in the IETF system. We found <strong>434 drafts</strong> from <strong>557 authors</strong> at <strong>230 organizations</strong> and identified <strong>11 standardization gaps</strong> -- two of them critical. The result is the most comprehensive public analysis of the IETF's AI agent landscape to date.</p>
<p>The story the data tells is not subtle: the internet's most important standards body is in the middle of a gold rush, and the prospectors are moving faster than the safety inspectors.</p>
<h2 id="the-growth-curve">The Growth Curve</h2>
<p>In 2024, just <strong>9 AI/agent-related drafts</strong> were submitted to the IETF -- <strong>0.5%</strong> of all submissions. By Q1 2026, AI/agent drafts account for <strong>9.3%</strong> of all new Internet-Drafts. Nearly 1 in 10.</p>
<table>
<thead>
<tr>
<th>Year</th>
<th style="text-align: right;">Total IETF Drafts</th>
<th style="text-align: right;">AI/Agent Drafts</th>
<th style="text-align: right;">AI Share</th>
</tr>
</thead>
<tbody>
<tr>
<td>2021</td>
<td style="text-align: right;">1,108</td>
<td style="text-align: right;">~0</td>
<td style="text-align: right;">~0%</td>
</tr>
<tr>
<td>2022</td>
<td style="text-align: right;">1,121</td>
<td style="text-align: right;">~0</td>
<td style="text-align: right;">~0%</td>
</tr>
<tr>
<td>2023</td>
<td style="text-align: right;">1,241</td>
<td style="text-align: right;">~0</td>
<td style="text-align: right;">~0%</td>
</tr>
<tr>
<td>2024</td>
<td style="text-align: right;">1,651</td>
<td style="text-align: right;">9</td>
<td style="text-align: right;">0.5%</td>
</tr>
<tr>
<td>2025</td>
<td style="text-align: right;">2,696</td>
<td style="text-align: right;">190</td>
<td style="text-align: right;">7.0%</td>
</tr>
<tr>
<td>2026 (Q1)</td>
<td style="text-align: right;">1,748</td>
<td style="text-align: right;">162</td>
<td style="text-align: right;">9.3%</td>
</tr>
</tbody>
</table>
<p>The IETF itself accelerated 2.4x from 2021 to 2025. But AI/agent work went from essentially zero to dominant topic in under two years. The acceleration is not gradual. Submissions surged rapidly beginning in mid-2025 -- from 5 drafts in June 2025 to 61 in October 2025 to 85 in February 2026 -- and have not slowed.</p>
<p>This growth is driven by a convergence of forces: the explosion of commercial AI agent deployments (ChatGPT plugins, Anthropic's Claude tools, Google's Gemini agents), the emergence of protocols like MCP and A2A that need standardization, and the recognition across the industry that AI agents communicating over the internet without agreed-upon identity, security, and interoperability standards is a problem that gets worse every month it goes unaddressed.</p>
<p>(A note on methodology: our pipeline searches the Datatracker for 12 keywords -- <code>agent</code>, <code>ai-agent</code>, <code>llm</code>, <code>autonomous</code>, <code>machine-learning</code>, <code>artificial-intelligence</code>, <code>mcp</code>, <code>agentic</code>, <code>inference</code>, <code>generative</code>, <code>intelligent</code>, and <code>aipref</code> -- across both draft names and abstracts. We started with 6 keywords and 260 drafts, then expanded to 12 to capture MCP-related work, generative AI infrastructure, and intelligent networking. The full methodology is in <a href="07-how-we-built-this.md">Post 7</a>.)</p>
<p>The drafts span ten categories, and the distribution reveals priorities:</p>
<table>
<thead>
<tr>
<th>Category</th>
<th style="text-align: right;">Drafts</th>
<th style="text-align: right;">Share</th>
</tr>
</thead>
<tbody>
<tr>
<td>Data formats and interoperability</td>
<td style="text-align: right;">174</td>
<td style="text-align: right;">40%</td>
</tr>
<tr>
<td>A2A protocols</td>
<td style="text-align: right;">155</td>
<td style="text-align: right;">36%</td>
</tr>
<tr>
<td>Agent identity and authentication</td>
<td style="text-align: right;">152</td>
<td style="text-align: right;">35%</td>
</tr>
<tr>
<td>Autonomous network operations</td>
<td style="text-align: right;">114</td>
<td style="text-align: right;">26%</td>
</tr>
<tr>
<td>Policy and governance</td>
<td style="text-align: right;">109</td>
<td style="text-align: right;">25%</td>
</tr>
<tr>
<td>Agent discovery and registration</td>
<td style="text-align: right;">89</td>
<td style="text-align: right;">21%</td>
</tr>
<tr>
<td>ML traffic management</td>
<td style="text-align: right;">79</td>
<td style="text-align: right;">18%</td>
</tr>
<tr>
<td>AI safety and alignment</td>
<td style="text-align: right;">47</td>
<td style="text-align: right;">11%</td>
</tr>
<tr>
<td>Model serving and inference</td>
<td style="text-align: right;">42</td>
<td style="text-align: right;">10%</td>
</tr>
<tr>
<td>Human-agent interaction</td>
<td style="text-align: right;">34</td>
<td style="text-align: right;">8%</td>
</tr>
</tbody>
</table>
<p>Note that drafts can belong to multiple categories, so percentages exceed 100%. The dominance of plumbing -- data formats, identity, and communication protocols -- is expected for an early-stage standards effort. What is unexpected is how little attention the safety and human-oversight categories receive.</p>
<p>The ecosystem's DNA is visible in what it cites. We parsed <strong>4,231 cross-references</strong> from the drafts, and the foundation is clear: <strong>TLS 1.3</strong> (RFC 8446, cited by 42 drafts), <strong>OAuth 2.0</strong> (RFC 6749, 36 drafts), <strong>HTTP Semantics</strong> (RFC 9110, 34 drafts), and <strong>JWT</strong> (RFC 7519, 22 drafts). The agent identity/auth category is essentially built on top of the OAuth stack. The entire landscape stands on a security foundation -- which makes the 4:1 safety deficit all the more jarring.</p>
<h2 id="the-safety-deficit">The Safety Deficit</h2>
<p>The ratio is stark:</p>
<table>
<thead>
<tr>
<th>Focus Area</th>
<th style="text-align: right;">Drafts</th>
</tr>
</thead>
<tbody>
<tr>
<td>A2A protocols</td>
<td style="text-align: right;">155</td>
</tr>
<tr>
<td>Autonomous operations</td>
<td style="text-align: right;">114</td>
</tr>
<tr>
<td>Agent identity/auth</td>
<td style="text-align: right;">152</td>
</tr>
<tr>
<td><strong>AI safety/alignment</strong></td>
<td style="text-align: right;"><strong>47</strong></td>
</tr>
<tr>
<td><strong>Human-agent interaction</strong></td>
<td style="text-align: right;"><strong>34</strong></td>
</tr>
</tbody>
</table>
<p>The capability-to-safety ratio is roughly 4:1 on aggregate, though it varies significantly by time period -- from as low as 1.5:1 in some months to over 20:1 in others. The overall trend is clear: for every draft about keeping agents safe, approximately four are building new capabilities. The community is building the highways and forgetting the traffic lights.</p>
<p>This is not an abstract concern. Imagine an AI agent managing cloud infrastructure that detects a spurious anomaly, autonomously scales down a critical service, and triggers a cascading outage across three availability zones. Today, there is no standard mechanism to verify that the agent followed its declared policy before acting. No standard way to roll back the decision once the cascade begins. No standard protocol for a human operator to issue an emergency stop. The three critical gaps our analysis identified -- behavior verification, resource management, and error recovery -- are all about what happens when things go wrong. And in a world of autonomous AI agents, things will go wrong.</p>
<p>The safety drafts that do exist are often among the highest-rated in our analysis. <a href="https://datatracker.ietf.org/doc/draft-aylward-daap-v2/">draft-aylward-daap-v2</a> -- a comprehensive accountability protocol -- and <a href="https://datatracker.ietf.org/doc/draft-cowles-volt/">draft-cowles-volt</a> -- a tamper-evident execution trace format -- each scored 4.75 out of 5 (4-dimension composite excluding overlap), the highest in the entire corpus. <a href="https://datatracker.ietf.org/doc/draft-birkholz-verifiable-agent-conversations/">draft-birkholz-verifiable-agent-conversations</a>, which defines verifiable conversation records using cryptographic signing, scored 4.5. The quality is there. The quantity is not.</p>
<h2 id="whos-writing-the-drafts">Who's Writing the Drafts</h2>
<p>The organizational picture is as revealing as the technical one. The top contributors:</p>
<table>
<thead>
<tr>
<th>Organization</th>
<th style="text-align: right;">Authors</th>
<th style="text-align: right;">Drafts</th>
</tr>
</thead>
<tbody>
<tr>
<td>Huawei</td>
<td style="text-align: right;">53</td>
<td style="text-align: right;">69</td>
</tr>
<tr>
<td>China Mobile</td>
<td style="text-align: right;">24</td>
<td style="text-align: right;">35</td>
</tr>
<tr>
<td>Cisco</td>
<td style="text-align: right;">24</td>
<td style="text-align: right;">26</td>
</tr>
<tr>
<td>Independent</td>
<td style="text-align: right;">19</td>
<td style="text-align: right;">25</td>
</tr>
<tr>
<td>China Telecom</td>
<td style="text-align: right;">24</td>
<td style="text-align: right;">24</td>
</tr>
<tr>
<td>China Unicom</td>
<td style="text-align: right;">22</td>
<td style="text-align: right;">21</td>
</tr>
<tr>
<td>Tsinghua University</td>
<td style="text-align: right;">13</td>
<td style="text-align: right;">16</td>
</tr>
<tr>
<td>ZTE Corporation</td>
<td style="text-align: right;">12</td>
<td style="text-align: right;">12</td>
</tr>
<tr>
<td>Five9</td>
<td style="text-align: right;">1</td>
<td style="text-align: right;">10</td>
</tr>
<tr>
<td>Ericsson</td>
<td style="text-align: right;">4</td>
<td style="text-align: right;">9</td>
</tr>
</tbody>
</table>
<p><strong>Huawei</strong> leads by a wide margin: <strong>53 authors</strong> contributing to <strong>69 drafts</strong> (across all Huawei entities) -- about 16% of the entire corpus. But the concentration goes deeper than raw numbers -- the next post will examine the team bloc structure, geopolitics, and what the collaboration network reveals about where power really lies.</p>
<p>Cisco and China Mobile each have 24 authors, but China Mobile's team produces 35 drafts to Cisco's 26. Ericsson has only 4 authors but punches above its weight with 9 focused drafts. Independent contributors account for 25 drafts -- a healthy sign of grassroots engagement.</p>
<h2 id="the-fragmentation-problem">The Fragmentation Problem</h2>
<p>The drafts are not just numerous; they are redundant. Our embedding-based similarity analysis found <strong>25+ draft pairs</strong> with greater than 0.98 cosine similarity -- functionally identical proposals submitted under different names.</p>
<p>The most crowded space is OAuth for AI agents: <strong>14 separate drafts</strong> all trying to solve how AI agents authenticate and get authorized. They range from broad framework proposals (<a href="https://datatracker.ietf.org/doc/draft-aap-oauth-profile/">draft-aap-oauth-profile</a>) to narrow extensions (<a href="https://datatracker.ietf.org/doc/draft-jia-oauth-scope-aggregation/">draft-jia-oauth-scope-aggregation</a>) to full accountability systems (<a href="https://datatracker.ietf.org/doc/draft-aylward-daap-v2/">draft-aylward-daap-v2</a>). None are compatible with each other.</p>
<p>Beyond OAuth, the broader A2A protocol landscape includes <strong>155 drafts</strong> with no interoperability layer. The most common technical idea in the entire corpus -- "Multi-Agent Communication Protocol" -- appears in 8 separate drafts from different teams. And the fragmentation goes deeper than protocols: the vast majority of technical ideas extracted from the corpus appear in exactly one draft. Everyone is solving the same problem. Nobody is solving it together.</p>
<p>This fragmentation has real costs. Implementers face confusion over which draft to follow. The IETF process slows as competing proposals vie for working group adoption. And the longer competing drafts proliferate without convergence, the higher the risk of incompatible deployments that entrench fragmentation rather than resolving it.</p>
<h2 id="what-the-best-drafts-look-like">What the Best Drafts Look Like</h2>
<p>Not everything is chaos. Our quality ratings -- scoring novelty, maturity, overlap avoidance, momentum, and relevance on a 1-5 scale -- surface drafts that are doing the hard work well:</p>
<table>
<thead>
<tr>
<th>Draft</th>
<th style="text-align: right;">Score</th>
<th>What It Does</th>
</tr>
</thead>
<tbody>
<tr>
<td><a href="https://datatracker.ietf.org/doc/draft-aylward-daap-v2/">draft-aylward-daap-v2</a></td>
<td style="text-align: right;">4.75</td>
<td>Comprehensive AI agent accountability with authentication, monitoring, enforcement</td>
</tr>
<tr>
<td><a href="https://datatracker.ietf.org/doc/draft-guy-bary-stamp-protocol/">draft-guy-bary-stamp-protocol</a></td>
<td style="text-align: right;">4.5</td>
<td>Cryptographic delegation and proof for agent task execution</td>
</tr>
<tr>
<td><a href="https://datatracker.ietf.org/doc/draft-drake-email-tpm-attestation/">draft-drake-email-tpm-attestation</a></td>
<td style="text-align: right;">4.5</td>
<td>Hardware attestation for email via TPM verification chains</td>
</tr>
<tr>
<td><a href="https://datatracker.ietf.org/doc/draft-ietf-lake-app-profiles/">draft-ietf-lake-app-profiles</a></td>
<td style="text-align: right;">4.5</td>
<td>Canonical CBOR for EDHOC application profiles</td>
</tr>
<tr>
<td><a href="https://datatracker.ietf.org/doc/draft-birkholz-verifiable-agent-conversations/">draft-birkholz-verifiable-agent-conversations</a></td>
<td style="text-align: right;">4.5</td>
<td>Verifiable agent conversation records with COSE signing</td>
</tr>
</tbody>
</table>
<p>Scores are 4-dimension composites (novelty, maturity, momentum, relevance), excluding overlap. The average score across all 434 rated drafts is 3.27. The best work combines clear problem definition with concrete mechanisms and low overlap with existing proposals. The worst drafts are me-too proposals that restate problems already solved elsewhere.</p>
<p><em>Methodology note: Quality ratings are LLM-generated (Claude Sonnet) from draft abstracts only, not full text. No human calibration has been performed. Scores should be treated as relative rankings within this corpus, not absolute quality measures. See <a href="07-how-we-built-this.md">How We Built This</a> and the <a href="../methodology.md">Methodology</a> document for details.</em></p>
<h2 id="what-comes-next">What Comes Next</h2>
<p>The IETF has navigated technology gold rushes before -- the early web, IoT, DNS security. In each case, the first wave of competing proposals eventually converged, and the lasting standards came from those who focused on interoperability and safety alongside capability.</p>
<p>The AI agent wave is following the same early pattern. The landscape has quantity. The question is whether it develops architecture -- and whether the safety work catches up before the capability work ships without it.</p>
<p>This blog series will dig into the questions the data raises. The next post starts with the most fundamental: who, exactly, is writing the rules?</p>
<hr />
<h3 id="key-takeaways">Key Takeaways</h3>
<ul>
<li><strong>434 drafts</strong> from <strong>557 authors</strong> at <strong>230 organizations</strong> -- AI/agent work went from <strong>0.5% to 9.3%</strong> of all IETF submissions in 15 months</li>
<li>The capability-to-safety ratio (roughly <strong>4:1 on aggregate</strong>, varying from 1.5:1 to 21:1 by month) is the most concerning structural finding</li>
<li><strong>Huawei</strong> dominates authorship with 53 authors on 69 drafts (~16% of corpus); Chinese-linked institutions account for 160+ authors</li>
<li><strong>14 competing OAuth-for-agents proposals</strong> illustrate deep fragmentation; 155 A2A protocol drafts have no interoperability layer</li>
<li><strong>11 standardization gaps</strong> remain, with the 2 most critical relating to what happens when agents fail</li>
</ul>
<p><em>Next in this series: <a href="02-who-writes-the-rules.md">Who's Writing the Rules for AI Agents?</a> -- Inside the team blocs, geopolitics, and collaboration networks behind the IETF's AI agent standards.</em></p>
<hr />
<p><em>Analysis conducted using the IETF Draft Analyzer. Data current as of March 2026. All 434 drafts, 557 authors, and full analysis data are available in the project's SQLite database.</em></p>
<div class="post-nav"><a href="/blog/posts/00-series-overview.html">&larr; Series Overview</a><a href="/blog/posts/02-who-writes-the-rules.html">Who Writes the Rules &rarr;</a></div>
<footer>
<p>IETF Draft Analyzer &mdash; Data collected through March 2026.
<a href="https://github.com/cnennemann/ietf-draft-analyzer">Source on GitHub</a></p>
</footer>
</div>
</body>
</html>

View File

@@ -0,0 +1,303 @@
<!DOCTYPE html>
<html lang="en">
<head>
<meta charset="utf-8">
<meta name="viewport" content="width=device-width, initial-scale=1">
<title>Who's Writing the Rules for AI Agents? — IETF AI Agent Analysis</title>
<link rel="stylesheet" href="/blog/css/style.css">
</head>
<body>
<div class="container">
<nav><a href="/blog/" class="site-title">IETF AI Agent Analysis</a>
<a href="/blog/posts/01-gold-rush.html">Rush</a>
<strong>Rules</strong>
<a href="/blog/posts/03-oauth-wars.html">Wars</a>
<a href="/blog/posts/04-what-nobody-builds.html">Builds</a>
<a href="/blog/posts/05-1262-ideas.html">Converge</a>
<a href="/blog/posts/06-big-picture.html">Picture</a>
<a href="/blog/posts/07-how-we-built-this.html">This</a>
<a href="/blog/posts/08-agents-building-the-analysis.html">Analysis</a></nav>
<h1 id="whos-writing-the-rules-for-ai-agents">Who's Writing the Rules for AI Agents?</h1>
<p><em>Inside the team blocs, geopolitics, and collaboration networks shaping the future of AI agent standards.</em></p>
<p><em>Analysis based on IETF Datatracker data collected through March 2026. Counts and statistics reflect this snapshot.</em></p>
<hr />
<p>Thirteen people from one company co-author 22 Internet-Drafts at 94% internal cohesion. Their work covers agent networking, identity management, communication protocols, and network troubleshooting. Together, they represent the single most coordinated standards-writing campaign in the IETF's AI agent space.</p>
<p>They all work at Huawei.</p>
<p>This is the story of who is writing the rules for AI agents, what their collaboration networks reveal, and why the geography of authorship matters more than most people realize.</p>
<h2 id="the-numbers-behind-the-names">The Numbers Behind the Names</h2>
<p>Our analysis mapped <strong>557 unique authors</strong> from <strong>230 organizations</strong> across the 434 AI/agent drafts in the IETF pipeline. But those topline numbers mask extreme concentration.</p>
<table>
<thead>
<tr>
<th>Organization</th>
<th style="text-align: right;">Authors</th>
<th style="text-align: right;">Drafts</th>
</tr>
</thead>
<tbody>
<tr>
<td>Huawei</td>
<td style="text-align: right;">53</td>
<td style="text-align: right;">69</td>
</tr>
<tr>
<td>China Mobile</td>
<td style="text-align: right;">24</td>
<td style="text-align: right;">35</td>
</tr>
<tr>
<td>Cisco</td>
<td style="text-align: right;">24</td>
<td style="text-align: right;">26</td>
</tr>
<tr>
<td>Independent</td>
<td style="text-align: right;">19</td>
<td style="text-align: right;">25</td>
</tr>
<tr>
<td>China Telecom</td>
<td style="text-align: right;">24</td>
<td style="text-align: right;">24</td>
</tr>
<tr>
<td>China Unicom</td>
<td style="text-align: right;">22</td>
<td style="text-align: right;">21</td>
</tr>
<tr>
<td>Tsinghua University</td>
<td style="text-align: right;">13</td>
<td style="text-align: right;">16</td>
</tr>
<tr>
<td>ZTE Corporation</td>
<td style="text-align: right;">12</td>
<td style="text-align: right;">12</td>
</tr>
<tr>
<td>Five9</td>
<td style="text-align: right;">1</td>
<td style="text-align: right;">10</td>
</tr>
<tr>
<td>Ericsson</td>
<td style="text-align: right;">4</td>
<td style="text-align: right;">9</td>
</tr>
</tbody>
</table>
<p>One company -- Huawei -- contributes about 16% of all drafts (69 across all Huawei-named entities, consolidated from Huawei, Huawei Technologies, Huawei Canada, etc.). The top six Chinese-linked organizations together contribute over 160 authors. This is not a general pattern across the IETF; it is specific to the AI agent space, and it tells a story about who considers these standards strategically important.</p>
<h2 id="the-huawei-drafting-machine">The Huawei Drafting Machine</h2>
<p>The Huawei team bloc is worth examining in detail because it illustrates a pattern -- organized, coordinated standards campaigns -- that is characteristic of how some institutions approach the IETF.</p>
<p>The 13-person core team includes:</p>
<table>
<thead>
<tr>
<th>Author</th>
<th style="text-align: right;">Drafts</th>
<th>Role in Team</th>
</tr>
</thead>
<tbody>
<tr>
<td>Bing Liu</td>
<td style="text-align: right;">23</td>
<td>Top contributor, appears on most team drafts</td>
</tr>
<tr>
<td>Zhenbin Li</td>
<td style="text-align: right;">21</td>
<td>Core, agent networking frameworks</td>
</tr>
<tr>
<td>Nan Geng</td>
<td style="text-align: right;">20</td>
<td>Core, near-total overlap with Liu</td>
</tr>
<tr>
<td>Qiangzhou Gao</td>
<td style="text-align: right;">20</td>
<td>Core, cross-device communication</td>
</tr>
<tr>
<td>Xiaotong Shang</td>
<td style="text-align: right;">19</td>
<td>Core, network measurement and troubleshooting</td>
</tr>
<tr>
<td>Jianwei Mao</td>
<td style="text-align: right;">14</td>
<td>Communication protocol gap analysis</td>
</tr>
<tr>
<td>Guanming Zeng</td>
<td style="text-align: right;">13</td>
<td>MCP and NETCONF for agents</td>
</tr>
</tbody>
</table>
<p>The remaining six members contribute 2-5 drafts each. The team's <strong>94% cohesion</strong> means that nearly every possible pair of members shares the vast majority of their drafts. This is not casual co-authorship; it is a systematic drafting operation.</p>
<p>Their 22 drafts cover a specific territory: agent networking frameworks for enterprise and broadband networks, agent identity management, cross-device communication, MCP integration for network troubleshooting, and agent gateway requirements. The focus is heavily on <strong>autonomous network operations</strong> and <strong>A2A protocols</strong> -- the infrastructure layer of the agent ecosystem.</p>
<p>Two deeper metrics reveal the nature of this operation:</p>
<p><strong>Volume over iteration.</strong> Across the entire corpus, <strong>55% of all 434 drafts</strong> have never been revised beyond their first submission (rev-00). But the rate varies dramatically by organization. Of Huawei's drafts, <strong>65% are at rev-00</strong>. Compare that to Ericsson (11%), Siemens (0%), Nokia (20%), or Boeing (0%). The most serious iterators -- Boeing (avg 28.2 revisions per draft), Siemens (17.2), Sandelman Software (14.3) -- submit far fewer drafts but iterate relentlessly. Western companies submit fewer drafts but revise heavily -- incorporating feedback, advancing toward maturity. Huawei's pattern is the opposite: submit at volume, iterate rarely. Submitting a draft is cheap. Iterating it signals genuine investment.</p>
<p><strong>Campaign timing.</strong> Of Huawei's drafts, <strong>43 were submitted in the four weeks before IETF 121 Dublin</strong> -- 62% of the company's entire output, packed into a single pre-meeting window. For context, the entire corpus had 107 drafts in that period. Huawei alone accounted for <strong>40% of all pre-IETF 121 submissions</strong>. This is not organic growth. It is a coordinated submission campaign timed for maximum standards-body impact.</p>
<p>Beyond the main team, the company has additional smaller blocs. No other organization comes close to this level of coordinated output.</p>
<h2 id="the-chinese-institutional-ecosystem">The Chinese Institutional Ecosystem</h2>
<p>Huawei does not operate in isolation. The Chinese organizations in this space form a densely interconnected collaboration network.</p>
<table>
<thead>
<tr>
<th>Org A</th>
<th>Org B</th>
<th style="text-align: right;">Shared Drafts</th>
</tr>
</thead>
<tbody>
<tr>
<td>China Unicom</td>
<td>Huawei</td>
<td style="text-align: right;">6</td>
</tr>
<tr>
<td>Tsinghua University</td>
<td>Zhongguancun Laboratory</td>
<td style="text-align: right;">5</td>
</tr>
<tr>
<td>China Mobile</td>
<td>ZTE Corporation</td>
<td style="text-align: right;">4</td>
</tr>
<tr>
<td>China Mobile</td>
<td>Huawei</td>
<td style="text-align: right;">4</td>
</tr>
<tr>
<td>BUPT</td>
<td>Tsinghua University</td>
<td style="text-align: right;">3</td>
</tr>
<tr>
<td>China Telecom</td>
<td>Huawei</td>
<td style="text-align: right;">3</td>
</tr>
<tr>
<td>BUPT</td>
<td>China Telecom</td>
<td style="text-align: right;">3</td>
</tr>
<tr>
<td>CAICT</td>
<td>Huawei</td>
<td style="text-align: right;">3</td>
</tr>
</tbody>
</table>
<p>The structure has three tiers:</p>
<p><strong>Tier 1: Telecom operators</strong> -- China Mobile (24 authors, 35 drafts), China Telecom (24 authors, 24 drafts), China Unicom (22 authors, 21 drafts). These organizations bring domain expertise in network operations and 6G requirements. Their drafts focus heavily on use cases: agents for 6G networks, agent-based network management, traffic optimization.</p>
<p><strong>Tier 2: Equipment vendor</strong> -- Huawei (53 authors, 66 drafts), ZTE Corporation (12 authors, 12 drafts). Huawei's dominance here is striking; ZTE's contribution is modest by comparison. These drafts focus on architecture and protocols -- the building blocks rather than the use cases.</p>
<p><strong>Tier 3: Research institutions</strong> -- Tsinghua University (13 authors, 16 drafts), BUPT (14 authors, 7 drafts), Zhongguancun Laboratory (4 authors, 6 drafts), CAICT (8 authors, 6 drafts). These institutions bridge the gap between industry and academia, often co-authoring with both telecom operators and Huawei.</p>
<p>The Zhongguancun Laboratory team (4 members, 5 shared drafts, 94% cohesion) is led by Yong Cui of Tsinghua University, one of the most prolific individual authors with 8 drafts spanning agent discovery, network management benchmarking, and LLM-assisted operations. His work includes <a href="https://datatracker.ietf.org/doc/draft-cui-nmrg-llm-benchmark/">draft-cui-nmrg-llm-benchmark</a> (score 4.3) -- one of the highest-rated drafts in the corpus.</p>
<p>The China Telecom team (6 members from China Telecom, BUPT, and Tsinghua) focuses on 6G agent use cases and IoA task protocols. Their drafts are more forward-looking than Huawei's -- less about current network operations, more about where agents fit in next-generation infrastructure.</p>
<h2 id="where-is-the-west">Where Is the West?</h2>
<p>The absence is as telling as the presence.</p>
<p><strong>Google</strong>: 5 authors, 9 drafts -- a notable increase, but still thin relative to the company's agent platform presence (Gemini agents, A2A protocol).</p>
<p><strong>Microsoft</strong>: Minimal presence.</p>
<p><strong>Apple</strong>: Two authors, two drafts -- both about mail automation (<a href="https://datatracker.ietf.org/doc/draft-ietf-mailmaint-pacc/">draft-ietf-mailmaint-pacc</a>, <a href="https://datatracker.ietf.org/doc/draft-eggert-mailmaint-uaautoconf/">draft-eggert-mailmaint-uaautoconf</a>). Not about AI agents per se.</p>
<p><strong>Amazon</strong>: 6 authors, 6 drafts -- primarily post-quantum cryptography work (ML-KEM hybrid key exchange), not agent-specific.</p>
<p><strong>Cisco</strong>: The most active Western tech company with 24 authors across 26 drafts, but spread thinly. Three separate Cisco blocs cover different areas: Cullen Fluffy Jennings and Suhas Nandakumar work on A2A transport and agent identity; another team (Muscariello, Papalini, Sardara, Betts) works on AGNTCY messaging; a third (Farinacci, Rodriguez-Natal, Maino) works on LISP-based networking. No single coordinated campaign.</p>
<p><strong>Ericsson</strong>: 4 authors, 9 drafts -- focused on EDHOC lightweight authentication, a mature protocol effort led by Goran Selander. High quality (scores 3.2-4.1) but narrow scope.</p>
<p>The pattern is clear: Western companies are either absent from AI agent standardization or participating in adjacent security/crypto work rather than the core agent protocol space. The reasons likely include strategic focus on proprietary agent ecosystems (Google's Gemini, Apple's Siri agents), less tradition of IETF engagement in the agent/AI space, and the assumption that de facto standards (MCP, A2A) will matter more than de jure IETF ones.</p>
<p>This bet may prove wrong. IETF standards have a way of becoming the infrastructure that everyone must eventually support.</p>
<h2 id="the-team-bloc-landscape">The Team Bloc Landscape</h2>
<p>Beyond Huawei, our co-authorship analysis detected <strong>18 team blocs</strong> covering a significant fraction of the 557 authors. Each bloc is a group where members share at least 70% pairwise draft overlap and 3+ shared drafts.</p>
<p>The most notable non-Chinese blocs:</p>
<p><strong>Ericsson team</strong> (5 members, 6 drafts, 100% cohesion) -- Goran Selander and colleagues lead this European effort focused on EDHOC authentication and lightweight key exchange for constrained devices. They collaborate with Inria (France) and the University of Murcia (Spain). Their work (<a href="https://datatracker.ietf.org/doc/draft-spm-lake-pqsuites/">draft-spm-lake-pqsuites</a>, score 4.1) represents some of the most mature protocol work in the corpus.</p>
<p><strong>Five9/Bitwave team</strong> (2 members, 6 drafts, 100% cohesion) -- Jonathan Rosenberg (Five9) and Pat White (Bitwave) are the most prolific Western contributors to core agent protocols. Their drafts span the full stack: CHEQ for human confirmation of agent decisions (<a href="https://datatracker.ietf.org/doc/draft-rosenberg-aiproto-cheq/">draft-rosenberg-aiproto-cheq</a>, score 3.9), N-ACT for agent-to-tool communication, and an OAuth extension for agent authentication. Rosenberg is also the strongest cross-team bridge, sharing 3 drafts with Cisco's Cullen Fluffy Jennings -- the single strongest cross-bloc connection we found.</p>
<p><strong>ISI, R.C. ATHENA team</strong> (4 members, 4 drafts, 100% cohesion) -- A Greek research institute producing post-quantum authentication work for EDHOC. All four members (Haleplidis, Fraile, Fournaris, Koulamas) co-author every draft. Their <a href="https://datatracker.ietf.org/doc/draft-lake-pocero-authkem-ikr-edhoc/">draft-lake-pocero-authkem-ikr-edhoc</a> scored 4.2.</p>
<p><strong>JPMorgan/multi-org team</strong> (4 members from JPMorgan, Oracle, Telefonica, Aryaka; 2 drafts, 100% cohesion) -- The most cross-organizational Western bloc. Their work on transitive attestation (<a href="https://datatracker.ietf.org/doc/draft-mw-wimse-transitive-attestation/">draft-mw-wimse-transitive-attestation</a>, score 4.3) and actor chains (<a href="https://datatracker.ietf.org/doc/draft-mw-spice-actor-chain/">draft-mw-spice-actor-chain</a>, score 4.1) addresses the safety and accountability space. Notably, these are among the highest-scored drafts in the corpus.</p>
<h2 id="the-cross-pollination-problem">The Cross-Pollination Problem</h2>
<p>Once you account for team blocs, the cross-team collaboration picture is sparse. The top cross-bloc connection -- Jonathan Rosenberg bridging Five9/Bitwave and Cisco -- involves just 3 shared drafts. Most cross-team pairs share only 1.</p>
<p>Our network centrality analysis reveals who bridges these gaps. Of 557 authors, only <strong>115 (23%)</strong> co-author with people from both Chinese and Western organizations. The top bridge-builders are not from the organizations you might expect:</p>
<table>
<thead>
<tr>
<th>Author</th>
<th>Organization</th>
<th style="text-align: right;">BC Score</th>
<th style="text-align: right;">CN Neighbors</th>
<th style="text-align: right;">Western Neighbors</th>
</tr>
</thead>
<tbody>
<tr>
<td>Luis M. Contreras</td>
<td>Telefonica</td>
<td style="text-align: right;">0.035</td>
<td style="text-align: right;">11</td>
<td style="text-align: right;">3</td>
</tr>
<tr>
<td>Qin Wu</td>
<td>Huawei</td>
<td style="text-align: right;">0.035</td>
<td style="text-align: right;">12</td>
<td style="text-align: right;">11</td>
</tr>
<tr>
<td>Muhammad Awais Jadoon</td>
<td>InterDigital</td>
<td style="text-align: right;">0.023</td>
<td style="text-align: right;">9</td>
<td style="text-align: right;">4</td>
</tr>
<tr>
<td>Diego Lopez</td>
<td>Telefonica</td>
<td style="text-align: right;">0.013</td>
<td style="text-align: right;">6</td>
<td style="text-align: right;">9</td>
</tr>
<tr>
<td>Giuseppe Fioccola</td>
<td>Huawei</td>
<td style="text-align: right;">0.009</td>
<td style="text-align: right;">2</td>
<td style="text-align: right;">8</td>
</tr>
</tbody>
</table>
<p>The structural glue holding the two blocs together is <strong>European telecoms</strong> -- Telefonica, InterDigital, Deutsche Telekom. Not US Big Tech. Not any formal cross-standards body. A handful of European companies, through their authors' co-authorship ties, provide the only significant cross-divide connectivity. Qin Wu (Huawei) is the most balanced individual bridge, with nearly equal Chinese and Western co-author networks. But these bridges are thin: remove any two or three of these people, and the network fragments further.</p>
<p>The sparseness of these bridges becomes even more concerning when you look at what the two blocs are building <em>on</em>. Our RFC cross-reference analysis (detailed in Post 3) reveals that the Chinese and Western blocs cite fundamentally different technology stacks. The Chinese agent ecosystem is being built on <strong>network management protocols</strong> -- YANG (RFC 7950), NETCONF (RFC 6241), and autonomic networking (RFC 7575). The Western ecosystem is being built on <strong>IoT security and web infrastructure</strong> -- COSE (RFC 9052), CBOR (RFC 8949), CoAP (RFC 7252), HTTP Semantics (RFC 9110), and EDHOC (RFC 9528). The only shared foundation is <strong>OAuth 2.0</strong> -- which explains why the OAuth-for-agents space has 14 competing proposals. It is the one piece of common ground, and everyone is fighting over it.</p>
<p>This means the cross-pollination problem is deeper than "different teams working separately." The two blocs are building on incompatible infrastructure. Even if they agreed on an agent communication pattern, the underlying plumbing diverges.</p>
<p>The IETF's consensus process works best when different implementation perspectives collide and reconcile. In the AI agent space, those collisions are rare. The Chinese institutional ecosystem collaborates internally but has limited connections to Western contributors. The European cryptographic teams (Ericsson, RISE, ATHENA) work on authentication foundations but do not connect to the agent protocol teams. The American startups (Five9, Bitwave) and enterprise companies (Cisco) work on adjacent problems without shared architectural framing.</p>
<p>The one exception is Fraunhofer SIT's Henk Birkholz and Tradeverifyd's Orie Steele, whose <a href="https://datatracker.ietf.org/doc/draft-birkholz-verifiable-agent-conversations/">draft-birkholz-verifiable-agent-conversations</a> (score 4.5) and <a href="https://datatracker.ietf.org/doc/draft-steele-agent-considerations/">draft-steele-agent-considerations</a> (score 4.0) represent rare cross-cultural, safety-focused work from German and American collaborators.</p>
<h2 id="what-this-means">What This Means</h2>
<p>Three implications emerge from the authorship data:</p>
<p><strong>1. Volume and influence are not the same thing.</strong> Huawei's 69 drafts represent about 16% of the corpus, but 65% have never been revised. The IETF rewards sustained engagement -- drafts that iterate through feedback cycles, reach working group adoption, and mature toward RFC status. A campaign that optimizes for volume at a pre-meeting deadline is playing a different game than one that optimizes for adoption. The quality scores bear this out: Huawei's team averages around 3.1, respectable but not exceptional. The organizations doing the deepest work (Ericsson at 4.8 average revision, Siemens at 17.2) submit far fewer drafts but iterate relentlessly.</p>
<p><strong>2. The safety work comes from unexpected places.</strong> The highest-quality safety and accountability drafts come not from the high-volume drafters but from smaller, specialized teams: Aylward (independent), Birkholz/Steele (Fraunhofer/Tradeverifyd), Rosenberg/White (Five9/Bitwave), and the JPMorgan-led multi-org team. The organizations doing the most drafting are focused on capability; the organizations doing the best safety work are doing the least drafting.</p>
<p><strong>3. The IETF needs more bridges.</strong> Cross-team, cross-organization, cross-geography collaboration is the weakest link in the current landscape. Our centrality analysis shows that European telecoms -- not US Big Tech -- are the structural glue between Chinese and Western blocs. The standards that will endure are the ones where Chinese telecom expertise, European cryptographic rigor, and American agent-platform experience converge. Right now, those worlds barely overlap, and the few bridges that exist depend on a handful of individuals.</p>
<hr />
<h3 id="key-takeaways">Key Takeaways</h3>
<ul>
<li><strong>Huawei dominates</strong> with 53 authors on 69 drafts (~16% of corpus); their 13-person core team co-authors 22 drafts at 94% cohesion -- but 65% of those drafts have never been revised, and 43 were submitted in a single 4-week pre-meeting window</li>
<li><strong>Chinese institutions</strong> collectively contribute 160+ of 557 authors; they form a tightly interconnected collaboration ecosystem</li>
<li><strong>Google has 9 drafts but Microsoft and Apple are largely absent</strong> from AI agent standardization -- a notable strategic gap</li>
<li><strong>18 team blocs</strong> detected; cross-team collaboration is sparse, with most cross-bloc pairs sharing only 1 draft</li>
<li><strong>Only 23% of authors bridge the Chinese-Western divide</strong>; European telecoms (Telefonica, InterDigital) are the structural glue -- not US Big Tech</li>
<li><strong>The best safety work</strong> comes from smaller, specialized teams -- not from the high-volume drafters</li>
</ul>
<p><em>Next in this series: <a href="03-oauth-wars.md">The OAuth Wars and Other Battles</a> -- 14 competing proposals, 155 A2A protocols, and what fragmentation costs the internet.</em></p>
<hr />
<p><em>Data from the IETF Draft Analyzer, covering 434 drafts, 557 authors, and 18 detected team blocs. Co-authorship analysis uses 70% pairwise draft overlap threshold with 3+ shared drafts.</em></p>
<div class="post-nav"><a href="/blog/posts/01-gold-rush.html">&larr; The Gold Rush</a><a href="/blog/posts/03-oauth-wars.html">The OAuth Wars &rarr;</a></div>
<footer>
<p>IETF Draft Analyzer &mdash; Data collected through March 2026.
<a href="https://github.com/cnennemann/ietf-draft-analyzer">Source on GitHub</a></p>
</footer>
</div>
</body>
</html>

View File

@@ -0,0 +1,373 @@
<!DOCTYPE html>
<html lang="en">
<head>
<meta charset="utf-8">
<meta name="viewport" content="width=device-width, initial-scale=1">
<title>The OAuth Wars and Other Battles — IETF AI Agent Analysis</title>
<link rel="stylesheet" href="/blog/css/style.css">
</head>
<body>
<div class="container">
<nav><a href="/blog/" class="site-title">IETF AI Agent Analysis</a>
<a href="/blog/posts/01-gold-rush.html">Rush</a>
<a href="/blog/posts/02-who-writes-the-rules.html">Rules</a>
<strong>Wars</strong>
<a href="/blog/posts/04-what-nobody-builds.html">Builds</a>
<a href="/blog/posts/05-1262-ideas.html">Converge</a>
<a href="/blog/posts/06-big-picture.html">Picture</a>
<a href="/blog/posts/07-how-we-built-this.html">This</a>
<a href="/blog/posts/08-agents-building-the-analysis.html">Analysis</a></nav>
<h1 id="the-oauth-wars-and-other-battles">The OAuth Wars and Other Battles</h1>
<p><em>14 competing proposals, 155 protocols with no interop layer, and 25+ near-duplicate drafts. Inside the IETF's AI agent fragmentation problem.</em></p>
<p><em>Analysis based on IETF Datatracker data collected through March 2026. Counts and statistics reflect this snapshot.</em></p>
<hr />
<p>Fourteen separate Internet-Drafts are trying to solve the same problem: how should AI agents authenticate and get authorized using OAuth? They are not collaborating. They are not compatible. And they are all submitted in the same nine-month window.</p>
<p>This is the fragmentation problem, and it is not limited to OAuth. Across the IETF's AI agent landscape, our analysis found the same pattern repeated in agent discovery, multi-agent communication, intent-based routing, and 6G agent requirements. Teams are working in parallel, not together, and the cost is measured in wasted effort, confused implementers, and the growing risk of incompatible deployments.</p>
<h2 id="the-oauth-cluster-14-ways-to-solve-one-problem">The OAuth Cluster: 14 Ways to Solve One Problem</h2>
<p>The most crowded corner of the AI agent standards landscape is OAuth for agents. Every proposal is trying to answer the same fundamental question: when an AI agent acts on behalf of a user -- or on its own -- how does it prove its identity and obtain permission?</p>
<p>The depth of this cluster is not surprising when you look at the ecosystem's foundations. Our cross-reference analysis of all 434 drafts found that <strong>OAuth 2.0</strong> (RFC 6749) is cited by <strong>36 drafts</strong>, <strong>JWT</strong> (RFC 7519) by <strong>22</strong>, <strong>OAuth Bearer</strong> (RFC 6750) by <strong>9</strong>, and <strong>DPoP</strong> (RFC 9449) by <strong>9</strong>. The OAuth stack is the single most-referenced functional standard in the entire corpus after TLS. The agent identity problem runs through the landscape like a root system.</p>
<p>Here are all 14 drafts:</p>
<table>
<thead>
<tr>
<th>Draft</th>
<th>Approach</th>
<th style="text-align: right;">Score</th>
</tr>
</thead>
<tbody>
<tr>
<td><a href="https://datatracker.ietf.org/doc/draft-aylward-daap-v2/">draft-aylward-daap-v2</a></td>
<td>Comprehensive accountability protocol</td>
<td style="text-align: right;">4.75</td>
</tr>
<tr>
<td><a href="https://datatracker.ietf.org/doc/draft-goswami-agentic-jwt/">draft-goswami-agentic-jwt</a></td>
<td>Agentic JWT for autonomous systems</td>
<td style="text-align: right;">4.5</td>
</tr>
<tr>
<td><a href="https://datatracker.ietf.org/doc/draft-chen-oauth-rar-agent-extensions/">draft-chen-oauth-rar-agent-extensions</a></td>
<td>RAR extensions for agent policy</td>
<td style="text-align: right;">4.2</td>
</tr>
<tr>
<td><a href="https://datatracker.ietf.org/doc/draft-aap-oauth-profile/">draft-aap-oauth-profile</a></td>
<td>OAuth 2.0 profile for autonomous agents</td>
<td style="text-align: right;">4.2</td>
</tr>
<tr>
<td><a href="https://datatracker.ietf.org/doc/draft-barney-caam/">draft-barney-caam</a></td>
<td>Contextual agent authorization mesh</td>
<td style="text-align: right;">4.0</td>
</tr>
<tr>
<td><a href="https://datatracker.ietf.org/doc/draft-liu-agent-operation-authorization/">draft-liu-agent-operation-authorization</a></td>
<td>Verifiable delegation via JWT</td>
<td style="text-align: right;">4.1</td>
</tr>
<tr>
<td><a href="https://datatracker.ietf.org/doc/draft-rosenberg-oauth-aauth/">draft-rosenberg-oauth-aauth</a></td>
<td>OAuth for agents on PSTN/SMS</td>
<td style="text-align: right;">3.6</td>
</tr>
<tr>
<td><a href="https://datatracker.ietf.org/doc/draft-oauth-ai-agents-on-behalf-of-user/">draft-oauth-ai-agents-on-behalf-of-user</a></td>
<td>On-behalf-of-user extension</td>
<td style="text-align: right;">3.7</td>
</tr>
<tr>
<td><a href="https://datatracker.ietf.org/doc/draft-jia-oauth-scope-aggregation/">draft-jia-oauth-scope-aggregation</a></td>
<td>Scope aggregation for multi-step workflows</td>
<td style="text-align: right;">3.5</td>
</tr>
<tr>
<td><a href="https://datatracker.ietf.org/doc/draft-liu-oauth-a2a-profile/">draft-liu-oauth-a2a-profile</a></td>
<td>A2A profile for transaction tokens</td>
<td style="text-align: right;">3.6</td>
</tr>
<tr>
<td><a href="https://datatracker.ietf.org/doc/draft-song-oauth-ai-agent-authorization/">draft-song-oauth-ai-agent-authorization</a></td>
<td>Target-based authorization</td>
<td style="text-align: right;">2.8</td>
</tr>
<tr>
<td><a href="https://datatracker.ietf.org/doc/draft-song-oauth-ai-agent-collaborate-authz/">draft-song-oauth-ai-agent-collaborate-authz</a></td>
<td>Multi-agent collaboration authz</td>
<td style="text-align: right;">3.5</td>
</tr>
<tr>
<td><a href="https://datatracker.ietf.org/doc/draft-chen-ai-agent-auth-new-requirements/">draft-chen-ai-agent-auth-new-requirements</a></td>
<td>New auth requirements analysis</td>
<td style="text-align: right;">3.8</td>
</tr>
<tr>
<td><a href="https://datatracker.ietf.org/doc/draft-yao-agent-auth-considerations/">draft-yao-agent-auth-considerations</a></td>
<td>Auth considerations analysis</td>
<td style="text-align: right;">3.1</td>
</tr>
</tbody>
</table>
<p><em>(Scores are LLM-generated relative rankings from abstracts, not human expert assessments. See <a href="../methodology.md">Methodology</a>.)</em></p>
<p>The quality range is enormous -- from 2.8 to 4.75 -- and the approaches barely overlap. Some extend OAuth 2.0 with new grant types. Others define entirely new token formats (Agentic JWT). Still others propose mesh architectures or accountability layers on top of existing auth flows. Two drafts (song-oauth-ai-agent-authorization and song-oauth-ai-agent-collaborate-authz) come from the same Huawei team and address different facets of the problem. Two more (chen-oauth-rar-agent-extensions and chen-ai-agent-auth-new-requirements) come from a China Mobile team.</p>
<p>The gap our analysis identified in this cluster: most focus on <strong>single-agent authorization</strong>. Few address chained delegation across multiple agents, and none standardize real-time revocation in agent-to-agent workflows. An agent that obtains a token and delegates a sub-task to another agent -- which then delegates further -- creates a chain of trust that no single draft adequately covers.</p>
<p>A note on terminology: "consent" in the OAuth context means a technical authorization flow where a user delegates access scopes to a client. This is distinct from GDPR consent (<em>Einwilligung</em>) under Art. 6(1)(a) GDPR, which must be freely given, specific, informed, and unambiguous, and is revocable at any time. When AI agents further delegate to sub-agents, the chain of GDPR-valid consent may break entirely -- a problem none of these 14 drafts addresses. The controller-processor relationship under Art. 28 GDPR imposes additional requirements (data processing agreements, sub-processor authorization) that go beyond what any OAuth extension can express on its own.</p>
<h2 id="the-agent-gateway-melee-10-drafts">The Agent Gateway Melee: 10 Drafts</h2>
<p>If OAuth for agents is about identity, the agent gateway cluster is about communication architecture. Ten drafts are competing to define how agents from different platforms and ecosystems collaborate:</p>
<table>
<thead>
<tr>
<th>Draft</th>
<th>Approach</th>
<th style="text-align: right;">Score</th>
</tr>
</thead>
<tbody>
<tr>
<td><a href="https://datatracker.ietf.org/doc/draft-li-dmsc-macp/">draft-li-dmsc-macp</a></td>
<td>Multi-agent collaboration protocol suite</td>
<td style="text-align: right;">4.2</td>
</tr>
<tr>
<td><a href="https://datatracker.ietf.org/doc/draft-agent-gw/">draft-agent-gw</a></td>
<td>Semantic routing gateway</td>
<td style="text-align: right;">3.9</td>
</tr>
<tr>
<td><a href="https://datatracker.ietf.org/doc/draft-cui-dmsc-agent-cdi/">draft-cui-dmsc-agent-cdi</a></td>
<td>Cross-domain interop framework</td>
<td style="text-align: right;">3.0</td>
</tr>
<tr>
<td><a href="https://datatracker.ietf.org/doc/draft-han-rtgwg-agent-gateway-intercomm-framework/">draft-han-rtgwg-agent-gateway-intercomm-framework</a></td>
<td>Gateway intercommunication</td>
<td style="text-align: right;">3.6</td>
</tr>
<tr>
<td><a href="https://datatracker.ietf.org/doc/draft-li-dmsc-inf-architecture/">draft-li-dmsc-inf-architecture</a></td>
<td>DMSC infrastructure architecture</td>
<td style="text-align: right;">3.1</td>
</tr>
<tr>
<td><a href="https://datatracker.ietf.org/doc/draft-liu-dmsc-acps-arc/">draft-liu-dmsc-acps-arc</a></td>
<td>Agent collaboration protocols arch</td>
<td style="text-align: right;">3.6</td>
</tr>
<tr>
<td><a href="https://datatracker.ietf.org/doc/draft-yang-dmsc-ioa-task-protocol/">draft-yang-dmsc-ioa-task-protocol</a></td>
<td>IoA task protocol</td>
<td style="text-align: right;">3.0</td>
</tr>
<tr>
<td><a href="https://datatracker.ietf.org/doc/draft-yang-ioa-protocol/">draft-yang-ioa-protocol</a></td>
<td>IoA protocol</td>
<td style="text-align: right;">3.6</td>
</tr>
<tr>
<td><a href="https://datatracker.ietf.org/doc/draft-fu-nmop-agent-communication-framework/">draft-fu-nmop-agent-communication-framework</a></td>
<td>Network AIOps comm framework</td>
<td style="text-align: right;">3.0</td>
</tr>
<tr>
<td><a href="https://datatracker.ietf.org/doc/draft-campbell-agentic-http/">draft-campbell-agentic-http</a></td>
<td>HTTP best practices</td>
<td style="text-align: right;">--</td>
</tr>
</tbody>
</table>
<p>A revealing pattern: five of these ten drafts reference "DMSC" -- Dynamic Multi-agent Secured Collaboration -- a concept pushed primarily by Chinese institutions through the IETF's DMSC side meeting. This cluster represents an organized attempt to define the agent collaboration architecture, but even within that effort, multiple competing proposals have emerged.</p>
<p>The gap: no draft in this cluster addresses <strong>dynamic trust establishment between gateways</strong>, or how to handle conflicting semantic schemas across ecosystems. If Agent Gateway A speaks MCP and Agent Gateway B speaks A2A Protocol, these drafts describe the need for translation but do not provide it.</p>
<h2 id="the-near-duplicate-epidemic">The Near-Duplicate Epidemic</h2>
<p>Our embedding-based similarity analysis produced a more troubling finding: <strong>25+ draft pairs</strong> have cosine similarity above 0.98. Many are functionally identical proposals submitted under different names:</p>
<table>
<thead>
<tr>
<th>Draft A</th>
<th>Draft B</th>
<th>Reason</th>
</tr>
</thead>
<tbody>
<tr>
<td>draft-a2a-moqt-transport</td>
<td>draft-nandakumar-a2a-moqt-transport</td>
<td>Same content, different name</td>
</tr>
<tr>
<td>draft-abbey-scim-agent-extension</td>
<td>draft-scim-agent-extension</td>
<td>Same draft, dual submission</td>
</tr>
<tr>
<td>draft-rosenberg-aiproto</td>
<td>draft-rosenberg-aiproto-nact</td>
<td>Renamed</td>
</tr>
<tr>
<td>draft-rosenberg-aiproto-cheq</td>
<td>draft-rosenberg-cheq</td>
<td>Renamed</td>
</tr>
<tr>
<td>draft-cui-nmrg-llm-nm</td>
<td>draft-irtf-nmrg-llm-nm</td>
<td>WG adoption (individual to IRTF)</td>
</tr>
<tr>
<td>draft-ar-emu-hybrid-pqc-eapaka</td>
<td>draft-ietf-emu-hybrid-pqc-eapaka</td>
<td>WG adoption</td>
</tr>
<tr>
<td>draft-zheng-agent-identity-management</td>
<td>draft-zheng-dispatch-agent-identity-management</td>
<td>Same draft, different WG</td>
</tr>
<tr>
<td>draft-sun-zhang-iaip</td>
<td>draft-sz-dmsc-iaip</td>
<td>Same draft, different WG</td>
</tr>
<tr>
<td>draft-zeng-mcp-troubleshooting</td>
<td>draft-zm-rtgwg-mcp-troubleshooting</td>
<td>Same draft, different WG</td>
</tr>
</tbody>
</table>
<p>Some of these duplications are legitimate IETF process: a draft moves from individual submission to working group adoption (like draft-cui-nmrg-llm-nm becoming draft-irtf-nmrg-llm-nm). Others reflect authors shopping the same draft to multiple working groups. And a few appear to be genuine content duplication -- the same ideas submitted under different author combinations.</p>
<p>The practical effect: the 434-draft corpus includes substantial double-counting. After de-duplication, the true number of distinct proposals is somewhat lower -- removing the 25 near-duplicate pairs yields roughly 409 distinct drafts, and further accounting for related-but-not-identical submissions brings the number down further. But even with generous de-duplication, the volume is extraordinary.</p>
<h2 id="the-a2a-protocol-zoo">The A2A Protocol Zoo</h2>
<p>Zooming out from individual clusters, the broadest fragmentation is in the <strong>155 A2A protocol drafts</strong>. These span everything from low-level transport (A2A over MOQT/QUIC) to high-level semantic routing (intent-based agent interconnection) to specific use cases (MCP for network troubleshooting).</p>
<p>The most common technical idea in the entire corpus -- "Multi-Agent Communication Protocol" -- appears in <strong>8 separate drafts</strong> from different teams. Eight teams are independently designing how agents should talk to each other.</p>
<table>
<thead>
<tr>
<th>Competing Area</th>
<th style="text-align: right;">Drafts</th>
<th>Distinguishing Fact</th>
</tr>
</thead>
<tbody>
<tr>
<td>OAuth for agents</td>
<td style="text-align: right;">14</td>
<td>No draft handles chained delegation</td>
</tr>
<tr>
<td>Agent gateway/collaboration</td>
<td style="text-align: right;">10</td>
<td>5 are DMSC-linked; no trust framework</td>
</tr>
<tr>
<td>Agent discovery</td>
<td style="text-align: right;">6</td>
<td>Range from DNS-based to full directories</td>
</tr>
<tr>
<td>Intent-based routing</td>
<td style="text-align: right;">5</td>
<td>Requirements-heavy, protocol-light</td>
</tr>
<tr>
<td>6G agent requirements</td>
<td style="text-align: right;">6</td>
<td>Wish lists, not specifications</td>
</tr>
<tr>
<td>SCIM/identity registry</td>
<td style="text-align: right;">6</td>
<td>3 are near-duplicates</td>
</tr>
</tbody>
</table>
<p>The discovery cluster is particularly illustrative. Six drafts propose different ways to find AI agents: <a href="https://datatracker.ietf.org/doc/draft-narajala-ans/">draft-narajala-ans</a> (score 4.2) proposes a DNS-based Agent Name Service. <a href="https://datatracker.ietf.org/doc/draft-mozleywilliams-dnsop-bandaid/">draft-mozleywilliams-dnsop-bandaid</a> (3.6) also uses DNS but via SVCB records. <a href="https://datatracker.ietf.org/doc/draft-pioli-agent-discovery/">draft-pioli-agent-discovery</a> (3.2) defines a lightweight registration and discovery protocol. <a href="https://datatracker.ietf.org/doc/draft-gaikwad-woa/">draft-gaikwad-woa</a> (3.2) proposes a Web of Agents format using JSON Schema. None of them reference each other.</p>
<h2 id="the-deeper-fragmentation-different-technological-dna">The Deeper Fragmentation: Different Technological DNA</h2>
<p>The protocol-level fragmentation documented above is only the visible layer. Beneath it, our RFC cross-reference analysis reveals a more fundamental divide: the two major drafting blocs are building on <strong>entirely different technology stacks</strong>.</p>
<table>
<thead>
<tr>
<th>Foundation</th>
<th>Chinese Bloc</th>
<th>Western Bloc</th>
</tr>
</thead>
<tbody>
<tr>
<td><strong>Network management (YANG/NETCONF)</strong></td>
<td>Strong (RFC 6241, 8639, 8641, 7950)</td>
<td>Absent</td>
</tr>
<tr>
<td><strong>IoT security (COSE/CBOR/OSCORE/CoAP)</strong></td>
<td>Absent</td>
<td>Strong (RFC 9052, 8949, 8613, 7252)</td>
</tr>
<tr>
<td><strong>PKI/Certificates (X.509)</strong></td>
<td>Absent</td>
<td>Strong (RFC 5280)</td>
</tr>
<tr>
<td><strong>Lightweight auth (EDHOC, CWT)</strong></td>
<td>Absent</td>
<td>Strong (RFC 9528, 8392)</td>
</tr>
<tr>
<td><strong>Web APIs (HTTP Semantics)</strong></td>
<td>Weak</td>
<td>Strong (RFC 9110)</td>
</tr>
<tr>
<td><strong>TLS 1.3</strong></td>
<td>Moderate (8 citations)</td>
<td>Strong (18 citations)</td>
</tr>
<tr>
<td><strong>OAuth 2.0</strong></td>
<td>Present (11 citations)</td>
<td>Present (7 citations)</td>
</tr>
</tbody>
</table>
<p>The Chinese bloc -- Huawei, China Mobile, China Telecom, China Unicom, and associated research institutions -- builds agent infrastructure on <strong>YANG/NETCONF</strong>, the network management protocols that underpin autonomous network operations. The Western bloc -- Ericsson, Cisco, ATHENA, and European research labs -- builds on <strong>COSE/CBOR/CoAP</strong> (IoT security) and <strong>HTTP/TLS/PKI</strong> (web infrastructure).</p>
<p>The <strong>only shared foundation</strong> is OAuth 2.0, which both blocs cite at comparable rates. This is why the OAuth cluster has 14 competing proposals: it is the one piece of common ground, and everyone is fighting over it.</p>
<p>This means fragmentation goes deeper than protocol design. Even if the community agreed on a single agent communication pattern, the underlying plumbing is incompatible. A Chinese draft building on NETCONF and a Western draft building on CoAP cannot interoperate without a translation layer -- and that translation layer, as we document in the gap analysis, does not exist.</p>
<h2 id="what-fragmentation-costs">What Fragmentation Costs</h2>
<p>The costs of this fragmentation are not theoretical:</p>
<p><strong>For implementers</strong>: Which OAuth extension do you implement? Do you support SCIM agent schemas or Web of Agents? If your agent needs to discover another agent, do you look in DNS, a well-known URI, or a dedicated directory? Today there is no canonical answer, and choosing wrong means re-implementation when the IETF eventually converges.</p>
<p><strong>For the IETF process</strong>: Working groups spend time evaluating competing proposals that could be spent converging on solutions. The OAuth working group alone faces 14 agent-related drafts. The volume creates overhead that slows progress on any single proposal.</p>
<p><strong>For security</strong>: When multiple incompatible authentication and authorization schemes exist, implementations inevitably take shortcuts. The most dangerous agents will be those that implement the easiest -- not the most secure -- available standard.</p>
<p><strong>For the ecosystem</strong>: Each month that fragmentation persists, real-world agent deployments make choices. Those choices entrench specific approaches, making convergence harder and interoperability more expensive. The window for a unified standard narrows with every proprietary deployment.</p>
<p><strong>A note on IETF IPR policy</strong>: Implementers considering building on any of the OAuth or protocol drafts discussed above should be aware that Internet-Drafts may be subject to intellectual property rights (IPR) claims. Under BCP 79 (RFC 8179), IETF participants are expected to disclose known IPR. Check the <a href="https://datatracker.ietf.org/ipr/">IETF IPR disclosure database</a> before implementing.</p>
<h2 id="the-convergence-signals">The Convergence Signals</h2>
<p>Not everything is divergence. A few positive patterns emerged from the data:</p>
<p><strong>EDHOC is converging.</strong> The lightweight authenticated key exchange protocol has multiple working-group-adopted drafts (<a href="https://datatracker.ietf.org/doc/draft-ietf-lake-edhoc-psk/">draft-ietf-lake-edhoc-psk</a>, <a href="https://datatracker.ietf.org/doc/draft-ietf-lake-authz/">draft-ietf-lake-authz</a>, <a href="https://datatracker.ietf.org/doc/draft-ietf-emu-eap-edhoc/">draft-ietf-emu-eap-edhoc</a>) with coordinated authorship. This is what healthy standards development looks like: multiple drafts from different teams that explicitly build on each other.</p>
<p><strong>SCIM agent extensions show maturity.</strong> The Okta team's <a href="https://datatracker.ietf.org/doc/draft-abbey-scim-agent-extension/">draft-abbey-scim-agent-extension</a> (score 3.8) and <a href="https://datatracker.ietf.org/doc/draft-wahl-scim-agent-schema/">draft-wahl-scim-agent-schema</a> (score 3.9) represent a practical approach: extend an existing, widely-deployed protocol (SCIM) rather than invent a new one. This pragmatism is a convergence signal.</p>
<p><strong>The verifiable conversations approach is gaining traction.</strong> <a href="https://datatracker.ietf.org/doc/draft-birkholz-verifiable-agent-conversations/">draft-birkholz-verifiable-agent-conversations</a> (score 4.5) and the WIMSE/ECT work on execution context tokens represent a "record everything, verify later" approach to agent accountability that multiple communities can support.</p>
<h2 id="what-needs-to-happen">What Needs to Happen</h2>
<p>Three structural interventions would accelerate convergence:</p>
<p><strong>1. Working groups need to pick winners.</strong> The IETF process allows competing proposals, but at some point working groups must adopt specific approaches and redirect competing efforts. In the OAuth agent space, the highest-quality proposals (DAAP, Agentic JWT, RAR extensions) should be evaluated head-to-head, not allowed to proliferate indefinitely.</p>
<p><strong>2. Interoperability testing, not just drafting.</strong> The 155 A2A protocol proposals exist mostly as text. Interop testing -- where implementations from different teams prove they can work together -- would quickly reveal which proposals have real engineering substance and which are paper exercises.</p>
<p><strong>3. The translation layer must be built.</strong> Rather than picking one A2A protocol, the community may be better served by a thin interoperability layer that lets agents using different protocols communicate through gateways. Our gap analysis found this cross-protocol translation gap entirely unaddressed -- zero technical ideas in the current corpus.</p>
<hr />
<h3 id="key-takeaways">Key Takeaways</h3>
<ul>
<li><strong>14 competing OAuth-for-agents proposals</strong> illustrate the depth of fragmentation; none handle chained delegation across agent networks</li>
<li><strong>155 A2A protocol drafts</strong> exist without an interoperability layer; the most common idea in the corpus appears in 8 separate drafts from different teams</li>
<li><strong>25+ near-duplicate pairs</strong> (&gt;0.98 similarity) inflate the draft count; after de-duplication, roughly 409 distinct proposals remain</li>
<li><strong>Convergence signals exist</strong> in EDHOC authentication, SCIM agent extensions, and verifiable conversations -- areas where teams explicitly build on each other</li>
<li><strong>Fragmentation goes deeper than protocols</strong>: Chinese and Western blocs build on different RFC foundations (YANG/NETCONF vs COSE/CBOR/CoAP); the only shared bedrock is OAuth 2.0</li>
<li><strong>The missing piece</strong> is a cross-protocol translation layer; no draft in the corpus addresses how agents using different protocols can interoperate</li>
</ul>
<p><em>Next in this series: <a href="04-what-nobody-builds.md">What Nobody's Building (And Why It Matters)</a> -- The 11 gaps in the IETF's AI agent landscape, and the real-world consequences of leaving them unfilled.</em></p>
<hr />
<p><em>Data from the IETF Draft Analyzer's embedding-based overlap analysis (nomic-embed-text) and cluster detection at 0.85/0.90 similarity thresholds.</em></p>
<div class="post-nav"><a href="/blog/posts/02-who-writes-the-rules.html">&larr; Who Writes the Rules</a><a href="/blog/posts/04-what-nobody-builds.html">What Nobody Builds &rarr;</a></div>
<footer>
<p>IETF Draft Analyzer &mdash; Data collected through March 2026.
<a href="https://github.com/cnennemann/ietf-draft-analyzer">Source on GitHub</a></p>
</footer>
</div>
</body>
</html>

View File

@@ -0,0 +1,196 @@
<!DOCTYPE html>
<html lang="en">
<head>
<meta charset="utf-8">
<meta name="viewport" content="width=device-width, initial-scale=1">
<title>What Nobody's Building (And Why It Matters) — IETF AI Agent Analysis</title>
<link rel="stylesheet" href="/blog/css/style.css">
</head>
<body>
<div class="container">
<nav><a href="/blog/" class="site-title">IETF AI Agent Analysis</a>
<a href="/blog/posts/01-gold-rush.html">Rush</a>
<a href="/blog/posts/02-who-writes-the-rules.html">Rules</a>
<a href="/blog/posts/03-oauth-wars.html">Wars</a>
<strong>Builds</strong>
<a href="/blog/posts/05-1262-ideas.html">Converge</a>
<a href="/blog/posts/06-big-picture.html">Picture</a>
<a href="/blog/posts/07-how-we-built-this.html">This</a>
<a href="/blog/posts/08-agents-building-the-analysis.html">Analysis</a></nav>
<h1 id="what-nobodys-building-and-why-it-matters">What Nobody's Building (And Why It Matters)</h1>
<p><em>The 11 gaps in the IETF's AI agent landscape -- and the real-world disasters they invite.</em></p>
<p><em>Analysis based on IETF Datatracker data collected through March 2026. Counts and statistics reflect this snapshot.</em></p>
<hr />
<p>Imagine an AI agent managing a hospital's drug-dispensing system. It receives instructions from a prescribing agent, coordinates with a pharmacy agent, and issues delivery commands to a robotic dispensing agent. On Tuesday morning, the prescribing agent hallucinates a dosage. The pharmacy agent fills it. The dispensing agent delivers it. No human saw it happen. No system flagged it. No protocol exists to roll back the dispensed medication.</p>
<p>To be clear: this scenario is already regulated. Under the EU AI Act (Regulation 2024/1689), a drug-dispensing AI agent is a high-risk AI system under Annex III, requiring conformity assessment, risk management, and human oversight before deployment. The Medical Devices Regulation (MDR 2017/745) imposes additional obligations. The gap is not one of legal accountability -- it is one of technical implementation. The standards that would let developers <em>comply</em> with these regulations in multi-agent architectures do not yet exist.</p>
<p>This is the predictable consequence of the IETF's most critical standardization gaps.</p>
<p>We analyzed <strong>434 Internet-Drafts</strong>, extracted their technical components, and compared the result against what real-world agent deployments actually require. We found <strong>11 gaps</strong> -- areas where standardization work is missing or inadequate. Two of them are critical. And the critical ones share a defining characteristic: they address what happens when autonomous agents fail or misbehave.</p>
<p>Nobody is building the safety net.</p>
<h2 id="the-12-gaps">The 12 Gaps</h2>
<p>Our gap analysis sorted findings by severity based on the breadth of the shortfall and the consequences of leaving it unfilled:</p>
<table>
<thead>
<tr>
<th>#</th>
<th>Gap</th>
<th>Severity</th>
</tr>
</thead>
<tbody>
<tr>
<td>1</td>
<td>Agent Behavioral Verification</td>
<td>CRITICAL</td>
</tr>
<tr>
<td>2</td>
<td>Agent Failure Cascade Prevention</td>
<td>CRITICAL</td>
</tr>
<tr>
<td>3</td>
<td>Real-Time Agent Rollback Mechanisms</td>
<td>HIGH</td>
</tr>
<tr>
<td>4</td>
<td>Multi-Agent Consensus Protocols</td>
<td>HIGH</td>
</tr>
<tr>
<td>5</td>
<td>Human Override Standardization</td>
<td>HIGH</td>
</tr>
<tr>
<td>6</td>
<td>Cross-Domain Agent Audit Trails</td>
<td>HIGH</td>
</tr>
<tr>
<td>7</td>
<td>Federated Agent Learning Privacy</td>
<td>HIGH</td>
</tr>
<tr>
<td>8</td>
<td>Cross-Protocol Agent Migration</td>
<td>MEDIUM</td>
</tr>
<tr>
<td>9</td>
<td>Agent Resource Accounting and Billing</td>
<td>MEDIUM</td>
</tr>
<tr>
<td>10</td>
<td>Agent Capability Negotiation</td>
<td>MEDIUM</td>
</tr>
<tr>
<td>11</td>
<td>Agent Performance Benchmarking</td>
<td>MEDIUM</td>
</tr>
</tbody>
</table>
<p>The gap names above match the automated gap analysis output. The two critical gaps -- behavioral verification and failure cascade prevention -- address what happens when autonomous agents deviate from declared behavior or trigger cascading failures across interconnected systems. Several high-severity gaps (rollback mechanisms, human override, consensus protocols) address the same theme: what happens when things go wrong, and nobody has built the safety net.</p>
<p>A notable omission from this gap list: <strong>GDPR-mandated capabilities</strong>. The gap analysis focuses on technical desiderata but does not engage with the EU's legally binding data protection framework. Specific GDPR requirements that have no corresponding IETF draft work include: Data Protection Impact Assessment (DPIA) tooling for high-risk agent processing (Art. 35 GDPR), right-to-erasure propagation across multi-agent chains (Art. 17), data portability for agent-generated personal data (Art. 20), and purpose limitation enforcement when agents are authorized for specific tasks but may repurpose data (Art. 5(1)(b)). These are not optional features for EU-deployed agent systems -- they are legal requirements.</p>
<h2 id="critical-gap-1-agent-behavior-verification">Critical Gap 1: Agent Behavior Verification</h2>
<p><strong>The problem</strong>: No mechanism exists to verify that a deployed AI agent actually behaves according to its declared policies or specifications.</p>
<p><strong>The numbers</strong>: Only <strong>47 of 434 drafts</strong> address AI safety and alignment. The capability-to-safety ratio is roughly 4:1 on aggregate -- though it varies significantly by month, from as low as 1.5:1 to as high as 21:1. The trend is clear: the community is building agents faster than it is building the tools to keep them honest.</p>
<p><strong>What partially addresses this</strong>: Some work exists on the periphery. <a href="https://datatracker.ietf.org/doc/draft-aylward-daap-v2/">draft-aylward-daap-v2</a> (score 4.75 -- the highest-rated draft in the corpus) defines a behavioral monitoring framework and cryptographic identity verification. <a href="https://datatracker.ietf.org/doc/draft-birkholz-verifiable-agent-conversations/">draft-birkholz-verifiable-agent-conversations</a> (score 4.5) proposes verifiable conversation records using COSE signing. <a href="https://datatracker.ietf.org/doc/draft-berlinai-vera/">draft-berlinai-vera</a> (score 3.9) introduces a zero-trust architecture with five enforcement pillars.</p>
<p><strong>What is still missing</strong>: Runtime verification. These drafts define what agents <em>should</em> do and how to <em>record</em> what they did. None provides a real-time mechanism to detect that an agent is deviating from its declared behavior <em>while it is operating</em>. The gap is between policy declaration and policy enforcement -- the difference between a speed limit sign and a speed camera.</p>
<p><strong>The scenario</strong>: A financial trading agent is authorized to execute trades within specified parameters. It begins operating within bounds but, after a model update, starts exceeding risk limits. Without runtime behavior verification, the deviation is only discovered in post-hoc audit -- potentially days later, after significant damage.</p>
<h2 id="critical-gap-2-agent-failure-cascade-prevention">Critical Gap 2: Agent Failure Cascade Prevention</h2>
<p><strong>The problem</strong>: No protocols exist to prevent agent failures from cascading across interconnected autonomous systems. As agent interdependencies increase in production deployments, a failure in one agent can ripple outward.</p>
<p><strong>The numbers</strong>: Only <strong>47 drafts</strong> address AI safety despite 434 total drafts, and the high interconnectivity implied by 155 A2A protocols and 114 autonomous netops drafts creates the conditions for cascade failures.</p>
<p><strong>What is missing</strong>: Circuit breakers for cascading failures. Checkpoint and rollback protocols. Blast radius containment. Graceful degradation. All concepts well-established in distributed systems engineering, but absent from the agent standards landscape.</p>
<p><strong>The scenario</strong>: A telecom operator deploys 50 AI agents for network monitoring, troubleshooting, and optimization. During a major outage, all 50 agents simultaneously request inference resources to diagnose the problem. With no failure cascade prevention, agents compete chaotically. The most aggressive agents get resources; the most important diagnostic tasks may not. The outage extends because the agents that could fix it are starved by the agents that are observing it. For telecom operators in the EU, the NIS2 Directive (Directive 2022/2555) classifies electronic communications as an essential service, requiring incident response capabilities and supply chain security measures -- making cascade prevention not just an engineering problem but a regulatory obligation.</p>
<h2 id="high-gap-real-time-agent-rollback-mechanisms">High Gap: Real-Time Agent Rollback Mechanisms</h2>
<p><strong>The problem</strong>: No standards exist for how to quickly roll back incorrect decisions made by autonomous agents across distributed systems.</p>
<p><strong>The numbers</strong>: 114 autonomous netops drafts exist, but no rollback mechanisms for production network safety. <a href="https://datatracker.ietf.org/doc/draft-yue-anima-agent-recovery-networks/">draft-yue-anima-agent-recovery-networks</a> (score 4.1) is among the few drafts that partially addresses this, with its Task-Oriented Multi-Agent Recovery Framework and State Consistency Management. For context, "Multi-Agent Communication Protocol" -- defining how agents <em>talk</em> -- appears in 8 drafts. The community has invested far more effort in the plumbing than in the fire escape.</p>
<p><strong>What is missing</strong>: Circuit breakers for cascading failures. Checkpoint and rollback protocols. Blast radius containment. Graceful degradation. All concepts well-established in distributed systems engineering, but absent from the agent standards landscape.</p>
<p><strong>The scenario</strong>: A multi-agent supply chain system manages inventory, shipping, and payments. The inventory agent processes a large batch incorrectly, leading the shipping agent to dispatch wrong items, which causes the payment agent to process refunds to wrong accounts. The cascade happens in minutes. Without rollback mechanisms, untangling the mess requires manual human intervention across three systems -- and the agents continue making decisions based on corrupted state while humans try to intervene.</p>
<h2 id="the-high-priority-gaps">The High-Priority Gaps</h2>
<p>Several additional gaps scored HIGH severity. Each represents a missing piece that working deployments will hit:</p>
<h3 id="human-override-standardization">Human Override Standardization</h3>
<p>Only <strong>34 human-agent interaction drafts</strong> exist versus <strong>114 autonomous operations</strong> and <strong>155 A2A protocol</strong> drafts. Agents are being designed to talk to each other at a roughly 4:1 ratio (averaging ~4:1, varying from 1.5:1 to 21:1 month-to-month) over being designed to talk to humans. Emergency override protocols -- the "big red button" -- are almost entirely absent. This is not merely an engineering preference. For high-risk AI systems deployed in the EU, the AI Act (Art. 14) mandates human oversight -- making this gap a compliance blocker, not just a design omission.</p>
<p><a href="https://datatracker.ietf.org/doc/draft-rosenberg-aiproto-cheq/">draft-rosenberg-aiproto-cheq</a> (score 3.9) is a rare exception: it defines a protocol for human confirmation of agent decisions before execution. But CHEQ is opt-in and pre-execution. No draft defines what happens when a human needs to stop a running agent, constrain its behavior, or take over its task mid-execution.</p>
<h3 id="multi-agent-consensus-protocols">Multi-Agent Consensus Protocols</h3>
<p>When a group of agents disagree -- the diagnosis agent says the router is down, the monitoring agent says it is up, the optimization agent is rerouting traffic around it -- who arbitrates? No framework exists for agents to resolve conflicting assessments without human intervention. This is not a new problem: FIPA (Foundation for Intelligent Physical Agents) defined agent communication languages and interaction protocols for multi-agent coordination as early as 1997. The IETF landscape has largely not engaged with this prior art.</p>
<h3 id="cross-domain-agent-audit-trails">Cross-Domain Agent Audit Trails</h3>
<p>An agent operating across multiple domains or organizations needs to maintain audit trails that satisfy different regulatory requirements simultaneously. Identity management exists -- the 152 identity/auth drafts cover authentication. What does not exist is cross-domain audit standardization: the format and semantics for recording agent actions across jurisdictions with varying compliance requirements. The EU's eIDAS 2.0 regulation (Regulation 2024/1183) and its European Digital Identity Wallet framework provide a mature trust model that the IETF drafts have not yet connected to.</p>
<h3 id="federated-agent-learning-privacy">Federated Agent Learning Privacy</h3>
<p>While federated architectures exist, there is insufficient specification for privacy-preserving agent learning that prevents data leakage between federated participants during model updates. The absence of secure update mechanisms also intersects with the EU Cyber Resilience Act (Regulation 2024/2847), which requires products with digital elements -- including AI agent software -- to handle updates securely and provide vulnerability management throughout their lifecycle.</p>
<h3 id="cross-protocol-agent-migration">Cross-Protocol Agent Migration</h3>
<p>Agents need to migrate between different network protocols, domains, or infrastructure providers while maintaining state and identity. Current drafts focus on registration but not migration continuity.</p>
<h2 id="the-structural-problem">The Structural Problem</h2>
<p>Here is the finding the Architect on our team surfaced that reframes the entire gap analysis:</p>
<p><strong>The severity of each gap appears to correlate with the coordination difficulty required to fill it.</strong></p>
<p>The critical gaps (behavior verification, resource management, error recovery) require agreement across <em>multiple</em> IETF working groups. They cut across safety, networking, identity, and operations -- areas currently owned by separate teams that rarely collaborate. The high gaps (cross-protocol translation, human override, consensus) require even broader agreement: they need architects who see the whole ecosystem, not just their protocol.</p>
<p>Now look back at the team bloc analysis from Post 2. The 18 team blocs are <em>islands</em>. Cross-team collaboration is sparse. The strongest cross-bloc connection involves 3 shared drafts. The gaps that require the most cross-team work are being produced by an ecosystem that does the least cross-team work.</p>
<p>This is the structural explanation for the safety deficit. It is not that people do not care about safety. It is that safety standards require coordination across boundaries that the current authorship structure cannot bridge. Capability standards can be built within a single team. Safety standards cannot.</p>
<p>Our category co-occurrence analysis provides the concrete proof. Safety drafts are not entirely isolated -- they co-occur with several categories, coupling most strongly with policy and governance and identity/auth. But the pattern is revealing: safety pairs with <em>governance</em> categories, not <em>implementation</em> categories. Of the 155 drafts tagged as A2A protocols, very few also address safety. Safety has minimal co-occurrence with agent discovery/registration and model serving/inference. Its weakest links are to the categories where agents actually <em>do</em> things. Safety is being discussed in governance papers. It is barely present in the protocols that need it most. The traffic lights are not just behind the highways -- they are on a different road entirely.</p>
<p>IEEE P3394 (Standard for Trustworthy AI Agents), a concurrent standardization effort, is attempting to address some of these safety and trust dimensions from a different angle. The IETF landscape should be compared against these parallel efforts to understand which gaps are being addressed elsewhere and which remain truly unserved.</p>
<h2 id="the-41-ratio-revisited">The ~4:1 Ratio, Revisited</h2>
<p>The safety deficit is not just a number. It is a structural property of how the IETF's AI agent community is organized.</p>
<table>
<thead>
<tr>
<th>Category</th>
<th style="text-align: right;">Drafts</th>
<th style="text-align: right;">Team Blocs Active</th>
</tr>
</thead>
<tbody>
<tr>
<td>A2A protocols</td>
<td style="text-align: right;">155</td>
<td style="text-align: right;">Many (distributed across blocs)</td>
</tr>
<tr>
<td>Autonomous operations</td>
<td style="text-align: right;">114</td>
<td style="text-align: right;">Primarily Huawei, Chinese telecom</td>
</tr>
<tr>
<td>Agent identity/auth</td>
<td style="text-align: right;">152</td>
<td style="text-align: right;">Ericsson, Nokia, ATHENA, multiple</td>
</tr>
<tr>
<td><strong>AI safety/alignment</strong></td>
<td style="text-align: right;"><strong>47</strong></td>
<td style="text-align: right;"><strong>Few; mostly independents/startups</strong></td>
</tr>
<tr>
<td><strong>Human-agent interaction</strong></td>
<td style="text-align: right;"><strong>34</strong></td>
<td style="text-align: right;"><strong>Rosenberg/White (2-person team)</strong></td>
</tr>
</tbody>
</table>
<p>The capability categories have organized teams behind them. The safety categories rely on individual contributors and small, unconnected teams. The best safety draft in the corpus (DAAP, score 4.75) comes from an independent author (Aylward). The best human-agent drafts come from a two-person Five9/Bitwave team. There is no 13-person safety bloc with 94% cohesion.</p>
<p>Until that changes -- until safety and human oversight attract the same organized, sustained effort as communication protocols -- the ~4:1 aggregate ratio will persist. And the gaps will remain open.</p>
<hr />
<h3 id="key-takeaways">Key Takeaways</h3>
<ul>
<li><strong>11 gaps</strong> exist in the IETF's AI agent landscape: 2 critical, 5 high, 4 medium</li>
<li><strong>The 2 critical gaps</strong> address failure modes: behavioral verification and failure cascade prevention</li>
<li><strong>Agent rollback mechanisms and human override standardization</strong> are high-severity gaps with minimal coverage across 434 drafts</li>
<li><strong>Gap severity appears to correlate with coordination difficulty</strong>: the hardest gaps require cross-team, cross-WG collaboration that the current island structure cannot produce</li>
<li><strong>The safety deficit appears structural, not attitudinal</strong>: capability standards can be built by one team; safety standards require ecosystem-wide coordination that does not yet exist</li>
<li><strong>GDPR-mandated capabilities</strong> (DPIA support, erasure propagation, data portability, purpose limitation) represent an additional missing dimension not captured in the automated gap analysis</li>
</ul>
<p><em>Next in this series: <a href="05-1262-ideas.md">Where 434 Drafts Converge (And Where They Don't)</a> -- the fragmentation goes all the way down.</em></p>
<hr />
<p><em>Gap analysis based on 434 drafts, cross-referenced against real-world deployment requirements for autonomous AI agent systems. Data current as of March 2026.</em></p>
<div class="post-nav"><a href="/blog/posts/03-oauth-wars.html">&larr; The OAuth Wars</a><a href="/blog/posts/05-1262-ideas.html">Where Drafts Converge &rarr;</a></div>
<footer>
<p>IETF Draft Analyzer &mdash; Data collected through March 2026.
<a href="https://github.com/cnennemann/ietf-draft-analyzer">Source on GitHub</a></p>
</footer>
</div>
</body>
</html>

View File

@@ -0,0 +1,367 @@
<!DOCTYPE html>
<html lang="en">
<head>
<meta charset="utf-8">
<meta name="viewport" content="width=device-width, initial-scale=1">
<title>Where 434 Drafts Converge (And Where They Don't) — IETF AI Agent Analysis</title>
<link rel="stylesheet" href="/blog/css/style.css">
</head>
<body>
<div class="container">
<nav><a href="/blog/" class="site-title">IETF AI Agent Analysis</a>
<a href="/blog/posts/01-gold-rush.html">Rush</a>
<a href="/blog/posts/02-who-writes-the-rules.html">Rules</a>
<a href="/blog/posts/03-oauth-wars.html">Wars</a>
<a href="/blog/posts/04-what-nobody-builds.html">Builds</a>
<strong>Converge</strong>
<a href="/blog/posts/06-big-picture.html">Picture</a>
<a href="/blog/posts/07-how-we-built-this.html">This</a>
<a href="/blog/posts/08-agents-building-the-analysis.html">Analysis</a></nav>
<h1 id="where-434-drafts-converge-and-where-they-dont">Where 434 Drafts Converge (And Where They Don't)</h1>
<p><em>The fragmentation goes deeper than competing protocols. It extends all the way down to the idea level.</em></p>
<p><em>Analysis based on IETF Datatracker data collected through March 2026. Counts and statistics reflect this snapshot.</em></p>
<hr />
<p>We extracted technical components from 434 Internet-Drafts -- mechanisms, architectures, protocols, and patterns. Then we asked: how many of these ideas does anyone else also propose?</p>
<p>The current database contains <strong>419 extracted ideas</strong> across 377 drafts. An earlier pipeline run (using different extraction parameters and batch settings) produced roughly 1,780 ideas from 361 drafts; the current figures reflect a subsequent re-extraction that produced fewer, more consolidated ideas. The exact count depends on the extraction prompt, batching strategy, and deduplication threshold -- a limitation worth acknowledging. What is robust across both runs is the <em>pattern</em>: the vast majority of extracted ideas appear in exactly one draft. Only a handful show cross-draft convergence by exact title matching. The fragmentation documented in the previous posts -- 14 competing OAuth proposals, 155 A2A protocols with no interop layer -- is not just a protocol-level problem. It extends all the way down. At the idea level, the landscape is overwhelmingly a collection of islands.</p>
<p>But islands are not the whole story. Using fuzzy matching (SequenceMatcher at 0.75 threshold) across organizational boundaries, we found <strong>130 cross-org convergent ideas</strong> where different organizations are working on recognizably similar problems -- even when they use different names and different approaches. (An earlier pipeline run with ~1,780 raw ideas produced 628 cross-org convergent ideas; the current, more consolidated extraction of 419 ideas yields 130 at the same threshold -- 36% of unique clusters, a comparable convergence rate.) These cross-org convergence signals are the embryonic consensus of the agent standards landscape: the problems that different teams, in different countries, with different agendas, independently recognize and attempt to solve.</p>
<p>These convergence signals are more impressive than they first appear. Recall from Post 2 that <strong>55% of all drafts have never been revised</strong> beyond their first submission, and <strong>65% of Huawei's drafts</strong> are fire-and-forget. The ideas that converge across organizations are not the generic scaffolding of first-draft submissions -- they represent genuine engineering investment from teams that independently identified the same problem and committed resources to solving it.</p>
<p>The picture that emerges is paradoxical: the raw material for a complete agent ecosystem exists. The convergent ideas point toward the architecture the ecosystem needs. But they exist in isolation -- proposed by separate teams, embedded in separate drafts, with no connective tissue linking them into a coherent blueprint.</p>
<h2 id="the-taxonomy">The Taxonomy</h2>
<p>Every extracted idea was classified by type. The distribution reveals what kind of thinking dominates the landscape:</p>
<table>
<thead>
<tr>
<th>Type</th>
<th style="text-align: right;">Count</th>
<th style="text-align: right;">Share</th>
<th>What It Means</th>
</tr>
</thead>
<tbody>
<tr>
<td>Protocol</td>
<td style="text-align: right;">96</td>
<td style="text-align: right;">23%</td>
<td>Full protocol specifications</td>
</tr>
<tr>
<td>Architecture</td>
<td style="text-align: right;">95</td>
<td style="text-align: right;">23%</td>
<td>System designs and reference models</td>
</tr>
<tr>
<td>Extension</td>
<td style="text-align: right;">79</td>
<td style="text-align: right;">19%</td>
<td>Additions to existing standards (OAuth, SCIM, DNS)</td>
</tr>
<tr>
<td>Mechanism</td>
<td style="text-align: right;">68</td>
<td style="text-align: right;">16%</td>
<td>Concrete technical solutions: auth flows, routing algorithms, token formats</td>
</tr>
<tr>
<td>Requirement</td>
<td style="text-align: right;">42</td>
<td style="text-align: right;">10%</td>
<td>Formal requirement documents</td>
</tr>
<tr>
<td>Pattern</td>
<td style="text-align: right;">35</td>
<td style="text-align: right;">8%</td>
<td>Reusable design approaches</td>
</tr>
<tr>
<td>Framework</td>
<td style="text-align: right;">3</td>
<td style="text-align: right;">1%</td>
<td>Frameworks, profiles</td>
</tr>
<tr>
<td>Format</td>
<td style="text-align: right;">1</td>
<td style="text-align: right;">&lt;1%</td>
<td>Data format specifications</td>
</tr>
</tbody>
</table>
<p><em>Note: These counts reflect the current database (419 ideas). An earlier pipeline run with different extraction parameters produced higher counts across all categories; the relative proportions are more meaningful than the absolute numbers.</em></p>
<p>The near-equal split between <strong>protocols</strong> (96), <strong>architectures</strong> (95), and <strong>extensions</strong> (79) tells us the community is both building new solutions and extending existing ones. The protocols and extensions show that much of the work builds on established foundations (OAuth 2.0, SCIM, DNS, EDHOC) rather than starting from scratch.</p>
<p>The 95 architectures and 42 requirements suggest healthy standards development: teams are defining reference models before writing code. But the 35 patterns -- reusable approaches without full protocol specification -- indicate that some teams have identified what needs to be done without committing to how.</p>
<h2 id="where-teams-converge">Where Teams Converge</h2>
<p>By exact title, few ideas appear in multiple drafts. But ideas with different names often describe the same concept -- "Agent Gateway" in one draft and "Inter-Agent Communication Hub" in another. Our fuzzy-matching overlap analysis (using SequenceMatcher at 0.75 threshold) across organizational boundaries found <strong>130 ideas</strong> where 2+ distinct organizations are working on recognizably similar problems. These are the genuine consensus signals.</p>
<table>
<thead>
<tr>
<th>Idea</th>
<th style="text-align: right;">Orgs</th>
<th style="text-align: right;">Drafts</th>
<th>Key Organizations</th>
</tr>
</thead>
<tbody>
<tr>
<td>A2A Communication Paradigm</td>
<td style="text-align: right;">8</td>
<td style="text-align: right;">5</td>
<td>CAICT, Deutsche Telekom, Huawei, Orange, Telefonica</td>
</tr>
<tr>
<td>AI Agent Network Architecture</td>
<td style="text-align: right;">8</td>
<td style="text-align: right;">5</td>
<td>China Mobile, Deutsche Telekom, Huawei, Orange, UnionPay</td>
</tr>
<tr>
<td>Multi-Agent Communication Protocol</td>
<td style="text-align: right;">7</td>
<td style="text-align: right;">8</td>
<td>AsiaInfo, BUPT, China Mobile, China Telecom, Huawei</td>
</tr>
<tr>
<td>AI Agent Communication Network (ACN)</td>
<td style="text-align: right;">7</td>
<td style="text-align: right;">5</td>
<td>ANP Open Source, China Mobile, Cisco, Five9, Huawei</td>
</tr>
<tr>
<td>NLIP (Natural Language Interchange)</td>
<td style="text-align: right;">7</td>
<td style="text-align: right;">1</td>
<td>Fordham, IBM, Purdue, ServiceNow, eBay</td>
</tr>
<tr>
<td>ELA Protocol</td>
<td style="text-align: right;">6</td>
<td style="text-align: right;">6</td>
<td>Bitwave, Cisco, Ericsson, Five9, Inria</td>
</tr>
<tr>
<td>AI Gateway</td>
<td style="text-align: right;">6</td>
<td style="text-align: right;">4</td>
<td>AsiaInfo, BUPT, China Telecom, Huawei, UnionPay</td>
</tr>
<tr>
<td>Agent Communication across WAN</td>
<td style="text-align: right;">6</td>
<td style="text-align: right;">3</td>
<td>China Mobile, China Unicom, Deutsche Telekom, Huawei, Orange</td>
</tr>
</tbody>
</table>
<p>The most-converged idea -- "A2A Communication Paradigm" -- draws independent contributions from <strong>8 organizations across 5 countries</strong>. This is simultaneously the strongest convergence signal and the strongest fragmentation signal. Eight organizations agree this is important. They are building separate, incompatible versions.</p>
<p>Look at who bridges the divide. In three of the top eight convergent ideas, the same names appear alongside Chinese institutions: <strong>Deutsche Telekom, Telefonica, and Orange</strong>. These European telecoms show up in "A2A Communication Paradigm," "AI Agent Network Architecture," and "Agent Communication across WAN" -- each time co-listed with Huawei, China Mobile, or China Unicom. Of the <strong>180 ideas that cross the Chinese-Western organizational divide</strong>, European telecoms are present on a disproportionate share. The organizations most likely to prevent the agent ecosystem from splitting into incompatible regional stacks are not Google or Microsoft -- they are European carriers operating in both markets. US Big Tech is almost entirely absent from cross-divide convergence.</p>
<p>The organization-pair overlaps reveal where real collaboration happens -- and where it does not:</p>
<table>
<thead>
<tr>
<th>Org Pair</th>
<th style="text-align: right;">Shared Ideas</th>
<th>Signal</th>
</tr>
</thead>
<tbody>
<tr>
<td>China Unicom -- Huawei</td>
<td style="text-align: right;">32</td>
<td>Deep intra-bloc alignment</td>
</tr>
<tr>
<td>China Mobile -- Huawei</td>
<td style="text-align: right;">27</td>
<td>Deep intra-bloc alignment</td>
</tr>
<tr>
<td>Ericsson -- Inria</td>
<td style="text-align: right;">21</td>
<td>European cross-org collaboration</td>
</tr>
<tr>
<td>Tsinghua -- Zhongguancun Lab</td>
<td style="text-align: right;">20</td>
<td>Chinese academic convergence</td>
</tr>
<tr>
<td>Fraunhofer SIT -- Tradeverifyd</td>
<td style="text-align: right;">10</td>
<td>Verifiable records niche</td>
</tr>
</tbody>
</table>
<p>The pattern is stark: the highest-overlap pairs are Chinese institutions working within established blocs. Formal co-authorship between Chinese and Western organizations is thin -- but idea-level convergence, mediated by European telecoms operating in both markets, is broader than the co-authorship data suggests.</p>
<p>The convergence signals cluster in three areas:</p>
<p><strong>1. Agent communication infrastructure.</strong> How agents discover, connect to, and message each other. This is the most active area with the most redundant proposals. The underlying need is clear; the implementation is contested.</p>
<p><strong>2. Authentication and authorization.</strong> Action-based authorization, agent registration, cryptographic identity verification. OAuth extensions dominate, but the approaches diverge significantly between pure OAuth extension (add claims/scopes) and novel frameworks (DAAP accountability protocol, STAMP delegation proofs).</p>
<p><strong>3. Network architecture.</strong> Agent gateways, agent communication networks, network management architectures. This is where the Chinese institutional ecosystem has the strongest presence, with Huawei and affiliated organizations producing most of the architecture ideas.</p>
<h2 id="where-teams-innovate">Where Teams Innovate</h2>
<p>The 96% of ideas appearing in only one draft are a mix: mostly generic components describing what each draft does ("Agent Gateway," "Transport Configuration System"), but scattered among them are genuinely novel proposals that no other team has attempted -- either because they are too new, too specialized, or ahead of their time.</p>
<p>Some standouts from the unique ideas:</p>
<p><strong>Verifiable Agent Behavior Attestation</strong> (draft-birkholz-verifiable-agent-conversations) -- A CDDL-based format for cryptographically signing agent conversation records, enabling post-hoc verification of agent behavior. This directly addresses the critical behavior verification gap.</p>
<p><strong>ADOL: Agentic Data Optimization Layer</strong> (<a href="https://datatracker.ietf.org/doc/draft-chang-agent-token-efficient/">draft-chang-agent-token-efficient</a>, score 4.5) -- Addresses token bloat in agent communication protocols. As agents exchange increasingly complex context, message sizes explode. ADOL compresses agent communications by 60-80%, a practical necessity that nobody else is working on.</p>
<p><strong>Working Memory</strong> (draft-agent-gw) -- A structured context management system that maintains state across multi-step agent operations. Sounds basic -- but no other draft proposes a standard for how agents should manage persistent operational context.</p>
<p><strong>Autonomous Optical Network Operation</strong> (draft-zhao-ccamp-actn-optical-network-agent) -- Applies agent architecture to the specific domain of optical network management. This is the kind of vertical specialization that validates the horizontal agent architecture work.</p>
<p><strong>Execution Context Token (ECT)</strong> (<a href="https://datatracker.ietf.org/doc/draft-nennemann-wimse-ect/">draft-nennemann-wimse-ect</a>, score 4.0) -- A JWT extension that records what each task did, linked to predecessors via a DAG. This is arguably the single most architecturally significant idea in the corpus: it turns the execution history of a multi-agent workflow into a cryptographically verifiable directed acyclic graph. It is the technical foundation for accountability, rollback, audit, and provenance.</p>
<p><strong>CHEQ Protocol</strong> (<a href="https://datatracker.ietf.org/doc/draft-rosenberg-aiproto-cheq/">draft-rosenberg-aiproto-cheq</a>, score 3.9) -- Human confirmation of agent decisions before execution. The only concrete protocol proposal for human-in-the-loop agent oversight. In a landscape of 30 human-agent interaction drafts, CHEQ stands alone as an implementable solution.</p>
<h2 id="the-five-ideas-that-matter-most">The Five Ideas That Matter Most</h2>
<p>If you are building agent systems today and need to know which IETF proposals to watch, these five represent the highest combination of quality, novelty, and gap-filling potential:</p>
<table>
<thead>
<tr>
<th>Idea</th>
<th>Draft</th>
<th style="text-align: right;">Score</th>
<th>Why It Matters</th>
</tr>
</thead>
<tbody>
<tr>
<td>Execution Context Token</td>
<td>draft-nennemann-wimse-ect</td>
<td style="text-align: right;">4.0</td>
<td>DAG-based execution evidence; foundation for audit, rollback, and accountability</td>
</tr>
<tr>
<td>DAAP Accountability Protocol</td>
<td>draft-aylward-daap-v2</td>
<td style="text-align: right;">4.75</td>
<td>Most comprehensive safety proposal; authentication + monitoring + enforcement</td>
</tr>
<tr>
<td>STAMP Delegation Proofs</td>
<td>draft-guy-bary-stamp-protocol</td>
<td style="text-align: right;">4.5</td>
<td>Cryptographic proof that an agent was authorized for a specific task</td>
</tr>
<tr>
<td>Agent Description Language (ADL)</td>
<td>draft-nederveld-adl</td>
<td style="text-align: right;">4.1</td>
<td>JSON standard for describing agent capabilities, tools, and permissions</td>
</tr>
<tr>
<td>Verifiable Conversations</td>
<td>draft-birkholz-verifiable-agent-conversations</td>
<td style="text-align: right;">4.5</td>
<td>Cryptographic signing of conversation records for auditability</td>
</tr>
</tbody>
</table>
<p>Together, these five ideas sketch the outline of the ecosystem architecture that Post 6 will describe in full: ECT provides the execution backbone, DAAP provides the accountability layer, STAMP proves delegation, ADL describes capabilities, and verifiable conversations create the audit trail.</p>
<h2 id="mapping-ideas-to-gaps">Mapping Ideas to Gaps</h2>
<p>The most revealing analysis is mapping which ideas partially address which gaps:</p>
<table>
<thead>
<tr>
<th>Gap</th>
<th>Severity</th>
<th>Coverage</th>
</tr>
</thead>
<tbody>
<tr>
<td>Agent Behavioral Verification</td>
<td>CRITICAL</td>
<td>Partial: attestation and monitoring ideas exist but no runtime enforcement</td>
</tr>
<tr>
<td>Agent Failure Cascade Prevention</td>
<td>CRITICAL</td>
<td>Near-zero: minimal work on cascade containment</td>
</tr>
<tr>
<td>Real-Time Agent Rollback Mechanisms</td>
<td>HIGH</td>
<td>Near-zero: limited to draft-yue-anima-agent-recovery-networks</td>
</tr>
<tr>
<td>Multi-Agent Consensus Protocols</td>
<td>HIGH</td>
<td>Minimal: no conflict resolution framework</td>
</tr>
<tr>
<td>Human Override Standardization</td>
<td>HIGH</td>
<td>Near-zero: CHEQ exists but no emergency override protocol</td>
</tr>
<tr>
<td>Cross-Domain Agent Audit Trails</td>
<td>HIGH</td>
<td>Partial: identity covered, cross-domain audit not</td>
</tr>
<tr>
<td>Federated Agent Learning Privacy</td>
<td>HIGH</td>
<td>Minimal: privacy-preserving learning not specified</td>
</tr>
<tr>
<td>Cross-Protocol Agent Migration</td>
<td>MEDIUM</td>
<td>Complete absence in the corpus</td>
</tr>
<tr>
<td>Agent Resource Accounting and Billing</td>
<td>MEDIUM</td>
<td>Peripheral: resource types defined but no economic models</td>
</tr>
<tr>
<td>Agent Capability Negotiation</td>
<td>MEDIUM</td>
<td>Partial: tool enumeration exists but not dynamic negotiation</td>
</tr>
<tr>
<td>Agent Performance Benchmarking</td>
<td>MEDIUM</td>
<td>Moderate: benchmarking ideas exist (draft-cui-nmrg-llm-benchmark)</td>
</tr>
</tbody>
</table>
<p>The pattern is clear: the critical and high-severity gaps are those where the <em>periphery</em> of existing work touches the problem but nobody makes it the <em>central</em> problem. Teams building communication protocols think about resources; teams building discovery think about lifecycle. The gaps where no team is even circling the problem -- rollback mechanisms, human override, cascade prevention -- are the true blind spots.</p>
<h2 id="the-ideas-nobody-had">The Ideas Nobody Had</h2>
<p>Sometimes the absence is the finding. Here are technical ideas conspicuous in their absence from the entire corpus:</p>
<ul>
<li>
<p><strong>Agent capability degradation signaling</strong>: No protocol for an agent to advertise that its performance has degraded (model drift, resource constraints, partial failure). Other agents continue relying on it at full trust.</p>
</li>
<li>
<p><strong>Multi-agent transaction semantics</strong>: No ACID-like guarantees for multi-agent workflows. If three agents must all succeed or all roll back, there is no two-phase commit equivalent.</p>
</li>
<li>
<p><strong>Agent migration protocol</strong>: No standard for moving a running agent from one host to another while preserving state and active connections. Critical for cloud deployments.</p>
</li>
<li>
<p><strong>Privacy-preserving agent discovery</strong>: No mechanism for an agent to find capabilities without revealing its intent. "I need a medical diagnosis agent" reveals sensitive information before any trust is established. Under Art. 25 GDPR (data protection by design and by default), this is not just a nice-to-have -- it is a legal requirement for EU-deployed systems where discovery queries may constitute processing of special category data (Art. 9 GDPR, health data).</p>
</li>
<li>
<p><strong>Agent cost and billing</strong>: No standard for agents to negotiate compensation for services. Agents performing work for other agents have no way to express "this costs X" or "you have Y credits remaining."</p>
</li>
</ul>
<p>Each of these absences represents an opportunity for new drafts that would fill genuine needs.</p>
<h2 id="what-the-taxonomy-tells-builders">What the Taxonomy Tells Builders</h2>
<p>Three practical takeaways for anyone implementing agent systems:</p>
<p><strong>1. Build on the convergent ideas.</strong> Agent registration, action-based authorization, and capability-based discovery appear across multiple teams and organizations. These represent genuine consensus about what the infrastructure needs, even if implementations diverge.</p>
<p><strong>2. Watch the single-source innovations.</strong> The long tail of single-draft ideas contains the innovations that will differentiate the next generation of agent platforms. ECT, CHEQ, ADOL, and ADL are not widely known but represent some of the most thoughtful engineering in the corpus.</p>
<p><strong>3. Fill the blank spaces.</strong> Error recovery, cross-protocol translation, and human override are the clearest opportunities for new contributions. The community has signaled these gaps matter (through the severity of the gap analysis) but has not yet produced the ideas to fill them.</p>
<hr />
<h3 id="key-takeaways">Key Takeaways</h3>
<ul>
<li><strong>The vast majority of ideas appear in exactly one draft</strong> -- fragmentation extends all the way down to the idea level</li>
<li><strong>130 cross-org convergent ideas</strong> (36% of unique clusters, via SequenceMatcher fuzzy matching at 0.75 threshold) reveal where organizations independently agree; highest-overlap pairs are Chinese institutions (China Unicom-Huawei: 32 shared ideas)</li>
<li><strong>The critical gaps remain unfilled</strong>: rollback mechanisms, failure cascade prevention, and human override have minimal coverage across 434 drafts</li>
<li><strong>Five ideas to watch</strong>: ECT (execution DAG), DAAP (accountability), STAMP (delegation proof), ADL (agent description), verifiable conversations (audit trail)</li>
<li><strong>Convergence clusters in three areas</strong>: agent communication infrastructure, authentication/authorization, and network architecture</li>
</ul>
<p><em>Next in this series: <a href="06-big-picture.md">Drawing the Big Picture</a> -- 130 cross-org convergent ideas, 11 gaps, and the architectural vision that connects them.</em></p>
<hr />
<p><em>Idea extraction performed by Claude from draft abstracts and full text. Classification into types (protocol, architecture, extension, mechanism, requirement, pattern) based on the technical content of each proposal. The current database contains 419 ideas; figures referencing ~1,780 ideas come from an earlier pipeline run with different extraction parameters. Data current as of March 2026.</em></p>
<div class="post-nav"><a href="/blog/posts/04-what-nobody-builds.html">&larr; What Nobody Builds</a><a href="/blog/posts/06-big-picture.html">The Big Picture &rarr;</a></div>
<footer>
<p>IETF Draft Analyzer &mdash; Data collected through March 2026.
<a href="https://github.com/cnennemann/ietf-draft-analyzer">Source on GitHub</a></p>
</footer>
</div>
</body>
</html>

View File

@@ -0,0 +1,193 @@
<!DOCTYPE html>
<html lang="en">
<head>
<meta charset="utf-8">
<meta name="viewport" content="width=device-width, initial-scale=1">
<title>Drawing the Big Picture: What the Agent Ecosystem Actually Needs — IETF AI Agent Analysis</title>
<link rel="stylesheet" href="/blog/css/style.css">
</head>
<body>
<div class="container">
<nav><a href="/blog/" class="site-title">IETF AI Agent Analysis</a>
<a href="/blog/posts/01-gold-rush.html">Rush</a>
<a href="/blog/posts/02-who-writes-the-rules.html">Rules</a>
<a href="/blog/posts/03-oauth-wars.html">Wars</a>
<a href="/blog/posts/04-what-nobody-builds.html">Builds</a>
<a href="/blog/posts/05-1262-ideas.html">Converge</a>
<strong>Picture</strong>
<a href="/blog/posts/07-how-we-built-this.html">This</a>
<a href="/blog/posts/08-agents-building-the-analysis.html">Analysis</a></nav>
<h1 id="drawing-the-big-picture-what-the-agent-ecosystem-actually-needs">Drawing the Big Picture: What the Agent Ecosystem Actually Needs</h1>
<p><em>434 drafts, 130 cross-org convergent ideas, 11 gaps -- and the architectural vision that connects them all.</em></p>
<p><em>Analysis based on IETF Datatracker data collected through March 2026. Counts and statistics reflect this snapshot.</em></p>
<hr />
<p>We have spent five posts documenting a paradox: the IETF's AI agent landscape has extraordinary breadth (434 drafts), deep fragmentation at every level (the vast majority of ideas appear in only one draft, 155 competing A2A protocols, 14 OAuth proposals), concentrated authorship (18 team blocs, one company writing ~16% of all drafts), and critical gaps (behavioral verification, failure cascade prevention, human override) that nobody is filling.</p>
<p>The landscape has quantity. It lacks architecture.</p>
<p>This post is about what the architecture looks like -- not in theory, but derived from the data. The 11 gaps are not random absences; they are structurally related. The convergent ideas contain the components; they need a blueprint. And the blueprint already has a foundation: existing IETF work on workload identity (SPIFFE/WIMSE) and execution evidence (Execution Context Tokens) provides the lower layers. What is missing is what goes on top.</p>
<h2 id="what-the-ecosystem-needs-four-pillars">What the Ecosystem Needs: Four Pillars</h2>
<p>Our analysis -- synthesizing the gaps, the ideas, and the existing proposals -- points to four missing pillars:</p>
<h3 id="pillar-1-dag-based-execution">Pillar 1: DAG-Based Execution</h3>
<p><strong>The gap it fills</strong>: Error Recovery and Rollback (Critical), Resource Management (Critical)</p>
<p>Every multi-agent workflow is a directed acyclic graph: tasks with dependencies, checkpoints, and decision points. But no draft in the corpus defines "agent task graph" as a first-class construct. Without it, there is no way to:
- Know which tasks depend on which
- Place checkpoints for rollback
- Calculate the blast radius of a failure
- Schedule resources based on the graph structure</p>
<p>The Execution Context Token (ECT) from <a href="https://datatracker.ietf.org/doc/draft-nennemann-wimse-ect/">draft-nennemann-wimse-ect</a> provides the evidence layer: each task produces a signed token linked to its predecessors via parent references, forming a verifiable DAG. What is missing is the orchestration semantics: when to checkpoint, how to roll back, how to contain cascading failures.</p>
<p>The data supports this: the limited work addressing error recovery (notably <a href="https://datatracker.ietf.org/doc/draft-yue-anima-agent-recovery-networks/">draft-yue-anima-agent-recovery-networks</a>) includes "Task-Oriented Multi-Agent Recovery Framework" and "State Consistency Management" -- DAG concepts by another name. The answer is the same structure: a DAG execution model.</p>
<h3 id="pillar-2-human-in-the-loop-as-first-class">Pillar 2: Human-in-the-Loop as First Class</h3>
<p><strong>The gap it fills</strong>: Human Override and Intervention (High), Agent Explainability (Medium)</p>
<p>Only <strong>34 human-agent interaction drafts</strong> exist against <strong>155 A2A protocol</strong> drafts and <strong>114 autonomous operations</strong> drafts. Agents are being designed to talk to each other, not to humans. The CHEQ protocol (<a href="https://datatracker.ietf.org/doc/draft-rosenberg-aiproto-cheq/">draft-rosenberg-aiproto-cheq</a>) is a rare exception -- it defines human confirmation <em>before</em> agent execution. But nobody has standardized what happens <em>during</em> execution: how a human pauses a running workflow, constrains an agent's scope, takes over a task, or issues an emergency stop.</p>
<p>Human-in-the-loop must be a node type in the execution DAG, not an afterthought. The architecture needs:
- <strong>Approval gates</strong>: DAG nodes that block until a human approves
- <strong>Override commands</strong>: Standardized signals to pause, constrain, stop, or take over
- <strong>Escalation paths</strong>: What happens when an override times out
- <strong>Explainability tokens</strong>: How an agent communicates its reasoning at a HITL point</p>
<p>The irony: every production deployment will require these primitives. The standards community is building autonomous capabilities while the deployment community is adding human oversight ad hoc.</p>
<h3 id="pillar-3-protocol-agnostic-interoperability">Pillar 3: Protocol-Agnostic Interoperability</h3>
<p><strong>The gap it fills</strong>: Cross-Protocol Translation (High, zero ideas), Agent Lifecycle Management (High)</p>
<p>The 155 A2A protocol drafts will never converge to a single winner. MCP, A2A Protocol, SLIM, and dozens of others will coexist, each with different strengths. The answer is not to pick one; it is to build a translation layer that lets agents using different protocols interoperate through gateways.</p>
<p>This gap has <strong>zero ideas</strong> in the current corpus -- the starkest absence across 434 drafts. No team is working on it. Yet it is perhaps the most important architectural piece: without protocol interoperability, the agent ecosystem fragments into vendor-locked silos.</p>
<p>The protocol binding layer would define:
- How agents advertise which ecosystem features they support
- How gateways translate between protocols while preserving execution semantics (the DAG, the HITL points)
- How agents version and retire gracefully without breaking dependents
- The minimal semantic contract: intent, result, error -- expressible in any protocol</p>
<h3 id="pillar-4-assurance-profiles-dual-regime">Pillar 4: Assurance Profiles (Dual Regime)</h3>
<p><strong>The gap it fills</strong>: Behavior Verification (Critical), Cross-Domain Security (High), Dynamic Trust (High), Data Provenance (Medium)</p>
<p>The same agent ecosystem must work in two regimes:</p>
<p><strong>Relaxed</strong> (development, internal tools, low-risk): Best-effort, optional audit, minimal proof overhead. Think Kubernetes-deployed internal agents.</p>
<p><strong>Regulated</strong> (finance, healthcare, critical infrastructure): Cryptographic attestation per task, provenance chains, behavior verification against declared specifications, mandatory audit ledger. Think medical or financial agents.</p>
<p>The architecture achieves this with <em>assurance profiles</em> -- named configurations that dial up or down the proof requirements. The same DAG, same HITL points, same protocol bindings. Different levels of evidence:</p>
<table>
<thead>
<tr>
<th>Level</th>
<th>Evidence</th>
<th>Use Case</th>
</tr>
</thead>
<tbody>
<tr>
<td>L0</td>
<td>None (best-effort)</td>
<td>Development, testing</td>
</tr>
<tr>
<td>L1</td>
<td>Unsigned audit trail</td>
<td>Internal production</td>
</tr>
<tr>
<td>L2</td>
<td>Signed ECTs (JWT)</td>
<td>Cross-org, standard compliance</td>
</tr>
<tr>
<td>L3</td>
<td>Signed ECTs + external audit ledger</td>
<td>Regulated industries</td>
</tr>
</tbody>
</table>
<p>This dual-regime approach resolves the tension between "move fast" deployments and "prove everything" regulated environments. Ideas touching behavior verification and data provenance become implementable at higher assurance levels without imposing their cost on every deployment. Notably, the L2 and L3 profiles map directly to the conformity assessment requirements of the EU AI Act (Art. 43): high-risk AI systems must demonstrate compliance through either internal control (L2's signed ECTs) or third-party audit (L3's external audit ledger), making assurance profiles not just an engineering convenience but a regulatory implementation pathway.</p>
<h2 id="how-it-builds-on-what-exists">How It Builds on What Exists</h2>
<p>A critical point: this architecture does not compete with existing work. It layers on top of it. Our cross-reference analysis confirms the foundations are strong: <strong>TLS 1.3</strong> (RFC 8446, cited by 42 drafts), <strong>OAuth 2.0</strong> (RFC 6749, 36 drafts), <strong>HTTP Semantics</strong> (RFC 9110, 34 drafts), <strong>JWT</strong> (RFC 7519, 22 drafts), and <strong>COSE</strong> (RFC 9052, 20 drafts) form the bedrock.</p>
<p>But the bedrock is not uniform. Our RFC foundation analysis (Post 3) revealed that the Chinese and Western blocs build on <strong>fundamentally different technology stacks</strong>: YANG/NETCONF for network management on one side, COSE/CBOR/CoAP for IoT security on the other. The only shared foundation is OAuth 2.0. This means the architecture layer above must be genuinely protocol-agnostic -- it cannot assume either stack as the default. The four pillars are designed with this constraint: the DAG model, HITL primitives, and assurance profiles are expressed in terms of abstract semantics, not specific wire formats. The protocol binding layer (Pillar 3) exists precisely because the underlying plumbing diverges.</p>
<p>The architecture adds connective tissue above this layer, not below it:</p>
<table>
<thead>
<tr>
<th>Layer</th>
<th>Existing Work</th>
<th>What We Add</th>
</tr>
</thead>
<tbody>
<tr>
<td><strong>Identity</strong></td>
<td>SPIFFE (workload identifier), WIMSE (security context propagation)</td>
<td>Nothing -- use existing identity</td>
</tr>
<tr>
<td><strong>Evidence</strong></td>
<td>ECT (execution context tokens, DAG linking)</td>
<td>Orchestration semantics, checkpoint/rollback, HITL nodes</td>
</tr>
<tr>
<td><strong>Auth</strong></td>
<td>OAuth 2.0, SCIM, DAAP, STAMP, Agentic JWT</td>
<td>Protocol binding so any auth approach works</td>
</tr>
<tr>
<td><strong>Communication</strong></td>
<td>MCP, A2A, SLIM, 155 other protocols</td>
<td>Translation layer and capability advertisement</td>
</tr>
<tr>
<td><strong>Safety</strong></td>
<td>DAAP (accountability), verifiable conversations, VERA (zero-trust)</td>
<td>Assurance profiles connecting these into deployable configurations</td>
</tr>
</tbody>
</table>
<p>The proposed five-draft ecosystem:</p>
<ol>
<li><strong>Agent Ecosystem Model (AEM)</strong> -- Architecture and terminology. The shared vocabulary so everyone speaks the same language.</li>
<li><strong>Agent Task DAG (ATD)</strong> -- Execution semantics, checkpoints, rollback. How the DAG works.</li>
<li><strong>Human-in-the-Loop (HITL) Primitives</strong> -- Approval gates, overrides, escalation. How humans participate.</li>
<li><strong>Agent Ecosystem Protocol Binding (AEPB)</strong> -- Protocol translation, capability discovery, lifecycle management. How interoperability works.</li>
<li><strong>Assurance Profiles (APAE)</strong> -- Behavior verification, dynamic trust, provenance. How you prove it all works.</li>
</ol>
<p>Each draft addresses specific gaps. Together, they provide the connective tissue the landscape lacks.</p>
<h2 id="traction-vs-aspiration">Traction vs. Aspiration</h2>
<p>A reality check: of the 434 drafts, <strong>52 (12%)</strong> have been adopted by IETF working groups. The rest are individual submissions -- proposals without institutional backing. The WG-adopted drafts score higher on average (<strong>3.61 vs. 3.23</strong>, 4-dimension composite), particularly on maturity (+1.28) and momentum (+0.98), but lower on novelty (-0.45). <em>(Note: scores are LLM-generated relative rankings from abstracts; see <a href="../methodology.md">Methodology</a>.)</em> The WGs that have adopted the most agent-relevant drafts are security-focused: <strong>lamps</strong> (6 drafts), <strong>lake</strong> (5), <strong>tls</strong> (3), <strong>emu</strong> (3). Agent-specific WGs like <code>aipref</code> have adopted only 2 drafts.</p>
<p>This reveals a structural insight: the IETF is not building agent standards from scratch. It is <strong>retrofitting security standards for agents</strong>. The agent architecture we propose above would need to work within this reality -- building on the security WGs' infrastructure rather than competing with it.</p>
<h2 id="predictions">Predictions</h2>
<p>Based on the data trajectories and current momentum:</p>
<p><strong>Within 6 months</strong>: The OAuth-for-agents fragmentation will partially resolve. Working groups will adopt 2-3 canonical approaches (likely DAAP/STAMP for accountability and one of the RAR extensions for basic auth). The other 10 proposals will fade or merge.</p>
<p><strong>Within 12 months</strong>: The DMSC side meeting's gateway work will produce a specification, likely gateway-centric with Agent Gateways as the primary interoperability mechanism. This is not the protocol-agnostic translation layer the ecosystem needs, but it will be the first concrete interop proposal.</p>
<p><strong>Within 5 months (August 2026)</strong>: The EU AI Act (Regulation 2024/1689), which entered into force on 1 August 2024, becomes fully applicable on 2 August 2026. Its requirements for high-risk AI systems -- including mandatory risk management (Art. 9), human oversight (Art. 14), record-keeping (Art. 12), and accuracy/robustness (Art. 15) -- will drive immediate demand for behavior verification, human override, and audit standards. Non-compliance carries penalties up to 35 million EUR or 7% of global annual turnover (Art. 99). This is not future regulatory pressure; it is current law with imminent enforcement. The safety deficit is simultaneously a technical gap and a compliance gap for any agent system deployed in the EU.</p>
<p><strong>The risk</strong>: If the architecture work does not happen in the next 12 months, the agent ecosystem will calcify around vendor-specific protocol stacks (OpenAI's, Google's, Anthropic's, Huawei's). Each will have its own auth, discovery, and communication layer. The interoperability window will close, and the IETF's work will be standards for islands rather than standards for the internet.</p>
<h3 id="the-ethics-of-standardizing-early">The Ethics of Standardizing Early</h3>
<p>There is a harder question underneath the technical one: should the IETF be standardizing agent capabilities at all before safety frameworks are mature? The 4:1 capability-to-safety ratio is not just a gap -- it is a policy choice being made by default. Every A2A protocol that ships without behavior verification baked in creates a deployed base that resists retrofitting. The standards community is building the defaults that will govern billions of agent interactions, and those defaults currently assume trust rather than requiring proof.</p>
<p>The structural dynamics make this worse. The authorship analysis from Post 2 showed that a small number of large organizations -- Huawei, China Mobile, Cisco -- drive a disproportionate share of submissions. Civil society organizations, academic safety researchers, and smaller companies are largely absent from the drafting process. Standards that define agent identity, discovery, and communication also define what can be monitored, audited, and controlled. An agent discovery protocol designed primarily for enterprise deployment efficiency may inadvertently create a surveillance-friendly architecture if privacy and human autonomy are not first-class design constraints. The EU AI Act mandates human oversight (Art. 14), but a mandate is only as good as the protocol that implements it.</p>
<p>The IETF has historically been good at building infrastructure that serves everyone -- the end-to-end principle, protocol layering, rough consensus. But "rough consensus" among the current participants may not represent the interests of those most affected by autonomous agent systems. The architecture proposed above includes human-in-the-loop as a pillar, not an option. That is the right instinct. The question is whether the community will treat it with the same urgency as the protocol work -- or whether, as the data currently suggests, it will remain an aspiration while the highways ship without traffic lights.</p>
<h3 id="two-equilibria">Two Equilibria</h3>
<p>By 2028, the landscape will have resolved into one of two stable states.</p>
<p>In the <strong>first equilibrium</strong>, it looks like today's microservices ecosystem: a chaotic but functional collection of protocols, libraries, and frameworks, held together by platform-specific integrations and de facto standards from the largest cloud providers. The IETF's work exists but is incomplete. The real interoperability happens at higher layers -- agent frameworks like LangChain, Semantic Kernel, or their successors. Safety is bolted on after deployment.</p>
<p>In the <strong>second equilibrium</strong>, it looks more like the web: a layered architecture where identity (like TLS), communication (like HTTP), and semantics (like HTML) are cleanly separated, with standardized interfaces between them. Agents identify via WIMSE, execute via ECT-based DAGs, communicate via protocol-agnostic bindings, and operate under assurance profiles that scale from development to regulated production. Safety is built in, not bolted on.</p>
<p>The ~4:1 aggregate ratio (averaging ~4:1 but varying from 1.5:1 to 21:1 month-to-month) is the leading indicator. If it narrows -- if safety and oversight work accelerates to match capability work -- the second equilibrium becomes achievable. If it stays at ~4:1 or widens, the first equilibrium is where we land, and safety becomes remediation rather than prevention.</p>
<h2 id="what-builders-should-do-today">What Builders Should Do Today</h2>
<p>If you are building agent systems and cannot wait for standards to mature:</p>
<p><strong>1. Watch these drafts</strong>: ECT (execution evidence), DAAP (accountability), CHEQ (human confirmation), ADL (agent description), ANS (agent discovery). These have the highest combination of quality, novelty, and adoption potential.</p>
<p><strong>2. Design for the DAG</strong>: Structure your multi-agent workflows as directed acyclic graphs with explicit dependencies and checkpoints. Even without a standard, the pattern will be compatible with whatever emerges.</p>
<p><strong>3. Build HITL from the start</strong>: Every production agent deployment needs human override capability. Do not add it later. Design approval gates, emergency stops, and escalation paths into your architecture now.</p>
<p><strong>4. Implement assurance as a dial</strong>: Make your proof/audit level configurable. Start at L0 for development, L1 for production, and be ready to turn up to L2/L3 when regulation arrives.</p>
<p><strong>5. Avoid protocol lock-in</strong>: If you build on MCP today, architect for the possibility of supporting A2A or SLIM tomorrow. The protocol war is not over, and the winner may be "all of them via translation."</p>
<h2 id="the-thesis">The Thesis</h2>
<p>Across six posts, we have built to one argument:</p>
<p><strong>The IETF's AI agent standardization effort is the largest, fastest-growing, and most consequential standards race in a decade. But it is building the highways before the traffic lights.</strong> The data shows explosive growth (from 0.5% to 9.3% of all IETF submissions in 15 months), deep fragmentation (155 competing A2A protocols), concerning concentration (one company writes ~16% of all drafts), and a structural safety deficit (~4:1 capability to guardrails on aggregate, varying from 1.5:1 to 21:1 by month). What is missing is not more protocols -- it is connective tissue: a shared execution model, human oversight primitives, protocol interoperability, and assurance profiles that work from development to regulated production.</p>
<p>The convergent ideas -- and the broader set of 130 cross-org overlaps (36% of unique idea clusters) -- contain the components for this architecture. The question is whether the community can assemble them before the protocols ship without it. The convergence data suggests it is possible: <strong>180 ideas already cross the Chinese-Western divide</strong>, mediated largely by European telecoms (Deutsche Telekom, Telefonica, Orange) that operate in both markets and appear on both sides of nearly every major cross-cultural convergent idea. The bridge-builders exist. They need an architecture to bridge to.</p>
<p>The IETF has built the internet's infrastructure before. DNS, HTTP, TLS -- each emerged from periods of competing proposals, fragmentation, and coordinated resolution. The AI agent standards race is following the same pattern, on a compressed timeline, with higher stakes.</p>
<p>The traffic lights need to catch up to the highways. The data says they can -- if someone draws the big picture.</p>
<hr />
<h3 id="key-takeaways">Key Takeaways</h3>
<ul>
<li><strong>Four missing pillars</strong>: DAG-based execution, human-in-the-loop primitives, protocol-agnostic interoperability, and assurance profiles for dual-regime deployment</li>
<li><strong>The architecture builds on existing work</strong>: SPIFFE for identity, WIMSE for security context, ECT for execution evidence -- the foundation exists</li>
<li><strong>Five proposed drafts</strong> (AEM, ATD, HITL, AEPB, APAE) would fill the 11 gaps by providing connective tissue between existing protocol proposals</li>
<li><strong>The interoperability window is closing</strong>: vendor-specific agent stacks are forming; the next 12 months are critical for open standards</li>
<li><strong>For builders today</strong>: design for DAGs, build HITL from the start, make assurance configurable, avoid protocol lock-in</li>
</ul>
<p><em>Next in this series: <a href="07-how-we-built-this.md">How We Built This</a> -- the methodology behind analyzing 434 IETF drafts with Claude, Ollama, and Python.</em></p>
<hr />
<p><em>Synthesis based on the full IETF Draft Analyzer dataset: 434 drafts, 557 authors, 130 cross-org convergent ideas (via SequenceMatcher fuzzy matching at 0.75 threshold), 11 gaps, 18 team blocs. Data current as of March 2026.</em></p>
<div class="post-nav"><a href="/blog/posts/05-1262-ideas.html">&larr; Where Drafts Converge</a><a href="/blog/posts/07-how-we-built-this.html">How We Built This &rarr;</a></div>
<footer>
<p>IETF Draft Analyzer &mdash; Data collected through March 2026.
<a href="https://github.com/cnennemann/ietf-draft-analyzer">Source on GitHub</a></p>
</footer>
</div>
</body>
</html>

View File

@@ -0,0 +1,345 @@
<!DOCTYPE html>
<html lang="en">
<head>
<meta charset="utf-8">
<meta name="viewport" content="width=device-width, initial-scale=1">
<title>How We Built This: Analyzing 434 IETF Drafts with Claude and Ollama — IETF AI Agent Analysis</title>
<link rel="stylesheet" href="/blog/css/style.css">
</head>
<body>
<div class="container">
<nav><a href="/blog/" class="site-title">IETF AI Agent Analysis</a>
<a href="/blog/posts/01-gold-rush.html">Rush</a>
<a href="/blog/posts/02-who-writes-the-rules.html">Rules</a>
<a href="/blog/posts/03-oauth-wars.html">Wars</a>
<a href="/blog/posts/04-what-nobody-builds.html">Builds</a>
<a href="/blog/posts/05-1262-ideas.html">Converge</a>
<a href="/blog/posts/06-big-picture.html">Picture</a>
<strong>This</strong>
<a href="/blog/posts/08-agents-building-the-analysis.html">Analysis</a></nav>
<h1 id="how-we-built-this-analyzing-434-ietf-drafts-with-claude-and-ollama">How We Built This: Analyzing 434 IETF Drafts with Claude and Ollama</h1>
<p><em>The engineering behind the analysis -- a Python CLI, two LLMs, one SQLite database, and ~$9.</em></p>
<p><em>Analysis based on IETF Datatracker data collected through March 2026. Counts and statistics reflect this snapshot.</em></p>
<hr />
<p>Every claim in this series -- the ~4:1 safety ratio (averaging ~4:1 but varying from 1.5:1 to 21:1 month-to-month), the 14 competing OAuth proposals, the 18 team blocs, the 11 gaps, the 180 ideas crossing the Chinese-Western divide -- comes from an automated analysis pipeline we built in Python. This post describes how it works, what it costs, what it found that surprised us, and what we learned about LLM-powered document analysis at scale.</p>
<p>The tool is open source. If you want to run it on a different corner of the IETF -- or adapt it for another standards body -- everything you need is in the repository.</p>
<h2 id="the-pipeline">The Pipeline</h2>
<p>The analysis runs in six core stages. Each builds on the previous, and every stage caches its work so re-runs are fast and cheap.</p>
<pre><code>fetch --&gt; analyze --&gt; embed --&gt; ideas --&gt; gaps --&gt; report
| | | | | |
v v v v v v
Datatracker Claude Ollama Claude Claude Markdown
API Sonnet nomic-embed Haiku Sonnet + rich
</code></pre>
<p>Three additional analysis passes run on top of the core pipeline:</p>
<pre><code>refs --&gt; trends --&gt; idea-overlap --&gt; status
| | | |
v v v v
Regex SQL query SequenceMatcher Naming convention
(local) (local) (local) (local)
</code></pre>
<p>These secondary passes cost nothing -- they operate entirely on data already in the database.</p>
<h3 id="stage-1-fetch">Stage 1: Fetch</h3>
<p>The Datatracker API (<code>https://datatracker.ietf.org/api/v1/doc/document/</code>) provides structured metadata for every Internet-Draft: name, title, abstract, authors, revision, submission date, working group, and current status. Full text is available at <code>https://www.ietf.org/archive/id/{name}-{rev}.txt</code>.</p>
<p>We search for drafts matching 12 keywords: <code>agent</code>, <code>ai-agent</code>, <code>llm</code>, <code>autonomous</code>, <code>machine-learning</code>, <code>artificial-intelligence</code>, <code>mcp</code>, <code>agentic</code>, <code>inference</code>, <code>generative</code>, <code>intelligent</code>, <code>aipref</code>. Both <code>name__contains</code> and <code>abstract__contains</code> filters are used to cast a wide net. We started with 6 keywords and 260 drafts; adding 6 more captured 101 new drafts in categories we were missing -- MCP-related work, generative AI infrastructure, intelligent networking, and the nascent <code>aipref</code> working group.</p>
<p><strong>Gotchas learned the hard way</strong>: The Datatracker API uses <code>type__slug=draft</code> (not <code>type=draft</code>) to filter to drafts. Pagination requires tracking <code>meta.next</code> through the response chain. Affiliation data comes from the <code>documentauthor</code> record, not the <code>person</code> record. We add a 0.5-second polite delay between requests.</p>
<p>The result: <strong>434 drafts</strong> fetched, with full metadata and text stored in SQLite.</p>
<h3 id="stage-2-analyze">Stage 2: Analyze</h3>
<p>Each draft is sent to Claude Sonnet with a compact structured prompt that includes the draft name, title, date, page count, and abstract. The prompt asks for:
- <strong>Category classification</strong> (one or more of 11 categories: A2A protocols, agent identity/auth, autonomous netops, data formats/interop, agent discovery/reg, human-agent interaction, AI safety/alignment, ML traffic management, policy/governance, model serving/inference, other)
- <strong>Quality rating</strong> on five dimensions (novelty, maturity, overlap, momentum, relevance) each scored 1-5
- <strong>Brief summary</strong> of what the draft does and why it matters</p>
<p>The key optimization: <strong>caching</strong>. Every Claude API call is stored in an <code>llm_cache</code> table keyed by the SHA-256 hash of the full prompt. If the same draft is analyzed twice, the second call is free and instant. This makes the pipeline idempotent -- you can re-run any stage without wasting money.</p>
<p>We initially sent full draft text to Claude, but switched to abstract-only analysis after testing showed that abstracts produce equivalent ratings at roughly 10x lower token cost. Full text is still used for idea extraction (Stage 4), where granular detail matters.</p>
<p><strong>Cost</strong>: About $3.16 for the initial 260 drafts on Claude Sonnet (376K input tokens, 200K output tokens). With the <code>--cheap</code> flag, analysis uses Claude Haiku instead, cutting costs roughly 10x.</p>
<h3 id="stage-3-embed">Stage 3: Embed</h3>
<p>For similarity analysis, we generate vector embeddings using Ollama running locally with the <code>nomic-embed-text</code> model. Each draft's abstract is embedded into a 768-dimensional vector, stored as raw bytes in the database.</p>
<p><strong>Why not Claude for embeddings?</strong> Cost and speed. Ollama runs locally, is free, and processes all 434 drafts in under a minute. The embeddings are used for approximate similarity (cosine distance), overlap detection, and t-SNE visualization -- tasks where a small local model is perfectly adequate.</p>
<p>The embeddings enable:
- <strong>Overlap clusters</strong>: Draft pairs with &gt;0.85 cosine similarity grouped together
- <strong>Near-duplicate detection</strong>: 25+ pairs with &gt;0.98 similarity flagged as potential duplicates
- <strong>Interactive t-SNE landscape</strong>: 2D visualization of the entire draft space, color-coded by category</p>
<h3 id="stage-4-ideas">Stage 4: Ideas</h3>
<p>The most expensive stage. Each draft's full text is analyzed by Claude to extract discrete technical ideas -- mechanisms, architectures, protocols, patterns, extensions, and requirements.</p>
<p><strong>Batch optimization</strong>: Rather than calling Claude once per draft, we batch 5 drafts per API call using Claude Haiku (<code>--cheap --batch 5</code>). This cuts the number of API calls by 5x and uses the cheaper model. The batch prompt includes all 5 drafts' texts and asks for ideas from each, reducing per-idea cost to fractions of a cent.</p>
<p><strong>Result</strong>: The current database contains <strong>419 ideas</strong> across 377 drafts. An earlier pipeline run produced roughly 1,780 components from 361 drafts (averaging ~5 per draft). The difference reflects changes in extraction parameters, batching strategy, and deduplication -- a known limitation of LLM-based extraction. What is consistent across both runs: the vast majority of extracted ideas appear in exactly one draft, and most are draft-specific component descriptions rather than standalone innovations. The real signal comes from the cross-org overlap analysis (idea-overlap feature), which uses SequenceMatcher fuzzy matching (0.75 threshold) to identify <strong>130 cross-org convergent ideas</strong> where 2+ organizations work on recognizably similar problems (an earlier run with ~1,780 ideas yielded 628; the convergence rate of ~36% is consistent across both).</p>
<h3 id="stage-5-gaps">Stage 5: Gaps</h3>
<p>The gap analysis is a synthesis step. We send Claude Sonnet the full landscape context -- category distributions, idea taxonomy, safety ratio, overlap patterns -- and ask it to identify areas where standardization work is missing or inadequate.</p>
<p>This is the one stage where the LLM is doing genuine reasoning, not just extraction. The prompt provides the data; Claude identifies the structural gaps. We validate its findings against the raw data (e.g., confirming that only 6 ideas address error recovery, or that cross-protocol translation has zero ideas).</p>
<p><strong>Result</strong>: <strong>11 gaps</strong> identified (2 critical, 5 high, 4 medium), each cross-referenced with related drafts and ideas.</p>
<h3 id="stage-6-report">Stage 6: Report</h3>
<p>Reports are generated in Markdown with embedded data tables. Fifteen report types are available: overview, landscape, digest, timeline, overlap-matrix, overlap-clusters, authors, ideas, gaps, refs, trends, idea-overlap, and status. The <code>rich</code> library provides formatted terminal output for CLI commands.</p>
<h2 id="the-database">The Database</h2>
<p>The SQLite database is the real product. At <strong>28 MB</strong>, it contains everything needed to reproduce any finding in this series.</p>
<table>
<thead>
<tr>
<th>Table</th>
<th style="text-align: right;">Rows</th>
<th>Purpose</th>
</tr>
</thead>
<tbody>
<tr>
<td>drafts</td>
<td style="text-align: right;">434</td>
<td>Full metadata + text for every draft</td>
</tr>
<tr>
<td>ratings</td>
<td style="text-align: right;">434</td>
<td>5-dimension quality scores + summaries</td>
</tr>
<tr>
<td>embeddings</td>
<td style="text-align: right;">434</td>
<td>768-dim vectors as binary blobs</td>
</tr>
<tr>
<td>ideas</td>
<td style="text-align: right;">419</td>
<td>Extracted technical components with types</td>
</tr>
<tr>
<td>authors</td>
<td style="text-align: right;">557</td>
<td>Person records from Datatracker</td>
</tr>
<tr>
<td>draft_authors</td>
<td style="text-align: right;">1,057</td>
<td>Author-to-draft linkage with affiliation</td>
</tr>
<tr>
<td>draft_refs</td>
<td style="text-align: right;">4,231</td>
<td>RFC/draft/BCP cross-references</td>
</tr>
<tr>
<td>gaps</td>
<td style="text-align: right;">11</td>
<td>Identified standardization gaps</td>
</tr>
<tr>
<td>llm_cache</td>
<td style="text-align: right;">1,397</td>
<td>Cached Claude API responses</td>
</tr>
</tbody>
</table>
<p>FTS5 full-text search is enabled on drafts, supporting queries like <code>ietf search "agent authentication"</code> that return ranked results in milliseconds. Indexes on <code>draft_refs(ref_type, ref_id)</code> and <code>ideas(draft_name)</code> keep query performance fast even for cross-table joins.</p>
<p>The database design follows a principle: <strong>store raw data, compute derived data</strong>. The drafts table stores full text; the ratings, ideas, and refs tables store analysis results. Any analysis can be re-run without re-fetching from the Datatracker API.</p>
<h2 id="the-author-network">The Author Network</h2>
<p>The author analysis deserves special mention because it revealed the team bloc pattern -- one of the most important findings in the series.</p>
<p>The IETF Datatracker provides author information via two API endpoints:
- <code>/api/v1/doc/documentauthor/?document__name=X</code> -- returns author links per draft
- <code>/api/v1/person/person/{id}/</code> -- returns person details (name, affiliation)</p>
<p>We fetch all authors for all drafts, build a co-authorship graph, and detect team blocs: groups where every pair of members shares at least 70% of their drafts. This threshold was chosen empirically -- lower thresholds produce too many loose groups; higher thresholds miss real teams.</p>
<p>The detection algorithm:
1. For each pair of authors, calculate pairwise overlap = |shared drafts| / min(|A's drafts|, |B's drafts|)
2. Build a graph where edges represent pairs with &gt;= 70% overlap and &gt;= 2 shared drafts
3. Find connected components in this graph
4. Each component is a team bloc</p>
<p><strong>Organization normalization</strong> turned out to be essential. "Huawei Technologies", "Huawei Technologies Co., Ltd.", and "Huawei Canada" all need to resolve to "Huawei". We maintain a hand-curated alias table of 40+ mappings plus automatic suffix stripping for common patterns (", Inc.", " LLC", " AB", etc.). Without this, cross-org analysis would fragment the same company into multiple entities.</p>
<p><strong>Result</strong>: <strong>18 team blocs</strong> detected among 557 authors. The largest: a 13-person Huawei team with 22 shared drafts and 94% average cohesion.</p>
<h2 id="the-new-features">The New Features</h2>
<p>Four features were added during the analysis session, each unlocking a deeper analytical layer. All four run locally with zero API cost.</p>
<h3 id="rfc-cross-references-ietf-refs">RFC Cross-References (<code>ietf refs</code>)</h3>
<p><strong>What it does</strong>: Parses all 434 drafts for RFC references using regex (<code>RFC\s*\d{4,}</code>, <code>\[RFC\d+\]</code>, <code>BCP\s*\d+</code>, <code>draft-[\w-]+</code>). Stores results in a <code>draft_refs</code> table for querying.</p>
<p><strong>What it found</strong>: <strong>4,231 cross-references</strong> (2,443 RFC, 698 draft, 1,090 BCP) across 360 drafts with text. The most-referenced standards reveal what the agent ecosystem builds on:</p>
<table>
<thead>
<tr>
<th>RFC</th>
<th style="text-align: right;">References</th>
<th style="text-align: right;">What It Is</th>
</tr>
</thead>
<tbody>
<tr>
<td>RFC 2119</td>
<td style="text-align: right;">285</td>
<td style="text-align: right;">MUST/SHALL/MAY conventions</td>
</tr>
<tr>
<td>RFC 8174</td>
<td style="text-align: right;">237</td>
<td style="text-align: right;">Key words update</td>
</tr>
<tr>
<td>RFC 8446</td>
<td style="text-align: right;">42</td>
<td style="text-align: right;">TLS 1.3</td>
</tr>
<tr>
<td>RFC 6749</td>
<td style="text-align: right;">36</td>
<td style="text-align: right;">OAuth 2.0</td>
</tr>
<tr>
<td>RFC 9110</td>
<td style="text-align: right;">34</td>
<td style="text-align: right;">HTTP Semantics</td>
</tr>
<tr>
<td>RFC 8259</td>
<td style="text-align: right;">26</td>
<td style="text-align: right;">JSON</td>
</tr>
<tr>
<td>RFC 5280</td>
<td style="text-align: right;">22</td>
<td style="text-align: right;">X.509 Certificates</td>
</tr>
<tr>
<td>RFC 7519</td>
<td style="text-align: right;">22</td>
<td style="text-align: right;">JWT</td>
</tr>
<tr>
<td>RFC 9052</td>
<td style="text-align: right;">20</td>
<td style="text-align: right;">COSE</td>
</tr>
</tbody>
</table>
<p><strong>The insight</strong>: Strip away RFC 2119/8174 (boilerplate conventions that every IETF draft references) and the picture is clear: the agent ecosystem is built on <strong>OAuth + TLS + HTTP + JWT</strong>. It is a security and identity infrastructure, not a networking infrastructure. The IETF's agent standards are being constructed on the same foundation as the web itself. This reframes the entire landscape: agent standards are not something new. They are the next layer on top of the web's existing security architecture.</p>
<h3 id="category-trends-ietf-trends">Category Trends (<code>ietf trends</code>)</h3>
<p><strong>What it does</strong>: Monthly breakdown of new drafts per category with growth rates, comparing recent periods to earlier ones.</p>
<p><strong>What it found</strong>: The growth curve is a step function. Monthly submissions went from 2 (Jun 2025) to 67 (Oct 2025) to 86 (Feb 2026). A2A protocols are still accelerating (26 in Oct/Nov 2025, 36 in Feb 2026). Safety/alignment is growing but slower (5 in Oct 2025, 12 in Feb 2026). The aggregate ~4:1 ratio (which varies from 1.5:1 to 21:1 month-to-month) is narrowing, but not fast enough.</p>
<h3 id="cross-org-idea-overlap-ietf-idea-overlap">Cross-Org Idea Overlap (<code>ietf idea-overlap</code>)</h3>
<p><strong>What it does</strong>: Groups similar ideas using <code>SequenceMatcher</code> (threshold 0.75), then checks which ideas span drafts from multiple organizations. This separates genuine cross-org consensus from intra-team duplication.</p>
<p><strong>What it found</strong>: By exact title, the vast majority of unique ideas appear in only a single draft. But fuzzy matching reveals <strong>130 cross-org convergent ideas</strong> (36% of unique clusters) where 2+ organizations work on recognizably similar problems. The top convergence signal -- "A2A Communication Paradigm" -- spans <strong>8 organizations from 5 countries</strong>. The deeper finding: <strong>180 ideas cross the Chinese-Western organizational divide</strong>. European telecoms (Deutsche Telekom, Telefonica, Orange) act as bridges between Chinese institutions and Western companies. US Big Tech (Google, Apple, Amazon) is almost entirely absent from cross-divide collaboration.</p>
<h3 id="wg-adoption-status-ietf-status">WG Adoption Status (<code>ietf status</code>)</h3>
<p><strong>What it does</strong>: Determines which drafts have been formally adopted by IETF Working Groups based on the <code>draft-ietf-{wg}-*</code> naming convention. Compares scores, categories, and gap coverage between WG-adopted and individual drafts.</p>
<p><strong>What it found</strong>: <strong>52 of 434 drafts (12%)</strong> are WG-adopted. The remaining 90% are individual submissions -- ideas seeking institutional backing. WG-adopted drafts score slightly higher on average (<strong>3.61 vs 3.23</strong>), validating our rating methodology.</p>
<p>The most revealing finding: <strong>a majority of WG-adopted drafts are in security Working Groups</strong> (lamps, lake, tls, emu, ace). The agent-focused <code>aipref</code> WG has only 2 adopted drafts. The IETF is not building agent standards in agent-focused groups -- it is retrofitting its existing security infrastructure for agent use cases. The standards that will actually govern AI agents on the internet are being written by the same people who write TLS and OAuth, not by new agent-specific working groups.</p>
<h2 id="what-we-learned">What We Learned</h2>
<h3 id="llms-are-good-at-structured-extraction">LLMs are good at structured extraction</h3>
<p>Claude's strength in this pipeline is turning unstructured technical documents into structured data: categories, ratings, ideas, gaps. The extraction quality is high -- we spot-checked 50 drafts and found categorization and idea extraction accurate in ~90% of cases. The errors tend to be over-categorization (assigning too many categories) rather than miscategorization.</p>
<h3 id="llms-need-validation-for-synthesis">LLMs need validation for synthesis</h3>
<p>The gap analysis (Stage 5) required the most human oversight. Claude correctly identified the gaps, but the severity rankings and the "zero ideas" claims needed manual verification against the raw data. LLMs can synthesize, but the synthesis should be treated as a hypothesis, not a conclusion.</p>
<h3 id="caching-changes-the-economics">Caching changes the economics</h3>
<p>The <code>llm_cache</code> table transforms the cost model. The first run costs ~$3. Every subsequent run -- adding new drafts, re-running with different prompts, regenerating reports -- costs only for new work. Over the project's life, we estimate caching saved $30+ in redundant API calls. The cache key is a SHA-256 hash of the full prompt, making it trivially collision-resistant.</p>
<h3 id="hybrid-models-work">Hybrid models work</h3>
<p>Using Claude Sonnet for reasoning-heavy tasks (analysis, gap synthesis) and Claude Haiku for extraction-heavy tasks (idea extraction, batch processing) cut costs by 5-10x without meaningful quality loss. Using Ollama for embeddings made similarity analysis free and fast. The principle: match the model's capability to the task's difficulty.</p>
<h3 id="the-free-analyses-are-the-most-revealing">The free analyses are the most revealing</h3>
<p>The four features that cost zero API dollars -- regex-based RFC parsing, SQL-based trend analysis, SequenceMatcher-based idea dedup, and naming-convention-based WG detection -- produced some of the most narratively important findings in the entire series. The OAuth-stack-as-foundation insight from RFC cross-references. The 180 cross-divide ideas. The 10% WG adoption rate. The security-WG-not-agent-WG finding. None of these required an LLM. They required a well-structured database and the right questions.</p>
<h3 id="the-database-is-the-product">The database is the product</h3>
<p>The most valuable output is not any single report -- it is the SQLite database. With all drafts analyzed, ideas extracted, authors mapped, refs parsed, and embeddings stored, the database supports ad-hoc queries that no pre-built report can anticipate. The blog series was written primarily by querying the database, not by re-running the pipeline.</p>
<h2 id="cost-summary">Cost Summary</h2>
<table>
<thead>
<tr>
<th>Stage</th>
<th>Model</th>
<th style="text-align: right;">Drafts</th>
<th style="text-align: right;">Cost</th>
</tr>
</thead>
<tbody>
<tr>
<td>Analyze</td>
<td>Claude Sonnet</td>
<td style="text-align: right;">260</td>
<td style="text-align: right;">~$2.50</td>
</tr>
<tr>
<td>Analyze</td>
<td>Claude Sonnet</td>
<td style="text-align: right;">101</td>
<td style="text-align: right;">~$5.50</td>
</tr>
<tr>
<td>Ideas</td>
<td>Claude Haiku (batch 5)</td>
<td style="text-align: right;">434</td>
<td style="text-align: right;">~$0.80</td>
</tr>
<tr>
<td>Gaps</td>
<td>Claude Sonnet</td>
<td style="text-align: right;">1 call</td>
<td style="text-align: right;">~$0.20</td>
</tr>
<tr>
<td>Embed</td>
<td>Ollama (local)</td>
<td style="text-align: right;">434</td>
<td style="text-align: right;">$0.00</td>
</tr>
<tr>
<td>Refs</td>
<td>Regex (local)</td>
<td style="text-align: right;">434</td>
<td style="text-align: right;">$0.00</td>
</tr>
<tr>
<td>Trends</td>
<td>SQL (local)</td>
<td style="text-align: right;">434</td>
<td style="text-align: right;">$0.00</td>
</tr>
<tr>
<td>Idea-overlap</td>
<td>SequenceMatcher (local)</td>
<td style="text-align: right;">419 ideas</td>
<td style="text-align: right;">$0.00</td>
</tr>
<tr>
<td>WG Status</td>
<td>Naming convention</td>
<td style="text-align: right;">434</td>
<td style="text-align: right;">$0.00</td>
</tr>
<tr>
<td><strong>Total</strong></td>
<td></td>
<td style="text-align: right;"></td>
<td style="text-align: right;"><strong>~$9</strong></td>
</tr>
</tbody>
</table>
<p>For context: analyzing 434 IETF drafts -- fetching full text, rating quality on 5 dimensions, extracting 419 technical ideas, detecting 11 gaps, mapping 557 authors, parsing 4,231 cross-references, and identifying 18 team blocs -- cost less than two large coffees.</p>
<h2 id="the-tech-stack">The Tech Stack</h2>
<ul>
<li><strong>Python 3.11+</strong> with <strong>Click</strong> for the CLI</li>
<li><strong>SQLite</strong> with <strong>FTS5</strong> for full-text search</li>
<li><strong>httpx</strong> for HTTP requests (Datatracker API)</li>
<li><strong>anthropic</strong> SDK for Claude API</li>
<li><strong>ollama</strong> for local embeddings</li>
<li><strong>rich</strong> for terminal formatting</li>
<li><strong>numpy</strong> for cosine similarity and matrix operations</li>
</ul>
<p>43 CLI commands, 13+ interactive visualizations (HTML/PNG), 15 report types. Total codebase: approximately 6,100 lines of Python across 12 modules.</p>
<hr />
<h2 id="limitations">Limitations</h2>
<p><strong>A note on IETF IPR policy</strong>: Internet-Drafts may be subject to intellectual property rights (IPR) claims. Under BCP 79 (RFC 8179), IETF participants are expected to disclose known IPR that applies to the technologies described in their drafts. Implementers considering building on any of the drafts discussed in this series should check the <a href="https://datatracker.ietf.org/ipr/">IETF IPR disclosure database</a> before proceeding.</p>
<p>This analysis is exploratory, not peer-reviewed research. Several methodological limitations should be understood when interpreting the results:</p>
<p><strong>LLM-as-Judge ratings</strong>: All quality ratings are generated by Claude Sonnet from draft abstracts (not full text), with no human calibration. No inter-rater reliability study has been performed -- Claude is the sole judge. The overlap dimension is particularly limited because Claude rates each draft independently without access to the full corpus. Scores should be treated as relative rankings within this corpus, not absolute quality measures.</p>
<p><strong>Keyword-based corpus selection</strong>: The 12 search keywords cast a wide net but introduce both false positives (drafts about "user agents" or "autonomous systems" unrelated to AI) and false negatives (relevant drafts using terminology we did not search for). We estimate 30-50 false positives remain in the corpus. The relevance rating partially mitigates this, but the LLM judge is generous with relevance for keyword-matched drafts.</p>
<p><strong>Clustering thresholds</strong>: The 0.85 cosine similarity threshold for topical clusters, 0.90 for near-duplicates, and 0.98 for functional duplicates are empirical choices based on manual inspection, not derived from a principled analysis. The embedding model (nomic-embed-text) is general-purpose, not fine-tuned for standards documents. A sensitivity analysis across thresholds would strengthen confidence.</p>
<p><strong>Gap analysis</strong>: The gap identification is a single-shot LLM analysis based on compressed landscape statistics, not a systematic comparison against a reference architecture. Gap severity is assigned by Claude without defined thresholds. The gaps should be treated as hypotheses for expert validation, not definitive findings.</p>
<p><strong>Idea extraction quality</strong>: Batch extraction (Haiku, abstract-only at 800 chars) produces different results than individual extraction (Sonnet, abstract + full text). No precision/recall measurement has been performed. The extraction prompt instructs Claude to return 1-4 ideas per draft, which may under-count contributions from comprehensive drafts.</p>
<p><strong>Abstract-only analysis</strong>: Ratings are based on abstracts truncated to 2000 characters. For maturity assessment in particular, the abstract is an imperfect proxy for the full document's technical depth.</p>
<p>For full methodology documentation, see <code>data/reports/methodology.md</code> in the project repository.</p>
<hr />
<h3 id="key-takeaways">Key Takeaways</h3>
<ul>
<li><strong>The full analysis cost ~$9</strong> -- LLM-powered document analysis at scale is practical and cheap with proper caching and model selection</li>
<li><strong>Caching is essential</strong>: SHA-256 hashed prompt caching makes the pipeline idempotent and dramatically reduces costs on re-runs</li>
<li><strong>Hybrid LLM strategy</strong>: Claude Sonnet for reasoning, Claude Haiku for extraction (10x cheaper), Ollama for embeddings (free) -- match model capability to task difficulty</li>
<li><strong>The zero-cost analyses were the most revealing</strong>: RFC cross-references, idea overlap, WG adoption, and trend analysis all run locally and produced the series' most important structural findings</li>
<li><strong>The database is the product</strong>: a well-structured SQLite DB supports queries no pre-built report anticipates; the blog series was written by querying, not re-running</li>
</ul>
<p><em>Next in this series: <a href="08-agents-building-the-analysis.md">Agents Building the Agent Analysis</a> -- we used a team of AI agents to produce this series. The irony is the point.</em></p>
<hr />
<p><em>The IETF Draft Analyzer is open source. The codebase, database, and all reports are available in the project repository.</em></p>
<div class="post-nav"><a href="/blog/posts/06-big-picture.html">&larr; The Big Picture</a><a href="/blog/posts/08-agents-building-the-analysis.html">Agents Building the Agent Analysis &rarr;</a></div>
<footer>
<p>IETF Draft Analyzer &mdash; Data collected through March 2026.
<a href="https://github.com/cnennemann/ietf-draft-analyzer">Source on GitHub</a></p>
</footer>
</div>
</body>
</html>

View File

@@ -0,0 +1,252 @@
<!DOCTYPE html>
<html lang="en">
<head>
<meta charset="utf-8">
<meta name="viewport" content="width=device-width, initial-scale=1">
<title>Agents Building the Agent Analysis — IETF AI Agent Analysis</title>
<link rel="stylesheet" href="/blog/css/style.css">
</head>
<body>
<div class="container">
<nav><a href="/blog/" class="site-title">IETF AI Agent Analysis</a>
<a href="/blog/posts/01-gold-rush.html">Rush</a>
<a href="/blog/posts/02-who-writes-the-rules.html">Rules</a>
<a href="/blog/posts/03-oauth-wars.html">Wars</a>
<a href="/blog/posts/04-what-nobody-builds.html">Builds</a>
<a href="/blog/posts/05-1262-ideas.html">Converge</a>
<a href="/blog/posts/06-big-picture.html">Picture</a>
<a href="/blog/posts/07-how-we-built-this.html">This</a>
<strong>Analysis</strong></nav>
<h1 id="agents-building-the-agent-analysis">Agents Building the Agent Analysis</h1>
<p><em>We used a team of AI agents to analyze, write about, and review 434 IETF Internet-Drafts on AI agents. Here is what that looked like from the inside.</em></p>
<p><em>Analysis based on IETF Datatracker data collected through March 2026. Counts and statistics reflect this snapshot.</em></p>
<hr />
<p>There is an irony we should address up front: this entire blog series -- analyzing 434 Internet-Drafts about how AI agents should work -- was itself produced by a team of AI agents. Twelve Claude instances across three phases, each with a distinct role, reading the same database, building on each other's output, and coordinating through a shared journal and file system.</p>
<p>This post is the story of that process: what worked, what broke, what surprised us, and what it reveals about the state of AI agent coordination in practice -- which, as it happens, is exactly the problem the IETF drafts are trying to solve.</p>
<h2 id="phase-1-the-writing-team">Phase 1: The Writing Team</h2>
<p>We started with four agents, each defined in a one-page file and grounded by a shared 3,000-word team brief:</p>
<table>
<thead>
<tr>
<th>Agent</th>
<th>Role</th>
<th>What They Did</th>
</tr>
</thead>
<tbody>
<tr>
<td><strong>Architect</strong></td>
<td>The Big Picture</td>
<td>Read all reports, designed the narrative arc, wrote the vision document, reviewed every post</td>
</tr>
<tr>
<td><strong>Analyst</strong></td>
<td>The Data Whisperer</td>
<td>Ran the pipeline on 434 drafts, executed 20+ SQL queries, produced data packages</td>
</tr>
<tr>
<td><strong>Coder</strong></td>
<td>The Feature Builder</td>
<td>Implemented 7 new analysis features (refs, trends, idea-overlap, WG adoption, revisions, centrality, co-occurrence)</td>
</tr>
<tr>
<td><strong>Writer</strong></td>
<td>The Storyteller</td>
<td>Drafted all 8 blog posts, applied 6+ revision passes</td>
</tr>
</tbody>
</table>
<p>Each agent had access to the full project codebase, a SQLite database, and the <code>ietf</code> CLI tool. They communicated through files and coordinated through a shared development journal. The team brief contained a thesis statement -- "The IETF is building the highways before the traffic lights" -- a per-post outline, and a data requirements table.</p>
<h3 id="parallel-by-default">Parallel by default</h3>
<p>The key design decision: agents did not wait for each other when they could work in parallel. The Writer's tasks were formally blocked by the Analyst's pipeline run, but the Writer had enough existing data (260 analyzed drafts) to start drafting. Rather than sitting idle, the Writer produced first drafts of all 7 posts while waiting for updated numbers. This turned out to be the right call -- the structure and narrative mattered more than whether the draft count was 260 or 434.</p>
<p>The Coder and Writer worked simultaneously, their outputs feeding each other. Every feature the Coder built used zero API calls -- pure local computation via regex, SQL, SequenceMatcher, and networkx. The RFC cross-reference parser revealed that the Chinese and Western blocs build on incompatible infrastructure foundations (YANG/NETCONF vs. COSE/CBOR), with OAuth 2.0 as the only shared bedrock. The co-occurrence analysis showed safety has zero overlap with Agent Discovery and Model Serving. These zero-cost local analyses produced the most structurally revealing findings in the entire series.</p>
<h3 id="the-architect-shaped-everything">The Architect shaped everything</h3>
<p>The Architect produced fewer words than the Writer and fewer features than the Coder, but had disproportionate impact. Three contributions reshaped the output:</p>
<ol>
<li>The insight that <strong>gap severity correlates with coordination difficulty</strong> transformed Post 4 from a list of gaps into an argument about structural dysfunction.</li>
<li>The <strong>"two equilibria" framing</strong> -- microservices chaos vs. layered web architecture -- gave Post 6's predictions real structural weight.</li>
<li>A <strong>verification pass</strong> that caught the Writer's revisions silently failing (logged as done, not actually persisted in the file).</li>
</ol>
<p>That third point is worth dwelling on. The dev journal said "Post 1 revisions complete." The file still contained the pre-revision content. Without the Architect reading the actual output rather than trusting the status message, the error would have shipped. This is a small-scale version of the Behavior Verification gap the series identifies as critical -- and we will come back to it.</p>
<h3 id="the-human-who-said-so-what">The human who said "so what?"</h3>
<p>The most consequential intervention in the entire project came not from an agent but from the human project lead. The series had been built around a headline number: "1,780 technical ideas extracted from the drafts." The project lead asked: what does that number actually mean?</p>
<p>The answer was uncomfortable. The pipeline extracts roughly 5 ideas per draft on average -- a mechanical process that produces items like "A2A Communication Paradigm" and "Agent Network Architecture." The raw count sounds impressive but is mostly scaffolding. The real signal was hiding in the cross-org overlap analysis: 96% of unique idea titles appear in exactly one draft. Only 75 show up in two or more. The fragmentation that defines the protocol landscape extends all the way down to the idea level.</p>
<p>This required rewriting Post 5 entirely. Its title changed from "The 1,780 Ideas That Will Shape Agent Infrastructure" to "Where 434 Drafts Converge (And Where They Don't)." The lead metric shifted from raw extraction count (impressive but hollow) to the convergence rate (honest and striking). Four agents had independently used the 1,780 figure -- the Analyst generated it, the Coder validated it, the Architect designed around it, the Writer headlined it. None questioned whether it was meaningful.</p>
<h2 id="phase-2-the-review-cycle">Phase 2: The Review Cycle</h2>
<p>After the writing team produced 8 blog posts, a vision document, 7 new analysis features, and 30 dev-journal entries, we did something that turned out to matter more than the writing itself: we sent the entire output to four specialist reviewers, each running in parallel.</p>
<table>
<thead>
<tr>
<th>Reviewer</th>
<th>Lens</th>
<th>Issues Found</th>
</tr>
</thead>
<tbody>
<tr>
<td><strong>Statistics</strong></td>
<td>Data integrity, sampling bias, quantitative accuracy</td>
<td>3 critical, 4 important, 4 minor</td>
</tr>
<tr>
<td><strong>Legal</strong></td>
<td>German/EU internet law, GDPR, EU AI Act, eIDAS 2.0</td>
<td>3 critical, 5 regulatory gaps, 5 improvements</td>
</tr>
<tr>
<td><strong>Engineering</strong></td>
<td>Code quality, security, performance, DX</td>
<td>1 critical, 1 high, 5 bugs, 6 perf issues</td>
</tr>
<tr>
<td><strong>Science</strong></td>
<td>Methodology, reproducibility, related work, hedging</td>
<td>2 critical, 3 high, 4 medium</td>
</tr>
</tbody>
</table>
<p>Four agents, four completely different perspectives, run simultaneously. Together they surfaced <strong>36 distinct issues</strong> that the writing team had missed. The findings were often surprising.</p>
<h3 id="the-statistics-reviewer-found-the-numbers-did-not-add-up">The statistics reviewer found the numbers did not add up</h3>
<p>The statistical audit cross-checked every quantitative claim in the blog series against the actual database using raw SQL queries. The results were sobering. The blog claimed 361 drafts; the database held 434. The blog claimed 1,780 ideas; the database held 419. The blog claimed 12 gaps; the database held 11. Composite scores were inflated by 0.05-0.10 through rounding. The "4:1 safety ratio" varied from 1.5:1 to 21:1 by month -- a fact the flat claim obscured.</p>
<p>The ideas count mismatch was the most serious finding. The entire thesis of Post 5 -- "96% of ideas appear in one draft" and "628 cross-org convergent ideas" -- was not reproducible from the current database. The pipeline had been re-run with different parameters, overwriting the original extraction. Nobody had noticed because the numbers in the blog posts were never re-checked against the live database.</p>
<h3 id="the-legal-reviewer-found-regulatory-blindspots">The legal reviewer found regulatory blindspots</h3>
<p>The legal review, written from a German/EU internet law perspective, identified three critical issues that no technically-focused agent would have caught:</p>
<p><strong>Consent conflation.</strong> The series used "consent" interchangeably across OAuth authorization flows, GDPR consent (Einwilligung under Art. 6(1)(a)), and human-in-the-loop approval gates. These are legally distinct concepts. Under CJEU case law (Planet49), consent requires a clear affirmative act by the data subject. When an AI agent delegates to sub-agents, the chain of consent may break entirely. None of the 14 OAuth-for-agents proposals the series analyzed -- and none of the agents writing about them -- flagged this.</p>
<p><strong>The hospital scenario understated regulatory reality.</strong> Post 4's opening scenario -- an AI agent managing drug dispensing with a hallucinated dosage -- was framed as "what goes wrong if this gap is never addressed." Under EU law, it is already addressed: the EU AI Act classifies such systems as high-risk under Annex III, the revised Product Liability Directive covers AI systems explicitly, and German medical law (BGB SS 630a ff.) places duty of care on the provider. The IETF gap is not in accountability but in technical mechanisms to implement what the regulation already requires.</p>
<p><strong>GDPR was entirely absent from the gap analysis.</strong> The series identified 11 standardization gaps. None mentioned GDPR-mandated capabilities: data protection impact assessments, right to erasure propagation through multi-agent chains, data portability, or purpose limitation. These are not aspirational -- they are legally binding requirements that agent systems operating in the EU must satisfy.</p>
<h3 id="the-engineering-reviewer-found-a-sql-injection">The engineering reviewer found a SQL injection</h3>
<p>The codebase review graded the project B+ overall -- "solid for a research tool, needs hardening for production" -- but found a critical SQL injection vulnerability in <code>db.py</code>. The <code>update_generation_run</code> method interpolated column names from <code>**kwargs</code> directly into SQL strings without validation. The Flask SECRET_KEY was hardcoded as the string <code>"ietf-dashboard-dev"</code>. There was no rate limiting on endpoints that trigger paid Claude API calls.</p>
<p>The engineering reviewer also noted that <code>cli.py</code> had grown to 2,995 lines with approximately 40 repetitions of the same config/db boilerplate pattern. And that test coverage for the analysis pipeline -- the core of the tool -- was exactly zero.</p>
<h3 id="the-science-reviewer-questioned-the-methodology">The science reviewer questioned the methodology</h3>
<p>The scientific review identified the central methodological weakness: the entire rating system relies on Claude as the sole judge for five dimensions, with no human calibration, no inter-rater reliability measurement, and ratings based on abstracts only (truncated to 2,000 characters), not full draft text. The clustering threshold of 0.85 was described as "empirical" with no sensitivity analysis. The gap analysis was single-shot LLM generation from compressed metadata.</p>
<p>One finding was particularly striking: of 434 drafts rated for relevance, the distribution was heavily right-skewed (196 at 4, 98 at 5, only 38 at 1-2). Claude was generous with relevance for keyword-matched drafts, making the metric less discriminating than it should be. Upon manual review, 73 drafts turned out to be false positives -- including <code>draft-ietf-hpke-hpke</code> (generic public key encryption, nothing to do with AI agents) rated at relevance 5.</p>
<h2 id="phase-3-the-fix-cycle">Phase 3: The Fix Cycle</h2>
<p>With 36 issues identified, we launched fix agents -- the Coder handling engineering and data integrity issues, an Editor handling legal and statistical corrections across the blog posts.</p>
<p>The fixes unfolded in three rounds, prioritized by severity:</p>
<p><strong>Round 1 -- Critical.</strong> SQL injection patched with a column name whitelist. Flask SECRET_KEY replaced with <code>os.environ.get()</code> fallback to <code>os.urandom()</code>. FTS5 query sanitization added to prevent search injection. False-positive column added to the ratings table; 73 drafts flagged. All blog posts updated from 361 to 434 drafts. Ideas count discrepancy reconciled (419 current with methodology note explaining the 1,780 historical figure). Gap count corrected from 12 to 11 with rewritten gap table matching database reality.</p>
<p><strong>Round 2 -- High.</strong> Rate limiting added to Claude-calling endpoints (10 req/min/IP). Category names normalized in the database (21 legacy entries migrated). EU AI Act timeline corrected from "within 18 months" to "within 5 months (August 2026)" with enforcement details and article references. OAuth/GDPR consent distinction added. Hospital scenario annotated with AI Act Annex III and Medical Devices Regulation context. Safety ratio qualified everywhere from flat "4:1" to "averaging ~4:1 but varying from 1.5:1 to 21:1 month-to-month."</p>
<p><strong>Round 3 -- Medium.</strong> Methodology documentation created (comprehensive <code>methodology.md</code> covering all pipeline stages, limitations, and related work). IETF IPR notes added. Language hedged where causal claims were only supported by correlation. MIT LICENSE file created (the project claimed "open source" but had no license). FIPA, IEEE P3394, and eIDAS 2.0 references added where they naturally strengthen arguments. Coder reduced <code>cli.py</code> by 200 lines of boilerplate, added <code>--dry-run</code> flags to destructive commands, fixed N+1 query patterns.</p>
<p>In total: 14 files modified across the blog series, 7 security/quality fixes applied to the codebase, test count increased from 23 to 64, and a verified-counts document created as a single source of truth.</p>
<h2 id="what-this-reveals">What This Reveals</h2>
<h3 id="specialized-perspectives-catch-different-things">Specialized perspectives catch different things</h3>
<p>This is the headline finding from the review cycle. Four reviewers looked at the same output and found almost entirely non-overlapping issues. The statistician found number mismatches. The lawyer found consent conflation. The engineer found SQL injection. The scientist found methodological gaps. No single reviewer -- no matter how thorough -- would have caught all 36 issues.</p>
<p>This is not a theoretical observation about diverse review. It is an empirical result from running the experiment. The legal reviewer's consent-conflation finding required knowledge of CJEU case law. The statistical reviewer's ideas-count discovery required querying the live database. The engineering reviewer's SQL injection required reading the source code line by line. These are genuinely different skills applied to the same artifact.</p>
<h3 id="the-review-fix-verify-pattern-works">The review-fix-verify pattern works</h3>
<p>The cycle ran cleanly: four parallel reviews produced a prioritized list; fix agents resolved issues in severity order; the fixes were verified against the review documents. Three rounds (critical, high, medium) imposed natural prioritization. The entire cycle -- 4 reviews plus 3 fix rounds -- happened in a single day.</p>
<p>The pattern mirrors what the IETF itself does with Last Call reviews, directorate reviews, and IESG evaluation. Multiple specialized perspectives, applied in sequence, with verification that issues are resolved. The difference is that our cycle took hours, not months. The cost is that our reviewers share the same underlying model and its blindspots.</p>
<h3 id="agents-modifying-the-same-files-is-the-hard-problem">Agents modifying the same files is the hard problem</h3>
<p>The most persistent coordination difficulty was not conceptual but logistical: multiple agents editing the same blog posts. The Writer updated Post 4's gap table. The Editor changed the safety ratio phrasing. The Coder corrected the draft count. Each edit was correct in isolation. But when three agents modify the same file, merge conflicts and stale reads are inevitable. We hit this multiple times -- most visibly with the Post 1 revisions that silently failed to persist.</p>
<p>This maps directly to the IETF's Agent Execution Model gap. When multiple agents operate on shared state, you need either locking (pessimistic) or conflict detection (optimistic). We had neither. We used a file system, a dev journal, and hope.</p>
<h3 id="the-cheapest-analyses-mattered-most">The cheapest analyses mattered most</h3>
<table>
<thead>
<tr>
<th>Component</th>
<th style="text-align: right;">Cost</th>
<th>Key Finding</th>
</tr>
</thead>
<tbody>
<tr>
<td>Claude Sonnet (ratings, gaps)</td>
<td style="text-align: right;">~$8</td>
<td>4:1 safety deficit, 11 gaps</td>
</tr>
<tr>
<td>Claude Haiku (idea extraction)</td>
<td style="text-align: right;">~$0.80</td>
<td>419 ideas, 96% unique to one draft</td>
</tr>
<tr>
<td>4 reviewers (parallel)</td>
<td style="text-align: right;">~$4</td>
<td>36 issues across 4 dimensions</td>
</tr>
<tr>
<td>Ollama embeddings</td>
<td style="text-align: right;">$0.00</td>
<td>25+ near-duplicate pairs</td>
</tr>
<tr>
<td>Coder: regex, SQL, networkx</td>
<td style="text-align: right;">$0.00</td>
<td>RFC divergence, centrality, co-occurrence</td>
</tr>
<tr>
<td><strong>Total</strong></td>
<td style="text-align: right;"><strong>~$13</strong></td>
<td></td>
</tr>
</tbody>
</table>
<p>The LLM provided the foundation data. Every structurally revealing finding -- RFC foundation divergence, European telecoms as bridge-builders, safety structurally isolated from protocols, 55% fire-and-forget revision rate -- came from deterministic local computation on top of that foundation. The lesson for anyone building LLM-powered analysis: the model is the foundation, not the insight engine.</p>
<h2 id="the-meta-irony">The Meta-Irony</h2>
<p>We built a team of AI agents to analyze IETF drafts about AI agent standards. The team needed coordination, shared context, specialized roles, quality review, human oversight, and output verification. Every one of these needs maps to a gap in the IETF landscape:</p>
<table>
<thead>
<tr>
<th>Our Team Needed</th>
<th>What Happened</th>
<th>IETF Gap</th>
</tr>
</thead>
<tbody>
<tr>
<td>Shared execution context</td>
<td>Agents coordinated via SQLite, files, dev journal</td>
<td>Agent Execution Model (no standard)</td>
</tr>
<tr>
<td>Output verification</td>
<td>Writer's revisions silently failed; Architect caught it manually</td>
<td>Agent Behavioral Verification (critical)</td>
</tr>
<tr>
<td>Quality review</td>
<td>4 parallel reviewers found 36 issues the writing team missed</td>
<td>Agent Behavioral Verification (critical)</td>
</tr>
<tr>
<td>Error handling</td>
<td>Ideas reframing required 3 iterations to stabilize numbers</td>
<td>Real-Time Agent Rollback (high)</td>
</tr>
<tr>
<td>Coordination across approaches</td>
<td>Agents editing the same files with no merge mechanism</td>
<td>Cross-Protocol Agent Migration (medium)</td>
</tr>
<tr>
<td>Human oversight</td>
<td>Project lead's "so what?" redirected the entire ideas framing</td>
<td>Human Override Standardization (high)</td>
</tr>
<tr>
<td>Specialized perspectives</td>
<td>Legal, statistical, engineering, and scientific reviewers each found unique issues</td>
<td>Agent Capability Negotiation (medium)</td>
</tr>
</tbody>
</table>
<p>We solved these problems ad hoc -- with a journal, role definitions, manual verification passes, severity-prioritized fix rounds, and human review. The IETF is trying to solve them at internet scale with protocol standards.</p>
<p>The distance between our 12-agent team and a deployed multi-agent system on the open internet is vast. But the problems are structurally identical. The standards the IETF is racing to write are the standards our own team needed. The traffic lights the highway needs are the ones we built by hand.</p>
<hr />
<h3 id="key-takeaways">Key Takeaways</h3>
<ul>
<li><strong>Twelve agents across three phases</strong> (4 writers, 4 reviewers, 4 fixers) produced 8 blog posts, a vision document, 7 analysis features, 36 identified issues, and 64 tests -- from a ~$13 pipeline</li>
<li><strong>Four parallel reviewers found 36 non-overlapping issues</strong>: a SQL injection, consent conflation with EU law, a 76% ideas count mismatch, and uncalibrated LLM-as-judge methodology. No single reviewer would have caught all of them</li>
<li><strong>The human project lead's "so what?"</strong> was the single most consequential intervention -- no agent questioned whether the headline metric was meaningful</li>
<li><strong>A silent failure</strong> (revisions logged but not persisted) demonstrated the same Behavior Verification gap the series identifies as critical in the IETF landscape</li>
<li><strong>The team's coordination problems mirror the IETF's gaps</strong>: shared state, output verification, error recovery, capability negotiation, and human oversight are needed at every scale</li>
</ul>
<p><em>This post concludes the series. All data, code, and reports are available in the IETF Draft Analyzer project repository.</em></p>
<hr />
<p><em>Written by a team of Claude instances analyzing the IETF's work on AI agent standards. The irony is not lost on us.</em></p>
<div class="post-nav"><a href="/blog/posts/07-how-we-built-this.html">&larr; How We Built This</a><span></span></div>
<footer>
<p>IETF Draft Analyzer &mdash; Data collected through March 2026.
<a href="https://github.com/cnennemann/ietf-draft-analyzer">Source on GitHub</a></p>
</footer>
</div>
</body>
</html>