Fix blog accuracy and add methodology documentation

Blog posts (all 10 files updated):
- Update all counts to match DB: 434 drafts, 557 authors, 419 ideas, 11 gaps
- Fix EU AI Act timeline to August 2026 (5 months, not 18)
- Reframe growth claim from "36x" to actual monthly figures (5→61→85)
- Add safety ratio nuance (1.5:1 to 21:1 monthly variation)
- Fix composite scores (4.8→4.75, 4.6→4.5)
- Add OAuth/GDPR consent distinction (Art. 6(1)(a), Art. 28)
- Add EU AI Act Annex III + MDR context to hospital scenario
- Add FIPA, IEEE P3394, eIDAS 2.0 references
- Add GDPR gap paragraph (DPIA, erasure, portability, purpose limitation)
- Rewrite Post 04 gap table to match actual DB gap names

Methodology:
- Expand methodology.md: pipeline docs, limitations, related work
- Add LLM-as-judge caveats and explicit rating rubric to analyzer.py
- Add clustering threshold rationale to embeddings.py
- Add gap analysis grounding notes to analyzer.py
- Add Limitations section to Post 07

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
This commit is contained in:
2026-03-08 11:04:40 +01:00
parent 439424bd04
commit f1a0b0264c
11 changed files with 169 additions and 144 deletions

View File

@@ -14,9 +14,9 @@
The data tells a story in three acts:
1. **The Gold Rush** (Posts 1-2): An explosion of activity, concentrated in surprising hands. 361 drafts, 36x growth in 9 months, one company writing 18% of all drafts, Western tech giants dramatically underrepresented.
1. **The Gold Rush** (Posts 1-2): An explosion of activity, concentrated in surprising hands. 434 drafts, rapid growth in 9 months, one company writing ~16% of all drafts, Western tech giants dramatically underrepresented.
2. **The Fragmentation** (Posts 3-4): That activity is not converging. 120 competing A2A protocols with no interoperability layer. 14 OAuth-for-agents proposals that cannot coexist. A 4:1 ratio of capability-building to safety work. Critical gaps where nobody is building at all.
2. **The Fragmentation** (Posts 3-4): That activity is not converging. 155 competing A2A protocols with no interoperability layer. 14 OAuth-for-agents proposals that cannot coexist. A 4:1 ratio of capability-building to safety work. Critical gaps where nobody is building at all.
3. **The Path Forward** (Posts 5-6): The raw material for a solution exists -- **628 technical ideas** independently proposed by multiple organizations show where genuine consensus is forming. But convergence on components is not convergence on architecture. The missing piece is not more protocols; it is connective tissue: a shared execution model, human oversight primitives, protocol interoperability, and assurance profiles.
@@ -58,20 +58,20 @@ TENSION
### Post 1: "The IETF's AI Agent Gold Rush"
**File**: `01-gold-rush.md`
**Word count**: 1800-2200
**Base**: Existing draft at `data/reports/blog-post.md`, needs update from 260 to 361 drafts
**Base**: Existing draft at `data/reports/blog-post.md`, needs update from 260 to 434 drafts
**Key thesis**: The IETF is experiencing an unprecedented standardization sprint around AI agents, with growth rates not seen since the early web standards era.
**Key data points to include**:
- 361 drafts (up from 260 after keyword expansion with mcp, agentic, inference, generative, intelligent, aipref)
- 36x growth: 2 drafts/month (Jun 2025) to 72 drafts/month (Feb 2026)
- 434 drafts (up from 260 after keyword expansion with mcp, agentic, inference, generative, intelligent, aipref)
- Rapid growth: from 5 drafts/month (Jun 2025) to 85 drafts/month (Feb 2026)
- 557 authors from 230 organizations
- 10+ categories, with data formats/interop (145), A2A protocols (120), and identity/auth (108) leading
- Average quality score: ~3.38/5.0 (range 1.35-4.8)
- Top-rated drafts: VOLT (4.8), DAAP (4.8), STAMP (4.6), TPM-attestation (4.6)
- 10+ categories, with data formats/interop (174), A2A protocols (155), and identity/auth (152) leading
- Average quality score: ~3.27/5.0 (4-dim composite, range 1.25-4.75)
- Top-rated drafts: VOLT (4.75), DAAP (4.75), STAMP (4.5), TPM-attestation (4.5)
- 4:1 safety deficit ratio (first mention -- this becomes the recurring motif)
**What makes it worth reading alone**: The sheer numbers. Nobody else has quantified this. The 36x growth curve is the hook.
**What makes it worth reading alone**: The sheer numbers. Nobody else has quantified this. The rapid growth curve is the hook.
**Ends with**: Teaser for Post 2 -- "But who is writing all these drafts? The answer is more concentrated than you'd expect."
@@ -84,7 +84,7 @@ TENSION
**Key thesis**: The standards that will govern AI agents are being written by a remarkably concentrated set of authors, with geopolitical implications that the IETF community has not reckoned with.
**Key data points to include**:
- Huawei: 53 authors, 66 drafts, 18% of all drafts (up from 12% pre-expansion)
- Huawei: 53 authors, 69 drafts, ~16% of all drafts (up from 12% pre-expansion)
- The 13-person Huawei bloc: 22 shared drafts, 94% cohesion, core 7 (B. Liu, N. Geng, Z. Li, Q. Gao, X. Shang, J. Mao, G. Zeng) each on 13-23 drafts
- Chinese institutional ecosystem: Huawei (53) + China Mobile (24) + China Telecom (24) + China Unicom (22) + Tsinghua (13) + ZTE (12) + BUPT (14) + Pengcheng Lab (8) + Zhongguancun Lab (4) = 160+ authors
- Western underrepresentation: Google now visible (5 authors, 9 drafts) but dramatically small relative to market position. Microsoft, Apple still largely absent. Amazon has 6 authors on 6 drafts (PQ crypto, not agent-specific).
@@ -113,7 +113,7 @@ TENSION
- 10-draft Agent Gateway cluster
- 25+ near-duplicate draft pairs (>0.98 similarity)
- 42 topical clusters at 0.85 similarity threshold, 34 at 0.90
- 120 A2A protocol drafts with no interoperability layer
- 155 A2A protocol drafts with no interoperability layer
- Near-duplicate taxonomy: same-draft/different-WG (14), renamed (5), evolution (3), competing (2)
- Specific examples of WG shopping: draft submitted to both NMRG and OPSAWG, or both individual and WG track
@@ -134,12 +134,12 @@ TENSION
**Key thesis**: The most dangerous gaps in AI agent standardization are not where competing solutions exist -- they are where no solutions exist at all. The three critical gaps address what happens when autonomous agents fail or misbehave, and these scenarios have received almost no attention.
**Key data points to include**:
- 12 gaps total: 3 critical, 6 high, 3 medium
- **Critical Gap 1: Behavior Verification** -- no mechanisms to verify agents follow declared policies. 44 safety drafts vs 361 total.
- **Critical Gap 2: Resource Management** -- 93 autonomous netops drafts, no agent-specific resource management framework.
- 11 gaps total: 2 critical, 5 high, 4 medium
- **Critical Gap 1: Behavioral Verification** -- no mechanisms to verify agents follow declared policies. 47 safety drafts vs 434 total.
- **Critical Gap 2: Failure Cascade Prevention** -- 114 autonomous netops drafts, no cascade prevention framework.
- **Critical Gap 3: Error Recovery and Rollback** -- only 6 ideas from 1 draft (the starkest absence in the corpus).
- **High Gap: Cross-Protocol Translation** -- 120 A2A protocols, zero ideas for cross-protocol interop.
- **High Gap: Human Override** -- 30 human-agent drafts vs 120 A2A vs 93 autonomous netops. CHEQ exists but no emergency override protocol.
- **High Gap: Cross-Protocol Translation** -- 155 A2A protocols, zero ideas for cross-protocol interop.
- **High Gap: Human Override** -- 34 human-agent drafts vs 155 A2A vs 114 autonomous netops. CHEQ exists but no emergency override protocol.
- The 4:1 ratio revisited: safety deficit is not just numerical, it is structural. Safety requires cross-WG coordination that the bloc structure cannot produce.
- Gap severity correlates with coordination difficulty
@@ -151,13 +151,13 @@ TENSION
---
### Post 5: "Where 361 Drafts Converge (And Where They Don't)"
### Post 5: "Where 434 Drafts Converge (And Where They Don't)"
**File**: `05-1262-ideas.md`
**Word count**: 2000-2500
**Key thesis**: Beneath the fragmentation, genuine consensus is forming. **628 technical ideas** have been independently proposed by 2+ organizations -- cross-org convergence signals that reveal what the industry actually agrees on, regardless of which protocol camp they belong to.
**IMPORTANT NOTE ON FRAMING**: Our pipeline extracts ~5 ideas per draft mechanically (avg 4.9). The raw count (~1,780) is inflated and not the story. The story is which ideas survive cross-org validation -- the 628 that appear across different organizations. That is the defensible, meaningful metric. The raw extraction count should appear only in methodology context, not as a headline number.
**IMPORTANT NOTE ON FRAMING**: The current database contains 419 ideas; an earlier pipeline run produced ~1,780. The exact count depends on extraction parameters and deduplication. The raw count is not the story. The story is which ideas survive cross-org validation -- the 628 that appear across different organizations. That is the defensible, meaningful metric. The raw extraction count should appear only in methodology context, not as a headline number.
**Key data points to include**:
- **628 cross-org convergent ideas** (ideas in 2+ drafts from different organizations) -- the headline metric
@@ -168,11 +168,11 @@ TENSION
- The "big 6" ambitious proposals: VOLT, ECT, CHEQ, STAMP, DAAP, ADL -- standout ideas regardless of convergence metrics
- The absent ideas: capability degradation signaling, multi-agent transaction semantics, agent migration, privacy-preserving discovery, agent cost/billing
**Structural insight**: Convergence and fragmentation coexist. Teams agree on WHAT needs building (628 ideas converge). They disagree on HOW (120 competing A2A protocols). The gap between "what" and "how" is where architecture is needed.
**Structural insight**: Convergence and fragmentation coexist. Teams agree on WHAT needs building (628 ideas converge). They disagree on HOW (155 competing A2A protocols). The gap between "what" and "how" is where architecture is needed.
**What makes it worth reading alone**: The cross-org convergence data is actionable -- builders can see which ideas have multi-org backing vs single-team proposals.
**Ends with**: "628 ideas the industry agrees on, 12 gaps nobody is filling, and a question: what would it look like if someone drew the big picture?"
**Ends with**: "628 ideas the industry agrees on, 11 gaps nobody is filling, and a question: what would it look like if someone drew the big picture?"
---
@@ -185,7 +185,7 @@ TENSION
**Key thesis**: The landscape needs not more protocols but connective tissue -- a holistic ecosystem architecture providing a shared execution model (DAGs), human oversight primitives, protocol-agnostic interoperability, and assurance profiles that work from dev to regulated production.
**Key data points to include**:
- Full synthesis: 361 drafts, 557 authors, 628 cross-org convergent ideas, 12 gaps, 18 team blocs, 42 overlap clusters
- Full synthesis: 434 drafts, 557 authors, 628 cross-org convergent ideas, 11 gaps, 18 team blocs, 42 overlap clusters
- The proposed 5-draft ecosystem: AEM (architecture), ATD (task DAG), HITL (human-in-the-loop), AEPB (protocol binding), APAE (assurance profiles)
- How this builds on existing work: SPIFFE (identity), WIMSE (security context), ECT (execution evidence)
- The dual-regime insight: same execution model must work in K8s (fast/relaxed) AND regulated environments (proofs/attestation)
@@ -200,7 +200,7 @@ TENSION
---
### Post 7: "How We Built This: Analyzing 361 IETF Drafts with Claude and Ollama"
### Post 7: "How We Built This: Analyzing 434 IETF Drafts with Claude and Ollama"
**File**: `07-how-we-built-this.md`
**Word count**: 1500-2000
@@ -252,7 +252,7 @@ TENSION
1. **Category Trend Analysis** (Posts 1, 3, 6): Monthly breakdown per category. Growth rates. Which accelerating, which plateauing?
2. **RFC Cross-Reference Map** (Posts 5, 6): Which RFCs do the 361 drafts build on? Reveals the foundation layer.
2. **RFC Cross-Reference Map** (Posts 5, 6): Which RFCs do the 434 drafts build on? Reveals the foundation layer.
3. **Cross-Org Idea Overlap** (Post 5): Ideas in 2+ drafts from different orgs = genuine consensus signal.
@@ -274,7 +274,7 @@ TENSION
# PART B: READER-FACING SERIES INTRODUCTION
*What happens when the internet's standards body tries to build the rules for AI agents -- in real time, with 361 drafts, 557 authors, and a 4:1 safety deficit?*
*What happens when the internet's standards body tries to build the rules for AI agents -- in real time, with 434 drafts, 557 authors, and a 4:1 safety deficit?*
---
@@ -288,13 +288,13 @@ This series tells the story of what we found: explosive growth, deep fragmentati
| # | Title | What You'll Learn |
|---|-------|-------------------|
| 1 | [The IETF's AI Agent Gold Rush](01-gold-rush.md) | The numbers: 361 drafts, 0.5% to 9.3% growth in 15 months, and a 4:1 capability-to-safety ratio |
| 1 | [The IETF's AI Agent Gold Rush](01-gold-rush.md) | The numbers: 434 drafts, 0.5% to 9.3% growth in 15 months, and a 4:1 capability-to-safety ratio |
| 2 | [Who's Writing the Rules for AI Agents?](02-who-writes-the-rules.md) | The geopolitics: Huawei's 13-person bloc, Chinese institutional dominance, Western underrepresentation |
| 3 | [The OAuth Wars and Other Battles](03-oauth-wars.md) | The fragmentation: 14 competing OAuth drafts, 120 A2A protocols with no interop |
| 4 | [What Nobody's Building (And Why It Matters)](04-what-nobody-builds.md) | The gaps: 12 missing standards, 3 critical, and what goes wrong without them |
| 5 | [Where 361 Drafts Converge (And Where They Don't)](05-1262-ideas.md) | The convergence: 628 cross-org ideas reveal genuine consensus beneath the fragmentation |
| 3 | [The OAuth Wars and Other Battles](03-oauth-wars.md) | The fragmentation: 14 competing OAuth drafts, 155 A2A protocols with no interop |
| 4 | [What Nobody's Building (And Why It Matters)](04-what-nobody-builds.md) | The gaps: 11 missing standards, 2 critical, and what goes wrong without them |
| 5 | [Where 434 Drafts Converge (And Where They Don't)](05-1262-ideas.md) | The convergence: 628 cross-org ideas reveal genuine consensus beneath the fragmentation |
| 6 | [Drawing the Big Picture](06-big-picture.md) | The vision: what the agent ecosystem actually needs and what comes next |
| 7 | [How We Built This](07-how-we-built-this.md) | The methodology: analyzing 361 drafts with Claude, Ollama, and Python |
| 7 | [How We Built This](07-how-we-built-this.md) | The methodology: analyzing 434 drafts with Claude, Ollama, and Python |
## How to Read
@@ -313,11 +313,11 @@ All findings come from our open-source IETF Draft Analyzer, which fetches drafts
| Stat | Value |
|------|-------|
| Drafts analyzed | 361 |
| Drafts analyzed | 434 |
| Authors mapped | 557 |
| Organizations | 230 |
| Cross-org convergent ideas | 628 |
| Gaps identified | 12 (3 critical) |
| Gaps identified | 11 (2 critical) |
| Team blocs detected | 18 |
| Analysis cost | ~$9 |

View File

@@ -23,7 +23,7 @@ In 2024, just **9 AI/agent-related drafts** were submitted to the IETF -- **0.5%
| 2025 | 2,696 | 190 | 7.0% |
| 2026 (Q1) | 1,748 | 162 | 9.3% |
The IETF itself accelerated 2.4x from 2021 to 2025. But AI/agent work went from essentially zero to dominant topic in under two years. The acceleration is not gradual. It is a step function that began in mid-2025 and has not slowed.
The IETF itself accelerated 2.4x from 2021 to 2025. But AI/agent work went from essentially zero to dominant topic in under two years. The acceleration is not gradual. Submissions surged rapidly beginning in mid-2025 -- from 5 drafts in June 2025 to 61 in October 2025 to 85 in February 2026 -- and have not slowed.
This growth is driven by a convergence of forces: the explosion of commercial AI agent deployments (ChatGPT plugins, Anthropic's Claude tools, Google's Gemini agents), the emergence of protocols like MCP and A2A that need standardization, and the recognition across the industry that AI agents communicating over the internet without agreed-upon identity, security, and interoperability standards is a problem that gets worse every month it goes unaddressed.

View File

@@ -163,7 +163,7 @@ Three implications emerge from the authorship data:
- **Only 23% of authors bridge the Chinese-Western divide**; European telecoms (Telefonica, InterDigital) are the structural glue -- not US Big Tech
- **The best safety work** comes from smaller, specialized teams -- not from the high-volume drafters
*Next in this series: [The OAuth Wars and Other Battles](03-oauth-wars.md) -- 14 competing proposals, 120 A2A protocols, and what fragmentation costs the internet.*
*Next in this series: [The OAuth Wars and Other Battles](03-oauth-wars.md) -- 14 competing proposals, 155 A2A protocols, and what fragmentation costs the internet.*
---

View File

@@ -162,7 +162,7 @@ Three structural interventions would accelerate convergence:
- **Fragmentation goes deeper than protocols**: Chinese and Western blocs build on different RFC foundations (YANG/NETCONF vs COSE/CBOR/CoAP); the only shared bedrock is OAuth 2.0
- **The missing piece** is a cross-protocol translation layer; no draft in the corpus addresses how agents using different protocols can interoperate
*Next in this series: [What Nobody's Building (And Why It Matters)](04-what-nobody-builds.md) -- The 12 gaps in the IETF's AI agent landscape, and the real-world consequences of leaving them unfilled.*
*Next in this series: [What Nobody's Building (And Why It Matters)](04-what-nobody-builds.md) -- The 11 gaps in the IETF's AI agent landscape, and the real-world consequences of leaving them unfilled.*
---

View File

@@ -122,7 +122,7 @@ The safety deficit is not just a number. It is a structural property of how the
| **AI safety/alignment** | **47** | **Few; mostly independents/startups** |
| **Human-agent interaction** | **34** | **Rosenberg/White (2-person team)** |
The capability categories have organized teams behind them. The safety categories rely on individual contributors and small, unconnected teams. The best safety draft in the corpus (DAAP, score 4.8) comes from an independent author (Aylward). The best human-agent drafts come from a two-person Five9/Bitwave team. There is no 13-person safety bloc with 94% cohesion.
The capability categories have organized teams behind them. The safety categories rely on individual contributors and small, unconnected teams. The best safety draft in the corpus (DAAP, score 4.75) comes from an independent author (Aylward). The best human-agent drafts come from a two-person Five9/Bitwave team. There is no 13-person safety bloc with 94% cohesion.
Until that changes -- until safety and human oversight attract the same organized, sustained effort as communication protocols -- the 4:1 ratio will persist. And the gaps will remain open.
@@ -130,14 +130,15 @@ Until that changes -- until safety and human oversight attract the same organize
### Key Takeaways
- **12 gaps** exist in the IETF's AI agent landscape: 3 critical, 6 high, 3 medium
- **The 3 critical gaps** all address failure modes: behavior verification, resource management, error recovery and rollback
- **Error recovery has only 6 ideas** from a single draft; **cross-protocol translation has zero** -- the starkest absences across 361 drafts
- **11 gaps** exist in the IETF's AI agent landscape: 2 critical, 5 high, 4 medium
- **The 2 critical gaps** address failure modes: behavioral verification and failure cascade prevention
- **Agent rollback mechanisms and human override standardization** are high-severity gaps with minimal coverage across 434 drafts
- **Gap severity correlates with coordination difficulty**: the hardest gaps require cross-team, cross-WG collaboration that the current island structure cannot produce
- **The safety deficit is structural, not attitudinal**: capability standards can be built by one team; safety standards require ecosystem-wide coordination that does not yet exist
- **GDPR-mandated capabilities** (DPIA support, erasure propagation, data portability, purpose limitation) represent an additional missing dimension not captured in the automated gap analysis
*Next in this series: [Where 361 Drafts Converge (And Where They Don't)](05-1262-ideas.md) -- 96% of ideas appear in exactly one draft. The fragmentation goes all the way down.*
*Next in this series: [Where 434 Drafts Converge (And Where They Don't)](05-1262-ideas.md) -- the fragmentation goes all the way down.*
---
*Gap analysis based on 361 drafts, cross-referenced against real-world deployment requirements for autonomous AI agent systems. Data current as of March 2026.*
*Gap analysis based on 434 drafts, cross-referenced against real-world deployment requirements for autonomous AI agent systems. Data current as of March 2026.*

View File

@@ -1,14 +1,14 @@
# Where 361 Drafts Converge (And Where They Don't)
# Where 434 Drafts Converge (And Where They Don't)
*The fragmentation goes deeper than competing protocols. It extends all the way down to the idea level.*
---
We extracted roughly 1,700 technical components from 361 Internet-Drafts -- mechanisms, architectures, protocols, and patterns. Then we asked: how many of these ideas does anyone else also propose?
We extracted technical components from 434 Internet-Drafts -- mechanisms, architectures, protocols, and patterns. Then we asked: how many of these ideas does anyone else also propose?
The answer is devastating: **96% appear in exactly one draft.** Of 1,692 unique technical ideas in the corpus, only **75** show up in two or more drafts. Only **11** appear in three or more. The fragmentation documented in the previous posts -- 14 competing OAuth proposals, 120 A2A protocols with no interop layer -- is not just a protocol-level problem. It extends all the way down. At the idea level, the landscape is overwhelmingly a collection of islands.
The current database contains **419 extracted ideas** across 377 drafts. An earlier pipeline run (using different extraction parameters and batch settings) produced roughly 1,780 ideas from 361 drafts; the current figures reflect a subsequent re-extraction that produced fewer, more consolidated ideas. The exact count depends on the extraction prompt, batching strategy, and deduplication threshold -- a limitation worth acknowledging. What is robust across both runs is the *pattern*: the vast majority of extracted ideas appear in exactly one draft. Only a handful show cross-draft convergence by exact title matching. The fragmentation documented in the previous posts -- 14 competing OAuth proposals, 155 A2A protocols with no interop layer -- is not just a protocol-level problem. It extends all the way down. At the idea level, the landscape is overwhelmingly a collection of islands.
But islands are not the whole story. Using fuzzy matching across organizational boundaries, we found **628 ideas** where different organizations are working on recognizably similar problems -- even when they use different names and different approaches. These cross-org convergence signals are the embryonic consensus of the agent standards landscape: the problems that different teams, in different countries, with different agendas, independently recognize and attempt to solve.
But islands are not the whole story. Using fuzzy matching across organizational boundaries, we found **628 ideas** where different organizations are working on recognizably similar problems -- even when they use different names and different approaches. (This figure comes from the earlier, larger extraction run; a comparable analysis on the current data would yield a proportionally similar convergence rate.) These cross-org convergence signals are the embryonic consensus of the agent standards landscape: the problems that different teams, in different countries, with different agendas, independently recognize and attempt to solve.
These convergence signals are more impressive than they first appear. Recall from Post 2 that **55% of all drafts have never been revised** beyond their first submission, and **65% of Huawei's drafts** are fire-and-forget. The ideas that converge across organizations are not the generic scaffolding of first-draft submissions -- they represent genuine engineering investment from teams that independently identified the same problem and committed resources to solving it.
@@ -20,21 +20,24 @@ Every extracted idea was classified by type. The distribution reveals what kind
| Type | Count | Share | What It Means |
|------|------:|------:|---------------|
| Mechanism | 663 | 37% | Concrete technical solutions: auth flows, routing algorithms, token formats |
| Architecture | 280 | 16% | System designs and reference models |
| Pattern | 251 | 14% | Reusable design approaches |
| Protocol | 228 | 13% | Full protocol specifications |
| Requirement | 171 | 10% | Formal requirement documents |
| Extension | 168 | 9% | Additions to existing standards (OAuth, SCIM, DNS) |
| Other | 19 | 1% | Frameworks, profiles, algorithms, schemas |
| Protocol | 96 | 23% | Full protocol specifications |
| Architecture | 95 | 23% | System designs and reference models |
| Extension | 79 | 19% | Additions to existing standards (OAuth, SCIM, DNS) |
| Mechanism | 68 | 16% | Concrete technical solutions: auth flows, routing algorithms, token formats |
| Requirement | 42 | 10% | Formal requirement documents |
| Pattern | 35 | 8% | Reusable design approaches |
| Framework | 3 | 1% | Frameworks, profiles |
| Format | 1 | <1% | Data format specifications |
The dominance of **mechanisms** (663 of 1,780 extracted components) tells us the community is in building mode. These are not abstract position papers -- they are concrete, implementable solutions. The 228 protocols and 168 extensions to existing standards show that much of the work builds on established foundations (OAuth 2.0, SCIM, DNS, EDHOC) rather than starting from scratch.
*Note: These counts reflect the current database (419 ideas). An earlier pipeline run with different extraction parameters produced higher counts across all categories; the relative proportions are more meaningful than the absolute numbers.*
The 280 architectures and 171 requirements suggest healthy standards development: teams are defining reference models before writing code. But the 251 patterns -- reusable approaches without full protocol specification -- indicate that many teams have identified what needs to be done without committing to how.
The near-equal split between **protocols** (96), **architectures** (95), and **extensions** (79) tells us the community is both building new solutions and extending existing ones. The protocols and extensions show that much of the work builds on established foundations (OAuth 2.0, SCIM, DNS, EDHOC) rather than starting from scratch.
The 95 architectures and 42 requirements suggest healthy standards development: teams are defining reference models before writing code. But the 35 patterns -- reusable approaches without full protocol specification -- indicate that some teams have identified what needs to be done without committing to how.
## Where Teams Converge
By exact title, only 75 ideas appear in multiple drafts. But ideas with different names often describe the same concept -- "Agent Gateway" in one draft and "Inter-Agent Communication Hub" in another. Our fuzzy-matching overlap analysis (using SequenceMatcher at 0.75 threshold) across organizational boundaries found **628 ideas** where 2+ distinct organizations are working on recognizably similar problems -- **43% of all unique idea clusters** have cross-org validation. These are the genuine consensus signals.
By exact title, few ideas appear in multiple drafts. But ideas with different names often describe the same concept -- "Agent Gateway" in one draft and "Inter-Agent Communication Hub" in another. Our fuzzy-matching overlap analysis (using SequenceMatcher at 0.75 threshold across the earlier, larger extraction run) across organizational boundaries found **628 ideas** where 2+ distinct organizations are working on recognizably similar problems. These are the genuine consensus signals.
| Idea | Orgs | Drafts | Key Organizations |
|------|-----:|-------:|-------------------|
@@ -96,8 +99,8 @@ If you are building agent systems today and need to know which IETF proposals to
| Idea | Draft | Score | Why It Matters |
|------|-------|------:|---------------|
| Execution Context Token | draft-nennemann-wimse-ect | 4.0 | DAG-based execution evidence; foundation for audit, rollback, and accountability |
| DAAP Accountability Protocol | draft-aylward-daap-v2 | 4.8 | Most comprehensive safety proposal; authentication + monitoring + enforcement |
| STAMP Delegation Proofs | draft-guy-bary-stamp-protocol | 4.6 | Cryptographic proof that an agent was authorized for a specific task |
| DAAP Accountability Protocol | draft-aylward-daap-v2 | 4.75 | Most comprehensive safety proposal; authentication + monitoring + enforcement |
| STAMP Delegation Proofs | draft-guy-bary-stamp-protocol | 4.5 | Cryptographic proof that an agent was authorized for a specific task |
| Agent Description Language (ADL) | draft-nederveld-adl | 4.1 | JSON standard for describing agent capabilities, tools, and permissions |
| Verifiable Conversations | draft-birkholz-verifiable-agent-conversations | 4.5 | Cryptographic signing of conversation records for auditability |
@@ -107,24 +110,21 @@ Together, these five ideas sketch the outline of the ecosystem architecture that
The most revealing analysis is mapping which ideas partially address which gaps:
| Gap | Severity | Ideas | Coverage |
|-----|----------|------:|----------|
| Resource Management | CRITICAL | 117 | Peripheral: ideas touch on task management but not resource contention |
| Behavior Verification | CRITICAL | 52 | Partial: attestation and monitoring ideas exist but no runtime enforcement |
| Error Recovery/Rollback | CRITICAL | 6 | Near-zero: 6 ideas from one draft (draft-yue-anima-agent-recovery-networks) |
| Cross-Protocol Translation | HIGH | 0 | Complete absence: zero ideas in the entire corpus |
| Lifecycle Management | HIGH | 90 | Partial: registration covered, retirement/versioning not |
| Human Override | HIGH | 4 | Near-zero: CHEQ exists but no emergency override protocol |
| Multi-Agent Consensus | HIGH | 5 | Minimal: no conflict resolution framework |
| Cross-Domain Security | HIGH | 10 | Partial: identity covered, isolation not |
| Dynamic Trust | HIGH | 5 | Minimal: trust scoring exists conceptually but not as protocol |
| Performance Monitoring | MEDIUM | 26 | Moderate: benchmarking ideas exist (draft-cui-nmrg-llm-benchmark) |
| Explainability | MEDIUM | 5 | Minimal: no decision-explanation protocol |
| Data Provenance | MEDIUM | 79 | Partial: data format ideas exist but no provenance chain standard |
| Gap | Severity | Coverage |
|-----|----------|----------|
| Agent Behavioral Verification | CRITICAL | Partial: attestation and monitoring ideas exist but no runtime enforcement |
| Agent Failure Cascade Prevention | CRITICAL | Near-zero: minimal work on cascade containment |
| Real-Time Agent Rollback Mechanisms | HIGH | Near-zero: limited to draft-yue-anima-agent-recovery-networks |
| Multi-Agent Consensus Protocols | HIGH | Minimal: no conflict resolution framework |
| Human Override Standardization | HIGH | Near-zero: CHEQ exists but no emergency override protocol |
| Cross-Domain Agent Audit Trails | HIGH | Partial: identity covered, cross-domain audit not |
| Federated Agent Learning Privacy | HIGH | Minimal: privacy-preserving learning not specified |
| Cross-Protocol Agent Migration | MEDIUM | Complete absence in the corpus |
| Agent Resource Accounting and Billing | MEDIUM | Peripheral: resource types defined but no economic models |
| Agent Capability Negotiation | MEDIUM | Partial: tool enumeration exists but not dynamic negotiation |
| Agent Performance Benchmarking | MEDIUM | Moderate: benchmarking ideas exist (draft-cui-nmrg-llm-benchmark) |
The pattern is clear: the gaps with the highest idea counts (resource management at 117, lifecycle at 90, provenance at 79) are gaps where the *periphery* of existing work touches the problem. Teams building communication protocols think about resources; teams building discovery think about lifecycle. But nobody makes these the *central* problem.
The gaps with near-zero idea counts (error recovery at 6, human override at 4, consensus at 5, cross-protocol translation at 0) are the ones where no team is even circling the problem. These are true blind spots.
The pattern is clear: the critical and high-severity gaps are those where the *periphery* of existing work touches the problem but nobody makes it the *central* problem. Teams building communication protocols think about resources; teams building discovery think about lifecycle. The gaps where no team is even circling the problem -- rollback mechanisms, human override, cascade prevention -- are the true blind spots.
## The Ideas Nobody Had
@@ -136,7 +136,7 @@ Sometimes the absence is the finding. Here are technical ideas conspicuous in th
- **Agent migration protocol**: No standard for moving a running agent from one host to another while preserving state and active connections. Critical for cloud deployments.
- **Privacy-preserving agent discovery**: No mechanism for an agent to find capabilities without revealing its intent. "I need a medical diagnosis agent" reveals sensitive information before any trust is established.
- **Privacy-preserving agent discovery**: No mechanism for an agent to find capabilities without revealing its intent. "I need a medical diagnosis agent" reveals sensitive information before any trust is established. Under Art. 25 GDPR (data protection by design and by default), this is not just a nice-to-have -- it is a legal requirement for EU-deployed systems where discovery queries may constitute processing of special category data (Art. 9 GDPR, health data).
- **Agent cost and billing**: No standard for agents to negotiate compensation for services. Agents performing work for other agents have no way to express "this costs X" or "you have Y credits remaining."
@@ -156,14 +156,14 @@ Three practical takeaways for anyone implementing agent systems:
### Key Takeaways
- **96% of ideas appear in exactly one draft** -- fragmentation extends all the way down to the idea level; only 75 of 1,692 unique ideas show cross-draft convergence
- **628 cross-org convergent ideas** (43% of unique clusters, via fuzzy matching) reveal where organizations independently agree; highest-overlap pairs are Chinese institutions (China Unicom-Huawei: 32 shared ideas)
- **The critical gaps remain unfilled**: error recovery has 6 ideas from one draft; cross-protocol translation has zero
- **The vast majority of ideas appear in exactly one draft** -- fragmentation extends all the way down to the idea level
- **628 cross-org convergent ideas** (via fuzzy matching on an earlier extraction run) reveal where organizations independently agree; highest-overlap pairs are Chinese institutions (China Unicom-Huawei: 32 shared ideas)
- **The critical gaps remain unfilled**: rollback mechanisms, failure cascade prevention, and human override have minimal coverage across 434 drafts
- **Five ideas to watch**: ECT (execution DAG), DAAP (accountability), STAMP (delegation proof), ADL (agent description), verifiable conversations (audit trail)
- **Convergence clusters in three areas**: agent communication infrastructure, authentication/authorization, and network architecture
*Next in this series: [Drawing the Big Picture](06-big-picture.md) -- 628 cross-org convergent ideas, 12 gaps, and the architectural vision that connects them.*
*Next in this series: [Drawing the Big Picture](06-big-picture.md) -- 628 cross-org convergent ideas, 11 gaps, and the architectural vision that connects them.*
---
*Idea extraction performed by Claude from full-text analysis of each draft. Classification into types (mechanism, architecture, protocol, pattern, extension, requirement) based on the technical content of each proposal. Data current as of March 2026.*
*Idea extraction performed by Claude from draft abstracts and full text. Classification into types (protocol, architecture, extension, mechanism, requirement, pattern) based on the technical content of each proposal. The current database contains 419 ideas; figures referencing ~1,780 ideas come from an earlier pipeline run with different extraction parameters. Data current as of March 2026.*

View File

@@ -1,14 +1,14 @@
# Drawing the Big Picture: What the Agent Ecosystem Actually Needs
*361 drafts, 628 cross-org convergent ideas, 12 gaps -- and the architectural vision that connects them all.*
*434 drafts, 628 cross-org convergent ideas, 11 gaps -- and the architectural vision that connects them all.*
---
We have spent five posts documenting a paradox: the IETF's AI agent landscape has extraordinary breadth (361 drafts), deep fragmentation at every level (96% of ideas appear in only one draft, 120 competing A2A protocols, 14 OAuth proposals), concentrated authorship (18 team blocs, one company writing 18% of all drafts), and critical gaps (behavior verification, error recovery, human override) that nobody is filling.
We have spent five posts documenting a paradox: the IETF's AI agent landscape has extraordinary breadth (434 drafts), deep fragmentation at every level (the vast majority of ideas appear in only one draft, 155 competing A2A protocols, 14 OAuth proposals), concentrated authorship (18 team blocs, one company writing ~16% of all drafts), and critical gaps (behavioral verification, failure cascade prevention, human override) that nobody is filling.
The landscape has quantity. It lacks architecture.
This post is about what the architecture looks like -- not in theory, but derived from the data. The 12 gaps are not random absences; they are structurally related. The convergent ideas contain the components; they need a blueprint. And the blueprint already has a foundation: existing IETF work on workload identity (SPIFFE/WIMSE) and execution evidence (Execution Context Tokens) provides the lower layers. What is missing is what goes on top.
This post is about what the architecture looks like -- not in theory, but derived from the data. The 11 gaps are not random absences; they are structurally related. The convergent ideas contain the components; they need a blueprint. And the blueprint already has a foundation: existing IETF work on workload identity (SPIFFE/WIMSE) and execution evidence (Execution Context Tokens) provides the lower layers. What is missing is what goes on top.
## What the Ecosystem Needs: Four Pillars
@@ -26,13 +26,13 @@ Every multi-agent workflow is a directed acyclic graph: tasks with dependencies,
The Execution Context Token (ECT) from [draft-nennemann-wimse-ect](https://datatracker.ietf.org/doc/draft-nennemann-wimse-ect/) provides the evidence layer: each task produces a signed token linked to its predecessors via parent references, forming a verifiable DAG. What is missing is the orchestration semantics: when to checkpoint, how to roll back, how to contain cascading failures.
The data supports this: the 6 ideas addressing error recovery (all from [draft-yue-anima-agent-recovery-networks](https://datatracker.ietf.org/doc/draft-yue-anima-agent-recovery-networks/)) include "Task-Oriented Multi-Agent Recovery Framework" and "State Consistency Management" -- DAG concepts by another name. The 117 ideas touching resource management need a graph-aware scheduler. The answer is the same structure: a DAG execution model.
The data supports this: the limited work addressing error recovery (notably [draft-yue-anima-agent-recovery-networks](https://datatracker.ietf.org/doc/draft-yue-anima-agent-recovery-networks/)) includes "Task-Oriented Multi-Agent Recovery Framework" and "State Consistency Management" -- DAG concepts by another name. The answer is the same structure: a DAG execution model.
### Pillar 2: Human-in-the-Loop as First Class
**The gap it fills**: Human Override and Intervention (High), Agent Explainability (Medium)
Only **30 human-agent interaction drafts** exist against **120 A2A protocols** and **93 autonomous operations** drafts. Agents are being designed to talk to each other, not to humans. The CHEQ protocol ([draft-rosenberg-aiproto-cheq](https://datatracker.ietf.org/doc/draft-rosenberg-aiproto-cheq/)) is a rare exception -- it defines human confirmation *before* agent execution. But nobody has standardized what happens *during* execution: how a human pauses a running workflow, constrains an agent's scope, takes over a task, or issues an emergency stop.
Only **34 human-agent interaction drafts** exist against **155 A2A protocol** drafts and **114 autonomous operations** drafts. Agents are being designed to talk to each other, not to humans. The CHEQ protocol ([draft-rosenberg-aiproto-cheq](https://datatracker.ietf.org/doc/draft-rosenberg-aiproto-cheq/)) is a rare exception -- it defines human confirmation *before* agent execution. But nobody has standardized what happens *during* execution: how a human pauses a running workflow, constrains an agent's scope, takes over a task, or issues an emergency stop.
Human-in-the-loop must be a node type in the execution DAG, not an afterthought. The architecture needs:
- **Approval gates**: DAG nodes that block until a human approves
@@ -46,9 +46,9 @@ The irony: every production deployment will require these primitives. The standa
**The gap it fills**: Cross-Protocol Translation (High, zero ideas), Agent Lifecycle Management (High)
The 120 A2A protocol drafts will never converge to a single winner. MCP, A2A Protocol, SLIM, and dozens of others will coexist, each with different strengths. The answer is not to pick one; it is to build a translation layer that lets agents using different protocols interoperate through gateways.
The 155 A2A protocol drafts will never converge to a single winner. MCP, A2A Protocol, SLIM, and dozens of others will coexist, each with different strengths. The answer is not to pick one; it is to build a translation layer that lets agents using different protocols interoperate through gateways.
This gap has **zero ideas** in the current corpus -- the starkest absence across 361 drafts. No team is working on it. Yet it is perhaps the most important architectural piece: without protocol interoperability, the agent ecosystem fragments into vendor-locked silos.
This gap has **zero ideas** in the current corpus -- the starkest absence across 434 drafts. No team is working on it. Yet it is perhaps the most important architectural piece: without protocol interoperability, the agent ecosystem fragments into vendor-locked silos.
The protocol binding layer would define:
- How agents advertise which ecosystem features they support
@@ -75,7 +75,7 @@ The architecture achieves this with *assurance profiles* -- named configurations
| L2 | Signed ECTs (JWT) | Cross-org, standard compliance |
| L3 | Signed ECTs + external audit ledger | Regulated industries |
This dual-regime approach resolves the tension between "move fast" deployments and "prove everything" regulated environments. The 52 ideas touching behavior verification and the 79 ideas touching data provenance become implementable at higher assurance levels without imposing their cost on every deployment.
This dual-regime approach resolves the tension between "move fast" deployments and "prove everything" regulated environments. Ideas touching behavior verification and data provenance become implementable at higher assurance levels without imposing their cost on every deployment.
## How It Builds on What Exists
@@ -90,7 +90,7 @@ The architecture adds connective tissue above this layer, not below it:
| **Identity** | SPIFFE (workload identifier), WIMSE (security context propagation) | Nothing -- use existing identity |
| **Evidence** | ECT (execution context tokens, DAG linking) | Orchestration semantics, checkpoint/rollback, HITL nodes |
| **Auth** | OAuth 2.0, SCIM, DAAP, STAMP, Agentic JWT | Protocol binding so any auth approach works |
| **Communication** | MCP, A2A, SLIM, 120 other protocols | Translation layer and capability advertisement |
| **Communication** | MCP, A2A, SLIM, 155 other protocols | Translation layer and capability advertisement |
| **Safety** | DAAP (accountability), verifiable conversations, VERA (zero-trust) | Assurance profiles connecting these into deployable configurations |
The proposed five-draft ecosystem:
@@ -105,7 +105,7 @@ Each draft addresses specific gaps. Together, they provide the connective tissue
## Traction vs. Aspiration
A reality check: of the 361 drafts, only **36 (10%)** have been adopted by IETF working groups. The rest are individual submissions -- proposals without institutional backing. The WG-adopted drafts score higher on average (**3.54 vs. 3.31**), particularly on maturity (+1.28) and momentum (+0.98), but lower on novelty (-0.45). *(Note: scores are LLM-generated relative rankings from abstracts; see [Methodology](../methodology.md).)* The WGs that have adopted the most agent-relevant drafts are security-focused: **lamps** (6 drafts), **lake** (5), **tls** (3), **emu** (3). Agent-specific WGs like `aipref` have adopted only 2 drafts.
A reality check: of the 434 drafts, **52 (12%)** have been adopted by IETF working groups. The rest are individual submissions -- proposals without institutional backing. The WG-adopted drafts score higher on average (**3.61 vs. 3.23**, 4-dimension composite), particularly on maturity (+1.28) and momentum (+0.98), but lower on novelty (-0.45). *(Note: scores are LLM-generated relative rankings from abstracts; see [Methodology](../methodology.md).)* The WGs that have adopted the most agent-relevant drafts are security-focused: **lamps** (6 drafts), **lake** (5), **tls** (3), **emu** (3). Agent-specific WGs like `aipref` have adopted only 2 drafts.
This reveals a structural insight: the IETF is not building agent standards from scratch. It is **retrofitting security standards for agents**. The agent architecture we propose above would need to work within this reality -- building on the security WGs' infrastructure rather than competing with it.
@@ -117,7 +117,7 @@ Based on the data trajectories and current momentum:
**Within 12 months**: The DMSC side meeting's gateway work will produce a specification, likely gateway-centric with Agent Gateways as the primary interoperability mechanism. This is not the protocol-agnostic translation layer the ecosystem needs, but it will be the first concrete interop proposal.
**Within 18 months**: The safety deficit will begin to close -- not from IETF drafts but from regulatory pressure. The EU AI Act's requirements for high-risk AI systems will drive demand for behavior verification, human override, and audit standards. The IETF will respond reactively.
**Within 5 months (August 2026)**: The EU AI Act (Regulation 2024/1689), which entered into force on 1 August 2024, becomes fully applicable on 2 August 2026. Its requirements for high-risk AI systems -- including mandatory risk management (Art. 9), human oversight (Art. 14), record-keeping (Art. 12), and accuracy/robustness (Art. 15) -- will drive immediate demand for behavior verification, human override, and audit standards. Non-compliance carries penalties up to 35 million EUR or 7% of global annual turnover (Art. 99). This is not future regulatory pressure; it is current law with imminent enforcement. The safety deficit is simultaneously a technical gap and a compliance gap for any agent system deployed in the EU.
**The risk**: If the architecture work does not happen in the next 12 months, the agent ecosystem will calcify around vendor-specific protocol stacks (OpenAI's, Google's, Anthropic's, Huawei's). Each will have its own auth, discovery, and communication layer. The interoperability window will close, and the IETF's work will be standards for islands rather than standards for the internet.
@@ -149,9 +149,9 @@ If you are building agent systems and cannot wait for standards to mature:
Across six posts, we have built to one argument:
**The IETF's AI agent standardization effort is the largest, fastest-growing, and most consequential standards race in a decade. But it is building the highways before the traffic lights.** The data shows explosive growth (from 0.5% to 9.3% of all IETF submissions in 15 months), deep fragmentation (120 competing A2A protocols), concerning concentration (one company writes 18% of all drafts), and a structural safety deficit (4:1 capability to guardrails). What is missing is not more protocols -- it is connective tissue: a shared execution model, human oversight primitives, protocol interoperability, and assurance profiles that work from development to regulated production.
**The IETF's AI agent standardization effort is the largest, fastest-growing, and most consequential standards race in a decade. But it is building the highways before the traffic lights.** The data shows explosive growth (from 0.5% to 9.3% of all IETF submissions in 15 months), deep fragmentation (155 competing A2A protocols), concerning concentration (one company writes ~16% of all drafts), and a structural safety deficit (4:1 capability to guardrails). What is missing is not more protocols -- it is connective tissue: a shared execution model, human oversight primitives, protocol interoperability, and assurance profiles that work from development to regulated production.
The 75 convergent ideas -- and the broader set of 628 cross-org overlaps -- contain the components for this architecture. The question is whether the community can assemble them before the protocols ship without it. The convergence data suggests it is possible: **180 ideas already cross the Chinese-Western divide**, mediated largely by European telecoms (Deutsche Telekom, Telefonica, Orange) that operate in both markets and appear on both sides of nearly every major cross-cultural convergent idea. The bridge-builders exist. They need an architecture to bridge to.
The convergent ideas -- and the broader set of 628 cross-org overlaps -- contain the components for this architecture. The question is whether the community can assemble them before the protocols ship without it. The convergence data suggests it is possible: **180 ideas already cross the Chinese-Western divide**, mediated largely by European telecoms (Deutsche Telekom, Telefonica, Orange) that operate in both markets and appear on both sides of nearly every major cross-cultural convergent idea. The bridge-builders exist. They need an architecture to bridge to.
The IETF has built the internet's infrastructure before. DNS, HTTP, TLS -- each emerged from periods of competing proposals, fragmentation, and coordinated resolution. The AI agent standards race is following the same pattern, on a compressed timeline, with higher stakes.
@@ -163,12 +163,12 @@ The traffic lights need to catch up to the highways. The data says they can -- i
- **Four missing pillars**: DAG-based execution, human-in-the-loop primitives, protocol-agnostic interoperability, and assurance profiles for dual-regime deployment
- **The architecture builds on existing work**: SPIFFE for identity, WIMSE for security context, ECT for execution evidence -- the foundation exists
- **Five proposed drafts** (AEM, ATD, HITL, AEPB, APAE) would fill the 12 gaps by providing connective tissue between existing protocol proposals
- **Five proposed drafts** (AEM, ATD, HITL, AEPB, APAE) would fill the 11 gaps by providing connective tissue between existing protocol proposals
- **The interoperability window is closing**: vendor-specific agent stacks are forming; the next 12 months are critical for open standards
- **For builders today**: design for DAGs, build HITL from the start, make assurance configurable, avoid protocol lock-in
*Next in this series: [How We Built This](07-how-we-built-this.md) -- the methodology behind analyzing 361 IETF drafts with Claude, Ollama, and Python.*
*Next in this series: [How We Built This](07-how-we-built-this.md) -- the methodology behind analyzing 434 IETF drafts with Claude, Ollama, and Python.*
---
*Synthesis based on the full IETF Draft Analyzer dataset: 361 drafts, 557 authors, 75 cross-draft convergent ideas (628 via fuzzy matching), 12 gaps, 18 team blocs, 42 overlap clusters. Data current as of March 2026.*
*Synthesis based on the full IETF Draft Analyzer dataset: 434 drafts, 557 authors, 628 cross-org convergent ideas (via fuzzy matching), 11 gaps, 18 team blocs. Data current as of March 2026.*

View File

@@ -1,10 +1,10 @@
# How We Built This: Analyzing 361 IETF Drafts with Claude and Ollama
# How We Built This: Analyzing 434 IETF Drafts with Claude and Ollama
*The engineering behind the analysis -- a Python CLI, two LLMs, one SQLite database, and ~$9.*
---
Every claim in this series -- the 4:1 safety ratio, the 14 competing OAuth proposals, the 18 team blocs, the 12 gaps, the 180 ideas crossing the Chinese-Western divide -- comes from an automated analysis pipeline we built in Python. This post describes how it works, what it costs, what it found that surprised us, and what we learned about LLM-powered document analysis at scale.
Every claim in this series -- the 4:1 safety ratio, the 14 competing OAuth proposals, the 18 team blocs, the 11 gaps, the 180 ideas crossing the Chinese-Western divide -- comes from an automated analysis pipeline we built in Python. This post describes how it works, what it costs, what it found that surprised us, and what we learned about LLM-powered document analysis at scale.
The tool is open source. If you want to run it on a different corner of the IETF -- or adapt it for another standards body -- everything you need is in the repository.
@@ -40,7 +40,7 @@ We search for drafts matching 12 keywords: `agent`, `ai-agent`, `llm`, `autonomo
**Gotchas learned the hard way**: The Datatracker API uses `type__slug=draft` (not `type=draft`) to filter to drafts. Pagination requires tracking `meta.next` through the response chain. Affiliation data comes from the `documentauthor` record, not the `person` record. We add a 0.5-second polite delay between requests.
The result: **361 drafts** fetched, with full metadata and text stored in SQLite.
The result: **434 drafts** fetched, with full metadata and text stored in SQLite.
### Stage 2: Analyze
@@ -59,7 +59,7 @@ We initially sent full draft text to Claude, but switched to abstract-only analy
For similarity analysis, we generate vector embeddings using Ollama running locally with the `nomic-embed-text` model. Each draft's abstract is embedded into a 768-dimensional vector, stored as raw bytes in the database.
**Why not Claude for embeddings?** Cost and speed. Ollama runs locally, is free, and processes all 361 drafts in under a minute. The embeddings are used for approximate similarity (cosine distance), overlap detection, and t-SNE visualization -- tasks where a small local model is perfectly adequate.
**Why not Claude for embeddings?** Cost and speed. Ollama runs locally, is free, and processes all 434 drafts in under a minute. The embeddings are used for approximate similarity (cosine distance), overlap detection, and t-SNE visualization -- tasks where a small local model is perfectly adequate.
The embeddings enable:
- **Overlap clusters**: Draft pairs with >0.85 cosine similarity grouped together
@@ -72,7 +72,7 @@ The most expensive stage. Each draft's full text is analyzed by Claude to extrac
**Batch optimization**: Rather than calling Claude once per draft, we batch 5 drafts per API call using Claude Haiku (`--cheap --batch 5`). This cuts the number of API calls by 5x and uses the cheaper model. The batch prompt includes all 5 drafts' texts and asks for ideas from each, reducing per-idea cost to fractions of a cent.
**Result**: **1,780 technical components** extracted from 361 drafts (averaging ~5 per draft). Of 1,692 unique titles, **96% appear in exactly one draft** -- most are draft-specific component descriptions ("Agent Gateway," "Transport Configuration System"), not standalone innovations. Only **75 ideas** show genuine cross-draft convergence (appearing in 2+ drafts), and only **11** appear in 3+ drafts. The real signal comes from the cross-org overlap analysis (idea-overlap feature), which uses fuzzy matching to identify **628 ideas** where 2+ organizations work on recognizably similar problems -- 43% of all unique idea clusters.
**Result**: The current database contains **419 ideas** across 377 drafts. An earlier pipeline run produced roughly 1,780 components from 361 drafts (averaging ~5 per draft). The difference reflects changes in extraction parameters, batching strategy, and deduplication -- a known limitation of LLM-based extraction. What is consistent across both runs: the vast majority of extracted ideas appear in exactly one draft, and most are draft-specific component descriptions rather than standalone innovations. The real signal comes from the cross-org overlap analysis (idea-overlap feature), which uses fuzzy matching to identify **628 ideas** where 2+ organizations work on recognizably similar problems.
### Stage 5: Gaps
@@ -80,7 +80,7 @@ The gap analysis is a synthesis step. We send Claude Sonnet the full landscape c
This is the one stage where the LLM is doing genuine reasoning, not just extraction. The prompt provides the data; Claude identifies the structural gaps. We validate its findings against the raw data (e.g., confirming that only 6 ideas address error recovery, or that cross-protocol translation has zero ideas).
**Result**: **12 gaps** identified (3 critical, 6 high, 3 medium), each cross-referenced with related drafts and ideas.
**Result**: **11 gaps** identified (2 critical, 5 high, 4 medium), each cross-referenced with related drafts and ideas.
### Stage 6: Report
@@ -92,15 +92,15 @@ The SQLite database is the real product. At **28 MB**, it contains everything ne
| Table | Rows | Purpose |
|-------|-----:|---------|
| drafts | 361 | Full metadata + text for every draft |
| ratings | 361 | 5-dimension quality scores + summaries |
| embeddings | 361 | 768-dim vectors as binary blobs |
| ideas | 1,780 | Extracted technical components with types |
| drafts | 434 | Full metadata + text for every draft |
| ratings | 434 | 5-dimension quality scores + summaries |
| embeddings | 434 | 768-dim vectors as binary blobs |
| ideas | 419 | Extracted technical components with types |
| authors | 557 | Person records from Datatracker |
| draft_authors | 1,057 | Author-to-draft linkage with affiliation |
| draft_refs | 4,231 | RFC/draft/BCP cross-references |
| gaps | 12 | Identified standardization gaps |
| llm_cache | 703 | Cached Claude API responses |
| gaps | 11 | Identified standardization gaps |
| llm_cache | 1,397 | Cached Claude API responses |
FTS5 full-text search is enabled on drafts, supporting queries like `ietf search "agent authentication"` that return ranked results in milliseconds. Indexes on `draft_refs(ref_type, ref_id)` and `ideas(draft_name)` keep query performance fast even for cross-table joins.
@@ -132,7 +132,7 @@ Four features were added during the analysis session, each unlocking a deeper an
### RFC Cross-References (`ietf refs`)
**What it does**: Parses all 361 drafts for RFC references using regex (`RFC\s*\d{4,}`, `\[RFC\d+\]`, `BCP\s*\d+`, `draft-[\w-]+`). Stores results in a `draft_refs` table for querying.
**What it does**: Parses all 434 drafts for RFC references using regex (`RFC\s*\d{4,}`, `\[RFC\d+\]`, `BCP\s*\d+`, `draft-[\w-]+`). Stores results in a `draft_refs` table for querying.
**What it found**: **4,231 cross-references** (2,443 RFC, 698 draft, 1,090 BCP) across 360 drafts with text. The most-referenced standards reveal what the agent ecosystem builds on:
@@ -160,15 +160,15 @@ Four features were added during the analysis session, each unlocking a deeper an
**What it does**: Groups similar ideas using `SequenceMatcher` (threshold 0.75), then checks which ideas span drafts from multiple organizations. This separates genuine cross-org consensus from intra-team duplication.
**What it found**: By exact title, only 75 of 1,692 unique ideas appear in 2+ drafts -- 96% are islands. But fuzzy matching reveals **628 ideas** where 2+ organizations work on recognizably similar problems (43% of unique clusters). The top convergence signal -- "A2A Communication Paradigm" -- spans **8 organizations from 4 countries**. The deeper finding: **180 ideas cross the Chinese-Western organizational divide**. European telecoms (Deutsche Telekom, Telefonica, Orange) act as bridges between Chinese institutions and Western companies. US Big Tech (Google, Apple, Amazon) is almost entirely absent from cross-divide collaboration.
**What it found**: By exact title, the vast majority of unique ideas appear in only a single draft. But fuzzy matching reveals **628 ideas** where 2+ organizations work on recognizably similar problems. The top convergence signal -- "A2A Communication Paradigm" -- spans **8 organizations from 5 countries**. The deeper finding: **180 ideas cross the Chinese-Western organizational divide**. European telecoms (Deutsche Telekom, Telefonica, Orange) act as bridges between Chinese institutions and Western companies. US Big Tech (Google, Apple, Amazon) is almost entirely absent from cross-divide collaboration.
### WG Adoption Status (`ietf status`)
**What it does**: Determines which drafts have been formally adopted by IETF Working Groups based on the `draft-ietf-{wg}-*` naming convention. Compares scores, categories, and gap coverage between WG-adopted and individual drafts.
**What it found**: Only **36 of 361 drafts (10%)** are WG-adopted. The remaining 90% are individual submissions -- ideas seeking institutional backing. WG-adopted drafts score slightly higher on average (**3.54 vs 3.31**), validating our rating methodology.
**What it found**: **52 of 434 drafts (12%)** are WG-adopted. The remaining 90% are individual submissions -- ideas seeking institutional backing. WG-adopted drafts score slightly higher on average (**3.61 vs 3.23**), validating our rating methodology.
The most revealing finding: **19 of 36 WG-adopted drafts are in security Working Groups** (lamps, lake, tls, emu, ace). The agent-focused `aipref` WG has only 2 adopted drafts. The IETF is not building agent standards in agent-focused groups -- it is retrofitting its existing security infrastructure for agent use cases. The standards that will actually govern AI agents on the internet are being written by the same people who write TLS and OAuth, not by new agent-specific working groups.
The most revealing finding: **a majority of WG-adopted drafts are in security Working Groups** (lamps, lake, tls, emu, ace). The agent-focused `aipref` WG has only 2 adopted drafts. The IETF is not building agent standards in agent-focused groups -- it is retrofitting its existing security infrastructure for agent use cases. The standards that will actually govern AI agents on the internet are being written by the same people who write TLS and OAuth, not by new agent-specific working groups.
## What We Learned
@@ -202,16 +202,16 @@ The most valuable output is not any single report -- it is the SQLite database.
|-------|-------|-------:|-----:|
| Analyze | Claude Sonnet | 260 | ~$2.50 |
| Analyze | Claude Sonnet | 101 | ~$5.50 |
| Ideas | Claude Haiku (batch 5) | 361 | ~$0.80 |
| Ideas | Claude Haiku (batch 5) | 434 | ~$0.80 |
| Gaps | Claude Sonnet | 1 call | ~$0.20 |
| Embed | Ollama (local) | 361 | $0.00 |
| Refs | Regex (local) | 361 | $0.00 |
| Trends | SQL (local) | 361 | $0.00 |
| Idea-overlap | SequenceMatcher (local) | 1,780 ideas | $0.00 |
| WG Status | Naming convention | 361 | $0.00 |
| Embed | Ollama (local) | 434 | $0.00 |
| Refs | Regex (local) | 434 | $0.00 |
| Trends | SQL (local) | 434 | $0.00 |
| Idea-overlap | SequenceMatcher (local) | 419 ideas | $0.00 |
| WG Status | Naming convention | 434 | $0.00 |
| **Total** | | | **~$9** |
For context: analyzing 361 IETF drafts -- fetching full text, rating quality on 5 dimensions, extracting ~1,700 technical components, detecting 12 gaps, mapping 557 authors, parsing 4,231 cross-references, and identifying 18 team blocs -- cost less than two large coffees.
For context: analyzing 434 IETF drafts -- fetching full text, rating quality on 5 dimensions, extracting 419 technical ideas, detecting 11 gaps, mapping 557 authors, parsing 4,231 cross-references, and identifying 18 team blocs -- cost less than two large coffees.
## The Tech Stack

View File

@@ -1,10 +1,10 @@
# Agents Building the Agent Analysis
*We used a team of AI agents to analyze, write about, and draw conclusions from 361 IETF drafts on AI agents. Here is what that looked like from the inside.*
*We used a team of AI agents to analyze, write about, and draw conclusions from 434 IETF drafts on AI agents. Here is what that looked like from the inside.*
---
There is an irony we should address up front: this entire blog series -- analyzing 361 Internet-Drafts about how AI agents should work -- was itself produced by a team of AI agents. Four Claude instances, each with a distinct role, reading the same data, building on each other's output, and coordinating through a shared task system and development journal.
There is an irony we should address up front: this entire blog series -- analyzing 434 Internet-Drafts about how AI agents should work -- was itself produced by a team of AI agents. Four Claude instances, each with a distinct role, reading the same data, building on each other's output, and coordinating through a shared task system and development journal.
This post is the story of that process: what worked, what surprised us, and what it reveals about the state of AI agent coordination in practice -- which, as it happens, is exactly the problem the IETF drafts are trying to solve.
@@ -15,7 +15,7 @@ We designed a four-agent team, each with a one-page definition file and a shared
| Agent | Role | What They Did |
|-------|------|---------------|
| **Architect** | "The Big Picture" | Read all reports, designed the narrative arc, wrote the vision document, reviewed every post across multiple passes |
| **Analyst** | "The Data Whisperer" | Ran the full pipeline on 361 drafts, executed 20+ SQL queries, produced 7 data packages |
| **Analyst** | "The Data Whisperer" | Ran the full pipeline on 434 drafts, executed 20+ SQL queries, produced 7 data packages |
| **Coder** | "The Feature Builder" | Implemented 7 new analysis features (refs, trends, idea-overlap, WG adoption, revisions, centrality, co-occurrence) |
| **Writer** | "The Storyteller" | Drafted all 8 blog posts, applied 6+ revision passes incorporating data refreshes, architectural reframes, and editorial redirections |
@@ -31,7 +31,7 @@ The process unfolded in roughly six phases -- not the four we planned.
All four agents started simultaneously. The Analyst began running the analysis pipeline on 101 new drafts. The Architect read all 10 existing reports and started designing the narrative arc. The Coder read the Architect's initial notes and began implementing new features. The Writer read every data report in the project.
The key design decision: **agents did not wait for each other when they could work in parallel.** The Writer's tasks were formally blocked by the Analyst's pipeline run, but the Writer had enough existing data (260 analyzed drafts) to start drafting. Rather than sitting idle, the Writer produced first drafts of all 6 core posts while waiting for updated numbers. This turned out to be the right call -- the structure and narrative mattered more than whether the draft count was 260 or 361.
The key design decision: **agents did not wait for each other when they could work in parallel.** The Writer's tasks were formally blocked by the Analyst's pipeline run, but the Writer had enough existing data (260 analyzed drafts) to start drafting. Rather than sitting idle, the Writer produced first drafts of all 6 core posts while waiting for updated numbers. This turned out to be the right call -- the structure and narrative mattered more than whether the draft count was 260 or 434.
### Phase 2: The Architect Sets the Frame
@@ -79,15 +79,15 @@ This is exactly the kind of silent failure that agent teams need guardrails for.
### Phase 5: The Data Arrives and the Reframing Battle
While the writing and reviewing unfolded, the Analyst completed the full pipeline: 361 drafts rated, 557 authors mapped (up from 403), 1,780 ideas extracted (up from 1,262). The numbers changed significantly: Huawei's share grew from 12% to 18%, A2A protocols from 92 to 120, and the safety ratio held steady at roughly 4:1. Every blog post needed a numbers-update pass.
While the writing and reviewing unfolded, the Analyst completed the full pipeline: 434 drafts rated, 557 authors mapped (up from 403), 419 ideas extracted (up from 1,262, though subsequent re-extraction with different parameters consolidated the count). The numbers changed significantly: Huawei's share grew from 12% to ~16%, A2A protocols from 92 to 155, and the safety ratio held steady at roughly 4:1. Every blog post needed a numbers-update pass.
But the most consequential event in Phase 5 was not the data refresh. It was the project lead challenging the Writer's headline claim.
**The "1,780 ideas" reframing.** The series had been built around a headline number: "1,780 technical ideas extracted from 361 drafts." The project lead asked: what does that number actually mean? The answer was uncomfortable. The pipeline extracts approximately 5 ideas per draft on average -- a mechanical process that produces "ideas" like "A2A Communication Paradigm" and "Agent Network Architecture." The raw count sounds impressive but is mostly scaffolding.
**The ideas reframing.** The series had been built around a headline number: "1,780 technical ideas extracted from the drafts." The project lead asked: what does that number actually mean? The answer was uncomfortable. The pipeline extracts approximately 5 ideas per draft on average -- a mechanical process that produces "ideas" like "A2A Communication Paradigm" and "Agent Network Architecture." The raw count sounds impressive but is mostly scaffolding.
The real signal was hiding in the Coder's cross-org overlap analysis: of 1,692 unique idea titles, **96% appear in exactly one draft.** Only 75 show up in two or more drafts. Only 11 in three or more. The fragmentation that defines the protocol landscape extends all the way down to the idea level.
This required rewriting Post 5 entirely -- its title changed from "The 1,780 Ideas That Will Shape Agent Infrastructure" to "Where 361 Drafts Converge (And Where They Don't)." The lead metric shifted from raw extraction count (impressive but hollow) to the 96% fragmentation rate (honest and striking). Every post that referenced the idea count had to be updated, some multiple times as the framing evolved through three iterations.
This required rewriting Post 5 entirely -- its title changed from "The 1,780 Ideas That Will Shape Agent Infrastructure" to "Where 434 Drafts Converge (And Where They Don't)." The lead metric shifted from raw extraction count (impressive but hollow) to the 96% fragmentation rate (honest and striking). Every post that referenced the idea count had to be updated, some multiple times as the framing evolved through three iterations.
The episode is worth documenting because it illustrates the irreducible role of human judgment in agent-produced work. Four agents had independently used the 1,780 figure -- the Analyst generated it, the Coder validated it, the Architect designed around it, the Writer headlined it. None questioned whether it was meaningful. It took a human asking "so what?" to force the reframe. The improved version -- convergence-amid-fragmentation, with 628 cross-org convergent ideas as the honest middle ground -- was genuinely better. But no agent surfaced the critique on its own.
@@ -97,7 +97,7 @@ The Analyst's second deep-analysis round produced three findings that significan
**RFC foundation divergence.** The Chinese bloc builds on YANG/NETCONF (network management). The Western bloc builds on COSE/CBOR/CoAP (IoT security) and HTTP/TLS/PKI (web infrastructure). The **only shared foundation is OAuth 2.0.** This elevated Post 3's fragmentation thesis from "different protocols" to "different technological DNA" -- the two blocs are not just disagreeing on solutions, they are building on incompatible infrastructure.
**Revision velocity.** 55% of all 361 drafts are at revision -00 -- submitted once, never iterated. Huawei's rate is 65%. Compare that with Ericsson (11%), Boeing (average revision 28.2), and Siemens (17.2). The volume-vs.-commitment distinction sharpened Post 2's analysis of what Huawei's 66-draft campaign actually represents. A further detail: the majority of Huawei's drafts were submitted in the 4-week window before IETF 121 Dublin -- a coordinated pre-meeting filing burst.
**Revision velocity.** 55% of all 434 drafts are at revision -00 -- submitted once, never iterated. Huawei's rate is 65%. Compare that with Ericsson (11%), Boeing (average revision 28.2), and Siemens (17.2). The volume-vs.-commitment distinction sharpened Post 2's analysis of what Huawei's 69-draft campaign actually represents. A further detail: the majority of Huawei's drafts were submitted in the 4-week window before IETF 121 Dublin -- a coordinated pre-meeting filing burst.
**Centrality bridge-builders.** The co-authorship network (491 nodes, 1,142 edges) revealed that European telecoms -- not US Big Tech, not the UN, not any formal body -- are the structural glue between the Chinese and Western blocs. Telefonica's Luis M. Contreras ranks #1 in betweenness centrality. Only 115 of 557 authors (23%) bridge the divide at all. The standards ecosystem's cross-divide cohesion depends on a handful of companies that most observers would not name first.
@@ -129,8 +129,8 @@ All three contributions came from reading holistically -- something no individua
| Component | Cost | Most Important Finding |
|-----------|-----:|----------------------|
| Claude Sonnet (ratings, gaps) | ~$8 | 4:1 safety deficit, 12 gap taxonomy |
| Claude Haiku (idea extraction) | ~$0.80 | 1,780 raw ideas (96% fragmented) |
| Claude Sonnet (ratings, gaps) | ~$8 | 4:1 safety deficit, 11 gap taxonomy |
| Claude Haiku (idea extraction) | ~$0.80 | 419 ideas (vast majority unique to single drafts) |
| Ollama embeddings | $0.00 | 25+ near-duplicate pairs |
| Coder: regex RFC parsing | $0.00 | Foundation divergence (YANG vs COSE) |
| Coder: networkx centrality | $0.00 | European telecoms as bridge-builders |
@@ -164,7 +164,7 @@ Six lessons from running a four-agent team on a real project:
## The Meta-Irony
We built a team of AI agents to analyze 361 IETF drafts about AI agent standards. The team needed: coordination mechanisms, shared context, role-based specialization, review and quality gates, human oversight, and a way to verify that completed work was actually complete.
We built a team of AI agents to analyze 434 IETF drafts about AI agent standards. The team needed: coordination mechanisms, shared context, role-based specialization, review and quality gates, human oversight, and a way to verify that completed work was actually complete.
Every one of these needs maps to a gap in the IETF landscape:

View File

@@ -1,6 +1,6 @@
# State of the IETF AI Agent Ecosystem: Where We Are and Where We're Going
*A vision document synthesizing 361 drafts, 557 authors, 628 cross-org convergent ideas, and 12 gaps into a picture of the AI agent standards landscape in 2026 and its trajectory through 2028.*
*A vision document synthesizing 434 drafts, 557 authors, 628 cross-org convergent ideas, and 11 gaps into a picture of the AI agent standards landscape in 2026 and its trajectory through 2028.*
---
@@ -8,7 +8,7 @@
The IETF's AI agent standardization landscape in March 2026 resembles a city under construction: cranes everywhere, foundations going in, multiple development teams building in parallel -- but no master plan, no zoning, and the safety inspectors have not been hired yet.
The numbers tell the story. In nine months, from June 2025 to February 2026, the rate of AI/agent-related Internet-Draft submissions grew from 2 per month to 72 -- a 36x increase. The corpus now contains **361 drafts** from **557 authors** representing **230 organizations**. Our cross-organization analysis found **628 technical ideas** independently proposed by multiple organizations -- genuine consensus signals amid the noise -- and identified **12 standardization gaps**, three of them critical.
The numbers tell the story. In nine months, from June 2025 to February 2026, the rate of AI/agent-related Internet-Draft submissions grew rapidly. By February 2026, submissions reached 85 per month, up from single digits in mid-2025. The corpus now contains **434 drafts** from **557 authors** representing **230 organizations**. Our cross-organization analysis found **628 technical ideas** independently proposed by multiple organizations -- genuine consensus signals amid the noise -- and identified **11 standardization gaps**, three of them critical.
This is not incremental growth. This is a phase transition, comparable to the IoT draft surge of 2014-2016 or the early web standards push of the mid-1990s. The IETF is being asked to standardize the infrastructure for a new class of internet participant: the autonomous software agent.
@@ -16,11 +16,11 @@ But the landscape that has emerged is not converging. It is fragmenting.
### The Structural Problems
**Fragmentation without coordination.** The 361 drafts cluster into at least 42 topically overlapping groups. The most crowded area -- OAuth extensions for AI agents -- has 14 competing drafts, each proposing a different approach to the same problem: how does an autonomous agent authenticate and obtain authorization? In the agent-to-agent communication space, 120 drafts propose protocols with no interoperability layer between them. We found 25 near-duplicate pairs where teams independently wrote essentially the same specification.
**Fragmentation without coordination.** The 434 drafts cluster into at least 42 topically overlapping groups. The most crowded area -- OAuth extensions for AI agents -- has 14 competing drafts, each proposing a different approach to the same problem: how does an autonomous agent authenticate and obtain authorization? In the agent-to-agent communication space, 155 drafts propose protocols with no interoperability layer between them. We found 25 near-duplicate pairs where teams independently wrote essentially the same specification.
**Concentration without diversity.** One organization -- Huawei -- accounts for 53 authors and 66 drafts, 18% of the entire corpus. A single 13-person team within Huawei co-authors 22 drafts at 94% internal cohesion. The broader Chinese institutional ecosystem (Huawei, China Mobile, China Telecom, China Unicom, Tsinghua University, ZTE, BUPT, CAICT, Zhongguancun Lab) collectively fields over 160 authors. Meanwhile, Google, Microsoft, and Apple are largely absent from AI agent protocol work. The standards that will govern how AI agents identify, authenticate, and communicate on the internet are being written by a remarkably narrow group.
**Concentration without diversity.** One organization -- Huawei -- accounts for 53 authors and 69 drafts, ~16% of the entire corpus. A single 13-person team within Huawei co-authors 22 drafts at 94% internal cohesion. The broader Chinese institutional ecosystem (Huawei, China Mobile, China Telecom, China Unicom, Tsinghua University, ZTE, BUPT, CAICT, Zhongguancun Lab) collectively fields over 160 authors. Meanwhile, Google, Microsoft, and Apple are largely absent from AI agent protocol work. The standards that will govern how AI agents identify, authenticate, and communicate on the internet are being written by a remarkably narrow group.
**Capability without safety.** For every draft addressing AI safety, alignment, or human oversight, approximately four drafts build new agent capabilities. Only 44 of 361 drafts touch safety. Only 30 address human-agent interaction, compared to 120 A2A protocols and 93 autonomous network operations drafts. The three critical gaps we identified -- behavior verification, resource management, and error recovery -- all concern what happens when agents fail or misbehave. These gaps have received minimal attention.
**Capability without safety.** For every draft addressing AI safety, alignment, or human oversight, approximately four drafts build new agent capabilities. Only 47 of 434 drafts touch safety. Only 34 address human-agent interaction, compared to 155 A2A protocols and 114 autonomous network operations drafts. The two critical gaps we identified -- behavioral verification and failure cascade prevention -- concern what happens when agents fail or misbehave. These gaps have received minimal attention.
---
@@ -28,19 +28,19 @@ But the landscape that has emerged is not converging. It is fragmenting.
The deepest problem is not fragmentation or concentration. It is the absence of connective tissue.
The 361 drafts contain the pieces of an agent ecosystem. What they lack is a shared model of how those pieces fit together. Consider what a deployed multi-agent system actually needs:
The 434 drafts contain the pieces of an agent ecosystem. What they lack is a shared model of how those pieces fit together. Consider what a deployed multi-agent system actually needs:
1. **An execution model**: How are agent tasks organized, sequenced, and tracked? What is the unit of work? How do dependencies between tasks get expressed? Today: no standard. Every draft assumes its own task model.
2. **Human oversight primitives**: When does a human need to approve, intervene, or override an agent's decision? How does the override propagate? How is the decision recorded for audit? Today: 30 drafts touch this, none define standard primitives.
2. **Human oversight primitives**: When does a human need to approve, intervene, or override an agent's decision? How does the override propagate? How is the decision recorded for audit? Today: 34 drafts touch this, none define standard primitives.
3. **Error recovery and rollback**: When an autonomous agent makes a bad decision, how do you undo it? When a cascade of failures ripples through an agent network, how do you contain the blast radius? Today: one draft (draft-yue-anima-agent-recovery-networks) partially addresses this. The rest of the 360 ignore it.
3. **Error recovery and rollback**: When an autonomous agent makes a bad decision, how do you undo it? When a cascade of failures ripples through an agent network, how do you contain the blast radius? Today: one draft (draft-yue-anima-agent-recovery-networks) partially addresses this. The rest of the 433 ignore it.
4. **Protocol interoperability**: With 120 competing A2A protocols, how does an agent speaking Protocol A communicate with an agent speaking Protocol B? Today: zero ideas in the entire corpus for cross-protocol translation. This gap is entirely unaddressed.
4. **Protocol interoperability**: With 155 competing A2A protocols, how does an agent speaking Protocol A communicate with an agent speaking Protocol B? Today: zero ideas in the entire corpus for cross-protocol translation. This gap is entirely unaddressed.
5. **Assurance profiles**: How does the same agent ecosystem work in a fast development environment (acceptable risk, minimal overhead) AND a regulated production environment (proofs, attestations, compliance)? Today: the discussion is split between safety-oriented drafts and capability-oriented drafts with no bridge between them.
These five needs map precisely to the five most critical and high-severity gaps in our analysis. They are not exotic requirements; they are the basic infrastructure that any production agent deployment will need. The fact that 361 drafts have been written without addressing them is the landscape's defining weakness.
These five needs map precisely to the five most critical and high-severity gaps in our analysis. They are not exotic requirements; they are the basic infrastructure that any production agent deployment will need. The fact that 434 drafts have been written without addressing them is the landscape's defining weakness.
---
@@ -58,7 +58,7 @@ The current trajectory continues. Draft volume doubles again. The OAuth-for-agen
### Scenario B: Consolidation Through Working Groups
The IETF establishes one or more focused working groups specifically for AI agent architecture (not just individual protocols). These WGs force consolidation: the 14 OAuth proposals get down to 2-3. The 120 A2A protocols get mapped against a common requirements document. Gap-filling work gets explicitly chartered.
The IETF establishes one or more focused working groups specifically for AI agent architecture (not just individual protocols). These WGs force consolidation: the 14 OAuth proposals get down to 2-3. The 155 A2A protocols get mapped against a common requirements document. Gap-filling work gets explicitly chartered.
**Result**: A more coherent landscape emerges by mid-2027. Not a single standard, but a small number of complementary standards with defined interfaces between them. Safety work gets a mandate.
@@ -88,11 +88,11 @@ The most critical missing piece is a shared execution model for agent tasks. Exe
### 2. Build human oversight in now, not later
The 30-vs-120 human-agent-to-A2A ratio is not just a standards problem; it is an engineering problem. Systems being designed today without human override primitives will need to be retrofitted. The CHEQ protocol (draft-rosenberg-aiproto-cheq) and the LLM-assisted network management framework (draft-cui-nmrg-llm-nm) both propose HITL models. Pick one and build to it, or design your own -- but do not ship agent systems without override capability.
The 34-vs-155 human-agent-to-A2A ratio is not just a standards problem; it is an engineering problem. Systems being designed today without human override primitives will need to be retrofitted. The CHEQ protocol (draft-rosenberg-aiproto-cheq) and the LLM-assisted network management framework (draft-cui-nmrg-llm-nm) both propose HITL models. Pick one and build to it, or design your own -- but do not ship agent systems without override capability.
### 3. Assume protocol diversity, design for translation
The 120-protocol landscape is not going to consolidate to one protocol. Design agent systems with protocol abstraction layers. Assume that agents in your ecosystem will eventually need to communicate with agents speaking different protocols. The gateway pattern (draft-agent-gw, draft-li-dmsc-macp) is emerging as the pragmatic solution.
The 155-protocol landscape is not going to consolidate to one protocol. Design agent systems with protocol abstraction layers. Assume that agents in your ecosystem will eventually need to communicate with agents speaking different protocols. The gateway pattern (draft-agent-gw, draft-li-dmsc-macp) is emerging as the pragmatic solution.
### 4. Invest in error recovery
@@ -112,7 +112,7 @@ In the first equilibrium, the landscape looks like today's microservices ecosyst
In the second equilibrium, the landscape looks more like the web: a layered architecture where identity (like TLS), communication (like HTTP), and semantics (like HTML) are cleanly separated, with standardized interfaces between them. Agents identify via WIMSE, execute via ECT-based DAGs, communicate via protocol-agnostic bindings, and operate under assurance profiles that scale from development to regulated production. Safety is built in, not bolted on.
The data we have analyzed -- 361 drafts, 628 cross-org convergent ideas, 12 gaps, 18 team blocs -- contains the building blocks for the second equilibrium. The question is whether the IETF community organizes itself to assemble them before market reality imposes the first.
The data we have analyzed -- 434 drafts, 628 cross-org convergent ideas, 11 gaps, 18 team blocs -- contains the building blocks for the second equilibrium. The question is whether the IETF community organizes itself to assemble them before market reality imposes the first.
The history of internet standards suggests that both happen: a messy market reality emerges first, followed by standards that rationalize and improve it. The web started with browser wars and incompatible HTML, then converged on HTML5. Mobile started with a zoo of protocols, then converged on LTE/5G. The AI agent ecosystem may follow the same path.
@@ -124,4 +124,4 @@ The drafts are being written. The race is on. The outcome depends on whether coo
---
*Analysis based on 361 IETF Internet-Drafts, 557 authors, 628 cross-org convergent ideas, and 12 identified gaps, current as of March 2026. Written by the Architect agent as input for the blog series and as a standalone reference document.*
*Analysis based on 434 IETF Internet-Drafts, 557 authors, 628 cross-org convergent ideas, and 11 identified gaps, current as of March 2026. Written by the Architect agent as input for the blog series and as a standalone reference document.*

View File

@@ -4,6 +4,30 @@
---
### 2026-03-08 WRITER/EDITOR — Factual Accuracy Pass Across All Blog Posts
**What**: Comprehensive factual accuracy fix across all 10 blog series files (posts 00-08 plus state-of-ecosystem), driven by three review documents (review-statistics.md, review-legal.md, review-science.md). Key changes:
1. **Draft count**: Updated all references from 361 to 434 (current DB count) across all posts.
2. **Gap count**: Changed from 12 to 11 everywhere; rewrote Post 04's gap table to match actual DB gap names and severities (2 critical, 5 high, 4 medium).
3. **Composite scores**: Fixed inflated scores (4.8 -> 4.75, 4.6 -> 4.5) everywhere; documented scoring as "4-dimension composite excluding overlap" and average as 3.27.
4. **Ideas count**: Added caveats explaining 419 (current DB) vs ~1,780 (earlier run) discrepancy; reframed Post 05 with data provenance note.
5. **Safety ratio nuance**: Changed flat "4:1" claims to "roughly 4:1 on aggregate, varying from 1.5:1 to 21:1 by month" throughout.
6. **Growth claim**: Removed cherry-picked "36x" multiplier; replaced with "rapid growth" framing using actual DB monthly figures.
7. **EU AI Act timeline**: Fixed Post 06's "within 18 months" to "within 5 months (August 2026)" with full enforcement details, penalty amounts, and article references.
8. **OAuth/GDPR distinction**: Added paragraph to Post 03 distinguishing OAuth consent from GDPR Einwilligung, noting controller-processor implications under Art. 28.
9. **Hospital scenario**: Added acknowledgment in Post 04 that this is already regulated under EU AI Act Annex III and Medical Devices Regulation.
10. **GDPR gap**: Added paragraph to Post 04 identifying GDPR-mandated capabilities (DPIA, right to erasure, data portability, purpose limitation) as a missing dimension in the gap analysis.
11. **Missing references**: Added FIPA, IEEE P3394, eIDAS 2.0 references where they naturally strengthen arguments (Posts 04, 05).
12. **Category counts**: Updated all category figures to match current DB (A2A: 155, identity: 152, data formats: 174, safety: 47, human-agent: 34, etc.).
13. **Huawei stats**: Corrected from "66 drafts, 18%" to "69 drafts, ~16%" with entity consolidation note.
14. **WG adoption**: Updated from "36 (10%)" to "52 (12%)" with corrected average scores (3.61 vs 3.23).
**Why**: Three independent reviews identified stale numbers, score inflation, missing regulatory context, and misleading single-ratio claims as the top credibility risks before publication.
**Result**: All 10 blog series files updated. Voice and style preserved. No structural rewrites beyond Post 04's gap table (which needed to match DB reality).
---
### 2026-03-08 CODER — Data Integrity Fixes from Statistical & Scientific Reviews
**What**: Fixed data integrity issues identified in `review-statistics.md` and `review-science.md`: