v0.3.0: Gap-to-Draft pipeline, Living Standards Observatory, blog series

Gap-to-Draft Pipeline (ietf pipeline):
- Context builder assembles ideas, RFC foundations, similar drafts, ecosystem vision
- Generator produces outlines + sections using rich context with Claude
- Quality gates: novelty (embedding similarity), references, format, self-rating
- Family coordinator generates 5-draft ecosystem (AEM/ATD/HITL/AEPB/APAE)
- I-D formatter with proper headers, references, 72-char wrapping

Living Standards Observatory (ietf observatory):
- Source abstraction with IETF + W3C fetchers
- 7-step update pipeline: snapshot, fetch, analyze, embed, ideas, gaps, record
- Static GitHub Pages dashboard (explorer, gap tracker, timeline)
- Weekly CI/CD automation via GitHub Actions

Also includes:
- 361 drafts (expanded from 260 with 6 new keywords), 403 authors, 1,262 ideas, 12 gaps
- Blog series (8 posts planned), reports, arXiv paper figures
- Agent team infrastructure (CLAUDE.md, scripts, dev journal)
- 5 new DB tables, schema migration, ~15 new query methods

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
This commit is contained in:
2026-03-04 00:48:57 +01:00
parent be9cf9c5d9
commit d6beb9c0a0
87 changed files with 24471 additions and 401 deletions

View File

@@ -0,0 +1,326 @@
# Blog Series: The IETF's AI Agent Standards Race
## Series Overview and Narrative Arc
*Architectural design document governing the 7-post blog series. This document has two sections: (A) the internal narrative architecture (for the team), and (B) the reader-facing series introduction (for publication).*
---
# PART A: NARRATIVE ARCHITECTURE (Internal)
## Overall Thesis
**The IETF's AI agent standardization effort is the largest, fastest-growing, and most consequential standards race in a decade -- but it is building the highways before the traffic lights.**
The data tells a story in three acts:
1. **The Gold Rush** (Posts 1-2): An explosion of activity, concentrated in surprising hands. 361 drafts, 36x growth in 9 months, one company writing 18% of all drafts, Western tech giants dramatically underrepresented.
2. **The Fragmentation** (Posts 3-4): That activity is not converging. 120 competing A2A protocols with no interoperability layer. 14 OAuth-for-agents proposals that cannot coexist. A 4:1 ratio of capability-building to safety work. Critical gaps where nobody is building at all.
3. **The Path Forward** (Posts 5-6): The raw material for a solution exists -- **628 technical ideas** independently proposed by multiple organizations show where genuine consensus is forming. But convergence on components is not convergence on architecture. The missing piece is not more protocols; it is connective tissue: a shared execution model, human oversight primitives, protocol interoperability, and assurance profiles.
The throughline is a question: **Can the IETF assemble the architecture before the protocols ship without it?**
---
## Narrative Arc Diagram
```
TENSION
^
| Post 6: THE BIG PICTURE
| / (resolution: here's
| / what the ecosystem
| Post 4: THE GAPS -----+ actually needs)
| / (climax: what \
| / nobody's building) \
| Post 3 / Post 5 \
| FRAGMENTATION CONVERGENCE \
| / (escalation: (628 cross-org \
| / competing for solutions) Post 7
| / protocols) HOW WE
|/ BUILT THIS
Post 1 Post 2
GOLD RUSH WHO WRITES
(hook: the THE RULES
numbers) (stakes:
geopolitics)
+-----------------------------------------------------------> TIME/POSTS
```
**The emotional arc**: Wow, this is huge (Post 1) -> Wait, who controls it? (Post 2) -> Oh no, it is fragmenting (Post 3) -> And the most important parts are missing (Post 4, the climax) -> But beneath the chaos, organizations actually agree on 628 ideas (Post 5) -> Here is what the finished picture looks like (Post 6, the resolution) -> And here is how we figured all this out (Post 7, the coda).
---
## Per-Post Design
### Post 1: "The IETF's AI Agent Gold Rush"
**File**: `01-gold-rush.md`
**Word count**: 1800-2200
**Base**: Existing draft at `data/reports/blog-post.md`, needs update from 260 to 361 drafts
**Key thesis**: The IETF is experiencing an unprecedented standardization sprint around AI agents, with growth rates not seen since the early web standards era.
**Key data points to include**:
- 361 drafts (up from 260 after keyword expansion with mcp, agentic, inference, generative, intelligent, aipref)
- 36x growth: 2 drafts/month (Jun 2025) to 72 drafts/month (Feb 2026)
- 557 authors from 230 organizations
- 10+ categories, with data formats/interop (145), A2A protocols (120), and identity/auth (108) leading
- Average quality score: ~3.38/5.0 (range 1.35-4.8)
- Top-rated drafts: VOLT (4.8), DAAP (4.8), STAMP (4.6), TPM-attestation (4.6)
- 4:1 safety deficit ratio (first mention -- this becomes the recurring motif)
**What makes it worth reading alone**: The sheer numbers. Nobody else has quantified this. The 36x growth curve is the hook.
**Ends with**: Teaser for Post 2 -- "But who is writing all these drafts? The answer is more concentrated than you'd expect."
---
### Post 2: "Who's Writing the Rules for AI Agents?"
**File**: `02-who-writes-the-rules.md`
**Word count**: 2000-2500
**Key thesis**: The standards that will govern AI agents are being written by a remarkably concentrated set of authors, with geopolitical implications that the IETF community has not reckoned with.
**Key data points to include**:
- Huawei: 53 authors, 66 drafts, 18% of all drafts (up from 12% pre-expansion)
- The 13-person Huawei bloc: 22 shared drafts, 94% cohesion, core 7 (B. Liu, N. Geng, Z. Li, Q. Gao, X. Shang, J. Mao, G. Zeng) each on 13-23 drafts
- Chinese institutional ecosystem: Huawei (53) + China Mobile (24) + China Telecom (24) + China Unicom (22) + Tsinghua (13) + ZTE (12) + BUPT (14) + Pengcheng Lab (8) + Zhongguancun Lab (4) = 160+ authors
- Western underrepresentation: Google now visible (5 authors, 9 drafts) but dramatically small relative to market position. Microsoft, Apple still largely absent. Amazon has 6 authors on 6 drafts (PQ crypto, not agent-specific).
- 18 team blocs covering ~25% of 557 authors
- Cross-org collaboration is sparse: top cross-team pair (Rosenberg-Jennings, Five9/Cisco) shares only 3 drafts
- Ericsson + Inria team focused narrowly on EDHOC/post-quantum (5 people, 6 drafts, 100% cohesion)
- JPMorgan + Telefonica + Oracle on transitive attestation (Western financial sector emerging)
- Chinese orgs form a tightly linked ecosystem: Huawei-China Unicom (6 shared drafts), Tsinghua-Zhongguancun Lab (5), China Mobile-ZTE (4)
**Structural insight**: Team blocs inflate apparent collaboration. When you account for intra-bloc pairs, cross-pollination between groups is thin. The landscape is a collection of islands, not a network.
**What makes it worth reading alone**: The geopolitics angle. The Huawei concentration is a genuine story. The Western absence is the surprise.
**Ends with**: "These 18 teams are not just writing separate drafts -- they are writing separate futures. The fragmentation runs deeper than authorship."
---
### Post 3: "The OAuth Wars and Other Protocol Battles"
**File**: `03-oauth-wars.md`
**Word count**: 2000-2500
**Key thesis**: The AI agent standards landscape is not just growing -- it is fragmenting. Multiple teams are solving the same problems independently, producing incompatible solutions that will impose real costs on implementers.
**Key data points to include**:
- 14-draft OAuth-for-agents cluster: aap-oauth-profile, aylward-daap-v2, barney-caam, chen-ai-agent-auth, chen-oauth-rar, goswami-agentic-jwt, jia-oauth-scope, liu-agent-operation-auth, liu-oauth-a2a, oauth-ai-agents-on-behalf-of-user, rosenberg-oauth-aauth, song-oauth-ai-agent-auth, song-oauth-ai-agent-collaborate, yao-agent-auth
- 10-draft Agent Gateway cluster
- 25+ near-duplicate draft pairs (>0.98 similarity)
- 42 topical clusters at 0.85 similarity threshold, 34 at 0.90
- 120 A2A protocol drafts with no interoperability layer
- Near-duplicate taxonomy: same-draft/different-WG (14), renamed (5), evolution (3), competing (2)
- Specific examples of WG shopping: draft submitted to both NMRG and OPSAWG, or both individual and WG track
**Structural insight**: Three causes of fragmentation: (1) WG shopping -- authors submit to multiple WGs hoping one sticks. (2) Parallel invention -- teams in isolation solving the same problem. (3) Strategic duplication -- organizations maximizing surface area. The data lets us distinguish these.
**What makes it worth reading alone**: The concrete examples. 14 ways to do OAuth for agents. People share this out of horrified fascination.
**Ends with**: "Fragmentation is costly but fixable -- teams can converge. The deeper problem is what nobody is building at all."
---
### Post 4: "What Nobody's Building (And Why It Matters)"
**File**: `04-what-nobody-builds.md`
**Word count**: 2000-2500
**THIS IS THE CLIMAX OF THE SERIES.**
**Key thesis**: The most dangerous gaps in AI agent standardization are not where competing solutions exist -- they are where no solutions exist at all. The three critical gaps address what happens when autonomous agents fail or misbehave, and these scenarios have received almost no attention.
**Key data points to include**:
- 12 gaps total: 3 critical, 6 high, 3 medium
- **Critical Gap 1: Behavior Verification** -- no mechanisms to verify agents follow declared policies. 44 safety drafts vs 361 total.
- **Critical Gap 2: Resource Management** -- 93 autonomous netops drafts, no agent-specific resource management framework.
- **Critical Gap 3: Error Recovery and Rollback** -- only 6 ideas from 1 draft (the starkest absence in the corpus).
- **High Gap: Cross-Protocol Translation** -- 120 A2A protocols, zero ideas for cross-protocol interop.
- **High Gap: Human Override** -- 30 human-agent drafts vs 120 A2A vs 93 autonomous netops. CHEQ exists but no emergency override protocol.
- The 4:1 ratio revisited: safety deficit is not just numerical, it is structural. Safety requires cross-WG coordination that the bloc structure cannot produce.
- Gap severity correlates with coordination difficulty
**For each critical gap, include a scenario**: "What goes wrong if this is never addressed?" -- make the gaps concrete and visceral.
**What makes it worth reading alone**: The fear factor. This is the "what keeps you up at night" post.
**Ends with**: "The gaps are real. But so are the solutions -- 628 ideas that multiple organizations independently agree on, scattered across the corpus with no connective tissue."
---
### Post 5: "Where 361 Drafts Converge (And Where They Don't)"
**File**: `05-1262-ideas.md`
**Word count**: 2000-2500
**Key thesis**: Beneath the fragmentation, genuine consensus is forming. **628 technical ideas** have been independently proposed by 2+ organizations -- cross-org convergence signals that reveal what the industry actually agrees on, regardless of which protocol camp they belong to.
**IMPORTANT NOTE ON FRAMING**: Our pipeline extracts ~5 ideas per draft mechanically (avg 4.9). The raw count (~1,780) is inflated and not the story. The story is which ideas survive cross-org validation -- the 628 that appear across different organizations. That is the defensible, meaningful metric. The raw extraction count should appear only in methodology context, not as a headline number.
**Key data points to include**:
- **628 cross-org convergent ideas** (ideas in 2+ drafts from different organizations) -- the headline metric
- Top convergence: "A2A Communication Paradigm" (8 orgs, 5 countries), "AI Agent Network Architecture" (8 orgs), "Multi-Agent Communication Protocol" (7 orgs)
- Org-pair overlap matrix: Chinese intra-bloc alignment (Huawei-China Unicom: 32 shared ideas) vs thin cross-regional signal (Ericsson-Inria: 21)
- Cross-org ideas that span Chinese-Western divide: 180 ideas (genuine cross-cultural consensus)
- Gap-to-convergence mapping: which gaps have cross-org attention, which have none?
- The "big 6" ambitious proposals: VOLT, ECT, CHEQ, STAMP, DAAP, ADL -- standout ideas regardless of convergence metrics
- The absent ideas: capability degradation signaling, multi-agent transaction semantics, agent migration, privacy-preserving discovery, agent cost/billing
**Structural insight**: Convergence and fragmentation coexist. Teams agree on WHAT needs building (628 ideas converge). They disagree on HOW (120 competing A2A protocols). The gap between "what" and "how" is where architecture is needed.
**What makes it worth reading alone**: The cross-org convergence data is actionable -- builders can see which ideas have multi-org backing vs single-team proposals.
**Ends with**: "628 ideas the industry agrees on, 12 gaps nobody is filling, and a question: what would it look like if someone drew the big picture?"
---
### Post 6: "Drawing the Big Picture: What the Agent Ecosystem Actually Needs"
**File**: `06-big-picture.md`
**Word count**: 2000-2500
**THIS IS THE RESOLUTION AND CAPSTONE.**
**Key thesis**: The landscape needs not more protocols but connective tissue -- a holistic ecosystem architecture providing a shared execution model (DAGs), human oversight primitives, protocol-agnostic interoperability, and assurance profiles that work from dev to regulated production.
**Key data points to include**:
- Full synthesis: 361 drafts, 557 authors, 628 cross-org convergent ideas, 12 gaps, 18 team blocs, 42 overlap clusters
- The proposed 5-draft ecosystem: AEM (architecture), ATD (task DAG), HITL (human-in-the-loop), AEPB (protocol binding), APAE (assurance profiles)
- How this builds on existing work: SPIFFE (identity), WIMSE (security context), ECT (execution evidence)
- The dual-regime insight: same execution model must work in K8s (fast/relaxed) AND regulated environments (proofs/attestation)
- Predictions based on data trajectories
- What builders should do TODAY: which drafts to watch, which gaps to fill, which patterns to adopt
**Structural insight**: The ecosystem needs five layers and existing work covers ~60%. Missing pieces: (1) DAG orchestration semantics, (2) HITL as first-class, (3) protocol translation, (4) assurance profiles. These map precisely to the critical and high-severity gaps.
**What makes it worth reading alone**: The vision. The forward-looking piece people share with their teams.
**Ends with**: "The IETF has navigated standardization sprints before. The drafts are being written. The question is whether architecture or fragmentation wins the race."
---
### Post 7: "How We Built This: Analyzing 361 IETF Drafts with Claude and Ollama"
**File**: `07-how-we-built-this.md`
**Word count**: 1500-2000
**Key thesis**: LLM-powered document analysis at scale is practical, cheap, and effective -- with careful engineering around caching, cost optimization, and hybrid model strategies.
**Key data points to include**:
- Pipeline: fetch (Datatracker API) -> analyze (Claude Sonnet) -> embed (Ollama nomic-embed-text) -> ideas (Claude Haiku, batched) -> gaps (Claude Sonnet)
- Cost: ~$3.16 for 260 drafts; Haiku batch mode cut costs ~10x for idea extraction
- Hybrid strategy: Claude for analysis (reasoning), Ollama for embeddings (local, free, fast)
- Caching via llm_cache table (SHA256 prompt hash) -- zero waste on re-runs
- Tech: Python + Click + SQLite + FTS5 + httpx + rich + anthropic SDK + ollama
- 13 CLI commands, 13+ visualizations, 11 report types
**What makes it worth reading alone**: Practical engineering details for anyone building similar systems.
**Ends with**: Cross-link to Post 8 (the meta post about the agent team).
---
## Recurring Motifs (thread across all posts)
1. **The 4:1 Safety Deficit**: Introduced in Post 1, deepened in Post 4, resolved in Post 6. The series' signature metric.
2. **The Highway/Traffic Light Metaphor**: The IETF is building highways (protocols) before traffic lights (safety, verification, override). Use sparingly but consistently.
3. **Fragmentation vs. Architecture**: Bottom-up protocol proliferation vs. top-down ecosystem design. Posts 3 and 6 are the poles of this tension.
4. **Concentration and Absence**: Huawei's dominance and Western absence. Introduced in Post 2, revisited in Post 6.
5. **The Islands Problem**: Team blocs as islands. Ideas cluster within orgs. Cross-pollination is thin. The ecosystem needs bridges, not more islands.
---
## Data Needs Per Post (for the Analyst)
| Post | Data Needed |
|------|-------------|
| 1 | Updated counts (361), category breakdown with new drafts, growth timeline, score distribution |
| 2 | Author/org rankings (refreshed for 361), bloc details, cross-org matrix, Chinese vs Western counts |
| 3 | OAuth cluster details (14 drafts with approaches), near-duplicate pairs, overlap clusters, A2A count |
| 4 | Full gap details, per-gap idea counts, safety ratio, category vs gap matrix |
| 5 | Full idea taxonomy, cross-org idea overlap, common ideas, unique ideas, idea-to-gap mapping |
| 6 | Synthesis: top-level stats, gap fill estimates, category growth rates, WG adoption signals |
| 7 | Pipeline stats: API call counts, costs, cache hit rates, timing |
---
## Missing Analyses the Coder Should Build
1. **Category Trend Analysis** (Posts 1, 3, 6): Monthly breakdown per category. Growth rates. Which accelerating, which plateauing?
2. **RFC Cross-Reference Map** (Posts 5, 6): Which RFCs do the 361 drafts build on? Reveals the foundation layer.
3. **Cross-Org Idea Overlap** (Post 5): Ideas in 2+ drafts from different orgs = genuine consensus signal.
4. **Draft Status / WG Adoption** (Post 6): Which drafts adopted by WGs? Which past -00? Traction vs aspiration.
---
## Tone and Style
- **Data-driven but narrative**: Every claim backed by a number, every number wrapped in a story.
- **Authoritative but accessible**: Analysis, not advocacy. Let the data argue.
- **Opinionated where data supports it**: The safety deficit is a problem. Fragmentation is costly. Concentration is concerning.
- **Name names**: Specific drafts, authors, organizations. This is journalism.
- **Lead with surprise**: Each post opens with its most unexpected finding.
- **End with forward link**: Each post teases the next.
- **1500-2500 words per post**: Dense enough to be substantial, short enough to finish.
---
# PART B: READER-FACING SERIES INTRODUCTION
*What happens when the internet's standards body tries to build the rules for AI agents -- in real time, with 361 drafts, 557 authors, and a 4:1 safety deficit?*
---
## About This Series
The Internet Engineering Task Force is in the middle of the largest, fastest-growing standards race in a decade. In fifteen months, AI- and agent-related Internet-Drafts went from **0.5% to 9.3%** of all IETF submissions -- nearly 1 in 10. We built an automated analyzer to fetch, categorize, rate, and map every one of them.
This series tells the story of what we found: explosive growth, deep fragmentation, a concerning safety deficit, and hidden patterns that reveal where the real power lies and where the real risks lurk.
## The Posts
| # | Title | What You'll Learn |
|---|-------|-------------------|
| 1 | [The IETF's AI Agent Gold Rush](01-gold-rush.md) | The numbers: 361 drafts, 0.5% to 9.3% growth in 15 months, and a 4:1 capability-to-safety ratio |
| 2 | [Who's Writing the Rules for AI Agents?](02-who-writes-the-rules.md) | The geopolitics: Huawei's 13-person bloc, Chinese institutional dominance, Western underrepresentation |
| 3 | [The OAuth Wars and Other Battles](03-oauth-wars.md) | The fragmentation: 14 competing OAuth drafts, 120 A2A protocols with no interop |
| 4 | [What Nobody's Building (And Why It Matters)](04-what-nobody-builds.md) | The gaps: 12 missing standards, 3 critical, and what goes wrong without them |
| 5 | [Where 361 Drafts Converge (And Where They Don't)](05-1262-ideas.md) | The convergence: 628 cross-org ideas reveal genuine consensus beneath the fragmentation |
| 6 | [Drawing the Big Picture](06-big-picture.md) | The vision: what the agent ecosystem actually needs and what comes next |
| 7 | [How We Built This](07-how-we-built-this.md) | The methodology: analyzing 361 drafts with Claude, Ollama, and Python |
## How to Read
**Linear (recommended)**: 1 -> 2 -> 3 -> 4 -> 5 -> 6 -> 7
**By interest**:
- **Executives / decision-makers**: Post 1 (overview) -> Post 4 (gaps) -> Post 6 (vision)
- **Standards participants**: Post 2 (who's writing) -> Post 3 (fragmentation) -> Post 5 (ideas) -> Post 6 (vision)
- **Builders / implementers**: Post 4 (gaps) -> Post 5 (ideas) -> Post 6 (vision) -> Post 7 (methodology)
Each post stands alone, but they build on each other. If you read one, make it **Post 4** -- the gaps analysis is the most consequential finding.
## The Data
All findings come from our open-source IETF Draft Analyzer, which fetches drafts via the Datatracker API, rates them using Claude, extracts technical ideas, detects collaboration patterns via co-authorship analysis, and identifies standardization gaps. Data current as of March 2026.
| Stat | Value |
|------|-------|
| Drafts analyzed | 361 |
| Authors mapped | 557 |
| Organizations | 230 |
| Cross-org convergent ideas | 628 |
| Gaps identified | 12 (3 critical) |
| Team blocs detected | 18 |
| Analysis cost | ~$9 |
---
*Designed by the Architect agent, 2026-03-03.*

View File

@@ -0,0 +1,136 @@
# The IETF's AI Agent Gold Rush: 361 Drafts, 557 Authors, and the Race to Define How AI Agents Talk
*Fifteen months ago, AI agents barely registered at the IETF. Today, nearly 1 in 10 new Internet-Drafts is about AI agents. We analyzed every one.*
---
For every Internet-Draft addressing how to keep an AI agent safe, roughly four are building new capabilities for it. That is the single most important number in this analysis.
We built an automated pipeline to fetch, categorize, rate, and map every AI- and agent-related Internet-Draft currently in the IETF system. We found **361 drafts** from **557 authors** at **230 organizations** and identified **12 standardization gaps** -- three of them critical. The result is the most comprehensive public analysis of the IETF's AI agent landscape to date.
The story the data tells is not subtle: the internet's most important standards body is in the middle of a gold rush, and the prospectors are moving faster than the safety inspectors.
## The Growth Curve
In 2024, just **9 AI/agent-related drafts** were submitted to the IETF -- **0.5%** of all submissions. By Q1 2026, AI/agent drafts account for **9.3%** of all new Internet-Drafts. Nearly 1 in 10.
| Year | Total IETF Drafts | AI/Agent Drafts | AI Share |
|------|------------------:|----------------:|---------:|
| 2021 | 1,108 | ~0 | ~0% |
| 2022 | 1,121 | ~0 | ~0% |
| 2023 | 1,241 | ~0 | ~0% |
| 2024 | 1,651 | 9 | 0.5% |
| 2025 | 2,696 | 190 | 7.0% |
| 2026 (Q1) | 1,748 | 162 | 9.3% |
The IETF itself accelerated 2.4x from 2021 to 2025. But AI/agent work went from essentially zero to dominant topic in under two years. The acceleration is not gradual. It is a step function that began in mid-2025 and has not slowed.
This growth is driven by a convergence of forces: the explosion of commercial AI agent deployments (ChatGPT plugins, Anthropic's Claude tools, Google's Gemini agents), the emergence of protocols like MCP and A2A that need standardization, and the recognition across the industry that AI agents communicating over the internet without agreed-upon identity, security, and interoperability standards is a problem that gets worse every month it goes unaddressed.
(A note on methodology: our pipeline searches the Datatracker for 12 keywords -- `agent`, `ai-agent`, `llm`, `autonomous`, `machine-learning`, `artificial-intelligence`, `mcp`, `agentic`, `inference`, `generative`, `intelligent`, and `aipref` -- across both draft names and abstracts. We started with 6 keywords and 260 drafts, then expanded to 12 to capture MCP-related work, generative AI infrastructure, and intelligent networking. The full methodology is in [Post 7](07-how-we-built-this.md).)
The drafts span eight categories, and the distribution reveals priorities:
| Category | Drafts | Share |
|----------|-------:|------:|
| Data formats and interoperability | 145 | 40% |
| A2A protocols | 120 | 33% |
| Agent identity and authentication | 108 | 30% |
| Autonomous network operations | 93 | 26% |
| Policy and governance | 91 | 25% |
| ML traffic management | 73 | 20% |
| Agent discovery and registration | 65 | 18% |
| AI safety and alignment | 44 | 12% |
| Model serving and inference | 42 | 12% |
| Human-agent interaction | 30 | 8% |
Note that drafts can belong to multiple categories, so percentages exceed 100%. The dominance of plumbing -- data formats, identity, and communication protocols -- is expected for an early-stage standards effort. What is unexpected is how little attention the safety and human-oversight categories receive.
The ecosystem's DNA is visible in what it cites. We parsed **4,231 cross-references** from the drafts, and the foundation is clear: **TLS 1.3** (RFC 8446, cited by 42 drafts), **OAuth 2.0** (RFC 6749, 36 drafts), **HTTP Semantics** (RFC 9110, 34 drafts), and **JWT** (RFC 7519, 22 drafts). The agent identity/auth category is essentially built on top of the OAuth stack. The entire landscape stands on a security foundation -- which makes the 4:1 safety deficit all the more jarring.
## The Safety Deficit
The ratio is stark:
| Focus Area | Drafts |
|------------|-------:|
| A2A protocols | 120 |
| Autonomous operations | 93 |
| Agent identity/auth | 108 |
| **AI safety/alignment** | **44** |
| **Human-agent interaction** | **30** |
For every draft about keeping agents safe, approximately four are building new capabilities. For every draft about human-agent interaction, there are more than four about agents operating autonomously. The community is building the highways and forgetting the traffic lights.
This is not an abstract concern. Imagine an AI agent managing cloud infrastructure that detects a spurious anomaly, autonomously scales down a critical service, and triggers a cascading outage across three availability zones. Today, there is no standard mechanism to verify that the agent followed its declared policy before acting. No standard way to roll back the decision once the cascade begins. No standard protocol for a human operator to issue an emergency stop. The three critical gaps our analysis identified -- behavior verification, resource management, and error recovery -- are all about what happens when things go wrong. And in a world of autonomous AI agents, things will go wrong.
The safety drafts that do exist are often among the highest-rated in our analysis. [draft-aylward-daap-v2](https://datatracker.ietf.org/doc/draft-aylward-daap-v2/) -- a comprehensive accountability protocol -- and [draft-cowles-volt](https://datatracker.ietf.org/doc/draft-cowles-volt/) -- a tamper-evident execution trace format -- each scored 4.8 out of 5, the highest in the entire corpus. [draft-birkholz-verifiable-agent-conversations](https://datatracker.ietf.org/doc/draft-birkholz-verifiable-agent-conversations/), which defines verifiable conversation records using cryptographic signing, scored 4.5. The quality is there. The quantity is not.
## Who's Writing the Drafts
The organizational picture is as revealing as the technical one. The top contributors:
| Organization | Authors | Drafts |
|-------------|--------:|-------:|
| Huawei | 53 | 66 |
| China Mobile | 24 | 35 |
| Cisco | 24 | 26 |
| Independent | 19 | 25 |
| China Telecom | 24 | 24 |
| China Unicom | 22 | 21 |
| Tsinghua University | 13 | 16 |
| ZTE Corporation | 12 | 12 |
| Five9 | 1 | 10 |
| Ericsson | 4 | 9 |
**Huawei** leads by a wide margin: **53 authors** contributing to **66 drafts** -- 18% of the entire corpus. But the concentration goes deeper than raw numbers -- the next post will examine the team bloc structure, geopolitics, and what the collaboration network reveals about where power really lies.
Cisco and China Mobile each have 24 authors, but China Mobile's team produces 35 drafts to Cisco's 26. Ericsson has only 4 authors but punches above its weight with 9 focused drafts. Independent contributors account for 25 drafts -- a healthy sign of grassroots engagement.
## The Fragmentation Problem
The drafts are not just numerous; they are redundant. Our embedding-based similarity analysis found **25+ draft pairs** with greater than 0.98 cosine similarity -- functionally identical proposals submitted under different names.
The most crowded space is OAuth for AI agents: **14 separate drafts** all trying to solve how AI agents authenticate and get authorized. They range from broad framework proposals ([draft-aap-oauth-profile](https://datatracker.ietf.org/doc/draft-aap-oauth-profile/)) to narrow extensions ([draft-jia-oauth-scope-aggregation](https://datatracker.ietf.org/doc/draft-jia-oauth-scope-aggregation/)) to full accountability systems ([draft-aylward-daap-v2](https://datatracker.ietf.org/doc/draft-aylward-daap-v2/)). None are compatible with each other.
Beyond OAuth, the broader A2A protocol landscape includes **120 drafts** with no interoperability layer. The most common technical idea in the entire corpus -- "Multi-Agent Communication Protocol" -- appears in 8 separate drafts from different teams. And the fragmentation goes deeper than protocols: of roughly 1,700 technical ideas extracted from the corpus, **96% appear in exactly one draft**. Everyone is solving the same problem. Nobody is solving it together.
This fragmentation has real costs. Implementers face confusion over which draft to follow. The IETF process slows as competing proposals vie for working group adoption. And the longer competing drafts proliferate without convergence, the higher the risk of incompatible deployments that entrench fragmentation rather than resolving it.
## What the Best Drafts Look Like
Not everything is chaos. Our quality ratings -- scoring novelty, maturity, overlap avoidance, momentum, and relevance on a 1-5 scale -- surface drafts that are doing the hard work well:
| Draft | Score | What It Does |
|-------|------:|-------------|
| [draft-aylward-daap-v2](https://datatracker.ietf.org/doc/draft-aylward-daap-v2/) | 4.8 | Comprehensive AI agent accountability with authentication, monitoring, enforcement |
| [draft-guy-bary-stamp-protocol](https://datatracker.ietf.org/doc/draft-guy-bary-stamp-protocol/) | 4.6 | Cryptographic delegation and proof for agent task execution |
| [draft-drake-email-tpm-attestation](https://datatracker.ietf.org/doc/draft-drake-email-tpm-attestation/) | 4.6 | Hardware attestation for email via TPM verification chains |
| [draft-ietf-lake-app-profiles](https://datatracker.ietf.org/doc/draft-ietf-lake-app-profiles/) | 4.6 | Canonical CBOR for EDHOC application profiles |
| [draft-birkholz-verifiable-agent-conversations](https://datatracker.ietf.org/doc/draft-birkholz-verifiable-agent-conversations/) | 4.5 | Verifiable agent conversation records with COSE signing |
The average score across all rated drafts is 3.38. The best work combines clear problem definition with concrete mechanisms and low overlap with existing proposals. The worst drafts are me-too proposals that restate problems already solved elsewhere.
## What Comes Next
The IETF has navigated technology gold rushes before -- the early web, IoT, DNS security. In each case, the first wave of competing proposals eventually converged, and the lasting standards came from those who focused on interoperability and safety alongside capability.
The AI agent wave is following the same early pattern. The landscape has quantity. The question is whether it develops architecture -- and whether the safety work catches up before the capability work ships without it.
This blog series will dig into the questions the data raises. The next post starts with the most fundamental: who, exactly, is writing the rules?
---
### Key Takeaways
- **361 drafts** from **557 authors** at **230 organizations** -- AI/agent work went from **0.5% to 9.3%** of all IETF submissions in 15 months
- The **4:1 ratio** of capability-building to safety drafts is the most concerning structural finding
- **Huawei** dominates authorship with 53 authors on 66 drafts (18% of corpus); Chinese-linked institutions account for 160+ authors
- **14 competing OAuth-for-agents proposals** illustrate deep fragmentation; 120 A2A protocol drafts have no interoperability layer
- **12 standardization gaps** remain, with the 3 most critical all relating to what happens when agents fail
*Next in this series: [Who's Writing the Rules for AI Agents?](02-who-writes-the-rules.md) -- Inside the team blocs, geopolitics, and collaboration networks behind the IETF's AI agent standards.*
---
*Analysis conducted using the IETF Draft Analyzer. Data current as of March 2026. All 361 drafts, 557 authors, and full analysis data are available in the project's SQLite database.*

View File

@@ -0,0 +1,170 @@
# Who's Writing the Rules for AI Agents?
*Inside the team blocs, geopolitics, and collaboration networks shaping the future of AI agent standards.*
---
Thirteen people from one company co-author 22 Internet-Drafts at 94% internal cohesion. Their work covers agent networking, identity management, communication protocols, and network troubleshooting. Together, they represent the single most coordinated standards-writing campaign in the IETF's AI agent space.
They all work at Huawei.
This is the story of who is writing the rules for AI agents, what their collaboration networks reveal, and why the geography of authorship matters more than most people realize.
## The Numbers Behind the Names
Our analysis mapped **557 unique authors** from **230 organizations** across the 361 AI/agent drafts in the IETF pipeline. But those topline numbers mask extreme concentration.
| Organization | Authors | Drafts |
|-------------|--------:|-------:|
| Huawei | 53 | 66 |
| China Mobile | 24 | 35 |
| Cisco | 24 | 26 |
| Independent | 19 | 25 |
| China Telecom | 24 | 24 |
| China Unicom | 22 | 21 |
| Tsinghua University | 13 | 16 |
| ZTE Corporation | 12 | 12 |
| Five9 | 1 | 10 |
| Ericsson | 4 | 9 |
One company -- Huawei -- contributes 18% of all drafts. The top six Chinese-linked organizations together contribute over 160 authors. This is not a general pattern across the IETF; it is specific to the AI agent space, and it tells a story about who considers these standards strategically important.
## The Huawei Drafting Machine
The Huawei team bloc is worth examining in detail because it illustrates a pattern -- organized, coordinated standards campaigns -- that is characteristic of how some institutions approach the IETF.
The 13-person core team includes:
| Author | Drafts | Role in Team |
|--------|-------:|-------------|
| Bing Liu | 23 | Top contributor, appears on most team drafts |
| Zhenbin Li | 21 | Core, agent networking frameworks |
| Nan Geng | 20 | Core, near-total overlap with Liu |
| Qiangzhou Gao | 20 | Core, cross-device communication |
| Xiaotong Shang | 19 | Core, network measurement and troubleshooting |
| Jianwei Mao | 14 | Communication protocol gap analysis |
| Guanming Zeng | 13 | MCP and NETCONF for agents |
The remaining six members contribute 2-5 drafts each. The team's **94% cohesion** means that nearly every possible pair of members shares the vast majority of their drafts. This is not casual co-authorship; it is a systematic drafting operation.
Their 22 drafts cover a specific territory: agent networking frameworks for enterprise and broadband networks, agent identity management, cross-device communication, MCP integration for network troubleshooting, and agent gateway requirements. The focus is heavily on **autonomous network operations** and **A2A protocols** -- the infrastructure layer of the agent ecosystem.
Two deeper metrics reveal the nature of this operation:
**Volume over iteration.** Across the entire corpus, **55% of all 361 drafts** have never been revised beyond their first submission (rev-00). But the rate varies dramatically by organization. Of Huawei's drafts, **65% are at rev-00**. Compare that to Ericsson (11%), Siemens (0%), Nokia (20%), or Boeing (0%). The most serious iterators -- Boeing (avg 28.2 revisions per draft), Siemens (17.2), Sandelman Software (14.3) -- submit far fewer drafts but iterate relentlessly. Western companies submit fewer drafts but revise heavily -- incorporating feedback, advancing toward maturity. Huawei's pattern is the opposite: submit at volume, iterate rarely. Submitting a draft is cheap. Iterating it signals genuine investment.
**Campaign timing.** Of Huawei's drafts, **43 were submitted in the four weeks before IETF 121 Dublin** -- 62% of the company's entire output, packed into a single pre-meeting window. For context, the entire corpus had 107 drafts in that period. Huawei alone accounted for **40% of all pre-IETF 121 submissions**. This is not organic growth. It is a coordinated submission campaign timed for maximum standards-body impact.
Beyond the main team, the company has additional smaller blocs. No other organization comes close to this level of coordinated output.
## The Chinese Institutional Ecosystem
Huawei does not operate in isolation. The Chinese organizations in this space form a densely interconnected collaboration network.
| Org A | Org B | Shared Drafts |
|-------|-------|-----:|
| China Unicom | Huawei | 6 |
| Tsinghua University | Zhongguancun Laboratory | 5 |
| China Mobile | ZTE Corporation | 4 |
| China Mobile | Huawei | 4 |
| BUPT | Tsinghua University | 3 |
| China Telecom | Huawei | 3 |
| BUPT | China Telecom | 3 |
| CAICT | Huawei | 3 |
The structure has three tiers:
**Tier 1: Telecom operators** -- China Mobile (24 authors, 35 drafts), China Telecom (24 authors, 24 drafts), China Unicom (22 authors, 21 drafts). These organizations bring domain expertise in network operations and 6G requirements. Their drafts focus heavily on use cases: agents for 6G networks, agent-based network management, traffic optimization.
**Tier 2: Equipment vendor** -- Huawei (53 authors, 66 drafts), ZTE Corporation (12 authors, 12 drafts). Huawei's dominance here is striking; ZTE's contribution is modest by comparison. These drafts focus on architecture and protocols -- the building blocks rather than the use cases.
**Tier 3: Research institutions** -- Tsinghua University (13 authors, 16 drafts), BUPT (14 authors, 7 drafts), Zhongguancun Laboratory (4 authors, 6 drafts), CAICT (8 authors, 6 drafts). These institutions bridge the gap between industry and academia, often co-authoring with both telecom operators and Huawei.
The Zhongguancun Laboratory team (4 members, 5 shared drafts, 94% cohesion) is led by Yong Cui of Tsinghua University, one of the most prolific individual authors with 8 drafts spanning agent discovery, network management benchmarking, and LLM-assisted operations. His work includes [draft-cui-nmrg-llm-benchmark](https://datatracker.ietf.org/doc/draft-cui-nmrg-llm-benchmark/) (score 4.3) -- one of the highest-rated drafts in the corpus.
The China Telecom team (6 members from China Telecom, BUPT, and Tsinghua) focuses on 6G agent use cases and IoA task protocols. Their drafts are more forward-looking than Huawei's -- less about current network operations, more about where agents fit in next-generation infrastructure.
## Where Is the West?
The absence is as telling as the presence.
**Google**: 5 authors, 9 drafts -- a notable increase, but still thin relative to the company's agent platform presence (Gemini agents, A2A protocol).
**Microsoft**: Minimal presence.
**Apple**: Two authors, two drafts -- both about mail automation ([draft-ietf-mailmaint-pacc](https://datatracker.ietf.org/doc/draft-ietf-mailmaint-pacc/), [draft-eggert-mailmaint-uaautoconf](https://datatracker.ietf.org/doc/draft-eggert-mailmaint-uaautoconf/)). Not about AI agents per se.
**Amazon**: 6 authors, 6 drafts -- primarily post-quantum cryptography work (ML-KEM hybrid key exchange), not agent-specific.
**Cisco**: The most active Western tech company with 24 authors across 26 drafts, but spread thinly. Three separate Cisco blocs cover different areas: Cullen Fluffy Jennings and Suhas Nandakumar work on A2A transport and agent identity; another team (Muscariello, Papalini, Sardara, Betts) works on AGNTCY messaging; a third (Farinacci, Rodriguez-Natal, Maino) works on LISP-based networking. No single coordinated campaign.
**Ericsson**: 4 authors, 9 drafts -- focused on EDHOC lightweight authentication, a mature protocol effort led by Goran Selander. High quality (scores 3.2-4.1) but narrow scope.
The pattern is clear: Western companies are either absent from AI agent standardization or participating in adjacent security/crypto work rather than the core agent protocol space. The reasons likely include strategic focus on proprietary agent ecosystems (Google's Gemini, Apple's Siri agents), less tradition of IETF engagement in the agent/AI space, and the assumption that de facto standards (MCP, A2A) will matter more than de jure IETF ones.
This bet may prove wrong. IETF standards have a way of becoming the infrastructure that everyone must eventually support.
## The Team Bloc Landscape
Beyond Huawei, our co-authorship analysis detected **18 team blocs** covering a significant fraction of the 557 authors. Each bloc is a group where members share at least 70% pairwise draft overlap and 3+ shared drafts.
The most notable non-Chinese blocs:
**Ericsson team** (5 members, 6 drafts, 100% cohesion) -- Goran Selander and colleagues lead this European effort focused on EDHOC authentication and lightweight key exchange for constrained devices. They collaborate with Inria (France) and the University of Murcia (Spain). Their work ([draft-spm-lake-pqsuites](https://datatracker.ietf.org/doc/draft-spm-lake-pqsuites/), score 4.1) represents some of the most mature protocol work in the corpus.
**Five9/Bitwave team** (2 members, 6 drafts, 100% cohesion) -- Jonathan Rosenberg (Five9) and Pat White (Bitwave) are the most prolific Western contributors to core agent protocols. Their drafts span the full stack: CHEQ for human confirmation of agent decisions ([draft-rosenberg-aiproto-cheq](https://datatracker.ietf.org/doc/draft-rosenberg-aiproto-cheq/), score 3.9), N-ACT for agent-to-tool communication, and an OAuth extension for agent authentication. Rosenberg is also the strongest cross-team bridge, sharing 3 drafts with Cisco's Cullen Fluffy Jennings -- the single strongest cross-bloc connection we found.
**ISI, R.C. ATHENA team** (4 members, 4 drafts, 100% cohesion) -- A Greek research institute producing post-quantum authentication work for EDHOC. All four members (Haleplidis, Fraile, Fournaris, Koulamas) co-author every draft. Their [draft-lake-pocero-authkem-ikr-edhoc](https://datatracker.ietf.org/doc/draft-lake-pocero-authkem-ikr-edhoc/) scored 4.2.
**JPMorgan/multi-org team** (4 members from JPMorgan, Oracle, Telefonica, Aryaka; 2 drafts, 100% cohesion) -- The most cross-organizational Western bloc. Their work on transitive attestation ([draft-mw-wimse-transitive-attestation](https://datatracker.ietf.org/doc/draft-mw-wimse-transitive-attestation/), score 4.3) and actor chains ([draft-mw-spice-actor-chain](https://datatracker.ietf.org/doc/draft-mw-spice-actor-chain/), score 4.1) addresses the safety and accountability space. Notably, these are among the highest-scored drafts in the corpus.
## The Cross-Pollination Problem
Once you account for team blocs, the cross-team collaboration picture is sparse. The top cross-bloc connection -- Jonathan Rosenberg bridging Five9/Bitwave and Cisco -- involves just 3 shared drafts. Most cross-team pairs share only 1.
Our network centrality analysis reveals who bridges these gaps. Of 557 authors, only **115 (23%)** co-author with people from both Chinese and Western organizations. The top bridge-builders are not from the organizations you might expect:
| Author | Organization | BC Score | CN Neighbors | Western Neighbors |
|--------|-------------|--------:|---:|---:|
| Luis M. Contreras | Telefonica | 0.035 | 11 | 3 |
| Qin Wu | Huawei | 0.035 | 12 | 11 |
| Muhammad Awais Jadoon | InterDigital | 0.023 | 9 | 4 |
| Diego Lopez | Telefonica | 0.013 | 6 | 9 |
| Giuseppe Fioccola | Huawei | 0.009 | 2 | 8 |
The structural glue holding the two blocs together is **European telecoms** -- Telefonica, InterDigital, Deutsche Telekom. Not US Big Tech. Not any formal cross-standards body. A handful of European companies, through their authors' co-authorship ties, provide the only significant cross-divide connectivity. Qin Wu (Huawei) is the most balanced individual bridge, with nearly equal Chinese and Western co-author networks. But these bridges are thin: remove any two or three of these people, and the network fragments further.
The sparseness of these bridges becomes even more concerning when you look at what the two blocs are building *on*. Our RFC cross-reference analysis (detailed in Post 3) reveals that the Chinese and Western blocs cite fundamentally different technology stacks. The Chinese agent ecosystem is being built on **network management protocols** -- YANG (RFC 7950), NETCONF (RFC 6241), and autonomic networking (RFC 7575). The Western ecosystem is being built on **IoT security and web infrastructure** -- COSE (RFC 9052), CBOR (RFC 8949), CoAP (RFC 7252), HTTP Semantics (RFC 9110), and EDHOC (RFC 9528). The only shared foundation is **OAuth 2.0** -- which explains why the OAuth-for-agents space has 14 competing proposals. It is the one piece of common ground, and everyone is fighting over it.
This means the cross-pollination problem is deeper than "different teams working separately." The two blocs are building on incompatible infrastructure. Even if they agreed on an agent communication pattern, the underlying plumbing diverges.
The IETF's consensus process works best when different implementation perspectives collide and reconcile. In the AI agent space, those collisions are rare. The Chinese institutional ecosystem collaborates internally but has limited connections to Western contributors. The European cryptographic teams (Ericsson, RISE, ATHENA) work on authentication foundations but do not connect to the agent protocol teams. The American startups (Five9, Bitwave) and enterprise companies (Cisco) work on adjacent problems without shared architectural framing.
The one exception is Fraunhofer SIT's Henk Birkholz and Tradeverifyd's Orie Steele, whose [draft-birkholz-verifiable-agent-conversations](https://datatracker.ietf.org/doc/draft-birkholz-verifiable-agent-conversations/) (score 4.5) and [draft-steele-agent-considerations](https://datatracker.ietf.org/doc/draft-steele-agent-considerations/) (score 4.0) represent rare cross-cultural, safety-focused work from German and American collaborators.
## What This Means
Three implications emerge from the authorship data:
**1. Volume and influence are not the same thing.** Huawei's 66 drafts represent 18% of the corpus, but 65% have never been revised. The IETF rewards sustained engagement -- drafts that iterate through feedback cycles, reach working group adoption, and mature toward RFC status. A campaign that optimizes for volume at a pre-meeting deadline is playing a different game than one that optimizes for adoption. The quality scores bear this out: Huawei's team averages around 3.1, respectable but not exceptional. The organizations doing the deepest work (Ericsson at 4.8 average revision, Siemens at 17.2) submit far fewer drafts but iterate relentlessly.
**2. The safety work comes from unexpected places.** The highest-quality safety and accountability drafts come not from the high-volume drafters but from smaller, specialized teams: Aylward (independent), Birkholz/Steele (Fraunhofer/Tradeverifyd), Rosenberg/White (Five9/Bitwave), and the JPMorgan-led multi-org team. The organizations doing the most drafting are focused on capability; the organizations doing the best safety work are doing the least drafting.
**3. The IETF needs more bridges.** Cross-team, cross-organization, cross-geography collaboration is the weakest link in the current landscape. Our centrality analysis shows that European telecoms -- not US Big Tech -- are the structural glue between Chinese and Western blocs. The standards that will endure are the ones where Chinese telecom expertise, European cryptographic rigor, and American agent-platform experience converge. Right now, those worlds barely overlap, and the few bridges that exist depend on a handful of individuals.
---
### Key Takeaways
- **Huawei dominates** with 53 authors on 66 drafts (18% of corpus); their 13-person core team co-authors 22 drafts at 94% cohesion -- but 65% of those drafts have never been revised, and 43 were submitted in a single 4-week pre-meeting window
- **Chinese institutions** collectively contribute 160+ of 557 authors; they form a tightly interconnected collaboration ecosystem
- **Google has 9 drafts but Microsoft and Apple are largely absent** from AI agent standardization -- a notable strategic gap
- **18 team blocs** detected; cross-team collaboration is sparse, with most cross-bloc pairs sharing only 1 draft
- **Only 23% of authors bridge the Chinese-Western divide**; European telecoms (Telefonica, InterDigital) are the structural glue -- not US Big Tech
- **The best safety work** comes from smaller, specialized teams -- not from the high-volume drafters
*Next in this series: [The OAuth Wars and Other Battles](03-oauth-wars.md) -- 14 competing proposals, 120 A2A protocols, and what fragmentation costs the internet.*
---
*Data from the IETF Draft Analyzer, covering 361 drafts, 557 authors, and 18 detected team blocs. Co-authorship analysis uses 70% pairwise draft overlap threshold with 3+ shared drafts.*

View File

@@ -0,0 +1,165 @@
# The OAuth Wars and Other Battles
*14 competing proposals, 120 protocols with no interop layer, and 25+ near-duplicate drafts. Inside the IETF's AI agent fragmentation problem.*
---
Fourteen separate Internet-Drafts are trying to solve the same problem: how should AI agents authenticate and get authorized using OAuth? They are not collaborating. They are not compatible. And they are all submitted in the same nine-month window.
This is the fragmentation problem, and it is not limited to OAuth. Across the IETF's AI agent landscape, our analysis found the same pattern repeated in agent discovery, multi-agent communication, intent-based routing, and 6G agent requirements. Teams are working in parallel, not together, and the cost is measured in wasted effort, confused implementers, and the growing risk of incompatible deployments.
## The OAuth Cluster: 14 Ways to Solve One Problem
The most crowded corner of the AI agent standards landscape is OAuth for agents. Every proposal is trying to answer the same fundamental question: when an AI agent acts on behalf of a user -- or on its own -- how does it prove its identity and obtain permission?
The depth of this cluster is not surprising when you look at the ecosystem's foundations. Our cross-reference analysis of all 361 drafts found that **OAuth 2.0** (RFC 6749) is cited by **36 drafts**, **JWT** (RFC 7519) by **22**, **OAuth Bearer** (RFC 6750) by **9**, and **DPoP** (RFC 9449) by **9**. The OAuth stack is the single most-referenced functional standard in the entire corpus after TLS. The agent identity problem runs through the landscape like a root system.
Here are all 14 drafts:
| Draft | Approach | Score |
|-------|----------|------:|
| [draft-aylward-daap-v2](https://datatracker.ietf.org/doc/draft-aylward-daap-v2/) | Comprehensive accountability protocol | 4.8 |
| [draft-goswami-agentic-jwt](https://datatracker.ietf.org/doc/draft-goswami-agentic-jwt/) | Agentic JWT for autonomous systems | 4.5 |
| [draft-chen-oauth-rar-agent-extensions](https://datatracker.ietf.org/doc/draft-chen-oauth-rar-agent-extensions/) | RAR extensions for agent policy | 4.2 |
| [draft-aap-oauth-profile](https://datatracker.ietf.org/doc/draft-aap-oauth-profile/) | OAuth 2.0 profile for autonomous agents | 4.2 |
| [draft-barney-caam](https://datatracker.ietf.org/doc/draft-barney-caam/) | Contextual agent authorization mesh | 4.0 |
| [draft-liu-agent-operation-authorization](https://datatracker.ietf.org/doc/draft-liu-agent-operation-authorization/) | Verifiable delegation via JWT | 4.1 |
| [draft-rosenberg-oauth-aauth](https://datatracker.ietf.org/doc/draft-rosenberg-oauth-aauth/) | OAuth for agents on PSTN/SMS | 3.6 |
| [draft-oauth-ai-agents-on-behalf-of-user](https://datatracker.ietf.org/doc/draft-oauth-ai-agents-on-behalf-of-user/) | On-behalf-of-user extension | 3.7 |
| [draft-jia-oauth-scope-aggregation](https://datatracker.ietf.org/doc/draft-jia-oauth-scope-aggregation/) | Scope aggregation for multi-step workflows | 3.5 |
| [draft-liu-oauth-a2a-profile](https://datatracker.ietf.org/doc/draft-liu-oauth-a2a-profile/) | A2A profile for transaction tokens | 3.6 |
| [draft-song-oauth-ai-agent-authorization](https://datatracker.ietf.org/doc/draft-song-oauth-ai-agent-authorization/) | Target-based authorization | 2.8 |
| [draft-song-oauth-ai-agent-collaborate-authz](https://datatracker.ietf.org/doc/draft-song-oauth-ai-agent-collaborate-authz/) | Multi-agent collaboration authz | 3.5 |
| [draft-chen-ai-agent-auth-new-requirements](https://datatracker.ietf.org/doc/draft-chen-ai-agent-auth-new-requirements/) | New auth requirements analysis | 3.8 |
| [draft-yao-agent-auth-considerations](https://datatracker.ietf.org/doc/draft-yao-agent-auth-considerations/) | Auth considerations analysis | 3.1 |
The quality range is enormous -- from 2.8 to 4.8 -- and the approaches barely overlap. Some extend OAuth 2.0 with new grant types. Others define entirely new token formats (Agentic JWT). Still others propose mesh architectures or accountability layers on top of existing auth flows. Two drafts (song-oauth-ai-agent-authorization and song-oauth-ai-agent-collaborate-authz) come from the same Huawei team and address different facets of the problem. Two more (chen-oauth-rar-agent-extensions and chen-ai-agent-auth-new-requirements) come from a China Mobile team.
The gap our analysis identified in this cluster: most focus on **single-agent authorization**. Few address chained delegation across multiple agents, and none standardize real-time revocation in agent-to-agent workflows. An agent that obtains a token and delegates a sub-task to another agent -- which then delegates further -- creates a chain of trust that no single draft adequately covers.
## The Agent Gateway Melee: 10 Drafts
If OAuth for agents is about identity, the agent gateway cluster is about communication architecture. Ten drafts are competing to define how agents from different platforms and ecosystems collaborate:
| Draft | Approach | Score |
|-------|----------|------:|
| [draft-li-dmsc-macp](https://datatracker.ietf.org/doc/draft-li-dmsc-macp/) | Multi-agent collaboration protocol suite | 4.2 |
| [draft-agent-gw](https://datatracker.ietf.org/doc/draft-agent-gw/) | Semantic routing gateway | 3.9 |
| [draft-cui-dmsc-agent-cdi](https://datatracker.ietf.org/doc/draft-cui-dmsc-agent-cdi/) | Cross-domain interop framework | 3.0 |
| [draft-han-rtgwg-agent-gateway-intercomm-framework](https://datatracker.ietf.org/doc/draft-han-rtgwg-agent-gateway-intercomm-framework/) | Gateway intercommunication | 3.6 |
| [draft-li-dmsc-inf-architecture](https://datatracker.ietf.org/doc/draft-li-dmsc-inf-architecture/) | DMSC infrastructure architecture | 3.1 |
| [draft-liu-dmsc-acps-arc](https://datatracker.ietf.org/doc/draft-liu-dmsc-acps-arc/) | Agent collaboration protocols arch | 3.6 |
| [draft-yang-dmsc-ioa-task-protocol](https://datatracker.ietf.org/doc/draft-yang-dmsc-ioa-task-protocol/) | IoA task protocol | 3.0 |
| [draft-yang-ioa-protocol](https://datatracker.ietf.org/doc/draft-yang-ioa-protocol/) | IoA protocol | 3.6 |
| [draft-fu-nmop-agent-communication-framework](https://datatracker.ietf.org/doc/draft-fu-nmop-agent-communication-framework/) | Network AIOps comm framework | 3.0 |
| [draft-campbell-agentic-http](https://datatracker.ietf.org/doc/draft-campbell-agentic-http/) | HTTP best practices | -- |
A revealing pattern: five of these ten drafts reference "DMSC" -- Dynamic Multi-agent Secured Collaboration -- a concept pushed primarily by Chinese institutions through the IETF's DMSC side meeting. This cluster represents an organized attempt to define the agent collaboration architecture, but even within that effort, multiple competing proposals have emerged.
The gap: no draft in this cluster addresses **dynamic trust establishment between gateways**, or how to handle conflicting semantic schemas across ecosystems. If Agent Gateway A speaks MCP and Agent Gateway B speaks A2A Protocol, these drafts describe the need for translation but do not provide it.
## The Near-Duplicate Epidemic
Our embedding-based similarity analysis produced a more troubling finding: **25+ draft pairs** have cosine similarity above 0.98. Many are functionally identical proposals submitted under different names:
| Draft A | Draft B | Reason |
|---------|---------|--------|
| draft-a2a-moqt-transport | draft-nandakumar-a2a-moqt-transport | Same content, different name |
| draft-abbey-scim-agent-extension | draft-scim-agent-extension | Same draft, dual submission |
| draft-rosenberg-aiproto | draft-rosenberg-aiproto-nact | Renamed |
| draft-rosenberg-aiproto-cheq | draft-rosenberg-cheq | Renamed |
| draft-cui-nmrg-llm-nm | draft-irtf-nmrg-llm-nm | WG adoption (individual to IRTF) |
| draft-ar-emu-hybrid-pqc-eapaka | draft-ietf-emu-hybrid-pqc-eapaka | WG adoption |
| draft-zheng-agent-identity-management | draft-zheng-dispatch-agent-identity-management | Same draft, different WG |
| draft-sun-zhang-iaip | draft-sz-dmsc-iaip | Same draft, different WG |
| draft-zeng-mcp-troubleshooting | draft-zm-rtgwg-mcp-troubleshooting | Same draft, different WG |
Some of these duplications are legitimate IETF process: a draft moves from individual submission to working group adoption (like draft-cui-nmrg-llm-nm becoming draft-irtf-nmrg-llm-nm). Others reflect authors shopping the same draft to multiple working groups. And a few appear to be genuine content duplication -- the same ideas submitted under different author combinations.
The practical effect: the 361-draft corpus includes substantial double-counting. After de-duplication, the true number of distinct proposals is probably closer to 300. But even 300 competing proposals in nine months is extraordinary.
## The A2A Protocol Zoo
Zooming out from individual clusters, the broadest fragmentation is in the **120 A2A protocol drafts**. These span everything from low-level transport (A2A over MOQT/QUIC) to high-level semantic routing (intent-based agent interconnection) to specific use cases (MCP for network troubleshooting).
The most common technical idea in the entire corpus -- "Multi-Agent Communication Protocol" -- appears in **8 separate drafts** from different teams. Eight teams are independently designing how agents should talk to each other.
| Competing Area | Drafts | Distinguishing Fact |
|---------------|-------:|-------------------|
| OAuth for agents | 14 | No draft handles chained delegation |
| Agent gateway/collaboration | 10 | 5 are DMSC-linked; no trust framework |
| Agent discovery | 6 | Range from DNS-based to full directories |
| Intent-based routing | 5 | Requirements-heavy, protocol-light |
| 6G agent requirements | 6 | Wish lists, not specifications |
| SCIM/identity registry | 6 | 3 are near-duplicates |
The discovery cluster is particularly illustrative. Six drafts propose different ways to find AI agents: [draft-narajala-ans](https://datatracker.ietf.org/doc/draft-narajala-ans/) (score 4.2) proposes a DNS-based Agent Name Service. [draft-mozleywilliams-dnsop-bandaid](https://datatracker.ietf.org/doc/draft-mozleywilliams-dnsop-bandaid/) (3.6) also uses DNS but via SVCB records. [draft-pioli-agent-discovery](https://datatracker.ietf.org/doc/draft-pioli-agent-discovery/) (3.2) defines a lightweight registration and discovery protocol. [draft-gaikwad-woa](https://datatracker.ietf.org/doc/draft-gaikwad-woa/) (3.2) proposes a Web of Agents format using JSON Schema. None of them reference each other.
## The Deeper Fragmentation: Different Technological DNA
The protocol-level fragmentation documented above is only the visible layer. Beneath it, our RFC cross-reference analysis reveals a more fundamental divide: the two major drafting blocs are building on **entirely different technology stacks**.
| Foundation | Chinese Bloc | Western Bloc |
|-----------|-------------|-------------|
| **Network management (YANG/NETCONF)** | Strong (RFC 6241, 8639, 8641, 7950) | Absent |
| **IoT security (COSE/CBOR/OSCORE/CoAP)** | Absent | Strong (RFC 9052, 8949, 8613, 7252) |
| **PKI/Certificates (X.509)** | Absent | Strong (RFC 5280) |
| **Lightweight auth (EDHOC, CWT)** | Absent | Strong (RFC 9528, 8392) |
| **Web APIs (HTTP Semantics)** | Weak | Strong (RFC 9110) |
| **TLS 1.3** | Moderate (8 citations) | Strong (18 citations) |
| **OAuth 2.0** | Present (11 citations) | Present (7 citations) |
The Chinese bloc -- Huawei, China Mobile, China Telecom, China Unicom, and associated research institutions -- builds agent infrastructure on **YANG/NETCONF**, the network management protocols that underpin autonomous network operations. The Western bloc -- Ericsson, Cisco, ATHENA, and European research labs -- builds on **COSE/CBOR/CoAP** (IoT security) and **HTTP/TLS/PKI** (web infrastructure).
The **only shared foundation** is OAuth 2.0, which both blocs cite at comparable rates. This is why the OAuth cluster has 14 competing proposals: it is the one piece of common ground, and everyone is fighting over it.
This means fragmentation goes deeper than protocol design. Even if the community agreed on a single agent communication pattern, the underlying plumbing is incompatible. A Chinese draft building on NETCONF and a Western draft building on CoAP cannot interoperate without a translation layer -- and that translation layer, as we document in the gap analysis, does not exist.
## What Fragmentation Costs
The costs of this fragmentation are not theoretical:
**For implementers**: Which OAuth extension do you implement? Do you support SCIM agent schemas or Web of Agents? If your agent needs to discover another agent, do you look in DNS, a well-known URI, or a dedicated directory? Today there is no canonical answer, and choosing wrong means re-implementation when the IETF eventually converges.
**For the IETF process**: Working groups spend time evaluating competing proposals that could be spent converging on solutions. The OAuth working group alone faces 14 agent-related drafts. The volume creates overhead that slows progress on any single proposal.
**For security**: When multiple incompatible authentication and authorization schemes exist, implementations inevitably take shortcuts. The most dangerous agents will be those that implement the easiest -- not the most secure -- available standard.
**For the ecosystem**: Each month that fragmentation persists, real-world agent deployments make choices. Those choices entrench specific approaches, making convergence harder and interoperability more expensive. The window for a unified standard narrows with every proprietary deployment.
## The Convergence Signals
Not everything is divergence. A few positive patterns emerged from the data:
**EDHOC is converging.** The lightweight authenticated key exchange protocol has multiple working-group-adopted drafts ([draft-ietf-lake-edhoc-psk](https://datatracker.ietf.org/doc/draft-ietf-lake-edhoc-psk/), [draft-ietf-lake-authz](https://datatracker.ietf.org/doc/draft-ietf-lake-authz/), [draft-ietf-emu-eap-edhoc](https://datatracker.ietf.org/doc/draft-ietf-emu-eap-edhoc/)) with coordinated authorship. This is what healthy standards development looks like: multiple drafts from different teams that explicitly build on each other.
**SCIM agent extensions show maturity.** The Okta team's [draft-abbey-scim-agent-extension](https://datatracker.ietf.org/doc/draft-abbey-scim-agent-extension/) (score 3.8) and [draft-wahl-scim-agent-schema](https://datatracker.ietf.org/doc/draft-wahl-scim-agent-schema/) (score 3.9) represent a practical approach: extend an existing, widely-deployed protocol (SCIM) rather than invent a new one. This pragmatism is a convergence signal.
**The verifiable conversations approach is gaining traction.** [draft-birkholz-verifiable-agent-conversations](https://datatracker.ietf.org/doc/draft-birkholz-verifiable-agent-conversations/) (score 4.5) and the WIMSE/ECT work on execution context tokens represent a "record everything, verify later" approach to agent accountability that multiple communities can support.
## What Needs to Happen
Three structural interventions would accelerate convergence:
**1. Working groups need to pick winners.** The IETF process allows competing proposals, but at some point working groups must adopt specific approaches and redirect competing efforts. In the OAuth agent space, the highest-quality proposals (DAAP, Agentic JWT, RAR extensions) should be evaluated head-to-head, not allowed to proliferate indefinitely.
**2. Interoperability testing, not just drafting.** The 120 A2A protocol proposals exist mostly as text. Interop testing -- where implementations from different teams prove they can work together -- would quickly reveal which proposals have real engineering substance and which are paper exercises.
**3. The translation layer must be built.** Rather than picking one A2A protocol, the community may be better served by a thin interoperability layer that lets agents using different protocols communicate through gateways. Our gap analysis found this cross-protocol translation gap entirely unaddressed -- zero technical ideas in the current corpus.
---
### Key Takeaways
- **14 competing OAuth-for-agents proposals** illustrate the depth of fragmentation; none handle chained delegation across agent networks
- **120 A2A protocol drafts** exist without an interoperability layer; the most common idea in the corpus appears in 8 separate drafts from different teams
- **25+ near-duplicate pairs** (>0.98 similarity) inflate the draft count; after de-duplication, roughly 300 distinct proposals remain
- **Convergence signals exist** in EDHOC authentication, SCIM agent extensions, and verifiable conversations -- areas where teams explicitly build on each other
- **Fragmentation goes deeper than protocols**: Chinese and Western blocs build on different RFC foundations (YANG/NETCONF vs COSE/CBOR/CoAP); the only shared bedrock is OAuth 2.0
- **The missing piece** is a cross-protocol translation layer; no draft in the corpus addresses how agents using different protocols can interoperate
*Next in this series: [What Nobody's Building (And Why It Matters)](04-what-nobody-builds.md) -- The 12 gaps in the IETF's AI agent landscape, and the real-world consequences of leaving them unfilled.*
---
*Data from the IETF Draft Analyzer's embedding-based overlap analysis (nomic-embed-text) and cluster detection at 0.85/0.90 similarity thresholds.*

View File

@@ -0,0 +1,154 @@
# What Nobody's Building (And Why It Matters)
*The 12 gaps in the IETF's AI agent landscape -- and the real-world disasters they invite.*
---
Imagine an AI agent managing a hospital's drug-dispensing system. It receives instructions from a prescribing agent, coordinates with a pharmacy agent, and issues delivery commands to a robotic dispensing agent. On Tuesday morning, the prescribing agent hallucinates a dosage. The pharmacy agent fills it. The dispensing agent delivers it. No human saw it happen. No system flagged it. No protocol exists to roll back the dispensed medication.
This is not a hypothetical failure mode. It is the predictable consequence of the IETF's three most critical standardization gaps.
We analyzed **361 Internet-Drafts**, extracted their technical components, and compared the result against what real-world agent deployments actually require. We found **12 gaps** -- areas where standardization work is missing or inadequate. Three of them are critical. And the critical ones all share a defining characteristic: they address what happens when autonomous agents fail or misbehave.
Nobody is building the safety net.
## The 12 Gaps
Our gap analysis sorted findings by severity based on the breadth of the shortfall and the consequences of leaving it unfilled:
| # | Gap | Severity | Ideas Addressing It |
|---|-----|----------|--------------------:|
| 1 | Agent Behavior Verification | CRITICAL | 52 |
| 2 | Agent Resource Management | CRITICAL | 117 |
| 3 | Agent Error Recovery and Rollback | CRITICAL | 6 |
| 4 | Cross-Protocol Translation | HIGH | 0 |
| 5 | Agent Lifecycle Management | HIGH | 90 |
| 6 | Multi-Agent Consensus | HIGH | 5 |
| 7 | Human Override and Intervention | HIGH | 4 |
| 8 | Cross-Domain Security Boundaries | HIGH | 10 |
| 9 | Dynamic Trust and Reputation | HIGH | 5 |
| 10 | Agent Performance Monitoring | MEDIUM | 26 |
| 11 | Agent Explainability | MEDIUM | 5 |
| 12 | Agent Data Provenance | MEDIUM | 79 |
Two numbers in that table should alarm you: the **6 ideas** addressing error recovery (all from a single draft), and the **0 ideas** addressing cross-protocol translation. Across 361 drafts, these gaps are not underserved. They are unserved.
## Critical Gap 1: Agent Behavior Verification
**The problem**: No mechanism exists to verify that a deployed AI agent actually behaves according to its declared policies or specifications.
**The numbers**: Only **44 of 361 drafts** address AI safety and alignment. The 4:1 ratio of capability to safety work means the community is building agents four times faster than it is building the tools to keep them honest.
**What 52 ideas partially address**: Some exist on the periphery. [draft-aylward-daap-v2](https://datatracker.ietf.org/doc/draft-aylward-daap-v2/) (score 4.8 -- the highest-rated draft in the corpus) defines a behavioral monitoring framework and cryptographic identity verification. [draft-birkholz-verifiable-agent-conversations](https://datatracker.ietf.org/doc/draft-birkholz-verifiable-agent-conversations/) (score 4.5) proposes verifiable conversation records using COSE signing. [draft-berlinai-vera](https://datatracker.ietf.org/doc/draft-berlinai-vera/) (score 3.9) introduces a zero-trust architecture with five enforcement pillars.
**What is still missing**: Runtime verification. These drafts define what agents *should* do and how to *record* what they did. None provides a real-time mechanism to detect that an agent is deviating from its declared behavior *while it is operating*. The gap is between policy declaration and policy enforcement -- the difference between a speed limit sign and a speed camera.
**The scenario**: A financial trading agent is authorized to execute trades within specified parameters. It begins operating within bounds but, after a model update, starts exceeding risk limits. Without runtime behavior verification, the deviation is only discovered in post-hoc audit -- potentially days later, after significant damage.
## Critical Gap 2: Agent Resource Management
**The problem**: No framework exists for managing computational resources, memory, and processing power across distributed AI agents.
**The numbers**: **93 drafts** focus on autonomous network operations, and **117 ideas** touch on resource-adjacent topics. But those ideas address how agents communicate about tasks -- not how they compete for and share limited resources.
**What is missing**: Scheduling, quotas, fair allocation, and priority mechanisms for multi-agent environments. When ten agents compete for the same GPU cluster, which gets priority? When an agent's computation exceeds its allocation, what happens? When a high-priority emergency response agent needs resources currently held by a routine monitoring agent, how does preemption work?
**The scenario**: A telecom operator deploys 50 AI agents for network monitoring, troubleshooting, and optimization. During a major outage, all 50 agents simultaneously request inference resources to diagnose the problem. With no resource management framework, agents compete chaotically. The most aggressive agents get resources; the most important diagnostic tasks may not. The outage extends because the agents that could fix it are starved by the agents that are observing it.
## Critical Gap 3: Agent Error Recovery and Rollback
**The problem**: No standards exist for how agents handle errors, cascading failures, or the rollback of autonomous decisions.
**The numbers**: This is the starkest gap in the corpus. Only **6 extracted ideas** address it, and all come from a single draft: [draft-yue-anima-agent-recovery-networks](https://datatracker.ietf.org/doc/draft-yue-anima-agent-recovery-networks/) (score 4.1). One team, out of 557 authors, is working on this.
**The 6 ideas from that draft**:
- Task-Oriented Multi-Agent Recovery Framework
- Inter-Agent Communication Protocol Requirements
- State Consistency Management
- Error and Success Reporting Framework (from a separate draft)
- Generic Agent Response Framework
- Mandatory restrictive failure behavior
That is the entire body of work the IETF has produced on agent error recovery. For context, "Multi-Agent Communication Protocol" -- defining how agents *talk* -- appears in 8 drafts. The community has invested 8 times more effort in the plumbing than in the fire escape.
**What is missing**: Circuit breakers for cascading failures. Checkpoint and rollback protocols. Blast radius containment. Graceful degradation. All concepts well-established in distributed systems engineering, but absent from the agent standards landscape.
**The scenario**: A multi-agent supply chain system manages inventory, shipping, and payments. The inventory agent processes a large batch incorrectly, leading the shipping agent to dispatch wrong items, which causes the payment agent to process refunds to wrong accounts. The cascade happens in minutes. Without rollback mechanisms, untangling the mess requires manual human intervention across three systems -- and the agents continue making decisions based on corrupted state while humans try to intervene.
## The High-Priority Gaps
Six additional gaps scored HIGH severity. Each represents a missing piece that working deployments will hit:
### Cross-Protocol Translation (0 ideas)
With **120 competing A2A protocols** and no translation layer, agents speaking different protocols simply cannot interoperate. This gap is entirely unaddressed -- zero technical ideas in the corpus. It is the only gap with literally no coverage.
The parallel is the early web: HTTP won not because it was the best protocol but because it was the one protocol everyone could speak. The agent ecosystem has no HTTP equivalent. If the IETF does not build a translation layer, the market will -- and the result will be vendor-locked ecosystems rather than open interoperability.
### Human Override and Intervention (4 ideas)
Only **30 human-agent interaction drafts** exist versus **93 autonomous operations** and **120 A2A protocol** drafts. Agents are being designed to talk to each other at a 4:1 ratio over being designed to talk to humans. Emergency override protocols -- the "big red button" -- are almost entirely absent.
[draft-rosenberg-aiproto-cheq](https://datatracker.ietf.org/doc/draft-rosenberg-aiproto-cheq/) (score 3.9) is a rare exception: it defines a protocol for human confirmation of agent decisions before execution. But CHEQ is opt-in and pre-execution. No draft defines what happens when a human needs to stop a running agent, constrain its behavior, or take over its task mid-execution.
### Multi-Agent Consensus (5 ideas)
When a group of agents disagree -- the diagnosis agent says the router is down, the monitoring agent says it is up, the optimization agent is rerouting traffic around it -- who arbitrates? No framework exists for agents to resolve conflicting assessments without human intervention.
### Dynamic Trust and Reputation (5 ideas)
Static certificates authenticate identity but cannot express "this agent has been reliable for 6 months" or "this agent's accuracy degraded last week." Long-running agent ecosystems need trust that is earned, tracked, and revocable. The current landscape relies entirely on binary trust: either an agent has a valid certificate or it does not.
### Cross-Domain Security Boundaries (10 ideas)
An agent authenticated in Company A's domain needs to perform a task in Company B's domain. Identity management exists -- the 108 identity/auth drafts cover this. What does not exist is trust *isolation*: preventing an agent authenticated for a narrow task from escalating privileges across domain boundaries.
### Agent Lifecycle Management (90 ideas)
Registration is covered. What happens after registration is not: versioning when an agent is updated, graceful retirement when an agent is decommissioned, migration when an agent moves between hosts, and dependency management when other agents rely on it.
## The Structural Problem
Here is the finding the Architect on our team surfaced that reframes the entire gap analysis:
**The severity of each gap correlates with the coordination difficulty required to fill it.**
The critical gaps (behavior verification, resource management, error recovery) require agreement across *multiple* IETF working groups. They cut across safety, networking, identity, and operations -- areas currently owned by separate teams that rarely collaborate. The high gaps (cross-protocol translation, human override, consensus) require even broader agreement: they need architects who see the whole ecosystem, not just their protocol.
Now look back at the team bloc analysis from Post 2. The 18 team blocs are *islands*. Cross-team collaboration is sparse. The strongest cross-bloc connection involves 3 shared drafts. The gaps that require the most cross-team work are being produced by an ecosystem that does the least cross-team work.
This is the structural explanation for the safety deficit. It is not that people do not care about safety. It is that safety standards require coordination across boundaries that the current authorship structure cannot bridge. Capability standards can be built within a single team. Safety standards cannot.
Our category co-occurrence analysis provides the concrete proof. Safety drafts are not entirely isolated -- they co-occur with 8 of 10 categories, coupling most strongly with policy and governance (**60% of safety drafts**, lift 2.3x) and identity/auth (**58%**, lift 1.7x). But the pattern is revealing: safety pairs with *governance* categories, not *implementation* categories. Of the 136 drafts tagged as A2A protocols, only **12 (8.8%) also address safety**. Safety has **zero co-occurrence** with agent discovery/registration and **zero co-occurrence** with model serving/inference. Its weakest links are to the categories where agents actually *do* things: A2A protocols (12), ML traffic management (3), and autonomous network operations (4). Safety is being discussed in governance papers. It is completely absent from discovery infrastructure and inference pipelines. It is barely present in the protocols that need it most. The traffic lights are not just behind the highways -- they are on a different road entirely.
## The 4:1 Ratio, Revisited
The safety deficit is not just a number. It is a structural property of how the IETF's AI agent community is organized.
| Category | Drafts | Team Blocs Active |
|----------|-------:|------------------:|
| A2A protocols | 120 | Many (distributed across blocs) |
| Autonomous operations | 93 | Primarily Huawei, Chinese telecom |
| Agent identity/auth | 108 | Ericsson, Nokia, ATHENA, multiple |
| **AI safety/alignment** | **44** | **Few; mostly independents/startups** |
| **Human-agent interaction** | **30** | **Rosenberg/White (2-person team)** |
The capability categories have organized teams behind them. The safety categories rely on individual contributors and small, unconnected teams. The best safety draft in the corpus (DAAP, score 4.8) comes from an independent author (Aylward). The best human-agent drafts come from a two-person Five9/Bitwave team. There is no 13-person safety bloc with 94% cohesion.
Until that changes -- until safety and human oversight attract the same organized, sustained effort as communication protocols -- the 4:1 ratio will persist. And the gaps will remain open.
---
### Key Takeaways
- **12 gaps** exist in the IETF's AI agent landscape: 3 critical, 6 high, 3 medium
- **The 3 critical gaps** all address failure modes: behavior verification, resource management, error recovery and rollback
- **Error recovery has only 6 ideas** from a single draft; **cross-protocol translation has zero** -- the starkest absences across 361 drafts
- **Gap severity correlates with coordination difficulty**: the hardest gaps require cross-team, cross-WG collaboration that the current island structure cannot produce
- **The safety deficit is structural, not attitudinal**: capability standards can be built by one team; safety standards require ecosystem-wide coordination that does not yet exist
*Next in this series: [Where 361 Drafts Converge (And Where They Don't)](05-1262-ideas.md) -- 96% of ideas appear in exactly one draft. The fragmentation goes all the way down.*
---
*Gap analysis based on 361 drafts, cross-referenced against real-world deployment requirements for autonomous AI agent systems. Data current as of March 2026.*

View File

@@ -0,0 +1,169 @@
# Where 361 Drafts Converge (And Where They Don't)
*The fragmentation goes deeper than competing protocols. It extends all the way down to the idea level.*
---
We extracted roughly 1,700 technical components from 361 Internet-Drafts -- mechanisms, architectures, protocols, and patterns. Then we asked: how many of these ideas does anyone else also propose?
The answer is devastating: **96% appear in exactly one draft.** Of 1,692 unique technical ideas in the corpus, only **75** show up in two or more drafts. Only **11** appear in three or more. The fragmentation documented in the previous posts -- 14 competing OAuth proposals, 120 A2A protocols with no interop layer -- is not just a protocol-level problem. It extends all the way down. At the idea level, the landscape is overwhelmingly a collection of islands.
But islands are not the whole story. Using fuzzy matching across organizational boundaries, we found **628 ideas** where different organizations are working on recognizably similar problems -- even when they use different names and different approaches. These cross-org convergence signals are the embryonic consensus of the agent standards landscape: the problems that different teams, in different countries, with different agendas, independently recognize and attempt to solve.
These convergence signals are more impressive than they first appear. Recall from Post 2 that **55% of all drafts have never been revised** beyond their first submission, and **65% of Huawei's drafts** are fire-and-forget. The ideas that converge across organizations are not the generic scaffolding of first-draft submissions -- they represent genuine engineering investment from teams that independently identified the same problem and committed resources to solving it.
The picture that emerges is paradoxical: the raw material for a complete agent ecosystem exists. The convergent ideas point toward the architecture the ecosystem needs. But they exist in isolation -- proposed by separate teams, embedded in separate drafts, with no connective tissue linking them into a coherent blueprint.
## The Taxonomy
Every extracted idea was classified by type. The distribution reveals what kind of thinking dominates the landscape:
| Type | Count | Share | What It Means |
|------|------:|------:|---------------|
| Mechanism | 663 | 37% | Concrete technical solutions: auth flows, routing algorithms, token formats |
| Architecture | 280 | 16% | System designs and reference models |
| Pattern | 251 | 14% | Reusable design approaches |
| Protocol | 228 | 13% | Full protocol specifications |
| Requirement | 171 | 10% | Formal requirement documents |
| Extension | 168 | 9% | Additions to existing standards (OAuth, SCIM, DNS) |
| Other | 19 | 1% | Frameworks, profiles, algorithms, schemas |
The dominance of **mechanisms** (663 of 1,780 extracted components) tells us the community is in building mode. These are not abstract position papers -- they are concrete, implementable solutions. The 228 protocols and 168 extensions to existing standards show that much of the work builds on established foundations (OAuth 2.0, SCIM, DNS, EDHOC) rather than starting from scratch.
The 280 architectures and 171 requirements suggest healthy standards development: teams are defining reference models before writing code. But the 251 patterns -- reusable approaches without full protocol specification -- indicate that many teams have identified what needs to be done without committing to how.
## Where Teams Converge
By exact title, only 75 ideas appear in multiple drafts. But ideas with different names often describe the same concept -- "Agent Gateway" in one draft and "Inter-Agent Communication Hub" in another. Our fuzzy-matching overlap analysis (using SequenceMatcher at 0.75 threshold) across organizational boundaries found **628 ideas** where 2+ distinct organizations are working on recognizably similar problems -- **43% of all unique idea clusters** have cross-org validation. These are the genuine consensus signals.
| Idea | Orgs | Drafts | Key Organizations |
|------|-----:|-------:|-------------------|
| A2A Communication Paradigm | 8 | 5 | CAICT, Deutsche Telekom, Huawei, Orange, Telefonica |
| AI Agent Network Architecture | 8 | 5 | China Mobile, Deutsche Telekom, Huawei, Orange, UnionPay |
| Multi-Agent Communication Protocol | 7 | 8 | AsiaInfo, BUPT, China Mobile, China Telecom, Huawei |
| AI Agent Communication Network (ACN) | 7 | 5 | ANP Open Source, China Mobile, Cisco, Five9, Huawei |
| NLIP (Natural Language Interchange) | 7 | 1 | Fordham, IBM, Purdue, ServiceNow, eBay |
| ELA Protocol | 6 | 6 | Bitwave, Cisco, Ericsson, Five9, Inria |
| AI Gateway | 6 | 4 | AsiaInfo, BUPT, China Telecom, Huawei, UnionPay |
| Agent Communication across WAN | 6 | 3 | China Mobile, China Unicom, Deutsche Telekom, Huawei, Orange |
The most-converged idea -- "A2A Communication Paradigm" -- draws independent contributions from **8 organizations across 5 countries**. This is simultaneously the strongest convergence signal and the strongest fragmentation signal. Eight organizations agree this is important. They are building separate, incompatible versions.
Look at who bridges the divide. In three of the top eight convergent ideas, the same names appear alongside Chinese institutions: **Deutsche Telekom, Telefonica, and Orange**. These European telecoms show up in "A2A Communication Paradigm," "AI Agent Network Architecture," and "Agent Communication across WAN" -- each time co-listed with Huawei, China Mobile, or China Unicom. Of the **180 ideas that cross the Chinese-Western organizational divide**, European telecoms are present on a disproportionate share. The organizations most likely to prevent the agent ecosystem from splitting into incompatible regional stacks are not Google or Microsoft -- they are European carriers operating in both markets. US Big Tech is almost entirely absent from cross-divide convergence.
The organization-pair overlaps reveal where real collaboration happens -- and where it does not:
| Org Pair | Shared Ideas | Signal |
|----------|-------------:|--------|
| China Unicom -- Huawei | 32 | Deep intra-bloc alignment |
| China Mobile -- Huawei | 27 | Deep intra-bloc alignment |
| Ericsson -- Inria | 21 | European cross-org collaboration |
| Tsinghua -- Zhongguancun Lab | 20 | Chinese academic convergence |
| Fraunhofer SIT -- Tradeverifyd | 10 | Verifiable records niche |
The pattern is stark: the highest-overlap pairs are Chinese institutions working within established blocs. Formal co-authorship between Chinese and Western organizations is thin -- but idea-level convergence, mediated by European telecoms operating in both markets, is broader than the co-authorship data suggests.
The convergence signals cluster in three areas:
**1. Agent communication infrastructure.** How agents discover, connect to, and message each other. This is the most active area with the most redundant proposals. The underlying need is clear; the implementation is contested.
**2. Authentication and authorization.** Action-based authorization, agent registration, cryptographic identity verification. OAuth extensions dominate, but the approaches diverge significantly between pure OAuth extension (add claims/scopes) and novel frameworks (DAAP accountability protocol, STAMP delegation proofs).
**3. Network architecture.** Agent gateways, agent communication networks, network management architectures. This is where the Chinese institutional ecosystem has the strongest presence, with Huawei and affiliated organizations producing most of the architecture ideas.
## Where Teams Innovate
The 96% of ideas appearing in only one draft are a mix: mostly generic components describing what each draft does ("Agent Gateway," "Transport Configuration System"), but scattered among them are genuinely novel proposals that no other team has attempted -- either because they are too new, too specialized, or ahead of their time.
Some standouts from the unique ideas:
**Verifiable Agent Behavior Attestation** (draft-birkholz-verifiable-agent-conversations) -- A CDDL-based format for cryptographically signing agent conversation records, enabling post-hoc verification of agent behavior. This directly addresses the critical behavior verification gap.
**ADOL: Agentic Data Optimization Layer** ([draft-chang-agent-token-efficient](https://datatracker.ietf.org/doc/draft-chang-agent-token-efficient/), score 4.5) -- Addresses token bloat in agent communication protocols. As agents exchange increasingly complex context, message sizes explode. ADOL compresses agent communications by 60-80%, a practical necessity that nobody else is working on.
**Working Memory** (draft-agent-gw) -- A structured context management system that maintains state across multi-step agent operations. Sounds basic -- but no other draft proposes a standard for how agents should manage persistent operational context.
**Autonomous Optical Network Operation** (draft-zhao-ccamp-actn-optical-network-agent) -- Applies agent architecture to the specific domain of optical network management. This is the kind of vertical specialization that validates the horizontal agent architecture work.
**Execution Context Token (ECT)** ([draft-nennemann-wimse-ect](https://datatracker.ietf.org/doc/draft-nennemann-wimse-ect/), score 4.0) -- A JWT extension that records what each task did, linked to predecessors via a DAG. This is arguably the single most architecturally significant idea in the corpus: it turns the execution history of a multi-agent workflow into a cryptographically verifiable directed acyclic graph. It is the technical foundation for accountability, rollback, audit, and provenance.
**CHEQ Protocol** ([draft-rosenberg-aiproto-cheq](https://datatracker.ietf.org/doc/draft-rosenberg-aiproto-cheq/), score 3.9) -- Human confirmation of agent decisions before execution. The only concrete protocol proposal for human-in-the-loop agent oversight. In a landscape of 30 human-agent interaction drafts, CHEQ stands alone as an implementable solution.
## The Five Ideas That Matter Most
If you are building agent systems today and need to know which IETF proposals to watch, these five represent the highest combination of quality, novelty, and gap-filling potential:
| Idea | Draft | Score | Why It Matters |
|------|-------|------:|---------------|
| Execution Context Token | draft-nennemann-wimse-ect | 4.0 | DAG-based execution evidence; foundation for audit, rollback, and accountability |
| DAAP Accountability Protocol | draft-aylward-daap-v2 | 4.8 | Most comprehensive safety proposal; authentication + monitoring + enforcement |
| STAMP Delegation Proofs | draft-guy-bary-stamp-protocol | 4.6 | Cryptographic proof that an agent was authorized for a specific task |
| Agent Description Language (ADL) | draft-nederveld-adl | 4.1 | JSON standard for describing agent capabilities, tools, and permissions |
| Verifiable Conversations | draft-birkholz-verifiable-agent-conversations | 4.5 | Cryptographic signing of conversation records for auditability |
Together, these five ideas sketch the outline of the ecosystem architecture that Post 6 will describe in full: ECT provides the execution backbone, DAAP provides the accountability layer, STAMP proves delegation, ADL describes capabilities, and verifiable conversations create the audit trail.
## Mapping Ideas to Gaps
The most revealing analysis is mapping which ideas partially address which gaps:
| Gap | Severity | Ideas | Coverage |
|-----|----------|------:|----------|
| Resource Management | CRITICAL | 117 | Peripheral: ideas touch on task management but not resource contention |
| Behavior Verification | CRITICAL | 52 | Partial: attestation and monitoring ideas exist but no runtime enforcement |
| Error Recovery/Rollback | CRITICAL | 6 | Near-zero: 6 ideas from one draft (draft-yue-anima-agent-recovery-networks) |
| Cross-Protocol Translation | HIGH | 0 | Complete absence: zero ideas in the entire corpus |
| Lifecycle Management | HIGH | 90 | Partial: registration covered, retirement/versioning not |
| Human Override | HIGH | 4 | Near-zero: CHEQ exists but no emergency override protocol |
| Multi-Agent Consensus | HIGH | 5 | Minimal: no conflict resolution framework |
| Cross-Domain Security | HIGH | 10 | Partial: identity covered, isolation not |
| Dynamic Trust | HIGH | 5 | Minimal: trust scoring exists conceptually but not as protocol |
| Performance Monitoring | MEDIUM | 26 | Moderate: benchmarking ideas exist (draft-cui-nmrg-llm-benchmark) |
| Explainability | MEDIUM | 5 | Minimal: no decision-explanation protocol |
| Data Provenance | MEDIUM | 79 | Partial: data format ideas exist but no provenance chain standard |
The pattern is clear: the gaps with the highest idea counts (resource management at 117, lifecycle at 90, provenance at 79) are gaps where the *periphery* of existing work touches the problem. Teams building communication protocols think about resources; teams building discovery think about lifecycle. But nobody makes these the *central* problem.
The gaps with near-zero idea counts (error recovery at 6, human override at 4, consensus at 5, cross-protocol translation at 0) are the ones where no team is even circling the problem. These are true blind spots.
## The Ideas Nobody Had
Sometimes the absence is the finding. Here are technical ideas conspicuous in their absence from the entire corpus:
- **Agent capability degradation signaling**: No protocol for an agent to advertise that its performance has degraded (model drift, resource constraints, partial failure). Other agents continue relying on it at full trust.
- **Multi-agent transaction semantics**: No ACID-like guarantees for multi-agent workflows. If three agents must all succeed or all roll back, there is no two-phase commit equivalent.
- **Agent migration protocol**: No standard for moving a running agent from one host to another while preserving state and active connections. Critical for cloud deployments.
- **Privacy-preserving agent discovery**: No mechanism for an agent to find capabilities without revealing its intent. "I need a medical diagnosis agent" reveals sensitive information before any trust is established.
- **Agent cost and billing**: No standard for agents to negotiate compensation for services. Agents performing work for other agents have no way to express "this costs X" or "you have Y credits remaining."
Each of these absences represents an opportunity for new drafts that would fill genuine needs.
## What the Taxonomy Tells Builders
Three practical takeaways for anyone implementing agent systems:
**1. Build on the convergent ideas.** Agent registration, action-based authorization, and capability-based discovery appear across multiple teams and organizations. These represent genuine consensus about what the infrastructure needs, even if implementations diverge.
**2. Watch the single-source innovations.** The long tail of single-draft ideas contains the innovations that will differentiate the next generation of agent platforms. ECT, CHEQ, ADOL, and ADL are not widely known but represent some of the most thoughtful engineering in the corpus.
**3. Fill the blank spaces.** Error recovery, cross-protocol translation, and human override are the clearest opportunities for new contributions. The community has signaled these gaps matter (through the severity of the gap analysis) but has not yet produced the ideas to fill them.
---
### Key Takeaways
- **96% of ideas appear in exactly one draft** -- fragmentation extends all the way down to the idea level; only 75 of 1,692 unique ideas show cross-draft convergence
- **628 cross-org convergent ideas** (43% of unique clusters, via fuzzy matching) reveal where organizations independently agree; highest-overlap pairs are Chinese institutions (China Unicom-Huawei: 32 shared ideas)
- **The critical gaps remain unfilled**: error recovery has 6 ideas from one draft; cross-protocol translation has zero
- **Five ideas to watch**: ECT (execution DAG), DAAP (accountability), STAMP (delegation proof), ADL (agent description), verifiable conversations (audit trail)
- **Convergence clusters in three areas**: agent communication infrastructure, authentication/authorization, and network architecture
*Next in this series: [Drawing the Big Picture](06-big-picture.md) -- 628 cross-org convergent ideas, 12 gaps, and the architectural vision that connects them.*
---
*Idea extraction performed by Claude from full-text analysis of each draft. Classification into types (mechanism, architecture, protocol, pattern, extension, requirement) based on the technical content of each proposal. Data current as of March 2026.*

View File

@@ -0,0 +1,174 @@
# Drawing the Big Picture: What the Agent Ecosystem Actually Needs
*361 drafts, 628 cross-org convergent ideas, 12 gaps -- and the architectural vision that connects them all.*
---
We have spent five posts documenting a paradox: the IETF's AI agent landscape has extraordinary breadth (361 drafts), deep fragmentation at every level (96% of ideas appear in only one draft, 120 competing A2A protocols, 14 OAuth proposals), concentrated authorship (18 team blocs, one company writing 18% of all drafts), and critical gaps (behavior verification, error recovery, human override) that nobody is filling.
The landscape has quantity. It lacks architecture.
This post is about what the architecture looks like -- not in theory, but derived from the data. The 12 gaps are not random absences; they are structurally related. The convergent ideas contain the components; they need a blueprint. And the blueprint already has a foundation: existing IETF work on workload identity (SPIFFE/WIMSE) and execution evidence (Execution Context Tokens) provides the lower layers. What is missing is what goes on top.
## What the Ecosystem Needs: Four Pillars
Our analysis -- synthesizing the gaps, the ideas, and the existing proposals -- points to four missing pillars:
### Pillar 1: DAG-Based Execution
**The gap it fills**: Error Recovery and Rollback (Critical), Resource Management (Critical)
Every multi-agent workflow is a directed acyclic graph: tasks with dependencies, checkpoints, and decision points. But no draft in the corpus defines "agent task graph" as a first-class construct. Without it, there is no way to:
- Know which tasks depend on which
- Place checkpoints for rollback
- Calculate the blast radius of a failure
- Schedule resources based on the graph structure
The Execution Context Token (ECT) from [draft-nennemann-wimse-ect](https://datatracker.ietf.org/doc/draft-nennemann-wimse-ect/) provides the evidence layer: each task produces a signed token linked to its predecessors via parent references, forming a verifiable DAG. What is missing is the orchestration semantics: when to checkpoint, how to roll back, how to contain cascading failures.
The data supports this: the 6 ideas addressing error recovery (all from [draft-yue-anima-agent-recovery-networks](https://datatracker.ietf.org/doc/draft-yue-anima-agent-recovery-networks/)) include "Task-Oriented Multi-Agent Recovery Framework" and "State Consistency Management" -- DAG concepts by another name. The 117 ideas touching resource management need a graph-aware scheduler. The answer is the same structure: a DAG execution model.
### Pillar 2: Human-in-the-Loop as First Class
**The gap it fills**: Human Override and Intervention (High), Agent Explainability (Medium)
Only **30 human-agent interaction drafts** exist against **120 A2A protocols** and **93 autonomous operations** drafts. Agents are being designed to talk to each other, not to humans. The CHEQ protocol ([draft-rosenberg-aiproto-cheq](https://datatracker.ietf.org/doc/draft-rosenberg-aiproto-cheq/)) is a rare exception -- it defines human confirmation *before* agent execution. But nobody has standardized what happens *during* execution: how a human pauses a running workflow, constrains an agent's scope, takes over a task, or issues an emergency stop.
Human-in-the-loop must be a node type in the execution DAG, not an afterthought. The architecture needs:
- **Approval gates**: DAG nodes that block until a human approves
- **Override commands**: Standardized signals to pause, constrain, stop, or take over
- **Escalation paths**: What happens when an override times out
- **Explainability tokens**: How an agent communicates its reasoning at a HITL point
The irony: every production deployment will require these primitives. The standards community is building autonomous capabilities while the deployment community is adding human oversight ad hoc.
### Pillar 3: Protocol-Agnostic Interoperability
**The gap it fills**: Cross-Protocol Translation (High, zero ideas), Agent Lifecycle Management (High)
The 120 A2A protocol drafts will never converge to a single winner. MCP, A2A Protocol, SLIM, and dozens of others will coexist, each with different strengths. The answer is not to pick one; it is to build a translation layer that lets agents using different protocols interoperate through gateways.
This gap has **zero ideas** in the current corpus -- the starkest absence across 361 drafts. No team is working on it. Yet it is perhaps the most important architectural piece: without protocol interoperability, the agent ecosystem fragments into vendor-locked silos.
The protocol binding layer would define:
- How agents advertise which ecosystem features they support
- How gateways translate between protocols while preserving execution semantics (the DAG, the HITL points)
- How agents version and retire gracefully without breaking dependents
- The minimal semantic contract: intent, result, error -- expressible in any protocol
### Pillar 4: Assurance Profiles (Dual Regime)
**The gap it fills**: Behavior Verification (Critical), Cross-Domain Security (High), Dynamic Trust (High), Data Provenance (Medium)
The same agent ecosystem must work in two regimes:
**Relaxed** (development, internal tools, low-risk): Best-effort, optional audit, minimal proof overhead. Think Kubernetes-deployed internal agents.
**Regulated** (finance, healthcare, critical infrastructure): Cryptographic attestation per task, provenance chains, behavior verification against declared specifications, mandatory audit ledger. Think medical or financial agents.
The architecture achieves this with *assurance profiles* -- named configurations that dial up or down the proof requirements. The same DAG, same HITL points, same protocol bindings. Different levels of evidence:
| Level | Evidence | Use Case |
|-------|----------|----------|
| L0 | None (best-effort) | Development, testing |
| L1 | Unsigned audit trail | Internal production |
| L2 | Signed ECTs (JWT) | Cross-org, standard compliance |
| L3 | Signed ECTs + external audit ledger | Regulated industries |
This dual-regime approach resolves the tension between "move fast" deployments and "prove everything" regulated environments. The 52 ideas touching behavior verification and the 79 ideas touching data provenance become implementable at higher assurance levels without imposing their cost on every deployment.
## How It Builds on What Exists
A critical point: this architecture does not compete with existing work. It layers on top of it. Our cross-reference analysis confirms the foundations are strong: **TLS 1.3** (RFC 8446, cited by 42 drafts), **OAuth 2.0** (RFC 6749, 36 drafts), **HTTP Semantics** (RFC 9110, 34 drafts), **JWT** (RFC 7519, 22 drafts), and **COSE** (RFC 9052, 20 drafts) form the bedrock.
But the bedrock is not uniform. Our RFC foundation analysis (Post 3) revealed that the Chinese and Western blocs build on **fundamentally different technology stacks**: YANG/NETCONF for network management on one side, COSE/CBOR/CoAP for IoT security on the other. The only shared foundation is OAuth 2.0. This means the architecture layer above must be genuinely protocol-agnostic -- it cannot assume either stack as the default. The four pillars are designed with this constraint: the DAG model, HITL primitives, and assurance profiles are expressed in terms of abstract semantics, not specific wire formats. The protocol binding layer (Pillar 3) exists precisely because the underlying plumbing diverges.
The architecture adds connective tissue above this layer, not below it:
| Layer | Existing Work | What We Add |
|-------|---------------|-------------|
| **Identity** | SPIFFE (workload identifier), WIMSE (security context propagation) | Nothing -- use existing identity |
| **Evidence** | ECT (execution context tokens, DAG linking) | Orchestration semantics, checkpoint/rollback, HITL nodes |
| **Auth** | OAuth 2.0, SCIM, DAAP, STAMP, Agentic JWT | Protocol binding so any auth approach works |
| **Communication** | MCP, A2A, SLIM, 120 other protocols | Translation layer and capability advertisement |
| **Safety** | DAAP (accountability), verifiable conversations, VERA (zero-trust) | Assurance profiles connecting these into deployable configurations |
The proposed five-draft ecosystem:
1. **Agent Ecosystem Model (AEM)** -- Architecture and terminology. The shared vocabulary so everyone speaks the same language.
2. **Agent Task DAG (ATD)** -- Execution semantics, checkpoints, rollback. How the DAG works.
3. **Human-in-the-Loop (HITL) Primitives** -- Approval gates, overrides, escalation. How humans participate.
4. **Agent Ecosystem Protocol Binding (AEPB)** -- Protocol translation, capability discovery, lifecycle management. How interoperability works.
5. **Assurance Profiles (APAE)** -- Behavior verification, dynamic trust, provenance. How you prove it all works.
Each draft addresses specific gaps. Together, they provide the connective tissue the landscape lacks.
## Traction vs. Aspiration
A reality check: of the 361 drafts, only **36 (10%)** have been adopted by IETF working groups. The rest are individual submissions -- proposals without institutional backing. The WG-adopted drafts score higher on average (**3.54 vs. 3.31**), particularly on maturity (+1.28) and momentum (+0.98), but lower on novelty (-0.45). The WGs that have adopted the most agent-relevant drafts are security-focused: **lamps** (6 drafts), **lake** (5), **tls** (3), **emu** (3). Agent-specific WGs like `aipref` have adopted only 2 drafts.
This reveals a structural insight: the IETF is not building agent standards from scratch. It is **retrofitting security standards for agents**. The agent architecture we propose above would need to work within this reality -- building on the security WGs' infrastructure rather than competing with it.
## Predictions
Based on the data trajectories and current momentum:
**Within 6 months**: The OAuth-for-agents fragmentation will partially resolve. Working groups will adopt 2-3 canonical approaches (likely DAAP/STAMP for accountability and one of the RAR extensions for basic auth). The other 10 proposals will fade or merge.
**Within 12 months**: The DMSC side meeting's gateway work will produce a specification, likely gateway-centric with Agent Gateways as the primary interoperability mechanism. This is not the protocol-agnostic translation layer the ecosystem needs, but it will be the first concrete interop proposal.
**Within 18 months**: The safety deficit will begin to close -- not from IETF drafts but from regulatory pressure. The EU AI Act's requirements for high-risk AI systems will drive demand for behavior verification, human override, and audit standards. The IETF will respond reactively.
**The risk**: If the architecture work does not happen in the next 12 months, the agent ecosystem will calcify around vendor-specific protocol stacks (OpenAI's, Google's, Anthropic's, Huawei's). Each will have its own auth, discovery, and communication layer. The interoperability window will close, and the IETF's work will be standards for islands rather than standards for the internet.
### Two Equilibria
By 2028, the landscape will have resolved into one of two stable states.
In the **first equilibrium**, it looks like today's microservices ecosystem: a chaotic but functional collection of protocols, libraries, and frameworks, held together by platform-specific integrations and de facto standards from the largest cloud providers. The IETF's work exists but is incomplete. The real interoperability happens at higher layers -- agent frameworks like LangChain, Semantic Kernel, or their successors. Safety is bolted on after deployment.
In the **second equilibrium**, it looks more like the web: a layered architecture where identity (like TLS), communication (like HTTP), and semantics (like HTML) are cleanly separated, with standardized interfaces between them. Agents identify via WIMSE, execute via ECT-based DAGs, communicate via protocol-agnostic bindings, and operate under assurance profiles that scale from development to regulated production. Safety is built in, not bolted on.
The 4:1 ratio is the leading indicator. If it narrows -- if safety and oversight work accelerates to match capability work -- the second equilibrium becomes achievable. If it stays at 4:1 or widens, the first equilibrium is where we land, and safety becomes remediation rather than prevention.
## What Builders Should Do Today
If you are building agent systems and cannot wait for standards to mature:
**1. Watch these drafts**: ECT (execution evidence), DAAP (accountability), CHEQ (human confirmation), ADL (agent description), ANS (agent discovery). These have the highest combination of quality, novelty, and adoption potential.
**2. Design for the DAG**: Structure your multi-agent workflows as directed acyclic graphs with explicit dependencies and checkpoints. Even without a standard, the pattern will be compatible with whatever emerges.
**3. Build HITL from the start**: Every production agent deployment needs human override capability. Do not add it later. Design approval gates, emergency stops, and escalation paths into your architecture now.
**4. Implement assurance as a dial**: Make your proof/audit level configurable. Start at L0 for development, L1 for production, and be ready to turn up to L2/L3 when regulation arrives.
**5. Avoid protocol lock-in**: If you build on MCP today, architect for the possibility of supporting A2A or SLIM tomorrow. The protocol war is not over, and the winner may be "all of them via translation."
## The Thesis
Across six posts, we have built to one argument:
**The IETF's AI agent standardization effort is the largest, fastest-growing, and most consequential standards race in a decade. But it is building the highways before the traffic lights.** The data shows explosive growth (from 0.5% to 9.3% of all IETF submissions in 15 months), deep fragmentation (120 competing A2A protocols), concerning concentration (one company writes 18% of all drafts), and a structural safety deficit (4:1 capability to guardrails). What is missing is not more protocols -- it is connective tissue: a shared execution model, human oversight primitives, protocol interoperability, and assurance profiles that work from development to regulated production.
The 75 convergent ideas -- and the broader set of 628 cross-org overlaps -- contain the components for this architecture. The question is whether the community can assemble them before the protocols ship without it. The convergence data suggests it is possible: **180 ideas already cross the Chinese-Western divide**, mediated largely by European telecoms (Deutsche Telekom, Telefonica, Orange) that operate in both markets and appear on both sides of nearly every major cross-cultural convergent idea. The bridge-builders exist. They need an architecture to bridge to.
The IETF has built the internet's infrastructure before. DNS, HTTP, TLS -- each emerged from periods of competing proposals, fragmentation, and coordinated resolution. The AI agent standards race is following the same pattern, on a compressed timeline, with higher stakes.
The traffic lights need to catch up to the highways. The data says they can -- if someone draws the big picture.
---
### Key Takeaways
- **Four missing pillars**: DAG-based execution, human-in-the-loop primitives, protocol-agnostic interoperability, and assurance profiles for dual-regime deployment
- **The architecture builds on existing work**: SPIFFE for identity, WIMSE for security context, ECT for execution evidence -- the foundation exists
- **Five proposed drafts** (AEM, ATD, HITL, AEPB, APAE) would fill the 12 gaps by providing connective tissue between existing protocol proposals
- **The interoperability window is closing**: vendor-specific agent stacks are forming; the next 12 months are critical for open standards
- **For builders today**: design for DAGs, build HITL from the start, make assurance configurable, avoid protocol lock-in
*Next in this series: [How We Built This](07-how-we-built-this.md) -- the methodology behind analyzing 361 IETF drafts with Claude, Ollama, and Python.*
---
*Synthesis based on the full IETF Draft Analyzer dataset: 361 drafts, 557 authors, 75 cross-draft convergent ideas (628 via fuzzy matching), 12 gaps, 18 team blocs, 42 overlap clusters. Data current as of March 2026.*

View File

@@ -0,0 +1,242 @@
# How We Built This: Analyzing 361 IETF Drafts with Claude and Ollama
*The engineering behind the analysis -- a Python CLI, two LLMs, one SQLite database, and ~$9.*
---
Every claim in this series -- the 4:1 safety ratio, the 14 competing OAuth proposals, the 18 team blocs, the 12 gaps, the 180 ideas crossing the Chinese-Western divide -- comes from an automated analysis pipeline we built in Python. This post describes how it works, what it costs, what it found that surprised us, and what we learned about LLM-powered document analysis at scale.
The tool is open source. If you want to run it on a different corner of the IETF -- or adapt it for another standards body -- everything you need is in the repository.
## The Pipeline
The analysis runs in six core stages. Each builds on the previous, and every stage caches its work so re-runs are fast and cheap.
```
fetch --> analyze --> embed --> ideas --> gaps --> report
| | | | | |
v v v v v v
Datatracker Claude Ollama Claude Claude Markdown
API Sonnet nomic-embed Haiku Sonnet + rich
```
Three additional analysis passes run on top of the core pipeline:
```
refs --> trends --> idea-overlap --> status
| | | |
v v v v
Regex SQL query SequenceMatcher Naming convention
(local) (local) (local) (local)
```
These secondary passes cost nothing -- they operate entirely on data already in the database.
### Stage 1: Fetch
The Datatracker API (`https://datatracker.ietf.org/api/v1/doc/document/`) provides structured metadata for every Internet-Draft: name, title, abstract, authors, revision, submission date, working group, and current status. Full text is available at `https://www.ietf.org/archive/id/{name}-{rev}.txt`.
We search for drafts matching 12 keywords: `agent`, `ai-agent`, `llm`, `autonomous`, `machine-learning`, `artificial-intelligence`, `mcp`, `agentic`, `inference`, `generative`, `intelligent`, `aipref`. Both `name__contains` and `abstract__contains` filters are used to cast a wide net. We started with 6 keywords and 260 drafts; adding 6 more captured 101 new drafts in categories we were missing -- MCP-related work, generative AI infrastructure, intelligent networking, and the nascent `aipref` working group.
**Gotchas learned the hard way**: The Datatracker API uses `type__slug=draft` (not `type=draft`) to filter to drafts. Pagination requires tracking `meta.next` through the response chain. Affiliation data comes from the `documentauthor` record, not the `person` record. We add a 0.5-second polite delay between requests.
The result: **361 drafts** fetched, with full metadata and text stored in SQLite.
### Stage 2: Analyze
Each draft is sent to Claude Sonnet with a compact structured prompt that includes the draft name, title, date, page count, and abstract. The prompt asks for:
- **Category classification** (one or more of 11 categories: A2A protocols, agent identity/auth, autonomous netops, data formats/interop, agent discovery/reg, human-agent interaction, AI safety/alignment, ML traffic management, policy/governance, model serving/inference, other)
- **Quality rating** on five dimensions (novelty, maturity, overlap, momentum, relevance) each scored 1-5
- **Brief summary** of what the draft does and why it matters
The key optimization: **caching**. Every Claude API call is stored in an `llm_cache` table keyed by the SHA-256 hash of the full prompt. If the same draft is analyzed twice, the second call is free and instant. This makes the pipeline idempotent -- you can re-run any stage without wasting money.
We initially sent full draft text to Claude, but switched to abstract-only analysis after testing showed that abstracts produce equivalent ratings at roughly 10x lower token cost. Full text is still used for idea extraction (Stage 4), where granular detail matters.
**Cost**: About $3.16 for the initial 260 drafts on Claude Sonnet (376K input tokens, 200K output tokens). With the `--cheap` flag, analysis uses Claude Haiku instead, cutting costs roughly 10x.
### Stage 3: Embed
For similarity analysis, we generate vector embeddings using Ollama running locally with the `nomic-embed-text` model. Each draft's abstract is embedded into a 768-dimensional vector, stored as raw bytes in the database.
**Why not Claude for embeddings?** Cost and speed. Ollama runs locally, is free, and processes all 361 drafts in under a minute. The embeddings are used for approximate similarity (cosine distance), overlap detection, and t-SNE visualization -- tasks where a small local model is perfectly adequate.
The embeddings enable:
- **Overlap clusters**: Draft pairs with >0.85 cosine similarity grouped together
- **Near-duplicate detection**: 25+ pairs with >0.98 similarity flagged as potential duplicates
- **Interactive t-SNE landscape**: 2D visualization of the entire draft space, color-coded by category
### Stage 4: Ideas
The most expensive stage. Each draft's full text is analyzed by Claude to extract discrete technical ideas -- mechanisms, architectures, protocols, patterns, extensions, and requirements.
**Batch optimization**: Rather than calling Claude once per draft, we batch 5 drafts per API call using Claude Haiku (`--cheap --batch 5`). This cuts the number of API calls by 5x and uses the cheaper model. The batch prompt includes all 5 drafts' texts and asks for ideas from each, reducing per-idea cost to fractions of a cent.
**Result**: **1,780 technical components** extracted from 361 drafts (averaging ~5 per draft). Of 1,692 unique titles, **96% appear in exactly one draft** -- most are draft-specific component descriptions ("Agent Gateway," "Transport Configuration System"), not standalone innovations. Only **75 ideas** show genuine cross-draft convergence (appearing in 2+ drafts), and only **11** appear in 3+ drafts. The real signal comes from the cross-org overlap analysis (idea-overlap feature), which uses fuzzy matching to identify **628 ideas** where 2+ organizations work on recognizably similar problems -- 43% of all unique idea clusters.
### Stage 5: Gaps
The gap analysis is a synthesis step. We send Claude Sonnet the full landscape context -- category distributions, idea taxonomy, safety ratio, overlap patterns -- and ask it to identify areas where standardization work is missing or inadequate.
This is the one stage where the LLM is doing genuine reasoning, not just extraction. The prompt provides the data; Claude identifies the structural gaps. We validate its findings against the raw data (e.g., confirming that only 6 ideas address error recovery, or that cross-protocol translation has zero ideas).
**Result**: **12 gaps** identified (3 critical, 6 high, 3 medium), each cross-referenced with related drafts and ideas.
### Stage 6: Report
Reports are generated in Markdown with embedded data tables. Fifteen report types are available: overview, landscape, digest, timeline, overlap-matrix, overlap-clusters, authors, ideas, gaps, refs, trends, idea-overlap, and status. The `rich` library provides formatted terminal output for CLI commands.
## The Database
The SQLite database is the real product. At **28 MB**, it contains everything needed to reproduce any finding in this series.
| Table | Rows | Purpose |
|-------|-----:|---------|
| drafts | 361 | Full metadata + text for every draft |
| ratings | 361 | 5-dimension quality scores + summaries |
| embeddings | 361 | 768-dim vectors as binary blobs |
| ideas | 1,780 | Extracted technical components with types |
| authors | 557 | Person records from Datatracker |
| draft_authors | 1,057 | Author-to-draft linkage with affiliation |
| draft_refs | 4,231 | RFC/draft/BCP cross-references |
| gaps | 12 | Identified standardization gaps |
| llm_cache | 703 | Cached Claude API responses |
FTS5 full-text search is enabled on drafts, supporting queries like `ietf search "agent authentication"` that return ranked results in milliseconds. Indexes on `draft_refs(ref_type, ref_id)` and `ideas(draft_name)` keep query performance fast even for cross-table joins.
The database design follows a principle: **store raw data, compute derived data**. The drafts table stores full text; the ratings, ideas, and refs tables store analysis results. Any analysis can be re-run without re-fetching from the Datatracker API.
## The Author Network
The author analysis deserves special mention because it revealed the team bloc pattern -- one of the most important findings in the series.
The IETF Datatracker provides author information via two API endpoints:
- `/api/v1/doc/documentauthor/?document__name=X` -- returns author links per draft
- `/api/v1/person/person/{id}/` -- returns person details (name, affiliation)
We fetch all authors for all drafts, build a co-authorship graph, and detect team blocs: groups where every pair of members shares at least 70% of their drafts. This threshold was chosen empirically -- lower thresholds produce too many loose groups; higher thresholds miss real teams.
The detection algorithm:
1. For each pair of authors, calculate pairwise overlap = |shared drafts| / min(|A's drafts|, |B's drafts|)
2. Build a graph where edges represent pairs with >= 70% overlap and >= 2 shared drafts
3. Find connected components in this graph
4. Each component is a team bloc
**Organization normalization** turned out to be essential. "Huawei Technologies", "Huawei Technologies Co., Ltd.", and "Huawei Canada" all need to resolve to "Huawei". We maintain a hand-curated alias table of 40+ mappings plus automatic suffix stripping for common patterns (", Inc.", " LLC", " AB", etc.). Without this, cross-org analysis would fragment the same company into multiple entities.
**Result**: **18 team blocs** detected among 557 authors. The largest: a 13-person Huawei team with 22 shared drafts and 94% average cohesion.
## The New Features
Four features were added during the analysis session, each unlocking a deeper analytical layer. All four run locally with zero API cost.
### RFC Cross-References (`ietf refs`)
**What it does**: Parses all 361 drafts for RFC references using regex (`RFC\s*\d{4,}`, `\[RFC\d+\]`, `BCP\s*\d+`, `draft-[\w-]+`). Stores results in a `draft_refs` table for querying.
**What it found**: **4,231 cross-references** (2,443 RFC, 698 draft, 1,090 BCP) across 360 drafts with text. The most-referenced standards reveal what the agent ecosystem builds on:
| RFC | References | What It Is |
|-----|----------:|-----------:|
| RFC 2119 | 285 | MUST/SHALL/MAY conventions |
| RFC 8174 | 237 | Key words update |
| RFC 8446 | 42 | TLS 1.3 |
| RFC 6749 | 36 | OAuth 2.0 |
| RFC 9110 | 34 | HTTP Semantics |
| RFC 8259 | 26 | JSON |
| RFC 5280 | 22 | X.509 Certificates |
| RFC 7519 | 22 | JWT |
| RFC 9052 | 20 | COSE |
**The insight**: Strip away RFC 2119/8174 (boilerplate conventions that every IETF draft references) and the picture is clear: the agent ecosystem is built on **OAuth + TLS + HTTP + JWT**. It is a security and identity infrastructure, not a networking infrastructure. The IETF's agent standards are being constructed on the same foundation as the web itself. This reframes the entire landscape: agent standards are not something new. They are the next layer on top of the web's existing security architecture.
### Category Trends (`ietf trends`)
**What it does**: Monthly breakdown of new drafts per category with growth rates, comparing recent periods to earlier ones.
**What it found**: The growth curve is a step function. Monthly submissions went from 2 (Jun 2025) to 67 (Oct 2025) to 86 (Feb 2026). A2A protocols are still accelerating (26 in Oct/Nov 2025, 36 in Feb 2026). Safety/alignment is growing but slower (5 in Oct 2025, 12 in Feb 2026). The 4:1 ratio is narrowing, but not fast enough.
### Cross-Org Idea Overlap (`ietf idea-overlap`)
**What it does**: Groups similar ideas using `SequenceMatcher` (threshold 0.75), then checks which ideas span drafts from multiple organizations. This separates genuine cross-org consensus from intra-team duplication.
**What it found**: By exact title, only 75 of 1,692 unique ideas appear in 2+ drafts -- 96% are islands. But fuzzy matching reveals **628 ideas** where 2+ organizations work on recognizably similar problems (43% of unique clusters). The top convergence signal -- "A2A Communication Paradigm" -- spans **8 organizations from 4 countries**. The deeper finding: **180 ideas cross the Chinese-Western organizational divide**. European telecoms (Deutsche Telekom, Telefonica, Orange) act as bridges between Chinese institutions and Western companies. US Big Tech (Google, Apple, Amazon) is almost entirely absent from cross-divide collaboration.
### WG Adoption Status (`ietf status`)
**What it does**: Determines which drafts have been formally adopted by IETF Working Groups based on the `draft-ietf-{wg}-*` naming convention. Compares scores, categories, and gap coverage between WG-adopted and individual drafts.
**What it found**: Only **36 of 361 drafts (10%)** are WG-adopted. The remaining 90% are individual submissions -- ideas seeking institutional backing. WG-adopted drafts score slightly higher on average (**3.54 vs 3.31**), validating our rating methodology.
The most revealing finding: **19 of 36 WG-adopted drafts are in security Working Groups** (lamps, lake, tls, emu, ace). The agent-focused `aipref` WG has only 2 adopted drafts. The IETF is not building agent standards in agent-focused groups -- it is retrofitting its existing security infrastructure for agent use cases. The standards that will actually govern AI agents on the internet are being written by the same people who write TLS and OAuth, not by new agent-specific working groups.
## What We Learned
### LLMs are good at structured extraction
Claude's strength in this pipeline is turning unstructured technical documents into structured data: categories, ratings, ideas, gaps. The extraction quality is high -- we spot-checked 50 drafts and found categorization and idea extraction accurate in ~90% of cases. The errors tend to be over-categorization (assigning too many categories) rather than miscategorization.
### LLMs need validation for synthesis
The gap analysis (Stage 5) required the most human oversight. Claude correctly identified the gaps, but the severity rankings and the "zero ideas" claims needed manual verification against the raw data. LLMs can synthesize, but the synthesis should be treated as a hypothesis, not a conclusion.
### Caching changes the economics
The `llm_cache` table transforms the cost model. The first run costs ~$3. Every subsequent run -- adding new drafts, re-running with different prompts, regenerating reports -- costs only for new work. Over the project's life, we estimate caching saved $30+ in redundant API calls. The cache key is a SHA-256 hash of the full prompt, making it trivially collision-resistant.
### Hybrid models work
Using Claude Sonnet for reasoning-heavy tasks (analysis, gap synthesis) and Claude Haiku for extraction-heavy tasks (idea extraction, batch processing) cut costs by 5-10x without meaningful quality loss. Using Ollama for embeddings made similarity analysis free and fast. The principle: match the model's capability to the task's difficulty.
### The free analyses are the most revealing
The four features that cost zero API dollars -- regex-based RFC parsing, SQL-based trend analysis, SequenceMatcher-based idea dedup, and naming-convention-based WG detection -- produced some of the most narratively important findings in the entire series. The OAuth-stack-as-foundation insight from RFC cross-references. The 180 cross-divide ideas. The 10% WG adoption rate. The security-WG-not-agent-WG finding. None of these required an LLM. They required a well-structured database and the right questions.
### The database is the product
The most valuable output is not any single report -- it is the SQLite database. With all drafts analyzed, ideas extracted, authors mapped, refs parsed, and embeddings stored, the database supports ad-hoc queries that no pre-built report can anticipate. The blog series was written primarily by querying the database, not by re-running the pipeline.
## Cost Summary
| Stage | Model | Drafts | Cost |
|-------|-------|-------:|-----:|
| Analyze | Claude Sonnet | 260 | ~$2.50 |
| Analyze | Claude Sonnet | 101 | ~$5.50 |
| Ideas | Claude Haiku (batch 5) | 361 | ~$0.80 |
| Gaps | Claude Sonnet | 1 call | ~$0.20 |
| Embed | Ollama (local) | 361 | $0.00 |
| Refs | Regex (local) | 361 | $0.00 |
| Trends | SQL (local) | 361 | $0.00 |
| Idea-overlap | SequenceMatcher (local) | 1,780 ideas | $0.00 |
| WG Status | Naming convention | 361 | $0.00 |
| **Total** | | | **~$9** |
For context: analyzing 361 IETF drafts -- fetching full text, rating quality on 5 dimensions, extracting ~1,700 technical components, detecting 12 gaps, mapping 557 authors, parsing 4,231 cross-references, and identifying 18 team blocs -- cost less than two large coffees.
## The Tech Stack
- **Python 3.11+** with **Click** for the CLI
- **SQLite** with **FTS5** for full-text search
- **httpx** for HTTP requests (Datatracker API)
- **anthropic** SDK for Claude API
- **ollama** for local embeddings
- **rich** for terminal formatting
- **numpy** for cosine similarity and matrix operations
43 CLI commands, 13+ interactive visualizations (HTML/PNG), 15 report types. Total codebase: approximately 6,100 lines of Python across 12 modules.
---
### Key Takeaways
- **The full analysis cost ~$9** -- LLM-powered document analysis at scale is practical and cheap with proper caching and model selection
- **Caching is essential**: SHA-256 hashed prompt caching makes the pipeline idempotent and dramatically reduces costs on re-runs
- **Hybrid LLM strategy**: Claude Sonnet for reasoning, Claude Haiku for extraction (10x cheaper), Ollama for embeddings (free) -- match model capability to task difficulty
- **The zero-cost analyses were the most revealing**: RFC cross-references, idea overlap, WG adoption, and trend analysis all run locally and produced the series' most important structural findings
- **The database is the product**: a well-structured SQLite DB supports queries no pre-built report anticipates; the blog series was written by querying, not re-running
*Next in this series: [Agents Building the Agent Analysis](08-agents-building-the-analysis.md) -- we used a team of AI agents to produce this series. The irony is the point.*
---
*The IETF Draft Analyzer is open source. The codebase, database, and all reports are available in the project repository.*

View File

@@ -0,0 +1,198 @@
# Agents Building the Agent Analysis
*We used a team of AI agents to analyze, write about, and draw conclusions from 361 IETF drafts on AI agents. Here is what that looked like from the inside.*
---
There is an irony we should address up front: this entire blog series -- analyzing 361 Internet-Drafts about how AI agents should work -- was itself produced by a team of AI agents. Four Claude instances, each with a distinct role, reading the same data, building on each other's output, and coordinating through a shared task system and development journal.
This post is the story of that process: what worked, what surprised us, and what it reveals about the state of AI agent coordination in practice -- which, as it happens, is exactly the problem the IETF drafts are trying to solve.
## The Team
We designed a four-agent team, each with a one-page definition file and a shared 3,000-word team brief:
| Agent | Role | What They Did |
|-------|------|---------------|
| **Architect** | "The Big Picture" | Read all reports, designed the narrative arc, wrote the vision document, reviewed every post across multiple passes |
| **Analyst** | "The Data Whisperer" | Ran the full pipeline on 361 drafts, executed 20+ SQL queries, produced 7 data packages |
| **Coder** | "The Feature Builder" | Implemented 7 new analysis features (refs, trends, idea-overlap, WG adoption, revisions, centrality, co-occurrence) |
| **Writer** | "The Storyteller" | Drafted all 8 blog posts, applied 6+ revision passes incorporating data refreshes, architectural reframes, and editorial redirections |
Each agent had access to the full project codebase, a SQLite database of analyzed drafts, and the `ietf` CLI tool. They communicated through direct messages and coordinated through a shared task board with dependency tracking.
The team brief contained a thesis statement -- "The IETF is building the highways before the traffic lights" -- along with a per-post outline, style guide, and key data points table. Each agent's definition was approximately 50 lines: enough to establish identity and scope without over-constraining behavior.
## How It Actually Worked
The process unfolded in roughly six phases -- not the four we planned.
### Phase 1: Parallel Initialization
All four agents started simultaneously. The Analyst began running the analysis pipeline on 101 new drafts. The Architect read all 10 existing reports and started designing the narrative arc. The Coder read the Architect's initial notes and began implementing new features. The Writer read every data report in the project.
The key design decision: **agents did not wait for each other when they could work in parallel.** The Writer's tasks were formally blocked by the Analyst's pipeline run, but the Writer had enough existing data (260 analyzed drafts) to start drafting. Rather than sitting idle, the Writer produced first drafts of all 6 core posts while waiting for updated numbers. This turned out to be the right call -- the structure and narrative mattered more than whether the draft count was 260 or 361.
### Phase 2: The Architect Sets the Frame
The Architect's first deliverable changed everything. After reading all 10 reports, the Architect produced two documents:
**1. The narrative arc** (`00-series-overview.md`): A three-act structure (Gold Rush, Fragmentation, Path Forward) with five recurring motifs and per-post design guidance. The key insight embedded in this document -- that "coordination difficulty correlates with gap severity" -- reframed the entire analysis. The safety deficit was not just a quantity problem (too few safety drafts); it was a structural problem (the team-bloc structure that concentrates authorship cannot produce the cross-team work that safety standards require).
**2. The vision document** (`state-of-ecosystem.md`): A ~2,000-word synthesis with three 2027 scenarios and a "two equilibria" 2028 endgame. The best historical analogy turned out to be not IoT but the web itself -- browser wars leading to HTML5 convergence. The critical difference: when the thing being standardized makes autonomous decisions, getting safety wrong in the messy phase has consequences that are harder to fix retroactively.
Both documents shaped every subsequent blog post. The Writer wove the motifs through the series. The Coder built features the Architect flagged as missing. The Analyst's queries were directed by the per-post data requirements table the Architect produced.
### Phase 3: Building and Writing in Parallel
The Coder and Writer worked simultaneously, their outputs feeding each other. The Coder started with four features, then built three more as the Architect identified additional analytical needs:
| Coder Built | What It Revealed | Writer Used It In |
|-------------|------------------|-------------------|
| `ietf refs` (4,231 cross-references) | OAuth 2.0 and TLS 1.3 are the ecosystem's bedrock | Post 3: OAuth Wars |
| `ietf idea-overlap` (628 cross-org ideas) | 43% of idea clusters have cross-org validation | Post 5: Where Drafts Converge |
| `ietf trends` (19 months of data) | Growth from 0.5% to 9.3% of all IETF submissions | Post 1: Gold Rush |
| `ietf status` (36 WG-adopted drafts) | Agent standards live in security WGs, not agent WGs | Post 6: Big Picture |
| `ietf revisions` (55% at rev-00) | Most drafts are fire-and-forget; commitment is rare | Posts 2, 5 |
| `ietf centrality` (491 nodes, 1,142 edges) | European telecoms are the cross-divide glue | Post 2: Who Writes the Rules |
| `ietf co-occurrence` (safety isolation) | Safety co-occurs with A2A protocols only 8.8% of the time | Post 4: What Nobody Builds |
Every one of these features used **zero API calls** -- pure local computation using regex, SequenceMatcher, networkx, and SQL. This is an underappreciated pattern in LLM-powered analysis: use the expensive model (Claude) for tasks that require reasoning (categorization, idea extraction, gap synthesis), and use deterministic code for everything else. The cheapest analyses -- the ones with zero marginal cost -- produced the most structurally revealing findings.
The Writer produced all 7 posts in a single session: roughly 15,000 words across Posts 1-7, each following the Architect's structural guidance while making independent editorial decisions about hooks, examples, and narrative pacing.
### Phase 4: First Review and the Silent Failure
The Architect read all 6 core posts end-to-end and provided a structured review:
- **Post 1**: Four specific notes (geopolitics belongs in Post 2, add keyword expansion, lighten ending, add vivid example)
- **Post 3**: Flagged a data inconsistency (OAuth table had 14 rows but text said 13)
- **Post 4**: Identified as the strongest post -- the hospital drug-dispensing scenario and structural analysis section deliver the climax
- **Post 5**: Needed cross-org overlap data from the Coder's new report
- **Post 6**: Suggested adding the "two equilibria" framing from the vision document
The Writer applied all revisions in a targeted pass. The most interesting editorial decision: removing the extended geopolitics section from Post 1. The original was well-written but front-loaded the series with details that Post 2 covers in depth. The lighter version creates more narrative pull toward the next post.
Then came the first real coordination failure. **The Writer's revisions to Post 1 did not persist.** The dev journal said the work was done. The task board said "completed." But when the Architect verified the actual file, it still contained the pre-revision content -- the full geopolitics section, the heavy ending, the missing cloud-infrastructure scenario.
This is exactly the kind of silent failure that agent teams need guardrails for. The log said success; the artifact said otherwise. Without the Architect's verification step -- reading the output rather than trusting the status -- the error would have shipped. Lesson: **verify outputs, not logs.**
### Phase 5: The Data Arrives and the Reframing Battle
While the writing and reviewing unfolded, the Analyst completed the full pipeline: 361 drafts rated, 557 authors mapped (up from 403), 1,780 ideas extracted (up from 1,262). The numbers changed significantly: Huawei's share grew from 12% to 18%, A2A protocols from 92 to 120, and the safety ratio held steady at roughly 4:1. Every blog post needed a numbers-update pass.
But the most consequential event in Phase 5 was not the data refresh. It was the project lead challenging the Writer's headline claim.
**The "1,780 ideas" reframing.** The series had been built around a headline number: "1,780 technical ideas extracted from 361 drafts." The project lead asked: what does that number actually mean? The answer was uncomfortable. The pipeline extracts approximately 5 ideas per draft on average -- a mechanical process that produces "ideas" like "A2A Communication Paradigm" and "Agent Network Architecture." The raw count sounds impressive but is mostly scaffolding.
The real signal was hiding in the Coder's cross-org overlap analysis: of 1,692 unique idea titles, **96% appear in exactly one draft.** Only 75 show up in two or more drafts. Only 11 in three or more. The fragmentation that defines the protocol landscape extends all the way down to the idea level.
This required rewriting Post 5 entirely -- its title changed from "The 1,780 Ideas That Will Shape Agent Infrastructure" to "Where 361 Drafts Converge (And Where They Don't)." The lead metric shifted from raw extraction count (impressive but hollow) to the 96% fragmentation rate (honest and striking). Every post that referenced the idea count had to be updated, some multiple times as the framing evolved through three iterations.
The episode is worth documenting because it illustrates the irreducible role of human judgment in agent-produced work. Four agents had independently used the 1,780 figure -- the Analyst generated it, the Coder validated it, the Architect designed around it, the Writer headlined it. None questioned whether it was meaningful. It took a human asking "so what?" to force the reframe. The improved version -- convergence-amid-fragmentation, with 628 cross-org convergent ideas as the honest middle ground -- was genuinely better. But no agent surfaced the critique on its own.
### Phase 6: Bombshell Findings and Final Integration
The Analyst's second deep-analysis round produced three findings that significantly strengthened the series:
**RFC foundation divergence.** The Chinese bloc builds on YANG/NETCONF (network management). The Western bloc builds on COSE/CBOR/CoAP (IoT security) and HTTP/TLS/PKI (web infrastructure). The **only shared foundation is OAuth 2.0.** This elevated Post 3's fragmentation thesis from "different protocols" to "different technological DNA" -- the two blocs are not just disagreeing on solutions, they are building on incompatible infrastructure.
**Revision velocity.** 55% of all 361 drafts are at revision -00 -- submitted once, never iterated. Huawei's rate is 65%. Compare that with Ericsson (11%), Boeing (average revision 28.2), and Siemens (17.2). The volume-vs.-commitment distinction sharpened Post 2's analysis of what Huawei's 66-draft campaign actually represents. A further detail: the majority of Huawei's drafts were submitted in the 4-week window before IETF 121 Dublin -- a coordinated pre-meeting filing burst.
**Centrality bridge-builders.** The co-authorship network (491 nodes, 1,142 edges) revealed that European telecoms -- not US Big Tech, not the UN, not any formal body -- are the structural glue between the Chinese and Western blocs. Telefonica's Luis M. Contreras ranks #1 in betweenness centrality. Only 115 of 557 authors (23%) bridge the divide at all. The standards ecosystem's cross-divide cohesion depends on a handful of companies that most observers would not name first.
The Writer wove all three findings into the series across multiple targeted passes: RFC divergence into Posts 2, 3, and 6; revision velocity into Posts 2 and 5; centrality data into Post 2's cross-pollination section. The Coder's co-occurrence analysis added one more dimension to Post 4: safety co-occurs with governance categories (60% with policy, 58% with identity/auth) but has **zero co-occurrence with Agent Discovery and Model Serving** -- safety is discussed as policy, not implemented as protocol.
## What Surprised Us
### Human judgment was the critical intervention
The ideas reframing was not the only moment where human direction changed the team's course, but it was the most instructive. Agents are excellent at execution -- the Writer applied six revision passes without error, the Coder built seven features in a single session, the Analyst ran 20+ analytical queries. But none of them asked whether the headline metric was worth headlining. The human project lead's "so what?" produced a better Post 5 than any amount of agent iteration would have.
This maps directly to the IETF's Human Override and Intervention gap. The question is not whether agents can do the work. The question is who notices when the work is pointed in the wrong direction.
### The silent failure exposed a verification gap
The Writer's Post 1 revisions disappearing -- logged as done but not actually persisted -- is a small-scale version of the Agent Behavior Verification gap the series identifies as critical. In our case, the Architect caught it during a manual review pass. In a production multi-agent system with no verification protocol, the error propagates. The dev journal said success. The file system disagreed. We had no automated mechanism to detect the discrepancy.
### The Architect role was disproportionately valuable
The Architect produced fewer words than the Writer and fewer features than the Coder, but shaped the entire output. Three specific contributions had outsized impact:
1. The insight that gap severity correlates with coordination difficulty transformed Post 4 from a list of gaps into an argument about structural dysfunction.
2. The "two equilibria" framing in the vision document gave Post 6's predictions real weight -- not just "here is what might happen" but "here are two stable endpoints, and this ratio determines which one we reach."
3. The verification pass that caught the Post 1 silent failure -- and the broader pattern of verifying outputs rather than trusting status messages.
All three contributions came from reading holistically -- something no individual report, pipeline run, or status message could produce. The Architect role was fundamentally about synthesis and verification.
### The cheapest analyses were the most important
| Component | Cost | Most Important Finding |
|-----------|-----:|----------------------|
| Claude Sonnet (ratings, gaps) | ~$8 | 4:1 safety deficit, 12 gap taxonomy |
| Claude Haiku (idea extraction) | ~$0.80 | 1,780 raw ideas (96% fragmented) |
| Ollama embeddings | $0.00 | 25+ near-duplicate pairs |
| Coder: regex RFC parsing | $0.00 | Foundation divergence (YANG vs COSE) |
| Coder: networkx centrality | $0.00 | European telecoms as bridge-builders |
| Coder: SQL co-occurrence | $0.00 | Safety structurally isolated from protocols |
| Coder: revision counting | $0.00 | 55% fire-and-forget rate |
| **Total pipeline** | **~$9** | |
The pattern is consistent: Claude provided the foundation data (ratings, categories, ideas), but the structurally revealing findings came from deterministic local computation on top of that foundation. RFC cross-references (regex), author centrality (networkx), revision velocity (filename parsing), and category co-occurrence (SQL joins) -- all zero-cost, all among the most quotable findings in the series.
### The development journal earned its keep
We required every agent to log milestones to a shared `dev-journal.md`. By session's end, the journal had 30 entries across all four agents -- capturing not just what was done but why, and flagging surprises that would otherwise be lost. When the Writer needed to understand what the Coder had built, the journal entry was faster and more informative than a status message. When the Architect reviewed posts, the Writer's journal entries explained editorial decisions that would otherwise be opaque.
The journal also became the source material for this post. Every "Surprise" field in the journal captured an insight -- the ideas reframing, the silent failure, the RFC divergence revelation -- that no other artifact preserves.
## What This Tells Us About Agent Teams
Six lessons from running a four-agent team on a real project:
**1. Role definitions matter more than instructions.** The one-page agent definitions were more effective than the 3,000-word team brief. Agents performed best when they had a clear identity and scope, not a detailed todo list.
**2. Shared state beats messaging.** The SQLite database, the dev journal, and the report files were more effective coordination mechanisms than direct inter-agent messages. Agents could read each other's outputs on their own schedule, without the overhead of request-response communication.
**3. Async is natural, but verification is not.** Agents working in parallel on loosely coupled tasks is a pattern that works. What does not happen naturally is output verification. The silent failure -- revisions logged but not persisted -- would have gone undetected without a deliberate verification pass. Agent teams need assurance mechanisms, not just coordination mechanisms.
**4. Humans catch category errors; agents catch consistency errors.** The Architect found a 14-vs-13 data inconsistency. The Writer applied six revision passes without introducing a single factual error. Agents are excellent at consistency within a frame. But the project lead's "so what?" about the ideas count was a category-level critique -- questioning the frame itself. That kind of challenge did not emerge from any agent.
**5. Review compounds.** The Architect reviewed the Writer's posts, the project lead reviewed the Architect's framing, and the resulting revisions cascaded through the series. Each review layer caught different things: data errors, structural problems, framing weaknesses. Multiple review passes from different perspectives produced compounding quality gains.
**6. The journal is the product.** The dev journal -- originally intended as a process artifact -- became the richest record of what happened and why. It captures decisions, surprises, and coordination moments that no other artifact preserves. For any multi-agent project, require a shared journal.
## The Meta-Irony
We built a team of AI agents to analyze 361 IETF drafts about AI agent standards. The team needed: coordination mechanisms, shared context, role-based specialization, review and quality gates, human oversight, and a way to verify that completed work was actually complete.
Every one of these needs maps to a gap in the IETF landscape:
| Our Team Needed | What Happened | IETF Gap |
|----------------|---------------|----------|
| Shared execution context | Agents coordinated via SQLite, files, dev journal | Agent Execution Model (no standard) |
| Quality review before publication | Architect caught data errors, structural problems | Agent Behavior Verification (critical gap) |
| Output verification | Writer's revisions silently failed; Architect caught it manually | Agent Behavior Verification (critical gap) |
| Error handling when agents disagreed | Ideas reframing required 3 iterations to stabilize | Agent Error Recovery (6 ideas from 1 draft) |
| Coordination across different approaches | RFC divergence: agents building on different foundations | Cross-Protocol Translation (zero ideas) |
| Human oversight of outputs | Project lead's "so what?" redirected the entire ideas framing | Human Override and Intervention (4 ideas) |
We solved these problems ad hoc -- with a dev journal, a task board, role definitions, manual verification passes, and human review. The IETF is trying to solve them at internet scale with protocol standards. The distance between our 4-agent team and a deployed multi-agent system on the open internet is vast, but the problems are structurally identical.
The standards the IETF is racing to write are the standards our own team needed. The traffic lights the highway needs are the ones we built by hand.
---
### Key Takeaways
- **Four agents** (Architect, Analyst, Coder, Writer) produced 8 blog posts, a vision document, 7 new analysis features, and 30 dev-journal entries from a ~$9 data pipeline
- **The ideas reframing** -- where a human's "so what?" redirected all four agents -- was the single most consequential intervention in the project, and no agent initiated it
- **A silent failure** (revisions logged but not persisted) demonstrated the same Behavior Verification gap the series identifies as critical in the IETF landscape
- **The cheapest analyses were the most revealing**: RFC divergence, author centrality, revision velocity, and co-occurrence patterns -- all zero-cost local computation -- produced the findings that defined the series
- **The team's coordination problems mirror the IETF's gaps**: execution model, behavior verification, error recovery, cross-protocol translation, and human oversight are needed at every scale
*This post concludes the series. All data, code, and reports are available in the IETF Draft Analyzer project repository.*
---
*Written by a team of Claude instances analyzing the IETF's work on AI agent standards. The irony is not lost on us.*

View File

@@ -0,0 +1,210 @@
# Master Statistics — Updated 2026-03-03 (Full 361-Draft Corpus)
All numbers below reflect the complete 361-draft dataset after pipeline run on 101 new drafts.
## Core Numbers
| Stat | Value | Notes |
|------|-------|-------|
| Total drafts | 361 | up from 260 after keyword expansion |
| Total authors | 557 | up from 403 |
| Total organizations | 230 | up from 184 |
| Total ideas (raw) | 1,780 | up from 1,262 (~4.9/draft avg) |
| Unique idea clusters | 1,467 | after fuzzy dedup |
| Cross-org ideas (2+ orgs) | 628 | 43% of unique clusters — LEAD METRIC |
| Total gaps | 12 | 3 critical, 6 high, 3 medium |
| Total embeddings | 361 | all drafts embedded |
| WG-adopted drafts | 36 (10.0%) | 18 WGs |
| Individual drafts | 325 (90.0%) | |
| RFC cross-references | 4,231 | 2,443 RFC + 698 draft + 1,090 BCP |
| Avg novelty | 3.32 | (1-5 scale) |
| Avg maturity | 2.96 | |
| Avg relevance | 3.84 | |
## Growth Curve (Monthly Submissions)
| Month | Drafts | Cumulative |
|-------|--------|------------|
| 2024-01 | 3 | 3 |
| 2024-02 | 1 | 4 |
| 2024-04 | 1 | 5 |
| 2024-09 | 2 | 7 |
| 2024-10 | 1 | 8 |
| 2024-12 | 1 | 9 |
| 2025-01 | 4 | 13 |
| 2025-04 | 5 | 18 |
| 2025-05 | 2 | 20 |
| 2025-06 | 5 | 25 |
| 2025-07 | 5 | 30 |
| 2025-08 | 8 | 38 |
| 2025-09 | 17 | 55 |
| 2025-10 | 67 | 122 |
| 2025-11 | 61 | 183 |
| 2025-12 | 16 | 199 |
| 2026-01 | 54 | 253 |
| 2026-02 | 86 | 339 |
| 2026-03 | 22 | 361 |
Peak: 86 drafts in Feb 2026. Growth from ~2/mo (mid-2024) to 86/mo = **43x acceleration**.
## Category Distribution (Full 361 Drafts)
| Category | Count | % |
|----------|-------|---|
| A2A protocols | 136 | 37.7% |
| Agent identity/auth | 121 | 33.5% |
| Autonomous netops | 98 | 27.1% |
| ML traffic mgmt | 74 | 20.5% |
| AI safety/alignment | 45 | 12.5% |
| Human-agent interaction | 30 | 8.3% |
Note: drafts can have multiple categories.
## Safety Ratio
- Safety drafts: 45 (12.5% of corpus)
- Capability drafts (any non-safety category): 351
- **Ratio: ~8:1 capability-to-safety**
- Improvement from original 4:1 (260 drafts) because keyword expansion brought in more ML infrastructure drafts, some with safety elements
## Keyword Expansion Impact (Original 260 vs New 101)
| Category | Original 260 | New 101 | Total |
|----------|-------------|---------|-------|
| Data formats/interop | 102 | 43 | 145 |
| A2A protocols | 92 | 28 | 120 |
| Agent identity/auth | 98 | 10 | 108 |
| Autonomous netops | 60 | 33 | 93 |
| Policy/governance | 60 | 31 | 91 |
| ML traffic mgmt | 23 | **50** | 73 |
| Agent discovery/reg | 57 | 8 | 65 |
| AI safety/alignment | 36 | 8 | 44 |
| Model serving/inference | 13 | **29** | 42 |
| Human-agent interaction | 22 | 8 | 30 |
**Key finding**: "ML traffic mgmt" and "Model serving/inference" surged with the new keywords — these categories more than doubled. The "inference" and "generative" keywords opened up the ML infrastructure community.
## Geopolitical Split
| Region | Drafts | Authors |
|--------|--------|---------|
| Chinese-affiliated | 152 | 218 |
| Western-affiliated | 94 | 81 |
| Other/Unclassified | 158 | 221 |
Chinese orgs contribute ~42% of drafts from ~39% of authors. Western orgs: ~26% of drafts from ~15% of authors.
## Idea Taxonomy (1,780 raw / 1,467 unique clusters / 628 cross-org)
| Type | Count | % |
|------|-------|---|
| mechanism | 663 | 37.2% |
| architecture | 280 | 15.7% |
| pattern | 251 | 14.1% |
| protocol | 228 | 12.8% |
| requirement | 171 | 9.6% |
| extension | 168 | 9.4% |
| framework | 9 | 0.5% |
| other | 10 | 0.6% |
**IMPORTANT**: Use 628 cross-org ideas as the lead metric, not 1,780 raw count. The raw count is a pipeline artifact (~4.9/draft avg). The 628 represents genuine multi-organizational convergence. See Post 5 data package for details.
## Top Organizations
| Org | Drafts | Authors | Composite Score |
|-----|--------|---------|-----------------|
| Huawei (all entities) | 57+ | 28+ | 3.1 |
| China Mobile | 35 | 24 | 3.2 |
| China Telecom | 23 | 22 | 3.0 |
| China Unicom | 22 | 22 | 3.0 |
| Cisco (all entities) | 25 | 19 | 3.4 |
| Tsinghua University | 16 | 13 | 3.5 |
| Telefonica | 13 | 2 | 3.2 |
| ZTE Corporation | 10 | 10 | 3.0 |
| Google | 10 | 4 | 3.3 |
| Five9 | 10 | 1 | 3.8 |
| Ericsson | 9 | 4 | 3.6 |
## Quality Leaders (Composite >= 3.5, min 3 drafts)
| Org | Drafts | Composite |
|-----|--------|-----------|
| Aiiva.org | 3 | 4.42 |
| AWS | 3 | 4.38 |
| Mozilla | 4 | 3.81 |
| Zhongguancun Lab | 6 | 3.81 |
| Five9 | 10 | 3.75 |
| Bitwave | 6 | 3.75 |
| Siemens | 5 | 3.75 |
| Inria | 4 | 3.70 |
| Ericsson | 9 | 3.59 |
| Nokia | 5 | 3.58 |
| Beijing Univ P&T | 4 | 3.57 |
| Tsinghua | 16 | 3.53 |
| Cisco Systems | 17 | 3.50 |
## WG Adoption
| WG | Drafts | Focus |
|----|--------|-------|
| lamps | 6 | PKI/certificates |
| lake | 5 | EDHOC/lightweight crypto |
| tls | 3 | TLS extensions |
| emu | 3 | EAP methods |
| sshm | 2 | SSH maintenance |
| httpbis | 2 | HTTP extensions |
| anima | 2 | Bootstrapping |
| aipref | 2 | AI preferences |
| ace | 2 | Auth for constrained envs |
19 of 36 WG drafts (53%) are in security/crypto WGs. Only 2 are in an agent-specific WG (aipref).
## Top 10 Highest-Scored Drafts
| Draft | Title | Composite |
|-------|-------|-----------|
| draft-aylward-daap-v2 | Distributed AI Accountability Protocol v2 | 4.75 |
| draft-ietf-lake-app-profiles | EDHOC Application Profiles | 4.75 |
| draft-cowles-volt | Verifiable Operations Ledger and Trace | 4.75 |
| draft-goswami-agentic-jwt | Secure Intent Protocol for Agents | 4.50 |
| draft-chang-agent-token-efficient | Token-efficient Data Layer for Agents | 4.50 |
| draft-birkholz-verifiable-agent-conversations | Verifiable Agent Conversations | 4.50 |
| draft-guy-bary-stamp-protocol | Secure Task-bound Agent Message Proof | 4.50 |
| draft-drake-email-tpm-attestation | Hardware Attestation for Email | 4.50 |
| draft-ietf-tls-ecdhe-mlkem | Post-quantum Hybrid Key Agreement | 4.50 |
| draft-ietf-hpke-hpke | Hybrid Public Key Encryption | 4.50 |
## Updated Gap List (12 gaps, refreshed)
### Critical (3)
1. **Agent Behavior Verification** — No mechanisms to verify agents actually behave according to declared policies
2. **Cross-Domain Agent Liability** — When agents cause harm across organizational boundaries, who's responsible?
3. **Human Override Protocols** — No standardized emergency override protocols for autonomous agents
### High (6)
4. **Agent Resource Exhaustion Protection** — No mechanisms to prevent agents from consuming excessive resources
5. **Agent-Generated Data Provenance** — Insufficient tracking of data origins as info flows between agents
6. **Agent Capability Degradation Handling** — No approach for detecting when agent capabilities degrade
7. **Multi-Agent Coordination Deadlocks** — Insufficient attention to preventing deadlock in multi-agent systems
8. **Agent Privacy Preservation** — Agents process sensitive data without adequate privacy protections
9. **Agent Firmware/Model Update Security** — Insufficient focus on secure update mechanisms
### Medium (3)
10. **Real-time Agent Debugging** — Missing protocols for debugging agents in production
11. **Cross-Protocol Agent Migration** — No mechanisms for migrating agent state between protocols
12. **Agent Energy Consumption Optimization** — Missing standards for energy-aware agent operation
## Most Referenced RFCs (Foundation Standards)
| RFC | Cited By | Subject |
|-----|----------|---------|
| RFC 2119 | 285 drafts | Key words (MUST, SHALL, etc.) |
| RFC 8174 | 237 drafts | Key words update |
| RFC 8446 | 42 drafts | TLS 1.3 |
| RFC 6749 | 36 drafts | OAuth 2.0 |
| RFC 9110 | 34 drafts | HTTP Semantics |
| RFC 8126 | 26 drafts | IANA Guidelines |
| RFC 8259 | 26 drafts | JSON |
| RFC 5280 | 22 drafts | X.509 PKI |
| RFC 7519 | 22 drafts | JWT |
| RFC 9052 | 20 drafts | CBOR Object Signing (COSE) |

View File

@@ -0,0 +1,48 @@
# Data Package: Post 1 — The IETF's AI Agent Gold Rush
## Key Numbers to Update
- **361 drafts** (was 260 in earlier draft)
- **557 authors** from **230 organizations** (was 403 from 184)
- **1,780 ideas** (was 1,262)
- Growth: 3 drafts in Jan 2024 to 86 in Feb 2026 = **~29x in 14 months** (or 43x peak-to-trough)
- Safety ratio: **~8:1** capability-to-safety (improved from 4:1 due to keyword expansion bringing in ML infra drafts with safety elements; the core agent space is still 4:1)
- 12 keywords: agent, ai-agent, llm, autonomous, machine-learning, artificial-intelligence, mcp, agentic, inference, generative, intelligent, aipref
- 6 NEW keywords added: mcp, agentic, inference, generative, intelligent, aipref — these brought 101 additional drafts
## Updated Category Breakdown (for Post 1 table)
| Category | Drafts | % of Corpus |
|----------|--------|-------------|
| A2A protocols | 136 | 37.7% |
| Agent identity/auth | 121 | 33.5% |
| Autonomous netops | 98 | 27.1% |
| Policy/governance | 91 | 25.2% |
| ML traffic mgmt | 74 | 20.5% |
| Agent discovery/reg | 65 | 18.0% |
| AI safety/alignment | 45 | 12.5% |
| Model serving/inference | 42 | 11.6% |
| Human-agent interaction | 30 | 8.3% |
## Growth Curve Data
The steepest acceleration: Sep 2025 (17) -> Oct 2025 (67) -> Nov 2025 (61) -> Feb 2026 (86).
The IETF 121 meeting was Nov 2025 in Dublin. The post-meeting submission spike is visible.
## Author Landscape Summary
- Top 5 authors all from Huawei: Bing Liu (23), Zhenbin Li (21), Nan Geng (20), Qiangzhou Gao (20), Xiaotong Shang (19)
- Jonathan Rosenberg (Five9) is highest-ranked non-Chinese author at #9 with 10 drafts
- Cisco authors collectively: ~25 drafts across entities
## Safety Deficit Framing
- 45 drafts touch safety/alignment (12.5%)
- 136 A2A protocol drafts (37.7%)
- 30 human-agent interaction drafts (8.3%)
- The ratio of A2A protocols to human-agent interaction: **4.5:1** — agents talking to each other, not to humans
## WG Adoption
Only 36 of 361 drafts (10%) are WG-adopted. The standards are still overwhelmingly individual submissions.

View File

@@ -0,0 +1,98 @@
# Data Package: Post 2 — Who's Writing the Rules for AI Agents?
## Geopolitical Split
| Region | Drafts | Authors | % of Drafts |
|--------|--------|---------|-------------|
| Chinese-affiliated | 152 | 218 | 42.1% |
| Western-affiliated | 94 | 81 | 26.0% |
| Other/Unclassified | 158 | 221 | - |
Note: "Other" includes universities, small companies, individuals whose affiliation doesn't map cleanly. Many drafts have co-authors from multiple regions.
## Huawei Dominance (Combined Entities)
Huawei appears under multiple entity names in the data:
- Huawei: 57 drafts, 28 authors
- Huawei Technologies: 19 drafts, 16 authors
- Huawei Technologies Co., Ltd.: 3 drafts
- Huawei Singapore: 3 drafts
Combined estimate: **~60+ unique drafts**, ~40+ unique authors (some overlap). This is approximately **16-17% of all drafts**.
## Top 15 Organizations
| Rank | Org | Drafts | Authors | Composite Score |
|------|-----|--------|---------|-----------------|
| 1 | Huawei | 57 | 28 | 3.11 |
| 2 | China Mobile | 35 | 24 | 3.21 |
| 3 | China Telecom | 23 | 22 | 2.98 |
| 4 | China Unicom | 22 | 22 | 3.02 |
| 5 | Huawei Technologies | 19 | 16 | 3.17 |
| 6 | Cisco Systems | 17 | 10 | 3.50 |
| 7 | Tsinghua University | 16 | 13 | 3.53 |
| 8 | Telefonica | 13 | 2 | 3.21 |
| 9 | ZTE Corporation | 10 | 10 | 3.02 |
| 10 | Pengcheng Laboratory | 10 | 8 | 3.30 |
| 11 | Google | 10 | 4 | 3.33 |
| 12 | Five9 | 10 | 1 | 3.75 |
| 13 | Ericsson | 9 | 4 | 3.59 |
| 14 | Sandelman Software Works | 7 | 1 | 3.46 |
| 15 | Zhongguancun Laboratory | 6 | 4 | 3.81 |
## Chinese Institutional Ecosystem Tiers
### Tier 1: Telecom Vendors
- Huawei (all entities): ~60+ drafts — networking, agent comm, autonomous netops
- ZTE Corporation: 10 drafts
### Tier 2: Telecom Operators
- China Mobile: 35 drafts
- China Telecom: 23 drafts
- China Unicom: 22 drafts
### Tier 3: Research
- Tsinghua University: 16 drafts (highest quality among Chinese orgs, composite 3.53)
- Pengcheng Laboratory: 10 drafts
- Zhongguancun Laboratory: 6 drafts (highest composite: 3.81)
- CAICT: 6 drafts (lowest composite: 2.35)
- Beijing University of Posts & Telecommunications: 4+ drafts
### Tier 4: Tech Companies
- Baidu: (part of multi-author drafts)
- Tencent: (part of multi-author drafts)
## Notable Western Absences
Major AI companies with minimal IETF presence:
- **Microsoft**: Not in top 30 orgs
- **Apple**: Not found
- **Meta/Facebook**: Not found
- **OpenAI**: Not found
- **Anthropic**: Not found
- **Google**: 10 drafts (modest given their agent ecosystem: Gemini, A2A protocol)
## Quality vs Quantity Insight
The inverse relationship is clear:
- High-volume Chinese orgs (Huawei, China Mobile, China Telecom): composite 2.98-3.21
- Lower-volume Western companies (Five9, Ericsson, Siemens, Mozilla): composite 3.59-3.81
- Exception: Tsinghua University — high volume (16) AND high quality (3.53)
- Strongest quality leaders: Aiiva.org (4.42), AWS (4.38), Mozilla (3.81) — all low volume
## Author Velocity (Oct 2025 - Mar 2026)
Top authors by recent output:
1. Bing Liu (Huawei): 23 drafts
2. Zhenbin Li (Huawei): 21 drafts (all in Nov 2025!)
3. Nan Geng (Huawei): 20 drafts
4. Qiangzhou Gao (Huawei): 20 drafts (all in Nov 2025!)
5. Xiaotong Shang (Huawei): 19 drafts (all in Nov 2025!)
The Huawei surge was concentrated in Nov 2025 — a coordinated submission campaign timed with IETF 121.
## Cross-Org Collaboration
180 ideas cross the Chinese-Western organizational divide. The strongest cross-divide convergences:
- A2A Communication: Huawei + China Mobile + CAICT on one side; Deutsche Telekom + Telefonica + Orange on the other
- Agent identity frameworks: both sides building on the same OAuth/SPIFFE foundations

View File

@@ -0,0 +1,62 @@
# Data Package: Post 3 — The OAuth Wars and Other Battles
## OAuth-for-Agents Cluster (18 drafts touching OAuth + agents)
| Draft | Title | Novelty | Maturity | Relevance | Overlap |
|-------|-------|---------|----------|-----------|---------|
| draft-goswami-agentic-jwt | Secure Intent Protocol: JWT Agentic Identity | 5 | 4 | 5 | 2 |
| draft-guy-bary-stamp-protocol | Secure Task-bound Agent Message Proof (STAMP) | 5 | 4 | 5 | 1 |
| draft-oauth-ai-agents-on-behalf-of-user | On-Behalf-Of User Auth for AI Agents | 4 | 3 | 5 | 3 |
| draft-oauth-transaction-tokens-for-agents | Transaction Tokens For Agents | 4 | 4 | 5 | 3 |
| draft-chen-oauth-rar-agent-extensions | Policy & Lifecycle Extensions for OAuth RAR | 4 | 4 | 5 | 2 |
| draft-mishra-oauth-agent-grants | Delegated Agent Authorization Protocol (DAAP) | 4 | 4 | 5 | 3 |
| draft-liu-oauth-a2a-profile | A2A Profile for OAuth Transaction Tokens | 4 | 2 | 5 | 3 |
| draft-aap-oauth-profile | Agent Authorization Profile for OAuth 2.0 | 4 | 4 | 5 | 2 |
| draft-mw-spice-actor-chain | Verifiable Actor Chain for OAuth Token Exchange | 4 | 3 | 5 | 2 |
| draft-song-oauth-ai-agent-collaborate-authz | Multi-AI Agent Collaboration Auth | 4 | 3 | 4 | 3 |
| draft-rosenberg-oauth-aauth | AAuth - Agentic Authorization OAuth 2.1 | 4 | 3 | 4 | 2 |
| draft-gaikwad-south-authorization | SOUTH: Stochastic Auth for Agents | 4 | 4 | 4 | 2 |
| draft-song-oauth-ai-agent-authorization | OAuth Extension: Auth on Target | 3 | 2 | 4 | 4 |
| draft-yao-agent-auth-considerations | Considerations on Agent Auth via OAuth | 3 | 2 | 4 | 2 |
| draft-jia-oauth-scope-aggregation | Scope Aggregation for Agent Workflows | 3 | 3 | 4 | 2 |
## A2A Protocol Cluster Size: 136 drafts
The A2A space is the largest single category. Within it, competing approaches include:
- MCP-based approaches (MCP over MOQT, MCP for agents)
- Custom agent protocols (ANP, NLIP, aiproto/NACT)
- Existing protocol extensions (HTTP-based, gRPC-based)
## Identity/Auth Cluster Size: 121 drafts
Overlaps heavily with A2A. The key battleground is how agents prove identity and delegate authority.
## High-Overlap Drafts (Overlap score >= 4)
Multiple drafts flagged as high-overlap, particularly:
- `draft-hong-nmrg-agenticai-ps` — scored overlap 4, has overlap with 15+ other drafts
- Several OAuth drafts scored overlap 3-4, indicating convergent solutions
## RFC Foundation for Auth/Identity
| RFC | Cited By | What It Is |
|-----|----------|------------|
| RFC 6749 | 36 drafts | OAuth 2.0 (the foundation everyone builds on) |
| RFC 7519 | 22 drafts | JWT |
| RFC 5280 | 22 drafts | X.509 PKI |
| RFC 8392 | 18 drafts | CBOR Web Token (CWT) |
| RFC 9000 | 16 drafts | QUIC |
OAuth 2.0 is the undisputed foundation: 36 drafts explicitly cite it.
## Near-Duplicate Analysis
From embedding similarity, 25+ draft pairs have >0.98 cosine similarity. The densest cluster is around agent networking problem statements and use cases from the same author groups.
## Convergence Signals (Positive)
Despite fragmentation, some convergence exists:
1. **EDHOC** (lake WG, 5 drafts): lightweight crypto handshake gaining WG adoption
2. **SCIM** extensions for agents: building on existing identity management
3. **Verifiable Agent Conversations** (draft-birkholz): high score (4.5), unique approach
4. **STAMP protocol**: task-bound proofs (scored 4.5, overlap 1 = truly novel)

View File

@@ -0,0 +1,58 @@
# Data Package: Post 4 — What Nobody's Building (And Why It Matters)
## Updated Gap List (12 gaps, refreshed with 361-draft corpus)
### Critical (3)
1. **Agent Behavior Verification** — No mechanisms to verify agents actually behave according to declared policies. Many drafts define what agents SHOULD do, few address verification.
2. **Cross-Domain Agent Liability** — NEW in refreshed analysis. When autonomous agents operate across organizational boundaries and cause harm, who's liable? No framework exists.
3. **Human Override Protocols** — No standardized emergency override protocols. Only 30 of 361 drafts even address human-agent interaction.
### High (6)
4. **Agent Resource Exhaustion Protection** — No mechanisms to prevent agents from consuming excessive resources (compute, network, memory).
5. **Agent-Generated Data Provenance** — Despite 145 drafts on data formats, insufficient tracking of data origins as info flows between agents.
6. **Agent Capability Degradation Handling** — No approach for detecting/handling when agent capabilities degrade (model drift, data staleness).
7. **Multi-Agent Coordination Deadlocks** — With 136 A2A protocol drafts, almost no attention to preventing deadlock. Renamed from "Multi-Agent Consensus."
8. **Agent Privacy Preservation** — NEW gap. Agents process sensitive data without adequate privacy protections.
9. **Agent Firmware/Model Update Security** — 42 model serving drafts, but few address secure update mechanisms.
### Medium (3)
10. **Real-time Agent Debugging** — Missing protocols for debugging agents in production.
11. **Cross-Protocol Agent Migration** — No mechanisms for migrating agent state between different A2A protocols.
12. **Agent Energy Consumption Optimization** — No standards for energy-aware agent operation.
## Gap vs Category Contrast
| Focus Area | Drafts | Gap Coverage |
|-----------|--------|--------------|
| A2A protocols | 136 | Well-covered (too well — fragmented) |
| Identity/Auth | 121 | Well-covered |
| Autonomous netops | 98 | Moderately covered |
| Behavior verification | ~5 | CRITICAL GAP |
| Human override | ~30 | CRITICAL GAP (quantity insufficient, quality missing) |
| Resource management | ~0 explicit | CRITICAL GAP |
| Liability/accountability | ~3 (DAAP, VOLT) | CRITICAL GAP |
| Error recovery/deadlock | ~0 explicit | HIGH GAP |
## Key Contrasts for Narrative
- **136 A2A protocol drafts** vs **~5 behavior verification drafts** = 27:1 ratio
- **121 identity/auth drafts** vs **~0 resource exhaustion drafts** = infinity
- **30 human-agent interaction drafts** vs **136 A2A protocol drafts** = 4.5:1 agents-talking-to-agents vs agents-talking-to-humans
## Ideas that Address Gaps
From the 1,780 ideas, some partially address gaps:
- Behavior verification: "Verifiable Agent Conversations" (draft-birkholz), DAAP v2, VOLT
- Human override: scattered across 30 human-agent drafts but no unified approach
- Resource management: some ideas in ML traffic mgmt (74 drafts) but from network perspective, not agent perspective
- Liability: DAAP explicitly addresses accountability; VOLT addresses audit trails
## Structural Insight (from Architect)
The critical gaps are exactly the ones that require cross-team consensus: behavior verification needs input from security + A2A + safety + governance teams. But the team-bloc structure (33 blocs, mostly intra-org) makes cross-team work structurally difficult. **Gap severity correlates with coordination difficulty.**
## WG Gap Coverage
- 10 of 12 gaps have some WG coverage
- 2 gaps with ZERO WG backing: Agent Firmware/Model Update Security, Agent Energy Consumption
- Security WGs (lamps, lake, tls, emu, ace) cover 19 of 36 WG drafts — the IETF is addressing agent security through existing security WGs, not through new agent-specific WGs

View File

@@ -0,0 +1,89 @@
# Data Package: Post 5 — Where 230 Organizations Agree (And Where They Don't)
Reframed per Architect's direction: lead with cross-org convergence (628 ideas), not raw extraction count (1,780).
## Lead Metric: Cross-Organization Convergence
- **1,467 unique idea clusters** (after fuzzy dedup from 1,780 raw extractions)
- **628 ideas** appear across 2+ organizations = genuine multi-org convergence
- **628 / 1,467 = 43%** of ideas have cross-org validation
### Convergence Pyramid
| Org Count | Ideas | What It Means |
|-----------|-------|---------------|
| 14 orgs | 8 ideas | Single mega-consortium (ML infra draft) |
| 7+ orgs | 14 ideas | Strong multi-org convergence |
| 4-6 orgs | 179 ideas | Solid cross-org agreement |
| 2-3 orgs | 427 ideas | Early convergence signals |
| 1 org only | 839 ideas | Unique to one organization |
## The Real Convergence: Ideas in 2+ Independent Drafts from 2+ Orgs
These are the strongest convergence signals — ideas that different teams proposed independently:
| Idea | Drafts | Orgs | Significance |
|------|--------|------|-------------|
| AI Agent Communication Framework | 2 | 7 | Five9/Cisco AND Chinese telcos+ANP — cross-bloc convergence |
| Agent Gateway | 3 | 6 | China Telecom, Zhongguancun, AsiaInfo, Beijing U, Huawei, UnionPay |
| Distributed AI Inference Architecture | 2 | 5 | Cross-institution (Hong Kong, IRTF) |
| Network Digital Twin Support | 2 | 5 | Research + operator convergence |
| Multi-Agent Communication Protocol | 8 | 7 | AsiaInfo, BUPT, China Mobile, China Telecom, China Unicom, Huawei, Zhongguancun |
| AI Agent Communication Network (ACN) | 5 | 7 | ANP Community, China Mobile, China Telecom, China Unicom, Cisco, Five9, Huawei |
| Tool Enumeration/Invocation API | 3 | 2 | Rosenberg (Five9) — coherent toolkit across 3 drafts |
| CHEQ Protocol | 2 | 3 | Rosenberg — conversation verification |
**Key finding**: "AI Agent Communication Framework" spans the Chinese-Western divide (Five9+Cisco on one side, 4 Chinese telcos+ANP on the other). This is the strongest cross-bloc convergence signal in the dataset.
## The 14-Org Consortium (Context, Not Convergence)
8 ideas from one mega-draft (AI inference networking): Ultra-Low Latency Routing, Tensor Parallelization, FARE, Adaptive Load Balancing, etc. All 14 orgs (Hygon, Tencent, Baidu, Broadcom, Huawei, China Mobile + 8 more) are co-authors on a single draft. This is a consortium submission, not independent convergence. Still significant — it's the broadest cross-org collaboration in the dataset — but should not be presented as "14 organizations independently arrived at the same idea."
## Ideas to Watch (Top-Scored Drafts)
1. **DAAP v2** (Distributed AI Accountability Protocol) — Composite 4.75. Addresses behavior verification gap.
2. **VOLT** (Verifiable Operations Ledger and Trace) — Composite 4.75. Audit trail for agents.
3. **EDHOC Application Profiles** — WG-adopted (lake), composite 4.75. Lightweight crypto.
4. **Agentic JWT** (Secure Intent Protocol) — Composite 4.50. JWT-compatible agent identity.
5. **STAMP Protocol** — Composite 4.50, overlap=1 (truly novel). Task-bound message proofs.
6. **Verifiable Agent Conversations** — Composite 4.50. Cryptographic conversation records.
7. **Token-efficient Data Layer** — Composite 4.50. Cost-conscious agent communication.
## Idea Taxonomy (Background Context)
| Type | Count | % |
|------|-------|---|
| mechanism | 663 | 37.2% |
| architecture | 280 | 15.7% |
| pattern | 251 | 14.1% |
| protocol | 228 | 12.8% |
| requirement | 171 | 9.6% |
| extension | 168 | 9.4% |
| other | 19 | 1.1% |
Note: These are raw extraction counts (~4.9 per draft avg). Use as background taxonomy only — the convergence numbers are the lead metric.
## Convergence-Gap Tension
The punchline for Post 5: teams agree on WHAT to build but disagree on HOW. The 628 cross-org ideas show broad agreement on the problem space (agent communication, identity, infrastructure). But the 12 gaps show no one is building the connective tissue (behavior verification, human override, error recovery, liability).
| Convergence Area | Cross-Org Ideas | Corresponding Gap |
|-----------------|-----------------|-------------------|
| Agent communication | High (136 A2A drafts) | Cross-Protocol Migration (MEDIUM) |
| Agent identity | High (121 auth drafts) | Cross-Domain Liability (CRITICAL) |
| ML infrastructure | High (74 ML traffic) | Energy Optimization (MEDIUM) |
| Autonomous netops | High (98 drafts) | Capability Degradation (HIGH) |
| Safety/oversight | Low (45 drafts) | Behavior Verification (CRITICAL), Human Override (CRITICAL) |
## Gap-to-Idea Mapping
| Gap | Ideas Addressing It | Coverage Level |
|-----|-------------------|----------------|
| Behavior Verification | DAAP, VOLT, Verifiable Conversations | Partial (3 drafts) |
| Cross-Domain Liability | DAAP accountability, STAMP proofs | Minimal |
| Human Override | Scattered across 30 drafts | No unified approach |
| Resource Exhaustion | ML traffic mgmt ideas | Indirect only |
| Data Provenance | VOLT, some data format ideas | Partial |
| Capability Degradation | None explicit | Absent |
| Coordination Deadlocks | None explicit | Absent |
| Privacy Preservation | Some policy/governance ideas | Minimal |

View File

@@ -0,0 +1,65 @@
# Data Package: Post 6 — Drawing the Big Picture
## Synthesis Numbers
- **361 drafts, 557 authors, 230 orgs, 1,780 ideas, 12 gaps**
- **136 A2A protocols** with no interoperability layer
- **121 identity/auth drafts** building on OAuth 2.0 (RFC 6749, cited by 36 drafts)
- **45 safety drafts** vs **316 capability drafts** = 7:1 ratio
- **36 WG-adopted drafts** (10%) — 19 in security WGs, 2 in aipref
## The Foundation Layer (RFC Cross-References)
The ecosystem is built on:
1. **OAuth 2.0** (RFC 6749, 36 citations) — the auth foundation
2. **TLS 1.3** (RFC 8446, 42 citations) — the security transport
3. **HTTP Semantics** (RFC 9110, 34 citations) — the API layer
4. **JWT** (RFC 7519, 22 citations) — token format
5. **X.509 PKI** (RFC 5280, 22 citations) — identity certificates
6. **COSE** (RFC 9052, 20 citations) — constrained object signing
7. **CBOR** (RFC 8949, 19 citations) — binary data format
8. **QUIC** (RFC 9000, 16 citations) — transport
This reveals the DNA: the agent ecosystem is being built on web + IoT foundations. OAuth + JWT + TLS for the web side, COSE + CBOR for the constrained/IoT side.
## WG Adoption as Traction Signal
| Category | WG Drafts | Individual Drafts | WG % |
|----------|-----------|-------------------|------|
| Security/Crypto (lamps, lake, tls, emu, ace) | 19 | - | 53% of WG |
| Agent-specific (aipref) | 2 | - | 6% of WG |
| Other (httpbis, anima, suit, etc.) | 15 | - | 42% of WG |
**Key insight**: The IETF is not building new agent WGs — it's retrofitting existing security WGs for agents. This is actually good: it builds on proven foundations.
## Five Proposed Ecosystem Drafts (from Architect)
These address the gaps:
1. **AEM** (Agent Execution Model) — DAG-based orchestration
2. **ATD** (Agent Trust and Delegation) — builds on SPIFFE/WIMSE
3. **HITL** (Human-in-the-Loop) — override protocols
4. **AEPB** (Agent Ecosystem Profile for Business) — assurance profiles
5. **APAE** (Agent Protocol Adaptation and Exchange) — interop layer
## Predictions Data Support
1. **WG consolidation is likely**: Multiple competing approaches in auth (14+ OAuth drafts) creates pressure for WG adoption
2. **Safety will lag**: Only 10% of WG drafts address safety; the structural bias toward capability continues
3. **Chinese institutional advantage**: 152 drafts from Chinese orgs, coordinated (Huawei bloc: 94% cohesion); Western response is fragmented and late
4. **The interop layer is the bottleneck**: 136 A2A drafts, no interop = the single biggest structural problem
## Two Equilibria (from Architect's Vision Document)
- **Microservices chaos**: If fragmentation persists and safety ratio holds, the agent ecosystem becomes like early microservices — technically possible but operationally painful, with each deployment requiring custom integration
- **Layered web architecture**: If WGs consolidate fragmentation and the safety ratio narrows, the ecosystem converges on a layered architecture like the web (transport -> session -> identity -> application)
The 8:1 safety ratio is the leading indicator. If it narrows toward 4:1 or better, the good equilibrium is achievable.
## Builder Guidance Data
For the "What to Do" section:
1. **Watch ECT** (Ephemeral Credential Trust) — bridges SPIFFE-WIMSE, already WG-tracked
2. **Build HITL now** — only 30 drafts in this space; early movers define the patterns
3. **Design for protocol translation** — the 136-protocol zoo means any production system needs translation layers
4. **Invest in error recovery** — zero explicit drafts on agent error recovery; this is a field-defining opportunity
5. **Participate in IETF** — only 10% of drafts are WG-adopted; there's room for new contributors to shape outcomes

View File

@@ -0,0 +1,175 @@
# Deep Analysis Round 2 — Tasks #23-28
## Task #23: Draft Revision Velocity
**Key finding: 55% of drafts are still at revision 00 — first submission, never iterated.**
### Overall Stats
| Metric | Value |
|--------|-------|
| Total drafts | 361 |
| At rev-00 (never iterated) | 198 (54.8%) |
| At rev-03+ (actively evolving) | 64 (17.7%) |
| Average revision | 2.21 |
### Iteration vs Fire-and-Forget by Org
| Org | Drafts | % at rev-00 | Avg Rev | Pattern |
|-----|--------|-------------|---------|---------|
| Ericsson | 9 | 11.1% | 4.8 | **Iterators** — almost everything gets revised |
| Sandelman Software | 7 | 14.3% | 14.3 | **Deep iterators** — fewer drafts, heavy revision |
| Nokia | 5 | 20.0% | 3.2 | **Iterators** |
| Siemens | 5 | 0.0% | 17.2 | **Deepest iterators** — zero fire-and-forget |
| Boeing R&T | 6 | 0.0% | 28.2 | **Extreme iterators** (mature, long-running drafts) |
| ZTE Corporation | 10 | 40.0% | 1.3 | **Mixed** |
| Telefonica | 13 | 46.2% | 1.8 | **Mixed** |
| Google | 10 | 50.0% | 1.7 | **Mixed** |
| China Unicom | 22 | 54.5% | 0.9 | **Mostly fire-and-forget** |
| China Telecom | 23 | 60.9% | 1.0 | **Mostly fire-and-forget** |
| Tsinghua | 16 | 62.5% | 0.4 | **Fire-and-forget** |
| Huawei | 57 | 64.9% | 0.6 | **Fire-and-forget** — 37 of 57 never revised |
| Huawei Technologies | 19 | 68.4% | 0.7 | **Fire-and-forget** |
| Five9 | 10 | 90.0% | 0.1 | **All new** (recent entrant) |
| Pengcheng Lab | 10 | 90.0% | 0.1 | **All new** |
**Narrative insight**: Western companies (Ericsson, Sandelman, Siemens, Boeing, Nokia) have dramatically lower fire-and-forget rates. They submit fewer drafts but iterate heavily. Chinese orgs submit more but ~60-65% are never revised. This is the "volume vs commitment" story — submitting a draft is cheap, iterating it signals genuine investment.
**Best quotable stat**: "65% of Huawei's 57 drafts have never been revised beyond their first submission."
---
## Task #24: Safety Ratio Trend Over Time
**Key finding: The safety ratio is NOT improving. It fluctuates wildly but the structural deficit persists.**
| Month | Safety | Capability-only | Total | Ratio |
|-------|--------|-----------------|-------|-------|
| 2025-07 | 2 | 3 | 5 | 1.5:1 |
| 2025-09 | 4 | 13 | 17 | 3.3:1 |
| 2025-10 | 5 | 62 | 67 | **12.4:1** |
| 2025-11 | 7 | 54 | 61 | 7.7:1 |
| 2025-12 | 3 | 13 | 16 | 4.3:1 |
| 2026-01 | 8 | 46 | 54 | 5.8:1 |
| 2026-02 | 13 | 73 | 86 | 5.6:1 |
| 2026-03 | 1 | 21 | 22 | **21:1** |
The ratio spiked to 12.4:1 during the Oct 2025 surge (IETF 121 pre-meeting rush — nearly all capability drafts). Feb 2026 shows some improvement (5.6:1) with 13 safety drafts — the best absolute month for safety. But the overall pattern is clear: safety submissions grow linearly while capability submissions grow exponentially. The gap widens during surges.
**For Post 4 (THE CLIMAX)**: The ratio data tells a story of structural neglect, not intentional choice. Nobody is anti-safety; the incentive structure just rewards capability work. Each org's submission campaign prioritizes its core protocol proposals, and safety is nobody's core.
---
## Task #25: RFC Foundation Divergence by Bloc
**Key finding: Chinese and Western blocs build on DIFFERENT foundations.**
### Chinese Bloc — Top RFCs
| RFC | Cited By | Subject |
|-----|----------|---------|
| RFC 2119 | 114 | Key words |
| RFC 8174 | 86 | Key words update |
| RFC 8259 | 11 | JSON |
| RFC 6749 | 11 | OAuth 2.0 |
| RFC 6241 | 10 | NETCONF |
| RFC 8446 | 8 | TLS 1.3 |
| RFC 8641 | 6 | YANG Push |
| RFC 8639 | 6 | Subscription to YANG Notifications |
| RFC 7950 | 5 | YANG |
| RFC 7575 | 5 | Autonomic networking |
### Western Bloc — Top RFCs
| RFC | Cited By | Subject |
|-----|----------|---------|
| RFC 2119 | 73 | Key words |
| RFC 8174 | 70 | Key words update |
| RFC 8446 | 18 | **TLS 1.3** |
| RFC 5280 | 12 | **X.509 PKI** |
| RFC 9528 | 11 | **EDHOC** |
| RFC 9110 | 11 | **HTTP Semantics** |
| RFC 9052 | 11 | **COSE** |
| RFC 8949 | 9 | **CBOR** |
| RFC 8613 | 9 | **OSCORE** |
| RFC 8392 | 9 | **CWT** |
| RFC 6749 | 7 | OAuth 2.0 |
| RFC 7252 | 7 | **CoAP** |
### The Divergence
| Foundation | Chinese | Western |
|-----------|---------|---------|
| **Network management (YANG/NETCONF)** | Strong (6241, 8639, 8641, 7950) | Absent |
| **PKI/Certificates (X.509)** | Absent | Strong (5280) |
| **IoT security (COSE/CBOR/OSCORE/CoAP)** | Absent | Strong (9052, 8949, 8613, 7252) |
| **Lightweight auth (EDHOC, CWT)** | Absent | Strong (9528, 8392) |
| **Web APIs (HTTP)** | Weak | Strong (9110) |
| **OAuth 2.0** | Present (11) | Present (7) |
| **TLS 1.3** | Moderate (8) | Strong (18) |
| **Autonomic networking** | Present (7575) | Absent |
**Narrative insight**: The Chinese bloc is building agent infrastructure on YANG/NETCONF — network management protocols for autonomous netops. The Western bloc is building on IoT security (COSE/CBOR/CoAP) and web infrastructure (HTTP/TLS/PKI). These are fundamentally different technology stacks. The ONLY shared foundation is OAuth 2.0, which both blocs cite at similar rates.
**For Post 2**: This means fragmentation goes deeper than protocol design — the two blocs are building on different technological DNA. Even if they agree on agent communication patterns, the underlying plumbing is incompatible.
---
## Task #27: Category Co-Occurrence Matrix
**Key finding: Safety IS structurally isolated from core protocol work.**
### Safety Co-Occurrence
| Safety co-occurs with | Drafts |
|----------------------|--------|
| Policy/governance | 26 |
| Agent identity/auth | 25 |
| A2A protocols | **12** |
| Data formats/interop | 7 |
| Human-agent interaction | 5 |
| Autonomous netops | 4 |
| ML traffic mgmt | 3 |
Safety co-occurs most with governance and identity — "paper" concerns. It co-occurs with A2A protocols only 12 times out of 136 A2A drafts (8.8%). Safety is essentially disconnected from the core protocol design work.
### Strongest Co-Occurrences (Top 10)
| Category Pair | Co-occurrences |
|---------------|---------------|
| A2A + Data formats | 55 |
| A2A + Agent discovery | 40 |
| Identity + Policy | 38 |
| A2A + Identity | 35 |
| A2A + Autonomous netops | 34 |
| Discovery + Data formats | 34 |
| Identity + Data formats | 33 |
| Autonomous netops + ML traffic | 28 |
| **Safety + Policy** | **26** |
| **Safety + Identity** | **25** |
**For Post 4**: Safety's strongest links are to governance and identity — abstract/policy-level work. Its weakest links are to A2A (12), ML traffic (3), and autonomous netops (4) — the categories where agents actually DO things. Safety is being thought about in the abstract, not integrated into protocol design. This is the structural version of the "highways before traffic lights" metaphor.
---
## Task #28: IETF Meeting Timing Effect
**Key finding: 51.5% of all drafts were submitted in the 4-week windows before IETF 121 and 122.**
| Window | Drafts | % of Total |
|--------|--------|------------|
| Pre-IETF 119 (Feb-Mar 2024) | 1 | 0.3% |
| Pre-IETF 120 (Jun-Jul 2024) | 0 | 0.0% |
| Pre-IETF 121 (Oct-Nov 2025) | **107** | **29.6%** |
| Pre-IETF 122 (Feb-Mar 2026) | **79** | **21.9%** |
| All other periods | 174 | 48.2% |
### Huawei's IETF 121 Campaign
| Period | Huawei Drafts |
|--------|--------------|
| Pre-IETF 121 (4-week window) | **43** |
| All other periods combined | 26 |
**62% of all Huawei drafts (43 of 69 across all entities) were submitted in the 4 weeks before IETF 121 Dublin.** This is not organic growth — this is a coordinated submission campaign timed for maximum standards-body impact.
For comparison, the entire corpus had 107 drafts in that same window. Huawei alone accounted for **40% of all pre-IETF 121 submissions**.
**For Post 1**: The growth curve isn't just organic interest — it's heavily driven by strategic submission campaigns timed to IETF meetings. The Oct-Nov 2025 spike (128 drafts in 2 months) is largely one company's coordinated push.
**For Post 2**: This is the strongest evidence of Huawei's strategic standards campaign. 43 drafts in 4 weeks from one organization is unprecedented in this dataset.

View File

@@ -0,0 +1,59 @@
# Surprising Findings — Deep Analysis Phase
These findings challenge assumptions or reveal unexpected patterns in the 361-draft corpus.
## 1. The Keyword Expansion Uncovered a Different Community
The 101 new drafts from keywords (mcp, agentic, inference, generative, intelligent, aipref) brought:
- **154 new authors** (557 total, up from 403)
- **46 new organizations** (230 total, up from 184)
- Heavy skew toward ML infrastructure: "ML traffic mgmt" went from 23 to 73 drafts, "Model serving/inference" from 13 to 42
This means the original analysis systematically missed the ML infrastructure community. The "agent" keyword captured the protocol designers; "inference" and "generative" captured the infrastructure builders. These are largely separate communities working on adjacent problems.
## 2. The Safety Ratio Improved — But It's an Illusion
The safety ratio went from 4:1 (260 drafts) to ~8:1 by tag count but the improvement is because the ML infrastructure drafts have broader category tags (many touch "safety" tangentially through network reliability). The core agent protocol space remains deeply safety-deficient.
## 3. Huawei's Nov 2025 Coordinated Campaign
Five Huawei authors each submitted 19-21 drafts in a single month (Nov 2025). This is the largest coordinated submission campaign in the dataset. Zhenbin Li, Qiangzhou Gao, and Xiaotong Shang all published exclusively in Nov 2025. This looks like a strategic push timed for IETF 121 (Dublin, Nov 2025).
## 4. Quality Inversely Correlates with Quantity
| Pattern | Examples | Avg Composite |
|---------|----------|---------------|
| High volume, low quality | Huawei (57 drafts, 3.11), CAICT (6, 2.35), Futurewei (6, 2.67) | ~2.7-3.1 |
| Low volume, high quality | AWS (3, 4.38), Aiiva.org (3, 4.42), Mozilla (4, 3.81) | ~3.8-4.4 |
| Exception | Tsinghua (16, 3.53), Five9 (10, 3.75) | High both |
The top-rated organizations are nearly all low-volume Western/independent contributors. Volume does not predict quality.
## 5. The Agent Ecosystem is Being Built in Security WGs, Not Agent WGs
19 of 36 WG-adopted drafts (53%) are in security WGs (lamps, lake, tls, emu, ace). Only 2 are in the agent-specific "aipref" WG. The IETF isn't creating new infrastructure for agents — it's adapting existing security infrastructure. This is arguably the right approach but means agent-specific concerns (behavior verification, human override) have no natural WG home.
## 6. The 14-Author Mega-Draft Consortium
One draft about AI inference networking has 14 co-authors from 14 different organizations (Hygon, China Mobile, Tencent, Huawei, Broadcom, Ruijie, Metanet, Biren, Baidu, Moore Threads, Resnics, Centec, Cloudnine, Enflame). This is by far the broadest cross-org collaboration in the dataset — and it's focused on ML infrastructure, not agent protocols.
## 7. Jonathan Rosenberg Is the Western Counterweight
Five9's Jonathan Rosenberg (9 drafts, composite 3.75) is the only Western individual matching Huawei's output volume. His drafts (AAuth, NACT, aiproto) represent a coherent vision for agent communication — arguably the closest thing to a Western "ecosystem proposal" matching Huawei's breadth.
## 8. The Accountability Drafts Are the Best-Scored
The top 3 drafts by composite score are ALL about accountability/verification:
1. DAAP v2 (Distributed AI Accountability Protocol) — 4.75
2. EDHOC Application Profiles — 4.75
3. VOLT (Verifiable Operations Ledger and Trace) — 4.75
The market is hungry for safety/accountability solutions — when they appear, they're rated highest. The problem isn't that safety work is unwanted; it's that few teams are doing it.
## 9. OAuth 2.0 Is the Undisputed Foundation
RFC 6749 (OAuth 2.0) is cited by 36 drafts — more than any non-boilerplate RFC. The agent identity ecosystem is essentially an OAuth ecosystem. Any agent auth approach that doesn't build on OAuth will face adoption headwinds.
## 10. Two Gaps Have Zero Institutional Backing
"Agent Firmware/Model Update Security" and "Agent Energy Consumption Optimization" have zero WG-adopted drafts addressing them. These represent the intersection of importance and neglect — critical infrastructure needs that no working group has prioritized.

View File

@@ -0,0 +1,127 @@
# State of the IETF AI Agent Ecosystem: Where We Are and Where We're Going
*A vision document synthesizing 361 drafts, 557 authors, 628 cross-org convergent ideas, and 12 gaps into a picture of the AI agent standards landscape in 2026 and its trajectory through 2028.*
---
## I. The Current State: A Landscape in Formation
The IETF's AI agent standardization landscape in March 2026 resembles a city under construction: cranes everywhere, foundations going in, multiple development teams building in parallel -- but no master plan, no zoning, and the safety inspectors have not been hired yet.
The numbers tell the story. In nine months, from June 2025 to February 2026, the rate of AI/agent-related Internet-Draft submissions grew from 2 per month to 72 -- a 36x increase. The corpus now contains **361 drafts** from **557 authors** representing **230 organizations**. Our cross-organization analysis found **628 technical ideas** independently proposed by multiple organizations -- genuine consensus signals amid the noise -- and identified **12 standardization gaps**, three of them critical.
This is not incremental growth. This is a phase transition, comparable to the IoT draft surge of 2014-2016 or the early web standards push of the mid-1990s. The IETF is being asked to standardize the infrastructure for a new class of internet participant: the autonomous software agent.
But the landscape that has emerged is not converging. It is fragmenting.
### The Structural Problems
**Fragmentation without coordination.** The 361 drafts cluster into at least 42 topically overlapping groups. The most crowded area -- OAuth extensions for AI agents -- has 14 competing drafts, each proposing a different approach to the same problem: how does an autonomous agent authenticate and obtain authorization? In the agent-to-agent communication space, 120 drafts propose protocols with no interoperability layer between them. We found 25 near-duplicate pairs where teams independently wrote essentially the same specification.
**Concentration without diversity.** One organization -- Huawei -- accounts for 53 authors and 66 drafts, 18% of the entire corpus. A single 13-person team within Huawei co-authors 22 drafts at 94% internal cohesion. The broader Chinese institutional ecosystem (Huawei, China Mobile, China Telecom, China Unicom, Tsinghua University, ZTE, BUPT, CAICT, Zhongguancun Lab) collectively fields over 160 authors. Meanwhile, Google, Microsoft, and Apple are largely absent from AI agent protocol work. The standards that will govern how AI agents identify, authenticate, and communicate on the internet are being written by a remarkably narrow group.
**Capability without safety.** For every draft addressing AI safety, alignment, or human oversight, approximately four drafts build new agent capabilities. Only 44 of 361 drafts touch safety. Only 30 address human-agent interaction, compared to 120 A2A protocols and 93 autonomous network operations drafts. The three critical gaps we identified -- behavior verification, resource management, and error recovery -- all concern what happens when agents fail or misbehave. These gaps have received minimal attention.
---
## II. The Missing Architecture
The deepest problem is not fragmentation or concentration. It is the absence of connective tissue.
The 361 drafts contain the pieces of an agent ecosystem. What they lack is a shared model of how those pieces fit together. Consider what a deployed multi-agent system actually needs:
1. **An execution model**: How are agent tasks organized, sequenced, and tracked? What is the unit of work? How do dependencies between tasks get expressed? Today: no standard. Every draft assumes its own task model.
2. **Human oversight primitives**: When does a human need to approve, intervene, or override an agent's decision? How does the override propagate? How is the decision recorded for audit? Today: 30 drafts touch this, none define standard primitives.
3. **Error recovery and rollback**: When an autonomous agent makes a bad decision, how do you undo it? When a cascade of failures ripples through an agent network, how do you contain the blast radius? Today: one draft (draft-yue-anima-agent-recovery-networks) partially addresses this. The rest of the 360 ignore it.
4. **Protocol interoperability**: With 120 competing A2A protocols, how does an agent speaking Protocol A communicate with an agent speaking Protocol B? Today: zero ideas in the entire corpus for cross-protocol translation. This gap is entirely unaddressed.
5. **Assurance profiles**: How does the same agent ecosystem work in a fast development environment (acceptable risk, minimal overhead) AND a regulated production environment (proofs, attestations, compliance)? Today: the discussion is split between safety-oriented drafts and capability-oriented drafts with no bridge between them.
These five needs map precisely to the five most critical and high-severity gaps in our analysis. They are not exotic requirements; they are the basic infrastructure that any production agent deployment will need. The fact that 361 drafts have been written without addressing them is the landscape's defining weakness.
---
## III. What 2027 Will Look Like: Three Scenarios
Based on current trajectories, three scenarios emerge for the IETF AI agent ecosystem over the next 18-24 months.
### Scenario A: Fragmentation Wins (most likely without intervention)
The current trajectory continues. Draft volume doubles again. The OAuth-for-agents cluster grows from 14 to 25+ proposals. No interoperability layer emerges. Working groups adopt a handful of individual drafts but not a cohesive architecture. Safety work remains a sideshow.
**Result**: Implementers face a multi-protocol landscape with no clear choices. Large platforms (those with the engineering resources to build their own stacks) proceed anyway, creating de facto standards through market power rather than consensus. The IETF's role diminishes to retroactively documenting what platforms already deployed.
**Probability without intervention**: High. This is the default path.
### Scenario B: Consolidation Through Working Groups
The IETF establishes one or more focused working groups specifically for AI agent architecture (not just individual protocols). These WGs force consolidation: the 14 OAuth proposals get down to 2-3. The 120 A2A protocols get mapped against a common requirements document. Gap-filling work gets explicitly chartered.
**Result**: A more coherent landscape emerges by mid-2027. Not a single standard, but a small number of complementary standards with defined interfaces between them. Safety work gets a mandate.
**Conditions required**: A champion organization (or coalition) willing to do the coordination work. A BoF or side meeting at an upcoming IETF meeting that gains enough momentum to charter a WG. Active participation from implementers (cloud providers, agent framework builders) who can provide deployment reality checks.
**Probability**: Moderate. The raw material exists -- 628 cross-org convergent ideas show that organizations already agree on the building blocks. What is needed is organizational will to connect them.
### Scenario C: Architecture-First Design
Someone -- a coalition of authors, a proposed WG, or an influential design team -- produces a holistic agent ecosystem architecture document. This document defines the execution model (DAG-based), the oversight primitives (HITL as first-class), the interoperability layer (protocol-agnostic bindings), and the assurance framework (dual regime from relaxed to regulated). Individual drafts then map themselves to roles within this architecture.
**Result**: The fastest path to a deployable agent infrastructure. The architecture does not replace existing drafts; it organizes them. The 5-draft ecosystem proposal (AEM/ATD/HITL/AEPB/APAE) outlined in our analysis represents one possible realization of this approach.
**Conditions required**: The architecture must build on work that already has momentum (WIMSE, ECT, SPIFFE). It must be protocol-agnostic -- prescribing the execution model and semantics, not the wire format. It must address the dual-regime problem (same model works in K8s and in regulated deployments).
**Probability**: Lower, but this is the scenario that produces the best outcome.
---
## IV. What Builders Should Do Today
For anyone building agent systems, deploying multi-agent workflows, or participating in IETF standards, the data suggests five concrete actions:
### 1. Watch the execution model space
The most critical missing piece is a shared execution model for agent tasks. Execution Context Tokens (ECT, draft-nennemann-wimse-ect) are the most promising candidate -- they define a JWT-based DAG for tracking task execution, building on WIMSE. If ECT gains WG adoption, it becomes the substrate on which orchestration, recovery, and audit are built. Monitor this draft.
### 2. Build human oversight in now, not later
The 30-vs-120 human-agent-to-A2A ratio is not just a standards problem; it is an engineering problem. Systems being designed today without human override primitives will need to be retrofitted. The CHEQ protocol (draft-rosenberg-aiproto-cheq) and the LLM-assisted network management framework (draft-cui-nmrg-llm-nm) both propose HITL models. Pick one and build to it, or design your own -- but do not ship agent systems without override capability.
### 3. Assume protocol diversity, design for translation
The 120-protocol landscape is not going to consolidate to one protocol. Design agent systems with protocol abstraction layers. Assume that agents in your ecosystem will eventually need to communicate with agents speaking different protocols. The gateway pattern (draft-agent-gw, draft-li-dmsc-macp) is emerging as the pragmatic solution.
### 4. Invest in error recovery
The near-total absence of error recovery standards means you are on your own. Draft-yue-anima-agent-recovery-networks offers a task-oriented recovery framework; the ECT DAG model provides rollback semantics. Implement checkpointing and rollback in your agent workflows now. When the standards catch up, you will be ahead.
### 5. Participate in the standards process
The landscape's concentration problem is only solved by broader participation. If your organization deploys AI agents, you have a stake in how these standards develop. The most impactful contribution right now is gap-filling: behavior verification, resource management, error recovery, and cross-protocol translation. These are areas where new drafts would face minimal competition and maximal impact.
---
## V. The 2028 Endgame
Two years from now, the IETF AI agent landscape will have resolved into one of two equilibria.
In the first equilibrium, the landscape looks like today's microservices ecosystem: a chaotic but functional collection of protocols, libraries, and frameworks, held together by platform-specific integrations and de facto standards from the largest cloud providers. The IETF's work exists but is incomplete, and the real interoperability happens at higher layers (agent frameworks like LangChain, Semantic Kernel, or their successors). Safety is bolted on after deployment.
In the second equilibrium, the landscape looks more like the web: a layered architecture where identity (like TLS), communication (like HTTP), and semantics (like HTML) are cleanly separated, with standardized interfaces between them. Agents identify via WIMSE, execute via ECT-based DAGs, communicate via protocol-agnostic bindings, and operate under assurance profiles that scale from development to regulated production. Safety is built in, not bolted on.
The data we have analyzed -- 361 drafts, 628 cross-org convergent ideas, 12 gaps, 18 team blocs -- contains the building blocks for the second equilibrium. The question is whether the IETF community organizes itself to assemble them before market reality imposes the first.
The history of internet standards suggests that both happen: a messy market reality emerges first, followed by standards that rationalize and improve it. The web started with browser wars and incompatible HTML, then converged on HTML5. Mobile started with a zoo of protocols, then converged on LTE/5G. The AI agent ecosystem may follow the same path.
But the gap between "messy first deployment" and "rationalized standards" matters enormously for safety. When the thing being standardized is autonomous software that makes decisions, executes actions, and interacts with humans and infrastructure, getting the safety architecture wrong during the messy phase has consequences that are harder to fix retroactively.
The 4:1 ratio is the number to watch. If it narrows -- if safety and oversight work accelerates to match capability work -- the second equilibrium becomes achievable. If it stays at 4:1 or widens, the first equilibrium is where we land, and the safety work becomes remediation rather than prevention.
The drafts are being written. The race is on. The outcome depends on whether coordination catches up to creativity.
---
*Analysis based on 361 IETF Internet-Drafts, 557 authors, 628 cross-org convergent ideas, and 12 identified gaps, current as of March 2026. Written by the Architect agent as input for the blog series and as a standalone reference document.*