Fix blog accuracy and add methodology documentation

Blog posts (all 10 files updated):
- Update all counts to match DB: 434 drafts, 557 authors, 419 ideas, 11 gaps
- Fix EU AI Act timeline to August 2026 (5 months, not 18)
- Reframe growth claim from "36x" to actual monthly figures (5→61→85)
- Add safety ratio nuance (1.5:1 to 21:1 monthly variation)
- Fix composite scores (4.8→4.75, 4.6→4.5)
- Add OAuth/GDPR consent distinction (Art. 6(1)(a), Art. 28)
- Add EU AI Act Annex III + MDR context to hospital scenario
- Add FIPA, IEEE P3394, eIDAS 2.0 references
- Add GDPR gap paragraph (DPIA, erasure, portability, purpose limitation)
- Rewrite Post 04 gap table to match actual DB gap names

Methodology:
- Expand methodology.md: pipeline docs, limitations, related work
- Add LLM-as-judge caveats and explicit rating rubric to analyzer.py
- Add clustering threshold rationale to embeddings.py
- Add gap analysis grounding notes to analyzer.py
- Add Limitations section to Post 07

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
This commit is contained in:
2026-03-08 11:04:40 +01:00
parent 439424bd04
commit f1a0b0264c
11 changed files with 169 additions and 144 deletions

View File

@@ -1,14 +1,14 @@
# Drawing the Big Picture: What the Agent Ecosystem Actually Needs
*361 drafts, 628 cross-org convergent ideas, 12 gaps -- and the architectural vision that connects them all.*
*434 drafts, 628 cross-org convergent ideas, 11 gaps -- and the architectural vision that connects them all.*
---
We have spent five posts documenting a paradox: the IETF's AI agent landscape has extraordinary breadth (361 drafts), deep fragmentation at every level (96% of ideas appear in only one draft, 120 competing A2A protocols, 14 OAuth proposals), concentrated authorship (18 team blocs, one company writing 18% of all drafts), and critical gaps (behavior verification, error recovery, human override) that nobody is filling.
We have spent five posts documenting a paradox: the IETF's AI agent landscape has extraordinary breadth (434 drafts), deep fragmentation at every level (the vast majority of ideas appear in only one draft, 155 competing A2A protocols, 14 OAuth proposals), concentrated authorship (18 team blocs, one company writing ~16% of all drafts), and critical gaps (behavioral verification, failure cascade prevention, human override) that nobody is filling.
The landscape has quantity. It lacks architecture.
This post is about what the architecture looks like -- not in theory, but derived from the data. The 12 gaps are not random absences; they are structurally related. The convergent ideas contain the components; they need a blueprint. And the blueprint already has a foundation: existing IETF work on workload identity (SPIFFE/WIMSE) and execution evidence (Execution Context Tokens) provides the lower layers. What is missing is what goes on top.
This post is about what the architecture looks like -- not in theory, but derived from the data. The 11 gaps are not random absences; they are structurally related. The convergent ideas contain the components; they need a blueprint. And the blueprint already has a foundation: existing IETF work on workload identity (SPIFFE/WIMSE) and execution evidence (Execution Context Tokens) provides the lower layers. What is missing is what goes on top.
## What the Ecosystem Needs: Four Pillars
@@ -26,13 +26,13 @@ Every multi-agent workflow is a directed acyclic graph: tasks with dependencies,
The Execution Context Token (ECT) from [draft-nennemann-wimse-ect](https://datatracker.ietf.org/doc/draft-nennemann-wimse-ect/) provides the evidence layer: each task produces a signed token linked to its predecessors via parent references, forming a verifiable DAG. What is missing is the orchestration semantics: when to checkpoint, how to roll back, how to contain cascading failures.
The data supports this: the 6 ideas addressing error recovery (all from [draft-yue-anima-agent-recovery-networks](https://datatracker.ietf.org/doc/draft-yue-anima-agent-recovery-networks/)) include "Task-Oriented Multi-Agent Recovery Framework" and "State Consistency Management" -- DAG concepts by another name. The 117 ideas touching resource management need a graph-aware scheduler. The answer is the same structure: a DAG execution model.
The data supports this: the limited work addressing error recovery (notably [draft-yue-anima-agent-recovery-networks](https://datatracker.ietf.org/doc/draft-yue-anima-agent-recovery-networks/)) includes "Task-Oriented Multi-Agent Recovery Framework" and "State Consistency Management" -- DAG concepts by another name. The answer is the same structure: a DAG execution model.
### Pillar 2: Human-in-the-Loop as First Class
**The gap it fills**: Human Override and Intervention (High), Agent Explainability (Medium)
Only **30 human-agent interaction drafts** exist against **120 A2A protocols** and **93 autonomous operations** drafts. Agents are being designed to talk to each other, not to humans. The CHEQ protocol ([draft-rosenberg-aiproto-cheq](https://datatracker.ietf.org/doc/draft-rosenberg-aiproto-cheq/)) is a rare exception -- it defines human confirmation *before* agent execution. But nobody has standardized what happens *during* execution: how a human pauses a running workflow, constrains an agent's scope, takes over a task, or issues an emergency stop.
Only **34 human-agent interaction drafts** exist against **155 A2A protocol** drafts and **114 autonomous operations** drafts. Agents are being designed to talk to each other, not to humans. The CHEQ protocol ([draft-rosenberg-aiproto-cheq](https://datatracker.ietf.org/doc/draft-rosenberg-aiproto-cheq/)) is a rare exception -- it defines human confirmation *before* agent execution. But nobody has standardized what happens *during* execution: how a human pauses a running workflow, constrains an agent's scope, takes over a task, or issues an emergency stop.
Human-in-the-loop must be a node type in the execution DAG, not an afterthought. The architecture needs:
- **Approval gates**: DAG nodes that block until a human approves
@@ -46,9 +46,9 @@ The irony: every production deployment will require these primitives. The standa
**The gap it fills**: Cross-Protocol Translation (High, zero ideas), Agent Lifecycle Management (High)
The 120 A2A protocol drafts will never converge to a single winner. MCP, A2A Protocol, SLIM, and dozens of others will coexist, each with different strengths. The answer is not to pick one; it is to build a translation layer that lets agents using different protocols interoperate through gateways.
The 155 A2A protocol drafts will never converge to a single winner. MCP, A2A Protocol, SLIM, and dozens of others will coexist, each with different strengths. The answer is not to pick one; it is to build a translation layer that lets agents using different protocols interoperate through gateways.
This gap has **zero ideas** in the current corpus -- the starkest absence across 361 drafts. No team is working on it. Yet it is perhaps the most important architectural piece: without protocol interoperability, the agent ecosystem fragments into vendor-locked silos.
This gap has **zero ideas** in the current corpus -- the starkest absence across 434 drafts. No team is working on it. Yet it is perhaps the most important architectural piece: without protocol interoperability, the agent ecosystem fragments into vendor-locked silos.
The protocol binding layer would define:
- How agents advertise which ecosystem features they support
@@ -75,7 +75,7 @@ The architecture achieves this with *assurance profiles* -- named configurations
| L2 | Signed ECTs (JWT) | Cross-org, standard compliance |
| L3 | Signed ECTs + external audit ledger | Regulated industries |
This dual-regime approach resolves the tension between "move fast" deployments and "prove everything" regulated environments. The 52 ideas touching behavior verification and the 79 ideas touching data provenance become implementable at higher assurance levels without imposing their cost on every deployment.
This dual-regime approach resolves the tension between "move fast" deployments and "prove everything" regulated environments. Ideas touching behavior verification and data provenance become implementable at higher assurance levels without imposing their cost on every deployment.
## How It Builds on What Exists
@@ -90,7 +90,7 @@ The architecture adds connective tissue above this layer, not below it:
| **Identity** | SPIFFE (workload identifier), WIMSE (security context propagation) | Nothing -- use existing identity |
| **Evidence** | ECT (execution context tokens, DAG linking) | Orchestration semantics, checkpoint/rollback, HITL nodes |
| **Auth** | OAuth 2.0, SCIM, DAAP, STAMP, Agentic JWT | Protocol binding so any auth approach works |
| **Communication** | MCP, A2A, SLIM, 120 other protocols | Translation layer and capability advertisement |
| **Communication** | MCP, A2A, SLIM, 155 other protocols | Translation layer and capability advertisement |
| **Safety** | DAAP (accountability), verifiable conversations, VERA (zero-trust) | Assurance profiles connecting these into deployable configurations |
The proposed five-draft ecosystem:
@@ -105,7 +105,7 @@ Each draft addresses specific gaps. Together, they provide the connective tissue
## Traction vs. Aspiration
A reality check: of the 361 drafts, only **36 (10%)** have been adopted by IETF working groups. The rest are individual submissions -- proposals without institutional backing. The WG-adopted drafts score higher on average (**3.54 vs. 3.31**), particularly on maturity (+1.28) and momentum (+0.98), but lower on novelty (-0.45). *(Note: scores are LLM-generated relative rankings from abstracts; see [Methodology](../methodology.md).)* The WGs that have adopted the most agent-relevant drafts are security-focused: **lamps** (6 drafts), **lake** (5), **tls** (3), **emu** (3). Agent-specific WGs like `aipref` have adopted only 2 drafts.
A reality check: of the 434 drafts, **52 (12%)** have been adopted by IETF working groups. The rest are individual submissions -- proposals without institutional backing. The WG-adopted drafts score higher on average (**3.61 vs. 3.23**, 4-dimension composite), particularly on maturity (+1.28) and momentum (+0.98), but lower on novelty (-0.45). *(Note: scores are LLM-generated relative rankings from abstracts; see [Methodology](../methodology.md).)* The WGs that have adopted the most agent-relevant drafts are security-focused: **lamps** (6 drafts), **lake** (5), **tls** (3), **emu** (3). Agent-specific WGs like `aipref` have adopted only 2 drafts.
This reveals a structural insight: the IETF is not building agent standards from scratch. It is **retrofitting security standards for agents**. The agent architecture we propose above would need to work within this reality -- building on the security WGs' infrastructure rather than competing with it.
@@ -117,7 +117,7 @@ Based on the data trajectories and current momentum:
**Within 12 months**: The DMSC side meeting's gateway work will produce a specification, likely gateway-centric with Agent Gateways as the primary interoperability mechanism. This is not the protocol-agnostic translation layer the ecosystem needs, but it will be the first concrete interop proposal.
**Within 18 months**: The safety deficit will begin to close -- not from IETF drafts but from regulatory pressure. The EU AI Act's requirements for high-risk AI systems will drive demand for behavior verification, human override, and audit standards. The IETF will respond reactively.
**Within 5 months (August 2026)**: The EU AI Act (Regulation 2024/1689), which entered into force on 1 August 2024, becomes fully applicable on 2 August 2026. Its requirements for high-risk AI systems -- including mandatory risk management (Art. 9), human oversight (Art. 14), record-keeping (Art. 12), and accuracy/robustness (Art. 15) -- will drive immediate demand for behavior verification, human override, and audit standards. Non-compliance carries penalties up to 35 million EUR or 7% of global annual turnover (Art. 99). This is not future regulatory pressure; it is current law with imminent enforcement. The safety deficit is simultaneously a technical gap and a compliance gap for any agent system deployed in the EU.
**The risk**: If the architecture work does not happen in the next 12 months, the agent ecosystem will calcify around vendor-specific protocol stacks (OpenAI's, Google's, Anthropic's, Huawei's). Each will have its own auth, discovery, and communication layer. The interoperability window will close, and the IETF's work will be standards for islands rather than standards for the internet.
@@ -149,9 +149,9 @@ If you are building agent systems and cannot wait for standards to mature:
Across six posts, we have built to one argument:
**The IETF's AI agent standardization effort is the largest, fastest-growing, and most consequential standards race in a decade. But it is building the highways before the traffic lights.** The data shows explosive growth (from 0.5% to 9.3% of all IETF submissions in 15 months), deep fragmentation (120 competing A2A protocols), concerning concentration (one company writes 18% of all drafts), and a structural safety deficit (4:1 capability to guardrails). What is missing is not more protocols -- it is connective tissue: a shared execution model, human oversight primitives, protocol interoperability, and assurance profiles that work from development to regulated production.
**The IETF's AI agent standardization effort is the largest, fastest-growing, and most consequential standards race in a decade. But it is building the highways before the traffic lights.** The data shows explosive growth (from 0.5% to 9.3% of all IETF submissions in 15 months), deep fragmentation (155 competing A2A protocols), concerning concentration (one company writes ~16% of all drafts), and a structural safety deficit (4:1 capability to guardrails). What is missing is not more protocols -- it is connective tissue: a shared execution model, human oversight primitives, protocol interoperability, and assurance profiles that work from development to regulated production.
The 75 convergent ideas -- and the broader set of 628 cross-org overlaps -- contain the components for this architecture. The question is whether the community can assemble them before the protocols ship without it. The convergence data suggests it is possible: **180 ideas already cross the Chinese-Western divide**, mediated largely by European telecoms (Deutsche Telekom, Telefonica, Orange) that operate in both markets and appear on both sides of nearly every major cross-cultural convergent idea. The bridge-builders exist. They need an architecture to bridge to.
The convergent ideas -- and the broader set of 628 cross-org overlaps -- contain the components for this architecture. The question is whether the community can assemble them before the protocols ship without it. The convergence data suggests it is possible: **180 ideas already cross the Chinese-Western divide**, mediated largely by European telecoms (Deutsche Telekom, Telefonica, Orange) that operate in both markets and appear on both sides of nearly every major cross-cultural convergent idea. The bridge-builders exist. They need an architecture to bridge to.
The IETF has built the internet's infrastructure before. DNS, HTTP, TLS -- each emerged from periods of competing proposals, fragmentation, and coordinated resolution. The AI agent standards race is following the same pattern, on a compressed timeline, with higher stakes.
@@ -163,12 +163,12 @@ The traffic lights need to catch up to the highways. The data says they can -- i
- **Four missing pillars**: DAG-based execution, human-in-the-loop primitives, protocol-agnostic interoperability, and assurance profiles for dual-regime deployment
- **The architecture builds on existing work**: SPIFFE for identity, WIMSE for security context, ECT for execution evidence -- the foundation exists
- **Five proposed drafts** (AEM, ATD, HITL, AEPB, APAE) would fill the 12 gaps by providing connective tissue between existing protocol proposals
- **Five proposed drafts** (AEM, ATD, HITL, AEPB, APAE) would fill the 11 gaps by providing connective tissue between existing protocol proposals
- **The interoperability window is closing**: vendor-specific agent stacks are forming; the next 12 months are critical for open standards
- **For builders today**: design for DAGs, build HITL from the start, make assurance configurable, avoid protocol lock-in
*Next in this series: [How We Built This](07-how-we-built-this.md) -- the methodology behind analyzing 361 IETF drafts with Claude, Ollama, and Python.*
*Next in this series: [How We Built This](07-how-we-built-this.md) -- the methodology behind analyzing 434 IETF drafts with Claude, Ollama, and Python.*
---
*Synthesis based on the full IETF Draft Analyzer dataset: 361 drafts, 557 authors, 75 cross-draft convergent ideas (628 via fuzzy matching), 12 gaps, 18 team blocs, 42 overlap clusters. Data current as of March 2026.*
*Synthesis based on the full IETF Draft Analyzer dataset: 434 drafts, 557 authors, 628 cross-org convergent ideas (via fuzzy matching), 11 gaps, 18 team blocs. Data current as of March 2026.*