Files

Christian Nennemann f1a0b0264c Fix blog accuracy and add methodology documentation

Blog posts (all 10 files updated):
- Update all counts to match DB: 434 drafts, 557 authors, 419 ideas, 11 gaps
- Fix EU AI Act timeline to August 2026 (5 months, not 18)
- Reframe growth claim from "36x" to actual monthly figures (5→61→85)
- Add safety ratio nuance (1.5:1 to 21:1 monthly variation)
- Fix composite scores (4.8→4.75, 4.6→4.5)
- Add OAuth/GDPR consent distinction (Art. 6(1)(a), Art. 28)
- Add EU AI Act Annex III + MDR context to hospital scenario
- Add FIPA, IEEE P3394, eIDAS 2.0 references
- Add GDPR gap paragraph (DPIA, erasure, portability, purpose limitation)
- Rewrite Post 04 gap table to match actual DB gap names

Methodology:
- Expand methodology.md: pipeline docs, limitations, related work
- Add LLM-as-judge caveats and explicit rating rubric to analyzer.py
- Add clustering threshold rationale to embeddings.py
- Add gap analysis grounding notes to analyzer.py
- Add Limitations section to Post 07

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

2026-03-08 11:04:40 +01:00

17 KiB

Raw Blame History

Where 434 Drafts Converge (And Where They Don't)

The fragmentation goes deeper than competing protocols. It extends all the way down to the idea level.

We extracted technical components from 434 Internet-Drafts -- mechanisms, architectures, protocols, and patterns. Then we asked: how many of these ideas does anyone else also propose?

The current database contains 419 extracted ideas across 377 drafts. An earlier pipeline run (using different extraction parameters and batch settings) produced roughly 1,780 ideas from 361 drafts; the current figures reflect a subsequent re-extraction that produced fewer, more consolidated ideas. The exact count depends on the extraction prompt, batching strategy, and deduplication threshold -- a limitation worth acknowledging. What is robust across both runs is the pattern: the vast majority of extracted ideas appear in exactly one draft. Only a handful show cross-draft convergence by exact title matching. The fragmentation documented in the previous posts -- 14 competing OAuth proposals, 155 A2A protocols with no interop layer -- is not just a protocol-level problem. It extends all the way down. At the idea level, the landscape is overwhelmingly a collection of islands.

But islands are not the whole story. Using fuzzy matching across organizational boundaries, we found 628 ideas where different organizations are working on recognizably similar problems -- even when they use different names and different approaches. (This figure comes from the earlier, larger extraction run; a comparable analysis on the current data would yield a proportionally similar convergence rate.) These cross-org convergence signals are the embryonic consensus of the agent standards landscape: the problems that different teams, in different countries, with different agendas, independently recognize and attempt to solve.

These convergence signals are more impressive than they first appear. Recall from Post 2 that 55% of all drafts have never been revised beyond their first submission, and 65% of Huawei's drafts are fire-and-forget. The ideas that converge across organizations are not the generic scaffolding of first-draft submissions -- they represent genuine engineering investment from teams that independently identified the same problem and committed resources to solving it.

The picture that emerges is paradoxical: the raw material for a complete agent ecosystem exists. The convergent ideas point toward the architecture the ecosystem needs. But they exist in isolation -- proposed by separate teams, embedded in separate drafts, with no connective tissue linking them into a coherent blueprint.

The Taxonomy

Every extracted idea was classified by type. The distribution reveals what kind of thinking dominates the landscape:

Type	Count	Share	What It Means
Protocol	96	23%	Full protocol specifications
Architecture	95	23%	System designs and reference models
Extension	79	19%	Additions to existing standards (OAuth, SCIM, DNS)
Mechanism	68	16%	Concrete technical solutions: auth flows, routing algorithms, token formats
Requirement	42	10%	Formal requirement documents
Pattern	35	8%	Reusable design approaches
Framework	3	1%	Frameworks, profiles
Format	1	<1%	Data format specifications

Note: These counts reflect the current database (419 ideas). An earlier pipeline run with different extraction parameters produced higher counts across all categories; the relative proportions are more meaningful than the absolute numbers.

The near-equal split between protocols (96), architectures (95), and extensions (79) tells us the community is both building new solutions and extending existing ones. The protocols and extensions show that much of the work builds on established foundations (OAuth 2.0, SCIM, DNS, EDHOC) rather than starting from scratch.

The 95 architectures and 42 requirements suggest healthy standards development: teams are defining reference models before writing code. But the 35 patterns -- reusable approaches without full protocol specification -- indicate that some teams have identified what needs to be done without committing to how.

Where Teams Converge

By exact title, few ideas appear in multiple drafts. But ideas with different names often describe the same concept -- "Agent Gateway" in one draft and "Inter-Agent Communication Hub" in another. Our fuzzy-matching overlap analysis (using SequenceMatcher at 0.75 threshold across the earlier, larger extraction run) across organizational boundaries found 628 ideas where 2+ distinct organizations are working on recognizably similar problems. These are the genuine consensus signals.

Idea	Orgs	Drafts	Key Organizations
A2A Communication Paradigm	8	5	CAICT, Deutsche Telekom, Huawei, Orange, Telefonica
AI Agent Network Architecture	8	5	China Mobile, Deutsche Telekom, Huawei, Orange, UnionPay
Multi-Agent Communication Protocol	7	8	AsiaInfo, BUPT, China Mobile, China Telecom, Huawei
AI Agent Communication Network (ACN)	7	5	ANP Open Source, China Mobile, Cisco, Five9, Huawei
NLIP (Natural Language Interchange)	7	1	Fordham, IBM, Purdue, ServiceNow, eBay
ELA Protocol	6	6	Bitwave, Cisco, Ericsson, Five9, Inria
AI Gateway	6	4	AsiaInfo, BUPT, China Telecom, Huawei, UnionPay
Agent Communication across WAN	6	3	China Mobile, China Unicom, Deutsche Telekom, Huawei, Orange

The most-converged idea -- "A2A Communication Paradigm" -- draws independent contributions from 8 organizations across 5 countries. This is simultaneously the strongest convergence signal and the strongest fragmentation signal. Eight organizations agree this is important. They are building separate, incompatible versions.

Look at who bridges the divide. In three of the top eight convergent ideas, the same names appear alongside Chinese institutions: Deutsche Telekom, Telefonica, and Orange. These European telecoms show up in "A2A Communication Paradigm," "AI Agent Network Architecture," and "Agent Communication across WAN" -- each time co-listed with Huawei, China Mobile, or China Unicom. Of the 180 ideas that cross the Chinese-Western organizational divide, European telecoms are present on a disproportionate share. The organizations most likely to prevent the agent ecosystem from splitting into incompatible regional stacks are not Google or Microsoft -- they are European carriers operating in both markets. US Big Tech is almost entirely absent from cross-divide convergence.

The organization-pair overlaps reveal where real collaboration happens -- and where it does not:

Org Pair	Shared Ideas	Signal
China Unicom -- Huawei	32	Deep intra-bloc alignment
China Mobile -- Huawei	27	Deep intra-bloc alignment
Ericsson -- Inria	21	European cross-org collaboration
Tsinghua -- Zhongguancun Lab	20	Chinese academic convergence
Fraunhofer SIT -- Tradeverifyd	10	Verifiable records niche

The pattern is stark: the highest-overlap pairs are Chinese institutions working within established blocs. Formal co-authorship between Chinese and Western organizations is thin -- but idea-level convergence, mediated by European telecoms operating in both markets, is broader than the co-authorship data suggests.

The convergence signals cluster in three areas:

1. Agent communication infrastructure. How agents discover, connect to, and message each other. This is the most active area with the most redundant proposals. The underlying need is clear; the implementation is contested.

2. Authentication and authorization. Action-based authorization, agent registration, cryptographic identity verification. OAuth extensions dominate, but the approaches diverge significantly between pure OAuth extension (add claims/scopes) and novel frameworks (DAAP accountability protocol, STAMP delegation proofs).

3. Network architecture. Agent gateways, agent communication networks, network management architectures. This is where the Chinese institutional ecosystem has the strongest presence, with Huawei and affiliated organizations producing most of the architecture ideas.

Where Teams Innovate

The 96% of ideas appearing in only one draft are a mix: mostly generic components describing what each draft does ("Agent Gateway," "Transport Configuration System"), but scattered among them are genuinely novel proposals that no other team has attempted -- either because they are too new, too specialized, or ahead of their time.

Some standouts from the unique ideas:

Verifiable Agent Behavior Attestation (draft-birkholz-verifiable-agent-conversations) -- A CDDL-based format for cryptographically signing agent conversation records, enabling post-hoc verification of agent behavior. This directly addresses the critical behavior verification gap.

ADOL: Agentic Data Optimization Layer (draft-chang-agent-token-efficient, score 4.5) -- Addresses token bloat in agent communication protocols. As agents exchange increasingly complex context, message sizes explode. ADOL compresses agent communications by 60-80%, a practical necessity that nobody else is working on.

Working Memory (draft-agent-gw) -- A structured context management system that maintains state across multi-step agent operations. Sounds basic -- but no other draft proposes a standard for how agents should manage persistent operational context.

Autonomous Optical Network Operation (draft-zhao-ccamp-actn-optical-network-agent) -- Applies agent architecture to the specific domain of optical network management. This is the kind of vertical specialization that validates the horizontal agent architecture work.

Execution Context Token (ECT) (draft-nennemann-wimse-ect, score 4.0) -- A JWT extension that records what each task did, linked to predecessors via a DAG. This is arguably the single most architecturally significant idea in the corpus: it turns the execution history of a multi-agent workflow into a cryptographically verifiable directed acyclic graph. It is the technical foundation for accountability, rollback, audit, and provenance.

CHEQ Protocol (draft-rosenberg-aiproto-cheq, score 3.9) -- Human confirmation of agent decisions before execution. The only concrete protocol proposal for human-in-the-loop agent oversight. In a landscape of 30 human-agent interaction drafts, CHEQ stands alone as an implementable solution.

The Five Ideas That Matter Most

If you are building agent systems today and need to know which IETF proposals to watch, these five represent the highest combination of quality, novelty, and gap-filling potential:

Idea	Draft	Score	Why It Matters
Execution Context Token	draft-nennemann-wimse-ect	4.0	DAG-based execution evidence; foundation for audit, rollback, and accountability
DAAP Accountability Protocol	draft-aylward-daap-v2	4.75	Most comprehensive safety proposal; authentication + monitoring + enforcement
STAMP Delegation Proofs	draft-guy-bary-stamp-protocol	4.5	Cryptographic proof that an agent was authorized for a specific task
Agent Description Language (ADL)	draft-nederveld-adl	4.1	JSON standard for describing agent capabilities, tools, and permissions
Verifiable Conversations	draft-birkholz-verifiable-agent-conversations	4.5	Cryptographic signing of conversation records for auditability

Together, these five ideas sketch the outline of the ecosystem architecture that Post 6 will describe in full: ECT provides the execution backbone, DAAP provides the accountability layer, STAMP proves delegation, ADL describes capabilities, and verifiable conversations create the audit trail.

Mapping Ideas to Gaps

The most revealing analysis is mapping which ideas partially address which gaps:

Gap	Severity	Coverage
Agent Behavioral Verification	CRITICAL	Partial: attestation and monitoring ideas exist but no runtime enforcement
Agent Failure Cascade Prevention	CRITICAL	Near-zero: minimal work on cascade containment
Real-Time Agent Rollback Mechanisms	HIGH	Near-zero: limited to draft-yue-anima-agent-recovery-networks
Multi-Agent Consensus Protocols	HIGH	Minimal: no conflict resolution framework
Human Override Standardization	HIGH	Near-zero: CHEQ exists but no emergency override protocol
Cross-Domain Agent Audit Trails	HIGH	Partial: identity covered, cross-domain audit not
Federated Agent Learning Privacy	HIGH	Minimal: privacy-preserving learning not specified
Cross-Protocol Agent Migration	MEDIUM	Complete absence in the corpus
Agent Resource Accounting and Billing	MEDIUM	Peripheral: resource types defined but no economic models
Agent Capability Negotiation	MEDIUM	Partial: tool enumeration exists but not dynamic negotiation
Agent Performance Benchmarking	MEDIUM	Moderate: benchmarking ideas exist (draft-cui-nmrg-llm-benchmark)

The pattern is clear: the critical and high-severity gaps are those where the periphery of existing work touches the problem but nobody makes it the central problem. Teams building communication protocols think about resources; teams building discovery think about lifecycle. The gaps where no team is even circling the problem -- rollback mechanisms, human override, cascade prevention -- are the true blind spots.

The Ideas Nobody Had

Sometimes the absence is the finding. Here are technical ideas conspicuous in their absence from the entire corpus:

Agent capability degradation signaling: No protocol for an agent to advertise that its performance has degraded (model drift, resource constraints, partial failure). Other agents continue relying on it at full trust.
Multi-agent transaction semantics: No ACID-like guarantees for multi-agent workflows. If three agents must all succeed or all roll back, there is no two-phase commit equivalent.
Agent migration protocol: No standard for moving a running agent from one host to another while preserving state and active connections. Critical for cloud deployments.
Privacy-preserving agent discovery: No mechanism for an agent to find capabilities without revealing its intent. "I need a medical diagnosis agent" reveals sensitive information before any trust is established. Under Art. 25 GDPR (data protection by design and by default), this is not just a nice-to-have -- it is a legal requirement for EU-deployed systems where discovery queries may constitute processing of special category data (Art. 9 GDPR, health data).
Agent cost and billing: No standard for agents to negotiate compensation for services. Agents performing work for other agents have no way to express "this costs X" or "you have Y credits remaining."

Each of these absences represents an opportunity for new drafts that would fill genuine needs.

What the Taxonomy Tells Builders

Three practical takeaways for anyone implementing agent systems:

1. Build on the convergent ideas. Agent registration, action-based authorization, and capability-based discovery appear across multiple teams and organizations. These represent genuine consensus about what the infrastructure needs, even if implementations diverge.

2. Watch the single-source innovations. The long tail of single-draft ideas contains the innovations that will differentiate the next generation of agent platforms. ECT, CHEQ, ADOL, and ADL are not widely known but represent some of the most thoughtful engineering in the corpus.

3. Fill the blank spaces. Error recovery, cross-protocol translation, and human override are the clearest opportunities for new contributions. The community has signaled these gaps matter (through the severity of the gap analysis) but has not yet produced the ideas to fill them.

Key Takeaways

The vast majority of ideas appear in exactly one draft -- fragmentation extends all the way down to the idea level
628 cross-org convergent ideas (via fuzzy matching on an earlier extraction run) reveal where organizations independently agree; highest-overlap pairs are Chinese institutions (China Unicom-Huawei: 32 shared ideas)
The critical gaps remain unfilled: rollback mechanisms, failure cascade prevention, and human override have minimal coverage across 434 drafts
Five ideas to watch: ECT (execution DAG), DAAP (accountability), STAMP (delegation proof), ADL (agent description), verifiable conversations (audit trail)
Convergence clusters in three areas: agent communication infrastructure, authentication/authorization, and network architecture

Next in this series: Drawing the Big Picture -- 628 cross-org convergent ideas, 11 gaps, and the architectural vision that connects them.

Idea extraction performed by Claude from draft abstracts and full text. Classification into types (protocol, architecture, extension, mechanism, requirement, pattern) based on the technical content of each proposal. The current database contains 419 ideas; figures referencing ~1,780 ideas come from an earlier pipeline run with different extraction parameters. Data current as of March 2026.

17 KiB Raw Blame History