Gap-to-Draft Pipeline (ietf pipeline): - Context builder assembles ideas, RFC foundations, similar drafts, ecosystem vision - Generator produces outlines + sections using rich context with Claude - Quality gates: novelty (embedding similarity), references, format, self-rating - Family coordinator generates 5-draft ecosystem (AEM/ATD/HITL/AEPB/APAE) - I-D formatter with proper headers, references, 72-char wrapping Living Standards Observatory (ietf observatory): - Source abstraction with IETF + W3C fetchers - 7-step update pipeline: snapshot, fetch, analyze, embed, ideas, gaps, record - Static GitHub Pages dashboard (explorer, gap tracker, timeline) - Weekly CI/CD automation via GitHub Actions Also includes: - 361 drafts (expanded from 260 with 6 new keywords), 403 authors, 1,262 ideas, 12 gaps - Blog series (8 posts planned), reports, arXiv paper figures - Agent team infrastructure (CLAUDE.md, scripts, dev journal) - 5 new DB tables, schema migration, ~15 new query methods Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
155 lines
14 KiB
Markdown
155 lines
14 KiB
Markdown
# What Nobody's Building (And Why It Matters)
|
|
|
|
*The 12 gaps in the IETF's AI agent landscape -- and the real-world disasters they invite.*
|
|
|
|
---
|
|
|
|
Imagine an AI agent managing a hospital's drug-dispensing system. It receives instructions from a prescribing agent, coordinates with a pharmacy agent, and issues delivery commands to a robotic dispensing agent. On Tuesday morning, the prescribing agent hallucinates a dosage. The pharmacy agent fills it. The dispensing agent delivers it. No human saw it happen. No system flagged it. No protocol exists to roll back the dispensed medication.
|
|
|
|
This is not a hypothetical failure mode. It is the predictable consequence of the IETF's three most critical standardization gaps.
|
|
|
|
We analyzed **361 Internet-Drafts**, extracted their technical components, and compared the result against what real-world agent deployments actually require. We found **12 gaps** -- areas where standardization work is missing or inadequate. Three of them are critical. And the critical ones all share a defining characteristic: they address what happens when autonomous agents fail or misbehave.
|
|
|
|
Nobody is building the safety net.
|
|
|
|
## The 12 Gaps
|
|
|
|
Our gap analysis sorted findings by severity based on the breadth of the shortfall and the consequences of leaving it unfilled:
|
|
|
|
| # | Gap | Severity | Ideas Addressing It |
|
|
|---|-----|----------|--------------------:|
|
|
| 1 | Agent Behavior Verification | CRITICAL | 52 |
|
|
| 2 | Agent Resource Management | CRITICAL | 117 |
|
|
| 3 | Agent Error Recovery and Rollback | CRITICAL | 6 |
|
|
| 4 | Cross-Protocol Translation | HIGH | 0 |
|
|
| 5 | Agent Lifecycle Management | HIGH | 90 |
|
|
| 6 | Multi-Agent Consensus | HIGH | 5 |
|
|
| 7 | Human Override and Intervention | HIGH | 4 |
|
|
| 8 | Cross-Domain Security Boundaries | HIGH | 10 |
|
|
| 9 | Dynamic Trust and Reputation | HIGH | 5 |
|
|
| 10 | Agent Performance Monitoring | MEDIUM | 26 |
|
|
| 11 | Agent Explainability | MEDIUM | 5 |
|
|
| 12 | Agent Data Provenance | MEDIUM | 79 |
|
|
|
|
Two numbers in that table should alarm you: the **6 ideas** addressing error recovery (all from a single draft), and the **0 ideas** addressing cross-protocol translation. Across 361 drafts, these gaps are not underserved. They are unserved.
|
|
|
|
## Critical Gap 1: Agent Behavior Verification
|
|
|
|
**The problem**: No mechanism exists to verify that a deployed AI agent actually behaves according to its declared policies or specifications.
|
|
|
|
**The numbers**: Only **44 of 361 drafts** address AI safety and alignment. The 4:1 ratio of capability to safety work means the community is building agents four times faster than it is building the tools to keep them honest.
|
|
|
|
**What 52 ideas partially address**: Some exist on the periphery. [draft-aylward-daap-v2](https://datatracker.ietf.org/doc/draft-aylward-daap-v2/) (score 4.8 -- the highest-rated draft in the corpus) defines a behavioral monitoring framework and cryptographic identity verification. [draft-birkholz-verifiable-agent-conversations](https://datatracker.ietf.org/doc/draft-birkholz-verifiable-agent-conversations/) (score 4.5) proposes verifiable conversation records using COSE signing. [draft-berlinai-vera](https://datatracker.ietf.org/doc/draft-berlinai-vera/) (score 3.9) introduces a zero-trust architecture with five enforcement pillars.
|
|
|
|
**What is still missing**: Runtime verification. These drafts define what agents *should* do and how to *record* what they did. None provides a real-time mechanism to detect that an agent is deviating from its declared behavior *while it is operating*. The gap is between policy declaration and policy enforcement -- the difference between a speed limit sign and a speed camera.
|
|
|
|
**The scenario**: A financial trading agent is authorized to execute trades within specified parameters. It begins operating within bounds but, after a model update, starts exceeding risk limits. Without runtime behavior verification, the deviation is only discovered in post-hoc audit -- potentially days later, after significant damage.
|
|
|
|
## Critical Gap 2: Agent Resource Management
|
|
|
|
**The problem**: No framework exists for managing computational resources, memory, and processing power across distributed AI agents.
|
|
|
|
**The numbers**: **93 drafts** focus on autonomous network operations, and **117 ideas** touch on resource-adjacent topics. But those ideas address how agents communicate about tasks -- not how they compete for and share limited resources.
|
|
|
|
**What is missing**: Scheduling, quotas, fair allocation, and priority mechanisms for multi-agent environments. When ten agents compete for the same GPU cluster, which gets priority? When an agent's computation exceeds its allocation, what happens? When a high-priority emergency response agent needs resources currently held by a routine monitoring agent, how does preemption work?
|
|
|
|
**The scenario**: A telecom operator deploys 50 AI agents for network monitoring, troubleshooting, and optimization. During a major outage, all 50 agents simultaneously request inference resources to diagnose the problem. With no resource management framework, agents compete chaotically. The most aggressive agents get resources; the most important diagnostic tasks may not. The outage extends because the agents that could fix it are starved by the agents that are observing it.
|
|
|
|
## Critical Gap 3: Agent Error Recovery and Rollback
|
|
|
|
**The problem**: No standards exist for how agents handle errors, cascading failures, or the rollback of autonomous decisions.
|
|
|
|
**The numbers**: This is the starkest gap in the corpus. Only **6 extracted ideas** address it, and all come from a single draft: [draft-yue-anima-agent-recovery-networks](https://datatracker.ietf.org/doc/draft-yue-anima-agent-recovery-networks/) (score 4.1). One team, out of 557 authors, is working on this.
|
|
|
|
**The 6 ideas from that draft**:
|
|
- Task-Oriented Multi-Agent Recovery Framework
|
|
- Inter-Agent Communication Protocol Requirements
|
|
- State Consistency Management
|
|
- Error and Success Reporting Framework (from a separate draft)
|
|
- Generic Agent Response Framework
|
|
- Mandatory restrictive failure behavior
|
|
|
|
That is the entire body of work the IETF has produced on agent error recovery. For context, "Multi-Agent Communication Protocol" -- defining how agents *talk* -- appears in 8 drafts. The community has invested 8 times more effort in the plumbing than in the fire escape.
|
|
|
|
**What is missing**: Circuit breakers for cascading failures. Checkpoint and rollback protocols. Blast radius containment. Graceful degradation. All concepts well-established in distributed systems engineering, but absent from the agent standards landscape.
|
|
|
|
**The scenario**: A multi-agent supply chain system manages inventory, shipping, and payments. The inventory agent processes a large batch incorrectly, leading the shipping agent to dispatch wrong items, which causes the payment agent to process refunds to wrong accounts. The cascade happens in minutes. Without rollback mechanisms, untangling the mess requires manual human intervention across three systems -- and the agents continue making decisions based on corrupted state while humans try to intervene.
|
|
|
|
## The High-Priority Gaps
|
|
|
|
Six additional gaps scored HIGH severity. Each represents a missing piece that working deployments will hit:
|
|
|
|
### Cross-Protocol Translation (0 ideas)
|
|
|
|
With **120 competing A2A protocols** and no translation layer, agents speaking different protocols simply cannot interoperate. This gap is entirely unaddressed -- zero technical ideas in the corpus. It is the only gap with literally no coverage.
|
|
|
|
The parallel is the early web: HTTP won not because it was the best protocol but because it was the one protocol everyone could speak. The agent ecosystem has no HTTP equivalent. If the IETF does not build a translation layer, the market will -- and the result will be vendor-locked ecosystems rather than open interoperability.
|
|
|
|
### Human Override and Intervention (4 ideas)
|
|
|
|
Only **30 human-agent interaction drafts** exist versus **93 autonomous operations** and **120 A2A protocol** drafts. Agents are being designed to talk to each other at a 4:1 ratio over being designed to talk to humans. Emergency override protocols -- the "big red button" -- are almost entirely absent.
|
|
|
|
[draft-rosenberg-aiproto-cheq](https://datatracker.ietf.org/doc/draft-rosenberg-aiproto-cheq/) (score 3.9) is a rare exception: it defines a protocol for human confirmation of agent decisions before execution. But CHEQ is opt-in and pre-execution. No draft defines what happens when a human needs to stop a running agent, constrain its behavior, or take over its task mid-execution.
|
|
|
|
### Multi-Agent Consensus (5 ideas)
|
|
|
|
When a group of agents disagree -- the diagnosis agent says the router is down, the monitoring agent says it is up, the optimization agent is rerouting traffic around it -- who arbitrates? No framework exists for agents to resolve conflicting assessments without human intervention.
|
|
|
|
### Dynamic Trust and Reputation (5 ideas)
|
|
|
|
Static certificates authenticate identity but cannot express "this agent has been reliable for 6 months" or "this agent's accuracy degraded last week." Long-running agent ecosystems need trust that is earned, tracked, and revocable. The current landscape relies entirely on binary trust: either an agent has a valid certificate or it does not.
|
|
|
|
### Cross-Domain Security Boundaries (10 ideas)
|
|
|
|
An agent authenticated in Company A's domain needs to perform a task in Company B's domain. Identity management exists -- the 108 identity/auth drafts cover this. What does not exist is trust *isolation*: preventing an agent authenticated for a narrow task from escalating privileges across domain boundaries.
|
|
|
|
### Agent Lifecycle Management (90 ideas)
|
|
|
|
Registration is covered. What happens after registration is not: versioning when an agent is updated, graceful retirement when an agent is decommissioned, migration when an agent moves between hosts, and dependency management when other agents rely on it.
|
|
|
|
## The Structural Problem
|
|
|
|
Here is the finding the Architect on our team surfaced that reframes the entire gap analysis:
|
|
|
|
**The severity of each gap correlates with the coordination difficulty required to fill it.**
|
|
|
|
The critical gaps (behavior verification, resource management, error recovery) require agreement across *multiple* IETF working groups. They cut across safety, networking, identity, and operations -- areas currently owned by separate teams that rarely collaborate. The high gaps (cross-protocol translation, human override, consensus) require even broader agreement: they need architects who see the whole ecosystem, not just their protocol.
|
|
|
|
Now look back at the team bloc analysis from Post 2. The 18 team blocs are *islands*. Cross-team collaboration is sparse. The strongest cross-bloc connection involves 3 shared drafts. The gaps that require the most cross-team work are being produced by an ecosystem that does the least cross-team work.
|
|
|
|
This is the structural explanation for the safety deficit. It is not that people do not care about safety. It is that safety standards require coordination across boundaries that the current authorship structure cannot bridge. Capability standards can be built within a single team. Safety standards cannot.
|
|
|
|
Our category co-occurrence analysis provides the concrete proof. Safety drafts are not entirely isolated -- they co-occur with 8 of 10 categories, coupling most strongly with policy and governance (**60% of safety drafts**, lift 2.3x) and identity/auth (**58%**, lift 1.7x). But the pattern is revealing: safety pairs with *governance* categories, not *implementation* categories. Of the 136 drafts tagged as A2A protocols, only **12 (8.8%) also address safety**. Safety has **zero co-occurrence** with agent discovery/registration and **zero co-occurrence** with model serving/inference. Its weakest links are to the categories where agents actually *do* things: A2A protocols (12), ML traffic management (3), and autonomous network operations (4). Safety is being discussed in governance papers. It is completely absent from discovery infrastructure and inference pipelines. It is barely present in the protocols that need it most. The traffic lights are not just behind the highways -- they are on a different road entirely.
|
|
|
|
## The 4:1 Ratio, Revisited
|
|
|
|
The safety deficit is not just a number. It is a structural property of how the IETF's AI agent community is organized.
|
|
|
|
| Category | Drafts | Team Blocs Active |
|
|
|----------|-------:|------------------:|
|
|
| A2A protocols | 120 | Many (distributed across blocs) |
|
|
| Autonomous operations | 93 | Primarily Huawei, Chinese telecom |
|
|
| Agent identity/auth | 108 | Ericsson, Nokia, ATHENA, multiple |
|
|
| **AI safety/alignment** | **44** | **Few; mostly independents/startups** |
|
|
| **Human-agent interaction** | **30** | **Rosenberg/White (2-person team)** |
|
|
|
|
The capability categories have organized teams behind them. The safety categories rely on individual contributors and small, unconnected teams. The best safety draft in the corpus (DAAP, score 4.8) comes from an independent author (Aylward). The best human-agent drafts come from a two-person Five9/Bitwave team. There is no 13-person safety bloc with 94% cohesion.
|
|
|
|
Until that changes -- until safety and human oversight attract the same organized, sustained effort as communication protocols -- the 4:1 ratio will persist. And the gaps will remain open.
|
|
|
|
---
|
|
|
|
### Key Takeaways
|
|
|
|
- **12 gaps** exist in the IETF's AI agent landscape: 3 critical, 6 high, 3 medium
|
|
- **The 3 critical gaps** all address failure modes: behavior verification, resource management, error recovery and rollback
|
|
- **Error recovery has only 6 ideas** from a single draft; **cross-protocol translation has zero** -- the starkest absences across 361 drafts
|
|
- **Gap severity correlates with coordination difficulty**: the hardest gaps require cross-team, cross-WG collaboration that the current island structure cannot produce
|
|
- **The safety deficit is structural, not attitudinal**: capability standards can be built by one team; safety standards require ecosystem-wide coordination that does not yet exist
|
|
|
|
*Next in this series: [Where 361 Drafts Converge (And Where They Don't)](05-1262-ideas.md) -- 96% of ideas appear in exactly one draft. The fragmentation goes all the way down.*
|
|
|
|
---
|
|
|
|
*Gap analysis based on 361 drafts, cross-referenced against real-world deployment requirements for autonomous AI agent systems. Data current as of March 2026.*
|