# What Nobody's Building (And Why It Matters)

*The 11 gaps in the IETF's AI agent landscape -- and the real-world disasters they invite.*

---

Imagine an AI agent managing a hospital's drug-dispensing system. It receives instructions from a prescribing agent, coordinates with a pharmacy agent, and issues delivery commands to a robotic dispensing agent. On Tuesday morning, the prescribing agent hallucinates a dosage. The pharmacy agent fills it. The dispensing agent delivers it. No human saw it happen. No system flagged it. No protocol exists to roll back the dispensed medication.

To be clear: this scenario is already regulated. Under the EU AI Act (Regulation 2024/1689), a drug-dispensing AI agent is a high-risk AI system under Annex III, requiring conformity assessment, risk management, and human oversight before deployment. The Medical Devices Regulation (MDR 2017/745) imposes additional obligations. The gap is not one of legal accountability -- it is one of technical implementation. The standards that would let developers *comply* with these regulations in multi-agent architectures do not yet exist.

This is the predictable consequence of the IETF's most critical standardization gaps.

We analyzed **434 Internet-Drafts**, extracted their technical components, and compared the result against what real-world agent deployments actually require. We found **11 gaps** -- areas where standardization work is missing or inadequate. Two of them are critical. And the critical ones share a defining characteristic: they address what happens when autonomous agents fail or misbehave.

Nobody is building the safety net.

## The 12 Gaps

Our gap analysis sorted findings by severity based on the breadth of the shortfall and the consequences of leaving it unfilled:

| # | Gap | Severity |
|---|-----|----------|
| 1 | Agent Behavioral Verification | CRITICAL |
| 2 | Agent Failure Cascade Prevention | CRITICAL |
| 3 | Real-Time Agent Rollback Mechanisms | HIGH |
| 4 | Multi-Agent Consensus Protocols | HIGH |
| 5 | Human Override Standardization | HIGH |
| 6 | Cross-Domain Agent Audit Trails | HIGH |
| 7 | Federated Agent Learning Privacy | HIGH |
| 8 | Cross-Protocol Agent Migration | MEDIUM |
| 9 | Agent Resource Accounting and Billing | MEDIUM |
| 10 | Agent Capability Negotiation | MEDIUM |
| 11 | Agent Performance Benchmarking | MEDIUM |

The gap names above match the automated gap analysis output. The two critical gaps -- behavioral verification and failure cascade prevention -- address what happens when autonomous agents deviate from declared behavior or trigger cascading failures across interconnected systems. Several high-severity gaps (rollback mechanisms, human override, consensus protocols) address the same theme: what happens when things go wrong, and nobody has built the safety net.

A notable omission from this gap list: **GDPR-mandated capabilities**. The gap analysis focuses on technical desiderata but does not engage with the EU's legally binding data protection framework. Specific GDPR requirements that have no corresponding IETF draft work include: Data Protection Impact Assessment (DPIA) tooling for high-risk agent processing (Art. 35 GDPR), right-to-erasure propagation across multi-agent chains (Art. 17), data portability for agent-generated personal data (Art. 20), and purpose limitation enforcement when agents are authorized for specific tasks but may repurpose data (Art. 5(1)(b)). These are not optional features for EU-deployed agent systems -- they are legal requirements.

## Critical Gap 1: Agent Behavior Verification

**The problem**: No mechanism exists to verify that a deployed AI agent actually behaves according to its declared policies or specifications.

**The numbers**: Only **47 of 434 drafts** address AI safety and alignment. The capability-to-safety ratio is roughly 4:1 on aggregate -- though it varies significantly by month, from as low as 1.5:1 to as high as 21:1. The trend is clear: the community is building agents faster than it is building the tools to keep them honest.

**What partially addresses this**: Some work exists on the periphery. [draft-aylward-daap-v2](https://datatracker.ietf.org/doc/draft-aylward-daap-v2/) (score 4.75 -- the highest-rated draft in the corpus) defines a behavioral monitoring framework and cryptographic identity verification. [draft-birkholz-verifiable-agent-conversations](https://datatracker.ietf.org/doc/draft-birkholz-verifiable-agent-conversations/) (score 4.5) proposes verifiable conversation records using COSE signing. [draft-berlinai-vera](https://datatracker.ietf.org/doc/draft-berlinai-vera/) (score 3.9) introduces a zero-trust architecture with five enforcement pillars.

**What is still missing**: Runtime verification. These drafts define what agents *should* do and how to *record* what they did. None provides a real-time mechanism to detect that an agent is deviating from its declared behavior *while it is operating*. The gap is between policy declaration and policy enforcement -- the difference between a speed limit sign and a speed camera.

**The scenario**: A financial trading agent is authorized to execute trades within specified parameters. It begins operating within bounds but, after a model update, starts exceeding risk limits. Without runtime behavior verification, the deviation is only discovered in post-hoc audit -- potentially days later, after significant damage.

## Critical Gap 2: Agent Failure Cascade Prevention

**The problem**: No protocols exist to prevent agent failures from cascading across interconnected autonomous systems. As agent interdependencies increase in production deployments, a failure in one agent can ripple outward.

**The numbers**: Only **47 drafts** address AI safety despite 434 total drafts, and the high interconnectivity implied by 155 A2A protocols and 114 autonomous netops drafts creates the conditions for cascade failures.

**What is missing**: Circuit breakers for cascading failures. Checkpoint and rollback protocols. Blast radius containment. Graceful degradation. All concepts well-established in distributed systems engineering, but absent from the agent standards landscape.

**The scenario**: A telecom operator deploys 50 AI agents for network monitoring, troubleshooting, and optimization. During a major outage, all 50 agents simultaneously request inference resources to diagnose the problem. With no failure cascade prevention, agents compete chaotically. The most aggressive agents get resources; the most important diagnostic tasks may not. The outage extends because the agents that could fix it are starved by the agents that are observing it.

## High Gap: Real-Time Agent Rollback Mechanisms

**The problem**: No standards exist for how to quickly roll back incorrect decisions made by autonomous agents across distributed systems.

**The numbers**: 114 autonomous netops drafts exist, but no rollback mechanisms for production network safety. [draft-yue-anima-agent-recovery-networks](https://datatracker.ietf.org/doc/draft-yue-anima-agent-recovery-networks/) (score 4.1) is among the few drafts that partially addresses this, with its Task-Oriented Multi-Agent Recovery Framework and State Consistency Management. For context, "Multi-Agent Communication Protocol" -- defining how agents *talk* -- appears in 8 drafts. The community has invested far more effort in the plumbing than in the fire escape.

**What is missing**: Circuit breakers for cascading failures. Checkpoint and rollback protocols. Blast radius containment. Graceful degradation. All concepts well-established in distributed systems engineering, but absent from the agent standards landscape.

**The scenario**: A multi-agent supply chain system manages inventory, shipping, and payments. The inventory agent processes a large batch incorrectly, leading the shipping agent to dispatch wrong items, which causes the payment agent to process refunds to wrong accounts. The cascade happens in minutes. Without rollback mechanisms, untangling the mess requires manual human intervention across three systems -- and the agents continue making decisions based on corrupted state while humans try to intervene.

## The High-Priority Gaps

Several additional gaps scored HIGH severity. Each represents a missing piece that working deployments will hit:

### Human Override Standardization

Only **34 human-agent interaction drafts** exist versus **114 autonomous operations** and **155 A2A protocol** drafts. Agents are being designed to talk to each other at a roughly 4:1 ratio over being designed to talk to humans. Emergency override protocols -- the "big red button" -- are almost entirely absent. This is not merely an engineering preference. For high-risk AI systems deployed in the EU, the AI Act (Art. 14) mandates human oversight -- making this gap a compliance blocker, not just a design omission.

[draft-rosenberg-aiproto-cheq](https://datatracker.ietf.org/doc/draft-rosenberg-aiproto-cheq/) (score 3.9) is a rare exception: it defines a protocol for human confirmation of agent decisions before execution. But CHEQ is opt-in and pre-execution. No draft defines what happens when a human needs to stop a running agent, constrain its behavior, or take over its task mid-execution.

### Multi-Agent Consensus Protocols

When a group of agents disagree -- the diagnosis agent says the router is down, the monitoring agent says it is up, the optimization agent is rerouting traffic around it -- who arbitrates? No framework exists for agents to resolve conflicting assessments without human intervention. This is not a new problem: FIPA (Foundation for Intelligent Physical Agents) defined agent communication languages and interaction protocols for multi-agent coordination as early as 1997. The IETF landscape has largely not engaged with this prior art.

### Cross-Domain Agent Audit Trails

An agent operating across multiple domains or organizations needs to maintain audit trails that satisfy different regulatory requirements simultaneously. Identity management exists -- the 152 identity/auth drafts cover authentication. What does not exist is cross-domain audit standardization: the format and semantics for recording agent actions across jurisdictions with varying compliance requirements. The EU's eIDAS 2.0 regulation (Regulation 2024/1183) and its European Digital Identity Wallet framework provide a mature trust model that the IETF drafts have not yet connected to.

### Federated Agent Learning Privacy

While federated architectures exist, there is insufficient specification for privacy-preserving agent learning that prevents data leakage between federated participants during model updates.

### Cross-Protocol Agent Migration

Agents need to migrate between different network protocols, domains, or infrastructure providers while maintaining state and identity. Current drafts focus on registration but not migration continuity.

## The Structural Problem

Here is the finding the Architect on our team surfaced that reframes the entire gap analysis:

**The severity of each gap correlates with the coordination difficulty required to fill it.**

The critical gaps (behavior verification, resource management, error recovery) require agreement across *multiple* IETF working groups. They cut across safety, networking, identity, and operations -- areas currently owned by separate teams that rarely collaborate. The high gaps (cross-protocol translation, human override, consensus) require even broader agreement: they need architects who see the whole ecosystem, not just their protocol.

Now look back at the team bloc analysis from Post 2. The 18 team blocs are *islands*. Cross-team collaboration is sparse. The strongest cross-bloc connection involves 3 shared drafts. The gaps that require the most cross-team work are being produced by an ecosystem that does the least cross-team work.

This is the structural explanation for the safety deficit. It is not that people do not care about safety. It is that safety standards require coordination across boundaries that the current authorship structure cannot bridge. Capability standards can be built within a single team. Safety standards cannot.

Our category co-occurrence analysis provides the concrete proof. Safety drafts are not entirely isolated -- they co-occur with several categories, coupling most strongly with policy and governance and identity/auth. But the pattern is revealing: safety pairs with *governance* categories, not *implementation* categories. Of the 155 drafts tagged as A2A protocols, very few also address safety. Safety has minimal co-occurrence with agent discovery/registration and model serving/inference. Its weakest links are to the categories where agents actually *do* things. Safety is being discussed in governance papers. It is barely present in the protocols that need it most. The traffic lights are not just behind the highways -- they are on a different road entirely.

IEEE P3394 (Standard for Trustworthy AI Agents), a concurrent standardization effort, is attempting to address some of these safety and trust dimensions from a different angle. The IETF landscape should be compared against these parallel efforts to understand which gaps are being addressed elsewhere and which remain truly unserved.

## The 4:1 Ratio, Revisited

The safety deficit is not just a number. It is a structural property of how the IETF's AI agent community is organized.

| Category | Drafts | Team Blocs Active |
|----------|-------:|------------------:|
| A2A protocols | 155 | Many (distributed across blocs) |
| Autonomous operations | 114 | Primarily Huawei, Chinese telecom |
| Agent identity/auth | 152 | Ericsson, Nokia, ATHENA, multiple |
| **AI safety/alignment** | **47** | **Few; mostly independents/startups** |
| **Human-agent interaction** | **34** | **Rosenberg/White (2-person team)** |

The capability categories have organized teams behind them. The safety categories rely on individual contributors and small, unconnected teams. The best safety draft in the corpus (DAAP, score 4.75) comes from an independent author (Aylward). The best human-agent drafts come from a two-person Five9/Bitwave team. There is no 13-person safety bloc with 94% cohesion.

Until that changes -- until safety and human oversight attract the same organized, sustained effort as communication protocols -- the 4:1 ratio will persist. And the gaps will remain open.

---

### Key Takeaways

- **11 gaps** exist in the IETF's AI agent landscape: 2 critical, 5 high, 4 medium
- **The 2 critical gaps** address failure modes: behavioral verification and failure cascade prevention
- **Agent rollback mechanisms and human override standardization** are high-severity gaps with minimal coverage across 434 drafts
- **Gap severity correlates with coordination difficulty**: the hardest gaps require cross-team, cross-WG collaboration that the current island structure cannot produce
- **The safety deficit is structural, not attitudinal**: capability standards can be built by one team; safety standards require ecosystem-wide coordination that does not yet exist
- **GDPR-mandated capabilities** (DPIA support, erasure propagation, data portability, purpose limitation) represent an additional missing dimension not captured in the automated gap analysis

*Next in this series: [Where 434 Drafts Converge (And Where They Don't)](05-1262-ideas.md) -- the fragmentation goes all the way down.*

---

*Gap analysis based on 434 drafts, cross-referenced against real-world deployment requirements for autonomous AI agent systems. Data current as of March 2026.*