Files

Christian Nennemann 439424bd04 Fix security, data integrity, and accuracy issues from 4-perspective review

Security fixes:
- Fix SQL injection in db.py:update_generation_run (column name whitelist)
- Flask SECRET_KEY from env var instead of hardcoded
- Add LLM rating bounds validation (_clamp_rating, 1-10)
- Fix JSON extraction trailing whitespace handling

Data integrity:
- Normalize 21 legacy category names to 11 canonical short forms
- Add false_positive column, flag 73 non-AI drafts (361 relevant remain)
- Document verified counts: 434 total/361 relevant drafts, 557 authors, 419 ideas, 11 gaps

Code quality:
- Fix version string 0.1.0 → 0.2.0
- Add close()/context manager to Embedder class
- Dynamic matrix size instead of hardcoded "260x260"

Blog accuracy:
- Fix EU AI Act timeline (enforcement Aug 2026, not "18 months")
- Distinguish OAuth consent from GDPR Einwilligung
- Add EU AI Act Annex III context to hospital scenario
- Add FIPA, eIDAS 2.0 references where relevant

Methodology:
- Add methodology.md documenting pipeline, limitations, rating rubric
- Add LLM-as-judge caveats to analyzer.py
- Document clustering threshold rationale

Reviews from: legal (German/EU law), statistics, development, science perspectives.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

2026-03-08 10:52:33 +01:00

12 KiB

Raw Blame History

The IETF's AI Agent Gold Rush: 434 Drafts, 557 Authors, and the Race to Define How AI Agents Talk

Fifteen months ago, AI agents barely registered at the IETF. Today, nearly 1 in 10 new Internet-Drafts is about AI agents. We analyzed every one.

For every Internet-Draft addressing how to keep an AI agent safe, roughly four are building new capabilities for it. That is the single most important number in this analysis.

We built an automated pipeline to fetch, categorize, rate, and map every AI- and agent-related Internet-Draft currently in the IETF system. We found 434 drafts from 557 authors at 230 organizations and identified 11 standardization gaps -- two of them critical. The result is the most comprehensive public analysis of the IETF's AI agent landscape to date.

The story the data tells is not subtle: the internet's most important standards body is in the middle of a gold rush, and the prospectors are moving faster than the safety inspectors.

The Growth Curve

In 2024, just 9 AI/agent-related drafts were submitted to the IETF -- 0.5% of all submissions. By Q1 2026, AI/agent drafts account for 9.3% of all new Internet-Drafts. Nearly 1 in 10.

Year	Total IETF Drafts	AI/Agent Drafts	AI Share
2021	1,108	~0	~0%
2022	1,121	~0	~0%
2023	1,241	~0	~0%
2024	1,651	9	0.5%
2025	2,696	190	7.0%
2026 (Q1)	1,748	162	9.3%

The IETF itself accelerated 2.4x from 2021 to 2025. But AI/agent work went from essentially zero to dominant topic in under two years. The acceleration is not gradual. It is a step function that began in mid-2025 and has not slowed.

This growth is driven by a convergence of forces: the explosion of commercial AI agent deployments (ChatGPT plugins, Anthropic's Claude tools, Google's Gemini agents), the emergence of protocols like MCP and A2A that need standardization, and the recognition across the industry that AI agents communicating over the internet without agreed-upon identity, security, and interoperability standards is a problem that gets worse every month it goes unaddressed.

(A note on methodology: our pipeline searches the Datatracker for 12 keywords -- agent, ai-agent, llm, autonomous, machine-learning, artificial-intelligence, mcp, agentic, inference, generative, intelligent, and aipref -- across both draft names and abstracts. We started with 6 keywords and 260 drafts, then expanded to 12 to capture MCP-related work, generative AI infrastructure, and intelligent networking. The full methodology is in Post 7.)

The drafts span ten categories, and the distribution reveals priorities:

Category	Drafts	Share
Data formats and interoperability	174	40%
A2A protocols	155	36%
Agent identity and authentication	152	35%
Autonomous network operations	114	26%
Policy and governance	109	25%
Agent discovery and registration	89	21%
ML traffic management	79	18%
AI safety and alignment	47	11%
Model serving and inference	42	10%
Human-agent interaction	34	8%

Note that drafts can belong to multiple categories, so percentages exceed 100%. The dominance of plumbing -- data formats, identity, and communication protocols -- is expected for an early-stage standards effort. What is unexpected is how little attention the safety and human-oversight categories receive.

The ecosystem's DNA is visible in what it cites. We parsed 4,231 cross-references from the drafts, and the foundation is clear: TLS 1.3 (RFC 8446, cited by 42 drafts), OAuth 2.0 (RFC 6749, 36 drafts), HTTP Semantics (RFC 9110, 34 drafts), and JWT (RFC 7519, 22 drafts). The agent identity/auth category is essentially built on top of the OAuth stack. The entire landscape stands on a security foundation -- which makes the 4:1 safety deficit all the more jarring.

The Safety Deficit

The ratio is stark:

Focus Area	Drafts
A2A protocols	155
Autonomous operations	114
Agent identity/auth	152
AI safety/alignment	47
Human-agent interaction	34

The capability-to-safety ratio is roughly 4:1 on aggregate, though it varies significantly by time period -- from as low as 1.5:1 in some months to over 20:1 in others. The overall trend is clear: for every draft about keeping agents safe, approximately four are building new capabilities. The community is building the highways and forgetting the traffic lights.

This is not an abstract concern. Imagine an AI agent managing cloud infrastructure that detects a spurious anomaly, autonomously scales down a critical service, and triggers a cascading outage across three availability zones. Today, there is no standard mechanism to verify that the agent followed its declared policy before acting. No standard way to roll back the decision once the cascade begins. No standard protocol for a human operator to issue an emergency stop. The three critical gaps our analysis identified -- behavior verification, resource management, and error recovery -- are all about what happens when things go wrong. And in a world of autonomous AI agents, things will go wrong.

The safety drafts that do exist are often among the highest-rated in our analysis. draft-aylward-daap-v2 -- a comprehensive accountability protocol -- and draft-cowles-volt -- a tamper-evident execution trace format -- each scored 4.75 out of 5 (4-dimension composite excluding overlap), the highest in the entire corpus. draft-birkholz-verifiable-agent-conversations, which defines verifiable conversation records using cryptographic signing, scored 4.5. The quality is there. The quantity is not.

Who's Writing the Drafts

The organizational picture is as revealing as the technical one. The top contributors:

Organization	Authors	Drafts
Huawei	53	69
China Mobile	24	35
Cisco	24	26
Independent	19	25
China Telecom	24	24
China Unicom	22	21
Tsinghua University	13	16
ZTE Corporation	12	12
Five9	1	10
Ericsson	4	9

Huawei leads by a wide margin: 53 authors contributing to 69 drafts (across all Huawei entities) -- about 16% of the entire corpus. But the concentration goes deeper than raw numbers -- the next post will examine the team bloc structure, geopolitics, and what the collaboration network reveals about where power really lies.

Cisco and China Mobile each have 24 authors, but China Mobile's team produces 35 drafts to Cisco's 26. Ericsson has only 4 authors but punches above its weight with 9 focused drafts. Independent contributors account for 25 drafts -- a healthy sign of grassroots engagement.

The Fragmentation Problem

The drafts are not just numerous; they are redundant. Our embedding-based similarity analysis found 25+ draft pairs with greater than 0.98 cosine similarity -- functionally identical proposals submitted under different names.

The most crowded space is OAuth for AI agents: 14 separate drafts all trying to solve how AI agents authenticate and get authorized. They range from broad framework proposals (draft-aap-oauth-profile) to narrow extensions (draft-jia-oauth-scope-aggregation) to full accountability systems (draft-aylward-daap-v2). None are compatible with each other.

Beyond OAuth, the broader A2A protocol landscape includes 155 drafts with no interoperability layer. The most common technical idea in the entire corpus -- "Multi-Agent Communication Protocol" -- appears in 8 separate drafts from different teams. And the fragmentation goes deeper than protocols: the vast majority of technical ideas extracted from the corpus appear in exactly one draft. Everyone is solving the same problem. Nobody is solving it together.

This fragmentation has real costs. Implementers face confusion over which draft to follow. The IETF process slows as competing proposals vie for working group adoption. And the longer competing drafts proliferate without convergence, the higher the risk of incompatible deployments that entrench fragmentation rather than resolving it.

What the Best Drafts Look Like

Not everything is chaos. Our quality ratings -- scoring novelty, maturity, overlap avoidance, momentum, and relevance on a 1-5 scale -- surface drafts that are doing the hard work well:

Draft	Score	What It Does
draft-aylward-daap-v2	4.75	Comprehensive AI agent accountability with authentication, monitoring, enforcement
draft-guy-bary-stamp-protocol	4.5	Cryptographic delegation and proof for agent task execution
draft-drake-email-tpm-attestation	4.5	Hardware attestation for email via TPM verification chains
draft-ietf-lake-app-profiles	4.5	Canonical CBOR for EDHOC application profiles
draft-birkholz-verifiable-agent-conversations	4.5	Verifiable agent conversation records with COSE signing

Scores are 4-dimension composites (novelty, maturity, momentum, relevance), excluding overlap. The average score across all 434 rated drafts is 3.27. The best work combines clear problem definition with concrete mechanisms and low overlap with existing proposals. The worst drafts are me-too proposals that restate problems already solved elsewhere.

Methodology note: Quality ratings are LLM-generated (Claude Sonnet) from draft abstracts only, not full text. No human calibration has been performed. Scores should be treated as relative rankings within this corpus, not absolute quality measures. See How We Built This and the Methodology document for details.

What Comes Next

The IETF has navigated technology gold rushes before -- the early web, IoT, DNS security. In each case, the first wave of competing proposals eventually converged, and the lasting standards came from those who focused on interoperability and safety alongside capability.

The AI agent wave is following the same early pattern. The landscape has quantity. The question is whether it develops architecture -- and whether the safety work catches up before the capability work ships without it.

This blog series will dig into the questions the data raises. The next post starts with the most fundamental: who, exactly, is writing the rules?

Key Takeaways

434 drafts from 557 authors at 230 organizations -- AI/agent work went from 0.5% to 9.3% of all IETF submissions in 15 months
The capability-to-safety ratio (roughly 4:1 on aggregate, varying from 1.5:1 to 21:1 by month) is the most concerning structural finding
Huawei dominates authorship with 53 authors on 69 drafts (~16% of corpus); Chinese-linked institutions account for 160+ authors
14 competing OAuth-for-agents proposals illustrate deep fragmentation; 155 A2A protocol drafts have no interoperability layer
11 standardization gaps remain, with the 2 most critical relating to what happens when agents fail

Next in this series: Who's Writing the Rules for AI Agents? -- Inside the team blocs, geopolitics, and collaboration networks behind the IETF's AI agent standards.

Analysis conducted using the IETF Draft Analyzer. Data current as of March 2026. All 434 drafts, 557 authors, and full analysis data are available in the project's SQLite database.

12 KiB Raw Blame History