Fix remaining critical, high, and medium issues from 4-perspective review
Critical fixes:
- Fix rating clamp range 1-10 → 1-5 (actual scale)
- Add `ietf ideas convergence` command (SequenceMatcher at 0.75 threshold)
- Fix "628 cross-org ideas" → 130 (verified from current DB) across 8 files
Security fixes:
- Sanitize FTS5 query input (strip special chars + boolean operators)
- Add rate limiting (10 req/min/IP) on Claude-calling endpoints
- Change <path:name> → <string:name> on draft routes
Codebase fixes:
- Add Database context manager (__enter__/__exit__)
- Wire false_positive filtering into queries (exclude by default in web UI)
- Fix Post 3 arithmetic ("~300" → "~409" distinct proposals)
Content & licensing:
- Add MIT LICENSE file
- Add IPR/FRAND notes (BCP 79, RFC 8179) to Posts 03 and 07
- Qualify "4:1 safety ratio" with monthly variation in 6 remaining files
- Add "Data as of March 2026" freeze-date headers to all 10 blog posts
- Hedge causal language in Post 04
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
This commit is contained in:
@@ -2,6 +2,8 @@
|
||||
|
||||
*The 11 gaps in the IETF's AI agent landscape -- and the real-world disasters they invite.*
|
||||
|
||||
*Analysis based on IETF Datatracker data collected through March 2026. Counts and statistics reflect this snapshot.*
|
||||
|
||||
---
|
||||
|
||||
Imagine an AI agent managing a hospital's drug-dispensing system. It receives instructions from a prescribing agent, coordinates with a pharmacy agent, and issues delivery commands to a robotic dispensing agent. On Tuesday morning, the prescribing agent hallucinates a dosage. The pharmacy agent fills it. The dispensing agent delivers it. No human saw it happen. No system flagged it. No protocol exists to roll back the dispensed medication.
|
||||
@@ -74,7 +76,7 @@ Several additional gaps scored HIGH severity. Each represents a missing piece th
|
||||
|
||||
### Human Override Standardization
|
||||
|
||||
Only **34 human-agent interaction drafts** exist versus **114 autonomous operations** and **155 A2A protocol** drafts. Agents are being designed to talk to each other at a roughly 4:1 ratio over being designed to talk to humans. Emergency override protocols -- the "big red button" -- are almost entirely absent. This is not merely an engineering preference. For high-risk AI systems deployed in the EU, the AI Act (Art. 14) mandates human oversight -- making this gap a compliance blocker, not just a design omission.
|
||||
Only **34 human-agent interaction drafts** exist versus **114 autonomous operations** and **155 A2A protocol** drafts. Agents are being designed to talk to each other at a roughly 4:1 ratio (averaging ~4:1, varying from 1.5:1 to 21:1 month-to-month) over being designed to talk to humans. Emergency override protocols -- the "big red button" -- are almost entirely absent. This is not merely an engineering preference. For high-risk AI systems deployed in the EU, the AI Act (Art. 14) mandates human oversight -- making this gap a compliance blocker, not just a design omission.
|
||||
|
||||
[draft-rosenberg-aiproto-cheq](https://datatracker.ietf.org/doc/draft-rosenberg-aiproto-cheq/) (score 3.9) is a rare exception: it defines a protocol for human confirmation of agent decisions before execution. But CHEQ is opt-in and pre-execution. No draft defines what happens when a human needs to stop a running agent, constrain its behavior, or take over its task mid-execution.
|
||||
|
||||
@@ -98,7 +100,7 @@ Agents need to migrate between different network protocols, domains, or infrastr
|
||||
|
||||
Here is the finding the Architect on our team surfaced that reframes the entire gap analysis:
|
||||
|
||||
**The severity of each gap correlates with the coordination difficulty required to fill it.**
|
||||
**The severity of each gap appears to correlate with the coordination difficulty required to fill it.**
|
||||
|
||||
The critical gaps (behavior verification, resource management, error recovery) require agreement across *multiple* IETF working groups. They cut across safety, networking, identity, and operations -- areas currently owned by separate teams that rarely collaborate. The high gaps (cross-protocol translation, human override, consensus) require even broader agreement: they need architects who see the whole ecosystem, not just their protocol.
|
||||
|
||||
@@ -110,7 +112,7 @@ Our category co-occurrence analysis provides the concrete proof. Safety drafts a
|
||||
|
||||
IEEE P3394 (Standard for Trustworthy AI Agents), a concurrent standardization effort, is attempting to address some of these safety and trust dimensions from a different angle. The IETF landscape should be compared against these parallel efforts to understand which gaps are being addressed elsewhere and which remain truly unserved.
|
||||
|
||||
## The 4:1 Ratio, Revisited
|
||||
## The ~4:1 Ratio, Revisited
|
||||
|
||||
The safety deficit is not just a number. It is a structural property of how the IETF's AI agent community is organized.
|
||||
|
||||
@@ -124,7 +126,7 @@ The safety deficit is not just a number. It is a structural property of how the
|
||||
|
||||
The capability categories have organized teams behind them. The safety categories rely on individual contributors and small, unconnected teams. The best safety draft in the corpus (DAAP, score 4.75) comes from an independent author (Aylward). The best human-agent drafts come from a two-person Five9/Bitwave team. There is no 13-person safety bloc with 94% cohesion.
|
||||
|
||||
Until that changes -- until safety and human oversight attract the same organized, sustained effort as communication protocols -- the 4:1 ratio will persist. And the gaps will remain open.
|
||||
Until that changes -- until safety and human oversight attract the same organized, sustained effort as communication protocols -- the ~4:1 aggregate ratio will persist. And the gaps will remain open.
|
||||
|
||||
---
|
||||
|
||||
@@ -133,8 +135,8 @@ Until that changes -- until safety and human oversight attract the same organize
|
||||
- **11 gaps** exist in the IETF's AI agent landscape: 2 critical, 5 high, 4 medium
|
||||
- **The 2 critical gaps** address failure modes: behavioral verification and failure cascade prevention
|
||||
- **Agent rollback mechanisms and human override standardization** are high-severity gaps with minimal coverage across 434 drafts
|
||||
- **Gap severity correlates with coordination difficulty**: the hardest gaps require cross-team, cross-WG collaboration that the current island structure cannot produce
|
||||
- **The safety deficit is structural, not attitudinal**: capability standards can be built by one team; safety standards require ecosystem-wide coordination that does not yet exist
|
||||
- **Gap severity appears to correlate with coordination difficulty**: the hardest gaps require cross-team, cross-WG collaboration that the current island structure cannot produce
|
||||
- **The safety deficit appears structural, not attitudinal**: capability standards can be built by one team; safety standards require ecosystem-wide coordination that does not yet exist
|
||||
- **GDPR-mandated capabilities** (DPIA support, erasure propagation, data portability, purpose limitation) represent an additional missing dimension not captured in the automated gap analysis
|
||||
|
||||
*Next in this series: [Where 434 Drafts Converge (And Where They Don't)](05-1262-ideas.md) -- the fragmentation goes all the way down.*
|
||||
|
||||
Reference in New Issue
Block a user