v0.3.0: Publication-ready release with blog site, paper update, and polish

Release prep:
- Version bump to 0.3.0 (pyproject.toml, cli.py)
- Rewrite README.md with current stats (475 drafts, 713 authors, 501 ideas)
- Add CONTRIBUTING.md with dev setup and code conventions

Blog site:
- Add scripts/build-site.py (markdown → HTML with clean CSS, dark mode, nav)
- Generate static site in docs/blog/ (10 pages)
- Ready for GitHub Pages deployment

Academic paper (paper/main.tex):
- Update all counts: 474→475 drafts, 557→710 authors, 1907→462 ideas, 11→12 gaps
- Add false-positive filtering methodology (113 excluded, 361 relevant)
- Add cross-org convergence analysis (132 ideas, 33% rate)
- Add GDPR compliance gap to gap table
- Add LLM-as-judge caveats to rating methodology and limitations
- Add FIPA, IEEE P3394, W3C WoT to related work with bibliography entries
- Fix safety ratio to show monthly variation (1.5:1 to 21:1)

Pipeline:
- Fetch 1 new draft (475 total), 3 new authors (713 total)
- Fix 16 ruff lint errors across test files
- All 106 tests pass

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
This commit is contained in:
2026-03-08 17:54:43 +01:00
parent e247bfef8f
commit 1ec1f69bee
34 changed files with 4268 additions and 272 deletions

61
CONTRIBUTING.md Normal file
View File

@@ -0,0 +1,61 @@
# Contributing
## Setup
```bash
git clone <repo-url>
cd ietf-draft-analyzer
pip install -e ".[test]"
```
You'll also need:
- **Anthropic API key**: Set `ANTHROPIC_API_KEY` in `.env` or environment
- **Ollama**: Running locally with `nomic-embed-text` model for embeddings
## Running Tests
```bash
pytest tests/ -v
```
Tests cover: JSON extraction, rating validation, FTS5 sanitization, database operations, data models, Obsidian export.
## Code Conventions
- **CLI**: Click commands in `cli.py` with `@click.option()` decorators. Use the `pass_cfg_db` decorator for config/db lifecycle.
- **Database**: Tables defined in `db.py` `ensure_tables()`, queries as methods on `Database`. Always use parameterized queries.
- **Reports**: Report types in `reports.py` `generate_report()`.
- **LLM calls**: Always cache via `llm_cache` table (prompt SHA-256 hash). Use `cheap=True` for bulk operations.
- **Output**: Use `rich` for console output.
- **Web UI**: Flask routes in `app.py`, data access in `data.py` with TypedDict return types, Jinja2 templates.
## Adding a New CLI Command
1. Add the command function in `cli.py` under the appropriate group
2. Use `@pass_cfg_db` for automatic config/db injection
3. Add `--dry-run` flag for commands that modify data
4. Use `rich` tables/panels for output
## Adding a New Report Type
1. Add the report function in `reports.py`
2. Register it in the `report` CLI group in `cli.py`
3. Output goes to `data/reports/`
## Adding a New Web UI Page
1. Create template in `src/webui/templates/`
2. Add data function in `data.py` (with TypedDict return type)
3. Add route in `app.py`
4. Add navigation link in `base.html`
## Linting
```bash
pip install ruff
ruff check src/ tests/ --select E,F,W --ignore E501
```
## Project Layout
See README.md for full project structure.

287
README.md
View File

@@ -1,8 +1,8 @@
# IETF Draft Analyzer
Track, categorize, rate, and visualize AI/agent-related IETF Internet-Drafts.
Track, categorize, rate, and map AI/agent-related IETF Internet-Drafts.
**260 drafts** analyzed across **19 categories** with **403 authors**, **1,262 extracted ideas**, and **12 identified gaps** — spanning June 2025 to February 2026.
**475 drafts** analyzed (361 relevant after false-positive filtering) with **713 authors**, **501 extracted ideas**, **132 cross-org convergent ideas**, and **12 identified gaps** — spanning 2024 to March 2026.
## What This Does
@@ -12,9 +12,10 @@ The IETF is experiencing an unprecedented wave of standardization activity aroun
- **Rates** each draft on 5 dimensions (novelty, maturity, overlap, momentum, relevance) using Claude
- **Embeds** drafts with Ollama for pairwise similarity and clustering
- **Extracts** discrete technical ideas and identifies landscape gaps
- **Analyzes** cross-organizational convergence (SequenceMatcher at 0.75 threshold)
- **Maps** the author collaboration network and organizational affiliations
- **Generates** interactive visualizations, markdown reports, and a filterable browser
- **Produces** publication-ready figures for an arXiv paper
- **Generates** markdown reports and a full web dashboard
- **Filters** false positives automatically (relevance-based + manual flagging)
## Quick Start
@@ -28,23 +29,29 @@ export ANTHROPIC_API_KEY=sk-ant-...
# Fetch drafts from IETF Datatracker
ietf fetch
# Rate all unrated drafts with Claude
# Rate all unrated drafts (--cheap uses Haiku for lower cost)
ietf analyze --all
ietf analyze --all --cheap # ~10x cheaper with Haiku
# Generate embeddings (requires Ollama running locally)
ietf embed
# Extract technical ideas
ietf ideas --all
ietf ideas --all --cheap --batch 5
# Analyze cross-org convergence
ietf ideas convergence
# Identify gaps in the landscape
ietf gaps
# Generate all visualizations
ietf viz all
# Fetch author data
ietf authors --fetch
# Open the interactive browser
xdg-open data/figures/browser.html
# Generate reports
ietf report overview
ietf report landscape
ietf report authors
# Launch the web dashboard
./scripts/run-webui.sh
@@ -52,26 +59,51 @@ xdg-open data/figures/browser.html
## Web Dashboard
A full interactive dashboard at `http://127.0.0.1:5000` with 8 pages:
A full interactive dashboard at `http://127.0.0.1:5000`:
```bash
# Start the dashboard
./scripts/run-webui.sh
# or: python src/webui/app.py
# or: FLASK_APP=src/webui/app.py flask run
```
| Page | What it shows |
|------|---------------|
| **Overview** | Stat cards, score histogram, category donut, submission timeline, category radar |
| **Draft Explorer** | Searchable/filterable/sortable table of all drafts with category pills and score badges |
| **Draft Detail** | Individual draft view with score ring, dimension bars, ideas, references, and linked authors |
| **Ratings** | Score distributions, dimension box plots, category radar, novelty vs maturity scatter, top-20 leaderboard |
| **Landscape** | t-SNE embedding map, quality quadrants, violin plots by category |
| **Authors** | Co-authorship force-directed graph, organization charts, cross-org collaboration |
| **Overview** | Stat cards, score histogram, category radar, submission timeline |
| **Draft Explorer** | Searchable/filterable/sortable table with category pills and score badges |
| **Draft Detail** | Score ring, dimension bars, ideas, references, linked authors |
| **Ratings** | Score distributions, box plots, category radar, novelty vs maturity scatter |
| **Landscape** | t-SNE embedding map, quality quadrants |
| **Authors** | Co-authorship force-directed graph, organization charts |
| **Ideas** | Extracted ideas grouped by type with search |
| **Gaps** | Gap cards sorted by severity with links to related drafts |
| **Gaps** | Gap cards sorted by severity with related drafts |
| **Citations** | RFC cross-reference graph |
| **Similarity** | Draft similarity network |
| **Timeline** | Submission trends over time |
| **Monitor** | Pipeline health, API costs, processing status |
Charts are interactive (Plotly.js) — click data points to navigate to draft details, click categories to filter.
Charts are interactive (Plotly.js). GDPR-compliant analytics (no cookies, daily-salted IP hashing).
## Blog Series
An 8-post analysis series in `data/reports/blog-series/`:
1. **The Gold Rush** — Growth from 9 drafts to 9.3% of all IETF submissions
2. **Who Writes the Rules** — Huawei's 16%, geopolitical dynamics, team blocs
3. **The OAuth Wars** — 14 competing OAuth proposals, fragmentation costs
4. **What Nobody Builds** — The safety deficit (4:1 ratio), 12 identified gaps
5. **Where Drafts Converge** — 132 cross-org convergent ideas, implicit consensus
6. **The Big Picture** — Architectural vision, EU AI Act implications
7. **How We Built This** — Methodology, cost ($9-15), limitations
8. **Agents Building the Agent Analysis** — Meta post on using Claude agent teams
## Key Findings
- **Safety deficit**: ~4:1 ratio of capability-building to safety proposals (varies 1.5:1 to 21:1 monthly)
- **Extreme fragmentation**: 155 competing A2A protocols, 42 overlap clusters
- **Organizational concentration**: Huawei ~16% of all drafts, Chinese orgs ~40%
- **Cross-org convergence**: 132 ideas (33%) independently proposed by multiple organizations
- **12 gaps identified**: 2 critical (behavior verification, human override), 5 high, 5 medium
- **Top-rated drafts**: Safety-focused proposals score highest (VOLT 4.75, DAAP 4.75)
## CLI Commands
@@ -80,69 +112,42 @@ Charts are interactive (Plotly.js) — click data points to navigate to draft de
| Command | Description |
|---------|-------------|
| `ietf fetch` | Fetch AI/agent drafts from IETF Datatracker |
| `ietf analyze --all` | Rate all unrated drafts using Claude (5 dimensions + summary) |
| `ietf embed` | Generate semantic embeddings via Ollama |
| `ietf ideas --all` | Extract technical ideas from drafts using Claude |
| `ietf gaps` | Identify under-addressed areas in the landscape |
| `ietf authors --fetch` | Fetch author/affiliation data from Datatracker |
| `ietf analyze --all [--cheap] [--dry-run]` | Rate drafts using Claude |
| `ietf embed [--dry-run]` | Generate semantic embeddings via Ollama |
| `ietf ideas --all [--cheap] [--batch N] [--dry-run]` | Extract technical ideas |
| `ietf ideas convergence [--threshold 0.75]` | Cross-org convergence analysis |
| `ietf ideas dedup` | Deduplicate similar ideas |
| `ietf gaps [--dry-run]` | Identify landscape gaps |
| `ietf authors --fetch` | Fetch author/affiliation data |
### Exploration
| Command | Description |
|---------|-------------|
| `ietf list` | List tracked drafts |
| `ietf show <name>` | Show detailed info for a specific draft |
| `ietf search <query>` | Full-text search across all stored drafts |
| `ietf similar <name>` | Find the most similar drafts by embedding similarity |
| `ietf show <name>` | Show detailed info for a draft |
| `ietf search <query>` | Full-text search (FTS5) |
| `ietf similar <name>` | Find similar drafts by embedding similarity |
| `ietf clusters` | Find clusters of near-duplicate drafts |
| `ietf compare <name1> <name2> ...` | Compare drafts for overlap and unique contributions |
| `ietf authors` | Show top authors and their draft counts |
| `ietf network` | Show organizational collaboration network |
### Visualizations (`ietf viz`)
All outputs go to `data/figures/`. Interactive charts are standalone HTML files (no server needed).
| Command | Output | Format |
|---------|--------|--------|
| `ietf viz all` | Generate everything below | mixed |
| `ietf viz browser` | Filterable draft browser with search, category chips, score sliders | HTML |
| `ietf viz landscape` | t-SNE/UMAP 2D scatter of all drafts colored by category | HTML |
| `ietf viz heatmap` | 260x260 clustered pairwise similarity matrix | PNG |
| `ietf viz distributions` | Violin plots for all 5 rating dimensions by category | PNG |
| `ietf viz timeline` | Stacked area chart of monthly submissions by category | HTML |
| `ietf viz bubble` | Novelty vs Maturity explorer (size=relevance, color=category) | HTML |
| `ietf viz radar` | Average rating profile per category | HTML |
| `ietf viz network` | Author co-authorship force-directed graph | HTML |
| `ietf viz treemap` | Category composition treemap (color=avg score) | HTML |
| `ietf viz quality` | Score vs uniqueness with quadrant annotations | HTML |
| `ietf viz orgs` | Organization contribution horizontal bar chart | HTML |
| `ietf viz ideas` | Ideas frequency by type | HTML |
| `ietf compare <name1> <name2>` | Compare drafts for overlap |
| `ietf authors` | Top authors and draft counts |
| `ietf network` | Organizational collaboration network |
### Reports (`ietf report`)
Markdown reports saved to `data/reports/`.
| Command | Description |
|---------|-------------|
| `ietf report overview` | Sortable table of all rated drafts with bar-chart scores |
| `ietf report landscape` | Category-grouped view with per-category rankings |
| `ietf report timeline` | Monthly submission volume and category trends |
| `ietf report overlap-matrix` | Top similar pairs, per-category overlap, cross-category matrix |
| `ietf report authors` | Top authors, organizations, collaboration pairs |
| `ietf report digest` | Weekly digest of recently fetched drafts |
| `ietf report ideas` | Most common ideas, unique ideas, ideas by type |
### Other
| Command | Description |
|---------|-------------|
| `ietf draft-gen <topic>` | Generate an Internet-Draft addressing a landscape gap |
| `ietf config` | Show or modify configuration |
| `ietf report overview` | Sortable table of all rated drafts |
| `ietf report landscape` | Category-grouped view with rankings |
| `ietf report authors` | Top authors, organizations, collaboration |
| `ietf report ideas` | Ideas by type, most common, unique |
| `ietf report gaps` | Gap analysis with severity ratings |
| `ietf report timeline` | Monthly submission trends |
| `ietf report overlap-matrix` | Similar pairs and cross-category matrix |
## Rating System
Each draft is scored 1-5 on five dimensions:
Each draft is scored 1-5 on five dimensions by Claude (LLM-as-judge, see [methodology](data/reports/methodology.md) for caveats):
| Dimension | What it measures |
|-----------|-----------------|
@@ -152,145 +157,71 @@ Each draft is scored 1-5 on five dimensions:
| **Momentum** | Community engagement, revisions, adoption |
| **Relevance** | Importance to the AI/agent ecosystem |
**Composite score:**
```
score = 0.30 * novelty + 0.25 * relevance + 0.20 * maturity + 0.15 * momentum + 0.10 * (6 - overlap)
```
## Key Findings
- **36x growth** in 9 months (2 drafts/month to 72)
- **7.9% of draft pairs** exceed 0.80 cosine similarity — significant redundancy
- **Safety deficit**: AI safety proposals (36) are vastly outnumbered by protocol proposals (290+)
- **Organizational concentration**: Top 5 orgs contribute ~35% of all drafts
- **1,262 technical ideas** extracted across 6 types (mechanism, architecture, protocol, pattern, extension, requirement)
- **12 identified gaps** in the current landscape (3 critical, 6 high, 3 medium)
## Gap Analysis
Claude-powered gap analysis identifies 12 under-addressed areas across the 260-draft landscape. Each gap is cross-referenced with the drafts and ideas that partially touch on the topic, highlighting where effort is concentrated and where it's missing.
### Critical Gaps
| # | Gap | Category | Drafts in Category | Key Issue |
|--:|-----|----------|-------------------:|-----------|
| 1 | **Agent Resource Management** | Autonomous netops | 60 | No framework for scheduling, quotas, or fair allocation when agents compete for compute, memory, and bandwidth. Drafts focus on communication but ignore resource contention in multi-agent environments. |
| 2 | **Agent Behavior Verification** | AI safety/alignment | 36 | No runtime mechanisms to verify that deployed agents actually behave according to declared policies. Gap between stated capabilities and observed behavior. Closest work: `draft-birkholz-verifiable-agent-conversations` (attestation), `draft-aylward-daap-v2` (accountability). |
| 3 | **Agent Error Recovery & Rollback** | Autonomous netops | 60 | Missing standards for cascading failure recovery and rollback of autonomous decisions. Only `draft-yue-anima-agent-recovery-networks` specifically addresses recovery; `draft-srijal-agents-policy` touches mandatory failure behavior. |
### High-Severity Gaps
| # | Gap | Category | Drafts in Category | Key Issue |
|--:|-----|----------|-------------------:|-----------|
| 4 | **Cross-Protocol Translation** | A2A protocols | 92 | 92 competing A2A protocol drafts with high overlap but no universal translation layer or negotiation mechanism for interoperability between them. |
| 5 | **Agent Lifecycle Management** | Agent discovery/reg | 57 | Registration and discovery are covered but no standards for agent versioning, updates, graceful shutdown, or retirement without disrupting dependent services. |
| 6 | **Multi-Agent Consensus** | A2A protocols | 92 | No framework for groups of agents to reach consensus on conflicting decisions. Closest: `draft-li-dmsc-inf-architecture` (DMSC protocol), `draft-takagi-srta-trinity` (SRTA architecture). |
| 7 | **Human Override & Intervention** | Human-agent interaction | 22 | Only 22 drafts (vs 60 autonomous netops) address human-agent interaction. No emergency override protocols. Best effort: `draft-irtf-nmrg-llm-nm` (human-in-the-loop framework). |
| 8 | **Cross-Domain Security Boundaries** | Agent identity/auth | 98 | Missing frameworks for agents operating across security domains with different trust levels. `draft-diaconu-agents-authz-info-sharing` and `draft-cui-dmsc-agent-cdi` are early attempts but lack enforcement mechanisms. |
| 9 | **Dynamic Trust & Reputation** | Agent identity/auth | 98 | Static certificate-based auth is insufficient for long-running autonomous systems. No dynamic trust scoring or reputation tracking. Closest: `draft-cosmos-protocol-specification` (trust scoring), `draft-jiang-seat-dynamic-attestation`. |
### Medium-Severity Gaps
| # | Gap | Category | Drafts in Category | Key Issue |
|--:|-----|----------|-------------------:|-----------|
| 10 | **Agent Performance Monitoring** | Autonomous netops | 60 | No standardized metrics, SLOs, or observability framework for production agent deployments. `draft-fu-nmop-agent-communication-framework` mentions monitoring but doesn't define standards. |
| 11 | **Agent Explainability** | AI safety/alignment | 36 | No protocols for agents to explain decisions to other agents or humans. Critical for debugging and regulatory compliance. Only 36 safety drafts total. |
| 12 | **Agent Data Provenance** | Data formats/interop | 102 | No standards for tracking data lineage as information flows between agents. 102 data format drafts but none address provenance tracking. |
### Gap Coverage Ratio
The safety deficit is the most striking finding — only **12.3%** of categorized drafts (36/292) address AI safety/alignment, while 92 focus on A2A protocols and 60 on autonomous operations. The ratio of "how to do things" to "how to do things safely" is roughly **7:1**.
**Important**: Ratings are generated from abstracts and partial full text without human calibration. They should be treated as relative rankings, not absolute quality measures.
## Tech Stack
- **Python 3.11+** with Click CLI
- **SQLite** with FTS5 full-text search and WAL mode
- **Anthropic Claude** (Sonnet 4) for analysis, rating, idea extraction, gap analysis
- **Anthropic Claude** (Sonnet/Haiku) for analysis, rating, idea extraction, gap analysis
- **Ollama** (nomic-embed-text) for local embeddings and similarity
- **Flask** with Jinja2 for the interactive web dashboard
- **Plotly** for interactive HTML visualizations
- **Matplotlib/Seaborn** for publication-ready static figures
- **NetworkX** for author collaboration graph analysis
- **NumPy/SciPy/scikit-learn** for similarity computation and dimensionality reduction
- **Flask** with Jinja2 for the web dashboard
- **Plotly** for interactive visualizations
- **NumPy/SciPy/scikit-learn** for similarity computation and clustering
## Project Structure
```
src/ietf_analyzer/
cli.py # Click CLI entry point (all commands)
cli.py # Click CLI entry point (~30 commands)
fetcher.py # IETF Datatracker API client
analyzer.py # Claude-based analysis, rating, idea extraction, gap analysis
embeddings.py # Ollama embeddings + cosine similarity + clustering
db.py # SQLite with FTS5 (7 tables: drafts, ratings, embeddings, llm_cache, authors, draft_authors, ideas, gaps)
analyzer.py # Claude-based analysis (rating, ideas, gaps)
embeddings.py # Ollama embeddings + similarity + clustering
db.py # SQLite with FTS5 (8 tables)
models.py # Author, Draft, Rating dataclasses
reports.py # Markdown report generation
visualize.py # Interactive HTML + static PNG visualizations
authors.py # AuthorNetwork: Datatracker author fetching, collaboration graph
draftgen.py # Internet-Draft generation from gap analysis
config.py # Configuration with defaults
authors.py # Author network analysis
search.py # Hybrid FTS5 + embedding search
classifier.py # Two-stage Ollama classifier
readiness.py # Draft readiness scoring
config.py # Configuration
src/webui/
app.py # Flask application with all routes
data.py # Data access layer (stats, filtering, t-SNE, network graphs)
templates/ # Jinja2 templates (base + 8 page templates)
app.py # Flask application (20 API endpoints)
data.py # Data access layer with TypedDicts
auth.py # Admin authentication
analytics.py # GDPR-compliant pageview tracking
templates/ # Jinja2 templates (23 pages)
data/
drafts.db # SQLite database (all analysis data)
reports/ # Generated markdown reports
figures/ # Generated visualizations (HTML + PNG)
drafts.db # SQLite database
reports/ # Generated reports + blog series
.cache/ # Similarity matrix cache
paper/
main.tex # arXiv paper: "The AI Agent Standardization Wave"
export_figures.py # Export interactive charts to static images
Makefile # Build: make pdf
main.tex # arXiv paper
```
## Database Schema
| Table | Purpose | Records |
|-------|---------|--------:|
| `drafts` | Draft metadata + full text | 260 |
| `ratings` | 5-dimension AI ratings + summary | 260 |
| `embeddings` | Semantic vectors (nomic-embed-text) | 260 |
| `llm_cache` | Claude API response cache | ~500 |
| `authors` | Person records from Datatracker | 403 |
| `draft_authors` | Author-draft relationships | 742 |
| `ideas` | Extracted technical ideas | 1,262 |
| `drafts` | Draft metadata + full text | 475 |
| `ratings` | 5-dimension ratings + summary + false_positive flag | 475 |
| `embeddings` | Semantic vectors (nomic-embed-text, 768-dim) | 475 |
| `llm_cache` | Claude API response cache (SHA-256 dedup) | ~1,500 |
| `authors` | Person records from Datatracker | 713 |
| `draft_authors` | Author-draft relationships | ~1,400 |
| `ideas` | Extracted + deduplicated technical ideas | 501 |
| `gaps` | Gap analysis results | 12 |
| `drafts_fts` | FTS5 full-text search index | — |
## arXiv Paper
A 13-page paper is included in `paper/main.tex`:
> **The AI Agent Standardization Wave: A Quantitative Analysis of 260 IETF Internet-Drafts on Autonomous Agents and Artificial Intelligence**
Build with:
```bash
cd paper
python3 export_figures.py # copy/export figures
pdflatex main.tex # compile (run twice for references)
```
## Configuration
```bash
# Show current config
ietf config
# Change Claude model
ietf config --set claude_model claude-sonnet-4-20250514
# API key via .env file (auto-loaded)
echo "ANTHROPIC_API_KEY=sk-ant-..." > .env
```
## Cost
Full analysis of 260 drafts consumed ~475K API tokens (rating + idea extraction + gap analysis). At current Sonnet pricing, this is approximately $2-3 USD.
Full pipeline for 475 drafts: ~$9-15 USD total
- Sonnet for rating + gap analysis (~$3)
- Haiku for bulk idea extraction (~$1)
- Ollama embeddings: free (local)
## License
MIT
MIT — see [LICENSE](LICENSE)

Binary file not shown.

View File

@@ -1,5 +1,5 @@
# Gap Analysis: IETF AI/Agent Draft Landscape
*Generated 2026-03-08 14:30 UTC — analyzing 474 drafts, 462 technical ideas*
*Generated 2026-03-08 15:15 UTC — analyzing 474 drafts, 498 technical ideas*
## Overview
@@ -50,7 +50,7 @@ Current AI safety drafts focus on governance but lack technical protocols for re
### Partially Addressing Ideas
17 extracted ideas touch on this gap:
18 extracted ideas touch on this gap:
| Idea | Draft | Type |
|------|-------|------|
@@ -63,7 +63,7 @@ Current AI safety drafts focus on governance but lack technical protocols for re
| Verifiable Agent Conversation Format | draft-birkholz-verifiable-agent-conversations | protocol |
| Intent-Based Just-in-Time Authorization | draft-chen-agent-decoupled-authorization-model | architecture |
*...and 9 more*
*...and 10 more*
---
@@ -221,7 +221,7 @@ No standardized protocols exist for tracking and billing computational resources
### Partially Addressing Ideas
8 extracted ideas touch on this gap:
10 extracted ideas touch on this gap:
| Idea | Draft | Type |
|------|-------|------|
@@ -229,10 +229,12 @@ No standardized protocols exist for tracking and billing computational resources
| Events Query Protocol | draft-gupta-httpapi-events-query | protocol |
| Micro Agent Communication Protocol (µACP) | draft-mallick-muacp | protocol |
| MOQT Binding for A2A and MCP Protocols | draft-nandakumar-ai-agent-moq-transport | extension |
| AI Agent Protocol Requirements | draft-rosenberg-ai-protocols | requirement |
| SCIM 2.0 Agent Extension | draft-scim-agent-extension | extension |
| Authorized Connection Policy Framework | draft-steckbeck-ua-conn-sec | mechanism |
| Agent Workflow Protocol Well-Known Resource | draft-vinaysingh-awp-wellknown | extension |
| AI Network Traffic Optimization Agent | draft-yuan-rtgwg-traffic-agent-usecase | architecture |
*...and 2 more*
---
@@ -269,7 +271,7 @@ While agent discovery protocols exist, there's no way to cryptographically verif
### Partially Addressing Ideas
25 extracted ideas touch on this gap:
27 extracted ideas touch on this gap:
| Idea | Draft | Type |
|------|-------|------|
@@ -282,7 +284,7 @@ While agent discovery protocols exist, there's no way to cryptographically verif
| AI-Native Network Protocol (AINP) | draft-ainp-protocol | protocol |
| Distributed AI Accountability Protocol | draft-aylward-daap-v2 | protocol |
*...and 17 more*
*...and 19 more*
---
@@ -319,20 +321,20 @@ Current identity/auth solutions don't address secure communication between agent
### Partially Addressing Ideas
46 extracted ideas touch on this gap:
54 extracted ideas touch on this gap:
| Idea | Draft | Type |
|------|-------|------|
| Agent Gateway Intercommunication Framework | draft-han-rtgwg-agent-gateway-intercomm-framework | architecture |
| Agent Gateway Requirements | draft-liu-rtgwg-agent-gateway-requirements | requirement |
| AI Agent Security Requirements Framework | draft-ni-a2a-ai-agent-security-requirements | requirement |
| Centralized Gateway for Multi-Agent Communication | draft-song-dmsc-problem-statement | architecture |
| Multi-Tenant Policy Enforcement Infrastructure | draft-song-dmsc-problem-statement | architecture |
| Intelligent Agent Communication Gateway Architecture | draft-agent-gw | architecture |
| AI-Native Network Protocol (AINP) | draft-ainp-protocol | protocol |
| Agent-to-Agent Communication in Transportation Networks | draft-an-nmrg-i2icf-cits | pattern |
| Zero Trust Runtime Agent Architecture | draft-berlinai-vera | architecture |
| Agentic Data Optimization Layer (ADOL) | draft-chang-agent-token-efficient | protocol |
| Agentic network architecture for multi-agent coordination | draft-chuyi-nmrg-agentic-network-inference | architecture |
*...and 38 more*
*...and 46 more*
---
@@ -443,7 +445,7 @@ No standardized formats or protocols exist for how agents should persist long-te
### Partially Addressing Ideas
16 extracted ideas touch on this gap:
18 extracted ideas touch on this gap:
| Idea | Draft | Type |
|------|-------|------|
@@ -456,7 +458,7 @@ No standardized formats or protocols exist for how agents should persist long-te
| Agentic AI for Autonomous Network Management | draft-hong-nmrg-agenticai-ps | requirement |
| LISP-based geospatial intelligence network | draft-ietf-lisp-nexagon | protocol |
*...and 8 more*
*...and 10 more*
---

0
docs/blog/.nojekyll Normal file
View File

123
docs/blog/css/style.css Normal file
View File

@@ -0,0 +1,123 @@
:root {
--bg: #ffffff;
--text: #1a1a1a;
--muted: #6b7280;
--border: #e5e7eb;
--accent: #2563eb;
--code-bg: #f3f4f6;
}
@media (prefers-color-scheme: dark) {
:root {
--bg: #111827;
--text: #e5e7eb;
--muted: #9ca3af;
--border: #374151;
--accent: #60a5fa;
--code-bg: #1f2937;
}
}
* { margin: 0; padding: 0; box-sizing: border-box; }
body {
font-family: -apple-system, BlinkMacSystemFont, 'Segoe UI', system-ui, sans-serif;
color: var(--text);
background: var(--bg);
line-height: 1.7;
font-size: 17px;
}
.container {
max-width: 720px;
margin: 0 auto;
padding: 2rem 1.5rem;
}
nav {
border-bottom: 1px solid var(--border);
padding: 1rem 0;
margin-bottom: 2rem;
}
nav a {
color: var(--accent);
text-decoration: none;
margin-right: 1.5rem;
font-size: 0.9rem;
}
nav a:hover { text-decoration: underline; }
nav .site-title { font-weight: 700; font-size: 1.1rem; }
h1 { font-size: 2rem; margin: 1.5rem 0 1rem; line-height: 1.2; }
h2 { font-size: 1.5rem; margin: 2rem 0 0.75rem; }
h3 { font-size: 1.2rem; margin: 1.5rem 0 0.5rem; }
p { margin: 0.75rem 0; }
a { color: var(--accent); }
blockquote {
border-left: 3px solid var(--accent);
padding-left: 1rem;
color: var(--muted);
margin: 1rem 0;
}
code {
background: var(--code-bg);
padding: 0.15rem 0.4rem;
border-radius: 3px;
font-size: 0.9em;
}
pre {
background: var(--code-bg);
padding: 1rem;
border-radius: 6px;
overflow-x: auto;
margin: 1rem 0;
}
pre code { background: none; padding: 0; }
table {
width: 100%;
border-collapse: collapse;
margin: 1rem 0;
font-size: 0.95rem;
}
th, td {
padding: 0.5rem 0.75rem;
border: 1px solid var(--border);
text-align: left;
}
th { background: var(--code-bg); font-weight: 600; }
ul, ol { padding-left: 1.5rem; margin: 0.75rem 0; }
li { margin: 0.25rem 0; }
.post-nav {
display: flex;
justify-content: space-between;
margin-top: 3rem;
padding-top: 1rem;
border-top: 1px solid var(--border);
font-size: 0.9rem;
}
.post-list { list-style: none; padding: 0; }
.post-list li { margin: 1rem 0; }
.post-list a { font-size: 1.1rem; font-weight: 500; }
.post-list .desc { color: var(--muted); font-size: 0.9rem; }
footer {
margin-top: 3rem;
padding-top: 1rem;
border-top: 1px solid var(--border);
color: var(--muted);
font-size: 0.85rem;
}

47
docs/blog/index.html Normal file
View File

@@ -0,0 +1,47 @@
<!DOCTYPE html>
<html lang="en">
<head>
<meta charset="utf-8">
<meta name="viewport" content="width=device-width, initial-scale=1">
<title>Home — IETF AI Agent Analysis</title>
<link rel="stylesheet" href="/blog/css/style.css">
</head>
<body>
<div class="container">
<nav><a href="/blog/" class="site-title">IETF AI Agent Analysis</a>
<a href="/blog/posts/01-gold-rush.html">Rush</a>
<a href="/blog/posts/02-who-writes-the-rules.html">Rules</a>
<a href="/blog/posts/03-oauth-wars.html">Wars</a>
<a href="/blog/posts/04-what-nobody-builds.html">Builds</a>
<a href="/blog/posts/05-1262-ideas.html">Converge</a>
<a href="/blog/posts/06-big-picture.html">Picture</a>
<a href="/blog/posts/07-how-we-built-this.html">This</a>
<a href="/blog/posts/08-agents-building-the-analysis.html">Analysis</a></nav>
<h1>The AI Agent Standards Gold Rush</h1>
<p><em>A data-driven analysis of 475 IETF Internet-Drafts on AI agents, autonomous systems, and machine learning protocols.</em></p>
<p>The IETF is experiencing an unprecedented surge in AI/agent standardization activity.
We built an automated analysis pipeline to make sense of it: 713 authors, 501 ideas,
132 cross-organizational convergent ideas, and 12 identified gaps.</p>
<h2>The Series</h2>
<ul class="post-list">
<li><a href="/blog/posts/01-gold-rush.html">Post 1: The Gold Rush</a></li><li><a href="/blog/posts/02-who-writes-the-rules.html">Post 2: Who Writes the Rules</a></li><li><a href="/blog/posts/03-oauth-wars.html">Post 3: The OAuth Wars</a></li><li><a href="/blog/posts/04-what-nobody-builds.html">Post 4: What Nobody Builds</a></li><li><a href="/blog/posts/05-1262-ideas.html">Post 5: Where Drafts Converge</a></li><li><a href="/blog/posts/06-big-picture.html">Post 6: The Big Picture</a></li><li><a href="/blog/posts/07-how-we-built-this.html">Post 7: How We Built This</a></li><li><a href="/blog/posts/08-agents-building-the-analysis.html">Post 8: Agents Building the Agent Analysis</a></li>
</ul>
<h2>About</h2>
<p>This analysis was produced using the <a href="https://github.com/cnennemann/ietf-draft-analyzer">IETF Draft Analyzer</a>,
an open-source Python tool that combines Claude for multi-dimensional rating and idea extraction
with Ollama for semantic embeddings. Total API cost: ~$9-15.</p>
<p><a href="/blog/posts/00-series-overview.html">Read the series overview &rarr;</a></p>
<footer>
<p>IETF Draft Analyzer &mdash; Data collected through March 2026.
<a href="https://github.com/cnennemann/ietf-draft-analyzer">Source on GitHub</a></p>
</footer>
</div>
</body>
</html>

View File

@@ -0,0 +1,377 @@
<!DOCTYPE html>
<html lang="en">
<head>
<meta charset="utf-8">
<meta name="viewport" content="width=device-width, initial-scale=1">
<title>Blog Series: The IETF's AI Agent Standards Race — IETF AI Agent Analysis</title>
<link rel="stylesheet" href="/blog/css/style.css">
</head>
<body>
<div class="container">
<nav><a href="/blog/" class="site-title">IETF AI Agent Analysis</a>
<a href="/blog/posts/01-gold-rush.html">Rush</a>
<a href="/blog/posts/02-who-writes-the-rules.html">Rules</a>
<a href="/blog/posts/03-oauth-wars.html">Wars</a>
<a href="/blog/posts/04-what-nobody-builds.html">Builds</a>
<a href="/blog/posts/05-1262-ideas.html">Converge</a>
<a href="/blog/posts/06-big-picture.html">Picture</a>
<a href="/blog/posts/07-how-we-built-this.html">This</a>
<a href="/blog/posts/08-agents-building-the-analysis.html">Analysis</a></nav>
<h1 id="blog-series-the-ietfs-ai-agent-standards-race">Blog Series: The IETF's AI Agent Standards Race</h1>
<h2 id="series-overview-and-narrative-arc">Series Overview and Narrative Arc</h2>
<p><em>Architectural design document governing the 7-post blog series. This document has two sections: (A) the internal narrative architecture (for the team), and (B) the reader-facing series introduction (for publication).</em></p>
<p><em>Analysis based on IETF Datatracker data collected through March 2026. Counts and statistics reflect this snapshot.</em></p>
<hr />
<h1 id="part-a-narrative-architecture-internal">PART A: NARRATIVE ARCHITECTURE (Internal)</h1>
<h2 id="overall-thesis">Overall Thesis</h2>
<p><strong>The IETF's AI agent standardization effort is the largest, fastest-growing, and most consequential standards race in a decade -- but it is building the highways before the traffic lights.</strong></p>
<p>The data tells a story in three acts:</p>
<ol>
<li>
<p><strong>The Gold Rush</strong> (Posts 1-2): An explosion of activity, concentrated in surprising hands. 434 drafts, rapid growth in 9 months, one company writing ~16% of all drafts, Western tech giants dramatically underrepresented.</p>
</li>
<li>
<p><strong>The Fragmentation</strong> (Posts 3-4): That activity is not converging. 155 competing A2A protocols with no interoperability layer. 14 OAuth-for-agents proposals that cannot coexist. A ~4:1 ratio of capability-building to safety work (averaging ~4:1 but varying from 1.5:1 to 21:1 month-to-month). Critical gaps where nobody is building at all.</p>
</li>
<li>
<p><strong>The Path Forward</strong> (Posts 5-6): The raw material for a solution exists -- <strong>130 cross-org convergent ideas</strong> (36% of unique clusters) independently proposed by multiple organizations show where genuine consensus is forming. But convergence on components is not convergence on architecture. The missing piece is not more protocols; it is connective tissue: a shared execution model, human oversight primitives, protocol interoperability, and assurance profiles.</p>
</li>
</ol>
<p>The throughline is a question: <strong>Can the IETF assemble the architecture before the protocols ship without it?</strong></p>
<hr />
<h2 id="narrative-arc-diagram">Narrative Arc Diagram</h2>
<pre><code>TENSION
^
| Post 6: THE BIG PICTURE
| / (resolution: here's
| / what the ecosystem
| Post 4: THE GAPS -----+ actually needs)
| / (climax: what \
| / nobody's building) \
| Post 3 / Post 5 \
| FRAGMENTATION CONVERGENCE \
| / (escalation: (130 cross-org \
| / competing for solutions) Post 7
| / protocols) HOW WE
|/ BUILT THIS
Post 1 Post 2
GOLD RUSH WHO WRITES
(hook: the THE RULES
numbers) (stakes:
geopolitics)
+-----------------------------------------------------------&gt; TIME/POSTS
</code></pre>
<p><strong>The emotional arc</strong>: Wow, this is huge (Post 1) -&gt; Wait, who controls it? (Post 2) -&gt; Oh no, it is fragmenting (Post 3) -&gt; And the most important parts are missing (Post 4, the climax) -&gt; But beneath the chaos, organizations actually agree on 130 ideas (Post 5) -&gt; Here is what the finished picture looks like (Post 6, the resolution) -&gt; And here is how we figured all this out (Post 7, the coda).</p>
<hr />
<h2 id="per-post-design">Per-Post Design</h2>
<h3 id="post-1-the-ietfs-ai-agent-gold-rush">Post 1: "The IETF's AI Agent Gold Rush"</h3>
<p><strong>File</strong>: <code>01-gold-rush.md</code>
<strong>Word count</strong>: 1800-2200
<strong>Base</strong>: Existing draft at <code>data/reports/blog-post.md</code>, needs update from 260 to 434 drafts</p>
<p><strong>Key thesis</strong>: The IETF is experiencing an unprecedented standardization sprint around AI agents, with growth rates not seen since the early web standards era.</p>
<p><strong>Key data points to include</strong>:
- 434 drafts (up from 260 after keyword expansion with mcp, agentic, inference, generative, intelligent, aipref)
- Rapid growth: from 5 drafts/month (Jun 2025) to 85 drafts/month (Feb 2026)
- 557 authors from 230 organizations
- 10+ categories, with data formats/interop (174), A2A protocols (155), and identity/auth (152) leading
- Average quality score: ~3.27/5.0 (4-dim composite, range 1.25-4.75)
- Top-rated drafts: VOLT (4.75), DAAP (4.75), STAMP (4.5), TPM-attestation (4.5)
- ~4:1 safety deficit ratio on aggregate, varying from 1.5:1 to 21:1 by month (first mention -- this becomes the recurring motif)</p>
<p><strong>What makes it worth reading alone</strong>: The sheer numbers. Nobody else has quantified this. The rapid growth curve is the hook.</p>
<p><strong>Ends with</strong>: Teaser for Post 2 -- "But who is writing all these drafts? The answer is more concentrated than you'd expect."</p>
<hr />
<h3 id="post-2-whos-writing-the-rules-for-ai-agents">Post 2: "Who's Writing the Rules for AI Agents?"</h3>
<p><strong>File</strong>: <code>02-who-writes-the-rules.md</code>
<strong>Word count</strong>: 2000-2500</p>
<p><strong>Key thesis</strong>: The standards that will govern AI agents are being written by a remarkably concentrated set of authors, with geopolitical implications that the IETF community has not reckoned with.</p>
<p><strong>Key data points to include</strong>:
- Huawei: 53 authors, 69 drafts, ~16% of all drafts (up from 12% pre-expansion)
- The 13-person Huawei bloc: 22 shared drafts, 94% cohesion, core 7 (B. Liu, N. Geng, Z. Li, Q. Gao, X. Shang, J. Mao, G. Zeng) each on 13-23 drafts
- Chinese institutional ecosystem: Huawei (53) + China Mobile (24) + China Telecom (24) + China Unicom (22) + Tsinghua (13) + ZTE (12) + BUPT (14) + Pengcheng Lab (8) + Zhongguancun Lab (4) = 160+ authors
- Western underrepresentation: Google now visible (5 authors, 9 drafts) but dramatically small relative to market position. Microsoft, Apple still largely absent. Amazon has 6 authors on 6 drafts (PQ crypto, not agent-specific).
- 18 team blocs covering ~25% of 557 authors
- Cross-org collaboration is sparse: top cross-team pair (Rosenberg-Jennings, Five9/Cisco) shares only 3 drafts
- Ericsson + Inria team focused narrowly on EDHOC/post-quantum (5 people, 6 drafts, 100% cohesion)
- JPMorgan + Telefonica + Oracle on transitive attestation (Western financial sector emerging)
- Chinese orgs form a tightly linked ecosystem: Huawei-China Unicom (6 shared drafts), Tsinghua-Zhongguancun Lab (5), China Mobile-ZTE (4)</p>
<p><strong>Structural insight</strong>: Team blocs inflate apparent collaboration. When you account for intra-bloc pairs, cross-pollination between groups is thin. The landscape is a collection of islands, not a network.</p>
<p><strong>What makes it worth reading alone</strong>: The geopolitics angle. The Huawei concentration is a genuine story. The Western absence is the surprise.</p>
<p><strong>Ends with</strong>: "These 18 teams are not just writing separate drafts -- they are writing separate futures. The fragmentation runs deeper than authorship."</p>
<hr />
<h3 id="post-3-the-oauth-wars-and-other-protocol-battles">Post 3: "The OAuth Wars and Other Protocol Battles"</h3>
<p><strong>File</strong>: <code>03-oauth-wars.md</code>
<strong>Word count</strong>: 2000-2500</p>
<p><strong>Key thesis</strong>: The AI agent standards landscape is not just growing -- it is fragmenting. Multiple teams are solving the same problems independently, producing incompatible solutions that will impose real costs on implementers.</p>
<p><strong>Key data points to include</strong>:
- 14-draft OAuth-for-agents cluster: aap-oauth-profile, aylward-daap-v2, barney-caam, chen-ai-agent-auth, chen-oauth-rar, goswami-agentic-jwt, jia-oauth-scope, liu-agent-operation-auth, liu-oauth-a2a, oauth-ai-agents-on-behalf-of-user, rosenberg-oauth-aauth, song-oauth-ai-agent-auth, song-oauth-ai-agent-collaborate, yao-agent-auth
- 10-draft Agent Gateway cluster
- 25+ near-duplicate draft pairs (&gt;0.98 similarity)
- 42 topical clusters at 0.85 similarity threshold, 34 at 0.90
- 155 A2A protocol drafts with no interoperability layer
- Near-duplicate taxonomy: same-draft/different-WG (14), renamed (5), evolution (3), competing (2)
- Specific examples of WG shopping: draft submitted to both NMRG and OPSAWG, or both individual and WG track</p>
<p><strong>Structural insight</strong>: Three causes of fragmentation: (1) WG shopping -- authors submit to multiple WGs hoping one sticks. (2) Parallel invention -- teams in isolation solving the same problem. (3) Strategic duplication -- organizations maximizing surface area. The data lets us distinguish these.</p>
<p><strong>What makes it worth reading alone</strong>: The concrete examples. 14 ways to do OAuth for agents. People share this out of horrified fascination.</p>
<p><strong>Ends with</strong>: "Fragmentation is costly but fixable -- teams can converge. The deeper problem is what nobody is building at all."</p>
<hr />
<h3 id="post-4-what-nobodys-building-and-why-it-matters">Post 4: "What Nobody's Building (And Why It Matters)"</h3>
<p><strong>File</strong>: <code>04-what-nobody-builds.md</code>
<strong>Word count</strong>: 2000-2500</p>
<p><strong>THIS IS THE CLIMAX OF THE SERIES.</strong></p>
<p><strong>Key thesis</strong>: The most dangerous gaps in AI agent standardization are not where competing solutions exist -- they are where no solutions exist at all. The three critical gaps address what happens when autonomous agents fail or misbehave, and these scenarios have received almost no attention.</p>
<p><strong>Key data points to include</strong>:
- 11 gaps total: 2 critical, 5 high, 4 medium
- <strong>Critical Gap 1: Behavioral Verification</strong> -- no mechanisms to verify agents follow declared policies. 47 safety drafts vs 434 total.
- <strong>Critical Gap 2: Failure Cascade Prevention</strong> -- 114 autonomous netops drafts, no cascade prevention framework.
- <strong>Critical Gap 3: Error Recovery and Rollback</strong> -- only 6 ideas from 1 draft (the starkest absence in the corpus).
- <strong>High Gap: Cross-Protocol Translation</strong> -- 155 A2A protocols, zero ideas for cross-protocol interop.
- <strong>High Gap: Human Override</strong> -- 34 human-agent drafts vs 155 A2A vs 114 autonomous netops. CHEQ exists but no emergency override protocol.
- The ~4:1 ratio (varying 1.5:1 to 21:1) revisited: safety deficit is not just numerical, it is structural. Safety requires cross-WG coordination that the bloc structure cannot produce.
- Gap severity correlates with coordination difficulty</p>
<p><strong>For each critical gap, include a scenario</strong>: "What goes wrong if this is never addressed?" -- make the gaps concrete and visceral.</p>
<p><strong>What makes it worth reading alone</strong>: The fear factor. This is the "what keeps you up at night" post.</p>
<p><strong>Ends with</strong>: "The gaps are real. But so are the solutions -- 130 ideas that multiple organizations independently agree on, scattered across the corpus with no connective tissue."</p>
<hr />
<h3 id="post-5-where-434-drafts-converge-and-where-they-dont">Post 5: "Where 434 Drafts Converge (And Where They Don't)"</h3>
<p><strong>File</strong>: <code>05-1262-ideas.md</code>
<strong>Word count</strong>: 2000-2500</p>
<p><strong>Key thesis</strong>: Beneath the fragmentation, genuine consensus is forming. <strong>130 cross-org convergent ideas</strong> (36% of unique clusters) have been independently proposed by 2+ organizations -- cross-org convergence signals that reveal what the industry actually agrees on, regardless of which protocol camp they belong to.</p>
<p><strong>IMPORTANT NOTE ON FRAMING</strong>: The current database contains 419 ideas in 361 unique clusters. Cross-org convergence analysis (SequenceMatcher at 0.75 threshold) yields 130 ideas appearing across 2+ organizations. An earlier pipeline run with ~1,780 raw ideas produced 628 cross-org convergent ideas; the convergence <em>rate</em> (~36%) is consistent across both runs. The raw count is not the story. The story is which ideas survive cross-org validation. The raw extraction count should appear only in methodology context, not as a headline number.</p>
<p><strong>Key data points to include</strong>:
- <strong>130 cross-org convergent ideas</strong> (ideas in 2+ drafts from different organizations) -- the headline metric
- Top convergence: "A2A Communication Paradigm" (8 orgs, 5 countries), "AI Agent Network Architecture" (8 orgs), "Multi-Agent Communication Protocol" (7 orgs)
- Org-pair overlap matrix: Chinese intra-bloc alignment (Huawei-China Unicom: 32 shared ideas) vs thin cross-regional signal (Ericsson-Inria: 21)
- Cross-org ideas that span Chinese-Western divide: 180 ideas (genuine cross-cultural consensus)
- Gap-to-convergence mapping: which gaps have cross-org attention, which have none?
- The "big 6" ambitious proposals: VOLT, ECT, CHEQ, STAMP, DAAP, ADL -- standout ideas regardless of convergence metrics
- The absent ideas: capability degradation signaling, multi-agent transaction semantics, agent migration, privacy-preserving discovery, agent cost/billing</p>
<p><strong>Structural insight</strong>: Convergence and fragmentation coexist. Teams agree on WHAT needs building (130 ideas converge across orgs). They disagree on HOW (155 competing A2A protocols). The gap between "what" and "how" is where architecture is needed.</p>
<p><strong>What makes it worth reading alone</strong>: The cross-org convergence data is actionable -- builders can see which ideas have multi-org backing vs single-team proposals.</p>
<p><strong>Ends with</strong>: "130 ideas the industry agrees on, 11 gaps nobody is filling, and a question: what would it look like if someone drew the big picture?"</p>
<hr />
<h3 id="post-6-drawing-the-big-picture-what-the-agent-ecosystem-actually-needs">Post 6: "Drawing the Big Picture: What the Agent Ecosystem Actually Needs"</h3>
<p><strong>File</strong>: <code>06-big-picture.md</code>
<strong>Word count</strong>: 2000-2500</p>
<p><strong>THIS IS THE RESOLUTION AND CAPSTONE.</strong></p>
<p><strong>Key thesis</strong>: The landscape needs not more protocols but connective tissue -- a holistic ecosystem architecture providing a shared execution model (DAGs), human oversight primitives, protocol-agnostic interoperability, and assurance profiles that work from dev to regulated production.</p>
<p><strong>Key data points to include</strong>:
- Full synthesis: 434 drafts, 557 authors, 130 cross-org convergent ideas, 11 gaps, 18 team blocs, 42 overlap clusters
- The proposed 5-draft ecosystem: AEM (architecture), ATD (task DAG), HITL (human-in-the-loop), AEPB (protocol binding), APAE (assurance profiles)
- How this builds on existing work: SPIFFE (identity), WIMSE (security context), ECT (execution evidence)
- The dual-regime insight: same execution model must work in K8s (fast/relaxed) AND regulated environments (proofs/attestation)
- Predictions based on data trajectories
- What builders should do TODAY: which drafts to watch, which gaps to fill, which patterns to adopt</p>
<p><strong>Structural insight</strong>: The ecosystem needs five layers and existing work covers ~60%. Missing pieces: (1) DAG orchestration semantics, (2) HITL as first-class, (3) protocol translation, (4) assurance profiles. These map precisely to the critical and high-severity gaps.</p>
<p><strong>What makes it worth reading alone</strong>: The vision. The forward-looking piece people share with their teams.</p>
<p><strong>Ends with</strong>: "The IETF has navigated standardization sprints before. The drafts are being written. The question is whether architecture or fragmentation wins the race."</p>
<hr />
<h3 id="post-7-how-we-built-this-analyzing-434-ietf-drafts-with-claude-and-ollama">Post 7: "How We Built This: Analyzing 434 IETF Drafts with Claude and Ollama"</h3>
<p><strong>File</strong>: <code>07-how-we-built-this.md</code>
<strong>Word count</strong>: 1500-2000</p>
<p><strong>Key thesis</strong>: LLM-powered document analysis at scale is practical, cheap, and effective -- with careful engineering around caching, cost optimization, and hybrid model strategies.</p>
<p><strong>Key data points to include</strong>:
- Pipeline: fetch (Datatracker API) -&gt; analyze (Claude Sonnet) -&gt; embed (Ollama nomic-embed-text) -&gt; ideas (Claude Haiku, batched) -&gt; gaps (Claude Sonnet)
- Cost: ~$3.16 for 260 drafts; Haiku batch mode cut costs ~10x for idea extraction
- Hybrid strategy: Claude for analysis (reasoning), Ollama for embeddings (local, free, fast)
- Caching via llm_cache table (SHA256 prompt hash) -- zero waste on re-runs
- Tech: Python + Click + SQLite + FTS5 + httpx + rich + anthropic SDK + ollama
- 13 CLI commands, 13+ visualizations, 11 report types</p>
<p><strong>What makes it worth reading alone</strong>: Practical engineering details for anyone building similar systems.</p>
<p><strong>Ends with</strong>: Cross-link to Post 8 (the meta post about the agent team).</p>
<hr />
<h2 id="recurring-motifs-thread-across-all-posts">Recurring Motifs (thread across all posts)</h2>
<ol>
<li>
<p><strong>The ~4:1 Safety Deficit</strong> (averaging ~4:1, varying from 1.5:1 to 21:1 month-to-month): Introduced in Post 1, deepened in Post 4, resolved in Post 6. The series' signature metric.</p>
</li>
<li>
<p><strong>The Highway/Traffic Light Metaphor</strong>: The IETF is building highways (protocols) before traffic lights (safety, verification, override). Use sparingly but consistently.</p>
</li>
<li>
<p><strong>Fragmentation vs. Architecture</strong>: Bottom-up protocol proliferation vs. top-down ecosystem design. Posts 3 and 6 are the poles of this tension.</p>
</li>
<li>
<p><strong>Concentration and Absence</strong>: Huawei's dominance and Western absence. Introduced in Post 2, revisited in Post 6.</p>
</li>
<li>
<p><strong>The Islands Problem</strong>: Team blocs as islands. Ideas cluster within orgs. Cross-pollination is thin. The ecosystem needs bridges, not more islands.</p>
</li>
</ol>
<hr />
<h2 id="data-needs-per-post-for-the-analyst">Data Needs Per Post (for the Analyst)</h2>
<table>
<thead>
<tr>
<th>Post</th>
<th>Data Needed</th>
</tr>
</thead>
<tbody>
<tr>
<td>1</td>
<td>Updated counts (361), category breakdown with new drafts, growth timeline, score distribution</td>
</tr>
<tr>
<td>2</td>
<td>Author/org rankings (refreshed for 361), bloc details, cross-org matrix, Chinese vs Western counts</td>
</tr>
<tr>
<td>3</td>
<td>OAuth cluster details (14 drafts with approaches), near-duplicate pairs, overlap clusters, A2A count</td>
</tr>
<tr>
<td>4</td>
<td>Full gap details, per-gap idea counts, safety ratio, category vs gap matrix</td>
</tr>
<tr>
<td>5</td>
<td>Full idea taxonomy, cross-org idea overlap, common ideas, unique ideas, idea-to-gap mapping</td>
</tr>
<tr>
<td>6</td>
<td>Synthesis: top-level stats, gap fill estimates, category growth rates, WG adoption signals</td>
</tr>
<tr>
<td>7</td>
<td>Pipeline stats: API call counts, costs, cache hit rates, timing</td>
</tr>
</tbody>
</table>
<hr />
<h2 id="missing-analyses-the-coder-should-build">Missing Analyses the Coder Should Build</h2>
<ol>
<li>
<p><strong>Category Trend Analysis</strong> (Posts 1, 3, 6): Monthly breakdown per category. Growth rates. Which accelerating, which plateauing?</p>
</li>
<li>
<p><strong>RFC Cross-Reference Map</strong> (Posts 5, 6): Which RFCs do the 434 drafts build on? Reveals the foundation layer.</p>
</li>
<li>
<p><strong>Cross-Org Idea Overlap</strong> (Post 5): Ideas in 2+ drafts from different orgs = genuine consensus signal.</p>
</li>
<li>
<p><strong>Draft Status / WG Adoption</strong> (Post 6): Which drafts adopted by WGs? Which past -00? Traction vs aspiration.</p>
</li>
</ol>
<hr />
<h2 id="tone-and-style">Tone and Style</h2>
<ul>
<li><strong>Data-driven but narrative</strong>: Every claim backed by a number, every number wrapped in a story.</li>
<li><strong>Authoritative but accessible</strong>: Analysis, not advocacy. Let the data argue.</li>
<li><strong>Opinionated where data supports it</strong>: The safety deficit is a problem. Fragmentation is costly. Concentration is concerning.</li>
<li><strong>Name names</strong>: Specific drafts, authors, organizations. This is journalism.</li>
<li><strong>Lead with surprise</strong>: Each post opens with its most unexpected finding.</li>
<li><strong>End with forward link</strong>: Each post teases the next.</li>
<li><strong>1500-2500 words per post</strong>: Dense enough to be substantial, short enough to finish.</li>
</ul>
<hr />
<h1 id="part-b-reader-facing-series-introduction">PART B: READER-FACING SERIES INTRODUCTION</h1>
<p><em>What happens when the internet's standards body tries to build the rules for AI agents -- in real time, with 434 drafts, 557 authors, and a ~4:1 safety deficit (varying from 1.5:1 to 21:1 by month)?</em></p>
<hr />
<h2 id="about-this-series">About This Series</h2>
<p>The Internet Engineering Task Force is in the middle of the largest, fastest-growing standards race in a decade. In fifteen months, AI- and agent-related Internet-Drafts went from <strong>0.5% to 9.3%</strong> of all IETF submissions -- nearly 1 in 10. We built an automated analyzer to fetch, categorize, rate, and map every one of them.</p>
<p>This series tells the story of what we found: explosive growth, deep fragmentation, a concerning safety deficit, and hidden patterns that reveal where the real power lies and where the real risks lurk.</p>
<h2 id="the-posts">The Posts</h2>
<table>
<thead>
<tr>
<th>#</th>
<th>Title</th>
<th>What You'll Learn</th>
</tr>
</thead>
<tbody>
<tr>
<td>1</td>
<td><a href="01-gold-rush.md">The IETF's AI Agent Gold Rush</a></td>
<td>The numbers: 434 drafts, 0.5% to 9.3% growth in 15 months, and a ~4:1 capability-to-safety ratio (varying 1.5:1 to 21:1)</td>
</tr>
<tr>
<td>2</td>
<td><a href="02-who-writes-the-rules.md">Who's Writing the Rules for AI Agents?</a></td>
<td>The geopolitics: Huawei's 13-person bloc, Chinese institutional dominance, Western underrepresentation</td>
</tr>
<tr>
<td>3</td>
<td><a href="03-oauth-wars.md">The OAuth Wars and Other Battles</a></td>
<td>The fragmentation: 14 competing OAuth drafts, 155 A2A protocols with no interop</td>
</tr>
<tr>
<td>4</td>
<td><a href="04-what-nobody-builds.md">What Nobody's Building (And Why It Matters)</a></td>
<td>The gaps: 11 missing standards, 2 critical, and what goes wrong without them</td>
</tr>
<tr>
<td>5</td>
<td><a href="05-1262-ideas.md">Where 434 Drafts Converge (And Where They Don't)</a></td>
<td>The convergence: 130 cross-org ideas reveal genuine consensus beneath the fragmentation</td>
</tr>
<tr>
<td>6</td>
<td><a href="06-big-picture.md">Drawing the Big Picture</a></td>
<td>The vision: what the agent ecosystem actually needs and what comes next</td>
</tr>
<tr>
<td>7</td>
<td><a href="07-how-we-built-this.md">How We Built This</a></td>
<td>The methodology: analyzing 434 drafts with Claude, Ollama, and Python</td>
</tr>
</tbody>
</table>
<h2 id="how-to-read">How to Read</h2>
<p><strong>Linear (recommended)</strong>: 1 -&gt; 2 -&gt; 3 -&gt; 4 -&gt; 5 -&gt; 6 -&gt; 7</p>
<p><strong>By interest</strong>:
- <strong>Executives / decision-makers</strong>: Post 1 (overview) -&gt; Post 4 (gaps) -&gt; Post 6 (vision)
- <strong>Standards participants</strong>: Post 2 (who's writing) -&gt; Post 3 (fragmentation) -&gt; Post 5 (ideas) -&gt; Post 6 (vision)
- <strong>Builders / implementers</strong>: Post 4 (gaps) -&gt; Post 5 (ideas) -&gt; Post 6 (vision) -&gt; Post 7 (methodology)</p>
<p>Each post stands alone, but they build on each other. If you read one, make it <strong>Post 4</strong> -- the gaps analysis is the most consequential finding.</p>
<h2 id="the-data">The Data</h2>
<p>All findings come from our open-source IETF Draft Analyzer, which fetches drafts via the Datatracker API, rates them using Claude, extracts technical ideas, detects collaboration patterns via co-authorship analysis, and identifies standardization gaps. Data current as of March 2026.</p>
<table>
<thead>
<tr>
<th>Stat</th>
<th>Value</th>
</tr>
</thead>
<tbody>
<tr>
<td>Drafts analyzed</td>
<td>434</td>
</tr>
<tr>
<td>Authors mapped</td>
<td>557</td>
</tr>
<tr>
<td>Organizations</td>
<td>230</td>
</tr>
<tr>
<td>Cross-org convergent ideas</td>
<td>130</td>
</tr>
<tr>
<td>Gaps identified</td>
<td>11 (2 critical)</td>
</tr>
<tr>
<td>Team blocs detected</td>
<td>18</td>
</tr>
<tr>
<td>Analysis cost</td>
<td>~$9</td>
</tr>
</tbody>
</table>
<hr />
<p><em>Designed by the Architect agent, 2026-03-03.</em></p>
<div class="post-nav"><span></span><a href="/blog/posts/01-gold-rush.html">The Gold Rush &rarr;</a></div>
<footer>
<p>IETF Draft Analyzer &mdash; Data collected through March 2026.
<a href="https://github.com/cnennemann/ietf-draft-analyzer">Source on GitHub</a></p>
</footer>
</div>
</body>
</html>

View File

@@ -0,0 +1,312 @@
<!DOCTYPE html>
<html lang="en">
<head>
<meta charset="utf-8">
<meta name="viewport" content="width=device-width, initial-scale=1">
<title>The IETF's AI Agent Gold Rush: 434 Drafts, 557 Authors, and the Race to Define How AI Agents Talk — IETF AI Agent Analysis</title>
<link rel="stylesheet" href="/blog/css/style.css">
</head>
<body>
<div class="container">
<nav><a href="/blog/" class="site-title">IETF AI Agent Analysis</a>
<strong>Rush</strong>
<a href="/blog/posts/02-who-writes-the-rules.html">Rules</a>
<a href="/blog/posts/03-oauth-wars.html">Wars</a>
<a href="/blog/posts/04-what-nobody-builds.html">Builds</a>
<a href="/blog/posts/05-1262-ideas.html">Converge</a>
<a href="/blog/posts/06-big-picture.html">Picture</a>
<a href="/blog/posts/07-how-we-built-this.html">This</a>
<a href="/blog/posts/08-agents-building-the-analysis.html">Analysis</a></nav>
<h1 id="the-ietfs-ai-agent-gold-rush-434-drafts-557-authors-and-the-race-to-define-how-ai-agents-talk">The IETF's AI Agent Gold Rush: 434 Drafts, 557 Authors, and the Race to Define How AI Agents Talk</h1>
<p><em>Fifteen months ago, AI agents barely registered at the IETF. Today, nearly 1 in 10 new Internet-Drafts is about AI agents. We analyzed every one.</em></p>
<p><em>Analysis based on IETF Datatracker data collected through March 2026. Counts and statistics reflect this snapshot.</em></p>
<hr />
<p>For every Internet-Draft addressing how to keep an AI agent safe, roughly four are building new capabilities for it. That is the single most important number in this analysis.</p>
<p>We built an automated pipeline to fetch, categorize, rate, and map every AI- and agent-related Internet-Draft currently in the IETF system. We found <strong>434 drafts</strong> from <strong>557 authors</strong> at <strong>230 organizations</strong> and identified <strong>11 standardization gaps</strong> -- two of them critical. The result is the most comprehensive public analysis of the IETF's AI agent landscape to date.</p>
<p>The story the data tells is not subtle: the internet's most important standards body is in the middle of a gold rush, and the prospectors are moving faster than the safety inspectors.</p>
<h2 id="the-growth-curve">The Growth Curve</h2>
<p>In 2024, just <strong>9 AI/agent-related drafts</strong> were submitted to the IETF -- <strong>0.5%</strong> of all submissions. By Q1 2026, AI/agent drafts account for <strong>9.3%</strong> of all new Internet-Drafts. Nearly 1 in 10.</p>
<table>
<thead>
<tr>
<th>Year</th>
<th style="text-align: right;">Total IETF Drafts</th>
<th style="text-align: right;">AI/Agent Drafts</th>
<th style="text-align: right;">AI Share</th>
</tr>
</thead>
<tbody>
<tr>
<td>2021</td>
<td style="text-align: right;">1,108</td>
<td style="text-align: right;">~0</td>
<td style="text-align: right;">~0%</td>
</tr>
<tr>
<td>2022</td>
<td style="text-align: right;">1,121</td>
<td style="text-align: right;">~0</td>
<td style="text-align: right;">~0%</td>
</tr>
<tr>
<td>2023</td>
<td style="text-align: right;">1,241</td>
<td style="text-align: right;">~0</td>
<td style="text-align: right;">~0%</td>
</tr>
<tr>
<td>2024</td>
<td style="text-align: right;">1,651</td>
<td style="text-align: right;">9</td>
<td style="text-align: right;">0.5%</td>
</tr>
<tr>
<td>2025</td>
<td style="text-align: right;">2,696</td>
<td style="text-align: right;">190</td>
<td style="text-align: right;">7.0%</td>
</tr>
<tr>
<td>2026 (Q1)</td>
<td style="text-align: right;">1,748</td>
<td style="text-align: right;">162</td>
<td style="text-align: right;">9.3%</td>
</tr>
</tbody>
</table>
<p>The IETF itself accelerated 2.4x from 2021 to 2025. But AI/agent work went from essentially zero to dominant topic in under two years. The acceleration is not gradual. Submissions surged rapidly beginning in mid-2025 -- from 5 drafts in June 2025 to 61 in October 2025 to 85 in February 2026 -- and have not slowed.</p>
<p>This growth is driven by a convergence of forces: the explosion of commercial AI agent deployments (ChatGPT plugins, Anthropic's Claude tools, Google's Gemini agents), the emergence of protocols like MCP and A2A that need standardization, and the recognition across the industry that AI agents communicating over the internet without agreed-upon identity, security, and interoperability standards is a problem that gets worse every month it goes unaddressed.</p>
<p>(A note on methodology: our pipeline searches the Datatracker for 12 keywords -- <code>agent</code>, <code>ai-agent</code>, <code>llm</code>, <code>autonomous</code>, <code>machine-learning</code>, <code>artificial-intelligence</code>, <code>mcp</code>, <code>agentic</code>, <code>inference</code>, <code>generative</code>, <code>intelligent</code>, and <code>aipref</code> -- across both draft names and abstracts. We started with 6 keywords and 260 drafts, then expanded to 12 to capture MCP-related work, generative AI infrastructure, and intelligent networking. The full methodology is in <a href="07-how-we-built-this.md">Post 7</a>.)</p>
<p>The drafts span ten categories, and the distribution reveals priorities:</p>
<table>
<thead>
<tr>
<th>Category</th>
<th style="text-align: right;">Drafts</th>
<th style="text-align: right;">Share</th>
</tr>
</thead>
<tbody>
<tr>
<td>Data formats and interoperability</td>
<td style="text-align: right;">174</td>
<td style="text-align: right;">40%</td>
</tr>
<tr>
<td>A2A protocols</td>
<td style="text-align: right;">155</td>
<td style="text-align: right;">36%</td>
</tr>
<tr>
<td>Agent identity and authentication</td>
<td style="text-align: right;">152</td>
<td style="text-align: right;">35%</td>
</tr>
<tr>
<td>Autonomous network operations</td>
<td style="text-align: right;">114</td>
<td style="text-align: right;">26%</td>
</tr>
<tr>
<td>Policy and governance</td>
<td style="text-align: right;">109</td>
<td style="text-align: right;">25%</td>
</tr>
<tr>
<td>Agent discovery and registration</td>
<td style="text-align: right;">89</td>
<td style="text-align: right;">21%</td>
</tr>
<tr>
<td>ML traffic management</td>
<td style="text-align: right;">79</td>
<td style="text-align: right;">18%</td>
</tr>
<tr>
<td>AI safety and alignment</td>
<td style="text-align: right;">47</td>
<td style="text-align: right;">11%</td>
</tr>
<tr>
<td>Model serving and inference</td>
<td style="text-align: right;">42</td>
<td style="text-align: right;">10%</td>
</tr>
<tr>
<td>Human-agent interaction</td>
<td style="text-align: right;">34</td>
<td style="text-align: right;">8%</td>
</tr>
</tbody>
</table>
<p>Note that drafts can belong to multiple categories, so percentages exceed 100%. The dominance of plumbing -- data formats, identity, and communication protocols -- is expected for an early-stage standards effort. What is unexpected is how little attention the safety and human-oversight categories receive.</p>
<p>The ecosystem's DNA is visible in what it cites. We parsed <strong>4,231 cross-references</strong> from the drafts, and the foundation is clear: <strong>TLS 1.3</strong> (RFC 8446, cited by 42 drafts), <strong>OAuth 2.0</strong> (RFC 6749, 36 drafts), <strong>HTTP Semantics</strong> (RFC 9110, 34 drafts), and <strong>JWT</strong> (RFC 7519, 22 drafts). The agent identity/auth category is essentially built on top of the OAuth stack. The entire landscape stands on a security foundation -- which makes the 4:1 safety deficit all the more jarring.</p>
<h2 id="the-safety-deficit">The Safety Deficit</h2>
<p>The ratio is stark:</p>
<table>
<thead>
<tr>
<th>Focus Area</th>
<th style="text-align: right;">Drafts</th>
</tr>
</thead>
<tbody>
<tr>
<td>A2A protocols</td>
<td style="text-align: right;">155</td>
</tr>
<tr>
<td>Autonomous operations</td>
<td style="text-align: right;">114</td>
</tr>
<tr>
<td>Agent identity/auth</td>
<td style="text-align: right;">152</td>
</tr>
<tr>
<td><strong>AI safety/alignment</strong></td>
<td style="text-align: right;"><strong>47</strong></td>
</tr>
<tr>
<td><strong>Human-agent interaction</strong></td>
<td style="text-align: right;"><strong>34</strong></td>
</tr>
</tbody>
</table>
<p>The capability-to-safety ratio is roughly 4:1 on aggregate, though it varies significantly by time period -- from as low as 1.5:1 in some months to over 20:1 in others. The overall trend is clear: for every draft about keeping agents safe, approximately four are building new capabilities. The community is building the highways and forgetting the traffic lights.</p>
<p>This is not an abstract concern. Imagine an AI agent managing cloud infrastructure that detects a spurious anomaly, autonomously scales down a critical service, and triggers a cascading outage across three availability zones. Today, there is no standard mechanism to verify that the agent followed its declared policy before acting. No standard way to roll back the decision once the cascade begins. No standard protocol for a human operator to issue an emergency stop. The three critical gaps our analysis identified -- behavior verification, resource management, and error recovery -- are all about what happens when things go wrong. And in a world of autonomous AI agents, things will go wrong.</p>
<p>The safety drafts that do exist are often among the highest-rated in our analysis. <a href="https://datatracker.ietf.org/doc/draft-aylward-daap-v2/">draft-aylward-daap-v2</a> -- a comprehensive accountability protocol -- and <a href="https://datatracker.ietf.org/doc/draft-cowles-volt/">draft-cowles-volt</a> -- a tamper-evident execution trace format -- each scored 4.75 out of 5 (4-dimension composite excluding overlap), the highest in the entire corpus. <a href="https://datatracker.ietf.org/doc/draft-birkholz-verifiable-agent-conversations/">draft-birkholz-verifiable-agent-conversations</a>, which defines verifiable conversation records using cryptographic signing, scored 4.5. The quality is there. The quantity is not.</p>
<h2 id="whos-writing-the-drafts">Who's Writing the Drafts</h2>
<p>The organizational picture is as revealing as the technical one. The top contributors:</p>
<table>
<thead>
<tr>
<th>Organization</th>
<th style="text-align: right;">Authors</th>
<th style="text-align: right;">Drafts</th>
</tr>
</thead>
<tbody>
<tr>
<td>Huawei</td>
<td style="text-align: right;">53</td>
<td style="text-align: right;">69</td>
</tr>
<tr>
<td>China Mobile</td>
<td style="text-align: right;">24</td>
<td style="text-align: right;">35</td>
</tr>
<tr>
<td>Cisco</td>
<td style="text-align: right;">24</td>
<td style="text-align: right;">26</td>
</tr>
<tr>
<td>Independent</td>
<td style="text-align: right;">19</td>
<td style="text-align: right;">25</td>
</tr>
<tr>
<td>China Telecom</td>
<td style="text-align: right;">24</td>
<td style="text-align: right;">24</td>
</tr>
<tr>
<td>China Unicom</td>
<td style="text-align: right;">22</td>
<td style="text-align: right;">21</td>
</tr>
<tr>
<td>Tsinghua University</td>
<td style="text-align: right;">13</td>
<td style="text-align: right;">16</td>
</tr>
<tr>
<td>ZTE Corporation</td>
<td style="text-align: right;">12</td>
<td style="text-align: right;">12</td>
</tr>
<tr>
<td>Five9</td>
<td style="text-align: right;">1</td>
<td style="text-align: right;">10</td>
</tr>
<tr>
<td>Ericsson</td>
<td style="text-align: right;">4</td>
<td style="text-align: right;">9</td>
</tr>
</tbody>
</table>
<p><strong>Huawei</strong> leads by a wide margin: <strong>53 authors</strong> contributing to <strong>69 drafts</strong> (across all Huawei entities) -- about 16% of the entire corpus. But the concentration goes deeper than raw numbers -- the next post will examine the team bloc structure, geopolitics, and what the collaboration network reveals about where power really lies.</p>
<p>Cisco and China Mobile each have 24 authors, but China Mobile's team produces 35 drafts to Cisco's 26. Ericsson has only 4 authors but punches above its weight with 9 focused drafts. Independent contributors account for 25 drafts -- a healthy sign of grassroots engagement.</p>
<h2 id="the-fragmentation-problem">The Fragmentation Problem</h2>
<p>The drafts are not just numerous; they are redundant. Our embedding-based similarity analysis found <strong>25+ draft pairs</strong> with greater than 0.98 cosine similarity -- functionally identical proposals submitted under different names.</p>
<p>The most crowded space is OAuth for AI agents: <strong>14 separate drafts</strong> all trying to solve how AI agents authenticate and get authorized. They range from broad framework proposals (<a href="https://datatracker.ietf.org/doc/draft-aap-oauth-profile/">draft-aap-oauth-profile</a>) to narrow extensions (<a href="https://datatracker.ietf.org/doc/draft-jia-oauth-scope-aggregation/">draft-jia-oauth-scope-aggregation</a>) to full accountability systems (<a href="https://datatracker.ietf.org/doc/draft-aylward-daap-v2/">draft-aylward-daap-v2</a>). None are compatible with each other.</p>
<p>Beyond OAuth, the broader A2A protocol landscape includes <strong>155 drafts</strong> with no interoperability layer. The most common technical idea in the entire corpus -- "Multi-Agent Communication Protocol" -- appears in 8 separate drafts from different teams. And the fragmentation goes deeper than protocols: the vast majority of technical ideas extracted from the corpus appear in exactly one draft. Everyone is solving the same problem. Nobody is solving it together.</p>
<p>This fragmentation has real costs. Implementers face confusion over which draft to follow. The IETF process slows as competing proposals vie for working group adoption. And the longer competing drafts proliferate without convergence, the higher the risk of incompatible deployments that entrench fragmentation rather than resolving it.</p>
<h2 id="what-the-best-drafts-look-like">What the Best Drafts Look Like</h2>
<p>Not everything is chaos. Our quality ratings -- scoring novelty, maturity, overlap avoidance, momentum, and relevance on a 1-5 scale -- surface drafts that are doing the hard work well:</p>
<table>
<thead>
<tr>
<th>Draft</th>
<th style="text-align: right;">Score</th>
<th>What It Does</th>
</tr>
</thead>
<tbody>
<tr>
<td><a href="https://datatracker.ietf.org/doc/draft-aylward-daap-v2/">draft-aylward-daap-v2</a></td>
<td style="text-align: right;">4.75</td>
<td>Comprehensive AI agent accountability with authentication, monitoring, enforcement</td>
</tr>
<tr>
<td><a href="https://datatracker.ietf.org/doc/draft-guy-bary-stamp-protocol/">draft-guy-bary-stamp-protocol</a></td>
<td style="text-align: right;">4.5</td>
<td>Cryptographic delegation and proof for agent task execution</td>
</tr>
<tr>
<td><a href="https://datatracker.ietf.org/doc/draft-drake-email-tpm-attestation/">draft-drake-email-tpm-attestation</a></td>
<td style="text-align: right;">4.5</td>
<td>Hardware attestation for email via TPM verification chains</td>
</tr>
<tr>
<td><a href="https://datatracker.ietf.org/doc/draft-ietf-lake-app-profiles/">draft-ietf-lake-app-profiles</a></td>
<td style="text-align: right;">4.5</td>
<td>Canonical CBOR for EDHOC application profiles</td>
</tr>
<tr>
<td><a href="https://datatracker.ietf.org/doc/draft-birkholz-verifiable-agent-conversations/">draft-birkholz-verifiable-agent-conversations</a></td>
<td style="text-align: right;">4.5</td>
<td>Verifiable agent conversation records with COSE signing</td>
</tr>
</tbody>
</table>
<p>Scores are 4-dimension composites (novelty, maturity, momentum, relevance), excluding overlap. The average score across all 434 rated drafts is 3.27. The best work combines clear problem definition with concrete mechanisms and low overlap with existing proposals. The worst drafts are me-too proposals that restate problems already solved elsewhere.</p>
<p><em>Methodology note: Quality ratings are LLM-generated (Claude Sonnet) from draft abstracts only, not full text. No human calibration has been performed. Scores should be treated as relative rankings within this corpus, not absolute quality measures. See <a href="07-how-we-built-this.md">How We Built This</a> and the <a href="../methodology.md">Methodology</a> document for details.</em></p>
<h2 id="what-comes-next">What Comes Next</h2>
<p>The IETF has navigated technology gold rushes before -- the early web, IoT, DNS security. In each case, the first wave of competing proposals eventually converged, and the lasting standards came from those who focused on interoperability and safety alongside capability.</p>
<p>The AI agent wave is following the same early pattern. The landscape has quantity. The question is whether it develops architecture -- and whether the safety work catches up before the capability work ships without it.</p>
<p>This blog series will dig into the questions the data raises. The next post starts with the most fundamental: who, exactly, is writing the rules?</p>
<hr />
<h3 id="key-takeaways">Key Takeaways</h3>
<ul>
<li><strong>434 drafts</strong> from <strong>557 authors</strong> at <strong>230 organizations</strong> -- AI/agent work went from <strong>0.5% to 9.3%</strong> of all IETF submissions in 15 months</li>
<li>The capability-to-safety ratio (roughly <strong>4:1 on aggregate</strong>, varying from 1.5:1 to 21:1 by month) is the most concerning structural finding</li>
<li><strong>Huawei</strong> dominates authorship with 53 authors on 69 drafts (~16% of corpus); Chinese-linked institutions account for 160+ authors</li>
<li><strong>14 competing OAuth-for-agents proposals</strong> illustrate deep fragmentation; 155 A2A protocol drafts have no interoperability layer</li>
<li><strong>11 standardization gaps</strong> remain, with the 2 most critical relating to what happens when agents fail</li>
</ul>
<p><em>Next in this series: <a href="02-who-writes-the-rules.md">Who's Writing the Rules for AI Agents?</a> -- Inside the team blocs, geopolitics, and collaboration networks behind the IETF's AI agent standards.</em></p>
<hr />
<p><em>Analysis conducted using the IETF Draft Analyzer. Data current as of March 2026. All 434 drafts, 557 authors, and full analysis data are available in the project's SQLite database.</em></p>
<div class="post-nav"><a href="/blog/posts/00-series-overview.html">&larr; Series Overview</a><a href="/blog/posts/02-who-writes-the-rules.html">Who Writes the Rules &rarr;</a></div>
<footer>
<p>IETF Draft Analyzer &mdash; Data collected through March 2026.
<a href="https://github.com/cnennemann/ietf-draft-analyzer">Source on GitHub</a></p>
</footer>
</div>
</body>
</html>

View File

@@ -0,0 +1,303 @@
<!DOCTYPE html>
<html lang="en">
<head>
<meta charset="utf-8">
<meta name="viewport" content="width=device-width, initial-scale=1">
<title>Who's Writing the Rules for AI Agents? — IETF AI Agent Analysis</title>
<link rel="stylesheet" href="/blog/css/style.css">
</head>
<body>
<div class="container">
<nav><a href="/blog/" class="site-title">IETF AI Agent Analysis</a>
<a href="/blog/posts/01-gold-rush.html">Rush</a>
<strong>Rules</strong>
<a href="/blog/posts/03-oauth-wars.html">Wars</a>
<a href="/blog/posts/04-what-nobody-builds.html">Builds</a>
<a href="/blog/posts/05-1262-ideas.html">Converge</a>
<a href="/blog/posts/06-big-picture.html">Picture</a>
<a href="/blog/posts/07-how-we-built-this.html">This</a>
<a href="/blog/posts/08-agents-building-the-analysis.html">Analysis</a></nav>
<h1 id="whos-writing-the-rules-for-ai-agents">Who's Writing the Rules for AI Agents?</h1>
<p><em>Inside the team blocs, geopolitics, and collaboration networks shaping the future of AI agent standards.</em></p>
<p><em>Analysis based on IETF Datatracker data collected through March 2026. Counts and statistics reflect this snapshot.</em></p>
<hr />
<p>Thirteen people from one company co-author 22 Internet-Drafts at 94% internal cohesion. Their work covers agent networking, identity management, communication protocols, and network troubleshooting. Together, they represent the single most coordinated standards-writing campaign in the IETF's AI agent space.</p>
<p>They all work at Huawei.</p>
<p>This is the story of who is writing the rules for AI agents, what their collaboration networks reveal, and why the geography of authorship matters more than most people realize.</p>
<h2 id="the-numbers-behind-the-names">The Numbers Behind the Names</h2>
<p>Our analysis mapped <strong>557 unique authors</strong> from <strong>230 organizations</strong> across the 434 AI/agent drafts in the IETF pipeline. But those topline numbers mask extreme concentration.</p>
<table>
<thead>
<tr>
<th>Organization</th>
<th style="text-align: right;">Authors</th>
<th style="text-align: right;">Drafts</th>
</tr>
</thead>
<tbody>
<tr>
<td>Huawei</td>
<td style="text-align: right;">53</td>
<td style="text-align: right;">69</td>
</tr>
<tr>
<td>China Mobile</td>
<td style="text-align: right;">24</td>
<td style="text-align: right;">35</td>
</tr>
<tr>
<td>Cisco</td>
<td style="text-align: right;">24</td>
<td style="text-align: right;">26</td>
</tr>
<tr>
<td>Independent</td>
<td style="text-align: right;">19</td>
<td style="text-align: right;">25</td>
</tr>
<tr>
<td>China Telecom</td>
<td style="text-align: right;">24</td>
<td style="text-align: right;">24</td>
</tr>
<tr>
<td>China Unicom</td>
<td style="text-align: right;">22</td>
<td style="text-align: right;">21</td>
</tr>
<tr>
<td>Tsinghua University</td>
<td style="text-align: right;">13</td>
<td style="text-align: right;">16</td>
</tr>
<tr>
<td>ZTE Corporation</td>
<td style="text-align: right;">12</td>
<td style="text-align: right;">12</td>
</tr>
<tr>
<td>Five9</td>
<td style="text-align: right;">1</td>
<td style="text-align: right;">10</td>
</tr>
<tr>
<td>Ericsson</td>
<td style="text-align: right;">4</td>
<td style="text-align: right;">9</td>
</tr>
</tbody>
</table>
<p>One company -- Huawei -- contributes about 16% of all drafts (69 across all Huawei-named entities, consolidated from Huawei, Huawei Technologies, Huawei Canada, etc.). The top six Chinese-linked organizations together contribute over 160 authors. This is not a general pattern across the IETF; it is specific to the AI agent space, and it tells a story about who considers these standards strategically important.</p>
<h2 id="the-huawei-drafting-machine">The Huawei Drafting Machine</h2>
<p>The Huawei team bloc is worth examining in detail because it illustrates a pattern -- organized, coordinated standards campaigns -- that is characteristic of how some institutions approach the IETF.</p>
<p>The 13-person core team includes:</p>
<table>
<thead>
<tr>
<th>Author</th>
<th style="text-align: right;">Drafts</th>
<th>Role in Team</th>
</tr>
</thead>
<tbody>
<tr>
<td>Bing Liu</td>
<td style="text-align: right;">23</td>
<td>Top contributor, appears on most team drafts</td>
</tr>
<tr>
<td>Zhenbin Li</td>
<td style="text-align: right;">21</td>
<td>Core, agent networking frameworks</td>
</tr>
<tr>
<td>Nan Geng</td>
<td style="text-align: right;">20</td>
<td>Core, near-total overlap with Liu</td>
</tr>
<tr>
<td>Qiangzhou Gao</td>
<td style="text-align: right;">20</td>
<td>Core, cross-device communication</td>
</tr>
<tr>
<td>Xiaotong Shang</td>
<td style="text-align: right;">19</td>
<td>Core, network measurement and troubleshooting</td>
</tr>
<tr>
<td>Jianwei Mao</td>
<td style="text-align: right;">14</td>
<td>Communication protocol gap analysis</td>
</tr>
<tr>
<td>Guanming Zeng</td>
<td style="text-align: right;">13</td>
<td>MCP and NETCONF for agents</td>
</tr>
</tbody>
</table>
<p>The remaining six members contribute 2-5 drafts each. The team's <strong>94% cohesion</strong> means that nearly every possible pair of members shares the vast majority of their drafts. This is not casual co-authorship; it is a systematic drafting operation.</p>
<p>Their 22 drafts cover a specific territory: agent networking frameworks for enterprise and broadband networks, agent identity management, cross-device communication, MCP integration for network troubleshooting, and agent gateway requirements. The focus is heavily on <strong>autonomous network operations</strong> and <strong>A2A protocols</strong> -- the infrastructure layer of the agent ecosystem.</p>
<p>Two deeper metrics reveal the nature of this operation:</p>
<p><strong>Volume over iteration.</strong> Across the entire corpus, <strong>55% of all 434 drafts</strong> have never been revised beyond their first submission (rev-00). But the rate varies dramatically by organization. Of Huawei's drafts, <strong>65% are at rev-00</strong>. Compare that to Ericsson (11%), Siemens (0%), Nokia (20%), or Boeing (0%). The most serious iterators -- Boeing (avg 28.2 revisions per draft), Siemens (17.2), Sandelman Software (14.3) -- submit far fewer drafts but iterate relentlessly. Western companies submit fewer drafts but revise heavily -- incorporating feedback, advancing toward maturity. Huawei's pattern is the opposite: submit at volume, iterate rarely. Submitting a draft is cheap. Iterating it signals genuine investment.</p>
<p><strong>Campaign timing.</strong> Of Huawei's drafts, <strong>43 were submitted in the four weeks before IETF 121 Dublin</strong> -- 62% of the company's entire output, packed into a single pre-meeting window. For context, the entire corpus had 107 drafts in that period. Huawei alone accounted for <strong>40% of all pre-IETF 121 submissions</strong>. This is not organic growth. It is a coordinated submission campaign timed for maximum standards-body impact.</p>
<p>Beyond the main team, the company has additional smaller blocs. No other organization comes close to this level of coordinated output.</p>
<h2 id="the-chinese-institutional-ecosystem">The Chinese Institutional Ecosystem</h2>
<p>Huawei does not operate in isolation. The Chinese organizations in this space form a densely interconnected collaboration network.</p>
<table>
<thead>
<tr>
<th>Org A</th>
<th>Org B</th>
<th style="text-align: right;">Shared Drafts</th>
</tr>
</thead>
<tbody>
<tr>
<td>China Unicom</td>
<td>Huawei</td>
<td style="text-align: right;">6</td>
</tr>
<tr>
<td>Tsinghua University</td>
<td>Zhongguancun Laboratory</td>
<td style="text-align: right;">5</td>
</tr>
<tr>
<td>China Mobile</td>
<td>ZTE Corporation</td>
<td style="text-align: right;">4</td>
</tr>
<tr>
<td>China Mobile</td>
<td>Huawei</td>
<td style="text-align: right;">4</td>
</tr>
<tr>
<td>BUPT</td>
<td>Tsinghua University</td>
<td style="text-align: right;">3</td>
</tr>
<tr>
<td>China Telecom</td>
<td>Huawei</td>
<td style="text-align: right;">3</td>
</tr>
<tr>
<td>BUPT</td>
<td>China Telecom</td>
<td style="text-align: right;">3</td>
</tr>
<tr>
<td>CAICT</td>
<td>Huawei</td>
<td style="text-align: right;">3</td>
</tr>
</tbody>
</table>
<p>The structure has three tiers:</p>
<p><strong>Tier 1: Telecom operators</strong> -- China Mobile (24 authors, 35 drafts), China Telecom (24 authors, 24 drafts), China Unicom (22 authors, 21 drafts). These organizations bring domain expertise in network operations and 6G requirements. Their drafts focus heavily on use cases: agents for 6G networks, agent-based network management, traffic optimization.</p>
<p><strong>Tier 2: Equipment vendor</strong> -- Huawei (53 authors, 66 drafts), ZTE Corporation (12 authors, 12 drafts). Huawei's dominance here is striking; ZTE's contribution is modest by comparison. These drafts focus on architecture and protocols -- the building blocks rather than the use cases.</p>
<p><strong>Tier 3: Research institutions</strong> -- Tsinghua University (13 authors, 16 drafts), BUPT (14 authors, 7 drafts), Zhongguancun Laboratory (4 authors, 6 drafts), CAICT (8 authors, 6 drafts). These institutions bridge the gap between industry and academia, often co-authoring with both telecom operators and Huawei.</p>
<p>The Zhongguancun Laboratory team (4 members, 5 shared drafts, 94% cohesion) is led by Yong Cui of Tsinghua University, one of the most prolific individual authors with 8 drafts spanning agent discovery, network management benchmarking, and LLM-assisted operations. His work includes <a href="https://datatracker.ietf.org/doc/draft-cui-nmrg-llm-benchmark/">draft-cui-nmrg-llm-benchmark</a> (score 4.3) -- one of the highest-rated drafts in the corpus.</p>
<p>The China Telecom team (6 members from China Telecom, BUPT, and Tsinghua) focuses on 6G agent use cases and IoA task protocols. Their drafts are more forward-looking than Huawei's -- less about current network operations, more about where agents fit in next-generation infrastructure.</p>
<h2 id="where-is-the-west">Where Is the West?</h2>
<p>The absence is as telling as the presence.</p>
<p><strong>Google</strong>: 5 authors, 9 drafts -- a notable increase, but still thin relative to the company's agent platform presence (Gemini agents, A2A protocol).</p>
<p><strong>Microsoft</strong>: Minimal presence.</p>
<p><strong>Apple</strong>: Two authors, two drafts -- both about mail automation (<a href="https://datatracker.ietf.org/doc/draft-ietf-mailmaint-pacc/">draft-ietf-mailmaint-pacc</a>, <a href="https://datatracker.ietf.org/doc/draft-eggert-mailmaint-uaautoconf/">draft-eggert-mailmaint-uaautoconf</a>). Not about AI agents per se.</p>
<p><strong>Amazon</strong>: 6 authors, 6 drafts -- primarily post-quantum cryptography work (ML-KEM hybrid key exchange), not agent-specific.</p>
<p><strong>Cisco</strong>: The most active Western tech company with 24 authors across 26 drafts, but spread thinly. Three separate Cisco blocs cover different areas: Cullen Fluffy Jennings and Suhas Nandakumar work on A2A transport and agent identity; another team (Muscariello, Papalini, Sardara, Betts) works on AGNTCY messaging; a third (Farinacci, Rodriguez-Natal, Maino) works on LISP-based networking. No single coordinated campaign.</p>
<p><strong>Ericsson</strong>: 4 authors, 9 drafts -- focused on EDHOC lightweight authentication, a mature protocol effort led by Goran Selander. High quality (scores 3.2-4.1) but narrow scope.</p>
<p>The pattern is clear: Western companies are either absent from AI agent standardization or participating in adjacent security/crypto work rather than the core agent protocol space. The reasons likely include strategic focus on proprietary agent ecosystems (Google's Gemini, Apple's Siri agents), less tradition of IETF engagement in the agent/AI space, and the assumption that de facto standards (MCP, A2A) will matter more than de jure IETF ones.</p>
<p>This bet may prove wrong. IETF standards have a way of becoming the infrastructure that everyone must eventually support.</p>
<h2 id="the-team-bloc-landscape">The Team Bloc Landscape</h2>
<p>Beyond Huawei, our co-authorship analysis detected <strong>18 team blocs</strong> covering a significant fraction of the 557 authors. Each bloc is a group where members share at least 70% pairwise draft overlap and 3+ shared drafts.</p>
<p>The most notable non-Chinese blocs:</p>
<p><strong>Ericsson team</strong> (5 members, 6 drafts, 100% cohesion) -- Goran Selander and colleagues lead this European effort focused on EDHOC authentication and lightweight key exchange for constrained devices. They collaborate with Inria (France) and the University of Murcia (Spain). Their work (<a href="https://datatracker.ietf.org/doc/draft-spm-lake-pqsuites/">draft-spm-lake-pqsuites</a>, score 4.1) represents some of the most mature protocol work in the corpus.</p>
<p><strong>Five9/Bitwave team</strong> (2 members, 6 drafts, 100% cohesion) -- Jonathan Rosenberg (Five9) and Pat White (Bitwave) are the most prolific Western contributors to core agent protocols. Their drafts span the full stack: CHEQ for human confirmation of agent decisions (<a href="https://datatracker.ietf.org/doc/draft-rosenberg-aiproto-cheq/">draft-rosenberg-aiproto-cheq</a>, score 3.9), N-ACT for agent-to-tool communication, and an OAuth extension for agent authentication. Rosenberg is also the strongest cross-team bridge, sharing 3 drafts with Cisco's Cullen Fluffy Jennings -- the single strongest cross-bloc connection we found.</p>
<p><strong>ISI, R.C. ATHENA team</strong> (4 members, 4 drafts, 100% cohesion) -- A Greek research institute producing post-quantum authentication work for EDHOC. All four members (Haleplidis, Fraile, Fournaris, Koulamas) co-author every draft. Their <a href="https://datatracker.ietf.org/doc/draft-lake-pocero-authkem-ikr-edhoc/">draft-lake-pocero-authkem-ikr-edhoc</a> scored 4.2.</p>
<p><strong>JPMorgan/multi-org team</strong> (4 members from JPMorgan, Oracle, Telefonica, Aryaka; 2 drafts, 100% cohesion) -- The most cross-organizational Western bloc. Their work on transitive attestation (<a href="https://datatracker.ietf.org/doc/draft-mw-wimse-transitive-attestation/">draft-mw-wimse-transitive-attestation</a>, score 4.3) and actor chains (<a href="https://datatracker.ietf.org/doc/draft-mw-spice-actor-chain/">draft-mw-spice-actor-chain</a>, score 4.1) addresses the safety and accountability space. Notably, these are among the highest-scored drafts in the corpus.</p>
<h2 id="the-cross-pollination-problem">The Cross-Pollination Problem</h2>
<p>Once you account for team blocs, the cross-team collaboration picture is sparse. The top cross-bloc connection -- Jonathan Rosenberg bridging Five9/Bitwave and Cisco -- involves just 3 shared drafts. Most cross-team pairs share only 1.</p>
<p>Our network centrality analysis reveals who bridges these gaps. Of 557 authors, only <strong>115 (23%)</strong> co-author with people from both Chinese and Western organizations. The top bridge-builders are not from the organizations you might expect:</p>
<table>
<thead>
<tr>
<th>Author</th>
<th>Organization</th>
<th style="text-align: right;">BC Score</th>
<th style="text-align: right;">CN Neighbors</th>
<th style="text-align: right;">Western Neighbors</th>
</tr>
</thead>
<tbody>
<tr>
<td>Luis M. Contreras</td>
<td>Telefonica</td>
<td style="text-align: right;">0.035</td>
<td style="text-align: right;">11</td>
<td style="text-align: right;">3</td>
</tr>
<tr>
<td>Qin Wu</td>
<td>Huawei</td>
<td style="text-align: right;">0.035</td>
<td style="text-align: right;">12</td>
<td style="text-align: right;">11</td>
</tr>
<tr>
<td>Muhammad Awais Jadoon</td>
<td>InterDigital</td>
<td style="text-align: right;">0.023</td>
<td style="text-align: right;">9</td>
<td style="text-align: right;">4</td>
</tr>
<tr>
<td>Diego Lopez</td>
<td>Telefonica</td>
<td style="text-align: right;">0.013</td>
<td style="text-align: right;">6</td>
<td style="text-align: right;">9</td>
</tr>
<tr>
<td>Giuseppe Fioccola</td>
<td>Huawei</td>
<td style="text-align: right;">0.009</td>
<td style="text-align: right;">2</td>
<td style="text-align: right;">8</td>
</tr>
</tbody>
</table>
<p>The structural glue holding the two blocs together is <strong>European telecoms</strong> -- Telefonica, InterDigital, Deutsche Telekom. Not US Big Tech. Not any formal cross-standards body. A handful of European companies, through their authors' co-authorship ties, provide the only significant cross-divide connectivity. Qin Wu (Huawei) is the most balanced individual bridge, with nearly equal Chinese and Western co-author networks. But these bridges are thin: remove any two or three of these people, and the network fragments further.</p>
<p>The sparseness of these bridges becomes even more concerning when you look at what the two blocs are building <em>on</em>. Our RFC cross-reference analysis (detailed in Post 3) reveals that the Chinese and Western blocs cite fundamentally different technology stacks. The Chinese agent ecosystem is being built on <strong>network management protocols</strong> -- YANG (RFC 7950), NETCONF (RFC 6241), and autonomic networking (RFC 7575). The Western ecosystem is being built on <strong>IoT security and web infrastructure</strong> -- COSE (RFC 9052), CBOR (RFC 8949), CoAP (RFC 7252), HTTP Semantics (RFC 9110), and EDHOC (RFC 9528). The only shared foundation is <strong>OAuth 2.0</strong> -- which explains why the OAuth-for-agents space has 14 competing proposals. It is the one piece of common ground, and everyone is fighting over it.</p>
<p>This means the cross-pollination problem is deeper than "different teams working separately." The two blocs are building on incompatible infrastructure. Even if they agreed on an agent communication pattern, the underlying plumbing diverges.</p>
<p>The IETF's consensus process works best when different implementation perspectives collide and reconcile. In the AI agent space, those collisions are rare. The Chinese institutional ecosystem collaborates internally but has limited connections to Western contributors. The European cryptographic teams (Ericsson, RISE, ATHENA) work on authentication foundations but do not connect to the agent protocol teams. The American startups (Five9, Bitwave) and enterprise companies (Cisco) work on adjacent problems without shared architectural framing.</p>
<p>The one exception is Fraunhofer SIT's Henk Birkholz and Tradeverifyd's Orie Steele, whose <a href="https://datatracker.ietf.org/doc/draft-birkholz-verifiable-agent-conversations/">draft-birkholz-verifiable-agent-conversations</a> (score 4.5) and <a href="https://datatracker.ietf.org/doc/draft-steele-agent-considerations/">draft-steele-agent-considerations</a> (score 4.0) represent rare cross-cultural, safety-focused work from German and American collaborators.</p>
<h2 id="what-this-means">What This Means</h2>
<p>Three implications emerge from the authorship data:</p>
<p><strong>1. Volume and influence are not the same thing.</strong> Huawei's 69 drafts represent about 16% of the corpus, but 65% have never been revised. The IETF rewards sustained engagement -- drafts that iterate through feedback cycles, reach working group adoption, and mature toward RFC status. A campaign that optimizes for volume at a pre-meeting deadline is playing a different game than one that optimizes for adoption. The quality scores bear this out: Huawei's team averages around 3.1, respectable but not exceptional. The organizations doing the deepest work (Ericsson at 4.8 average revision, Siemens at 17.2) submit far fewer drafts but iterate relentlessly.</p>
<p><strong>2. The safety work comes from unexpected places.</strong> The highest-quality safety and accountability drafts come not from the high-volume drafters but from smaller, specialized teams: Aylward (independent), Birkholz/Steele (Fraunhofer/Tradeverifyd), Rosenberg/White (Five9/Bitwave), and the JPMorgan-led multi-org team. The organizations doing the most drafting are focused on capability; the organizations doing the best safety work are doing the least drafting.</p>
<p><strong>3. The IETF needs more bridges.</strong> Cross-team, cross-organization, cross-geography collaboration is the weakest link in the current landscape. Our centrality analysis shows that European telecoms -- not US Big Tech -- are the structural glue between Chinese and Western blocs. The standards that will endure are the ones where Chinese telecom expertise, European cryptographic rigor, and American agent-platform experience converge. Right now, those worlds barely overlap, and the few bridges that exist depend on a handful of individuals.</p>
<hr />
<h3 id="key-takeaways">Key Takeaways</h3>
<ul>
<li><strong>Huawei dominates</strong> with 53 authors on 69 drafts (~16% of corpus); their 13-person core team co-authors 22 drafts at 94% cohesion -- but 65% of those drafts have never been revised, and 43 were submitted in a single 4-week pre-meeting window</li>
<li><strong>Chinese institutions</strong> collectively contribute 160+ of 557 authors; they form a tightly interconnected collaboration ecosystem</li>
<li><strong>Google has 9 drafts but Microsoft and Apple are largely absent</strong> from AI agent standardization -- a notable strategic gap</li>
<li><strong>18 team blocs</strong> detected; cross-team collaboration is sparse, with most cross-bloc pairs sharing only 1 draft</li>
<li><strong>Only 23% of authors bridge the Chinese-Western divide</strong>; European telecoms (Telefonica, InterDigital) are the structural glue -- not US Big Tech</li>
<li><strong>The best safety work</strong> comes from smaller, specialized teams -- not from the high-volume drafters</li>
</ul>
<p><em>Next in this series: <a href="03-oauth-wars.md">The OAuth Wars and Other Battles</a> -- 14 competing proposals, 155 A2A protocols, and what fragmentation costs the internet.</em></p>
<hr />
<p><em>Data from the IETF Draft Analyzer, covering 434 drafts, 557 authors, and 18 detected team blocs. Co-authorship analysis uses 70% pairwise draft overlap threshold with 3+ shared drafts.</em></p>
<div class="post-nav"><a href="/blog/posts/01-gold-rush.html">&larr; The Gold Rush</a><a href="/blog/posts/03-oauth-wars.html">The OAuth Wars &rarr;</a></div>
<footer>
<p>IETF Draft Analyzer &mdash; Data collected through March 2026.
<a href="https://github.com/cnennemann/ietf-draft-analyzer">Source on GitHub</a></p>
</footer>
</div>
</body>
</html>

View File

@@ -0,0 +1,373 @@
<!DOCTYPE html>
<html lang="en">
<head>
<meta charset="utf-8">
<meta name="viewport" content="width=device-width, initial-scale=1">
<title>The OAuth Wars and Other Battles — IETF AI Agent Analysis</title>
<link rel="stylesheet" href="/blog/css/style.css">
</head>
<body>
<div class="container">
<nav><a href="/blog/" class="site-title">IETF AI Agent Analysis</a>
<a href="/blog/posts/01-gold-rush.html">Rush</a>
<a href="/blog/posts/02-who-writes-the-rules.html">Rules</a>
<strong>Wars</strong>
<a href="/blog/posts/04-what-nobody-builds.html">Builds</a>
<a href="/blog/posts/05-1262-ideas.html">Converge</a>
<a href="/blog/posts/06-big-picture.html">Picture</a>
<a href="/blog/posts/07-how-we-built-this.html">This</a>
<a href="/blog/posts/08-agents-building-the-analysis.html">Analysis</a></nav>
<h1 id="the-oauth-wars-and-other-battles">The OAuth Wars and Other Battles</h1>
<p><em>14 competing proposals, 155 protocols with no interop layer, and 25+ near-duplicate drafts. Inside the IETF's AI agent fragmentation problem.</em></p>
<p><em>Analysis based on IETF Datatracker data collected through March 2026. Counts and statistics reflect this snapshot.</em></p>
<hr />
<p>Fourteen separate Internet-Drafts are trying to solve the same problem: how should AI agents authenticate and get authorized using OAuth? They are not collaborating. They are not compatible. And they are all submitted in the same nine-month window.</p>
<p>This is the fragmentation problem, and it is not limited to OAuth. Across the IETF's AI agent landscape, our analysis found the same pattern repeated in agent discovery, multi-agent communication, intent-based routing, and 6G agent requirements. Teams are working in parallel, not together, and the cost is measured in wasted effort, confused implementers, and the growing risk of incompatible deployments.</p>
<h2 id="the-oauth-cluster-14-ways-to-solve-one-problem">The OAuth Cluster: 14 Ways to Solve One Problem</h2>
<p>The most crowded corner of the AI agent standards landscape is OAuth for agents. Every proposal is trying to answer the same fundamental question: when an AI agent acts on behalf of a user -- or on its own -- how does it prove its identity and obtain permission?</p>
<p>The depth of this cluster is not surprising when you look at the ecosystem's foundations. Our cross-reference analysis of all 434 drafts found that <strong>OAuth 2.0</strong> (RFC 6749) is cited by <strong>36 drafts</strong>, <strong>JWT</strong> (RFC 7519) by <strong>22</strong>, <strong>OAuth Bearer</strong> (RFC 6750) by <strong>9</strong>, and <strong>DPoP</strong> (RFC 9449) by <strong>9</strong>. The OAuth stack is the single most-referenced functional standard in the entire corpus after TLS. The agent identity problem runs through the landscape like a root system.</p>
<p>Here are all 14 drafts:</p>
<table>
<thead>
<tr>
<th>Draft</th>
<th>Approach</th>
<th style="text-align: right;">Score</th>
</tr>
</thead>
<tbody>
<tr>
<td><a href="https://datatracker.ietf.org/doc/draft-aylward-daap-v2/">draft-aylward-daap-v2</a></td>
<td>Comprehensive accountability protocol</td>
<td style="text-align: right;">4.75</td>
</tr>
<tr>
<td><a href="https://datatracker.ietf.org/doc/draft-goswami-agentic-jwt/">draft-goswami-agentic-jwt</a></td>
<td>Agentic JWT for autonomous systems</td>
<td style="text-align: right;">4.5</td>
</tr>
<tr>
<td><a href="https://datatracker.ietf.org/doc/draft-chen-oauth-rar-agent-extensions/">draft-chen-oauth-rar-agent-extensions</a></td>
<td>RAR extensions for agent policy</td>
<td style="text-align: right;">4.2</td>
</tr>
<tr>
<td><a href="https://datatracker.ietf.org/doc/draft-aap-oauth-profile/">draft-aap-oauth-profile</a></td>
<td>OAuth 2.0 profile for autonomous agents</td>
<td style="text-align: right;">4.2</td>
</tr>
<tr>
<td><a href="https://datatracker.ietf.org/doc/draft-barney-caam/">draft-barney-caam</a></td>
<td>Contextual agent authorization mesh</td>
<td style="text-align: right;">4.0</td>
</tr>
<tr>
<td><a href="https://datatracker.ietf.org/doc/draft-liu-agent-operation-authorization/">draft-liu-agent-operation-authorization</a></td>
<td>Verifiable delegation via JWT</td>
<td style="text-align: right;">4.1</td>
</tr>
<tr>
<td><a href="https://datatracker.ietf.org/doc/draft-rosenberg-oauth-aauth/">draft-rosenberg-oauth-aauth</a></td>
<td>OAuth for agents on PSTN/SMS</td>
<td style="text-align: right;">3.6</td>
</tr>
<tr>
<td><a href="https://datatracker.ietf.org/doc/draft-oauth-ai-agents-on-behalf-of-user/">draft-oauth-ai-agents-on-behalf-of-user</a></td>
<td>On-behalf-of-user extension</td>
<td style="text-align: right;">3.7</td>
</tr>
<tr>
<td><a href="https://datatracker.ietf.org/doc/draft-jia-oauth-scope-aggregation/">draft-jia-oauth-scope-aggregation</a></td>
<td>Scope aggregation for multi-step workflows</td>
<td style="text-align: right;">3.5</td>
</tr>
<tr>
<td><a href="https://datatracker.ietf.org/doc/draft-liu-oauth-a2a-profile/">draft-liu-oauth-a2a-profile</a></td>
<td>A2A profile for transaction tokens</td>
<td style="text-align: right;">3.6</td>
</tr>
<tr>
<td><a href="https://datatracker.ietf.org/doc/draft-song-oauth-ai-agent-authorization/">draft-song-oauth-ai-agent-authorization</a></td>
<td>Target-based authorization</td>
<td style="text-align: right;">2.8</td>
</tr>
<tr>
<td><a href="https://datatracker.ietf.org/doc/draft-song-oauth-ai-agent-collaborate-authz/">draft-song-oauth-ai-agent-collaborate-authz</a></td>
<td>Multi-agent collaboration authz</td>
<td style="text-align: right;">3.5</td>
</tr>
<tr>
<td><a href="https://datatracker.ietf.org/doc/draft-chen-ai-agent-auth-new-requirements/">draft-chen-ai-agent-auth-new-requirements</a></td>
<td>New auth requirements analysis</td>
<td style="text-align: right;">3.8</td>
</tr>
<tr>
<td><a href="https://datatracker.ietf.org/doc/draft-yao-agent-auth-considerations/">draft-yao-agent-auth-considerations</a></td>
<td>Auth considerations analysis</td>
<td style="text-align: right;">3.1</td>
</tr>
</tbody>
</table>
<p><em>(Scores are LLM-generated relative rankings from abstracts, not human expert assessments. See <a href="../methodology.md">Methodology</a>.)</em></p>
<p>The quality range is enormous -- from 2.8 to 4.75 -- and the approaches barely overlap. Some extend OAuth 2.0 with new grant types. Others define entirely new token formats (Agentic JWT). Still others propose mesh architectures or accountability layers on top of existing auth flows. Two drafts (song-oauth-ai-agent-authorization and song-oauth-ai-agent-collaborate-authz) come from the same Huawei team and address different facets of the problem. Two more (chen-oauth-rar-agent-extensions and chen-ai-agent-auth-new-requirements) come from a China Mobile team.</p>
<p>The gap our analysis identified in this cluster: most focus on <strong>single-agent authorization</strong>. Few address chained delegation across multiple agents, and none standardize real-time revocation in agent-to-agent workflows. An agent that obtains a token and delegates a sub-task to another agent -- which then delegates further -- creates a chain of trust that no single draft adequately covers.</p>
<p>A note on terminology: "consent" in the OAuth context means a technical authorization flow where a user delegates access scopes to a client. This is distinct from GDPR consent (<em>Einwilligung</em>) under Art. 6(1)(a) GDPR, which must be freely given, specific, informed, and unambiguous, and is revocable at any time. When AI agents further delegate to sub-agents, the chain of GDPR-valid consent may break entirely -- a problem none of these 14 drafts addresses. The controller-processor relationship under Art. 28 GDPR imposes additional requirements (data processing agreements, sub-processor authorization) that go beyond what any OAuth extension can express on its own.</p>
<h2 id="the-agent-gateway-melee-10-drafts">The Agent Gateway Melee: 10 Drafts</h2>
<p>If OAuth for agents is about identity, the agent gateway cluster is about communication architecture. Ten drafts are competing to define how agents from different platforms and ecosystems collaborate:</p>
<table>
<thead>
<tr>
<th>Draft</th>
<th>Approach</th>
<th style="text-align: right;">Score</th>
</tr>
</thead>
<tbody>
<tr>
<td><a href="https://datatracker.ietf.org/doc/draft-li-dmsc-macp/">draft-li-dmsc-macp</a></td>
<td>Multi-agent collaboration protocol suite</td>
<td style="text-align: right;">4.2</td>
</tr>
<tr>
<td><a href="https://datatracker.ietf.org/doc/draft-agent-gw/">draft-agent-gw</a></td>
<td>Semantic routing gateway</td>
<td style="text-align: right;">3.9</td>
</tr>
<tr>
<td><a href="https://datatracker.ietf.org/doc/draft-cui-dmsc-agent-cdi/">draft-cui-dmsc-agent-cdi</a></td>
<td>Cross-domain interop framework</td>
<td style="text-align: right;">3.0</td>
</tr>
<tr>
<td><a href="https://datatracker.ietf.org/doc/draft-han-rtgwg-agent-gateway-intercomm-framework/">draft-han-rtgwg-agent-gateway-intercomm-framework</a></td>
<td>Gateway intercommunication</td>
<td style="text-align: right;">3.6</td>
</tr>
<tr>
<td><a href="https://datatracker.ietf.org/doc/draft-li-dmsc-inf-architecture/">draft-li-dmsc-inf-architecture</a></td>
<td>DMSC infrastructure architecture</td>
<td style="text-align: right;">3.1</td>
</tr>
<tr>
<td><a href="https://datatracker.ietf.org/doc/draft-liu-dmsc-acps-arc/">draft-liu-dmsc-acps-arc</a></td>
<td>Agent collaboration protocols arch</td>
<td style="text-align: right;">3.6</td>
</tr>
<tr>
<td><a href="https://datatracker.ietf.org/doc/draft-yang-dmsc-ioa-task-protocol/">draft-yang-dmsc-ioa-task-protocol</a></td>
<td>IoA task protocol</td>
<td style="text-align: right;">3.0</td>
</tr>
<tr>
<td><a href="https://datatracker.ietf.org/doc/draft-yang-ioa-protocol/">draft-yang-ioa-protocol</a></td>
<td>IoA protocol</td>
<td style="text-align: right;">3.6</td>
</tr>
<tr>
<td><a href="https://datatracker.ietf.org/doc/draft-fu-nmop-agent-communication-framework/">draft-fu-nmop-agent-communication-framework</a></td>
<td>Network AIOps comm framework</td>
<td style="text-align: right;">3.0</td>
</tr>
<tr>
<td><a href="https://datatracker.ietf.org/doc/draft-campbell-agentic-http/">draft-campbell-agentic-http</a></td>
<td>HTTP best practices</td>
<td style="text-align: right;">--</td>
</tr>
</tbody>
</table>
<p>A revealing pattern: five of these ten drafts reference "DMSC" -- Dynamic Multi-agent Secured Collaboration -- a concept pushed primarily by Chinese institutions through the IETF's DMSC side meeting. This cluster represents an organized attempt to define the agent collaboration architecture, but even within that effort, multiple competing proposals have emerged.</p>
<p>The gap: no draft in this cluster addresses <strong>dynamic trust establishment between gateways</strong>, or how to handle conflicting semantic schemas across ecosystems. If Agent Gateway A speaks MCP and Agent Gateway B speaks A2A Protocol, these drafts describe the need for translation but do not provide it.</p>
<h2 id="the-near-duplicate-epidemic">The Near-Duplicate Epidemic</h2>
<p>Our embedding-based similarity analysis produced a more troubling finding: <strong>25+ draft pairs</strong> have cosine similarity above 0.98. Many are functionally identical proposals submitted under different names:</p>
<table>
<thead>
<tr>
<th>Draft A</th>
<th>Draft B</th>
<th>Reason</th>
</tr>
</thead>
<tbody>
<tr>
<td>draft-a2a-moqt-transport</td>
<td>draft-nandakumar-a2a-moqt-transport</td>
<td>Same content, different name</td>
</tr>
<tr>
<td>draft-abbey-scim-agent-extension</td>
<td>draft-scim-agent-extension</td>
<td>Same draft, dual submission</td>
</tr>
<tr>
<td>draft-rosenberg-aiproto</td>
<td>draft-rosenberg-aiproto-nact</td>
<td>Renamed</td>
</tr>
<tr>
<td>draft-rosenberg-aiproto-cheq</td>
<td>draft-rosenberg-cheq</td>
<td>Renamed</td>
</tr>
<tr>
<td>draft-cui-nmrg-llm-nm</td>
<td>draft-irtf-nmrg-llm-nm</td>
<td>WG adoption (individual to IRTF)</td>
</tr>
<tr>
<td>draft-ar-emu-hybrid-pqc-eapaka</td>
<td>draft-ietf-emu-hybrid-pqc-eapaka</td>
<td>WG adoption</td>
</tr>
<tr>
<td>draft-zheng-agent-identity-management</td>
<td>draft-zheng-dispatch-agent-identity-management</td>
<td>Same draft, different WG</td>
</tr>
<tr>
<td>draft-sun-zhang-iaip</td>
<td>draft-sz-dmsc-iaip</td>
<td>Same draft, different WG</td>
</tr>
<tr>
<td>draft-zeng-mcp-troubleshooting</td>
<td>draft-zm-rtgwg-mcp-troubleshooting</td>
<td>Same draft, different WG</td>
</tr>
</tbody>
</table>
<p>Some of these duplications are legitimate IETF process: a draft moves from individual submission to working group adoption (like draft-cui-nmrg-llm-nm becoming draft-irtf-nmrg-llm-nm). Others reflect authors shopping the same draft to multiple working groups. And a few appear to be genuine content duplication -- the same ideas submitted under different author combinations.</p>
<p>The practical effect: the 434-draft corpus includes substantial double-counting. After de-duplication, the true number of distinct proposals is somewhat lower -- removing the 25 near-duplicate pairs yields roughly 409 distinct drafts, and further accounting for related-but-not-identical submissions brings the number down further. But even with generous de-duplication, the volume is extraordinary.</p>
<h2 id="the-a2a-protocol-zoo">The A2A Protocol Zoo</h2>
<p>Zooming out from individual clusters, the broadest fragmentation is in the <strong>155 A2A protocol drafts</strong>. These span everything from low-level transport (A2A over MOQT/QUIC) to high-level semantic routing (intent-based agent interconnection) to specific use cases (MCP for network troubleshooting).</p>
<p>The most common technical idea in the entire corpus -- "Multi-Agent Communication Protocol" -- appears in <strong>8 separate drafts</strong> from different teams. Eight teams are independently designing how agents should talk to each other.</p>
<table>
<thead>
<tr>
<th>Competing Area</th>
<th style="text-align: right;">Drafts</th>
<th>Distinguishing Fact</th>
</tr>
</thead>
<tbody>
<tr>
<td>OAuth for agents</td>
<td style="text-align: right;">14</td>
<td>No draft handles chained delegation</td>
</tr>
<tr>
<td>Agent gateway/collaboration</td>
<td style="text-align: right;">10</td>
<td>5 are DMSC-linked; no trust framework</td>
</tr>
<tr>
<td>Agent discovery</td>
<td style="text-align: right;">6</td>
<td>Range from DNS-based to full directories</td>
</tr>
<tr>
<td>Intent-based routing</td>
<td style="text-align: right;">5</td>
<td>Requirements-heavy, protocol-light</td>
</tr>
<tr>
<td>6G agent requirements</td>
<td style="text-align: right;">6</td>
<td>Wish lists, not specifications</td>
</tr>
<tr>
<td>SCIM/identity registry</td>
<td style="text-align: right;">6</td>
<td>3 are near-duplicates</td>
</tr>
</tbody>
</table>
<p>The discovery cluster is particularly illustrative. Six drafts propose different ways to find AI agents: <a href="https://datatracker.ietf.org/doc/draft-narajala-ans/">draft-narajala-ans</a> (score 4.2) proposes a DNS-based Agent Name Service. <a href="https://datatracker.ietf.org/doc/draft-mozleywilliams-dnsop-bandaid/">draft-mozleywilliams-dnsop-bandaid</a> (3.6) also uses DNS but via SVCB records. <a href="https://datatracker.ietf.org/doc/draft-pioli-agent-discovery/">draft-pioli-agent-discovery</a> (3.2) defines a lightweight registration and discovery protocol. <a href="https://datatracker.ietf.org/doc/draft-gaikwad-woa/">draft-gaikwad-woa</a> (3.2) proposes a Web of Agents format using JSON Schema. None of them reference each other.</p>
<h2 id="the-deeper-fragmentation-different-technological-dna">The Deeper Fragmentation: Different Technological DNA</h2>
<p>The protocol-level fragmentation documented above is only the visible layer. Beneath it, our RFC cross-reference analysis reveals a more fundamental divide: the two major drafting blocs are building on <strong>entirely different technology stacks</strong>.</p>
<table>
<thead>
<tr>
<th>Foundation</th>
<th>Chinese Bloc</th>
<th>Western Bloc</th>
</tr>
</thead>
<tbody>
<tr>
<td><strong>Network management (YANG/NETCONF)</strong></td>
<td>Strong (RFC 6241, 8639, 8641, 7950)</td>
<td>Absent</td>
</tr>
<tr>
<td><strong>IoT security (COSE/CBOR/OSCORE/CoAP)</strong></td>
<td>Absent</td>
<td>Strong (RFC 9052, 8949, 8613, 7252)</td>
</tr>
<tr>
<td><strong>PKI/Certificates (X.509)</strong></td>
<td>Absent</td>
<td>Strong (RFC 5280)</td>
</tr>
<tr>
<td><strong>Lightweight auth (EDHOC, CWT)</strong></td>
<td>Absent</td>
<td>Strong (RFC 9528, 8392)</td>
</tr>
<tr>
<td><strong>Web APIs (HTTP Semantics)</strong></td>
<td>Weak</td>
<td>Strong (RFC 9110)</td>
</tr>
<tr>
<td><strong>TLS 1.3</strong></td>
<td>Moderate (8 citations)</td>
<td>Strong (18 citations)</td>
</tr>
<tr>
<td><strong>OAuth 2.0</strong></td>
<td>Present (11 citations)</td>
<td>Present (7 citations)</td>
</tr>
</tbody>
</table>
<p>The Chinese bloc -- Huawei, China Mobile, China Telecom, China Unicom, and associated research institutions -- builds agent infrastructure on <strong>YANG/NETCONF</strong>, the network management protocols that underpin autonomous network operations. The Western bloc -- Ericsson, Cisco, ATHENA, and European research labs -- builds on <strong>COSE/CBOR/CoAP</strong> (IoT security) and <strong>HTTP/TLS/PKI</strong> (web infrastructure).</p>
<p>The <strong>only shared foundation</strong> is OAuth 2.0, which both blocs cite at comparable rates. This is why the OAuth cluster has 14 competing proposals: it is the one piece of common ground, and everyone is fighting over it.</p>
<p>This means fragmentation goes deeper than protocol design. Even if the community agreed on a single agent communication pattern, the underlying plumbing is incompatible. A Chinese draft building on NETCONF and a Western draft building on CoAP cannot interoperate without a translation layer -- and that translation layer, as we document in the gap analysis, does not exist.</p>
<h2 id="what-fragmentation-costs">What Fragmentation Costs</h2>
<p>The costs of this fragmentation are not theoretical:</p>
<p><strong>For implementers</strong>: Which OAuth extension do you implement? Do you support SCIM agent schemas or Web of Agents? If your agent needs to discover another agent, do you look in DNS, a well-known URI, or a dedicated directory? Today there is no canonical answer, and choosing wrong means re-implementation when the IETF eventually converges.</p>
<p><strong>For the IETF process</strong>: Working groups spend time evaluating competing proposals that could be spent converging on solutions. The OAuth working group alone faces 14 agent-related drafts. The volume creates overhead that slows progress on any single proposal.</p>
<p><strong>For security</strong>: When multiple incompatible authentication and authorization schemes exist, implementations inevitably take shortcuts. The most dangerous agents will be those that implement the easiest -- not the most secure -- available standard.</p>
<p><strong>For the ecosystem</strong>: Each month that fragmentation persists, real-world agent deployments make choices. Those choices entrench specific approaches, making convergence harder and interoperability more expensive. The window for a unified standard narrows with every proprietary deployment.</p>
<p><strong>A note on IETF IPR policy</strong>: Implementers considering building on any of the OAuth or protocol drafts discussed above should be aware that Internet-Drafts may be subject to intellectual property rights (IPR) claims. Under BCP 79 (RFC 8179), IETF participants are expected to disclose known IPR. Check the <a href="https://datatracker.ietf.org/ipr/">IETF IPR disclosure database</a> before implementing.</p>
<h2 id="the-convergence-signals">The Convergence Signals</h2>
<p>Not everything is divergence. A few positive patterns emerged from the data:</p>
<p><strong>EDHOC is converging.</strong> The lightweight authenticated key exchange protocol has multiple working-group-adopted drafts (<a href="https://datatracker.ietf.org/doc/draft-ietf-lake-edhoc-psk/">draft-ietf-lake-edhoc-psk</a>, <a href="https://datatracker.ietf.org/doc/draft-ietf-lake-authz/">draft-ietf-lake-authz</a>, <a href="https://datatracker.ietf.org/doc/draft-ietf-emu-eap-edhoc/">draft-ietf-emu-eap-edhoc</a>) with coordinated authorship. This is what healthy standards development looks like: multiple drafts from different teams that explicitly build on each other.</p>
<p><strong>SCIM agent extensions show maturity.</strong> The Okta team's <a href="https://datatracker.ietf.org/doc/draft-abbey-scim-agent-extension/">draft-abbey-scim-agent-extension</a> (score 3.8) and <a href="https://datatracker.ietf.org/doc/draft-wahl-scim-agent-schema/">draft-wahl-scim-agent-schema</a> (score 3.9) represent a practical approach: extend an existing, widely-deployed protocol (SCIM) rather than invent a new one. This pragmatism is a convergence signal.</p>
<p><strong>The verifiable conversations approach is gaining traction.</strong> <a href="https://datatracker.ietf.org/doc/draft-birkholz-verifiable-agent-conversations/">draft-birkholz-verifiable-agent-conversations</a> (score 4.5) and the WIMSE/ECT work on execution context tokens represent a "record everything, verify later" approach to agent accountability that multiple communities can support.</p>
<h2 id="what-needs-to-happen">What Needs to Happen</h2>
<p>Three structural interventions would accelerate convergence:</p>
<p><strong>1. Working groups need to pick winners.</strong> The IETF process allows competing proposals, but at some point working groups must adopt specific approaches and redirect competing efforts. In the OAuth agent space, the highest-quality proposals (DAAP, Agentic JWT, RAR extensions) should be evaluated head-to-head, not allowed to proliferate indefinitely.</p>
<p><strong>2. Interoperability testing, not just drafting.</strong> The 155 A2A protocol proposals exist mostly as text. Interop testing -- where implementations from different teams prove they can work together -- would quickly reveal which proposals have real engineering substance and which are paper exercises.</p>
<p><strong>3. The translation layer must be built.</strong> Rather than picking one A2A protocol, the community may be better served by a thin interoperability layer that lets agents using different protocols communicate through gateways. Our gap analysis found this cross-protocol translation gap entirely unaddressed -- zero technical ideas in the current corpus.</p>
<hr />
<h3 id="key-takeaways">Key Takeaways</h3>
<ul>
<li><strong>14 competing OAuth-for-agents proposals</strong> illustrate the depth of fragmentation; none handle chained delegation across agent networks</li>
<li><strong>155 A2A protocol drafts</strong> exist without an interoperability layer; the most common idea in the corpus appears in 8 separate drafts from different teams</li>
<li><strong>25+ near-duplicate pairs</strong> (&gt;0.98 similarity) inflate the draft count; after de-duplication, roughly 409 distinct proposals remain</li>
<li><strong>Convergence signals exist</strong> in EDHOC authentication, SCIM agent extensions, and verifiable conversations -- areas where teams explicitly build on each other</li>
<li><strong>Fragmentation goes deeper than protocols</strong>: Chinese and Western blocs build on different RFC foundations (YANG/NETCONF vs COSE/CBOR/CoAP); the only shared bedrock is OAuth 2.0</li>
<li><strong>The missing piece</strong> is a cross-protocol translation layer; no draft in the corpus addresses how agents using different protocols can interoperate</li>
</ul>
<p><em>Next in this series: <a href="04-what-nobody-builds.md">What Nobody's Building (And Why It Matters)</a> -- The 11 gaps in the IETF's AI agent landscape, and the real-world consequences of leaving them unfilled.</em></p>
<hr />
<p><em>Data from the IETF Draft Analyzer's embedding-based overlap analysis (nomic-embed-text) and cluster detection at 0.85/0.90 similarity thresholds.</em></p>
<div class="post-nav"><a href="/blog/posts/02-who-writes-the-rules.html">&larr; Who Writes the Rules</a><a href="/blog/posts/04-what-nobody-builds.html">What Nobody Builds &rarr;</a></div>
<footer>
<p>IETF Draft Analyzer &mdash; Data collected through March 2026.
<a href="https://github.com/cnennemann/ietf-draft-analyzer">Source on GitHub</a></p>
</footer>
</div>
</body>
</html>

View File

@@ -0,0 +1,196 @@
<!DOCTYPE html>
<html lang="en">
<head>
<meta charset="utf-8">
<meta name="viewport" content="width=device-width, initial-scale=1">
<title>What Nobody's Building (And Why It Matters) — IETF AI Agent Analysis</title>
<link rel="stylesheet" href="/blog/css/style.css">
</head>
<body>
<div class="container">
<nav><a href="/blog/" class="site-title">IETF AI Agent Analysis</a>
<a href="/blog/posts/01-gold-rush.html">Rush</a>
<a href="/blog/posts/02-who-writes-the-rules.html">Rules</a>
<a href="/blog/posts/03-oauth-wars.html">Wars</a>
<strong>Builds</strong>
<a href="/blog/posts/05-1262-ideas.html">Converge</a>
<a href="/blog/posts/06-big-picture.html">Picture</a>
<a href="/blog/posts/07-how-we-built-this.html">This</a>
<a href="/blog/posts/08-agents-building-the-analysis.html">Analysis</a></nav>
<h1 id="what-nobodys-building-and-why-it-matters">What Nobody's Building (And Why It Matters)</h1>
<p><em>The 11 gaps in the IETF's AI agent landscape -- and the real-world disasters they invite.</em></p>
<p><em>Analysis based on IETF Datatracker data collected through March 2026. Counts and statistics reflect this snapshot.</em></p>
<hr />
<p>Imagine an AI agent managing a hospital's drug-dispensing system. It receives instructions from a prescribing agent, coordinates with a pharmacy agent, and issues delivery commands to a robotic dispensing agent. On Tuesday morning, the prescribing agent hallucinates a dosage. The pharmacy agent fills it. The dispensing agent delivers it. No human saw it happen. No system flagged it. No protocol exists to roll back the dispensed medication.</p>
<p>To be clear: this scenario is already regulated. Under the EU AI Act (Regulation 2024/1689), a drug-dispensing AI agent is a high-risk AI system under Annex III, requiring conformity assessment, risk management, and human oversight before deployment. The Medical Devices Regulation (MDR 2017/745) imposes additional obligations. The gap is not one of legal accountability -- it is one of technical implementation. The standards that would let developers <em>comply</em> with these regulations in multi-agent architectures do not yet exist.</p>
<p>This is the predictable consequence of the IETF's most critical standardization gaps.</p>
<p>We analyzed <strong>434 Internet-Drafts</strong>, extracted their technical components, and compared the result against what real-world agent deployments actually require. We found <strong>11 gaps</strong> -- areas where standardization work is missing or inadequate. Two of them are critical. And the critical ones share a defining characteristic: they address what happens when autonomous agents fail or misbehave.</p>
<p>Nobody is building the safety net.</p>
<h2 id="the-12-gaps">The 12 Gaps</h2>
<p>Our gap analysis sorted findings by severity based on the breadth of the shortfall and the consequences of leaving it unfilled:</p>
<table>
<thead>
<tr>
<th>#</th>
<th>Gap</th>
<th>Severity</th>
</tr>
</thead>
<tbody>
<tr>
<td>1</td>
<td>Agent Behavioral Verification</td>
<td>CRITICAL</td>
</tr>
<tr>
<td>2</td>
<td>Agent Failure Cascade Prevention</td>
<td>CRITICAL</td>
</tr>
<tr>
<td>3</td>
<td>Real-Time Agent Rollback Mechanisms</td>
<td>HIGH</td>
</tr>
<tr>
<td>4</td>
<td>Multi-Agent Consensus Protocols</td>
<td>HIGH</td>
</tr>
<tr>
<td>5</td>
<td>Human Override Standardization</td>
<td>HIGH</td>
</tr>
<tr>
<td>6</td>
<td>Cross-Domain Agent Audit Trails</td>
<td>HIGH</td>
</tr>
<tr>
<td>7</td>
<td>Federated Agent Learning Privacy</td>
<td>HIGH</td>
</tr>
<tr>
<td>8</td>
<td>Cross-Protocol Agent Migration</td>
<td>MEDIUM</td>
</tr>
<tr>
<td>9</td>
<td>Agent Resource Accounting and Billing</td>
<td>MEDIUM</td>
</tr>
<tr>
<td>10</td>
<td>Agent Capability Negotiation</td>
<td>MEDIUM</td>
</tr>
<tr>
<td>11</td>
<td>Agent Performance Benchmarking</td>
<td>MEDIUM</td>
</tr>
</tbody>
</table>
<p>The gap names above match the automated gap analysis output. The two critical gaps -- behavioral verification and failure cascade prevention -- address what happens when autonomous agents deviate from declared behavior or trigger cascading failures across interconnected systems. Several high-severity gaps (rollback mechanisms, human override, consensus protocols) address the same theme: what happens when things go wrong, and nobody has built the safety net.</p>
<p>A notable omission from this gap list: <strong>GDPR-mandated capabilities</strong>. The gap analysis focuses on technical desiderata but does not engage with the EU's legally binding data protection framework. Specific GDPR requirements that have no corresponding IETF draft work include: Data Protection Impact Assessment (DPIA) tooling for high-risk agent processing (Art. 35 GDPR), right-to-erasure propagation across multi-agent chains (Art. 17), data portability for agent-generated personal data (Art. 20), and purpose limitation enforcement when agents are authorized for specific tasks but may repurpose data (Art. 5(1)(b)). These are not optional features for EU-deployed agent systems -- they are legal requirements.</p>
<h2 id="critical-gap-1-agent-behavior-verification">Critical Gap 1: Agent Behavior Verification</h2>
<p><strong>The problem</strong>: No mechanism exists to verify that a deployed AI agent actually behaves according to its declared policies or specifications.</p>
<p><strong>The numbers</strong>: Only <strong>47 of 434 drafts</strong> address AI safety and alignment. The capability-to-safety ratio is roughly 4:1 on aggregate -- though it varies significantly by month, from as low as 1.5:1 to as high as 21:1. The trend is clear: the community is building agents faster than it is building the tools to keep them honest.</p>
<p><strong>What partially addresses this</strong>: Some work exists on the periphery. <a href="https://datatracker.ietf.org/doc/draft-aylward-daap-v2/">draft-aylward-daap-v2</a> (score 4.75 -- the highest-rated draft in the corpus) defines a behavioral monitoring framework and cryptographic identity verification. <a href="https://datatracker.ietf.org/doc/draft-birkholz-verifiable-agent-conversations/">draft-birkholz-verifiable-agent-conversations</a> (score 4.5) proposes verifiable conversation records using COSE signing. <a href="https://datatracker.ietf.org/doc/draft-berlinai-vera/">draft-berlinai-vera</a> (score 3.9) introduces a zero-trust architecture with five enforcement pillars.</p>
<p><strong>What is still missing</strong>: Runtime verification. These drafts define what agents <em>should</em> do and how to <em>record</em> what they did. None provides a real-time mechanism to detect that an agent is deviating from its declared behavior <em>while it is operating</em>. The gap is between policy declaration and policy enforcement -- the difference between a speed limit sign and a speed camera.</p>
<p><strong>The scenario</strong>: A financial trading agent is authorized to execute trades within specified parameters. It begins operating within bounds but, after a model update, starts exceeding risk limits. Without runtime behavior verification, the deviation is only discovered in post-hoc audit -- potentially days later, after significant damage.</p>
<h2 id="critical-gap-2-agent-failure-cascade-prevention">Critical Gap 2: Agent Failure Cascade Prevention</h2>
<p><strong>The problem</strong>: No protocols exist to prevent agent failures from cascading across interconnected autonomous systems. As agent interdependencies increase in production deployments, a failure in one agent can ripple outward.</p>
<p><strong>The numbers</strong>: Only <strong>47 drafts</strong> address AI safety despite 434 total drafts, and the high interconnectivity implied by 155 A2A protocols and 114 autonomous netops drafts creates the conditions for cascade failures.</p>
<p><strong>What is missing</strong>: Circuit breakers for cascading failures. Checkpoint and rollback protocols. Blast radius containment. Graceful degradation. All concepts well-established in distributed systems engineering, but absent from the agent standards landscape.</p>
<p><strong>The scenario</strong>: A telecom operator deploys 50 AI agents for network monitoring, troubleshooting, and optimization. During a major outage, all 50 agents simultaneously request inference resources to diagnose the problem. With no failure cascade prevention, agents compete chaotically. The most aggressive agents get resources; the most important diagnostic tasks may not. The outage extends because the agents that could fix it are starved by the agents that are observing it. For telecom operators in the EU, the NIS2 Directive (Directive 2022/2555) classifies electronic communications as an essential service, requiring incident response capabilities and supply chain security measures -- making cascade prevention not just an engineering problem but a regulatory obligation.</p>
<h2 id="high-gap-real-time-agent-rollback-mechanisms">High Gap: Real-Time Agent Rollback Mechanisms</h2>
<p><strong>The problem</strong>: No standards exist for how to quickly roll back incorrect decisions made by autonomous agents across distributed systems.</p>
<p><strong>The numbers</strong>: 114 autonomous netops drafts exist, but no rollback mechanisms for production network safety. <a href="https://datatracker.ietf.org/doc/draft-yue-anima-agent-recovery-networks/">draft-yue-anima-agent-recovery-networks</a> (score 4.1) is among the few drafts that partially addresses this, with its Task-Oriented Multi-Agent Recovery Framework and State Consistency Management. For context, "Multi-Agent Communication Protocol" -- defining how agents <em>talk</em> -- appears in 8 drafts. The community has invested far more effort in the plumbing than in the fire escape.</p>
<p><strong>What is missing</strong>: Circuit breakers for cascading failures. Checkpoint and rollback protocols. Blast radius containment. Graceful degradation. All concepts well-established in distributed systems engineering, but absent from the agent standards landscape.</p>
<p><strong>The scenario</strong>: A multi-agent supply chain system manages inventory, shipping, and payments. The inventory agent processes a large batch incorrectly, leading the shipping agent to dispatch wrong items, which causes the payment agent to process refunds to wrong accounts. The cascade happens in minutes. Without rollback mechanisms, untangling the mess requires manual human intervention across three systems -- and the agents continue making decisions based on corrupted state while humans try to intervene.</p>
<h2 id="the-high-priority-gaps">The High-Priority Gaps</h2>
<p>Several additional gaps scored HIGH severity. Each represents a missing piece that working deployments will hit:</p>
<h3 id="human-override-standardization">Human Override Standardization</h3>
<p>Only <strong>34 human-agent interaction drafts</strong> exist versus <strong>114 autonomous operations</strong> and <strong>155 A2A protocol</strong> drafts. Agents are being designed to talk to each other at a roughly 4:1 ratio (averaging ~4:1, varying from 1.5:1 to 21:1 month-to-month) over being designed to talk to humans. Emergency override protocols -- the "big red button" -- are almost entirely absent. This is not merely an engineering preference. For high-risk AI systems deployed in the EU, the AI Act (Art. 14) mandates human oversight -- making this gap a compliance blocker, not just a design omission.</p>
<p><a href="https://datatracker.ietf.org/doc/draft-rosenberg-aiproto-cheq/">draft-rosenberg-aiproto-cheq</a> (score 3.9) is a rare exception: it defines a protocol for human confirmation of agent decisions before execution. But CHEQ is opt-in and pre-execution. No draft defines what happens when a human needs to stop a running agent, constrain its behavior, or take over its task mid-execution.</p>
<h3 id="multi-agent-consensus-protocols">Multi-Agent Consensus Protocols</h3>
<p>When a group of agents disagree -- the diagnosis agent says the router is down, the monitoring agent says it is up, the optimization agent is rerouting traffic around it -- who arbitrates? No framework exists for agents to resolve conflicting assessments without human intervention. This is not a new problem: FIPA (Foundation for Intelligent Physical Agents) defined agent communication languages and interaction protocols for multi-agent coordination as early as 1997. The IETF landscape has largely not engaged with this prior art.</p>
<h3 id="cross-domain-agent-audit-trails">Cross-Domain Agent Audit Trails</h3>
<p>An agent operating across multiple domains or organizations needs to maintain audit trails that satisfy different regulatory requirements simultaneously. Identity management exists -- the 152 identity/auth drafts cover authentication. What does not exist is cross-domain audit standardization: the format and semantics for recording agent actions across jurisdictions with varying compliance requirements. The EU's eIDAS 2.0 regulation (Regulation 2024/1183) and its European Digital Identity Wallet framework provide a mature trust model that the IETF drafts have not yet connected to.</p>
<h3 id="federated-agent-learning-privacy">Federated Agent Learning Privacy</h3>
<p>While federated architectures exist, there is insufficient specification for privacy-preserving agent learning that prevents data leakage between federated participants during model updates. The absence of secure update mechanisms also intersects with the EU Cyber Resilience Act (Regulation 2024/2847), which requires products with digital elements -- including AI agent software -- to handle updates securely and provide vulnerability management throughout their lifecycle.</p>
<h3 id="cross-protocol-agent-migration">Cross-Protocol Agent Migration</h3>
<p>Agents need to migrate between different network protocols, domains, or infrastructure providers while maintaining state and identity. Current drafts focus on registration but not migration continuity.</p>
<h2 id="the-structural-problem">The Structural Problem</h2>
<p>Here is the finding the Architect on our team surfaced that reframes the entire gap analysis:</p>
<p><strong>The severity of each gap appears to correlate with the coordination difficulty required to fill it.</strong></p>
<p>The critical gaps (behavior verification, resource management, error recovery) require agreement across <em>multiple</em> IETF working groups. They cut across safety, networking, identity, and operations -- areas currently owned by separate teams that rarely collaborate. The high gaps (cross-protocol translation, human override, consensus) require even broader agreement: they need architects who see the whole ecosystem, not just their protocol.</p>
<p>Now look back at the team bloc analysis from Post 2. The 18 team blocs are <em>islands</em>. Cross-team collaboration is sparse. The strongest cross-bloc connection involves 3 shared drafts. The gaps that require the most cross-team work are being produced by an ecosystem that does the least cross-team work.</p>
<p>This is the structural explanation for the safety deficit. It is not that people do not care about safety. It is that safety standards require coordination across boundaries that the current authorship structure cannot bridge. Capability standards can be built within a single team. Safety standards cannot.</p>
<p>Our category co-occurrence analysis provides the concrete proof. Safety drafts are not entirely isolated -- they co-occur with several categories, coupling most strongly with policy and governance and identity/auth. But the pattern is revealing: safety pairs with <em>governance</em> categories, not <em>implementation</em> categories. Of the 155 drafts tagged as A2A protocols, very few also address safety. Safety has minimal co-occurrence with agent discovery/registration and model serving/inference. Its weakest links are to the categories where agents actually <em>do</em> things. Safety is being discussed in governance papers. It is barely present in the protocols that need it most. The traffic lights are not just behind the highways -- they are on a different road entirely.</p>
<p>IEEE P3394 (Standard for Trustworthy AI Agents), a concurrent standardization effort, is attempting to address some of these safety and trust dimensions from a different angle. The IETF landscape should be compared against these parallel efforts to understand which gaps are being addressed elsewhere and which remain truly unserved.</p>
<h2 id="the-41-ratio-revisited">The ~4:1 Ratio, Revisited</h2>
<p>The safety deficit is not just a number. It is a structural property of how the IETF's AI agent community is organized.</p>
<table>
<thead>
<tr>
<th>Category</th>
<th style="text-align: right;">Drafts</th>
<th style="text-align: right;">Team Blocs Active</th>
</tr>
</thead>
<tbody>
<tr>
<td>A2A protocols</td>
<td style="text-align: right;">155</td>
<td style="text-align: right;">Many (distributed across blocs)</td>
</tr>
<tr>
<td>Autonomous operations</td>
<td style="text-align: right;">114</td>
<td style="text-align: right;">Primarily Huawei, Chinese telecom</td>
</tr>
<tr>
<td>Agent identity/auth</td>
<td style="text-align: right;">152</td>
<td style="text-align: right;">Ericsson, Nokia, ATHENA, multiple</td>
</tr>
<tr>
<td><strong>AI safety/alignment</strong></td>
<td style="text-align: right;"><strong>47</strong></td>
<td style="text-align: right;"><strong>Few; mostly independents/startups</strong></td>
</tr>
<tr>
<td><strong>Human-agent interaction</strong></td>
<td style="text-align: right;"><strong>34</strong></td>
<td style="text-align: right;"><strong>Rosenberg/White (2-person team)</strong></td>
</tr>
</tbody>
</table>
<p>The capability categories have organized teams behind them. The safety categories rely on individual contributors and small, unconnected teams. The best safety draft in the corpus (DAAP, score 4.75) comes from an independent author (Aylward). The best human-agent drafts come from a two-person Five9/Bitwave team. There is no 13-person safety bloc with 94% cohesion.</p>
<p>Until that changes -- until safety and human oversight attract the same organized, sustained effort as communication protocols -- the ~4:1 aggregate ratio will persist. And the gaps will remain open.</p>
<hr />
<h3 id="key-takeaways">Key Takeaways</h3>
<ul>
<li><strong>11 gaps</strong> exist in the IETF's AI agent landscape: 2 critical, 5 high, 4 medium</li>
<li><strong>The 2 critical gaps</strong> address failure modes: behavioral verification and failure cascade prevention</li>
<li><strong>Agent rollback mechanisms and human override standardization</strong> are high-severity gaps with minimal coverage across 434 drafts</li>
<li><strong>Gap severity appears to correlate with coordination difficulty</strong>: the hardest gaps require cross-team, cross-WG collaboration that the current island structure cannot produce</li>
<li><strong>The safety deficit appears structural, not attitudinal</strong>: capability standards can be built by one team; safety standards require ecosystem-wide coordination that does not yet exist</li>
<li><strong>GDPR-mandated capabilities</strong> (DPIA support, erasure propagation, data portability, purpose limitation) represent an additional missing dimension not captured in the automated gap analysis</li>
</ul>
<p><em>Next in this series: <a href="05-1262-ideas.md">Where 434 Drafts Converge (And Where They Don't)</a> -- the fragmentation goes all the way down.</em></p>
<hr />
<p><em>Gap analysis based on 434 drafts, cross-referenced against real-world deployment requirements for autonomous AI agent systems. Data current as of March 2026.</em></p>
<div class="post-nav"><a href="/blog/posts/03-oauth-wars.html">&larr; The OAuth Wars</a><a href="/blog/posts/05-1262-ideas.html">Where Drafts Converge &rarr;</a></div>
<footer>
<p>IETF Draft Analyzer &mdash; Data collected through March 2026.
<a href="https://github.com/cnennemann/ietf-draft-analyzer">Source on GitHub</a></p>
</footer>
</div>
</body>
</html>

View File

@@ -0,0 +1,367 @@
<!DOCTYPE html>
<html lang="en">
<head>
<meta charset="utf-8">
<meta name="viewport" content="width=device-width, initial-scale=1">
<title>Where 434 Drafts Converge (And Where They Don't) — IETF AI Agent Analysis</title>
<link rel="stylesheet" href="/blog/css/style.css">
</head>
<body>
<div class="container">
<nav><a href="/blog/" class="site-title">IETF AI Agent Analysis</a>
<a href="/blog/posts/01-gold-rush.html">Rush</a>
<a href="/blog/posts/02-who-writes-the-rules.html">Rules</a>
<a href="/blog/posts/03-oauth-wars.html">Wars</a>
<a href="/blog/posts/04-what-nobody-builds.html">Builds</a>
<strong>Converge</strong>
<a href="/blog/posts/06-big-picture.html">Picture</a>
<a href="/blog/posts/07-how-we-built-this.html">This</a>
<a href="/blog/posts/08-agents-building-the-analysis.html">Analysis</a></nav>
<h1 id="where-434-drafts-converge-and-where-they-dont">Where 434 Drafts Converge (And Where They Don't)</h1>
<p><em>The fragmentation goes deeper than competing protocols. It extends all the way down to the idea level.</em></p>
<p><em>Analysis based on IETF Datatracker data collected through March 2026. Counts and statistics reflect this snapshot.</em></p>
<hr />
<p>We extracted technical components from 434 Internet-Drafts -- mechanisms, architectures, protocols, and patterns. Then we asked: how many of these ideas does anyone else also propose?</p>
<p>The current database contains <strong>419 extracted ideas</strong> across 377 drafts. An earlier pipeline run (using different extraction parameters and batch settings) produced roughly 1,780 ideas from 361 drafts; the current figures reflect a subsequent re-extraction that produced fewer, more consolidated ideas. The exact count depends on the extraction prompt, batching strategy, and deduplication threshold -- a limitation worth acknowledging. What is robust across both runs is the <em>pattern</em>: the vast majority of extracted ideas appear in exactly one draft. Only a handful show cross-draft convergence by exact title matching. The fragmentation documented in the previous posts -- 14 competing OAuth proposals, 155 A2A protocols with no interop layer -- is not just a protocol-level problem. It extends all the way down. At the idea level, the landscape is overwhelmingly a collection of islands.</p>
<p>But islands are not the whole story. Using fuzzy matching (SequenceMatcher at 0.75 threshold) across organizational boundaries, we found <strong>130 cross-org convergent ideas</strong> where different organizations are working on recognizably similar problems -- even when they use different names and different approaches. (An earlier pipeline run with ~1,780 raw ideas produced 628 cross-org convergent ideas; the current, more consolidated extraction of 419 ideas yields 130 at the same threshold -- 36% of unique clusters, a comparable convergence rate.) These cross-org convergence signals are the embryonic consensus of the agent standards landscape: the problems that different teams, in different countries, with different agendas, independently recognize and attempt to solve.</p>
<p>These convergence signals are more impressive than they first appear. Recall from Post 2 that <strong>55% of all drafts have never been revised</strong> beyond their first submission, and <strong>65% of Huawei's drafts</strong> are fire-and-forget. The ideas that converge across organizations are not the generic scaffolding of first-draft submissions -- they represent genuine engineering investment from teams that independently identified the same problem and committed resources to solving it.</p>
<p>The picture that emerges is paradoxical: the raw material for a complete agent ecosystem exists. The convergent ideas point toward the architecture the ecosystem needs. But they exist in isolation -- proposed by separate teams, embedded in separate drafts, with no connective tissue linking them into a coherent blueprint.</p>
<h2 id="the-taxonomy">The Taxonomy</h2>
<p>Every extracted idea was classified by type. The distribution reveals what kind of thinking dominates the landscape:</p>
<table>
<thead>
<tr>
<th>Type</th>
<th style="text-align: right;">Count</th>
<th style="text-align: right;">Share</th>
<th>What It Means</th>
</tr>
</thead>
<tbody>
<tr>
<td>Protocol</td>
<td style="text-align: right;">96</td>
<td style="text-align: right;">23%</td>
<td>Full protocol specifications</td>
</tr>
<tr>
<td>Architecture</td>
<td style="text-align: right;">95</td>
<td style="text-align: right;">23%</td>
<td>System designs and reference models</td>
</tr>
<tr>
<td>Extension</td>
<td style="text-align: right;">79</td>
<td style="text-align: right;">19%</td>
<td>Additions to existing standards (OAuth, SCIM, DNS)</td>
</tr>
<tr>
<td>Mechanism</td>
<td style="text-align: right;">68</td>
<td style="text-align: right;">16%</td>
<td>Concrete technical solutions: auth flows, routing algorithms, token formats</td>
</tr>
<tr>
<td>Requirement</td>
<td style="text-align: right;">42</td>
<td style="text-align: right;">10%</td>
<td>Formal requirement documents</td>
</tr>
<tr>
<td>Pattern</td>
<td style="text-align: right;">35</td>
<td style="text-align: right;">8%</td>
<td>Reusable design approaches</td>
</tr>
<tr>
<td>Framework</td>
<td style="text-align: right;">3</td>
<td style="text-align: right;">1%</td>
<td>Frameworks, profiles</td>
</tr>
<tr>
<td>Format</td>
<td style="text-align: right;">1</td>
<td style="text-align: right;">&lt;1%</td>
<td>Data format specifications</td>
</tr>
</tbody>
</table>
<p><em>Note: These counts reflect the current database (419 ideas). An earlier pipeline run with different extraction parameters produced higher counts across all categories; the relative proportions are more meaningful than the absolute numbers.</em></p>
<p>The near-equal split between <strong>protocols</strong> (96), <strong>architectures</strong> (95), and <strong>extensions</strong> (79) tells us the community is both building new solutions and extending existing ones. The protocols and extensions show that much of the work builds on established foundations (OAuth 2.0, SCIM, DNS, EDHOC) rather than starting from scratch.</p>
<p>The 95 architectures and 42 requirements suggest healthy standards development: teams are defining reference models before writing code. But the 35 patterns -- reusable approaches without full protocol specification -- indicate that some teams have identified what needs to be done without committing to how.</p>
<h2 id="where-teams-converge">Where Teams Converge</h2>
<p>By exact title, few ideas appear in multiple drafts. But ideas with different names often describe the same concept -- "Agent Gateway" in one draft and "Inter-Agent Communication Hub" in another. Our fuzzy-matching overlap analysis (using SequenceMatcher at 0.75 threshold) across organizational boundaries found <strong>130 ideas</strong> where 2+ distinct organizations are working on recognizably similar problems. These are the genuine consensus signals.</p>
<table>
<thead>
<tr>
<th>Idea</th>
<th style="text-align: right;">Orgs</th>
<th style="text-align: right;">Drafts</th>
<th>Key Organizations</th>
</tr>
</thead>
<tbody>
<tr>
<td>A2A Communication Paradigm</td>
<td style="text-align: right;">8</td>
<td style="text-align: right;">5</td>
<td>CAICT, Deutsche Telekom, Huawei, Orange, Telefonica</td>
</tr>
<tr>
<td>AI Agent Network Architecture</td>
<td style="text-align: right;">8</td>
<td style="text-align: right;">5</td>
<td>China Mobile, Deutsche Telekom, Huawei, Orange, UnionPay</td>
</tr>
<tr>
<td>Multi-Agent Communication Protocol</td>
<td style="text-align: right;">7</td>
<td style="text-align: right;">8</td>
<td>AsiaInfo, BUPT, China Mobile, China Telecom, Huawei</td>
</tr>
<tr>
<td>AI Agent Communication Network (ACN)</td>
<td style="text-align: right;">7</td>
<td style="text-align: right;">5</td>
<td>ANP Open Source, China Mobile, Cisco, Five9, Huawei</td>
</tr>
<tr>
<td>NLIP (Natural Language Interchange)</td>
<td style="text-align: right;">7</td>
<td style="text-align: right;">1</td>
<td>Fordham, IBM, Purdue, ServiceNow, eBay</td>
</tr>
<tr>
<td>ELA Protocol</td>
<td style="text-align: right;">6</td>
<td style="text-align: right;">6</td>
<td>Bitwave, Cisco, Ericsson, Five9, Inria</td>
</tr>
<tr>
<td>AI Gateway</td>
<td style="text-align: right;">6</td>
<td style="text-align: right;">4</td>
<td>AsiaInfo, BUPT, China Telecom, Huawei, UnionPay</td>
</tr>
<tr>
<td>Agent Communication across WAN</td>
<td style="text-align: right;">6</td>
<td style="text-align: right;">3</td>
<td>China Mobile, China Unicom, Deutsche Telekom, Huawei, Orange</td>
</tr>
</tbody>
</table>
<p>The most-converged idea -- "A2A Communication Paradigm" -- draws independent contributions from <strong>8 organizations across 5 countries</strong>. This is simultaneously the strongest convergence signal and the strongest fragmentation signal. Eight organizations agree this is important. They are building separate, incompatible versions.</p>
<p>Look at who bridges the divide. In three of the top eight convergent ideas, the same names appear alongside Chinese institutions: <strong>Deutsche Telekom, Telefonica, and Orange</strong>. These European telecoms show up in "A2A Communication Paradigm," "AI Agent Network Architecture," and "Agent Communication across WAN" -- each time co-listed with Huawei, China Mobile, or China Unicom. Of the <strong>180 ideas that cross the Chinese-Western organizational divide</strong>, European telecoms are present on a disproportionate share. The organizations most likely to prevent the agent ecosystem from splitting into incompatible regional stacks are not Google or Microsoft -- they are European carriers operating in both markets. US Big Tech is almost entirely absent from cross-divide convergence.</p>
<p>The organization-pair overlaps reveal where real collaboration happens -- and where it does not:</p>
<table>
<thead>
<tr>
<th>Org Pair</th>
<th style="text-align: right;">Shared Ideas</th>
<th>Signal</th>
</tr>
</thead>
<tbody>
<tr>
<td>China Unicom -- Huawei</td>
<td style="text-align: right;">32</td>
<td>Deep intra-bloc alignment</td>
</tr>
<tr>
<td>China Mobile -- Huawei</td>
<td style="text-align: right;">27</td>
<td>Deep intra-bloc alignment</td>
</tr>
<tr>
<td>Ericsson -- Inria</td>
<td style="text-align: right;">21</td>
<td>European cross-org collaboration</td>
</tr>
<tr>
<td>Tsinghua -- Zhongguancun Lab</td>
<td style="text-align: right;">20</td>
<td>Chinese academic convergence</td>
</tr>
<tr>
<td>Fraunhofer SIT -- Tradeverifyd</td>
<td style="text-align: right;">10</td>
<td>Verifiable records niche</td>
</tr>
</tbody>
</table>
<p>The pattern is stark: the highest-overlap pairs are Chinese institutions working within established blocs. Formal co-authorship between Chinese and Western organizations is thin -- but idea-level convergence, mediated by European telecoms operating in both markets, is broader than the co-authorship data suggests.</p>
<p>The convergence signals cluster in three areas:</p>
<p><strong>1. Agent communication infrastructure.</strong> How agents discover, connect to, and message each other. This is the most active area with the most redundant proposals. The underlying need is clear; the implementation is contested.</p>
<p><strong>2. Authentication and authorization.</strong> Action-based authorization, agent registration, cryptographic identity verification. OAuth extensions dominate, but the approaches diverge significantly between pure OAuth extension (add claims/scopes) and novel frameworks (DAAP accountability protocol, STAMP delegation proofs).</p>
<p><strong>3. Network architecture.</strong> Agent gateways, agent communication networks, network management architectures. This is where the Chinese institutional ecosystem has the strongest presence, with Huawei and affiliated organizations producing most of the architecture ideas.</p>
<h2 id="where-teams-innovate">Where Teams Innovate</h2>
<p>The 96% of ideas appearing in only one draft are a mix: mostly generic components describing what each draft does ("Agent Gateway," "Transport Configuration System"), but scattered among them are genuinely novel proposals that no other team has attempted -- either because they are too new, too specialized, or ahead of their time.</p>
<p>Some standouts from the unique ideas:</p>
<p><strong>Verifiable Agent Behavior Attestation</strong> (draft-birkholz-verifiable-agent-conversations) -- A CDDL-based format for cryptographically signing agent conversation records, enabling post-hoc verification of agent behavior. This directly addresses the critical behavior verification gap.</p>
<p><strong>ADOL: Agentic Data Optimization Layer</strong> (<a href="https://datatracker.ietf.org/doc/draft-chang-agent-token-efficient/">draft-chang-agent-token-efficient</a>, score 4.5) -- Addresses token bloat in agent communication protocols. As agents exchange increasingly complex context, message sizes explode. ADOL compresses agent communications by 60-80%, a practical necessity that nobody else is working on.</p>
<p><strong>Working Memory</strong> (draft-agent-gw) -- A structured context management system that maintains state across multi-step agent operations. Sounds basic -- but no other draft proposes a standard for how agents should manage persistent operational context.</p>
<p><strong>Autonomous Optical Network Operation</strong> (draft-zhao-ccamp-actn-optical-network-agent) -- Applies agent architecture to the specific domain of optical network management. This is the kind of vertical specialization that validates the horizontal agent architecture work.</p>
<p><strong>Execution Context Token (ECT)</strong> (<a href="https://datatracker.ietf.org/doc/draft-nennemann-wimse-ect/">draft-nennemann-wimse-ect</a>, score 4.0) -- A JWT extension that records what each task did, linked to predecessors via a DAG. This is arguably the single most architecturally significant idea in the corpus: it turns the execution history of a multi-agent workflow into a cryptographically verifiable directed acyclic graph. It is the technical foundation for accountability, rollback, audit, and provenance.</p>
<p><strong>CHEQ Protocol</strong> (<a href="https://datatracker.ietf.org/doc/draft-rosenberg-aiproto-cheq/">draft-rosenberg-aiproto-cheq</a>, score 3.9) -- Human confirmation of agent decisions before execution. The only concrete protocol proposal for human-in-the-loop agent oversight. In a landscape of 30 human-agent interaction drafts, CHEQ stands alone as an implementable solution.</p>
<h2 id="the-five-ideas-that-matter-most">The Five Ideas That Matter Most</h2>
<p>If you are building agent systems today and need to know which IETF proposals to watch, these five represent the highest combination of quality, novelty, and gap-filling potential:</p>
<table>
<thead>
<tr>
<th>Idea</th>
<th>Draft</th>
<th style="text-align: right;">Score</th>
<th>Why It Matters</th>
</tr>
</thead>
<tbody>
<tr>
<td>Execution Context Token</td>
<td>draft-nennemann-wimse-ect</td>
<td style="text-align: right;">4.0</td>
<td>DAG-based execution evidence; foundation for audit, rollback, and accountability</td>
</tr>
<tr>
<td>DAAP Accountability Protocol</td>
<td>draft-aylward-daap-v2</td>
<td style="text-align: right;">4.75</td>
<td>Most comprehensive safety proposal; authentication + monitoring + enforcement</td>
</tr>
<tr>
<td>STAMP Delegation Proofs</td>
<td>draft-guy-bary-stamp-protocol</td>
<td style="text-align: right;">4.5</td>
<td>Cryptographic proof that an agent was authorized for a specific task</td>
</tr>
<tr>
<td>Agent Description Language (ADL)</td>
<td>draft-nederveld-adl</td>
<td style="text-align: right;">4.1</td>
<td>JSON standard for describing agent capabilities, tools, and permissions</td>
</tr>
<tr>
<td>Verifiable Conversations</td>
<td>draft-birkholz-verifiable-agent-conversations</td>
<td style="text-align: right;">4.5</td>
<td>Cryptographic signing of conversation records for auditability</td>
</tr>
</tbody>
</table>
<p>Together, these five ideas sketch the outline of the ecosystem architecture that Post 6 will describe in full: ECT provides the execution backbone, DAAP provides the accountability layer, STAMP proves delegation, ADL describes capabilities, and verifiable conversations create the audit trail.</p>
<h2 id="mapping-ideas-to-gaps">Mapping Ideas to Gaps</h2>
<p>The most revealing analysis is mapping which ideas partially address which gaps:</p>
<table>
<thead>
<tr>
<th>Gap</th>
<th>Severity</th>
<th>Coverage</th>
</tr>
</thead>
<tbody>
<tr>
<td>Agent Behavioral Verification</td>
<td>CRITICAL</td>
<td>Partial: attestation and monitoring ideas exist but no runtime enforcement</td>
</tr>
<tr>
<td>Agent Failure Cascade Prevention</td>
<td>CRITICAL</td>
<td>Near-zero: minimal work on cascade containment</td>
</tr>
<tr>
<td>Real-Time Agent Rollback Mechanisms</td>
<td>HIGH</td>
<td>Near-zero: limited to draft-yue-anima-agent-recovery-networks</td>
</tr>
<tr>
<td>Multi-Agent Consensus Protocols</td>
<td>HIGH</td>
<td>Minimal: no conflict resolution framework</td>
</tr>
<tr>
<td>Human Override Standardization</td>
<td>HIGH</td>
<td>Near-zero: CHEQ exists but no emergency override protocol</td>
</tr>
<tr>
<td>Cross-Domain Agent Audit Trails</td>
<td>HIGH</td>
<td>Partial: identity covered, cross-domain audit not</td>
</tr>
<tr>
<td>Federated Agent Learning Privacy</td>
<td>HIGH</td>
<td>Minimal: privacy-preserving learning not specified</td>
</tr>
<tr>
<td>Cross-Protocol Agent Migration</td>
<td>MEDIUM</td>
<td>Complete absence in the corpus</td>
</tr>
<tr>
<td>Agent Resource Accounting and Billing</td>
<td>MEDIUM</td>
<td>Peripheral: resource types defined but no economic models</td>
</tr>
<tr>
<td>Agent Capability Negotiation</td>
<td>MEDIUM</td>
<td>Partial: tool enumeration exists but not dynamic negotiation</td>
</tr>
<tr>
<td>Agent Performance Benchmarking</td>
<td>MEDIUM</td>
<td>Moderate: benchmarking ideas exist (draft-cui-nmrg-llm-benchmark)</td>
</tr>
</tbody>
</table>
<p>The pattern is clear: the critical and high-severity gaps are those where the <em>periphery</em> of existing work touches the problem but nobody makes it the <em>central</em> problem. Teams building communication protocols think about resources; teams building discovery think about lifecycle. The gaps where no team is even circling the problem -- rollback mechanisms, human override, cascade prevention -- are the true blind spots.</p>
<h2 id="the-ideas-nobody-had">The Ideas Nobody Had</h2>
<p>Sometimes the absence is the finding. Here are technical ideas conspicuous in their absence from the entire corpus:</p>
<ul>
<li>
<p><strong>Agent capability degradation signaling</strong>: No protocol for an agent to advertise that its performance has degraded (model drift, resource constraints, partial failure). Other agents continue relying on it at full trust.</p>
</li>
<li>
<p><strong>Multi-agent transaction semantics</strong>: No ACID-like guarantees for multi-agent workflows. If three agents must all succeed or all roll back, there is no two-phase commit equivalent.</p>
</li>
<li>
<p><strong>Agent migration protocol</strong>: No standard for moving a running agent from one host to another while preserving state and active connections. Critical for cloud deployments.</p>
</li>
<li>
<p><strong>Privacy-preserving agent discovery</strong>: No mechanism for an agent to find capabilities without revealing its intent. "I need a medical diagnosis agent" reveals sensitive information before any trust is established. Under Art. 25 GDPR (data protection by design and by default), this is not just a nice-to-have -- it is a legal requirement for EU-deployed systems where discovery queries may constitute processing of special category data (Art. 9 GDPR, health data).</p>
</li>
<li>
<p><strong>Agent cost and billing</strong>: No standard for agents to negotiate compensation for services. Agents performing work for other agents have no way to express "this costs X" or "you have Y credits remaining."</p>
</li>
</ul>
<p>Each of these absences represents an opportunity for new drafts that would fill genuine needs.</p>
<h2 id="what-the-taxonomy-tells-builders">What the Taxonomy Tells Builders</h2>
<p>Three practical takeaways for anyone implementing agent systems:</p>
<p><strong>1. Build on the convergent ideas.</strong> Agent registration, action-based authorization, and capability-based discovery appear across multiple teams and organizations. These represent genuine consensus about what the infrastructure needs, even if implementations diverge.</p>
<p><strong>2. Watch the single-source innovations.</strong> The long tail of single-draft ideas contains the innovations that will differentiate the next generation of agent platforms. ECT, CHEQ, ADOL, and ADL are not widely known but represent some of the most thoughtful engineering in the corpus.</p>
<p><strong>3. Fill the blank spaces.</strong> Error recovery, cross-protocol translation, and human override are the clearest opportunities for new contributions. The community has signaled these gaps matter (through the severity of the gap analysis) but has not yet produced the ideas to fill them.</p>
<hr />
<h3 id="key-takeaways">Key Takeaways</h3>
<ul>
<li><strong>The vast majority of ideas appear in exactly one draft</strong> -- fragmentation extends all the way down to the idea level</li>
<li><strong>130 cross-org convergent ideas</strong> (36% of unique clusters, via SequenceMatcher fuzzy matching at 0.75 threshold) reveal where organizations independently agree; highest-overlap pairs are Chinese institutions (China Unicom-Huawei: 32 shared ideas)</li>
<li><strong>The critical gaps remain unfilled</strong>: rollback mechanisms, failure cascade prevention, and human override have minimal coverage across 434 drafts</li>
<li><strong>Five ideas to watch</strong>: ECT (execution DAG), DAAP (accountability), STAMP (delegation proof), ADL (agent description), verifiable conversations (audit trail)</li>
<li><strong>Convergence clusters in three areas</strong>: agent communication infrastructure, authentication/authorization, and network architecture</li>
</ul>
<p><em>Next in this series: <a href="06-big-picture.md">Drawing the Big Picture</a> -- 130 cross-org convergent ideas, 11 gaps, and the architectural vision that connects them.</em></p>
<hr />
<p><em>Idea extraction performed by Claude from draft abstracts and full text. Classification into types (protocol, architecture, extension, mechanism, requirement, pattern) based on the technical content of each proposal. The current database contains 419 ideas; figures referencing ~1,780 ideas come from an earlier pipeline run with different extraction parameters. Data current as of March 2026.</em></p>
<div class="post-nav"><a href="/blog/posts/04-what-nobody-builds.html">&larr; What Nobody Builds</a><a href="/blog/posts/06-big-picture.html">The Big Picture &rarr;</a></div>
<footer>
<p>IETF Draft Analyzer &mdash; Data collected through March 2026.
<a href="https://github.com/cnennemann/ietf-draft-analyzer">Source on GitHub</a></p>
</footer>
</div>
</body>
</html>

View File

@@ -0,0 +1,193 @@
<!DOCTYPE html>
<html lang="en">
<head>
<meta charset="utf-8">
<meta name="viewport" content="width=device-width, initial-scale=1">
<title>Drawing the Big Picture: What the Agent Ecosystem Actually Needs — IETF AI Agent Analysis</title>
<link rel="stylesheet" href="/blog/css/style.css">
</head>
<body>
<div class="container">
<nav><a href="/blog/" class="site-title">IETF AI Agent Analysis</a>
<a href="/blog/posts/01-gold-rush.html">Rush</a>
<a href="/blog/posts/02-who-writes-the-rules.html">Rules</a>
<a href="/blog/posts/03-oauth-wars.html">Wars</a>
<a href="/blog/posts/04-what-nobody-builds.html">Builds</a>
<a href="/blog/posts/05-1262-ideas.html">Converge</a>
<strong>Picture</strong>
<a href="/blog/posts/07-how-we-built-this.html">This</a>
<a href="/blog/posts/08-agents-building-the-analysis.html">Analysis</a></nav>
<h1 id="drawing-the-big-picture-what-the-agent-ecosystem-actually-needs">Drawing the Big Picture: What the Agent Ecosystem Actually Needs</h1>
<p><em>434 drafts, 130 cross-org convergent ideas, 11 gaps -- and the architectural vision that connects them all.</em></p>
<p><em>Analysis based on IETF Datatracker data collected through March 2026. Counts and statistics reflect this snapshot.</em></p>
<hr />
<p>We have spent five posts documenting a paradox: the IETF's AI agent landscape has extraordinary breadth (434 drafts), deep fragmentation at every level (the vast majority of ideas appear in only one draft, 155 competing A2A protocols, 14 OAuth proposals), concentrated authorship (18 team blocs, one company writing ~16% of all drafts), and critical gaps (behavioral verification, failure cascade prevention, human override) that nobody is filling.</p>
<p>The landscape has quantity. It lacks architecture.</p>
<p>This post is about what the architecture looks like -- not in theory, but derived from the data. The 11 gaps are not random absences; they are structurally related. The convergent ideas contain the components; they need a blueprint. And the blueprint already has a foundation: existing IETF work on workload identity (SPIFFE/WIMSE) and execution evidence (Execution Context Tokens) provides the lower layers. What is missing is what goes on top.</p>
<h2 id="what-the-ecosystem-needs-four-pillars">What the Ecosystem Needs: Four Pillars</h2>
<p>Our analysis -- synthesizing the gaps, the ideas, and the existing proposals -- points to four missing pillars:</p>
<h3 id="pillar-1-dag-based-execution">Pillar 1: DAG-Based Execution</h3>
<p><strong>The gap it fills</strong>: Error Recovery and Rollback (Critical), Resource Management (Critical)</p>
<p>Every multi-agent workflow is a directed acyclic graph: tasks with dependencies, checkpoints, and decision points. But no draft in the corpus defines "agent task graph" as a first-class construct. Without it, there is no way to:
- Know which tasks depend on which
- Place checkpoints for rollback
- Calculate the blast radius of a failure
- Schedule resources based on the graph structure</p>
<p>The Execution Context Token (ECT) from <a href="https://datatracker.ietf.org/doc/draft-nennemann-wimse-ect/">draft-nennemann-wimse-ect</a> provides the evidence layer: each task produces a signed token linked to its predecessors via parent references, forming a verifiable DAG. What is missing is the orchestration semantics: when to checkpoint, how to roll back, how to contain cascading failures.</p>
<p>The data supports this: the limited work addressing error recovery (notably <a href="https://datatracker.ietf.org/doc/draft-yue-anima-agent-recovery-networks/">draft-yue-anima-agent-recovery-networks</a>) includes "Task-Oriented Multi-Agent Recovery Framework" and "State Consistency Management" -- DAG concepts by another name. The answer is the same structure: a DAG execution model.</p>
<h3 id="pillar-2-human-in-the-loop-as-first-class">Pillar 2: Human-in-the-Loop as First Class</h3>
<p><strong>The gap it fills</strong>: Human Override and Intervention (High), Agent Explainability (Medium)</p>
<p>Only <strong>34 human-agent interaction drafts</strong> exist against <strong>155 A2A protocol</strong> drafts and <strong>114 autonomous operations</strong> drafts. Agents are being designed to talk to each other, not to humans. The CHEQ protocol (<a href="https://datatracker.ietf.org/doc/draft-rosenberg-aiproto-cheq/">draft-rosenberg-aiproto-cheq</a>) is a rare exception -- it defines human confirmation <em>before</em> agent execution. But nobody has standardized what happens <em>during</em> execution: how a human pauses a running workflow, constrains an agent's scope, takes over a task, or issues an emergency stop.</p>
<p>Human-in-the-loop must be a node type in the execution DAG, not an afterthought. The architecture needs:
- <strong>Approval gates</strong>: DAG nodes that block until a human approves
- <strong>Override commands</strong>: Standardized signals to pause, constrain, stop, or take over
- <strong>Escalation paths</strong>: What happens when an override times out
- <strong>Explainability tokens</strong>: How an agent communicates its reasoning at a HITL point</p>
<p>The irony: every production deployment will require these primitives. The standards community is building autonomous capabilities while the deployment community is adding human oversight ad hoc.</p>
<h3 id="pillar-3-protocol-agnostic-interoperability">Pillar 3: Protocol-Agnostic Interoperability</h3>
<p><strong>The gap it fills</strong>: Cross-Protocol Translation (High, zero ideas), Agent Lifecycle Management (High)</p>
<p>The 155 A2A protocol drafts will never converge to a single winner. MCP, A2A Protocol, SLIM, and dozens of others will coexist, each with different strengths. The answer is not to pick one; it is to build a translation layer that lets agents using different protocols interoperate through gateways.</p>
<p>This gap has <strong>zero ideas</strong> in the current corpus -- the starkest absence across 434 drafts. No team is working on it. Yet it is perhaps the most important architectural piece: without protocol interoperability, the agent ecosystem fragments into vendor-locked silos.</p>
<p>The protocol binding layer would define:
- How agents advertise which ecosystem features they support
- How gateways translate between protocols while preserving execution semantics (the DAG, the HITL points)
- How agents version and retire gracefully without breaking dependents
- The minimal semantic contract: intent, result, error -- expressible in any protocol</p>
<h3 id="pillar-4-assurance-profiles-dual-regime">Pillar 4: Assurance Profiles (Dual Regime)</h3>
<p><strong>The gap it fills</strong>: Behavior Verification (Critical), Cross-Domain Security (High), Dynamic Trust (High), Data Provenance (Medium)</p>
<p>The same agent ecosystem must work in two regimes:</p>
<p><strong>Relaxed</strong> (development, internal tools, low-risk): Best-effort, optional audit, minimal proof overhead. Think Kubernetes-deployed internal agents.</p>
<p><strong>Regulated</strong> (finance, healthcare, critical infrastructure): Cryptographic attestation per task, provenance chains, behavior verification against declared specifications, mandatory audit ledger. Think medical or financial agents.</p>
<p>The architecture achieves this with <em>assurance profiles</em> -- named configurations that dial up or down the proof requirements. The same DAG, same HITL points, same protocol bindings. Different levels of evidence:</p>
<table>
<thead>
<tr>
<th>Level</th>
<th>Evidence</th>
<th>Use Case</th>
</tr>
</thead>
<tbody>
<tr>
<td>L0</td>
<td>None (best-effort)</td>
<td>Development, testing</td>
</tr>
<tr>
<td>L1</td>
<td>Unsigned audit trail</td>
<td>Internal production</td>
</tr>
<tr>
<td>L2</td>
<td>Signed ECTs (JWT)</td>
<td>Cross-org, standard compliance</td>
</tr>
<tr>
<td>L3</td>
<td>Signed ECTs + external audit ledger</td>
<td>Regulated industries</td>
</tr>
</tbody>
</table>
<p>This dual-regime approach resolves the tension between "move fast" deployments and "prove everything" regulated environments. Ideas touching behavior verification and data provenance become implementable at higher assurance levels without imposing their cost on every deployment. Notably, the L2 and L3 profiles map directly to the conformity assessment requirements of the EU AI Act (Art. 43): high-risk AI systems must demonstrate compliance through either internal control (L2's signed ECTs) or third-party audit (L3's external audit ledger), making assurance profiles not just an engineering convenience but a regulatory implementation pathway.</p>
<h2 id="how-it-builds-on-what-exists">How It Builds on What Exists</h2>
<p>A critical point: this architecture does not compete with existing work. It layers on top of it. Our cross-reference analysis confirms the foundations are strong: <strong>TLS 1.3</strong> (RFC 8446, cited by 42 drafts), <strong>OAuth 2.0</strong> (RFC 6749, 36 drafts), <strong>HTTP Semantics</strong> (RFC 9110, 34 drafts), <strong>JWT</strong> (RFC 7519, 22 drafts), and <strong>COSE</strong> (RFC 9052, 20 drafts) form the bedrock.</p>
<p>But the bedrock is not uniform. Our RFC foundation analysis (Post 3) revealed that the Chinese and Western blocs build on <strong>fundamentally different technology stacks</strong>: YANG/NETCONF for network management on one side, COSE/CBOR/CoAP for IoT security on the other. The only shared foundation is OAuth 2.0. This means the architecture layer above must be genuinely protocol-agnostic -- it cannot assume either stack as the default. The four pillars are designed with this constraint: the DAG model, HITL primitives, and assurance profiles are expressed in terms of abstract semantics, not specific wire formats. The protocol binding layer (Pillar 3) exists precisely because the underlying plumbing diverges.</p>
<p>The architecture adds connective tissue above this layer, not below it:</p>
<table>
<thead>
<tr>
<th>Layer</th>
<th>Existing Work</th>
<th>What We Add</th>
</tr>
</thead>
<tbody>
<tr>
<td><strong>Identity</strong></td>
<td>SPIFFE (workload identifier), WIMSE (security context propagation)</td>
<td>Nothing -- use existing identity</td>
</tr>
<tr>
<td><strong>Evidence</strong></td>
<td>ECT (execution context tokens, DAG linking)</td>
<td>Orchestration semantics, checkpoint/rollback, HITL nodes</td>
</tr>
<tr>
<td><strong>Auth</strong></td>
<td>OAuth 2.0, SCIM, DAAP, STAMP, Agentic JWT</td>
<td>Protocol binding so any auth approach works</td>
</tr>
<tr>
<td><strong>Communication</strong></td>
<td>MCP, A2A, SLIM, 155 other protocols</td>
<td>Translation layer and capability advertisement</td>
</tr>
<tr>
<td><strong>Safety</strong></td>
<td>DAAP (accountability), verifiable conversations, VERA (zero-trust)</td>
<td>Assurance profiles connecting these into deployable configurations</td>
</tr>
</tbody>
</table>
<p>The proposed five-draft ecosystem:</p>
<ol>
<li><strong>Agent Ecosystem Model (AEM)</strong> -- Architecture and terminology. The shared vocabulary so everyone speaks the same language.</li>
<li><strong>Agent Task DAG (ATD)</strong> -- Execution semantics, checkpoints, rollback. How the DAG works.</li>
<li><strong>Human-in-the-Loop (HITL) Primitives</strong> -- Approval gates, overrides, escalation. How humans participate.</li>
<li><strong>Agent Ecosystem Protocol Binding (AEPB)</strong> -- Protocol translation, capability discovery, lifecycle management. How interoperability works.</li>
<li><strong>Assurance Profiles (APAE)</strong> -- Behavior verification, dynamic trust, provenance. How you prove it all works.</li>
</ol>
<p>Each draft addresses specific gaps. Together, they provide the connective tissue the landscape lacks.</p>
<h2 id="traction-vs-aspiration">Traction vs. Aspiration</h2>
<p>A reality check: of the 434 drafts, <strong>52 (12%)</strong> have been adopted by IETF working groups. The rest are individual submissions -- proposals without institutional backing. The WG-adopted drafts score higher on average (<strong>3.61 vs. 3.23</strong>, 4-dimension composite), particularly on maturity (+1.28) and momentum (+0.98), but lower on novelty (-0.45). <em>(Note: scores are LLM-generated relative rankings from abstracts; see <a href="../methodology.md">Methodology</a>.)</em> The WGs that have adopted the most agent-relevant drafts are security-focused: <strong>lamps</strong> (6 drafts), <strong>lake</strong> (5), <strong>tls</strong> (3), <strong>emu</strong> (3). Agent-specific WGs like <code>aipref</code> have adopted only 2 drafts.</p>
<p>This reveals a structural insight: the IETF is not building agent standards from scratch. It is <strong>retrofitting security standards for agents</strong>. The agent architecture we propose above would need to work within this reality -- building on the security WGs' infrastructure rather than competing with it.</p>
<h2 id="predictions">Predictions</h2>
<p>Based on the data trajectories and current momentum:</p>
<p><strong>Within 6 months</strong>: The OAuth-for-agents fragmentation will partially resolve. Working groups will adopt 2-3 canonical approaches (likely DAAP/STAMP for accountability and one of the RAR extensions for basic auth). The other 10 proposals will fade or merge.</p>
<p><strong>Within 12 months</strong>: The DMSC side meeting's gateway work will produce a specification, likely gateway-centric with Agent Gateways as the primary interoperability mechanism. This is not the protocol-agnostic translation layer the ecosystem needs, but it will be the first concrete interop proposal.</p>
<p><strong>Within 5 months (August 2026)</strong>: The EU AI Act (Regulation 2024/1689), which entered into force on 1 August 2024, becomes fully applicable on 2 August 2026. Its requirements for high-risk AI systems -- including mandatory risk management (Art. 9), human oversight (Art. 14), record-keeping (Art. 12), and accuracy/robustness (Art. 15) -- will drive immediate demand for behavior verification, human override, and audit standards. Non-compliance carries penalties up to 35 million EUR or 7% of global annual turnover (Art. 99). This is not future regulatory pressure; it is current law with imminent enforcement. The safety deficit is simultaneously a technical gap and a compliance gap for any agent system deployed in the EU.</p>
<p><strong>The risk</strong>: If the architecture work does not happen in the next 12 months, the agent ecosystem will calcify around vendor-specific protocol stacks (OpenAI's, Google's, Anthropic's, Huawei's). Each will have its own auth, discovery, and communication layer. The interoperability window will close, and the IETF's work will be standards for islands rather than standards for the internet.</p>
<h3 id="the-ethics-of-standardizing-early">The Ethics of Standardizing Early</h3>
<p>There is a harder question underneath the technical one: should the IETF be standardizing agent capabilities at all before safety frameworks are mature? The 4:1 capability-to-safety ratio is not just a gap -- it is a policy choice being made by default. Every A2A protocol that ships without behavior verification baked in creates a deployed base that resists retrofitting. The standards community is building the defaults that will govern billions of agent interactions, and those defaults currently assume trust rather than requiring proof.</p>
<p>The structural dynamics make this worse. The authorship analysis from Post 2 showed that a small number of large organizations -- Huawei, China Mobile, Cisco -- drive a disproportionate share of submissions. Civil society organizations, academic safety researchers, and smaller companies are largely absent from the drafting process. Standards that define agent identity, discovery, and communication also define what can be monitored, audited, and controlled. An agent discovery protocol designed primarily for enterprise deployment efficiency may inadvertently create a surveillance-friendly architecture if privacy and human autonomy are not first-class design constraints. The EU AI Act mandates human oversight (Art. 14), but a mandate is only as good as the protocol that implements it.</p>
<p>The IETF has historically been good at building infrastructure that serves everyone -- the end-to-end principle, protocol layering, rough consensus. But "rough consensus" among the current participants may not represent the interests of those most affected by autonomous agent systems. The architecture proposed above includes human-in-the-loop as a pillar, not an option. That is the right instinct. The question is whether the community will treat it with the same urgency as the protocol work -- or whether, as the data currently suggests, it will remain an aspiration while the highways ship without traffic lights.</p>
<h3 id="two-equilibria">Two Equilibria</h3>
<p>By 2028, the landscape will have resolved into one of two stable states.</p>
<p>In the <strong>first equilibrium</strong>, it looks like today's microservices ecosystem: a chaotic but functional collection of protocols, libraries, and frameworks, held together by platform-specific integrations and de facto standards from the largest cloud providers. The IETF's work exists but is incomplete. The real interoperability happens at higher layers -- agent frameworks like LangChain, Semantic Kernel, or their successors. Safety is bolted on after deployment.</p>
<p>In the <strong>second equilibrium</strong>, it looks more like the web: a layered architecture where identity (like TLS), communication (like HTTP), and semantics (like HTML) are cleanly separated, with standardized interfaces between them. Agents identify via WIMSE, execute via ECT-based DAGs, communicate via protocol-agnostic bindings, and operate under assurance profiles that scale from development to regulated production. Safety is built in, not bolted on.</p>
<p>The ~4:1 aggregate ratio (averaging ~4:1 but varying from 1.5:1 to 21:1 month-to-month) is the leading indicator. If it narrows -- if safety and oversight work accelerates to match capability work -- the second equilibrium becomes achievable. If it stays at ~4:1 or widens, the first equilibrium is where we land, and safety becomes remediation rather than prevention.</p>
<h2 id="what-builders-should-do-today">What Builders Should Do Today</h2>
<p>If you are building agent systems and cannot wait for standards to mature:</p>
<p><strong>1. Watch these drafts</strong>: ECT (execution evidence), DAAP (accountability), CHEQ (human confirmation), ADL (agent description), ANS (agent discovery). These have the highest combination of quality, novelty, and adoption potential.</p>
<p><strong>2. Design for the DAG</strong>: Structure your multi-agent workflows as directed acyclic graphs with explicit dependencies and checkpoints. Even without a standard, the pattern will be compatible with whatever emerges.</p>
<p><strong>3. Build HITL from the start</strong>: Every production agent deployment needs human override capability. Do not add it later. Design approval gates, emergency stops, and escalation paths into your architecture now.</p>
<p><strong>4. Implement assurance as a dial</strong>: Make your proof/audit level configurable. Start at L0 for development, L1 for production, and be ready to turn up to L2/L3 when regulation arrives.</p>
<p><strong>5. Avoid protocol lock-in</strong>: If you build on MCP today, architect for the possibility of supporting A2A or SLIM tomorrow. The protocol war is not over, and the winner may be "all of them via translation."</p>
<h2 id="the-thesis">The Thesis</h2>
<p>Across six posts, we have built to one argument:</p>
<p><strong>The IETF's AI agent standardization effort is the largest, fastest-growing, and most consequential standards race in a decade. But it is building the highways before the traffic lights.</strong> The data shows explosive growth (from 0.5% to 9.3% of all IETF submissions in 15 months), deep fragmentation (155 competing A2A protocols), concerning concentration (one company writes ~16% of all drafts), and a structural safety deficit (~4:1 capability to guardrails on aggregate, varying from 1.5:1 to 21:1 by month). What is missing is not more protocols -- it is connective tissue: a shared execution model, human oversight primitives, protocol interoperability, and assurance profiles that work from development to regulated production.</p>
<p>The convergent ideas -- and the broader set of 130 cross-org overlaps (36% of unique idea clusters) -- contain the components for this architecture. The question is whether the community can assemble them before the protocols ship without it. The convergence data suggests it is possible: <strong>180 ideas already cross the Chinese-Western divide</strong>, mediated largely by European telecoms (Deutsche Telekom, Telefonica, Orange) that operate in both markets and appear on both sides of nearly every major cross-cultural convergent idea. The bridge-builders exist. They need an architecture to bridge to.</p>
<p>The IETF has built the internet's infrastructure before. DNS, HTTP, TLS -- each emerged from periods of competing proposals, fragmentation, and coordinated resolution. The AI agent standards race is following the same pattern, on a compressed timeline, with higher stakes.</p>
<p>The traffic lights need to catch up to the highways. The data says they can -- if someone draws the big picture.</p>
<hr />
<h3 id="key-takeaways">Key Takeaways</h3>
<ul>
<li><strong>Four missing pillars</strong>: DAG-based execution, human-in-the-loop primitives, protocol-agnostic interoperability, and assurance profiles for dual-regime deployment</li>
<li><strong>The architecture builds on existing work</strong>: SPIFFE for identity, WIMSE for security context, ECT for execution evidence -- the foundation exists</li>
<li><strong>Five proposed drafts</strong> (AEM, ATD, HITL, AEPB, APAE) would fill the 11 gaps by providing connective tissue between existing protocol proposals</li>
<li><strong>The interoperability window is closing</strong>: vendor-specific agent stacks are forming; the next 12 months are critical for open standards</li>
<li><strong>For builders today</strong>: design for DAGs, build HITL from the start, make assurance configurable, avoid protocol lock-in</li>
</ul>
<p><em>Next in this series: <a href="07-how-we-built-this.md">How We Built This</a> -- the methodology behind analyzing 434 IETF drafts with Claude, Ollama, and Python.</em></p>
<hr />
<p><em>Synthesis based on the full IETF Draft Analyzer dataset: 434 drafts, 557 authors, 130 cross-org convergent ideas (via SequenceMatcher fuzzy matching at 0.75 threshold), 11 gaps, 18 team blocs. Data current as of March 2026.</em></p>
<div class="post-nav"><a href="/blog/posts/05-1262-ideas.html">&larr; Where Drafts Converge</a><a href="/blog/posts/07-how-we-built-this.html">How We Built This &rarr;</a></div>
<footer>
<p>IETF Draft Analyzer &mdash; Data collected through March 2026.
<a href="https://github.com/cnennemann/ietf-draft-analyzer">Source on GitHub</a></p>
</footer>
</div>
</body>
</html>

View File

@@ -0,0 +1,345 @@
<!DOCTYPE html>
<html lang="en">
<head>
<meta charset="utf-8">
<meta name="viewport" content="width=device-width, initial-scale=1">
<title>How We Built This: Analyzing 434 IETF Drafts with Claude and Ollama — IETF AI Agent Analysis</title>
<link rel="stylesheet" href="/blog/css/style.css">
</head>
<body>
<div class="container">
<nav><a href="/blog/" class="site-title">IETF AI Agent Analysis</a>
<a href="/blog/posts/01-gold-rush.html">Rush</a>
<a href="/blog/posts/02-who-writes-the-rules.html">Rules</a>
<a href="/blog/posts/03-oauth-wars.html">Wars</a>
<a href="/blog/posts/04-what-nobody-builds.html">Builds</a>
<a href="/blog/posts/05-1262-ideas.html">Converge</a>
<a href="/blog/posts/06-big-picture.html">Picture</a>
<strong>This</strong>
<a href="/blog/posts/08-agents-building-the-analysis.html">Analysis</a></nav>
<h1 id="how-we-built-this-analyzing-434-ietf-drafts-with-claude-and-ollama">How We Built This: Analyzing 434 IETF Drafts with Claude and Ollama</h1>
<p><em>The engineering behind the analysis -- a Python CLI, two LLMs, one SQLite database, and ~$9.</em></p>
<p><em>Analysis based on IETF Datatracker data collected through March 2026. Counts and statistics reflect this snapshot.</em></p>
<hr />
<p>Every claim in this series -- the ~4:1 safety ratio (averaging ~4:1 but varying from 1.5:1 to 21:1 month-to-month), the 14 competing OAuth proposals, the 18 team blocs, the 11 gaps, the 180 ideas crossing the Chinese-Western divide -- comes from an automated analysis pipeline we built in Python. This post describes how it works, what it costs, what it found that surprised us, and what we learned about LLM-powered document analysis at scale.</p>
<p>The tool is open source. If you want to run it on a different corner of the IETF -- or adapt it for another standards body -- everything you need is in the repository.</p>
<h2 id="the-pipeline">The Pipeline</h2>
<p>The analysis runs in six core stages. Each builds on the previous, and every stage caches its work so re-runs are fast and cheap.</p>
<pre><code>fetch --&gt; analyze --&gt; embed --&gt; ideas --&gt; gaps --&gt; report
| | | | | |
v v v v v v
Datatracker Claude Ollama Claude Claude Markdown
API Sonnet nomic-embed Haiku Sonnet + rich
</code></pre>
<p>Three additional analysis passes run on top of the core pipeline:</p>
<pre><code>refs --&gt; trends --&gt; idea-overlap --&gt; status
| | | |
v v v v
Regex SQL query SequenceMatcher Naming convention
(local) (local) (local) (local)
</code></pre>
<p>These secondary passes cost nothing -- they operate entirely on data already in the database.</p>
<h3 id="stage-1-fetch">Stage 1: Fetch</h3>
<p>The Datatracker API (<code>https://datatracker.ietf.org/api/v1/doc/document/</code>) provides structured metadata for every Internet-Draft: name, title, abstract, authors, revision, submission date, working group, and current status. Full text is available at <code>https://www.ietf.org/archive/id/{name}-{rev}.txt</code>.</p>
<p>We search for drafts matching 12 keywords: <code>agent</code>, <code>ai-agent</code>, <code>llm</code>, <code>autonomous</code>, <code>machine-learning</code>, <code>artificial-intelligence</code>, <code>mcp</code>, <code>agentic</code>, <code>inference</code>, <code>generative</code>, <code>intelligent</code>, <code>aipref</code>. Both <code>name__contains</code> and <code>abstract__contains</code> filters are used to cast a wide net. We started with 6 keywords and 260 drafts; adding 6 more captured 101 new drafts in categories we were missing -- MCP-related work, generative AI infrastructure, intelligent networking, and the nascent <code>aipref</code> working group.</p>
<p><strong>Gotchas learned the hard way</strong>: The Datatracker API uses <code>type__slug=draft</code> (not <code>type=draft</code>) to filter to drafts. Pagination requires tracking <code>meta.next</code> through the response chain. Affiliation data comes from the <code>documentauthor</code> record, not the <code>person</code> record. We add a 0.5-second polite delay between requests.</p>
<p>The result: <strong>434 drafts</strong> fetched, with full metadata and text stored in SQLite.</p>
<h3 id="stage-2-analyze">Stage 2: Analyze</h3>
<p>Each draft is sent to Claude Sonnet with a compact structured prompt that includes the draft name, title, date, page count, and abstract. The prompt asks for:
- <strong>Category classification</strong> (one or more of 11 categories: A2A protocols, agent identity/auth, autonomous netops, data formats/interop, agent discovery/reg, human-agent interaction, AI safety/alignment, ML traffic management, policy/governance, model serving/inference, other)
- <strong>Quality rating</strong> on five dimensions (novelty, maturity, overlap, momentum, relevance) each scored 1-5
- <strong>Brief summary</strong> of what the draft does and why it matters</p>
<p>The key optimization: <strong>caching</strong>. Every Claude API call is stored in an <code>llm_cache</code> table keyed by the SHA-256 hash of the full prompt. If the same draft is analyzed twice, the second call is free and instant. This makes the pipeline idempotent -- you can re-run any stage without wasting money.</p>
<p>We initially sent full draft text to Claude, but switched to abstract-only analysis after testing showed that abstracts produce equivalent ratings at roughly 10x lower token cost. Full text is still used for idea extraction (Stage 4), where granular detail matters.</p>
<p><strong>Cost</strong>: About $3.16 for the initial 260 drafts on Claude Sonnet (376K input tokens, 200K output tokens). With the <code>--cheap</code> flag, analysis uses Claude Haiku instead, cutting costs roughly 10x.</p>
<h3 id="stage-3-embed">Stage 3: Embed</h3>
<p>For similarity analysis, we generate vector embeddings using Ollama running locally with the <code>nomic-embed-text</code> model. Each draft's abstract is embedded into a 768-dimensional vector, stored as raw bytes in the database.</p>
<p><strong>Why not Claude for embeddings?</strong> Cost and speed. Ollama runs locally, is free, and processes all 434 drafts in under a minute. The embeddings are used for approximate similarity (cosine distance), overlap detection, and t-SNE visualization -- tasks where a small local model is perfectly adequate.</p>
<p>The embeddings enable:
- <strong>Overlap clusters</strong>: Draft pairs with &gt;0.85 cosine similarity grouped together
- <strong>Near-duplicate detection</strong>: 25+ pairs with &gt;0.98 similarity flagged as potential duplicates
- <strong>Interactive t-SNE landscape</strong>: 2D visualization of the entire draft space, color-coded by category</p>
<h3 id="stage-4-ideas">Stage 4: Ideas</h3>
<p>The most expensive stage. Each draft's full text is analyzed by Claude to extract discrete technical ideas -- mechanisms, architectures, protocols, patterns, extensions, and requirements.</p>
<p><strong>Batch optimization</strong>: Rather than calling Claude once per draft, we batch 5 drafts per API call using Claude Haiku (<code>--cheap --batch 5</code>). This cuts the number of API calls by 5x and uses the cheaper model. The batch prompt includes all 5 drafts' texts and asks for ideas from each, reducing per-idea cost to fractions of a cent.</p>
<p><strong>Result</strong>: The current database contains <strong>419 ideas</strong> across 377 drafts. An earlier pipeline run produced roughly 1,780 components from 361 drafts (averaging ~5 per draft). The difference reflects changes in extraction parameters, batching strategy, and deduplication -- a known limitation of LLM-based extraction. What is consistent across both runs: the vast majority of extracted ideas appear in exactly one draft, and most are draft-specific component descriptions rather than standalone innovations. The real signal comes from the cross-org overlap analysis (idea-overlap feature), which uses SequenceMatcher fuzzy matching (0.75 threshold) to identify <strong>130 cross-org convergent ideas</strong> where 2+ organizations work on recognizably similar problems (an earlier run with ~1,780 ideas yielded 628; the convergence rate of ~36% is consistent across both).</p>
<h3 id="stage-5-gaps">Stage 5: Gaps</h3>
<p>The gap analysis is a synthesis step. We send Claude Sonnet the full landscape context -- category distributions, idea taxonomy, safety ratio, overlap patterns -- and ask it to identify areas where standardization work is missing or inadequate.</p>
<p>This is the one stage where the LLM is doing genuine reasoning, not just extraction. The prompt provides the data; Claude identifies the structural gaps. We validate its findings against the raw data (e.g., confirming that only 6 ideas address error recovery, or that cross-protocol translation has zero ideas).</p>
<p><strong>Result</strong>: <strong>11 gaps</strong> identified (2 critical, 5 high, 4 medium), each cross-referenced with related drafts and ideas.</p>
<h3 id="stage-6-report">Stage 6: Report</h3>
<p>Reports are generated in Markdown with embedded data tables. Fifteen report types are available: overview, landscape, digest, timeline, overlap-matrix, overlap-clusters, authors, ideas, gaps, refs, trends, idea-overlap, and status. The <code>rich</code> library provides formatted terminal output for CLI commands.</p>
<h2 id="the-database">The Database</h2>
<p>The SQLite database is the real product. At <strong>28 MB</strong>, it contains everything needed to reproduce any finding in this series.</p>
<table>
<thead>
<tr>
<th>Table</th>
<th style="text-align: right;">Rows</th>
<th>Purpose</th>
</tr>
</thead>
<tbody>
<tr>
<td>drafts</td>
<td style="text-align: right;">434</td>
<td>Full metadata + text for every draft</td>
</tr>
<tr>
<td>ratings</td>
<td style="text-align: right;">434</td>
<td>5-dimension quality scores + summaries</td>
</tr>
<tr>
<td>embeddings</td>
<td style="text-align: right;">434</td>
<td>768-dim vectors as binary blobs</td>
</tr>
<tr>
<td>ideas</td>
<td style="text-align: right;">419</td>
<td>Extracted technical components with types</td>
</tr>
<tr>
<td>authors</td>
<td style="text-align: right;">557</td>
<td>Person records from Datatracker</td>
</tr>
<tr>
<td>draft_authors</td>
<td style="text-align: right;">1,057</td>
<td>Author-to-draft linkage with affiliation</td>
</tr>
<tr>
<td>draft_refs</td>
<td style="text-align: right;">4,231</td>
<td>RFC/draft/BCP cross-references</td>
</tr>
<tr>
<td>gaps</td>
<td style="text-align: right;">11</td>
<td>Identified standardization gaps</td>
</tr>
<tr>
<td>llm_cache</td>
<td style="text-align: right;">1,397</td>
<td>Cached Claude API responses</td>
</tr>
</tbody>
</table>
<p>FTS5 full-text search is enabled on drafts, supporting queries like <code>ietf search "agent authentication"</code> that return ranked results in milliseconds. Indexes on <code>draft_refs(ref_type, ref_id)</code> and <code>ideas(draft_name)</code> keep query performance fast even for cross-table joins.</p>
<p>The database design follows a principle: <strong>store raw data, compute derived data</strong>. The drafts table stores full text; the ratings, ideas, and refs tables store analysis results. Any analysis can be re-run without re-fetching from the Datatracker API.</p>
<h2 id="the-author-network">The Author Network</h2>
<p>The author analysis deserves special mention because it revealed the team bloc pattern -- one of the most important findings in the series.</p>
<p>The IETF Datatracker provides author information via two API endpoints:
- <code>/api/v1/doc/documentauthor/?document__name=X</code> -- returns author links per draft
- <code>/api/v1/person/person/{id}/</code> -- returns person details (name, affiliation)</p>
<p>We fetch all authors for all drafts, build a co-authorship graph, and detect team blocs: groups where every pair of members shares at least 70% of their drafts. This threshold was chosen empirically -- lower thresholds produce too many loose groups; higher thresholds miss real teams.</p>
<p>The detection algorithm:
1. For each pair of authors, calculate pairwise overlap = |shared drafts| / min(|A's drafts|, |B's drafts|)
2. Build a graph where edges represent pairs with &gt;= 70% overlap and &gt;= 2 shared drafts
3. Find connected components in this graph
4. Each component is a team bloc</p>
<p><strong>Organization normalization</strong> turned out to be essential. "Huawei Technologies", "Huawei Technologies Co., Ltd.", and "Huawei Canada" all need to resolve to "Huawei". We maintain a hand-curated alias table of 40+ mappings plus automatic suffix stripping for common patterns (", Inc.", " LLC", " AB", etc.). Without this, cross-org analysis would fragment the same company into multiple entities.</p>
<p><strong>Result</strong>: <strong>18 team blocs</strong> detected among 557 authors. The largest: a 13-person Huawei team with 22 shared drafts and 94% average cohesion.</p>
<h2 id="the-new-features">The New Features</h2>
<p>Four features were added during the analysis session, each unlocking a deeper analytical layer. All four run locally with zero API cost.</p>
<h3 id="rfc-cross-references-ietf-refs">RFC Cross-References (<code>ietf refs</code>)</h3>
<p><strong>What it does</strong>: Parses all 434 drafts for RFC references using regex (<code>RFC\s*\d{4,}</code>, <code>\[RFC\d+\]</code>, <code>BCP\s*\d+</code>, <code>draft-[\w-]+</code>). Stores results in a <code>draft_refs</code> table for querying.</p>
<p><strong>What it found</strong>: <strong>4,231 cross-references</strong> (2,443 RFC, 698 draft, 1,090 BCP) across 360 drafts with text. The most-referenced standards reveal what the agent ecosystem builds on:</p>
<table>
<thead>
<tr>
<th>RFC</th>
<th style="text-align: right;">References</th>
<th style="text-align: right;">What It Is</th>
</tr>
</thead>
<tbody>
<tr>
<td>RFC 2119</td>
<td style="text-align: right;">285</td>
<td style="text-align: right;">MUST/SHALL/MAY conventions</td>
</tr>
<tr>
<td>RFC 8174</td>
<td style="text-align: right;">237</td>
<td style="text-align: right;">Key words update</td>
</tr>
<tr>
<td>RFC 8446</td>
<td style="text-align: right;">42</td>
<td style="text-align: right;">TLS 1.3</td>
</tr>
<tr>
<td>RFC 6749</td>
<td style="text-align: right;">36</td>
<td style="text-align: right;">OAuth 2.0</td>
</tr>
<tr>
<td>RFC 9110</td>
<td style="text-align: right;">34</td>
<td style="text-align: right;">HTTP Semantics</td>
</tr>
<tr>
<td>RFC 8259</td>
<td style="text-align: right;">26</td>
<td style="text-align: right;">JSON</td>
</tr>
<tr>
<td>RFC 5280</td>
<td style="text-align: right;">22</td>
<td style="text-align: right;">X.509 Certificates</td>
</tr>
<tr>
<td>RFC 7519</td>
<td style="text-align: right;">22</td>
<td style="text-align: right;">JWT</td>
</tr>
<tr>
<td>RFC 9052</td>
<td style="text-align: right;">20</td>
<td style="text-align: right;">COSE</td>
</tr>
</tbody>
</table>
<p><strong>The insight</strong>: Strip away RFC 2119/8174 (boilerplate conventions that every IETF draft references) and the picture is clear: the agent ecosystem is built on <strong>OAuth + TLS + HTTP + JWT</strong>. It is a security and identity infrastructure, not a networking infrastructure. The IETF's agent standards are being constructed on the same foundation as the web itself. This reframes the entire landscape: agent standards are not something new. They are the next layer on top of the web's existing security architecture.</p>
<h3 id="category-trends-ietf-trends">Category Trends (<code>ietf trends</code>)</h3>
<p><strong>What it does</strong>: Monthly breakdown of new drafts per category with growth rates, comparing recent periods to earlier ones.</p>
<p><strong>What it found</strong>: The growth curve is a step function. Monthly submissions went from 2 (Jun 2025) to 67 (Oct 2025) to 86 (Feb 2026). A2A protocols are still accelerating (26 in Oct/Nov 2025, 36 in Feb 2026). Safety/alignment is growing but slower (5 in Oct 2025, 12 in Feb 2026). The aggregate ~4:1 ratio (which varies from 1.5:1 to 21:1 month-to-month) is narrowing, but not fast enough.</p>
<h3 id="cross-org-idea-overlap-ietf-idea-overlap">Cross-Org Idea Overlap (<code>ietf idea-overlap</code>)</h3>
<p><strong>What it does</strong>: Groups similar ideas using <code>SequenceMatcher</code> (threshold 0.75), then checks which ideas span drafts from multiple organizations. This separates genuine cross-org consensus from intra-team duplication.</p>
<p><strong>What it found</strong>: By exact title, the vast majority of unique ideas appear in only a single draft. But fuzzy matching reveals <strong>130 cross-org convergent ideas</strong> (36% of unique clusters) where 2+ organizations work on recognizably similar problems. The top convergence signal -- "A2A Communication Paradigm" -- spans <strong>8 organizations from 5 countries</strong>. The deeper finding: <strong>180 ideas cross the Chinese-Western organizational divide</strong>. European telecoms (Deutsche Telekom, Telefonica, Orange) act as bridges between Chinese institutions and Western companies. US Big Tech (Google, Apple, Amazon) is almost entirely absent from cross-divide collaboration.</p>
<h3 id="wg-adoption-status-ietf-status">WG Adoption Status (<code>ietf status</code>)</h3>
<p><strong>What it does</strong>: Determines which drafts have been formally adopted by IETF Working Groups based on the <code>draft-ietf-{wg}-*</code> naming convention. Compares scores, categories, and gap coverage between WG-adopted and individual drafts.</p>
<p><strong>What it found</strong>: <strong>52 of 434 drafts (12%)</strong> are WG-adopted. The remaining 90% are individual submissions -- ideas seeking institutional backing. WG-adopted drafts score slightly higher on average (<strong>3.61 vs 3.23</strong>), validating our rating methodology.</p>
<p>The most revealing finding: <strong>a majority of WG-adopted drafts are in security Working Groups</strong> (lamps, lake, tls, emu, ace). The agent-focused <code>aipref</code> WG has only 2 adopted drafts. The IETF is not building agent standards in agent-focused groups -- it is retrofitting its existing security infrastructure for agent use cases. The standards that will actually govern AI agents on the internet are being written by the same people who write TLS and OAuth, not by new agent-specific working groups.</p>
<h2 id="what-we-learned">What We Learned</h2>
<h3 id="llms-are-good-at-structured-extraction">LLMs are good at structured extraction</h3>
<p>Claude's strength in this pipeline is turning unstructured technical documents into structured data: categories, ratings, ideas, gaps. The extraction quality is high -- we spot-checked 50 drafts and found categorization and idea extraction accurate in ~90% of cases. The errors tend to be over-categorization (assigning too many categories) rather than miscategorization.</p>
<h3 id="llms-need-validation-for-synthesis">LLMs need validation for synthesis</h3>
<p>The gap analysis (Stage 5) required the most human oversight. Claude correctly identified the gaps, but the severity rankings and the "zero ideas" claims needed manual verification against the raw data. LLMs can synthesize, but the synthesis should be treated as a hypothesis, not a conclusion.</p>
<h3 id="caching-changes-the-economics">Caching changes the economics</h3>
<p>The <code>llm_cache</code> table transforms the cost model. The first run costs ~$3. Every subsequent run -- adding new drafts, re-running with different prompts, regenerating reports -- costs only for new work. Over the project's life, we estimate caching saved $30+ in redundant API calls. The cache key is a SHA-256 hash of the full prompt, making it trivially collision-resistant.</p>
<h3 id="hybrid-models-work">Hybrid models work</h3>
<p>Using Claude Sonnet for reasoning-heavy tasks (analysis, gap synthesis) and Claude Haiku for extraction-heavy tasks (idea extraction, batch processing) cut costs by 5-10x without meaningful quality loss. Using Ollama for embeddings made similarity analysis free and fast. The principle: match the model's capability to the task's difficulty.</p>
<h3 id="the-free-analyses-are-the-most-revealing">The free analyses are the most revealing</h3>
<p>The four features that cost zero API dollars -- regex-based RFC parsing, SQL-based trend analysis, SequenceMatcher-based idea dedup, and naming-convention-based WG detection -- produced some of the most narratively important findings in the entire series. The OAuth-stack-as-foundation insight from RFC cross-references. The 180 cross-divide ideas. The 10% WG adoption rate. The security-WG-not-agent-WG finding. None of these required an LLM. They required a well-structured database and the right questions.</p>
<h3 id="the-database-is-the-product">The database is the product</h3>
<p>The most valuable output is not any single report -- it is the SQLite database. With all drafts analyzed, ideas extracted, authors mapped, refs parsed, and embeddings stored, the database supports ad-hoc queries that no pre-built report can anticipate. The blog series was written primarily by querying the database, not by re-running the pipeline.</p>
<h2 id="cost-summary">Cost Summary</h2>
<table>
<thead>
<tr>
<th>Stage</th>
<th>Model</th>
<th style="text-align: right;">Drafts</th>
<th style="text-align: right;">Cost</th>
</tr>
</thead>
<tbody>
<tr>
<td>Analyze</td>
<td>Claude Sonnet</td>
<td style="text-align: right;">260</td>
<td style="text-align: right;">~$2.50</td>
</tr>
<tr>
<td>Analyze</td>
<td>Claude Sonnet</td>
<td style="text-align: right;">101</td>
<td style="text-align: right;">~$5.50</td>
</tr>
<tr>
<td>Ideas</td>
<td>Claude Haiku (batch 5)</td>
<td style="text-align: right;">434</td>
<td style="text-align: right;">~$0.80</td>
</tr>
<tr>
<td>Gaps</td>
<td>Claude Sonnet</td>
<td style="text-align: right;">1 call</td>
<td style="text-align: right;">~$0.20</td>
</tr>
<tr>
<td>Embed</td>
<td>Ollama (local)</td>
<td style="text-align: right;">434</td>
<td style="text-align: right;">$0.00</td>
</tr>
<tr>
<td>Refs</td>
<td>Regex (local)</td>
<td style="text-align: right;">434</td>
<td style="text-align: right;">$0.00</td>
</tr>
<tr>
<td>Trends</td>
<td>SQL (local)</td>
<td style="text-align: right;">434</td>
<td style="text-align: right;">$0.00</td>
</tr>
<tr>
<td>Idea-overlap</td>
<td>SequenceMatcher (local)</td>
<td style="text-align: right;">419 ideas</td>
<td style="text-align: right;">$0.00</td>
</tr>
<tr>
<td>WG Status</td>
<td>Naming convention</td>
<td style="text-align: right;">434</td>
<td style="text-align: right;">$0.00</td>
</tr>
<tr>
<td><strong>Total</strong></td>
<td></td>
<td style="text-align: right;"></td>
<td style="text-align: right;"><strong>~$9</strong></td>
</tr>
</tbody>
</table>
<p>For context: analyzing 434 IETF drafts -- fetching full text, rating quality on 5 dimensions, extracting 419 technical ideas, detecting 11 gaps, mapping 557 authors, parsing 4,231 cross-references, and identifying 18 team blocs -- cost less than two large coffees.</p>
<h2 id="the-tech-stack">The Tech Stack</h2>
<ul>
<li><strong>Python 3.11+</strong> with <strong>Click</strong> for the CLI</li>
<li><strong>SQLite</strong> with <strong>FTS5</strong> for full-text search</li>
<li><strong>httpx</strong> for HTTP requests (Datatracker API)</li>
<li><strong>anthropic</strong> SDK for Claude API</li>
<li><strong>ollama</strong> for local embeddings</li>
<li><strong>rich</strong> for terminal formatting</li>
<li><strong>numpy</strong> for cosine similarity and matrix operations</li>
</ul>
<p>43 CLI commands, 13+ interactive visualizations (HTML/PNG), 15 report types. Total codebase: approximately 6,100 lines of Python across 12 modules.</p>
<hr />
<h2 id="limitations">Limitations</h2>
<p><strong>A note on IETF IPR policy</strong>: Internet-Drafts may be subject to intellectual property rights (IPR) claims. Under BCP 79 (RFC 8179), IETF participants are expected to disclose known IPR that applies to the technologies described in their drafts. Implementers considering building on any of the drafts discussed in this series should check the <a href="https://datatracker.ietf.org/ipr/">IETF IPR disclosure database</a> before proceeding.</p>
<p>This analysis is exploratory, not peer-reviewed research. Several methodological limitations should be understood when interpreting the results:</p>
<p><strong>LLM-as-Judge ratings</strong>: All quality ratings are generated by Claude Sonnet from draft abstracts (not full text), with no human calibration. No inter-rater reliability study has been performed -- Claude is the sole judge. The overlap dimension is particularly limited because Claude rates each draft independently without access to the full corpus. Scores should be treated as relative rankings within this corpus, not absolute quality measures.</p>
<p><strong>Keyword-based corpus selection</strong>: The 12 search keywords cast a wide net but introduce both false positives (drafts about "user agents" or "autonomous systems" unrelated to AI) and false negatives (relevant drafts using terminology we did not search for). We estimate 30-50 false positives remain in the corpus. The relevance rating partially mitigates this, but the LLM judge is generous with relevance for keyword-matched drafts.</p>
<p><strong>Clustering thresholds</strong>: The 0.85 cosine similarity threshold for topical clusters, 0.90 for near-duplicates, and 0.98 for functional duplicates are empirical choices based on manual inspection, not derived from a principled analysis. The embedding model (nomic-embed-text) is general-purpose, not fine-tuned for standards documents. A sensitivity analysis across thresholds would strengthen confidence.</p>
<p><strong>Gap analysis</strong>: The gap identification is a single-shot LLM analysis based on compressed landscape statistics, not a systematic comparison against a reference architecture. Gap severity is assigned by Claude without defined thresholds. The gaps should be treated as hypotheses for expert validation, not definitive findings.</p>
<p><strong>Idea extraction quality</strong>: Batch extraction (Haiku, abstract-only at 800 chars) produces different results than individual extraction (Sonnet, abstract + full text). No precision/recall measurement has been performed. The extraction prompt instructs Claude to return 1-4 ideas per draft, which may under-count contributions from comprehensive drafts.</p>
<p><strong>Abstract-only analysis</strong>: Ratings are based on abstracts truncated to 2000 characters. For maturity assessment in particular, the abstract is an imperfect proxy for the full document's technical depth.</p>
<p>For full methodology documentation, see <code>data/reports/methodology.md</code> in the project repository.</p>
<hr />
<h3 id="key-takeaways">Key Takeaways</h3>
<ul>
<li><strong>The full analysis cost ~$9</strong> -- LLM-powered document analysis at scale is practical and cheap with proper caching and model selection</li>
<li><strong>Caching is essential</strong>: SHA-256 hashed prompt caching makes the pipeline idempotent and dramatically reduces costs on re-runs</li>
<li><strong>Hybrid LLM strategy</strong>: Claude Sonnet for reasoning, Claude Haiku for extraction (10x cheaper), Ollama for embeddings (free) -- match model capability to task difficulty</li>
<li><strong>The zero-cost analyses were the most revealing</strong>: RFC cross-references, idea overlap, WG adoption, and trend analysis all run locally and produced the series' most important structural findings</li>
<li><strong>The database is the product</strong>: a well-structured SQLite DB supports queries no pre-built report anticipates; the blog series was written by querying, not re-running</li>
</ul>
<p><em>Next in this series: <a href="08-agents-building-the-analysis.md">Agents Building the Agent Analysis</a> -- we used a team of AI agents to produce this series. The irony is the point.</em></p>
<hr />
<p><em>The IETF Draft Analyzer is open source. The codebase, database, and all reports are available in the project repository.</em></p>
<div class="post-nav"><a href="/blog/posts/06-big-picture.html">&larr; The Big Picture</a><a href="/blog/posts/08-agents-building-the-analysis.html">Agents Building the Agent Analysis &rarr;</a></div>
<footer>
<p>IETF Draft Analyzer &mdash; Data collected through March 2026.
<a href="https://github.com/cnennemann/ietf-draft-analyzer">Source on GitHub</a></p>
</footer>
</div>
</body>
</html>

View File

@@ -0,0 +1,252 @@
<!DOCTYPE html>
<html lang="en">
<head>
<meta charset="utf-8">
<meta name="viewport" content="width=device-width, initial-scale=1">
<title>Agents Building the Agent Analysis — IETF AI Agent Analysis</title>
<link rel="stylesheet" href="/blog/css/style.css">
</head>
<body>
<div class="container">
<nav><a href="/blog/" class="site-title">IETF AI Agent Analysis</a>
<a href="/blog/posts/01-gold-rush.html">Rush</a>
<a href="/blog/posts/02-who-writes-the-rules.html">Rules</a>
<a href="/blog/posts/03-oauth-wars.html">Wars</a>
<a href="/blog/posts/04-what-nobody-builds.html">Builds</a>
<a href="/blog/posts/05-1262-ideas.html">Converge</a>
<a href="/blog/posts/06-big-picture.html">Picture</a>
<a href="/blog/posts/07-how-we-built-this.html">This</a>
<strong>Analysis</strong></nav>
<h1 id="agents-building-the-agent-analysis">Agents Building the Agent Analysis</h1>
<p><em>We used a team of AI agents to analyze, write about, and review 434 IETF Internet-Drafts on AI agents. Here is what that looked like from the inside.</em></p>
<p><em>Analysis based on IETF Datatracker data collected through March 2026. Counts and statistics reflect this snapshot.</em></p>
<hr />
<p>There is an irony we should address up front: this entire blog series -- analyzing 434 Internet-Drafts about how AI agents should work -- was itself produced by a team of AI agents. Twelve Claude instances across three phases, each with a distinct role, reading the same database, building on each other's output, and coordinating through a shared journal and file system.</p>
<p>This post is the story of that process: what worked, what broke, what surprised us, and what it reveals about the state of AI agent coordination in practice -- which, as it happens, is exactly the problem the IETF drafts are trying to solve.</p>
<h2 id="phase-1-the-writing-team">Phase 1: The Writing Team</h2>
<p>We started with four agents, each defined in a one-page file and grounded by a shared 3,000-word team brief:</p>
<table>
<thead>
<tr>
<th>Agent</th>
<th>Role</th>
<th>What They Did</th>
</tr>
</thead>
<tbody>
<tr>
<td><strong>Architect</strong></td>
<td>The Big Picture</td>
<td>Read all reports, designed the narrative arc, wrote the vision document, reviewed every post</td>
</tr>
<tr>
<td><strong>Analyst</strong></td>
<td>The Data Whisperer</td>
<td>Ran the pipeline on 434 drafts, executed 20+ SQL queries, produced data packages</td>
</tr>
<tr>
<td><strong>Coder</strong></td>
<td>The Feature Builder</td>
<td>Implemented 7 new analysis features (refs, trends, idea-overlap, WG adoption, revisions, centrality, co-occurrence)</td>
</tr>
<tr>
<td><strong>Writer</strong></td>
<td>The Storyteller</td>
<td>Drafted all 8 blog posts, applied 6+ revision passes</td>
</tr>
</tbody>
</table>
<p>Each agent had access to the full project codebase, a SQLite database, and the <code>ietf</code> CLI tool. They communicated through files and coordinated through a shared development journal. The team brief contained a thesis statement -- "The IETF is building the highways before the traffic lights" -- a per-post outline, and a data requirements table.</p>
<h3 id="parallel-by-default">Parallel by default</h3>
<p>The key design decision: agents did not wait for each other when they could work in parallel. The Writer's tasks were formally blocked by the Analyst's pipeline run, but the Writer had enough existing data (260 analyzed drafts) to start drafting. Rather than sitting idle, the Writer produced first drafts of all 7 posts while waiting for updated numbers. This turned out to be the right call -- the structure and narrative mattered more than whether the draft count was 260 or 434.</p>
<p>The Coder and Writer worked simultaneously, their outputs feeding each other. Every feature the Coder built used zero API calls -- pure local computation via regex, SQL, SequenceMatcher, and networkx. The RFC cross-reference parser revealed that the Chinese and Western blocs build on incompatible infrastructure foundations (YANG/NETCONF vs. COSE/CBOR), with OAuth 2.0 as the only shared bedrock. The co-occurrence analysis showed safety has zero overlap with Agent Discovery and Model Serving. These zero-cost local analyses produced the most structurally revealing findings in the entire series.</p>
<h3 id="the-architect-shaped-everything">The Architect shaped everything</h3>
<p>The Architect produced fewer words than the Writer and fewer features than the Coder, but had disproportionate impact. Three contributions reshaped the output:</p>
<ol>
<li>The insight that <strong>gap severity correlates with coordination difficulty</strong> transformed Post 4 from a list of gaps into an argument about structural dysfunction.</li>
<li>The <strong>"two equilibria" framing</strong> -- microservices chaos vs. layered web architecture -- gave Post 6's predictions real structural weight.</li>
<li>A <strong>verification pass</strong> that caught the Writer's revisions silently failing (logged as done, not actually persisted in the file).</li>
</ol>
<p>That third point is worth dwelling on. The dev journal said "Post 1 revisions complete." The file still contained the pre-revision content. Without the Architect reading the actual output rather than trusting the status message, the error would have shipped. This is a small-scale version of the Behavior Verification gap the series identifies as critical -- and we will come back to it.</p>
<h3 id="the-human-who-said-so-what">The human who said "so what?"</h3>
<p>The most consequential intervention in the entire project came not from an agent but from the human project lead. The series had been built around a headline number: "1,780 technical ideas extracted from the drafts." The project lead asked: what does that number actually mean?</p>
<p>The answer was uncomfortable. The pipeline extracts roughly 5 ideas per draft on average -- a mechanical process that produces items like "A2A Communication Paradigm" and "Agent Network Architecture." The raw count sounds impressive but is mostly scaffolding. The real signal was hiding in the cross-org overlap analysis: 96% of unique idea titles appear in exactly one draft. Only 75 show up in two or more. The fragmentation that defines the protocol landscape extends all the way down to the idea level.</p>
<p>This required rewriting Post 5 entirely. Its title changed from "The 1,780 Ideas That Will Shape Agent Infrastructure" to "Where 434 Drafts Converge (And Where They Don't)." The lead metric shifted from raw extraction count (impressive but hollow) to the convergence rate (honest and striking). Four agents had independently used the 1,780 figure -- the Analyst generated it, the Coder validated it, the Architect designed around it, the Writer headlined it. None questioned whether it was meaningful.</p>
<h2 id="phase-2-the-review-cycle">Phase 2: The Review Cycle</h2>
<p>After the writing team produced 8 blog posts, a vision document, 7 new analysis features, and 30 dev-journal entries, we did something that turned out to matter more than the writing itself: we sent the entire output to four specialist reviewers, each running in parallel.</p>
<table>
<thead>
<tr>
<th>Reviewer</th>
<th>Lens</th>
<th>Issues Found</th>
</tr>
</thead>
<tbody>
<tr>
<td><strong>Statistics</strong></td>
<td>Data integrity, sampling bias, quantitative accuracy</td>
<td>3 critical, 4 important, 4 minor</td>
</tr>
<tr>
<td><strong>Legal</strong></td>
<td>German/EU internet law, GDPR, EU AI Act, eIDAS 2.0</td>
<td>3 critical, 5 regulatory gaps, 5 improvements</td>
</tr>
<tr>
<td><strong>Engineering</strong></td>
<td>Code quality, security, performance, DX</td>
<td>1 critical, 1 high, 5 bugs, 6 perf issues</td>
</tr>
<tr>
<td><strong>Science</strong></td>
<td>Methodology, reproducibility, related work, hedging</td>
<td>2 critical, 3 high, 4 medium</td>
</tr>
</tbody>
</table>
<p>Four agents, four completely different perspectives, run simultaneously. Together they surfaced <strong>36 distinct issues</strong> that the writing team had missed. The findings were often surprising.</p>
<h3 id="the-statistics-reviewer-found-the-numbers-did-not-add-up">The statistics reviewer found the numbers did not add up</h3>
<p>The statistical audit cross-checked every quantitative claim in the blog series against the actual database using raw SQL queries. The results were sobering. The blog claimed 361 drafts; the database held 434. The blog claimed 1,780 ideas; the database held 419. The blog claimed 12 gaps; the database held 11. Composite scores were inflated by 0.05-0.10 through rounding. The "4:1 safety ratio" varied from 1.5:1 to 21:1 by month -- a fact the flat claim obscured.</p>
<p>The ideas count mismatch was the most serious finding. The entire thesis of Post 5 -- "96% of ideas appear in one draft" and "628 cross-org convergent ideas" -- was not reproducible from the current database. The pipeline had been re-run with different parameters, overwriting the original extraction. Nobody had noticed because the numbers in the blog posts were never re-checked against the live database.</p>
<h3 id="the-legal-reviewer-found-regulatory-blindspots">The legal reviewer found regulatory blindspots</h3>
<p>The legal review, written from a German/EU internet law perspective, identified three critical issues that no technically-focused agent would have caught:</p>
<p><strong>Consent conflation.</strong> The series used "consent" interchangeably across OAuth authorization flows, GDPR consent (Einwilligung under Art. 6(1)(a)), and human-in-the-loop approval gates. These are legally distinct concepts. Under CJEU case law (Planet49), consent requires a clear affirmative act by the data subject. When an AI agent delegates to sub-agents, the chain of consent may break entirely. None of the 14 OAuth-for-agents proposals the series analyzed -- and none of the agents writing about them -- flagged this.</p>
<p><strong>The hospital scenario understated regulatory reality.</strong> Post 4's opening scenario -- an AI agent managing drug dispensing with a hallucinated dosage -- was framed as "what goes wrong if this gap is never addressed." Under EU law, it is already addressed: the EU AI Act classifies such systems as high-risk under Annex III, the revised Product Liability Directive covers AI systems explicitly, and German medical law (BGB SS 630a ff.) places duty of care on the provider. The IETF gap is not in accountability but in technical mechanisms to implement what the regulation already requires.</p>
<p><strong>GDPR was entirely absent from the gap analysis.</strong> The series identified 11 standardization gaps. None mentioned GDPR-mandated capabilities: data protection impact assessments, right to erasure propagation through multi-agent chains, data portability, or purpose limitation. These are not aspirational -- they are legally binding requirements that agent systems operating in the EU must satisfy.</p>
<h3 id="the-engineering-reviewer-found-a-sql-injection">The engineering reviewer found a SQL injection</h3>
<p>The codebase review graded the project B+ overall -- "solid for a research tool, needs hardening for production" -- but found a critical SQL injection vulnerability in <code>db.py</code>. The <code>update_generation_run</code> method interpolated column names from <code>**kwargs</code> directly into SQL strings without validation. The Flask SECRET_KEY was hardcoded as the string <code>"ietf-dashboard-dev"</code>. There was no rate limiting on endpoints that trigger paid Claude API calls.</p>
<p>The engineering reviewer also noted that <code>cli.py</code> had grown to 2,995 lines with approximately 40 repetitions of the same config/db boilerplate pattern. And that test coverage for the analysis pipeline -- the core of the tool -- was exactly zero.</p>
<h3 id="the-science-reviewer-questioned-the-methodology">The science reviewer questioned the methodology</h3>
<p>The scientific review identified the central methodological weakness: the entire rating system relies on Claude as the sole judge for five dimensions, with no human calibration, no inter-rater reliability measurement, and ratings based on abstracts only (truncated to 2,000 characters), not full draft text. The clustering threshold of 0.85 was described as "empirical" with no sensitivity analysis. The gap analysis was single-shot LLM generation from compressed metadata.</p>
<p>One finding was particularly striking: of 434 drafts rated for relevance, the distribution was heavily right-skewed (196 at 4, 98 at 5, only 38 at 1-2). Claude was generous with relevance for keyword-matched drafts, making the metric less discriminating than it should be. Upon manual review, 73 drafts turned out to be false positives -- including <code>draft-ietf-hpke-hpke</code> (generic public key encryption, nothing to do with AI agents) rated at relevance 5.</p>
<h2 id="phase-3-the-fix-cycle">Phase 3: The Fix Cycle</h2>
<p>With 36 issues identified, we launched fix agents -- the Coder handling engineering and data integrity issues, an Editor handling legal and statistical corrections across the blog posts.</p>
<p>The fixes unfolded in three rounds, prioritized by severity:</p>
<p><strong>Round 1 -- Critical.</strong> SQL injection patched with a column name whitelist. Flask SECRET_KEY replaced with <code>os.environ.get()</code> fallback to <code>os.urandom()</code>. FTS5 query sanitization added to prevent search injection. False-positive column added to the ratings table; 73 drafts flagged. All blog posts updated from 361 to 434 drafts. Ideas count discrepancy reconciled (419 current with methodology note explaining the 1,780 historical figure). Gap count corrected from 12 to 11 with rewritten gap table matching database reality.</p>
<p><strong>Round 2 -- High.</strong> Rate limiting added to Claude-calling endpoints (10 req/min/IP). Category names normalized in the database (21 legacy entries migrated). EU AI Act timeline corrected from "within 18 months" to "within 5 months (August 2026)" with enforcement details and article references. OAuth/GDPR consent distinction added. Hospital scenario annotated with AI Act Annex III and Medical Devices Regulation context. Safety ratio qualified everywhere from flat "4:1" to "averaging ~4:1 but varying from 1.5:1 to 21:1 month-to-month."</p>
<p><strong>Round 3 -- Medium.</strong> Methodology documentation created (comprehensive <code>methodology.md</code> covering all pipeline stages, limitations, and related work). IETF IPR notes added. Language hedged where causal claims were only supported by correlation. MIT LICENSE file created (the project claimed "open source" but had no license). FIPA, IEEE P3394, and eIDAS 2.0 references added where they naturally strengthen arguments. Coder reduced <code>cli.py</code> by 200 lines of boilerplate, added <code>--dry-run</code> flags to destructive commands, fixed N+1 query patterns.</p>
<p>In total: 14 files modified across the blog series, 7 security/quality fixes applied to the codebase, test count increased from 23 to 64, and a verified-counts document created as a single source of truth.</p>
<h2 id="what-this-reveals">What This Reveals</h2>
<h3 id="specialized-perspectives-catch-different-things">Specialized perspectives catch different things</h3>
<p>This is the headline finding from the review cycle. Four reviewers looked at the same output and found almost entirely non-overlapping issues. The statistician found number mismatches. The lawyer found consent conflation. The engineer found SQL injection. The scientist found methodological gaps. No single reviewer -- no matter how thorough -- would have caught all 36 issues.</p>
<p>This is not a theoretical observation about diverse review. It is an empirical result from running the experiment. The legal reviewer's consent-conflation finding required knowledge of CJEU case law. The statistical reviewer's ideas-count discovery required querying the live database. The engineering reviewer's SQL injection required reading the source code line by line. These are genuinely different skills applied to the same artifact.</p>
<h3 id="the-review-fix-verify-pattern-works">The review-fix-verify pattern works</h3>
<p>The cycle ran cleanly: four parallel reviews produced a prioritized list; fix agents resolved issues in severity order; the fixes were verified against the review documents. Three rounds (critical, high, medium) imposed natural prioritization. The entire cycle -- 4 reviews plus 3 fix rounds -- happened in a single day.</p>
<p>The pattern mirrors what the IETF itself does with Last Call reviews, directorate reviews, and IESG evaluation. Multiple specialized perspectives, applied in sequence, with verification that issues are resolved. The difference is that our cycle took hours, not months. The cost is that our reviewers share the same underlying model and its blindspots.</p>
<h3 id="agents-modifying-the-same-files-is-the-hard-problem">Agents modifying the same files is the hard problem</h3>
<p>The most persistent coordination difficulty was not conceptual but logistical: multiple agents editing the same blog posts. The Writer updated Post 4's gap table. The Editor changed the safety ratio phrasing. The Coder corrected the draft count. Each edit was correct in isolation. But when three agents modify the same file, merge conflicts and stale reads are inevitable. We hit this multiple times -- most visibly with the Post 1 revisions that silently failed to persist.</p>
<p>This maps directly to the IETF's Agent Execution Model gap. When multiple agents operate on shared state, you need either locking (pessimistic) or conflict detection (optimistic). We had neither. We used a file system, a dev journal, and hope.</p>
<h3 id="the-cheapest-analyses-mattered-most">The cheapest analyses mattered most</h3>
<table>
<thead>
<tr>
<th>Component</th>
<th style="text-align: right;">Cost</th>
<th>Key Finding</th>
</tr>
</thead>
<tbody>
<tr>
<td>Claude Sonnet (ratings, gaps)</td>
<td style="text-align: right;">~$8</td>
<td>4:1 safety deficit, 11 gaps</td>
</tr>
<tr>
<td>Claude Haiku (idea extraction)</td>
<td style="text-align: right;">~$0.80</td>
<td>419 ideas, 96% unique to one draft</td>
</tr>
<tr>
<td>4 reviewers (parallel)</td>
<td style="text-align: right;">~$4</td>
<td>36 issues across 4 dimensions</td>
</tr>
<tr>
<td>Ollama embeddings</td>
<td style="text-align: right;">$0.00</td>
<td>25+ near-duplicate pairs</td>
</tr>
<tr>
<td>Coder: regex, SQL, networkx</td>
<td style="text-align: right;">$0.00</td>
<td>RFC divergence, centrality, co-occurrence</td>
</tr>
<tr>
<td><strong>Total</strong></td>
<td style="text-align: right;"><strong>~$13</strong></td>
<td></td>
</tr>
</tbody>
</table>
<p>The LLM provided the foundation data. Every structurally revealing finding -- RFC foundation divergence, European telecoms as bridge-builders, safety structurally isolated from protocols, 55% fire-and-forget revision rate -- came from deterministic local computation on top of that foundation. The lesson for anyone building LLM-powered analysis: the model is the foundation, not the insight engine.</p>
<h2 id="the-meta-irony">The Meta-Irony</h2>
<p>We built a team of AI agents to analyze IETF drafts about AI agent standards. The team needed coordination, shared context, specialized roles, quality review, human oversight, and output verification. Every one of these needs maps to a gap in the IETF landscape:</p>
<table>
<thead>
<tr>
<th>Our Team Needed</th>
<th>What Happened</th>
<th>IETF Gap</th>
</tr>
</thead>
<tbody>
<tr>
<td>Shared execution context</td>
<td>Agents coordinated via SQLite, files, dev journal</td>
<td>Agent Execution Model (no standard)</td>
</tr>
<tr>
<td>Output verification</td>
<td>Writer's revisions silently failed; Architect caught it manually</td>
<td>Agent Behavioral Verification (critical)</td>
</tr>
<tr>
<td>Quality review</td>
<td>4 parallel reviewers found 36 issues the writing team missed</td>
<td>Agent Behavioral Verification (critical)</td>
</tr>
<tr>
<td>Error handling</td>
<td>Ideas reframing required 3 iterations to stabilize numbers</td>
<td>Real-Time Agent Rollback (high)</td>
</tr>
<tr>
<td>Coordination across approaches</td>
<td>Agents editing the same files with no merge mechanism</td>
<td>Cross-Protocol Agent Migration (medium)</td>
</tr>
<tr>
<td>Human oversight</td>
<td>Project lead's "so what?" redirected the entire ideas framing</td>
<td>Human Override Standardization (high)</td>
</tr>
<tr>
<td>Specialized perspectives</td>
<td>Legal, statistical, engineering, and scientific reviewers each found unique issues</td>
<td>Agent Capability Negotiation (medium)</td>
</tr>
</tbody>
</table>
<p>We solved these problems ad hoc -- with a journal, role definitions, manual verification passes, severity-prioritized fix rounds, and human review. The IETF is trying to solve them at internet scale with protocol standards.</p>
<p>The distance between our 12-agent team and a deployed multi-agent system on the open internet is vast. But the problems are structurally identical. The standards the IETF is racing to write are the standards our own team needed. The traffic lights the highway needs are the ones we built by hand.</p>
<hr />
<h3 id="key-takeaways">Key Takeaways</h3>
<ul>
<li><strong>Twelve agents across three phases</strong> (4 writers, 4 reviewers, 4 fixers) produced 8 blog posts, a vision document, 7 analysis features, 36 identified issues, and 64 tests -- from a ~$13 pipeline</li>
<li><strong>Four parallel reviewers found 36 non-overlapping issues</strong>: a SQL injection, consent conflation with EU law, a 76% ideas count mismatch, and uncalibrated LLM-as-judge methodology. No single reviewer would have caught all of them</li>
<li><strong>The human project lead's "so what?"</strong> was the single most consequential intervention -- no agent questioned whether the headline metric was meaningful</li>
<li><strong>A silent failure</strong> (revisions logged but not persisted) demonstrated the same Behavior Verification gap the series identifies as critical in the IETF landscape</li>
<li><strong>The team's coordination problems mirror the IETF's gaps</strong>: shared state, output verification, error recovery, capability negotiation, and human oversight are needed at every scale</li>
</ul>
<p><em>This post concludes the series. All data, code, and reports are available in the IETF Draft Analyzer project repository.</em></p>
<hr />
<p><em>Written by a team of Claude instances analyzing the IETF's work on AI agent standards. The irony is not lost on us.</em></p>
<div class="post-nav"><a href="/blog/posts/07-how-we-built-this.html">&larr; How We Built This</a><span></span></div>
<footer>
<p>IETF Draft Analyzer &mdash; Data collected through March 2026.
<a href="https://github.com/cnennemann/ietf-draft-analyzer">Source on GitHub</a></p>
</footer>
</div>
</body>
</html>

View File

@@ -31,7 +31,7 @@
\title{%
\textbf{The AI Agent Standards Gold Rush:\\
A Systematic Analysis of 434 IETF Internet-Drafts}%
A Systematic Analysis of 474 IETF Internet-Drafts}%
}
\author{
@@ -48,7 +48,7 @@
% ── Abstract ──────────────────────────────────────────────────────────────
\begin{abstract}
The Internet Engineering Task Force (IETF) is experiencing an unprecedented surge in standardization activity related to artificial intelligence and autonomous agents. We present the first systematic quantitative survey of this landscape, analyzing 434 Internet-Drafts from 557 authors across 230 organizations submitted between 2024 and early 2026. Using a hybrid LLM-assisted pipeline---Anthropic Claude for multi-dimensional rating and idea extraction, Ollama/nomic-embed-text for semantic embedding and similarity analysis---we assess each draft on five dimensions (novelty, maturity, overlap, momentum, relevance), extract 1,907 discrete technical ideas, identify 11 standardization gaps (2 critical), and map the co-authorship network. Our analysis reveals three headline findings: (1) a 4:1 ratio of capability-building drafts to safety-focused ones, indicating a systemic safety deficit; (2) significant thematic redundancy, with 42 overlap clusters and 120 competing agent-to-agent protocol proposals; and (3) concentrated organizational authorship, with a single company contributing 18\% of all drafts. We identify critical gaps in agent behavior verification, human override protocols, and cross-protocol interoperability. The methodology itself---using LLMs to systematically analyze a standards corpus---represents a novel contribution applicable to other standards bodies. Our open-source toolkit and dataset are released for reproducibility.
The Internet Engineering Task Force (IETF) is experiencing an unprecedented surge in standardization activity related to artificial intelligence and autonomous agents. We present the first systematic quantitative survey of this landscape, analyzing 474 Internet-Drafts from 710 authors across approximately 280 organizations submitted between 2024 and early 2026. After false-positive filtering (113 drafts excluded as not AI-relevant), 361 drafts form the core analytical corpus. Using a hybrid LLM-assisted pipeline---Anthropic Claude for multi-dimensional rating and idea extraction, Ollama/nomic-embed-text for semantic embedding and similarity analysis---we assess each draft on five dimensions (novelty, maturity, overlap, momentum, relevance), extract 462 deduplicated technical ideas (from approximately 1,780 raw extractions), identify 12 standardization gaps (2 critical), and map the co-authorship network. Cross-organizational convergence analysis reveals 132 ideas (33\% convergence rate) where independent teams arrived at similar solutions. Our analysis reveals three headline findings: (1) a safety deficit averaging 4:1 (varying 1.5:1 to 21:1 month-to-month) of capability-building drafts to safety-focused ones; (2) significant thematic redundancy, with 42 overlap clusters and 155 competing agent-to-agent protocol proposals; and (3) concentrated organizational authorship, with a single company contributing approximately 16\% of all drafts. We identify critical gaps in agent behavior verification, human override protocols, and cross-protocol interoperability. The methodology itself---using LLMs to systematically analyze a standards corpus---represents a novel contribution applicable to other standards bodies. Our open-source toolkit and dataset are released under the MIT license for reproducibility.
\end{abstract}
\noindent\textbf{Keywords:} IETF, Internet-Drafts, AI agents, standardization, protocol analysis, LLM-assisted analysis, embedding similarity, safety deficit, author networks
@@ -71,19 +71,21 @@ However, the speed and volume of this activity raises important questions:
To answer these questions, we built an automated analysis pipeline that:
\begin{enumerate}[nosep]
\item Harvests draft metadata and full text from the IETF Datatracker API (434 drafts, 557 authors).
\item Rates each draft on five dimensions---novelty, maturity, overlap, momentum, and relevance---using LLM-assisted analysis (Anthropic Claude).
\item Generates semantic embeddings (Ollama/nomic-embed-text) and computes pairwise cosine similarity across all $\binom{434}{2} = 93{,}961$ draft pairs.
\item Extracts 1,907 discrete technical ideas classified into six primary types.
\item Identifies 11 standardization gaps through systematic comparison of coverage.
\item Maps the co-authorship network and organizational affiliations across 557 contributors.
\item Harvests draft metadata and full text from the IETF Datatracker API (474 drafts, 710 authors), with false-positive filtering reducing the analytical corpus to 361 relevant drafts.
\item Rates each draft on five dimensions---novelty, maturity, overlap, momentum, and relevance---using LLM-assisted analysis (Anthropic Claude), with scores clamped to 1--5 and validated for bounds.
\item Generates semantic embeddings (Ollama/nomic-embed-text) and computes pairwise cosine similarity across all $\binom{474}{2} = 112{,}101$ draft pairs.
\item Extracts approximately 1,780 raw technical ideas, deduplicated via SequenceMatcher (threshold 0.75) to 462 unique ideas classified into six primary types.
\item Identifies 12 standardization gaps through systematic comparison of coverage.
\item Maps the co-authorship network and organizational affiliations across 710 contributors.
\item Performs cross-organizational convergence analysis, finding 132 ideas (33\%) independently proposed by multiple organizations.
\end{enumerate}
\noindent Our contributions are:
\begin{itemize}[nosep]
\item \textbf{First systematic survey} of AI/agent-related IETF drafts at scale, covering 434 drafts.
\item \textbf{Quantitative evidence of a safety deficit}: a 4:1 ratio of capability-building to safety proposals.
\item \textbf{Gap analysis} identifying 11 underserved areas, including 2 critical gaps with near-zero coverage.
\item \textbf{First systematic survey} of AI/agent-related IETF drafts at scale, covering 474 drafts (361 after false-positive filtering).
\item \textbf{Quantitative evidence of a safety deficit}: averaging 4:1 capability-building to safety proposals (varying 1.5:1 to 21:1 month-to-month).
\item \textbf{Gap analysis} identifying 12 underserved areas, including 2 critical gaps with near-zero coverage.
\item \textbf{Cross-organizational convergence analysis}: 132 ideas independently proposed by multiple organizations, indicating implicit community consensus.
\item \textbf{Reproducible LLM-assisted methodology} combining Claude-based rating with embedding-based similarity, applicable to other standards corpora.
\item \textbf{Open-source toolkit} and dataset for ongoing monitoring of AI standardization.
\end{itemize}
@@ -156,7 +158,7 @@ Each draft was assessed using Anthropic Claude (Sonnet 4) on five dimensions, ea
\subsection{Embedding and Similarity Analysis}
We generated 768-dimensional embeddings for each draft using Ollama with the \texttt{nomic-embed-text} model, encoding a combination of title, abstract, and the first 4,000 characters of full text. Pairwise cosine similarity was computed across all $\binom{434}{2} = 93{,}961$ draft pairs:
We generated 768-dimensional embeddings for each draft using Ollama with the \texttt{nomic-embed-text} model, encoding a combination of title, abstract, and the first 4,000 characters of full text. Pairwise cosine similarity was computed across all $\binom{474}{2} = 112{,}101$ draft pairs:
\begin{equation}
\text{sim}(a, b) = \frac{\mathbf{v}_a \cdot \mathbf{v}_b}{\|\mathbf{v}_a\| \cdot \|\mathbf{v}_b\|}
\end{equation}
@@ -173,7 +175,7 @@ Gaps were identified by comparing the idea coverage across categories against th
\subsection{Author Network Analysis}
Author and affiliation data were retrieved from Datatracker, yielding a bipartite graph of 557 authors across 434 drafts. We identified persistent co-author teams (``team blocs'') using a pairwise draft overlap threshold of $\geq$70\% with $\geq$3 shared drafts. Cross-organizational collaboration was measured by counting shared drafts between organizations.
Author and affiliation data were retrieved from Datatracker, yielding a bipartite graph of 710 authors across 474 drafts. We identified persistent co-author teams (``team blocs'') using a pairwise draft overlap threshold of $\geq$70\% with $\geq$3 shared drafts. Cross-organizational collaboration was measured by counting shared drafts between organizations.
\subsection{Reproducibility and Cost}
@@ -191,22 +193,26 @@ The entire analysis pipeline is implemented as a Python CLI tool (\texttt{ietf})
\toprule
\textbf{Metric} & \textbf{Value} \\
\midrule
Internet-Drafts analyzed & 434 \\
Unique authors & 557 \\
Organizations represented & 230 \\
Technical ideas extracted & 1,907 \\
Standardization gaps identified & 11 \\
Drafts with ratings & 434 \\
Internet-Drafts collected & 474 \\
False positives excluded & 113 \\
Relevant drafts (analytical corpus) & 361 \\
Unique authors & 710 \\
Organizations represented & $\sim$280 \\
Raw ideas extracted & $\sim$1,780 \\
Deduplicated ideas & 462 \\
Cross-org convergent ideas & 132 (33\%) \\
Standardization gaps identified & 12 \\
Drafts with ratings & 474 \\
Overlap clusters ($\geq$0.85 threshold) & 42 \\
Near-duplicate pairs ($\geq$0.90 threshold) & 34 \\
Time span & 2024 -- Mar 2026 \\
Embedding dimension & 768 (nomic-embed-text) \\
Pairwise similarity pairs & 93,961 \\
Pairwise similarity pairs & 112,101 \\
\bottomrule
\end{tabular}
\end{table}
The corpus spans drafts submitted from early 2024 through March 2026, with the overwhelming majority (425 of 434) submitted after June 2025. Table~\ref{tab:growth} shows the acceleration in AI/agent-related submissions relative to total IETF activity.
The corpus spans drafts submitted from early 2024 through March 2026, with the overwhelming majority submitted after June 2025. Of the 474 drafts collected, 113 were flagged as false positives (relevance score $\leq 2$ or manually identified as non-AI-related, e.g., HPKE key encapsulation, PIE bufferbloat management), leaving 361 drafts in the analytical corpus. Table~\ref{tab:growth} shows the acceleration in AI/agent-related submissions relative to total IETF activity.
\begin{table}[h]
\centering
@@ -242,31 +248,31 @@ Our LLM-assisted classification assigned each draft to one or more of ten semant
\toprule
\textbf{Category} & \textbf{Drafts} & \textbf{Share} \\
\midrule
Data formats / interoperability & 145 & 33\% \\
A2A protocols & 120 & 28\% \\
Agent identity / authentication & 108 & 25\% \\
Autonomous network operations & 93 & 21\% \\
Policy / governance & 91 & 21\% \\
ML traffic management & 73 & 17\% \\
Agent discovery / registration & 65 & 15\% \\
AI safety / alignment & 44 & 10\% \\
Model serving / inference & 42 & 10\% \\
Human-agent interaction & 30 & 7\% \\
Data formats / interoperability & 174 & 36\% \\
A2A protocols & 155 & 33\% \\
Agent identity / authentication & 152 & 32\% \\
Autonomous network operations & 114 & 24\% \\
Policy / governance & 91 & 19\% \\
ML traffic management & 73 & 15\% \\
Agent discovery / registration & 65 & 14\% \\
AI safety / alignment & 47 & 10\% \\
Model serving / inference & 42 & 9\% \\
Human-agent interaction & 34 & 7\% \\
\bottomrule
\end{tabular}
\end{table}
The most striking finding is the \textbf{safety deficit}. Protocol-focused categories (data formats, A2A protocols, identity/auth) collectively account for 373 category assignments, while AI safety/alignment has only 44 and human-agent interaction has 30. This yields a \textbf{4:1 ratio of capability-building to safety proposals}. For every draft about keeping agents safe, approximately four are building new capabilities. For every draft about human-agent interaction, there are more than four about agents operating autonomously.
The most striking finding is the \textbf{safety deficit}. Protocol-focused categories (data formats, A2A protocols, identity/auth) collectively account for 481 category assignments, while AI safety/alignment has only 47 and human-agent interaction has 34. This yields an average \textbf{4:1 ratio of capability-building to safety proposals} (varying from 1.5:1 to 21:1 month-to-month). For every draft about keeping agents safe, approximately four are building new capabilities. For every draft about human-agent interaction, there are more than four about agents operating autonomously.
The safety drafts that \emph{do} exist are often among the highest-rated. \texttt{draft-aylward-daap-v2} (a comprehensive accountability protocol) and \texttt{draft-cowles-volt} (a tamper-evident execution trace format) each scored 4.8/5.0---the highest in the entire corpus. The quality is there; the quantity is not.
The safety drafts that \emph{do} exist are often among the highest-rated. \texttt{draft-aylward-daap-v2} (a comprehensive accountability protocol) and \texttt{draft-cowles-volt} (a tamper-evident execution trace format) each scored 4.75/5.0---among the highest in the entire corpus. The quality is there; the quantity is not.
\subsection{Rating Distributions}
Across all 434 rated drafts, Table~\ref{tab:ratings} summarizes the five rating dimensions.
Across all 474 rated drafts, Table~\ref{tab:ratings} summarizes the five rating dimensions. \textit{Note: Ratings are LLM-generated from abstracts and partial full text, without human calibration. They should be treated as relative rankings rather than absolute quality measures.}
\begin{table}[h]
\centering
\caption{Average scores across five rating dimensions ($n = 434$, scale 1--5).}
\caption{Average scores across five rating dimensions ($n = 474$, scale 1--5). Scores are LLM-generated and uncalibrated against human baselines.}
\label{tab:ratings}
\begin{tabular}{lcc}
\toprule
@@ -312,11 +318,11 @@ Table~\ref{tab:clusters} shows the three largest competing clusters.
We also identified 25 near-duplicate draft pairs ($>$0.98 cosine similarity)---functionally identical proposals submitted under different names, in different working groups, or as renamed versions. Notable examples include \texttt{draft-rosenberg-aiproto} and \texttt{draft-rosenberg-aiproto-nact} (same N-ACT protocol, renamed), and \texttt{draft-abbey-scim-agent-extension} and \texttt{draft-scim-agent-extension} (same SCIM extension, different submission path).
This fragmentation has practical consequences. The most common recurring technical idea---``Multi-Agent Communication Protocol''---appears independently in 8 separate drafts from different teams. Yet of the 1,907 technical ideas extracted from the corpus, \textbf{96\% appear in exactly one draft}. Everyone is solving the same problems; nobody is solving them together.
This fragmentation has practical consequences. The most common recurring technical idea---``Multi-Agent Communication Protocol''---appears independently in 8 separate drafts from different teams. Yet of the 462 deduplicated technical ideas extracted from the corpus, the majority appear in only one or two drafts, with only 132 (33\%) showing cross-organizational convergence. Everyone is solving the same problems; nobody is solving them together.
\subsection{Technical Ideas Landscape}
The 1,907 extracted ideas distribute across six primary types (Table~\ref{tab:ideas}).
The 462 deduplicated ideas (from approximately 1,780 raw extractions, consolidated via SequenceMatcher at 0.75 threshold) distribute across six primary types (Table~\ref{tab:ideas}).
\begin{table}[h]
\centering
@@ -326,15 +332,15 @@ The 1,907 extracted ideas distribute across six primary types (Table~\ref{tab:id
\toprule
\textbf{Idea Type} & \textbf{Count} & \textbf{\%} \\
\midrule
Mechanism & 694 & 36.4 \\
Architecture & 301 & 15.8 \\
Pattern & 273 & 14.3 \\
Protocol & 237 & 12.4 \\
Extension & 201 & 10.5 \\
Requirement & 182 & 9.5 \\
Other & 19 & 1.0 \\
Architecture & 107 & 23.2 \\
Protocol & 106 & 22.9 \\
Extension & 84 & 18.2 \\
Mechanism & 74 & 16.0 \\
Requirement & 47 & 10.2 \\
Pattern & 40 & 8.7 \\
Other & 4 & 0.9 \\
\midrule
\textbf{Total} & \textbf{1,907} & \textbf{100.0} \\
\textbf{Total} & \textbf{462} & \textbf{100.0} \\
\bottomrule
\end{tabular}
\end{table}
@@ -367,7 +373,7 @@ The authorship landscape shows significant organizational concentration. Table~\
\toprule
\textbf{Organization} & \textbf{Authors} & \textbf{Drafts} \\
\midrule
Huawei & 53 & 66 \\
Huawei & 55 & 69 \\
China Mobile & 24 & 35 \\
Cisco & 24 & 26 \\
Independent & 19 & 25 \\
@@ -381,7 +387,7 @@ Ericsson & 4 & 9 \\
\end{tabular}
\end{table}
Huawei dominates with 53 authors contributing to 66 drafts---\textbf{18\% of the entire corpus} from a single company. Chinese technology organizations collectively (Huawei, China Mobile, China Telecom, China Unicom, ZTE, Tsinghua) contribute approximately 40\% of all drafts. Western participation is led by Cisco (26 drafts) and independent contributors (25 drafts), with notable concentrated contributions from Five9 (10 drafts from a single prolific author, Jonathan Rosenberg) and Ericsson (9 drafts from 4 authors).
Huawei dominates with 55 authors contributing to 69 drafts---\textbf{approximately 16\% of the entire corpus} from a single company. Chinese technology organizations collectively (Huawei, China Mobile, China Telecom, China Unicom, ZTE, Tsinghua) contribute approximately 40\% of all drafts. Western participation is led by Cisco (26 drafts) and independent contributors (25 drafts), with notable concentrated contributions from Five9 (10 drafts from a single prolific author, Jonathan Rosenberg) and Ericsson (9 drafts from 4 authors).
\subsubsection{Team Blocs}
@@ -419,7 +425,7 @@ Table~\ref{tab:top} lists the five highest-scored drafts, representing the propo
\section{Gap Analysis}
Our systematic gap analysis identified 11 areas where standardization work is missing or inadequate. Table~\ref{tab:gaps} summarizes these gaps by severity.
Our systematic gap analysis identified 12 areas where standardization work is missing or inadequate. Table~\ref{tab:gaps} summarizes these gaps by severity.
\begin{table}[h]
\centering
@@ -442,6 +448,7 @@ MED & Cross-Protocol Migration & No state/context migration between different A2
MED & Real-time Debugging & No standard interfaces for production agent introspection & 23 \\
MED & Model Update Security & Missing cryptographically verified, rollback-capable agent updates & 79 \\
MED & Energy Optimization & No energy-aware agent deployment or energy budget enforcement & 17 \\
MED & GDPR Compliance & No mechanisms for DPIA support, right-to-erasure propagation, or purpose limitation in agent chains & 0 \\
\bottomrule
\end{tabularx}
\end{table}
@@ -454,11 +461,11 @@ Some drafts approach the problem from adjacent angles. \texttt{draft-aylward-daa
\subsection{Critical Gap: Human Override Protocols}
Only 30 of 434 drafts address human-agent interaction, compared to 120 A2A protocol drafts and 93 autonomous operations drafts. Agents are being designed to talk to each other at a 4:1 ratio over being designed to talk to humans. The CHEQ protocol (\texttt{draft-rosenberg-aiproto-cheq}, score 3.9) is a rare exception---it defines human confirmation \emph{before} agent execution. But CHEQ is opt-in and pre-execution. No draft standardizes what happens \emph{during} execution: how a human pauses a running workflow, constrains an agent's scope, takes over a task, or issues an emergency stop.
Only 34 of 474 drafts address human-agent interaction, compared to 155 A2A protocol drafts and 114 autonomous operations drafts. Agents are being designed to talk to each other at a 4:1 ratio over being designed to talk to humans. The CHEQ protocol (\texttt{draft-rosenberg-aiproto-cheq}, score 3.9) is a rare exception---it defines human confirmation \emph{before} agent execution. But CHEQ is opt-in and pre-execution. No draft standardizes what happens \emph{during} execution: how a human pauses a running workflow, constrains an agent's scope, takes over a task, or issues an emergency stop.
\subsection{The Zero-Coverage Gap: Cross-Protocol Translation}
With 120 competing A2A protocols and no translation layer, agents speaking different protocols cannot interoperate. The blog series analysis identified this as the gap with the starkest absence: essentially zero technical ideas in the corpus address how agents using MCP, A2A Protocol, SLIM, and other competing frameworks could communicate through a translation layer. If the IETF does not build this, the market will---and the result will be vendor-locked ecosystems rather than open interoperability.
With 155 competing A2A protocols and no translation layer, agents speaking different protocols cannot interoperate. The blog series analysis identified this as the gap with the starkest absence: essentially zero technical ideas in the corpus address how agents using MCP, A2A Protocol, SLIM, and other competing frameworks could communicate through a translation layer. If the IETF does not build this, the market will---and the result will be vendor-locked ecosystems rather than open interoperability.
% ── 7. Discussion ────────────────────────────────────────────────────────
@@ -472,7 +479,7 @@ The quality signal offers a counterpoint: the highest-scored drafts in the corpu
\subsection{The Redundancy Problem}
With 42 overlap clusters and 120 competing A2A protocol proposals, the IETF AI/agent space shows significant coordination failure. The OAuth-for-agents cluster alone contains 13 independent proposals, none compatible with each other. This fragmentation wastes engineering effort, confuses implementers, and risks incompatible deployments that entrench rather than resolve the problem.
With 42 overlap clusters and 155 competing A2A protocol proposals, the IETF AI/agent space shows significant coordination failure. The OAuth-for-agents cluster alone contains 13 independent proposals, none compatible with each other. This fragmentation wastes engineering effort, confuses implementers, and risks incompatible deployments that entrench rather than resolve the problem.
We observe that redundancy is partly a natural consequence of the IETF's open submission process---anyone can submit a draft---and partly reflects the ``gold rush'' dynamics where organizations race to establish their preferred approach as the standard. The embedding-based similarity tools developed here could help IETF area directors flag duplicates during triage and actively encourage consolidation.
@@ -484,7 +491,7 @@ This bifurcation extends to the technical foundations. The Chinese bloc tends to
\subsection{Methodological Contributions}
The LLM-assisted analysis pipeline itself represents a methodological contribution. Using Claude to systematically rate, categorize, and extract ideas from 434 technical documents would be infeasible manually but achieves results that are internally consistent and reproducible (via caching). Several design choices merit discussion:
The LLM-assisted analysis pipeline itself represents a methodological contribution. Using Claude to systematically rate, categorize, and extract ideas from 474 technical documents would be infeasible manually but achieves results that are internally consistent and reproducible (via caching). Several design choices merit discussion:
\begin{itemize}[nosep]
\item \textbf{LLM rating validity}: Claude rates based on abstracts and partial full text, which may not capture implementation depth. We mitigate this by using five orthogonal dimensions that capture different quality facets, and by validating that alternative weighting schemes produce highly correlated rankings (Appendix~\ref{app:sensitivity}, Spearman $\rho \geq 0.93$).
@@ -496,7 +503,7 @@ The LLM-assisted analysis pipeline itself represents a methodological contributi
\subsection{Toward an Architectural Vision}
Our analysis suggests that the 11 gaps are not random absences but structurally related. They point to four missing architectural pillars for the AI agent ecosystem:
Our analysis suggests that the 12 gaps are not random absences but structurally related. They point to four missing architectural pillars for the AI agent ecosystem:
\begin{enumerate}[nosep]
\item \textbf{DAG-based execution model}: Multi-agent workflows as directed acyclic graphs with checkpoints, rollback, and blast-radius containment---addressing error recovery, resource management, and coordination gaps.
@@ -514,11 +521,13 @@ Our analysis suggests that the 11 gaps are not random absences but structurally
\begin{itemize}[nosep]
\item \textbf{Keyword bias}: Our twelve seed keywords may miss relevant drafts using different terminology (e.g., ``cognitive computing,'' ``neural network'' in draft names).
\item \textbf{Single-LLM assessment}: Ratings from Claude may carry systematic biases. Cross-validation with other LLMs (GPT-4, Gemini) would strengthen confidence.
\item \textbf{Snapshot analysis}: The dataset reflects a point in time; drafts expire, evolve, and merge continuously.
\item \textbf{Single-LLM assessment}: All ratings come from Claude with no human calibration or inter-rater reliability testing. No intra-rater consistency check was performed. Cross-validation with other LLMs (GPT-4, Gemini) and human expert baselines would strengthen confidence.
\item \textbf{Abstract-level rating}: Ratings are based on abstracts and partial full text (first 4,000 characters), which may not capture implementation depth in longer specifications.
\item \textbf{Snapshot analysis}: The dataset reflects a point in time (March 2026); drafts expire, evolve, and merge continuously.
\item \textbf{False positive filtering}: Despite removing 113 false positives, an estimated 20--30 borderline drafts may remain in the corpus. The filtering threshold (relevance $\leq 2$) is conservative.
\item \textbf{Author disambiguation}: Datatracker affiliations are self-reported and may be inconsistent (e.g., ``Huawei'' vs.\ ``Huawei Technologies'' appear as separate entities).
\item \textbf{No citation analysis}: We do not track inter-draft references, which would reveal influence networks beyond topical similarity.
\item \textbf{Abstract-level assessment}: Rating from abstracts may miss implementation depth in full-text specifications.
\item \textbf{Clustering threshold}: The 0.85 cosine similarity threshold for overlap clustering was chosen empirically without sensitivity analysis across multiple thresholds.
\end{itemize}
% ── 8. Related Work ─────────────────────────────────────────────────────
@@ -531,7 +540,7 @@ Our analysis suggests that the 11 gaps are not random absences but structurally
\textbf{LLM-assisted evaluation.} Zheng et al.~\citep{zheng2023} demonstrate that LLM judges can match human evaluation quality for text assessment. Our pipeline extends this approach from evaluating model outputs to evaluating standards documents, using structured prompts for multi-dimensional rating.
\textbf{Multi-agent systems.} The AAMAS community has long studied multi-agent coordination~\citep{wooldridge2009}. Our analysis reveals that the IETF is now addressing many of the same problems (coordination, trust, resource allocation) but from a protocol standardization perspective rather than an algorithmic one.
\textbf{Multi-agent systems.} The AAMAS community has long studied multi-agent coordination~\citep{wooldridge2009}. The Foundation for Intelligent Physical Agents (FIPA) developed the first agent communication standards (FIPA-ACL, Agent Management, Interaction Protocols) in the late 1990s, which influenced many current IETF proposals~\citep{fipa2002}. IEEE P3394 (Standard for Trustworthy Autonomous and Semi-Autonomous Systems) addresses agent trust from a systems engineering perspective~\citep{ieeep3394}. The W3C Web of Things Architecture~\citep{w3cwot2020} defines discovery and description mechanisms relevant to agent registration. Our analysis reveals that the IETF is now addressing many of the same problems (coordination, trust, resource allocation) but from a protocol standardization perspective rather than an algorithmic one, with limited acknowledgment of this prior work.
% ── 9. Future Work ──────────────────────────────────────────────────────
@@ -550,11 +559,11 @@ Our analysis suggests that the 11 gaps are not random absences but structurally
\section{Conclusion}
The IETF AI/agent standardization wave represents a unique moment in Internet governance: the community is attempting to standardize the infrastructure for autonomous agents concurrently with their deployment. Our analysis of 434 Internet-Drafts from 557 authors reveals a landscape characterized by both extraordinary energy and significant structural problems.
The IETF AI/agent standardization wave represents a unique moment in Internet governance: the community is attempting to standardize the infrastructure for autonomous agents concurrently with their deployment. Our analysis of 474 Internet-Drafts from 710 authors (361 after false-positive filtering) reveals a landscape characterized by both extraordinary energy and significant structural problems.
Three findings demand attention. First, the \textbf{4:1 safety deficit}: the community is building agent capabilities four times faster than safety mechanisms, despite the highest-quality proposals being safety-focused. Second, \textbf{extreme fragmentation}: 120 competing A2A protocol proposals, 13 independent OAuth-for-agents drafts, and 96\% of technical ideas appearing in only one draft indicate that coordination mechanisms are failing to keep pace with submission volume. Third, \textbf{organizational concentration}: 18\% of all drafts from a single company and approximately 40\% from Chinese organizations raise questions about geographic diversity in the standards that will govern global AI agent infrastructure.
Three findings demand attention. First, the \textbf{4:1 safety deficit}: the community is building agent capabilities four times faster than safety mechanisms, despite the highest-quality proposals being safety-focused. Second, \textbf{extreme fragmentation}: 155 competing A2A protocol proposals, 13 independent OAuth-for-agents drafts, and only 33\% cross-organizational convergence among 462 deduplicated ideas indicate that coordination mechanisms are failing to keep pace with submission volume. Third, \textbf{organizational concentration}: 16\% of all drafts from a single company and approximately 40\% from Chinese organizations raise questions about geographic diversity in the standards that will govern global AI agent infrastructure.
The 1,907 technical ideas we extract represent a rich but disorganized design space. The 11 gaps we identify---from behavior verification to human override protocols to cross-protocol translation---highlight where the community's collective blind spots lie. The architectural vision we sketch, building on existing IETF primitives (WIMSE, ECT, OAuth), suggests a path from fragmentation toward coherence.
The 462 deduplicated technical ideas we extract (with 132 showing cross-organizational convergence) represent a rich but disorganized design space. The 12 gaps we identify---from behavior verification to human override protocols to cross-protocol translation---highlight where the community's collective blind spots lie. The architectural vision we sketch, building on existing IETF primitives (WIMSE, ECT, OAuth), suggests a path from fragmentation toward coherence.
The methodology demonstrated here---combining LLM-assisted multi-dimensional rating with embedding-based similarity analysis---is itself a contribution. At \$3.16 in API costs, it provides a scalable, reproducible approach to standards landscape analysis that could be applied to any standards body facing a surge in submissions. As AI standardization accelerates globally, such tools become essential for maintaining coherence and directing limited community attention to the areas that matter most.
@@ -635,6 +644,23 @@ Anthropic.
\newblock Technical report, 2025.
\newblock \url{https://modelcontextprotocol.io}
\bibitem[FIPA(2002)]{fipa2002}
Foundation for Intelligent Physical Agents.
\newblock FIPA Agent Communication Language Specifications.
\newblock FIPA Standard SC00061G, 2002.
\newblock \url{http://www.fipa.org/specs/fipa00061/}
\bibitem[IEEE(2024)]{ieeep3394}
IEEE Standards Association.
\newblock P3394 -- Standard for Trustworthy Autonomous and Semi-Autonomous Systems.
\newblock IEEE, 2024.
\bibitem[W3C(2020)]{w3cwot2020}
W3C Web of Things Working Group.
\newblock Web of Things (WoT) Architecture.
\newblock W3C Recommendation, April 2020.
\newblock \url{https://www.w3.org/TR/wot-architecture/}
\end{thebibliography}
% ── Appendix ─────────────────────────────────────────────────────────────

View File

@@ -4,7 +4,7 @@ build-backend = "setuptools.build_meta"
[project]
name = "ietf-draft-analyzer"
version = "0.1.0"
version = "0.3.0"
description = "Track, categorize, and rate AI/agent-related IETF Internet-Drafts"
requires-python = ">=3.11"
dependencies = [

285
scripts/build-site.py Normal file
View File

@@ -0,0 +1,285 @@
#!/usr/bin/env python3
"""Build static HTML blog site from markdown posts in data/reports/blog-series/."""
import re
from pathlib import Path
import markdown
ROOT = Path(__file__).resolve().parent.parent
POSTS_DIR = ROOT / "data" / "reports" / "blog-series"
OUT_DIR = ROOT / "docs" / "blog"
CSS_DIR = OUT_DIR / "css"
POSTS_OUT = OUT_DIR / "posts"
# Ordered list of posts to include
POSTS = [
("00-series-overview.md", "Series Overview"),
("01-gold-rush.md", "The Gold Rush"),
("02-who-writes-the-rules.md", "Who Writes the Rules"),
("03-oauth-wars.md", "The OAuth Wars"),
("04-what-nobody-builds.md", "What Nobody Builds"),
("05-1262-ideas.md", "Where Drafts Converge"),
("06-big-picture.md", "The Big Picture"),
("07-how-we-built-this.md", "How We Built This"),
("08-agents-building-the-analysis.md", "Agents Building the Agent Analysis"),
]
CSS = """\
:root {
--bg: #ffffff;
--text: #1a1a1a;
--muted: #6b7280;
--border: #e5e7eb;
--accent: #2563eb;
--code-bg: #f3f4f6;
}
@media (prefers-color-scheme: dark) {
:root {
--bg: #111827;
--text: #e5e7eb;
--muted: #9ca3af;
--border: #374151;
--accent: #60a5fa;
--code-bg: #1f2937;
}
}
* { margin: 0; padding: 0; box-sizing: border-box; }
body {
font-family: -apple-system, BlinkMacSystemFont, 'Segoe UI', system-ui, sans-serif;
color: var(--text);
background: var(--bg);
line-height: 1.7;
font-size: 17px;
}
.container {
max-width: 720px;
margin: 0 auto;
padding: 2rem 1.5rem;
}
nav {
border-bottom: 1px solid var(--border);
padding: 1rem 0;
margin-bottom: 2rem;
}
nav a {
color: var(--accent);
text-decoration: none;
margin-right: 1.5rem;
font-size: 0.9rem;
}
nav a:hover { text-decoration: underline; }
nav .site-title { font-weight: 700; font-size: 1.1rem; }
h1 { font-size: 2rem; margin: 1.5rem 0 1rem; line-height: 1.2; }
h2 { font-size: 1.5rem; margin: 2rem 0 0.75rem; }
h3 { font-size: 1.2rem; margin: 1.5rem 0 0.5rem; }
p { margin: 0.75rem 0; }
a { color: var(--accent); }
blockquote {
border-left: 3px solid var(--accent);
padding-left: 1rem;
color: var(--muted);
margin: 1rem 0;
}
code {
background: var(--code-bg);
padding: 0.15rem 0.4rem;
border-radius: 3px;
font-size: 0.9em;
}
pre {
background: var(--code-bg);
padding: 1rem;
border-radius: 6px;
overflow-x: auto;
margin: 1rem 0;
}
pre code { background: none; padding: 0; }
table {
width: 100%;
border-collapse: collapse;
margin: 1rem 0;
font-size: 0.95rem;
}
th, td {
padding: 0.5rem 0.75rem;
border: 1px solid var(--border);
text-align: left;
}
th { background: var(--code-bg); font-weight: 600; }
ul, ol { padding-left: 1.5rem; margin: 0.75rem 0; }
li { margin: 0.25rem 0; }
.post-nav {
display: flex;
justify-content: space-between;
margin-top: 3rem;
padding-top: 1rem;
border-top: 1px solid var(--border);
font-size: 0.9rem;
}
.post-list { list-style: none; padding: 0; }
.post-list li { margin: 1rem 0; }
.post-list a { font-size: 1.1rem; font-weight: 500; }
.post-list .desc { color: var(--muted); font-size: 0.9rem; }
footer {
margin-top: 3rem;
padding-top: 1rem;
border-top: 1px solid var(--border);
color: var(--muted);
font-size: 0.85rem;
}
"""
def slug(filename: str) -> str:
return filename.replace(".md", ".html")
def build_nav(current: str = "") -> str:
links = ['<a href="/blog/" class="site-title">IETF AI Agent Analysis</a>']
for fn, title in POSTS[1:]: # skip overview in nav
s = slug(fn)
if fn == current:
links.append(f"<strong>{title.split()[-1]}</strong>")
else:
links.append(f'<a href="/blog/posts/{s}">{title.split()[-1]}</a>')
return "<nav>" + "\n".join(links) + "</nav>"
def build_post_nav(idx: int) -> str:
parts = []
if idx > 0:
prev_fn, prev_title = POSTS[idx - 1]
parts.append(f'<a href="/blog/posts/{slug(prev_fn)}">&larr; {prev_title}</a>')
else:
parts.append("<span></span>")
if idx < len(POSTS) - 1:
next_fn, next_title = POSTS[idx + 1]
parts.append(f'<a href="/blog/posts/{slug(next_fn)}">{next_title} &rarr;</a>')
else:
parts.append("<span></span>")
return f'<div class="post-nav">{parts[0]}{parts[1]}</div>'
def wrap_html(title: str, body: str, nav: str, post_nav: str = "") -> str:
return f"""<!DOCTYPE html>
<html lang="en">
<head>
<meta charset="utf-8">
<meta name="viewport" content="width=device-width, initial-scale=1">
<title>{title} — IETF AI Agent Analysis</title>
<link rel="stylesheet" href="/blog/css/style.css">
</head>
<body>
<div class="container">
{nav}
{body}
{post_nav}
<footer>
<p>IETF Draft Analyzer &mdash; Data collected through March 2026.
<a href="https://github.com/cnennemann/ietf-draft-analyzer">Source on GitHub</a></p>
</footer>
</div>
</body>
</html>"""
def extract_title(md_text: str) -> str:
"""Extract first H1 from markdown."""
m = re.search(r"^#\s+(.+)$", md_text, re.MULTILINE)
return m.group(1) if m else "Untitled"
def main():
md_ext = markdown.Markdown(extensions=["tables", "fenced_code", "toc"])
# Create output dirs
CSS_DIR.mkdir(parents=True, exist_ok=True)
POSTS_OUT.mkdir(parents=True, exist_ok=True)
# Write CSS
(CSS_DIR / "style.css").write_text(CSS)
# Write .nojekyll
(OUT_DIR / ".nojekyll").write_text("")
# Build each post
for idx, (fn, fallback_title) in enumerate(POSTS):
src = POSTS_DIR / fn
if not src.exists():
print(f" SKIP {fn} (not found)")
continue
md_text = src.read_text()
md_ext.reset()
html_body = md_ext.convert(md_text)
title = extract_title(md_text) or fallback_title
nav = build_nav(fn)
post_nav = build_post_nav(idx)
full_html = wrap_html(title, html_body, nav, post_nav)
out_path = POSTS_OUT / slug(fn)
out_path.write_text(full_html)
print(f" BUILT {out_path.relative_to(ROOT)}")
# Build index page
post_links = []
for i, (fn, title) in enumerate(POSTS):
if i == 0:
continue # skip overview in index list
post_links.append(
f'<li><a href="/blog/posts/{slug(fn)}">Post {i}: {title}</a></li>'
)
index_body = f"""
<h1>The AI Agent Standards Gold Rush</h1>
<p><em>A data-driven analysis of {475} IETF Internet-Drafts on AI agents, autonomous systems, and machine learning protocols.</em></p>
<p>The IETF is experiencing an unprecedented surge in AI/agent standardization activity.
We built an automated analysis pipeline to make sense of it: {713} authors, {501} ideas,
{132} cross-organizational convergent ideas, and {12} identified gaps.</p>
<h2>The Series</h2>
<ul class="post-list">
{"".join(post_links)}
</ul>
<h2>About</h2>
<p>This analysis was produced using the <a href="https://github.com/cnennemann/ietf-draft-analyzer">IETF Draft Analyzer</a>,
an open-source Python tool that combines Claude for multi-dimensional rating and idea extraction
with Ollama for semantic embeddings. Total API cost: ~$9-15.</p>
<p><a href="/blog/posts/{slug(POSTS[0][0])}">Read the series overview &rarr;</a></p>
"""
index_html = wrap_html("Home", index_body, build_nav())
(OUT_DIR / "index.html").write_text(index_html)
print(f" BUILT docs/blog/index.html")
print(f"\nDone. {len(POSTS) + 1} pages built in docs/blog/")
if __name__ == "__main__":
main()

View File

@@ -0,0 +1,135 @@
#!/usr/bin/env python3
"""Compare Claude Haiku vs Ollama as pre-classifiers, using Claude Sonnet ratings as ground truth."""
import sqlite3
import hashlib
import json
import sys
import time
sys.path.insert(0, "src")
import anthropic
from ietf_analyzer.config import Config
cfg = Config.load()
conn = sqlite3.connect(cfg.db_path)
conn.row_factory = sqlite3.Row
HAIKU_PROMPT = """\
You are classifying IETF Internet-Drafts for an AI/agent standards tracker.
A draft is RELEVANT if it relates to ANY of these topics:
- AI agents, autonomous agents, multi-agent systems
- Agent identity, authentication, authorization, discovery
- Agent-to-agent (A2A) communication protocols
- Large language models (LLMs), generative AI
- Machine learning in network operations
- AI safety, alignment, trustworthiness
- Model Context Protocol (MCP), agentic workflows
- OAuth/JWT/credentials for agents or AI systems
- Autonomous network operations using AI
- Intelligent network management or traffic handling
A draft is NOT relevant if it only covers:
- Pure cryptography without AI/agent context
- General networking protocols (BGP, DNS, TLS) without AI
- Email, HTTP, or web standards without AI/agent features
- Remote attestation (RATS) unless specifically for AI agents
- Accessibility guidelines for user agents (browsers)
Title: {title}
Abstract: {abstract}
Is this draft relevant to AI agents or related topics? Answer ONLY "yes" or "no"."""
client = anthropic.Anthropic()
def haiku_classify(title, abstract):
"""Classify with Haiku, using llm_cache to avoid repeat calls."""
prompt = HAIKU_PROMPT.format(title=title, abstract=abstract[:2000])
cache_key = hashlib.sha256(f"haiku-classify:{prompt}".encode()).hexdigest()
# Check cache
cached = conn.execute("SELECT response_json FROM llm_cache WHERE prompt_hash=?", (cache_key,)).fetchone()
if cached:
return cached["response_json"].strip().lower().startswith("yes"), True
resp = client.messages.create(
model=cfg.claude_model_cheap,
max_tokens=10,
messages=[{"role": "user", "content": prompt}],
)
answer = resp.content[0].text.strip().lower()
# Cache it
conn.execute(
"INSERT OR REPLACE INTO llm_cache (draft_name, prompt_hash, request_json, response_json, model, input_tokens, output_tokens) VALUES (?,?,?,?,?,?,?)",
("_classify_", cache_key, prompt[:500], answer, cfg.claude_model_cheap, resp.usage.input_tokens, resp.usage.output_tokens),
)
conn.commit()
return answer.startswith("yes"), False
# Get all rated drafts
rows = conn.execute("""
SELECT d.name, d.title, d.abstract, r.relevance, r.false_positive
FROM drafts d JOIN ratings r ON d.name = r.draft_name
WHERE d.abstract IS NOT NULL AND d.abstract != ''
ORDER BY d.name
""").fetchall()
print(f"Classifying {len(rows)} drafts with Haiku...\n")
haiku_agree = 0
haiku_fp = [] # Haiku=yes, Claude=no
haiku_fn = [] # Haiku=no, Claude=yes
total_tokens_in = 0
total_tokens_out = 0
cached_count = 0
api_count = 0
for i, r in enumerate(rows):
claude_relevant = not r["false_positive"] and r["relevance"] >= 3
haiku_relevant, was_cached = haiku_classify(r["title"], r["abstract"])
if was_cached:
cached_count += 1
else:
api_count += 1
if api_count % 20 == 0:
time.sleep(1) # rate limit
if haiku_relevant == claude_relevant:
haiku_agree += 1
elif haiku_relevant and not claude_relevant:
haiku_fp.append({"name": r["name"], "title": r["title"][:60], "rel": r["relevance"], "fp": r["false_positive"]})
else:
haiku_fn.append({"name": r["name"], "title": r["title"][:60], "rel": r["relevance"], "fp": r["false_positive"]})
if (i + 1) % 50 == 0:
print(f" Processed {i+1}/{len(rows)} ({cached_count} cached, {api_count} API calls)...")
print(f"\n{'='*70}")
print(f"HAIKU AGREEMENT with Claude Sonnet: {haiku_agree}/{len(rows)} ({100*haiku_agree/len(rows):.1f}%)")
print(f"API calls: {api_count}, Cached: {cached_count}")
print(f"{'='*70}")
print(f"\nHaiku=RELEVANT but Sonnet=NOT ({len(haiku_fp)}):")
for d in haiku_fp[:10]:
fp = " [FP]" if d["fp"] else ""
print(f" rel={d['rel']}{fp} | {d['name']}: {d['title']}")
print(f"\nHaiku=IRRELEVANT but Sonnet=RELEVANT ({len(haiku_fn)}):")
for d in haiku_fn[:10]:
print(f" rel={d['rel']} | {d['name']}: {d['title']}")
# Cost estimate
avg_tokens_per_call = 800 # ~800 input tokens per classification
cost_per_draft = (avg_tokens_per_call * 0.80 + 50 * 4.0) / 1_000_000 # Haiku pricing
print(f"\n{'='*70}")
print(f"Cost estimate: ~${cost_per_draft:.5f}/draft = ~${cost_per_draft * len(rows):.3f} for {len(rows)} drafts")
print(f"Ollama cost: $0 (but 66.9% agreement)")
print(f"Haiku cost: ~${cost_per_draft * len(rows):.3f} ({100*haiku_agree/len(rows):.1f}% agreement)")
conn.close()

View File

@@ -0,0 +1,73 @@
#!/usr/bin/env python3
"""Fetch from all 5 sources and import into DB."""
import sys
sys.path.insert(0, "src")
from ietf_analyzer.config import Config
from ietf_analyzer.sources import FETCHERS, get_fetcher
from ietf_analyzer.db import Database
from ietf_analyzer.models import Draft
from rich.console import Console
console = Console()
cfg = Config.load()
db = Database(cfg)
# Only fetch from new sources (IETF and W3C already done recently)
sources_to_fetch = ["etsi", "itu", "iso"]
total_new = 0
for source_name in sources_to_fetch:
console.print(f"\n[bold blue]{'='*60}[/]")
console.print(f"[bold blue]Fetching from {source_name.upper()}...[/]")
console.print(f"[bold blue]{'='*60}[/]")
fetcher = get_fetcher(source_name, cfg)
try:
docs = fetcher.search(cfg.search_keywords)
console.print(f" Found {len(docs)} documents")
new_count = 0
for doc in docs:
existing = db.get_draft(doc.name)
if existing:
continue
new_count += 1
# Convert to Draft
draft = Draft(
name=doc.name,
rev="01",
title=doc.title,
abstract=doc.abstract,
source=doc.source,
source_id=doc.source_id,
source_url=doc.source_url,
time=doc.time,
doc_status=doc.doc_status,
full_text=doc.full_text,
)
db.upsert_draft(draft)
console.print(f" [green]Imported {new_count} new documents[/]")
total_new += new_count
except Exception as e:
console.print(f" [red]Error: {e}[/]")
import traceback
traceback.print_exc()
finally:
fetcher.close()
console.print(f"\n[bold green]Total new documents: {total_new}[/]")
# Final stats
import sqlite3
conn = sqlite3.connect(cfg.db_path)
rows = conn.execute("SELECT source, COUNT(*) FROM drafts GROUP BY source ORDER BY source").fetchall()
console.print("\n[bold]Database by source:[/]")
for source, count in rows:
console.print(f" {source}: {count}")
total = conn.execute("SELECT COUNT(*) FROM drafts").fetchone()[0]
console.print(f" [bold]Total: {total}[/]")
conn.close()

View File

@@ -20,7 +20,7 @@ def _get_config() -> Config:
@click.group()
@click.version_option(version="0.2.0")
@click.version_option(version="0.3.0")
@click.pass_context
def main(ctx):
"""IETF Draft Analyzer — track, categorize, and rate AI/agent Internet-Drafts."""

View File

@@ -52,7 +52,7 @@ class Config:
# Observatory — add "w3c" to enable W3C spec tracking:
# ietf observatory update --source w3c (one-off)
# or set observatory_sources to ["ietf", "w3c"] in config.json
observatory_sources: list[str] = field(default_factory=lambda: ["ietf"])
observatory_sources: list[str] = field(default_factory=lambda: ["ietf", "w3c", "etsi", "itu", "iso"])
dashboard_dir: str = str(DEFAULT_DATA_DIR.parent / "docs")
w3c_groups: list[str] = field(default_factory=lambda: [
"webmachinelearning", "wot", "credentials", "did", "vc"

View File

@@ -1,10 +1,19 @@
"""Multi-source document fetcher registry."""
from .base import SourceDocument, SourceFetcher
from .etsi import ETSIFetcher
from .ietf import IETFFetcher
from .iso import ISOFetcher
from .itu import ITUFetcher
from .w3c import W3CFetcher
FETCHERS = {"ietf": IETFFetcher, "w3c": W3CFetcher}
FETCHERS = {
"ietf": IETFFetcher,
"w3c": W3CFetcher,
"etsi": ETSIFetcher,
"itu": ITUFetcher,
"iso": ISOFetcher,
}
def get_fetcher(source_name: str, config=None):

View File

@@ -0,0 +1,191 @@
"""Fetch AI-related specs from ETSI (no auth needed, free PDFs).
ETSI has no REST API — we scrape the standards search page and download PDFs.
Focus on SAI (Securing AI) and ENI (Experiential Networked Intelligence) groups.
"""
from __future__ import annotations
import re
import time as time_mod
import httpx
from rich.console import Console
from rich.progress import (
BarColumn,
MofNCompleteColumn,
Progress,
SpinnerColumn,
TextColumn,
)
from ..config import Config
from .base import SourceDocument
console = Console()
# ETSI portal search endpoint (returns HTML)
ETSI_SEARCH_URL = "https://www.etsi.org/standards-search"
# Known AI-relevant ETSI technical bodies and their spec prefixes
ETSI_AI_GROUPS = {
"SAI": "Securing Artificial Intelligence",
"ENI": "Experiential Networked Intelligence",
}
# Direct catalog of known ETSI AI specs (bootstrap — extend via search)
# Format: (doc_id, title, group, url_path)
ETSI_AI_CATALOG = [
# SAI — Securing AI
("ETSI GR SAI 001", "AI Threat Ontology", "SAI",
"etsi_gr/SAI/001_099/001/01.01.01_60/gr_SAI001v010101p.pdf"),
("ETSI GR SAI 002", "Data Supply Chain Security", "SAI",
"etsi_gr/SAI/001_099/002/01.01.01_60/gr_SAI002v010101p.pdf"),
("ETSI GR SAI 003", "Security Testing of AI", "SAI",
"etsi_gr/SAI/001_099/003/01.01.01_60/gr_SAI003v010101p.pdf"),
("ETSI GR SAI 004", "Problem Statement on AI and Automated Decision Making", "SAI",
"etsi_gr/SAI/001_099/004/01.01.01_60/gr_SAI004v010101p.pdf"),
("ETSI GR SAI 005", "Mitigation Strategy Report", "SAI",
"etsi_gr/SAI/001_099/005/01.01.01_60/gr_SAI005v010101p.pdf"),
("ETSI GR SAI 006", "Role of Hardware in AI Security", "SAI",
"etsi_gr/SAI/001_099/006/01.01.01_60/gr_SAI006v010101p.pdf"),
("ETSI EN 304 223", "Baseline Cyber Security Requirements for AI Models and Systems", "SAI",
"etsi_en/304200_304299/304223/02.01.01_60/en_304223v020101p.pdf"),
# ENI — Experiential Networked Intelligence
("ETSI GS ENI 001", "ENI Use Cases", "ENI",
"etsi_gs/ENI/001_099/001/03.01.01_60/gs_ENI001v030101p.pdf"),
("ETSI GS ENI 002", "ENI Requirements", "ENI",
"etsi_gs/ENI/001_099/002/03.01.01_60/gs_ENI002v030101p.pdf"),
("ETSI GS ENI 005", "System Architecture", "ENI",
"etsi_gs/ENI/001_099/005/02.01.01_60/gs_ENI005v020101p.pdf"),
("ETSI GR ENI 007", "ENI Definition of Categories for AI Application to Networks", "ENI",
"etsi_gr/ENI/001_099/007/01.01.01_60/gr_ENI007v010101p.pdf"),
("ETSI GS ENI 019", "Representing, Inferring, and Applying Context Information", "ENI",
"etsi_gs/ENI/001_099/019/02.01.01_60/gs_ENI019v020101p.pdf"),
]
ETSI_DELIVER_BASE = "https://www.etsi.org/deliver/"
def _doc_id_to_name(doc_id: str) -> str:
"""Convert ETSI doc ID to a slug name. E.g. 'ETSI GR SAI 001' -> 'etsi-gr-sai-001'."""
return doc_id.lower().replace(" ", "-").replace("/", "-")
class ETSIFetcher:
"""Fetch AI-related specs from ETSI.
Uses a curated catalog of known SAI/ENI specs plus keyword search
on the ETSI portal for discovery.
"""
def __init__(self, config: Config | None = None):
self.config = config or Config.load()
self.client = httpx.Client(timeout=30, follow_redirects=True)
def search(
self, keywords: list[str], since: str | None = None
) -> list[SourceDocument]:
"""Return AI-relevant ETSI specs from catalog + keyword search."""
seen: dict[str, SourceDocument] = {}
# Strategy 1: Curated catalog of known AI specs
console.print(" Loading ETSI AI catalog (SAI + ENI)...")
for doc_id, title, group, url_path in ETSI_AI_CATALOG:
name = _doc_id_to_name(doc_id)
url = f"{ETSI_DELIVER_BASE}{url_path}"
seen[name] = SourceDocument(
name=name,
title=f"{doc_id}: {title}",
abstract=f"ETSI {group} specification: {title}",
source="etsi",
source_id=doc_id,
source_url=url,
time="",
doc_status="published",
extra={"group": group},
)
# Strategy 2: Search ETSI portal for additional AI specs
console.print(" Searching ETSI portal for AI-related specs...")
search_terms = ["artificial intelligence", "machine learning", "autonomous", "neural network"]
for term in search_terms:
try:
resp = self.client.get(
ETSI_SEARCH_URL,
params={"search": term, "page": 1, "size": 50, "sort": "date_desc"},
headers={"Accept": "text/html"},
)
if resp.status_code == 200:
# Parse titles and links from search results
new_docs = self._parse_search_results(resp.text, keywords)
for doc in new_docs:
if doc.name not in seen:
seen[doc.name] = doc
time_mod.sleep(0.5)
except httpx.HTTPError as e:
console.print(f"[yellow]ETSI search error for '{term}': {e}[/]")
console.print(f" Found [bold green]{len(seen)}[/] ETSI specs")
return list(seen.values())
def _parse_search_results(self, html: str, keywords: list[str]) -> list[SourceDocument]:
"""Parse ETSI search results HTML for spec links and titles."""
docs = []
# ETSI search results contain links like /deliver/etsi_gr/SAI/...
# and titles in specific patterns. This is best-effort parsing.
kw_lower = [k.lower() for k in keywords]
# Look for PDF download links in the HTML
pattern = re.compile(
r'href="(/deliver/[^"]+\.pdf)"[^>]*>.*?</a>.*?'
r'<[^>]*class="[^"]*title[^"]*"[^>]*>([^<]+)',
re.DOTALL | re.IGNORECASE,
)
for match in pattern.finditer(html):
url_path = match.group(1)
title = match.group(2).strip()
if not any(kw in title.lower() for kw in kw_lower):
continue
# Extract doc ID from path
doc_id = url_path.split("/")[-1].replace(".pdf", "").replace("_", " ").upper()
name = f"etsi-{url_path.split('/')[-1].replace('.pdf', '').lower()}"
docs.append(SourceDocument(
name=name,
title=title,
abstract=title,
source="etsi",
source_id=doc_id,
source_url=f"https://www.etsi.org{url_path}",
time="",
doc_status="published",
))
return docs
def download_text(self, doc: SourceDocument) -> str | None:
"""Download PDF and extract text (best-effort)."""
url = doc.source_url
if not url or not url.endswith(".pdf"):
return None
try:
resp = self.client.get(url)
resp.raise_for_status()
# We get a PDF — try to extract text with pdfminer if available
try:
from io import BytesIO
from pdfminer.high_level import extract_text
text = extract_text(BytesIO(resp.content))
return text[:100000] if text else None
except ImportError:
# No pdfminer — store as placeholder
console.print(f"[dim]pdfminer not installed, cannot extract text from {doc.name}[/]")
return f"[PDF document: {doc.title}. Install pdfminer.six to extract text.]"
except httpx.HTTPError as e:
console.print(f"[dim]Could not download {doc.name}: {e}[/]")
return None
def close(self) -> None:
self.client.close()

View File

@@ -0,0 +1,196 @@
"""Fetch AI-related standards metadata from ISO/IEC JTC 1/SC 42.
ISO provides open data (JSON/CSV/Parquet) for metadata but full text is paywalled.
We use the open data portal to catalog SC 42 standards and supplement with
publicly available scope/abstract from the ISO Online Browsing Platform (OBP).
"""
from __future__ import annotations
import csv
import io
import re
import time as time_mod
import httpx
from rich.console import Console
from ..config import Config
from .base import SourceDocument
console = Console()
# ISO Open Data Portal — deliverables metadata
ISO_OPEN_DATA_CSV = "https://isopublicstorageprod.blob.core.windows.net/opendata/_latest/iso_deliverables_metadata/csv/iso_deliverables_metadata.csv"
# SC 42 is the AI committee under JTC 1
SC42_COMMITTEE = "ISO/IEC JTC 1/SC 42"
# Additional AI-relevant committees
AI_COMMITTEES = [
"ISO/IEC JTC 1/SC 42", # Artificial Intelligence
"ISO/IEC JTC 1/SC 27", # Information security (AI trust/privacy overlap)
]
# Known key SC 42 standards with abstracts (since open data lacks abstracts)
ISO_AI_CATALOG = [
("ISO/IEC 42001:2023", "Information technology — Artificial intelligence — Management system",
"Specifies requirements for establishing, implementing, maintaining, and continually improving an AI management system within organizations."),
("ISO/IEC 22989:2022", "Information technology — Artificial intelligence — AI concepts and terminology",
"Establishes terminology and describes concepts in the field of artificial intelligence."),
("ISO/IEC 23894:2023", "Information technology — Artificial intelligence — Guidance on risk management",
"Provides guidance on managing risk related to development and use of AI systems."),
("ISO/IEC 23053:2022", "Framework for Artificial Intelligence (AI) Systems Using Machine Learning (ML)",
"Establishes a framework describing a generic AI system using ML technology."),
("ISO/IEC 38507:2022", "Information technology — Governance of IT — Governance implications of the use of AI",
"Provides guidance on the governance implications of AI for governing bodies of organizations."),
("ISO/IEC 5338:2023", "Information technology — AI system life cycle processes",
"Defines processes and their associated activities for the development and operation of AI systems."),
("ISO/IEC 5339:2024", "Information technology — AI — Guidance for AI applications",
"Provides guidance on how to apply AI within organizations, including use case analysis and risk assessment."),
("ISO/IEC 42005:2025", "Information technology — AI — AI system impact assessment",
"Provides guidance for assessing the potential positive and negative impacts of AI systems."),
("ISO/IEC TR 24028:2020", "Overview of trustworthiness in artificial intelligence",
"Provides an overview of trustworthiness in AI, including risks, challenges, and approaches."),
("ISO/IEC TR 24029-1:2021", "Assessment of the robustness of neural networks — Part 1: Overview",
"Provides background on properties of neural network robustness and methods for assessment."),
("ISO/IEC TR 24030:2021", "Artificial intelligence — Use cases",
"Provides a collection of representative use cases of AI applications across various domains."),
("ISO/IEC TS 6254:2024", "Objectives and approaches for explainability of ML models and AI systems",
"Describes objectives and approaches for explainability of machine learning models and AI systems."),
("ISO/IEC 12792:2024", "Transparency taxonomy of AI systems",
"Establishes a transparency taxonomy for AI systems to support understanding and governance."),
]
ISO_OBP_URL = "https://www.iso.org/standard/"
def _iso_id_to_name(iso_id: str) -> str:
"""Convert ISO ID to slug. E.g. 'ISO/IEC 42001:2023' -> 'iso-iec-42001-2023'."""
slug = iso_id.lower().replace("/", "-").replace(" ", "-").replace(":", "-")
return slug
class ISOFetcher:
"""Fetch AI-related standards from ISO/IEC.
Combines:
1. Open data CSV for discovering SC 42 standards
2. Curated catalog with known abstracts
3. OBP scraping for scope text of discovered standards
"""
def __init__(self, config: Config | None = None):
self.config = config or Config.load()
self.client = httpx.Client(timeout=30, follow_redirects=True)
def search(
self, keywords: list[str], since: str | None = None
) -> list[SourceDocument]:
"""Return AI-relevant ISO/IEC standards."""
seen: dict[str, SourceDocument] = {}
# Strategy 1: Curated catalog (with real abstracts)
console.print(" Loading ISO/IEC SC 42 catalog...")
for iso_id, title, abstract in ISO_AI_CATALOG:
name = _iso_id_to_name(iso_id)
seen[name] = SourceDocument(
name=name,
title=f"{iso_id}: {title}",
abstract=abstract,
source="iso",
source_id=iso_id,
source_url=f"https://www.iso.org/standard/{iso_id.split(':')[0].replace('/', '%2F').replace(' ', '%20')}.html",
time=self._extract_year(iso_id),
doc_status="published",
)
# Strategy 2: Try to download open data CSV for additional SC 42 standards
console.print(" Fetching ISO open data for SC 42 standards...")
open_data_docs = self._fetch_open_data(keywords, since)
for doc in open_data_docs:
if doc.name not in seen:
seen[doc.name] = doc
console.print(f" Found [bold green]{len(seen)}[/] ISO/IEC AI standards")
return list(seen.values())
def _extract_year(self, iso_id: str) -> str:
"""Extract year from ISO ID like 'ISO/IEC 42001:2023'."""
if ":" in iso_id:
return iso_id.split(":")[-1]
return ""
def _fetch_open_data(self, keywords: list[str], since: str | None) -> list[SourceDocument]:
"""Fetch ISO open data CSV and filter for AI standards."""
docs = []
try:
console.print(" Downloading ISO deliverables catalog (CSV)...")
resp = self.client.get(ISO_OPEN_DATA_CSV, timeout=60)
resp.raise_for_status()
reader = csv.DictReader(io.StringIO(resp.text))
ai_keywords = ["artificial intelligence", "machine learning", "neural network",
"ai system", "trustworth", "autonomous"]
for row in reader:
title = row.get("title.en", "")
committee = row.get("ownerCommittee", "")
ref = row.get("reference", "")
status = row.get("currentStage", "")
pub_date = row.get("publicationDate", "")
scope = row.get("scope.en", "")
# Filter: SC 42 committee OR AI keywords in title
is_sc42 = "SC 42" in committee
has_ai_keyword = any(kw in title.lower() for kw in ai_keywords)
if not (is_sc42 or has_ai_keyword):
continue
if since and pub_date and pub_date < since:
continue
name = _iso_id_to_name(ref)
docs.append(SourceDocument(
name=name,
title=f"{ref}: {title}",
abstract=f"ISO/IEC standard: {title}. Committee: {committee}. Status: {status}.",
source="iso",
source_id=ref,
source_url=f"https://www.iso.org/standard/{ref.split(':')[0].replace('/', '%2F').replace(' ', '%20')}.html",
time=pub_date or self._extract_year(ref),
doc_status=status.lower() if status else "published",
))
except httpx.HTTPError as e:
console.print(f"[yellow]Could not fetch ISO open data: {e}[/]")
except Exception as e:
console.print(f"[yellow]Error parsing ISO CSV: {e}[/]")
return docs
def download_text(self, doc: SourceDocument) -> str | None:
"""ISO full text is paywalled. Return abstract/scope only."""
# Try to scrape scope from ISO website
iso_id = doc.source_id.split(":")[0] # e.g. "ISO/IEC 42001"
try:
# The OBP has scope text for some standards
url = f"https://www.iso.org/standard/{iso_id.replace('/', '%2F').replace(' ', '%20')}.html"
resp = self.client.get(url)
if resp.status_code == 200:
# Extract scope/abstract from page
scope_match = re.search(
r'<div[^>]*id="item-abstract"[^>]*>(.*?)</div>',
resp.text, re.DOTALL,
)
if scope_match:
scope = re.sub(r'<[^>]+>', '', scope_match.group(1)).strip()
if len(scope) > 30:
return scope[:5000]
time_mod.sleep(0.5)
except httpx.HTTPError:
pass
return None
def close(self) -> None:
self.client.close()

View File

@@ -0,0 +1,193 @@
"""Fetch AI-related recommendations from ITU-T (free PDFs, no API).
ITU-T has no REST API. We use:
1. A curated catalog of known AI-related recommendations (Y-series, X-series)
2. The ITU handle system for metadata
3. Direct PDF downloads from itu.int
"""
from __future__ import annotations
import re
import time as time_mod
import httpx
from rich.console import Console
from ..config import Config
from .base import SourceDocument
console = Console()
# Known AI-relevant ITU-T Recommendations
# Source: ITU-T Study Groups 13 (Future Networks), 16 (Multimedia), 17 (Security), 20 (IoT)
# Format: (rec_id, title, series_topic)
ITU_AI_CATALOG = [
# Y-series: Global information infrastructure, cloud computing, AI
("Y.3172", "Architectural framework for machine learning in future networks including IMT-2020",
"AI/ML architecture"),
("Y.3173", "Framework for evaluating intelligence levels of future networks including IMT-2020",
"AI/ML architecture"),
("Y.3174", "Framework for data handling to enable machine learning in future networks including IMT-2020",
"AI/ML architecture"),
("Y.3176", "Machine learning marketplace integration in future networks including IMT-2020",
"AI/ML architecture"),
("Y.3177", "Architectural framework of AI-as-a-Service to enable AI services in future networks",
"AI/ML architecture"),
("Y.3178", "Requirements and framework of federated machine learning",
"Federated learning"),
("Y.3179", "Architectural framework for AI-based network automation",
"Network automation"),
("Y.3180", "Framework for multi-domain ML pipeline in future networks",
"AI/ML architecture"),
("Y.3181", "Architectural framework for trustworthy networking based on machine learning technology",
"Trustworthy AI"),
("Y.3530", "Cloud computing — Overview of machine learning in future networks",
"Cloud AI"),
("Y.3531", "Cloud computing — Functional architecture for machine learning as a service",
"Cloud AI"),
("Y.4464", "Framework for IoT-area network using autonomous agents",
"IoT agents"),
("Y.4907", "Reference architecture for intelligent transportation systems communication network using AI",
"AI transport"),
# X-series: Security
("X.1381", "Framework for AI risk assessment in telecommunication networks",
"AI security"),
("X.1382", "Security requirements for AI-based solutions in telecommunication networks",
"AI security"),
("X.1383", "Assessment criteria for trustworthiness of AI-based telecommunication services",
"Trustworthy AI"),
("X.1384", "Security threats and risk treatment measures for AI-based telecommunication systems",
"AI security"),
# Focus Group deliverables (FG-AI4H etc.)
("FG-AI4H DEL01", "AI for Health: Ethics and Governance",
"AI health"),
("FG-AI4H DEL02", "AI for Health: Data handling",
"AI health"),
("FG-AI4H DEL7.2", "AI for Health: Clinical evaluation of AI",
"AI health"),
]
ITU_REC_BASE = "https://www.itu.int/rec/T-REC-"
ITU_HANDLE_BASE = "https://handle.itu.int/11.1002/1000"
def _rec_to_name(rec_id: str) -> str:
"""Convert ITU-T rec ID to slug. E.g. 'Y.3172' -> 'itu-t-y-3172'."""
slug = rec_id.lower().replace(".", "-").replace(" ", "-").replace("/", "-")
return f"itu-t-{slug}"
def _rec_to_url(rec_id: str) -> str:
"""Best-effort URL for an ITU-T recommendation."""
if rec_id.startswith("FG-"):
# Focus group deliverables have different URL patterns
return f"https://www.itu.int/en/ITU-T/focusgroups/{rec_id.split()[0].lower()}/Pages/default.aspx"
# Standard recommendations: T-REC-Y.3172
return f"{ITU_REC_BASE}{rec_id.replace('.', '.')}"
class ITUFetcher:
"""Fetch AI-related specs from ITU-T.
Uses a curated catalog since ITU-T has no search API.
Can be extended by scraping ITU-T work programme pages.
"""
def __init__(self, config: Config | None = None):
self.config = config or Config.load()
self.client = httpx.Client(timeout=30, follow_redirects=True)
def search(
self, keywords: list[str], since: str | None = None
) -> list[SourceDocument]:
"""Return AI-relevant ITU-T recommendations from curated catalog."""
console.print(" Loading ITU-T AI recommendation catalog...")
docs: list[SourceDocument] = []
for rec_id, title, topic in ITU_AI_CATALOG:
name = _rec_to_name(rec_id)
url = _rec_to_url(rec_id)
docs.append(SourceDocument(
name=name,
title=f"ITU-T {rec_id}: {title}",
abstract=f"ITU-T Recommendation {rec_id} on {topic}: {title}",
source="itu",
source_id=rec_id,
source_url=url,
time="",
doc_status="published",
extra={"topic": topic},
))
# Try to enrich with actual metadata from ITU website
console.print(f" Fetching metadata for {len(docs)} ITU-T recommendations...")
enriched = 0
for doc in docs:
if self._enrich_metadata(doc):
enriched += 1
time_mod.sleep(0.3)
console.print(f" Found [bold green]{len(docs)}[/] ITU-T specs ({enriched} with metadata)")
return docs
def _enrich_metadata(self, doc: SourceDocument) -> bool:
"""Try to fetch real abstract/date from ITU website."""
rec_id = doc.source_id
if rec_id.startswith("FG-"):
return False # Focus groups have different structure
try:
# Try the recommendation page for scope/abstract
url = f"https://www.itu.int/ITU-T/recommendations/rec.aspx?rec={rec_id}"
resp = self.client.get(url)
if resp.status_code == 200:
html = resp.text
# Extract summary/scope from page
scope_match = re.search(
r'(?:Summary|Scope|Abstract)[:\s]*</[^>]+>\s*(.+?)(?:</|<br|<p)',
html, re.DOTALL | re.IGNORECASE,
)
if scope_match:
scope = re.sub(r'<[^>]+>', '', scope_match.group(1)).strip()
if len(scope) > 50:
doc.abstract = scope[:2000]
# Extract date
date_match = re.search(
r'(?:Approved|Published)[:\s]*(\d{4}-\d{2}-\d{2}|\d{4}-\d{2}|\d{4})',
html, re.IGNORECASE,
)
if date_match:
doc.time = date_match.group(1)
return True
except httpx.HTTPError:
pass
return False
def download_text(self, doc: SourceDocument) -> str | None:
"""ITU-T recommendations are PDFs — download and extract if possible."""
rec_id = doc.source_id
if rec_id.startswith("FG-"):
return None
# Try to find the PDF download link
try:
# ITU recommendation pages have a download link
url = f"https://www.itu.int/rec/dologin_pub.asp?lang=e&id=T-REC-{rec_id}&type=items"
resp = self.client.get(url)
if resp.status_code == 200 and "application/pdf" in resp.headers.get("content-type", ""):
try:
from io import BytesIO
from pdfminer.high_level import extract_text
text = extract_text(BytesIO(resp.content))
return text[:100000] if text else None
except ImportError:
return f"[PDF document: {doc.title}. Install pdfminer.six to extract text.]"
except httpx.HTTPError as e:
console.print(f"[dim]Could not download {doc.name}: {e}[/]")
return None
def close(self) -> None:
self.client.close()

View File

@@ -53,11 +53,26 @@
color: #4ade80;
border: 1px solid rgba(34, 197, 94, 0.3);
}
.source-generated {
.source-etsi {
background: rgba(251, 146, 60, 0.15);
color: #fb923c;
border: 1px solid rgba(251, 146, 60, 0.3);
}
.source-itu {
background: rgba(244, 114, 182, 0.15);
color: #f472b6;
border: 1px solid rgba(244, 114, 182, 0.3);
}
.source-iso {
background: rgba(168, 85, 247, 0.15);
color: #c084fc;
border: 1px solid rgba(168, 85, 247, 0.3);
}
.source-generated {
background: rgba(148, 163, 184, 0.15);
color: #94a3b8;
border: 1px solid rgba(148, 163, 184, 0.3);
}
.cat-pill {
display: inline-block;
padding: 1px 8px;
@@ -162,7 +177,9 @@
<option value="">All sources</option>
<option value="ietf" {% if current_source == 'ietf' %}selected{% endif %}>IETF</option>
<option value="w3c" {% if current_source == 'w3c' %}selected{% endif %}>W3C</option>
<option value="generated" {% if current_source == 'generated' %}selected{% endif %}>Generated</option>
<option value="etsi" {% if current_source == 'etsi' %}selected{% endif %}>ETSI</option>
<option value="itu" {% if current_source == 'itu' %}selected{% endif %}>ITU-T</option>
<option value="iso" {% if current_source == 'iso' %}selected{% endif %}>ISO/IEC</option>
</select>
</div>
<!-- Sort -->

View File

@@ -2,15 +2,12 @@
from __future__ import annotations
import json
import sqlite3
from datetime import datetime, timezone
import numpy as np
import pytest
from ietf_analyzer.config import Config
from ietf_analyzer.db import Database, SCHEMA
from ietf_analyzer.db import Database
from ietf_analyzer.models import Author, Draft, Rating

View File

@@ -4,7 +4,6 @@ from __future__ import annotations
import json
import pytest
from ietf_analyzer.analyzer import Analyzer
from ietf_analyzer.models import Rating

View File

@@ -3,13 +3,10 @@
from __future__ import annotations
import json
from datetime import datetime, timezone
import numpy as np
import pytest
from ietf_analyzer.db import Database
from ietf_analyzer.models import Author, Draft, Rating
from ietf_analyzer.models import Draft, Rating
# ---- Table creation ----

View File

@@ -3,12 +3,10 @@
from __future__ import annotations
import json
import os
from pathlib import Path
import pytest
from ietf_analyzer.models import Draft, Rating, Author, normalize_category, CATEGORY_NORMALIZE
from ietf_analyzer.models import Draft, Rating, normalize_category
from ietf_analyzer.config import Config, DEFAULT_KEYWORDS

View File

@@ -11,7 +11,6 @@ import sys
import zipfile
from pathlib import Path
import pytest
_project_root = Path(__file__).resolve().parent.parent
if str(_project_root / "src") not in sys.path:

View File

@@ -2,7 +2,6 @@
from __future__ import annotations
import pytest
from ietf_analyzer.search import HybridSearch