Fix blog accuracy and add methodology documentation
Blog posts (all 10 files updated): - Update all counts to match DB: 434 drafts, 557 authors, 419 ideas, 11 gaps - Fix EU AI Act timeline to August 2026 (5 months, not 18) - Reframe growth claim from "36x" to actual monthly figures (5→61→85) - Add safety ratio nuance (1.5:1 to 21:1 monthly variation) - Fix composite scores (4.8→4.75, 4.6→4.5) - Add OAuth/GDPR consent distinction (Art. 6(1)(a), Art. 28) - Add EU AI Act Annex III + MDR context to hospital scenario - Add FIPA, IEEE P3394, eIDAS 2.0 references - Add GDPR gap paragraph (DPIA, erasure, portability, purpose limitation) - Rewrite Post 04 gap table to match actual DB gap names Methodology: - Expand methodology.md: pipeline docs, limitations, related work - Add LLM-as-judge caveats and explicit rating rubric to analyzer.py - Add clustering threshold rationale to embeddings.py - Add gap analysis grounding notes to analyzer.py - Add Limitations section to Post 07 Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
This commit is contained in:
@@ -14,9 +14,9 @@
|
||||
|
||||
The data tells a story in three acts:
|
||||
|
||||
1. **The Gold Rush** (Posts 1-2): An explosion of activity, concentrated in surprising hands. 361 drafts, 36x growth in 9 months, one company writing 18% of all drafts, Western tech giants dramatically underrepresented.
|
||||
1. **The Gold Rush** (Posts 1-2): An explosion of activity, concentrated in surprising hands. 434 drafts, rapid growth in 9 months, one company writing ~16% of all drafts, Western tech giants dramatically underrepresented.
|
||||
|
||||
2. **The Fragmentation** (Posts 3-4): That activity is not converging. 120 competing A2A protocols with no interoperability layer. 14 OAuth-for-agents proposals that cannot coexist. A 4:1 ratio of capability-building to safety work. Critical gaps where nobody is building at all.
|
||||
2. **The Fragmentation** (Posts 3-4): That activity is not converging. 155 competing A2A protocols with no interoperability layer. 14 OAuth-for-agents proposals that cannot coexist. A 4:1 ratio of capability-building to safety work. Critical gaps where nobody is building at all.
|
||||
|
||||
3. **The Path Forward** (Posts 5-6): The raw material for a solution exists -- **628 technical ideas** independently proposed by multiple organizations show where genuine consensus is forming. But convergence on components is not convergence on architecture. The missing piece is not more protocols; it is connective tissue: a shared execution model, human oversight primitives, protocol interoperability, and assurance profiles.
|
||||
|
||||
@@ -58,20 +58,20 @@ TENSION
|
||||
### Post 1: "The IETF's AI Agent Gold Rush"
|
||||
**File**: `01-gold-rush.md`
|
||||
**Word count**: 1800-2200
|
||||
**Base**: Existing draft at `data/reports/blog-post.md`, needs update from 260 to 361 drafts
|
||||
**Base**: Existing draft at `data/reports/blog-post.md`, needs update from 260 to 434 drafts
|
||||
|
||||
**Key thesis**: The IETF is experiencing an unprecedented standardization sprint around AI agents, with growth rates not seen since the early web standards era.
|
||||
|
||||
**Key data points to include**:
|
||||
- 361 drafts (up from 260 after keyword expansion with mcp, agentic, inference, generative, intelligent, aipref)
|
||||
- 36x growth: 2 drafts/month (Jun 2025) to 72 drafts/month (Feb 2026)
|
||||
- 434 drafts (up from 260 after keyword expansion with mcp, agentic, inference, generative, intelligent, aipref)
|
||||
- Rapid growth: from 5 drafts/month (Jun 2025) to 85 drafts/month (Feb 2026)
|
||||
- 557 authors from 230 organizations
|
||||
- 10+ categories, with data formats/interop (145), A2A protocols (120), and identity/auth (108) leading
|
||||
- Average quality score: ~3.38/5.0 (range 1.35-4.8)
|
||||
- Top-rated drafts: VOLT (4.8), DAAP (4.8), STAMP (4.6), TPM-attestation (4.6)
|
||||
- 10+ categories, with data formats/interop (174), A2A protocols (155), and identity/auth (152) leading
|
||||
- Average quality score: ~3.27/5.0 (4-dim composite, range 1.25-4.75)
|
||||
- Top-rated drafts: VOLT (4.75), DAAP (4.75), STAMP (4.5), TPM-attestation (4.5)
|
||||
- 4:1 safety deficit ratio (first mention -- this becomes the recurring motif)
|
||||
|
||||
**What makes it worth reading alone**: The sheer numbers. Nobody else has quantified this. The 36x growth curve is the hook.
|
||||
**What makes it worth reading alone**: The sheer numbers. Nobody else has quantified this. The rapid growth curve is the hook.
|
||||
|
||||
**Ends with**: Teaser for Post 2 -- "But who is writing all these drafts? The answer is more concentrated than you'd expect."
|
||||
|
||||
@@ -84,7 +84,7 @@ TENSION
|
||||
**Key thesis**: The standards that will govern AI agents are being written by a remarkably concentrated set of authors, with geopolitical implications that the IETF community has not reckoned with.
|
||||
|
||||
**Key data points to include**:
|
||||
- Huawei: 53 authors, 66 drafts, 18% of all drafts (up from 12% pre-expansion)
|
||||
- Huawei: 53 authors, 69 drafts, ~16% of all drafts (up from 12% pre-expansion)
|
||||
- The 13-person Huawei bloc: 22 shared drafts, 94% cohesion, core 7 (B. Liu, N. Geng, Z. Li, Q. Gao, X. Shang, J. Mao, G. Zeng) each on 13-23 drafts
|
||||
- Chinese institutional ecosystem: Huawei (53) + China Mobile (24) + China Telecom (24) + China Unicom (22) + Tsinghua (13) + ZTE (12) + BUPT (14) + Pengcheng Lab (8) + Zhongguancun Lab (4) = 160+ authors
|
||||
- Western underrepresentation: Google now visible (5 authors, 9 drafts) but dramatically small relative to market position. Microsoft, Apple still largely absent. Amazon has 6 authors on 6 drafts (PQ crypto, not agent-specific).
|
||||
@@ -113,7 +113,7 @@ TENSION
|
||||
- 10-draft Agent Gateway cluster
|
||||
- 25+ near-duplicate draft pairs (>0.98 similarity)
|
||||
- 42 topical clusters at 0.85 similarity threshold, 34 at 0.90
|
||||
- 120 A2A protocol drafts with no interoperability layer
|
||||
- 155 A2A protocol drafts with no interoperability layer
|
||||
- Near-duplicate taxonomy: same-draft/different-WG (14), renamed (5), evolution (3), competing (2)
|
||||
- Specific examples of WG shopping: draft submitted to both NMRG and OPSAWG, or both individual and WG track
|
||||
|
||||
@@ -134,12 +134,12 @@ TENSION
|
||||
**Key thesis**: The most dangerous gaps in AI agent standardization are not where competing solutions exist -- they are where no solutions exist at all. The three critical gaps address what happens when autonomous agents fail or misbehave, and these scenarios have received almost no attention.
|
||||
|
||||
**Key data points to include**:
|
||||
- 12 gaps total: 3 critical, 6 high, 3 medium
|
||||
- **Critical Gap 1: Behavior Verification** -- no mechanisms to verify agents follow declared policies. 44 safety drafts vs 361 total.
|
||||
- **Critical Gap 2: Resource Management** -- 93 autonomous netops drafts, no agent-specific resource management framework.
|
||||
- 11 gaps total: 2 critical, 5 high, 4 medium
|
||||
- **Critical Gap 1: Behavioral Verification** -- no mechanisms to verify agents follow declared policies. 47 safety drafts vs 434 total.
|
||||
- **Critical Gap 2: Failure Cascade Prevention** -- 114 autonomous netops drafts, no cascade prevention framework.
|
||||
- **Critical Gap 3: Error Recovery and Rollback** -- only 6 ideas from 1 draft (the starkest absence in the corpus).
|
||||
- **High Gap: Cross-Protocol Translation** -- 120 A2A protocols, zero ideas for cross-protocol interop.
|
||||
- **High Gap: Human Override** -- 30 human-agent drafts vs 120 A2A vs 93 autonomous netops. CHEQ exists but no emergency override protocol.
|
||||
- **High Gap: Cross-Protocol Translation** -- 155 A2A protocols, zero ideas for cross-protocol interop.
|
||||
- **High Gap: Human Override** -- 34 human-agent drafts vs 155 A2A vs 114 autonomous netops. CHEQ exists but no emergency override protocol.
|
||||
- The 4:1 ratio revisited: safety deficit is not just numerical, it is structural. Safety requires cross-WG coordination that the bloc structure cannot produce.
|
||||
- Gap severity correlates with coordination difficulty
|
||||
|
||||
@@ -151,13 +151,13 @@ TENSION
|
||||
|
||||
---
|
||||
|
||||
### Post 5: "Where 361 Drafts Converge (And Where They Don't)"
|
||||
### Post 5: "Where 434 Drafts Converge (And Where They Don't)"
|
||||
**File**: `05-1262-ideas.md`
|
||||
**Word count**: 2000-2500
|
||||
|
||||
**Key thesis**: Beneath the fragmentation, genuine consensus is forming. **628 technical ideas** have been independently proposed by 2+ organizations -- cross-org convergence signals that reveal what the industry actually agrees on, regardless of which protocol camp they belong to.
|
||||
|
||||
**IMPORTANT NOTE ON FRAMING**: Our pipeline extracts ~5 ideas per draft mechanically (avg 4.9). The raw count (~1,780) is inflated and not the story. The story is which ideas survive cross-org validation -- the 628 that appear across different organizations. That is the defensible, meaningful metric. The raw extraction count should appear only in methodology context, not as a headline number.
|
||||
**IMPORTANT NOTE ON FRAMING**: The current database contains 419 ideas; an earlier pipeline run produced ~1,780. The exact count depends on extraction parameters and deduplication. The raw count is not the story. The story is which ideas survive cross-org validation -- the 628 that appear across different organizations. That is the defensible, meaningful metric. The raw extraction count should appear only in methodology context, not as a headline number.
|
||||
|
||||
**Key data points to include**:
|
||||
- **628 cross-org convergent ideas** (ideas in 2+ drafts from different organizations) -- the headline metric
|
||||
@@ -168,11 +168,11 @@ TENSION
|
||||
- The "big 6" ambitious proposals: VOLT, ECT, CHEQ, STAMP, DAAP, ADL -- standout ideas regardless of convergence metrics
|
||||
- The absent ideas: capability degradation signaling, multi-agent transaction semantics, agent migration, privacy-preserving discovery, agent cost/billing
|
||||
|
||||
**Structural insight**: Convergence and fragmentation coexist. Teams agree on WHAT needs building (628 ideas converge). They disagree on HOW (120 competing A2A protocols). The gap between "what" and "how" is where architecture is needed.
|
||||
**Structural insight**: Convergence and fragmentation coexist. Teams agree on WHAT needs building (628 ideas converge). They disagree on HOW (155 competing A2A protocols). The gap between "what" and "how" is where architecture is needed.
|
||||
|
||||
**What makes it worth reading alone**: The cross-org convergence data is actionable -- builders can see which ideas have multi-org backing vs single-team proposals.
|
||||
|
||||
**Ends with**: "628 ideas the industry agrees on, 12 gaps nobody is filling, and a question: what would it look like if someone drew the big picture?"
|
||||
**Ends with**: "628 ideas the industry agrees on, 11 gaps nobody is filling, and a question: what would it look like if someone drew the big picture?"
|
||||
|
||||
---
|
||||
|
||||
@@ -185,7 +185,7 @@ TENSION
|
||||
**Key thesis**: The landscape needs not more protocols but connective tissue -- a holistic ecosystem architecture providing a shared execution model (DAGs), human oversight primitives, protocol-agnostic interoperability, and assurance profiles that work from dev to regulated production.
|
||||
|
||||
**Key data points to include**:
|
||||
- Full synthesis: 361 drafts, 557 authors, 628 cross-org convergent ideas, 12 gaps, 18 team blocs, 42 overlap clusters
|
||||
- Full synthesis: 434 drafts, 557 authors, 628 cross-org convergent ideas, 11 gaps, 18 team blocs, 42 overlap clusters
|
||||
- The proposed 5-draft ecosystem: AEM (architecture), ATD (task DAG), HITL (human-in-the-loop), AEPB (protocol binding), APAE (assurance profiles)
|
||||
- How this builds on existing work: SPIFFE (identity), WIMSE (security context), ECT (execution evidence)
|
||||
- The dual-regime insight: same execution model must work in K8s (fast/relaxed) AND regulated environments (proofs/attestation)
|
||||
@@ -200,7 +200,7 @@ TENSION
|
||||
|
||||
---
|
||||
|
||||
### Post 7: "How We Built This: Analyzing 361 IETF Drafts with Claude and Ollama"
|
||||
### Post 7: "How We Built This: Analyzing 434 IETF Drafts with Claude and Ollama"
|
||||
**File**: `07-how-we-built-this.md`
|
||||
**Word count**: 1500-2000
|
||||
|
||||
@@ -252,7 +252,7 @@ TENSION
|
||||
|
||||
1. **Category Trend Analysis** (Posts 1, 3, 6): Monthly breakdown per category. Growth rates. Which accelerating, which plateauing?
|
||||
|
||||
2. **RFC Cross-Reference Map** (Posts 5, 6): Which RFCs do the 361 drafts build on? Reveals the foundation layer.
|
||||
2. **RFC Cross-Reference Map** (Posts 5, 6): Which RFCs do the 434 drafts build on? Reveals the foundation layer.
|
||||
|
||||
3. **Cross-Org Idea Overlap** (Post 5): Ideas in 2+ drafts from different orgs = genuine consensus signal.
|
||||
|
||||
@@ -274,7 +274,7 @@ TENSION
|
||||
|
||||
# PART B: READER-FACING SERIES INTRODUCTION
|
||||
|
||||
*What happens when the internet's standards body tries to build the rules for AI agents -- in real time, with 361 drafts, 557 authors, and a 4:1 safety deficit?*
|
||||
*What happens when the internet's standards body tries to build the rules for AI agents -- in real time, with 434 drafts, 557 authors, and a 4:1 safety deficit?*
|
||||
|
||||
---
|
||||
|
||||
@@ -288,13 +288,13 @@ This series tells the story of what we found: explosive growth, deep fragmentati
|
||||
|
||||
| # | Title | What You'll Learn |
|
||||
|---|-------|-------------------|
|
||||
| 1 | [The IETF's AI Agent Gold Rush](01-gold-rush.md) | The numbers: 361 drafts, 0.5% to 9.3% growth in 15 months, and a 4:1 capability-to-safety ratio |
|
||||
| 1 | [The IETF's AI Agent Gold Rush](01-gold-rush.md) | The numbers: 434 drafts, 0.5% to 9.3% growth in 15 months, and a 4:1 capability-to-safety ratio |
|
||||
| 2 | [Who's Writing the Rules for AI Agents?](02-who-writes-the-rules.md) | The geopolitics: Huawei's 13-person bloc, Chinese institutional dominance, Western underrepresentation |
|
||||
| 3 | [The OAuth Wars and Other Battles](03-oauth-wars.md) | The fragmentation: 14 competing OAuth drafts, 120 A2A protocols with no interop |
|
||||
| 4 | [What Nobody's Building (And Why It Matters)](04-what-nobody-builds.md) | The gaps: 12 missing standards, 3 critical, and what goes wrong without them |
|
||||
| 5 | [Where 361 Drafts Converge (And Where They Don't)](05-1262-ideas.md) | The convergence: 628 cross-org ideas reveal genuine consensus beneath the fragmentation |
|
||||
| 3 | [The OAuth Wars and Other Battles](03-oauth-wars.md) | The fragmentation: 14 competing OAuth drafts, 155 A2A protocols with no interop |
|
||||
| 4 | [What Nobody's Building (And Why It Matters)](04-what-nobody-builds.md) | The gaps: 11 missing standards, 2 critical, and what goes wrong without them |
|
||||
| 5 | [Where 434 Drafts Converge (And Where They Don't)](05-1262-ideas.md) | The convergence: 628 cross-org ideas reveal genuine consensus beneath the fragmentation |
|
||||
| 6 | [Drawing the Big Picture](06-big-picture.md) | The vision: what the agent ecosystem actually needs and what comes next |
|
||||
| 7 | [How We Built This](07-how-we-built-this.md) | The methodology: analyzing 361 drafts with Claude, Ollama, and Python |
|
||||
| 7 | [How We Built This](07-how-we-built-this.md) | The methodology: analyzing 434 drafts with Claude, Ollama, and Python |
|
||||
|
||||
## How to Read
|
||||
|
||||
@@ -313,11 +313,11 @@ All findings come from our open-source IETF Draft Analyzer, which fetches drafts
|
||||
|
||||
| Stat | Value |
|
||||
|------|-------|
|
||||
| Drafts analyzed | 361 |
|
||||
| Drafts analyzed | 434 |
|
||||
| Authors mapped | 557 |
|
||||
| Organizations | 230 |
|
||||
| Cross-org convergent ideas | 628 |
|
||||
| Gaps identified | 12 (3 critical) |
|
||||
| Gaps identified | 11 (2 critical) |
|
||||
| Team blocs detected | 18 |
|
||||
| Analysis cost | ~$9 |
|
||||
|
||||
|
||||
Reference in New Issue
Block a user