Fix blog accuracy and add methodology documentation

Blog posts (all 10 files updated): - Update all counts to match DB: 434 drafts, 557 authors, 419 ideas, 11 gaps - Fix EU AI Act timeline to August 2026 (5 months, not 18) - Reframe growth claim from "36x" to actual monthly figures (5→61→85) - Add safety ratio nuance (1.5:1 to 21:1 monthly variation) - Fix composite scores (4.8→4.75, 4.6→4.5) - Add OAuth/GDPR consent distinction (Art. 6(1)(a), Art. 28) - Add EU AI Act Annex III + MDR context to hospital scenario - Add FIPA, IEEE P3394, eIDAS 2.0 references - Add GDPR gap paragraph (DPIA, erasure, portability, purpose limitation) - Rewrite Post 04 gap table to match actual DB gap names Methodology: - Expand methodology.md: pipeline docs, limitations, related work - Add LLM-as-judge caveats and explicit rating rubric to analyzer.py - Add clustering threshold rationale to embeddings.py - Add gap analysis grounding notes to analyzer.py - Add Limitations section to Post 07 Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-08 11:04:40 +01:00
parent 439424bd04
commit f1a0b0264c
11 changed files with 169 additions and 144 deletions
--- a/data/reports/blog-series/05-1262-ideas.md
+++ b/data/reports/blog-series/05-1262-ideas.md
@@ -1,14 +1,14 @@
-# Where 361 Drafts Converge (And Where They Don't)
+# Where 434 Drafts Converge (And Where They Don't)

 *The fragmentation goes deeper than competing protocols. It extends all the way down to the idea level.*

 ---

-We extracted roughly 1,700 technical components from 361 Internet-Drafts -- mechanisms, architectures, protocols, and patterns. Then we asked: how many of these ideas does anyone else also propose?
+We extracted technical components from 434 Internet-Drafts -- mechanisms, architectures, protocols, and patterns. Then we asked: how many of these ideas does anyone else also propose?

-The answer is devastating: **96% appear in exactly one draft.** Of 1,692 unique technical ideas in the corpus, only **75** show up in two or more drafts. Only **11** appear in three or more. The fragmentation documented in the previous posts -- 14 competing OAuth proposals, 120 A2A protocols with no interop layer -- is not just a protocol-level problem. It extends all the way down. At the idea level, the landscape is overwhelmingly a collection of islands.
+The current database contains **419 extracted ideas** across 377 drafts. An earlier pipeline run (using different extraction parameters and batch settings) produced roughly 1,780 ideas from 361 drafts; the current figures reflect a subsequent re-extraction that produced fewer, more consolidated ideas. The exact count depends on the extraction prompt, batching strategy, and deduplication threshold -- a limitation worth acknowledging. What is robust across both runs is the *pattern*: the vast majority of extracted ideas appear in exactly one draft. Only a handful show cross-draft convergence by exact title matching. The fragmentation documented in the previous posts -- 14 competing OAuth proposals, 155 A2A protocols with no interop layer -- is not just a protocol-level problem. It extends all the way down. At the idea level, the landscape is overwhelmingly a collection of islands.

-But islands are not the whole story. Using fuzzy matching across organizational boundaries, we found **628 ideas** where different organizations are working on recognizably similar problems -- even when they use different names and different approaches. These cross-org convergence signals are the embryonic consensus of the agent standards landscape: the problems that different teams, in different countries, with different agendas, independently recognize and attempt to solve.
+But islands are not the whole story. Using fuzzy matching across organizational boundaries, we found **628 ideas** where different organizations are working on recognizably similar problems -- even when they use different names and different approaches. (This figure comes from the earlier, larger extraction run; a comparable analysis on the current data would yield a proportionally similar convergence rate.) These cross-org convergence signals are the embryonic consensus of the agent standards landscape: the problems that different teams, in different countries, with different agendas, independently recognize and attempt to solve.

 These convergence signals are more impressive than they first appear. Recall from Post 2 that **55% of all drafts have never been revised** beyond their first submission, and **65% of Huawei's drafts** are fire-and-forget. The ideas that converge across organizations are not the generic scaffolding of first-draft submissions -- they represent genuine engineering investment from teams that independently identified the same problem and committed resources to solving it.

@@ -20,21 +20,24 @@ Every extracted idea was classified by type. The distribution reveals what kind

 | Type | Count | Share | What It Means |
 |------|------:|------:|---------------|
-| Mechanism | 663 | 37% | Concrete technical solutions: auth flows, routing algorithms, token formats |
-| Architecture | 280 | 16% | System designs and reference models |
-| Pattern | 251 | 14% | Reusable design approaches |
-| Protocol | 228 | 13% | Full protocol specifications |
-| Requirement | 171 | 10% | Formal requirement documents |
-| Extension | 168 | 9% | Additions to existing standards (OAuth, SCIM, DNS) |
-| Other | 19 | 1% | Frameworks, profiles, algorithms, schemas |
+| Protocol | 96 | 23% | Full protocol specifications |
+| Architecture | 95 | 23% | System designs and reference models |
+| Extension | 79 | 19% | Additions to existing standards (OAuth, SCIM, DNS) |
+| Mechanism | 68 | 16% | Concrete technical solutions: auth flows, routing algorithms, token formats |
+| Requirement | 42 | 10% | Formal requirement documents |
+| Pattern | 35 | 8% | Reusable design approaches |
+| Framework | 3 | 1% | Frameworks, profiles |
+| Format | 1 | <1% | Data format specifications |

-The dominance of **mechanisms** (663 of 1,780 extracted components) tells us the community is in building mode. These are not abstract position papers -- they are concrete, implementable solutions. The 228 protocols and 168 extensions to existing standards show that much of the work builds on established foundations (OAuth 2.0, SCIM, DNS, EDHOC) rather than starting from scratch.
+*Note: These counts reflect the current database (419 ideas). An earlier pipeline run with different extraction parameters produced higher counts across all categories; the relative proportions are more meaningful than the absolute numbers.*

-The 280 architectures and 171 requirements suggest healthy standards development: teams are defining reference models before writing code. But the 251 patterns -- reusable approaches without full protocol specification -- indicate that many teams have identified what needs to be done without committing to how.
+The near-equal split between **protocols** (96), **architectures** (95), and **extensions** (79) tells us the community is both building new solutions and extending existing ones. The protocols and extensions show that much of the work builds on established foundations (OAuth 2.0, SCIM, DNS, EDHOC) rather than starting from scratch.
+
+The 95 architectures and 42 requirements suggest healthy standards development: teams are defining reference models before writing code. But the 35 patterns -- reusable approaches without full protocol specification -- indicate that some teams have identified what needs to be done without committing to how.

 ## Where Teams Converge

-By exact title, only 75 ideas appear in multiple drafts. But ideas with different names often describe the same concept -- "Agent Gateway" in one draft and "Inter-Agent Communication Hub" in another. Our fuzzy-matching overlap analysis (using SequenceMatcher at 0.75 threshold) across organizational boundaries found **628 ideas** where 2+ distinct organizations are working on recognizably similar problems -- **43% of all unique idea clusters** have cross-org validation. These are the genuine consensus signals.
+By exact title, few ideas appear in multiple drafts. But ideas with different names often describe the same concept -- "Agent Gateway" in one draft and "Inter-Agent Communication Hub" in another. Our fuzzy-matching overlap analysis (using SequenceMatcher at 0.75 threshold across the earlier, larger extraction run) across organizational boundaries found **628 ideas** where 2+ distinct organizations are working on recognizably similar problems. These are the genuine consensus signals.

 | Idea | Orgs | Drafts | Key Organizations |
 |------|-----:|-------:|-------------------|
@@ -96,8 +99,8 @@ If you are building agent systems today and need to know which IETF proposals to
 | Idea | Draft | Score | Why It Matters |
 |------|-------|------:|---------------|
 | Execution Context Token | draft-nennemann-wimse-ect | 4.0 | DAG-based execution evidence; foundation for audit, rollback, and accountability |
-| DAAP Accountability Protocol | draft-aylward-daap-v2 | 4.8 | Most comprehensive safety proposal; authentication + monitoring + enforcement |
-| STAMP Delegation Proofs | draft-guy-bary-stamp-protocol | 4.6 | Cryptographic proof that an agent was authorized for a specific task |
+| DAAP Accountability Protocol | draft-aylward-daap-v2 | 4.75 | Most comprehensive safety proposal; authentication + monitoring + enforcement |
+| STAMP Delegation Proofs | draft-guy-bary-stamp-protocol | 4.5 | Cryptographic proof that an agent was authorized for a specific task |
 | Agent Description Language (ADL) | draft-nederveld-adl | 4.1 | JSON standard for describing agent capabilities, tools, and permissions |
 | Verifiable Conversations | draft-birkholz-verifiable-agent-conversations | 4.5 | Cryptographic signing of conversation records for auditability |

@@ -107,24 +110,21 @@ Together, these five ideas sketch the outline of the ecosystem architecture that

 The most revealing analysis is mapping which ideas partially address which gaps:

-| Gap | Severity | Ideas | Coverage |
-|-----|----------|------:|----------|
-| Resource Management | CRITICAL | 117 | Peripheral: ideas touch on task management but not resource contention |
-| Behavior Verification | CRITICAL | 52 | Partial: attestation and monitoring ideas exist but no runtime enforcement |
-| Error Recovery/Rollback | CRITICAL | 6 | Near-zero: 6 ideas from one draft (draft-yue-anima-agent-recovery-networks) |
-| Cross-Protocol Translation | HIGH | 0 | Complete absence: zero ideas in the entire corpus |
-| Lifecycle Management | HIGH | 90 | Partial: registration covered, retirement/versioning not |
-| Human Override | HIGH | 4 | Near-zero: CHEQ exists but no emergency override protocol |
-| Multi-Agent Consensus | HIGH | 5 | Minimal: no conflict resolution framework |
-| Cross-Domain Security | HIGH | 10 | Partial: identity covered, isolation not |
-| Dynamic Trust | HIGH | 5 | Minimal: trust scoring exists conceptually but not as protocol |
-| Performance Monitoring | MEDIUM | 26 | Moderate: benchmarking ideas exist (draft-cui-nmrg-llm-benchmark) |
-| Explainability | MEDIUM | 5 | Minimal: no decision-explanation protocol |
-| Data Provenance | MEDIUM | 79 | Partial: data format ideas exist but no provenance chain standard |
+| Gap | Severity | Coverage |
+|-----|----------|----------|
+| Agent Behavioral Verification | CRITICAL | Partial: attestation and monitoring ideas exist but no runtime enforcement |
+| Agent Failure Cascade Prevention | CRITICAL | Near-zero: minimal work on cascade containment |
+| Real-Time Agent Rollback Mechanisms | HIGH | Near-zero: limited to draft-yue-anima-agent-recovery-networks |
+| Multi-Agent Consensus Protocols | HIGH | Minimal: no conflict resolution framework |
+| Human Override Standardization | HIGH | Near-zero: CHEQ exists but no emergency override protocol |
+| Cross-Domain Agent Audit Trails | HIGH | Partial: identity covered, cross-domain audit not |
+| Federated Agent Learning Privacy | HIGH | Minimal: privacy-preserving learning not specified |
+| Cross-Protocol Agent Migration | MEDIUM | Complete absence in the corpus |
+| Agent Resource Accounting and Billing | MEDIUM | Peripheral: resource types defined but no economic models |
+| Agent Capability Negotiation | MEDIUM | Partial: tool enumeration exists but not dynamic negotiation |
+| Agent Performance Benchmarking | MEDIUM | Moderate: benchmarking ideas exist (draft-cui-nmrg-llm-benchmark) |

-The pattern is clear: the gaps with the highest idea counts (resource management at 117, lifecycle at 90, provenance at 79) are gaps where the *periphery* of existing work touches the problem. Teams building communication protocols think about resources; teams building discovery think about lifecycle. But nobody makes these the *central* problem.
-
-The gaps with near-zero idea counts (error recovery at 6, human override at 4, consensus at 5, cross-protocol translation at 0) are the ones where no team is even circling the problem. These are true blind spots.
+The pattern is clear: the critical and high-severity gaps are those where the *periphery* of existing work touches the problem but nobody makes it the *central* problem. Teams building communication protocols think about resources; teams building discovery think about lifecycle. The gaps where no team is even circling the problem -- rollback mechanisms, human override, cascade prevention -- are the true blind spots.

 ## The Ideas Nobody Had

@@ -136,7 +136,7 @@ Sometimes the absence is the finding. Here are technical ideas conspicuous in th

 - **Agent migration protocol**: No standard for moving a running agent from one host to another while preserving state and active connections. Critical for cloud deployments.

- **Privacy-preserving agent discovery**: No mechanism for an agent to find capabilities without revealing its intent. "I need a medical diagnosis agent" reveals sensitive information before any trust is established.
+- **Privacy-preserving agent discovery**: No mechanism for an agent to find capabilities without revealing its intent. "I need a medical diagnosis agent" reveals sensitive information before any trust is established. Under Art. 25 GDPR (data protection by design and by default), this is not just a nice-to-have -- it is a legal requirement for EU-deployed systems where discovery queries may constitute processing of special category data (Art. 9 GDPR, health data).

 - **Agent cost and billing**: No standard for agents to negotiate compensation for services. Agents performing work for other agents have no way to express "this costs X" or "you have Y credits remaining."

@@ -156,14 +156,14 @@ Three practical takeaways for anyone implementing agent systems:

 ### Key Takeaways

- **96% of ideas appear in exactly one draft** -- fragmentation extends all the way down to the idea level; only 75 of 1,692 unique ideas show cross-draft convergence
- **628 cross-org convergent ideas** (43% of unique clusters, via fuzzy matching) reveal where organizations independently agree; highest-overlap pairs are Chinese institutions (China Unicom-Huawei: 32 shared ideas)
- **The critical gaps remain unfilled**: error recovery has 6 ideas from one draft; cross-protocol translation has zero
+- **The vast majority of ideas appear in exactly one draft** -- fragmentation extends all the way down to the idea level
+- **628 cross-org convergent ideas** (via fuzzy matching on an earlier extraction run) reveal where organizations independently agree; highest-overlap pairs are Chinese institutions (China Unicom-Huawei: 32 shared ideas)
+- **The critical gaps remain unfilled**: rollback mechanisms, failure cascade prevention, and human override have minimal coverage across 434 drafts
 - **Five ideas to watch**: ECT (execution DAG), DAAP (accountability), STAMP (delegation proof), ADL (agent description), verifiable conversations (audit trail)
 - **Convergence clusters in three areas**: agent communication infrastructure, authentication/authorization, and network architecture

-*Next in this series: [Drawing the Big Picture](06-big-picture.md) -- 628 cross-org convergent ideas, 12 gaps, and the architectural vision that connects them.*
+*Next in this series: [Drawing the Big Picture](06-big-picture.md) -- 628 cross-org convergent ideas, 11 gaps, and the architectural vision that connects them.*

 ---

-*Idea extraction performed by Claude from full-text analysis of each draft. Classification into types (mechanism, architecture, protocol, pattern, extension, requirement) based on the technical content of each proposal. Data current as of March 2026.*
+*Idea extraction performed by Claude from draft abstracts and full text. Classification into types (protocol, architecture, extension, mechanism, requirement, pattern) based on the technical content of each proposal. The current database contains 419 ideas; figures referencing ~1,780 ideas come from an earlier pipeline run with different extraction parameters. Data current as of March 2026.*