Fix security, data integrity, and accuracy issues from 4-perspective review

Security fixes: - Fix SQL injection in db.py:update_generation_run (column name whitelist) - Flask SECRET_KEY from env var instead of hardcoded - Add LLM rating bounds validation (_clamp_rating, 1-10) - Fix JSON extraction trailing whitespace handling Data integrity: - Normalize 21 legacy category names to 11 canonical short forms - Add false_positive column, flag 73 non-AI drafts (361 relevant remain) - Document verified counts: 434 total/361 relevant drafts, 557 authors, 419 ideas, 11 gaps Code quality: - Fix version string 0.1.0 → 0.2.0 - Add close()/context manager to Embedder class - Dynamic matrix size instead of hardcoded "260x260" Blog accuracy: - Fix EU AI Act timeline (enforcement Aug 2026, not "18 months") - Distinguish OAuth consent from GDPR Einwilligung - Add EU AI Act Annex III context to hospital scenario - Add FIPA, eIDAS 2.0 references where relevant Methodology: - Add methodology.md documenting pipeline, limitations, rating rubric - Add LLM-as-judge caveats to analyzer.py - Document clustering threshold rationale Reviews from: legal (German/EU law), statistics, development, science perspectives. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-08 10:52:33 +01:00
parent a386d0bb1a
commit 439424bd04
19 changed files with 1745 additions and 126 deletions
--- a/data/reports/blog-series/03-oauth-wars.md
+++ b/data/reports/blog-series/03-oauth-wars.md
@@ -1,6 +1,6 @@
 # The OAuth Wars and Other Battles

-*14 competing proposals, 120 protocols with no interop layer, and 25+ near-duplicate drafts. Inside the IETF's AI agent fragmentation problem.*
+*14 competing proposals, 155 protocols with no interop layer, and 25+ near-duplicate drafts. Inside the IETF's AI agent fragmentation problem.*

 ---

@@ -12,13 +12,13 @@ This is the fragmentation problem, and it is not limited to OAuth. Across the IE

 The most crowded corner of the AI agent standards landscape is OAuth for agents. Every proposal is trying to answer the same fundamental question: when an AI agent acts on behalf of a user -- or on its own -- how does it prove its identity and obtain permission?

-The depth of this cluster is not surprising when you look at the ecosystem's foundations. Our cross-reference analysis of all 361 drafts found that **OAuth 2.0** (RFC 6749) is cited by **36 drafts**, **JWT** (RFC 7519) by **22**, **OAuth Bearer** (RFC 6750) by **9**, and **DPoP** (RFC 9449) by **9**. The OAuth stack is the single most-referenced functional standard in the entire corpus after TLS. The agent identity problem runs through the landscape like a root system.
+The depth of this cluster is not surprising when you look at the ecosystem's foundations. Our cross-reference analysis of all 434 drafts found that **OAuth 2.0** (RFC 6749) is cited by **36 drafts**, **JWT** (RFC 7519) by **22**, **OAuth Bearer** (RFC 6750) by **9**, and **DPoP** (RFC 9449) by **9**. The OAuth stack is the single most-referenced functional standard in the entire corpus after TLS. The agent identity problem runs through the landscape like a root system.

 Here are all 14 drafts:

 | Draft | Approach | Score |
 |-------|----------|------:|
-| [draft-aylward-daap-v2](https://datatracker.ietf.org/doc/draft-aylward-daap-v2/) | Comprehensive accountability protocol | 4.8 |
+| [draft-aylward-daap-v2](https://datatracker.ietf.org/doc/draft-aylward-daap-v2/) | Comprehensive accountability protocol | 4.75 |
 | [draft-goswami-agentic-jwt](https://datatracker.ietf.org/doc/draft-goswami-agentic-jwt/) | Agentic JWT for autonomous systems | 4.5 |
 | [draft-chen-oauth-rar-agent-extensions](https://datatracker.ietf.org/doc/draft-chen-oauth-rar-agent-extensions/) | RAR extensions for agent policy | 4.2 |
 | [draft-aap-oauth-profile](https://datatracker.ietf.org/doc/draft-aap-oauth-profile/) | OAuth 2.0 profile for autonomous agents | 4.2 |
@@ -33,10 +33,14 @@ Here are all 14 drafts:
 | [draft-chen-ai-agent-auth-new-requirements](https://datatracker.ietf.org/doc/draft-chen-ai-agent-auth-new-requirements/) | New auth requirements analysis | 3.8 |
 | [draft-yao-agent-auth-considerations](https://datatracker.ietf.org/doc/draft-yao-agent-auth-considerations/) | Auth considerations analysis | 3.1 |

-The quality range is enormous -- from 2.8 to 4.8 -- and the approaches barely overlap. Some extend OAuth 2.0 with new grant types. Others define entirely new token formats (Agentic JWT). Still others propose mesh architectures or accountability layers on top of existing auth flows. Two drafts (song-oauth-ai-agent-authorization and song-oauth-ai-agent-collaborate-authz) come from the same Huawei team and address different facets of the problem. Two more (chen-oauth-rar-agent-extensions and chen-ai-agent-auth-new-requirements) come from a China Mobile team.
+*(Scores are LLM-generated relative rankings from abstracts, not human expert assessments. See [Methodology](../methodology.md).)*
+
+The quality range is enormous -- from 2.8 to 4.75 -- and the approaches barely overlap. Some extend OAuth 2.0 with new grant types. Others define entirely new token formats (Agentic JWT). Still others propose mesh architectures or accountability layers on top of existing auth flows. Two drafts (song-oauth-ai-agent-authorization and song-oauth-ai-agent-collaborate-authz) come from the same Huawei team and address different facets of the problem. Two more (chen-oauth-rar-agent-extensions and chen-ai-agent-auth-new-requirements) come from a China Mobile team.

 The gap our analysis identified in this cluster: most focus on **single-agent authorization**. Few address chained delegation across multiple agents, and none standardize real-time revocation in agent-to-agent workflows. An agent that obtains a token and delegates a sub-task to another agent -- which then delegates further -- creates a chain of trust that no single draft adequately covers.

+A note on terminology: "consent" in the OAuth context means a technical authorization flow where a user delegates access scopes to a client. This is distinct from GDPR consent (*Einwilligung*) under Art. 6(1)(a) GDPR, which must be freely given, specific, informed, and unambiguous, and is revocable at any time. When AI agents further delegate to sub-agents, the chain of GDPR-valid consent may break entirely -- a problem none of these 14 drafts addresses. The controller-processor relationship under Art. 28 GDPR imposes additional requirements (data processing agreements, sub-processor authorization) that go beyond what any OAuth extension can express on its own.
+
 ## The Agent Gateway Melee: 10 Drafts

 If OAuth for agents is about identity, the agent gateway cluster is about communication architecture. Ten drafts are competing to define how agents from different platforms and ecosystems collaborate:
@@ -76,11 +80,11 @@ Our embedding-based similarity analysis produced a more troubling finding: **25+

 Some of these duplications are legitimate IETF process: a draft moves from individual submission to working group adoption (like draft-cui-nmrg-llm-nm becoming draft-irtf-nmrg-llm-nm). Others reflect authors shopping the same draft to multiple working groups. And a few appear to be genuine content duplication -- the same ideas submitted under different author combinations.

-The practical effect: the 361-draft corpus includes substantial double-counting. After de-duplication, the true number of distinct proposals is probably closer to 300. But even 300 competing proposals in nine months is extraordinary.
+The practical effect: the 434-draft corpus includes substantial double-counting. After de-duplication, the true number of distinct proposals is somewhat lower -- removing the 25 near-duplicate pairs yields roughly 409 distinct drafts, and further accounting for related-but-not-identical submissions brings the number down further. But even with generous de-duplication, the volume is extraordinary.

 ## The A2A Protocol Zoo

-Zooming out from individual clusters, the broadest fragmentation is in the **120 A2A protocol drafts**. These span everything from low-level transport (A2A over MOQT/QUIC) to high-level semantic routing (intent-based agent interconnection) to specific use cases (MCP for network troubleshooting).
+Zooming out from individual clusters, the broadest fragmentation is in the **155 A2A protocol drafts**. These span everything from low-level transport (A2A over MOQT/QUIC) to high-level semantic routing (intent-based agent interconnection) to specific use cases (MCP for network troubleshooting).

 The most common technical idea in the entire corpus -- "Multi-Agent Communication Protocol" -- appears in **8 separate drafts** from different teams. Eight teams are independently designing how agents should talk to each other.

@@ -143,7 +147,7 @@ Three structural interventions would accelerate convergence:

 **1. Working groups need to pick winners.** The IETF process allows competing proposals, but at some point working groups must adopt specific approaches and redirect competing efforts. In the OAuth agent space, the highest-quality proposals (DAAP, Agentic JWT, RAR extensions) should be evaluated head-to-head, not allowed to proliferate indefinitely.

-**2. Interoperability testing, not just drafting.** The 120 A2A protocol proposals exist mostly as text. Interop testing -- where implementations from different teams prove they can work together -- would quickly reveal which proposals have real engineering substance and which are paper exercises.
+**2. Interoperability testing, not just drafting.** The 155 A2A protocol proposals exist mostly as text. Interop testing -- where implementations from different teams prove they can work together -- would quickly reveal which proposals have real engineering substance and which are paper exercises.

 **3. The translation layer must be built.** Rather than picking one A2A protocol, the community may be better served by a thin interoperability layer that lets agents using different protocols communicate through gateways. Our gap analysis found this cross-protocol translation gap entirely unaddressed -- zero technical ideas in the current corpus.

@@ -152,7 +156,7 @@ Three structural interventions would accelerate convergence:
 ### Key Takeaways

 - **14 competing OAuth-for-agents proposals** illustrate the depth of fragmentation; none handle chained delegation across agent networks
- **120 A2A protocol drafts** exist without an interoperability layer; the most common idea in the corpus appears in 8 separate drafts from different teams
+- **155 A2A protocol drafts** exist without an interoperability layer; the most common idea in the corpus appears in 8 separate drafts from different teams
 - **25+ near-duplicate pairs** (>0.98 similarity) inflate the draft count; after de-duplication, roughly 300 distinct proposals remain
 - **Convergence signals exist** in EDHOC authentication, SCIM agent extensions, and verifiable conversations -- areas where teams explicitly build on each other
 - **Fragmentation goes deeper than protocols**: Chinese and Western blocs build on different RFC foundations (YANG/NETCONF vs COSE/CBOR/CoAP); the only shared bedrock is OAuth 2.0