Run pipeline, write Post 08, commit untracked files

Pipeline:
- Extract ideas for 38 new drafts → 462 ideas total
- Convergence analysis: 132 cross-org convergent ideas (33% rate)
- Fetch authors for 102 drafts → 709 authors (up from 403)
- Refresh gap analysis: 12 gaps across full 474-draft corpus
- Update verified counts with new totals

Post 08:
- Complete rewrite of "Agents Building the Agent Analysis" (2,953 words)
- Covers 3 phases: writing team → review cycle → fix cycle
- Meta-irony table mapping team coordination to IETF gap names
- Specific examples from dev journal (SQL injection, consent conflation, ideas mismatch)

Untracked files committed:
- scripts/: backfill-wg-names, classify-unrated, compare-classifiers, download-relevant-text, run-webui
- src/ietf_analyzer/classifier.py: two-stage Ollama classifier
- src/webui/: analytics (GDPR-compliant), auth, obsidian_export
- tests/test_obsidian_export.py (10 tests)
- data/reports/: wg-analysis, generated draft for gap #37

Housekeeping:
- .gitignore: exclude LaTeX artifacts, stale DBs, analytics.db

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
This commit is contained in:
2026-03-08 15:31:30 +01:00
parent 20c45a7eba
commit e247bfef8f
19 changed files with 2758 additions and 586 deletions

10
.gitignore vendored
View File

@@ -4,5 +4,15 @@ __pycache__/
dist/ dist/
build/ build/
data/config.json data/config.json
data/analytics.db
data/ietf_drafts.db
.claude/ .claude/
.env .env
# LaTeX build artifacts
paper/*.aux
paper/*.log
paper/*.out
paper/*.synctex.gz
paper/*.fls
paper/*.fdb_latexmk

Binary file not shown.

View File

@@ -1,197 +1,167 @@
# Agents Building the Agent Analysis # Agents Building the Agent Analysis
*We used a team of AI agents to analyze, write about, and draw conclusions from 434 IETF drafts on AI agents. Here is what that looked like from the inside.* *We used a team of AI agents to analyze, write about, and review 434 IETF Internet-Drafts on AI agents. Here is what that looked like from the inside.*
*Analysis based on IETF Datatracker data collected through March 2026. Counts and statistics reflect this snapshot.* *Analysis based on IETF Datatracker data collected through March 2026. Counts and statistics reflect this snapshot.*
--- ---
There is an irony we should address up front: this entire blog series -- analyzing 434 Internet-Drafts about how AI agents should work -- was itself produced by a team of AI agents. Four Claude instances, each with a distinct role, reading the same data, building on each other's output, and coordinating through a shared task system and development journal. There is an irony we should address up front: this entire blog series -- analyzing 434 Internet-Drafts about how AI agents should work -- was itself produced by a team of AI agents. Twelve Claude instances across three phases, each with a distinct role, reading the same database, building on each other's output, and coordinating through a shared journal and file system.
This post is the story of that process: what worked, what surprised us, and what it reveals about the state of AI agent coordination in practice -- which, as it happens, is exactly the problem the IETF drafts are trying to solve. This post is the story of that process: what worked, what broke, what surprised us, and what it reveals about the state of AI agent coordination in practice -- which, as it happens, is exactly the problem the IETF drafts are trying to solve.
## The Team ## Phase 1: The Writing Team
We designed a four-agent team, each with a one-page definition file and a shared 3,000-word team brief: We started with four agents, each defined in a one-page file and grounded by a shared 3,000-word team brief:
| Agent | Role | What They Did | | Agent | Role | What They Did |
|-------|------|---------------| |-------|------|---------------|
| **Architect** | "The Big Picture" | Read all reports, designed the narrative arc, wrote the vision document, reviewed every post across multiple passes | | **Architect** | The Big Picture | Read all reports, designed the narrative arc, wrote the vision document, reviewed every post |
| **Analyst** | "The Data Whisperer" | Ran the full pipeline on 434 drafts, executed 20+ SQL queries, produced 7 data packages | | **Analyst** | The Data Whisperer | Ran the pipeline on 434 drafts, executed 20+ SQL queries, produced data packages |
| **Coder** | "The Feature Builder" | Implemented 7 new analysis features (refs, trends, idea-overlap, WG adoption, revisions, centrality, co-occurrence) | | **Coder** | The Feature Builder | Implemented 7 new analysis features (refs, trends, idea-overlap, WG adoption, revisions, centrality, co-occurrence) |
| **Writer** | "The Storyteller" | Drafted all 8 blog posts, applied 6+ revision passes incorporating data refreshes, architectural reframes, and editorial redirections | | **Writer** | The Storyteller | Drafted all 8 blog posts, applied 6+ revision passes |
Each agent had access to the full project codebase, a SQLite database of analyzed drafts, and the `ietf` CLI tool. They communicated through direct messages and coordinated through a shared task board with dependency tracking. Each agent had access to the full project codebase, a SQLite database, and the `ietf` CLI tool. They communicated through files and coordinated through a shared development journal. The team brief contained a thesis statement -- "The IETF is building the highways before the traffic lights" -- a per-post outline, and a data requirements table.
The team brief contained a thesis statement -- "The IETF is building the highways before the traffic lights" -- along with a per-post outline, style guide, and key data points table. Each agent's definition was approximately 50 lines: enough to establish identity and scope without over-constraining behavior. ### Parallel by default
## How It Actually Worked The key design decision: agents did not wait for each other when they could work in parallel. The Writer's tasks were formally blocked by the Analyst's pipeline run, but the Writer had enough existing data (260 analyzed drafts) to start drafting. Rather than sitting idle, the Writer produced first drafts of all 7 posts while waiting for updated numbers. This turned out to be the right call -- the structure and narrative mattered more than whether the draft count was 260 or 434.
The process unfolded in roughly six phases -- not the four we planned. The Coder and Writer worked simultaneously, their outputs feeding each other. Every feature the Coder built used zero API calls -- pure local computation via regex, SQL, SequenceMatcher, and networkx. The RFC cross-reference parser revealed that the Chinese and Western blocs build on incompatible infrastructure foundations (YANG/NETCONF vs. COSE/CBOR), with OAuth 2.0 as the only shared bedrock. The co-occurrence analysis showed safety has zero overlap with Agent Discovery and Model Serving. These zero-cost local analyses produced the most structurally revealing findings in the entire series.
### Phase 1: Parallel Initialization ### The Architect shaped everything
All four agents started simultaneously. The Analyst began running the analysis pipeline on 101 new drafts. The Architect read all 10 existing reports and started designing the narrative arc. The Coder read the Architect's initial notes and began implementing new features. The Writer read every data report in the project. The Architect produced fewer words than the Writer and fewer features than the Coder, but had disproportionate impact. Three contributions reshaped the output:
The key design decision: **agents did not wait for each other when they could work in parallel.** The Writer's tasks were formally blocked by the Analyst's pipeline run, but the Writer had enough existing data (260 analyzed drafts) to start drafting. Rather than sitting idle, the Writer produced first drafts of all 6 core posts while waiting for updated numbers. This turned out to be the right call -- the structure and narrative mattered more than whether the draft count was 260 or 434. 1. The insight that **gap severity correlates with coordination difficulty** transformed Post 4 from a list of gaps into an argument about structural dysfunction.
2. The **"two equilibria" framing** -- microservices chaos vs. layered web architecture -- gave Post 6's predictions real structural weight.
3. A **verification pass** that caught the Writer's revisions silently failing (logged as done, not actually persisted in the file).
### Phase 2: The Architect Sets the Frame That third point is worth dwelling on. The dev journal said "Post 1 revisions complete." The file still contained the pre-revision content. Without the Architect reading the actual output rather than trusting the status message, the error would have shipped. This is a small-scale version of the Behavior Verification gap the series identifies as critical -- and we will come back to it.
The Architect's first deliverable changed everything. After reading all 10 reports, the Architect produced two documents: ### The human who said "so what?"
**1. The narrative arc** (`00-series-overview.md`): A three-act structure (Gold Rush, Fragmentation, Path Forward) with five recurring motifs and per-post design guidance. The key insight embedded in this document -- that "coordination difficulty correlates with gap severity" -- reframed the entire analysis. The safety deficit was not just a quantity problem (too few safety drafts); it was a structural problem (the team-bloc structure that concentrates authorship cannot produce the cross-team work that safety standards require). The most consequential intervention in the entire project came not from an agent but from the human project lead. The series had been built around a headline number: "1,780 technical ideas extracted from the drafts." The project lead asked: what does that number actually mean?
**2. The vision document** (`state-of-ecosystem.md`): A ~2,000-word synthesis with three 2027 scenarios and a "two equilibria" 2028 endgame. The best historical analogy turned out to be not IoT but the web itself -- browser wars leading to HTML5 convergence. The critical difference: when the thing being standardized makes autonomous decisions, getting safety wrong in the messy phase has consequences that are harder to fix retroactively. The answer was uncomfortable. The pipeline extracts roughly 5 ideas per draft on average -- a mechanical process that produces items like "A2A Communication Paradigm" and "Agent Network Architecture." The raw count sounds impressive but is mostly scaffolding. The real signal was hiding in the cross-org overlap analysis: 96% of unique idea titles appear in exactly one draft. Only 75 show up in two or more. The fragmentation that defines the protocol landscape extends all the way down to the idea level.
Both documents shaped every subsequent blog post. The Writer wove the motifs through the series. The Coder built features the Architect flagged as missing. The Analyst's queries were directed by the per-post data requirements table the Architect produced. This required rewriting Post 5 entirely. Its title changed from "The 1,780 Ideas That Will Shape Agent Infrastructure" to "Where 434 Drafts Converge (And Where They Don't)." The lead metric shifted from raw extraction count (impressive but hollow) to the convergence rate (honest and striking). Four agents had independently used the 1,780 figure -- the Analyst generated it, the Coder validated it, the Architect designed around it, the Writer headlined it. None questioned whether it was meaningful.
### Phase 3: Building and Writing in Parallel ## Phase 2: The Review Cycle
The Coder and Writer worked simultaneously, their outputs feeding each other. The Coder started with four features, then built three more as the Architect identified additional analytical needs: After the writing team produced 8 blog posts, a vision document, 7 new analysis features, and 30 dev-journal entries, we did something that turned out to matter more than the writing itself: we sent the entire output to four specialist reviewers, each running in parallel.
| Coder Built | What It Revealed | Writer Used It In | | Reviewer | Lens | Issues Found |
|-------------|------------------|-------------------| |----------|------|-------------|
| `ietf refs` (4,231 cross-references) | OAuth 2.0 and TLS 1.3 are the ecosystem's bedrock | Post 3: OAuth Wars | | **Statistics** | Data integrity, sampling bias, quantitative accuracy | 3 critical, 4 important, 4 minor |
| `ietf idea-overlap` (130 cross-org ideas) | 36% of idea clusters have cross-org validation | Post 5: Where Drafts Converge | | **Legal** | German/EU internet law, GDPR, EU AI Act, eIDAS 2.0 | 3 critical, 5 regulatory gaps, 5 improvements |
| `ietf trends` (19 months of data) | Growth from 0.5% to 9.3% of all IETF submissions | Post 1: Gold Rush | | **Engineering** | Code quality, security, performance, DX | 1 critical, 1 high, 5 bugs, 6 perf issues |
| `ietf status` (36 WG-adopted drafts) | Agent standards live in security WGs, not agent WGs | Post 6: Big Picture | | **Science** | Methodology, reproducibility, related work, hedging | 2 critical, 3 high, 4 medium |
| `ietf revisions` (55% at rev-00) | Most drafts are fire-and-forget; commitment is rare | Posts 2, 5 |
| `ietf centrality` (491 nodes, 1,142 edges) | European telecoms are the cross-divide glue | Post 2: Who Writes the Rules |
| `ietf co-occurrence` (safety isolation) | Safety co-occurs with A2A protocols only 8.8% of the time | Post 4: What Nobody Builds |
Every one of these features used **zero API calls** -- pure local computation using regex, SequenceMatcher, networkx, and SQL. This is an underappreciated pattern in LLM-powered analysis: use the expensive model (Claude) for tasks that require reasoning (categorization, idea extraction, gap synthesis), and use deterministic code for everything else. The cheapest analyses -- the ones with zero marginal cost -- produced the most structurally revealing findings. Four agents, four completely different perspectives, run simultaneously. Together they surfaced **36 distinct issues** that the writing team had missed. The findings were often surprising.
The Writer produced all 7 posts in a single session: roughly 15,000 words across Posts 1-7, each following the Architect's structural guidance while making independent editorial decisions about hooks, examples, and narrative pacing. ### The statistics reviewer found the numbers did not add up
### Phase 4: First Review and the Silent Failure The statistical audit cross-checked every quantitative claim in the blog series against the actual database using raw SQL queries. The results were sobering. The blog claimed 361 drafts; the database held 434. The blog claimed 1,780 ideas; the database held 419. The blog claimed 12 gaps; the database held 11. Composite scores were inflated by 0.05-0.10 through rounding. The "4:1 safety ratio" varied from 1.5:1 to 21:1 by month -- a fact the flat claim obscured.
The Architect read all 6 core posts end-to-end and provided a structured review: The ideas count mismatch was the most serious finding. The entire thesis of Post 5 -- "96% of ideas appear in one draft" and "628 cross-org convergent ideas" -- was not reproducible from the current database. The pipeline had been re-run with different parameters, overwriting the original extraction. Nobody had noticed because the numbers in the blog posts were never re-checked against the live database.
- **Post 1**: Four specific notes (geopolitics belongs in Post 2, add keyword expansion, lighten ending, add vivid example) ### The legal reviewer found regulatory blindspots
- **Post 3**: Flagged a data inconsistency (OAuth table had 14 rows but text said 13)
- **Post 4**: Identified as the strongest post -- the hospital drug-dispensing scenario and structural analysis section deliver the climax
- **Post 5**: Needed cross-org overlap data from the Coder's new report
- **Post 6**: Suggested adding the "two equilibria" framing from the vision document
The Writer applied all revisions in a targeted pass. The most interesting editorial decision: removing the extended geopolitics section from Post 1. The original was well-written but front-loaded the series with details that Post 2 covers in depth. The lighter version creates more narrative pull toward the next post. The legal review, written from a German/EU internet law perspective, identified three critical issues that no technically-focused agent would have caught:
Then came the first real coordination failure. **The Writer's revisions to Post 1 did not persist.** The dev journal said the work was done. The task board said "completed." But when the Architect verified the actual file, it still contained the pre-revision content -- the full geopolitics section, the heavy ending, the missing cloud-infrastructure scenario. **Consent conflation.** The series used "consent" interchangeably across OAuth authorization flows, GDPR consent (Einwilligung under Art. 6(1)(a)), and human-in-the-loop approval gates. These are legally distinct concepts. Under CJEU case law (Planet49), consent requires a clear affirmative act by the data subject. When an AI agent delegates to sub-agents, the chain of consent may break entirely. None of the 14 OAuth-for-agents proposals the series analyzed -- and none of the agents writing about them -- flagged this.
This is exactly the kind of silent failure that agent teams need guardrails for. The log said success; the artifact said otherwise. Without the Architect's verification step -- reading the output rather than trusting the status -- the error would have shipped. Lesson: **verify outputs, not logs.** **The hospital scenario understated regulatory reality.** Post 4's opening scenario -- an AI agent managing drug dispensing with a hallucinated dosage -- was framed as "what goes wrong if this gap is never addressed." Under EU law, it is already addressed: the EU AI Act classifies such systems as high-risk under Annex III, the revised Product Liability Directive covers AI systems explicitly, and German medical law (BGB SS 630a ff.) places duty of care on the provider. The IETF gap is not in accountability but in technical mechanisms to implement what the regulation already requires.
### Phase 5: The Data Arrives and the Reframing Battle **GDPR was entirely absent from the gap analysis.** The series identified 11 standardization gaps. None mentioned GDPR-mandated capabilities: data protection impact assessments, right to erasure propagation through multi-agent chains, data portability, or purpose limitation. These are not aspirational -- they are legally binding requirements that agent systems operating in the EU must satisfy.
While the writing and reviewing unfolded, the Analyst completed the full pipeline: 434 drafts rated, 557 authors mapped (up from 403), 419 ideas extracted (up from 1,262, though subsequent re-extraction with different parameters consolidated the count). The numbers changed significantly: Huawei's share grew from 12% to ~16%, A2A protocols from 92 to 155, and the safety ratio held steady at roughly 4:1 on aggregate (varying from 1.5:1 to 21:1 month-to-month). Every blog post needed a numbers-update pass. ### The engineering reviewer found a SQL injection
But the most consequential event in Phase 5 was not the data refresh. It was the project lead challenging the Writer's headline claim. The codebase review graded the project B+ overall -- "solid for a research tool, needs hardening for production" -- but found a critical SQL injection vulnerability in `db.py`. The `update_generation_run` method interpolated column names from `**kwargs` directly into SQL strings without validation. The Flask SECRET_KEY was hardcoded as the string `"ietf-dashboard-dev"`. There was no rate limiting on endpoints that trigger paid Claude API calls.
**The ideas reframing.** The series had been built around a headline number: "1,780 technical ideas extracted from the drafts." The project lead asked: what does that number actually mean? The answer was uncomfortable. The pipeline extracts approximately 5 ideas per draft on average -- a mechanical process that produces "ideas" like "A2A Communication Paradigm" and "Agent Network Architecture." The raw count sounds impressive but is mostly scaffolding. The engineering reviewer also noted that `cli.py` had grown to 2,995 lines with approximately 40 repetitions of the same config/db boilerplate pattern. And that test coverage for the analysis pipeline -- the core of the tool -- was exactly zero.
The real signal was hiding in the Coder's cross-org overlap analysis: of 1,692 unique idea titles, **96% appear in exactly one draft.** Only 75 show up in two or more drafts. Only 11 in three or more. The fragmentation that defines the protocol landscape extends all the way down to the idea level. ### The science reviewer questioned the methodology
This required rewriting Post 5 entirely -- its title changed from "The 1,780 Ideas That Will Shape Agent Infrastructure" to "Where 434 Drafts Converge (And Where They Don't)." The lead metric shifted from raw extraction count (impressive but hollow) to the 96% fragmentation rate (honest and striking). Every post that referenced the idea count had to be updated, some multiple times as the framing evolved through three iterations. The scientific review identified the central methodological weakness: the entire rating system relies on Claude as the sole judge for five dimensions, with no human calibration, no inter-rater reliability measurement, and ratings based on abstracts only (truncated to 2,000 characters), not full draft text. The clustering threshold of 0.85 was described as "empirical" with no sensitivity analysis. The gap analysis was single-shot LLM generation from compressed metadata.
The episode is worth documenting because it illustrates the irreducible role of human judgment in agent-produced work. Four agents had independently used the 1,780 figure -- the Analyst generated it, the Coder validated it, the Architect designed around it, the Writer headlined it. None questioned whether it was meaningful. It took a human asking "so what?" to force the reframe. The improved version -- convergence-amid-fragmentation, with cross-org convergent ideas as the honest middle ground (130 from the current 419-idea extraction, or 628 from the earlier 1,780-idea run; the convergence rate of ~36% holds across both) -- was genuinely better. But no agent surfaced the critique on its own. One finding was particularly striking: of 434 drafts rated for relevance, the distribution was heavily right-skewed (196 at 4, 98 at 5, only 38 at 1-2). Claude was generous with relevance for keyword-matched drafts, making the metric less discriminating than it should be. Upon manual review, 73 drafts turned out to be false positives -- including `draft-ietf-hpke-hpke` (generic public key encryption, nothing to do with AI agents) rated at relevance 5.
### Phase 6: Bombshell Findings and Final Integration ## Phase 3: The Fix Cycle
The Analyst's second deep-analysis round produced three findings that significantly strengthened the series: With 36 issues identified, we launched fix agents -- the Coder handling engineering and data integrity issues, an Editor handling legal and statistical corrections across the blog posts.
**RFC foundation divergence.** The Chinese bloc builds on YANG/NETCONF (network management). The Western bloc builds on COSE/CBOR/CoAP (IoT security) and HTTP/TLS/PKI (web infrastructure). The **only shared foundation is OAuth 2.0.** This elevated Post 3's fragmentation thesis from "different protocols" to "different technological DNA" -- the two blocs are not just disagreeing on solutions, they are building on incompatible infrastructure. The fixes unfolded in three rounds, prioritized by severity:
**Revision velocity.** 55% of all 434 drafts are at revision -00 -- submitted once, never iterated. Huawei's rate is 65%. Compare that with Ericsson (11%), Boeing (average revision 28.2), and Siemens (17.2). The volume-vs.-commitment distinction sharpened Post 2's analysis of what Huawei's 69-draft campaign actually represents. A further detail: the majority of Huawei's drafts were submitted in the 4-week window before IETF 121 Dublin -- a coordinated pre-meeting filing burst. **Round 1 -- Critical.** SQL injection patched with a column name whitelist. Flask SECRET_KEY replaced with `os.environ.get()` fallback to `os.urandom()`. FTS5 query sanitization added to prevent search injection. False-positive column added to the ratings table; 73 drafts flagged. All blog posts updated from 361 to 434 drafts. Ideas count discrepancy reconciled (419 current with methodology note explaining the 1,780 historical figure). Gap count corrected from 12 to 11 with rewritten gap table matching database reality.
**Centrality bridge-builders.** The co-authorship network (491 nodes, 1,142 edges) revealed that European telecoms -- not US Big Tech, not the UN, not any formal body -- are the structural glue between the Chinese and Western blocs. Telefonica's Luis M. Contreras ranks #1 in betweenness centrality. Only 115 of 557 authors (23%) bridge the divide at all. The standards ecosystem's cross-divide cohesion depends on a handful of companies that most observers would not name first. **Round 2 -- High.** Rate limiting added to Claude-calling endpoints (10 req/min/IP). Category names normalized in the database (21 legacy entries migrated). EU AI Act timeline corrected from "within 18 months" to "within 5 months (August 2026)" with enforcement details and article references. OAuth/GDPR consent distinction added. Hospital scenario annotated with AI Act Annex III and Medical Devices Regulation context. Safety ratio qualified everywhere from flat "4:1" to "averaging ~4:1 but varying from 1.5:1 to 21:1 month-to-month."
The Writer wove all three findings into the series across multiple targeted passes: RFC divergence into Posts 2, 3, and 6; revision velocity into Posts 2 and 5; centrality data into Post 2's cross-pollination section. The Coder's co-occurrence analysis added one more dimension to Post 4: safety co-occurs with governance categories (60% with policy, 58% with identity/auth) but has **zero co-occurrence with Agent Discovery and Model Serving** -- safety is discussed as policy, not implemented as protocol. **Round 3 -- Medium.** Methodology documentation created (comprehensive `methodology.md` covering all pipeline stages, limitations, and related work). IETF IPR notes added. Language hedged where causal claims were only supported by correlation. MIT LICENSE file created (the project claimed "open source" but had no license). FIPA, IEEE P3394, and eIDAS 2.0 references added where they naturally strengthen arguments. Coder reduced `cli.py` by 200 lines of boilerplate, added `--dry-run` flags to destructive commands, fixed N+1 query patterns.
## What Surprised Us In total: 14 files modified across the blog series, 7 security/quality fixes applied to the codebase, test count increased from 23 to 64, and a verified-counts document created as a single source of truth.
### Human judgment was the critical intervention ## What This Reveals
The ideas reframing was not the only moment where human direction changed the team's course, but it was the most instructive. Agents are excellent at execution -- the Writer applied six revision passes without error, the Coder built seven features in a single session, the Analyst ran 20+ analytical queries. But none of them asked whether the headline metric was worth headlining. The human project lead's "so what?" produced a better Post 5 than any amount of agent iteration would have. ### Specialized perspectives catch different things
This maps directly to the IETF's Human Override and Intervention gap. The question is not whether agents can do the work. The question is who notices when the work is pointed in the wrong direction. This is the headline finding from the review cycle. Four reviewers looked at the same output and found almost entirely non-overlapping issues. The statistician found number mismatches. The lawyer found consent conflation. The engineer found SQL injection. The scientist found methodological gaps. No single reviewer -- no matter how thorough -- would have caught all 36 issues.
### The silent failure exposed a verification gap This is not a theoretical observation about diverse review. It is an empirical result from running the experiment. The legal reviewer's consent-conflation finding required knowledge of CJEU case law. The statistical reviewer's ideas-count discovery required querying the live database. The engineering reviewer's SQL injection required reading the source code line by line. These are genuinely different skills applied to the same artifact.
The Writer's Post 1 revisions disappearing -- logged as done but not actually persisted -- is a small-scale version of the Agent Behavior Verification gap the series identifies as critical. In our case, the Architect caught it during a manual review pass. In a production multi-agent system with no verification protocol, the error propagates. The dev journal said success. The file system disagreed. We had no automated mechanism to detect the discrepancy. ### The review-fix-verify pattern works
### The Architect role was disproportionately valuable The cycle ran cleanly: four parallel reviews produced a prioritized list; fix agents resolved issues in severity order; the fixes were verified against the review documents. Three rounds (critical, high, medium) imposed natural prioritization. The entire cycle -- 4 reviews plus 3 fix rounds -- happened in a single day.
The Architect produced fewer words than the Writer and fewer features than the Coder, but shaped the entire output. Three specific contributions had outsized impact: The pattern mirrors what the IETF itself does with Last Call reviews, directorate reviews, and IESG evaluation. Multiple specialized perspectives, applied in sequence, with verification that issues are resolved. The difference is that our cycle took hours, not months. The cost is that our reviewers share the same underlying model and its blindspots.
1. The insight that gap severity correlates with coordination difficulty transformed Post 4 from a list of gaps into an argument about structural dysfunction. ### Agents modifying the same files is the hard problem
2. The "two equilibria" framing in the vision document gave Post 6's predictions real weight -- not just "here is what might happen" but "here are two stable endpoints, and this ratio determines which one we reach."
3. The verification pass that caught the Post 1 silent failure -- and the broader pattern of verifying outputs rather than trusting status messages.
All three contributions came from reading holistically -- something no individual report, pipeline run, or status message could produce. The Architect role was fundamentally about synthesis and verification. The most persistent coordination difficulty was not conceptual but logistical: multiple agents editing the same blog posts. The Writer updated Post 4's gap table. The Editor changed the safety ratio phrasing. The Coder corrected the draft count. Each edit was correct in isolation. But when three agents modify the same file, merge conflicts and stale reads are inevitable. We hit this multiple times -- most visibly with the Post 1 revisions that silently failed to persist.
### The cheapest analyses were the most important This maps directly to the IETF's Agent Execution Model gap. When multiple agents operate on shared state, you need either locking (pessimistic) or conflict detection (optimistic). We had neither. We used a file system, a dev journal, and hope.
| Component | Cost | Most Important Finding | ### The cheapest analyses mattered most
|-----------|-----:|----------------------|
| Claude Sonnet (ratings, gaps) | ~$8 | 4:1 safety deficit, 11 gap taxonomy | | Component | Cost | Key Finding |
| Claude Haiku (idea extraction) | ~$0.80 | 419 ideas (vast majority unique to single drafts) | |-----------|-----:|-------------|
| Claude Sonnet (ratings, gaps) | ~$8 | 4:1 safety deficit, 11 gaps |
| Claude Haiku (idea extraction) | ~$0.80 | 419 ideas, 96% unique to one draft |
| 4 reviewers (parallel) | ~$4 | 36 issues across 4 dimensions |
| Ollama embeddings | $0.00 | 25+ near-duplicate pairs | | Ollama embeddings | $0.00 | 25+ near-duplicate pairs |
| Coder: regex RFC parsing | $0.00 | Foundation divergence (YANG vs COSE) | | Coder: regex, SQL, networkx | $0.00 | RFC divergence, centrality, co-occurrence |
| Coder: networkx centrality | $0.00 | European telecoms as bridge-builders | | **Total** | **~$13** | |
| Coder: SQL co-occurrence | $0.00 | Safety structurally isolated from protocols |
| Coder: revision counting | $0.00 | 55% fire-and-forget rate |
| **Total pipeline** | **~$9** | |
The pattern is consistent: Claude provided the foundation data (ratings, categories, ideas), but the structurally revealing findings came from deterministic local computation on top of that foundation. RFC cross-references (regex), author centrality (networkx), revision velocity (filename parsing), and category co-occurrence (SQL joins) -- all zero-cost, all among the most quotable findings in the series. The LLM provided the foundation data. Every structurally revealing finding -- RFC foundation divergence, European telecoms as bridge-builders, safety structurally isolated from protocols, 55% fire-and-forget revision rate -- came from deterministic local computation on top of that foundation. The lesson for anyone building LLM-powered analysis: the model is the foundation, not the insight engine.
### The development journal earned its keep
We required every agent to log milestones to a shared `dev-journal.md`. By session's end, the journal had 30 entries across all four agents -- capturing not just what was done but why, and flagging surprises that would otherwise be lost. When the Writer needed to understand what the Coder had built, the journal entry was faster and more informative than a status message. When the Architect reviewed posts, the Writer's journal entries explained editorial decisions that would otherwise be opaque.
The journal also became the source material for this post. Every "Surprise" field in the journal captured an insight -- the ideas reframing, the silent failure, the RFC divergence revelation -- that no other artifact preserves.
## What This Tells Us About Agent Teams
Six lessons from running a four-agent team on a real project:
**1. Role definitions matter more than instructions.** The one-page agent definitions were more effective than the 3,000-word team brief. Agents performed best when they had a clear identity and scope, not a detailed todo list.
**2. Shared state beats messaging.** The SQLite database, the dev journal, and the report files were more effective coordination mechanisms than direct inter-agent messages. Agents could read each other's outputs on their own schedule, without the overhead of request-response communication.
**3. Async is natural, but verification is not.** Agents working in parallel on loosely coupled tasks is a pattern that works. What does not happen naturally is output verification. The silent failure -- revisions logged but not persisted -- would have gone undetected without a deliberate verification pass. Agent teams need assurance mechanisms, not just coordination mechanisms.
**4. Humans catch category errors; agents catch consistency errors.** The Architect found a 14-vs-13 data inconsistency. The Writer applied six revision passes without introducing a single factual error. Agents are excellent at consistency within a frame. But the project lead's "so what?" about the ideas count was a category-level critique -- questioning the frame itself. That kind of challenge did not emerge from any agent.
**5. Review compounds.** The Architect reviewed the Writer's posts, the project lead reviewed the Architect's framing, and the resulting revisions cascaded through the series. Each review layer caught different things: data errors, structural problems, framing weaknesses. Multiple review passes from different perspectives produced compounding quality gains.
**6. The journal is the product.** The dev journal -- originally intended as a process artifact -- became the richest record of what happened and why. It captures decisions, surprises, and coordination moments that no other artifact preserves. For any multi-agent project, require a shared journal.
## The Meta-Irony ## The Meta-Irony
We built a team of AI agents to analyze 434 IETF drafts about AI agent standards. The team needed: coordination mechanisms, shared context, role-based specialization, review and quality gates, human oversight, and a way to verify that completed work was actually complete. We built a team of AI agents to analyze IETF drafts about AI agent standards. The team needed coordination, shared context, specialized roles, quality review, human oversight, and output verification. Every one of these needs maps to a gap in the IETF landscape:
Every one of these needs maps to a gap in the IETF landscape:
| Our Team Needed | What Happened | IETF Gap | | Our Team Needed | What Happened | IETF Gap |
|----------------|---------------|----------| |----------------|---------------|----------|
| Shared execution context | Agents coordinated via SQLite, files, dev journal | Agent Execution Model (no standard) | | Shared execution context | Agents coordinated via SQLite, files, dev journal | Agent Execution Model (no standard) |
| Quality review before publication | Architect caught data errors, structural problems | Agent Behavior Verification (critical gap) | | Output verification | Writer's revisions silently failed; Architect caught it manually | Agent Behavioral Verification (critical) |
| Output verification | Writer's revisions silently failed; Architect caught it manually | Agent Behavior Verification (critical gap) | | Quality review | 4 parallel reviewers found 36 issues the writing team missed | Agent Behavioral Verification (critical) |
| Error handling when agents disagreed | Ideas reframing required 3 iterations to stabilize | Agent Error Recovery (6 ideas from 1 draft) | | Error handling | Ideas reframing required 3 iterations to stabilize numbers | Real-Time Agent Rollback (high) |
| Coordination across different approaches | RFC divergence: agents building on different foundations | Cross-Protocol Translation (zero ideas) | | Coordination across approaches | Agents editing the same files with no merge mechanism | Cross-Protocol Agent Migration (medium) |
| Human oversight of outputs | Project lead's "so what?" redirected the entire ideas framing | Human Override and Intervention (4 ideas) | | Human oversight | Project lead's "so what?" redirected the entire ideas framing | Human Override Standardization (high) |
| Specialized perspectives | Legal, statistical, engineering, and scientific reviewers each found unique issues | Agent Capability Negotiation (medium) |
We solved these problems ad hoc -- with a dev journal, a task board, role definitions, manual verification passes, and human review. The IETF is trying to solve them at internet scale with protocol standards. The distance between our 4-agent team and a deployed multi-agent system on the open internet is vast, but the problems are structurally identical. We solved these problems ad hoc -- with a journal, role definitions, manual verification passes, severity-prioritized fix rounds, and human review. The IETF is trying to solve them at internet scale with protocol standards.
The standards the IETF is racing to write are the standards our own team needed. The traffic lights the highway needs are the ones we built by hand. The distance between our 12-agent team and a deployed multi-agent system on the open internet is vast. But the problems are structurally identical. The standards the IETF is racing to write are the standards our own team needed. The traffic lights the highway needs are the ones we built by hand.
--- ---
### Key Takeaways ### Key Takeaways
- **Four agents** (Architect, Analyst, Coder, Writer) produced 8 blog posts, a vision document, 7 new analysis features, and 30 dev-journal entries from a ~$9 data pipeline - **Twelve agents across three phases** (4 writers, 4 reviewers, 4 fixers) produced 8 blog posts, a vision document, 7 analysis features, 36 identified issues, and 64 tests -- from a ~$13 pipeline
- **The ideas reframing** -- where a human's "so what?" redirected all four agents -- was the single most consequential intervention in the project, and no agent initiated it - **Four parallel reviewers found 36 non-overlapping issues**: a SQL injection, consent conflation with EU law, a 76% ideas count mismatch, and uncalibrated LLM-as-judge methodology. No single reviewer would have caught all of them
- **The human project lead's "so what?"** was the single most consequential intervention -- no agent questioned whether the headline metric was meaningful
- **A silent failure** (revisions logged but not persisted) demonstrated the same Behavior Verification gap the series identifies as critical in the IETF landscape - **A silent failure** (revisions logged but not persisted) demonstrated the same Behavior Verification gap the series identifies as critical in the IETF landscape
- **The cheapest analyses were the most revealing**: RFC divergence, author centrality, revision velocity, and co-occurrence patterns -- all zero-cost local computation -- produced the findings that defined the series - **The team's coordination problems mirror the IETF's gaps**: shared state, output verification, error recovery, capability negotiation, and human oversight are needed at every scale
- **The team's coordination problems mirror the IETF's gaps**: execution model, behavior verification, error recovery, cross-protocol translation, and human oversight are needed at every scale
*This post concludes the series. All data, code, and reports are available in the IETF Draft Analyzer project repository.* *This post concludes the series. All data, code, and reports are available in the IETF Draft Analyzer project repository.*

View File

@@ -4,6 +4,41 @@
--- ---
### 2026-03-08 ANALYST — Pipeline run: authors + gaps refresh
**What**: Ran the processing pipeline on 474-draft corpus. Fetched authors for 102 previously-unlinked drafts (113 were missing, 11 had Datatracker issues). Re-ran gap analysis with --refresh on the full corpus. Checked idea extraction status.
**Why**: After corpus expansion to 474 drafts, 113 drafts lacked author data and gap analysis needed refreshing against the full set.
**Result**: Author coverage: 463/474 drafts now have authors (up from ~350), 709 unique authors (up from 403). Gap analysis: 12 gaps identified (same count, refreshed against full corpus). All 474 drafts already rated. Idea extraction: 59 drafts have no ideas but are in the LLM cache (previously processed, yielded nothing -- 25 rated relevance 4-5, so may warrant individual re-extraction with --reextract).
**Surprise**: The `drafts_without_ideas` query checks both the ideas table AND the llm_cache table, so drafts that were batch-processed but yielded no ideas won't be retried by `--all`. To force re-extraction for high-relevance drafts without ideas, use `ietf ideas --reextract --draft <name>` individually.
---
### 2026-03-08 WRITER — Post 08 Rewrite: "Agents Building the Agent Analysis"
**What**: Complete rewrite of Post 08, the meta post about using Claude Code agent teams to build the project. The previous draft (~3,500 words, written before the review cycle) covered only Phase 1 (the writing team). The new version (~2,800 words) covers all three phases: the 4-agent writing team, the 4-agent review cycle, and the 3-round fix cycle.
**Why**: The review cycle was the most consequential phase of the entire project -- 4 parallel reviewers found 36 issues including a SQL injection, consent conflation with EU law, a 76% ideas count mismatch, and uncalibrated methodology. This material was missing from the previous draft entirely. The post needed to tell the complete story.
**Result**: New structure: Phase 1 (writing team + parallel execution + Architect's impact + human "so what?" intervention), Phase 2 (4 parallel reviewers, specific findings per reviewer), Phase 3 (3-round fix cycle by severity), then analysis sections (specialized perspectives, review-fix-verify pattern, shared-state coordination problem, cost breakdown, meta-irony table). The meta-irony table now maps 7 team coordination needs to specific IETF gap names from the database.
**Surprise**: The post's strongest structural element is the review cycle section -- the specific examples (consent conflation, HPKE false positive, silent revision failure) are more vivid and demonstrable than the writing-phase anecdotes. The review cycle essentially proved the thesis: agents analyzing agents need the same coordination standards the agents are analyzing.
---
### 2026-03-08 CODER — Track untracked files, update .gitignore
**What**: Cleaned up untracked files in the repo. Updated `.gitignore` to exclude LaTeX build artifacts (`paper/*.aux`, `paper/*.log`, `paper/*.out`), `data/analytics.db`, and `data/ietf_drafts.db` (stale DB). Staged 12 new files for commit: 5 scripts (`backfill-wg-names.py`, `classify-unrated.py`, `compare-classifiers.py`, `download-relevant-text.py`, `run-webui.sh`), 4 source modules (`classifier.py`, `analytics.py`, `auth.py`, `obsidian_export.py`), 1 test (`test_obsidian_export.py`), 2 reports (`wg-analysis.md`, generated draft).
**Why**: These files had accumulated as untracked over several sessions. Production code, utility scripts, and analysis reports all belong in version control. Build artifacts and local DBs do not.
**Result**: 12 files staged, .gitignore updated with 6 new patterns. No commit made yet (deferred to parent process).
---
### 2026-03-08 ANALYST — Re-extract ideas and convergence analysis
**What**: Ran idea extraction pipeline for 38 drafts that were missing ideas (out of 97 initially missing — 59 remain without ideas, likely false positives or drafts without sufficient content). Then ran cross-organization convergence analysis on the full idea set.
**Why**: Ideas count was stale at 419 across 377 drafts after the DB expanded to 474 drafts. Convergence analysis needed to understand which technical ideas are independently emerging across multiple organizations.
**Result**: 462 ideas across 415 drafts. Convergence analysis found 132 cross-org convergent ideas out of 398 unique clusters (33% convergence rate). Top convergent idea: "Fully Adaptive Routing Ethernet for AI" with 14 contributing organizations. Notable: "AI Agent Protocol Framework" converges across 7 orgs and 3 separate drafts. Updated `data/reports/reviews/verified-counts.md` with new counts and convergence results.
**Cost**: 654,377 tokens in + 335,984 tokens out (Haiku, cheap mode), 8 batches of 5 drafts.
---
### 2026-03-08 CODER — TypedDicts for data layer, ethics + regulatory content in blog series ### 2026-03-08 CODER — TypedDicts for data layer, ethics + regulatory content in blog series
**What**: Four improvements across typing and content: **What**: Four improvements across typing and content:

View File

@@ -1,9 +1,9 @@
# Gap Analysis: IETF AI/Agent Draft Landscape # Gap Analysis: IETF AI/Agent Draft Landscape
*Generated 2026-03-03 19:58 UTC — analyzing 361 drafts, 1780 technical ideas* *Generated 2026-03-08 14:30 UTC — analyzing 474 drafts, 462 technical ideas*
## Overview ## Overview
This report identifies **12 gaps** — areas, problems, or technical challenges not adequately addressed by the current 361 IETF AI/agent drafts. Each gap is cross-referenced with related drafts and extracted technical ideas to show partial coverage. This report identifies **12 gaps** — areas, problems, or technical challenges not adequately addressed by the current 474 IETF AI/agent drafts. Each gap is cross-referenced with related drafts and extracted technical ideas to show partial coverage.
| Severity | Count | | Severity | Count |
|----------|------:| |----------|------:|
@@ -13,34 +13,34 @@ This report identifies **12 gaps** — areas, problems, or technical challenges
### Safety Deficit ### Safety Deficit
Only **44** of 361 drafts address AI safety/alignment, while **120** focus on A2A protocols and **93** on autonomous operations. The ratio of capability-building to safety is roughly **4:1**. Only **46** of 474 drafts address AI safety/alignment, while **150** focus on A2A protocols and **110** on autonomous operations. The ratio of capability-building to safety is roughly **5:1**.
--- ---
## 1. Agent Behavior Verification ## 1. Real-time Agent Behavior Verification
| | | | | |
|---|---| |---|---|
| **Severity** | CRITICAL | | **Severity** | CRITICAL |
| **Category** | AI safety/alignment | | **Category** | AI safety/alignment |
| **Drafts in category** | 44 | | **Drafts in category** | 46 |
While many drafts address agent identity and authentication, few tackle how to verify that an agent is actually behaving according to its declared capabilities and policies. There's a critical gap in runtime behavioral attestation and compliance monitoring mechanisms. Current AI safety drafts focus on governance but lack technical protocols for real-time verification that agents are behaving according to their declared policies. There's no standard way to cryptographically prove agent actions match stated intentions.
**Evidence:** High overlap in identity/auth (108 drafts) but only 44 drafts on safety/alignment, with no specific focus on behavioral verification **Evidence:** Only 46 safety drafts versus 474 total, with governance focus rather than technical verification
### Related Drafts ### Related Drafts
**Keyword matches** (drafts mentioning gap topic): **Keyword matches** (drafts mentioning gap topic):
- [draft-li-dmsc-macp](https://datatracker.ietf.org/doc/draft-li-dmsc-macp/) (score 4.2) — Multi-agent Collaboration Protocol Suite - [draft-an-nmrg-i2icf-cits](https://datatracker.ietf.org/doc/draft-an-nmrg-i2icf-cits/) (score 3.7) — Interface to In-Network Computing Functions for Cooperative Intelligent Transpor
- [draft-zhao-detnet-enhanced-use-cases](https://datatracker.ietf.org/doc/draft-zhao-detnet-enhanced-use-cases/) (score 3.2) — Enhanced Use Cases for Scaling Deterministic Networks
- [draft-zhang-rvp-problem-statement](https://datatracker.ietf.org/doc/draft-zhang-rvp-problem-statement/) (score 3.5) — Problem Statements and Requirements of Real-Virtual Agent Protocol (RVP): Commun
- [draft-yuan-rtgwg-traffic-agent-usecase](https://datatracker.ietf.org/doc/draft-yuan-rtgwg-traffic-agent-usecase/) (score 3.7) — Use cases of the AI Network Traffic Optimization Agent
- [draft-altanai-aipref-realtime-protocol-bindings](https://datatracker.ietf.org/doc/draft-altanai-aipref-realtime-protocol-bindings/) (score 3.6) — AI Preferences for Real-Time Protocol Bindings
- [draft-zheng-agent-identity-management](https://datatracker.ietf.org/doc/draft-zheng-agent-identity-management/) (score 3.7) — Agent Identity Managenment - [draft-zheng-agent-identity-management](https://datatracker.ietf.org/doc/draft-zheng-agent-identity-management/) (score 3.7) — Agent Identity Managenment
- [draft-zheng-dispatch-agent-identity-management](https://datatracker.ietf.org/doc/draft-zheng-dispatch-agent-identity-management/) (score 3.3) — Agent Identity Managenment
- [draft-fu-nmop-agent-communication-framework](https://datatracker.ietf.org/doc/draft-fu-nmop-agent-communication-framework/) (score 3.0) — Agent Communication Framework for Network AIOps
- [draft-zyyhl-agent-networks-framework](https://datatracker.ietf.org/doc/draft-zyyhl-agent-networks-framework/) (score 3.6) — Framework for AI Agent Networks
- [draft-ruan-spring-priority-flow-control-sid](https://datatracker.ietf.org/doc/draft-ruan-spring-priority-flow-control-sid/) (score 3.1) — SRv6 behavior extention for Flow Control in WAN
**Top-rated in AI safety/alignment** (44 drafts): **Top-rated in AI safety/alignment** (46 drafts):
- [draft-cowles-volt](https://datatracker.ietf.org/doc/draft-cowles-volt/) (4.8) — Defines tamper-evident execution trace format for AI agent workflows using hash chains and cryptogra - [draft-cowles-volt](https://datatracker.ietf.org/doc/draft-cowles-volt/) (4.8) — Defines tamper-evident execution trace format for AI agent workflows using hash chains and cryptogra
- [draft-aylward-daap-v2](https://datatracker.ietf.org/doc/draft-aylward-daap-v2/) (4.8) — Defines comprehensive protocol for AI agent accountability including authentication, monitoring, and - [draft-aylward-daap-v2](https://datatracker.ietf.org/doc/draft-aylward-daap-v2/) (4.8) — Defines comprehensive protocol for AI agent accountability including authentication, monitoring, and
@@ -50,275 +50,34 @@ While many drafts address agent identity and authentication, few tackle how to v
### Partially Addressing Ideas ### Partially Addressing Ideas
53 extracted ideas touch on this gap: 17 extracted ideas touch on this gap:
| Idea | Draft | Type | | Idea | Draft | Type |
|------|-------|------| |------|-------|------|
| Verifiable Agent Behavior Attestation | draft-birkholz-verifiable-agent-conversations | requirement | | Distributed AI Accountability Protocol | draft-aylward-daap-v2 | protocol |
| Behavioral Trustworthiness Assessment | draft-chen-agent-decoupled-authorization-model | mechanism | | AGENTS.TXT Policy File | draft-srijal-agents-policy | protocol |
| Multi-Vendor TEE Attestation (M-TACE) | draft-aylward-aiga-1 | mechanism | | AI Network Security Agent | draft-yuan-rtgwg-security-agent-usecase | architecture |
| Multi-Vendor TEE Attestation (M-TACE) | draft-aylward-aiga-2 | mechanism | | A2A Protocol Transport over MOQT | draft-a2a-moqt-transport | protocol |
| Cryptographic Identity Verification | draft-aylward-daap-v2 | mechanism |
| Behavioral Monitoring Framework | draft-aylward-daap-v2 | mechanism |
| Post-Discovery Authorization Handshake | draft-barney-caam | protocol | | Post-Discovery Authorization Handshake | draft-barney-caam | protocol |
| Five Enforcement Pillars with Typed Schemas | draft-berlinai-vera | pattern | | Evidence-based Autonomy Maturity Model | draft-berlinai-vera | mechanism |
| Verifiable Agent Conversation Format | draft-birkholz-verifiable-agent-conversations | protocol |
| Intent-Based Just-in-Time Authorization | draft-chen-agent-decoupled-authorization-model | architecture |
*...and 45 more* *...and 9 more*
--- ---
## 2. Cross-Domain Agent Liability ## 2. Multi-Agent Consensus Under Byzantine Conditions
| | | | | |
|---|---| |---|---|
| **Severity** | CRITICAL | | **Severity** | CRITICAL |
| **Category** | Policy/governance |
| **Drafts in category** | 91 |
When autonomous agents operate across organizational boundaries and cause harm or make decisions with legal implications, there's no standardized framework for liability attribution. The policy/governance drafts don't address cross-jurisdictional legal accountability.
**Evidence:** 91 policy/governance drafts but legal liability for cross-domain autonomous actions remains unaddressed
### Related Drafts
**Keyword matches** (drafts mentioning gap topic):
- [draft-diaconu-agents-authz-info-sharing](https://datatracker.ietf.org/doc/draft-diaconu-agents-authz-info-sharing/) (score 3.2) — Cross-Domain AuthZ Information sharing for Agents
- [draft-cui-dmsc-agent-cdi](https://datatracker.ietf.org/doc/draft-cui-dmsc-agent-cdi/) (score 3.0) — Cross-Domain Interoperability Framework for AI Agent Collaboration
- [draft-han-rtgwg-agent-gateway-intercomm-framework](https://datatracker.ietf.org/doc/draft-han-rtgwg-agent-gateway-intercomm-framework/) (score 3.6) — Agent Gateway Intercommunication Framework
- [draft-ni-a2a-ai-agent-security-requirements](https://datatracker.ietf.org/doc/draft-ni-a2a-ai-agent-security-requirements/) (score 3.7) — Security Requirements for AI Agents
- [draft-intellinode-ai-semantic-contract](https://datatracker.ietf.org/doc/draft-intellinode-ai-semantic-contract/) (score 3.2) — Semantic-Driven Traffic Shaping Contract for AI Networks
- [draft-li-dmsc-macp](https://datatracker.ietf.org/doc/draft-li-dmsc-macp/) (score 4.2) — Multi-agent Collaboration Protocol Suite
**Top-rated in Policy/governance** (91 drafts):
- [draft-cowles-volt](https://datatracker.ietf.org/doc/draft-cowles-volt/) (4.8) — Defines tamper-evident execution trace format for AI agent workflows using hash chains and cryptogra
- [draft-aylward-daap-v2](https://datatracker.ietf.org/doc/draft-aylward-daap-v2/) (4.8) — Defines comprehensive protocol for AI agent accountability including authentication, monitoring, and
- [draft-goswami-agentic-jwt](https://datatracker.ietf.org/doc/draft-goswami-agentic-jwt/) (4.5) — Extends OAuth 2.0 with Agentic JWT to address authorization challenges in autonomous AI systems. Int
- [draft-wang-cats-odsi](https://datatracker.ietf.org/doc/draft-wang-cats-odsi/) (4.5) — Specifies framework for decentralized LLM inference across untrusted participants with layer-aware e
- [draft-birkholz-verifiable-agent-conversations](https://datatracker.ietf.org/doc/draft-birkholz-verifiable-agent-conversations/) (4.5) — Defines CDDL-based data format for verifiable agent conversation records using COSE signing. Support
### Partially Addressing Ideas
26 extracted ideas touch on this gap:
| Idea | Draft | Type |
|------|-------|------|
| Cross-Domain Agent Identity Management | draft-abbey-scim-agent-extension | protocol |
| Multi-level Inference Protocol | draft-chuyi-nmrg-agentic-network-inference | protocol |
| Cross-Domain Agent Coordination | draft-chuyi-nmrg-agentic-network-inference | mechanism |
| Cross-Domain Agent Discovery | draft-cui-dmsc-agent-cdi | mechanism |
| Federated Agent Identity Framework | draft-cui-dmsc-agent-cdi | architecture |
| Agent Capability Negotiation Protocol | draft-cui-dmsc-agent-cdi | protocol |
| Federated Policy Enforcement | draft-cui-dmsc-agent-cdi | architecture |
| Cross-Domain Authorization Information Sharing | draft-diaconu-agents-authz-info-sharing | mechanism |
*...and 18 more*
---
## 3. Human Override Protocols
| | |
|---|---|
| **Severity** | CRITICAL |
| **Category** | Human-agent interaction |
| **Drafts in category** | 30 |
Critical gap in standardized protocols for humans to safely interrupt, override, or take control of autonomous agents in emergency situations. Only 30 drafts address human-agent interaction, with no focus on emergency takeover procedures.
**Evidence:** Only 30 human-agent interaction drafts compared to 213+ autonomous operation drafts, with no emergency override standards
### Related Drafts
**Keyword matches** (drafts mentioning gap topic):
- [draft-dhir-http-agent-profile](https://datatracker.ietf.org/doc/draft-dhir-http-agent-profile/) (score 4.2) — HTTP Agent Profile (HAP): Authenticated and Monetized Agent Traffic on the Web
- [draft-irtf-nmrg-llm-nm](https://datatracker.ietf.org/doc/draft-irtf-nmrg-llm-nm/) (score 3.5) — A Framework for LLM-Assisted Network Management with Human-in-the-Loop
- [draft-cui-nmrg-llm-nm](https://datatracker.ietf.org/doc/draft-cui-nmrg-llm-nm/) (score 4.1) — A Framework for LLM Agent-Assisted Network Management with Human-in-the-Loop
- [draft-zeng-opsawg-applicability-mcp-a2a](https://datatracker.ietf.org/doc/draft-zeng-opsawg-applicability-mcp-a2a/) (score 3.5) — When NETCONF Is Not Enough: Applicability of MCP and A2A for Advanced Network Ma
- [draft-wmz-nmrg-agent-ndt-arch](https://datatracker.ietf.org/doc/draft-wmz-nmrg-agent-ndt-arch/) (score 4.2) — Network Digital Twin and Agentic AI based Architecture for AI driven Network Ope
- [draft-ietf-suit-firmware-encryption](https://datatracker.ietf.org/doc/draft-ietf-suit-firmware-encryption/) (score 3.7) — Encrypted Payloads in SUIT Manifests
**Top-rated in Human-agent interaction** (30 drafts):
- [draft-drake-email-tpm-attestation](https://datatracker.ietf.org/doc/draft-drake-email-tpm-attestation/) (4.6) — Defines hardware attestation for email using TPM verification chains to prevent spam and provide Syb
- [draft-ietf-aipref-vocab](https://datatracker.ietf.org/doc/draft-ietf-aipref-vocab/) (4.4) — Defines a standardized vocabulary for expressing preferences about how digital assets should be used
- [draft-dhir-http-agent-profile](https://datatracker.ietf.org/doc/draft-dhir-http-agent-profile/) (4.2) — Defines HTTP Agent Profile for authenticating agent traffic, separating human from agent traffic, an
- [draft-song-tsvwg-camp](https://datatracker.ietf.org/doc/draft-song-tsvwg-camp/) (4.2) — Proposes CAMP, a multipath transport protocol for interactive multimodal LLM systems that maintains
- [draft-liu-agent-operation-authorization](https://datatracker.ietf.org/doc/draft-liu-agent-operation-authorization/) (4.1) — Specifies framework for verifiable delegation of actions from humans to AI agents using JWT tokens.
### Partially Addressing Ideas
7 extracted ideas touch on this gap:
| Idea | Draft | Type |
|------|-------|------|
| LLM-Human Collaborative Framework | draft-irtf-nmrg-llm-nm | architecture |
| CHEQ Protocol | draft-rosenberg-aiproto-cheq | protocol |
| Signed Confirmation Objects | draft-rosenberg-aiproto-cheq | mechanism |
| Cross-Protocol Integration Pattern | draft-rosenberg-aiproto-cheq | pattern |
| CHEQ Protocol | draft-rosenberg-cheq | protocol |
| Signed Decision Objects | draft-rosenberg-cheq | mechanism |
| Protocol Integration Pattern | draft-rosenberg-cheq | pattern |
---
## 4. Agent Resource Exhaustion Protection
| | |
|---|---|
| **Severity** | HIGH |
| **Category** | Autonomous netops |
| **Drafts in category** | 93 |
Missing standardized mechanisms to prevent malicious or poorly designed agents from consuming excessive network, compute, or storage resources. Current drafts focus on traffic management but not on agent-specific resource quotas and enforcement.
**Evidence:** 93 autonomous netops drafts and 73 ML traffic management drafts lack agent-specific resource protection mechanisms
### Related Drafts
**Keyword matches** (drafts mentioning gap topic):
- [draft-li-dmsc-macp](https://datatracker.ietf.org/doc/draft-li-dmsc-macp/) (score 4.2) — Multi-agent Collaboration Protocol Suite
- [draft-zheng-agent-identity-management](https://datatracker.ietf.org/doc/draft-zheng-agent-identity-management/) (score 3.7) — Agent Identity Managenment
- [draft-zheng-dispatch-agent-identity-management](https://datatracker.ietf.org/doc/draft-zheng-dispatch-agent-identity-management/) (score 3.3) — Agent Identity Managenment
- [draft-fu-nmop-agent-communication-framework](https://datatracker.ietf.org/doc/draft-fu-nmop-agent-communication-framework/) (score 3.0) — Agent Communication Framework for Network AIOps
- [draft-zyyhl-agent-networks-framework](https://datatracker.ietf.org/doc/draft-zyyhl-agent-networks-framework/) (score 3.6) — Framework for AI Agent Networks
- [draft-jia-oauth-scope-aggregation](https://datatracker.ietf.org/doc/draft-jia-oauth-scope-aggregation/) (score 3.5) — OAuth 2.0 Scope Aggregation for Multi-Step AI Agent Workflows
**Top-rated in Autonomous netops** (93 drafts):
- [draft-cui-nmrg-llm-benchmark](https://datatracker.ietf.org/doc/draft-cui-nmrg-llm-benchmark/) (4.3) — Provides comprehensive evaluation framework for LLM-based network configuration agents. Includes emu
- [draft-wmz-nmrg-agent-ndt-arch](https://datatracker.ietf.org/doc/draft-wmz-nmrg-agent-ndt-arch/) (4.2) — Comprehensive architecture combining Network Digital Twin with Agentic AI for intent-based network o
- [draft-yue-anima-agent-recovery-networks](https://datatracker.ietf.org/doc/draft-yue-anima-agent-recovery-networks/) (4.1) — Defines task-oriented multi-agent framework for fault recovery in converged mobile networks. Targets
- [draft-cui-nmrg-llm-nm](https://datatracker.ietf.org/doc/draft-cui-nmrg-llm-nm/) (4.1) — Defines framework for collaborative network management between LLM agents and human operators. Intro
- [draft-jadoon-nmrg-agentic-ai-autonomous-networks](https://datatracker.ietf.org/doc/draft-jadoon-nmrg-agentic-ai-autonomous-networks/) (4.1) — Introduces architectural principles for integrating AI agents into IP protocol stack layers while pr
### Partially Addressing Ideas
40 extracted ideas touch on this gap:
| Idea | Draft | Type |
|------|-------|------|
| Agent Resource Type | draft-abbey-scim-agent-extension | extension |
| Agentic Application Resource Type | draft-abbey-scim-agent-extension | extension |
| Collaborative Inference Acceleration (KDN) | draft-agent-gw | mechanism |
| Data and Agent Aware-Inference and Training Network (DA-ITN) | draft-akhavain-moussa-ai-network | architecture |
| Agent-to-Agent (A2A) Communication Paradigm | draft-an-nmrg-i2icf-cits | protocol |
| Network-Level Quarantine Protocol | draft-aylward-aiga-1 | protocol |
| Agent Task Negotiation | draft-cui-ai-agent-task | protocol |
| Multi-Agent Security Protection | draft-fu-nmop-agent-communication-framework | mechanism |
*...and 32 more*
---
## 5. Agent-Generated Data Provenance
| | |
|---|---|
| **Severity** | HIGH |
| **Category** | Data formats/interop |
| **Drafts in category** | 145 |
While 145 drafts address data formats for AI interop, there's insufficient attention to tracking the provenance and lineage of data generated by agents. This creates trust and auditability issues in agent-to-agent data exchanges.
**Evidence:** 145 data format drafts with high overlap but no clear standards for agent-generated data provenance tracking
### Related Drafts
**Keyword matches** (drafts mentioning gap topic):
- [draft-romanchuk-normative-admissibility](https://datatracker.ietf.org/doc/draft-romanchuk-normative-admissibility/) (score 3.4) — Normative Admissibility Framework for Agent Speech Acts
- [draft-li-semantic-routing-architecture](https://datatracker.ietf.org/doc/draft-li-semantic-routing-architecture/) (score 3.6) — Semantic Routing Architecture for AI Agents Communication
- [draft-cui-nmrg-llm-nm](https://datatracker.ietf.org/doc/draft-cui-nmrg-llm-nm/) (score 4.1) — A Framework for LLM Agent-Assisted Network Management with Human-in-the-Loop
- [draft-mpsb-agntcy-messaging](https://datatracker.ietf.org/doc/draft-mpsb-agntcy-messaging/) (score 2.6) — An Overview of Messaging Systems and Their Applicability to Agentic AI
- [draft-gaikwad-south-authorization](https://datatracker.ietf.org/doc/draft-gaikwad-south-authorization/) (score 3.7) — SOUTH: Stochastic Authorization for Agent and Service Requests
- [draft-abaris-aicdh](https://datatracker.ietf.org/doc/draft-abaris-aicdh/) (score 2.8) — AI Content Disclosure Header
**Top-rated in Data formats/interop** (145 drafts):
- [draft-cowles-volt](https://datatracker.ietf.org/doc/draft-cowles-volt/) (4.8) — Defines tamper-evident execution trace format for AI agent workflows using hash chains and cryptogra
- [draft-williams-netmod-lm-hierarchy-topology](https://datatracker.ietf.org/doc/draft-williams-netmod-lm-hierarchy-topology/) (4.6) — Defines YANG data model for hierarchical language model coordination across tiny, small, and large L
- [draft-ietf-lake-app-profiles](https://datatracker.ietf.org/doc/draft-ietf-lake-app-profiles/) (4.6) — Defines canonical CBOR representation for EDHOC application profiles and coordination mechanisms for
- [draft-chang-agent-token-efficient](https://datatracker.ietf.org/doc/draft-chang-agent-token-efficient/) (4.5) — Defines ADOL (Agentic Data Optimization Layer) to address token bloat in agent communication protoco
- [draft-birkholz-verifiable-agent-conversations](https://datatracker.ietf.org/doc/draft-birkholz-verifiable-agent-conversations/) (4.5) — Defines CDDL-based data format for verifiable agent conversation records using COSE signing. Support
### Partially Addressing Ideas
4 extracted ideas touch on this gap:
| Idea | Draft | Type |
|------|-------|------|
| Context-Enhanced Training Data | draft-improving-data-quality-tags | extension |
| Training Data Provenance Claims | draft-messous-eat-ai | mechanism |
| Sentinel Evidence Package | draft-reilly-sentinel-protocol | architecture |
| AI Lifecycle Provenance Tracking | draft-reilly-sentinel-protocol | architecture |
---
## 6. Agent Capability Degradation Handling
| | |
|---|---|
| **Severity** | HIGH |
| **Category** | AI safety/alignment |
| **Drafts in category** | 44 |
No standardized approaches for detecting and handling when an agent's capabilities degrade due to model drift, data corruption, or hardware issues. Systems need graceful degradation protocols rather than silent failures.
**Evidence:** Only 44 safety/alignment drafts don't address capability degradation, while 213+ drafts assume stable agent performance
### Related Drafts
**Keyword matches** (drafts mentioning gap topic):
- [draft-li-dmsc-macp](https://datatracker.ietf.org/doc/draft-li-dmsc-macp/) (score 4.2) — Multi-agent Collaboration Protocol Suite
- [draft-zheng-agent-identity-management](https://datatracker.ietf.org/doc/draft-zheng-agent-identity-management/) (score 3.7) — Agent Identity Managenment
- [draft-zheng-dispatch-agent-identity-management](https://datatracker.ietf.org/doc/draft-zheng-dispatch-agent-identity-management/) (score 3.3) — Agent Identity Managenment
- [draft-fu-nmop-agent-communication-framework](https://datatracker.ietf.org/doc/draft-fu-nmop-agent-communication-framework/) (score 3.0) — Agent Communication Framework for Network AIOps
- [draft-zyyhl-agent-networks-framework](https://datatracker.ietf.org/doc/draft-zyyhl-agent-networks-framework/) (score 3.6) — Framework for AI Agent Networks
- [draft-li-dmsc-inf-architecture](https://datatracker.ietf.org/doc/draft-li-dmsc-inf-architecture/) (score 3.1) — Dynamic Multi-agent Secured Collaboration Infrastructure Architecture
**Top-rated in AI safety/alignment** (44 drafts):
- [draft-cowles-volt](https://datatracker.ietf.org/doc/draft-cowles-volt/) (4.8) — Defines tamper-evident execution trace format for AI agent workflows using hash chains and cryptogra
- [draft-aylward-daap-v2](https://datatracker.ietf.org/doc/draft-aylward-daap-v2/) (4.8) — Defines comprehensive protocol for AI agent accountability including authentication, monitoring, and
- [draft-guy-bary-stamp-protocol](https://datatracker.ietf.org/doc/draft-guy-bary-stamp-protocol/) (4.6) — Defines STAMP protocol for cryptographic delegation and proof in AI agent systems. Provides task-bou
- [draft-drake-email-tpm-attestation](https://datatracker.ietf.org/doc/draft-drake-email-tpm-attestation/) (4.6) — Defines hardware attestation for email using TPM verification chains to prevent spam and provide Syb
- [draft-goswami-agentic-jwt](https://datatracker.ietf.org/doc/draft-goswami-agentic-jwt/) (4.5) — Extends OAuth 2.0 with Agentic JWT to address authorization challenges in autonomous AI systems. Int
### Partially Addressing Ideas
45 extracted ideas touch on this gap:
| Idea | Draft | Type |
|------|-------|------|
| Semantic Routing | draft-agent-gw | mechanism |
| Semantic Routing | draft-ainp-protocol | mechanism |
| Capability-based Discovery | draft-ainp-protocol | pattern |
| Complex Delegation Relationship Management | draft-chen-ai-agent-auth-new-requirements | architecture |
| Capability-Based Discovery Mechanism | draft-cui-ai-agent-discovery-invocation | mechanism |
| Agent Capability Negotiation Protocol | draft-cui-dmsc-agent-cdi | protocol |
| Agent Capability-Based Routing | draft-du-catalist-routing-considerations | mechanism |
| Agent Monitoring and Tracking | draft-fu-nmop-agent-communication-framework | mechanism |
*...and 37 more*
---
## 7. Multi-Agent Coordination Deadlocks
| | |
|---|---|
| **Severity** | HIGH |
| **Category** | A2A protocols | | **Category** | A2A protocols |
| **Drafts in category** | 120 | | **Drafts in category** | 150 |
With 120+ A2A protocol drafts, there's insufficient attention to preventing deadlock situations where multiple agents create circular dependencies or resource conflicts. Missing are standardized deadlock detection and resolution mechanisms. While agent discovery and A2A protocols exist, there's no framework for handling consensus when some agents may be compromised or malicious. Critical for autonomous systems making collective decisions.
**Evidence:** 120 A2A protocol drafts with high internal overlap but no systematic deadlock prevention frameworks **Evidence:** Complex autonomous systems require Byzantine fault tolerance but it's absent from protocol designs
### Related Drafts ### Related Drafts
@@ -329,196 +88,88 @@ With 120+ A2A protocol drafts, there's insufficient attention to preventing dead
- [draft-yue-anima-agent-recovery-networks](https://datatracker.ietf.org/doc/draft-yue-anima-agent-recovery-networks/) (score 4.1) — Task-Oriented Multi-Agent Recovery Framework for High-Reliability in Converged M - [draft-yue-anima-agent-recovery-networks](https://datatracker.ietf.org/doc/draft-yue-anima-agent-recovery-networks/) (score 4.1) — Task-Oriented Multi-Agent Recovery Framework for High-Reliability in Converged M
- [draft-chang-agent-context-interaction](https://datatracker.ietf.org/doc/draft-chang-agent-context-interaction/) (score 2.9) — Agent Context Interaction Optimizations - [draft-chang-agent-context-interaction](https://datatracker.ietf.org/doc/draft-chang-agent-context-interaction/) (score 2.9) — Agent Context Interaction Optimizations
- [draft-fu-nmop-agent-communication-framework](https://datatracker.ietf.org/doc/draft-fu-nmop-agent-communication-framework/) (score 3.0) — Agent Communication Framework for Network AIOps - [draft-fu-nmop-agent-communication-framework](https://datatracker.ietf.org/doc/draft-fu-nmop-agent-communication-framework/) (score 3.0) — Agent Communication Framework for Network AIOps
- [draft-cui-ai-agent-task](https://datatracker.ietf.org/doc/draft-cui-ai-agent-task/) (score 3.0) — Task-oriented Coordination Requirements for AI Agent Protocols - [draft-ramakrishna-satp-views-addresses](https://datatracker.ietf.org/doc/draft-ramakrishna-satp-views-addresses/) (score 3.4) — Views and View Addresses for Secure Asset Transfer
**Top-rated in A2A protocols** (120 drafts): **Top-rated in A2A protocols** (150 drafts):
- [draft-guy-bary-stamp-protocol](https://datatracker.ietf.org/doc/draft-guy-bary-stamp-protocol/) (4.6) — Defines STAMP protocol for cryptographic delegation and proof in AI agent systems. Provides task-bou - [draft-guy-bary-stamp-protocol](https://datatracker.ietf.org/doc/draft-guy-bary-stamp-protocol/) (4.6) — Defines STAMP protocol for cryptographic delegation and proof in AI agent systems. Provides task-bou
- [draft-williams-netmod-lm-hierarchy-topology](https://datatracker.ietf.org/doc/draft-williams-netmod-lm-hierarchy-topology/) (4.6) — Defines YANG data model for hierarchical language model coordination across tiny, small, and large L - [draft-williams-netmod-lm-hierarchy-topology](https://datatracker.ietf.org/doc/draft-williams-netmod-lm-hierarchy-topology/) (4.6) — Defines YANG data model for hierarchical language model coordination across tiny, small, and large L
- [draft-ietf-lake-edhoc](https://datatracker.ietf.org/doc/draft-ietf-lake-edhoc/) (4.6) — Specifies EDHOC, a compact authenticated Diffie-Hellman key exchange protocol for constrained enviro
- [draft-chang-agent-token-efficient](https://datatracker.ietf.org/doc/draft-chang-agent-token-efficient/) (4.5) — Defines ADOL (Agentic Data Optimization Layer) to address token bloat in agent communication protoco - [draft-chang-agent-token-efficient](https://datatracker.ietf.org/doc/draft-chang-agent-token-efficient/) (4.5) — Defines ADOL (Agentic Data Optimization Layer) to address token bloat in agent communication protoco
- [draft-chen-oauth-rar-agent-extensions](https://datatracker.ietf.org/doc/draft-chen-oauth-rar-agent-extensions/) (4.2) — Extends OAuth RAR with policy_context and lifecycle_binding members for AI agent environments. Enabl - [draft-chen-oauth-rar-agent-extensions](https://datatracker.ietf.org/doc/draft-chen-oauth-rar-agent-extensions/) (4.2) — Extends OAuth RAR with policy_context and lifecycle_binding members for AI agent environments. Enabl
- [draft-mallick-muacp](https://datatracker.ietf.org/doc/draft-mallick-muacp/) (4.2) — Resource-efficient messaging protocol specifically designed for constrained IoT/Edge devices with de
### Partially Addressing Ideas ### Partially Addressing Ideas
11 extracted ideas touch on this gap: 2 extracted ideas touch on this gap:
| Idea | Draft | Type | | Idea | Draft | Type |
|------|-------|------| |------|-------|------|
| Multi-Agent Task Coordination | draft-du-ai-agent-communication-6g-aspect | mechanism | | ASRank Structural Vulnerability Analysis | draft-xu-sidrops-asrank-vulnerabilities | requirement |
| AI Gateway | draft-fu-nmop-agent-communication-framework | architecture | | MCP and A2A Complementary Solutions for Network Management | draft-zeng-opsawg-applicability-mcp-a2a | architecture |
| DMSC Infrastructure Architecture | draft-li-dmsc-inf-architecture | architecture |
| Multi-agent Collaboration Protocol Suite | draft-li-dmsc-macp | protocol |
| Task-based Multi-Agent Coordination | draft-li-dmsc-mcps-agw | pattern |
| Cognitive Networking Substrate | draft-li-semantic-routing-architecture | architecture |
| Agent Communication Use Cases | draft-stephan-ai-agent-6g | pattern |
| Structured Responsibility and Traceability Architecture (SRTA) | draft-takagi-srta-trinity | architecture |
*...and 3 more*
--- ---
## 8. Agent Privacy Preservation ## 3. Emergency Agent Shutdown Coordination
| | | | | |
|---|---| |---|---|
| **Severity** | HIGH | | **Severity** | CRITICAL |
| **Category** | Agent identity/auth | | **Category** | AI safety/alignment |
| **Drafts in category** | 108 | | **Drafts in category** | 46 |
Agents often process sensitive data but current drafts don't adequately address privacy-preserving computation, differential privacy, or secure multi-party computation for agent interactions. This is critical for deployment in regulated industries. Missing protocols for coordinated emergency shutdown of autonomous agent networks when safety issues are detected. Individual agent controls exist but not network-wide coordination mechanisms.
**Evidence:** 108 identity/auth drafts focus on authentication but lack privacy preservation mechanisms for agent data processing **Evidence:** Human-in-the-loop drafts exist but no emergency coordination protocols for autonomous systems
### Related Drafts ### Related Drafts
**Keyword matches** (drafts mentioning gap topic): **Keyword matches** (drafts mentioning gap topic):
- [draft-li-dmsc-macp](https://datatracker.ietf.org/doc/draft-li-dmsc-macp/) (score 4.2) — Multi-agent Collaboration Protocol Suite - [draft-aylward-daap-v2](https://datatracker.ietf.org/doc/draft-aylward-daap-v2/) (score 4.8) — Distributed AI Accountability Protocol (DAAP) Version 2.0
- [draft-khatri-sipcore-call-transfer-fail-response](https://datatracker.ietf.org/doc/draft-khatri-sipcore-call-transfer-fail-response/) (score 3.3) — A SIP Response Code (497) for Call Transfer Failure
- [draft-cui-dmsc-agent-cdi](https://datatracker.ietf.org/doc/draft-cui-dmsc-agent-cdi/) (score 3.0) — Cross-Domain Interoperability Framework for AI Agent Collaboration
- [draft-yu-ai-agent-use-cases-in-6g](https://datatracker.ietf.org/doc/draft-yu-ai-agent-use-cases-in-6g/) (score 2.5) — AI Agent Use Cases and Requirements in 6G Network
- [draft-zhang-rvp-problem-statement](https://datatracker.ietf.org/doc/draft-zhang-rvp-problem-statement/) (score 3.5) — Problem Statements and Requirements of Real-Virtual Agent Protocol (RVP): Commun
- [draft-zheng-agent-identity-management](https://datatracker.ietf.org/doc/draft-zheng-agent-identity-management/) (score 3.7) — Agent Identity Managenment - [draft-zheng-agent-identity-management](https://datatracker.ietf.org/doc/draft-zheng-agent-identity-management/) (score 3.7) — Agent Identity Managenment
- [draft-zheng-dispatch-agent-identity-management](https://datatracker.ietf.org/doc/draft-zheng-dispatch-agent-identity-management/) (score 3.3) — Agent Identity Managenment
- [draft-fu-nmop-agent-communication-framework](https://datatracker.ietf.org/doc/draft-fu-nmop-agent-communication-framework/) (score 3.0) — Agent Communication Framework for Network AIOps
- [draft-zyyhl-agent-networks-framework](https://datatracker.ietf.org/doc/draft-zyyhl-agent-networks-framework/) (score 3.6) — Framework for AI Agent Networks
- [draft-kale-agntcy-federated-privacy](https://datatracker.ietf.org/doc/draft-kale-agntcy-federated-privacy/) (score 3.2) — Privacy-Preserving Federated Learning Architecture for Multi-Tenant AI Agent Sys
**Top-rated in Agent identity/auth** (108 drafts): **Top-rated in AI safety/alignment** (46 drafts):
- [draft-cowles-volt](https://datatracker.ietf.org/doc/draft-cowles-volt/) (4.8) — Defines tamper-evident execution trace format for AI agent workflows using hash chains and cryptogra - [draft-cowles-volt](https://datatracker.ietf.org/doc/draft-cowles-volt/) (4.8) — Defines tamper-evident execution trace format for AI agent workflows using hash chains and cryptogra
- [draft-aylward-daap-v2](https://datatracker.ietf.org/doc/draft-aylward-daap-v2/) (4.8) — Defines comprehensive protocol for AI agent accountability including authentication, monitoring, and - [draft-aylward-daap-v2](https://datatracker.ietf.org/doc/draft-aylward-daap-v2/) (4.8) — Defines comprehensive protocol for AI agent accountability including authentication, monitoring, and
- [draft-guy-bary-stamp-protocol](https://datatracker.ietf.org/doc/draft-guy-bary-stamp-protocol/) (4.6) — Defines STAMP protocol for cryptographic delegation and proof in AI agent systems. Provides task-bou - [draft-guy-bary-stamp-protocol](https://datatracker.ietf.org/doc/draft-guy-bary-stamp-protocol/) (4.6) — Defines STAMP protocol for cryptographic delegation and proof in AI agent systems. Provides task-bou
- [draft-drake-email-tpm-attestation](https://datatracker.ietf.org/doc/draft-drake-email-tpm-attestation/) (4.6) — Defines hardware attestation for email using TPM verification chains to prevent spam and provide Syb - [draft-drake-email-tpm-attestation](https://datatracker.ietf.org/doc/draft-drake-email-tpm-attestation/) (4.6) — Defines hardware attestation for email using TPM verification chains to prevent spam and provide Syb
- [draft-williams-netmod-lm-hierarchy-topology](https://datatracker.ietf.org/doc/draft-williams-netmod-lm-hierarchy-topology/) (4.6) — Defines YANG data model for hierarchical language model coordination across tiny, small, and large L - [draft-goswami-agentic-jwt](https://datatracker.ietf.org/doc/draft-goswami-agentic-jwt/) (4.5) — Extends OAuth 2.0 with Agentic JWT to address authorization challenges in autonomous AI systems. Int
### Partially Addressing Ideas ### Partially Addressing Ideas
11 extracted ideas touch on this gap: 9 extracted ideas touch on this gap:
| Idea | Draft | Type | | Idea | Draft | Type |
|------|-------|------| |------|-------|------|
| Agent Card Structure | draft-nandakumar-agent-sd-jwt | protocol | | Distributed AI Accountability Protocol | draft-aylward-daap-v2 | protocol |
| Pseudonymous Key Generation | draft-bradleylundberg-cfrg-arkg | mechanism | | Agentic network architecture for multi-agent coordination | draft-chuyi-nmrg-agentic-network-inference | architecture |
| Privacy-Preserving Human Tokens | draft-dhir-http-agent-profile | mechanism | | Dynamic Task Coordination Requirements for AI Agents | draft-cui-ai-agent-task | requirement |
| Cryptographic Erasure Compliance | draft-gaikwad-aps-profile | mechanism | | Multi-Agent Communication Framework for AIOps | draft-fu-nmop-agent-communication-framework | architecture |
| Privacy-Respecting Capability Attestation | draft-huang-rats-agentic-eat-cap-attest | pattern | | Meta-Layer Coordination Substrate | draft-meta-layer-overview | architecture |
| Differential Privacy for Agent Models | draft-kale-agntcy-federated-privacy | mechanism | | Trinity Configuration for Agent Coordination | draft-takagi-srta-trinity | pattern |
| Agent Identity Preservation | draft-liu-oauth-a2a-profile | pattern | | Internet of Agents Task Protocol for heterogeneous collaboration | draft-yang-dmsc-ioa-task-protocol | protocol |
| Inference-Time Data Access Policy Claims | draft-messous-eat-ai | mechanism | | Task-Oriented Multi-Agent Recovery Framework | draft-yue-anima-agent-recovery-networks | architecture |
*...and 3 more* *...and 1 more*
--- ---
## 9. Agent Firmware/Model Update Security ## 4. Cross-Protocol Agent Migration
| | | | | |
|---|---| |---|---|
| **Severity** | HIGH | | **Severity** | HIGH |
| **Category** | Model serving/inference |
| **Drafts in category** | 42 |
While model serving is addressed in 42 drafts, there's insufficient focus on secure update mechanisms for agent models and firmware. Missing are standards for cryptographically verified, rollback-capable agent updates.
**Evidence:** 42 model serving drafts but no comprehensive security standards for agent software/model updates
### Related Drafts
**Keyword matches** (drafts mentioning gap topic):
- [draft-li-dmsc-macp](https://datatracker.ietf.org/doc/draft-li-dmsc-macp/) (score 4.2) — Multi-agent Collaboration Protocol Suite
- [draft-zheng-agent-identity-management](https://datatracker.ietf.org/doc/draft-zheng-agent-identity-management/) (score 3.7) — Agent Identity Managenment
- [draft-zheng-dispatch-agent-identity-management](https://datatracker.ietf.org/doc/draft-zheng-dispatch-agent-identity-management/) (score 3.3) — Agent Identity Managenment
- [draft-fu-nmop-agent-communication-framework](https://datatracker.ietf.org/doc/draft-fu-nmop-agent-communication-framework/) (score 3.0) — Agent Communication Framework for Network AIOps
- [draft-zyyhl-agent-networks-framework](https://datatracker.ietf.org/doc/draft-zyyhl-agent-networks-framework/) (score 3.6) — Framework for AI Agent Networks
- [draft-ietf-tls-extended-key-update](https://datatracker.ietf.org/doc/draft-ietf-tls-extended-key-update/) (score 4.2) — Extended Key Update for Transport Layer Security (TLS) 1.3
**Top-rated in Model serving/inference** (42 drafts):
- [draft-williams-netmod-lm-hierarchy-topology](https://datatracker.ietf.org/doc/draft-williams-netmod-lm-hierarchy-topology/) (4.6) — Defines YANG data model for hierarchical language model coordination across tiny, small, and large L
- [draft-chang-agent-token-efficient](https://datatracker.ietf.org/doc/draft-chang-agent-token-efficient/) (4.5) — Defines ADOL (Agentic Data Optimization Layer) to address token bloat in agent communication protoco
- [draft-calabria-bmwg-ai-fabric-inference-bench](https://datatracker.ietf.org/doc/draft-calabria-bmwg-ai-fabric-inference-bench/) (4.5) — Defines benchmarking methodology for AI inference network fabrics. Establishes KPIs and test procedu
- [draft-wang-cats-odsi](https://datatracker.ietf.org/doc/draft-wang-cats-odsi/) (4.5) — Specifies framework for decentralized LLM inference across untrusted participants with layer-aware e
- [draft-wmz-nmrg-agent-ndt-arch](https://datatracker.ietf.org/doc/draft-wmz-nmrg-agent-ndt-arch/) (4.2) — Comprehensive architecture combining Network Digital Twin with Agentic AI for intent-based network o
### Partially Addressing Ideas
79 extracted ideas touch on this gap:
| Idea | Draft | Type |
|------|-------|------|
| Multi-layered Security Architecture | draft-aylward-daap-v2 | architecture |
| VERA Zero Trust Reference Architecture | draft-berlinai-vera | architecture |
| Evidence-Based Maturity Runtime | draft-berlinai-vera | mechanism |
| Five Enforcement Pillars with Typed Schemas | draft-berlinai-vera | pattern |
| AI Agent Structured Threat Model | draft-berlinai-vera | requirement |
| Cryptographic Proof-Based Autonomy | draft-berlinai-vera | mechanism |
| Pseudonymous Key Generation | draft-bradleylundberg-cfrg-arkg | mechanism |
| Multi-Agent Security Protection | draft-fu-nmop-agent-communication-framework | mechanism |
*...and 71 more*
---
## 10. Real-time Agent Debugging
| | |
|---|---|
| **Severity** | MEDIUM |
| **Category** | Other AI/agent |
| **Drafts in category** | 26 |
Missing standardized protocols for debugging autonomous agents in production environments. When agents make unexpected decisions, there are no standard interfaces for real-time introspection without disrupting operations.
**Evidence:** 26 other AI/agent drafts suggest various approaches but no standardized debugging protocols for production agents
### Related Drafts
**Keyword matches** (drafts mentioning gap topic):
- [draft-an-nmrg-i2icf-cits](https://datatracker.ietf.org/doc/draft-an-nmrg-i2icf-cits/) (score 3.7) — Interface to In-Network Computing Functions for Cooperative Intelligent Transpor
- [draft-zhao-detnet-enhanced-use-cases](https://datatracker.ietf.org/doc/draft-zhao-detnet-enhanced-use-cases/) (score 3.2) — Enhanced Use Cases for Scaling Deterministic Networks
- [draft-zhang-rvp-problem-statement](https://datatracker.ietf.org/doc/draft-zhang-rvp-problem-statement/) (score 3.5) — Problem Statements and Requirements of Real-Virtual Agent Protocol (RVP): Commun
- [draft-yuan-rtgwg-traffic-agent-usecase](https://datatracker.ietf.org/doc/draft-yuan-rtgwg-traffic-agent-usecase/) (score 3.7) — Use cases of the AI Network Traffic Optimization Agent
- [draft-hong-nmrg-agenticai-ps](https://datatracker.ietf.org/doc/draft-hong-nmrg-agenticai-ps/) (score 3.0) — Motivations and Problem Statement of Agentic AI for network management
- [draft-li-dmsc-macp](https://datatracker.ietf.org/doc/draft-li-dmsc-macp/) (score 4.2) — Multi-agent Collaboration Protocol Suite
**Top-rated in Other AI/agent** (26 drafts):
- [draft-calabria-bmwg-ai-fabric-inference-bench](https://datatracker.ietf.org/doc/draft-calabria-bmwg-ai-fabric-inference-bench/) (4.5) — Defines benchmarking methodology for AI inference network fabrics. Establishes KPIs and test procedu
- [draft-ietf-tls-ecdhe-mlkem](https://datatracker.ietf.org/doc/draft-ietf-tls-ecdhe-mlkem/) (4.4) — Defines hybrid post-quantum key agreement mechanisms for TLS 1.3 that combine ML-KEM with traditiona
- [draft-wmz-nmrg-agent-ndt-arch](https://datatracker.ietf.org/doc/draft-wmz-nmrg-agent-ndt-arch/) (4.2) — Comprehensive architecture combining Network Digital Twin with Agentic AI for intent-based network o
- [draft-an-nmrg-i2icf-cits](https://datatracker.ietf.org/doc/draft-an-nmrg-i2icf-cits/) (3.7) — Defines framework for orchestrating In-Network Computing Functions in Cooperative Intelligent Transp
- [draft-cui-nmrg-auto-test](https://datatracker.ietf.org/doc/draft-cui-nmrg-auto-test/) (3.6) — Framework for AI-assisted network protocol testing using LLMs and automated test generation. Defines
### Partially Addressing Ideas
23 extracted ideas touch on this gap:
| Idea | Draft | Type |
|------|-------|------|
| A2A Protocol Transport over MOQT | draft-a2a-moqt-transport | protocol |
| QUIC-based Publish/Subscribe for AI Agents | draft-a2a-moqt-transport | mechanism |
| Streaming Capabilities Integration | draft-a2a-moqt-transport | mechanism |
| Action-Based Authorization | draft-aylward-aiga-2 | mechanism |
| Multi-layered Security Architecture | draft-aylward-daap-v2 | architecture |
| Behavioral Monitoring Framework | draft-aylward-daap-v2 | mechanism |
| Context-Aware Task Scheduling | draft-cui-ai-agent-task | mechanism |
| Real-Time Task Adaptability | draft-cui-ai-agent-task | requirement |
*...and 15 more*
---
## 11. Cross-Protocol Agent Migration
| | |
|---|---|
| **Severity** | MEDIUM |
| **Category** | A2A protocols | | **Category** | A2A protocols |
| **Drafts in category** | 120 | | **Drafts in category** | 150 |
No standardized mechanisms for migrating agent state and context when moving between different A2A protocols or infrastructure providers. This creates vendor lock-in and limits agent mobility. While A2A protocols exist, there's no standardized mechanism for agents to migrate between different protocol frameworks or service providers while maintaining state and identity. This creates vendor lock-in and limits agent portability across heterogeneous systems.
**Evidence:** 120 A2A protocol drafts with high overlap suggest competing approaches but no migration standards between them **Evidence:** 150 A2A protocol drafts with high overlap suggest fragmentation without migration solutions
### Related Drafts ### Related Drafts
@@ -529,75 +180,356 @@ No standardized mechanisms for migrating agent state and context when moving bet
- [draft-narajala-ans](https://datatracker.ietf.org/doc/draft-narajala-ans/) (score 4.2) — Agent Name Service (ANS): A Universal Directory for Secure AI Agent Discovery an - [draft-narajala-ans](https://datatracker.ietf.org/doc/draft-narajala-ans/) (score 4.2) — Agent Name Service (ANS): A Universal Directory for Secure AI Agent Discovery an
- [draft-ietf-emu-eap-edhoc](https://datatracker.ietf.org/doc/draft-ietf-emu-eap-edhoc/) (score 3.2) — Using the Extensible Authentication Protocol (EAP) with Ephemeral Diffie-Hellman - [draft-ietf-emu-eap-edhoc](https://datatracker.ietf.org/doc/draft-ietf-emu-eap-edhoc/) (score 3.2) — Using the Extensible Authentication Protocol (EAP) with Ephemeral Diffie-Hellman
- [draft-howe-sipcore-mcp-extension](https://datatracker.ietf.org/doc/draft-howe-sipcore-mcp-extension/) (score 3.7) — SIP Extension for Model Context Protocol (MCP) - [draft-howe-sipcore-mcp-extension](https://datatracker.ietf.org/doc/draft-howe-sipcore-mcp-extension/) (score 3.7) — SIP Extension for Model Context Protocol (MCP)
- [draft-li-dmsc-macp](https://datatracker.ietf.org/doc/draft-li-dmsc-macp/) (score 4.2) — Multi-agent Collaboration Protocol Suite - [draft-zheng-agent-identity-management](https://datatracker.ietf.org/doc/draft-zheng-agent-identity-management/) (score 3.7) — Agent Identity Managenment
**Top-rated in A2A protocols** (120 drafts): **Top-rated in A2A protocols** (150 drafts):
- [draft-guy-bary-stamp-protocol](https://datatracker.ietf.org/doc/draft-guy-bary-stamp-protocol/) (4.6) — Defines STAMP protocol for cryptographic delegation and proof in AI agent systems. Provides task-bou - [draft-guy-bary-stamp-protocol](https://datatracker.ietf.org/doc/draft-guy-bary-stamp-protocol/) (4.6) — Defines STAMP protocol for cryptographic delegation and proof in AI agent systems. Provides task-bou
- [draft-williams-netmod-lm-hierarchy-topology](https://datatracker.ietf.org/doc/draft-williams-netmod-lm-hierarchy-topology/) (4.6) — Defines YANG data model for hierarchical language model coordination across tiny, small, and large L - [draft-williams-netmod-lm-hierarchy-topology](https://datatracker.ietf.org/doc/draft-williams-netmod-lm-hierarchy-topology/) (4.6) — Defines YANG data model for hierarchical language model coordination across tiny, small, and large L
- [draft-ietf-lake-edhoc](https://datatracker.ietf.org/doc/draft-ietf-lake-edhoc/) (4.6) — Specifies EDHOC, a compact authenticated Diffie-Hellman key exchange protocol for constrained enviro
- [draft-chang-agent-token-efficient](https://datatracker.ietf.org/doc/draft-chang-agent-token-efficient/) (4.5) — Defines ADOL (Agentic Data Optimization Layer) to address token bloat in agent communication protoco - [draft-chang-agent-token-efficient](https://datatracker.ietf.org/doc/draft-chang-agent-token-efficient/) (4.5) — Defines ADOL (Agentic Data Optimization Layer) to address token bloat in agent communication protoco
- [draft-chen-oauth-rar-agent-extensions](https://datatracker.ietf.org/doc/draft-chen-oauth-rar-agent-extensions/) (4.2) — Extends OAuth RAR with policy_context and lifecycle_binding members for AI agent environments. Enabl - [draft-chen-oauth-rar-agent-extensions](https://datatracker.ietf.org/doc/draft-chen-oauth-rar-agent-extensions/) (4.2) — Extends OAuth RAR with policy_context and lifecycle_binding members for AI agent environments. Enabl
- [draft-mallick-muacp](https://datatracker.ietf.org/doc/draft-mallick-muacp/) (4.2) — Resource-efficient messaging protocol specifically designed for constrained IoT/Edge devices with de
### Partially Addressing Ideas ### Partially Addressing Ideas
3 extracted ideas touch on this gap: No directly related technical ideas found in current drafts — this gap is entirely unaddressed.
| Idea | Draft | Type |
|------|-------|------|
| Transport-Independent Attestation Format | draft-drake-email-tpm-attestation | extension |
| Cross-Protocol Integration Pattern | draft-rosenberg-aiproto-cheq | pattern |
| Agent Mobility with IPv6 MIPv6 | draft-yc-ipv6-for-ioa | mechanism |
--- ---
## 12. Agent Energy Consumption Optimization ## 5. Agent Resource Accounting and Billing
| | | | | |
|---|---| |---|---|
| **Severity** | MEDIUM | | **Severity** | HIGH |
| **Category** | ML traffic mgmt | | **Category** | new |
| **Drafts in category** | 73 | | **Drafts in category** | 0 |
Missing standards for energy-aware agent deployment and operation. As AI workloads are energy-intensive, there's no framework for agents to optimize their energy consumption or for infrastructure to enforce energy budgets. No standardized protocols exist for tracking and billing computational resources consumed by autonomous agents across distributed systems. This is essential for commercial deployment but completely unaddressed.
**Evidence:** 73 ML traffic management drafts focus on performance but lack energy consumption considerations for sustainable AI deployment **Evidence:** High focus on protocols and deployment but zero drafts addressing economic models
### Related Drafts ### Related Drafts
**Keyword matches** (drafts mentioning gap topic): **Keyword matches** (drafts mentioning gap topic):
- [draft-li-dmsc-macp](https://datatracker.ietf.org/doc/draft-li-dmsc-macp/) (score 4.2) — Multi-agent Collaboration Protocol Suite
- [draft-zheng-agent-identity-management](https://datatracker.ietf.org/doc/draft-zheng-agent-identity-management/) (score 3.7) — Agent Identity Managenment - [draft-zheng-agent-identity-management](https://datatracker.ietf.org/doc/draft-zheng-agent-identity-management/) (score 3.7) — Agent Identity Managenment
- [draft-li-dmsc-macp](https://datatracker.ietf.org/doc/draft-li-dmsc-macp/) (score 4.2) — Multi-agent Collaboration Protocol Suite
- [draft-zheng-dispatch-agent-identity-management](https://datatracker.ietf.org/doc/draft-zheng-dispatch-agent-identity-management/) (score 3.3) — Agent Identity Managenment - [draft-zheng-dispatch-agent-identity-management](https://datatracker.ietf.org/doc/draft-zheng-dispatch-agent-identity-management/) (score 3.3) — Agent Identity Managenment
- [draft-fu-nmop-agent-communication-framework](https://datatracker.ietf.org/doc/draft-fu-nmop-agent-communication-framework/) (score 3.0) — Agent Communication Framework for Network AIOps - [draft-fu-nmop-agent-communication-framework](https://datatracker.ietf.org/doc/draft-fu-nmop-agent-communication-framework/) (score 3.0) — Agent Communication Framework for Network AIOps
- [draft-zyyhl-agent-networks-framework](https://datatracker.ietf.org/doc/draft-zyyhl-agent-networks-framework/) (score 3.6) — Framework for AI Agent Networks - [draft-zyyhl-agent-networks-framework](https://datatracker.ietf.org/doc/draft-zyyhl-agent-networks-framework/) (score 3.6) — Framework for AI Agent Networks
- [draft-ahc-green-smartpdu-yang](https://datatracker.ietf.org/doc/draft-ahc-green-smartpdu-yang/) (score 2.9) — A YANG Model for SmartPDU Monitoring and Control - [draft-jia-oauth-scope-aggregation](https://datatracker.ietf.org/doc/draft-jia-oauth-scope-aggregation/) (score 3.5) — OAuth 2.0 Scope Aggregation for Multi-Step AI Agent Workflows
**Top-rated in ML traffic mgmt** (73 drafts):
- [draft-calabria-bmwg-ai-fabric-inference-bench](https://datatracker.ietf.org/doc/draft-calabria-bmwg-ai-fabric-inference-bench/) (4.5) — Defines benchmarking methodology for AI inference network fabrics. Establishes KPIs and test procedu
- [draft-dhir-http-agent-profile](https://datatracker.ietf.org/doc/draft-dhir-http-agent-profile/) (4.2) — Defines HTTP Agent Profile for authenticating agent traffic, separating human from agent traffic, an
- [draft-calabria-bmwg-ai-fabric-terminology](https://datatracker.ietf.org/doc/draft-calabria-bmwg-ai-fabric-terminology/) (4.2) — Defines comprehensive benchmarking terminology for AI network fabrics including collective communica
- [draft-li-spring-rdma-multicast-over-srv6](https://datatracker.ietf.org/doc/draft-li-spring-rdma-multicast-over-srv6/) (4.2) — Specifies SRv6 extensions for RDMA multicast delivery with new End.MT behavior and ACK/NACK aggregat
- [draft-song-tsvwg-camp](https://datatracker.ietf.org/doc/draft-song-tsvwg-camp/) (4.2) — Proposes CAMP, a multipath transport protocol for interactive multimodal LLM systems that maintains
### Partially Addressing Ideas ### Partially Addressing Ideas
17 extracted ideas touch on this gap: 8 extracted ideas touch on this gap:
| Idea | Draft | Type | | Idea | Draft | Type |
|------|-------|------| |------|-------|------|
| SmartPDU Telemetry Framework | draft-ahc-green-smartpdu-yang | mechanism | | SCIM 2.0 Extension for Agents and Agentic Applications | draft-abbey-scim-agent-extension | extension |
| Agent Context Distribution | draft-chang-agent-context-interaction | mechanism | | Events Query Protocol | draft-gupta-httpapi-events-query | protocol |
| Context Distribution Optimization Procedures | draft-chang-agent-context-interaction | protocol | | Micro Agent Communication Protocol (µACP) | draft-mallick-muacp | protocol |
| Schema Deduplication via JSON References | draft-chang-agent-token-efficient | mechanism | | MOQT Binding for A2A and MCP Protocols | draft-nandakumar-ai-agent-moq-transport | extension |
| Agentic Data Optimization Layer (ADOL) | draft-chang-agent-token-efficient | architecture | | SCIM 2.0 Agent Extension | draft-scim-agent-extension | extension |
| Information Exchange Efficiency | draft-chuyi-nmrg-agentic-network-inference | mechanism | | Authorized Connection Policy Framework | draft-steckbeck-ua-conn-sec | mechanism |
| Vector Index Workload Optimization | draft-gaikwad-aps-profile | pattern | | Agent Workflow Protocol Well-Known Resource | draft-vinaysingh-awp-wellknown | extension |
| Collaboration Tunnel Protocol (TCT) | draft-jurkovikj-collab-tunnel | protocol | | AI Network Traffic Optimization Agent | draft-yuan-rtgwg-traffic-agent-usecase | architecture |
*...and 9 more* ---
## 6. Agent Capability Advertisement Verification
| | |
|---|---|
| **Severity** | HIGH |
| **Category** | Agent discovery/reg |
| **Drafts in category** | 87 |
While agent discovery protocols exist, there's no way to cryptographically verify that advertised agent capabilities are accurate. Agents could falsely claim capabilities leading to system failures.
**Evidence:** 87 discovery drafts but no mention of capability verification mechanisms
### Related Drafts
**Keyword matches** (drafts mentioning gap topic):
- [draft-zheng-agent-identity-management](https://datatracker.ietf.org/doc/draft-zheng-agent-identity-management/) (score 3.7) — Agent Identity Managenment
- [draft-li-dmsc-macp](https://datatracker.ietf.org/doc/draft-li-dmsc-macp/) (score 4.2) — Multi-agent Collaboration Protocol Suite
- [draft-zheng-dispatch-agent-identity-management](https://datatracker.ietf.org/doc/draft-zheng-dispatch-agent-identity-management/) (score 3.3) — Agent Identity Managenment
- [draft-fu-nmop-agent-communication-framework](https://datatracker.ietf.org/doc/draft-fu-nmop-agent-communication-framework/) (score 3.0) — Agent Communication Framework for Network AIOps
- [draft-zyyhl-agent-networks-framework](https://datatracker.ietf.org/doc/draft-zyyhl-agent-networks-framework/) (score 3.6) — Framework for AI Agent Networks
- [draft-li-dmsc-inf-architecture](https://datatracker.ietf.org/doc/draft-li-dmsc-inf-architecture/) (score 3.1) — Dynamic Multi-agent Secured Collaboration Infrastructure Architecture
**Top-rated in Agent discovery/reg** (87 drafts):
- [draft-narajala-ans](https://datatracker.ietf.org/doc/draft-narajala-ans/) (4.2) — Introduces Agent Name Service (ANS) as a DNS-based universal directory for AI agent discovery and ve
- [draft-li-dmsc-macp](https://datatracker.ietf.org/doc/draft-li-dmsc-macp/) (4.2) — Specifies a comprehensive multi-agent collaboration protocol suite using Agent Gateways for registra
- [draft-cui-dns-native-agent-naming-resolution](https://datatracker.ietf.org/doc/draft-cui-dns-native-agent-naming-resolution/) (4.1) — Specifies DNS-native naming and resolution for AI agents using FQDNs and SVCB records. Emphasizes DN
- [draft-nederveld-adl](https://datatracker.ietf.org/doc/draft-nederveld-adl/) (4.1) — Defines ADL, a JSON-based standard for describing AI agents including their capabilities, tools, per
- [draft-rosenberg-ai-protocols](https://datatracker.ietf.org/doc/draft-rosenberg-ai-protocols/) (4.1) — Establishes framework for AI agent communications on the Internet, surveying existing protocols like
### Partially Addressing Ideas
25 extracted ideas touch on this gap:
| Idea | Draft | Type |
|------|-------|------|
| DNS-based AI Agent Discovery | draft-mozleywilliams-dnsop-bandaid | mechanism |
| DNS namespace for AI agent discovery | draft-mozleywilliams-dnsop-dnsaid | mechanism |
| Agent Registration and Discovery Protocol | draft-pioli-agent-discovery | protocol |
| Intent-based Agent Interconnection Protocol | draft-sun-zhang-iaip | protocol |
| Capability Advertisement and Intent Resolution | draft-sz-dmsc-iaip | mechanism |
| Intelligent Agent Communication Gateway Architecture | draft-agent-gw | architecture |
| AI-Native Network Protocol (AINP) | draft-ainp-protocol | protocol |
| Distributed AI Accountability Protocol | draft-aylward-daap-v2 | protocol |
*...and 17 more*
---
## 7. Cross-Domain Agent Communication Security
| | |
|---|---|
| **Severity** | HIGH |
| **Category** | Agent identity/auth |
| **Drafts in category** | 145 |
Current identity/auth solutions don't address secure communication between agents operating in different security domains or trust boundaries. Critical for enterprise and government deployments.
**Evidence:** 145 identity drafts show awareness but cross-domain scenarios appear unaddressed
### Related Drafts
**Keyword matches** (drafts mentioning gap topic):
- [draft-diaconu-agents-authz-info-sharing](https://datatracker.ietf.org/doc/draft-diaconu-agents-authz-info-sharing/) (score 3.2) — Cross-Domain AuthZ Information sharing for Agents
- [draft-cui-dmsc-agent-cdi](https://datatracker.ietf.org/doc/draft-cui-dmsc-agent-cdi/) (score 3.0) — Cross-Domain Interoperability Framework for AI Agent Collaboration
- [draft-han-rtgwg-agent-gateway-intercomm-framework](https://datatracker.ietf.org/doc/draft-han-rtgwg-agent-gateway-intercomm-framework/) (score 3.6) — Agent Gateway Intercommunication Framework
- [draft-ni-a2a-ai-agent-security-requirements](https://datatracker.ietf.org/doc/draft-ni-a2a-ai-agent-security-requirements/) (score 3.7) — Security Requirements for AI Agents
- [draft-intellinode-ai-semantic-contract](https://datatracker.ietf.org/doc/draft-intellinode-ai-semantic-contract/) (score 3.2) — Semantic-Driven Traffic Shaping Contract for AI Networks
- [draft-zheng-agent-identity-management](https://datatracker.ietf.org/doc/draft-zheng-agent-identity-management/) (score 3.7) — Agent Identity Managenment
**Top-rated in Agent identity/auth** (145 drafts):
- [draft-cowles-volt](https://datatracker.ietf.org/doc/draft-cowles-volt/) (4.8) — Defines tamper-evident execution trace format for AI agent workflows using hash chains and cryptogra
- [draft-aylward-daap-v2](https://datatracker.ietf.org/doc/draft-aylward-daap-v2/) (4.8) — Defines comprehensive protocol for AI agent accountability including authentication, monitoring, and
- [draft-guy-bary-stamp-protocol](https://datatracker.ietf.org/doc/draft-guy-bary-stamp-protocol/) (4.6) — Defines STAMP protocol for cryptographic delegation and proof in AI agent systems. Provides task-bou
- [draft-drake-email-tpm-attestation](https://datatracker.ietf.org/doc/draft-drake-email-tpm-attestation/) (4.6) — Defines hardware attestation for email using TPM verification chains to prevent spam and provide Syb
- [draft-williams-netmod-lm-hierarchy-topology](https://datatracker.ietf.org/doc/draft-williams-netmod-lm-hierarchy-topology/) (4.6) — Defines YANG data model for hierarchical language model coordination across tiny, small, and large L
### Partially Addressing Ideas
46 extracted ideas touch on this gap:
| Idea | Draft | Type |
|------|-------|------|
| Centralized Gateway for Multi-Agent Communication | draft-song-dmsc-problem-statement | architecture |
| Multi-Tenant Policy Enforcement Infrastructure | draft-song-dmsc-problem-statement | architecture |
| Intelligent Agent Communication Gateway Architecture | draft-agent-gw | architecture |
| AI-Native Network Protocol (AINP) | draft-ainp-protocol | protocol |
| Agent-to-Agent Communication in Transportation Networks | draft-an-nmrg-i2icf-cits | pattern |
| Zero Trust Runtime Agent Architecture | draft-berlinai-vera | architecture |
| Agentic Data Optimization Layer (ADOL) | draft-chang-agent-token-efficient | protocol |
| Agentic network architecture for multi-agent coordination | draft-chuyi-nmrg-agentic-network-inference | architecture |
*...and 38 more*
---
## 8. Agent Performance Degradation Detection
| | |
|---|---|
| **Severity** | HIGH |
| **Category** | new |
| **Drafts in category** | 0 |
No standardized protocols exist for detecting when AI agents are experiencing model drift, adversarial attacks, or performance degradation. Essential for maintaining autonomous system reliability.
**Evidence:** ML traffic management exists but not agent health monitoring standards
### Related Drafts
**Keyword matches** (drafts mentioning gap topic):
- [draft-zheng-agent-identity-management](https://datatracker.ietf.org/doc/draft-zheng-agent-identity-management/) (score 3.7) — Agent Identity Managenment
- [draft-li-dmsc-macp](https://datatracker.ietf.org/doc/draft-li-dmsc-macp/) (score 4.2) — Multi-agent Collaboration Protocol Suite
- [draft-zheng-dispatch-agent-identity-management](https://datatracker.ietf.org/doc/draft-zheng-dispatch-agent-identity-management/) (score 3.3) — Agent Identity Managenment
- [draft-fu-nmop-agent-communication-framework](https://datatracker.ietf.org/doc/draft-fu-nmop-agent-communication-framework/) (score 3.0) — Agent Communication Framework for Network AIOps
- [draft-zyyhl-agent-networks-framework](https://datatracker.ietf.org/doc/draft-zyyhl-agent-networks-framework/) (score 3.6) — Framework for AI Agent Networks
- [draft-xiong-rtgwg-use-cases-hp-wan](https://datatracker.ietf.org/doc/draft-xiong-rtgwg-use-cases-hp-wan/) (score 2.6) — Use Cases for High-performance Wide Area Network
### Partially Addressing Ideas
5 extracted ideas touch on this gap:
| Idea | Draft | Type |
|------|-------|------|
| Virtual In-Cloud Router as IPv6 Enhancement Agent | draft-he-yi-srv6ops-ipv6-enhancemnet-in-cloud-uc | architecture |
| 6G Agent Protocol Requirements and Enabling Technologies | draft-hw-ai-agent-6g | requirement |
| Comparative analysis of messaging protocols for agentic AI | draft-mpsb-agntcy-messaging | pattern |
| AI Network Security Agent | draft-yuan-rtgwg-security-agent-usecase | architecture |
| Task-Oriented Multi-Agent Recovery Framework | draft-yue-anima-agent-recovery-networks | architecture |
---
## 9. Legal Liability Attribution Protocols
| | |
|---|---|
| **Severity** | HIGH |
| **Category** | Policy/governance |
| **Drafts in category** | 115 |
Missing technical protocols for creating audit trails that can determine legal liability when autonomous agents cause harm. Governance drafts exist but not technical accountability mechanisms.
**Evidence:** 115 governance drafts but legal technology gap for liability attribution
### Related Drafts
**Keyword matches** (drafts mentioning gap topic):
- [draft-madhavan-aipref-displaybasedpref](https://datatracker.ietf.org/doc/draft-madhavan-aipref-displaybasedpref/) (score 2.5) — A Vocabulary for Controlling Usage of Content Collected by Search and AI Crawler
- [draft-farzdusa-aipref-enduser](https://datatracker.ietf.org/doc/draft-farzdusa-aipref-enduser/) (score 3.8) — AI Preferences Signaling: End User Impact
- [draft-kotecha-agentic-dispute-protocol](https://datatracker.ietf.org/doc/draft-kotecha-agentic-dispute-protocol/) (score 3.6) — Agentic Dispute Protocol
- [draft-cui-dmsc-agent-cdi](https://datatracker.ietf.org/doc/draft-cui-dmsc-agent-cdi/) (score 3.0) — Cross-Domain Interoperability Framework for AI Agent Collaboration
- [draft-ietf-aipref-vocab](https://datatracker.ietf.org/doc/draft-ietf-aipref-vocab/) (score 4.4) — A Vocabulary For Expressing AI Usage Preferences
- [draft-aylward-aiga-1](https://datatracker.ietf.org/doc/draft-aylward-aiga-1/) (score 4.2) — AI Governance and Accountability Protocol (AIGA)
**Top-rated in Policy/governance** (115 drafts):
- [draft-cowles-volt](https://datatracker.ietf.org/doc/draft-cowles-volt/) (4.8) — Defines tamper-evident execution trace format for AI agent workflows using hash chains and cryptogra
- [draft-aylward-daap-v2](https://datatracker.ietf.org/doc/draft-aylward-daap-v2/) (4.8) — Defines comprehensive protocol for AI agent accountability including authentication, monitoring, and
- [draft-goswami-agentic-jwt](https://datatracker.ietf.org/doc/draft-goswami-agentic-jwt/) (4.5) — Extends OAuth 2.0 with Agentic JWT to address authorization challenges in autonomous AI systems. Int
- [draft-wang-cats-odsi](https://datatracker.ietf.org/doc/draft-wang-cats-odsi/) (4.5) — Specifies framework for decentralized LLM inference across untrusted participants with layer-aware e
- [draft-birkholz-verifiable-agent-conversations](https://datatracker.ietf.org/doc/draft-birkholz-verifiable-agent-conversations/) (4.5) — Defines CDDL-based data format for verifiable agent conversation records using COSE signing. Support
### Partially Addressing Ideas
No directly related technical ideas found in current drafts — this gap is entirely unaddressed.
---
## 10. Agent Memory and State Persistence Standards
| | |
|---|---|
| **Severity** | MEDIUM |
| **Category** | Data formats/interop |
| **Drafts in category** | 165 |
No standardized formats or protocols exist for how agents should persist long-term memory, experience, and learned behaviors across system restarts or migrations. Each implementation creates proprietary solutions.
**Evidence:** 165 data format drafts focus on communication but not persistent state management
### Related Drafts
**Keyword matches** (drafts mentioning gap topic):
- [draft-zheng-agent-identity-management](https://datatracker.ietf.org/doc/draft-zheng-agent-identity-management/) (score 3.7) — Agent Identity Managenment
- [draft-li-dmsc-macp](https://datatracker.ietf.org/doc/draft-li-dmsc-macp/) (score 4.2) — Multi-agent Collaboration Protocol Suite
- [draft-zheng-dispatch-agent-identity-management](https://datatracker.ietf.org/doc/draft-zheng-dispatch-agent-identity-management/) (score 3.3) — Agent Identity Managenment
- [draft-fu-nmop-agent-communication-framework](https://datatracker.ietf.org/doc/draft-fu-nmop-agent-communication-framework/) (score 3.0) — Agent Communication Framework for Network AIOps
- [draft-zyyhl-agent-networks-framework](https://datatracker.ietf.org/doc/draft-zyyhl-agent-networks-framework/) (score 3.6) — Framework for AI Agent Networks
- [draft-gaikwad-llm-benchmarking-terminology](https://datatracker.ietf.org/doc/draft-gaikwad-llm-benchmarking-terminology/) (score 2.7) — Benchmarking Terminology for Large Language Model Serving
**Top-rated in Data formats/interop** (165 drafts):
- [draft-cowles-volt](https://datatracker.ietf.org/doc/draft-cowles-volt/) (4.8) — Defines tamper-evident execution trace format for AI agent workflows using hash chains and cryptogra
- [draft-williams-netmod-lm-hierarchy-topology](https://datatracker.ietf.org/doc/draft-williams-netmod-lm-hierarchy-topology/) (4.6) — Defines YANG data model for hierarchical language model coordination across tiny, small, and large L
- [draft-ietf-lake-app-profiles](https://datatracker.ietf.org/doc/draft-ietf-lake-app-profiles/) (4.6) — Defines canonical CBOR representation for EDHOC application profiles and coordination mechanisms for
- [draft-chang-agent-token-efficient](https://datatracker.ietf.org/doc/draft-chang-agent-token-efficient/) (4.5) — Defines ADOL (Agentic Data Optimization Layer) to address token bloat in agent communication protoco
- [draft-birkholz-verifiable-agent-conversations](https://datatracker.ietf.org/doc/draft-birkholz-verifiable-agent-conversations/) (4.5) — Defines CDDL-based data format for verifiable agent conversation records using COSE signing. Support
### Partially Addressing Ideas
16 extracted ideas touch on this gap:
| Idea | Draft | Type |
|------|-------|------|
| Compliance-oriented agent memory model | draft-gaikwad-aps-profile | pattern |
| Zero Trust Interoperability Framework | draft-liu-saag-zt-problem-statement | requirement |
| Intelligent Agent Communication Gateway Architecture | draft-agent-gw | architecture |
| Zero Trust Runtime Agent Architecture | draft-berlinai-vera | architecture |
| Agentic Hypercall Protocol | draft-campbell-agentic-http | pattern |
| Agent Persistent State Profile | draft-gaikwad-aps-profile | architecture |
| Agentic AI for Autonomous Network Management | draft-hong-nmrg-agenticai-ps | requirement |
| LISP-based geospatial intelligence network | draft-ietf-lisp-nexagon | protocol |
*...and 8 more*
---
## 11. Agent-to-Human Escalation Standards
| | |
|---|---|
| **Severity** | MEDIUM |
| **Category** | Human-agent interaction |
| **Drafts in category** | 41 |
While human-in-the-loop protocols exist, there's no standardized framework for when and how agents should escalate decisions to humans based on uncertainty, risk, or ethical considerations.
**Evidence:** Only 41 human-agent interaction drafts versus complex autonomous systems requiring escalation
### Related Drafts
**Keyword matches** (drafts mentioning gap topic):
- [draft-williams-netmod-lm-hierarchy-topology](https://datatracker.ietf.org/doc/draft-williams-netmod-lm-hierarchy-topology/) (score 4.6) — Hierarchical Topology for Language Model Coordination
- [draft-ietf-websec-mime-sniff](https://datatracker.ietf.org/doc/draft-ietf-websec-mime-sniff/) (score 3.7) — Media Type Sniffing
- [draft-scrm-aiproto-usecases](https://datatracker.ietf.org/doc/draft-scrm-aiproto-usecases/) (score 4.1) — Agentic AI Use Cases
- [draft-zeng-opsawg-llm-netconf-gap](https://datatracker.ietf.org/doc/draft-zeng-opsawg-llm-netconf-gap/) (score 3.9) — Gap Analysis of Network Configuration Protocols in LLM-Driven Intent-Based Netwo
- [draft-jadoon-nmrg-agentic-ai-autonomous-networks](https://datatracker.ietf.org/doc/draft-jadoon-nmrg-agentic-ai-autonomous-networks/) (score 4.1) — Agentic AI Architectural Principles for Autonomous Computer Networks
**Top-rated in Human-agent interaction** (41 drafts):
- [draft-drake-email-tpm-attestation](https://datatracker.ietf.org/doc/draft-drake-email-tpm-attestation/) (4.6) — Defines hardware attestation for email using TPM verification chains to prevent spam and provide Syb
- [draft-ietf-aipref-vocab](https://datatracker.ietf.org/doc/draft-ietf-aipref-vocab/) (4.4) — Defines a standardized vocabulary for expressing preferences about how digital assets should be used
- [draft-dhir-http-agent-profile](https://datatracker.ietf.org/doc/draft-dhir-http-agent-profile/) (4.2) — Defines HTTP Agent Profile for authenticating agent traffic, separating human from agent traffic, an
- [draft-song-tsvwg-camp](https://datatracker.ietf.org/doc/draft-song-tsvwg-camp/) (4.2) — Proposes CAMP, a multipath transport protocol for interactive multimodal LLM systems that maintains
- [draft-liu-agent-operation-authorization](https://datatracker.ietf.org/doc/draft-liu-agent-operation-authorization/) (4.1) — Specifies framework for verifiable delegation of actions from humans to AI agents using JWT tokens.
### Partially Addressing Ideas
No directly related technical ideas found in current drafts — this gap is entirely unaddressed.
---
## 12. Federated Agent Learning Privacy
| | |
|---|---|
| **Severity** | MEDIUM |
| **Category** | new |
| **Drafts in category** | 0 |
Federated AI operations models exist but lack privacy-preserving protocols for agents learning from shared experiences without exposing sensitive data from individual deployments.
**Evidence:** Federated models mentioned but privacy-preserving learning protocols absent
### Related Drafts
**Keyword matches** (drafts mentioning gap topic):
- [draft-kale-agntcy-federated-privacy](https://datatracker.ietf.org/doc/draft-kale-agntcy-federated-privacy/) (score 3.2) — Privacy-Preserving Federated Learning Architecture for Multi-Tenant AI Agent Sys
- [draft-cui-dmsc-agent-cdi](https://datatracker.ietf.org/doc/draft-cui-dmsc-agent-cdi/) (score 3.0) — Cross-Domain Interoperability Framework for AI Agent Collaboration
- [draft-ai-traffic](https://datatracker.ietf.org/doc/draft-ai-traffic/) (score 2.5) — Handling inter-DC/Edge AI-related network traffic: Problem statement
- [draft-aft-ai-traffic](https://datatracker.ietf.org/doc/draft-aft-ai-traffic/) (score 3.1) — Handling inter-DC/Edge AI-related network traffic: Problem statement
- [draft-aylward-aiga-1](https://datatracker.ietf.org/doc/draft-aylward-aiga-1/) (score 4.2) — AI Governance and Accountability Protocol (AIGA)
- [draft-zheng-agent-identity-management](https://datatracker.ietf.org/doc/draft-zheng-agent-identity-management/) (score 3.7) — Agent Identity Managenment
### Partially Addressing Ideas
5 extracted ideas touch on this gap:
| Idea | Draft | Type |
|------|-------|------|
| Privacy-Preserving Federated Learning for Multi-Tenant AI Agents | draft-kale-agntcy-federated-privacy | architecture |
| Cross-Domain Agent Interoperability Framework | draft-cui-dmsc-agent-cdi | architecture |
| HTTP Agent Profile (HAP) | draft-dhir-http-agent-profile | protocol |
| AI Network Security Agent | draft-yuan-rtgwg-security-agent-usecase | architecture |
| AI Network Traffic Optimization Agent | draft-yuan-rtgwg-traffic-agent-usecase | architecture |
--- ---
@@ -607,16 +539,14 @@ Missing standards for energy-aware agent deployment and operation. As AI workloa
| Category | Drafts | Gaps | Gap Topics | | Category | Drafts | Gaps | Gap Topics |
|----------|-------:|-----:|------------| |----------|-------:|-----:|------------|
| a2a protocols | 120 | 2 | Multi-Agent Coordination Deadlocks; Cross-Protocol Agent Migration | | a2a protocols | 150 | 2 | Multi-Agent Consensus Under Byzantine Conditions; Cross-Protocol Agent Migration |
| agent identity/auth | 108 | 1 | Agent Privacy Preservation | | agent discovery/reg | 87 | 1 | Agent Capability Advertisement Verification |
| ai safety/alignment | 44 | 2 | Agent Behavior Verification; Agent Capability Degradation Handling | | agent identity/auth | 145 | 1 | Cross-Domain Agent Communication Security |
| autonomous netops | 93 | 1 | Agent Resource Exhaustion Protection | | ai safety/alignment | 46 | 2 | Real-time Agent Behavior Verification; Emergency Agent Shutdown Coordination |
| data formats/interop | 145 | 1 | Agent-Generated Data Provenance | | data formats/interop | 165 | 1 | Agent Memory and State Persistence Standards |
| human-agent interaction | 30 | 1 | Human Override Protocols | | human-agent interaction | 41 | 1 | Agent-to-Human Escalation Standards |
| ml traffic mgmt | 73 | 1 | Agent Energy Consumption Optimization | | new | 0 | 3 | Agent Resource Accounting and Billing; Agent Performance Degradation Detection; Federated Agent Learning Privacy |
| model serving/inference | 42 | 1 | Agent Firmware/Model Update Security | | policy/governance | 115 | 1 | Legal Liability Attribution Protocols |
| other ai/agent | 26 | 1 | Real-time Agent Debugging |
| policy/governance | 91 | 1 | Cross-Domain Agent Liability |
## Recommendations ## Recommendations

View File

@@ -0,0 +1,656 @@
Internet-Draft AI/Agent WG
Intended status: standards-track March 2026
Expires: September 08, 2026
Multi-Agent Consensus Protocol (MACP) for Distributed AI Agent Coordination
draft-ai-consensus-protocol-00
Abstract
This document defines the Multi-Agent Consensus Protocol (MACP), a
standardized framework for enabling multiple AI agents to reach
consensus on shared decisions and resolve conflicting objectives
in distributed environments. MACP addresses critical coordination
challenges in autonomous systems where agents must collaborate on
resource allocation, policy enforcement, and decision-making
across organizational and domain boundaries. The protocol
incorporates Byzantine fault tolerance mechanisms, cryptographic
verification, and hierarchical consensus structures to ensure
reliable agreement even in the presence of malicious or
malfunctioning agents. MACP defines message formats, consensus
algorithms, conflict resolution procedures, and integration
patterns with existing agent-to-agent communication protocols. The
protocol supports various consensus models including proof-of-
authority, weighted voting, and reputation-based systems, enabling
deployment across diverse use cases from IoT device coordination
to enterprise AI system orchestration. This specification aims to
reduce fragmentation in multi-agent systems and provide a
foundation for interoperable autonomous agent coordination at
scale.
Status of This Memo
This Internet-Draft is submitted in full conformance with the
provisions of BCP 78 and BCP 79.
This document is intended to have standards-track status.
Distribution of this memo is unlimited.
Table of Contents
1. Introduction ................................................ 3
2. Terminology ................................................. 4
3. Problem Statement ........................................... 5
4. MACP Architecture and Core Components ....................... 6
5. Consensus Algorithms and Message Formats .................... 7
6. Conflict Resolution and Decision Binding .................... 8
7. Integration with Existing Agent Protocols ................... 9
8. Security Considerations ..................................... 10
9. IANA Considerations ......................................... 11
1. Introduction
The proliferation of autonomous AI agents across distributed
computing environments has created an urgent need for standardized
consensus mechanisms that enable coordinated decision-making
without centralized control. As organizations deploy increasing
numbers of intelligent agents for tasks ranging from resource
allocation and policy enforcement to complex multi-party
negotiations, the lack of interoperable consensus protocols has
resulted in fragmented implementations that cannot effectively
coordinate across organizational and domain boundaries. Current
agent-to-agent communication protocols, while addressing basic
message exchange and authentication, provide insufficient
mechanisms for achieving reliable agreement among multiple agents
with potentially conflicting objectives or incomplete information.
Existing consensus approaches in multi-agent systems typically
rely on proprietary coordination mechanisms or adapt consensus
algorithms designed for blockchain and distributed database
applications without addressing the unique requirements of AI
agent coordination. These limitations become particularly acute in
scenarios involving Byzantine fault tolerance, where agents may
exhibit malicious behavior, experience partial failures, or
operate under adversarial conditions. The heterogeneous nature of
AI agent implementations, combined with varying trust
relationships and organizational policies, further complicates the
development of effective consensus mechanisms that can operate
reliably at scale.
The Multi-Agent Consensus Protocol (MACP) addresses these
challenges by providing a standardized framework specifically
designed for AI agent coordination that incorporates proven
consensus algorithms while addressing the unique requirements of
autonomous agent systems. MACP supports multiple consensus models
including proof-of-authority, weighted voting based on agent
reputation or capabilities, and hierarchical consensus structures
that reflect organizational boundaries and trust relationships.
The protocol integrates cryptographic verification mechanisms and
Byzantine fault tolerance to ensure reliable consensus achievement
even in the presence of malicious or malfunctioning agents, while
maintaining compatibility with existing agent communication and
attestation frameworks.
The scope of MACP encompasses the definition of consensus
algorithms optimized for AI agent coordination, standardized
message formats for proposal submission and voting processes,
conflict resolution mechanisms for handling competing objectives,
and integration patterns with existing agent-to-agent protocols.
This specification aims to reduce the current fragmentation in
multi-agent coordination approaches by providing a foundation for
interoperable consensus mechanisms that can scale from small IoT
device clusters to enterprise-wide AI system orchestration. By
establishing common protocols for multi-agent consensus, MACP
enables the development of more robust and coordinated autonomous
systems while maintaining the flexibility required for diverse
deployment scenarios and organizational requirements.
2. Terminology
The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL
NOT", "SHOULD", "SHOULD NOT", "RECOMMENDED", "NOT RECOMMENDED",
"MAY", and "OPTIONAL" in this document are to be interpreted as
described in BCP 14 [RFC2119] [RFC8174] when, and only when, they
appear in all capitals, as shown here.
This section establishes terminology specific to multi-agent
consensus operations and distributed AI agent coordination. These
definitions build upon established concepts from distributed
systems literature while introducing terminology specific to
autonomous agent environments. Where applicable, references are
made to related terminology from existing agent communication
protocols such as those defined in [RFC8600] and distributed
consensus literature.
A "consensus agent" is an autonomous AI entity capable of
participating in distributed decision-making processes by
submitting proposals, evaluating options, and committing to
agreed-upon outcomes. Consensus agents MUST maintain state
consistency with other participants and possess cryptographic
capabilities for message authentication and verification. An
"observer agent" is a non-participating entity that monitors
consensus processes without voting rights or decision authority.
A "coordination domain" defines the scope and boundaries within
which a group of agents operate under shared governance rules and
consensus mechanisms. Each coordination domain establishes its own
policies for membership, voting weights, and decision authority. A
"decision quorum" represents the minimum number or weight
threshold of participating agents required to reach a valid
consensus decision within a coordination domain, expressed either
as an absolute count or percentage of eligible participants.
"Byzantine Fault Tolerance" (BFT) refers to the system's ability
to achieve consensus despite the presence of agents that may
exhibit arbitrary failures, including malicious behavior, message
corruption, or incorrect state reporting. MACP implementations
SHOULD support Byzantine fault tolerance for up to one-third of
participating agents within any coordination domain. "Practical
Byzantine Fault Tolerance" (pBFT) describes optimized algorithms
that achieve Byzantine fault tolerance with reduced message
complexity and improved performance characteristics.
"Conflict resolution" encompasses the mechanisms and procedures
used to address competing proposals, resolve deadlocks, and handle
situations where multiple valid decisions could be reached. This
includes tie-breaking algorithms, priority-based selection, and
escalation procedures to higher-level coordination domains.
"Decision binding" refers to the enforcement mechanisms that
ensure participating agents comply with consensus outcomes and
maintain consistency with agreed-upon decisions across the
coordination domain.
3. Problem Statement
The proliferation of autonomous AI agents across distributed
systems has created an urgent need for standardized consensus
mechanisms that can coordinate decision-making without centralized
control. Current multi-agent deployments frequently encounter
scenarios where agents must collectively agree on resource
allocation, policy enforcement, task scheduling, or strategic
decisions that affect multiple stakeholders. However, existing
agent-to-agent communication protocols such as FIPA-ACL and
emerging frameworks focus primarily on message exchange and basic
coordination primitives, lacking robust consensus mechanisms
necessary for reliable distributed decision-making. This gap
becomes particularly problematic in cross-organizational
deployments where agents operate under different governance
models, trust assumptions, and operational constraints.
Network partitions and communication failures present fundamental
challenges to consensus achievement in distributed agent
environments. Unlike traditional distributed systems where nodes
typically operate within controlled network environments, AI
agents often function across heterogeneous networks with varying
reliability characteristics, from edge computing environments to
cloud infrastructure spanning multiple providers. Agents may
become temporarily or permanently unreachable, creating scenarios
where consensus decisions must proceed with incomplete information
or be safely aborted to prevent system-wide inconsistencies.
Current agent protocols provide insufficient guidance for handling
these partition scenarios, often resulting in ad-hoc
implementations that cannot guarantee safety properties or
liveness guarantees across different deployment contexts.
Conflicting objectives among participating agents introduce
additional complexity beyond traditional distributed consensus
problems. AI agents frequently operate with competing utility
functions, resource constraints, and optimization targets that may
not align with collective decision requirements. For example, in a
multi-cloud resource allocation scenario, individual agents may
prioritize cost minimization for their respective organizations
while the collective decision requires balancing performance,
security, and availability across all participants. Existing
consensus algorithms assume participants share common objectives
or can reduce decisions to simple binary choices, failing to
address the multi-dimensional optimization problems inherent in AI
agent coordination.
The presence of malicious or Byzantine actors poses significant
threats to consensus integrity in open multi-agent environments.
Unlike closed distributed systems where all participants operate
under unified security models, AI agents may need to establish
consensus across organizational boundaries where some participants
cannot be fully trusted. Malicious agents may attempt to
manipulate consensus outcomes through strategic voting, false
information injection, or coordination attacks designed to prevent
legitimate consensus achievement. Furthermore, compromised or
malfunctioning agents may exhibit Byzantine behavior without
malicious intent, requiring consensus mechanisms that can tolerate
arbitrary failures while maintaining decision quality and system
progress.
Current limitations in existing agent-to-agent protocols create
additional barriers to reliable consensus implementation. Most
protocols focus on peer-to-peer communication abstractions without
addressing the coordination complexity required for multi-party
decision-making. Authentication and authorization mechanisms are
typically designed for bilateral interactions rather than group
consensus scenarios, lacking support for quorum-based verification
or reputation systems that could improve consensus security.
Additionally, existing protocols provide limited support for
hierarchical consensus structures or delegation mechanisms that
could enable scalable decision-making in large agent populations,
forcing implementations toward flat consensus models that may not
perform well beyond small agent groups.
4. MACP Architecture and Core Components
The MACP architecture employs a distributed coordination model
consisting of three primary components: Consensus Coordinators,
Participating Agents, and Decision Domains. This architecture is
designed to scale horizontally while maintaining fault tolerance
and ensuring efficient consensus across diverse agent populations.
The system operates on the principle of domain-partitioned
consensus, where agents are organized into logical groupings based
on their functional roles, trust relationships, or organizational
boundaries. Each Decision Domain maintains its own consensus state
and can interact with other domains through well-defined inter-
domain protocols.
Consensus Coordinators serve as the orchestration layer for MACP
operations within each Decision Domain. A Consensus Coordinator
MUST be capable of maintaining the current consensus state,
managing proposal queues, and coordinating message distribution
among Participating Agents. Coordinators are responsible for
implementing the selected consensus algorithm, enforcing quorum
requirements, and ensuring message ordering consistency. In
deployments requiring high availability, multiple Consensus
Coordinators MAY operate in a redundant configuration using leader
election mechanisms similar to those defined in Raft consensus
algorithms. Coordinators MUST implement Byzantine fault tolerance
measures when operating in environments where malicious behavior
is possible, maintaining consensus integrity even when up to one-
third of coordinators exhibit arbitrary failures.
Participating Agents represent the autonomous entities that
contribute to consensus decisions within the MACP framework. Each
Participating Agent MUST maintain a unique identity within its
Decision Domain and possess the capability to generate, evaluate,
and vote on consensus proposals. Agents are classified into one of
three participation modes: Active Participants that can both
propose and vote on decisions, Voting Participants that can vote
but not propose, and Observer Participants that receive consensus
results but do not participate in the decision process.
Participating Agents MUST implement proposal validation logic
appropriate to their domain context and SHOULD incorporate
reputation tracking mechanisms to assess the trustworthiness of
proposals from other agents.
Decision Domains establish the scope and context for consensus
operations, defining the set of agents authorized to participate
in specific types of decisions. A Decision Domain MUST specify its
consensus model (proof-of-authority, weighted voting, or
reputation-based), quorum requirements, and decision binding
policies. Domains operate independently but MAY establish inter-
domain communication channels for coordinating decisions that span
multiple domains. The domain configuration MUST include conflict
resolution parameters, timeout specifications, and rollback
procedures to handle consensus failures gracefully.
The interaction patterns between these components follow a
structured request-response model augmented with publish-subscribe
mechanisms for state synchronization. When a Participating Agent
initiates a consensus proposal, it MUST first submit the proposal
to its designated Consensus Coordinator, which validates the
proposal format and participant authorization. The Coordinator
then distributes the proposal to all eligible Participating Agents
within the Decision Domain, collecting votes according to the
configured consensus algorithm. Upon reaching quorum and achieving
consensus, the Coordinator publishes the binding decision to all
participants and updates the domain's consensus state. Failed
consensus attempts trigger the domain's configured rollback
procedures, allowing the system to maintain consistency despite
partial failures or network partitions.
5. Consensus Algorithms and Message Formats
MACP implements multiple consensus algorithms to accommodate
different operational requirements and network conditions. The
base specification MUST support the Practical Byzantine Fault
Tolerance (pBFT) algorithm adapted for multi-agent environments,
while implementations MAY support additional algorithms including
RAFT consensus for non-Byzantine scenarios and novel reputation-
weighted consensus for environments with established agent trust
relationships. The pBFT implementation assumes a maximum of f
Byzantine agents out of 3f+1 total participating agents, providing
safety and liveness guarantees under standard network assumptions.
Each consensus algorithm is identified by a unique Algorithm
Identifier (AID) registered with IANA as specified in Section 9.
The MACP consensus process follows a four-phase protocol:
Proposal, Pre-voting, Voting, and Commitment. During the Proposal
phase, any authorized agent MAY submit decision proposals to the
designated consensus coordinator for the relevant coordination
domain. The Pre-voting phase allows agents to signal their
preliminary position and identify potential conflicts or
dependencies with other pending proposals. The Voting phase
requires participating agents to submit cryptographically signed
votes within a specified timeout window, with vote validity
determined by the agent's authorization level and current
reputation score. The Commitment phase broadcasts the final
decision and requires acknowledgment from a quorum of agents
before considering the consensus binding.
MACP defines standardized message formats using JSON serialization
with mandatory digital signatures following RFC 7515 (JSON Web
Signature). All consensus messages MUST include a common header
containing the message type, consensus session identifier,
timestamp, originating agent identifier, and cryptographic
signature. Proposal messages additionally contain the proposed
decision payload, expected quorum size, voting timeout duration,
and conflict resolution parameters. Vote messages include the
proposal hash, agent's decision (ACCEPT, REJECT, ABSTAIN), voting
weight, and optional reasoning metadata. Commitment messages
broadcast the final consensus result, participating agent list,
vote tally, and binding duration for the agreed decision.
Message validation requires verification of agent authorization,
signature authenticity, and temporal constraints before
processing. Agents MUST reject messages with invalid signatures,
expired timestamps beyond the configured clock skew tolerance, or
originating from agents not authorized for the specific
coordination domain. Vote aggregation follows the specified
consensus algorithm with additional validation for vote weight
consistency and duplicate vote detection. The consensus
coordinator MUST broadcast commitment messages to all
participating agents and maintain an audit log of the complete
consensus session for accountability purposes as defined in
existing agent attestation frameworks.
Timeout handling and failure recovery mechanisms ensure liveness
properties under adverse network conditions. If insufficient votes
are received within the voting timeout window, the consensus
coordinator MUST initiate a new consensus round with an
exponentially increasing timeout duration, up to a maximum
threshold defined in the coordination domain configuration.
Network partition scenarios are addressed through coordinator
election protocols that prevent split-brain consensus decisions.
Failed consensus attempts trigger rollback procedures that notify
all participating agents of the unsuccessful coordination attempt
and release any provisionally allocated resources pending the
consensus outcome.
6. Conflict Resolution and Decision Binding
Conflict resolution in MACP occurs when multiple competing
proposals are submitted simultaneously or when agents disagree on
the validity or priority of proposed decisions. When conflicts are
detected during the proposal phase, participating agents MUST
invoke the conflict resolution mechanism before proceeding with
the voting phase. The protocol defines three primary conflict
types: proposal conflicts (multiple proposals for the same
decision domain), timing conflicts (simultaneous submissions
within the conflict detection window), and validity conflicts
(disagreement on proposal prerequisites or constraints). Agents
MUST maintain a conflict detection buffer with a configurable
timeout period (default 30 seconds) to identify competing
proposals before initiating consensus procedures.
The tie-breaking procedure activates when voting results in
equivalent support for multiple proposals or when no proposal
achieves the required quorum threshold. MACP employs a
hierarchical tie-breaking mechanism starting with proposal
priority levels, followed by submitter reputation scores, and
finally deterministic hash-based selection using the combined hash
of conflicting proposal identifiers. Participating agents MUST
apply tie-breaking rules in the specified order and MUST reach
agreement on tie-breaking criteria during the initial coordination
domain establishment. When tie-breaking fails to resolve
conflicts, the consensus coordinator MUST initiate a cooling-off
period of at least 60 seconds before allowing resubmission of
conflicting proposals.
Decision binding ensures that consensus outcomes are enforced
across all participating agents through cryptographic commitment
and distributed verification mechanisms. Once consensus is
achieved, all participating agents MUST generate binding
commitment messages containing the decision hash, their digital
signature, and a commitment timestamp. The binding phase requires
acknowledgment from at least the same quorum that approved the
original proposal within a configurable binding timeout (default
120 seconds). Agents MUST store binding commitments in their local
decision ledger and MUST reject future proposals that violate
committed decisions unless explicitly superseded through the
decision override mechanism.
Timeout handling addresses scenarios where consensus cannot be
achieved within specified time bounds or when participating agents
become unresponsive during critical phases. MACP defines distinct
timeout periods for each consensus phase: proposal timeout
(default 60 seconds), voting timeout (default 180 seconds), and
binding timeout (default 120 seconds). When timeouts occur, the
consensus coordinator MUST broadcast a timeout notification and
initiate graceful degradation procedures, which may include
reducing quorum requirements, extending timeout periods, or
aborting the consensus attempt. Agents that fail to respond within
timeout periods MUST be temporarily excluded from the current
consensus round but MAY rejoin subsequent rounds.
Rollback mechanisms provide recovery capabilities when consensus
failures occur after the voting phase or when binding commitments
cannot be properly established. Rollback procedures MUST be
initiated when binding acknowledgments fall below the required
threshold, when Byzantine fault detection identifies compromised
consensus results, or when critical participating agents report
implementation failures. The rollback process requires the
consensus coordinator to broadcast rollback notifications
containing the failed consensus identifier, rollback reason code,
and reversion instructions. All participating agents MUST
acknowledge rollback notifications, remove associated decision
commitments from their local ledgers, and reset their consensus
state to allow for subsequent retry attempts with modified
parameters or participant sets.
7. Integration with Existing Agent Protocols
MACP is designed to operate as an overlay protocol that integrates
seamlessly with existing agent-to-agent communication frameworks
and infrastructure. Implementations MUST support integration with
standard authentication protocols including OAuth 2.0 [RFC6749],
OpenID Connect, and X.509 certificate-based authentication systems
commonly deployed in enterprise environments. MACP consensus
messages SHOULD leverage existing secure transport mechanisms such
as TLS 1.3 [RFC8446] or DTLS for UDP-based communications,
ensuring that consensus operations benefit from established
security practices without requiring separate cryptographic
implementations.
Integration with agent accountability frameworks requires MACP
implementations to maintain comprehensive audit trails of
consensus participation and decision outcomes. Consensus
coordinators MUST log all proposal submissions, voting records,
and final decisions in formats compatible with existing audit and
compliance systems. When operating alongside agent attestation
protocols, MACP SHOULD verify agent identity and authorization
status before allowing participation in consensus processes,
utilizing existing identity providers and policy enforcement
points where available. The protocol defines standard interfaces
for querying agent reputation scores and authorization levels from
external accountability systems.
MACP consensus operations MUST be designed to coexist with
workflow management and orchestration platforms commonly used in
distributed AI deployments. Implementations SHOULD provide APIs
and event notifications that allow workflow systems to trigger
consensus processes when collective decisions are required, and to
receive binding consensus outcomes for subsequent workflow
execution. The protocol supports asynchronous integration patterns
where consensus results can be delivered to workflow systems
through message queues, webhooks, or polling interfaces, ensuring
compatibility with diverse orchestration architectures.
For environments utilizing existing agent-to-agent communication
protocols such as FIPA-ACL or custom REST-based agent APIs, MACP
provides adapter interfaces that translate consensus-specific
messages into native communication formats. Protocol
implementations MAY offer plugin architectures that allow custom
integration modules for proprietary agent communication systems,
while maintaining core consensus algorithm integrity. Standard
message mapping templates are provided for common integration
scenarios, reducing implementation complexity for organizations
with established agent communication infrastructure.
The protocol includes provisions for gradual deployment in mixed
environments where only a subset of agents support MACP consensus
mechanisms. Non-MACP agents MAY participate in consensus processes
through proxy agents that translate between native agent protocols
and MACP message formats, though such deployments SHOULD implement
additional verification mechanisms to ensure proxy agent fidelity.
Integration guidelines specify fallback procedures for scenarios
where consensus mechanisms are unavailable, allowing graceful
degradation to existing coordination approaches while maintaining
system stability.
8. Security Considerations
Security considerations for MACP deployment are paramount given
the distributed nature of multi-agent systems and the potential
for malicious actors to compromise consensus integrity. The
protocol MUST implement comprehensive threat mitigation strategies
to address attacks specific to distributed consensus mechanisms.
Primary security concerns include Sybil attacks where malicious
actors create multiple false agent identities to gain
disproportionate voting power, coordination attacks where
compromised agents collude to manipulate consensus outcomes, and
consensus manipulation through message tampering or replay
attacks. Additionally, MACP implementations MUST consider denial-
of-service attacks targeting consensus coordinators, eclipse
attacks isolating honest agents from the consensus network, and
long-range attacks where compromised agents attempt to rewrite
historical consensus decisions.
Cryptographic requirements for MACP implementations MUST include
strong identity verification mechanisms to prevent unauthorized
participation in consensus processes. Each participating agent
MUST possess a cryptographically verifiable identity backed by
public-key infrastructure or distributed identity systems such as
those defined in [RFC6960] and emerging decentralized identity
standards. Digital signatures MUST be used for all consensus
messages including proposals, votes, and commitments, with
signature schemes providing at least 128-bit security strength as
specified in [RFC3766]. Message integrity MUST be protected
through cryptographic hash functions resistant to collision
attacks, and implementations SHOULD employ hash-based message
authentication codes (HMAC) for additional verification. Time-
based replay attack prevention MUST be implemented through message
timestamps and nonce mechanisms, with strict validation of message
freshness windows.
Identity verification mechanisms MUST prevent Sybil attacks
through robust agent authentication and reputation tracking
systems. MACP implementations SHOULD integrate with existing
Public Key Infrastructure (PKI) systems or emerging decentralized
identity frameworks to establish verifiable agent identities.
Consensus coordinators MUST maintain authoritative lists of
eligible participating agents and regularly validate agent
credentials against trusted identity providers. Multi-factor
authentication SHOULD be employed for high-stakes consensus
decisions, potentially including hardware security module (HSM)
attestation or trusted execution environment verification. Agent
reputation systems MAY be implemented to track historical behavior
and adjust voting weights based on demonstrated trustworthiness,
though such systems MUST include mechanisms to prevent reputation
manipulation attacks.
Protection mechanisms for consensus integrity MUST address both
technical and game-theoretic attack vectors inherent in
distributed decision-making systems. Byzantine Fault Tolerant
consensus algorithms MUST be configured to handle the maximum
expected number of malicious agents according to theoretical
bounds, typically supporting up to f faulty agents in a network of
3f+1 total agents. Network-level protections SHOULD include
encrypted communication channels using protocols such as TLS 1.3
[RFC8446] and distributed denial-of-service (DDoS) mitigation
strategies to ensure consensus availability. Implementations MUST
implement consensus finality mechanisms that prevent retroactive
modification of agreed-upon decisions and provide cryptographic
proofs of consensus achievement. Regular security audits and
penetration testing SHOULD be conducted on MACP implementations,
with particular attention to consensus algorithm correctness and
cryptographic implementation vulnerabilities.
Economic and incentive-based security measures SHOULD be
considered to discourage malicious behavior and ensure honest
participation in consensus processes. Stake-based consensus
mechanisms MAY be implemented where agents must commit resources
or reputation to participate in decision-making, creating economic
disincentives for malicious behavior. Slashing mechanisms SHOULD
be employed to penalize agents that violate consensus rules or
demonstrate Byzantine behavior. However, such economic measures
MUST be carefully designed to prevent wealth concentration attacks
and ensure broad participation accessibility. Monitoring and
anomaly detection systems SHOULD continuously analyze consensus
patterns to identify potential coordinated attacks or unusual
voting behaviors that may indicate compromise. Emergency response
procedures MUST be established to handle detected security
incidents, including mechanisms for temporarily suspending
consensus participation of suspected malicious agents and
initiating incident response protocols.
9. IANA Considerations
This document requests IANA to create and maintain several new
registries to support the Multi-Agent Consensus Protocol (MACP)
and ensure protocol extensibility and interoperability. The
registries defined in this section will enable standardized
identification of protocol elements while allowing for future
enhancements and vendor-specific extensions without creating
conflicts or ambiguity in multi-agent consensus implementations.
IANA is requested to establish a "Multi-Agent Consensus Protocol
(MACP) Parameters" registry group containing three distinct
registries. The first registry, "MACP Message Types", SHALL
contain identifiers for all MACP message types including consensus
proposals, votes, commitments, and administrative messages.
Message type identifiers MUST be allocated as 16-bit unsigned
integers in the range 0x0000-0xFFFF, with values 0x0000-0x7FFF
reserved for IETF-defined message types and values 0x8000-0xFFFF
available for private use and experimental implementations.
Registration of new message types in the IETF range requires
Standards Action as defined in RFC 8126, and MUST include a
complete message format specification and security considerations.
The second registry, "MACP Consensus Algorithm Identifiers", SHALL
contain unique identifiers for consensus algorithms supported by
MACP implementations. Algorithm identifiers MUST be allocated as
UTF-8 strings following the pattern "algorithm-name.version" with
a maximum length of 64 characters. The registry MUST include
algorithm names, version numbers, reference specifications,
security properties, and applicable use case constraints for each
entry. Initial registry entries SHALL include "pbft.1.0" for
Practical Byzantine Fault Tolerance, "poa.1.0" for Proof of
Authority, and "weighted-vote.1.0" for weighted voting consensus.
New algorithm registrations require Expert Review with designated
experts having demonstrated expertise in distributed consensus
mechanisms and multi-agent systems.
The third registry, "MACP Conflict Resolution Methods", SHALL
contain identifiers for standardized conflict resolution
procedures used when multiple competing proposals achieve similar
consensus scores. Resolution method identifiers MUST follow the
same UTF-8 string format as consensus algorithms and include
detailed descriptions of resolution logic, fairness guarantees,
and termination conditions. Registration requires Expert Review
and MUST demonstrate deterministic behavior across all
participating agents. Initial entries SHALL include "timestamp-
priority.1.0", "hash-ordering.1.0", and "weighted-random.1.0" with
complete algorithmic specifications.
IANA is further requested to establish a "MACP Extension
Parameters" registry for protocol extension identifiers used in
MACP header fields and capability negotiation. Extension
identifiers MUST be allocated as reverse DNS notation strings to
prevent conflicts and enable vendor-specific extensions while
maintaining global uniqueness. The registry SHALL operate under
First Come First Served allocation policy as defined in RFC 8126,
requiring only basic documentation of the extension purpose and
format. All registry entries MUST include contact information for
the registering organization and SHOULD reference publicly
accessible specification documents for interoperability purposes.
Author's Address
Generated by IETF Draft Analyzer
2026-03-07

View File

@@ -12,7 +12,7 @@
| drafts | 434 | Up from 361 after 2026-03-07 fetch | | drafts | 434 | Up from 361 after 2026-03-07 fetch |
| ratings | 434 | 1:1 with drafts | | ratings | 434 | 1:1 with drafts |
| authors | 557 | Unique persons from Datatracker | | authors | 557 | Unique persons from Datatracker |
| ideas | 419 | See "Ideas Count History" below | | ideas | 462 | Re-extracted 2026-03-08, see "Ideas Count History" below |
| gaps | 11 | Not 12 -- see gap list below | | gaps | 11 | Not 12 -- see gap list below |
| embeddings | 434 | 1:1 with drafts | | embeddings | 434 | 1:1 with drafts |
| draft_authors | 1,057 | Draft-author links | | draft_authors | 1,057 | Draft-author links |
@@ -79,24 +79,25 @@ Blog posts reference 12 gaps with different names (e.g., "Agent Resource Exhaust
## Ideas Count History ## Ideas Count History
The database currently contains **419 ideas** across **377 drafts**. This is the third different count encountered: The database currently contains **462 ideas** across **415 drafts**. This is the fourth count encountered:
| Source | Count | Date | Likely Explanation | | Source | Count | Date | Likely Explanation |
|--------|-------|------|-------------------| |--------|-------|------|-------------------|
| Blog post 5 filename | 1,262 | ~2026-03-03 | Pre-expansion dataset (260 drafts), before dedup | | Blog post 5 filename | 1,262 | ~2026-03-03 | Pre-expansion dataset (260 drafts), before dedup |
| Blog post 5 text / master stats | 1,780 | ~2026-03-05 | Post-expansion (361 drafts), before dedup | | Blog post 5 text / master stats | 1,780 | ~2026-03-05 | Post-expansion (361 drafts), before dedup |
| Current database | 419 | 2026-03-08 | After `dedup_ideas` run (0.85 threshold) or re-extraction with different params | | Previous database | 419 | 2026-03-08 | After `dedup_ideas` run (0.85 threshold) or re-extraction with different params |
| Current database | 462 | 2026-03-08 | After re-extraction for 38 drafts missing ideas (474 total drafts, 59 still without ideas) |
### Ideas by Type (current DB) ### Ideas by Type (current DB)
| Type | Count | | Type | Count |
|------|-------| |------|-------|
| protocol | 96 | | architecture | 107 |
| architecture | 95 | | protocol | 106 |
| extension | 79 | | extension | 84 |
| mechanism | 68 | | mechanism | 74 |
| requirement | 42 | | requirement | 47 |
| pattern | 35 | | pattern | 40 |
| framework | 3 | | framework | 3 |
| format | 1 | | format | 1 |
@@ -104,14 +105,30 @@ The database currently contains **419 ideas** across **377 drafts**. This is the
| Ideas/Draft | Drafts | | Ideas/Draft | Drafts |
|-------------|--------| |-------------|--------|
| 1 | 337 | | 1 | 370 |
| 2 | 38 | | 2 | 43 |
| 3 | 2 | | 3 | 2 |
| 0 (no ideas) | 57 | | 0 (no ideas) | 59 |
The near-uniform 1-idea-per-draft (89% of drafts with ideas) suggests either aggressive dedup or a re-extraction with constrained output. The original pipeline extracted 1-4 ideas per draft, so the 1,780 figure likely reflects pre-dedup counts. The near-uniform 1-idea-per-draft (89% of drafts with ideas) suggests either aggressive dedup or a re-extraction with constrained output. The original pipeline extracted 1-4 ideas per draft, so the 1,780 figure likely reflects pre-dedup counts.
Excluding false positives: 365 ideas across 326 drafts. ### Convergence Analysis (2026-03-08)
Cross-organization idea convergence analysis (threshold: 0.75 SequenceMatcher similarity):
| Metric | Value |
|--------|-------|
| Total ideas | 462 |
| Unique clusters | 398 |
| Cross-org convergent ideas | 132 |
| Convergence rate | 33% |
Top convergent ideas by organization count:
- **Fully Adaptive Routing Ethernet for AI** — 14 orgs (Baidu, Broadcom, China Mobile, etc.)
- **AI Agent Protocol Framework** — 7 orgs, 3 drafts
- **Natural Language Protocol for Agent Comm** — 7 orgs
- **LISP-based geospatial intelligence network** — 6 orgs
- **MCP-Based Network Management Plane** — 4 orgs (Deutsche Telekom, Huawei, Orange, Telefonica)
## Actions Taken (2026-03-08) ## Actions Taken (2026-03-08)

View File

@@ -0,0 +1,97 @@
# Working Group Analysis
*Generated 2026-03-06 21:16 UTC — 434 drafts (85 WG-adopted, 349 individual)*
## Working Group Overview
| WG | Drafts | Ideas | Novelty | Maturity | Overlap | Momentum | Relevance |
|:---|-------:|------:|--------:|---------:|--------:|---------:|----------:|
| **lake** | 11 | 10 | 3.1 | 3.8 | 2.3 | 3.6 | 3.9 |
| **lamps** | 9 | 9 | 2.7 | 3.9 | 1.7 | 3.4 | 3.6 |
| **aipref** | 9 | 10 | 3.0 | 3.2 | 3.2 | 3.3 | 4.1 |
| **emu** | 6 | 6 | 3.3 | 3.2 | 2.8 | 3.3 | 3.7 |
| **httpbis** | 5 | 5 | 2.0 | 4.8 | 3.2 | 4.2 | 3.0 |
| **tsv** | 4 | 4 | 2.8 | 3.8 | 2.2 | 3.0 | 3.0 |
| **tls** | 4 | 4 | 3.2 | 4.0 | 2.0 | 4.5 | 5.0 |
| **sshm** | 3 | 3 | 2.0 | 4.3 | 2.0 | 3.7 | 3.7 |
| **idr** | 3 | 3 | 2.7 | 3.0 | 2.7 | 3.7 | 3.0 |
| **dnsop** | 3 | 3 | 3.0 | 3.7 | 1.7 | 3.7 | 3.0 |
| **app** | 3 | 3 | 2.0 | 3.7 | 2.0 | 2.0 | 2.0 |
| **anima** | 3 | 4 | 3.0 | 4.3 | 2.3 | 3.7 | 3.7 |
| **sml** | 2 | 2 | 3.0 | 3.5 | 2.0 | 3.0 | 3.0 |
| **nmrg** | 2 | 2 | 3.0 | 3.0 | 3.5 | 3.0 | 3.5 |
| **hpke** | 2 | 2 | 3.5 | 4.5 | 2.0 | 4.5 | 5.0 |
| **dtn** | 2 | 2 | 3.0 | 4.0 | 1.0 | 3.5 | 2.5 |
| **ace** | 2 | 2 | 3.5 | 4.0 | 3.0 | 4.0 | 4.0 |
| **websec** | 1 | 1 | 3.0 | 4.0 | 2.0 | 4.0 | 4.0 |
| **vwrap** | 1 | 1 | 4.0 | 3.0 | 2.0 | 3.0 | 4.0 |
| **suit** | 1 | 1 | 3.0 | 4.0 | 2.0 | 4.0 | 4.0 |
| **sip** | 1 | 1 | 3.0 | 4.0 | 2.0 | 4.0 | 3.0 |
| **sec** | 1 | 1 | 2.0 | 5.0 | 4.0 | 3.0 | 4.0 |
| **roll** | 1 | 1 | 2.0 | 4.0 | 3.0 | 3.0 | 3.0 |
| **pim** | 1 | 2 | 2.0 | 3.0 | 3.0 | 3.0 | 2.0 |
| **netconf** | 1 | 1 | 3.0 | 4.0 | 2.0 | 4.0 | 4.0 |
| **mailmaint** | 1 | 1 | 2.0 | 4.0 | 4.0 | 2.0 | 2.0 |
| **lisp** | 1 | 1 | 4.0 | 4.0 | 2.0 | 4.0 | 4.0 |
| **grow** | 1 | 1 | 4.0 | 4.0 | 2.0 | 4.0 | 5.0 |
| **core** | 1 | 1 | 3.0 | 4.0 | 1.0 | 4.0 | 4.0 |
## Cross-WG Category Spread
Categories appearing in multiple WGs — potential coordination or alignment needed.
| Category | WG Count | Total Drafts | WGs |
|:---------|:--------:|-------------:|:----|
| Data formats/interop | 23 | 174 | aipref(8), lamps(6), lake(4), httpbis(3), sml(2), sshm(2), hpke(2), lisp(1), mailmaint(1), nmrg(1), ace(1), suit(1), tls(1), anima(1), netconf(1), pim(1), dtn(1), websec(1), app(1), emu(1), core(1), sec(1) |
| Agent identity/auth | 13 | 152 | lake(8), emu(6), anima(3), lamps(3), sshm(2), ace(2), hpke(2), sml(1), vwrap(1), aipref(1), core(1), sec(1) |
| A2A protocols | 9 | 155 | idr(3), lake(2), lisp(1), ace(1), aipref(1), sip(1), vwrap(1), dtn(1) |
| Autonomous netops | 9 | 114 | anima(2), dnsop(2), lisp(1), roll(1), nmrg(1), netconf(1), dtn(1), grow(1) |
| Policy/governance | 9 | 108 | aipref(9), lamps(2), dnsop(2), lake(1), tls(1), websec(1), httpbis(1), idr(1) |
| Agent discovery/reg | 8 | 89 | lake(2), roll(1), pim(1), sip(1), aipref(1), app(1), anima(1) |
| Other AI/agent | 6 | 34 | tsv(3), httpbis(2), tls(2), app(2), dnsop(1) |
| ML traffic mgmt | 5 | 79 | nmrg(1), tsv(1), aipref(1), grow(1) |
| Human-agent interaction | 4 | 33 | aipref(3), nmrg(1), vwrap(1) |
| AI safety/alignment | 3 | 47 | aipref(2), sml(1) |
| Model serving/inference | 2 | 42 | nmrg(1) |
## Cross-WG Idea Overlap
Same technical ideas appearing in different WGs — strongest signals for alignment.
### Hybrid Post-Quantum Cryptography for EAP-AKA' (1 WGs: emu)
- **[emu]** [draft-ar-emu-hybrid-pqc-eapaka](https://datatracker.ietf.org/doc/draft-ar-emu-hybrid-pqc-eapaka/) — Enhancing Security in EAP-AKA' with Hybrid Post-Quantum Cryptography
## Individual vs WG-Adopted Distribution
| Category | Individual | WG-Adopted | Assessment |
|:---------|----------:|-----------:|:-----------|
| A2A protocols | 144 | 11 | WG exists — individual drafts could target it |
| AI safety/alignment | 44 | 3 | WG exists — individual drafts could target it |
| Agent discovery/reg | 81 | 8 | WG exists — individual drafts could target it |
| Agent identity/auth | 121 | 31 | WG exists — individual drafts could target it |
| Autonomous netops | 104 | 10 | WG exists — individual drafts could target it |
| Data formats/interop | 132 | 42 | WG exists — individual drafts could target it |
| Human-agent interaction | 29 | 5 | WG exists — individual drafts could target it |
| ML traffic mgmt | 75 | 4 | WG exists — individual drafts could target it |
| Model serving/inference | 41 | 1 | WG exists — individual drafts could target it |
| Other AI/agent | 24 | 10 | WG exists — individual drafts could target it |
| Policy/governance | 91 | 18 | WG exists — individual drafts could target it |
## Recommended Submission Targets
For each category, the best WG to submit new work to.
| Category | Best WG | Alternatives |
|:---------|:--------|:-------------|
| Data formats/interop | **aipref** | lamps(6), lake(4) |
| Agent identity/auth | **lake** | emu(6), anima(3) |
| A2A protocols | **idr** | lake(2), lisp(1) |
| Autonomous netops | **anima** | dnsop(2), lisp(1) |
| Policy/governance | **aipref** | lamps(2), dnsop(2) |
| Agent discovery/reg | **lake** | roll(1), pim(1) |
| Other AI/agent | **tsv** | httpbis(2), tls(2) |
| ML traffic mgmt | **nmrg** | tsv(1), aipref(1) |
| Human-agent interaction | **aipref** | nmrg(1), vwrap(1) |
| AI safety/alignment | **aipref** | sml(1) |
| Model serving/inference | **nmrg** | - |

View File

@@ -0,0 +1,66 @@
#!/usr/bin/env python3
"""Backfill working group names by resolving group_uri from Datatracker API."""
import sqlite3
import time
import httpx
DB_PATH = "data/drafts.db"
conn = sqlite3.connect(DB_PATH)
conn.row_factory = sqlite3.Row
# Get distinct group_uris that don't have a group name yet
rows = conn.execute("""
SELECT DISTINCT group_uri FROM drafts
WHERE group_uri IS NOT NULL AND group_uri != ''
AND ("group" IS NULL OR "group" = '')
""").fetchall()
uris = [r["group_uri"] for r in rows]
print(f"Resolving {len(uris)} unique group URIs...")
client = httpx.Client(timeout=30, follow_redirects=True)
resolved = {}
for uri in uris:
try:
resp = client.get(f"https://datatracker.ietf.org{uri}", params={"format": "json"})
resp.raise_for_status()
data = resp.json()
acronym = data.get("acronym", "")
name = data.get("name", "")
resolved[uri] = acronym or name or ""
print(f" {uri} -> {resolved[uri]} ({name})")
time.sleep(0.3)
except Exception as e:
print(f" {uri} -> ERROR: {e}")
resolved[uri] = ""
client.close()
# Update the database
for uri, group_name in resolved.items():
if group_name:
conn.execute(
'UPDATE drafts SET "group" = ? WHERE group_uri = ?',
(group_name, uri),
)
conn.commit()
# Show summary
rows = conn.execute("""
SELECT "group", COUNT(*) as cnt FROM drafts
WHERE "group" IS NOT NULL AND "group" != ''
GROUP BY "group" ORDER BY cnt DESC
""").fetchall()
print(f"\nWorking groups resolved ({len(rows)} groups):")
for r in rows:
print(f" {r[0]:30s} {r[1]} drafts")
total = conn.execute('SELECT COUNT(*) FROM drafts WHERE "group" IS NOT NULL AND "group" != ""').fetchone()[0]
none_count = conn.execute('SELECT COUNT(*) FROM drafts WHERE "group" IS NULL OR "group" = ""').fetchone()[0]
print(f"\nTotal with WG: {total}, individual/unresolved: {none_count}")
conn.close()

View File

@@ -0,0 +1,39 @@
#!/usr/bin/env python3
"""Classify unrated drafts using Ollama two-stage filter."""
import sqlite3
import sys
sys.path.insert(0, "src")
from ietf_analyzer.classifier import Classifier
from ietf_analyzer.config import Config
cfg = Config.load()
conn = sqlite3.connect(cfg.db_path)
conn.row_factory = sqlite3.Row
# Get unrated drafts
rows = conn.execute("""
SELECT name, title, abstract, source FROM drafts
WHERE name NOT IN (SELECT draft_name FROM ratings)
ORDER BY source, name
""").fetchall()
drafts = [dict(r) for r in rows]
print(f"Classifying {len(drafts)} unrated drafts...\n")
with Classifier(cfg) as clf:
relevant, irrelevant = clf.classify_batch(drafts, verbose=True)
print(f"\n--- RELEVANT ({len(relevant)}) ---")
for d in relevant:
print(f" [{d['source']}] {d['name']}")
print(f" {d['title'][:100]}")
print(f"\n--- IRRELEVANT ({len(irrelevant)}) ---")
for d in irrelevant:
print(f" [{d['source']}] {d['name']}")
print(f" {d['title'][:100]}")
print(f"\nSummary: {len(relevant)} relevant, {len(irrelevant)} irrelevant out of {len(drafts)}")
conn.close()

View File

@@ -0,0 +1,86 @@
#!/usr/bin/env python3
"""Compare Ollama classifier vs Claude ratings to find disagreements."""
import sqlite3
import sys
sys.path.insert(0, "src")
from ietf_analyzer.classifier import Classifier
from ietf_analyzer.config import Config
cfg = Config.load()
conn = sqlite3.connect(cfg.db_path)
conn.row_factory = sqlite3.Row
# Get all rated drafts with their Claude ratings
rows = conn.execute("""
SELECT d.name, d.title, d.abstract, r.relevance, r.false_positive,
r.novelty, r.maturity, r.overlap, r.momentum,
(r.novelty + r.maturity + (5 - r.overlap) + r.momentum + r.relevance) / 5.0 as composite
FROM drafts d JOIN ratings r ON d.name = r.draft_name
WHERE d.abstract IS NOT NULL AND d.abstract != ''
ORDER BY d.name
""").fetchall()
print(f"Comparing Ollama classifier vs Claude ratings on {len(rows)} drafts...\n")
with Classifier(cfg) as clf:
agree = 0
disagree_ollama_yes_claude_no = [] # Ollama says relevant, Claude says FP
disagree_ollama_no_claude_yes = [] # Ollama says irrelevant, Claude says relevant
for i, r in enumerate(rows):
is_rel, sim, method = clf.classify(r["title"], r["abstract"])
# Claude's view: false_positive=1 OR relevance<=2 means "not really relevant"
claude_relevant = not r["false_positive"] and r["relevance"] >= 3
if is_rel == claude_relevant:
agree += 1
elif is_rel and not claude_relevant:
disagree_ollama_yes_claude_no.append({
"name": r["name"], "title": r["title"][:60],
"sim": sim, "method": method,
"relevance": r["relevance"], "fp": r["false_positive"],
"composite": r["composite"],
})
else:
disagree_ollama_no_claude_yes.append({
"name": r["name"], "title": r["title"][:60],
"sim": sim, "method": method,
"relevance": r["relevance"], "fp": r["false_positive"],
"composite": r["composite"],
})
if (i + 1) % 50 == 0:
print(f" Processed {i+1}/{len(rows)}...")
print(f"\n{'='*70}")
print(f"AGREEMENT: {agree}/{len(rows)} ({100*agree/len(rows):.1f}%)")
print(f"{'='*70}")
print(f"\nOllama=RELEVANT but Claude=NOT relevant ({len(disagree_ollama_yes_claude_no)}):")
print(f" (These are cases where Ollama wastes Claude tokens on irrelevant drafts)")
for d in sorted(disagree_ollama_yes_claude_no, key=lambda x: x["sim"], reverse=True)[:15]:
fp_label = " [FP]" if d["fp"] else ""
print(f" sim={d['sim']:.3f} ({d['method']:18s}) rel={d['relevance']}{fp_label} | {d['name']}")
print(f" {d['title']}")
print(f"\nOllama=IRRELEVANT but Claude=RELEVANT ({len(disagree_ollama_no_claude_yes)}):")
print(f" (These are cases where Ollama would have incorrectly filtered out good drafts)")
for d in sorted(disagree_ollama_no_claude_yes, key=lambda x: x["relevance"], reverse=True)[:15]:
print(f" sim={d['sim']:.3f} ({d['method']:18s}) rel={d['relevance']} comp={d['composite']:.1f} | {d['name']}")
print(f" {d['title']}")
# Summary stats
total_fp_by_claude = sum(1 for r in rows if r["false_positive"] or r["relevance"] <= 2)
total_relevant_by_claude = len(rows) - total_fp_by_claude
print(f"\n{'='*70}")
print(f"Claude thinks: {total_relevant_by_claude} relevant, {total_fp_by_claude} not relevant")
print(f"Ollama would let through: {agree + len(disagree_ollama_yes_claude_no) - len(disagree_ollama_no_claude_yes)} (saves {len(disagree_ollama_no_claude_yes) - len(disagree_ollama_yes_claude_no)} Claude calls)")
print(f"\nToken savings if Ollama pre-filters:")
print(f" Correctly rejected: {agree - total_relevant_by_claude + len(rows) - agree - len(disagree_ollama_yes_claude_no)} drafts")
print(f" Incorrectly rejected (missed): {len(disagree_ollama_no_claude_yes)} drafts")
print(f" Incorrectly passed (wasted): {len(disagree_ollama_yes_claude_no)} drafts")
conn.close()

View File

@@ -0,0 +1,65 @@
#!/usr/bin/env python3
"""Download full text for the 9 classifier-relevant unrated drafts."""
import sqlite3
import time
import sys
sys.path.insert(0, "src")
import httpx
from ietf_analyzer.config import Config
cfg = Config.load()
conn = sqlite3.connect(cfg.db_path)
conn.row_factory = sqlite3.Row
# The 9 relevant drafts from classifier
relevant_names = [
"draft-bondar-wca",
"draft-latour-pre-registration",
"draft-li-trustworthy-routing-discovery",
"draft-scrm-aiproto-usecases",
"draft-song-dmsc-problem-statement",
"draft-wiethuechter-drip-det-moc",
"draft-wiethuechter-drip-det-tada",
"draft-zzn-dvs",
"w3c-cuap",
]
client = httpx.Client(timeout=30, follow_redirects=True)
for name in relevant_names:
row = conn.execute("SELECT name, rev, source, source_url, full_text FROM drafts WHERE name=?", (name,)).fetchone()
if not row:
print(f" SKIP {name}: not in DB")
continue
if row["full_text"]:
print(f" SKIP {name}: already has text")
continue
if row["source"] == "w3c":
url = row["source_url"] or ""
if not url:
print(f" SKIP {name}: no source_url for W3C doc")
continue
else:
rev = row["rev"] or "00"
url = f"https://www.ietf.org/archive/id/{name}-{rev}.txt"
print(f" Fetching {name} from {url}...")
try:
resp = client.get(url)
if resp.status_code == 200:
text = resp.text[:500000] # cap at 500K
conn.execute("UPDATE drafts SET full_text=? WHERE name=?", (text, name))
conn.commit()
print(f" OK ({len(text)} chars)")
else:
print(f" FAIL: HTTP {resp.status_code}")
except Exception as e:
print(f" ERROR: {e}")
time.sleep(0.5)
client.close()
conn.close()
print("\nDone.")

8
scripts/run-webui.sh Executable file
View File

@@ -0,0 +1,8 @@
#!/usr/bin/env bash
# Start the IETF Draft Analyzer Web Dashboard
#
# Usage:
# ./scripts/run-webui.sh # Production (admin disabled)
# ./scripts/run-webui.sh --dev # Development (admin enabled)
cd "$(dirname "$0")/.."
python src/webui/app.py "$@"

View File

@@ -0,0 +1,182 @@
"""Local AI-relevance classifier using Ollama.
Two-stage filter to avoid spending Claude tokens on irrelevant drafts:
1. Embedding similarity — fast cosine check against a reference description
2. Chat classification — small local model for borderline cases
Both stages run locally via Ollama (zero cost).
"""
from __future__ import annotations
import numpy as np
import ollama as ollama_lib
from rich.console import Console
from .config import Config
console = Console()
# Reference description of what we're looking for.
# Embedding of this text is compared against each draft's abstract.
REFERENCE_DESCRIPTION = """
AI agent protocols, autonomous agent communication, agent-to-agent interaction,
agent identity and authentication, agent authorization, agent discovery,
large language model integration with network protocols, agentic systems,
machine learning for network operations, AI safety in networked systems,
model context protocol, multi-agent coordination, agent task delegation,
generative AI infrastructure, intelligent network automation,
trustworthy AI systems, AI governance in standards.
"""
# Thresholds for the two-stage filter (calibrated against 434 drafts + 73 FPs)
# TP avg similarity: 0.685, FP avg: 0.598
SIMILARITY_ACCEPT = 0.72 # Above this: definitely relevant, skip chat
SIMILARITY_REJECT = 0.50 # Below this: definitely irrelevant, skip chat
# Between REJECT and ACCEPT: borderline, use chat model to decide
CLASSIFY_PROMPT = """\
You are classifying IETF Internet-Drafts for an AI/agent standards tracker.
A draft is RELEVANT if it relates to ANY of these topics:
- AI agents, autonomous agents, multi-agent systems
- Agent identity, authentication, authorization, discovery
- Agent-to-agent (A2A) communication protocols
- Large language models (LLMs), generative AI
- Machine learning in network operations
- AI safety, alignment, trustworthiness
- Model Context Protocol (MCP), agentic workflows
- OAuth/JWT/credentials for agents or AI systems
- Autonomous network operations using AI
- Intelligent network management or traffic handling
A draft is NOT relevant if it only covers:
- Pure cryptography without AI/agent context
- General networking protocols (BGP, DNS, TLS) without AI
- Email, HTTP, or web standards without AI/agent features
Title: {title}
Abstract: {abstract}
Is this draft relevant to AI agents or related topics? Answer ONLY "yes" or "no"."""
class Classifier:
"""Classify drafts as AI-relevant using local Ollama models."""
def __init__(self, config: Config | None = None):
self.config = config or Config.load()
self.client = ollama_lib.Client(host=self.config.ollama_url)
self._ref_embedding: np.ndarray | None = None
def close(self) -> None:
if hasattr(self.client, '_client'):
self.client._client.close()
def __enter__(self):
return self
def __exit__(self, *exc):
self.close()
def _get_reference_embedding(self) -> np.ndarray:
"""Get (cached) embedding of the reference AI description."""
if self._ref_embedding is None:
resp = self.client.embed(
model=self.config.ollama_embed_model,
input=REFERENCE_DESCRIPTION.strip(),
)
self._ref_embedding = np.array(resp["embeddings"][0], dtype=np.float32)
return self._ref_embedding
def _embed(self, text: str) -> np.ndarray:
"""Embed a text string."""
resp = self.client.embed(
model=self.config.ollama_embed_model,
input=text[:8000],
)
return np.array(resp["embeddings"][0], dtype=np.float32)
def _cosine_similarity(self, a: np.ndarray, b: np.ndarray) -> float:
dot = np.dot(a, b)
norm = np.linalg.norm(a) * np.linalg.norm(b)
return float(dot / norm) if norm > 0 else 0.0
def _chat_classify(self, title: str, abstract: str) -> bool:
"""Ask local chat model whether a draft is AI-related."""
prompt = CLASSIFY_PROMPT.format(title=title, abstract=abstract[:2000])
try:
resp = self.client.chat(
model=self.config.ollama_classify_model,
messages=[{"role": "user", "content": prompt}],
options={"temperature": 0.0, "num_predict": 10},
)
answer = resp["message"]["content"].strip().lower()
return answer.startswith("yes")
except Exception as e:
console.print(f"[yellow]Chat classify failed: {e}, defaulting to relevant[/yellow]")
return True # err on the side of inclusion
def classify(self, title: str, abstract: str) -> tuple[bool, float, str]:
"""Classify a draft as AI-relevant.
Returns:
(is_relevant, similarity_score, method)
method is one of: "embedding_accept", "embedding_reject", "chat_yes", "chat_no"
"""
text = f"{title}\n{abstract}"
ref = self._get_reference_embedding()
emb = self._embed(text)
sim = self._cosine_similarity(emb, ref)
if sim >= SIMILARITY_ACCEPT:
return True, sim, "embedding_accept"
if sim <= SIMILARITY_REJECT:
return False, sim, "embedding_reject"
# Borderline — ask chat model
is_relevant = self._chat_classify(title, abstract)
method = "chat_yes" if is_relevant else "chat_no"
return is_relevant, sim, method
def classify_batch(
self, drafts: list[dict], verbose: bool = True
) -> tuple[list[dict], list[dict]]:
"""Classify a batch of drafts.
Args:
drafts: list of dicts with at least 'name', 'title', 'abstract' keys
Returns:
(relevant, irrelevant) — two lists of draft dicts
"""
relevant = []
irrelevant = []
stats = {"embedding_accept": 0, "embedding_reject": 0, "chat_yes": 0, "chat_no": 0}
for i, d in enumerate(drafts):
is_rel, sim, method = self.classify(
d.get("title", ""), d.get("abstract", "")
)
stats[method] += 1
if verbose and (i + 1) % 10 == 0:
console.print(f" Classified {i + 1}/{len(drafts)}...")
if is_rel:
relevant.append(d)
else:
irrelevant.append(d)
if verbose:
console.print(
f"\n [green]Relevant: {len(relevant)}[/green] "
f"[red]Irrelevant: {len(irrelevant)}[/red]\n"
f" Embedding accept: {stats['embedding_accept']} "
f" Embedding reject: {stats['embedding_reject']}\n"
f" Chat yes: {stats['chat_yes']} "
f" Chat no: {stats['chat_no']}"
)
return relevant, irrelevant

View File

@@ -297,8 +297,9 @@ class Database:
def upsert_draft(self, draft: Draft) -> None: def upsert_draft(self, draft: Draft) -> None:
self.conn.execute( self.conn.execute(
"""INSERT INTO drafts (name, rev, title, abstract, time, dt_id, pages, words, """INSERT INTO drafts (name, rev, title, abstract, time, dt_id, pages, words,
"group", group_uri, expires, ad, shepherd, states, full_text, categories, tags, fetched_at) "group", group_uri, expires, ad, shepherd, states, full_text, categories, tags, fetched_at,
VALUES (?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?) source, source_id, source_url, doc_status)
VALUES (?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?)
ON CONFLICT(name) DO UPDATE SET ON CONFLICT(name) DO UPDATE SET
rev=excluded.rev, title=excluded.title, abstract=excluded.abstract, rev=excluded.rev, title=excluded.title, abstract=excluded.abstract,
time=excluded.time, dt_id=excluded.dt_id, pages=excluded.pages, time=excluded.time, dt_id=excluded.dt_id, pages=excluded.pages,
@@ -307,7 +308,9 @@ class Database:
states=excluded.states, states=excluded.states,
full_text=COALESCE(excluded.full_text, full_text), full_text=COALESCE(excluded.full_text, full_text),
categories=excluded.categories, tags=excluded.tags, categories=excluded.categories, tags=excluded.tags,
fetched_at=excluded.fetched_at fetched_at=excluded.fetched_at,
source=excluded.source, source_id=excluded.source_id,
source_url=excluded.source_url, doc_status=excluded.doc_status
""", """,
( (
draft.name, draft.rev, draft.title, draft.abstract, draft.time, draft.name, draft.rev, draft.title, draft.abstract, draft.time,
@@ -316,6 +319,7 @@ class Database:
json.dumps(draft.states), draft.full_text, json.dumps(draft.states), draft.full_text,
json.dumps(draft.categories), json.dumps(draft.tags), json.dumps(draft.categories), json.dumps(draft.tags),
draft.fetched_at or datetime.now(timezone.utc).isoformat(), draft.fetched_at or datetime.now(timezone.utc).isoformat(),
draft.source, draft.source_id, draft.source_url, draft.doc_status,
), ),
) )
self.conn.commit() self.conn.commit()

244
src/webui/analytics.py Normal file
View File

@@ -0,0 +1,244 @@
"""Lightweight, GDPR-compliant analytics using SQLite.
No cookies, no personal data, no consent needed.
Visitor uniqueness is estimated via daily-salted IP hash (not stored raw).
Data lives in a separate analytics.db to keep the main DB clean.
"""
from __future__ import annotations
import hashlib
import sqlite3
import time
from collections import Counter, defaultdict
from datetime import date, datetime, timedelta
from pathlib import Path
from urllib.parse import urlparse
from flask import Flask, request, g
_SCHEMA = """
CREATE TABLE IF NOT EXISTS page_views (
id INTEGER PRIMARY KEY AUTOINCREMENT,
ts TEXT NOT NULL DEFAULT (strftime('%Y-%m-%dT%H:%M:%S', 'now')),
date TEXT NOT NULL DEFAULT (strftime('%Y-%m-%d', 'now')),
path TEXT NOT NULL,
referrer TEXT,
visitor TEXT,
ua_type TEXT
);
CREATE INDEX IF NOT EXISTS idx_pv_date ON page_views(date);
CREATE INDEX IF NOT EXISTS idx_pv_path ON page_views(path);
CREATE TABLE IF NOT EXISTS downloads (
id INTEGER PRIMARY KEY AUTOINCREMENT,
ts TEXT NOT NULL DEFAULT (strftime('%Y-%m-%dT%H:%M:%S', 'now')),
date TEXT NOT NULL DEFAULT (strftime('%Y-%m-%d', 'now')),
file_type TEXT NOT NULL,
visitor TEXT
);
CREATE INDEX IF NOT EXISTS idx_dl_date ON downloads(date);
"""
# Daily salt rotates so yesterday's hashes can't be correlated with today's
_daily_salt: tuple[str, str] = ("", "")
def _get_salt() -> str:
global _daily_salt
today = date.today().isoformat()
if _daily_salt[0] != today:
_daily_salt = (today, hashlib.sha256(f"ietf-analytics-{today}".encode()).hexdigest()[:16])
return _daily_salt[1]
def _hash_visitor(ip: str) -> str:
"""Create a daily-rotating hash from IP. Cannot be reversed or correlated across days."""
salt = _get_salt()
return hashlib.sha256(f"{salt}:{ip}".encode()).hexdigest()[:12]
def _classify_ua(ua: str) -> str:
"""Rough bot/browser classification."""
ua_lower = ua.lower()
if any(b in ua_lower for b in ("bot", "spider", "crawl", "slurp", "wget", "curl", "python-requests")):
return "bot"
if "mobile" in ua_lower:
return "mobile"
return "browser"
def _get_analytics_db() -> sqlite3.Connection:
"""Get or create the analytics DB connection for this request."""
if "analytics_db" not in g:
db_path = Path(request.app_root or ".").parent.parent.parent / "data" / "analytics.db"
# Fall back to app config if available
if hasattr(g, "_analytics_db_path"):
db_path = g._analytics_db_path
db_path.parent.mkdir(parents=True, exist_ok=True)
conn = sqlite3.connect(str(db_path))
conn.row_factory = sqlite3.Row
conn.executescript(_SCHEMA)
g.analytics_db = conn
return g.analytics_db
# Paths to skip (static assets, API calls, etc.)
_SKIP_PREFIXES = ("/static/", "/api/", "/favicon", "/robots.txt", "/admin/")
def init_analytics(app: Flask, db_path: str | None = None):
"""Register analytics hooks on the Flask app."""
_resolved_db_path = Path(db_path) if db_path else (
Path(app.root_path).parent.parent / "data" / "analytics.db"
)
@app.before_request
def track_pageview():
path = request.path
# Skip static/API/admin routes
if any(path.startswith(p) for p in _SKIP_PREFIXES):
return
g._analytics_db_path = _resolved_db_path
try:
conn = _get_analytics_db()
ip = request.remote_addr or "unknown"
visitor = _hash_visitor(ip)
ua = request.headers.get("User-Agent", "")
ua_type = _classify_ua(ua)
# Skip bots from page view counts (still track downloads)
if ua_type == "bot" and path != "/export/obsidian":
return
referrer = request.headers.get("Referer", "")
# Only keep the domain of referrer
if referrer:
try:
parsed = urlparse(referrer)
referrer = parsed.netloc or ""
except Exception:
referrer = ""
# Track downloads separately
if path == "/export/obsidian":
conn.execute(
"INSERT INTO downloads (file_type, visitor) VALUES (?, ?)",
("obsidian", visitor),
)
conn.commit()
conn.execute(
"INSERT INTO page_views (path, referrer, visitor, ua_type) VALUES (?, ?, ?, ?)",
(path, referrer, visitor, ua_type),
)
conn.commit()
except Exception:
pass # Analytics should never break the app
@app.teardown_appcontext
def close_analytics_db(exception=None):
conn = g.pop("analytics_db", None)
if conn is not None:
conn.close()
def get_analytics_data(db_path: str | Path) -> dict:
"""Query analytics data for the dashboard. Returns dicts ready for rendering."""
conn = sqlite3.connect(str(db_path))
conn.row_factory = sqlite3.Row
conn.executescript(_SCHEMA)
today = date.today()
week_ago = (today - timedelta(days=7)).isoformat()
month_ago = (today - timedelta(days=30)).isoformat()
# --- Overall stats ---
total_views = conn.execute("SELECT COUNT(*) FROM page_views").fetchone()[0]
total_visitors = conn.execute("SELECT COUNT(DISTINCT visitor || date) FROM page_views").fetchone()[0]
total_downloads = conn.execute("SELECT COUNT(*) FROM downloads").fetchone()[0]
today_views = conn.execute(
"SELECT COUNT(*) FROM page_views WHERE date = ?", (today.isoformat(),)
).fetchone()[0]
today_visitors = conn.execute(
"SELECT COUNT(DISTINCT visitor) FROM page_views WHERE date = ?", (today.isoformat(),)
).fetchone()[0]
week_views = conn.execute(
"SELECT COUNT(*) FROM page_views WHERE date >= ?", (week_ago,)
).fetchone()[0]
month_views = conn.execute(
"SELECT COUNT(*) FROM page_views WHERE date >= ?", (month_ago,)
).fetchone()[0]
# --- Daily views (last 30 days) ---
daily_rows = conn.execute(
"SELECT date, COUNT(*) as views, COUNT(DISTINCT visitor) as visitors "
"FROM page_views WHERE date >= ? GROUP BY date ORDER BY date",
(month_ago,),
).fetchall()
daily = {
"dates": [r["date"] for r in daily_rows],
"views": [r["views"] for r in daily_rows],
"visitors": [r["visitors"] for r in daily_rows],
}
# --- Top pages (last 30 days) ---
page_rows = conn.execute(
"SELECT path, COUNT(*) as views, COUNT(DISTINCT visitor) as visitors "
"FROM page_views WHERE date >= ? GROUP BY path ORDER BY views DESC LIMIT 20",
(month_ago,),
).fetchall()
top_pages = [{"path": r["path"], "views": r["views"], "visitors": r["visitors"]} for r in page_rows]
# --- Top referrers (last 30 days) ---
ref_rows = conn.execute(
"SELECT referrer, COUNT(*) as count FROM page_views "
"WHERE date >= ? AND referrer != '' GROUP BY referrer ORDER BY count DESC LIMIT 15",
(month_ago,),
).fetchall()
top_referrers = [{"referrer": r["referrer"], "count": r["count"]} for r in ref_rows]
# --- Downloads over time ---
dl_rows = conn.execute(
"SELECT date, COUNT(*) as count FROM downloads GROUP BY date ORDER BY date"
).fetchall()
downloads_daily = {
"dates": [r["date"] for r in dl_rows],
"counts": [r["count"] for r in dl_rows],
}
# --- Hourly pattern (last 7 days) ---
hourly_rows = conn.execute(
"SELECT CAST(strftime('%H', ts) AS INTEGER) as hour, COUNT(*) as views "
"FROM page_views WHERE date >= ? GROUP BY hour ORDER BY hour",
(week_ago,),
).fetchall()
hourly = {r["hour"]: r["views"] for r in hourly_rows}
hourly_full = {"hours": list(range(24)), "views": [hourly.get(h, 0) for h in range(24)]}
conn.close()
return {
"stats": {
"total_views": total_views,
"total_visitors": total_visitors,
"total_downloads": total_downloads,
"today_views": today_views,
"today_visitors": today_visitors,
"week_views": week_views,
"month_views": month_views,
},
"daily": daily,
"top_pages": top_pages,
"top_referrers": top_referrers,
"downloads_daily": downloads_daily,
"hourly": hourly_full,
}

55
src/webui/auth.py Normal file
View File

@@ -0,0 +1,55 @@
"""Admin authentication with two run modes.
Production (default):
python src/webui/app.py
All admin routes return 404. No way to access private features.
Development:
python src/webui/app.py --dev
Every request is auto-authenticated as admin. No login needed.
The mode is set once at startup and cannot be changed at runtime.
"""
from __future__ import annotations
from functools import wraps
from flask import abort, g
# Module-level flag set by init_auth()
_dev_mode: bool = False
_initialized: bool = False
def is_admin() -> bool:
"""Check if the current request has admin access."""
return _dev_mode
def admin_required(f):
"""Decorator: returns 404 for non-admin users so routes stay hidden."""
@wraps(f)
def decorated(*args, **kwargs):
if not is_admin():
abort(404)
return f(*args, **kwargs)
return decorated
def init_auth(app, dev: bool = False):
"""Set the auth mode and register Flask hooks (once only)."""
global _dev_mode, _initialized
_dev_mode = dev
if _initialized:
return
_initialized = True
@app.before_request
def set_admin_flag():
g.is_admin = is_admin()
@app.context_processor
def inject_admin():
return {"is_admin": g.get("is_admin", False)}

View File

@@ -0,0 +1,508 @@
"""Export research data as an Obsidian-compatible vault (ZIP).
Generates interlinked markdown files with YAML frontmatter,
[[wikilinks]], #tags, and Mermaid diagrams that Obsidian renders natively.
"""
from __future__ import annotations
import io
import zipfile
from collections import Counter, defaultdict
from datetime import date
from ietf_analyzer.db import Database
from webui.data import _extract_month
def _safe_filename(name: str) -> str:
"""Sanitize a string for use as a filename."""
return name.replace("/", "-").replace("\\", "-").replace(":", "-").replace('"', "")
def _score_bar(val: float, max_val: float = 5.0) -> str:
"""Render a simple text progress bar."""
filled = round(val / max_val * 10)
return "`" + "\u2588" * filled + "\u2591" * (10 - filled) + f"` {val}/{max_val}"
def _mermaid_pie(title: str, data: dict[str, int], limit: int = 12) -> str:
"""Generate a Mermaid pie chart."""
items = list(data.items())[:limit]
if not items:
return ""
lines = [f'```mermaid\npie title {title}']
for label, count in items:
safe_label = label.replace('"', "'")
lines.append(f' "{safe_label}" : {count}')
lines.append("```")
return "\n".join(lines)
def _mermaid_bar(title: str, data: dict[str, float], limit: int = 15) -> str:
"""Generate a Mermaid xychart bar chart."""
items = list(data.items())[:limit]
if not items:
return ""
labels = [f'"{k[:20]}"' for k, _ in items]
values = [str(round(v, 1)) for _, v in items]
return f"""```mermaid
xychart-beta
title "{title}"
x-axis [{", ".join(labels)}]
y-axis "Score"
bar [{", ".join(values)}]
```"""
def _mermaid_timeline_chart(monthly: dict[str, int]) -> str:
"""Generate a Mermaid xychart for submissions over time."""
if len(monthly) < 2:
return ""
months = sorted(monthly.keys())
# Show every 3rd label to avoid clutter
labels = []
for i, m in enumerate(months):
if i % 3 == 0:
labels.append(f'"{m}"')
else:
labels.append('" "')
values = [str(monthly[m]) for m in months]
return f"""```mermaid
xychart-beta
title "Draft Submissions Over Time"
x-axis [{", ".join(labels)}]
y-axis "Drafts"
bar [{", ".join(values)}]
```"""
def build_obsidian_vault(db: Database) -> bytes:
"""Build a ZIP file containing an Obsidian vault with all research data."""
buf = io.BytesIO()
prefix = "IETF-AI-Agent-Drafts"
pairs = db.drafts_with_ratings(limit=2000)
all_drafts_list = db.list_drafts(limit=2000, order_by="time DESC")
draft_map = {d.name: d for d in all_drafts_list}
all_ideas = db.all_ideas()
all_authors = db.top_authors(limit=500)
# Build lookup maps
cat_counts: Counter = Counter()
cat_drafts: dict[str, list[str]] = defaultdict(list)
score_map: dict[str, float] = {}
rating_map: dict[str, object] = {}
for d, r in pairs:
score_map[d.name] = r.composite_score
rating_map[d.name] = r
for cat in r.categories:
cat_counts[cat] += 1
cat_drafts[cat].append(d.name)
# Monthly submission counts
monthly: Counter = Counter()
for d in all_drafts_list:
monthly[_extract_month(d.time)] += 1
# Ideas by draft
ideas_by_draft: dict[str, list[dict]] = defaultdict(list)
for idea in all_ideas:
ideas_by_draft[idea.get("draft_name", "")].append(idea)
# Author info by draft
author_drafts: dict[str, list[str]] = defaultdict(list)
author_info: dict[str, dict] = {}
for name, aff, cnt, drafts in all_authors:
author_info[name] = {"affiliation": aff or "", "draft_count": cnt, "drafts": drafts}
for dn in drafts:
author_drafts[dn].append(name)
with zipfile.ZipFile(buf, "w", zipfile.ZIP_DEFLATED) as zf:
# --- Dashboard.md ---
top_rated = sorted(pairs, key=lambda p: p[1].composite_score, reverse=True)[:15]
top_table = "| Draft | Score | Category |\n|---|---|---|\n"
for d, r in top_rated:
score = r.composite_score
cat = r.categories[0] if r.categories else ""
top_table += f"| [[{d.name}]] | **{score:.2f}** | {cat} |\n"
cat_pie = _mermaid_pie("Drafts by Category", dict(cat_counts.most_common(12)))
timeline_chart = _mermaid_timeline_chart(dict(sorted(monthly.items())))
# Score distribution as mermaid
score_buckets: Counter = Counter()
for _, r in pairs:
bucket = f"{r.composite_score:.0f}"
score_buckets[bucket] += 1
score_dist = dict(sorted(score_buckets.items()))
dashboard = f"""---
tags: [dashboard, ietf, ai-agents]
generated: {date.today().isoformat()}
---
# IETF AI/Agent Draft Analysis
> Automated analysis of {len(all_drafts_list)} Internet-Drafts on AI and agent topics.
> Generated by [IETF Draft Analyzer](https://github.com) on {date.today().isoformat()}.
## Key Stats
| Metric | Value |
|---|---|
| Total Drafts | **{len(all_drafts_list)}** |
| Rated Drafts | **{len(pairs)}** |
| Authors | **{len(all_authors)}** |
| Ideas Extracted | **{len(all_ideas)}** |
| Categories | **{len(cat_counts)}** |
## Categories
{cat_pie}
### Category Index
{chr(10).join(f"- [[{cat}]] ({count} drafts)" for cat, count in cat_counts.most_common())}
## Submissions Over Time
{timeline_chart}
## Top Rated Drafts
{top_table}
## Navigation
- **[[Categories/index|Categories]]** — Browse by topic
- **[[Authors/index|Authors]]** — Browse by author
- **[[Analysis/Score Distribution|Score Distribution]]** — Rating analytics
- **[[Analysis/Top Rated|Top Rated]]** — Highest-scored drafts
- **[[Analysis/Ideas Overview|Ideas]]** — Extracted technical ideas
- **[[Analysis/Glossary|Glossary]]** — Terms, abbreviations, and scoring methodology
"""
zf.writestr(f"{prefix}/Dashboard.md", dashboard)
# --- Individual Draft Notes ---
for d_obj in all_drafts_list:
name = d_obj.name
draft = draft_map.get(name, d_obj)
r = rating_map.get(name)
ideas = ideas_by_draft.get(name, [])
authors = author_drafts.get(name, [])
month = _extract_month(draft.time)
# Frontmatter
fm_lines = [
"---",
f'title: "{(draft.title or name).replace(chr(34), chr(39))}"',
f"date: {draft.time or 'unknown'}",
f"rev: {draft.rev or '00'}",
]
if r:
fm_lines.append(f"score: {r.composite_score:.2f}")
fm_lines.append(f"novelty: {r.novelty}")
fm_lines.append(f"maturity: {r.maturity}")
fm_lines.append(f"overlap: {r.overlap}")
fm_lines.append(f"momentum: {r.momentum}")
fm_lines.append(f"relevance: {r.relevance}")
if r.categories:
fm_lines.append(f"categories: [{', '.join(r.categories)}]")
if authors:
fm_lines.append(f"authors: [{', '.join(a.replace(',', '') for a in authors)}]")
fm_lines.append(f"tags: [draft, ietf, {month}]")
fm_lines.append("---")
frontmatter = "\n".join(fm_lines)
# Body
body = f"\n# {draft.title or name}\n\n"
body += f"**{name}** | rev {draft.rev or '00'} | {draft.time or 'unknown'}\n\n"
if authors:
body += "## Authors\n\n"
body += "\n".join(f"- [[{a}]]" for a in authors) + "\n\n"
if r:
body += "## Rating\n\n"
body += f"**Composite Score: {r.composite_score:.2f}**\n\n"
body += f"| Dimension | Score |\n|---|---|\n"
body += f"| Novelty | {_score_bar(r.novelty)} |\n"
body += f"| Maturity | {_score_bar(r.maturity)} |\n"
body += f"| Overlap | {_score_bar(r.overlap)} |\n"
body += f"| Momentum | {_score_bar(r.momentum)} |\n"
body += f"| Relevance | {_score_bar(r.relevance)} |\n\n"
if r.summary:
body += f"> {r.summary}\n\n"
if r.categories:
body += "**Categories:** " + ", ".join(f"[[{c}]]" for c in r.categories) + "\n\n"
if draft.abstract:
body += "## Abstract\n\n"
body += draft.abstract + "\n\n"
if ideas:
body += f"## Extracted Ideas ({len(ideas)})\n\n"
for idea in ideas:
novelty = f" `N:{idea.get('novelty_score', '?')}`" if idea.get("novelty_score") else ""
itype = f" *{idea.get('type', '')}*" if idea.get("type") else ""
body += f"- **{idea.get('title', 'Untitled')}**{itype}{novelty}\n"
if idea.get("description"):
body += f" {idea['description']}\n"
body += "\n"
body += "## Links\n\n"
body += f"- [View on IETF Datatracker](https://datatracker.ietf.org/doc/{name}/)\n"
if draft.rev:
body += f"- [Read Full Text](https://www.ietf.org/archive/id/{name}-{draft.rev}.txt)\n"
content = frontmatter + body
zf.writestr(f"{prefix}/Drafts/{_safe_filename(name)}.md", content)
# --- Author Notes ---
author_index_lines = [
"---\ntags: [index, authors]\n---\n",
"# Authors\n\n",
f"**{len(all_authors)}** authors contributing to AI/agent Internet-Drafts.\n\n",
"| Author | Affiliation | Drafts |\n|---|---|---|\n",
]
for name, aff, cnt, drafts in sorted(all_authors, key=lambda x: x[2], reverse=True):
author_index_lines.append(f"| [[{name}]] | {aff or ''} | {cnt} |\n")
zf.writestr(f"{prefix}/Authors/index.md", "".join(author_index_lines))
for name, aff, cnt, drafts in all_authors:
fm = f"---\ntags: [author]\naffiliation: \"{aff or ''}\"\ndraft_count: {cnt}\n---\n"
body = f"\n# {name}\n\n"
if aff:
body += f"**Affiliation:** {aff}\n\n"
body += f"## Drafts ({cnt})\n\n"
for dn in drafts:
d = draft_map.get(dn)
title = d.title if d else dn
score = score_map.get(dn, "")
score_str = f" (score: {score:.2f})" if score else ""
body += f"- [[{dn}|{title}]]{score_str}\n"
# Co-authors
coauthors: Counter = Counter()
for dn in drafts:
for other in author_drafts.get(dn, []):
if other != name:
coauthors[other] += 1
if coauthors:
body += f"\n## Co-authors\n\n"
for co, shared in coauthors.most_common(20):
body += f"- [[{co}]] ({shared} shared)\n"
zf.writestr(f"{prefix}/Authors/{_safe_filename(name)}.md", fm + body)
# --- Category Notes ---
cat_index_lines = [
"---\ntags: [index, categories]\n---\n",
"# Categories\n\n",
_mermaid_pie("Draft Distribution", dict(cat_counts.most_common(12))),
"\n\n",
]
for cat, count in cat_counts.most_common():
cat_index_lines.append(f"- [[{cat}]] — {count} drafts\n")
zf.writestr(f"{prefix}/Categories/index.md", "".join(cat_index_lines))
for cat, count in cat_counts.most_common():
fm = f"---\ntags: [category]\ndraft_count: {count}\n---\n"
body = f"\n# {cat}\n\n"
body += f"**{count} drafts** in this category.\n\n"
# Table of drafts sorted by score
draft_names = cat_drafts[cat]
scored = [(dn, score_map.get(dn, 0)) for dn in draft_names]
scored.sort(key=lambda x: x[1], reverse=True)
body += "| Draft | Score |\n|---|---|\n"
for dn, score in scored:
d = draft_map.get(dn)
title = d.title[:60] if d else dn
body += f"| [[{dn}|{title}]] | {score:.2f} |\n"
zf.writestr(f"{prefix}/Categories/{_safe_filename(cat)}.md", fm + body)
# --- Analysis Notes ---
# Score Distribution
score_lines = [
"---\ntags: [analysis]\n---\n",
"\n# Score Distribution\n\n",
"Composite scores across all rated drafts (1.05.0 scale).\n\n",
]
# Mermaid bar chart of score buckets
buckets: dict[str, int] = defaultdict(int)
for _, r in pairs:
b = f"{r.composite_score:.1f}"
buckets[b] += 1
sorted_buckets = dict(sorted(buckets.items()))
if sorted_buckets:
labels = [f'"{k}"' for k in sorted_buckets.keys()]
values = [str(v) for v in sorted_buckets.values()]
score_lines.append(f"""```mermaid
xychart-beta
title "Score Distribution"
x-axis [{", ".join(labels)}]
y-axis "Count"
bar [{", ".join(values)}]
```\n\n""")
# Dimension averages
dims = {"Novelty": [], "Maturity": [], "Overlap": [], "Momentum": [], "Relevance": []}
for _, r in pairs:
dims["Novelty"].append(r.novelty)
dims["Maturity"].append(r.maturity)
dims["Overlap"].append(r.overlap)
dims["Momentum"].append(r.momentum)
dims["Relevance"].append(r.relevance)
score_lines.append("## Dimension Averages\n\n")
score_lines.append("| Dimension | Average | Min | Max |\n|---|---|---|---|\n")
for dim, vals in dims.items():
if vals:
avg = sum(vals) / len(vals)
score_lines.append(f"| {dim} | {avg:.2f} | {min(vals)} | {max(vals)} |\n")
zf.writestr(f"{prefix}/Analysis/Score Distribution.md", "".join(score_lines))
# Top Rated
top_lines = [
"---\ntags: [analysis]\n---\n",
"\n# Top Rated Drafts\n\n",
"Drafts ranked by composite score.\n\n",
"| # | Draft | Score | Novelty | Maturity | Overlap | Momentum | Relevance | Category |\n",
"|---|---|---|---|---|---|---|---|---|\n",
]
for i, (d, r) in enumerate(top_rated[:30], 1):
cat = r.categories[0] if r.categories else ""
top_lines.append(
f"| {i} | [[{d.name}|{(d.title or d.name)[:45]}]] | **{r.composite_score:.2f}** | "
f"{r.novelty} | {r.maturity} | {r.overlap} | {r.momentum} | {r.relevance} | {cat} |\n"
)
zf.writestr(f"{prefix}/Analysis/Top Rated.md", "".join(top_lines))
# Ideas Overview
type_counts = Counter(i.get("type", "other") or "other" for i in all_ideas)
ideas_lines = [
"---\ntags: [analysis, ideas]\n---\n",
f"\n# Extracted Ideas\n\n",
f"**{len(all_ideas)}** technical ideas extracted from rated drafts.\n\n",
_mermaid_pie("Ideas by Type", dict(type_counts.most_common(10))),
"\n\n## By Type\n\n",
]
for itype, count in type_counts.most_common():
ideas_lines.append(f"- **{itype}**: {count} ideas\n")
ideas_lines.append(f"\n## Recent Ideas\n\n")
for idea in all_ideas[:50]:
dn = idea.get("draft_name", "")
novelty = f" `N:{idea.get('novelty_score')}`" if idea.get("novelty_score") else ""
ideas_lines.append(f"- **{idea.get('title', 'Untitled')}**{novelty} — [[{dn}]]\n")
if len(all_ideas) > 50:
ideas_lines.append(f"\n*...and {len(all_ideas) - 50} more. See individual draft notes.*\n")
zf.writestr(f"{prefix}/Analysis/Ideas Overview.md", "".join(ideas_lines))
# Timeline
timeline_lines = [
"---\ntags: [analysis, timeline]\n---\n",
"\n# Timeline\n\n",
"Draft submission activity over time.\n\n",
_mermaid_timeline_chart(dict(sorted(monthly.items()))),
"\n\n## Monthly Counts\n\n",
"| Month | Drafts |\n|---|---|\n",
]
for m in sorted(monthly.keys()):
timeline_lines.append(f"| {m} | {monthly[m]} |\n")
zf.writestr(f"{prefix}/Analysis/Timeline.md", "".join(timeline_lines))
# --- Glossary ---
glossary = """---
tags: [reference, glossary]
---
# Glossary
Reference for all terms, abbreviations, and scoring dimensions used in this vault.
## Scoring Dimensions
Each draft is rated by Claude AI on five dimensions, scored from 1 (lowest) to 5 (highest).
| Dimension | Description |
|---|---|
| **Novelty** | How original is this draft? Does it introduce new ideas, or rehash existing approaches? High = genuinely new contribution. |
| **Maturity** | How complete and well-developed is the specification? High = detailed protocol, clear data formats, ready for implementation. Low = early sketch or position paper. |
| **Overlap** | How much does this draft duplicate existing work? High overlap (5) = very similar to other drafts. Low overlap (1) = unique in the landscape. *Note: In composite score, this is inverted (5 - overlap) so lower overlap contributes positively.* |
| **Momentum** | Is this draft gaining traction? High = active revisions, working group adoption, multiple authors/organizations. Low = single submission, no updates. |
| **Relevance** | How relevant is this draft to AI agent infrastructure? High = directly addresses agent-to-agent communication, identity, authorization. Low = tangentially related. |
## Composite Score
The **composite score** (1.05.0) is calculated as:
```
score = (novelty + maturity + (5 - overlap) + momentum + relevance) / 5
```
Overlap is inverted because a *lower* overlap is better (more unique).
## Score Bars
Score bars visualize ratings: `\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2591\u2591\u2591` = 3.5/5.0
- `\u2588` (filled) = earned score
- `\u2591` (empty) = remaining
## Other Terms
| Term | Meaning |
|---|---|
| **Draft / I-D** | Internet-Draft — a working document submitted to the IETF. Not yet an RFC (standard). |
| **RFC** | Request for Comments — a published IETF standard or informational document. |
| **Working Group (WG)** | An IETF group chartered to work on a specific topic (e.g., WIMSE, OAuth). |
| **Category** | Topic classification assigned by Claude during analysis (e.g., "A2A protocols", "AI safety/alignment"). A draft can belong to multiple categories. |
| **Idea** | A distinct technical concept extracted from a draft by Claude. Each idea has a type (protocol, mechanism, framework, etc.) and a novelty score. |
| **Novelty Score (N:15)** | Per-idea originality rating. Shown as `N:4` next to ideas. 5 = completely new concept, 1 = well-known approach. |
| **Gap** | An area identified where no existing draft adequately addresses a need in the AI agent ecosystem. |
| **Affiliation** | The organization an author is associated with (from IETF Datatracker records). |
| **Co-authorship** | Two authors who appear together on at least one draft. |
| **Datatracker** | The IETF's official system for tracking Internet-Drafts, RFCs, and working groups (datatracker.ietf.org). |
"""
zf.writestr(f"{prefix}/Analysis/Glossary.md", glossary)
# --- .obsidian settings for graph colors ---
graph_json = """{
"collapse-filter": false,
"search": "",
"showTags": true,
"showAttachments": false,
"hideUnresolved": false,
"showOrphans": true,
"collapse-color-groups": false,
"colorGroups": [
{"query": "path:Drafts", "color": {"a": 1, "rgb": 3444735}},
{"query": "path:Authors", "color": {"a": 1, "rgb": 10092441}},
{"query": "path:Categories", "color": {"a": 1, "rgb": 16744448}},
{"query": "path:Analysis", "color": {"a": 1, "rgb": 2293541}}
],
"collapse-display": false,
"showArrow": true,
"textFadeMultiplier": 0,
"nodeSizeMultiplier": 1.2,
"lineSizeMultiplier": 1,
"collapse-forces": true,
"centerStrength": 0.5,
"repelStrength": 10,
"linkStrength": 1,
"linkDistance": 100
}"""
zf.writestr(f"{prefix}/.obsidian/graph.json", graph_json)
buf.seek(0)
return buf.getvalue()

View File

@@ -0,0 +1,200 @@
"""Tests for the Obsidian vault export.
If this test breaks, the export is out of sync with the data model.
Fix obsidian_export.py to match whatever changed.
"""
from __future__ import annotations
import io
import sys
import zipfile
from pathlib import Path
import pytest
_project_root = Path(__file__).resolve().parent.parent
if str(_project_root / "src") not in sys.path:
sys.path.insert(0, str(_project_root / "src"))
from webui.obsidian_export import build_obsidian_vault
def test_vault_structure(seeded_db):
"""Vault ZIP should contain expected folders and key files."""
data = build_obsidian_vault(seeded_db)
assert len(data) > 0
z = zipfile.ZipFile(io.BytesIO(data))
names = z.namelist()
# Key structural files must exist
assert "IETF-AI-Agent-Drafts/Dashboard.md" in names
assert "IETF-AI-Agent-Drafts/Authors/index.md" in names
assert "IETF-AI-Agent-Drafts/Categories/index.md" in names
assert "IETF-AI-Agent-Drafts/.obsidian/graph.json" in names
# Should have analysis notes
analysis = [n for n in names if "/Analysis/" in n]
assert len(analysis) >= 3 # Score Distribution, Top Rated, Ideas Overview
def test_vault_has_all_drafts(seeded_db):
"""Every draft in the DB should have a corresponding note in the vault."""
data = build_obsidian_vault(seeded_db)
z = zipfile.ZipFile(io.BytesIO(data))
draft_files = [n for n in z.namelist() if "/Drafts/" in n]
# seeded_db has 5 drafts
assert len(draft_files) == 5
# Check each draft name appears
draft_names = {Path(f).stem for f in draft_files}
assert "draft-alpha-agent-comm" in draft_names
assert "draft-gamma-agent-id" in draft_names
def test_draft_note_has_frontmatter(seeded_db):
"""Draft notes must have YAML frontmatter with score and categories."""
data = build_obsidian_vault(seeded_db)
z = zipfile.ZipFile(io.BytesIO(data))
content = z.read("IETF-AI-Agent-Drafts/Drafts/draft-alpha-agent-comm.md").decode()
# YAML frontmatter
assert content.startswith("---")
assert "score:" in content
assert "novelty:" in content
assert "maturity:" in content
assert "categories:" in content
assert "tags:" in content
# No floating-point noise (e.g., 3.4000000000000004)
import re
long_floats = re.findall(r"\d+\.\d{4,}", content)
assert len(long_floats) == 0, f"Unformatted floats found: {long_floats}"
def test_draft_note_has_wikilinks(seeded_db):
"""Draft notes should link to authors and categories with [[wikilinks]]."""
data = build_obsidian_vault(seeded_db)
z = zipfile.ZipFile(io.BytesIO(data))
content = z.read("IETF-AI-Agent-Drafts/Drafts/draft-alpha-agent-comm.md").decode()
# Should link to authors
assert "[[Alice Researcher]]" in content
assert "[[Bob Engineer]]" in content
# Should link to categories
assert "[[A2A protocols]]" in content
def test_draft_note_has_ideas(seeded_db):
"""Draft notes should include extracted ideas."""
data = build_obsidian_vault(seeded_db)
z = zipfile.ZipFile(io.BytesIO(data))
content = z.read("IETF-AI-Agent-Drafts/Drafts/draft-alpha-agent-comm.md").decode()
assert "Extracted Ideas" in content
assert "Agent Handshake" in content
assert "Capability Negotiation" in content
def test_draft_note_has_rating_bars(seeded_db):
"""Draft notes should include visual score bars."""
data = build_obsidian_vault(seeded_db)
z = zipfile.ZipFile(io.BytesIO(data))
content = z.read("IETF-AI-Agent-Drafts/Drafts/draft-alpha-agent-comm.md").decode()
# Score bars use block chars
assert "\u2588" in content # filled block
assert "\u2591" in content # empty block
assert "/5.0" in content
def test_author_notes(seeded_db):
"""Author notes should list their drafts with wikilinks."""
data = build_obsidian_vault(seeded_db)
z = zipfile.ZipFile(io.BytesIO(data))
content = z.read("IETF-AI-Agent-Drafts/Authors/Alice Researcher.md").decode()
assert content.startswith("---")
assert "affiliation:" in content
assert "ExampleCorp" in content
assert "[[draft-alpha-agent-comm" in content
assert "[[draft-gamma-agent-id" in content
def test_category_notes(seeded_db):
"""Category notes should list drafts with scores."""
data = build_obsidian_vault(seeded_db)
z = zipfile.ZipFile(io.BytesIO(data))
cat_files = [n for n in z.namelist() if "/Categories/" in n and "index" not in n]
# seeded_db has 5 distinct categories
assert len(cat_files) >= 4
# Check one category note
content = z.read("IETF-AI-Agent-Drafts/Categories/A2A protocols.md").decode()
assert "[[draft-alpha-agent-comm" in content
assert "draft_count:" in content
def test_dashboard_has_mermaid(seeded_db):
"""Dashboard should contain Mermaid chart blocks."""
data = build_obsidian_vault(seeded_db)
z = zipfile.ZipFile(io.BytesIO(data))
content = z.read("IETF-AI-Agent-Drafts/Dashboard.md").decode()
assert "```mermaid" in content
assert "pie title" in content
assert "Key Stats" in content
assert "Total Drafts" in content
def test_vault_has_glossary(seeded_db):
"""Vault should contain a Glossary with scoring dimensions explained."""
data = build_obsidian_vault(seeded_db)
z = zipfile.ZipFile(io.BytesIO(data))
assert "IETF-AI-Agent-Drafts/Analysis/Glossary.md" in z.namelist()
content = z.read("IETF-AI-Agent-Drafts/Analysis/Glossary.md").decode()
# All five dimensions must be explained
for dim in ("Novelty", "Maturity", "Overlap", "Momentum", "Relevance"):
assert dim in content, f"Glossary missing dimension: {dim}"
assert "Composite Score" in content
assert "Internet-Draft" in content
def test_top_rated_uses_full_names(seeded_db):
"""Top Rated table should use full dimension names, not abbreviations."""
data = build_obsidian_vault(seeded_db)
z = zipfile.ZipFile(io.BytesIO(data))
content = z.read("IETF-AI-Agent-Drafts/Analysis/Top Rated.md").decode()
assert "Novelty" in content
assert "Maturity" in content
assert "| Nov |" not in content # no abbreviations
def test_vault_is_valid_zip(seeded_db):
"""The output should be a valid ZIP that can be extracted."""
data = build_obsidian_vault(seeded_db)
z = zipfile.ZipFile(io.BytesIO(data))
# Should not raise
bad = z.testzip()
assert bad is None, f"Corrupt file in ZIP: {bad}"
# All files should be decodable as UTF-8
for name in z.namelist():
if name.endswith(".md"):
z.read(name).decode("utf-8")