Files
ietf-draft-analyzer/data/reports/dev-journal.md
Christian Nennemann f1a0b0264c Fix blog accuracy and add methodology documentation
Blog posts (all 10 files updated):
- Update all counts to match DB: 434 drafts, 557 authors, 419 ideas, 11 gaps
- Fix EU AI Act timeline to August 2026 (5 months, not 18)
- Reframe growth claim from "36x" to actual monthly figures (5→61→85)
- Add safety ratio nuance (1.5:1 to 21:1 monthly variation)
- Fix composite scores (4.8→4.75, 4.6→4.5)
- Add OAuth/GDPR consent distinction (Art. 6(1)(a), Art. 28)
- Add EU AI Act Annex III + MDR context to hospital scenario
- Add FIPA, IEEE P3394, eIDAS 2.0 references
- Add GDPR gap paragraph (DPIA, erasure, portability, purpose limitation)
- Rewrite Post 04 gap table to match actual DB gap names

Methodology:
- Expand methodology.md: pipeline docs, limitations, related work
- Add LLM-as-judge caveats and explicit rating rubric to analyzer.py
- Add clustering threshold rationale to embeddings.py
- Add gap analysis grounding notes to analyzer.py
- Add Limitations section to Post 07

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-08 11:04:40 +01:00

563 lines
80 KiB
Markdown

# Development Journal — IETF Draft Analyzer
*This journal tracks development milestones across sessions. It serves as source material for the meta blog post about using Claude agent teams to build this analysis.*
---
### 2026-03-08 WRITER/EDITOR — Factual Accuracy Pass Across All Blog Posts
**What**: Comprehensive factual accuracy fix across all 10 blog series files (posts 00-08 plus state-of-ecosystem), driven by three review documents (review-statistics.md, review-legal.md, review-science.md). Key changes:
1. **Draft count**: Updated all references from 361 to 434 (current DB count) across all posts.
2. **Gap count**: Changed from 12 to 11 everywhere; rewrote Post 04's gap table to match actual DB gap names and severities (2 critical, 5 high, 4 medium).
3. **Composite scores**: Fixed inflated scores (4.8 -> 4.75, 4.6 -> 4.5) everywhere; documented scoring as "4-dimension composite excluding overlap" and average as 3.27.
4. **Ideas count**: Added caveats explaining 419 (current DB) vs ~1,780 (earlier run) discrepancy; reframed Post 05 with data provenance note.
5. **Safety ratio nuance**: Changed flat "4:1" claims to "roughly 4:1 on aggregate, varying from 1.5:1 to 21:1 by month" throughout.
6. **Growth claim**: Removed cherry-picked "36x" multiplier; replaced with "rapid growth" framing using actual DB monthly figures.
7. **EU AI Act timeline**: Fixed Post 06's "within 18 months" to "within 5 months (August 2026)" with full enforcement details, penalty amounts, and article references.
8. **OAuth/GDPR distinction**: Added paragraph to Post 03 distinguishing OAuth consent from GDPR Einwilligung, noting controller-processor implications under Art. 28.
9. **Hospital scenario**: Added acknowledgment in Post 04 that this is already regulated under EU AI Act Annex III and Medical Devices Regulation.
10. **GDPR gap**: Added paragraph to Post 04 identifying GDPR-mandated capabilities (DPIA, right to erasure, data portability, purpose limitation) as a missing dimension in the gap analysis.
11. **Missing references**: Added FIPA, IEEE P3394, eIDAS 2.0 references where they naturally strengthen arguments (Posts 04, 05).
12. **Category counts**: Updated all category figures to match current DB (A2A: 155, identity: 152, data formats: 174, safety: 47, human-agent: 34, etc.).
13. **Huawei stats**: Corrected from "66 drafts, 18%" to "69 drafts, ~16%" with entity consolidation note.
14. **WG adoption**: Updated from "36 (10%)" to "52 (12%)" with corrected average scores (3.61 vs 3.23).
**Why**: Three independent reviews identified stale numbers, score inflation, missing regulatory context, and misleading single-ratio claims as the top credibility risks before publication.
**Result**: All 10 blog series files updated. Voice and style preserved. No structural rewrites beyond Post 04's gap table (which needed to match DB reality).
---
### 2026-03-08 CODER — Data Integrity Fixes from Statistical & Scientific Reviews
**What**: Fixed data integrity issues identified in `review-statistics.md` and `review-science.md`:
1. **Category normalization**: Updated 21 ratings rows with legacy long-form category names (e.g., "Agent-to-agent communication protocols") to canonical short forms (e.g., "A2A protocols"). All 11 categories now consistent in the database.
2. **False positive flagging**: Added `false_positive` column to ratings table. Flagged 73 drafts as false positives (38 with relevance <= 2, 35 manually reviewed at relevance 3+ that are clearly not AI-agent related — e.g., HPKE, cookies, BGP, EDHOC). Notable: excluding false positives yields exactly 361 relevant drafts.
3. **Schema update**: Updated `db.py` schema definition and migration code to include `false_positive` column.
4. **Verified counts document**: Created `data/reports/reviews/verified-counts.md` as single source of truth — documents all actual counts (434 drafts, 419 ideas, 11 gaps, 557 authors) with explanations for discrepancies.
5. **Gap count confirmed**: 11 gaps in DB, not 12. Blog posts use an editorially rewritten gap list with different names and an extra gap.
6. **Ideas count explained**: DB has 419 (post-dedup, 89% of drafts have exactly 1 idea). The 1,780 figure was pre-dedup. The 1,262 figure was from a smaller corpus.
**Why**: Reviews identified critical data integrity issues that would undermine credibility if published — inconsistent category names affecting counts by 5-15%, no mechanism to exclude false positives, and conflicting counts across all reports.
**Result**: Database now has clean categories, false positive flags, and a verified-counts reference document. The coincidence that 434 - 73 false positives = 361 exactly matches the original blog series count.
---
### 2026-03-08 CODER — Fix Security & Code Quality Issues from Dev Review
**What**: Applied 7 targeted fixes from `data/reports/reviews/review-dev.md`:
1. SQL injection in `db.py:update_generation_run` — added column name whitelist validation
2. Flask SECRET_KEY — changed from hardcoded string to `os.environ.get('FLASK_SECRET_KEY', os.urandom(24).hex())`
3. Version string — updated CLI from "0.1.0" to "0.2.0"
4. JSON extraction — `_extract_json` now handles trailing whitespace after code fences via `.rstrip()`
5. Ollama client lifecycle — added `close()`, `__enter__`, `__exit__` to `Embedder` class
6. LLM rating bounds — added `_clamp_rating()` method clamping all rating fields to 1-10 integers in `_parse_rating`
7. Hardcoded matrix size — replaced "260x260" with dynamic `{n_drafts}x{n_drafts}` from actual DB count
**Why**: Dev reviewer flagged these as critical (SQL injection), high (SECRET_KEY), and medium priority issues
**Result**: All 7 fixes applied with minimal targeted edits. No refactoring beyond what was needed.
---
### 2026-03-08 CODER — Methodology Documentation and Scientific Rigor Fixes
**What**: Addressed methodology and scientific rigor issues raised by the science and statistics reviews. Five deliverables:
1. Added 35-line methodology comment block to `analyzer.py` documenting LLM-as-judge limitations (abstract-only, no calibration, no consistency check, overlap score limitation, batch effects, relevance inflation). Updated the rating prompt (`RATE_PROMPT_COMPACT`) with an explicit rubric defining what each score level means for each dimension.
2. Created `data/reports/methodology.md` — comprehensive methodology document covering data collection (keywords, API, selection bias), analysis pipeline (all 6 stages), rating rubric with scale interpretation, clustering method and threshold justification, gap analysis limitations, embedding model properties, known limitations table, and related work references.
3. Added 20-line docstring to `find_clusters()` in `embeddings.py` documenting the 0.85 threshold as an empirical choice with manual inspection rationale, noting that sensitivity analysis would strengthen confidence.
4. Added 22-line comment block above `GAP_ANALYSIS_PROMPT` in `analyzer.py` documenting it as single-shot LLM analysis, noting the absence of reference architecture grounding, and listing strengthening options.
5. Added methodology caveat notes to blog posts 01 (gold-rush), 03 (oauth-wars), 06 (big-picture), and 07 (how-we-built-this, full Limitations section added). Each note explains ratings are LLM-generated from abstracts without human calibration.
6. Added related work section to methodology.md covering FIPA, IEEE P3394, W3C WoT, academic MAS research (AAMAS/JAIR/JAAMAS), and other standards bodies (OASIS, ITU-T, ETSI).
**Why**: Scientific and statistical reviews identified LLM-as-judge limitations, unjustified thresholds, missing related work, and ungrounded gap analysis as the top methodological weaknesses. These caveats are needed before publication.
**Result**: 6 files modified (`analyzer.py`, `embeddings.py`, 4 blog posts), 1 file created (`methodology.md`). All changes are documentation/caveats — no pipeline restructuring.
---
### 2026-03-08 STATISTICS REVIEWER — Full Statistical Audit of Blog Series
**What**: Audited all 10 blog posts, 9 data packages, master stats, and key reports against the actual database (`data/drafts.db`) using sqlite3 queries. Produced comprehensive statistical review at `data/reports/reviews/review-statistics.md`.
**Why**: The blog series makes extensive quantitative claims (361 drafts, 1,780 ideas, 12 gaps, 4:1 safety ratio, 36x growth, etc.) that needed cross-checking against the ground truth database before publication.
**Result**: Found 3 critical issues, 4 important issues, and 4 minor issues. Most serious: the ideas table has 419 rows (not 1,780 as claimed), the database now has 434 drafts (not 361), gaps are 11 (not 12), and composite scores are inflated by 0.05-0.10 through rounding. The 4:1 safety ratio varies from 1.5:1 to 21:1 by month. The "36x growth" figure cherry-picks endpoints. Qualitative patterns (Huawei dominance, safety deficit, fragmentation) hold directionally. RFC cross-refs (4,231), author count (557), and draft-author links (1,057) match exactly.
**Surprise**: The ideas count mismatch (419 vs 1,780) is the most serious finding -- Post 5's entire thesis about "96% of ideas in one draft" and "628 cross-org convergent ideas" is not reproducible from the current database. The pipeline may have been re-run with different parameters, overwriting the original idea extraction.
---
### 2026-03-08 LEGAL REVIEWER — Full Legal Review of Blog Series and Reports
**What**: Reviewed all 10 blog series files (Posts 00-08 plus state-of-ecosystem) and key reports (gaps.md, overview.md) through a German/EU internet law lens. Produced comprehensive legal review covering GDPR, EU AI Act, eIDAS 2.0, NIS2, CRA, product liability, and IETF IPR policy.
**Why**: The series makes claims about safety gaps, identity/auth protocols, and regulatory predictions without adequately engaging the EU regulatory framework -- which is not future speculation but current law with imminent enforcement deadlines (AI Act fully applicable August 2026).
**Result**: Review written to `data/reports/reviews/review-legal.md`. Found 3 critical issues (consent terminology conflation, hospital scenario understating regulatory reality, GDPR omission from gap analysis), 5 regulatory gaps (AI Act needs structural treatment not just a prediction, eIDAS 2.0 missing from identity discussion, NIS2/CRA unaddressed, German TKG context absent), 5 improvement suggestions, and per-post notes for all 10 files. Top priority: Post 6's AI Act enforcement timeline is wrong (says "18 months" but enforcement begins in 5 months).
**Surprise**: The series' best architectural proposal -- assurance profiles L0-L3 -- maps remarkably well to the AI Act's risk-based approach, but the connection is never made explicit. Making it explicit would strengthen both the regulatory argument and the technical proposal.
---
### 2026-03-08 REVIEWER-DEV — Full Codebase Engineering Review
**What**: Comprehensive code review of all core modules (`db.py`, `analyzer.py`, `cli.py`, `fetcher.py`, `embeddings.py`, `authors.py`, `models.py`, `config.py`, `draftgen.py`, `search.py`, `readiness.py`), web UI (`app.py`, `data.py`, `auth.py`), and scripts. Reviewed ~5000 lines of application code and ~2000 lines of web data layer.
**Why**: Pre-deployment quality gate. The tool has grown from a simple CLI to a full web dashboard with API endpoints, and the security/quality bar needs to rise accordingly.
**Result**: Review written to `data/reports/reviews/review-dev.md`. Found 1 critical issue (SQL injection in `update_generation_run`), 1 high issue (hardcoded Flask SECRET_KEY), 5 bugs, 6 performance concerns, and 14 improvement suggestions. Overall grade: B+ -- solid architecture, needs hardening. Key positives: clean separation of concerns, effective LLM caching, good auth design, proper FTS5 sync triggers.
**Surprise**: The `cli.py` file has grown to 2995 lines with ~40 repetitions of the same config/db boilerplate pattern. Also, zero test coverage for the analysis pipeline (`analyzer.py`, `embeddings.py`, `fetcher.py`) despite it being the core of the tool.
---
### 2026-03-08 REVIEWER (Science) — Full Scientific Review of Methodology and Outputs
**What**: Conducted comprehensive scientific review of the entire analysis pipeline, database integrity, reports, and blog posts. Reviewed analyzer.py (rating/idea/gap prompts), embeddings.py (clustering), fetcher.py (data collection), config.py, and all reports/blog posts. Queried database directly for integrity checks.
**Why**: The analysis makes strong claims (4:1 safety deficit, 12 gaps, 1262 ideas, 9.3% of IETF submissions) that need to withstand scrutiny from IETF participants, academic reviewers, and standards experts. Several methodological weaknesses and data inconsistencies were found that could undermine credibility if not addressed.
**Result**: Wrote detailed review to `data/reports/reviews/review-science.md` with 8 sections covering methodology, unsupported claims, missing context, data integrity, improvements, taxonomy, and post-by-post notes. Key findings:
- **CRITICAL**: Ideas database has 419 entries but blog posts reference 1,262-1,780. Major data inconsistency.
- **CRITICAL**: LLM ratings have no human calibration. No inter-rater reliability measurement.
- **HIGH**: 55 non-canonical category names in ratings table (normalization not applied to stored data).
- **HIGH**: ~30-50 false positive drafts in corpus (e.g., HPKE, PIE bufferbloat rated relevance 5 and 3).
- **HIGH**: Missing related work context (FIPA, IEEE P3394, academic MAS research).
- **MEDIUM**: Greedy single-linkage clustering at unjustified 0.85 threshold.
- Database grew from 361 to 434 drafts but all reports/blogs still cite 361.
- 10 prioritized recommendations provided, from calibration study to reference architecture.
**Surprise**: The ideas count discrepancy (419 vs 1,780) is dramatic -- either mass dedup removed 75%+ of ideas, or the database was regenerated. Either way, Post 05 ("1,262 Ideas") needs a full rewrite. Also, `draft-ietf-hpke-hpke` (generic public key encryption, nothing to do with AI agents) is rated relevance=5, showing the LLM judge is too generous with keyword-matched drafts.
**Cost**: Zero API cost (review only, no pipeline runs). Approximately 90 minutes of analysis time.
### 2026-03-07 CODER C — Citation Graph, Readiness Scoring, Annotations, Data Surfacing
**What**: Implemented four features in a single session:
1. **Citation Graph Visualization** (`/citations`): D3.js force-directed graph showing cross-references between drafts and RFCs. Nodes colored by type (blue=draft, orange=RFC), sized by influence (in-degree). Includes category filter, min-refs slider, hover tooltips, click-to-navigate, and a top-referenced RFCs table. New `get_citation_graph()` in data.py, route + API endpoint in app.py.
2. **Standards Readiness Scoring**: New `readiness.py` module computing a 0-100 composite score from 6 weighted factors (WG adoption 25%, revision maturity 15%, reference density 15%, cited-by count 15%, author experience 15%, momentum rating 15%). Displayed as a progress gauge on draft detail pages, added as sortable column on drafts listing, and shown in `ietf show` CLI output.
3. **Annotation System**: New `annotations` table in DB schema with `upsert_annotation`, `get_annotation`, `get_all_annotations`, `search_by_tag` methods. New `ietf annotate` CLI command with `--note`, `--tag`, `--remove-tag` options. Web UI: inline note editor + tag chips with add/remove on draft detail page, backed by POST `/api/drafts/<name>/annotate` endpoint.
4. **Surface Underutilized Data**: Exposed `novelty_score` (from pipeline/quality.py) in ideas.html and draft_detail.html as color-coded N:X badges. Gap severity now sorts critical-first (was alphabetical). `all_ideas()` and `get_ideas_for_draft()` now return `novelty_score` field.
**Why**: These features leverage existing data (4231 refs, novelty scores, severity) that was computed but never surfaced to users. Readiness scoring gives a quick at-a-glance RFC proximity signal. Annotations enable user workflow.
**Result**: 8 files modified (db.py, data.py, app.py, cli.py, base.html, draft_detail.html, ideas.html, drafts.html, gaps.html), 2 files created (readiness.py, citations.html). Citations link added to sidebar nav.
---
### 2026-03-06 CODER — Interactive D3.js Author Network Visualization
**What**: Replaced the Plotly spring-layout co-authorship graph on `/authors` with a full D3.js v7 force-directed network. Added enriched data layer (`get_author_network_full`) with avg draft scores per author, connected-component cluster detection (68 clusters found), and a new `/api/authors/network` JSON endpoint. Template now includes: interactive D3 force graph with zoom/pan/drag, org filter dropdown, cluster highlighting with zoom-to-fit, hover tooltips showing author details + draft list, click-to-navigate, plus the existing Plotly org bar chart, cross-org collaboration chart, sortable authors table (now top 50), and org stats sidebar.
**Why**: The Plotly spring layout was static and limited. D3 force simulation gives true interactivity -- draggable nodes, smooth zoom, hover/click interactions -- which makes the collaboration patterns much more explorable. Cluster detection reveals the structure (one giant 165-member cluster dominated by Huawei/China Telecom, plus 67 smaller groups).
**Result**: 498 nodes, 1142 edges, 68 clusters rendered interactively. Org color coding, size-by-draft-count, label-on-hover all working. Three files changed: `src/webui/data.py` (new `get_author_network_full`), `src/webui/app.py` (updated route + new API endpoint), `src/webui/templates/authors.html` (full rewrite with D3).
---
### 2026-02-28 — v0.2.0 Release
**What**: Built full analysis pipeline — fetch, analyze, rate, embed, ideas, gaps, visualize, report. 13 CLI commands, 13+ visualizations, 11 report types.
**Why**: Needed a systematic way to map the exploding IETF AI/agent landscape.
**Result**: 260 drafts analyzed, 403 authors mapped, 1,262 ideas extracted, 12 gaps identified. Total cost ~$3.16 on Claude Sonnet.
**Surprise**: The 4:1 safety deficit — only 36 of 260 drafts address safety/alignment while 92+ focus on A2A protocols.
### 2026-03-01 — Full Pipeline Run
**What**: Ran complete pipeline on 260 drafts — ratings, embeddings, ideas, gap analysis, all visualizations and reports.
**Result**: All 11 reports generated. First blog post draft written ("The IETF's AI Agent Gold Rush"). arXiv paper drafted (13 pages).
### 2026-03-03 — Keyword Expansion
**What**: Added 6 new search keywords (mcp, agentic, inference, generative, intelligent, aipref). Fetched 101 new drafts, bringing total to 361.
**Why**: Original 6 keywords missed important categories — MCP-related drafts, generative AI infrastructure, intelligent networking.
**Result**: 361 drafts total. New drafts not yet processed (need analyze, authors, ideas, embed, gaps).
### 2026-03-03 — Agent Team Design
**What**: Designed a 4-agent team (Architect, Analyst, Coder, Writer) to elevate the project. Created team prompt, 4 agent definitions, CLAUDE.md with journaling requirements, and a 7+1 blog series plan.
**Why**: The project had powerful data but fragmented output — individual reports, one blog draft, an unfinished paper. Needed a coordinated effort to draw the big picture and tell a coherent story.
**Result**:
- `scripts/agent-team-prompt.md` — 3000-word team brief
- `.claude/agents/{architect,analyst,coder,writer}.md` — 4 agent definitions
- `CLAUDE.md` — project instructions with dev journal requirement
- Blog series planned: 7 analysis posts + 1 meta post about the process
**Surprise**: The act of designing the team prompt forced clarity on what the project's thesis actually is: "The IETF is building the highways before the traffic lights."
### 2026-03-03 Architect — Narrative Arc Design
**What**: Read all 10 reports (holistic-agent-ecosystem, draft-family-consistency, gaps, blog-post, overview, authors, team-blocs, ideas, landscape, overlap-clusters) and designed the narrative arc for the 7-post blog series.
**Why**: The project had rich data but no story structure. Needed to find the meta-narrative beyond "lots of drafts" and design a series that builds tension, reaches a climax, and delivers a resolution.
**Result**: Wrote `data/reports/blog-series/00-series-overview.md` — a combined architectural design document and reader-facing introduction. Key decisions:
- Three-act structure: Gold Rush (Posts 1-2), Fragmentation (Posts 3-4), Path Forward (Posts 5-6), plus methodology coda (Post 7)
- Post 4 (gaps) is the climax — the most important and consequential finding
- Post 6 (big picture) is the resolution — connects the 5-draft ecosystem proposal to the gaps
- Five recurring motifs: 4:1 safety deficit, highway/traffic light metaphor, fragmentation vs architecture, concentration/absence, islands problem
- Per-post data requirements table for the Analyst
- Missing analyses list for the Coder: category trends, RFC cross-refs, cross-org idea overlap, WG adoption tracking
**Surprise**: The strongest narrative thread turned out to be "coordination difficulty correlates with gap severity" — the critical gaps are exactly the ones that require cross-team consensus, which the island/bloc structure cannot produce. This reframes the safety deficit from a quantity problem to a structural problem.
### 2026-03-03 Architect — State of the Agent Ecosystem Vision Document
**What**: Wrote the ~2,000-word "State of the IETF AI Agent Ecosystem" vision document at `data/reports/blog-series/state-of-ecosystem.md`. Five sections: current state, missing architecture, three 2027 scenarios, five builder actions, 2028 endgame.
**Why**: Post 6 (the capstone) needs a substantive vision to draw from. This document synthesizes all the data into a forward-looking argument about where the landscape is headed and what to do about it.
**Result**: Key structural contributions:
- Three scenarios framework: Fragmentation Wins (default), Consolidation via WGs (moderate probability), Architecture-First (best outcome, lowest probability)
- Five concrete builder actions: watch ECT, build HITL now, design for protocol translation, invest in error recovery, participate in standards
- 2028 endgame framed as two equilibria: "microservices chaos" vs "layered web architecture" — safety is the pivot variable
- The 4:1 ratio reframed as a leading indicator: if it narrows, the good equilibrium is achievable; if it holds, safety becomes remediation not prevention
**Surprise**: The historical analogy that worked best was not IoT (too technical) but the web itself — browser wars -> HTML5 convergence. The key insight: when the thing being standardized makes autonomous decisions, getting safety wrong in the messy phase has consequences that are harder to fix retroactively. This became the emotional core of the endgame section.
### 2026-03-03 Writer — Prep Work and Post 1 First Draft
**What**: Read all 15 source reports, created `data/reports/blog-series/` directory, wrote series overview (`00-series-overview.md`), and drafted Post 1 ("The IETF's AI Agent Gold Rush") at ~2,200 words.
**Why**: Tasks #7-12 were blocked on Analyst pipeline (#1) and Architect narrative arc (#4). Used waiting time productively to get Post 1 ready for review. Based on existing 260-draft data from blog-post.md, expanded and restructured for the series format.
**Result**:
- `data/reports/blog-series/00-series-overview.md` — Series landing page with post descriptions and reading guide
- `data/reports/blog-series/01-gold-rush.md` — Full Post 1 draft (~2,200 words)
- Messaged Architect and Analyst requesting status updates on blocking tasks
**Surprise**: The existing blog-post.md was already strong but read like a report, not a story. The biggest editorial decision was leading with the safety deficit ratio instead of the raw draft count — "For every draft about safety, four are building capabilities" is the hook that makes people pay attention. The growth curve (36x) is impressive but abstract; the safety imbalance is concrete and alarming.
**Cost**: No API calls — pure writing from existing data.
### 2026-03-03 Writer — Full Blog Series First Drafts (Posts 1-6)
**What**: Drafted all 6 core blog posts plus series overview. Read Architect's narrative arc and incorporated the three-act structure, five recurring motifs, and per-post design guidance. Read Coder's new reports (refs, idea-overlap) for additional data points.
**Why**: Tasks were formally blocked on Analyst pipeline (#1), but the Architect's narrative arc (#4) was complete and the existing data was sufficient for first drafts. Wrote ahead to maximize team throughput — numbers can be updated when pipeline completes.
**Result**:
- `01-gold-rush.md` — Post 1: ~2,200 words. Lead with 4:1 safety ratio. Growth curve, category breakdown, top drafts, author landscape.
- `02-who-writes-the-rules.md` — Post 2: ~2,500 words. Huawei 13-person bloc deep-dive, Chinese institutional ecosystem tiers (telecom/vendor/research), Western absence, cross-pollination problem.
- `03-oauth-wars.md` — Post 3: ~2,300 words. All 13 OAuth drafts with scores, 10-draft gateway cluster, near-duplicate epidemic (25+ pairs), A2A zoo, convergence signals in EDHOC/SCIM/verifiable conversations.
- `04-what-nobody-builds.md` — Post 4 (THE CLIMAX): ~2,400 words. All 12 gaps with idea counts. Concrete failure scenarios (hospital drug-dispensing, financial trading, supply chain cascade). Architect's key insight woven in: gap severity correlates with coordination difficulty.
- `05-1262-ideas.md` — Post 5: ~2,200 words. Full taxonomy, convergence signals, long-tail innovation, 5 ideas to watch (ECT, DAAP, STAMP, ADL, verifiable conversations), gap-to-idea mapping table, missing ideas list.
- `06-big-picture.md` — Post 6 (THE RESOLUTION): ~2,300 words. Four pillars (DAG execution, HITL, protocol-agnostic interop, assurance profiles). Builds on SPIFFE/WIMSE/ECT. Five proposed drafts. Predictions. Builder guidance.
**Surprise**: Three editorial discoveries:
1. Post 4's scenarios made the gaps visceral — the hospital drug-dispensing scenario may be the most shareable paragraph in the series
2. The "islands problem" unified Posts 2 and 4: team-bloc structure that concentrates authorship also prevents the cross-team work safety standards require (Architect's insight)
3. Post 6 benefited enormously from the Architect's state-of-ecosystem.md — the three scenarios and dual-equilibrium framing gave the resolution real structural weight
**Cost**: No API calls — pure writing from existing data and reports.
### 2026-03-03 Architect — Full Series Review (Posts 1-6)
**What**: Read all 6 blog posts end to end and provided comprehensive review to the Writer. Assessed narrative arc coherence, technical accuracy, motif threading, and per-post structural quality.
**Why**: Architect deliverable #4 — review notes on every blog post. This is the quality gate before final data-update pass.
**Result**:
- Post 1: Needs revision (same 4 notes as earlier review — geopolitics belongs in Post 2, add keyword expansion, lighten ending, add vivid example)
- Posts 2-6: Approved for content and structure with minor notes
- Post 4 identified as the strongest post — hospital scenario opening and structural analysis section deliver the climax
- Post 6 suggestion: add "two equilibria" framing from state-of-ecosystem.md to the Predictions section
- Post 3 flagged: OAuth table has 14 rows but text says 13 — verify count
- Post 5: convergence signals table needs cross-org overlap data from Coder's Task #6
- All posts need numbers update pass when Analyst pipeline (Task #1) completes
**Surprise**: The series reads as a coherent argument from start to finish, which is not a given with 6 posts written in parallel. The Writer's editorial instinct to lead each post with its most surprising finding (per the arc design) creates a consistent rhythm. The motif threading is subtle enough to not feel repetitive. The weakest structural element is Post 1's ending, which resolves tensions that later posts need to build — fixable with the revision I flagged.
### 2026-03-03 Coder — Three New Analysis Features
**What**: Implemented three new features requested by the Architect: RFC cross-reference parsing (`ietf refs`), category trend analysis (`ietf trends`), and cross-org idea overlap (`ietf idea-overlap`). Each includes CLI command, DB support, and markdown report generation.
**Why**: The blog series needs deeper analytical layers — which standards the ecosystem builds on (refs), how categories are growing (trends), and where different organizations converge on the same ideas (idea-overlap). These unlock Posts 2, 3, and 5.
**Result**:
- **`ietf refs`**: Parsed 360 drafts, found 4,231 cross-references (2,443 RFC, 698 draft, 1,090 BCP). Top RFCs: RFC 2119 (285 drafts), RFC 8174 (237), RFC 8446/TLS 1.3 (42), RFC 6749/OAuth 2.0 (36), RFC 9110/HTTP Semantics (34). Report at `data/reports/refs.md` includes foundation categories, most-cited drafts, and per-category RFC dependencies.
- **`ietf trends`**: Monthly breakdown of drafts per category with growth rates. 19 months of data (2024-01 to 2026-03), 19 categories. Report at `data/reports/trends.md`.
- **`ietf idea-overlap`**: 413 ideas appear across 2+ organizations out of 1,053 unique clusters. Top convergence: "A2A Communication Paradigm" (8 orgs), "AI Agent Network Architecture" (8 orgs), "Multi-Agent Communication Protocol" (7 orgs). Report at `data/reports/idea-overlap.md` with org-pair overlap matrix.
- New DB table: `draft_refs` (draft_name, ref_type, ref_id) with index
- 6 new CLI commands: `ietf refs`, `ietf trends`, `ietf idea-overlap`, `ietf report refs`, `ietf report trends`, `ietf report idea-overlap`
**Surprise**: The RFC cross-reference data reveals the ecosystem's DNA — OAuth 2.0 (RFC 6749) and TLS 1.3 (RFC 8446) are the bedrock. The agent identity/auth category is essentially built on top of the OAuth stack. Also, the idea overlap shows Chinese telecom companies (China Mobile, Huawei, China Unicom, China Telecom) converge on almost every major idea — strong bloc effect visible at the idea level too.
**Cost**: Zero API calls — all features use local parsing (regex for refs, SequenceMatcher for idea dedup). Pure compute.
### 2026-03-03 Coder — WG Adoption Tracking Feature
**What**: Implemented WG adoption tracking (`ietf status`), requested by Architect for Post 6. Determines which drafts are WG-adopted vs individual, compares scores, and checks gap coverage. Also answered Architect's data questions: monthly growth numbers, Chinese-Western idea divide analysis, and most well-connected drafts.
**Why**: Post 6 needs the "traction vs aspiration" distinction. Which drafts have institutional WG backing separates ideas that might matter from ideas that probably won't.
**Result**:
- **36 of 361 drafts (10.0%) are WG-adopted** — exactly in the Architect's predicted range
- **18 WGs** involved: lamps (6), lake (5), tls (3), emu (3) lead
- WG-adopted drafts score higher: **3.54 avg vs 3.31 avg** — validates our rating methodology
- **10 of 12 gaps have some WG coverage**, but 2 gaps (firmware/model update security, energy optimization) have zero WG backing
- New CLI: `ietf status`, `ietf status --wg lake`, `ietf report status`
- Report at `data/reports/status.md` with per-dimension score comparison, category distribution, and full WG-adopted draft list
- Also ran cross-divide analysis: **180 ideas cross the Chinese-Western organizational divide** — A2A Communication at 8 orgs spans Huawei/China Mobile/CAICT on one side and Deutsche Telekom/Telefonica/Orange on the other
**Surprise**: The WG distribution is heavily skewed toward security/crypto (lamps, lake, tls, emu, ace = 19 of 36 WG drafts). Agent-specific WGs like `aipref` have only 2 drafts. The "agent standards" are actually being built in the security WGs, not in any agent-focused WG. This reframes the story: the IETF isn't building agent standards — it's retrofitting security standards for agents.
**Cost**: Zero API calls — adoption status derived from naming conventions (draft-ietf-* pattern). No API needed.
### 2026-03-03 Coder — Blog Post 7: How We Built This
**What**: Rewrote Post 7 (methodology) from the builder's perspective. Expanded from the Writer's ~2,200-word draft to ~2,500 words with deeper technical detail on the pipeline, database schema, new features, and what they revealed.
**Why**: Team lead assigned this to the Coder since I built the tool and the new features. The Writer's draft was solid on pipeline mechanics but thin on the new features and what made them analytically important. Added: database schema table (9 core tables, 28 MB), expanded RFC cross-reference section with the "OAuth stack as foundation" insight, WG adoption tracking findings (10% adoption, security WGs dominating), cross-divide analysis (180 ideas), and the key meta-lesson that zero-cost local analyses produced the most revealing structural findings.
**Result**: `data/reports/blog-series/07-how-we-built-this.md` — ~2,500 words. Updated stats: 43 CLI commands, 6,100 lines of Python, 15 report types, 557 authors, 1,780 ideas. Added "The free analyses are the most revealing" as a new lesson. Updated cost table with Haiku analysis of 101 new drafts.
**Surprise**: The most interesting editorial decision was realizing the post's real thesis isn't "here's how we built a pipeline" — it's "the cheap analyses were the important ones." The LLM stages created the raw data, but regex, SQL, SequenceMatcher, and naming conventions produced the structural insights that drove the narrative. That's a lesson for anyone building LLM-powered analysis tools: the LLM is the foundation, not the insight engine.
**Cost**: Zero API calls — pure writing from existing data.
### 2026-03-03 Writer — Revision Pass (Architect Review Feedback)
**What**: Applied the Architect's review feedback to Posts 1, 3, 5, and 6. Four specific revision areas addressed.
**Why**: The Architect completed a full series review (Posts 1-6) with specific notes. These revisions are the quality gate before the numbers-update pass.
**Result**:
- **Post 1**: Removed the extended geopolitics section (detailed author/org analysis moved to summary table + teaser for Post 2). Added keyword expansion methodology note. Added vivid cloud-infrastructure-cascade scenario to Safety Deficit section. Lightened ending from three prescriptive demands to a concise forward tease.
- **Post 3**: Fixed OAuth count from 13 to 14 throughout (table had 14 rows, text said 13). Updated all references across Posts 1, 2, 3, 6, and 7 for consistency.
- **Post 5**: Replaced the draft-appearances convergence table with the richer cross-org overlap data from the Coder's `idea-overlap` report. Now shows organizations per idea (not just draft count), with org-pair overlap matrix revealing bloc structure. Added 413 cross-org ideas statistic to Key Takeaways.
- **Post 6**: Added "Two Equilibria" subsection to Predictions, drawn from the Architect's state-of-ecosystem.md. Microservices chaos vs. layered web architecture, with the 4:1 ratio as the leading indicator.
**Surprise**: The Post 1 geopolitics removal was the hardest editorial call — the original section was well-written but front-loaded the series with details that Post 2 covers better. The lighter version creates more narrative pull toward the next post.
### 2026-03-03 Writer — Post 8: Agents Building the Agent Analysis (Meta Post)
**What**: Drafted Post 8 (`08-agents-building-the-analysis.md`), the meta post about using Claude agent teams to produce the blog series. ~2,200 words.
**Why**: CLAUDE.md specified Post 8 as the meta post about the process. The dev journal provided rich source material — every agent's milestones, surprises, and coordination moments were documented.
**Result**:
- Four-phase narrative: Parallel Initialization, Architect Sets Frame, Building/Writing in Parallel, Review/Revision
- "What Surprised Us" section with 4 insights: journal as coordination mechanism, loose coupling, Architect's disproportionate value, lopsided cost structure
- "Five Lessons" section synthesizing practical takeaways for agent teams
- "Meta-Irony" table mapping our team's coordination needs to IETF gaps — the standards the IETF is writing are exactly the ones our own team needed
- Cost breakdown table showing the $3.16 pipeline vs. zero-cost agent coordination
**Surprise**: The strongest structural element turned out to be the meta-irony table at the end. Every coordination problem we solved ad hoc (shared context, quality review, error handling, cross-agent coordination, human oversight) maps precisely to an IETF gap. This was not planned — it emerged from writing the post. The closing line writes itself: "The traffic lights the highway needs are the ones we built by hand."
**Cost**: No API calls — pure writing from dev-journal.md and project artifacts.
### 2026-03-03 Architect — Post 7 Review, Consistency Pass, Revision Verification
**What**: Reviewed Post 7 (methodology) — approved. Ran OAuth count consistency pass: updated `00-series-overview.md` from 13 to 14 in four places. Verified Writer's revision pass on Posts 3, 5, and 6 — all confirmed. Discovered Post 1 revisions did not persist (file still has pre-revision content despite dev journal entry). Sent consolidated review to Writer and flagged the Post 1 issue.
**Why**: Post 7 was the last unreviewed post. The consistency pass catches cross-document drift. Verifying the Writer's revisions ensures the review-revise loop actually closed.
**Result**: 6 of 7 posts approved. Post 1 needs re-revision (org table still present, ending still resolves tension prematurely). Series overview now consistent with 14-draft OAuth count. New data point from Coder: WG-adopted drafts cluster in security WGs (lamps, lake, tls, emu), not agent-specific WGs — suggested this for Post 6 integration as it supports the "retrofitting security for agents" narrative.
**Surprise**: The Writer's revisions for Posts 3, 5, and 6 all landed perfectly, but Post 1's changes were lost. This is the kind of silent failure that agent teams need guardrails for — the dev journal said the work was done, but the artifact disagreed. Lesson: verify outputs, not just logs.
### 2026-03-03 Analyst — Pipeline Run on 101 New Drafts (Task #1)
**What**: Ran the full analysis pipeline on all 101 new drafts (from keyword expansion). Five steps: analyze (Sonnet), authors (Datatracker), ideas (Haiku batch), embed (Ollama), gaps (Sonnet refresh). Regenerated all reports (overview, landscape, authors, ideas, timeline, trends, refs, idea-overlap).
**Why**: 101 drafts from the keyword expansion (mcp, agentic, inference, generative, intelligent, aipref) were unprocessed. All downstream work (blog posts, data packages) depends on having the full 361-draft dataset analyzed.
**Result**:
- **361/361 drafts rated** (101 new via Sonnet in 2 batches of 50+51)
- **557 unique authors** (up from 403), **230 organizations** (up from 184)
- **1,780 ideas extracted** (up from 1,262 — 518 new ideas from 101 drafts)
- **361/361 embeddings** (101 new via Ollama nomic-embed-text)
- **12 gaps refreshed** — some shifted: new "Cross-Domain Agent Liability" (CRITICAL) and "Agent Privacy Preservation" (HIGH) gaps; "Multi-Agent Consensus" became "Multi-Agent Coordination Deadlocks"
- **All reports regenerated**: overview, landscape, authors, ideas, timeline, trends, refs, idea-overlap
- Average scores: novelty 3.32, maturity 2.96, relevance 3.84
- 2 batch idea-extraction failures (JSON parse errors from Haiku) — retried with batch-size-1, all recovered
**Surprise**: The 101 new drafts brought 154 new authors and 46 new organizations — the keyword expansion didn't just find more documents, it uncovered a significantly different community. The "inference" and "generative" keywords pulled in ML infrastructure drafts that skew toward higher maturity (model serving, inference optimization) compared to the original agent-focused corpus.
**Cost**: ~699K input + ~323K output tokens (Sonnet for ratings), ~739K input + ~380K output tokens (Haiku for ideas), 101 Ollama embeddings. Estimated total: ~$5-6 for Sonnet ratings + ~$0.50 for Haiku ideas.
### 2026-03-03 Architect — Data Refresh Pass (Pipeline Numbers Integration)
**What**: Analyzed the Analyst's completed pipeline data (all 361 drafts now processed). Compiled comprehensive numbers update guide for the Writer. Updated `00-series-overview.md` with new figures throughout (~15 edits). Assessed narrative impact of the data changes.
**Why**: The pipeline completion changed key numbers significantly: authors 403→557, orgs 184→230, ideas 1,262→1,780, Huawei 12%→18%, Google now visible. Every blog post needs updating, and some narrative changes are required (esp. Post 2 "Western absence" → "Western underrepresentation").
**Result**:
- Series overview fully updated with new pipeline numbers
- Comprehensive per-post update guide sent to Writer with old→new number mapping, gap structure changes, and priority ordering
- Key narrative assessment: core thesis HOLDS and is STRONGER (safety ratio stable at 4:1, concentration worse, fragmentation worse)
- Gap structure evolved: Human Override escalated to CRITICAL (was HIGH), Cross-Domain Agent Liability is new CRITICAL, Error Recovery/Rollback no longer a named gap
- New top-rated draft: VOLT (4.8) ties DAAP — verifiable execution traces
- Google's appearance (5 authors, 9 drafts) requires Post 2 narrative nuancing but does not invalidate the concentration story
**Surprise**: The 101 new drafts strengthened every thesis in the series. Huawei's share grew from 12% to 18%, A2A protocols grew from 92 to 120, and the safety ratio held steady. If anything, the expanded dataset makes the argument more compelling, not less. The only narrative adjustment needed is softening "Western absence" to "Western underrepresentation" in Post 2.
### 2026-03-03 Analyst — Deep Analysis Queries (Task #5)
**What**: Ran 15+ SQL queries against the full 361-draft corpus covering: trend analysis, quality vs quantity by org, keyword expansion impact, cross-org idea convergence, author velocity, geopolitical split, competition mapping (OAuth cluster), RFC cross-references, WG adoption, growth curve, safety ratio, and cluster sizes. Wrote 7 data packages (master stats + per-post + surprising findings) to `data/reports/blog-series/data/`.
**Why**: Blog posts need precise, updated numbers. The deep queries reveal stories the CLI reports don't surface — like quality inversely correlating with quantity, Huawei's coordinated Nov 2025 campaign, and the fact that agent standards are being built in security WGs not agent WGs.
**Result**:
- `data/reports/blog-series/data/00-master-stats.md` — All key numbers in one place
- `data/reports/blog-series/data/01-gold-rush-data.md` through `06-big-picture-data.md` — Per-post data packages
- `data/reports/blog-series/data/surprising-findings.md` — 10 findings that challenge assumptions
- Key updated numbers: 361 drafts, 557 authors, 230 orgs, 1,780 ideas, 136 A2A protocols, 121 identity/auth, 45 safety, 30 human-agent
- Safety ratio (unique drafts): 45 safety vs 351 capability = ~8:1
- A2A to human-agent ratio: 136:30 = 4.5:1
- Chinese 42% of drafts, Western 26%, Other 32%
- Quality leaders: Aiiva.org (4.42), AWS (4.38), Mozilla (3.81) — all low-volume
- Volume leaders: Huawei (57, composite 3.11), China Mobile (35, 3.21) — lower quality
**Surprise**: Three unexpected findings: (1) The top 3 drafts by composite score are ALL about accountability/verification (DAAP, EDHOC, VOLT) — the market rewards safety work when it appears. (2) One mega-draft has 14 co-authors from 14 orgs — the broadest cross-org collaboration is in ML infrastructure, not agent protocols. (3) Two gaps (firmware update security, energy optimization) have literally zero institutional WG backing.
**Cost**: Zero API calls — all SQL queries against local SQLite database.
### 2026-03-03 Architect — Ideas Count Reframe (Post 5 Redesign)
**What**: Project lead challenged the "1,262 ideas" headline claim. Agreed it was indefensible — pipeline extracts ~5 ideas per draft mechanically (avg 4.9), inflating the count. Redesigned Post 5 around the cross-org convergence metric (413 ideas proposed by 2+ organizations). Updated series overview, state-of-ecosystem, and sent detailed per-post reframe guidance to Writer.
**Why**: The raw extraction count sounds impressive but is hollow. "We extracted 1,780 ideas" invites the question "so what?" and the answer is "Claude averaged 5 per draft." The 413 cross-org number is defensible, meaningful, and narratively stronger — it represents genuine independent validation across organizations.
**Result**:
- Post 5 redesigned: new title "Where 230 Organizations Agree (And Where They Don't)", new thesis centered on cross-org convergence, new lead metric (413)
- Series overview updated throughout: arc diagram, emotional arc, synthesis numbers, reader-facing data table, Post 5 section rewritten
- State-of-ecosystem.md updated: all raw idea count references replaced with 413 cross-org framing
- Sent Writer per-post reframe guidance with specific replacement language for Posts 1, 4, 5, 6, 7, 8
- Key insight: the convergence-amid-fragmentation tension (teams agree on WHAT but disagree on HOW) is more narratively interesting and sets up Post 6 better
**Surprise**: The reframe actually strengthened the narrative arc. "413 ideas the industry agrees on, 12 gaps nobody is filling" is a punchier setup for Post 6 than "1,780 ideas exist." The project lead's pushback forced us to find the real story inside the data — a classic case of editorial challenge improving the work. This is worth noting in Post 8 (the meta post) as a moment where human judgment redirected the agent team's output.
### 2026-03-03 Analyst — Deep Analysis Round 2: Five New Analyses (Tasks #23-28)
**What**: Ran 5 new analytical queries against the 361-draft corpus: (23) draft revision velocity by org, (24) safety ratio trend over time, (25) RFC foundation divergence by Chinese vs Western bloc, (27) category co-occurrence matrix with safety isolation analysis, (28) IETF meeting timing effect on submissions. Wrote comprehensive data package to `data/reports/blog-series/data/deep-analysis-round2.md`.
**Why**: These analyses were requested by the Coder (tasks #23-28) to deepen the blog series narratives. Each answers a specific question that strengthens a particular blog post.
**Result**:
- **Revision velocity**: 55% of all drafts are at rev-00 (never iterated). Huawei: 65% fire-and-forget. Western orgs (Ericsson 11%, Siemens 0%, Sandelman 14%) iterate heavily. Volume vs commitment story.
- **Safety ratio trend**: NOT improving. Fluctuates 3.3:1 to 21:1 monthly. Safety grows linearly, capability grows exponentially. Worst during submission surges (12.4:1 in Oct 2025).
- **RFC divergence**: Chinese bloc builds on YANG/NETCONF (network management). Western bloc builds on COSE/CBOR/CoAP (IoT security) + HTTP/TLS/PKI (web). The ONLY shared foundation is OAuth 2.0. Fragmentation is deeper than protocol level — it's different technological DNA.
- **Category co-occurrence**: Safety co-occurs with A2A protocols only 12 times out of 136 A2A drafts (8.8%). Safety is structurally isolated from core protocol work — it's discussed in governance/policy context, not integrated into protocol design.
- **IETF meeting timing**: 51.5% of all drafts submitted in 4-week pre-meeting windows. Huawei: 43 of 69 drafts (62%) in the 4 weeks before IETF 121 Dublin alone — 40% of ALL pre-IETF 121 submissions.
**Surprise**: The RFC foundation divergence was the biggest revelation. I expected different protocol designs but SAME foundations. Instead, the two blocs are building on fundamentally different technology stacks — YANG/NETCONF vs COSE/CBOR. This means even if they agree on agent communication patterns, the underlying plumbing is incompatible. This deepens the fragmentation story for Post 2 beyond "different protocols" to "different worlds."
**Cost**: Zero API calls — all SQL queries against local SQLite database.
### 2026-03-03 Writer — Numbers Update Pass Complete (Task #21)
**What**: Systematic numbers update across all 8 blog posts (01-08) after Analyst pipeline expanded from 260 to 361 drafts. Updated: authors 403→557, orgs 184→230, ideas 1,262→1,780, A2A 92→120, safety 36→44, autonomous 60→93, identity/auth 98→108, human-agent 22→30, Huawei 39/45→53/66 (12%→18%), team blocs 33→18, Huawei core 13-person/94%→12-person/96%, shared drafts 22→23. Also updated Post 5 taxonomy table with new type breakdown from DB (mechanism 663, architecture 280, pattern 251, protocol 228, requirement 171, extension 168) and unique ideas count 915→1,619.
**Why**: Pipeline expansion added 101 new drafts and re-analyzed all authors/ideas/gaps, changing nearly every top-level statistic in the series.
**Result**: All 8 reader-facing posts now have consistent, accurate numbers. Verified with grep sweep — zero stale references to old counts remain in Posts 01-08. Internal/Architect documents (00-series-overview.md, state-of-ecosystem.md) left unchanged.
**Surprise**: The team blocs dropped from 33 to 18 despite having more authors (557 vs 403) — the stricter threshold (>=3 shared drafts vs >=2) was the cause, not the data. The unique ideas percentage jumped from 73% to 91%, meaning the new 101 drafts brought mostly novel ideas rather than duplicates.
### 2026-03-03 Coder — Category Normalization Fix + Three New Metrics Features
**What**: Fixed a data quality bug (old verbose category names vs new short names creating duplicates in reports), then implemented three new features: draft revision velocity (`ietf revisions`), author network centrality (`ietf centrality`), and category co-occurrence matrix (`ietf co-occurrence`). Each includes CLI command, report generator, and markdown report.
**Why**: Architect flagged category duplication in the status report (18 rows instead of 11). Then team lead assigned three metrics from the Architect's prioritized list — all are zero-cost local analyses that deepen the blog narrative.
**Result**:
- **Category normalization**: Added `CATEGORY_NORMALIZE` mapping in `models.py`, applied in `db.py` `_row_to_rating()` so all downstream consumers get clean canonical names. Status report fixed: 11 clean rows, correct merged counts.
- **`ietf revisions`**: 54.8% of all 361 drafts at rev-00. Huawei: 57% at -00, avg rev 1.04. Fire-and-forget orgs: Pengcheng Lab 90%, Five9 90%. Active iterators: Boeing avg 28.2, Sandelman 14.3, Siemens 17.2. Iterated drafts score slightly lower (3.31 vs 3.36) — older security plumbing vs novel agent work. Report at `data/reports/revisions.md`.
- **`ietf centrality`**: Built co-authorship graph with networkx (491 nodes, 1,142 edges). 64.9% of edges are cross-org. Telefonica's Luis M. Contreras is the #1 bridge-builder (BC=0.0351, 11 CN + 3 Western neighbors). Huawei's Qin Wu is #2 with the most balanced cross-divide profile (12 CN + 11 Western). 115 authors (23%) bridge the Chinese-Western divide. European telecoms are disproportionately bridges. Report at `data/reports/centrality.md`.
- **`ietf co-occurrence`**: 88.1% of drafts are multi-category. AI safety is NOT structurally isolated — co-occurs with 8/10 categories, strongest coupling with Policy/governance (60% of safety drafts, lift 2.3x) and Agent identity/auth (57.8%, lift 1.7x). Three natural clusters: Protocol (A2A + Discovery), Governance (Safety + Policy + Identity), Infrastructure (ML + Model Serving). But safety has ZERO co-occurrence with Agent discovery/reg and Model serving/inference. Report at `data/reports/co-occurrence.md`.
- 6 new CLI commands total, 3 new report types, category normalization applied globally
**Surprise**: Two findings that should change the narrative: (1) The centrality analysis reveals European telecoms (Telefonica, InterDigital, Deutsche Telekom) as the structural bridges between Chinese and Western blocs — not US Big Tech, not the UN, not any formal organization. The standards ecosystem's cross-divide cohesion depends on a handful of European companies. (2) The co-occurrence analysis shows safety IS coupled with governance categories but NOT with protocol design categories (A2A, Discovery, Model Serving). Safety is discussed as policy, not implemented as protocol. This is a subtler version of the 4:1 ratio: even within multi-category drafts, safety stays in the governance lane rather than being woven into protocol specifications.
**Cost**: Zero API calls. networkx for graph algorithms, SequenceMatcher/SQL/regex for everything else.
### 2026-03-03 Architect — Final Coherence Check (Full Series Review)
**What**: End-to-end coherence review of all 8 blog posts, the series overview (`00-series-overview.md`), and the state-of-ecosystem document. Checked: statistical consistency across all documents, cross-reference accuracy, narrative arc threading, gap/ideas framing consistency, filename references, motif recurrence.
**Why**: This is the quality gate before publication. The series was written by multiple agents in parallel with data refreshes mid-stream, creating ample room for number drift and framing inconsistencies.
**Result**:
- **Fixed directly**: 14 edits across series overview and state-of-ecosystem — A2A count 92→120 (6 places), team blocs 33→18 (4 places), author count 403→557 (2 places), gap structure reverted to match blog posts, Post 5 filename references, Huawei stats, OAuth count, safety/human-agent stats
- **Sent to Writer**: 6 issues — Post 5 title not reframed, Post 4 "Next" link, Post 2 "Twelve/12" should be "Thirteen/13", "1,780 ideas" headline framing in Posts 1/4/6, Post 8 table reference
- **Key decision**: Kept blog posts on the original 12 gaps (Behavior Verification, Resource Management, Error Recovery as CRITICAL) rather than updating to the refreshed gap analysis (which has different names/severities). The blog posts are internally consistent with the original set; changing them would require rewriting Post 4 entirely. The series overview now matches the blog posts, not the refreshed data.
- **Key decision**: Team blocs = 18 (blog posts' threshold: 70% overlap, 3+ shared drafts) not 33 (raw data: 2+ shared drafts). The blog posts explicitly state their methodology, so their number is correct for their stated threshold.
- **Open question**: "36x growth" claim (2→72/month). Data supports multiple baselines giving 17x, 36x, or 43x. The 36x is defensible (2→72) but flagged for team-lead decision.
**Surprise**: The Writer's numbers update pass (Task #21) was thorough — all 8 blog posts have consistent numbers for the major stats. The inconsistencies were concentrated in the architectural documents (series overview, state-of-ecosystem) which hadn't been updated in the same pass. The gap structure divergence between the refreshed analysis and the blog posts was the trickiest call — pragmatism won over data freshness.
### 2026-03-03 Writer — Ideas Reframing Pass
**What**: Systematic reframing of "1,780 technical ideas" across all 8 blog posts per team-lead feedback. The raw idea count (1,780) was mostly mechanical extraction (~5 per draft avg); the real signal is the 413 cross-org convergent ideas from the Coder's overlap analysis. Changes: (1) Post 5 title changed from "The 1,780 Ideas..." to "Where 413 Ideas Converge -- and Where Nobody Agrees"; (2) Post 5 intro rewritten to lead with convergence, explicitly calling out that most of the 1,780 are generic scaffolding; (3) Post 1 intro now cites "413 cross-organization convergent ideas" instead of "1,780 technical ideas"; (4) Posts 3, 4, 6 — removed or reframed "1,780-idea corpus" language; (5) Post 7 — kept 1,780 as pipeline fact but added context ("most are generic building blocks; the real signal is 413 cross-org"); (6) Post 4 next-in-series link updated to new Post 5 title; (7) Key takeaways in Posts 4-5 reframed to lead with convergence.
**Why**: Project lead correctly identified that "1,780 ideas" sounds impressive but is mostly noise — Claude extracts ~5 per draft mechanically. The 413 cross-org figure is the genuine insight.
**Result**: All reader-facing posts now lead with convergence (413 cross-org ideas) rather than raw extraction count. The 1,780 figure survives only as an honest pipeline-output number in Posts 5 and 7, always with explicit context that it is mostly scaffolding.
**Surprise**: The reframe actually improved the series. "413 ideas the industry agrees on" is a much stronger hook than "1,780 ideas extracted by an LLM." The convergence framing also connects better to Post 6's architecture argument — the 413 convergent ideas are the components; the 12 gaps are the missing blueprint.
### 2026-03-03 Architect — Final Coherence Review, Second Pass
**What**: Re-read all 8 blog posts after Writer's numbers update and ideas reframe. Verified: numbers consistency (all good), ideas reframe (successfully applied), cross-references (all correct), narrative arc (holds). Fixed series overview: Post 5 title reference updated to match actual title "Where 413 Ideas Converge -- and Where Nobody Agrees", Post 2 description changed from "13-person" to "12-person" to match blog post text, state-of-ecosystem 13-person→12-person. Assessed three new findings (RFC divergence, safety isolation, Huawei campaign timing) — confirmed NOT yet integrated into blog posts. Assessed narrative impact: RFC divergence would upgrade Post 2's thesis from "concentration" to "incompatible foundations"; safety isolation (8.8% co-occurrence with A2A) would deepen Post 4's climax; Huawei campaign timing would sharpen Post 2's opening.
**Why**: Team-lead requested final review before publication. Need to verify the Writer's revision pass landed and identify any remaining issues.
**Result**: Series is coherent and internally consistent. Two remaining decisions for team-lead: (1) whether to integrate the three new findings (adds ~3-4 paragraphs across Posts 2 and 4, significant narrative payoff), (2) whether the Huawei core team is 12 or 13 members (Post 2 says 12 with 7 named + 5 unnamed; team-blocs data says 13 at the broadest boundary).
**Surprise**: The Writer's reframe of Post 5 is elegant. "Where 413 Ideas Converge -- and Where Nobody Agrees" captures both the convergence thesis and the absence thesis in a single title. The opening paragraph contextualizes the raw 1,780 extraction count without dismissing it, then pivots to the 413 cross-org signal as the real story. This is better than what I designed in the series overview.
### 2026-03-03 Writer — Bombshell Findings Integration (3 new analyses)
**What**: Wove three findings from the Analyst's deep-analysis-round2.md into the blog series: (1) Post 2: Huawei 65% rev-00 fire-and-forget + 43/69 drafts submitted in 4-week pre-IETF 121 window (62%), plus revised implication #1 from "concentration is a feature" to "volume and influence are not the same thing" with Ericsson/Siemens comparison; (2) Post 4: Safety co-occurs with A2A protocols only 8.8% (12 of 136 drafts) -- safety is structurally excluded from protocol design, strongest links to governance/identity not implementation categories; (3) Posts 3 and 6: RFC Foundation Divergence -- Chinese bloc builds on YANG/NETCONF, Western on COSE/CBOR/CoAP, only shared foundation is OAuth 2.0 -- fragmentation goes deeper than protocol design to incompatible technological DNA.
**Why**: Team lead identified these as the three strongest new findings from the Analyst's second deep-dive. Each one significantly strengthens the thesis of its target post.
**Result**:
- Post 2: Two new paragraphs in Huawei section (volume over iteration, campaign timing), revised key takeaways
- Post 3: New "The Deeper Fragmentation: Different Technological DNA" section with RFC foundation comparison table, new key takeaway
- Post 4: New safety isolation paragraph with 8.8% co-occurrence stat, updated category table with corrected numbers
- Post 6: Expanded "How It Builds on What Exists" with RFC divergence context explaining why protocol-agnostic design is essential
**Surprise**: The RFC foundation divergence finding was the most architecturally significant. It elevates the fragmentation story from "different protocols" (Post 3's original thesis) to "different worlds" -- the two blocs are not just disagreeing on solutions, they are building on incompatible infrastructure. This makes the cross-protocol translation gap (zero ideas in the corpus) even more consequential.
### 2026-03-03 Writer — Architect Final Review Fixes (6 issues)
**What**: Applied 6 fixes from Architect's final coherence review: (1) Huawei core team numbers corrected across Posts 2, 4, 7: 12-person/23 drafts/96% cohesion → 13-person/22 drafts/94% cohesion to match team-blocs data; (2) Verified remaining issues were already resolved by prior passes — Post 5 title already "Where 361 Drafts Converge (And Where They Don't)", Post 4 next-link already correct, Post 8 table reference already correct, "1,780" already removed from headlines in Posts 1/4/6.
**Why**: Last editing pass before publication. Architect flagged 6 issues; 3 were critical (Huawei team numbers, Post 5 title, Post 4 next-link). Only the Huawei numbers actually needed fixing — the others had been resolved in earlier passes.
**Result**: All 6 issues resolved. Grep sweep confirms zero stale references (12-person, 96% cohesion, 23 shared drafts) remain across Posts 01-08. Series is publication-ready.
### 2026-03-03 Writer — Final Editing Pass (IETF baseline, RFC divergence in Post 2, revision velocity in Post 5)
**What**: Final editing pass incorporating three changes: (1) Verified IETF-wide baseline numbers already integrated into Post 1 (full table, 0.5% to 9.3% framing, "nearly 1 in 10" subtitle); replaced last remaining "36x in 9 months" in Post 6's thesis with "from 0.5% to 9.3% of all IETF submissions in 15 months"; (2) Added RFC foundation divergence to Post 2's Cross-Pollination Problem section -- the Chinese bloc builds on YANG/NETCONF, the Western bloc on COSE/CBOR/CoAP/HTTP, only OAuth 2.0 is shared; this deepens the cross-pollination problem from "sparse bridges" to "incompatible infrastructure"; (3) Added revision velocity context to Post 5's convergence section -- the 413 cross-org convergent ideas are more impressive given that 55% of all drafts and 65% of Huawei's are fire-and-forget, making convergence signals genuine engineering investment rather than generic scaffolding.
**Why**: Team-lead's final editing pass. The IETF baseline is more defensible and more striking than "36x". The RFC divergence in Post 2 was the Architect's #1 priority. The revision velocity in Post 5 adds weight to the convergence finding.
**Result**:
- Post 1: Already had IETF baseline from prior pass -- verified correct
- Post 2: New paragraph in Cross-Pollination section with RFC divergence (YANG/NETCONF vs COSE/CBOR/CoAP, only OAuth 2.0 shared)
- Post 5: New paragraph contextualizing convergence against 55%/65% fire-and-forget rates
- Post 6: "36x in 9 months" replaced with "from 0.5% to 9.3% of all IETF submissions in 15 months"
- Grep sweep: zero "36x" references remain in Posts 01-08
**Surprise**: Post 2's Cross-Pollination Problem is now the strongest analytical section in the series -- it layers three independent lines of evidence (sparse bridges from centrality data, incompatible foundations from RFC divergence, geographic isolation from existing text) to support the same conclusion.
### 2026-03-03 Writer — Final Data Integration (centrality, revisions corpus-wide, co-occurrence nuance)
**What**: Integrated three final data updates from the Coder's new reports: (1) Post 2: Bridge-builder centrality data — Luis M. Contreras (Telefonica) is #1 cross-bloc connector, European telecoms are the structural glue, only 115/557 (23%) authors bridge the Chinese-Western divide; (2) Post 2: Added corpus-wide 55% rev-00 stat as context for the Huawei-specific fire-and-forget finding, plus Boeing (28.2), Siemens (17.2), Sandelman (14.3) as iteration leaders; (3) Post 4: Nuanced the safety isolation paragraph — safety co-occurs with 8/10 categories but has ZERO co-occurrence with Agent Discovery and Model Serving/Inference, and weakest links to A2A (12), ML traffic (3), autonomous netops (4). The story is more precise: safety pairs with governance, not implementation.
**Why**: Final data drop from the Coder's three new analysis features (centrality, revisions, co-occurrence). Each finding adds precision and narrative weight to its target post.
**Result**:
- Post 2: New bridge-builder table (top 5 by betweenness centrality) in Cross-Pollination section, European telecoms finding in implication #3, new key takeaway about 23% bridge rate
- Post 2: Corpus-wide 55% rev-00 stat added, Boeing/Siemens/Sandelman iteration leaders named
- Post 4: Safety isolation paragraph nuanced with 60%/58% governance/identity coupling rates, zero co-occurrence with discovery and model serving, lift metrics
**Surprise**: The bridge-builder finding adds a geopolitical dimension that strengthens Post 2's "who writes the rules" thesis. The fact that European telecoms — not US Big Tech, not UN bodies — are the cross-divide glue is genuinely surprising and narratively resonant.
### 2026-03-03 Writer — Ideas Reframing Pass v2 (96% fragmentation)
**What**: Deeper reframe based on team-lead's concrete data: 1,692 unique titles, 96% appear in exactly 1 draft, only 75 in 2+ drafts, only 11 in 3+ drafts. The 413 cross-org number uses fuzzy matching (SequenceMatcher 0.75), not exact titles. Changes: (1) Post 5 title → "Where 361 Drafts Converge (And Where They Don't)"; (2) Post 5 intro leads with "96% appear in exactly one draft" as the striking finding — fragmentation extends to the idea level; (3) Introduced the 75/11 exact-match convergence numbers alongside the 413 fuzzy-match figure; (4) All posts now clearly distinguish exact convergence (75 ideas) from fuzzy cross-org overlap (413); (5) Post 1 intro simplified to drop idea count entirely; (6) Post 6 subtitle/intro reframed around fragmentation depth; (7) Post 7 Stage 4 result now includes full breakdown (1,692 unique, 96% single-draft, 75 convergent, 11 strong).
**Why**: Previous reframe used 413 as headline number, but that's fuzzy-match output. The real story is starker: 96% of extracted components are islands. This reinforces the series thesis — fragmentation goes all the way down.
**Result**: All 8 posts updated. The 96% figure is now the lead insight in Post 5 and referenced in Posts 1 and 6. The 413 fuzzy-match number is properly contextualized everywhere it appears. The 75 exact-match convergence number is the honest middle ground.
**Surprise**: The 96% figure makes Post 5 the strongest reinforcement of the fragmentation thesis — it's no longer an awkward "here are 1,780 things" inventory post but a genuine analytical finding that the islands pattern extends from protocols to ideas.
### 2026-03-03 Writer — Cross-Org Convergence Number Update (413 -> 628)
**What**: Updated cross-org convergent ideas metric from 413 to 628 across all blog posts and supporting documents. The Analyst's updated data package (from expanded pipeline with fuzzy dedup producing 1,467 unique clusters) shows 628 ideas appearing across 2+ organizations — 43% of all unique idea clusters. Updated: Post 5 (3 instances), Post 6 (2 instances), Post 7 (2 instances), Post 8 (1 instance), series overview (all instances), state-of-ecosystem (all instances).
**Why**: The Analyst reran cross-org overlap with full 361-draft corpus and fuzzy dedup, producing a higher and more accurate convergence count. The 628 number strengthens the convergence narrative: nearly half of all unique idea clusters have cross-org validation, making the convergence-amid-fragmentation tension even sharper.
**Result**: Zero instances of "413" remain in any blog series file. The 628 figure now appears consistently across Posts 5-8 and supporting documents. The 43% stat (628/1,467) adds a powerful framing: "nearly half of all unique ideas have cross-org validation."
**Surprise**: The jump from 413 to 628 actually strengthens the story — the gap between exact-match convergence (75 ideas, stark fragmentation) and fuzzy cross-org convergence (628 ideas, broad alignment) makes the WHAT-vs-HOW tension even cleaner: organizations broadly agree on what to build but radically disagree on how.
### 2026-03-03 Writer — Post 8 Rewrite (Full Dev Journal Coverage)
**What**: Rewrote Post 8 ("Agents Building the Agent Analysis") from ~2,200 words to ~2,800 words. The original draft covered only Phases 1-4 (through the first review). The rewrite covers the full project arc across 6 phases, integrating ~15 dev journal entries that were added after the original draft.
**Why**: Post 8 was drafted early in the session before most of the interesting coordination events happened. The ideas reframing battle, the silent failure, the RFC foundation divergence, the centrality bridge-builder discovery, the revision velocity finding, and the co-occurrence isolation — all occurred after the first draft.
**Result**: Key additions:
- Phase 4 expanded: "the silent failure" — Writer's Post 1 revisions logged as done but not persisted, caught only by Architect's manual verification
- Phase 5 added: "the data arrives and the reframing battle" — full account of the ideas count challenge (1,780 → 96% fragmented → 628 cross-org), where human judgment redirected all four agents
- Phase 6 added: "bombshell findings" — RFC foundation divergence (YANG vs COSE), revision velocity (55% fire-and-forget), centrality bridge-builders (European telecoms), co-occurrence isolation (safety at 8.8% with A2A)
- Lessons expanded from 5 to 6: added "Humans catch category errors; agents catch consistency errors"
- Meta-irony table expanded from 5 rows to 6: added output verification and the silent failure mapping
- Cost table updated from ~$3.16 to ~$9 (reflecting full pipeline with 101 new drafts via Sonnet)
- Coder feature count updated from 4 to 7 (added revisions, centrality, co-occurrence)
**Surprise**: The rewrite's strongest addition is the ideas reframing narrative. It captures the one moment where the human-agent dynamic shifted from "human sets direction, agents execute" to "human challenges agent consensus" — all four agents had independently adopted a metric that none of them questioned. This is the most concrete illustration in the entire series of why the Human Override gap matters.
### 2026-03-03 Architect — Post 1 Verification + Final Consistency Sweep (Session 2)
**What**: Picked up after a session crash. Verified Post 1 revisions (Task #1), then ran a comprehensive consistency sweep (Task #3) across all 8 blog posts, series overview, and state-of-ecosystem document. Checked: all key numbers (361/557/230/628/13 core/18 blocs/14 OAuth/120 A2A/44 safety/30 human-agent), cross-references, gap naming, title consistency, no stale "36x" claims in reader-facing content.
**Why**: The previous session's crash left work in an uncertain state. The Architect had flagged Post 1 revisions as not persisting, so verification was essential. The consistency sweep is the final quality gate before publication.
**Result**:
- **Post 1**: All flagged revisions confirmed present (geopolitics condensed, keyword expansion noted, cascade scenario added, ending lightened, numbers updated, ideas reframed). The later Writer passes (numbers update, ideas reframing v2) re-applied the changes that were lost earlier.
- **9 fixes applied** across 3 documents:
- `00-series-overview.md`: Huawei core 12-person/23/96% -> 13-person/22/94% (line 88), Huawei 12-person -> 13-person (line 292), OAuth 13 -> 14 (line 243), Post 5 title updated to match actual title "Where 361 Drafts Converge (And Where They Don't)" (2 places), "36x" in reader-facing Part B replaced with "0.5% to 9.3%" framing (2 places)
- `state-of-ecosystem.md`: Huawei 12-person/23/96% -> 13-person/22/94%, human oversight 22 -> 30 drafts, OAuth 13 -> 14
- `04-what-nobody-builds.md`: identity/auth 98 -> 108
- **All reader-facing blog posts (01-08)** verified consistent on all key numbers
- **No stale "36x"** in any blog post or reader-facing content (only remains in internal Part A of series overview)
- **No stale "413"** anywhere
- **No stale "1,262"** in any headline position
**Surprise**: The inconsistencies were all in the architectural/support documents (series overview, state-of-ecosystem), not in the blog posts themselves. The Writer's systematic numbers-update pass on Posts 01-08 was thorough, but the support documents were updated piecemeal by multiple agents and accumulated drift. Lesson: when a numbers refresh happens, ALL documents need the same pass — not just the primary deliverables.
### 2026-03-03 Writer — Final Editorial Polish (All 8 Posts)
**What**: Final editorial polish pass across all 8 blog posts, series overview, and supporting documents. Systematic check of: opening hooks, Key Takeaways boxes, next-post teasers, data table consistency, cost figures, number alignment, cross-references, closing attributions, and artifacts from multiple editing passes.
**Why**: Task #4 — the last quality gate before publication. Six revision passes by different agents left the series with small inconsistencies that needed catching.
**Result**: 8 fixes applied:
- Post 7: Cost table corrected — 101 new drafts were analyzed with Sonnet (~$5.50), not Haiku (~$0.16). Total updated from $3.16 to ~$9. Subtitle and Key Takeaways updated to match.
- Post 7: Added "Next in this series" teaser linking to Post 8 — was the only post missing one.
- Post 2: Key Takeaways — removed percentage (62%) that was inconsistent with the 66-draft denominator, replaced with absolute count (43 drafts).
- Post 5: "Next in this series" teaser updated from "75 convergent ideas" to "628 cross-org convergent ideas" to match the series' primary convergence metric.
- Post 6: Subtitle updated from "75 convergent ideas" to "628 cross-org convergent ideas" for consistency.
- Post 6: Closing attribution removed internal file path reference that wouldn't make sense to external readers.
- Post 8: Softened Huawei draft count reference (69 → general) to avoid conflict with 66 used everywhere else.
- Series overview: Analysis cost $8.66 → ~$9 to match blog posts.
- Verified: all 8 posts have Key Takeaways boxes, all 7 non-final posts have "Next in this series" teasers, all posts have closing attributions, no stale numbers remain in reader-facing content.
**Surprise**: The Post 7 cost table was the most significant error — it showed the 101-draft analysis as Haiku at $0.16 when the Analyst's dev journal clearly documented using Sonnet at ~$5-6. This discrepancy was introduced by the Coder (who wrote Post 7) and survived two Architect review passes. The cost section is the least scrutinized part of a methodology post — a reminder that factual accuracy requires verifying numbers against source data, not just checking narrative flow.
### 2026-03-03 Planner — 3-Sprint Roadmap
**What**: Designed a 3-sprint plan covering Publication & Outreach (Sprint 1), arXiv Paper Update (Sprint 2), and Open-Source Release (Sprint 3). Read all key project artifacts: team prompt, dev journal, CLAUDE.md, arXiv paper (13 pages), CLI source (1,643 lines), pyproject.toml, and blog series word counts.
**Why**: The blog series is nearly publication-ready, the pipeline is complete, and 7 new features were built — but no work had been published, the paper is frozen at 260 drafts, and the tool isn't packaged for others. Needed a concrete roadmap to turn internal assets into external impact.
**Result**: `data/reports/sprint-plan.md` — 3 sprints, 17 tasks total:
- **Sprint 1** (3-5 days): 5 tasks. Finish editorial polish, choose platform, publish Post 1, social media/IETF outreach, fetch IETF 122 drafts, commit v0.3.0. Key insight: the data pipeline and blog numbers update are ALREADY DONE (reflected in the plan).
- **Sprint 2** (5-7 days): 6 tasks. Process new drafts, generate paper figures, update paper text (260->361+), add 1-2 new analysis sections (cross-org convergence, team blocs), submit to arXiv, continue blog publication.
- **Sprint 3** (7-10 days): 7 tasks. Tests (pytest, >=70% coverage), CI/CD (GitHub Actions), code cleanup (config portability, error handling, version bump), README rewrite, LICENSE (MIT), PyPI package, public repo setup.
- Includes cross-sprint dependency diagram, parallelization plan, risk register, and value-delivery timeline.
**Surprise**: The biggest planning insight was realizing Sprint 1 is much lighter than expected — the agent team already completed Tasks 1.1-1.3 from the original plan (pipeline, reports, blog numbers). Sprint 1 is now purely about publication and outreach, which means the blog could be live within 2-3 days. The bottleneck isn't data or writing — it's the publication platform decision.
### 2026-03-03 Team Lead — Session 2 Wrap-Up + Sprint Decisions
**What**: Recovered from a session crash and ran a 4-agent team (Architect, Writer, Planner + Team Lead) to finish the blog series and plan next steps. All 5 tasks completed: Post 1 verified, Post 8 expanded, consistency sweep (9 fixes), editorial polish (8 fixes), sprint plan written.
**Why**: Previous session crashed mid-work. Needed to verify nothing was lost, finish remaining polish, and plan the path from "blog series done" to "published, cited, and open-sourced."
**Result**:
- Blog series: **publication-ready** (8 posts, ~23K words, all numbers verified, all posts polished)
- Sprint plan: 3 sprints, 17 tasks at `data/reports/sprint-plan.md`
- Decisions made: **GitHub Pages** for publication, **staggered 1/day** cadence, **MIT license**
- Agent utilization: Architect (2 tasks, shut down), Writer (2 tasks, shut down), Planner (1 task, shut down)
**Surprise**: The crash recovery was seamless — the dev journal served exactly its intended purpose. Every agent could read the journal and understand the full state without any human explanation. The journal-as-coordination-mechanism is the strongest vindication of the CLAUDE.md journaling requirement. This should feature prominently in Post 8.
### 2026-03-07 CODER E — W3C Integration, Docker, Scheduling, Pipeline Health
**What**: Four-part infrastructure sprint to make the platform multi-source, self-running, and deployable:
1. **W3C Integration** — Wired the existing W3C fetcher (`sources/w3c.py`) into the full pipeline. Made analysis prompts source-aware (`_doc_type_label()` returns "IETF draft" or "W3C specification" based on `source` field). Added source filter dropdown (IETF / W3C / All) to the Draft Explorer web UI with colored source badges (blue for IETF, green for W3C). Updated `get_drafts_page()` to accept `source` parameter. All pagination and sort links preserve source filter state. Config documents how to enable W3C: `ietf observatory update --source w3c` or add `"w3c"` to `observatory_sources` in config.json.
2. **Scheduled Updates** — Created `scripts/scheduled-update.sh` for cron-based automation. Handles .env loading, log rotation (30 days), and proper error exit codes. Usage: `crontab -e -> 0 6 * * * /path/to/scheduled-update.sh`
3. **Docker Deployment** — Created `Dockerfile` (python:3.11-slim), `docker-compose.yml` (web + ollama services with volume mounts for data persistence), and `.dockerignore`. One-command deployment: `docker compose up`.
4. **Pipeline Health** — Enhanced `ietf pipeline status` to show comprehensive health: processing stage breakdown (rated/embedded/ideas with ASCII progress bars), total ideas, gaps, API token usage, estimated cost. Enhanced monitor web page with visual pipeline progress bars, cost tracking panel, and document/idea/gap counts. Added `--dry-run` flag to `ietf observatory update` that previews what would happen. Wrapped all observatory update steps in try/except for graceful error recovery — failures in one stage no longer block subsequent stages.
**Why**: The platform was IETF-only despite having a complete W3C fetcher. Docker makes deployment reproducible. Scheduled updates make it self-running. Error recovery prevents partial failures from wasting an entire update cycle.
**Result**:
- Files modified: `analyzer.py`, `observatory.py`, `cli.py`, `config.py`, `data.py`, `app.py`, `drafts.html`, `monitor.html`
- Files created: `Dockerfile`, `docker-compose.yml`, `.dockerignore`, `scripts/scheduled-update.sh`
- All Python files compile cleanly
- No breaking changes to existing IETF-only workflows