439424bd04
Fix security, data integrity, and accuracy issues from 4-perspective review
...
Security fixes:
- Fix SQL injection in db.py:update_generation_run (column name whitelist)
- Flask SECRET_KEY from env var instead of hardcoded
- Add LLM rating bounds validation (_clamp_rating, 1-10)
- Fix JSON extraction trailing whitespace handling
Data integrity:
- Normalize 21 legacy category names to 11 canonical short forms
- Add false_positive column, flag 73 non-AI drafts (361 relevant remain)
- Document verified counts: 434 total/361 relevant drafts, 557 authors, 419 ideas, 11 gaps
Code quality:
- Fix version string 0.1.0 → 0.2.0
- Add close()/context manager to Embedder class
- Dynamic matrix size instead of hardcoded "260x260"
Blog accuracy:
- Fix EU AI Act timeline (enforcement Aug 2026, not "18 months")
- Distinguish OAuth consent from GDPR Einwilligung
- Add EU AI Act Annex III context to hospital scenario
- Add FIPA, eIDAS 2.0 references where relevant
Methodology:
- Add methodology.md documenting pipeline, limitations, rating rubric
- Add LLM-as-judge caveats to analyzer.py
- Document clustering threshold rationale
Reviews from: legal (German/EU law), statistics, development, science perspectives.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com >
2026-03-08 10:52:33 +01:00
757b781c67
Platform upgrade: semantic search, citations, readiness, tests, Docker
...
Major features added by 5 parallel agent teams:
- Semantic "Ask" (NL queries via FTS5 + embeddings + Claude synthesis)
- Global search across drafts, ideas, authors, gaps
- REST API expansion (14 endpoints, up from 3) with CSV/JSON export
- Citation graph visualization (D3.js, 440 nodes, 2422 edges)
- Standards readiness scoring (0-100 composite from 6 factors)
- Side-by-side draft comparison view with shared/unique analysis
- Annotation system (notes + tags per draft, DB-persisted)
- Docker deployment (Dockerfile + docker-compose with Ollama)
- Scheduled updates (cron script with log rotation)
- Pipeline health dashboard (stage progress bars, cost tracking)
- Test suite foundation (54 pytest tests covering DB, models, web data)
Fixes: compare_drafts() stubbed→working, get_authors_for_draft() bug,
source-aware analysis prompts, config env var overrides + validation,
resilient batch error handling with --retry-failed, observatory --dry-run
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com >
2026-03-07 20:52:56 +01:00
6e3a387778
Idea quality pipeline, web UI features, academic paper
...
- Tighten idea extraction prompts (1-4 ideas, no sub-features) reducing
1,907 ideas to 468 across 434 drafts (78% reduction)
- Add embedding-based dedup (ietf dedup-ideas) for same-draft similarity
- Add novelty scoring (ietf ideas score) and filtering (ietf ideas filter)
using Claude to rate ideas 1-5, removing 49 generic building blocks
- Final count: 419 high-quality ideas (avg 1.1/draft)
- Web UI: gap explorer with live draft generation and pre-generated demos
- Web UI: D3.js author collaboration network (498 nodes, 1142 edges,
68 clusters, org filtering, interactive zoom/pan)
- Academic paper: 15-page LaTeX workshop paper analyzing the 434-draft
AI agent standards landscape
- Save improvement ideas backlog to data/reports/improvement-ideas.md
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com >
2026-03-06 22:17:57 +01:00
3c3d7e649f
Add ASCII art figures to all 6 generated Internet-Drafts
...
Each draft gets 2 illustrative figures:
- ABVP: architecture components + verification workflow
- ATD: example DAG structure + execution state transitions
- HITL: primitive framework overview + approval workflow sequence
- AEM/PPALP: federated learning architecture + aggregation flow
- RARP: cross-domain architecture + two-phase rollback protocol
- APAE: layered architecture + cross-domain provenance tracking
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com >
2026-03-04 02:20:39 +01:00
404092b938
Generate 5-draft ecosystem family, fix formatter markdown stripping
...
Pipeline output:
- ABVP: Agent Behavior Verification Protocol (quality 3.0/5)
- AEM: Privacy-Preserving Agent Learning Protocol (quality 2.1/5)
- ATD: Agent Task DAG Framework (quality 2.5/5)
- HITL: Human-in-the-Loop Primitives (quality 2.4/5)
- AEPB: Real-Time Agent Rollback Protocol (quality 2.5/5)
- APAE: Agent Provenance Assurance Ecosystem (quality 2.5/5)
Quality gates: all pass novelty + references, format gate improved
with markdown stripping (_strip_markdown) and dynamic header padding.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com >
2026-03-04 01:42:30 +01:00
d6beb9c0a0
v0.3.0: Gap-to-Draft pipeline, Living Standards Observatory, blog series
...
Gap-to-Draft Pipeline (ietf pipeline):
- Context builder assembles ideas, RFC foundations, similar drafts, ecosystem vision
- Generator produces outlines + sections using rich context with Claude
- Quality gates: novelty (embedding similarity), references, format, self-rating
- Family coordinator generates 5-draft ecosystem (AEM/ATD/HITL/AEPB/APAE)
- I-D formatter with proper headers, references, 72-char wrapping
Living Standards Observatory (ietf observatory):
- Source abstraction with IETF + W3C fetchers
- 7-step update pipeline: snapshot, fetch, analyze, embed, ideas, gaps, record
- Static GitHub Pages dashboard (explorer, gap tracker, timeline)
- Weekly CI/CD automation via GitHub Actions
Also includes:
- 361 drafts (expanded from 260 with 6 new keywords), 403 authors, 1,262 ideas, 12 gaps
- Blog series (8 posts planned), reports, arXiv paper figures
- Agent team infrastructure (CLAUDE.md, scripts, dev journal)
- 5 new DB tables, schema migration, ~15 new query methods
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com >
2026-03-04 00:48:57 +01:00
be9cf9c5d9
v0.2.0: visualizations, interactive browser, arXiv paper, gap analysis
...
New features:
- 12 interactive visualizations (ietf viz): t-SNE landscape, similarity
heatmap, score distributions, timeline, bubble explorer, radar charts,
author network graph, category treemap, quality vs overlap, org bar chart,
ideas chart, and interactive draft browser
- Interactive draft browser (browser.html): filterable by category, keyword,
score sliders with sortable table and expandable detail rows
- arXiv paper (paper/main.tex): 13-page manuscript with all findings
- Gap analysis: 12 identified under-addressed areas
- Author network: collaboration graph, org contributions, cross-org analysis
- Draft generation from gaps (ietf draft-gen)
- Auto-load .env for API keys (python-dotenv)
New modules: visualize.py, authors.py, draftgen.py
New reports: timeline, overlap-matrix, authors, gaps
New deps: plotly, matplotlib, seaborn, scipy, scikit-learn, networkx, python-dotenv
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com >
2026-02-28 13:37:55 +01:00
6771a4c235
IETF Draft Analyzer v0.1.0 — track, categorize, and rate AI/agent drafts
...
Python CLI tool that fetches AI/agent-related Internet-Drafts from the IETF
Datatracker, rates them using Claude, generates embeddings via Ollama for
similarity/clustering, and produces markdown reports.
Features:
- Fetch drafts by keyword from Datatracker API with full text download
- Batch analysis with Claude (token-optimized, responses cached in SQLite)
- Embedding-based similarity search and overlap cluster detection
- Reports: overview, landscape by category, overlap clusters, weekly digest
- SQLite with FTS5 for full-text search across 260 tracked drafts
Initial analysis of 260 drafts reveals OAuth agent auth (13 drafts) and
agent gateway/collaboration (10 drafts) as the most crowded clusters,
while AI safety/alignment is underserved with the highest quality scores.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com >
2026-02-28 00:36:45 +01:00