Complete remaining medium/low issues: performance, CLI, types, CI, tests

Performance:
- Batch readiness computation (~200 queries → ~6 per page)
- Batch draft lookup in author network (N+1 → single query)
- File-based similarity matrix cache (.npy + metadata sidecar)
- 5-minute TTL embedding cache for search queries

CLI quality:
- Add pass_cfg_db decorator, convert ~30 commands to shared config/db lifecycle
- Add --dry-run to analyze, embed, embed-ideas, ideas, gaps commands
- Move 15+ in-function imports to top of data.py

Types & documentation:
- Add 16 TypedDicts to data.py, annotate 12 function return types
- Add ethics section to Post 06 (premature standardization, power asymmetry)
- Add EU AI Act Article 43 conformity mapping to Post 06
- Add NIS2 and CRA references to Post 04

CI & testing:
- Add GitHub Actions CI workflow (Python 3.11+3.12, ruff, pytest)
- Add API documentation for all 20 endpoints (data/reports/api-docs.md)
- Add 41 new tests (test_analyzer.py, test_search.py) — 64 total pass

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
This commit is contained in:
2026-03-08 14:06:54 +01:00
parent e7527ad68e
commit 20c45a7eba
14 changed files with 2305 additions and 1238 deletions

359
data/reports/api-docs.md Normal file
View File

@@ -0,0 +1,359 @@
# IETF Draft Analyzer — API Documentation
All API endpoints return JSON by default. Several support `?format=csv` for CSV export.
Base URL: `http://localhost:5000`
---
## Public Endpoints
### GET /api/stats
Overview statistics for the entire corpus.
**Parameters:** None
**Response:**
```json
{
"total_drafts": 361,
"rated_drafts": 260,
"total_authors": 403,
"total_ideas": 1262,
"total_gaps": 12,
"avg_score": 3.42
}
```
---
### GET /api/drafts
Paginated, filterable list of drafts.
**Parameters:**
| Param | Type | Default | Description |
|-------|------|---------|-------------|
| `page` | int | 1 | Page number |
| `q` | string | "" | Full-text search query |
| `cat` | string | "" | Filter by category |
| `source` | string | "" | Filter by source (ietf, w3c) |
| `min_score` | float | 0.0 | Minimum composite score |
| `sort` | string | "score" | Sort field |
| `dir` | string | "desc" | Sort direction (asc/desc) |
| `format` | string | "json" | Response format: "json" or "csv" |
**Response:** JSON object with `drafts` array and pagination metadata.
---
### GET /api/drafts/{name}
Detail for a single draft including rating, authors, ideas, and references.
**Parameters:**
| Param | Type | Description |
|-------|------|-------------|
| `name` | string | Draft name, e.g. `draft-ietf-ai-agent-protocol` |
**Response:** JSON object with full draft detail, or `{"error": "Draft not found"}` (404).
---
### GET /api/categories
Category names and draft counts.
**Parameters:**
| Param | Type | Default | Description |
|-------|------|---------|-------------|
| `format` | string | "json" | "json" or "csv" |
**Response:**
```json
{
"A2A protocols": 45,
"AI safety/alignment": 38,
...
}
```
---
### GET /api/ratings
Rating distributions across the corpus.
**Parameters:**
| Param | Type | Default | Description |
|-------|------|---------|-------------|
| `format` | string | "json" | "json" or "csv" |
**Response:** JSON object with arrays: `names`, `scores`, `novelty`, `maturity`, `overlap`, `momentum`, `relevance`, `categories`.
---
### GET /api/timeline
Timeline data showing draft publication over time.
**Parameters:** None
**Response:** JSON object with timeline series data.
---
### GET /api/landscape
t-SNE 2D embedding landscape of all drafts.
**Parameters:**
| Param | Type | Default | Description |
|-------|------|---------|-------------|
| `format` | string | "json" | "json" or "csv" |
**Response:** JSON array of `{name, x, y, category, score}` points.
---
### GET /api/similarity
Draft similarity network graph.
**Parameters:** None
**Response:** JSON object with `nodes` and `edges` arrays for a force-directed graph.
---
### GET /api/idea-clusters
Clustered ideas across drafts.
**Parameters:** None
**Response:** JSON object with cluster data.
---
### GET /api/ideas
All extracted technical ideas, grouped by type.
**Parameters:**
| Param | Type | Default | Description |
|-------|------|---------|-------------|
| `format` | string | "json" | "json" or "csv" |
**Response:** JSON object with `ideas` array.
---
### GET /api/authors/network
Author collaboration network graph.
**Parameters:** None
**Response:** JSON object with `nodes` and `edges` arrays.
---
### GET /api/citations
Citation/reference graph between drafts.
**Parameters:**
| Param | Type | Default | Description |
|-------|------|---------|-------------|
| `min_refs` | int | 2 | Minimum references to include a node |
**Response:** JSON object with citation graph data.
---
### GET /api/search
Global search across drafts, ideas, authors, and gaps.
**Parameters:**
| Param | Type | Default | Description |
|-------|------|---------|-------------|
| `q` | string | "" | Search query (required for results) |
| `format` | string | "json" | "json" or "csv" |
**Response:**
```json
{
"drafts": [...],
"ideas": [...],
"authors": [...],
"gaps": [...]
}
```
---
### POST /api/ask
Search-only question answering (free, no Claude API call). Returns relevant sources and any cached answer.
**Request body:**
```json
{
"question": "What drafts address agent authentication?",
"top_k": 5
}
```
**Response:** JSON with `sources` array and optional cached `answer`.
---
## Admin-Only Endpoints
These endpoints require admin mode (`--dev` flag) or authentication.
### POST /api/ask/synthesize
Synthesize an answer using Claude (costs tokens, rate-limited to 10 req/min/IP). Answers are cached permanently.
**Auth:** Admin required
**Request body:**
```json
{
"question": "How do IETF drafts approach agent identity?",
"top_k": 5
}
```
**Response:** JSON with `sources` array and synthesized `answer`.
**Errors:** 429 if rate-limited.
---
### GET /api/gaps
All identified standardization gaps.
**Auth:** Admin required
**Parameters:**
| Param | Type | Default | Description |
|-------|------|---------|-------------|
| `format` | string | "json" | "json" or "csv" |
**Response:** JSON array of gap objects.
---
### GET /api/gaps/{gap_id}
Detail for a single gap.
**Auth:** Admin required
**Parameters:**
| Param | Type | Description |
|-------|------|-------------|
| `gap_id` | int | Gap ID |
**Response:** JSON object with gap detail, or `{"error": "Gap not found"}` (404).
---
### POST /api/compare
Compare multiple drafts using Claude (costs tokens, rate-limited).
**Auth:** Admin required
**Request body:**
```json
{
"drafts": ["draft-name-one", "draft-name-two"]
}
```
**Response:**
```json
{
"text": "Comparison analysis text...",
"drafts": ["draft-name-one", "draft-name-two"]
}
```
**Errors:** 400 if fewer than 2 drafts provided.
---
### POST /api/drafts/{name}/annotate
Add or update annotations (notes, tags) for a draft.
**Auth:** Admin required
**Request body:**
```json
{
"note": "Interesting approach to agent handshake",
"tags": ["important", "review"],
"add_tag": "flagged",
"remove_tag": "review"
}
```
All fields are optional. `add_tag`/`remove_tag` operate on existing tags incrementally.
**Response:**
```json
{
"success": true,
"annotation": {"note": "...", "tags": ["important", "flagged"]}
}
```
---
### GET /api/monitor
Pipeline monitoring status (processing progress, error counts).
**Auth:** Admin required
**Response:** JSON object with monitoring data.
---
## Non-API Data Endpoints
### GET /export/obsidian
Download the entire research corpus as an Obsidian vault ZIP file.
**Response:** `application/zip` file download.
---
## Authentication
- **Production mode** (default): Admin endpoints return 403.
- **Development mode** (`--dev` flag): All admin endpoints are accessible without authentication.
- Rate-limited endpoints (`/api/ask/synthesize`, `/api/compare`): 10 requests per minute per IP, enforced via in-memory sliding window.
## Error Responses
All errors return JSON:
```json
{"error": "Description of the error"}
```
Common HTTP status codes:
- `400` — Bad request (missing parameters)
- `403` — Admin access required
- `404` — Resource not found
- `429` — Rate limit exceeded
- `500` — Internal server error

View File

@@ -58,7 +58,7 @@ A notable omission from this gap list: **GDPR-mandated capabilities**. The gap a
**What is missing**: Circuit breakers for cascading failures. Checkpoint and rollback protocols. Blast radius containment. Graceful degradation. All concepts well-established in distributed systems engineering, but absent from the agent standards landscape.
**The scenario**: A telecom operator deploys 50 AI agents for network monitoring, troubleshooting, and optimization. During a major outage, all 50 agents simultaneously request inference resources to diagnose the problem. With no failure cascade prevention, agents compete chaotically. The most aggressive agents get resources; the most important diagnostic tasks may not. The outage extends because the agents that could fix it are starved by the agents that are observing it.
**The scenario**: A telecom operator deploys 50 AI agents for network monitoring, troubleshooting, and optimization. During a major outage, all 50 agents simultaneously request inference resources to diagnose the problem. With no failure cascade prevention, agents compete chaotically. The most aggressive agents get resources; the most important diagnostic tasks may not. The outage extends because the agents that could fix it are starved by the agents that are observing it. For telecom operators in the EU, the NIS2 Directive (Directive 2022/2555) classifies electronic communications as an essential service, requiring incident response capabilities and supply chain security measures -- making cascade prevention not just an engineering problem but a regulatory obligation.
## High Gap: Real-Time Agent Rollback Mechanisms
@@ -90,7 +90,7 @@ An agent operating across multiple domains or organizations needs to maintain au
### Federated Agent Learning Privacy
While federated architectures exist, there is insufficient specification for privacy-preserving agent learning that prevents data leakage between federated participants during model updates.
While federated architectures exist, there is insufficient specification for privacy-preserving agent learning that prevents data leakage between federated participants during model updates. The absence of secure update mechanisms also intersects with the EU Cyber Resilience Act (Regulation 2024/2847), which requires products with digital elements -- including AI agent software -- to handle updates securely and provide vulnerability management throughout their lifecycle.
### Cross-Protocol Agent Migration

View File

@@ -77,7 +77,7 @@ The architecture achieves this with *assurance profiles* -- named configurations
| L2 | Signed ECTs (JWT) | Cross-org, standard compliance |
| L3 | Signed ECTs + external audit ledger | Regulated industries |
This dual-regime approach resolves the tension between "move fast" deployments and "prove everything" regulated environments. Ideas touching behavior verification and data provenance become implementable at higher assurance levels without imposing their cost on every deployment.
This dual-regime approach resolves the tension between "move fast" deployments and "prove everything" regulated environments. Ideas touching behavior verification and data provenance become implementable at higher assurance levels without imposing their cost on every deployment. Notably, the L2 and L3 profiles map directly to the conformity assessment requirements of the EU AI Act (Art. 43): high-risk AI systems must demonstrate compliance through either internal control (L2's signed ECTs) or third-party audit (L3's external audit ledger), making assurance profiles not just an engineering convenience but a regulatory implementation pathway.
## How It Builds on What Exists
@@ -123,6 +123,14 @@ Based on the data trajectories and current momentum:
**The risk**: If the architecture work does not happen in the next 12 months, the agent ecosystem will calcify around vendor-specific protocol stacks (OpenAI's, Google's, Anthropic's, Huawei's). Each will have its own auth, discovery, and communication layer. The interoperability window will close, and the IETF's work will be standards for islands rather than standards for the internet.
### The Ethics of Standardizing Early
There is a harder question underneath the technical one: should the IETF be standardizing agent capabilities at all before safety frameworks are mature? The 4:1 capability-to-safety ratio is not just a gap -- it is a policy choice being made by default. Every A2A protocol that ships without behavior verification baked in creates a deployed base that resists retrofitting. The standards community is building the defaults that will govern billions of agent interactions, and those defaults currently assume trust rather than requiring proof.
The structural dynamics make this worse. The authorship analysis from Post 2 showed that a small number of large organizations -- Huawei, China Mobile, Cisco -- drive a disproportionate share of submissions. Civil society organizations, academic safety researchers, and smaller companies are largely absent from the drafting process. Standards that define agent identity, discovery, and communication also define what can be monitored, audited, and controlled. An agent discovery protocol designed primarily for enterprise deployment efficiency may inadvertently create a surveillance-friendly architecture if privacy and human autonomy are not first-class design constraints. The EU AI Act mandates human oversight (Art. 14), but a mandate is only as good as the protocol that implements it.
The IETF has historically been good at building infrastructure that serves everyone -- the end-to-end principle, protocol layering, rough consensus. But "rough consensus" among the current participants may not represent the interests of those most affected by autonomous agent systems. The architecture proposed above includes human-in-the-loop as a pillar, not an option. That is the right instinct. The question is whether the community will treat it with the same urgency as the protocol work -- or whether, as the data currently suggests, it will remain an aspiration while the highways ship without traffic lights.
### Two Equilibria
By 2028, the landscape will have resolved into one of two stable states.

View File

@@ -4,6 +4,53 @@
---
### 2026-03-08 CODER — TypedDicts for data layer, ethics + regulatory content in blog series
**What**: Four improvements across typing and content:
1. **TypedDicts in `src/webui/data.py`** — Added 16 TypedDict definitions for common return shapes: `OverviewStats`, `DraftsPage`, `DraftListItem`, `AuthorInfo`, `AuthorNetwork` (with `AuthorNetworkNode`, `AuthorNetworkEdge`, `AuthorCluster`), `SimilarityGraph`, `TimelineData`, `MonitorStatus` (with `MonitorPipeline`, `MonitorCost`), `SearchResults`, `CitationGraph`. Annotated 12 function return types.
2. **Ethics section in Post 06** — Added "The Ethics of Standardizing Early" section (3 paragraphs) covering: premature capability standardization, power asymmetry in authorship, surveillance-friendly architecture risk, and human oversight as non-optional.
3. **EU AI Act conformity assessment note in Post 06** — Connected L2/L3 assurance profiles to Art. 43 conformity assessment requirements (1 sentence in Pillar 4 section).
4. **NIS2 + CRA references in Post 04** — Added NIS2 Directive reference to telecom cascade scenario (essential service obligations). Added Cyber Resilience Act reference to federated learning privacy gap (secure update lifecycle requirements).
**Why**: Untyped dicts make the data layer hard to maintain and refactor. Blog series lacked ethical framing and key EU regulatory cross-references (NIS2, CRA) that strengthen the compliance narrative.
**Result**: 16 TypedDicts with 12 annotated functions. 3 blog post sections added/expanded across Posts 04 and 06.
---
### 2026-03-08 CODER — CI/CD, API docs, and test coverage expansion
**What**: Three infrastructure additions:
1. **GitHub Actions CI** — Added `.github/workflows/ci.yml` that runs on push/PR to main. Tests Python 3.11 and 3.12, installs from `[test]` extras, runs ruff lint (E/F/W rules, ignoring E501), and runs pytest.
2. **API documentation** — Created `data/reports/api-docs.md` documenting all 20 API endpoints in `src/webui/app.py` with method, URL, parameters, response format, and auth requirements. Covers public endpoints (drafts, stats, search, ideas, ratings, etc.) and admin-only endpoints (gaps, compare, synthesize, annotate, monitor).
3. **New test files** — Added `tests/test_analyzer.py` (21 tests covering `_extract_json`, `_clamp_rating`, `_parse_rating` with compact/verbose keys, defaults, and clamping) and `tests/test_search.py` (19 tests covering `sanitize_fts_query` with injection attempts, boolean operators, special chars, edge cases). Total: 64 tests all passing.
**Why**: Project had zero CI, no API docs for the web UI, and test coverage only on DB/models. These are prerequisites for public deployment and contributor onboarding.
**Result**: CI workflow ready, API fully documented, test count increased from 23 to 64. All tests pass in 0.6s.
---
### 2026-03-08 CODER — Performance: fix N+1 queries and add caching
**What**: Four targeted performance fixes across the codebase:
1. **Batch readiness computation**`compute_readiness_batch()` in `readiness.py` replaces per-draft readiness calls on the drafts page. Bulk-loads ref counts, cited-by counts, author experience, and ratings in ~6 queries total instead of ~200 (4 queries x 50 drafts/page).
2. **Batch draft lookup in author network**`_compute_author_network_full()` now calls `db.get_drafts_by_names()` once to pre-load all drafts referenced by authors, instead of calling `db.get_draft()` in a loop inside cluster building.
3. **File-based similarity matrix cache**`Embedder.similarity_matrix()` now caches the O(n^2) cosine similarity matrix to disk (`.cache/` dir next to DB), keyed by SHA256 hash of draft names. Reloads from cache if the set of embedded drafts hasn't changed.
4. **Embeddings cache for search**`HybridSearch._get_all_embeddings()` caches the result of `db.all_embeddings()` with a 5-minute TTL, avoiding a full DB scan on every search query.
Also added `Database.get_drafts_by_names()` batch method in `db.py` (chunked to stay under SQLite's 999 variable limit).
**Why**: Page loads on the drafts listing and author network pages were slow due to N+1 query patterns. The similarity matrix was recomputed from scratch on every CLI invocation. Search queries redundantly loaded all embeddings from disk.
**Result**: Drafts page: ~200 queries reduced to ~6. Author network cluster building: ~100 `get_draft` calls reduced to 1 batch query. Similarity matrix: cached to disk, skips O(n^2) recomputation when embeddings unchanged. Search: embeddings loaded once per 5 minutes instead of per query.
---
### 2026-03-08 CODER — CLI boilerplate reduction, --dry-run flags, webui import cleanup
**What**: Three code quality improvements across the CLI and web UI:
1. **CLI boilerplate reduction** — Created a `pass_cfg_db` decorator that extracts `cfg` and `db` from the Click context, replacing ~40 instances of `cfg = _get_config(); db = Database(cfg); try: ... finally: db.close()`. The `main()` group now initializes config/db once and registers `db.close()` via `ctx.call_on_close()`. Converted ~30 commands to use the new pattern (all report, viz, wg, ideas, and core commands). Remaining ~15 read-only commands still use the old pattern but work correctly.
2. **--dry-run on destructive commands** — Added `--dry-run` flag to `analyze`, `embed`, `embed-ideas`, `ideas` (extract), and `gaps`. Each shows what would be processed (draft names, counts) without making API calls or DB changes. Pre-existing dry-run flags on `ideas filter`, `dedup-ideas`, `pipeline generate`, and `observatory update` were preserved.
3. **webui/data.py import cleanup** — Moved 15+ in-function imports to the top of the file: `numpy`, `re`, `sklearn.{TSNE, AgglomerativeClustering, normalize}`, `ietf_analyzer.{readiness, search}`. Fixed `json as _json` alias to use the already-imported `json`. sklearn imports inside try/except blocks (for graceful failure) were moved to top level since sklearn is a required dependency.
**Why**: The CLI had ~800 lines of pure boilerplate. The try/finally pattern was error-prone (easy to forget db.close()). Missing --dry-run on destructive commands made it risky to explore what a command would do. In-function imports in data.py were unnecessary since all dependencies are required.
**Result**: cli.py reduced by ~200 lines of boilerplate. 6 commands now have --dry-run. data.py has clean top-level imports. Both files pass syntax checks and the CLI loads correctly.
---
### 2026-03-08 CODER — Critical fixes: rating clamp, convergence command, blog number correction
**What**: Three fixes addressing data integrity and reproducibility: