Complete remaining medium/low issues: performance, CLI, types, CI, tests

Performance: - Batch readiness computation (~200 queries → ~6 per page) - Batch draft lookup in author network (N+1 → single query) - File-based similarity matrix cache (.npy + metadata sidecar) - 5-minute TTL embedding cache for search queries CLI quality: - Add pass_cfg_db decorator, convert ~30 commands to shared config/db lifecycle - Add --dry-run to analyze, embed, embed-ideas, ideas, gaps commands - Move 15+ in-function imports to top of data.py Types & documentation: - Add 16 TypedDicts to data.py, annotate 12 function return types - Add ethics section to Post 06 (premature standardization, power asymmetry) - Add EU AI Act Article 43 conformity mapping to Post 06 - Add NIS2 and CRA references to Post 04 CI & testing: - Add GitHub Actions CI workflow (Python 3.11+3.12, ruff, pytest) - Add API documentation for all 20 endpoints (data/reports/api-docs.md) - Add 41 new tests (test_analyzer.py, test_search.py) — 64 total pass Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-08 14:06:54 +01:00
parent e7527ad68e
commit 20c45a7eba
14 changed files with 2305 additions and 1238 deletions
--- a/data/reports/api-docs.md
+++ b/data/reports/api-docs.md
@@ -0,0 +1,359 @@
+# IETF Draft Analyzer — API Documentation
+
+All API endpoints return JSON by default. Several support `?format=csv` for CSV export.
+
+Base URL: `http://localhost:5000`
+
+---
+
+## Public Endpoints
+
+### GET /api/stats
+
+Overview statistics for the entire corpus.
+
+**Parameters:** None
+
+**Response:**
+```json
+{
+  "total_drafts": 361,
+  "rated_drafts": 260,
+  "total_authors": 403,
+  "total_ideas": 1262,
+  "total_gaps": 12,
+  "avg_score": 3.42
+}
+```
+
+---
+
+### GET /api/drafts
+
+Paginated, filterable list of drafts.
+
+**Parameters:**
+| Param | Type | Default | Description |
+|-------|------|---------|-------------|
+| `page` | int | 1 | Page number |
+| `q` | string | "" | Full-text search query |
+| `cat` | string | "" | Filter by category |
+| `source` | string | "" | Filter by source (ietf, w3c) |
+| `min_score` | float | 0.0 | Minimum composite score |
+| `sort` | string | "score" | Sort field |
+| `dir` | string | "desc" | Sort direction (asc/desc) |
+| `format` | string | "json" | Response format: "json" or "csv" |
+
+**Response:** JSON object with `drafts` array and pagination metadata.
+
+---
+
+### GET /api/drafts/{name}
+
+Detail for a single draft including rating, authors, ideas, and references.
+
+**Parameters:**
+| Param | Type | Description |
+|-------|------|-------------|
+| `name` | string | Draft name, e.g. `draft-ietf-ai-agent-protocol` |
+
+**Response:** JSON object with full draft detail, or `{"error": "Draft not found"}` (404).
+
+---
+
+### GET /api/categories
+
+Category names and draft counts.
+
+**Parameters:**
+| Param | Type | Default | Description |
+|-------|------|---------|-------------|
+| `format` | string | "json" | "json" or "csv" |
+
+**Response:**
+```json
+{
+  "A2A protocols": 45,
+  "AI safety/alignment": 38,
+  ...
+}
+```
+
+---
+
+### GET /api/ratings
+
+Rating distributions across the corpus.
+
+**Parameters:**
+| Param | Type | Default | Description |
+|-------|------|---------|-------------|
+| `format` | string | "json" | "json" or "csv" |
+
+**Response:** JSON object with arrays: `names`, `scores`, `novelty`, `maturity`, `overlap`, `momentum`, `relevance`, `categories`.
+
+---
+
+### GET /api/timeline
+
+Timeline data showing draft publication over time.
+
+**Parameters:** None
+
+**Response:** JSON object with timeline series data.
+
+---
+
+### GET /api/landscape
+
+t-SNE 2D embedding landscape of all drafts.
+
+**Parameters:**
+| Param | Type | Default | Description |
+|-------|------|---------|-------------|
+| `format` | string | "json" | "json" or "csv" |
+
+**Response:** JSON array of `{name, x, y, category, score}` points.
+
+---
+
+### GET /api/similarity
+
+Draft similarity network graph.
+
+**Parameters:** None
+
+**Response:** JSON object with `nodes` and `edges` arrays for a force-directed graph.
+
+---
+
+### GET /api/idea-clusters
+
+Clustered ideas across drafts.
+
+**Parameters:** None
+
+**Response:** JSON object with cluster data.
+
+---
+
+### GET /api/ideas
+
+All extracted technical ideas, grouped by type.
+
+**Parameters:**
+| Param | Type | Default | Description |
+|-------|------|---------|-------------|
+| `format` | string | "json" | "json" or "csv" |
+
+**Response:** JSON object with `ideas` array.
+
+---
+
+### GET /api/authors/network
+
+Author collaboration network graph.
+
+**Parameters:** None
+
+**Response:** JSON object with `nodes` and `edges` arrays.
+
+---
+
+### GET /api/citations
+
+Citation/reference graph between drafts.
+
+**Parameters:**
+| Param | Type | Default | Description |
+|-------|------|---------|-------------|
+| `min_refs` | int | 2 | Minimum references to include a node |
+
+**Response:** JSON object with citation graph data.
+
+---
+
+### GET /api/search
+
+Global search across drafts, ideas, authors, and gaps.
+
+**Parameters:**
+| Param | Type | Default | Description |
+|-------|------|---------|-------------|
+| `q` | string | "" | Search query (required for results) |
+| `format` | string | "json" | "json" or "csv" |
+
+**Response:**
+```json
+{
+  "drafts": [...],
+  "ideas": [...],
+  "authors": [...],
+  "gaps": [...]
+}
+```
+
+---
+
+### POST /api/ask
+
+Search-only question answering (free, no Claude API call). Returns relevant sources and any cached answer.
+
+**Request body:**
+```json
+{
+  "question": "What drafts address agent authentication?",
+  "top_k": 5
+}
+```
+
+**Response:** JSON with `sources` array and optional cached `answer`.
+
+---
+
+## Admin-Only Endpoints
+
+These endpoints require admin mode (`--dev` flag) or authentication.
+
+### POST /api/ask/synthesize
+
+Synthesize an answer using Claude (costs tokens, rate-limited to 10 req/min/IP). Answers are cached permanently.
+
+**Auth:** Admin required
+
+**Request body:**
+```json
+{
+  "question": "How do IETF drafts approach agent identity?",
+  "top_k": 5
+}
+```
+
+**Response:** JSON with `sources` array and synthesized `answer`.
+
+**Errors:** 429 if rate-limited.
+
+---
+
+### GET /api/gaps
+
+All identified standardization gaps.
+
+**Auth:** Admin required
+
+**Parameters:**
+| Param | Type | Default | Description |
+|-------|------|---------|-------------|
+| `format` | string | "json" | "json" or "csv" |
+
+**Response:** JSON array of gap objects.
+
+---
+
+### GET /api/gaps/{gap_id}
+
+Detail for a single gap.
+
+**Auth:** Admin required
+
+**Parameters:**
+| Param | Type | Description |
+|-------|------|-------------|
+| `gap_id` | int | Gap ID |
+
+**Response:** JSON object with gap detail, or `{"error": "Gap not found"}` (404).
+
+---
+
+### POST /api/compare
+
+Compare multiple drafts using Claude (costs tokens, rate-limited).
+
+**Auth:** Admin required
+
+**Request body:**
+```json
+{
+  "drafts": ["draft-name-one", "draft-name-two"]
+}
+```
+
+**Response:**
+```json
+{
+  "text": "Comparison analysis text...",
+  "drafts": ["draft-name-one", "draft-name-two"]
+}
+```
+
+**Errors:** 400 if fewer than 2 drafts provided.
+
+---
+
+### POST /api/drafts/{name}/annotate
+
+Add or update annotations (notes, tags) for a draft.
+
+**Auth:** Admin required
+
+**Request body:**
+```json
+{
+  "note": "Interesting approach to agent handshake",
+  "tags": ["important", "review"],
+  "add_tag": "flagged",
+  "remove_tag": "review"
+}
+```
+
+All fields are optional. `add_tag`/`remove_tag` operate on existing tags incrementally.
+
+**Response:**
+```json
+{
+  "success": true,
+  "annotation": {"note": "...", "tags": ["important", "flagged"]}
+}
+```
+
+---
+
+### GET /api/monitor
+
+Pipeline monitoring status (processing progress, error counts).
+
+**Auth:** Admin required
+
+**Response:** JSON object with monitoring data.
+
+---
+
+## Non-API Data Endpoints
+
+### GET /export/obsidian
+
+Download the entire research corpus as an Obsidian vault ZIP file.
+
+**Response:** `application/zip` file download.
+
+---
+
+## Authentication
+
+- **Production mode** (default): Admin endpoints return 403.
+- **Development mode** (`--dev` flag): All admin endpoints are accessible without authentication.
+- Rate-limited endpoints (`/api/ask/synthesize`, `/api/compare`): 10 requests per minute per IP, enforced via in-memory sliding window.
+
+## Error Responses
+
+All errors return JSON:
+```json
+{"error": "Description of the error"}
+```
+
+Common HTTP status codes:
+- `400` — Bad request (missing parameters)
+- `403` — Admin access required
+- `404` — Resource not found
+- `429` — Rate limit exceeded
+- `500` — Internal server error