Platform upgrade: semantic search, citations, readiness, tests, Docker
Major features added by 5 parallel agent teams: - Semantic "Ask" (NL queries via FTS5 + embeddings + Claude synthesis) - Global search across drafts, ideas, authors, gaps - REST API expansion (14 endpoints, up from 3) with CSV/JSON export - Citation graph visualization (D3.js, 440 nodes, 2422 edges) - Standards readiness scoring (0-100 composite from 6 factors) - Side-by-side draft comparison view with shared/unique analysis - Annotation system (notes + tags per draft, DB-persisted) - Docker deployment (Dockerfile + docker-compose with Ollama) - Scheduled updates (cron script with log rotation) - Pipeline health dashboard (stage progress bars, cost tracking) - Test suite foundation (54 pytest tests covering DB, models, web data) Fixes: compare_drafts() stubbed→working, get_authors_for_draft() bug, source-aware analysis prompts, config env var overrides + validation, resilient batch error handling with --retry-failed, observatory --dry-run Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
This commit is contained in:
@@ -4,6 +4,19 @@
|
||||
|
||||
---
|
||||
|
||||
### 2026-03-07 CODER C — Citation Graph, Readiness Scoring, Annotations, Data Surfacing
|
||||
|
||||
**What**: Implemented four features in a single session:
|
||||
1. **Citation Graph Visualization** (`/citations`): D3.js force-directed graph showing cross-references between drafts and RFCs. Nodes colored by type (blue=draft, orange=RFC), sized by influence (in-degree). Includes category filter, min-refs slider, hover tooltips, click-to-navigate, and a top-referenced RFCs table. New `get_citation_graph()` in data.py, route + API endpoint in app.py.
|
||||
2. **Standards Readiness Scoring**: New `readiness.py` module computing a 0-100 composite score from 6 weighted factors (WG adoption 25%, revision maturity 15%, reference density 15%, cited-by count 15%, author experience 15%, momentum rating 15%). Displayed as a progress gauge on draft detail pages, added as sortable column on drafts listing, and shown in `ietf show` CLI output.
|
||||
3. **Annotation System**: New `annotations` table in DB schema with `upsert_annotation`, `get_annotation`, `get_all_annotations`, `search_by_tag` methods. New `ietf annotate` CLI command with `--note`, `--tag`, `--remove-tag` options. Web UI: inline note editor + tag chips with add/remove on draft detail page, backed by POST `/api/drafts/<name>/annotate` endpoint.
|
||||
4. **Surface Underutilized Data**: Exposed `novelty_score` (from pipeline/quality.py) in ideas.html and draft_detail.html as color-coded N:X badges. Gap severity now sorts critical-first (was alphabetical). `all_ideas()` and `get_ideas_for_draft()` now return `novelty_score` field.
|
||||
|
||||
**Why**: These features leverage existing data (4231 refs, novelty scores, severity) that was computed but never surfaced to users. Readiness scoring gives a quick at-a-glance RFC proximity signal. Annotations enable user workflow.
|
||||
**Result**: 8 files modified (db.py, data.py, app.py, cli.py, base.html, draft_detail.html, ideas.html, drafts.html, gaps.html), 2 files created (readiness.py, citations.html). Citations link added to sidebar nav.
|
||||
|
||||
---
|
||||
|
||||
### 2026-03-06 CODER — Interactive D3.js Author Network Visualization
|
||||
|
||||
**What**: Replaced the Plotly spring-layout co-authorship graph on `/authors` with a full D3.js v7 force-directed network. Added enriched data layer (`get_author_network_full`) with avg draft scores per author, connected-component cluster detection (68 clusters found), and a new `/api/authors/network` JSON endpoint. Template now includes: interactive D3 force graph with zoom/pan/drag, org filter dropdown, cluster highlighting with zoom-to-fit, hover tooltips showing author details + draft list, click-to-navigate, plus the existing Plotly org bar chart, cross-org collaboration chart, sortable authors table (now top 50), and org stats sidebar.
|
||||
@@ -411,3 +424,23 @@
|
||||
- Decisions made: **GitHub Pages** for publication, **staggered 1/day** cadence, **MIT license**
|
||||
- Agent utilization: Architect (2 tasks, shut down), Writer (2 tasks, shut down), Planner (1 task, shut down)
|
||||
**Surprise**: The crash recovery was seamless — the dev journal served exactly its intended purpose. Every agent could read the journal and understand the full state without any human explanation. The journal-as-coordination-mechanism is the strongest vindication of the CLAUDE.md journaling requirement. This should feature prominently in Post 8.
|
||||
|
||||
### 2026-03-07 CODER E — W3C Integration, Docker, Scheduling, Pipeline Health
|
||||
|
||||
**What**: Four-part infrastructure sprint to make the platform multi-source, self-running, and deployable:
|
||||
|
||||
1. **W3C Integration** — Wired the existing W3C fetcher (`sources/w3c.py`) into the full pipeline. Made analysis prompts source-aware (`_doc_type_label()` returns "IETF draft" or "W3C specification" based on `source` field). Added source filter dropdown (IETF / W3C / All) to the Draft Explorer web UI with colored source badges (blue for IETF, green for W3C). Updated `get_drafts_page()` to accept `source` parameter. All pagination and sort links preserve source filter state. Config documents how to enable W3C: `ietf observatory update --source w3c` or add `"w3c"` to `observatory_sources` in config.json.
|
||||
|
||||
2. **Scheduled Updates** — Created `scripts/scheduled-update.sh` for cron-based automation. Handles .env loading, log rotation (30 days), and proper error exit codes. Usage: `crontab -e -> 0 6 * * * /path/to/scheduled-update.sh`
|
||||
|
||||
3. **Docker Deployment** — Created `Dockerfile` (python:3.11-slim), `docker-compose.yml` (web + ollama services with volume mounts for data persistence), and `.dockerignore`. One-command deployment: `docker compose up`.
|
||||
|
||||
4. **Pipeline Health** — Enhanced `ietf pipeline status` to show comprehensive health: processing stage breakdown (rated/embedded/ideas with ASCII progress bars), total ideas, gaps, API token usage, estimated cost. Enhanced monitor web page with visual pipeline progress bars, cost tracking panel, and document/idea/gap counts. Added `--dry-run` flag to `ietf observatory update` that previews what would happen. Wrapped all observatory update steps in try/except for graceful error recovery — failures in one stage no longer block subsequent stages.
|
||||
|
||||
**Why**: The platform was IETF-only despite having a complete W3C fetcher. Docker makes deployment reproducible. Scheduled updates make it self-running. Error recovery prevents partial failures from wasting an entire update cycle.
|
||||
|
||||
**Result**:
|
||||
- Files modified: `analyzer.py`, `observatory.py`, `cli.py`, `config.py`, `data.py`, `app.py`, `drafts.html`, `monitor.html`
|
||||
- Files created: `Dockerfile`, `docker-compose.yml`, `.dockerignore`, `scripts/scheduled-update.sh`
|
||||
- All Python files compile cleanly
|
||||
- No breaking changes to existing IETF-only workflows
|
||||
|
||||
182
data/reports/platform-improvement-plan.md
Normal file
182
data/reports/platform-improvement-plan.md
Normal file
@@ -0,0 +1,182 @@
|
||||
# Platform Improvement Plan: IETF Draft Analyzer → Standards Intelligence Platform
|
||||
|
||||
*Generated 2026-03-07 — Based on full codebase audit and architectural analysis*
|
||||
|
||||
---
|
||||
|
||||
## Current State Summary
|
||||
|
||||
| Dimension | What Exists | Assessment |
|
||||
|-----------|-------------|------------|
|
||||
| **Data** | 434 drafts, 557 authors, 419 ideas, 11 gaps, 4231 refs to 694 RFCs | Strong foundation |
|
||||
| **CLI** | 20+ commands (fetch, analyze, embed, ideas, gaps, report, viz, wg, etc.) | Feature-rich |
|
||||
| **Web UI** | 16 Flask pages with D3.js/Plotly visualizations | Good but disconnected |
|
||||
| **Pipeline** | Observatory class with multi-source support | Built but manual |
|
||||
| **Multi-SDO** | IETF complete, W3C fetcher written but unused | Partially built |
|
||||
| **Tests** | Zero tests | Critical gap |
|
||||
| **Deployment** | Manual Python setup | No containerization |
|
||||
|
||||
## The Transformation
|
||||
|
||||
**From**: A powerful CLI analysis tool that an expert runs manually
|
||||
**To**: A living intelligence platform that monitors, alerts, and answers questions
|
||||
|
||||
---
|
||||
|
||||
## Phase 1: Quick Wins (All Parallelizable)
|
||||
|
||||
### 1.1 Global Search [S: 3-5h]
|
||||
Add a unified search bar to the web UI that queries across drafts (FTS5), ideas, authors, and gaps simultaneously. Currently search is isolated to the drafts page.
|
||||
|
||||
- **Files**: `src/webui/app.py` (new `/search` route), `src/webui/data.py` (new `global_search()`), `src/webui/templates/base.html` (search in sidebar), new `search_results.html`
|
||||
- **Why first**: Most common user action on any data platform
|
||||
|
||||
### 1.2 REST API [S: 2-3h]
|
||||
Expose all existing `data.py` functions as JSON endpoints. Only 3 API endpoints exist today — extend to all 15+ data views.
|
||||
|
||||
- **Files**: `src/webui/app.py` (add `/api/ideas`, `/api/gaps`, `/api/ratings`, `/api/timeline`, `/api/landscape`, `/api/similarity`, `/api/drafts/<name>`, etc.)
|
||||
- **Why**: Enables programmatic access, third-party tools, decouples data from presentation
|
||||
|
||||
### 1.3 Export (CSV/JSON) [S: 3-4h]
|
||||
Add export buttons to web UI pages + `ietf export` CLI command.
|
||||
|
||||
- **Files**: `src/ietf_analyzer/cli.py` (new `export` command), API endpoints get `?format=csv` support
|
||||
- **Depends on**: 1.2
|
||||
|
||||
### 1.4 Annotation System [M: 4-6h]
|
||||
Add private notes and custom tags per draft, persisted in DB.
|
||||
|
||||
- **Files**: `src/ietf_analyzer/db.py` (new `annotations` table), `src/webui/app.py` (POST endpoint), `src/webui/templates/draft_detail.html` (inline edit), `src/ietf_analyzer/cli.py` (new `annotate` command)
|
||||
- **Why**: Analysts need to layer their own context onto the data
|
||||
|
||||
### 1.5 Test Suite Foundation [M: 6-8h]
|
||||
Create pytest infrastructure with ~30 tests covering DB layer, models, and web data functions.
|
||||
|
||||
- **Files**: new `tests/conftest.py`, `tests/test_db.py`, `tests/test_models.py`, `tests/test_data.py`, update `pyproject.toml`
|
||||
- **Why**: Zero tests today. Every future change benefits from a safety net
|
||||
|
||||
---
|
||||
|
||||
## Phase 2: Core Platform
|
||||
|
||||
### 2.1 Semantic Search / "Ask" [M: 1-2 days] — THE SIGNATURE FEATURE
|
||||
Natural language queries: "Which drafts address agent authentication?" → synthesized answer with citations.
|
||||
|
||||
- Embed the query via Ollama, compute cosine similarity against all 434 draft embeddings
|
||||
- Merge with FTS5 keyword results using reciprocal rank fusion
|
||||
- Optionally synthesize answer via Claude with top-K context
|
||||
- **Files**: new `src/ietf_analyzer/search.py`, `src/ietf_analyzer/cli.py` (`ietf ask`), `src/webui/app.py` (`/ask` route), new template
|
||||
|
||||
### 2.2 Competitive Landscape Mapping [M: 1-2 days]
|
||||
Auto-detect and compare competing proposals in the same problem space.
|
||||
|
||||
- Group drafts by similarity clusters, enrich with rating comparisons and WG adoption status
|
||||
- Show head-to-head comparisons: where they agree, where they diverge
|
||||
- **Files**: new `src/ietf_analyzer/competition.py`, `src/webui/app.py` (`/competition` route), new template
|
||||
|
||||
### 2.3 Standards Readiness Scoring [M: 1 day]
|
||||
Composite 0-100 "readiness" score: WG adoption, revision count, reference density, cited-by count, author track record.
|
||||
|
||||
- **Files**: new `src/ietf_analyzer/readiness.py`, update `src/webui/templates/draft_detail.html` (gauge chart), update drafts listing
|
||||
|
||||
### 2.4 Scheduled Updates + Pipeline Health [M: 1 day]
|
||||
Cron-based auto-fetch using existing Observatory, plus monitoring dashboard.
|
||||
|
||||
- **Files**: new `scripts/scheduled-update.sh`, enhance `src/webui/templates/monitor.html` (stage breakdown, cost tracking, failure log)
|
||||
|
||||
### 2.5 Comparison View [M: 1 day]
|
||||
Side-by-side comparison of 2+ drafts with rating radar overlay, shared/unique ideas, shared/unique references.
|
||||
|
||||
- **Files**: `src/webui/app.py` (`/compare?drafts=...`), `src/webui/data.py` (`get_comparison_data()`), new template, add checkboxes to drafts listing
|
||||
|
||||
---
|
||||
|
||||
## Phase 3: Intelligence
|
||||
|
||||
### 3.1 Trend Forecasting [L: 2-3 days]
|
||||
Predict which areas will grow based on submission velocity, revision activity, WG adoption signals.
|
||||
|
||||
- Per-category momentum signals, linear/exponential extrapolation, "Hot/Cooling/Emerging" tags
|
||||
- **Files**: new `src/ietf_analyzer/forecasting.py`, new web page `/trends`
|
||||
|
||||
### 3.2 Change Detection & Diffing [L: 2-3 days]
|
||||
Track what changed between draft revisions, summarize changes via Claude.
|
||||
|
||||
- New `draft_revisions` table to archive old versions
|
||||
- Section-level diff with Claude-generated change summaries
|
||||
- **Files**: new `src/ietf_analyzer/diff.py`, update `src/ietf_analyzer/db.py`, `src/ietf_analyzer/fetcher.py`
|
||||
- **Depends on**: 2.4 (scheduled updates to catch new revisions)
|
||||
|
||||
### 3.3 Citation Graph [M: 1-2 days]
|
||||
Visual dependency tree from the 4231 existing cross-references to 694 RFCs.
|
||||
|
||||
- D3.js force-directed graph (pattern from authors.html), PageRank-style influence scores
|
||||
- **Files**: `src/webui/data.py` (`get_citation_graph()`), new template `citations.html`
|
||||
|
||||
### 3.4 Newsletter Generation [M: 1-2 days]
|
||||
Automated weekly/monthly digest with new drafts, significant changes, trend shifts.
|
||||
|
||||
- **Files**: new `src/ietf_analyzer/newsletter.py`, `src/ietf_analyzer/cli.py` (`ietf newsletter`)
|
||||
- **Depends on**: 3.2 (for change content)
|
||||
|
||||
---
|
||||
|
||||
## Phase 4: Scale
|
||||
|
||||
### 4.1 Docker Deployment [S: 3-4h]
|
||||
Dockerfile + docker-compose.yml with Flask app + Ollama.
|
||||
|
||||
### 4.2 Complete W3C Integration [M: 1 day]
|
||||
Wire existing W3C fetcher into Observatory pipeline, make prompts source-aware, add source filter to web UI.
|
||||
|
||||
### 4.3 IEEE + 3GPP Sources [L: 3-5 days]
|
||||
New source fetchers following `SourceFetcher` protocol. Depends on 4.2 validating the pipeline.
|
||||
|
||||
### 4.4 Cross-SDO Analysis [L: 2-3 days]
|
||||
Compare work across standards bodies. Embedding similarity between IETF/W3C/IEEE specs, gap analysis for topics only one body covers.
|
||||
|
||||
### 4.5 Plugin Architecture [L: 2-3 days]
|
||||
Formalize extension points for sources, analyzers, report types via Python entry points.
|
||||
|
||||
---
|
||||
|
||||
## Recommended Agent Team Assignments
|
||||
|
||||
### Immediate Sprint (Phase 1 — all in parallel)
|
||||
|
||||
| Agent | Items | Focus |
|
||||
|-------|-------|-------|
|
||||
| **Coder A** | 1.1 + 1.2 | Global search + REST API |
|
||||
| **Coder B** | 1.4 + 1.3 | Annotations + Export |
|
||||
| **Coder C** | 1.5 | Test suite foundation |
|
||||
|
||||
### Next Sprint (Phase 2 — mostly parallel)
|
||||
|
||||
| Agent | Items | Focus |
|
||||
|-------|-------|-------|
|
||||
| **Coder A** | 2.1 | Semantic search / Ask (signature feature) |
|
||||
| **Coder B** | 2.2 + 2.5 | Competition mapping + Comparison view |
|
||||
| **Coder C** | 2.3 + 2.4 | Readiness scoring + Pipeline health |
|
||||
|
||||
---
|
||||
|
||||
## Impact vs Effort Matrix
|
||||
|
||||
```
|
||||
HIGH IMPACT
|
||||
|
|
||||
| 2.1 Ask 3.2 Diffing
|
||||
| 1.1 Search 3.1 Forecasting
|
||||
| 2.2 Competition 4.4 Cross-SDO
|
||||
| 2.5 Compare
|
||||
| 1.2 API
|
||||
| 2.3 Readiness 3.3 Citations
|
||||
| 1.4 Annotations 3.4 Newsletter
|
||||
| 1.3 Export 4.2 W3C
|
||||
| 1.5 Tests 4.1 Docker
|
||||
| 4.5 Plugins
|
||||
| 4.3 IEEE/3GPP
|
||||
LOW IMPACT
|
||||
+------------------------------------------
|
||||
LOW EFFORT HIGH EFFORT
|
||||
```
|
||||
Reference in New Issue
Block a user