Complete remaining medium/low issues: performance, CLI, types, CI, tests
Performance: - Batch readiness computation (~200 queries → ~6 per page) - Batch draft lookup in author network (N+1 → single query) - File-based similarity matrix cache (.npy + metadata sidecar) - 5-minute TTL embedding cache for search queries CLI quality: - Add pass_cfg_db decorator, convert ~30 commands to shared config/db lifecycle - Add --dry-run to analyze, embed, embed-ideas, ideas, gaps commands - Move 15+ in-function imports to top of data.py Types & documentation: - Add 16 TypedDicts to data.py, annotate 12 function return types - Add ethics section to Post 06 (premature standardization, power asymmetry) - Add EU AI Act Article 43 conformity mapping to Post 06 - Add NIS2 and CRA references to Post 04 CI & testing: - Add GitHub Actions CI workflow (Python 3.11+3.12, ruff, pytest) - Add API documentation for all 20 endpoints (data/reports/api-docs.md) - Add 41 new tests (test_analyzer.py, test_search.py) — 64 total pass Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
This commit is contained in:
@@ -4,6 +4,53 @@
|
||||
|
||||
---
|
||||
|
||||
### 2026-03-08 CODER — TypedDicts for data layer, ethics + regulatory content in blog series
|
||||
|
||||
**What**: Four improvements across typing and content:
|
||||
1. **TypedDicts in `src/webui/data.py`** — Added 16 TypedDict definitions for common return shapes: `OverviewStats`, `DraftsPage`, `DraftListItem`, `AuthorInfo`, `AuthorNetwork` (with `AuthorNetworkNode`, `AuthorNetworkEdge`, `AuthorCluster`), `SimilarityGraph`, `TimelineData`, `MonitorStatus` (with `MonitorPipeline`, `MonitorCost`), `SearchResults`, `CitationGraph`. Annotated 12 function return types.
|
||||
2. **Ethics section in Post 06** — Added "The Ethics of Standardizing Early" section (3 paragraphs) covering: premature capability standardization, power asymmetry in authorship, surveillance-friendly architecture risk, and human oversight as non-optional.
|
||||
3. **EU AI Act conformity assessment note in Post 06** — Connected L2/L3 assurance profiles to Art. 43 conformity assessment requirements (1 sentence in Pillar 4 section).
|
||||
4. **NIS2 + CRA references in Post 04** — Added NIS2 Directive reference to telecom cascade scenario (essential service obligations). Added Cyber Resilience Act reference to federated learning privacy gap (secure update lifecycle requirements).
|
||||
**Why**: Untyped dicts make the data layer hard to maintain and refactor. Blog series lacked ethical framing and key EU regulatory cross-references (NIS2, CRA) that strengthen the compliance narrative.
|
||||
**Result**: 16 TypedDicts with 12 annotated functions. 3 blog post sections added/expanded across Posts 04 and 06.
|
||||
|
||||
---
|
||||
|
||||
### 2026-03-08 CODER — CI/CD, API docs, and test coverage expansion
|
||||
|
||||
**What**: Three infrastructure additions:
|
||||
1. **GitHub Actions CI** — Added `.github/workflows/ci.yml` that runs on push/PR to main. Tests Python 3.11 and 3.12, installs from `[test]` extras, runs ruff lint (E/F/W rules, ignoring E501), and runs pytest.
|
||||
2. **API documentation** — Created `data/reports/api-docs.md` documenting all 20 API endpoints in `src/webui/app.py` with method, URL, parameters, response format, and auth requirements. Covers public endpoints (drafts, stats, search, ideas, ratings, etc.) and admin-only endpoints (gaps, compare, synthesize, annotate, monitor).
|
||||
3. **New test files** — Added `tests/test_analyzer.py` (21 tests covering `_extract_json`, `_clamp_rating`, `_parse_rating` with compact/verbose keys, defaults, and clamping) and `tests/test_search.py` (19 tests covering `sanitize_fts_query` with injection attempts, boolean operators, special chars, edge cases). Total: 64 tests all passing.
|
||||
**Why**: Project had zero CI, no API docs for the web UI, and test coverage only on DB/models. These are prerequisites for public deployment and contributor onboarding.
|
||||
**Result**: CI workflow ready, API fully documented, test count increased from 23 to 64. All tests pass in 0.6s.
|
||||
|
||||
---
|
||||
|
||||
### 2026-03-08 CODER — Performance: fix N+1 queries and add caching
|
||||
|
||||
**What**: Four targeted performance fixes across the codebase:
|
||||
1. **Batch readiness computation** — `compute_readiness_batch()` in `readiness.py` replaces per-draft readiness calls on the drafts page. Bulk-loads ref counts, cited-by counts, author experience, and ratings in ~6 queries total instead of ~200 (4 queries x 50 drafts/page).
|
||||
2. **Batch draft lookup in author network** — `_compute_author_network_full()` now calls `db.get_drafts_by_names()` once to pre-load all drafts referenced by authors, instead of calling `db.get_draft()` in a loop inside cluster building.
|
||||
3. **File-based similarity matrix cache** — `Embedder.similarity_matrix()` now caches the O(n^2) cosine similarity matrix to disk (`.cache/` dir next to DB), keyed by SHA256 hash of draft names. Reloads from cache if the set of embedded drafts hasn't changed.
|
||||
4. **Embeddings cache for search** — `HybridSearch._get_all_embeddings()` caches the result of `db.all_embeddings()` with a 5-minute TTL, avoiding a full DB scan on every search query.
|
||||
Also added `Database.get_drafts_by_names()` batch method in `db.py` (chunked to stay under SQLite's 999 variable limit).
|
||||
**Why**: Page loads on the drafts listing and author network pages were slow due to N+1 query patterns. The similarity matrix was recomputed from scratch on every CLI invocation. Search queries redundantly loaded all embeddings from disk.
|
||||
**Result**: Drafts page: ~200 queries reduced to ~6. Author network cluster building: ~100 `get_draft` calls reduced to 1 batch query. Similarity matrix: cached to disk, skips O(n^2) recomputation when embeddings unchanged. Search: embeddings loaded once per 5 minutes instead of per query.
|
||||
|
||||
---
|
||||
|
||||
### 2026-03-08 CODER — CLI boilerplate reduction, --dry-run flags, webui import cleanup
|
||||
|
||||
**What**: Three code quality improvements across the CLI and web UI:
|
||||
1. **CLI boilerplate reduction** — Created a `pass_cfg_db` decorator that extracts `cfg` and `db` from the Click context, replacing ~40 instances of `cfg = _get_config(); db = Database(cfg); try: ... finally: db.close()`. The `main()` group now initializes config/db once and registers `db.close()` via `ctx.call_on_close()`. Converted ~30 commands to use the new pattern (all report, viz, wg, ideas, and core commands). Remaining ~15 read-only commands still use the old pattern but work correctly.
|
||||
2. **--dry-run on destructive commands** — Added `--dry-run` flag to `analyze`, `embed`, `embed-ideas`, `ideas` (extract), and `gaps`. Each shows what would be processed (draft names, counts) without making API calls or DB changes. Pre-existing dry-run flags on `ideas filter`, `dedup-ideas`, `pipeline generate`, and `observatory update` were preserved.
|
||||
3. **webui/data.py import cleanup** — Moved 15+ in-function imports to the top of the file: `numpy`, `re`, `sklearn.{TSNE, AgglomerativeClustering, normalize}`, `ietf_analyzer.{readiness, search}`. Fixed `json as _json` alias to use the already-imported `json`. sklearn imports inside try/except blocks (for graceful failure) were moved to top level since sklearn is a required dependency.
|
||||
**Why**: The CLI had ~800 lines of pure boilerplate. The try/finally pattern was error-prone (easy to forget db.close()). Missing --dry-run on destructive commands made it risky to explore what a command would do. In-function imports in data.py were unnecessary since all dependencies are required.
|
||||
**Result**: cli.py reduced by ~200 lines of boilerplate. 6 commands now have --dry-run. data.py has clean top-level imports. Both files pass syntax checks and the CLI loads correctly.
|
||||
|
||||
---
|
||||
|
||||
### 2026-03-08 CODER — Critical fixes: rating clamp, convergence command, blog number correction
|
||||
|
||||
**What**: Three fixes addressing data integrity and reproducibility:
|
||||
|
||||
Reference in New Issue
Block a user