The _initialized singleton in auth.py prevented hooks from registering on the correct app instance when create_app() was called twice (once eagerly at import, once from __main__). Removed the guard and made the module-level app lazy. Also adds feature backlog and architecture assessment from the review team. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
13 KiB
IETF Draft Analyzer — Architectural Assessment
Date: 2026-03-09 Scope: Core source code analysis (src/, tests/) Project Size: ~7.6 MB, 19,662 lines of Python
1. File Sizes and Complexity
God Files (largest, highest complexity risk)
| File | LOC | Severity | Issue |
|---|---|---|---|
webui/data.py |
4,360 | HIGH | Service/data access layer doing too much |
cli.py |
3,438 | MEDIUM | 96 functions, 40+ Click commands, hard to navigate |
reports.py |
2,739 | MEDIUM | Single Reporter class with many report generation methods |
db.py |
1,690 | MEDIUM | 100+ methods, schema + CRUD + business logic mixed |
Healthy Modules (< 500 LOC, focused)
models.py(104 LOC) — Domain models onlyconfig.py(108 LOC) — Configuration with env overridesembeddings.py(205 LOC) — Ollama embedding wrapperauthors.py(137 LOC) — Author network fetchingfetcher.py(204 LOC) — Datatracker API client
2. Module Boundaries: Core vs. Web
Clean Separation ✓
- Core layer (
src/ietf_analyzer/): Self-contained, no Flask dependencies - Web layer (
src/webui/): Depends on core, not vice-versa - No circular imports detected
Problem: webui/data.py Violates Single Responsibility
What it does:
- Wraps
Database(4,360 lines!) - Implements domain logic (clustering, readiness scoring, similarity graphs)
- Prepares data for JSON/Jinja2 serialization
- Defines TypedDicts for response shapes
- Calls
sklearnfor TSNE/hierarchical clustering - Builds visualization data (radar, histogram, network graphs)
Risk: Tests only test_web_data.py — hard to regression-test domain logic when mixed with presentation layer.
3. Flask Structure (app.py)
Routes Count
- 72 functions (includes helpers)
- ~40+ @app.route() handlers
- No blueprints — monolithic Flask app
Route Categories
- Overview pages (5) —
/,/landscape,/timeline,/idea-clusters,/ratings - Detail pages (6) —
/drafts,/drafts/<name>,/gaps,/gaps/<id>, etc. - Feature pages (8) —
/search,/ask,/compare,/monitor,/admin/analytics, etc. - API endpoints (20+) —
/api/drafts,/api/stats,/api/search,/api/ask, etc. - Helpers (5+) — auth, rate limiting, CSV export, DB context
Issues
| Issue | Effort | Impact |
|---|---|---|
| No blueprint organization — Mix of concerns (pages, APIs, admin) in one file | SMALL | Makes navigation hard |
| Tight coupling to data.py — 50 imports from data.py | SMALL | Hard to refactor data layer |
| Mixed JSON/HTML rendering — Some routes render both based on Accept header | SMALL | Should be separate APIs |
Admin functions inline — /admin/analytics uses @admin_required decorator |
SMALL | Should be separate blueprint |
Recommendation: Split into 4 blueprints:
blueprints/pages.py— HTML pagesblueprints/api.py— JSON endpointsblueprints/admin.py— Admin routesblueprints/helpers.py— Shared utilities
4. Database Layer (db.py)
Structure: Single Database Class
100+ methods doing:
- Schema definition —
SCHEMAconstant,ensure_tables() - CRUD operations —
add_draft(),update_draft(),get_draft(),delete_draft() - Bulk operations —
add_drafts(),update_ratings() - Complex queries —
get_drafts_by_category(),search_fts(),most_cited(),co_authors() - Business logic — Rating aggregations, clustering, similarity ranking
- Cache management —
llm_cachetable operations - Stats —
count_drafts(),count_by_source(), aggregations
Issues
| Issue | Evidence | Refactor Effort |
|---|---|---|
| Mixed concerns | Methods scattered: schema, CRUD, queries, business logic | LARGE |
| No transaction support | add_draft() does 3+ INSERT statements without explicit tx |
MEDIUM |
| Hard to unit test | Database class touches 8+ tables; need fixtures for each | MEDIUM |
| Tight coupling to models | Direct Author, Draft, Rating dataclass deps |
SMALL |
| No query builders | Raw SQL in 20+ methods (injection risk if not careful) | MEDIUM |
Refactoring Path (4-step)
# Current: db.Database (100 methods)
#
# Refactored:
# - db.Schema — @dataclass fixtures, schema def (10 methods)
# - db.Repository — CRUD base class (15 methods)
# - db.DraftRepository, AuthorRepository, etc. — domain-specific CRUD
# - db.Queries — Complex queries as static methods or separate class
# - db.Cache — LLM cache operations (10 methods)
Effort: LARGE (4+ hours) Benefit: Testability, reusability, transaction support, easier migrations
5. Pipeline Architecture (pipeline/)
Structure: Modular design ✓
context.py— ContextBuilder (domain logic for draft generation)generator.py— DraftGenerator (Claude-based content generation)family.py— Family/relationships logicformatter.py— Output formattingquality.py— Quality checksprompts.py— System promptsPROMPTSconstant — Shared across modules
Assessment
✓ Good separation — Each module has a single responsibility ✓ Testable — Pure functions + dependency injection via Config/Database ✓ Extensible — Can add new stages without touching existing code
No refactoring needed for pipeline itself.
6. Sources Architecture (sources/)
Structure: Plugin pattern ✓
base.py—SourceDocumentdataclass,SourceFetcherprotocolietf.py,w3c.py,etsi.py,itu.py,iso.py,nist.py— Concrete fetchers
Assessment
✓ Excellent separation — Base protocol + concrete implementations ✓ Testable — Mock fetchers easy to create ✓ Extensible — New source = new file, no changes to orchestrator
No refactoring needed.
7. Config Management (config.py)
Structure
- Single
Configdataclass (40 fields) load()class method with env var override supportsave()method to persist to JSON- Validation in
_validate()
Assessment
✓ Clean — Single responsibility
✓ Testable — No I/O except file read/write
✓ Env support — _ENV_OVERRIDES dict maps env vars
Minor issue: Could use structured logging of which env vars override config (currently silent).
No refactoring needed unless config grows beyond 50 fields.
8. CLI Structure (cli.py)
Count: 96 functions
Command groups:
fetch— Datatracker + multi-source fetchingclassify— Ollama-based pre-filteringlist,search,show,annotate— Draft browsinganalyze— Claude analysis (rate, ideas, gaps)ask,compare— Interactive queriesembed,embed-ideas— Ollama embeddingssimilar,clusters— Embedding-based searchreport(group) →overview,landscape,digest,timeline, etc.monitor— Background pipeline automationpipeline,pipeline-status,pipeline-auto-heal— Orchestrationobservatory— Multi-source dashboardreadiness— Release readiness analysisexport— Generate drafts from gapsweb— Flask app launcher
Issues
| Issue | Severity | Solution |
|---|---|---|
| 96 functions in one file | MEDIUM | Hard to navigate, should split into subcommands or files |
| Long help text inline | LOW | Could use .help-txt files for docstrings |
| Late imports | LOW | Some imports inside @pass_cfg_db functions to save startup time (OK pattern) |
| Global console object | LOW | OK for Click, allows colorized output |
Recommended Split (5 files)
# cli.py (main entry, 200 lines)
# └─ cli/fetching.py (fetch, classify) — 400 lines
# └─ cli/analysis.py (analyze, ask, compare) — 600 lines
# └─ cli/reporting.py (report *, export, observatory) — 800 lines
# └─ cli/admin.py (monitor, pipeline, web) — 400 lines
Effort: SMALL (1 hour) Benefit: Easier to navigate, faster to find commands, clearer testing boundaries.
9. Circular Dependencies
Check Results
✓ No circular imports detected
cli.py→db,config,models✓app.py→webui.data,webui.auth,ietf_analyzer.*✓webui.data→ietf_analyzer.db,ietf_analyzer.config✓db.py→models,config✓
10. Test Structure
Test files: 8 modules covering:
test_db.py— Database operationstest_analyzer.py— Claude analysistest_search.py— Similarity + FTStest_web_data.py— Data layer for webtest_models.py— Domain modelstest_obsidian_export.py— Export
Coverage gaps:
- No tests for
cli.py(big commands, hard to test without mocking db) - No tests for
app.pyroutes (would need Flask test client + fixtures) - No tests for pipeline modules (context, generator, family)
- No tests for sources (would need HTTP mocks)
Effort to add 30% CLI coverage: MEDIUM (2-3 hours)
Summary: Refactoring Roadmap
| Priority | Module | Issue | Effort | Benefit |
|---|---|---|---|---|
| HIGH | webui/data.py |
4,360 LOC, mixed concerns (CRUD + domain logic + presentation) | LARGE | Separates presentation from domain, enables better testing |
| MEDIUM | db.py |
100 methods, mixed schema/CRUD/logic | LARGE | Testability, transaction support, query builders |
| MEDIUM | cli.py |
96 functions, hard to navigate | SMALL | Better organization, easier to find commands |
| MEDIUM | app.py |
40+ routes, no blueprints | SMALL | Clearer structure, easier to refactor |
| LOW | Config | 40 fields, working well | NONE | Monitor for growth beyond 50 fields |
| LOW | Pipeline/Sources | Well-structured, testable | NONE | No changes needed |
Recommendations (Priority Order)
1. Extract webui/data.py → presentation + domain (4 hours)
webui/
├── data.py (current 4,360 → 1,000) — Only JSON serialization
└── services/
├── drafts.py (200) — Draft filtering/sorting logic
├── analytics.py (400) — Dashboard stats + visualizations
├── search.py (300) — Search + clustering
└── readiness.py (200) — Readiness scoring
Cost: 4 hours Benefit: Can reuse analytics for CLI reports, easier to test
2. Refactor db.py → Repository pattern (4 hours)
db/
├── __init__.py (exports Database facade)
├── schema.py (100) — Schema definition
├── repository.py (200) — Base CRUD class
├── drafts.py (300) — DraftRepository (get, add, update, delete drafts)
├── ratings.py (200) — RatingRepository
├── authors.py (150) — AuthorRepository
├── queries.py (400) — Complex queries (search, similarity, aggregations)
└── cache.py (150) — LLM cache operations
Cost: 4 hours Benefit: 80% reduction in method count per class, transaction support, testability
3. Split cli.py into subcommand groups (1 hour)
cli/
├── __init__.py (main entry, ~200 lines)
├── fetching.py (fetch, classify)
├── analysis.py (analyze, ask, compare)
├── reporting.py (report *, export, observatory)
└── admin.py (monitor, pipeline, web)
Cost: 1 hour Benefit: Easier to navigate, clearer boundaries, faster to find commands
4. Convert Flask app to blueprints (1.5 hours)
webui/
├── app.py (core Flask setup, ~100 lines)
└── blueprints/
├── pages.py (HTML routes: /, /drafts, /landscape, etc.)
├── api.py (JSON endpoints: /api/*)
├── admin.py (/admin/*)
└── helpers.py (rate limiting, auth, CSV export)
Cost: 1.5 hours Benefit: Clearer separation of concerns, easier to add/remove features
5. Add CLI tests (3 hours)
- Mock database for each command
- Test success paths + error cases
- Quick smoke tests for all 15+ major commands
Cost: 3 hours Benefit: Catch regressions early, safe refactoring
Current State Assessment
| Dimension | Score | Notes |
|---|---|---|
| Modularity | 7/10 | Good separation of concerns; core vs. web clean; webui/data.py is the weak point |
| Testability | 6/10 | DB layer hard to unit test; pipeline/sources good; no CLI tests |
| Maintainability | 6/10 | Many large files; well-documented; consistent patterns throughout |
| Extensibility | 8/10 | Plugin pattern for sources; easy to add new reports; CLI is open-ended |
| Performance | 8/10 | Caching, FTS5, lazy imports in CLI; no N+1 queries detected |
Overall: 7/10 — Production-ready but showing signs of technical debt accumulation.
The project is well-organized at a high level but needs refactoring of large monolithic files to stay maintainable as it grows. The webui/data.py and db.py files should be prioritized first.