Files
ietf-draft-analyzer/data/reports/architecture-assessment-2026-03-09.md
Christian Nennemann 61cdab16b9 fix: dev mode auth regression from blueprint refactor
The _initialized singleton in auth.py prevented hooks from registering
on the correct app instance when create_app() was called twice (once
eagerly at import, once from __main__). Removed the guard and made
the module-level app lazy. Also adds feature backlog and architecture
assessment from the review team.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-09 03:52:02 +01:00

13 KiB

IETF Draft Analyzer — Architectural Assessment

Date: 2026-03-09 Scope: Core source code analysis (src/, tests/) Project Size: ~7.6 MB, 19,662 lines of Python


1. File Sizes and Complexity

God Files (largest, highest complexity risk)

File LOC Severity Issue
webui/data.py 4,360 HIGH Service/data access layer doing too much
cli.py 3,438 MEDIUM 96 functions, 40+ Click commands, hard to navigate
reports.py 2,739 MEDIUM Single Reporter class with many report generation methods
db.py 1,690 MEDIUM 100+ methods, schema + CRUD + business logic mixed

Healthy Modules (< 500 LOC, focused)

  • models.py (104 LOC) — Domain models only
  • config.py (108 LOC) — Configuration with env overrides
  • embeddings.py (205 LOC) — Ollama embedding wrapper
  • authors.py (137 LOC) — Author network fetching
  • fetcher.py (204 LOC) — Datatracker API client

2. Module Boundaries: Core vs. Web

Clean Separation ✓

  • Core layer (src/ietf_analyzer/): Self-contained, no Flask dependencies
  • Web layer (src/webui/): Depends on core, not vice-versa
  • No circular imports detected

Problem: webui/data.py Violates Single Responsibility

What it does:

  1. Wraps Database (4,360 lines!)
  2. Implements domain logic (clustering, readiness scoring, similarity graphs)
  3. Prepares data for JSON/Jinja2 serialization
  4. Defines TypedDicts for response shapes
  5. Calls sklearn for TSNE/hierarchical clustering
  6. Builds visualization data (radar, histogram, network graphs)

Risk: Tests only test_web_data.py — hard to regression-test domain logic when mixed with presentation layer.


3. Flask Structure (app.py)

Routes Count

  • 72 functions (includes helpers)
  • ~40+ @app.route() handlers
  • No blueprints — monolithic Flask app

Route Categories

  1. Overview pages (5) — /, /landscape, /timeline, /idea-clusters, /ratings
  2. Detail pages (6) — /drafts, /drafts/<name>, /gaps, /gaps/<id>, etc.
  3. Feature pages (8) — /search, /ask, /compare, /monitor, /admin/analytics, etc.
  4. API endpoints (20+) — /api/drafts, /api/stats, /api/search, /api/ask, etc.
  5. Helpers (5+) — auth, rate limiting, CSV export, DB context

Issues

Issue Effort Impact
No blueprint organization — Mix of concerns (pages, APIs, admin) in one file SMALL Makes navigation hard
Tight coupling to data.py — 50 imports from data.py SMALL Hard to refactor data layer
Mixed JSON/HTML rendering — Some routes render both based on Accept header SMALL Should be separate APIs
Admin functions inline/admin/analytics uses @admin_required decorator SMALL Should be separate blueprint

Recommendation: Split into 4 blueprints:

  • blueprints/pages.py — HTML pages
  • blueprints/api.py — JSON endpoints
  • blueprints/admin.py — Admin routes
  • blueprints/helpers.py — Shared utilities

4. Database Layer (db.py)

Structure: Single Database Class

100+ methods doing:

  1. Schema definitionSCHEMA constant, ensure_tables()
  2. CRUD operationsadd_draft(), update_draft(), get_draft(), delete_draft()
  3. Bulk operationsadd_drafts(), update_ratings()
  4. Complex queriesget_drafts_by_category(), search_fts(), most_cited(), co_authors()
  5. Business logic — Rating aggregations, clustering, similarity ranking
  6. Cache managementllm_cache table operations
  7. Statscount_drafts(), count_by_source(), aggregations

Issues

Issue Evidence Refactor Effort
Mixed concerns Methods scattered: schema, CRUD, queries, business logic LARGE
No transaction support add_draft() does 3+ INSERT statements without explicit tx MEDIUM
Hard to unit test Database class touches 8+ tables; need fixtures for each MEDIUM
Tight coupling to models Direct Author, Draft, Rating dataclass deps SMALL
No query builders Raw SQL in 20+ methods (injection risk if not careful) MEDIUM

Refactoring Path (4-step)

# Current: db.Database (100 methods)
#
# Refactored:
# - db.Schema — @dataclass fixtures, schema def (10 methods)
# - db.Repository — CRUD base class (15 methods)
# - db.DraftRepository, AuthorRepository, etc. — domain-specific CRUD
# - db.Queries — Complex queries as static methods or separate class
# - db.Cache — LLM cache operations (10 methods)

Effort: LARGE (4+ hours) Benefit: Testability, reusability, transaction support, easier migrations


5. Pipeline Architecture (pipeline/)

Structure: Modular design ✓

  • context.py — ContextBuilder (domain logic for draft generation)
  • generator.py — DraftGenerator (Claude-based content generation)
  • family.py — Family/relationships logic
  • formatter.py — Output formatting
  • quality.py — Quality checks
  • prompts.py — System prompts
  • PROMPTS constant — Shared across modules

Assessment

Good separation — Each module has a single responsibility ✓ Testable — Pure functions + dependency injection via Config/Database ✓ Extensible — Can add new stages without touching existing code

No refactoring needed for pipeline itself.


6. Sources Architecture (sources/)

Structure: Plugin pattern ✓

  • base.pySourceDocument dataclass, SourceFetcher protocol
  • ietf.py, w3c.py, etsi.py, itu.py, iso.py, nist.py — Concrete fetchers

Assessment

Excellent separation — Base protocol + concrete implementations ✓ Testable — Mock fetchers easy to create ✓ Extensible — New source = new file, no changes to orchestrator

No refactoring needed.


7. Config Management (config.py)

Structure

  • Single Config dataclass (40 fields)
  • load() class method with env var override support
  • save() method to persist to JSON
  • Validation in _validate()

Assessment

Clean — Single responsibility ✓ Testable — No I/O except file read/write ✓ Env support_ENV_OVERRIDES dict maps env vars

Minor issue: Could use structured logging of which env vars override config (currently silent).

No refactoring needed unless config grows beyond 50 fields.


8. CLI Structure (cli.py)

Count: 96 functions

Command groups:

  • fetch — Datatracker + multi-source fetching
  • classify — Ollama-based pre-filtering
  • list, search, show, annotate — Draft browsing
  • analyze — Claude analysis (rate, ideas, gaps)
  • ask, compare — Interactive queries
  • embed, embed-ideas — Ollama embeddings
  • similar, clusters — Embedding-based search
  • report (group) → overview, landscape, digest, timeline, etc.
  • monitor — Background pipeline automation
  • pipeline, pipeline-status, pipeline-auto-heal — Orchestration
  • observatory — Multi-source dashboard
  • readiness — Release readiness analysis
  • export — Generate drafts from gaps
  • web — Flask app launcher

Issues

Issue Severity Solution
96 functions in one file MEDIUM Hard to navigate, should split into subcommands or files
Long help text inline LOW Could use .help-txt files for docstrings
Late imports LOW Some imports inside @pass_cfg_db functions to save startup time (OK pattern)
Global console object LOW OK for Click, allows colorized output
# cli.py (main entry, 200 lines)
# └─ cli/fetching.py (fetch, classify) — 400 lines
# └─ cli/analysis.py (analyze, ask, compare) — 600 lines
# └─ cli/reporting.py (report *, export, observatory) — 800 lines
# └─ cli/admin.py (monitor, pipeline, web) — 400 lines

Effort: SMALL (1 hour) Benefit: Easier to navigate, faster to find commands, clearer testing boundaries.


9. Circular Dependencies

Check Results

✓ No circular imports detected

  • cli.pydb, config, models
  • app.pywebui.data, webui.auth, ietf_analyzer.*
  • webui.dataietf_analyzer.db, ietf_analyzer.config
  • db.pymodels, config

10. Test Structure

Test files: 8 modules covering:

  • test_db.py — Database operations
  • test_analyzer.py — Claude analysis
  • test_search.py — Similarity + FTS
  • test_web_data.py — Data layer for web
  • test_models.py — Domain models
  • test_obsidian_export.py — Export

Coverage gaps:

  • No tests for cli.py (big commands, hard to test without mocking db)
  • No tests for app.py routes (would need Flask test client + fixtures)
  • No tests for pipeline modules (context, generator, family)
  • No tests for sources (would need HTTP mocks)

Effort to add 30% CLI coverage: MEDIUM (2-3 hours)


Summary: Refactoring Roadmap

Priority Module Issue Effort Benefit
HIGH webui/data.py 4,360 LOC, mixed concerns (CRUD + domain logic + presentation) LARGE Separates presentation from domain, enables better testing
MEDIUM db.py 100 methods, mixed schema/CRUD/logic LARGE Testability, transaction support, query builders
MEDIUM cli.py 96 functions, hard to navigate SMALL Better organization, easier to find commands
MEDIUM app.py 40+ routes, no blueprints SMALL Clearer structure, easier to refactor
LOW Config 40 fields, working well NONE Monitor for growth beyond 50 fields
LOW Pipeline/Sources Well-structured, testable NONE No changes needed

Recommendations (Priority Order)

1. Extract webui/data.py → presentation + domain (4 hours)

webui/
├── data.py (current 4,360 → 1,000) — Only JSON serialization
└── services/
    ├── drafts.py (200) — Draft filtering/sorting logic
    ├── analytics.py (400) — Dashboard stats + visualizations
    ├── search.py (300) — Search + clustering
    └── readiness.py (200) — Readiness scoring

Cost: 4 hours Benefit: Can reuse analytics for CLI reports, easier to test

2. Refactor db.py → Repository pattern (4 hours)

db/
├── __init__.py (exports Database facade)
├── schema.py (100) — Schema definition
├── repository.py (200) — Base CRUD class
├── drafts.py (300) — DraftRepository (get, add, update, delete drafts)
├── ratings.py (200) — RatingRepository
├── authors.py (150) — AuthorRepository
├── queries.py (400) — Complex queries (search, similarity, aggregations)
└── cache.py (150) — LLM cache operations

Cost: 4 hours Benefit: 80% reduction in method count per class, transaction support, testability

3. Split cli.py into subcommand groups (1 hour)

cli/
├── __init__.py (main entry, ~200 lines)
├── fetching.py (fetch, classify)
├── analysis.py (analyze, ask, compare)
├── reporting.py (report *, export, observatory)
└── admin.py (monitor, pipeline, web)

Cost: 1 hour Benefit: Easier to navigate, clearer boundaries, faster to find commands

4. Convert Flask app to blueprints (1.5 hours)

webui/
├── app.py (core Flask setup, ~100 lines)
└── blueprints/
    ├── pages.py (HTML routes: /, /drafts, /landscape, etc.)
    ├── api.py (JSON endpoints: /api/*)
    ├── admin.py (/admin/*)
    └── helpers.py (rate limiting, auth, CSV export)

Cost: 1.5 hours Benefit: Clearer separation of concerns, easier to add/remove features

5. Add CLI tests (3 hours)

  • Mock database for each command
  • Test success paths + error cases
  • Quick smoke tests for all 15+ major commands

Cost: 3 hours Benefit: Catch regressions early, safe refactoring


Current State Assessment

Dimension Score Notes
Modularity 7/10 Good separation of concerns; core vs. web clean; webui/data.py is the weak point
Testability 6/10 DB layer hard to unit test; pipeline/sources good; no CLI tests
Maintainability 6/10 Many large files; well-documented; consistent patterns throughout
Extensibility 8/10 Plugin pattern for sources; easy to add new reports; CLI is open-ended
Performance 8/10 Caching, FTS5, lazy imports in CLI; no N+1 queries detected

Overall: 7/10 — Production-ready but showing signs of technical debt accumulation.

The project is well-organized at a high level but needs refactoring of large monolithic files to stay maintainable as it grows. The webui/data.py and db.py files should be prioritized first.