Files
quicproquo/SPRINTS.md

230 lines
9.0 KiB
Markdown
Raw Permalink Blame History

This file contains ambiguous Unicode characters
This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.
# quicprochat — Sprint Plan
> 7 sprints synthesized from code audit, architecture analysis, and ecosystem research.
> Each sprint is ~1 week. Sprints are ordered by priority and dependency.
---
## Sprint 1 — Bug Fixes & Code Quality (Quick Wins)
Fix all known bugs, clippy warnings, and dead code before building on top.
- [x] **1.1 Fix boolean logic bug in TUI**
- `crates/quicprochat-client/src/client/v2_tui.rs:832` — remove `|| true`
- Cursor positioning always executes regardless of input state
- [x] **1.2 Fix unwrap violations in P2P router**
- `crates/quicprochat-p2p/src/routing.rs:416,419``.lock().unwrap()` on Mutex
- Replace with `.expect("lock poisoned")` or proper error handling
- [x] **1.3 Remove placeholder assertion in WebTransport**
- `crates/quicprochat-server/src/webtransport.rs:418``assert!(true);`
- [x] **1.4 Wire up unused metrics**
- `record_storage_latency()` — instrument storage layer calls
- `record_uptime_seconds()` — add periodic heartbeat task in server main loop
- [x] **1.5 Wire up or remove unused config fields**
- `EffectiveConfig::webtransport_listen` — connect to WebTransport listener
- `EffectiveConfig::rpc_timeout_secs` — apply as per-RPC deadline
- `EffectiveConfig::storage_timeout_secs` — apply as DB query timeout
- [x] **1.6 Fix remaining clippy warnings**
- Reduce function arity (2 functions with 8-9 args → use config/param structs)
- Remove useless `format!()` call
- Collapse nested conditionals
- Rename `from_str` method to avoid `FromStr` trait confusion
---
## Sprint 2 — OpenMLS 0.5 → 0.8 Migration
**CRITICAL**: OpenMLS 0.7.2 includes security patches. Staying on 0.5 is a risk.
- [x] **2.1 Migrate StorageProvider trait**
- Old `OpenMlsKeyStore` → new `StorageProvider` (most invasive change)
- Rework `DiskKeyStore` integration (must keep bincode serialization)
- Update all `group.rs` calls that interact with the key store
- [x] **2.2 Update MLS API calls**
- `self_update()` / `propose_self_update()` — add `LeafNodeParameters` arg
- `join_by_external_commit()` — add optional LeafNode params
- `Sender::NewMember` → split into `NewMemberProposal` / `NewMemberCommit`
- [x] **2.3 Handle GREASE support**
- New variants in `ProposalType`, `ExtensionType`, `CredentialType`
- Update match arms to handle unknown/GREASE values
- [x] **2.4 Update AAD handling**
- AAD no longer persisted — set before every API call generating `MlsMessageOut`
- [x] **2.5 Verify FIPS 203 alignment**
- Confirm ML-KEM-768 parameters match final FIPS 203 (not draft)
- Review hybrid KEM against RFC 9794 combination methods
- [x] **2.6 Full test suite pass**
- All 301 tests must pass with OpenMLS 0.8
- Run crypto benchmarks to check for performance regressions
---
## Sprint 3 — Client Resilience
Currently, network glitches cause the client to hang. This blocks v2 launch.
- [x] **3.1 Auto-reconnect with backoff**
- Integrate existing `retry.rs` into `RpcClient::call()` path
- Exponential backoff with jitter (already implemented, not wired)
- Configurable max retries and backoff ceiling
- [x] **3.2 Push subscription recovery**
- Detect broken push stream and re-subscribe automatically
- Buffer missed events during reconnection window
- [x] **3.3 Heartbeat / keepalive**
- Periodic QUIC ping in TUI and REPL modes
- Detect dead connections before user notices
- [x] **3.4 SDK disconnect lifecycle**
- Add `QpcClient::disconnect()` for clean shutdown
- Proper state machine: Connected → Reconnecting → Disconnected
- [x] **3.5 Connection status UI**
- TUI: show connection state in status bar (Connected / Reconnecting / Offline)
- REPL: print status change notifications
---
## Sprint 4 — Server Hardening
Fix graceful shutdown and wire up timeouts for production readiness.
- [x] **4.1 In-flight RPC tracking**
- Replace fixed 30s shutdown delay with actual in-flight RPC counter
- Drain when counter reaches zero (with configurable max wait)
- [x] **4.2 Apply request-level timeouts**
- Wire `rpc_timeout_secs` config into per-RPC deadline enforcement
- Wire `storage_timeout_secs` into DB query timeouts
- Cancel long-running operations cleanly
- [x] **4.3 Plugin shutdown hooks**
- Add `on_shutdown` hook to `HookVTable`
- Call plugin shutdown before server exits
- [x] **4.4 Federation drain during shutdown**
- Stop accepting federation relay requests on SIGTERM
- Wait for in-flight federation RPCs before exit
- [x] **4.5 Connection draining improvements**
- Send QUIC CONNECTION_CLOSE with application reason
- WebTransport: send close frame before dropping sessions
---
## Sprint 5 — Test Coverage & CI Hardening
Address the major test coverage gaps identified in the audit.
- [x] **5.1 RPC framing unit tests**
- `crates/quicprochat-rpc/src/framing.rs` — encode/decode edge cases
- Malformed frames, truncated input, max-size payloads
- Fuzzing harness for frame parser
- [x] **5.2 SDK state machine tests**
- `crates/quicprochat-sdk/src/conversation.rs` — conversation lifecycle
- `crates/quicprochat-sdk/src/groups.rs` — group join/leave/update
- `crates/quicprochat-sdk/src/messaging.rs` — send/receive/queue
- [x] **5.3 Server domain service tests**
- `crates/quicprochat-server/src/domain/` — all service modules
- Test business logic without DB (mock storage trait)
- [x] **5.4 Integration tests**
- Reconnection scenario (kill server, restart, verify client recovers)
- Graceful shutdown (send SIGTERM during active RPCs, verify drain)
- Multi-node federation relay (if federation wired in Sprint 6)
- [x] **5.5 CI hardening**
- Add MSRV check (Rust 1.75 or declared minimum)
- Add cross-platform CI (macOS, Windows — at least build check)
- Add cargo-fuzz for crypto and parsing code
- Add MIRI for unsafe code in plugin-api/FFI
---
## Sprint 6 — Federation & P2P Integration
Wire up the scaffolded federation and P2P code into working features.
- [x] **6.1 Federation message routing**
- Wire `federation::routing::resolve_destination()` into `handle_enqueue`
- Route messages to remote home servers via `FederationClient::relay_enqueue()`
- Resolve protocol mismatch (Cap'n Proto federation vs Protobuf main RPC)
- [x] **6.2 Federation identity resolution**
- Cross-server user lookup (`user@remote-server`)
- KeyPackage fetching across federated nodes
- [x] **6.3 P2P client integration**
- Wire iroh P2P into client as transport option
- Fallback logic: prefer P2P direct → fall back to server relay
- mDNS discovery in client (already scaffolded, needs activation)
- [x] **6.4 Multipath QUIC evaluation**
- Research draft-ietf-quic-multipath (likely RFC in 2026)
- Prototype: use multiple paths for mesh relay resilience
- Decision: adopt or defer based on quinn support
- [x] **6.5 Federation integration tests**
- Two-server test: register on A, send to user on B, verify delivery
- mTLS mutual auth verification
- Partition tolerance (one node goes down, messages queue)
---
## Sprint 7 — Documentation, Polish & Future Prep
Final polish and forward-looking improvements.
- [x] **7.1 Crate-level documentation**
- Add module-level docs to `quicprochat-plugin-api`, `quicprochat-rpc`, `quicprochat-sdk`
- Doc comments for all public APIs in domain services
- [x] **7.2 Refactor high-arity functions** (none found — already clean)
- Consolidate 8-9 parameter functions into config/param structs
- Improve builder patterns where appropriate
- [ ] **7.3 Review RFC 9750 (MLS Architecture)** (deferred — requires manual review)
- Verify quicprochat's AS/DS split aligns with RFC 9750 recommendations
- Document any deviations and rationale
- [ ] **7.4 Desktop client evaluation** (deferred — requires Tauri prototype)
- Prototype Tauri v2 desktop shell wrapping the TUI or a web UI
- Evaluate effort to ship cross-platform desktop client
- [x] **7.5 Security pre-audit prep**
- Document all crypto boundaries and trust assumptions
- Create threat model document
- Prepare scope document for external auditors (Roadmap item 4.1)
- Budget: NCC Group / Trail of Bits / Cure53 ($50K$150K, 4-6 weeks)
- [ ] **7.6 Repository rename** (requires GitHub admin action)
- Rename GitHub repository from `quicproquo``quicprochat`
- Update all GitHub URLs, CI badge links, go.mod import paths
- Set up redirect from old repo name
---
## Sprint Summary
| Sprint | Focus | Risk | Key Deliverable |
|--------|-------|------|----------------|
| **1** | Bug fixes & code quality | Low | Zero clippy warnings, metrics wired |
| **2** | OpenMLS 0.5 → 0.8 | High | Security patches applied, FIPS 203 verified |
| **3** | Client resilience | Medium | Auto-reconnect, heartbeat, status UI |
| **4** | Server hardening | Medium | Real graceful shutdown, timeouts enforced |
| **5** | Test coverage & CI | Low | Unit tests for SDK/RPC/domain, fuzzing |
| **6** | Federation & P2P | High | Working cross-server messaging, P2P fallback |
| **7** | Docs, polish & audit prep | Low | Audit-ready, desktop prototype |