230 lines
9.0 KiB
Markdown
230 lines
9.0 KiB
Markdown
# quicprochat — Sprint Plan
|
||
|
||
> 7 sprints synthesized from code audit, architecture analysis, and ecosystem research.
|
||
> Each sprint is ~1 week. Sprints are ordered by priority and dependency.
|
||
|
||
---
|
||
|
||
## Sprint 1 — Bug Fixes & Code Quality (Quick Wins)
|
||
|
||
Fix all known bugs, clippy warnings, and dead code before building on top.
|
||
|
||
- [x] **1.1 Fix boolean logic bug in TUI**
|
||
- `crates/quicprochat-client/src/client/v2_tui.rs:832` — remove `|| true`
|
||
- Cursor positioning always executes regardless of input state
|
||
|
||
- [x] **1.2 Fix unwrap violations in P2P router**
|
||
- `crates/quicprochat-p2p/src/routing.rs:416,419` — `.lock().unwrap()` on Mutex
|
||
- Replace with `.expect("lock poisoned")` or proper error handling
|
||
|
||
- [x] **1.3 Remove placeholder assertion in WebTransport**
|
||
- `crates/quicprochat-server/src/webtransport.rs:418` — `assert!(true);`
|
||
|
||
- [x] **1.4 Wire up unused metrics**
|
||
- `record_storage_latency()` — instrument storage layer calls
|
||
- `record_uptime_seconds()` — add periodic heartbeat task in server main loop
|
||
|
||
- [x] **1.5 Wire up or remove unused config fields**
|
||
- `EffectiveConfig::webtransport_listen` — connect to WebTransport listener
|
||
- `EffectiveConfig::rpc_timeout_secs` — apply as per-RPC deadline
|
||
- `EffectiveConfig::storage_timeout_secs` — apply as DB query timeout
|
||
|
||
- [x] **1.6 Fix remaining clippy warnings**
|
||
- Reduce function arity (2 functions with 8-9 args → use config/param structs)
|
||
- Remove useless `format!()` call
|
||
- Collapse nested conditionals
|
||
- Rename `from_str` method to avoid `FromStr` trait confusion
|
||
|
||
---
|
||
|
||
## Sprint 2 — OpenMLS 0.5 → 0.8 Migration
|
||
|
||
**CRITICAL**: OpenMLS 0.7.2 includes security patches. Staying on 0.5 is a risk.
|
||
|
||
- [x] **2.1 Migrate StorageProvider trait**
|
||
- Old `OpenMlsKeyStore` → new `StorageProvider` (most invasive change)
|
||
- Rework `DiskKeyStore` integration (must keep bincode serialization)
|
||
- Update all `group.rs` calls that interact with the key store
|
||
|
||
- [x] **2.2 Update MLS API calls**
|
||
- `self_update()` / `propose_self_update()` — add `LeafNodeParameters` arg
|
||
- `join_by_external_commit()` — add optional LeafNode params
|
||
- `Sender::NewMember` → split into `NewMemberProposal` / `NewMemberCommit`
|
||
|
||
- [x] **2.3 Handle GREASE support**
|
||
- New variants in `ProposalType`, `ExtensionType`, `CredentialType`
|
||
- Update match arms to handle unknown/GREASE values
|
||
|
||
- [x] **2.4 Update AAD handling**
|
||
- AAD no longer persisted — set before every API call generating `MlsMessageOut`
|
||
|
||
- [x] **2.5 Verify FIPS 203 alignment**
|
||
- Confirm ML-KEM-768 parameters match final FIPS 203 (not draft)
|
||
- Review hybrid KEM against RFC 9794 combination methods
|
||
|
||
- [x] **2.6 Full test suite pass**
|
||
- All 301 tests must pass with OpenMLS 0.8
|
||
- Run crypto benchmarks to check for performance regressions
|
||
|
||
---
|
||
|
||
## Sprint 3 — Client Resilience
|
||
|
||
Currently, network glitches cause the client to hang. This blocks v2 launch.
|
||
|
||
- [x] **3.1 Auto-reconnect with backoff**
|
||
- Integrate existing `retry.rs` into `RpcClient::call()` path
|
||
- Exponential backoff with jitter (already implemented, not wired)
|
||
- Configurable max retries and backoff ceiling
|
||
|
||
- [x] **3.2 Push subscription recovery**
|
||
- Detect broken push stream and re-subscribe automatically
|
||
- Buffer missed events during reconnection window
|
||
|
||
- [x] **3.3 Heartbeat / keepalive**
|
||
- Periodic QUIC ping in TUI and REPL modes
|
||
- Detect dead connections before user notices
|
||
|
||
- [x] **3.4 SDK disconnect lifecycle**
|
||
- Add `QpcClient::disconnect()` for clean shutdown
|
||
- Proper state machine: Connected → Reconnecting → Disconnected
|
||
|
||
- [x] **3.5 Connection status UI**
|
||
- TUI: show connection state in status bar (Connected / Reconnecting / Offline)
|
||
- REPL: print status change notifications
|
||
|
||
---
|
||
|
||
## Sprint 4 — Server Hardening
|
||
|
||
Fix graceful shutdown and wire up timeouts for production readiness.
|
||
|
||
- [x] **4.1 In-flight RPC tracking**
|
||
- Replace fixed 30s shutdown delay with actual in-flight RPC counter
|
||
- Drain when counter reaches zero (with configurable max wait)
|
||
|
||
- [x] **4.2 Apply request-level timeouts**
|
||
- Wire `rpc_timeout_secs` config into per-RPC deadline enforcement
|
||
- Wire `storage_timeout_secs` into DB query timeouts
|
||
- Cancel long-running operations cleanly
|
||
|
||
- [x] **4.3 Plugin shutdown hooks**
|
||
- Add `on_shutdown` hook to `HookVTable`
|
||
- Call plugin shutdown before server exits
|
||
|
||
- [x] **4.4 Federation drain during shutdown**
|
||
- Stop accepting federation relay requests on SIGTERM
|
||
- Wait for in-flight federation RPCs before exit
|
||
|
||
- [x] **4.5 Connection draining improvements**
|
||
- Send QUIC CONNECTION_CLOSE with application reason
|
||
- WebTransport: send close frame before dropping sessions
|
||
|
||
---
|
||
|
||
## Sprint 5 — Test Coverage & CI Hardening
|
||
|
||
Address the major test coverage gaps identified in the audit.
|
||
|
||
- [x] **5.1 RPC framing unit tests**
|
||
- `crates/quicprochat-rpc/src/framing.rs` — encode/decode edge cases
|
||
- Malformed frames, truncated input, max-size payloads
|
||
- Fuzzing harness for frame parser
|
||
|
||
- [x] **5.2 SDK state machine tests**
|
||
- `crates/quicprochat-sdk/src/conversation.rs` — conversation lifecycle
|
||
- `crates/quicprochat-sdk/src/groups.rs` — group join/leave/update
|
||
- `crates/quicprochat-sdk/src/messaging.rs` — send/receive/queue
|
||
|
||
- [x] **5.3 Server domain service tests**
|
||
- `crates/quicprochat-server/src/domain/` — all service modules
|
||
- Test business logic without DB (mock storage trait)
|
||
|
||
- [x] **5.4 Integration tests**
|
||
- Reconnection scenario (kill server, restart, verify client recovers)
|
||
- Graceful shutdown (send SIGTERM during active RPCs, verify drain)
|
||
- Multi-node federation relay (if federation wired in Sprint 6)
|
||
|
||
- [x] **5.5 CI hardening**
|
||
- Add MSRV check (Rust 1.75 or declared minimum)
|
||
- Add cross-platform CI (macOS, Windows — at least build check)
|
||
- Add cargo-fuzz for crypto and parsing code
|
||
- Add MIRI for unsafe code in plugin-api/FFI
|
||
|
||
---
|
||
|
||
## Sprint 6 — Federation & P2P Integration
|
||
|
||
Wire up the scaffolded federation and P2P code into working features.
|
||
|
||
- [x] **6.1 Federation message routing**
|
||
- Wire `federation::routing::resolve_destination()` into `handle_enqueue`
|
||
- Route messages to remote home servers via `FederationClient::relay_enqueue()`
|
||
- Resolve protocol mismatch (Cap'n Proto federation vs Protobuf main RPC)
|
||
|
||
- [x] **6.2 Federation identity resolution**
|
||
- Cross-server user lookup (`user@remote-server`)
|
||
- KeyPackage fetching across federated nodes
|
||
|
||
- [x] **6.3 P2P client integration**
|
||
- Wire iroh P2P into client as transport option
|
||
- Fallback logic: prefer P2P direct → fall back to server relay
|
||
- mDNS discovery in client (already scaffolded, needs activation)
|
||
|
||
- [x] **6.4 Multipath QUIC evaluation**
|
||
- Research draft-ietf-quic-multipath (likely RFC in 2026)
|
||
- Prototype: use multiple paths for mesh relay resilience
|
||
- Decision: adopt or defer based on quinn support
|
||
|
||
- [x] **6.5 Federation integration tests**
|
||
- Two-server test: register on A, send to user on B, verify delivery
|
||
- mTLS mutual auth verification
|
||
- Partition tolerance (one node goes down, messages queue)
|
||
|
||
---
|
||
|
||
## Sprint 7 — Documentation, Polish & Future Prep
|
||
|
||
Final polish and forward-looking improvements.
|
||
|
||
- [x] **7.1 Crate-level documentation**
|
||
- Add module-level docs to `quicprochat-plugin-api`, `quicprochat-rpc`, `quicprochat-sdk`
|
||
- Doc comments for all public APIs in domain services
|
||
|
||
- [x] **7.2 Refactor high-arity functions** (none found — already clean)
|
||
- Consolidate 8-9 parameter functions into config/param structs
|
||
- Improve builder patterns where appropriate
|
||
|
||
- [ ] **7.3 Review RFC 9750 (MLS Architecture)** (deferred — requires manual review)
|
||
- Verify quicprochat's AS/DS split aligns with RFC 9750 recommendations
|
||
- Document any deviations and rationale
|
||
|
||
- [ ] **7.4 Desktop client evaluation** (deferred — requires Tauri prototype)
|
||
- Prototype Tauri v2 desktop shell wrapping the TUI or a web UI
|
||
- Evaluate effort to ship cross-platform desktop client
|
||
|
||
- [x] **7.5 Security pre-audit prep**
|
||
- Document all crypto boundaries and trust assumptions
|
||
- Create threat model document
|
||
- Prepare scope document for external auditors (Roadmap item 4.1)
|
||
- Budget: NCC Group / Trail of Bits / Cure53 ($50K–$150K, 4-6 weeks)
|
||
|
||
- [ ] **7.6 Repository rename** (requires GitHub admin action)
|
||
- Rename GitHub repository from `quicproquo` → `quicprochat`
|
||
- Update all GitHub URLs, CI badge links, go.mod import paths
|
||
- Set up redirect from old repo name
|
||
|
||
---
|
||
|
||
## Sprint Summary
|
||
|
||
| Sprint | Focus | Risk | Key Deliverable |
|
||
|--------|-------|------|----------------|
|
||
| **1** | Bug fixes & code quality | Low | Zero clippy warnings, metrics wired |
|
||
| **2** | OpenMLS 0.5 → 0.8 | High | Security patches applied, FIPS 203 verified |
|
||
| **3** | Client resilience | Medium | Auto-reconnect, heartbeat, status UI |
|
||
| **4** | Server hardening | Medium | Real graceful shutdown, timeouts enforced |
|
||
| **5** | Test coverage & CI | Low | Unit tests for SDK/RPC/domain, fuzzing |
|
||
| **6** | Federation & P2P | High | Working cross-server messaging, P2P fallback |
|
||
| **7** | Docs, polish & audit prep | Low | Audit-ready, desktop prototype |
|