9.0 KiB
quicprochat — Sprint Plan
7 sprints synthesized from code audit, architecture analysis, and ecosystem research. Each sprint is ~1 week. Sprints are ordered by priority and dependency.
Sprint 1 — Bug Fixes & Code Quality (Quick Wins)
Fix all known bugs, clippy warnings, and dead code before building on top.
-
1.1 Fix boolean logic bug in TUI
crates/quicprochat-client/src/client/v2_tui.rs:832— remove|| true- Cursor positioning always executes regardless of input state
-
1.2 Fix unwrap violations in P2P router
crates/quicprochat-p2p/src/routing.rs:416,419—.lock().unwrap()on Mutex- Replace with
.expect("lock poisoned")or proper error handling
-
1.3 Remove placeholder assertion in WebTransport
crates/quicprochat-server/src/webtransport.rs:418—assert!(true);
-
1.4 Wire up unused metrics
record_storage_latency()— instrument storage layer callsrecord_uptime_seconds()— add periodic heartbeat task in server main loop
-
1.5 Wire up or remove unused config fields
EffectiveConfig::webtransport_listen— connect to WebTransport listenerEffectiveConfig::rpc_timeout_secs— apply as per-RPC deadlineEffectiveConfig::storage_timeout_secs— apply as DB query timeout
-
1.6 Fix remaining clippy warnings
- Reduce function arity (2 functions with 8-9 args → use config/param structs)
- Remove useless
format!()call - Collapse nested conditionals
- Rename
from_strmethod to avoidFromStrtrait confusion
Sprint 2 — OpenMLS 0.5 → 0.8 Migration
CRITICAL: OpenMLS 0.7.2 includes security patches. Staying on 0.5 is a risk.
-
2.1 Migrate StorageProvider trait
- Old
OpenMlsKeyStore→ newStorageProvider(most invasive change) - Rework
DiskKeyStoreintegration (must keep bincode serialization) - Update all
group.rscalls that interact with the key store
- Old
-
2.2 Update MLS API calls
self_update()/propose_self_update()— addLeafNodeParametersargjoin_by_external_commit()— add optional LeafNode paramsSender::NewMember→ split intoNewMemberProposal/NewMemberCommit
-
2.3 Handle GREASE support
- New variants in
ProposalType,ExtensionType,CredentialType - Update match arms to handle unknown/GREASE values
- New variants in
-
2.4 Update AAD handling
- AAD no longer persisted — set before every API call generating
MlsMessageOut
- AAD no longer persisted — set before every API call generating
-
2.5 Verify FIPS 203 alignment
- Confirm ML-KEM-768 parameters match final FIPS 203 (not draft)
- Review hybrid KEM against RFC 9794 combination methods
-
2.6 Full test suite pass
- All 301 tests must pass with OpenMLS 0.8
- Run crypto benchmarks to check for performance regressions
Sprint 3 — Client Resilience
Currently, network glitches cause the client to hang. This blocks v2 launch.
-
3.1 Auto-reconnect with backoff
- Integrate existing
retry.rsintoRpcClient::call()path - Exponential backoff with jitter (already implemented, not wired)
- Configurable max retries and backoff ceiling
- Integrate existing
-
3.2 Push subscription recovery
- Detect broken push stream and re-subscribe automatically
- Buffer missed events during reconnection window
-
3.3 Heartbeat / keepalive
- Periodic QUIC ping in TUI and REPL modes
- Detect dead connections before user notices
-
3.4 SDK disconnect lifecycle
- Add
QpcClient::disconnect()for clean shutdown - Proper state machine: Connected → Reconnecting → Disconnected
- Add
-
3.5 Connection status UI
- TUI: show connection state in status bar (Connected / Reconnecting / Offline)
- REPL: print status change notifications
Sprint 4 — Server Hardening
Fix graceful shutdown and wire up timeouts for production readiness.
-
4.1 In-flight RPC tracking
- Replace fixed 30s shutdown delay with actual in-flight RPC counter
- Drain when counter reaches zero (with configurable max wait)
-
4.2 Apply request-level timeouts
- Wire
rpc_timeout_secsconfig into per-RPC deadline enforcement - Wire
storage_timeout_secsinto DB query timeouts - Cancel long-running operations cleanly
- Wire
-
4.3 Plugin shutdown hooks
- Add
on_shutdownhook toHookVTable - Call plugin shutdown before server exits
- Add
-
4.4 Federation drain during shutdown
- Stop accepting federation relay requests on SIGTERM
- Wait for in-flight federation RPCs before exit
-
4.5 Connection draining improvements
- Send QUIC CONNECTION_CLOSE with application reason
- WebTransport: send close frame before dropping sessions
Sprint 5 — Test Coverage & CI Hardening
Address the major test coverage gaps identified in the audit.
-
5.1 RPC framing unit tests
crates/quicprochat-rpc/src/framing.rs— encode/decode edge cases- Malformed frames, truncated input, max-size payloads
- Fuzzing harness for frame parser
-
5.2 SDK state machine tests
crates/quicprochat-sdk/src/conversation.rs— conversation lifecyclecrates/quicprochat-sdk/src/groups.rs— group join/leave/updatecrates/quicprochat-sdk/src/messaging.rs— send/receive/queue
-
5.3 Server domain service tests
crates/quicprochat-server/src/domain/— all service modules- Test business logic without DB (mock storage trait)
-
5.4 Integration tests
- Reconnection scenario (kill server, restart, verify client recovers)
- Graceful shutdown (send SIGTERM during active RPCs, verify drain)
- Multi-node federation relay (if federation wired in Sprint 6)
-
5.5 CI hardening
- Add MSRV check (Rust 1.75 or declared minimum)
- Add cross-platform CI (macOS, Windows — at least build check)
- Add cargo-fuzz for crypto and parsing code
- Add MIRI for unsafe code in plugin-api/FFI
Sprint 6 — Federation & P2P Integration
Wire up the scaffolded federation and P2P code into working features.
-
6.1 Federation message routing
- Wire
federation::routing::resolve_destination()intohandle_enqueue - Route messages to remote home servers via
FederationClient::relay_enqueue() - Resolve protocol mismatch (Cap'n Proto federation vs Protobuf main RPC)
- Wire
-
6.2 Federation identity resolution
- Cross-server user lookup (
user@remote-server) - KeyPackage fetching across federated nodes
- Cross-server user lookup (
-
6.3 P2P client integration
- Wire iroh P2P into client as transport option
- Fallback logic: prefer P2P direct → fall back to server relay
- mDNS discovery in client (already scaffolded, needs activation)
-
6.4 Multipath QUIC evaluation
- Research draft-ietf-quic-multipath (likely RFC in 2026)
- Prototype: use multiple paths for mesh relay resilience
- Decision: adopt or defer based on quinn support
-
6.5 Federation integration tests
- Two-server test: register on A, send to user on B, verify delivery
- mTLS mutual auth verification
- Partition tolerance (one node goes down, messages queue)
Sprint 7 — Documentation, Polish & Future Prep
Final polish and forward-looking improvements.
-
7.1 Crate-level documentation
- Add module-level docs to
quicprochat-plugin-api,quicprochat-rpc,quicprochat-sdk - Doc comments for all public APIs in domain services
- Add module-level docs to
-
7.2 Refactor high-arity functions (none found — already clean)
- Consolidate 8-9 parameter functions into config/param structs
- Improve builder patterns where appropriate
-
7.3 Review RFC 9750 (MLS Architecture) (deferred — requires manual review)
- Verify quicprochat's AS/DS split aligns with RFC 9750 recommendations
- Document any deviations and rationale
-
7.4 Desktop client evaluation (deferred — requires Tauri prototype)
- Prototype Tauri v2 desktop shell wrapping the TUI or a web UI
- Evaluate effort to ship cross-platform desktop client
-
7.5 Security pre-audit prep
- Document all crypto boundaries and trust assumptions
- Create threat model document
- Prepare scope document for external auditors (Roadmap item 4.1)
- Budget: NCC Group / Trail of Bits / Cure53 ($50K–$150K, 4-6 weeks)
-
7.6 Repository rename (requires GitHub admin action)
- Rename GitHub repository from
quicproquo→quicprochat - Update all GitHub URLs, CI badge links, go.mod import paths
- Set up redirect from old repo name
- Rename GitHub repository from
Sprint Summary
| Sprint | Focus | Risk | Key Deliverable |
|---|---|---|---|
| 1 | Bug fixes & code quality | Low | Zero clippy warnings, metrics wired |
| 2 | OpenMLS 0.5 → 0.8 | High | Security patches applied, FIPS 203 verified |
| 3 | Client resilience | Medium | Auto-reconnect, heartbeat, status UI |
| 4 | Server hardening | Medium | Real graceful shutdown, timeouts enforced |
| 5 | Test coverage & CI | Low | Unit tests for SDK/RPC/domain, fuzzing |
| 6 | Federation & P2P | High | Working cross-server messaging, P2P fallback |
| 7 | Docs, polish & audit prep | Low | Audit-ready, desktop prototype |