Files
quicproquo/SPRINTS.md

9.0 KiB
Raw Permalink Blame History

quicprochat — Sprint Plan

7 sprints synthesized from code audit, architecture analysis, and ecosystem research. Each sprint is ~1 week. Sprints are ordered by priority and dependency.


Sprint 1 — Bug Fixes & Code Quality (Quick Wins)

Fix all known bugs, clippy warnings, and dead code before building on top.

  • 1.1 Fix boolean logic bug in TUI

    • crates/quicprochat-client/src/client/v2_tui.rs:832 — remove || true
    • Cursor positioning always executes regardless of input state
  • 1.2 Fix unwrap violations in P2P router

    • crates/quicprochat-p2p/src/routing.rs:416,419.lock().unwrap() on Mutex
    • Replace with .expect("lock poisoned") or proper error handling
  • 1.3 Remove placeholder assertion in WebTransport

    • crates/quicprochat-server/src/webtransport.rs:418assert!(true);
  • 1.4 Wire up unused metrics

    • record_storage_latency() — instrument storage layer calls
    • record_uptime_seconds() — add periodic heartbeat task in server main loop
  • 1.5 Wire up or remove unused config fields

    • EffectiveConfig::webtransport_listen — connect to WebTransport listener
    • EffectiveConfig::rpc_timeout_secs — apply as per-RPC deadline
    • EffectiveConfig::storage_timeout_secs — apply as DB query timeout
  • 1.6 Fix remaining clippy warnings

    • Reduce function arity (2 functions with 8-9 args → use config/param structs)
    • Remove useless format!() call
    • Collapse nested conditionals
    • Rename from_str method to avoid FromStr trait confusion

Sprint 2 — OpenMLS 0.5 → 0.8 Migration

CRITICAL: OpenMLS 0.7.2 includes security patches. Staying on 0.5 is a risk.

  • 2.1 Migrate StorageProvider trait

    • Old OpenMlsKeyStore → new StorageProvider (most invasive change)
    • Rework DiskKeyStore integration (must keep bincode serialization)
    • Update all group.rs calls that interact with the key store
  • 2.2 Update MLS API calls

    • self_update() / propose_self_update() — add LeafNodeParameters arg
    • join_by_external_commit() — add optional LeafNode params
    • Sender::NewMember → split into NewMemberProposal / NewMemberCommit
  • 2.3 Handle GREASE support

    • New variants in ProposalType, ExtensionType, CredentialType
    • Update match arms to handle unknown/GREASE values
  • 2.4 Update AAD handling

    • AAD no longer persisted — set before every API call generating MlsMessageOut
  • 2.5 Verify FIPS 203 alignment

    • Confirm ML-KEM-768 parameters match final FIPS 203 (not draft)
    • Review hybrid KEM against RFC 9794 combination methods
  • 2.6 Full test suite pass

    • All 301 tests must pass with OpenMLS 0.8
    • Run crypto benchmarks to check for performance regressions

Sprint 3 — Client Resilience

Currently, network glitches cause the client to hang. This blocks v2 launch.

  • 3.1 Auto-reconnect with backoff

    • Integrate existing retry.rs into RpcClient::call() path
    • Exponential backoff with jitter (already implemented, not wired)
    • Configurable max retries and backoff ceiling
  • 3.2 Push subscription recovery

    • Detect broken push stream and re-subscribe automatically
    • Buffer missed events during reconnection window
  • 3.3 Heartbeat / keepalive

    • Periodic QUIC ping in TUI and REPL modes
    • Detect dead connections before user notices
  • 3.4 SDK disconnect lifecycle

    • Add QpcClient::disconnect() for clean shutdown
    • Proper state machine: Connected → Reconnecting → Disconnected
  • 3.5 Connection status UI

    • TUI: show connection state in status bar (Connected / Reconnecting / Offline)
    • REPL: print status change notifications

Sprint 4 — Server Hardening

Fix graceful shutdown and wire up timeouts for production readiness.

  • 4.1 In-flight RPC tracking

    • Replace fixed 30s shutdown delay with actual in-flight RPC counter
    • Drain when counter reaches zero (with configurable max wait)
  • 4.2 Apply request-level timeouts

    • Wire rpc_timeout_secs config into per-RPC deadline enforcement
    • Wire storage_timeout_secs into DB query timeouts
    • Cancel long-running operations cleanly
  • 4.3 Plugin shutdown hooks

    • Add on_shutdown hook to HookVTable
    • Call plugin shutdown before server exits
  • 4.4 Federation drain during shutdown

    • Stop accepting federation relay requests on SIGTERM
    • Wait for in-flight federation RPCs before exit
  • 4.5 Connection draining improvements

    • Send QUIC CONNECTION_CLOSE with application reason
    • WebTransport: send close frame before dropping sessions

Sprint 5 — Test Coverage & CI Hardening

Address the major test coverage gaps identified in the audit.

  • 5.1 RPC framing unit tests

    • crates/quicprochat-rpc/src/framing.rs — encode/decode edge cases
    • Malformed frames, truncated input, max-size payloads
    • Fuzzing harness for frame parser
  • 5.2 SDK state machine tests

    • crates/quicprochat-sdk/src/conversation.rs — conversation lifecycle
    • crates/quicprochat-sdk/src/groups.rs — group join/leave/update
    • crates/quicprochat-sdk/src/messaging.rs — send/receive/queue
  • 5.3 Server domain service tests

    • crates/quicprochat-server/src/domain/ — all service modules
    • Test business logic without DB (mock storage trait)
  • 5.4 Integration tests

    • Reconnection scenario (kill server, restart, verify client recovers)
    • Graceful shutdown (send SIGTERM during active RPCs, verify drain)
    • Multi-node federation relay (if federation wired in Sprint 6)
  • 5.5 CI hardening

    • Add MSRV check (Rust 1.75 or declared minimum)
    • Add cross-platform CI (macOS, Windows — at least build check)
    • Add cargo-fuzz for crypto and parsing code
    • Add MIRI for unsafe code in plugin-api/FFI

Sprint 6 — Federation & P2P Integration

Wire up the scaffolded federation and P2P code into working features.

  • 6.1 Federation message routing

    • Wire federation::routing::resolve_destination() into handle_enqueue
    • Route messages to remote home servers via FederationClient::relay_enqueue()
    • Resolve protocol mismatch (Cap'n Proto federation vs Protobuf main RPC)
  • 6.2 Federation identity resolution

    • Cross-server user lookup (user@remote-server)
    • KeyPackage fetching across federated nodes
  • 6.3 P2P client integration

    • Wire iroh P2P into client as transport option
    • Fallback logic: prefer P2P direct → fall back to server relay
    • mDNS discovery in client (already scaffolded, needs activation)
  • 6.4 Multipath QUIC evaluation

    • Research draft-ietf-quic-multipath (likely RFC in 2026)
    • Prototype: use multiple paths for mesh relay resilience
    • Decision: adopt or defer based on quinn support
  • 6.5 Federation integration tests

    • Two-server test: register on A, send to user on B, verify delivery
    • mTLS mutual auth verification
    • Partition tolerance (one node goes down, messages queue)

Sprint 7 — Documentation, Polish & Future Prep

Final polish and forward-looking improvements.

  • 7.1 Crate-level documentation

    • Add module-level docs to quicprochat-plugin-api, quicprochat-rpc, quicprochat-sdk
    • Doc comments for all public APIs in domain services
  • 7.2 Refactor high-arity functions (none found — already clean)

    • Consolidate 8-9 parameter functions into config/param structs
    • Improve builder patterns where appropriate
  • 7.3 Review RFC 9750 (MLS Architecture) (deferred — requires manual review)

    • Verify quicprochat's AS/DS split aligns with RFC 9750 recommendations
    • Document any deviations and rationale
  • 7.4 Desktop client evaluation (deferred — requires Tauri prototype)

    • Prototype Tauri v2 desktop shell wrapping the TUI or a web UI
    • Evaluate effort to ship cross-platform desktop client
  • 7.5 Security pre-audit prep

    • Document all crypto boundaries and trust assumptions
    • Create threat model document
    • Prepare scope document for external auditors (Roadmap item 4.1)
    • Budget: NCC Group / Trail of Bits / Cure53 ($50K$150K, 4-6 weeks)
  • 7.6 Repository rename (requires GitHub admin action)

    • Rename GitHub repository from quicproquoquicprochat
    • Update all GitHub URLs, CI badge links, go.mod import paths
    • Set up redirect from old repo name

Sprint Summary

Sprint Focus Risk Key Deliverable
1 Bug fixes & code quality Low Zero clippy warnings, metrics wired
2 OpenMLS 0.5 → 0.8 High Security patches applied, FIPS 203 verified
3 Client resilience Medium Auto-reconnect, heartbeat, status UI
4 Server hardening Medium Real graceful shutdown, timeouts enforced
5 Test coverage & CI Low Unit tests for SDK/RPC/domain, fuzzing
6 Federation & P2P High Working cross-server messaging, P2P fallback
7 Docs, polish & audit prep Low Audit-ready, desktop prototype