Files
quicproquo/docs/src/roadmap/production-readiness.md
Chris Nennemann 853ca4fec0 chore: rename project quicnprotochat -> quicproquo (binaries: qpq)
Rename the entire workspace:
- Crate packages: quicnprotochat-{core,proto,server,client,gui,p2p,mobile} -> quicproquo-*
- Binary names: quicnprotochat -> qpq, quicnprotochat-server -> qpq-server,
  quicnprotochat-gui -> qpq-gui
- Default files: *-state.bin -> qpq-state.bin, *-server.toml -> qpq-server.toml,
  *.db -> qpq.db
- Environment variable prefix: QUICNPROTOCHAT_* -> QPQ_*
- App identifier: chat.quicnproto.gui -> chat.quicproquo.gui
- Proto package: quicnprotochat.bench -> quicproquo.bench
- All documentation, Docker, CI, and script references updated

HKDF domain-separation strings and P2P ALPN remain unchanged for
backward compatibility with existing encrypted state and wire protocol.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-01 20:11:51 +01:00

10 KiB

Production Readiness WBS

This page defines the work breakdown structure (WBS) for taking quicproquo from a proof-of-concept to a production-hardened system. It covers feature scope, security policy, phased delivery, and a planning checklist.

For the milestone-by-milestone tracker, see Milestones. This document focuses on the cross-cutting concerns that span multiple milestones.


Feature Scope (Must-Have)

These are the feature areas that must be addressed before quicproquo can be considered production-ready. Each area maps to one or more milestones or phases in the WBS below.

Area Description Primary Milestone
Identity / Auth Account creation, device registration, token-based RPC authentication, MLS identity binding M4 + Phase 3
Key / MLS Lifecycle KeyPackage rotation, epoch advancement, member removal, credential updates M5 + Phase 2
Transport / Delivery QUIC + TLS 1.3 hardening, ALPN enforcement, connection draining, reconnect M1 (done) + Phase 2
Private 1:1 Channels Channel creation, per-channel authz, TTL eviction, DM-specific flows Phase 4
Storage / Persistence SQLite (or SQLCipher) for AS, DS, client state; migrations; backup/restore M6 + Phase 6
Observability / Ops Structured logging, metrics, distributed tracing, healthcheck endpoints Phase 6
Client Resilience Offline queue, retry with backoff, idempotent message IDs, gap detection Phase 4
Compatibility / Protocols Wire versioning, N-1 client interoperability, ciphersuite negotiation Phase 2 + Phase 5

Security Plan (By Design)

quicproquo follows a security-by-design philosophy. The standards below are non-negotiable -- see Coding Standards for how they are enforced in code.

Governance

  • CODEOWNERS file mapping each crate to a responsible reviewer.
  • All PRs require at least one review from a crate owner.
  • Security-sensitive changes (crypto, auth, wire format) require two reviewers.
  • GPG-signed commits only.

Transport Policy

  • TLS 1.3 only (rustls configured with TLS13 cipher suites exclusively).
  • ALPN token b"capnp" required; reject connections with mismatched ALPN.
  • Self-signed certificates acceptable for development; production deployments must use a CA-signed certificate or certificate pinning.
  • Connection draining on shutdown (QUIC CONNECTION_CLOSE).

MLS Policy

  • Ciphersuite: MLS_128_DHKEMX25519_AES128GCM_SHA256_Ed25519 (baseline).
  • Single-use KeyPackages (consumed on fetch, per RFC 9420).
  • KeyPackage TTL: 24 hours; clients must rotate before expiry.
  • Ciphersuite allowlist: server rejects KeyPackages with unknown ciphersuites.
  • No downgrade: once a group has used a ciphersuite, members cannot rejoin with a weaker one.

Input Validation

  • All incoming Cap'n Proto messages validated against schema before processing.
  • Maximum payload size: 5 MB per RPC call.
  • Group ID, identity key, and channel ID fields validated for correct length (32 bytes, 32 bytes, 16 bytes respectively).
  • UTF-8 validation on all string fields.

Secrets Management

  • All private key material wrapped in Zeroizing<T> (via the zeroize crate).
  • No secret material in log output at any level.
  • No unwrap() on cryptographic operations -- all errors are typed and propagated.
  • Constant-time comparison for authentication tokens and key fingerprints.

Abuse / DoS Controls

  • Rate limiting: 50 requests/second per IP, per account, and per device.
  • Payload cap: 5 MB per message.
  • Connection limit: configurable max concurrent QUIC connections.
  • KeyPackage upload limit: configurable per account (prevents store exhaustion).
  • Long-poll timeout cap: server-enforced maximum for fetchWait.

Data Protection

  • MLS ciphertext is opaque to the server (DS never holds group keys).
  • Message retention: 7 days default, configurable.
  • KeyPackage retention: 24 hours (TTL eviction).
  • At-rest encryption for persistent storage (SQLCipher at M6).

Logging Safety

  • Structured logging via tracing with env-filter.
  • Sensitive fields (keys, tokens, ciphertext) are never logged, even at TRACE.
  • Audit-level events: auth success/failure, token issuance, keypackage upload, enqueue/fetch, rate limit hits.

Testing

  • Unit tests for all crypto operations (see Testing Strategy).
  • Integration tests for every RPC method.
  • Negative tests: malformed input, expired tokens, wrong identity, replay attempts.
  • N-1 compatibility tests (old client against new server).
  • Fuzzing targets for Cap'n Proto parsers and MLS message handling (Phase 5).

Work Breakdown (6 Phases)

Phase 1 -- Baselines and Governance

Goal: Establish project hygiene before adding features.

Task Description
CODEOWNERS Map crates to responsible reviewers
CI pipeline GitHub Actions: cargo test --workspace, cargo clippy, cargo fmt --check, cargo deny check
SBOM generation cargo-cyclonedx or cargo-about in CI; publish with each release
Threat model Document assets, adversaries, attack surface, trust boundaries; reference in Threat Model
Dependency audit cargo audit in CI; pin all major versions per Coding Standards

Phase 2 -- Protocols and Core Hardening

Goal: Lock down the wire format and cryptographic policy.

Task Description
Wire versioning Add version field to all Cap'n Proto structs; reject unknown versions
Ciphersuite allowlist Server rejects KeyPackages outside the allowed set
Downgrade guards Prevent epoch rollback; reject Commits with weaker ciphersuites
ALPN enforcement Reject connections without b"capnp" ALPN token
Connection draining Graceful QUIC CONNECTION_CLOSE on server shutdown
KeyPackage rotation Client-side timer to upload fresh KeyPackages before TTL expiry

Phase 3 -- Auth, Device, and Server Hardening

Goal: Add account/device identity and token-based authentication.

See Auth, Devices, and Tokens for the full design.

Task Description
Account + device model {account_id, device_id, device_pubkey} with status lifecycle
Token issuance Access + refresh tokens; configurable expiry
RPC auth middleware Validate token on every RPC; map to account/device
Identity binding Bind MLS identity key to account; reject mismatched uploads
Rate limiting Per-IP, per-account, per-device counters
Audit logging Auth events, token lifecycle, rate limit hits

Phase 4 -- Delivery Semantics and Client Resilience

Goal: Reliable message delivery and 1:1 channels.

See 1:1 Channel Design for the DM-specific design.

Task Description
Idempotent message IDs Client-generated UUIDs; server deduplicates
Ordering guarantees Per-channel sequence numbers; client detects gaps
Offline queue Server retains messages for offline recipients (up to TTL)
1:1 channels Channel creation, membership, per-channel authz
TTL eviction Background sweep + fetch-time check for expired messages
Client retry Exponential backoff with jitter on transient failures

Phase 5 -- E2E Harness and Security Tests

Goal: Automated end-to-end testing and security validation.

Task Description
docker-compose testnet Multi-node test environment with configurable topology
Positive E2E tests Full group lifecycle: register, create, invite, join, send, recv, leave
Negative E2E tests Expired tokens, wrong identity, replay, malformed messages
Compat matrix N-1 client/server version testing
Fuzz targets cargo-fuzz targets for Cap'n Proto parsers, MLS message handlers
Golden-wire fixtures Serialised test vectors for regression testing across versions

Phase 6 -- Reliability, Performance, and Operations

Goal: Production-grade operations and performance validation.

Task Description
SQLite/SQLCipher persistence AS key store, DS message log, client state (M6)
Soak testing 72-hour continuous operation under synthetic load
Load testing Throughput and latency benchmarks (Criterion + custom harness)
Chaos testing Network partitions, process crashes, disk full scenarios
Backup / restore SQLite backup with integrity verification
Canary / rollback Rolling deployment strategy with automatic rollback on failure
Metrics + dashboards Prometheus metrics, Grafana dashboards (see Future Research)

Planning Checklist

Use this checklist when planning a new milestone or phase. Each item should have a documented decision before implementation begins.

  • Release criteria / SLOs -- Define what "done" means. Latency targets, error rate thresholds, test coverage minimums.
  • Threat model review -- Update the Threat Model for any new attack surface introduced by this phase.
  • Protocol policy -- Ciphersuite allowlist, wire version, downgrade rules.
  • Identity / auth model -- Who authenticates, how, and what operations are gated.
  • Data model -- Schema changes, migrations, backward compatibility.
  • Abuse controls -- Rate limits, size caps, connection limits for this phase.
  • Observability contracts -- What new metrics, logs, and traces are needed.
  • Environments / secrets -- Dev, staging, production configuration; secret rotation plan.
  • Testing matrix -- Unit, integration, E2E, negative, fuzz, compat tests for this phase.
  • Rollout / ops -- Deployment strategy, rollback plan, monitoring during rollout.

Cross-references