# Production Readiness WBS This page defines the work breakdown structure (WBS) for taking quicproquo from a proof-of-concept to a production-hardened system. It covers feature scope, security policy, phased delivery, and a planning checklist. For the milestone-by-milestone tracker, see [Milestones](milestones.md). This document focuses on the cross-cutting concerns that span multiple milestones. --- ## Feature Scope (Must-Have) These are the feature areas that must be addressed before quicproquo can be considered production-ready. Each area maps to one or more milestones or phases in the WBS below. | Area | Description | Primary Milestone | |------|-------------|-------------------| | **Identity / Auth** | Account creation, device registration, token-based RPC authentication, MLS identity binding | M4 + Phase 3 | | **Key / MLS Lifecycle** | KeyPackage rotation, epoch advancement, member removal, credential updates | M5 + Phase 2 | | **Transport / Delivery** | QUIC + TLS 1.3 hardening, ALPN enforcement, connection draining, reconnect | M1 (done) + Phase 2 | | **Private 1:1 Channels** | Channel creation, per-channel authz, TTL eviction, DM-specific flows | Phase 4 | | **Storage / Persistence** | SQLite (or SQLCipher) for AS, DS, client state; migrations; backup/restore | M6 + Phase 6 | | **Observability / Ops** | Structured logging, metrics, distributed tracing, healthcheck endpoints | Phase 6 | | **Client Resilience** | Offline queue, retry with backoff, idempotent message IDs, gap detection | Phase 4 | | **Compatibility / Protocols** | Wire versioning, N-1 client interoperability, ciphersuite negotiation | Phase 2 + Phase 5 | --- ## Security Plan (By Design) quicproquo follows a security-by-design philosophy. The standards below are non-negotiable -- see [Coding Standards](../contributing/coding-standards.md) for how they are enforced in code. ### Governance - `CODEOWNERS` file mapping each crate to a responsible reviewer. - All PRs require at least one review from a crate owner. - Security-sensitive changes (crypto, auth, wire format) require two reviewers. - GPG-signed commits only. ### Transport Policy - TLS 1.3 only (`rustls` configured with `TLS13` cipher suites exclusively). - ALPN token `b"capnp"` required; reject connections with mismatched ALPN. - Self-signed certificates acceptable for development; production deployments must use a CA-signed certificate or certificate pinning. - Connection draining on shutdown (QUIC `CONNECTION_CLOSE`). ### MLS Policy - Ciphersuite: `MLS_128_DHKEMX25519_AES128GCM_SHA256_Ed25519` (baseline). - Single-use KeyPackages (consumed on fetch, per RFC 9420). - KeyPackage TTL: 24 hours; clients must rotate before expiry. - Ciphersuite allowlist: server rejects KeyPackages with unknown ciphersuites. - No downgrade: once a group has used a ciphersuite, members cannot rejoin with a weaker one. ### Input Validation - All incoming Cap'n Proto messages validated against schema before processing. - Maximum payload size: 5 MB per RPC call. - Group ID, identity key, and channel ID fields validated for correct length (32 bytes, 32 bytes, 16 bytes respectively). - UTF-8 validation on all string fields. ### Secrets Management - All private key material wrapped in `Zeroizing` (via the `zeroize` crate). - No secret material in log output at any level. - No `unwrap()` on cryptographic operations -- all errors are typed and propagated. - Constant-time comparison for authentication tokens and key fingerprints. ### Abuse / DoS Controls - Rate limiting: 50 requests/second per IP, per account, and per device. - Payload cap: 5 MB per message. - Connection limit: configurable max concurrent QUIC connections. - KeyPackage upload limit: configurable per account (prevents store exhaustion). - Long-poll timeout cap: server-enforced maximum for `fetchWait`. ### Data Protection - MLS ciphertext is opaque to the server (DS never holds group keys). - Message retention: 7 days default, configurable. - KeyPackage retention: 24 hours (TTL eviction). - At-rest encryption for persistent storage (SQLCipher at M6). ### Logging Safety - Structured logging via `tracing` with `env-filter`. - Sensitive fields (keys, tokens, ciphertext) are never logged, even at `TRACE`. - Audit-level events: auth success/failure, token issuance, keypackage upload, enqueue/fetch, rate limit hits. ### Testing - Unit tests for all crypto operations (see [Testing Strategy](../contributing/testing.md)). - Integration tests for every RPC method. - Negative tests: malformed input, expired tokens, wrong identity, replay attempts. - N-1 compatibility tests (old client against new server). - Fuzzing targets for Cap'n Proto parsers and MLS message handling (Phase 5). --- ## Work Breakdown (6 Phases) ### Phase 1 -- Baselines and Governance **Goal:** Establish project hygiene before adding features. | Task | Description | |------|-------------| | CODEOWNERS | Map crates to responsible reviewers | | CI pipeline | GitHub Actions: `cargo test --workspace`, `cargo clippy`, `cargo fmt --check`, `cargo deny check` | | SBOM generation | `cargo-cyclonedx` or `cargo-about` in CI; publish with each release | | Threat model | Document assets, adversaries, attack surface, trust boundaries; reference in [Threat Model](../cryptography/threat-model.md) | | Dependency audit | `cargo audit` in CI; pin all major versions per [Coding Standards](../contributing/coding-standards.md) | ### Phase 2 -- Protocols and Core Hardening **Goal:** Lock down the wire format and cryptographic policy. | Task | Description | |------|-------------| | Wire versioning | Add `version` field to all Cap'n Proto structs; reject unknown versions | | Ciphersuite allowlist | Server rejects KeyPackages outside the allowed set | | Downgrade guards | Prevent epoch rollback; reject Commits with weaker ciphersuites | | ALPN enforcement | Reject connections without `b"capnp"` ALPN token | | Connection draining | Graceful QUIC `CONNECTION_CLOSE` on server shutdown | | KeyPackage rotation | Client-side timer to upload fresh KeyPackages before TTL expiry | ### Phase 3 -- Auth, Device, and Server Hardening **Goal:** Add account/device identity and token-based authentication. See [Auth, Devices, and Tokens](authz-plan.md) for the full design. | Task | Description | |------|-------------| | Account + device model | `{account_id, device_id, device_pubkey}` with status lifecycle | | Token issuance | Access + refresh tokens; configurable expiry | | RPC auth middleware | Validate token on every RPC; map to account/device | | Identity binding | Bind MLS identity key to account; reject mismatched uploads | | Rate limiting | Per-IP, per-account, per-device counters | | Audit logging | Auth events, token lifecycle, rate limit hits | ### Phase 4 -- Delivery Semantics and Client Resilience **Goal:** Reliable message delivery and 1:1 channels. See [1:1 Channel Design](dm-channels.md) for the DM-specific design. | Task | Description | |------|-------------| | Idempotent message IDs | Client-generated UUIDs; server deduplicates | | Ordering guarantees | Per-channel sequence numbers; client detects gaps | | Offline queue | Server retains messages for offline recipients (up to TTL) | | 1:1 channels | Channel creation, membership, per-channel authz | | TTL eviction | Background sweep + fetch-time check for expired messages | | Client retry | Exponential backoff with jitter on transient failures | ### Phase 5 -- E2E Harness and Security Tests **Goal:** Automated end-to-end testing and security validation. | Task | Description | |------|-------------| | docker-compose testnet | Multi-node test environment with configurable topology | | Positive E2E tests | Full group lifecycle: register, create, invite, join, send, recv, leave | | Negative E2E tests | Expired tokens, wrong identity, replay, malformed messages | | Compat matrix | N-1 client/server version testing | | Fuzz targets | `cargo-fuzz` targets for Cap'n Proto parsers, MLS message handlers | | Golden-wire fixtures | Serialised test vectors for regression testing across versions | ### Phase 6 -- Reliability, Performance, and Operations **Goal:** Production-grade operations and performance validation. | Task | Description | |------|-------------| | SQLite/SQLCipher persistence | AS key store, DS message log, client state (M6) | | Soak testing | 72-hour continuous operation under synthetic load | | Load testing | Throughput and latency benchmarks (Criterion + custom harness) | | Chaos testing | Network partitions, process crashes, disk full scenarios | | Backup / restore | SQLite backup with integrity verification | | Canary / rollback | Rolling deployment strategy with automatic rollback on failure | | Metrics + dashboards | Prometheus metrics, Grafana dashboards (see [Future Research](future-research.md)) | --- ## Planning Checklist Use this checklist when planning a new milestone or phase. Each item should have a documented decision before implementation begins. - [ ] **Release criteria / SLOs** -- Define what "done" means. Latency targets, error rate thresholds, test coverage minimums. - [ ] **Threat model review** -- Update the [Threat Model](../cryptography/threat-model.md) for any new attack surface introduced by this phase. - [ ] **Protocol policy** -- Ciphersuite allowlist, wire version, downgrade rules. - [ ] **Identity / auth model** -- Who authenticates, how, and what operations are gated. - [ ] **Data model** -- Schema changes, migrations, backward compatibility. - [ ] **Abuse controls** -- Rate limits, size caps, connection limits for this phase. - [ ] **Observability contracts** -- What new metrics, logs, and traces are needed. - [ ] **Environments / secrets** -- Dev, staging, production configuration; secret rotation plan. - [ ] **Testing matrix** -- Unit, integration, E2E, negative, fuzz, compat tests for this phase. - [ ] **Rollout / ops** -- Deployment strategy, rollback plan, monitoring during rollout. --- ## Cross-references - [Milestones](milestones.md) -- feature milestone tracker - [Auth, Devices, and Tokens](authz-plan.md) -- Phase 3 design - [1:1 Channel Design](dm-channels.md) -- Phase 4 design - [Future Research](future-research.md) -- technology options for Phase 6+ - [Coding Standards](../contributing/coding-standards.md) -- engineering standards - [Testing Strategy](../contributing/testing.md) -- test structure and conventions - [Threat Model](../cryptography/threat-model.md) -- security analysis