Rename all project references from quicproquo/qpq to quicprochat/qpc across documentation, Docker configuration, CI workflows, packaging scripts, operational configs, and build tooling. - Docker: crate paths, binary names, user/group, data dirs, env vars - CI: workflow crate references, binary names, artifact names - Docs: all markdown files under docs/, SDK READMEs, book.toml - Packaging: OpenWrt Makefile, init script, UCI config (file renames) - Scripts: justfile, dev-shell, screenshot, cross-compile, ai_team - Operations: Prometheus config, alert rules, Grafana dashboard - Config: .env.example (QPQ_* → QPC_*), CODEOWNERS paths - Top-level: README, CONTRIBUTING, ROADMAP, CLAUDE.md
10 KiB
Production Readiness WBS
This page defines the work breakdown structure (WBS) for taking quicprochat from a proof-of-concept to a production-hardened system. It covers feature scope, security policy, phased delivery, and a planning checklist.
For the milestone-by-milestone tracker, see Milestones. This document focuses on the cross-cutting concerns that span multiple milestones.
Feature Scope (Must-Have)
These are the feature areas that must be addressed before quicprochat can be considered production-ready. Each area maps to one or more milestones or phases in the WBS below.
| Area | Description | Primary Milestone |
|---|---|---|
| Identity / Auth | Account creation, device registration, token-based RPC authentication, MLS identity binding | M4 + Phase 3 |
| Key / MLS Lifecycle | KeyPackage rotation, epoch advancement, member removal, credential updates | M5 + Phase 2 |
| Transport / Delivery | QUIC + TLS 1.3 hardening, ALPN enforcement, connection draining, reconnect | M1 (done) + Phase 2 |
| Private 1:1 Channels | Channel creation, per-channel authz, TTL eviction, DM-specific flows | Phase 4 |
| Storage / Persistence | SQLite (or SQLCipher) for AS, DS, client state; migrations; backup/restore | M6 + Phase 6 |
| Observability / Ops | Structured logging, metrics, distributed tracing, healthcheck endpoints | Phase 6 |
| Client Resilience | Offline queue, retry with backoff, idempotent message IDs, gap detection | Phase 4 |
| Compatibility / Protocols | Wire versioning, N-1 client interoperability, ciphersuite negotiation | Phase 2 + Phase 5 |
Security Plan (By Design)
quicprochat follows a security-by-design philosophy. The standards below are non-negotiable -- see Coding Standards for how they are enforced in code.
Governance
CODEOWNERSfile mapping each crate to a responsible reviewer.- All PRs require at least one review from a crate owner.
- Security-sensitive changes (crypto, auth, wire format) require two reviewers.
- GPG-signed commits only.
Transport Policy
- TLS 1.3 only (
rustlsconfigured withTLS13cipher suites exclusively). - ALPN token
b"qpc"required; reject connections with mismatched ALPN. - Self-signed certificates acceptable for development; production deployments must use a CA-signed certificate or certificate pinning.
- Connection draining on shutdown (QUIC
CONNECTION_CLOSE).
MLS Policy
- Ciphersuite:
MLS_128_DHKEMX25519_AES128GCM_SHA256_Ed25519(baseline). - Single-use KeyPackages (consumed on fetch, per RFC 9420).
- KeyPackage TTL: 24 hours; clients must rotate before expiry.
- Ciphersuite allowlist: server rejects KeyPackages with unknown ciphersuites.
- No downgrade: once a group has used a ciphersuite, members cannot rejoin with a weaker one.
Input Validation
- All incoming Protobuf messages validated against schema before processing.
- Maximum payload size: 5 MB per RPC call.
- Group ID, identity key, and channel ID fields validated for correct length (32 bytes, 32 bytes, 16 bytes respectively).
- UTF-8 validation on all string fields.
Secrets Management
- All private key material wrapped in
Zeroizing<T>(via thezeroizecrate). - No secret material in log output at any level.
- No
unwrap()on cryptographic operations -- all errors are typed and propagated. - Constant-time comparison for authentication tokens and key fingerprints.
Abuse / DoS Controls
- Rate limiting: 50 requests/second per IP, per account, and per device.
- Payload cap: 5 MB per message.
- Connection limit: configurable max concurrent QUIC connections.
- KeyPackage upload limit: configurable per account (prevents store exhaustion).
- Long-poll timeout cap: server-enforced maximum for
fetchWait.
Data Protection
- MLS ciphertext is opaque to the server (DS never holds group keys).
- Message retention: 7 days default, configurable.
- KeyPackage retention: 24 hours (TTL eviction).
- At-rest encryption for persistent storage (SQLCipher at M6).
Logging Safety
- Structured logging via
tracingwithenv-filter. - Sensitive fields (keys, tokens, ciphertext) are never logged, even at
TRACE. - Audit-level events: auth success/failure, token issuance, keypackage upload, enqueue/fetch, rate limit hits.
Testing
- Unit tests for all crypto operations (see Testing Strategy).
- Integration tests for every RPC method.
- Negative tests: malformed input, expired tokens, wrong identity, replay attempts.
- N-1 compatibility tests (old client against new server).
- Fuzzing targets for Protobuf parsers and MLS message handling (Phase 5).
Work Breakdown (6 Phases)
Phase 1 -- Baselines and Governance
Goal: Establish project hygiene before adding features.
| Task | Description |
|---|---|
| CODEOWNERS | Map crates to responsible reviewers |
| CI pipeline | GitHub Actions: cargo test --workspace, cargo clippy, cargo fmt --check, cargo deny check |
| SBOM generation | cargo-cyclonedx or cargo-about in CI; publish with each release |
| Threat model | Document assets, adversaries, attack surface, trust boundaries; reference in Threat Model |
| Dependency audit | cargo audit in CI; pin all major versions per Coding Standards |
Phase 2 -- Protocols and Core Hardening
Goal: Lock down the wire format and cryptographic policy.
| Task | Description |
|---|---|
| Wire versioning | Version field in all Protobuf frames; reject unknown versions |
| Ciphersuite allowlist | Server rejects KeyPackages outside the allowed set |
| Downgrade guards | Prevent epoch rollback; reject Commits with weaker ciphersuites |
| ALPN enforcement | Reject connections without b"qpc" ALPN token |
| Connection draining | Graceful QUIC CONNECTION_CLOSE on server shutdown |
| KeyPackage rotation | Client-side timer to upload fresh KeyPackages before TTL expiry |
Phase 3 -- Auth, Device, and Server Hardening
Goal: Add account/device identity and token-based authentication.
See Auth, Devices, and Tokens for the full design.
| Task | Description |
|---|---|
| Account + device model | {account_id, device_id, device_pubkey} with status lifecycle |
| Token issuance | Access + refresh tokens; configurable expiry |
| RPC auth middleware | Validate token on every RPC; map to account/device |
| Identity binding | Bind MLS identity key to account; reject mismatched uploads |
| Rate limiting | Per-IP, per-account, per-device counters |
| Audit logging | Auth events, token lifecycle, rate limit hits |
Phase 4 -- Delivery Semantics and Client Resilience
Goal: Reliable message delivery and 1:1 channels.
See 1:1 Channel Design for the DM-specific design.
| Task | Description |
|---|---|
| Idempotent message IDs | Client-generated UUIDs; server deduplicates |
| Ordering guarantees | Per-channel sequence numbers; client detects gaps |
| Offline queue | Server retains messages for offline recipients (up to TTL) |
| 1:1 channels | Channel creation, membership, per-channel authz |
| TTL eviction | Background sweep + fetch-time check for expired messages |
| Client retry | Exponential backoff with jitter on transient failures |
Phase 5 -- E2E Harness and Security Tests
Goal: Automated end-to-end testing and security validation.
| Task | Description |
|---|---|
| docker-compose testnet | Multi-node test environment with configurable topology |
| Positive E2E tests | Full group lifecycle: register, create, invite, join, send, recv, leave |
| Negative E2E tests | Expired tokens, wrong identity, replay, malformed messages |
| Compat matrix | N-1 client/server version testing |
| Fuzz targets | cargo-fuzz targets for Protobuf parsers, MLS message handlers |
| Golden-wire fixtures | Serialised test vectors for regression testing across versions |
Phase 6 -- Reliability, Performance, and Operations
Goal: Production-grade operations and performance validation.
| Task | Description |
|---|---|
| SQLite/SQLCipher persistence | AS key store, DS message log, client state (M6) |
| Soak testing | 72-hour continuous operation under synthetic load |
| Load testing | Throughput and latency benchmarks (Criterion + custom harness) |
| Chaos testing | Network partitions, process crashes, disk full scenarios |
| Backup / restore | SQLite backup with integrity verification |
| Canary / rollback | Rolling deployment strategy with automatic rollback on failure |
| Metrics + dashboards | Prometheus metrics, Grafana dashboards (see Future Research) |
Planning Checklist
Use this checklist when planning a new milestone or phase. Each item should have a documented decision before implementation begins.
- Release criteria / SLOs -- Define what "done" means. Latency targets, error rate thresholds, test coverage minimums.
- Threat model review -- Update the Threat Model for any new attack surface introduced by this phase.
- Protocol policy -- Ciphersuite allowlist, wire version, downgrade rules.
- Identity / auth model -- Who authenticates, how, and what operations are gated.
- Data model -- Schema changes, migrations, backward compatibility.
- Abuse controls -- Rate limits, size caps, connection limits for this phase.
- Observability contracts -- What new metrics, logs, and traces are needed.
- Environments / secrets -- Dev, staging, production configuration; secret rotation plan.
- Testing matrix -- Unit, integration, E2E, negative, fuzz, compat tests for this phase.
- Rollout / ops -- Deployment strategy, rollback plan, monitoring during rollout.
Cross-references
- Milestones -- feature milestone tracker
- Auth, Devices, and Tokens -- Phase 3 design
- 1:1 Channel Design -- Phase 4 design
- Future Research -- technology options for Phase 6+
- Coding Standards -- engineering standards
- Testing Strategy -- test structure and conventions
- Threat Model -- security analysis