feat: add post-quantum hybrid KEM + SQLCipher persistence
Feature 1 — Post-Quantum Hybrid KEM (X25519 + ML-KEM-768): - Create hybrid_kem.rs with keygen, encrypt, decrypt + 11 unit tests - Wire format: version(1) | x25519_eph_pk(32) | mlkem_ct(1088) | nonce(12) | ct - Add uploadHybridKey/fetchHybridKey RPCs to node.capnp schema - Server: hybrid key storage in FileBackedStore + RPC handlers - Client: hybrid keypair in StoredState, auto-wrap/unwrap in send/recv/invite/join - demo-group runs full hybrid PQ envelope round-trip Feature 2 — SQLCipher Persistence: - Extract Store trait from FileBackedStore API - Create SqlStore (rusqlite + bundled-sqlcipher) with encrypted-at-rest SQLite - Schema: key_packages, deliveries, hybrid_keys tables with indexes - Server CLI: --store-backend=sql, --db-path, --db-key flags - 5 unit tests for SqlStore (FIFO, round-trip, upsert, channel isolation) Also includes: client lib.rs refactor, auth config, TOML config file support, mdBook documentation, and various cleanups by user. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
This commit is contained in:
226
docs/src/roadmap/production-readiness.md
Normal file
226
docs/src/roadmap/production-readiness.md
Normal file
@@ -0,0 +1,226 @@
|
||||
# Production Readiness WBS
|
||||
|
||||
This page defines the work breakdown structure (WBS) for taking quicnprotochat
|
||||
from a proof-of-concept to a production-hardened system. It covers feature scope,
|
||||
security policy, phased delivery, and a planning checklist.
|
||||
|
||||
For the milestone-by-milestone tracker, see [Milestones](milestones.md). This
|
||||
document focuses on the cross-cutting concerns that span multiple milestones.
|
||||
|
||||
---
|
||||
|
||||
## Feature Scope (Must-Have)
|
||||
|
||||
These are the feature areas that must be addressed before quicnprotochat can be
|
||||
considered production-ready. Each area maps to one or more milestones or phases
|
||||
in the WBS below.
|
||||
|
||||
| Area | Description | Primary Milestone |
|
||||
|------|-------------|-------------------|
|
||||
| **Identity / Auth** | Account creation, device registration, token-based RPC authentication, MLS identity binding | M4 + Phase 3 |
|
||||
| **Key / MLS Lifecycle** | KeyPackage rotation, epoch advancement, member removal, credential updates | M5 + Phase 2 |
|
||||
| **Transport / Delivery** | QUIC + TLS 1.3 hardening, ALPN enforcement, connection draining, reconnect | M1 (done) + Phase 2 |
|
||||
| **Private 1:1 Channels** | Channel creation, per-channel authz, TTL eviction, DM-specific flows | Phase 4 |
|
||||
| **Storage / Persistence** | SQLite (or SQLCipher) for AS, DS, client state; migrations; backup/restore | M6 + Phase 6 |
|
||||
| **Observability / Ops** | Structured logging, metrics, distributed tracing, healthcheck endpoints | Phase 6 |
|
||||
| **Client Resilience** | Offline queue, retry with backoff, idempotent message IDs, gap detection | Phase 4 |
|
||||
| **Compatibility / Protocols** | Wire versioning, N-1 client interoperability, ciphersuite negotiation | Phase 2 + Phase 5 |
|
||||
|
||||
---
|
||||
|
||||
## Security Plan (By Design)
|
||||
|
||||
quicnprotochat follows a security-by-design philosophy. The standards below are
|
||||
non-negotiable -- see [Coding Standards](../contributing/coding-standards.md) for
|
||||
how they are enforced in code.
|
||||
|
||||
### Governance
|
||||
|
||||
- `CODEOWNERS` file mapping each crate to a responsible reviewer.
|
||||
- All PRs require at least one review from a crate owner.
|
||||
- Security-sensitive changes (crypto, auth, wire format) require two reviewers.
|
||||
- GPG-signed commits only.
|
||||
|
||||
### Transport Policy
|
||||
|
||||
- TLS 1.3 only (`rustls` configured with `TLS13` cipher suites exclusively).
|
||||
- ALPN token `b"capnp"` required; reject connections with mismatched ALPN.
|
||||
- Self-signed certificates acceptable for development; production deployments
|
||||
must use a CA-signed certificate or certificate pinning.
|
||||
- Connection draining on shutdown (QUIC `CONNECTION_CLOSE`).
|
||||
|
||||
### MLS Policy
|
||||
|
||||
- Ciphersuite: `MLS_128_DHKEMX25519_AES128GCM_SHA256_Ed25519` (baseline).
|
||||
- Single-use KeyPackages (consumed on fetch, per RFC 9420).
|
||||
- KeyPackage TTL: 24 hours; clients must rotate before expiry.
|
||||
- Ciphersuite allowlist: server rejects KeyPackages with unknown ciphersuites.
|
||||
- No downgrade: once a group has used a ciphersuite, members cannot rejoin with
|
||||
a weaker one.
|
||||
|
||||
### Input Validation
|
||||
|
||||
- All incoming Cap'n Proto messages validated against schema before processing.
|
||||
- Maximum payload size: 5 MB per RPC call.
|
||||
- Group ID, identity key, and channel ID fields validated for correct length
|
||||
(32 bytes, 32 bytes, 16 bytes respectively).
|
||||
- UTF-8 validation on all string fields.
|
||||
|
||||
### Secrets Management
|
||||
|
||||
- All private key material wrapped in `Zeroizing<T>` (via the `zeroize` crate).
|
||||
- No secret material in log output at any level.
|
||||
- No `unwrap()` on cryptographic operations -- all errors are typed and propagated.
|
||||
- Constant-time comparison for authentication tokens and key fingerprints.
|
||||
|
||||
### Abuse / DoS Controls
|
||||
|
||||
- Rate limiting: 50 requests/second per IP, per account, and per device.
|
||||
- Payload cap: 5 MB per message.
|
||||
- Connection limit: configurable max concurrent QUIC connections.
|
||||
- KeyPackage upload limit: configurable per account (prevents store exhaustion).
|
||||
- Long-poll timeout cap: server-enforced maximum for `fetchWait`.
|
||||
|
||||
### Data Protection
|
||||
|
||||
- MLS ciphertext is opaque to the server (DS never holds group keys).
|
||||
- Message retention: 7 days default, configurable.
|
||||
- KeyPackage retention: 24 hours (TTL eviction).
|
||||
- At-rest encryption for persistent storage (SQLCipher at M6).
|
||||
|
||||
### Logging Safety
|
||||
|
||||
- Structured logging via `tracing` with `env-filter`.
|
||||
- Sensitive fields (keys, tokens, ciphertext) are never logged, even at `TRACE`.
|
||||
- Audit-level events: auth success/failure, token issuance, keypackage upload,
|
||||
enqueue/fetch, rate limit hits.
|
||||
|
||||
### Testing
|
||||
|
||||
- Unit tests for all crypto operations (see [Testing Strategy](../contributing/testing.md)).
|
||||
- Integration tests for every RPC method.
|
||||
- Negative tests: malformed input, expired tokens, wrong identity, replay attempts.
|
||||
- N-1 compatibility tests (old client against new server).
|
||||
- Fuzzing targets for Cap'n Proto parsers and MLS message handling (Phase 5).
|
||||
|
||||
---
|
||||
|
||||
## Work Breakdown (6 Phases)
|
||||
|
||||
### Phase 1 -- Baselines and Governance
|
||||
|
||||
**Goal:** Establish project hygiene before adding features.
|
||||
|
||||
| Task | Description |
|
||||
|------|-------------|
|
||||
| CODEOWNERS | Map crates to responsible reviewers |
|
||||
| CI pipeline | GitHub Actions: `cargo test --workspace`, `cargo clippy`, `cargo fmt --check`, `cargo deny check` |
|
||||
| SBOM generation | `cargo-cyclonedx` or `cargo-about` in CI; publish with each release |
|
||||
| Threat model | Document assets, adversaries, attack surface, trust boundaries; reference in [Threat Model](../cryptography/threat-model.md) |
|
||||
| Dependency audit | `cargo audit` in CI; pin all major versions per [Coding Standards](../contributing/coding-standards.md) |
|
||||
|
||||
### Phase 2 -- Protocols and Core Hardening
|
||||
|
||||
**Goal:** Lock down the wire format and cryptographic policy.
|
||||
|
||||
| Task | Description |
|
||||
|------|-------------|
|
||||
| Wire versioning | Add `version` field to all Cap'n Proto structs; reject unknown versions |
|
||||
| Ciphersuite allowlist | Server rejects KeyPackages outside the allowed set |
|
||||
| Downgrade guards | Prevent epoch rollback; reject Commits with weaker ciphersuites |
|
||||
| ALPN enforcement | Reject connections without `b"capnp"` ALPN token |
|
||||
| Connection draining | Graceful QUIC `CONNECTION_CLOSE` on server shutdown |
|
||||
| KeyPackage rotation | Client-side timer to upload fresh KeyPackages before TTL expiry |
|
||||
|
||||
### Phase 3 -- Auth, Device, and Server Hardening
|
||||
|
||||
**Goal:** Add account/device identity and token-based authentication.
|
||||
|
||||
See [Auth, Devices, and Tokens](authz-plan.md) for the full design.
|
||||
|
||||
| Task | Description |
|
||||
|------|-------------|
|
||||
| Account + device model | `{account_id, device_id, device_pubkey}` with status lifecycle |
|
||||
| Token issuance | Access + refresh tokens; configurable expiry |
|
||||
| RPC auth middleware | Validate token on every RPC; map to account/device |
|
||||
| Identity binding | Bind MLS identity key to account; reject mismatched uploads |
|
||||
| Rate limiting | Per-IP, per-account, per-device counters |
|
||||
| Audit logging | Auth events, token lifecycle, rate limit hits |
|
||||
|
||||
### Phase 4 -- Delivery Semantics and Client Resilience
|
||||
|
||||
**Goal:** Reliable message delivery and 1:1 channels.
|
||||
|
||||
See [1:1 Channel Design](dm-channels.md) for the DM-specific design.
|
||||
|
||||
| Task | Description |
|
||||
|------|-------------|
|
||||
| Idempotent message IDs | Client-generated UUIDs; server deduplicates |
|
||||
| Ordering guarantees | Per-channel sequence numbers; client detects gaps |
|
||||
| Offline queue | Server retains messages for offline recipients (up to TTL) |
|
||||
| 1:1 channels | Channel creation, membership, per-channel authz |
|
||||
| TTL eviction | Background sweep + fetch-time check for expired messages |
|
||||
| Client retry | Exponential backoff with jitter on transient failures |
|
||||
|
||||
### Phase 5 -- E2E Harness and Security Tests
|
||||
|
||||
**Goal:** Automated end-to-end testing and security validation.
|
||||
|
||||
| Task | Description |
|
||||
|------|-------------|
|
||||
| docker-compose testnet | Multi-node test environment with configurable topology |
|
||||
| Positive E2E tests | Full group lifecycle: register, create, invite, join, send, recv, leave |
|
||||
| Negative E2E tests | Expired tokens, wrong identity, replay, malformed messages |
|
||||
| Compat matrix | N-1 client/server version testing |
|
||||
| Fuzz targets | `cargo-fuzz` targets for Cap'n Proto parsers, MLS message handlers |
|
||||
| Golden-wire fixtures | Serialised test vectors for regression testing across versions |
|
||||
|
||||
### Phase 6 -- Reliability, Performance, and Operations
|
||||
|
||||
**Goal:** Production-grade operations and performance validation.
|
||||
|
||||
| Task | Description |
|
||||
|------|-------------|
|
||||
| SQLite/SQLCipher persistence | AS key store, DS message log, client state (M6) |
|
||||
| Soak testing | 72-hour continuous operation under synthetic load |
|
||||
| Load testing | Throughput and latency benchmarks (Criterion + custom harness) |
|
||||
| Chaos testing | Network partitions, process crashes, disk full scenarios |
|
||||
| Backup / restore | SQLite backup with integrity verification |
|
||||
| Canary / rollback | Rolling deployment strategy with automatic rollback on failure |
|
||||
| Metrics + dashboards | Prometheus metrics, Grafana dashboards (see [Future Research](future-research.md)) |
|
||||
|
||||
---
|
||||
|
||||
## Planning Checklist
|
||||
|
||||
Use this checklist when planning a new milestone or phase. Each item should have
|
||||
a documented decision before implementation begins.
|
||||
|
||||
- [ ] **Release criteria / SLOs** -- Define what "done" means. Latency targets,
|
||||
error rate thresholds, test coverage minimums.
|
||||
- [ ] **Threat model review** -- Update the [Threat Model](../cryptography/threat-model.md)
|
||||
for any new attack surface introduced by this phase.
|
||||
- [ ] **Protocol policy** -- Ciphersuite allowlist, wire version, downgrade rules.
|
||||
- [ ] **Identity / auth model** -- Who authenticates, how, and what operations
|
||||
are gated.
|
||||
- [ ] **Data model** -- Schema changes, migrations, backward compatibility.
|
||||
- [ ] **Abuse controls** -- Rate limits, size caps, connection limits for this phase.
|
||||
- [ ] **Observability contracts** -- What new metrics, logs, and traces are needed.
|
||||
- [ ] **Environments / secrets** -- Dev, staging, production configuration;
|
||||
secret rotation plan.
|
||||
- [ ] **Testing matrix** -- Unit, integration, E2E, negative, fuzz, compat tests
|
||||
for this phase.
|
||||
- [ ] **Rollout / ops** -- Deployment strategy, rollback plan, monitoring during
|
||||
rollout.
|
||||
|
||||
---
|
||||
|
||||
## Cross-references
|
||||
|
||||
- [Milestones](milestones.md) -- feature milestone tracker
|
||||
- [Auth, Devices, and Tokens](authz-plan.md) -- Phase 3 design
|
||||
- [1:1 Channel Design](dm-channels.md) -- Phase 4 design
|
||||
- [Future Research](future-research.md) -- technology options for Phase 6+
|
||||
- [Coding Standards](../contributing/coding-standards.md) -- engineering standards
|
||||
- [Testing Strategy](../contributing/testing.md) -- test structure and conventions
|
||||
- [Threat Model](../cryptography/threat-model.md) -- security analysis
|
||||
Reference in New Issue
Block a user