Files
quicproquo/ROADMAP_WBS.md

58 lines
5.8 KiB
Markdown

# Production Readiness Work Breakdown
## Feature Scope (must-have)
- Identity and Auth: account/device model, signup/login, short-lived tokens + refresh, device binding/revocation, rate limits, audit events.
- Key and MLS Lifecycle: keypackage create/rotate/expire, add/remove member, epoch advance, replay/downgrade protection, external commits, keystore encryption at rest.
- Transport and Delivery: QUIC/TLS endpoint on 4201, health/readiness, ordering and dedup policy, idempotent delivery IDs, backpressure, resumable sessions, payload size caps.
- Private 1:1 Channels: first-class DM abstraction (channel IDs), authz on enqueue/fetch, per-channel history/retention policy, same MLS encryption with pairwise groups, spam/rate controls.
- Storage and Persistence: durable queues and keypackages, migrations and schema versioning, integrity checksums, backup/restore playbook.
- Observability and Ops: structured logs with correlation IDs, metrics (auth latency, handshake success, delivery lag, queue depth), traces across auth→delivery→storage, alerting/SLO dashboards.
- Client Resilience and UX: offline queue with retry/jitter, reconnect/resume, state persistence, basic key verification surface, compatibility handling for server upgrades.
- Compatibility and Protocols: Cap'n Proto schema versioning rules, golden-wire fixtures, N-1 client/server matrix tests, ciphersuite allowlist.
## Security Plan (by design)
- Governance: CODEOWNERS on crypto/proto/auth paths; required review; cargo-audit/deny + SBOM in CI; threat model maintained per release.
- Transport Policy: TLS 1.3 strict ciphers, mTLS option, pinned server identity, downgrade detection; QUIC rate limits/connection caps.
- MLS Policy: enforce lifetime/usage on keypackages, replay/downgrade checks, epoch monotonicity, credential validation.
- Input Validation: strict length/type checks on all RPC inputs; reject oversize or malformed payloads; explicit error mapping with no panics on untrusted data.
- Secrets: config via env/secret manager only; no secrets in repo/images; rotation hooks; memory zeroize where feasible.
- Abuse/DoS Controls: per-IP/account rate limits, request/body size caps, cheap pre-auth drops, bounded queues/backpressure.
- Data Protection: encryption at rest for keystore/state; backups with integrity verification; deletion/retention policies.
- Logging Safety: redaction of secrets/PII; correlation IDs; audit log for auth/device/key events; access-controlled log sinks.
- Testing: unit/prop tests for codecs/crypto/state machines; integration tests for auth/storage; e2e security cases (tamper/replay/downgrade/expiry); fuzzing targets for parsers; periodic pentest.
## Work Breakdown (phased)
1) Baselines and Governance
- CODEOWNERS + review gates; fmt/clippy/test and cargo-audit/deny in CI; SBOM generation; threat model + release criteria (SLOs, ciphersuites, compat policy).
2) Protocols and Core Hardening
- Cap'n Proto versioning rules + compat tests + golden-wire fixtures.
- Enforce ciphersuite allowlist; downgrade/replay guards; keypackage lifetime/expiry; keystore encryption; structured error taxonomy.
- Wire guardrails: TLS 1.3 only; MLS_128_DHKEMX25519_AES128GCM_SHA256_Ed25519 only; schema version tags on all Cap'n Proto messages; reject unknown versions; golden captures for auth/envelope/delivery; N-1 compatibility tests.
3) Auth/Device and Server Hardening
- Account/device schema and storage; signup/login + token/refresh; device bind/revoke; rate limits and size caps; audit events; health/readiness; graceful shutdown/backpressure.
- AuthZ/RBAC hooks on enqueue/fetch keyed to identity/device; session TTLs; lockout/backoff; audit log on auth/device/key events; per-IP/account limits (50 r/s, 5 MB payload cap, 50 conns/IP).
4) Delivery Semantics and Client Resilience
- Idempotent delivery IDs, ordering/dedup policy, resumable sessions, offline queue with retry/jitter, state persistence; client/server config for port 4201; telemetry hooks.
- First-class 1:1 channels: channel IDs, authz on enqueue/fetch, per-channel retention (7d), keypackage TTL 24h, spam/rate controls, optional history toggle.
5) E2E Harness and Security Tests
- docker-compose testnet; Rust e2e driver; happy-path flows (register, upload/fetch, create/join/send/recv, resume); negative cases (tamper, replay, downgrade, expired keypackage, oversize, rate limit); compatibility matrix (N-1 clients/servers).
6) Reliability, Perf, and Operations
- Soak/load tests with thresholds; chaos (loss/latency/reorder); backups/restore drills; staging parity; canary/rollback runbooks; alerting + dashboards.
## Planning Checklist (before implementation)
- Define release criteria and SLOs: availability, p99 latencies (auth, handshake, enqueue/fetch), error budgets.
- Threat model sign-off: auth/device, transport, MLS lifecycle, storage, abuse/DoS; document mitigations and gaps.
- Protocol policy: allowed ciphersuites, Cap'n Proto versioning rules, backward/forward compatibility guarantees, keypackage lifetime/rotation cadence.
- Identity and auth model: account/device lifecycle, token TTL/refresh, revocation flows, audit requirements.
- Data model decisions: schema for keypackages, delivery queues, audit logs; retention and deletion policy (per-message, per-channel).
- Abuse controls: rate limits (per IP/account/channel), size caps, connection caps, cheap pre-auth drops; defaults and override policy.
- Observability contracts: required metrics/log fields/traces, correlation IDs; dashboards to build; alert thresholds.
- Environments and secrets: how configs are injected (env/secret manager), key rotation plan, no-secrets-in-repo enforcement.
- Testing matrix: target platforms, N-1 compatibility scope, minimum e2e acceptance set, perf thresholds.
- Rollout and ops: staging parity definition, canary/rollback procedure, backup/restore drill cadence, on-call/runbook ownership.