Files

Christian Nennemann 2e081ead8e chore: rename quicproquo → quicprochat in docs, Docker, CI, and packaging

Rename all project references from quicproquo/qpq to quicprochat/qpc
across documentation, Docker configuration, CI workflows, packaging
scripts, operational configs, and build tooling.

- Docker: crate paths, binary names, user/group, data dirs, env vars
- CI: workflow crate references, binary names, artifact names
- Docs: all markdown files under docs/, SDK READMEs, book.toml
- Packaging: OpenWrt Makefile, init script, UCI config (file renames)
- Scripts: justfile, dev-shell, screenshot, cross-compile, ai_team
- Operations: Prometheus config, alert rules, Grafana dashboard
- Config: .env.example (QPQ_* → QPC_*), CODEOWNERS paths
- Top-level: README, CONTRIBUTING, ROADMAP, CLAUDE.md

2026-03-21 19:14:06 +01:00

10 KiB

Raw Blame History

Production Readiness WBS

This page defines the work breakdown structure (WBS) for taking quicprochat from a proof-of-concept to a production-hardened system. It covers feature scope, security policy, phased delivery, and a planning checklist.

For the milestone-by-milestone tracker, see Milestones. This document focuses on the cross-cutting concerns that span multiple milestones.

Feature Scope (Must-Have)

These are the feature areas that must be addressed before quicprochat can be considered production-ready. Each area maps to one or more milestones or phases in the WBS below.

Area	Description	Primary Milestone
Identity / Auth	Account creation, device registration, token-based RPC authentication, MLS identity binding	M4 + Phase 3
Key / MLS Lifecycle	KeyPackage rotation, epoch advancement, member removal, credential updates	M5 + Phase 2
Transport / Delivery	QUIC + TLS 1.3 hardening, ALPN enforcement, connection draining, reconnect	M1 (done) + Phase 2
Private 1:1 Channels	Channel creation, per-channel authz, TTL eviction, DM-specific flows	Phase 4
Storage / Persistence	SQLite (or SQLCipher) for AS, DS, client state; migrations; backup/restore	M6 + Phase 6
Observability / Ops	Structured logging, metrics, distributed tracing, healthcheck endpoints	Phase 6
Client Resilience	Offline queue, retry with backoff, idempotent message IDs, gap detection	Phase 4
Compatibility / Protocols	Wire versioning, N-1 client interoperability, ciphersuite negotiation	Phase 2 + Phase 5

Security Plan (By Design)

quicprochat follows a security-by-design philosophy. The standards below are non-negotiable -- see Coding Standards for how they are enforced in code.

Governance

CODEOWNERS file mapping each crate to a responsible reviewer.
All PRs require at least one review from a crate owner.
Security-sensitive changes (crypto, auth, wire format) require two reviewers.
GPG-signed commits only.

Transport Policy

TLS 1.3 only (rustls configured with TLS13 cipher suites exclusively).
ALPN token b"qpc" required; reject connections with mismatched ALPN.
Self-signed certificates acceptable for development; production deployments must use a CA-signed certificate or certificate pinning.
Connection draining on shutdown (QUIC CONNECTION_CLOSE).

MLS Policy

Ciphersuite: MLS_128_DHKEMX25519_AES128GCM_SHA256_Ed25519 (baseline).
Single-use KeyPackages (consumed on fetch, per RFC 9420).
KeyPackage TTL: 24 hours; clients must rotate before expiry.
Ciphersuite allowlist: server rejects KeyPackages with unknown ciphersuites.
No downgrade: once a group has used a ciphersuite, members cannot rejoin with a weaker one.

Input Validation

All incoming Protobuf messages validated against schema before processing.
Maximum payload size: 5 MB per RPC call.
Group ID, identity key, and channel ID fields validated for correct length (32 bytes, 32 bytes, 16 bytes respectively).
UTF-8 validation on all string fields.

Secrets Management

All private key material wrapped in Zeroizing<T> (via the zeroize crate).
No secret material in log output at any level.
No unwrap() on cryptographic operations -- all errors are typed and propagated.
Constant-time comparison for authentication tokens and key fingerprints.

Abuse / DoS Controls

Rate limiting: 50 requests/second per IP, per account, and per device.
Payload cap: 5 MB per message.
Connection limit: configurable max concurrent QUIC connections.
KeyPackage upload limit: configurable per account (prevents store exhaustion).
Long-poll timeout cap: server-enforced maximum for fetchWait.

Data Protection

MLS ciphertext is opaque to the server (DS never holds group keys).
Message retention: 7 days default, configurable.
KeyPackage retention: 24 hours (TTL eviction).
At-rest encryption for persistent storage (SQLCipher at M6).

Logging Safety

Structured logging via tracing with env-filter.
Sensitive fields (keys, tokens, ciphertext) are never logged, even at TRACE.
Audit-level events: auth success/failure, token issuance, keypackage upload, enqueue/fetch, rate limit hits.

Testing

Unit tests for all crypto operations (see Testing Strategy).
Integration tests for every RPC method.
Negative tests: malformed input, expired tokens, wrong identity, replay attempts.
N-1 compatibility tests (old client against new server).
Fuzzing targets for Protobuf parsers and MLS message handling (Phase 5).

Work Breakdown (6 Phases)

Phase 1 -- Baselines and Governance

Goal: Establish project hygiene before adding features.

Task	Description
CODEOWNERS	Map crates to responsible reviewers
CI pipeline	GitHub Actions: `cargo test --workspace`, `cargo clippy`, `cargo fmt --check`, `cargo deny check`
SBOM generation	`cargo-cyclonedx` or `cargo-about` in CI; publish with each release
Threat model	Document assets, adversaries, attack surface, trust boundaries; reference in Threat Model
Dependency audit	`cargo audit` in CI; pin all major versions per Coding Standards

Phase 2 -- Protocols and Core Hardening

Goal: Lock down the wire format and cryptographic policy.

Task	Description
Wire versioning	Version field in all Protobuf frames; reject unknown versions
Ciphersuite allowlist	Server rejects KeyPackages outside the allowed set
Downgrade guards	Prevent epoch rollback; reject Commits with weaker ciphersuites
ALPN enforcement	Reject connections without `b"qpc"` ALPN token
Connection draining	Graceful QUIC `CONNECTION_CLOSE` on server shutdown
KeyPackage rotation	Client-side timer to upload fresh KeyPackages before TTL expiry

Phase 3 -- Auth, Device, and Server Hardening

Goal: Add account/device identity and token-based authentication.

See Auth, Devices, and Tokens for the full design.

Task	Description
Account + device model	`{account_id, device_id, device_pubkey}` with status lifecycle
Token issuance	Access + refresh tokens; configurable expiry
RPC auth middleware	Validate token on every RPC; map to account/device
Identity binding	Bind MLS identity key to account; reject mismatched uploads
Rate limiting	Per-IP, per-account, per-device counters
Audit logging	Auth events, token lifecycle, rate limit hits

Phase 4 -- Delivery Semantics and Client Resilience

Goal: Reliable message delivery and 1:1 channels.

See 1:1 Channel Design for the DM-specific design.

Task	Description
Idempotent message IDs	Client-generated UUIDs; server deduplicates
Ordering guarantees	Per-channel sequence numbers; client detects gaps
Offline queue	Server retains messages for offline recipients (up to TTL)
1:1 channels	Channel creation, membership, per-channel authz
TTL eviction	Background sweep + fetch-time check for expired messages
Client retry	Exponential backoff with jitter on transient failures

Phase 5 -- E2E Harness and Security Tests

Goal: Automated end-to-end testing and security validation.

Task	Description
docker-compose testnet	Multi-node test environment with configurable topology
Positive E2E tests	Full group lifecycle: register, create, invite, join, send, recv, leave
Negative E2E tests	Expired tokens, wrong identity, replay, malformed messages
Compat matrix	N-1 client/server version testing
Fuzz targets	`cargo-fuzz` targets for Protobuf parsers, MLS message handlers
Golden-wire fixtures	Serialised test vectors for regression testing across versions

Phase 6 -- Reliability, Performance, and Operations

Goal: Production-grade operations and performance validation.

Task	Description
SQLite/SQLCipher persistence	AS key store, DS message log, client state (M6)
Soak testing	72-hour continuous operation under synthetic load
Load testing	Throughput and latency benchmarks (Criterion + custom harness)
Chaos testing	Network partitions, process crashes, disk full scenarios
Backup / restore	SQLite backup with integrity verification
Canary / rollback	Rolling deployment strategy with automatic rollback on failure
Metrics + dashboards	Prometheus metrics, Grafana dashboards (see Future Research)

Planning Checklist

Use this checklist when planning a new milestone or phase. Each item should have a documented decision before implementation begins.

Release criteria / SLOs -- Define what "done" means. Latency targets, error rate thresholds, test coverage minimums.
Threat model review -- Update the Threat Model for any new attack surface introduced by this phase.
Protocol policy -- Ciphersuite allowlist, wire version, downgrade rules.
Identity / auth model -- Who authenticates, how, and what operations are gated.
Data model -- Schema changes, migrations, backward compatibility.
Abuse controls -- Rate limits, size caps, connection limits for this phase.
Observability contracts -- What new metrics, logs, and traces are needed.
Environments / secrets -- Dev, staging, production configuration; secret rotation plan.
Testing matrix -- Unit, integration, E2E, negative, fuzz, compat tests for this phase.
Rollout / ops -- Deployment strategy, rollback plan, monitoring during rollout.

Cross-references

Milestones -- feature milestone tracker
Auth, Devices, and Tokens -- Phase 3 design
1:1 Channel Design -- Phase 4 design
Future Research -- technology options for Phase 6+
Coding Standards -- engineering standards
Testing Strategy -- test structure and conventions
Threat Model -- security analysis

10 KiB Raw Blame History