Rename all project references from quicproquo/qpq to quicprochat/qpc across documentation, Docker configuration, CI workflows, packaging scripts, operational configs, and build tooling. - Docker: crate paths, binary names, user/group, data dirs, env vars - CI: workflow crate references, binary names, artifact names - Docs: all markdown files under docs/, SDK READMEs, book.toml - Packaging: OpenWrt Makefile, init script, UCI config (file renames) - Scripts: justfile, dev-shell, screenshot, cross-compile, ai_team - Operations: Prometheus config, alert rules, Grafana dashboard - Config: .env.example (QPQ_* → QPC_*), CODEOWNERS paths - Top-level: README, CONTRIBUTING, ROADMAP, CLAUDE.md
22 KiB
Roadmap — quicprochat
From proof-of-concept to production-grade E2E encrypted messaging.
Each phase is designed to be tackled sequentially. Items within a phase can be parallelised. Check the box when done.
Phase 1 — Production Hardening (Critical)
Eliminate all crash paths, enforce secure defaults, fix deployment blockers.
-
1.1 Remove
.unwrap()/.expect()from production paths- Replace
AUTH_CONTEXT.read().expect()in client RPC with properResult - Replace
"0.0.0.0:0".parse().unwrap()in client with fallible parse - Replace
Mutex::lock().unwrap()in server storage with.map_err() - Audit:
grep -rn 'unwrap()\|expect(' crates/outside#[cfg(test)]
- Replace
-
1.2 Enforce secure defaults in production mode
- Reject startup if
QPC_PRODUCTION=trueandauth_tokenis empty or"devtoken" - Require non-empty
db_keywhen using SQL backend in production - Refuse to auto-generate TLS certs in production mode (require existing cert+key)
- Already partially implemented — verify and harden the validation in
config.rs
- Reject startup if
-
1.3 Fix
.gitignore- Add
data/,*.der,*.pem,*.db,*.bin(state files),*.ks(keystores) - Verify no secrets are already tracked:
git ls-files data/ *.der *.db
- Add
-
1.4 Fix Dockerfile
- Sync workspace members (handle excluded
p2pcrate) - Create dedicated user/group instead of
nobody - Set writable
QPC_DATA_DIRwith correct permissions - Test:
docker build . && docker run --rm -it qpc-server --help
- Sync workspace members (handle excluded
-
1.5 TLS certificate lifecycle
- Document CA-signed cert setup (Let's Encrypt / custom CA)
- Add
--tls-requiredflag that refuses to start without valid cert - Log clear warning when using self-signed certs
- Document certificate rotation procedure
Phase 2 — Test & CI Maturity
Build confidence before adding features.
-
2.1 Expand E2E test coverage
- Auth failure scenarios (wrong password, expired token, invalid token)
- Message ordering verification (send N messages, verify seq numbers)
- Concurrent clients (3+ members in group, simultaneous send/recv)
- OPAQUE registration + login full flow
- Queue full behavior (>1000 messages)
- Rate limiting behavior (>100 enqueues/minute)
- Reconnection after server restart
- KeyPackage exhaustion (fetch when none available)
-
2.2 Add unit tests for untested paths
- Client retry logic (exponential backoff, jitter, retriable classification)
- REPL input parsing edge cases (empty input, special characters,
/commands) - State file encryption/decryption round-trip with bad password
- Token cache expiry
- Conversation store migrations
-
2.3 CI hardening
- Add
.github/CODEOWNERS(crypto, auth, wire-format require 2 reviewers) - Ensure
cargo deny checkruns on every PR (already in CI — verify) - Add
cargo auditas blocking check (already in CI — verify) - Add coverage reporting (tarpaulin or llvm-cov)
- Add CI job for Docker build validation
- Add
-
2.4 Clean up build warnings
- Fix Cap'n Proto generated
unused_parenswarnings - Remove dead code / unused imports
- Address
openmlsfuture-incompat warnings - Target:
cargo clippy --workspace -- -D warningspasses clean
- Fix Cap'n Proto generated
Phase 3 — Client SDKs: Native QUIC + Cap'n Proto Everywhere
No REST gateway. No protocol dilution. The .capnp schemas are the
interface definition. Every SDK speaks native QUIC + Cap'n Proto. The
project name stays honest.
Why this matters
The name is quicnprotochat — the protocol IS the product. Instead of adding an HTTP translation layer that loses zero-copy performance and adds base64 overhead, we invest in making the native protocol accessible from every language that has QUIC + Cap'n Proto support, and provide WASM/FFI for the crypto layer.
Architecture
Server: QUIC + Cap'n Proto (single protocol, no gateway)
Client SDKs:
┌─── Rust quinn + capnp-rpc (existing, reference impl)
├─── Go quic-go + go-capnp (native, high confidence)
├─── Python aioquic + pycapnp (native QUIC, manual framing)
├─── C/C++ msquic/ngtcp2 + capnproto (reference impl, full RPC)
└─── Browser WebTransport + capnp (WASM) (QUIC transport, no HTTP needed)
Crypto layer (client-side MLS, shared across all SDKs):
┌─── Rust crate (native, existing)
├─── WASM module (browsers, Node.js, Deno)
└─── C FFI (Swift, Kotlin, Python, Go via cgo)
Language support reality check
| Language | QUIC | Cap'n Proto | RPC | Confidence |
|---|---|---|---|---|
| Rust | quinn ✅ | capnp-rpc ✅ | Full ✅ | Existing |
| Go | quic-go ✅ | go-capnp ✅ | Level 1 ✅ | High |
| Python | aioquic ✅ | pycapnp ⚠️ | Manual framing | Medium |
| C/C++ | msquic/ngtcp2 ✅ | capnproto ✅ | Full ✅ | High |
| Browser | WebTransport ✅ | WASM ✅ | Via WASM bridge | Medium |
Implementation
-
3.1 Go SDK (
quicprochat-go)- Generated Go types from
node.capnp(6487-line codegen, all 24 RPC methods) - QUIC transport via
quic-gowith TLS 1.3 + ALPN"capnp" - High-level
qpcpackage: Connect, Health, ResolveUser, CreateChannel, Send/SendWithTTL, Receive/ReceiveWait, DeleteAccount, OPAQUE auth - Example CLI in
sdks/go/cmd/example/
- Generated Go types from
-
3.2 Python SDK (
quicprochat-py)- QUIC transport:
aioquicwith custom Cap'n Proto stream handler - Cap'n Proto serialization:
pycapnpfor message types - Manual RPC framing: length-prefixed request/response over QUIC stream
- Async/await API matching the Rust client patterns
- Crypto: PyO3 bindings to
quicprochat-corefor MLS operations - Publish: PyPI
quicprochat - Example: async bot client
- QUIC transport:
-
3.3 C FFI layer (
quicprochat-ffi)crates/quicprochat-ffiwith 7 extern "C" functions: connect, login, send, receive, disconnect, last_error, free_string- Builds as
libquicprochat_ffi.so/.dylib/.dll - Python ctypes wrapper in
examples/python/qpc_client.py
-
3.4 WASM compilation of
quicprochat-corewasm-pack buildtarget producing 175 KB WASM bundle (LTO + opt-level=s)- 13
wasm_bindgenfunctions: Ed25519 identity, hybrid KEM, safety numbers, sealed sender, padding - Browser-ready with
crypto.getRandomValues()RNG - Published as
sdks/typescript/wasm-crypto/
-
3.5 WebTransport server endpoint
- Add HTTP/3 + WebTransport listener to server (same QUIC stack via quinn)
- Cap'n Proto RPC framed over WebTransport bidirectional streams
- Same auth, same storage, same RPC handlers — just a different stream source
- Browsers connect via
new WebTransport("https://server:7443") - ALPN negotiation:
"h3"for WebTransport,"capnp"for native QUIC - Configurable port:
--webtransport-listen 0.0.0.0:7443 - Feature-flagged:
--features webtransport
-
3.6 TypeScript/JavaScript SDK (
@quicprochat/client)QpqClientclass: connect, offline, health, resolveUser, createChannel, send/sendWithTTL, receive, deleteAccount- WASM crypto wrapper: generateIdentity, sign/verify, hybridEncrypt/Decrypt, computeSafetyNumber, sealedSend, pad
- WebSocket transport with request/response correlation and reconnection
- Browser demo: interactive crypto playground + chat UI (
sdks/typescript/demo/index.html)
-
3.7 SDK documentation and schema publishing
- Publish
.capnpschemas as the canonical API contract - Document the QUIC + Cap'n Proto connection pattern for each language
- Provide a "build your own SDK" guide (QUIC stream → Cap'n Proto RPC bootstrap)
- Reference implementation checklist: connect, auth, upload key, enqueue, fetch
- Publish
Phase 4 — Trust & Security Infrastructure
Address the security gaps required for real-world deployment.
-
4.1 Third-party cryptographic audit
- Scope: MLS integration, OPAQUE flow, hybrid KEM, key lifecycle, zeroization
- Firms: NCC Group, Trail of Bits, Cure53
- Budget and timeline: typically 4-6 weeks, $50K–$150K
- Publish report publicly (builds trust)
-
4.2 Key Transparency / revocation
- Replace
BasicCredentialwith X.509-based MLS credentials - Or: verifiable key directory (Merkle tree, auditable log)
- Users can verify peer keys haven't been substituted (MITM detection)
- Revocation mechanism for compromised keys
- Replace
-
4.3 Client authentication on Delivery Service
- DS sender identity binding with explicit audit logging
sender_prefixtracking in enqueue/batch_enqueue RPCs- Sender identity derived from authenticated session
-
4.4 M7 — Post-quantum MLS integration
- Integrate hybrid KEM (X25519 + ML-KEM-768) into the OpenMLS crypto provider
- Group key material gets post-quantum confidentiality
- Full test suite with PQ ciphersuite
- Ref: existing
hybrid_kem.rsandhybrid_crypto.rs
-
4.5 Username enumeration mitigation
- 5 ms timing floor on
resolveUserresponses - Rate limiting to prevent bulk enumeration attacks
- 5 ms timing floor on
Phase 5 — Features & UX
Make it a product people want to use.
-
5.1 Multi-device support
- Account → multiple devices, each with own Ed25519 key + MLS KeyPackages
- Device graph management (add device, remove device, list devices)
- Messages delivered to all devices of a user
device_idfield already in Auth struct — wire it through
-
5.2 Account recovery
- Recovery codes or backup key (encrypted, stored by user)
- Option: server-assisted recovery with security questions (lower security)
- MLS state re-establishment after device loss
-
5.3 Full MLS lifecycle
- Member removal (Remove proposal → Commit → fan-out)
- Credential update (Update proposal for key rotation)
- Explicit proposal handling (queue proposals, batch commit)
- Group metadata (name, description, avatar hash)
-
5.4 Message editing and deletion
Edit(0x06) andDelete(0x07) message types inAppMessage/edit <index> <text>and/delete <index>REPL commands (own messages only)- Database update/removal on incoming edit/delete
-
5.5 File and media transfer
uploadBlob/downloadBlobRPCs with 256 KB chunked streaming- SHA-256 content-addressable storage with hash verification
FileRef(0x08) message type with blob_id, filename, file_size, mime_type/send-file <path>and/download <index>REPL commands with progress bars- 50 MB max file size, automatic MIME detection via
mime_guess
-
5.6 Abuse prevention and moderation
- Block user (client-side, suppress display)
- Report message (encrypted report to admin key)
- Admin tools: ban user, delete account, audit log
-
5.7 Offline message queue (client-side)
- Queue messages when disconnected, send on reconnect
- Idempotent message IDs to prevent duplicates
- Gap detection: compare local seq with server seq
Phase 6 — Scale & Operations
Prepare for real traffic.
-
6.1 Distributed rate limiting
- Current: in-memory per-process, lost on restart
- Move to Redis or shared state for multi-node deployments
- Sliding window with configurable thresholds
-
6.2 Multi-node / horizontal scaling
- Stateless server design (already mostly there — state is in storage backend)
- Shared PostgreSQL or CockroachDB backend (replace SQLite)
- Message queue fan-out (Redis pub/sub or NATS for cross-node notification)
- Load balancer health check via QUIC RPC
health()or Prometheus/metrics
-
6.3 Operational runbook
- Backup / restore procedures (SQLCipher, file backend)
- Key rotation (auth token, TLS cert, DB encryption key)
- Incident response playbook
- Scaling guide (when to add nodes, resource sizing)
- Monitoring dashboard templates (Grafana + Prometheus)
-
6.4 Connection draining and graceful shutdown
- Stop accepting new connections on SIGTERM
- Wait for in-flight RPCs (configurable timeout, default 30s)
- Drain WebTransport sessions with close frame
- Document expected behavior for load balancers (health → unhealthy first)
-
6.5 Request-level timeouts
- Per-RPC timeout (prevent slow clients from holding resources)
- Database query timeout
- Overall request deadline propagation
-
6.6 Observability enhancements
- Request correlation IDs (trace across RPC → storage)
- Storage operation latency metrics
- Per-endpoint latency histograms
- Structured audit log to persistent storage (not just stdout)
- OpenTelemetry integration
Phase 7 — Platform Expansion & Research
Long-term vision for wide adoption.
-
7.1 Mobile clients (iOS + Android)
- Use C FFI (Phase 3.3) for crypto + transport (single library)
- Push notifications via APNs / FCM (server sends notification on enqueue)
- Background QUIC connection for message polling
- Biometric auth for local key storage (Keychain / Android Keystore)
-
7.2 Web client (browser)
- Use WASM (Phase 3.4) for crypto
- Use WebTransport (Phase 3.5) for native QUIC transport
- Cap'n Proto via WASM bridge (Phase 3.6)
- IndexedDB for local state persistence
- Service Worker for background notifications
- Progressive Web App (PWA) support
-
7.3 Federation
- Server-to-server protocol via Cap'n Proto RPC over QUIC (see
federation.capnp) relayEnqueue,proxyFetchKeyPackage,federationHealthmethods- Identity resolution across federated servers
- MLS group spanning multiple servers
- Trust model for federated deployments
- Server-to-server protocol via Cap'n Proto RPC over QUIC (see
-
7.4 Sealed Sender
- Sender identity inside MLS ciphertext only (server can't see who sent)
sealed_sendermodule in quicprochat-core with seal/unseal API- WASM-accessible via
wasm_bindgenfor browser use
-
7.5 Additional language SDKs
- Java/Kotlin: JNI bindings to C FFI (Phase 3.3) + native QUIC (netty-quic)
- Swift: Swift wrapper over C FFI + Network.framework QUIC
- Ruby: FFI bindings via
quicprochat-ffi - Evaluate demand-driven — only build SDKs people request
-
7.6 P2P / NAT traversal
- Direct peer-to-peer via iroh (foundation exists in
quicprochat-p2p) - Server as fallback relay only
- Reduces latency and single-point-of-failure
- Ref:
FUTURE-IMPROVEMENTS.md § 6.1
- Direct peer-to-peer via iroh (foundation exists in
-
7.7 Traffic analysis resistance
- Padding messages to uniform size
- Decoy traffic to mask timing patterns
- Optional Tor/I2P routing for IP privacy
- Ref:
FUTURE-IMPROVEMENTS.md § 5.4, 6.3
Phase 8 — Freifunk / Community Mesh Networking
Make qpc a first-class citizen on decentralised, community-operated wireless networks (Freifunk, BATMAN-adv/Babel routing, OpenWrt). Multiple qpc nodes form a federated mesh; clients auto-discover nearby nodes via mDNS; the network functions without any central infrastructure or internet uplink.
Architecture
Client A ─── mDNS discovery ──► nearby qpc node (LAN / mesh)
│
Cap'n Proto federation
│
remote qpc node (across mesh)
-
F0 — Re-include
quicprochat-p2pin workspace; fix ALPN strings- Moved
crates/quicprochat-p2pfromexcludeback into[workspace] members - Fixed ALPN
b"quicnprotochat/p2p/1"→b"quicprochat/p2p/1"(breaking wire change) - Fixed federation ALPN
b"qnpc-fed"→b"quicprochat/federation/1" - Feature-gated behind
--features meshon client (keeps iroh out of default builds)
- Moved
-
F1 — Federation routing in message delivery
handle_enqueueandhandle_batch_enqueuecallfederation::routing::resolve_destination()- Recipients with a remote home server are relayed via
FederationClient::relay_enqueue() - mTLS mutual authentication between nodes (both present client certs, validated against shared CA)
- Config:
QPC_FEDERATION_LISTEN,QPC_LOCAL_DOMAIN,QPC_FEDERATION_CERT/KEY/CA
-
F2 — mDNS local peer discovery
- Server announces
_quicprochat._udp.local.on startup viamdns-sd - Client:
MeshDiscovery::start()browses for nearby nodes (feature-gated) - REPL commands:
/mesh peers(scan + list),/mesh server <host:port>(note address) - Nodes announce:
ver=1,server=<host:port>,domain=<local_domain>TXT records
- Server announces
-
F3 — Self-sovereign mesh identity
- Ed25519 keypair-based identity independent of AS registration
- JSON-persisted seed + known peers directory
- Sign/verify operations for mesh authenticity (
crates/quicprochat-p2p/src/identity.rs)
-
F4 — Store-and-forward with TTL
MeshEnvelopewith TTL-based expiry, hop_count tracking, max_hops routing limit- SHA-256 deduplication ID prevents relay loops
- Ed25519 signature verification on envelopes
MeshStorein-memory queue with per-recipient capacity limits and TTL-based GC
-
F5 — Lightweight broadcast channels
- Symmetric ChaCha20-Poly1305 encrypted channels (no MLS overhead)
- Topic-based pub/sub via
BroadcastChannelandBroadcastManager - Subscribe/unsubscribe, create, publish API on
P2pNode
-
F6 — Extended
/meshREPL commands/mesh send <peer_id> <msg>— direct P2P message via iroh/mesh broadcast <topic> <msg>— publish to broadcast channel/mesh subscribe <topic>— join broadcast channel/mesh route— show routing table/mesh identity— show mesh identity info/mesh store— show store-and-forward statistics
-
F7 — OpenWrt cross-compilation guide
- Musl static builds:
x86_64-unknown-linux-musl,armv7-unknown-linux-musleabihf,mips-unknown-linux-musl - Strip binary:
--release+strip→ target size < 5 MB for flash storage opkgpackage manifest for OpenWrt feedprocdinit script +uciconfig file for OpenWrt integration- CI job: cross-compile and size-check on every release tag
- Musl static builds:
-
F8 — Traffic analysis resistance for mesh
- Uniform message padding to nearest 256-byte boundary (hides message size)
- Configurable decoy traffic rate (fake messages to mask send timing)
- Optional onion routing: 3-hop relay through other mesh nodes (no Tor dependency)
- Ref: Phase 7.7 for server-side traffic analysis resistance
Phase 9 — Developer Experience & Community Growth
Features designed to attract contributors, create demo/showcase potential, and lower the barrier to entry for non-crypto developers.
-
9.1 Criterion Benchmark Suite (
qpc-bench)- Criterion benchmarks for all crypto primitives: hybrid KEM encap/decap, MLS group-add at 10/100/1000 members, epoch rotation, Noise_XX handshake
- CI publishes HTML benchmark reports as GitHub Actions artifacts
- Citable numbers — no other project benchmarks MLS + PQ-KEM in Rust
-
9.2 Safety Numbers (key verification)
- 60-digit numeric code derived from two identity keys (Signal-style)
/verify <username>REPL command for out-of-band verification- Available in WASM via
compute_safety_numberbinding
-
9.3 Full-Screen TUI (Ratatui + Crossterm)
qpc tuilaunches a full-screen terminal UI: message pane, input bar, channel sidebar with unread counts, MLS epoch indicator- Feature-gated
--features tuito keep ratatui/crossterm out of default builds - Existing REPL and CLI subcommands are unaffected
-
9.4 Delivery Proof Canary Tokens
- Server signs
Ed25519(SHA-256(message_id || recipient || timestamp))on enqueue - Sender stores proof locally — cryptographic evidence the server queued the message
- Cap'n Proto schema gains optional
deliveryProof: Dataon enqueue response
- Server signs
-
9.5 Verifiable Transcript Archive
GroupMember::export_transcript(path, password)writes encrypted, tamper-evident message archive (CBOR records, Argon2id + ChaCha20-Poly1305, Merkle chain)qpc export verifyCLI command independently verifies chain integrity- Useful for legal discovery, audit, or personal backup
-
9.6 Key Transparency (Merkle-Log Identity Binding)
- Append-only Merkle log of (username, identity_key) bindings in the AS
- Clients receive inclusion proofs alongside key fetches
- Any client can independently audit the full identity history
- Lightweight subset of RFC 9162 adapted for identity keys
-
9.7 Dynamic Server Plugin System
- Server loads
.so/.dylibplugins at runtime via--plugin-dir - C-compatible
HookVTableviaextern "C"— plugins in any language - 6 hook points: on_message_enqueue, on_batch_enqueue, on_auth, on_channel_created, on_fetch, on_user_registered
- Example plugins: logging plugin, rate limit plugin (512 KiB payload enforcement)
- Server loads
-
9.8 PQ Noise Transport Layer
- Hybrid
Noise_XX + ML-KEM-768handshake for post-quantum transport security - Closes the harvest-now-decrypt-later gap on handshake metadata (ADR-006)
- Feature-gated
--features pq-noise; classical Noise_XX default preserved - May require extending or forking
snowcrate'sCryptoResolver
- Hybrid
Summary Timeline
| Phase | Focus | Estimated Effort |
|---|---|---|
| 1 | Production Hardening | 1–2 days |
| 2 | Test & CI Maturity | 2–3 days |
| 3 | Client SDKs (Go, Python, WASM, FFI, WebTransport) | 5–8 days |
| 4 | Trust & Security Infrastructure | 2–4 days (excl. audit) |
| 5 | Features & UX | 5–7 days |
| 6 | Scale & Operations | 3–5 days |
| 7 | Platform Expansion & Research | ongoing |
| 8 | Freifunk / Community Mesh | ongoing |
| 9 | Developer Experience & Community Growth | 3–5 days |
Related Documents
- Future Improvements — consolidated improvement list
- Production Readiness Audit — specific blockers
- Security Audit — findings and recommendations
- Milestone Tracker — M1–M7 status
- Auth, Devices, and Tokens — authorization design
- DM Channel Design — 1:1 channel spec