Files
quicproquo/ROADMAP.md
Chris Nennemann 853ca4fec0 chore: rename project quicnprotochat -> quicproquo (binaries: qpq)
Rename the entire workspace:
- Crate packages: quicnprotochat-{core,proto,server,client,gui,p2p,mobile} -> quicproquo-*
- Binary names: quicnprotochat -> qpq, quicnprotochat-server -> qpq-server,
  quicnprotochat-gui -> qpq-gui
- Default files: *-state.bin -> qpq-state.bin, *-server.toml -> qpq-server.toml,
  *.db -> qpq.db
- Environment variable prefix: QUICNPROTOCHAT_* -> QPQ_*
- App identifier: chat.quicnproto.gui -> chat.quicproquo.gui
- Proto package: quicnprotochat.bench -> quicproquo.bench
- All documentation, Docker, CI, and script references updated

HKDF domain-separation strings and P2P ALPN remain unchanged for
backward compatibility with existing encrypted state and wire protocol.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-01 20:11:51 +01:00

377 lines
16 KiB
Markdown
Raw Blame History

This file contains ambiguous Unicode characters
This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.
# Roadmap — quicproquo
> From proof-of-concept to production-grade E2E encrypted messaging.
>
> Each phase is designed to be tackled sequentially. Items within a phase
> can be parallelised. Check the box when done.
---
## Phase 1 — Production Hardening (Critical)
Eliminate all crash paths, enforce secure defaults, fix deployment blockers.
- [ ] **1.1 Remove `.unwrap()` / `.expect()` from production paths**
- Replace `AUTH_CONTEXT.read().expect()` in client RPC with proper `Result`
- Replace `"0.0.0.0:0".parse().unwrap()` in client with fallible parse
- Replace `Mutex::lock().unwrap()` in server storage with `.map_err()`
- Audit: `grep -rn 'unwrap()\|expect(' crates/` outside `#[cfg(test)]`
- [ ] **1.2 Enforce secure defaults in production mode**
- Reject startup if `QPQ_PRODUCTION=true` and `auth_token` is empty or `"devtoken"`
- Require non-empty `db_key` when using SQL backend in production
- Refuse to auto-generate TLS certs in production mode (require existing cert+key)
- Already partially implemented — verify and harden the validation in `config.rs`
- [ ] **1.3 Fix `.gitignore`**
- Add `data/`, `*.der`, `*.pem`, `*.db`, `*.bin` (state files), `*.ks` (keystores)
- Verify no secrets are already tracked: `git ls-files data/ *.der *.db`
- [ ] **1.4 Fix Dockerfile**
- Sync workspace members (handle excluded `p2p` crate)
- Create dedicated user/group instead of `nobody`
- Set writable `QPQ_DATA_DIR` with correct permissions
- Test: `docker build . && docker run --rm -it qpq-server --help`
- [ ] **1.5 TLS certificate lifecycle**
- Document CA-signed cert setup (Let's Encrypt / custom CA)
- Add `--tls-required` flag that refuses to start without valid cert
- Log clear warning when using self-signed certs
- Document certificate rotation procedure
---
## Phase 2 — Test & CI Maturity
Build confidence before adding features.
- [ ] **2.1 Expand E2E test coverage**
- Auth failure scenarios (wrong password, expired token, invalid token)
- Message ordering verification (send N messages, verify seq numbers)
- Concurrent clients (3+ members in group, simultaneous send/recv)
- OPAQUE registration + login full flow
- Queue full behavior (>1000 messages)
- Rate limiting behavior (>100 enqueues/minute)
- Reconnection after server restart
- KeyPackage exhaustion (fetch when none available)
- [ ] **2.2 Add unit tests for untested paths**
- Client retry logic (exponential backoff, jitter, retriable classification)
- REPL input parsing edge cases (empty input, special characters, `/` commands)
- State file encryption/decryption round-trip with bad password
- Token cache expiry
- Conversation store migrations
- [ ] **2.3 CI hardening**
- Add `.github/CODEOWNERS` (crypto, auth, wire-format require 2 reviewers)
- Ensure `cargo deny check` runs on every PR (already in CI — verify)
- Add `cargo audit` as blocking check (already in CI — verify)
- Add coverage reporting (tarpaulin or llvm-cov)
- Add CI job for Docker build validation
- [ ] **2.4 Clean up build warnings**
- Fix Cap'n Proto generated `unused_parens` warnings
- Remove dead code / unused imports
- Address `openmls` future-incompat warnings
- Target: `cargo clippy --workspace -- -D warnings` passes clean
---
## Phase 3 — Client SDKs: Native QUIC + Cap'n Proto Everywhere
**No REST gateway. No protocol dilution.** The `.capnp` schemas are the
interface definition. Every SDK speaks native QUIC + Cap'n Proto. The
project name stays honest.
### Why this matters
The name is **quic**n**proto**chat — the protocol IS the product. Instead
of adding an HTTP translation layer that loses zero-copy performance and
adds base64 overhead, we invest in making the native protocol accessible
from every language that has QUIC + Cap'n Proto support, and provide
WASM/FFI for the crypto layer.
### Architecture
```
Server: QUIC + Cap'n Proto (single protocol, no gateway)
Client SDKs:
┌─── Rust quinn + capnp-rpc (existing, reference impl)
├─── Go quic-go + go-capnp (native, high confidence)
├─── Python aioquic + pycapnp (native QUIC, manual framing)
├─── C/C++ msquic/ngtcp2 + capnproto (reference impl, full RPC)
└─── Browser WebTransport + capnp (WASM) (QUIC transport, no HTTP needed)
Crypto layer (client-side MLS, shared across all SDKs):
┌─── Rust crate (native, existing)
├─── WASM module (browsers, Node.js, Deno)
└─── C FFI (Swift, Kotlin, Python, Go via cgo)
```
### Language support reality check
| Language | QUIC | Cap'n Proto | RPC | Confidence |
|----------|------|-------------|-----|------------|
| **Rust** | quinn ✅ | capnp-rpc ✅ | Full ✅ | Existing |
| **Go** | quic-go ✅ | go-capnp ✅ | Level 1 ✅ | High |
| **Python** | aioquic ✅ | pycapnp ⚠️ | Manual framing | Medium |
| **C/C++** | msquic/ngtcp2 ✅ | capnproto ✅ | Full ✅ | High |
| **Browser** | WebTransport ✅ | WASM ✅ | Via WASM bridge | Medium |
### Implementation
- [ ] **3.1 Go SDK (`quicproquo-go`)**
- Generate Go types: `capnp compile -ogo schemas/node.capnp`
- QUIC transport: `quic-go` with TLS 1.3 + ALPN `"capnp"`
- Cap'n Proto RPC framing over QUIC bidirectional stream
- Auth context: bearer token + session management
- Retry with exponential backoff (mirror Rust client pattern)
- Publish: `go get git.xorwell.de/c/quicproquo-go`
- Example: CLI client matching Rust feature set
- [ ] **3.2 Python SDK (`quicproquo-py`)**
- QUIC transport: `aioquic` with custom Cap'n Proto stream handler
- Cap'n Proto serialization: `pycapnp` for message types
- Manual RPC framing: length-prefixed request/response over QUIC stream
- Async/await API matching the Rust client patterns
- Crypto: PyO3 bindings to `quicproquo-core` for MLS operations
- Publish: PyPI `quicproquo`
- Example: async bot client
- [ ] **3.3 C FFI layer (`quicproquo-ffi`)**
- New crate in workspace: `crates/quicproquo-ffi`
- `cbindgen` to generate `quicproquo.h` C header
- Crypto functions: `qpc_identity_new()`, `qpc_group_create()`,
`qpc_encrypt()`, `qpc_decrypt()`, `qpc_key_package_generate()`
- Transport functions: `qpc_connect()`, `qpc_enqueue()`, `qpc_fetch()`,
`qpc_fetch_wait()` (bundles QUIC + Cap'n Proto internally)
- Memory: caller-allocated buffers with length, no ownership transfer
- Builds as `libquicproquo.so` / `.dylib` / `.dll`
- Swift and Kotlin wrapper examples using the C header
- [ ] **3.4 WASM compilation of `quicproquo-core`**
- `wasm-pack build` target for browser + Node.js
- Crypto-only: `GroupMember`, `IdentityKeypair`, `AppMessage`,
`hybrid_encrypt/decrypt`, `generate_key_package`
- Transport NOT included (browsers use WebTransport, see Phase 3.5)
- Publish to npm: `@quicproquo/core`
- TypeScript type definitions auto-generated via `wasm-bindgen`
- [ ] **3.5 WebTransport server endpoint**
- Add HTTP/3 + WebTransport listener to server (same QUIC stack via quinn)
- Cap'n Proto RPC framed over WebTransport bidirectional streams
- Same auth, same storage, same RPC handlers — just a different stream source
- Browsers connect via `new WebTransport("https://server:7443")`
- ALPN negotiation: `"h3"` for WebTransport, `"capnp"` for native QUIC
- Configurable port: `--webtransport-listen 0.0.0.0:7443`
- Feature-flagged: `--features webtransport`
- [ ] **3.6 TypeScript/JavaScript SDK (`@quicproquo/client`)**
- WebTransport for QUIC connectivity (no HTTP fallback)
- WASM module (Phase 3.4) for MLS crypto
- Cap'n Proto serialization via WASM bridge
- Handles: auth flow, key upload, message send/receive, group management
- Publish to npm: `@quicproquo/client`
- Example: browser chat UI
- [ ] **3.7 SDK documentation and schema publishing**
- Publish `.capnp` schemas as the canonical API contract
- Document the QUIC + Cap'n Proto connection pattern for each language
- Provide a "build your own SDK" guide (QUIC stream → Cap'n Proto RPC bootstrap)
- Reference implementation checklist: connect, auth, upload key, enqueue, fetch
---
## Phase 4 — Trust & Security Infrastructure
Address the security gaps required for real-world deployment.
- [ ] **4.1 Third-party cryptographic audit**
- Scope: MLS integration, OPAQUE flow, hybrid KEM, key lifecycle, zeroization
- Firms: NCC Group, Trail of Bits, Cure53
- Budget and timeline: typically 4-6 weeks, $50K$150K
- Publish report publicly (builds trust)
- [ ] **4.2 Key Transparency / revocation**
- Replace `BasicCredential` with X.509-based MLS credentials
- Or: verifiable key directory (Merkle tree, auditable log)
- Users can verify peer keys haven't been substituted (MITM detection)
- Revocation mechanism for compromised keys
- [ ] **4.3 Client authentication on Delivery Service**
- Currently server trusts claimed identity key on enqueue
- Bind enqueue operations to the authenticated session's identity key
- Prevent: client A fetching/sending as client B's identity
- Backward compat: sealed_sender mode for anonymous enqueue
- [ ] **4.4 M7 — Post-quantum MLS integration**
- Integrate hybrid KEM (X25519 + ML-KEM-768) into the OpenMLS crypto provider
- Group key material gets post-quantum confidentiality
- Full test suite with PQ ciphersuite
- Ref: existing `hybrid_kem.rs` and `hybrid_crypto.rs`
- [ ] **4.5 Username enumeration mitigation**
- Constant-time or uniform response for unknown users during OPAQUE login
- Prevent timing side-channels that reveal user existence
---
## Phase 5 — Features & UX
Make it a product people want to use.
- [ ] **5.1 Multi-device support**
- Account → multiple devices, each with own Ed25519 key + MLS KeyPackages
- Device graph management (add device, remove device, list devices)
- Messages delivered to all devices of a user
- `device_id` field already in Auth struct — wire it through
- [ ] **5.2 Account recovery**
- Recovery codes or backup key (encrypted, stored by user)
- Option: server-assisted recovery with security questions (lower security)
- MLS state re-establishment after device loss
- [ ] **5.3 Full MLS lifecycle**
- Member removal (Remove proposal → Commit → fan-out)
- Credential update (Update proposal for key rotation)
- Explicit proposal handling (queue proposals, batch commit)
- Group metadata (name, description, avatar hash)
- [ ] **5.4 Message editing and deletion**
- New `AppMessage` variants: `Edit { target_seq, new_content }`, `Delete { target_seq }`
- Client-side tombstones, server doesn't know about edits
- [ ] **5.5 File and media transfer**
- Upload encrypted blob → get content hash
- Share hash + symmetric key inside MLS message
- Download by hash, decrypt client-side
- Size limits, content-type validation
- [ ] **5.6 Abuse prevention and moderation**
- Block user (client-side, suppress display)
- Report message (encrypted report to admin key)
- Admin tools: ban user, delete account, audit log
- [ ] **5.7 Offline message queue (client-side)**
- Queue messages when disconnected, send on reconnect
- Idempotent message IDs to prevent duplicates
- Gap detection: compare local seq with server seq
---
## Phase 6 — Scale & Operations
Prepare for real traffic.
- [ ] **6.1 Distributed rate limiting**
- Current: in-memory per-process, lost on restart
- Move to Redis or shared state for multi-node deployments
- Sliding window with configurable thresholds
- [ ] **6.2 Multi-node / horizontal scaling**
- Stateless server design (already mostly there — state is in storage backend)
- Shared PostgreSQL or CockroachDB backend (replace SQLite)
- Message queue fan-out (Redis pub/sub or NATS for cross-node notification)
- Load balancer health check via QUIC RPC `health()` or Prometheus `/metrics`
- [ ] **6.3 Operational runbook**
- Backup / restore procedures (SQLCipher, file backend)
- Key rotation (auth token, TLS cert, DB encryption key)
- Incident response playbook
- Scaling guide (when to add nodes, resource sizing)
- Monitoring dashboard templates (Grafana + Prometheus)
- [ ] **6.4 Connection draining and graceful shutdown**
- Stop accepting new connections on SIGTERM
- Wait for in-flight RPCs (configurable timeout, default 30s)
- Drain WebTransport sessions with close frame
- Document expected behavior for load balancers (health → unhealthy first)
- [ ] **6.5 Request-level timeouts**
- Per-RPC timeout (prevent slow clients from holding resources)
- Database query timeout
- Overall request deadline propagation
- [ ] **6.6 Observability enhancements**
- Request correlation IDs (trace across RPC → storage)
- Storage operation latency metrics
- Per-endpoint latency histograms
- Structured audit log to persistent storage (not just stdout)
- OpenTelemetry integration
---
## Phase 7 — Platform Expansion & Research
Long-term vision for wide adoption.
- [ ] **7.1 Mobile clients (iOS + Android)**
- Use C FFI (Phase 3.3) for crypto + transport (single library)
- Push notifications via APNs / FCM (server sends notification on enqueue)
- Background QUIC connection for message polling
- Biometric auth for local key storage (Keychain / Android Keystore)
- [ ] **7.2 Web client (browser)**
- Use WASM (Phase 3.4) for crypto
- Use WebTransport (Phase 3.5) for native QUIC transport
- Cap'n Proto via WASM bridge (Phase 3.6)
- IndexedDB for local state persistence
- Service Worker for background notifications
- Progressive Web App (PWA) support
- [ ] **7.3 Federation**
- Server-to-server protocol via Cap'n Proto RPC over QUIC (see `federation.capnp`)
- `relayEnqueue`, `proxyFetchKeyPackage`, `federationHealth` methods
- Identity resolution across federated servers
- MLS group spanning multiple servers
- Trust model for federated deployments
- [ ] **7.4 Sealed Sender**
- Sender identity inside MLS ciphertext only (server can't see who sent)
- Requires: sender certificate + encrypted sender proof
- Ref: Signal's Sealed Sender design
- [ ] **7.5 Additional language SDKs**
- Java/Kotlin: JNI bindings to C FFI (Phase 3.3) + native QUIC (netty-quic)
- Swift: Swift wrapper over C FFI + Network.framework QUIC
- Ruby: FFI bindings via `quicproquo-ffi`
- Evaluate demand-driven — only build SDKs people request
- [ ] **7.6 P2P / NAT traversal**
- Direct peer-to-peer via iroh (foundation exists in `quicproquo-p2p`)
- Server as fallback relay only
- Reduces latency and single-point-of-failure
- Ref: `FUTURE-IMPROVEMENTS.md § 6.1`
- [ ] **7.7 Traffic analysis resistance**
- Padding messages to uniform size
- Decoy traffic to mask timing patterns
- Optional Tor/I2P routing for IP privacy
- Ref: `FUTURE-IMPROVEMENTS.md § 5.4, 6.3`
---
## Summary Timeline
| Phase | Focus | Estimated Effort |
|-------|-------|-----------------|
| **1** | Production Hardening | 12 days |
| **2** | Test & CI Maturity | 23 days |
| **3** | Client SDKs (Go, Python, WASM, FFI, WebTransport) | 58 days |
| **4** | Trust & Security Infrastructure | 24 days (excl. audit) |
| **5** | Features & UX | 57 days |
| **6** | Scale & Operations | 35 days |
| **7** | Platform Expansion & Research | ongoing |
---
## Related Documents
- [Future Improvements](docs/FUTURE-IMPROVEMENTS.md) — consolidated improvement list
- [Production Readiness Audit](docs/PRODUCTION-READINESS-AUDIT.md) — specific blockers
- [Security Audit](docs/SECURITY-AUDIT.md) — findings and recommendations
- [Milestone Tracker](docs/src/roadmap/milestones.md) — M1M7 status
- [Auth, Devices, and Tokens](docs/src/roadmap/authz-plan.md) — authorization design
- [DM Channel Design](docs/src/roadmap/dm-channels.md) — 1:1 channel spec