chore: rename project quicnprotochat -> quicproquo (binaries: qpq)

Rename the entire workspace:
- Crate packages: quicnprotochat-{core,proto,server,client,gui,p2p,mobile} -> quicproquo-*
- Binary names: quicnprotochat -> qpq, quicnprotochat-server -> qpq-server,
  quicnprotochat-gui -> qpq-gui
- Default files: *-state.bin -> qpq-state.bin, *-server.toml -> qpq-server.toml,
  *.db -> qpq.db
- Environment variable prefix: QUICNPROTOCHAT_* -> QPQ_*
- App identifier: chat.quicnproto.gui -> chat.quicproquo.gui
- Proto package: quicnprotochat.bench -> quicproquo.bench
- All documentation, Docker, CI, and script references updated

HKDF domain-separation strings and P2P ALPN remain unchanged for
backward compatibility with existing encrypted state and wire protocol.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
This commit is contained in:
2026-03-01 20:11:51 +01:00
parent 553de3a2b7
commit 853ca4fec0
152 changed files with 4070 additions and 788 deletions

376
ROADMAP.md Normal file
View File

@@ -0,0 +1,376 @@
# Roadmap — quicproquo
> From proof-of-concept to production-grade E2E encrypted messaging.
>
> Each phase is designed to be tackled sequentially. Items within a phase
> can be parallelised. Check the box when done.
---
## Phase 1 — Production Hardening (Critical)
Eliminate all crash paths, enforce secure defaults, fix deployment blockers.
- [ ] **1.1 Remove `.unwrap()` / `.expect()` from production paths**
- Replace `AUTH_CONTEXT.read().expect()` in client RPC with proper `Result`
- Replace `"0.0.0.0:0".parse().unwrap()` in client with fallible parse
- Replace `Mutex::lock().unwrap()` in server storage with `.map_err()`
- Audit: `grep -rn 'unwrap()\|expect(' crates/` outside `#[cfg(test)]`
- [ ] **1.2 Enforce secure defaults in production mode**
- Reject startup if `QPQ_PRODUCTION=true` and `auth_token` is empty or `"devtoken"`
- Require non-empty `db_key` when using SQL backend in production
- Refuse to auto-generate TLS certs in production mode (require existing cert+key)
- Already partially implemented — verify and harden the validation in `config.rs`
- [ ] **1.3 Fix `.gitignore`**
- Add `data/`, `*.der`, `*.pem`, `*.db`, `*.bin` (state files), `*.ks` (keystores)
- Verify no secrets are already tracked: `git ls-files data/ *.der *.db`
- [ ] **1.4 Fix Dockerfile**
- Sync workspace members (handle excluded `p2p` crate)
- Create dedicated user/group instead of `nobody`
- Set writable `QPQ_DATA_DIR` with correct permissions
- Test: `docker build . && docker run --rm -it qpq-server --help`
- [ ] **1.5 TLS certificate lifecycle**
- Document CA-signed cert setup (Let's Encrypt / custom CA)
- Add `--tls-required` flag that refuses to start without valid cert
- Log clear warning when using self-signed certs
- Document certificate rotation procedure
---
## Phase 2 — Test & CI Maturity
Build confidence before adding features.
- [ ] **2.1 Expand E2E test coverage**
- Auth failure scenarios (wrong password, expired token, invalid token)
- Message ordering verification (send N messages, verify seq numbers)
- Concurrent clients (3+ members in group, simultaneous send/recv)
- OPAQUE registration + login full flow
- Queue full behavior (>1000 messages)
- Rate limiting behavior (>100 enqueues/minute)
- Reconnection after server restart
- KeyPackage exhaustion (fetch when none available)
- [ ] **2.2 Add unit tests for untested paths**
- Client retry logic (exponential backoff, jitter, retriable classification)
- REPL input parsing edge cases (empty input, special characters, `/` commands)
- State file encryption/decryption round-trip with bad password
- Token cache expiry
- Conversation store migrations
- [ ] **2.3 CI hardening**
- Add `.github/CODEOWNERS` (crypto, auth, wire-format require 2 reviewers)
- Ensure `cargo deny check` runs on every PR (already in CI — verify)
- Add `cargo audit` as blocking check (already in CI — verify)
- Add coverage reporting (tarpaulin or llvm-cov)
- Add CI job for Docker build validation
- [ ] **2.4 Clean up build warnings**
- Fix Cap'n Proto generated `unused_parens` warnings
- Remove dead code / unused imports
- Address `openmls` future-incompat warnings
- Target: `cargo clippy --workspace -- -D warnings` passes clean
---
## Phase 3 — Client SDKs: Native QUIC + Cap'n Proto Everywhere
**No REST gateway. No protocol dilution.** The `.capnp` schemas are the
interface definition. Every SDK speaks native QUIC + Cap'n Proto. The
project name stays honest.
### Why this matters
The name is **quic**n**proto**chat — the protocol IS the product. Instead
of adding an HTTP translation layer that loses zero-copy performance and
adds base64 overhead, we invest in making the native protocol accessible
from every language that has QUIC + Cap'n Proto support, and provide
WASM/FFI for the crypto layer.
### Architecture
```
Server: QUIC + Cap'n Proto (single protocol, no gateway)
Client SDKs:
┌─── Rust quinn + capnp-rpc (existing, reference impl)
├─── Go quic-go + go-capnp (native, high confidence)
├─── Python aioquic + pycapnp (native QUIC, manual framing)
├─── C/C++ msquic/ngtcp2 + capnproto (reference impl, full RPC)
└─── Browser WebTransport + capnp (WASM) (QUIC transport, no HTTP needed)
Crypto layer (client-side MLS, shared across all SDKs):
┌─── Rust crate (native, existing)
├─── WASM module (browsers, Node.js, Deno)
└─── C FFI (Swift, Kotlin, Python, Go via cgo)
```
### Language support reality check
| Language | QUIC | Cap'n Proto | RPC | Confidence |
|----------|------|-------------|-----|------------|
| **Rust** | quinn ✅ | capnp-rpc ✅ | Full ✅ | Existing |
| **Go** | quic-go ✅ | go-capnp ✅ | Level 1 ✅ | High |
| **Python** | aioquic ✅ | pycapnp ⚠️ | Manual framing | Medium |
| **C/C++** | msquic/ngtcp2 ✅ | capnproto ✅ | Full ✅ | High |
| **Browser** | WebTransport ✅ | WASM ✅ | Via WASM bridge | Medium |
### Implementation
- [ ] **3.1 Go SDK (`quicproquo-go`)**
- Generate Go types: `capnp compile -ogo schemas/node.capnp`
- QUIC transport: `quic-go` with TLS 1.3 + ALPN `"capnp"`
- Cap'n Proto RPC framing over QUIC bidirectional stream
- Auth context: bearer token + session management
- Retry with exponential backoff (mirror Rust client pattern)
- Publish: `go get git.xorwell.de/c/quicproquo-go`
- Example: CLI client matching Rust feature set
- [ ] **3.2 Python SDK (`quicproquo-py`)**
- QUIC transport: `aioquic` with custom Cap'n Proto stream handler
- Cap'n Proto serialization: `pycapnp` for message types
- Manual RPC framing: length-prefixed request/response over QUIC stream
- Async/await API matching the Rust client patterns
- Crypto: PyO3 bindings to `quicproquo-core` for MLS operations
- Publish: PyPI `quicproquo`
- Example: async bot client
- [ ] **3.3 C FFI layer (`quicproquo-ffi`)**
- New crate in workspace: `crates/quicproquo-ffi`
- `cbindgen` to generate `quicproquo.h` C header
- Crypto functions: `qpc_identity_new()`, `qpc_group_create()`,
`qpc_encrypt()`, `qpc_decrypt()`, `qpc_key_package_generate()`
- Transport functions: `qpc_connect()`, `qpc_enqueue()`, `qpc_fetch()`,
`qpc_fetch_wait()` (bundles QUIC + Cap'n Proto internally)
- Memory: caller-allocated buffers with length, no ownership transfer
- Builds as `libquicproquo.so` / `.dylib` / `.dll`
- Swift and Kotlin wrapper examples using the C header
- [ ] **3.4 WASM compilation of `quicproquo-core`**
- `wasm-pack build` target for browser + Node.js
- Crypto-only: `GroupMember`, `IdentityKeypair`, `AppMessage`,
`hybrid_encrypt/decrypt`, `generate_key_package`
- Transport NOT included (browsers use WebTransport, see Phase 3.5)
- Publish to npm: `@quicproquo/core`
- TypeScript type definitions auto-generated via `wasm-bindgen`
- [ ] **3.5 WebTransport server endpoint**
- Add HTTP/3 + WebTransport listener to server (same QUIC stack via quinn)
- Cap'n Proto RPC framed over WebTransport bidirectional streams
- Same auth, same storage, same RPC handlers — just a different stream source
- Browsers connect via `new WebTransport("https://server:7443")`
- ALPN negotiation: `"h3"` for WebTransport, `"capnp"` for native QUIC
- Configurable port: `--webtransport-listen 0.0.0.0:7443`
- Feature-flagged: `--features webtransport`
- [ ] **3.6 TypeScript/JavaScript SDK (`@quicproquo/client`)**
- WebTransport for QUIC connectivity (no HTTP fallback)
- WASM module (Phase 3.4) for MLS crypto
- Cap'n Proto serialization via WASM bridge
- Handles: auth flow, key upload, message send/receive, group management
- Publish to npm: `@quicproquo/client`
- Example: browser chat UI
- [ ] **3.7 SDK documentation and schema publishing**
- Publish `.capnp` schemas as the canonical API contract
- Document the QUIC + Cap'n Proto connection pattern for each language
- Provide a "build your own SDK" guide (QUIC stream → Cap'n Proto RPC bootstrap)
- Reference implementation checklist: connect, auth, upload key, enqueue, fetch
---
## Phase 4 — Trust & Security Infrastructure
Address the security gaps required for real-world deployment.
- [ ] **4.1 Third-party cryptographic audit**
- Scope: MLS integration, OPAQUE flow, hybrid KEM, key lifecycle, zeroization
- Firms: NCC Group, Trail of Bits, Cure53
- Budget and timeline: typically 4-6 weeks, $50K$150K
- Publish report publicly (builds trust)
- [ ] **4.2 Key Transparency / revocation**
- Replace `BasicCredential` with X.509-based MLS credentials
- Or: verifiable key directory (Merkle tree, auditable log)
- Users can verify peer keys haven't been substituted (MITM detection)
- Revocation mechanism for compromised keys
- [ ] **4.3 Client authentication on Delivery Service**
- Currently server trusts claimed identity key on enqueue
- Bind enqueue operations to the authenticated session's identity key
- Prevent: client A fetching/sending as client B's identity
- Backward compat: sealed_sender mode for anonymous enqueue
- [ ] **4.4 M7 — Post-quantum MLS integration**
- Integrate hybrid KEM (X25519 + ML-KEM-768) into the OpenMLS crypto provider
- Group key material gets post-quantum confidentiality
- Full test suite with PQ ciphersuite
- Ref: existing `hybrid_kem.rs` and `hybrid_crypto.rs`
- [ ] **4.5 Username enumeration mitigation**
- Constant-time or uniform response for unknown users during OPAQUE login
- Prevent timing side-channels that reveal user existence
---
## Phase 5 — Features & UX
Make it a product people want to use.
- [ ] **5.1 Multi-device support**
- Account → multiple devices, each with own Ed25519 key + MLS KeyPackages
- Device graph management (add device, remove device, list devices)
- Messages delivered to all devices of a user
- `device_id` field already in Auth struct — wire it through
- [ ] **5.2 Account recovery**
- Recovery codes or backup key (encrypted, stored by user)
- Option: server-assisted recovery with security questions (lower security)
- MLS state re-establishment after device loss
- [ ] **5.3 Full MLS lifecycle**
- Member removal (Remove proposal → Commit → fan-out)
- Credential update (Update proposal for key rotation)
- Explicit proposal handling (queue proposals, batch commit)
- Group metadata (name, description, avatar hash)
- [ ] **5.4 Message editing and deletion**
- New `AppMessage` variants: `Edit { target_seq, new_content }`, `Delete { target_seq }`
- Client-side tombstones, server doesn't know about edits
- [ ] **5.5 File and media transfer**
- Upload encrypted blob → get content hash
- Share hash + symmetric key inside MLS message
- Download by hash, decrypt client-side
- Size limits, content-type validation
- [ ] **5.6 Abuse prevention and moderation**
- Block user (client-side, suppress display)
- Report message (encrypted report to admin key)
- Admin tools: ban user, delete account, audit log
- [ ] **5.7 Offline message queue (client-side)**
- Queue messages when disconnected, send on reconnect
- Idempotent message IDs to prevent duplicates
- Gap detection: compare local seq with server seq
---
## Phase 6 — Scale & Operations
Prepare for real traffic.
- [ ] **6.1 Distributed rate limiting**
- Current: in-memory per-process, lost on restart
- Move to Redis or shared state for multi-node deployments
- Sliding window with configurable thresholds
- [ ] **6.2 Multi-node / horizontal scaling**
- Stateless server design (already mostly there — state is in storage backend)
- Shared PostgreSQL or CockroachDB backend (replace SQLite)
- Message queue fan-out (Redis pub/sub or NATS for cross-node notification)
- Load balancer health check via QUIC RPC `health()` or Prometheus `/metrics`
- [ ] **6.3 Operational runbook**
- Backup / restore procedures (SQLCipher, file backend)
- Key rotation (auth token, TLS cert, DB encryption key)
- Incident response playbook
- Scaling guide (when to add nodes, resource sizing)
- Monitoring dashboard templates (Grafana + Prometheus)
- [ ] **6.4 Connection draining and graceful shutdown**
- Stop accepting new connections on SIGTERM
- Wait for in-flight RPCs (configurable timeout, default 30s)
- Drain WebTransport sessions with close frame
- Document expected behavior for load balancers (health → unhealthy first)
- [ ] **6.5 Request-level timeouts**
- Per-RPC timeout (prevent slow clients from holding resources)
- Database query timeout
- Overall request deadline propagation
- [ ] **6.6 Observability enhancements**
- Request correlation IDs (trace across RPC → storage)
- Storage operation latency metrics
- Per-endpoint latency histograms
- Structured audit log to persistent storage (not just stdout)
- OpenTelemetry integration
---
## Phase 7 — Platform Expansion & Research
Long-term vision for wide adoption.
- [ ] **7.1 Mobile clients (iOS + Android)**
- Use C FFI (Phase 3.3) for crypto + transport (single library)
- Push notifications via APNs / FCM (server sends notification on enqueue)
- Background QUIC connection for message polling
- Biometric auth for local key storage (Keychain / Android Keystore)
- [ ] **7.2 Web client (browser)**
- Use WASM (Phase 3.4) for crypto
- Use WebTransport (Phase 3.5) for native QUIC transport
- Cap'n Proto via WASM bridge (Phase 3.6)
- IndexedDB for local state persistence
- Service Worker for background notifications
- Progressive Web App (PWA) support
- [ ] **7.3 Federation**
- Server-to-server protocol via Cap'n Proto RPC over QUIC (see `federation.capnp`)
- `relayEnqueue`, `proxyFetchKeyPackage`, `federationHealth` methods
- Identity resolution across federated servers
- MLS group spanning multiple servers
- Trust model for federated deployments
- [ ] **7.4 Sealed Sender**
- Sender identity inside MLS ciphertext only (server can't see who sent)
- Requires: sender certificate + encrypted sender proof
- Ref: Signal's Sealed Sender design
- [ ] **7.5 Additional language SDKs**
- Java/Kotlin: JNI bindings to C FFI (Phase 3.3) + native QUIC (netty-quic)
- Swift: Swift wrapper over C FFI + Network.framework QUIC
- Ruby: FFI bindings via `quicproquo-ffi`
- Evaluate demand-driven — only build SDKs people request
- [ ] **7.6 P2P / NAT traversal**
- Direct peer-to-peer via iroh (foundation exists in `quicproquo-p2p`)
- Server as fallback relay only
- Reduces latency and single-point-of-failure
- Ref: `FUTURE-IMPROVEMENTS.md § 6.1`
- [ ] **7.7 Traffic analysis resistance**
- Padding messages to uniform size
- Decoy traffic to mask timing patterns
- Optional Tor/I2P routing for IP privacy
- Ref: `FUTURE-IMPROVEMENTS.md § 5.4, 6.3`
---
## Summary Timeline
| Phase | Focus | Estimated Effort |
|-------|-------|-----------------|
| **1** | Production Hardening | 12 days |
| **2** | Test & CI Maturity | 23 days |
| **3** | Client SDKs (Go, Python, WASM, FFI, WebTransport) | 58 days |
| **4** | Trust & Security Infrastructure | 24 days (excl. audit) |
| **5** | Features & UX | 57 days |
| **6** | Scale & Operations | 35 days |
| **7** | Platform Expansion & Research | ongoing |
---
## Related Documents
- [Future Improvements](docs/FUTURE-IMPROVEMENTS.md) — consolidated improvement list
- [Production Readiness Audit](docs/PRODUCTION-READINESS-AUDIT.md) — specific blockers
- [Security Audit](docs/SECURITY-AUDIT.md) — findings and recommendations
- [Milestone Tracker](docs/src/roadmap/milestones.md) — M1M7 status
- [Auth, Devices, and Tokens](docs/src/roadmap/authz-plan.md) — authorization design
- [DM Channel Design](docs/src/roadmap/dm-channels.md) — 1:1 channel spec