Files
quicproquo/ROADMAP.md
Chris Nennemann dc4e4e49a0 feat: Phase 9 — developer experience, extensibility, and community growth
New crates:
- quicproquo-bot: Bot SDK with polling API + JSON pipe mode
- quicproquo-kt: Key Transparency Merkle log (RFC 9162 subset)
- quicproquo-plugin-api: no_std C-compatible plugin vtable API
- quicproquo-gen: scaffolding tool (qpq-gen plugin/bot/rpc/hook)

Server features:
- ServerHooks trait wired into all RPC handlers (enqueue, fetch, auth,
  channel, registration) with plugin rejection support
- Dynamic plugin loader (libloading) with --plugin-dir config
- Delivery proof canary tokens (Ed25519 server signatures on enqueue)
- Key Transparency Merkle log with inclusion proofs on resolveUser

Core library:
- Safety numbers (60-digit HMAC-SHA256 key verification codes)
- Verifiable transcript archive (CBOR + ChaCha20-Poly1305 + hash chain)
- Delivery proof verification utility
- Criterion benchmarks (hybrid KEM, MLS, identity, sealed sender, padding)

Client:
- /verify REPL command for out-of-band key verification
- Full-screen TUI via Ratatui (feature-gated --features tui)
- qpq export / qpq export-verify CLI subcommands
- KT inclusion proof verification on user resolution

Also: ROADMAP Phase 9 added, bot SDK docs, server hooks docs,
crate-responsibilities updated, example plugins (rate_limit, logging).
2026-03-03 22:47:38 +01:00

22 KiB
Raw Blame History

Roadmap — quicproquo

From proof-of-concept to production-grade E2E encrypted messaging.

Each phase is designed to be tackled sequentially. Items within a phase can be parallelised. Check the box when done.


Phase 1 — Production Hardening (Critical)

Eliminate all crash paths, enforce secure defaults, fix deployment blockers.

  • 1.1 Remove .unwrap() / .expect() from production paths

    • Replace AUTH_CONTEXT.read().expect() in client RPC with proper Result
    • Replace "0.0.0.0:0".parse().unwrap() in client with fallible parse
    • Replace Mutex::lock().unwrap() in server storage with .map_err()
    • Audit: grep -rn 'unwrap()\|expect(' crates/ outside #[cfg(test)]
  • 1.2 Enforce secure defaults in production mode

    • Reject startup if QPQ_PRODUCTION=true and auth_token is empty or "devtoken"
    • Require non-empty db_key when using SQL backend in production
    • Refuse to auto-generate TLS certs in production mode (require existing cert+key)
    • Already partially implemented — verify and harden the validation in config.rs
  • 1.3 Fix .gitignore

    • Add data/, *.der, *.pem, *.db, *.bin (state files), *.ks (keystores)
    • Verify no secrets are already tracked: git ls-files data/ *.der *.db
  • 1.4 Fix Dockerfile

    • Sync workspace members (handle excluded p2p crate)
    • Create dedicated user/group instead of nobody
    • Set writable QPQ_DATA_DIR with correct permissions
    • Test: docker build . && docker run --rm -it qpq-server --help
  • 1.5 TLS certificate lifecycle

    • Document CA-signed cert setup (Let's Encrypt / custom CA)
    • Add --tls-required flag that refuses to start without valid cert
    • Log clear warning when using self-signed certs
    • Document certificate rotation procedure

Phase 2 — Test & CI Maturity

Build confidence before adding features.

  • 2.1 Expand E2E test coverage

    • Auth failure scenarios (wrong password, expired token, invalid token)
    • Message ordering verification (send N messages, verify seq numbers)
    • Concurrent clients (3+ members in group, simultaneous send/recv)
    • OPAQUE registration + login full flow
    • Queue full behavior (>1000 messages)
    • Rate limiting behavior (>100 enqueues/minute)
    • Reconnection after server restart
    • KeyPackage exhaustion (fetch when none available)
  • 2.2 Add unit tests for untested paths

    • Client retry logic (exponential backoff, jitter, retriable classification)
    • REPL input parsing edge cases (empty input, special characters, / commands)
    • State file encryption/decryption round-trip with bad password
    • Token cache expiry
    • Conversation store migrations
  • 2.3 CI hardening

    • Add .github/CODEOWNERS (crypto, auth, wire-format require 2 reviewers)
    • Ensure cargo deny check runs on every PR (already in CI — verify)
    • Add cargo audit as blocking check (already in CI — verify)
    • Add coverage reporting (tarpaulin or llvm-cov)
    • Add CI job for Docker build validation
  • 2.4 Clean up build warnings

    • Fix Cap'n Proto generated unused_parens warnings
    • Remove dead code / unused imports
    • Address openmls future-incompat warnings
    • Target: cargo clippy --workspace -- -D warnings passes clean

Phase 3 — Client SDKs: Native QUIC + Cap'n Proto Everywhere

No REST gateway. No protocol dilution. The .capnp schemas are the interface definition. Every SDK speaks native QUIC + Cap'n Proto. The project name stays honest.

Why this matters

The name is quicnprotochat — the protocol IS the product. Instead of adding an HTTP translation layer that loses zero-copy performance and adds base64 overhead, we invest in making the native protocol accessible from every language that has QUIC + Cap'n Proto support, and provide WASM/FFI for the crypto layer.

Architecture

  Server: QUIC + Cap'n Proto (single protocol, no gateway)

  Client SDKs:
    ┌─── Rust         quinn + capnp-rpc          (existing, reference impl)
    ├─── Go           quic-go + go-capnp          (native, high confidence)
    ├─── Python       aioquic + pycapnp            (native QUIC, manual framing)
    ├─── C/C++        msquic/ngtcp2 + capnproto    (reference impl, full RPC)
    └─── Browser      WebTransport + capnp (WASM)  (QUIC transport, no HTTP needed)

  Crypto layer (client-side MLS, shared across all SDKs):
    ┌─── Rust crate   (native, existing)
    ├─── WASM module  (browsers, Node.js, Deno)
    └─── C FFI        (Swift, Kotlin, Python, Go via cgo)

Language support reality check

Language QUIC Cap'n Proto RPC Confidence
Rust quinn capnp-rpc Full Existing
Go quic-go go-capnp Level 1 High
Python aioquic pycapnp ⚠️ Manual framing Medium
C/C++ msquic/ngtcp2 capnproto Full High
Browser WebTransport WASM Via WASM bridge Medium

Implementation

  • 3.1 Go SDK (quicproquo-go)

    • Generate Go types: capnp compile -ogo schemas/node.capnp
    • QUIC transport: quic-go with TLS 1.3 + ALPN "capnp"
    • Cap'n Proto RPC framing over QUIC bidirectional stream
    • Auth context: bearer token + session management
    • Retry with exponential backoff (mirror Rust client pattern)
    • Publish: go get git.xorwell.de/c/quicproquo-go
    • Example: CLI client matching Rust feature set
  • 3.2 Python SDK (quicproquo-py)

    • QUIC transport: aioquic with custom Cap'n Proto stream handler
    • Cap'n Proto serialization: pycapnp for message types
    • Manual RPC framing: length-prefixed request/response over QUIC stream
    • Async/await API matching the Rust client patterns
    • Crypto: PyO3 bindings to quicproquo-core for MLS operations
    • Publish: PyPI quicproquo
    • Example: async bot client
  • 3.3 C FFI layer (quicproquo-ffi)

    • New crate in workspace: crates/quicproquo-ffi
    • cbindgen to generate quicproquo.h C header
    • Crypto functions: qpc_identity_new(), qpc_group_create(), qpc_encrypt(), qpc_decrypt(), qpc_key_package_generate()
    • Transport functions: qpc_connect(), qpc_enqueue(), qpc_fetch(), qpc_fetch_wait() (bundles QUIC + Cap'n Proto internally)
    • Memory: caller-allocated buffers with length, no ownership transfer
    • Builds as libquicproquo.so / .dylib / .dll
    • Swift and Kotlin wrapper examples using the C header
  • 3.4 WASM compilation of quicproquo-core

    • wasm-pack build target for browser + Node.js
    • Crypto-only: GroupMember, IdentityKeypair, AppMessage, hybrid_encrypt/decrypt, generate_key_package
    • Transport NOT included (browsers use WebTransport, see Phase 3.5)
    • Publish to npm: @quicproquo/core
    • TypeScript type definitions auto-generated via wasm-bindgen
  • 3.5 WebTransport server endpoint

    • Add HTTP/3 + WebTransport listener to server (same QUIC stack via quinn)
    • Cap'n Proto RPC framed over WebTransport bidirectional streams
    • Same auth, same storage, same RPC handlers — just a different stream source
    • Browsers connect via new WebTransport("https://server:7443")
    • ALPN negotiation: "h3" for WebTransport, "capnp" for native QUIC
    • Configurable port: --webtransport-listen 0.0.0.0:7443
    • Feature-flagged: --features webtransport
  • 3.6 TypeScript/JavaScript SDK (@quicproquo/client)

    • WebTransport for QUIC connectivity (no HTTP fallback)
    • WASM module (Phase 3.4) for MLS crypto
    • Cap'n Proto serialization via WASM bridge
    • Handles: auth flow, key upload, message send/receive, group management
    • Publish to npm: @quicproquo/client
    • Example: browser chat UI
  • 3.7 SDK documentation and schema publishing

    • Publish .capnp schemas as the canonical API contract
    • Document the QUIC + Cap'n Proto connection pattern for each language
    • Provide a "build your own SDK" guide (QUIC stream → Cap'n Proto RPC bootstrap)
    • Reference implementation checklist: connect, auth, upload key, enqueue, fetch

Phase 4 — Trust & Security Infrastructure

Address the security gaps required for real-world deployment.

  • 4.1 Third-party cryptographic audit

    • Scope: MLS integration, OPAQUE flow, hybrid KEM, key lifecycle, zeroization
    • Firms: NCC Group, Trail of Bits, Cure53
    • Budget and timeline: typically 4-6 weeks, $50K$150K
    • Publish report publicly (builds trust)
  • 4.2 Key Transparency / revocation

    • Replace BasicCredential with X.509-based MLS credentials
    • Or: verifiable key directory (Merkle tree, auditable log)
    • Users can verify peer keys haven't been substituted (MITM detection)
    • Revocation mechanism for compromised keys
  • 4.3 Client authentication on Delivery Service

    • Currently server trusts claimed identity key on enqueue
    • Bind enqueue operations to the authenticated session's identity key
    • Prevent: client A fetching/sending as client B's identity
    • Backward compat: sealed_sender mode for anonymous enqueue
  • 4.4 M7 — Post-quantum MLS integration

    • Integrate hybrid KEM (X25519 + ML-KEM-768) into the OpenMLS crypto provider
    • Group key material gets post-quantum confidentiality
    • Full test suite with PQ ciphersuite
    • Ref: existing hybrid_kem.rs and hybrid_crypto.rs
  • 4.5 Username enumeration mitigation

    • Constant-time or uniform response for unknown users during OPAQUE login
    • Prevent timing side-channels that reveal user existence

Phase 5 — Features & UX

Make it a product people want to use.

  • 5.1 Multi-device support

    • Account → multiple devices, each with own Ed25519 key + MLS KeyPackages
    • Device graph management (add device, remove device, list devices)
    • Messages delivered to all devices of a user
    • device_id field already in Auth struct — wire it through
  • 5.2 Account recovery

    • Recovery codes or backup key (encrypted, stored by user)
    • Option: server-assisted recovery with security questions (lower security)
    • MLS state re-establishment after device loss
  • 5.3 Full MLS lifecycle

    • Member removal (Remove proposal → Commit → fan-out)
    • Credential update (Update proposal for key rotation)
    • Explicit proposal handling (queue proposals, batch commit)
    • Group metadata (name, description, avatar hash)
  • 5.4 Message editing and deletion

    • New AppMessage variants: Edit { target_seq, new_content }, Delete { target_seq }
    • Client-side tombstones, server doesn't know about edits
  • 5.5 File and media transfer

    • Upload encrypted blob → get content hash
    • Share hash + symmetric key inside MLS message
    • Download by hash, decrypt client-side
    • Size limits, content-type validation
  • 5.6 Abuse prevention and moderation

    • Block user (client-side, suppress display)
    • Report message (encrypted report to admin key)
    • Admin tools: ban user, delete account, audit log
  • 5.7 Offline message queue (client-side)

    • Queue messages when disconnected, send on reconnect
    • Idempotent message IDs to prevent duplicates
    • Gap detection: compare local seq with server seq

Phase 6 — Scale & Operations

Prepare for real traffic.

  • 6.1 Distributed rate limiting

    • Current: in-memory per-process, lost on restart
    • Move to Redis or shared state for multi-node deployments
    • Sliding window with configurable thresholds
  • 6.2 Multi-node / horizontal scaling

    • Stateless server design (already mostly there — state is in storage backend)
    • Shared PostgreSQL or CockroachDB backend (replace SQLite)
    • Message queue fan-out (Redis pub/sub or NATS for cross-node notification)
    • Load balancer health check via QUIC RPC health() or Prometheus /metrics
  • 6.3 Operational runbook

    • Backup / restore procedures (SQLCipher, file backend)
    • Key rotation (auth token, TLS cert, DB encryption key)
    • Incident response playbook
    • Scaling guide (when to add nodes, resource sizing)
    • Monitoring dashboard templates (Grafana + Prometheus)
  • 6.4 Connection draining and graceful shutdown

    • Stop accepting new connections on SIGTERM
    • Wait for in-flight RPCs (configurable timeout, default 30s)
    • Drain WebTransport sessions with close frame
    • Document expected behavior for load balancers (health → unhealthy first)
  • 6.5 Request-level timeouts

    • Per-RPC timeout (prevent slow clients from holding resources)
    • Database query timeout
    • Overall request deadline propagation
  • 6.6 Observability enhancements

    • Request correlation IDs (trace across RPC → storage)
    • Storage operation latency metrics
    • Per-endpoint latency histograms
    • Structured audit log to persistent storage (not just stdout)
    • OpenTelemetry integration

Phase 7 — Platform Expansion & Research

Long-term vision for wide adoption.

  • 7.1 Mobile clients (iOS + Android)

    • Use C FFI (Phase 3.3) for crypto + transport (single library)
    • Push notifications via APNs / FCM (server sends notification on enqueue)
    • Background QUIC connection for message polling
    • Biometric auth for local key storage (Keychain / Android Keystore)
  • 7.2 Web client (browser)

    • Use WASM (Phase 3.4) for crypto
    • Use WebTransport (Phase 3.5) for native QUIC transport
    • Cap'n Proto via WASM bridge (Phase 3.6)
    • IndexedDB for local state persistence
    • Service Worker for background notifications
    • Progressive Web App (PWA) support
  • 7.3 Federation

    • Server-to-server protocol via Cap'n Proto RPC over QUIC (see federation.capnp)
    • relayEnqueue, proxyFetchKeyPackage, federationHealth methods
    • Identity resolution across federated servers
    • MLS group spanning multiple servers
    • Trust model for federated deployments
  • 7.4 Sealed Sender

    • Sender identity inside MLS ciphertext only (server can't see who sent)
    • Requires: sender certificate + encrypted sender proof
    • Ref: Signal's Sealed Sender design
  • 7.5 Additional language SDKs

    • Java/Kotlin: JNI bindings to C FFI (Phase 3.3) + native QUIC (netty-quic)
    • Swift: Swift wrapper over C FFI + Network.framework QUIC
    • Ruby: FFI bindings via quicproquo-ffi
    • Evaluate demand-driven — only build SDKs people request
  • 7.6 P2P / NAT traversal

    • Direct peer-to-peer via iroh (foundation exists in quicproquo-p2p)
    • Server as fallback relay only
    • Reduces latency and single-point-of-failure
    • Ref: FUTURE-IMPROVEMENTS.md § 6.1
  • 7.7 Traffic analysis resistance

    • Padding messages to uniform size
    • Decoy traffic to mask timing patterns
    • Optional Tor/I2P routing for IP privacy
    • Ref: FUTURE-IMPROVEMENTS.md § 5.4, 6.3

Phase 8 — Freifunk / Community Mesh Networking

Make qpq a first-class citizen on decentralised, community-operated wireless networks (Freifunk, BATMAN-adv/Babel routing, OpenWrt). Multiple qpq nodes form a federated mesh; clients auto-discover nearby nodes via mDNS; the network functions without any central infrastructure or internet uplink.

Architecture

  Client A ─── mDNS discovery ──► nearby qpq node (LAN / mesh)
                                        │
                               Cap'n Proto federation
                                        │
                               remote qpq node (across mesh)
  • F0 — Re-include quicproquo-p2p in workspace; fix ALPN strings

    • Moved crates/quicproquo-p2p from exclude back into [workspace] members
    • Fixed ALPN b"quicnprotochat/p2p/1"b"quicproquo/p2p/1" (breaking wire change)
    • Fixed federation ALPN b"qnpc-fed"b"quicproquo/federation/1"
    • Feature-gated behind --features mesh on client (keeps iroh out of default builds)
  • F1 — Federation routing in message delivery

    • handle_enqueue and handle_batch_enqueue call federation::routing::resolve_destination()
    • Recipients with a remote home server are relayed via FederationClient::relay_enqueue()
    • mTLS mutual authentication between nodes (both present client certs, validated against shared CA)
    • Config: QPQ_FEDERATION_LISTEN, QPQ_LOCAL_DOMAIN, QPQ_FEDERATION_CERT/KEY/CA
  • F2 — mDNS local peer discovery

    • Server announces _quicproquo._udp.local. on startup via mdns-sd
    • Client: MeshDiscovery::start() browses for nearby nodes (feature-gated)
    • REPL commands: /mesh peers (scan + list), /mesh server <host:port> (note address)
    • Nodes announce: ver=1, server=<host:port>, domain=<local_domain> TXT records
  • F3 — Self-sovereign mesh identity

    • Keypair = identity; OPAQUE password auth becomes optional (opt-in for managed deployments)
    • --mesh startup mode: no AS required, nodes accept any verifiable keypair
    • Bootstrap trust via out-of-band key fingerprint exchange (QR code or short code)
  • F4 — Store-and-forward with TTL

    • Add ttl_secs: u32 to Envelope in node.capnp
    • Relay nodes hold messages for offline peers up to TTL, then discard
    • Gossip-style propagation: each hop decrements a hop counter
    • Enables asynchronous messaging across intermittently connected mesh segments
  • F5 — Lightweight broadcast channels

    • No MLS overhead; symmetric group key distributed out-of-band
    • Gossip delivery: node broadcasts to all peers, peers re-broadcast once
    • Loop prevention via bloom filter on seen message IDs
    • Suitable for community bulletin boards, emergency broadcasts on mesh
  • F6 — Extended /mesh REPL commands

    • /mesh dm <fingerprint> — direct message to peer by key fingerprint (P2P path)
    • /mesh broadcast <channel> — publish to a symmetric broadcast channel
    • /mesh auto — auto-select server with lowest RTT from discovered peers
    • Auto-reconnect: if current server unreachable, fall back to next discovered peer
  • F7 — OpenWrt cross-compilation guide

    • Musl static builds: x86_64-unknown-linux-musl, armv7-unknown-linux-musleabihf, mips-unknown-linux-musl
    • Strip binary: --release + strip → target size < 5 MB for flash storage
    • opkg package manifest for OpenWrt feed
    • procd init script + uci config file for OpenWrt integration
    • CI job: cross-compile and size-check on every release tag
  • F8 — Traffic analysis resistance for mesh

    • Uniform message padding to nearest 256-byte boundary (hides message size)
    • Configurable decoy traffic rate (fake messages to mask send timing)
    • Optional onion routing: 3-hop relay through other mesh nodes (no Tor dependency)
    • Ref: Phase 7.7 for server-side traffic analysis resistance

Phase 9 — Developer Experience & Community Growth

Features designed to attract contributors, create demo/showcase potential, and lower the barrier to entry for non-crypto developers.

  • 9.1 Criterion Benchmark Suite (qpq-bench)

    • Criterion benchmarks for all crypto primitives: hybrid KEM encap/decap, MLS group-add at 10/100/1000 members, epoch rotation, Noise_XX handshake
    • CI publishes HTML benchmark reports as GitHub Actions artifacts
    • Citable numbers — no other project benchmarks MLS + PQ-KEM in Rust
  • 9.2 Safety Numbers (key verification)

    • Derive a 60-digit numeric code from two identity keys (Signal-style)
    • REPL /verify <username> command for out-of-band key verification
    • Pure client-side — no server or wire format changes needed
  • 9.3 Full-Screen TUI (Ratatui + Crossterm)

    • qpq tui launches a full-screen terminal UI: message pane, input bar, channel sidebar with unread counts, MLS epoch indicator
    • Feature-gated --features tui to keep ratatui/crossterm out of default builds
    • Existing REPL and CLI subcommands are unaffected
  • 9.4 Delivery Proof Canary Tokens

    • Server signs Ed25519(SHA-256(message_id || recipient || timestamp)) on enqueue
    • Sender stores proof locally — cryptographic evidence the server queued the message
    • Cap'n Proto schema gains optional deliveryProof: Data on enqueue response
  • 9.5 Verifiable Transcript Archive

    • GroupMember::export_transcript(path, password) writes encrypted, tamper-evident message archive (CBOR records, Argon2id + ChaCha20-Poly1305, Merkle chain)
    • qpq export verify CLI command independently verifies chain integrity
    • Useful for legal discovery, audit, or personal backup
  • 9.6 Key Transparency (Merkle-Log Identity Binding)

    • Append-only Merkle log of (username, identity_key) bindings in the AS
    • Clients receive inclusion proofs alongside key fetches
    • Any client can independently audit the full identity history
    • Lightweight subset of RFC 9162 adapted for identity keys
  • 9.7 Dynamic Server Plugin System

    • Server loads .so/.dylib plugins at runtime from config [plugins] section
    • C-compatible HookVTable via extern "C" — plugins in any language
    • Ships with Rust reference plugin + Python ctypes example
    • Extends existing ServerHooks trait with dynamic dispatch
  • 9.8 PQ Noise Transport Layer

    • Hybrid Noise_XX + ML-KEM-768 handshake for post-quantum transport security
    • Closes the harvest-now-decrypt-later gap on handshake metadata (ADR-006)
    • Feature-gated --features pq-noise; classical Noise_XX default preserved
    • May require extending or forking snow crate's CryptoResolver

Summary Timeline

Phase Focus Estimated Effort
1 Production Hardening 12 days
2 Test & CI Maturity 23 days
3 Client SDKs (Go, Python, WASM, FFI, WebTransport) 58 days
4 Trust & Security Infrastructure 24 days (excl. audit)
5 Features & UX 57 days
6 Scale & Operations 35 days
7 Platform Expansion & Research ongoing
8 Freifunk / Community Mesh ongoing
9 Developer Experience & Community Growth 35 days