15 KiB
quicproquo v2 — Design Analysis & Recommendations
Multi-perspective retrospective of the v1 architecture. Produced 2026-03-04 by four parallel analysis agents examining server, client/UX, crypto/security, and project structure/DX.
Executive Summary
quicproquo v1 demonstrates strong fundamentals: QUIC-native transport, RFC 9420 MLS group encryption, post-quantum hybrid KEM, OPAQUE zero-knowledge auth, and a working multi-language SDK surface. These are the right bets and put the project ahead of most open-source messengers on the crypto front.
However, three architectural choices limit the path to production:
- capnp-rpc is
!Send— forces single-threaded RPC handling, blocking scalability. - Monolithic client with global state — business logic is tangled into the REPL, duplicated across TUI/GUI/Web, and cannot be used as a library.
- Poll-based delivery — 1-second polling wastes bandwidth and adds latency; no server-push channel exists.
A v2 should keep the crypto stack (MLS + hybrid PQ KEM + OPAQUE), keep QUIC, but rearchitect the RPC layer, extract an SDK crate, and add push-based delivery.
Part 1 — What Works Well
Transport & Protocol
- QUIC (quinn) + TLS 1.3 — correct choice. Built-in encryption, connection migration, 0-RTT potential. No reason to change.
- Cap'n Proto schemas as API contract — zero-copy wire format, compact binary, schema evolution via ordinals. The schemas are good; the RPC runtime is the problem.
Cryptography
- MLS (RFC 9420, openmls) — only IETF-standard group E2E protocol. No realistic alternative for groups > 2 members. Test suite is thorough (1005 lines covering 2-party, 3-party, hybrid, removal, leave, stale epoch).
- Hybrid PQ KEM (X25519 + ML-KEM-768) — forward-thinking dual-algorithm protection. Well-implemented with versioned wire format, proper zeroization, and 12 targeted tests. Ahead of Signal (PQXDH, late 2023) and Matrix (no PQ).
- OPAQUE (RFC 9497) — server never sees passwords. Ristretto255 + Argon2id is best-in-class.
- Sealed sender, safety numbers, message padding — all clean, simple, correct. Safety numbers match Signal's 5200-iteration HMAC-SHA256 cost.
- Zeroization discipline — secrets wrapped in
Zeroizing, Debug impls redact keys, no.unwrap()in crypto paths. - WASM feature gating —
core/nativecleanly separates WASM-safe crypto from native-only modules (MLS, OPAQUE, filesystem).
Server Design
- Store trait abstraction — 30+ methods, clean backend swap (SqlStore vs FileBackedStore). Well-factored.
- OPAQUE auth with timing floors —
resolveUser/resolveIdentitymask lookup timing to prevent username enumeration. - Delivery proofs — Ed25519-signed receipt of server acceptance. Clients get cryptographic evidence.
wasNewflag on createChannel — elegantly solves the dual-MLS-group race condition where both DM parties try to initialize.- Plugin hooks (C-ABI) —
#![no_std]vtable, zero dependencies, chained hooks with continue/reject protocol. Clean extensibility. - Production config validation — enforces encrypted storage, strong auth tokens, pre-existing TLS certs.
Client & DX
- Zero-config local dev —
qpq --username alice --password passauto-starts server, generates TLS certs, registers, and logs in. Genuinely excellent. - Encrypted-at-rest everything — state file (QPCE), conversation DB (SQLCipher), session cache. Argon2id + ChaCha20-Poly1305 throughout.
- Playbook system — YAML-scripted command execution with assertions. Great for CI/integration testing.
- Conversation store — SQLite with deduplication, outbox for offline queuing, activity tracking.
- Conventional commits, GPG-signed — consistent
feat:/fix:/docs:discipline. - Security lints enforced by build —
clippy::unwrap_used = "deny",unsafe_code = "warn".
Part 2 — What Needs Rethinking
2.1 RPC Layer: capnp-rpc is the #1 Scalability Bottleneck
Problem: capnp-rpc uses Rc internally and is !Send. Everything runs on
a LocalSet with spawn_local. All 27 RPC methods serialize through a single
thread. No work-stealing, no multi-core utilization.
Impact: With 1000+ concurrent clients, the single-threaded executor cannot
keep up. A slow fetchWait (30s timeout) blocks the entire connection.
Also: The WebSocket bridge (ws_bridge.rs, 645 lines) exists solely because
Cap'n Proto cannot run in browsers. This duplicates handler logic and creates
maintenance burden.
2.2 Client Architecture: Monolith with Global State
Problem: AUTH_CONTEXT is a process-wide RwLock<Option<ClientAuth>>.
Business logic (MLS processing, sealed sender, hybrid decryption, message
routing) lives inside repl.rs's poll_messages() — a 100-line function that
mixes transport, crypto, routing, and storage.
Impact: Every frontend (REPL, TUI, GUI, Web) must reimplement message processing. The TUI already duplicates it. The GUI stub and mobile PoC would need yet another copy. Client cannot be used as a library.
2.3 Delivery Model: Poll-Based, No Push Channel
Problem: Client polls every 1 second with fetch_wait(timeout_ms=0) — never
actually long-polls. Constant network traffic even when idle. ~1 second latency
for message delivery.
Also: fetch is destructive (drains queue). If the client crashes between
receive and processing, messages are lost.
2.4 Connection Model: Single Stream
Problem: max_concurrent_bidi_streams(1) means the entire QUIC connection is
effectively single-stream. A blocking fetchWait prevents all other RPCs.
2.5 Storage: Single Mutex-Guarded SQLite Connection
Problem: SqlStore uses Mutex<Connection>. Every database operation
acquires a global lock. Under concurrent load, all storage access serializes.
Also: FileBackedStore flushes the entire map on every write (O(n) I/O).
Sessions are in-memory only — server restart forces all clients to re-login.
2.6 Key Management Gaps
- DiskKeyStore — HPKE private keys stored as plaintext bincode on disk. No encryption at rest.
- MLS group state —
GroupMemberholdsMlsGroupin memory only. Process crash loses all group state. - Token zeroization —
AuthContext.token,ClientAuth.access_tokenare not wrapped inZeroizing.
2.7 Workspace Bloat
12 crates for a project at this maturity is excessive. Several are thin stubs
(quicproquo-gen, quicproquo-bot at 354 lines) or broken (quicproquo-gui
fails cargo build --workspace).
Part 3 — v2 Architecture Recommendations
3.1 Replace capnp-rpc with a Send-Compatible RPC Framework
Recommendation: Switch to tonic (gRPC) or a custom framing layer.
| Dimension | capnp-rpc (v1) | tonic/gRPC (v2) |
|---|---|---|
| Threading | !Send, single-threaded |
Send + Sync, multi-threaded |
| Browser | Requires WS bridge | grpc-web native |
| Streaming | Not supported | Built-in |
| Middleware | None (copy-paste auth) | Interceptors/layers |
| Ecosystem | Niche | Massive (every language) |
Alternative: Keep Cap'n Proto schemas for serialization (zero-copy
advantage) but replace capnp-rpc with custom framing over QUIC streams. This
preserves the wire format while gaining Send compatibility.
The WS bridge would be eliminated entirely — grpc-web or WebTransport gives browsers direct access.
3.2 Extract an SDK Crate (Most Important Client Change)
Create quicproquo-sdk that owns all business logic:
quicproquo-sdk/
src/
client.rs -- QpqClient: connect, login, send, receive
events.rs -- ClientEvent enum (push-based)
conversation.rs -- ConversationHandle, group management
crypto.rs -- MLS pipeline, sealed sender, hybrid decryption
sync.rs -- message sync, offline queue, retry
All frontends become thin shells:
CLI/REPL -> calls sdk
TUI -> calls sdk
Tauri GUI -> calls sdk (via Tauri commands)
Mobile -> calls sdk (via C FFI)
Web/WASM -> calls sdk (compiled to wasm32)
Key API shape:
pub struct QpqClient { /* session, rpc, crypto pipeline */ }
impl QpqClient {
pub async fn connect(config: ClientConfig) -> Result<Self>;
pub async fn login(username: &str, password: &str) -> Result<Self>;
pub async fn dm(&mut self, username: &str) -> Result<ConversationHandle>;
pub async fn create_group(&mut self, name: &str) -> Result<ConversationHandle>;
pub async fn send(&mut self, text: &str) -> Result<MessageId>;
pub fn subscribe(&self) -> Receiver<ClientEvent>;
}
No global state. No AUTH_CONTEXT. Auth context is per-QpqClient instance.
3.3 Add Push-Based Delivery
Recommendation: Dedicated QUIC unidirectional stream for server-push notifications.
Client opens bidi stream 0 -> RPC channel (request/response)
Server opens uni stream 1 -> push notifications (new message, typing, etc.)
Benefits:
- Zero-latency message delivery (no polling)
- No idle network traffic
- Typing indicators delivered in real-time
- Graceful degradation: fall back to long-poll if push stream fails
Also: Make peek + ack the default delivery pattern (not destructive
fetch). Add idempotency keys to prevent duplicate messages on retry.
3.4 Multi-Stream Connections
Allow 4-8 concurrent bidirectional QUIC streams per connection. This enables:
- Pipelined RPCs (send while fetching)
- Concurrent blob upload + chat
fetchWaiton one stream without blocking others
3.5 Storage Improvements
| Change | Rationale |
|---|---|
Drop FileBackedStore |
O(n) flush per write, no federation support |
| Connection pool for SQLite | Replace Mutex<Connection> with r2d2/deadpool |
| Persist sessions to DB | Server restart shouldn't force re-login |
| Encrypt DiskKeyStore at rest | HPKE private keys in plaintext is a real vuln |
| Persist MLS group state | Process crash shouldn't lose group state |
| Atomic keystore writes | tempfile-then-rename pattern |
3.6 Crypto Stack Refinements
The algorithms are correct. The refinements are operational:
| Change | Rationale |
|---|---|
| Typed MLS error variants | Stop losing error info via format!("{e:?}") |
| Formalize hybrid PQ ciphersuite ID | Replace length-based key detection |
| Remove all InsecureServerCertVerifier | No TLS bypass on any platform |
| Add passkey/WebAuthn alt-auth | Better UX for GUI/mobile, no password to forget |
| Consider Double Ratchet for 1:1 DMs | MLS is over-engineered for 2-party; DR gives better per-message forward secrecy |
| Token/session secret zeroization | AuthContext.token et al. need Zeroizing wrappers |
| Fix serde deserialization of secrets | Intermediate non-zeroized Vec<u8> in IdentityKeypair::deserialize |
3.7 Workspace Restructuring
Reduce from 12 to 8 crates:
quicproquo-core -- crypto primitives (keep)
quicproquo-proto -- schema codegen (keep)
quicproquo-plugin-api -- #![no_std] C-ABI (keep)
quicproquo-kt -- key transparency (keep)
quicproquo-sdk -- NEW: business logic library
quicproquo-server -- server binary (keep)
quicproquo-client -- CLI/TUI binary, depends on sdk (keep, slimmed)
quicproquo-p2p -- mesh networking (keep, feature-flagged)
Merge/remove:
bot->sdk::botmoduleffi->sdkwith--features c-ffigen->scripts/orxtaskgui->apps/gui/outside workspace (Tauri project)mobile->examples/(research spike)
Add [workspace.default-members] so cargo build doesn't attempt GUI.
Add justfile with build, test, test-e2e, build-wasm, docker.
3.8 Plugin System Evolution
| Change | Rationale |
|---|---|
Add version: u32 to HookVTable |
ABI stability — check version on load |
| Config passthrough | qpq_plugin_init(vtable, config_json) |
| Async hooks | Plugins that call external services shouldn't block Tokio |
| Evaluate WASM plugins | Sandboxed community plugins (keep C-ABI for first-party) |
3.9 Federation Improvements
| Change | Rationale |
|---|---|
| DNS SRV / .well-known discovery | Static peer config doesn't scale |
| Persistent relay queue with retry | Messages to offline peers are currently lost |
| Deterministic channel ID derivation | Avoid cross-server channel conflicts |
| Keep mDNS as optional mesh feature | Not for internet-scale, but good for LAN |
3.10 Test & CI Improvements
| Change | Rationale |
|---|---|
| Per-client auth context | Removes --test-threads 1 constraint |
| Mock server for client unit tests | Fast tests without spawning real server |
| Fuzz testing (cargo-fuzz) | Hybrid KEM, sealed sender, padding, Cap'n Proto deser |
| WS bridge unit tests | 645 lines, zero tests, security-critical |
| WASM + Go SDK in CI | Currently untested in CI |
| Separate E2E from unit test CI job | Different speed, different failure modes |
| macOS CI | FFI/mobile cross-compilation validation |
| Release automation | Binary artifacts, Docker tags, WASM npm publish |
Part 4 — Ecosystem Positioning
Don't compete with Signal or Matrix directly.
Target: Privacy-first messaging infrastructure for developers and organizations.
quicproquo's differentiators — QUIC-native transport, post-quantum crypto, MLS, plugin system, multi-language SDKs, embeddable architecture — point toward an infrastructure play, not a consumer app.
Think: "the Postgres of E2E encrypted messaging" — a high-quality open-source server and protocol that other projects build on.
| Segment | Value Proposition |
|---|---|
| Developer tool | API-first messenger for encrypted bots and integrations |
| Embeddable | C FFI + WASM + Go SDK for embedding in other apps |
| Enterprise | On-prem, plugins for compliance/audit, OPAQUE zero-knowledge auth |
| Research | Post-quantum crypto, MLS reference implementation, mesh networking |
Part 5 — Priority Ordering
Phase 1: Foundation (unblocks everything else)
- Replace capnp-rpc with Send-compatible framework
- Extract SDK crate from client
- Per-client auth context (no global state)
Phase 2: Reliability
- Push-based delivery (QUIC uni-stream)
- Multi-stream connections
- Persist sessions + MLS group state
- Encrypt DiskKeyStore at rest
- peek+ack as default delivery
Phase 3: Polish
- Workspace restructuring (12 -> 8 crates)
- TUI as primary interactive mode (built on SDK)
- Plugin system v2 (versioning, config, async)
- Federation retry queue + discovery
Phase 4: Ecosystem
- Full MLS in WASM (browser E2E)
- WebTransport (eliminate WS bridge)
- Tauri GUI (built on SDK)
- Release automation + expanded CI
Appendix — Analysis Sources
This document was produced by four parallel analysis agents:
| Agent | Scope | Files Read |
|---|---|---|
| server-analyst | Transport, RPC, delivery, storage, federation | 27 server .rs files, 4 schemas, core transport |
| client-analyst | REPL, UX, state, multi-platform, SDK design | All client .rs, GUI, mobile, TS demo |
| security-analyst | MLS, OPAQUE, hybrid KEM, keystore, identity | All core .rs, review doc |
| dx-analyst | Workspace, build, tests, plugins, CI, ecosystem | All Cargo.toml, tests, CI, plugins, SDKs |