# quicproquo v2 — Design Analysis & Recommendations > Multi-perspective retrospective of the v1 architecture. > Produced 2026-03-04 by four parallel analysis agents examining server, > client/UX, crypto/security, and project structure/DX. --- ## Executive Summary quicproquo v1 demonstrates strong fundamentals: QUIC-native transport, RFC 9420 MLS group encryption, post-quantum hybrid KEM, OPAQUE zero-knowledge auth, and a working multi-language SDK surface. These are the right bets and put the project ahead of most open-source messengers on the crypto front. However, three architectural choices limit the path to production: 1. **capnp-rpc is `!Send`** — forces single-threaded RPC handling, blocking scalability. 2. **Monolithic client with global state** — business logic is tangled into the REPL, duplicated across TUI/GUI/Web, and cannot be used as a library. 3. **Poll-based delivery** — 1-second polling wastes bandwidth and adds latency; no server-push channel exists. A v2 should keep the crypto stack (MLS + hybrid PQ KEM + OPAQUE), keep QUIC, but rearchitect the RPC layer, extract an SDK crate, and add push-based delivery. --- ## Part 1 — What Works Well ### Transport & Protocol - **QUIC (quinn) + TLS 1.3** — correct choice. Built-in encryption, connection migration, 0-RTT potential. No reason to change. - **Cap'n Proto schemas as API contract** — zero-copy wire format, compact binary, schema evolution via ordinals. The *schemas* are good; the *RPC runtime* is the problem. ### Cryptography - **MLS (RFC 9420, openmls)** — only IETF-standard group E2E protocol. No realistic alternative for groups > 2 members. Test suite is thorough (1005 lines covering 2-party, 3-party, hybrid, removal, leave, stale epoch). - **Hybrid PQ KEM (X25519 + ML-KEM-768)** — forward-thinking dual-algorithm protection. Well-implemented with versioned wire format, proper zeroization, and 12 targeted tests. Ahead of Signal (PQXDH, late 2023) and Matrix (no PQ). - **OPAQUE (RFC 9497)** — server never sees passwords. Ristretto255 + Argon2id is best-in-class. - **Sealed sender, safety numbers, message padding** — all clean, simple, correct. Safety numbers match Signal's 5200-iteration HMAC-SHA256 cost. - **Zeroization discipline** — secrets wrapped in `Zeroizing`, Debug impls redact keys, no `.unwrap()` in crypto paths. - **WASM feature gating** — `core/native` cleanly separates WASM-safe crypto from native-only modules (MLS, OPAQUE, filesystem). ### Server Design - **Store trait abstraction** — 30+ methods, clean backend swap (SqlStore vs FileBackedStore). Well-factored. - **OPAQUE auth with timing floors** — `resolveUser`/`resolveIdentity` mask lookup timing to prevent username enumeration. - **Delivery proofs** — Ed25519-signed receipt of server acceptance. Clients get cryptographic evidence. - **`wasNew` flag on createChannel** — elegantly solves the dual-MLS-group race condition where both DM parties try to initialize. - **Plugin hooks (C-ABI)** — `#![no_std]` vtable, zero dependencies, chained hooks with continue/reject protocol. Clean extensibility. - **Production config validation** — enforces encrypted storage, strong auth tokens, pre-existing TLS certs. ### Client & DX - **Zero-config local dev** — `qpq --username alice --password pass` auto-starts server, generates TLS certs, registers, and logs in. Genuinely excellent. - **Encrypted-at-rest everything** — state file (QPCE), conversation DB (SQLCipher), session cache. Argon2id + ChaCha20-Poly1305 throughout. - **Playbook system** — YAML-scripted command execution with assertions. Great for CI/integration testing. - **Conversation store** — SQLite with deduplication, outbox for offline queuing, activity tracking. - **Conventional commits, GPG-signed** — consistent `feat:`/`fix:`/`docs:` discipline. - **Security lints enforced by build** — `clippy::unwrap_used = "deny"`, `unsafe_code = "warn"`. --- ## Part 2 — What Needs Rethinking ### 2.1 RPC Layer: capnp-rpc is the #1 Scalability Bottleneck **Problem:** `capnp-rpc` uses `Rc` internally and is `!Send`. Everything runs on a `LocalSet` with `spawn_local`. All 27 RPC methods serialize through a single thread. No work-stealing, no multi-core utilization. **Impact:** With 1000+ concurrent clients, the single-threaded executor cannot keep up. A slow `fetchWait` (30s timeout) blocks the entire connection. **Also:** The WebSocket bridge (`ws_bridge.rs`, 645 lines) exists solely because Cap'n Proto cannot run in browsers. This duplicates handler logic and creates maintenance burden. ### 2.2 Client Architecture: Monolith with Global State **Problem:** `AUTH_CONTEXT` is a process-wide `RwLock>`. Business logic (MLS processing, sealed sender, hybrid decryption, message routing) lives inside `repl.rs`'s `poll_messages()` — a 100-line function that mixes transport, crypto, routing, and storage. **Impact:** Every frontend (REPL, TUI, GUI, Web) must reimplement message processing. The TUI already duplicates it. The GUI stub and mobile PoC would need yet another copy. Client cannot be used as a library. ### 2.3 Delivery Model: Poll-Based, No Push Channel **Problem:** Client polls every 1 second with `fetch_wait(timeout_ms=0)` — never actually long-polls. Constant network traffic even when idle. ~1 second latency for message delivery. **Also:** `fetch` is destructive (drains queue). If the client crashes between receive and processing, messages are lost. ### 2.4 Connection Model: Single Stream **Problem:** `max_concurrent_bidi_streams(1)` means the entire QUIC connection is effectively single-stream. A blocking `fetchWait` prevents all other RPCs. ### 2.5 Storage: Single Mutex-Guarded SQLite Connection **Problem:** `SqlStore` uses `Mutex`. Every database operation acquires a global lock. Under concurrent load, all storage access serializes. **Also:** `FileBackedStore` flushes the entire map on every write (O(n) I/O). Sessions are in-memory only — server restart forces all clients to re-login. ### 2.6 Key Management Gaps - **DiskKeyStore** — HPKE private keys stored as plaintext bincode on disk. No encryption at rest. - **MLS group state** — `GroupMember` holds `MlsGroup` in memory only. Process crash loses all group state. - **Token zeroization** — `AuthContext.token`, `ClientAuth.access_token` are not wrapped in `Zeroizing`. ### 2.7 Workspace Bloat 12 crates for a project at this maturity is excessive. Several are thin stubs (`quicproquo-gen`, `quicproquo-bot` at 354 lines) or broken (`quicproquo-gui` fails `cargo build --workspace`). --- ## Part 3 — v2 Architecture Recommendations ### 3.1 Replace capnp-rpc with a Send-Compatible RPC Framework **Recommendation:** Switch to **tonic (gRPC)** or a custom framing layer. | Dimension | capnp-rpc (v1) | tonic/gRPC (v2) | |-----------|---------------|-----------------| | Threading | `!Send`, single-threaded | `Send + Sync`, multi-threaded | | Browser | Requires WS bridge | grpc-web native | | Streaming | Not supported | Built-in | | Middleware | None (copy-paste auth) | Interceptors/layers | | Ecosystem | Niche | Massive (every language) | **Alternative:** Keep Cap'n Proto *schemas* for serialization (zero-copy advantage) but replace capnp-rpc with custom framing over QUIC streams. This preserves the wire format while gaining `Send` compatibility. The WS bridge would be eliminated entirely — grpc-web or WebTransport gives browsers direct access. ### 3.2 Extract an SDK Crate (Most Important Client Change) Create `quicproquo-sdk` that owns all business logic: ``` quicproquo-sdk/ src/ client.rs -- QpqClient: connect, login, send, receive events.rs -- ClientEvent enum (push-based) conversation.rs -- ConversationHandle, group management crypto.rs -- MLS pipeline, sealed sender, hybrid decryption sync.rs -- message sync, offline queue, retry ``` All frontends become thin shells: ``` CLI/REPL -> calls sdk TUI -> calls sdk Tauri GUI -> calls sdk (via Tauri commands) Mobile -> calls sdk (via C FFI) Web/WASM -> calls sdk (compiled to wasm32) ``` **Key API shape:** ```rust pub struct QpqClient { /* session, rpc, crypto pipeline */ } impl QpqClient { pub async fn connect(config: ClientConfig) -> Result; pub async fn login(username: &str, password: &str) -> Result; pub async fn dm(&mut self, username: &str) -> Result; pub async fn create_group(&mut self, name: &str) -> Result; pub async fn send(&mut self, text: &str) -> Result; pub fn subscribe(&self) -> Receiver; } ``` No global state. No `AUTH_CONTEXT`. Auth context is per-`QpqClient` instance. ### 3.3 Add Push-Based Delivery **Recommendation:** Dedicated QUIC unidirectional stream for server-push notifications. ``` Client opens bidi stream 0 -> RPC channel (request/response) Server opens uni stream 1 -> push notifications (new message, typing, etc.) ``` Benefits: - Zero-latency message delivery (no polling) - No idle network traffic - Typing indicators delivered in real-time - Graceful degradation: fall back to long-poll if push stream fails **Also:** Make `peek` + `ack` the default delivery pattern (not destructive `fetch`). Add idempotency keys to prevent duplicate messages on retry. ### 3.4 Multi-Stream Connections Allow 4-8 concurrent bidirectional QUIC streams per connection. This enables: - Pipelined RPCs (send while fetching) - Concurrent blob upload + chat - `fetchWait` on one stream without blocking others ### 3.5 Storage Improvements | Change | Rationale | |--------|-----------| | Drop `FileBackedStore` | O(n) flush per write, no federation support | | Connection pool for SQLite | Replace `Mutex` with r2d2/deadpool | | Persist sessions to DB | Server restart shouldn't force re-login | | Encrypt DiskKeyStore at rest | HPKE private keys in plaintext is a real vuln | | Persist MLS group state | Process crash shouldn't lose group state | | Atomic keystore writes | tempfile-then-rename pattern | ### 3.6 Crypto Stack Refinements The algorithms are correct. The refinements are operational: | Change | Rationale | |--------|-----------| | Typed MLS error variants | Stop losing error info via `format!("{e:?}")` | | Formalize hybrid PQ ciphersuite ID | Replace length-based key detection | | Remove all InsecureServerCertVerifier | No TLS bypass on any platform | | Add passkey/WebAuthn alt-auth | Better UX for GUI/mobile, no password to forget | | Consider Double Ratchet for 1:1 DMs | MLS is over-engineered for 2-party; DR gives better per-message forward secrecy | | Token/session secret zeroization | `AuthContext.token` et al. need `Zeroizing` wrappers | | Fix serde deserialization of secrets | Intermediate non-zeroized `Vec` in `IdentityKeypair::deserialize` | ### 3.7 Workspace Restructuring **Reduce from 12 to 8 crates:** ``` quicproquo-core -- crypto primitives (keep) quicproquo-proto -- schema codegen (keep) quicproquo-plugin-api -- #![no_std] C-ABI (keep) quicproquo-kt -- key transparency (keep) quicproquo-sdk -- NEW: business logic library quicproquo-server -- server binary (keep) quicproquo-client -- CLI/TUI binary, depends on sdk (keep, slimmed) quicproquo-p2p -- mesh networking (keep, feature-flagged) ``` **Merge/remove:** - `bot` -> `sdk::bot` module - `ffi` -> `sdk` with `--features c-ffi` - `gen` -> `scripts/` or `xtask` - `gui` -> `apps/gui/` outside workspace (Tauri project) - `mobile` -> `examples/` (research spike) **Add `[workspace.default-members]`** so `cargo build` doesn't attempt GUI. **Add `justfile`** with `build`, `test`, `test-e2e`, `build-wasm`, `docker`. ### 3.8 Plugin System Evolution | Change | Rationale | |--------|-----------| | Add `version: u32` to `HookVTable` | ABI stability — check version on load | | Config passthrough | `qpq_plugin_init(vtable, config_json)` | | Async hooks | Plugins that call external services shouldn't block Tokio | | Evaluate WASM plugins | Sandboxed community plugins (keep C-ABI for first-party) | ### 3.9 Federation Improvements | Change | Rationale | |--------|-----------| | DNS SRV / .well-known discovery | Static peer config doesn't scale | | Persistent relay queue with retry | Messages to offline peers are currently lost | | Deterministic channel ID derivation | Avoid cross-server channel conflicts | | Keep mDNS as optional mesh feature | Not for internet-scale, but good for LAN | ### 3.10 Test & CI Improvements | Change | Rationale | |--------|-----------| | Per-client auth context | Removes `--test-threads 1` constraint | | Mock server for client unit tests | Fast tests without spawning real server | | Fuzz testing (cargo-fuzz) | Hybrid KEM, sealed sender, padding, Cap'n Proto deser | | WS bridge unit tests | 645 lines, zero tests, security-critical | | WASM + Go SDK in CI | Currently untested in CI | | Separate E2E from unit test CI job | Different speed, different failure modes | | macOS CI | FFI/mobile cross-compilation validation | | Release automation | Binary artifacts, Docker tags, WASM npm publish | --- ## Part 4 — Ecosystem Positioning ### Don't compete with Signal or Matrix directly. **Target: Privacy-first messaging infrastructure for developers and organizations.** quicproquo's differentiators — QUIC-native transport, post-quantum crypto, MLS, plugin system, multi-language SDKs, embeddable architecture — point toward an infrastructure play, not a consumer app. Think: *"the Postgres of E2E encrypted messaging"* — a high-quality open-source server and protocol that other projects build on. | Segment | Value Proposition | |---------|-------------------| | **Developer tool** | API-first messenger for encrypted bots and integrations | | **Embeddable** | C FFI + WASM + Go SDK for embedding in other apps | | **Enterprise** | On-prem, plugins for compliance/audit, OPAQUE zero-knowledge auth | | **Research** | Post-quantum crypto, MLS reference implementation, mesh networking | --- ## Part 5 — Priority Ordering ### Phase 1: Foundation (unblocks everything else) 1. Replace capnp-rpc with Send-compatible framework 2. Extract SDK crate from client 3. Per-client auth context (no global state) ### Phase 2: Reliability 4. Push-based delivery (QUIC uni-stream) 5. Multi-stream connections 6. Persist sessions + MLS group state 7. Encrypt DiskKeyStore at rest 8. peek+ack as default delivery ### Phase 3: Polish 9. Workspace restructuring (12 -> 8 crates) 10. TUI as primary interactive mode (built on SDK) 11. Plugin system v2 (versioning, config, async) 12. Federation retry queue + discovery ### Phase 4: Ecosystem 13. Full MLS in WASM (browser E2E) 14. WebTransport (eliminate WS bridge) 15. Tauri GUI (built on SDK) 16. Release automation + expanded CI --- ## Appendix — Analysis Sources This document was produced by four parallel analysis agents: | Agent | Scope | Files Read | |-------|-------|-----------| | server-analyst | Transport, RPC, delivery, storage, federation | 27 server .rs files, 4 schemas, core transport | | client-analyst | REPL, UX, state, multi-platform, SDK design | All client .rs, GUI, mobile, TS demo | | security-analyst | MLS, OPAQUE, hybrid KEM, keystore, identity | All core .rs, review doc | | dx-analyst | Workspace, build, tests, plugins, CI, ecosystem | All Cargo.toml, tests, CI, plugins, SDKs |