Files
quicproquo/docs/V2-MASTER-PLAN.md

329 lines
15 KiB
Markdown

# quicproquo v2 — Master Implementation Plan
> Created 2026-03-04. This is the authoritative plan for the v2 rewrite.
> See also: `docs/V2-DESIGN-ANALYSIS.md` for the detailed retrospective.
## Context
The v1 codebase has strong crypto foundations (MLS, hybrid PQ KEM, OPAQUE) but three
architectural bottlenecks: capnp-rpc is `!Send` (single-threaded), client business logic
is trapped in a monolithic REPL with global state, and delivery is poll-based.
This plan creates v2 on a new branch, keeping the crypto stack intact and replacing
the RPC/transport layer, extracting an SDK, and restructuring the workspace.
**Key decisions:**
- Transport: Protobuf (prost) + custom framing over QUIC (quinn)
- Mobile: Tauri 2 (same Rust SDK backend, web UI)
- Branch strategy: `v2` branch from main, not a fresh repo
- Constraints: Rust, QUIC, GPG-signed commits, zeroize secrets, no stubs
---
## Architecture Overview
```
┌─────────────────────────────────────────────────────┐
│ Frontends │
│ CLI/TUI │ Tauri GUI/Mobile │ Web (WebTransport)│
└─────┬─────┴────────┬───────────┴──────────┬─────────┘
│ │ │
▼ ▼ ▼
┌─────────────────────────────────────────────────────┐
│ quicproquo-sdk │
│ QpqClient { connect, login, send, recv, subscribe } │
│ Event system (tokio broadcast) │
│ Crypto pipeline (MLS, sealed sender, hybrid) │
│ Conversation store (SQLCipher) │
└──────────────────────┬──────────────────────────────┘
┌─────────────────────────────────────────────────────┐
│ quicproquo-rpc │
│ QUIC framing: [method:u16][req_id:u32][len:u32][pb] │
│ Multi-stream (1 RPC per stream) │
│ Server-push via uni-streams │
│ tower middleware (auth, rate-limit) │
└──────────────────────┬──────────────────────────────┘
┌─────────────────────────────────────────────────────┐
│ quicproquo-server │
│ Domain services (auth, delivery, channel, blob) │
│ Store trait → SqlStore (connection pool) │
│ Plugin hooks, federation, KT │
└─────────────────────────────────────────────────────┘
```
### Wire Format
Per QUIC bidirectional stream (request/response):
```
Request: [method_id: u16][request_id: u32][payload_len: u32][protobuf bytes]
Response: [status: u8][request_id: u32][payload_len: u32][protobuf bytes]
```
Per QUIC unidirectional stream (server → client push):
```
Push: [event_type: u16][payload_len: u32][protobuf bytes]
```
Each RPC opens its own QUIC bidi stream → natural multi-stream, no head-of-line blocking.
---
## Workspace Structure (v2: 9 crates)
```
quicproquo/
├── crates/
│ ├── quicproquo-core/ # KEEP AS-IS — crypto primitives, MLS, hybrid KEM
│ ├── quicproquo-kt/ # KEEP AS-IS — key transparency
│ ├── quicproquo-plugin-api/ # KEEP AS-IS — #![no_std] C-ABI
│ ├── quicproquo-proto/ # REWRITE — protobuf schemas + prost codegen
│ ├── quicproquo-rpc/ # NEW — QUIC RPC framework (framing, dispatch, tower)
│ ├── quicproquo-sdk/ # NEW — client business logic library
│ ├── quicproquo-server/ # REWRITE — domain services + RPC handlers
│ ├── quicproquo-client/ # REWRITE — thin CLI/TUI shell over SDK
│ └── quicproquo-p2p/ # KEEP — iroh mesh (feature-flagged, later)
├── apps/
│ └── gui/ # Tauri 2 desktop + mobile app (outside workspace)
├── proto/ # .proto source files
│ └── qpq/v1/
│ ├── auth.proto # OPAQUE registration + login (4 methods)
│ ├── delivery.proto # enqueue, fetch, peek, ack, batch (6 methods)
│ ├── keys.proto # key package + hybrid key CRUD (5 methods)
│ ├── channel.proto # channel create (1 method)
│ ├── user.proto # resolve user/identity (2 methods)
│ ├── blob.proto # upload/download (2 methods)
│ ├── device.proto # register/list/revoke (3 methods)
│ ├── p2p.proto # endpoint publish/resolve + health (3 methods)
│ ├── federation.proto # relay + proxy (6 methods)
│ ├── push.proto # server-push events (NEW)
│ └── common.proto # shared types (Auth, Envelope, Error)
├── sdks/
│ ├── go/ # Go SDK (regenerate from .proto)
│ └── typescript/ # TS SDK (WebTransport client)
├── justfile # NEW — build commands
└── Cargo.toml # workspace root
```
**Removed from workspace:**
- `quicproquo-bot``sdk::bot` module
- `quicproquo-ffi``sdk` with `--features c-ffi`
- `quicproquo-gen``scripts/`
- `quicproquo-gui``apps/gui/` (Tauri project, outside workspace)
- `quicproquo-mobile` → merged into `apps/gui/` (Tauri 2 mobile)
---
## Crate Reuse Assessment
| v1 Crate | capnp deps? | v2 Action | Effort |
|----------|:-----------:|-----------|--------|
| **quicproquo-core** | None | Copy as-is | Zero |
| **quicproquo-kt** | None | Copy as-is | Zero |
| **quicproquo-plugin-api** | None | Copy as-is | Zero |
| **quicproquo-p2p** | None | Copy as-is | Zero |
| **quicproquo-proto** | 100% capnp | Replace with prost codegen | Medium |
| **quicproquo-server** | 16/20 files | Extract domain logic, rewrite handlers | High |
| **quicproquo-client** | 6/10 files | Extract to SDK, thin CLI shell | High |
### Key Files to Reuse Directly
| Source (v1) | Destination (v2) | Notes |
|-------------|------------------|-------|
| `crates/quicproquo-core/` (entire) | same path | Zero changes |
| `crates/quicproquo-kt/` (entire) | same path | Zero changes |
| `crates/quicproquo-plugin-api/` (entire) | same path | Zero changes |
| `server/src/storage.rs` | `server/src/storage.rs` | Store trait — keep |
| `server/src/sql_store.rs` | `server/src/sql_store.rs` | Add connection pool |
| `server/src/hooks.rs` | `server/src/hooks.rs` | Plugin system — keep |
| `server/src/plugin_loader.rs` | `server/src/plugin_loader.rs` | Keep |
| `server/src/error_codes.rs` | `server/src/error_codes.rs` | Keep |
| `server/src/config.rs` | `server/src/config.rs` | Update for new transport |
| `client/src/conversation.rs` | `sdk/src/conversation.rs` | Move to SDK |
| `client/src/token_cache.rs` | `sdk/src/token_cache.rs` | Move to SDK |
| `client/src/display.rs` | `client/src/display.rs` | Keep in CLI |
| `schemas/*.capnp` | reference only | Translate to .proto |
---
## Phased Implementation
### Phase 1: Foundation
**Goal:** v2 branch with new workspace, proto schemas, RPC framework skeleton, SDK skeleton.
**Scope:** Compiles, no runtime functionality yet.
1. **Create v2 branch** from main
2. **Restructure workspace** — update root Cargo.toml, create new crate dirs, add justfile
3. **Write .proto files** — translate all 33 RPC methods + push events from Cap'n Proto
4. **Create quicproquo-proto crate** — prost-build codegen
5. **Create quicproquo-rpc crate** — QUIC RPC framework:
- `framing.rs` — wire format encode/decode (request, response, push)
- `server.rs` — accept QUIC connections, dispatch to handlers
- `client.rs` — connect, send requests, receive responses + push events
- `middleware.rs` — tower-based auth + rate-limit layers
- `method.rs` — method registry (method_id → async handler fn)
6. **Create quicproquo-sdk crate** — public API skeleton:
- `client.rs``QpqClient` struct
- `events.rs``ClientEvent` enum
- `conversation.rs``ConversationHandle`, `ConversationStore`
- `config.rs``ClientConfig`
7. **Extract server domain types**`server/src/domain/` module:
- `types.rs` — plain Rust request/response types
- `auth.rs` — OPAQUE logic extracted from auth_ops.rs
- `delivery.rs` — enqueue/fetch logic extracted from delivery.rs
**Verification:**
- `cargo build --workspace` succeeds
- `cargo test -p quicproquo-core` passes (72 tests)
- Proto codegen works
- RPC framework compiles
---
### Phase 2: Server Core
**Goal:** Working server with all 33 RPC handlers over QUIC.
1. **RPC dispatch** — method registry, connection lifecycle
2. **Domain handlers** — all 33 methods as `async fn(Request) -> Result<Response>`
- Auth (4): OPAQUE register start/finish, login start/finish
- Delivery (6): enqueue, fetch, fetchWait, peek, ack, batchEnqueue
- Keys (5): upload/fetch key package, upload/fetch/batch-fetch hybrid key
- Channels (1): createChannel
- Users (2): resolveUser, resolveIdentity
- Blobs (2): uploadBlob, downloadBlob
- Devices (3): registerDevice, listDevices, revokeDevice
- P2P (3): health, publishEndpoint, resolveEndpoint
- Federation (6): relay enqueue/batch, proxy fetch/resolve, health
3. **Server-push** — notification stream via QUIC uni-stream
4. **Storage upgrades:**
- Drop `FileBackedStore`
- Connection pool (deadpool-sqlite)
- Persist sessions to SQLite
- Atomic queue depth check + enqueue
5. **Tower middleware** — auth validation, rate limiting, audit logging
6. **Multi-stream** — concurrent RPCs per connection (remove 1-stream limit)
**Verification:**
- Server starts, accepts QUIC connections
- Health check RPC works
- OPAQUE registration + login works
- Message enqueue + fetch round-trip
---
### Phase 3: SDK
**Goal:** Complete client SDK library — the heart of v2.
1. **QpqClient** — connect, OPAQUE auth, session management (no global state)
2. **Crypto pipeline** — MLS processing, sealed sender unwrap, hybrid decrypt
(extracted from repl.rs `poll_messages()`)
3. **Conversation management** — create DM, create group, invite, remove, send, receive
4. **Event system**`tokio::broadcast<ClientEvent>` replacing poll loop
- `MessageReceived`, `TypingIndicator`, `ConversationCreated`
- `MemberJoined`, `MemberLeft`, `ConnectionLost`, `Reconnected`
5. **Offline support** — outbox queue, retry with backoff, sync on reconnect
6. **ConversationStore** — SQLCipher local DB (migrate from client/conversation.rs)
7. **Key management** — encrypted DiskKeyStore, MLS group state persistence
8. **Token/secret zeroization**`AuthContext.token` etc. wrapped in `Zeroizing`
**Verification:**
- SDK integration test: connect → login → create DM → send → receive
- No global state (`AUTH_CONTEXT` eliminated)
- Event subscription works
- Offline outbox drains on reconnect
---
### Phase 4: Client
**Goal:** CLI and TUI as thin shells over SDK.
1. **CLI binary** (`qpq`) — clap subcommands calling `QpqClient`
2. **REPL** — readline with tab-completion (rustyline), categorized `/help`
3. **TUI** — ratatui, subscribes to `QpqClient::subscribe()` events
4. **Simplified commands:**
- Hide MLS/KeyPackage internals (auto-refresh)
- Message references by short ID (not index)
- Batch operations (`/create-group team alice bob`)
- Categorized help (Chat, Groups, Security, System)
5. **Auto-server-launch** — keep zero-config DX from v1
6. **Playbook system** — keep YAML-based test scripting
**Verification:**
- `qpq --username alice --password pass` starts REPL (same UX as v1)
- TUI mode works with live event updates
- Tab-completion for commands and usernames
- E2E test: two clients exchange messages
---
### Phase 5: Desktop & Mobile
**Goal:** Tauri 2 app for all platforms.
1. **Tauri 2 project** in `apps/gui/`
2. **Rust backend** — Tauri commands wrapping `QpqClient`
3. **Web frontend** — Svelte or vanilla HTML/JS
4. **Desktop** — Linux, macOS, Windows
5. **Mobile** — iOS, Android via Tauri 2 mobile
6. **QUIC connection migration** — automatic wifi↔cellular handoff
**Verification:**
- Desktop app builds and runs on Linux
- Mobile app builds for Android (emulator)
- Send message from CLI → received in GUI
---
### Phase 6: Polish & Ecosystem
**Goal:** Production readiness.
1. **Federation improvements** — DNS SRV discovery, persistent relay queue with retry
2. **Plugin system v2** — version field, config passthrough, async hooks, WASM plugins
3. **WebTransport** — browser clients over HTTP/3 (same quinn endpoint)
4. **WASM MLS** — compile openmls to wasm32 for browser E2E encryption
5. **CI/CD** — release automation, WASM CI, multi-platform (Linux + macOS)
6. **Security hardening:**
- Fuzz testing (hybrid KEM, sealed sender, padding, protobuf deser)
- Remove all `InsecureServerCertVerifier` paths
- Certificate pinning
- Add passkey/WebAuthn as alternative auth
7. **Double Ratchet for 1:1 DMs** — better per-message forward secrecy than MLS for 2-party
---
## RPC Method Inventory (33 total)
| Category | Methods | Proto File |
|----------|---------|-----------|
| Auth (OPAQUE) | opaqueRegisterStart, opaqueRegisterFinish, opaqueLoginStart, opaqueLoginFinish | auth.proto |
| Delivery | enqueue, fetch, fetchWait, peek, ack, batchEnqueue | delivery.proto |
| Keys | uploadKeyPackage, fetchKeyPackage, uploadHybridKey, fetchHybridKey, fetchHybridKeys | keys.proto |
| Channel | createChannel | channel.proto |
| User | resolveUser, resolveIdentity | user.proto |
| Blob | uploadBlob, downloadBlob | blob.proto |
| Device | registerDevice, listDevices, revokeDevice | device.proto |
| P2P | health, publishEndpoint, resolveEndpoint | p2p.proto |
| Federation | relayEnqueue, relayBatchEnqueue, proxyFetchKeyPackage, proxyFetchHybridKey, proxyResolveUser, federationHealth | federation.proto |
**New in v2:**
| Push Events | Description | Proto File |
|-------------|-------------|-----------|
| MessageNotification | New message available | push.proto |
| TypingNotification | Peer is typing | push.proto |
| ChannelUpdate | Channel created/member changed | push.proto |
| SessionExpired | Auth session expired | push.proto |
---
## Engineering Standards (carried from v1)
- Conventional commits: `feat:`, `fix:`, `chore:`, `docs:`, `test:`, `refactor:`
- GPG-signed commits only
- No `Co-authored-by` trailers
- No `.unwrap()` on crypto or I/O in non-test paths
- Secrets: zeroize on drop, never in logs
- No stubs / `todo!()` / `unimplemented!()` in production code
- `clippy::unwrap_used = "deny"` at workspace level