chore: prepare repository for public release

- Add split licensing: AGPL-3.0 for server, Apache-2.0/MIT for all
  other crates and SDKs (Signal-style)
- Add SECURITY.md with vulnerability disclosure policy
- Add CONTRIBUTING.md with build, test, and code standards
- Add "not audited" security disclaimer to README
- Add workspace package metadata (license, repository, keywords)
- Move internal planning docs to docs/internal/ (gitignored)
This commit is contained in:
2026-03-06 20:51:30 +01:00
parent aa29d3bc34
commit a9d1f535aa
24 changed files with 1020 additions and 1808 deletions

View File

@@ -1,483 +0,0 @@
# quicproquo — AI Agent Team Specification
> A structured multi-agent system for bringing quicproquo from working prototype
> to production-grade, audited, documented, deployable software.
---
## Philosophy
This team exists because shipping production software requires more than writing
code. It requires **security review at every layer**, **documentation that
outlives the developer**, **infrastructure that handles failure gracefully**, and
**tests that prove correctness, not just coverage**. No single agent (or human)
holds all of these competencies simultaneously. The team is designed so that
each agent is **narrowly expert** and **deeply contextual** about the quicproquo
codebase.
### Principles
1. **Read before write.** Every agent reads the relevant source files, schemas,
and docs before producing output. No agent guesses at code structure.
2. **Scope discipline.** Agents only touch their assigned crates and concern
areas. A server-dev never edits client code. A security auditor never edits
production code.
3. **Security is not optional.** Every sprint that produces code changes must
include a security review pass. This is not a suggestion — it is a gate.
4. **Docs are a deliverable.** Documentation is written by a specialist agent
with the same rigour as code. API docs, architecture docs, and user guides
are first-class outputs.
5. **Incremental, verifiable progress.** Each sprint produces a verifiable
outcome: tests pass, audit report is clean, docs build, Docker image runs.
---
## Team Roster
### Development Agents
| Agent | Scope | Tools | Edits Code? |
|-------|-------|-------|-------------|
| `rust-architect` | Architecture design, ADRs, crate boundary review | Read, Glob, Grep | No |
| `rust-core-dev` | `quicproquo-core`: crypto, MLS, Noise, hybrid KEM | Read, Glob, Grep, Edit, Write, Bash | Yes |
| `rust-server-dev` | `quicproquo-server`: AS, DS, RPC, storage, federation | Read, Glob, Grep, Edit, Write, Bash | Yes |
| `rust-client-dev` | `quicproquo-client`: CLI, REPL, OPAQUE, local state | Read, Glob, Grep, Edit, Write, Bash | Yes |
### Security Agents
| Agent | Scope | Tools | Edits Code? |
|-------|-------|-------|-------------|
| `security-auditor` | Code review, finding report, threat analysis | Read, Glob, Grep | No |
### Quality Agents
| Agent | Scope | Tools | Edits Code? |
|-------|-------|-------|-------------|
| `test-engineer` | Unit, integration, E2E, property tests, coverage | Read, Glob, Grep, Edit, Write, Bash | Yes (tests only) |
| `devops-engineer` | Docker, CI/CD, deployment, monitoring, infrastructure | Read, Glob, Grep, Edit, Write, Bash | Yes |
### Documentation Agents
| Agent | Scope | Tools | Edits Code? |
|-------|-------|-------|-------------|
| `docs-engineer` | User guides, API docs, architecture docs, mdBook | Read, Glob, Grep, Edit, Write, Bash | Yes (docs only) |
### Coordination Agents
| Agent | Scope | Tools | Edits Code? |
|-------|-------|-------|-------------|
| `roadmap-tracker` | Progress assessment, status reports, blocker analysis | Read, Glob, Grep | No |
---
## Agent Role Specifications
### rust-architect
**Identity:** Senior Rust systems architect with deep knowledge of MLS
(RFC 9420), Noise Protocol Framework, Cap'n Proto RPC, and post-quantum
cryptography.
**Reads:** `master-prompt.md`, `ROADMAP.md`, all `.capnp` schemas, crate
`lib.rs` and `mod.rs` files, `Cargo.toml` dependency lists.
**Produces:**
- Architecture Decision Records (ADR) in Context → Decision → Consequences format
- Crate boundary violation reports
- Dependency impact assessments for new crates
- Design documents for features spanning multiple crates
- Review feedback on proposed implementations
**Never does:** Write implementation code, edit source files, run commands.
**Quality gate:** Every ADR must reference the relevant RFC, spec section, or
engineering standard from `master-prompt.md`.
---
### rust-core-dev
**Identity:** Cryptography-focused Rust developer. Expert in `openmls`, `snow`,
`ml-kem`, `opaque-ke`, `zeroize`, and the `dalek` ecosystem.
**Owns:** `crates/quicproquo-core/`
**Security invariants (non-negotiable):**
- Every crypto operation returns `Result` — never `.unwrap()` or `.expect()`
- All key material types derive `Zeroize` and `ZeroizeOnDrop`
- No secret bytes in `tracing` or `log` output
- Constant-time comparisons via `subtle::ConstantTimeEq` for auth tags
- No `unsafe` without a `// SAFETY:` comment documenting the invariant
**Before any edit:**
1. Read the target file in full
2. Read `ROADMAP.md` to verify the change is in scope
3. Read `master-prompt.md` §Non-Negotiable Engineering Standards
4. Check if a new dependency is needed — if yes, justify in commit message
**After any edit:** `cargo check -p quicproquo-core && cargo test -p quicproquo-core`
---
### rust-server-dev
**Identity:** Backend systems developer. Expert in Tokio async patterns,
Cap'n Proto RPC server implementation, SQLite/SQLCipher persistence, and
connection lifecycle management.
**Owns:** `crates/quicproquo-server/`
**Security invariants:**
- No `.unwrap()` on any `Mutex::lock()`, I/O, or database operation
- Auth tokens validated before any privileged RPC handler
- `QPQ_PRODUCTION=true` rejects default/empty tokens at startup
- Rate limiting applied before processing enqueue operations
- Structured logging via `tracing` — no `println!` or `eprintln!`
**Before any edit:**
1. Read the target file and its corresponding `.capnp` schema
2. Verify the Cap'n Proto interface hasn't changed out from under you
3. Check for existing tests in `crates/quicproquo-server/tests/`
**After any edit:** `cargo check -p quicproquo-server && cargo test -p quicproquo-server`
---
### rust-client-dev
**Identity:** CLI and application developer. Expert in `clap`, interactive REPL
design, OPAQUE password authentication, encrypted local storage, and
connection management.
**Owns:** `crates/quicproquo-client/`
**UX invariants:**
- Clear, user-facing error messages — no raw Rust error types in REPL output
- REPL prompt shows current context (server address, active conversation)
- Graceful handling of server disconnection with auto-reconnect
- State file encrypted with Argon2id + ChaCha20-Poly1305
**Before any edit:**
1. Read the target file and related command handlers in `commands.rs`
2. Understand state management in `state.rs`
3. Check the REPL command table for conflicts
**After any edit:** `cargo check -p quicproquo-client && cargo test -p quicproquo-client`
---
### security-auditor
**Identity:** Application security engineer specialising in cryptographic
protocol implementations. Familiar with OWASP, CWE, NIST guidelines, and
the specific threat model of E2E encrypted messengers.
**Audit checklist (every review):**
1. `.unwrap()` / `.expect()` outside `#[cfg(test)]` on crypto or I/O paths
2. Key material types missing `Zeroize` / `ZeroizeOnDrop`
3. Secrets (keys, passwords, tokens, nonces) reaching `tracing`/`log`/`println`
4. Non-constant-time comparisons on authentication tags, tokens, or MACs
5. `panic!` / `unreachable!` in production paths
6. `unsafe` blocks without documented safety invariants
7. Missing input validation on RPC boundaries (untrusted data from network)
8. Race conditions in shared state (DashMap, Mutex, RwLock patterns)
9. Dockerfile security: running as root, secrets in ENV/ARG, base image age
10. Dependency supply chain: unmaintained crates, known CVEs via `cargo audit`
11. Timing side channels in authentication flows (OPAQUE, token validation)
12. Replay attack vectors in message delivery
**Output format:** Prioritised Markdown report with severity levels:
`Critical > High > Medium > Low > Informational`
Each finding includes: file:line, description, attack scenario, remediation.
**Never does:** Edit source files. Findings only.
---
### test-engineer
**Identity:** QA engineer with expertise in Rust testing patterns, property-based
testing (`proptest`), integration test harnesses, and E2E test design for
networked systems.
**Responsibilities:**
- Write unit tests inside `#[cfg(test)]` modules
- Write integration tests in `crates/<crate>/tests/`
- Write E2E tests that spin up server + client(s)
- Run `cargo test` and diagnose failures
- Verify test coverage against ROADMAP milestone criteria
- Identify untested code paths and edge cases
**Naming convention:** `test_<what>_<expected_outcome>` (snake_case)
**E2E test requirements:**
- Use `AUTH_LOCK` mutex for tests that share auth context
- Run with `--test-threads 1` for E2E tests
- Clean up spawned server processes on test completion
- Assert on specific error types, not just `is_err()`
**After writing tests:** Run them, report pass/fail, diagnose failures.
---
### devops-engineer
**Identity:** Infrastructure and deployment engineer. Expert in Docker
multi-stage builds, GitHub Actions CI/CD, Linux systemd services,
monitoring/observability, and release automation.
**Owns:** `docker/`, `.github/`, `docker-compose.yml`, deployment configs
**Responsibilities:**
- Docker image builds, optimisation, and security hardening
- CI pipeline maintenance and enhancement
- Release automation (cargo-release, changelogs, tagging)
- Monitoring setup (Prometheus metrics endpoint, Grafana dashboards)
- Deployment documentation (systemd units, Docker Compose, Kubernetes)
- Infrastructure-as-code for test and staging environments
- Cross-compilation targets (musl, ARM, MIPS for OpenWrt)
- Binary size optimisation for embedded/mesh deployments
**Quality gates:**
- Docker image builds successfully: `docker build -f docker/Dockerfile .`
- CI pipeline passes locally: `act` or manual validation
- Release artifacts are reproducible
---
### docs-engineer
**Identity:** Technical writer with deep understanding of cryptographic
protocols and systems programming. Writes documentation that is accurate,
navigable, and useful to both users and contributors.
**Owns:** `docs/`, `README.md`, `CONTRIBUTING.md`, `SECURITY.md`, inline
doc comments on public APIs
**Documentation tiers:**
1. **User documentation** — Getting started, installation, REPL commands,
configuration reference, troubleshooting
2. **Operator documentation** — Deployment guide, Docker setup, certificate
management, backup/restore, monitoring, operational runbook
3. **Developer documentation** — Architecture overview, crate responsibilities,
contribution guide, coding standards, testing guide
4. **Protocol documentation** — Wire format reference, Cap'n Proto schema
docs, MLS integration details, Noise transport spec
5. **Security documentation** — Threat model, trust boundaries, key lifecycle,
audit reports, responsible disclosure policy
**Quality gates:**
- `mdbook build docs/` succeeds without warnings
- All code examples in docs compile (`cargo test --doc`)
- Internal links resolve (no broken cross-references)
- Every public API has a doc comment with examples
---
### roadmap-tracker
**Identity:** Project manager and progress analyst. Reads code and docs to
objectively assess completion status.
**Method:**
1. Read `ROADMAP.md` in full
2. For each unchecked `- [ ]` item, search source for implementation evidence
3. Classify: Complete, Partial (what exists vs. what's missing), Not Started
4. Identify blockers (dependency chains between items)
5. Identify quick wins (< 1 hour, self-contained, high impact)
**Output:** Structured Markdown status report.
**Never does:** Edit files, make recommendations about architecture, or
prioritise business value. Pure objective assessment.
---
## Sprint Definitions
Sprints are groups of agent tasks that can run in parallel. Tasks within a
sprint touch different crates or concern areas, so they don't conflict.
### Production Readiness Path
The sprints below form a dependency chain. Run them in order.
```
status → audit → phase1-hardening → phase1-infra → phase2-tests →
docs-foundation → security-review → release-prep
```
### Sprint: `status`
**Purpose:** Baseline assessment before starting work.
| Agent | Task |
|-------|------|
| `roadmap-tracker` | Full roadmap status report across all phases |
| `security-auditor` | Quick security sweep of recent changes (HEAD~10) |
### Sprint: `audit`
**Purpose:** Deep security audit + roadmap analysis.
| Agent | Task |
|-------|------|
| `security-auditor` | Full audit of quicproquo-core and quicproquo-server |
| `roadmap-tracker` | Detailed Phase 1 and Phase 2 completion assessment |
### Sprint: `phase1-hardening`
**Purpose:** Eliminate crash paths and enforce secure defaults.
| Agent | Task |
|-------|------|
| `rust-core-dev` | Remove `.unwrap()`/`.expect()` from non-test code in core |
| `rust-server-dev` | Remove `.unwrap()`/`.expect()` from non-test code in server; implement `QPQ_PRODUCTION` checks |
| `rust-client-dev` | Remove `.unwrap()`/`.expect()` from non-test code in client; fix `AUTH_CONTEXT.read().expect()` |
### Sprint: `phase1-infra`
**Purpose:** Fix deployment infrastructure.
| Agent | Task |
|-------|------|
| `devops-engineer` | Fix Dockerfile (non-root user, correct workspace members, writable data dir); fix `.gitignore`; validate Docker build |
| `rust-architect` | Design TLS certificate lifecycle: CA-signed cert flow, `--tls-required` flag, rotation without downtime |
### Sprint: `phase2-tests`
**Purpose:** Build test confidence.
| Agent | Task |
|-------|------|
| `test-engineer` | E2E tests: auth failures, message ordering, concurrent clients, KeyPackage exhaustion |
| `test-engineer` | Unit tests: REPL parsing edge cases, token cache expiry, state file encryption round-trip |
| `devops-engineer` | CI hardening: coverage reporting, Docker build validation in CI, `CODEOWNERS` enforcement |
### Sprint: `docs-foundation`
**Purpose:** Create production-quality documentation.
| Agent | Task |
|-------|------|
| `docs-engineer` | Create root-level `SECURITY.md` (responsible disclosure, PGP key, scope, response timeline) |
| `docs-engineer` | Create root-level `CONTRIBUTING.md` (dev setup, PR process, commit conventions, testing, review checklist) |
| `docs-engineer` | Audit and update all `docs/src/` pages for accuracy against current codebase; fix broken references |
| `docs-engineer` | Write operator deployment guide: Docker, systemd, certificate setup, monitoring, backup/restore |
### Sprint: `security-review`
**Purpose:** Final security gate before release.
| Agent | Task |
|-------|------|
| `security-auditor` | Full audit of all crates after Phase 1 hardening changes |
| `security-auditor` | Review Dockerfile, docker-compose.yml, CI pipeline for security issues |
| `security-auditor` | Threat model review: verify docs/src/cryptography/threat-model.md matches current implementation |
### Sprint: `release-prep`
**Purpose:** Prepare for first production release.
| Agent | Task |
|-------|------|
| `devops-engineer` | Set up cargo-release workflow, CHANGELOG.md generation, version tagging strategy |
| `docs-engineer` | Final README.md review: feature matrix accurate, quick start works, badges correct |
| `roadmap-tracker` | Final status report: what's complete, what's deferred, what's blocking 1.0 |
---
## Usage
```bash
# Full orchestrator mode — orchestrator delegates to the right agents
python scripts/ai_team.py "Implement Phase 1.1 unwrap removal across all crates"
# Direct agent access — bypass orchestrator for focused work
python scripts/ai_team.py --agent security-auditor "Audit the OPAQUE login flow in quicproquo-client"
python scripts/ai_team.py --agent docs-engineer "Write the operator deployment guide"
# Predefined parallel sprint — multiple agents work simultaneously
python scripts/ai_team.py --sprint audit
python scripts/ai_team.py --sprint phase1-hardening
python scripts/ai_team.py --sprint docs-foundation
# Ad-hoc parallel tasks
python scripts/ai_team.py --parallel \
"rust-server-dev: Fix rate limiting bypass in enqueue handler" \
"security-auditor: Review the rate limiting implementation"
# Discovery
python scripts/ai_team.py --list-agents
python scripts/ai_team.py --list-sprints
```
### Recommended Production Readiness Sequence
```bash
# 1. Assess current state
python scripts/ai_team.py --sprint status
# 2. Deep audit
python scripts/ai_team.py --sprint audit
# 3. Fix critical issues (code changes)
python scripts/ai_team.py --sprint phase1-hardening
# 4. Fix infrastructure
python scripts/ai_team.py --sprint phase1-infra
# 5. Build test confidence
python scripts/ai_team.py --sprint phase2-tests
# 6. Write documentation
python scripts/ai_team.py --sprint docs-foundation
# 7. Final security review (after all code changes)
python scripts/ai_team.py --sprint security-review
# 8. Prepare release
python scripts/ai_team.py --sprint release-prep
```
---
## Quality Gates
Every sprint must pass its quality gate before the next sprint begins.
| Sprint | Gate |
|--------|------|
| `status` | Report produced, no agent failures |
| `audit` | All Critical/High findings documented |
| `phase1-hardening` | `cargo check --workspace` passes; zero `.unwrap()` outside `#[cfg(test)]` |
| `phase1-infra` | `docker build -f docker/Dockerfile .` succeeds; `.gitignore` covers all sensitive patterns |
| `phase2-tests` | `cargo test --workspace` passes; E2E coverage for all Phase 2.1 items |
| `docs-foundation` | `mdbook build docs/` succeeds; `SECURITY.md` and `CONTRIBUTING.md` exist |
| `security-review` | Zero Critical findings; all High findings have remediation plan |
| `release-prep` | CHANGELOG.md exists; version tags consistent; README quick start verified |
---
## Extending the Team
To add a new agent:
1. Define it in `AGENTS` dict in `scripts/ai_team.py`
2. Write a focused system prompt with: identity, scope, invariants, workflow
3. Specify the minimal tool set (prefer read-only when possible)
4. Add it to relevant sprints
5. Document it in this file
To add a new sprint:
1. Define it in `SPRINTS` dict in `scripts/ai_team.py`
2. Ensure all tasks within the sprint touch different files/crates
3. Document the quality gate
4. Add it to the dependency chain if it has ordering requirements
---
*quicproquo AI Agent Team — v2.0 | 2026-03-03*

View File

@@ -1,106 +0,0 @@
# Multi-Agent Work Plan: Sections 1 (Security) + 5 (Features)
This document splits work for **Future Improvements §1 (Security and hardening)** and **§5 (Features and product)** between two agents so they can work in parallel with minimal merge conflicts.
---
## Agent A: Security and hardening
**Owns:** Server auth/OPAQUE, TLS config, core crypto (identity, keypackage, hybrid_kem), docs under `docs/src/cryptography/` and TLS/cert docs.
### A1. 1.2 CA-signed TLS / certificate lifecycle
- **Files:** `docs/src/getting-started/` (new or existing), `crates/quicproquo-server/src/tls.rs` (optional env), `README.md`.
- **Tasks:**
1. Add **Certificate lifecycle** doc: using CA-issued certs (e.g. Let's Encrypt), cert rotation, OCSP/CRL optional. Recommend pinning for single-server.
2. Optional: server config or env to prefer CA-signed cert path (e.g. `QPQ_USE_CA_CERT=1` and read from a different path). Low priority if docs suffice.
- **Deliverable:** `docs/src/getting-started/certificate-lifecycle.md` (or section in running-the-server) + README link.
### A2. 1.4 Username enumeration (OPAQUE)
- **Files:** `crates/quicproquo-server/src/node_service/auth_ops.rs`, `docs/SECURITY-AUDIT.md`.
- **Tasks:**
1. Document the risk in SECURITY-AUDIT (already mentioned).
2. Optional mitigation: ensure `get_user_record` is always called before `ServerLogin::start` (already true). If desired, add a constant-time delay or dummy work when user not found so response timing does not leak existence. Keep OPAQUE security unchanged.
- **Deliverable:** Doc update; optional small code change in `handle_opaque_login_start`.
### A3. 1.1 M7 — Post-quantum MLS
- **Files:** `crates/quicproquo-core/src/` (new or modified crypto provider), `crates/quicproquo-core/src/group.rs`, `crates/quicproquo-core/src/hybrid_kem.rs`, `crates/quicproquo-core/src/hybrid_crypto.rs`.
- **Tasks:**
1. Implement a custom `OpenMlsCryptoProvider` (or adapter) that uses hybrid X25519 + ML-KEM-768 for MLS KEM (HPKE layer).
2. Wire hybrid shared secret derivation (see milestones M7) into the provider.
3. Run full test suite; ensure M3/M4/M5 tests pass.
- **Deliverable:** Hybrid KEM in MLS path; tests green. Large change; coordinate with core crate.
### A4. 1.3 Stronger credential binding
- **Files:** Docs only for now.
- **Tasks:** Add a short **Future research** subsection or ADR: X.509-based MLS credentials, or Key Transparency for public key binding. No code change in this round.
- **Deliverable:** `docs/src/roadmap/future-research.md` or ADR update.
---
## Agent B: Features and product
**Owns:** Cap'n Proto schema (node.capnp delivery/channel methods), server storage (Store trait, FileBackedStore, SqlStore), `node_service/delivery.rs`, `node_service/key_ops.rs` (if createChannel lives there), client commands for channels.
### B1. 5.1 Private 1:1 channels (DM)
- **Files:** `schemas/node.capnp`, `crates/quicproquo-server/src/storage.rs`, `crates/quicproquo-server/src/sql_store.rs`, `crates/quicproquo-server/src/node_service/delivery.rs`, new `crates/quicproquo-server/src/node_service/channel_ops.rs` (or add to delivery), migrations for channels table.
- **Tasks:**
1. **Schema:** Add `createChannel @N (auth :Auth, peerKey :Data) -> (channelId :Data);` to `node.capnp`. Rebuild proto.
2. **Store trait:** Add `create_channel(&self, member_a: &[u8], member_b: &[u8]) -> Result<Vec<u8>, StorageError>`, `get_channel_members(&self, channel_id: &[u8]) -> Result<Option<(Vec<u8>, Vec<u8>)>, StorageError>`. Implement in FileBackedStore (in-memory map channel_id -> (a, b)) and SqlStore (channels table, unique on sorted (a,b)).
3. **Server:** Implement `handle_create_channel`: auth required, identity required; create channel with (caller_identity, peer_key); return 16-byte channel_id (e.g. UUID).
4. **Delivery authz:** When `channel_id.len() == 16`: call `get_channel_members`. If Some((a, b)), verify caller identity is one of a/b and recipient_key is the other. If channel not found or authz fails, return E022 (or new code). Legacy: `channel_id` empty = current behaviour (no channel check).
5. **Config:** Optional server flag to require channel authz for non-empty channel_id (default on).
- **Deliverable:** createChannel RPC, channel storage, per-channel authz on enqueue/fetch/fetchWait; legacy mode when channel_id empty.
- **Ref:** [DM channels design](src/roadmap/dm-channels.md).
### B2. 5.2 MLS lifecycle (remove, update, proposals)
- **Files:** `crates/quicproquo-core/src/group.rs`, client commands that use GroupMember.
- **Tasks:**
1. Add `remove_member` (by index or identity) and `update_credential` / rekey using openmls APIs.
2. Handle incoming MLS proposals (Remove, Update) in `receive_message` path and apply to group state.
3. CLI: `remove` and `update` subcommands or options.
- **Deliverable:** Members can be removed and credentials updated; proposals handled; CLI exposed.
- **Ref:** OpenMLS API for `MlsGroup::remove_member`, `MlsGroup::process_pending_proposals`, etc.
### B3. 5.3 Sealed Sender and 5.4 Traffic analysis
- **Files:** Docs; optionally `crates/quicproquo-server`, `crates/quicproquo-client` for padding.
- **Tasks:**
1. Document current `sealed_sender` behaviour (enqueue without identity binding) and that full “sender in ciphertext” is a future protocol change.
2. Optional: add optional payload padding (e.g. pad to next 256 bytes) or random delay in client send path for 5.4.
- **Deliverable:** Doc update; optional padding/behaviour.
---
## File ownership (avoid conflicts)
| Area | Agent A | Agent B |
|------|---------|---------|
| `schemas/node.capnp` | — | Add createChannel |
| `crates/quicproquo-server/src/node_service/auth_ops.rs` | 1.4 username enum | — |
| `crates/quicproquo-server/src/node_service/delivery.rs` | — | 5.1 channel authz |
| `crates/quicproquo-server/src/storage.rs` | — | 5.1 Store channel methods |
| `crates/quicproquo-server/src/sql_store.rs` | — | 5.1 channels table + impl |
| `crates/quicproquo-server/src/tls.rs` | 1.2 optional | — |
| `crates/quicproquo-core/` | 1.1 M7, 1.3 doc | 5.2 group.rs |
| `docs/` | 1.2, 1.3, 1.4, 5.3/5.4 | — (or shared) |
**Shared:** `docs/`, `README.md`. Prefer non-overlapping files (e.g. A adds `certificate-lifecycle.md`, B does not edit it).
---
## Order of operations (recommended)
1. **Both:** Sync on schema and Store trait changes so B adds `createChannel` and channel methods without A touching the same trait.
2. **Agent A:** Ship A1 (CA/TLS docs) and A2 (1.4 doc + optional code) first; then A3 (M7) in a follow-up PR/batch.
3. **Agent B:** Ship B1 (createChannel + channel authz) first; then B2 (MLS remove/update); then B3/B4 (docs/padding).
---
## Completion checklist
- [ ] A1: CA-signed TLS / certificate lifecycle doc
- [ ] A2: Username enumeration doc and/or mitigation
- [ ] A3: M7 hybrid KEM in MLS provider
- [ ] A4: 1.3 credential binding (docs)
- [ ] B1: createChannel RPC + channel storage + delivery authz
- [ ] B2: MLS remove/update and proposal handling
- [ ] B3/B4: Sealed Sender and traffic analysis (docs + optional padding)

View File

@@ -1,317 +0,0 @@
# Consolidated Codebase Review — quicproquo
**Date:** 2026-03-04
**Reviewers:** 4 independent agents (security, architecture, code quality, correctness)
**Scope:** Full codebase — all workspace crates, schemas, Cargo.toml
---
## CRITICAL (7 findings)
### C1. Federation service has NO authentication on inbound requests
**Source:** Security | **File:** `crates/quicproquo-server/src/federation/service.rs:22-201`
`FederationServiceImpl` handles inbound federation requests (`relay_enqueue`, `relay_batch_enqueue`, `proxy_fetch_key_package`, `proxy_resolve_user`) but performs **zero authentication** on the caller. The `auth` field in the request is only logged (`origin` string), never validated. While mTLS protects the transport, any server with a valid federation certificate can inject arbitrary messages, enumerate users, and fetch KeyPackages. The `FederationAuth.origin` field is a self-declared string, not verified against the mTLS certificate's subject.
**Fix:** Validate `origin` against the mTLS client certificate's CN/SAN. Enforce per-peer rate limits. Consider signing federation messages at the application layer.
### C2. WebSocket bridge bypasses DM channel authorization
**Source:** Security | **File:** `crates/quicproquo-server/src/ws_bridge.rs:230-305`
The `handle_send` function lets any authenticated user enqueue a message to any recipient, bypassing the DM channel membership verification that the Cap'n Proto `enqueue` path enforces in `delivery.rs:93-135`. The WS bridge calls `store.enqueue()` directly, skipping channel membership auth, payload size limits (5 MB), rate limiting, hook invocations, delivery proof generation, and audit logging.
**Fix:** Apply the same authorization, size limits, rate limiting, hooks, and audit logging as the Cap'n Proto delivery path.
### C3. `hpke_seal` silently returns empty ciphertext on error
**Source:** Quality | **File:** `crates/quicproquo-core/src/hybrid_crypto.rs:198-201,216-219`
`HybridCrypto::hpke_seal()` catches errors and returns `Ok(vec![])` instead of propagating. Empty ciphertexts are sent as if valid — data loss and security issue.
**Fix:** Propagate errors via `Result`.
### C4. NodeServiceImpl god object (15 fields, 27 RPC methods, no capability segmentation)
**Source:** Architecture | **File:** `crates/quicproquo-server/src/node_service/mod.rs:253-336`
Single struct implements 27 methods spanning auth, delivery, key management, channels, blobs, devices, federation, and account lifecycle. `handle_node_connection` takes 15 parameters. Unauthenticated clients get capability to invoke all 27 methods.
**Fix:** Extract `ServerContext` struct. Split Cap'n Proto schema into capability interfaces (AuthService, DeliveryService, etc.) vended after auth.
### C5. FileBackedStore O(n) full-map serialization on every mutation
**Source:** Architecture | **File:** `crates/quicproquo-server/src/storage.rs:327-442`
Every write locks Mutex, mutates HashMap, serializes **entire** map to disk via `fs::write`. No fsync, no atomic rename. Performance cliff and data-loss vector.
**Fix:** Make SqlStore the default. If FileBackedStore remains for dev, use write-to-temp-then-rename.
### C6. `std::sync::Mutex` in async context (server and P2P)
**Source:** Architecture | **Files:** `storage.rs:1-7,25-28`, `sql_store.rs:4,50`, `node_service/mod.rs:272`
Holding `std::sync::Mutex` across disk I/O blocks Tokio worker threads, causing head-of-line blocking.
**Fix:** Replace with `tokio::sync::Mutex` or use `spawn_blocking` for disk I/O.
### C7. `hpke_setup_sender_and_export` silently downgrades to classical crypto on parse error
**Source:** Security | **File:** `crates/quicproquo-core/src/hybrid_crypto.rs:263`
On invalid hybrid public key, silently falls back to classical RustCrypto provider. Malicious server can force PQ downgrade.
**Fix:** Return error on hybrid key parse failure, consistent with `hpke_seal`.
---
## HIGH (14 findings)
### H1. Global `AUTH_CONTEXT: RwLock<Option<ClientAuth>>`
**Source:** Architecture + Quality | **File:** `crates/quicproquo-client/src/lib.rs:36,40,82`
Blocks multi-account, creates hidden coupling, root cause of `AUTH_LOCK` test serialization hack.
**Fix:** Replace with `ClientContext` struct passed to all functions.
### H2. Store trait is 30+ method monolith
**Source:** Architecture | **File:** `crates/quicproquo-server/src/storage.rs:33-180`
Any new storage backend must implement every method. Cannot be composed or partially implemented.
**Fix:** Split into sub-traits: `KeyPackageStore`, `DeliveryStore`, `UserStore`, `ChannelStore`, etc.
### H3. CoreError::Mls wraps errors as String, losing type info
**Source:** Architecture | **File:** `crates/quicproquo-core/src/error.rs:16-17`
Impossible to match on specific MLS error conditions.
**Fix:** Create MLS sub-error variants or wrap boxed error.
### H4. Proto `from_bytes` uses default 64 MiB traversal limit
**Source:** Architecture | **File:** `crates/quicproquo-proto/src/lib.rs:67-72`
DoS amplification vector for direct callers (client, bot, FFI).
**Fix:** Accept `ReaderOptions` as parameter, make default stricter.
### H5. Mobile crate hardcodes SkipServerVerification
**Source:** Architecture + Security | **File:** `crates/quicproquo-mobile/src/lib.rs:93-100,165-172`
Unconditionally skips TLS verification. Inherently MITM-vulnerable.
**Fix:** Add certificate_verifier parameter or feature flag.
### H6. Duplicate InsecureServerCertVerifier implementations
**Source:** Architecture | **Files:** `client/rpc.rs:27-29`, `mobile/lib.rs:165-167`
**Fix:** Consolidate into shared crate behind `cfg(feature = "insecure")`.
### H7. DiskKeyStore writes HPKE private keys to disk unencrypted
**Source:** Security | **File:** `crates/quicproquo-core/src/keystore.rs`
No encryption, no file permissions. HPKE private keys are MLS epoch secrets.
**Fix:** Encrypt with Argon2id + ChaCha20-Poly1305. Set 0o600 permissions.
### H8. `identity.rs:seed_bytes()` returns unzeroized copy of secret seed
**Source:** Security | **File:** `crates/quicproquo-core/src/identity.rs:52`
Copies 32-byte Ed25519 seed out of `Zeroizing` wrapper. Returned value not zeroized.
**Fix:** Return `&[u8]` reference or `Zeroizing<[u8; 32]>`.
### H9. `hybrid_kem.rs:private_to_bytes()` returns unzeroized `Vec<u8>`
**Source:** Security | **File:** `crates/quicproquo-core/src/hybrid_kem.rs:162`
Hybrid private key material lingers in memory.
**Fix:** Return `Zeroizing<Vec<u8>>`.
### H10. MeshIdentity stores Ed25519 seed as plaintext JSON
**Source:** Security | **File:** `crates/quicproquo-p2p/src/identity.rs:72-79`
No encryption, no file permissions.
**Fix:** Encrypt identity file, set 0o600 permissions.
### H11. WebSocket bridge has rate_limits field but never checks it
**Source:** Security | **File:** `crates/quicproquo-server/src/ws_bridge.rs:28-36`
**Fix:** Apply `check_rate_limit()` in all WS bridge handlers.
### H12. ~100 lines duplicated between `receive_message` and `receive_message_with_sender`
**Source:** Quality | **File:** `crates/quicproquo-core/src/group.rs:471-583`
Bug fix in one must be manually replicated to other. Security-critical MLS code.
**Fix:** Extract shared MLS message processing helper.
### H13. Synchronous file I/O in async blob handler
**Source:** Quality | **File:** `crates/quicproquo-server/src/node_service/blob_ops.rs:124-137`
Blocking `std::fs` calls in async handler stall event loop.
**Fix:** Use `tokio::fs` or `spawn_blocking`.
### H14. `MeshEnvelope::forwarded()` invalidates signature without re-signing
**Source:** Quality + Correctness + Security | **File:** `crates/quicproquo-p2p/src/envelope.rs:172-176`
Increments `hop_count` included in signed bytes. All forwarded envelopes fail `verify()`.
**Fix:** Exclude `hop_count` from signature, or add separate forwarding signature.
---
## MEDIUM (19 findings)
### M1. fetch_wait TOCTOU: missed notification window
**Source:** Correctness | **File:** `crates/quicproquo-server/src/node_service/delivery.rs:496-522`
TOCTOU between initial fetch (empty) and waiter registration. Enqueue between these points fires notify before waiter exists.
**Fix:** Register waiter before initial fetch.
### M2. `verify_transcript_chain` never checks hashes — misleading name
**Source:** Correctness | **File:** `crates/quicproquo-core/src/transcript.rs:215-251`
Only validates structural integrity, not hash chain. Name implies verification.
**Fix:** Rename to `validate_transcript_structure` or implement actual chain verification.
### M3. Non-atomic file writes in FileBackedStore
**Source:** Correctness | **File:** `crates/quicproquo-server/src/storage.rs:332-337` (all flush_* methods)
`fs::write` directly — crash mid-write corrupts file, loses all data.
**Fix:** Use `tempfile::NamedTempFile` + `persist()`.
### M4. `delete_account` non-atomic multi-lock
**Source:** Correctness | **File:** `crates/quicproquo-server/src/storage.rs:800-864`
6 sequential Mutex locks. Concurrent fetch could see partially deleted account.
**Fix:** Use single transaction or hold all locks simultaneously.
### M5. Timing side channel in `resolveIdentity` — no timing floor
**Source:** Security | **File:** `crates/quicproquo-server/src/node_service/user_ops.rs:142-178`
Unlike `resolveUser` which has 5ms floor.
**Fix:** Apply same `RESOLVE_TIMING_FLOOR`.
### M6. WS bridge `resolve_user` has no timing floor
**Source:** Security | **File:** `crates/quicproquo-server/src/ws_bridge.rs:158-181`
**Fix:** Add same timing floor as Cap'n Proto handler.
### M7. `AuthContext.token` not zeroized
**Source:** Security | **File:** `crates/quicproquo-server/src/auth.rs:68-72`
**Fix:** Wrap in `Zeroizing<Vec<u8>>`.
### M8. Client `ClientAuth.access_token` not zeroized
**Source:** Security | **File:** `crates/quicproquo-client/src/lib.rs:50-55`
**Fix:** Use `Zeroizing<Vec<u8>>`.
### M9. `SessionState.password` stores plaintext password in memory
**Source:** Security | **File:** `crates/quicproquo-client/src/client/session.rs:29`
**Fix:** Use `Zeroizing<String>`, derive key at startup and zeroize password.
### M10. `conversation.rs:172` hex-encodes derived key without zeroization
**Source:** Security | **File:** `crates/quicproquo-client/src/client/conversation.rs:172`
**Fix:** Use `Zeroizing<String>` for `hex_key`.
### M11. `device_ops.rs:49` uses `.unwrap_or("")` on untrusted input
**Source:** Security | **File:** `crates/quicproquo-server/src/node_service/device_ops.rs:49`
**Fix:** Return error for invalid UTF-8.
### M12. `MeshEnvelope::forwarded()` invalidates signature (duplicate of H14)
**Source:** Security | **File:** `crates/quicproquo-p2p/src/envelope.rs:172-176`
### M13. FileBackedStore `create_channel` O(n) linear scan
**Source:** Architecture + Quality + Correctness | **File:** `crates/quicproquo-server/src/storage.rs:749-765`
**Fix:** Secondary index or deterministic channel ID from member pair hash.
### M14. `resolve_identity_key` O(n) linear scan
**Source:** Architecture + Quality | **File:** `crates/quicproquo-server/src/storage.rs:676-684`
**Fix:** Maintain reverse map.
### M15. FFI error classification by string matching
**Source:** Architecture | **File:** `crates/quicproquo-ffi/src/lib.rs:183`
**Fix:** Match on typed error variants.
### M16. Documentation drift: master-prompt.md says Noise/TCP, code uses QUIC/TLS
**Source:** Architecture | **Files:** `master-prompt.md`, server/client `Cargo.toml`
**Fix:** Update master-prompt.md to reflect actual transport.
### M17. Plugin `HookVTable` unsafe Send+Sync without safety docs
**Source:** Architecture | **File:** `crates/quicproquo-plugin-api/src/lib.rs:190-192`
**Fix:** Add `// SAFETY:` documentation blocks.
### M18. OPAQUE register_finish: spurious RegistrationRequest deserialization
**Source:** Quality + Correctness | **File:** `crates/quicproquo-server/src/node_service/auth_ops.rs:335-343`
Dead code — deserializes upload_bytes as wrong type first.
**Fix:** Remove lines 335-343.
### M19. Mixed serialization formats in DiskKeyStore (bincode + serde_json)
**Source:** Quality | **File:** `crates/quicproquo-core/src/keystore.rs`
**Fix:** Standardize on one format.
---
## LOW (23 findings)
### L1. `BroadcastChannel.key` not zeroized on drop
**File:** `crates/quicproquo-p2p/src/broadcast.rs:18-19`
### L2. Plugin loader `CStr::from_ptr` on plugin-returned string (UB risk)
**File:** `crates/quicproquo-server/src/plugin_loader.rs:102`
### L3. Token cache stores session token as plaintext hex when no password set
**File:** `crates/quicproquo-client/src/client/token_cache.rs:63-68`
### L4. `--password` CLI flag visible in process list
**File:** `crates/quicproquo-client/src/main.rs:104`
### L5. `clippy::unwrap_used` is warn not deny
**File:** `Cargo.toml:88`
### L6. `strip = "symbols"` hinders post-mortem debugging
**File:** `Cargo.toml:94`
### L7. `thread_rng()` for channel ID generation instead of `OsRng`
**Files:** `storage.rs:760`, `sql_store.rs:545`
### L8. `conversation.rs:548` — `limit` cast from `usize` to `u32` without saturation
**File:** `crates/quicproquo-client/src/client/conversation.rs:548`
### L9. `conversation.rs:363-365,420-422` — bincode deserialize errors silently dropped
**File:** `crates/quicproquo-client/src/client/conversation.rs`
### L10. `repl.rs:610` — static AtomicU64 for padding timer instead of session state
**File:** `crates/quicproquo-client/src/client/repl.rs:610`
### L11. `command_engine.rs:148` — `to_slash()` clones entire Command enum unnecessarily
**File:** `crates/quicproquo-client/src/client/command_engine.rs:148`
### L12. `conversation.rs:201-203` — SQL ATTACH with format string
**File:** `crates/quicproquo-client/src/client/conversation.rs:201-203`
### L13. Client `hex.rs` trivial wrapper with zero value-add
**File:** `crates/quicproquo-client/src/client/hex.rs`
### L14. `config.rs` `#[allow(dead_code)]` on EffectiveFederationConfig
**File:** `crates/quicproquo-server/src/config.rs`
### L15. `federation/address.rs` `#[allow(dead_code)]` on entire module
**File:** `crates/quicproquo-server/src/federation/address.rs`
### L16. ANSI escape codes hardcoded without terminal capability detection
**File:** `crates/quicproquo-client/src/client/display.rs`
### L17. GUI `lib.rs:75` `.expect()` on Tauri run
**File:** `crates/quicproquo-gui/src/lib.rs:75`
### L18. `session.rs:186` DiskKeyStore failure silently falls back to ephemeral
**File:** `crates/quicproquo-client/src/client/session.rs:186`
### L19. `retry.rs:37` `thread_rng()` for jitter instead of `OsRng`
**File:** `crates/quicproquo-client/src/client/retry.rs:37`
### L20. `MeshStore.seen` set grows unboundedly
**File:** `crates/quicproquo-p2p/src/` (MeshStore)
### L21. `envelope.rs:58` `.expect()` on system clock in non-test code
**File:** `crates/quicproquo-p2p/src/envelope.rs:58`
### L22. `broadcast.rs:47` `.expect()` on encryption in non-test code
**File:** `crates/quicproquo-p2p/src/broadcast.rs:47`
### L23. Bot crate hardcodes sender as "peer" (TODO)
**File:** `crates/quicproquo-bot/src/lib.rs`
---
## TESTING GAPS (6 findings)
### T1. No unit tests for `plugin_loader.rs`
### T2. No unit tests for `federation/tls.rs`
### T3. No tests for `blob_ops.rs`
### T4. No tests for `delivery.rs`
### T5. `conversation.rs` migration code untested
### T6. No negative test for `MeshEnvelope::verify()` with wrong key
---
## Strengths (positive findings from security audit)
- Constant-time token comparison via `subtle::ConstantTimeEq`
- Parameterized SQL everywhere (no injection)
- Proper Argon2id (19 MiB, t=2, p=1) + ChaCha20-Poly1305 for client state
- Zero `.unwrap()` in non-test server code (grep-verified)
- TLS 1.3 only, mTLS for federation
- Rate limiting: 100 enqueues/60s, 50 connections/IP/60s
- Delivery proof signing: SHA-256(seq||recipient||timestamp) + Ed25519
- KeyPackage ciphersuite validation (only 0x0001 accepted)
- Payload size limits: 5 MB message, 50 MB blob, 1 MB KeyPackage
- Queue depth limits: 1000/inbox, 100K sessions, 100K waiters
- Blob path traversal protection via hex-encoded hash
- Audit logging with secret redaction (identity keys prefix-only)
- Production config validation (rejects devtoken, empty auth, missing TLS)
---
## Recommended fix priority
1. **Federation auth** (C1) — auth-gate inbound requests, validate origin against mTLS cert
2. **WS bridge authz** (C2) + rate limits (H11) + timing floors (M6) — parity with Cap'n Proto path
3. **Crypto error propagation** (C3, C7) — hpke_seal and hpke_setup_sender_and_export
4. **Zeroization sweep** (H8, H9, H10, M7-M10, L1) — all leaked secret material
5. **ServerContext extraction** (C4) — foundation for capability-based security
6. **FileBackedStore atomic writes** (C5, M3) — prevent data loss on crash
7. **std::sync::Mutex → tokio::sync::Mutex** (C6) — unblock Tokio workers
8. **Mobile TLS verification** (H5) — remove hardcoded skip
9. **fetch_wait TOCTOU** (M1) — register waiter before fetch
10. **Testing gaps** (T1-T6) — critical untested paths

View File

@@ -1,175 +0,0 @@
# Next Sprint Planning — quicproquo
> Pick 8 of the 24 features below for the next sprint cycle.
> Created: 2026-03-04 | Status: PENDING SELECTION
## Completed Sprints (this cycle)
| # | Sprint | Commit | Summary |
|---|--------|--------|---------|
| 4 | Rich Messaging | `81d5e2e` | Read receipts, typing, reactions, edit/delete |
| 5 | File Transfer | `3350d76` | Chunked blob upload/download, /send-file |
| 6 | Disappearing + Groups | `fd21ea6` | TTL messages, /group-info, deleteAccount |
| 7 | Go SDK | `65ff262` | QUIC + Cap'n Proto, 24 RPC methods, 14 API functions |
| 8 | TypeScript SDK | `28ceaaf` | 175KB WASM crypto, WebSocket transport, browser demo |
| 9 | Mesh Networking | `1b61b7e` | MeshIdentity, store-and-forward, broadcast channels |
| 10 | Privacy Hardening | `9244e80` | --redact-logs, traffic padding, /privacy suite, /verify-fs |
| 11 | Multi-Device | `9244e80` | Device registry (3 RPCs), /devices, max 5 per identity |
## Current Codebase Stats
- **27 Cap'n Proto RPCs** (@0@26) on NodeService
- **10 AppMessage types** (0x010x09 + file ref)
- **~40 REPL commands**
- **Tests**: 72 core + 35 server + 28 P2P + 14 E2E = 149
- **SDKs**: Rust (native), Go, TypeScript/WASM, C FFI, Python ctypes
- **Crates**: core, proto, server, client, p2p, bot, gen, kt, plugin-api, gui, mobile, ffi
---
## Feature Candidates (pick 8)
### A. Federation Wiring
**Effort**: Medium | **Area**: Server
Wire the existing outbound federation relay into the actual delivery flow. When a message targets `user@remote.domain`, the server routes via `FederationClient::relay_enqueue()` instead of local store. Add `/federate <domain>` admin command to configure peers. Test with two server instances. Currently all federation code exists but is marked `#[allow(dead_code)]`.
### B. Contact Management & Blocking
**Effort**: Medium | **Area**: Client + Server
Contact list with add/remove/block/unblock. Server-side: `addContact @27`, `removeContact @28`, `blockUser @29`, `listContacts @30` RPCs + contacts table. Client: `/contacts`, `/block <user>`, `/unblock <user>`. Blocked users can't enqueue messages to you (server enforces). Import/export contacts as JSON.
### C. Voice/Video Call Signaling
**Effort**: High | **Area**: Core + Client
WebRTC signaling over MLS for E2E encrypted calls. Add `CallOffer`, `CallAnswer`, `CallIce`, `CallHangup` AppMessage types (0x0A0x0D). Client REPL: `/call <user>`, `/answer`, `/hangup`. The actual media (audio/video) uses WebRTC peer-to-peer; qpq only handles the encrypted signaling. Include SDP offer/answer exchange and ICE candidate relay.
### D. Encrypted Backup & Restore
**Effort**: Medium | **Area**: Client + Core
Export all local state (message history, keys, group state) as an encrypted archive. Key derivation from user password via Argon2id. Format: encrypted SQLite dump + identity seed + MLS group states. `/backup <path>` and `/restore <path>` commands. Verify integrity on restore. Critical for device migration and disaster recovery.
### E. Group Permissions & Roles
**Effort**: Medium | **Area**: Server + Client
Admin/moderator/member roles within MLS groups. Server-side role storage per channel. Admins can: remove members, rename group, set TTL policy. Moderators can: mute members. Members can: send messages. `/role <user> admin|mod|member`, `/mute <user> <duration>`. Enforced at both server (RPC level) and client (MLS proposal validation).
### F. Key Transparency Audit Client
**Effort**: Medium | **Area**: Client + KT crate
Client-side verification of the KT Merkle log. The KT crate (`quicproquo-kt`) already has the Merkle tree and audit log. Add: `/kt audit <username>` to verify a user's key history is consistent, `/kt monitor` to continuously watch for key changes, `/kt proof <username>` to fetch and verify inclusion proofs. Alert on unexpected key changes (TOFU violation).
### G. Message Search
**Effort**: Low-Medium | **Area**: Client
Full-text search over local encrypted message history. Add FTS5 virtual table to the conversation SQLite DB. `/search <query>` returns matching messages with context, timestamps, and conversation names. `/search <query> in:<conversation>` for scoped search. Highlight matching terms. Index on message insert.
### H. Server Clustering & HA
**Effort**: High | **Area**: Server + Infra
Run multiple qpq-server instances behind a shared state layer. Options: shared PostgreSQL backend (replace SQLite for clustered mode), or Raft consensus for delivery queue. Add `--cluster-peers` flag, health-based leader election, delivery queue synchronization. Docker Compose with 3-node cluster. This is the path to production-scale deployment.
### I. Protocol Compliance Testing
**Effort**: Medium | **Area**: Testing
Comprehensive MLS RFC 9420 compliance test suite. Verify: TreeKEM operations, epoch advancement, proposal/commit sequences, welcome message handling, group context extensions, PSK injection, external joins. Cross-test with other MLS implementations (OpenMLS test vectors). Add to CI. Target: 50+ protocol-level tests covering edge cases.
### J. User Profiles & Status
**Effort**: Low | **Area**: Server + Client
Profile pictures (stored as blobs), display names, status messages ("Available", "Away", custom text), about/bio text. `updateProfile @27` and `fetchProfile @28` RPCs. Profile data is signed by the identity key for authenticity. `/profile set-name <name>`, `/profile set-status <text>`, `/profile set-avatar <path>`, `/profile <username>` to view. Cache profiles locally.
### K. Notification Framework
**Effort**: Medium | **Area**: Server + Client
Per-conversation notification settings: all, mentions-only, muted. Server-side WebPush integration for browser clients (using the TS SDK). Add `updateNotificationSettings @27` RPC. Client: `/mute <conversation>`, `/unmute`, `/notify mentions-only`. Push notification payload: encrypted sender + conversation hint (no message content). APNs/FCM gateway as a separate microservice.
### L. Mobile App Shell
**Effort**: High | **Area**: Mobile + FFI
React Native app using the C FFI bindings (quicproquo-ffi). Screens: login, conversation list, chat view, settings. Bridge FFI functions to React Native via NativeModules. Use the existing `qpq_connect`, `qpq_login`, `qpq_send`, `qpq_receive` C API. iOS + Android targets. Alternatively: Flutter with dart:ffi. Includes push notification registration.
### M. Message Threading & Replies
**Effort**: Low-Medium | **Area**: Client + Core
Threaded conversations within channels. Add `thread_id` field to Chat AppMessage — replies to a message inherit its thread_id (or create one). `/thread <msg-index>` enters a thread view showing only that thread's messages. `/threads` lists active threads with last activity. Thread-aware notification counts. Local storage: add `thread_id` column to messages table, filter queries by thread.
### N. Cross-Signing & Identity Verification
**Effort**: Medium | **Area**: Core + Client
Out-of-band identity verification via QR codes and emoji comparison. Generate a short verification code from both parties' identity keys (similar to Signal's safety numbers but interactive). `/verify <user>` starts a verification session, displays emoji sequence or QR payload. `/verify confirm` marks the contact as verified. Verified contacts show a checkmark. Store verification state locally. Alert if a verified contact's key changes.
### O. Offline Message Queue with Priorities
**Effort**: Low-Medium | **Area**: Client
Smart offline queue that prioritizes messages when reconnecting. Messages queued while offline get priority levels: critical (key rotation, group ops), normal (chat), low (typing, read receipts). On reconnect, send critical first, then normal, drop stale low-priority. `/outbox` shows pending messages. `/outbox flush` forces immediate send. `/outbox clear` discards unsent. Exponential backoff with jitter for reconnection.
### P. Audit Log & Compliance Export
**Effort**: Medium | **Area**: Server
Persistent server-side audit log for compliance. Every RPC call logged to a dedicated `audit_events` table: timestamp, identity, operation, result, metadata. Configurable retention policy (30/60/90 days). `qpq-admin audit --from --to --user` CLI to query. Export to JSON/CSV. GDPR data export: `/export-my-data` RPC returns all data the server holds about a user. Separate from redact-logs (this is structured, queryable).
### Q. Bot Framework Enhancements
**Effort**: Medium | **Area**: Bot SDK + Server
Enhance the existing `quicproquo-bot` crate into a full bot platform. Add: slash command registration (`/weather`, `/poll`, etc.), interactive message components (buttons/selects as structured AppMessage extensions), bot permissions (scoped access tokens), webhook delivery (HTTP POST on events). `BotBuilder` pattern: `Bot::new().command("ping", handle_ping).on_message(handle_msg).run()`. Example bots: echo, reminder, RSS feed.
### R. Tor/I2P Transport
**Effort**: High | **Area**: Server + Client + P2P
Anonymous transport layer for privacy-critical deployments. Server: listen on Tor hidden service (.onion) via `arti` or `tor` crate, configurable via `--tor-hidden-service`. Client: connect through SOCKS5 proxy to .onion address, `--tor-proxy socks5://127.0.0.1:9050`. P2P mesh: route through Tor for metadata-resistant peer communication. Optional I2P support via SAM bridge. All existing QUIC+TLS works over the tunnel.
### S. Plugin Marketplace & Hot-Reload
**Effort**: Medium | **Area**: Server + Plugin API
Extend the existing plugin system into a discoverable marketplace. Plugin manifest format (TOML) with name, version, permissions, hooks. `qpq-server --plugin-dir ./plugins/` auto-loads `.so`/`.dylib` files. Hot-reload: watch plugin directory, reload on change without server restart. Plugin isolation: each plugin runs in its own thread with limited Store access. Add `qpq-gen plugin <name>` scaffolding. Example: spam filter plugin, message archiver.
### T. Stress Testing & Benchmarking Suite
**Effort**: Medium | **Area**: Testing + Infra
Production-grade load testing tool. Simulate N concurrent clients: register, login, create channels, send/receive at configurable rate. Measure: messages/sec throughput, p50/p95/p99 latency, memory usage, connection limits. `cargo bench` integration for micro-benchmarks (already have some). New `qpq-loadtest` binary: `qpq-loadtest --clients 100 --rate 50/s --duration 60s --server localhost:5001`. Generate HTML report with charts. Identify bottlenecks before production.
### U. Disappearing Media & View-Once
**Effort**: Low | **Area**: Client + Core
View-once messages that auto-delete after first viewing. Add `ViewOnce` flag to FileRef AppMessage — recipient can view the file/image once, then it's deleted locally. Server-side: auto-delete blob after first download. `/send-once <path>` command. Display "[view-once media]" placeholder until opened. Prevent screenshots (best-effort: clear clipboard, disable screen recording notification). Extends existing file transfer infrastructure.
### V. Emoji Status & Presence
**Effort**: Low | **Area**: Server + Client
Lightweight presence system. Users set an emoji + short text status ("🏖️ On vacation", "🔴 Do not disturb", "🟢 Available"). Ephemeral — not stored permanently, expires after configurable duration. `publishPresence` RPC (piggyback on existing `publishEndpoint`). Client poll or push-based presence updates. `/status 🎯 Focusing` to set, `/status` to view, `/who` shows online contacts with their status. No tracking — presence is opt-in and ephemeral.
### W. Markdown & Rich Text Messages
**Effort**: Low | **Area**: Core + Client
Rich text formatting in messages. Support a subset of Markdown in chat: **bold**, *italic*, `code`, ```code blocks```, ~~strikethrough~~, > quotes, [links](url). Parse on display (client-side only — wire format stays plain text with Markdown syntax). TUI renderer: ANSI escape codes for bold/italic/color. Browser demo: render as HTML. Add `/format on|off` toggle. No changes to MLS or wire protocol — purely presentational.
### X. Invitation Links & QR Codes
**Effort**: Low-Medium | **Area**: Server + Client
Shareable invitation links for joining the server or a group. `createInvite` RPC generates a time-limited, usage-limited token. Format: `qpq://server:port/invite/TOKEN` or QR code encoding. `/invite create [--expires 24h] [--uses 10]` generates link. `/invite list` shows active invites. `/invite revoke <id>` cancels. New users can register via invite: `qpq-client --invite qpq://...`. Group invites: generate a link that auto-adds the joiner to a specific group after registration.
### Y. Command Engine & Playbooks
**Effort**: Medium | **Area**: Client + Testing
Unified command abstraction layer making every REPL action available via code and YAML. Command registry maps string names to typed `Command` variants. YAML playbook format for declarative multi-step scenarios with variables, assertions, and loops. `qpq-client --run playbook.yaml` for batch execution. Programmatic Rust API: `engine.execute(Command::Send { ... })`. Enables: CI smoke tests, reproducible environments, bot scripting, onboarding demos, load test scenarios, migration scripts. Pairs with every other feature.
---
## Selection Guide
**Privacy-first** (maximum anonymity & security):
A (federation), D (backup), F (KT audit), N (cross-signing), R (Tor), U (view-once)
**Production-ready** (deploy to real users):
A (federation), B (contacts), H (clustering), I (compliance), K (notifications), T (stress test)
**User experience** (make it feel like a real messenger):
B (contacts), C (calls), G (search), J (profiles), V (presence), W (rich text), X (invites)
**Mobile launch** (ship an app):
D (backup), J (profiles), K (notifications), L (mobile app), X (invites)
**Developer ecosystem** (grow the community):
Q (bot framework), S (plugin marketplace), T (stress test), I (compliance)
**Mesh/Freifunk** (offline-first, decentralized):
A (federation), N (cross-signing), O (offline queue), R (Tor)
---
## Completed (this planning cycle)
| Sprint | Feature | Status |
|--------|---------|--------|
| — | Y. Command Engine & Playbooks | Done — `command_engine.rs`, `playbook.rs`, `--run` CLI, 5 example playbooks |
## Selected Features (fill in after choosing)
> Pick 8 of AX above, then we'll plan sprint assignments.
| Sprint | Feature | Notes |
|--------|---------|-------|
| 12 | | |
| 13 | | |
| 14 | | |
| 15 | | |
| 16 | | |
| 17 | | |
| 18 | | |
| 19 | | |

View File

@@ -1,380 +0,0 @@
# quicproquo v2 — Design Analysis & Recommendations
> Multi-perspective retrospective of the v1 architecture.
> Produced 2026-03-04 by four parallel analysis agents examining server,
> client/UX, crypto/security, and project structure/DX.
---
## Executive Summary
quicproquo v1 demonstrates strong fundamentals: QUIC-native transport, RFC 9420
MLS group encryption, post-quantum hybrid KEM, OPAQUE zero-knowledge auth, and a
working multi-language SDK surface. These are the right bets and put the project
ahead of most open-source messengers on the crypto front.
However, three architectural choices limit the path to production:
1. **capnp-rpc is `!Send`** — forces single-threaded RPC handling, blocking
scalability.
2. **Monolithic client with global state** — business logic is tangled into the
REPL, duplicated across TUI/GUI/Web, and cannot be used as a library.
3. **Poll-based delivery** — 1-second polling wastes bandwidth and adds latency;
no server-push channel exists.
A v2 should keep the crypto stack (MLS + hybrid PQ KEM + OPAQUE), keep QUIC, but
rearchitect the RPC layer, extract an SDK crate, and add push-based delivery.
---
## Part 1 — What Works Well
### Transport & Protocol
- **QUIC (quinn) + TLS 1.3** — correct choice. Built-in encryption, connection
migration, 0-RTT potential. No reason to change.
- **Cap'n Proto schemas as API contract** — zero-copy wire format, compact
binary, schema evolution via ordinals. The *schemas* are good; the *RPC
runtime* is the problem.
### Cryptography
- **MLS (RFC 9420, openmls)** — only IETF-standard group E2E protocol. No
realistic alternative for groups > 2 members. Test suite is thorough (1005
lines covering 2-party, 3-party, hybrid, removal, leave, stale epoch).
- **Hybrid PQ KEM (X25519 + ML-KEM-768)** — forward-thinking dual-algorithm
protection. Well-implemented with versioned wire format, proper zeroization,
and 12 targeted tests. Ahead of Signal (PQXDH, late 2023) and Matrix (no PQ).
- **OPAQUE (RFC 9497)** — server never sees passwords. Ristretto255 + Argon2id
is best-in-class.
- **Sealed sender, safety numbers, message padding** — all clean, simple,
correct. Safety numbers match Signal's 5200-iteration HMAC-SHA256 cost.
- **Zeroization discipline** — secrets wrapped in `Zeroizing`, Debug impls
redact keys, no `.unwrap()` in crypto paths.
- **WASM feature gating** — `core/native` cleanly separates WASM-safe crypto
from native-only modules (MLS, OPAQUE, filesystem).
### Server Design
- **Store trait abstraction** — 30+ methods, clean backend swap (SqlStore vs
FileBackedStore). Well-factored.
- **OPAQUE auth with timing floors** — `resolveUser`/`resolveIdentity` mask
lookup timing to prevent username enumeration.
- **Delivery proofs** — Ed25519-signed receipt of server acceptance. Clients get
cryptographic evidence.
- **`wasNew` flag on createChannel** — elegantly solves the dual-MLS-group race
condition where both DM parties try to initialize.
- **Plugin hooks (C-ABI)** — `#![no_std]` vtable, zero dependencies, chained
hooks with continue/reject protocol. Clean extensibility.
- **Production config validation** — enforces encrypted storage, strong auth
tokens, pre-existing TLS certs.
### Client & DX
- **Zero-config local dev** — `qpq --username alice --password pass` auto-starts
server, generates TLS certs, registers, and logs in. Genuinely excellent.
- **Encrypted-at-rest everything** — state file (QPCE), conversation DB
(SQLCipher), session cache. Argon2id + ChaCha20-Poly1305 throughout.
- **Playbook system** — YAML-scripted command execution with assertions. Great
for CI/integration testing.
- **Conversation store** — SQLite with deduplication, outbox for offline
queuing, activity tracking.
- **Conventional commits, GPG-signed** — consistent `feat:`/`fix:`/`docs:`
discipline.
- **Security lints enforced by build** — `clippy::unwrap_used = "deny"`,
`unsafe_code = "warn"`.
---
## Part 2 — What Needs Rethinking
### 2.1 RPC Layer: capnp-rpc is the #1 Scalability Bottleneck
**Problem:** `capnp-rpc` uses `Rc` internally and is `!Send`. Everything runs on
a `LocalSet` with `spawn_local`. All 27 RPC methods serialize through a single
thread. No work-stealing, no multi-core utilization.
**Impact:** With 1000+ concurrent clients, the single-threaded executor cannot
keep up. A slow `fetchWait` (30s timeout) blocks the entire connection.
**Also:** The WebSocket bridge (`ws_bridge.rs`, 645 lines) exists solely because
Cap'n Proto cannot run in browsers. This duplicates handler logic and creates
maintenance burden.
### 2.2 Client Architecture: Monolith with Global State
**Problem:** `AUTH_CONTEXT` is a process-wide `RwLock<Option<ClientAuth>>`.
Business logic (MLS processing, sealed sender, hybrid decryption, message
routing) lives inside `repl.rs`'s `poll_messages()` — a 100-line function that
mixes transport, crypto, routing, and storage.
**Impact:** Every frontend (REPL, TUI, GUI, Web) must reimplement message
processing. The TUI already duplicates it. The GUI stub and mobile PoC would need
yet another copy. Client cannot be used as a library.
### 2.3 Delivery Model: Poll-Based, No Push Channel
**Problem:** Client polls every 1 second with `fetch_wait(timeout_ms=0)` — never
actually long-polls. Constant network traffic even when idle. ~1 second latency
for message delivery.
**Also:** `fetch` is destructive (drains queue). If the client crashes between
receive and processing, messages are lost.
### 2.4 Connection Model: Single Stream
**Problem:** `max_concurrent_bidi_streams(1)` means the entire QUIC connection is
effectively single-stream. A blocking `fetchWait` prevents all other RPCs.
### 2.5 Storage: Single Mutex-Guarded SQLite Connection
**Problem:** `SqlStore` uses `Mutex<Connection>`. Every database operation
acquires a global lock. Under concurrent load, all storage access serializes.
**Also:** `FileBackedStore` flushes the entire map on every write (O(n) I/O).
Sessions are in-memory only — server restart forces all clients to re-login.
### 2.6 Key Management Gaps
- **DiskKeyStore** — HPKE private keys stored as plaintext bincode on disk. No
encryption at rest.
- **MLS group state** — `GroupMember` holds `MlsGroup` in memory only. Process
crash loses all group state.
- **Token zeroization** — `AuthContext.token`, `ClientAuth.access_token` are not
wrapped in `Zeroizing`.
### 2.7 Workspace Bloat
12 crates for a project at this maturity is excessive. Several are thin stubs
(`quicproquo-gen`, `quicproquo-bot` at 354 lines) or broken (`quicproquo-gui`
fails `cargo build --workspace`).
---
## Part 3 — v2 Architecture Recommendations
### 3.1 Replace capnp-rpc with a Send-Compatible RPC Framework
**Recommendation:** Switch to **tonic (gRPC)** or a custom framing layer.
| Dimension | capnp-rpc (v1) | tonic/gRPC (v2) |
|-----------|---------------|-----------------|
| Threading | `!Send`, single-threaded | `Send + Sync`, multi-threaded |
| Browser | Requires WS bridge | grpc-web native |
| Streaming | Not supported | Built-in |
| Middleware | None (copy-paste auth) | Interceptors/layers |
| Ecosystem | Niche | Massive (every language) |
**Alternative:** Keep Cap'n Proto *schemas* for serialization (zero-copy
advantage) but replace capnp-rpc with custom framing over QUIC streams. This
preserves the wire format while gaining `Send` compatibility.
The WS bridge would be eliminated entirely — grpc-web or WebTransport gives
browsers direct access.
### 3.2 Extract an SDK Crate (Most Important Client Change)
Create `quicproquo-sdk` that owns all business logic:
```
quicproquo-sdk/
src/
client.rs -- QpqClient: connect, login, send, receive
events.rs -- ClientEvent enum (push-based)
conversation.rs -- ConversationHandle, group management
crypto.rs -- MLS pipeline, sealed sender, hybrid decryption
sync.rs -- message sync, offline queue, retry
```
All frontends become thin shells:
```
CLI/REPL -> calls sdk
TUI -> calls sdk
Tauri GUI -> calls sdk (via Tauri commands)
Mobile -> calls sdk (via C FFI)
Web/WASM -> calls sdk (compiled to wasm32)
```
**Key API shape:**
```rust
pub struct QpqClient { /* session, rpc, crypto pipeline */ }
impl QpqClient {
pub async fn connect(config: ClientConfig) -> Result<Self>;
pub async fn login(username: &str, password: &str) -> Result<Self>;
pub async fn dm(&mut self, username: &str) -> Result<ConversationHandle>;
pub async fn create_group(&mut self, name: &str) -> Result<ConversationHandle>;
pub async fn send(&mut self, text: &str) -> Result<MessageId>;
pub fn subscribe(&self) -> Receiver<ClientEvent>;
}
```
No global state. No `AUTH_CONTEXT`. Auth context is per-`QpqClient` instance.
### 3.3 Add Push-Based Delivery
**Recommendation:** Dedicated QUIC unidirectional stream for server-push
notifications.
```
Client opens bidi stream 0 -> RPC channel (request/response)
Server opens uni stream 1 -> push notifications (new message, typing, etc.)
```
Benefits:
- Zero-latency message delivery (no polling)
- No idle network traffic
- Typing indicators delivered in real-time
- Graceful degradation: fall back to long-poll if push stream fails
**Also:** Make `peek` + `ack` the default delivery pattern (not destructive
`fetch`). Add idempotency keys to prevent duplicate messages on retry.
### 3.4 Multi-Stream Connections
Allow 4-8 concurrent bidirectional QUIC streams per connection. This enables:
- Pipelined RPCs (send while fetching)
- Concurrent blob upload + chat
- `fetchWait` on one stream without blocking others
### 3.5 Storage Improvements
| Change | Rationale |
|--------|-----------|
| Drop `FileBackedStore` | O(n) flush per write, no federation support |
| Connection pool for SQLite | Replace `Mutex<Connection>` with r2d2/deadpool |
| Persist sessions to DB | Server restart shouldn't force re-login |
| Encrypt DiskKeyStore at rest | HPKE private keys in plaintext is a real vuln |
| Persist MLS group state | Process crash shouldn't lose group state |
| Atomic keystore writes | tempfile-then-rename pattern |
### 3.6 Crypto Stack Refinements
The algorithms are correct. The refinements are operational:
| Change | Rationale |
|--------|-----------|
| Typed MLS error variants | Stop losing error info via `format!("{e:?}")` |
| Formalize hybrid PQ ciphersuite ID | Replace length-based key detection |
| Remove all InsecureServerCertVerifier | No TLS bypass on any platform |
| Add passkey/WebAuthn alt-auth | Better UX for GUI/mobile, no password to forget |
| Consider Double Ratchet for 1:1 DMs | MLS is over-engineered for 2-party; DR gives better per-message forward secrecy |
| Token/session secret zeroization | `AuthContext.token` et al. need `Zeroizing` wrappers |
| Fix serde deserialization of secrets | Intermediate non-zeroized `Vec<u8>` in `IdentityKeypair::deserialize` |
### 3.7 Workspace Restructuring
**Reduce from 12 to 8 crates:**
```
quicproquo-core -- crypto primitives (keep)
quicproquo-proto -- schema codegen (keep)
quicproquo-plugin-api -- #![no_std] C-ABI (keep)
quicproquo-kt -- key transparency (keep)
quicproquo-sdk -- NEW: business logic library
quicproquo-server -- server binary (keep)
quicproquo-client -- CLI/TUI binary, depends on sdk (keep, slimmed)
quicproquo-p2p -- mesh networking (keep, feature-flagged)
```
**Merge/remove:**
- `bot` -> `sdk::bot` module
- `ffi` -> `sdk` with `--features c-ffi`
- `gen` -> `scripts/` or `xtask`
- `gui` -> `apps/gui/` outside workspace (Tauri project)
- `mobile` -> `examples/` (research spike)
**Add `[workspace.default-members]`** so `cargo build` doesn't attempt GUI.
**Add `justfile`** with `build`, `test`, `test-e2e`, `build-wasm`, `docker`.
### 3.8 Plugin System Evolution
| Change | Rationale |
|--------|-----------|
| Add `version: u32` to `HookVTable` | ABI stability — check version on load |
| Config passthrough | `qpq_plugin_init(vtable, config_json)` |
| Async hooks | Plugins that call external services shouldn't block Tokio |
| Evaluate WASM plugins | Sandboxed community plugins (keep C-ABI for first-party) |
### 3.9 Federation Improvements
| Change | Rationale |
|--------|-----------|
| DNS SRV / .well-known discovery | Static peer config doesn't scale |
| Persistent relay queue with retry | Messages to offline peers are currently lost |
| Deterministic channel ID derivation | Avoid cross-server channel conflicts |
| Keep mDNS as optional mesh feature | Not for internet-scale, but good for LAN |
### 3.10 Test & CI Improvements
| Change | Rationale |
|--------|-----------|
| Per-client auth context | Removes `--test-threads 1` constraint |
| Mock server for client unit tests | Fast tests without spawning real server |
| Fuzz testing (cargo-fuzz) | Hybrid KEM, sealed sender, padding, Cap'n Proto deser |
| WS bridge unit tests | 645 lines, zero tests, security-critical |
| WASM + Go SDK in CI | Currently untested in CI |
| Separate E2E from unit test CI job | Different speed, different failure modes |
| macOS CI | FFI/mobile cross-compilation validation |
| Release automation | Binary artifacts, Docker tags, WASM npm publish |
---
## Part 4 — Ecosystem Positioning
### Don't compete with Signal or Matrix directly.
**Target: Privacy-first messaging infrastructure for developers and
organizations.**
quicproquo's differentiators — QUIC-native transport, post-quantum crypto, MLS,
plugin system, multi-language SDKs, embeddable architecture — point toward an
infrastructure play, not a consumer app.
Think: *"the Postgres of E2E encrypted messaging"* — a high-quality open-source
server and protocol that other projects build on.
| Segment | Value Proposition |
|---------|-------------------|
| **Developer tool** | API-first messenger for encrypted bots and integrations |
| **Embeddable** | C FFI + WASM + Go SDK for embedding in other apps |
| **Enterprise** | On-prem, plugins for compliance/audit, OPAQUE zero-knowledge auth |
| **Research** | Post-quantum crypto, MLS reference implementation, mesh networking |
---
## Part 5 — Priority Ordering
### Phase 1: Foundation (unblocks everything else)
1. Replace capnp-rpc with Send-compatible framework
2. Extract SDK crate from client
3. Per-client auth context (no global state)
### Phase 2: Reliability
4. Push-based delivery (QUIC uni-stream)
5. Multi-stream connections
6. Persist sessions + MLS group state
7. Encrypt DiskKeyStore at rest
8. peek+ack as default delivery
### Phase 3: Polish
9. Workspace restructuring (12 -> 8 crates)
10. TUI as primary interactive mode (built on SDK)
11. Plugin system v2 (versioning, config, async)
12. Federation retry queue + discovery
### Phase 4: Ecosystem
13. Full MLS in WASM (browser E2E)
14. WebTransport (eliminate WS bridge)
15. Tauri GUI (built on SDK)
16. Release automation + expanded CI
---
## Appendix — Analysis Sources
This document was produced by four parallel analysis agents:
| Agent | Scope | Files Read |
|-------|-------|-----------|
| server-analyst | Transport, RPC, delivery, storage, federation | 27 server .rs files, 4 schemas, core transport |
| client-analyst | REPL, UX, state, multi-platform, SDK design | All client .rs, GUI, mobile, TS demo |
| security-analyst | MLS, OPAQUE, hybrid KEM, keystore, identity | All core .rs, review doc |
| dx-analyst | Workspace, build, tests, plugins, CI, ecosystem | All Cargo.toml, tests, CI, plugins, SDKs |

View File

@@ -1,328 +0,0 @@
# quicproquo v2 — Master Implementation Plan
> Created 2026-03-04. This is the authoritative plan for the v2 rewrite.
> See also: `docs/V2-DESIGN-ANALYSIS.md` for the detailed retrospective.
## Context
The v1 codebase has strong crypto foundations (MLS, hybrid PQ KEM, OPAQUE) but three
architectural bottlenecks: capnp-rpc is `!Send` (single-threaded), client business logic
is trapped in a monolithic REPL with global state, and delivery is poll-based.
This plan creates v2 on a new branch, keeping the crypto stack intact and replacing
the RPC/transport layer, extracting an SDK, and restructuring the workspace.
**Key decisions:**
- Transport: Protobuf (prost) + custom framing over QUIC (quinn)
- Mobile: Tauri 2 (same Rust SDK backend, web UI)
- Branch strategy: `v2` branch from main, not a fresh repo
- Constraints: Rust, QUIC, GPG-signed commits, zeroize secrets, no stubs
---
## Architecture Overview
```
┌─────────────────────────────────────────────────────┐
│ Frontends │
│ CLI/TUI │ Tauri GUI/Mobile │ Web (WebTransport)│
└─────┬─────┴────────┬───────────┴──────────┬─────────┘
│ │ │
▼ ▼ ▼
┌─────────────────────────────────────────────────────┐
│ quicproquo-sdk │
│ QpqClient { connect, login, send, recv, subscribe } │
│ Event system (tokio broadcast) │
│ Crypto pipeline (MLS, sealed sender, hybrid) │
│ Conversation store (SQLCipher) │
└──────────────────────┬──────────────────────────────┘
┌─────────────────────────────────────────────────────┐
│ quicproquo-rpc │
│ QUIC framing: [method:u16][req_id:u32][len:u32][pb] │
│ Multi-stream (1 RPC per stream) │
│ Server-push via uni-streams │
│ tower middleware (auth, rate-limit) │
└──────────────────────┬──────────────────────────────┘
┌─────────────────────────────────────────────────────┐
│ quicproquo-server │
│ Domain services (auth, delivery, channel, blob) │
│ Store trait → SqlStore (connection pool) │
│ Plugin hooks, federation, KT │
└─────────────────────────────────────────────────────┘
```
### Wire Format
Per QUIC bidirectional stream (request/response):
```
Request: [method_id: u16][request_id: u32][payload_len: u32][protobuf bytes]
Response: [status: u8][request_id: u32][payload_len: u32][protobuf bytes]
```
Per QUIC unidirectional stream (server → client push):
```
Push: [event_type: u16][payload_len: u32][protobuf bytes]
```
Each RPC opens its own QUIC bidi stream → natural multi-stream, no head-of-line blocking.
---
## Workspace Structure (v2: 9 crates)
```
quicproquo/
├── crates/
│ ├── quicproquo-core/ # KEEP AS-IS — crypto primitives, MLS, hybrid KEM
│ ├── quicproquo-kt/ # KEEP AS-IS — key transparency
│ ├── quicproquo-plugin-api/ # KEEP AS-IS — #![no_std] C-ABI
│ ├── quicproquo-proto/ # REWRITE — protobuf schemas + prost codegen
│ ├── quicproquo-rpc/ # NEW — QUIC RPC framework (framing, dispatch, tower)
│ ├── quicproquo-sdk/ # NEW — client business logic library
│ ├── quicproquo-server/ # REWRITE — domain services + RPC handlers
│ ├── quicproquo-client/ # REWRITE — thin CLI/TUI shell over SDK
│ └── quicproquo-p2p/ # KEEP — iroh mesh (feature-flagged, later)
├── apps/
│ └── gui/ # Tauri 2 desktop + mobile app (outside workspace)
├── proto/ # .proto source files
│ └── qpq/v1/
│ ├── auth.proto # OPAQUE registration + login (4 methods)
│ ├── delivery.proto # enqueue, fetch, peek, ack, batch (6 methods)
│ ├── keys.proto # key package + hybrid key CRUD (5 methods)
│ ├── channel.proto # channel create (1 method)
│ ├── user.proto # resolve user/identity (2 methods)
│ ├── blob.proto # upload/download (2 methods)
│ ├── device.proto # register/list/revoke (3 methods)
│ ├── p2p.proto # endpoint publish/resolve + health (3 methods)
│ ├── federation.proto # relay + proxy (6 methods)
│ ├── push.proto # server-push events (NEW)
│ └── common.proto # shared types (Auth, Envelope, Error)
├── sdks/
│ ├── go/ # Go SDK (regenerate from .proto)
│ └── typescript/ # TS SDK (WebTransport client)
├── justfile # NEW — build commands
└── Cargo.toml # workspace root
```
**Removed from workspace:**
- `quicproquo-bot``sdk::bot` module
- `quicproquo-ffi``sdk` with `--features c-ffi`
- `quicproquo-gen``scripts/`
- `quicproquo-gui``apps/gui/` (Tauri project, outside workspace)
- `quicproquo-mobile` → merged into `apps/gui/` (Tauri 2 mobile)
---
## Crate Reuse Assessment
| v1 Crate | capnp deps? | v2 Action | Effort |
|----------|:-----------:|-----------|--------|
| **quicproquo-core** | None | Copy as-is | Zero |
| **quicproquo-kt** | None | Copy as-is | Zero |
| **quicproquo-plugin-api** | None | Copy as-is | Zero |
| **quicproquo-p2p** | None | Copy as-is | Zero |
| **quicproquo-proto** | 100% capnp | Replace with prost codegen | Medium |
| **quicproquo-server** | 16/20 files | Extract domain logic, rewrite handlers | High |
| **quicproquo-client** | 6/10 files | Extract to SDK, thin CLI shell | High |
### Key Files to Reuse Directly
| Source (v1) | Destination (v2) | Notes |
|-------------|------------------|-------|
| `crates/quicproquo-core/` (entire) | same path | Zero changes |
| `crates/quicproquo-kt/` (entire) | same path | Zero changes |
| `crates/quicproquo-plugin-api/` (entire) | same path | Zero changes |
| `server/src/storage.rs` | `server/src/storage.rs` | Store trait — keep |
| `server/src/sql_store.rs` | `server/src/sql_store.rs` | Add connection pool |
| `server/src/hooks.rs` | `server/src/hooks.rs` | Plugin system — keep |
| `server/src/plugin_loader.rs` | `server/src/plugin_loader.rs` | Keep |
| `server/src/error_codes.rs` | `server/src/error_codes.rs` | Keep |
| `server/src/config.rs` | `server/src/config.rs` | Update for new transport |
| `client/src/conversation.rs` | `sdk/src/conversation.rs` | Move to SDK |
| `client/src/token_cache.rs` | `sdk/src/token_cache.rs` | Move to SDK |
| `client/src/display.rs` | `client/src/display.rs` | Keep in CLI |
| `schemas/*.capnp` | reference only | Translate to .proto |
---
## Phased Implementation
### Phase 1: Foundation
**Goal:** v2 branch with new workspace, proto schemas, RPC framework skeleton, SDK skeleton.
**Scope:** Compiles, no runtime functionality yet.
1. **Create v2 branch** from main
2. **Restructure workspace** — update root Cargo.toml, create new crate dirs, add justfile
3. **Write .proto files** — translate all 33 RPC methods + push events from Cap'n Proto
4. **Create quicproquo-proto crate** — prost-build codegen
5. **Create quicproquo-rpc crate** — QUIC RPC framework:
- `framing.rs` — wire format encode/decode (request, response, push)
- `server.rs` — accept QUIC connections, dispatch to handlers
- `client.rs` — connect, send requests, receive responses + push events
- `middleware.rs` — tower-based auth + rate-limit layers
- `method.rs` — method registry (method_id → async handler fn)
6. **Create quicproquo-sdk crate** — public API skeleton:
- `client.rs``QpqClient` struct
- `events.rs``ClientEvent` enum
- `conversation.rs``ConversationHandle`, `ConversationStore`
- `config.rs``ClientConfig`
7. **Extract server domain types**`server/src/domain/` module:
- `types.rs` — plain Rust request/response types
- `auth.rs` — OPAQUE logic extracted from auth_ops.rs
- `delivery.rs` — enqueue/fetch logic extracted from delivery.rs
**Verification:**
- `cargo build --workspace` succeeds
- `cargo test -p quicproquo-core` passes (72 tests)
- Proto codegen works
- RPC framework compiles
---
### Phase 2: Server Core
**Goal:** Working server with all 33 RPC handlers over QUIC.
1. **RPC dispatch** — method registry, connection lifecycle
2. **Domain handlers** — all 33 methods as `async fn(Request) -> Result<Response>`
- Auth (4): OPAQUE register start/finish, login start/finish
- Delivery (6): enqueue, fetch, fetchWait, peek, ack, batchEnqueue
- Keys (5): upload/fetch key package, upload/fetch/batch-fetch hybrid key
- Channels (1): createChannel
- Users (2): resolveUser, resolveIdentity
- Blobs (2): uploadBlob, downloadBlob
- Devices (3): registerDevice, listDevices, revokeDevice
- P2P (3): health, publishEndpoint, resolveEndpoint
- Federation (6): relay enqueue/batch, proxy fetch/resolve, health
3. **Server-push** — notification stream via QUIC uni-stream
4. **Storage upgrades:**
- Drop `FileBackedStore`
- Connection pool (deadpool-sqlite)
- Persist sessions to SQLite
- Atomic queue depth check + enqueue
5. **Tower middleware** — auth validation, rate limiting, audit logging
6. **Multi-stream** — concurrent RPCs per connection (remove 1-stream limit)
**Verification:**
- Server starts, accepts QUIC connections
- Health check RPC works
- OPAQUE registration + login works
- Message enqueue + fetch round-trip
---
### Phase 3: SDK
**Goal:** Complete client SDK library — the heart of v2.
1. **QpqClient** — connect, OPAQUE auth, session management (no global state)
2. **Crypto pipeline** — MLS processing, sealed sender unwrap, hybrid decrypt
(extracted from repl.rs `poll_messages()`)
3. **Conversation management** — create DM, create group, invite, remove, send, receive
4. **Event system**`tokio::broadcast<ClientEvent>` replacing poll loop
- `MessageReceived`, `TypingIndicator`, `ConversationCreated`
- `MemberJoined`, `MemberLeft`, `ConnectionLost`, `Reconnected`
5. **Offline support** — outbox queue, retry with backoff, sync on reconnect
6. **ConversationStore** — SQLCipher local DB (migrate from client/conversation.rs)
7. **Key management** — encrypted DiskKeyStore, MLS group state persistence
8. **Token/secret zeroization**`AuthContext.token` etc. wrapped in `Zeroizing`
**Verification:**
- SDK integration test: connect → login → create DM → send → receive
- No global state (`AUTH_CONTEXT` eliminated)
- Event subscription works
- Offline outbox drains on reconnect
---
### Phase 4: Client
**Goal:** CLI and TUI as thin shells over SDK.
1. **CLI binary** (`qpq`) — clap subcommands calling `QpqClient`
2. **REPL** — readline with tab-completion (rustyline), categorized `/help`
3. **TUI** — ratatui, subscribes to `QpqClient::subscribe()` events
4. **Simplified commands:**
- Hide MLS/KeyPackage internals (auto-refresh)
- Message references by short ID (not index)
- Batch operations (`/create-group team alice bob`)
- Categorized help (Chat, Groups, Security, System)
5. **Auto-server-launch** — keep zero-config DX from v1
6. **Playbook system** — keep YAML-based test scripting
**Verification:**
- `qpq --username alice --password pass` starts REPL (same UX as v1)
- TUI mode works with live event updates
- Tab-completion for commands and usernames
- E2E test: two clients exchange messages
---
### Phase 5: Desktop & Mobile
**Goal:** Tauri 2 app for all platforms.
1. **Tauri 2 project** in `apps/gui/`
2. **Rust backend** — Tauri commands wrapping `QpqClient`
3. **Web frontend** — Svelte or vanilla HTML/JS
4. **Desktop** — Linux, macOS, Windows
5. **Mobile** — iOS, Android via Tauri 2 mobile
6. **QUIC connection migration** — automatic wifi↔cellular handoff
**Verification:**
- Desktop app builds and runs on Linux
- Mobile app builds for Android (emulator)
- Send message from CLI → received in GUI
---
### Phase 6: Polish & Ecosystem
**Goal:** Production readiness.
1. **Federation improvements** — DNS SRV discovery, persistent relay queue with retry
2. **Plugin system v2** — version field, config passthrough, async hooks, WASM plugins
3. **WebTransport** — browser clients over HTTP/3 (same quinn endpoint)
4. **WASM MLS** — compile openmls to wasm32 for browser E2E encryption
5. **CI/CD** — release automation, WASM CI, multi-platform (Linux + macOS)
6. **Security hardening:**
- Fuzz testing (hybrid KEM, sealed sender, padding, protobuf deser)
- Remove all `InsecureServerCertVerifier` paths
- Certificate pinning
- Add passkey/WebAuthn as alternative auth
7. **Double Ratchet for 1:1 DMs** — better per-message forward secrecy than MLS for 2-party
---
## RPC Method Inventory (33 total)
| Category | Methods | Proto File |
|----------|---------|-----------|
| Auth (OPAQUE) | opaqueRegisterStart, opaqueRegisterFinish, opaqueLoginStart, opaqueLoginFinish | auth.proto |
| Delivery | enqueue, fetch, fetchWait, peek, ack, batchEnqueue | delivery.proto |
| Keys | uploadKeyPackage, fetchKeyPackage, uploadHybridKey, fetchHybridKey, fetchHybridKeys | keys.proto |
| Channel | createChannel | channel.proto |
| User | resolveUser, resolveIdentity | user.proto |
| Blob | uploadBlob, downloadBlob | blob.proto |
| Device | registerDevice, listDevices, revokeDevice | device.proto |
| P2P | health, publishEndpoint, resolveEndpoint | p2p.proto |
| Federation | relayEnqueue, relayBatchEnqueue, proxyFetchKeyPackage, proxyFetchHybridKey, proxyResolveUser, federationHealth | federation.proto |
**New in v2:**
| Push Events | Description | Proto File |
|-------------|-------------|-----------|
| MessageNotification | New message available | push.proto |
| TypingNotification | Peer is typing | push.proto |
| ChannelUpdate | Channel created/member changed | push.proto |
| SessionExpired | Auth session expired | push.proto |
---
## Engineering Standards (carried from v1)
- Conventional commits: `feat:`, `fix:`, `chore:`, `docs:`, `test:`, `refactor:`
- GPG-signed commits only
- No `Co-authored-by` trailers
- No `.unwrap()` on crypto or I/O in non-test paths
- Secrets: zeroize on drop, never in logs
- No stubs / `todo!()` / `unimplemented!()` in production code
- `clippy::unwrap_used = "deny"` at workspace level