chore: prepare repository for public release
- Add split licensing: AGPL-3.0 for server, Apache-2.0/MIT for all other crates and SDKs (Signal-style) - Add SECURITY.md with vulnerability disclosure policy - Add CONTRIBUTING.md with build, test, and code standards - Add "not audited" security disclaimer to README - Add workspace package metadata (license, repository, keywords) - Move internal planning docs to docs/internal/ (gitignored)
This commit is contained in:
@@ -1,483 +0,0 @@
|
||||
# quicproquo — AI Agent Team Specification
|
||||
|
||||
> A structured multi-agent system for bringing quicproquo from working prototype
|
||||
> to production-grade, audited, documented, deployable software.
|
||||
|
||||
---
|
||||
|
||||
## Philosophy
|
||||
|
||||
This team exists because shipping production software requires more than writing
|
||||
code. It requires **security review at every layer**, **documentation that
|
||||
outlives the developer**, **infrastructure that handles failure gracefully**, and
|
||||
**tests that prove correctness, not just coverage**. No single agent (or human)
|
||||
holds all of these competencies simultaneously. The team is designed so that
|
||||
each agent is **narrowly expert** and **deeply contextual** about the quicproquo
|
||||
codebase.
|
||||
|
||||
### Principles
|
||||
|
||||
1. **Read before write.** Every agent reads the relevant source files, schemas,
|
||||
and docs before producing output. No agent guesses at code structure.
|
||||
2. **Scope discipline.** Agents only touch their assigned crates and concern
|
||||
areas. A server-dev never edits client code. A security auditor never edits
|
||||
production code.
|
||||
3. **Security is not optional.** Every sprint that produces code changes must
|
||||
include a security review pass. This is not a suggestion — it is a gate.
|
||||
4. **Docs are a deliverable.** Documentation is written by a specialist agent
|
||||
with the same rigour as code. API docs, architecture docs, and user guides
|
||||
are first-class outputs.
|
||||
5. **Incremental, verifiable progress.** Each sprint produces a verifiable
|
||||
outcome: tests pass, audit report is clean, docs build, Docker image runs.
|
||||
|
||||
---
|
||||
|
||||
## Team Roster
|
||||
|
||||
### Development Agents
|
||||
|
||||
| Agent | Scope | Tools | Edits Code? |
|
||||
|-------|-------|-------|-------------|
|
||||
| `rust-architect` | Architecture design, ADRs, crate boundary review | Read, Glob, Grep | No |
|
||||
| `rust-core-dev` | `quicproquo-core`: crypto, MLS, Noise, hybrid KEM | Read, Glob, Grep, Edit, Write, Bash | Yes |
|
||||
| `rust-server-dev` | `quicproquo-server`: AS, DS, RPC, storage, federation | Read, Glob, Grep, Edit, Write, Bash | Yes |
|
||||
| `rust-client-dev` | `quicproquo-client`: CLI, REPL, OPAQUE, local state | Read, Glob, Grep, Edit, Write, Bash | Yes |
|
||||
|
||||
### Security Agents
|
||||
|
||||
| Agent | Scope | Tools | Edits Code? |
|
||||
|-------|-------|-------|-------------|
|
||||
| `security-auditor` | Code review, finding report, threat analysis | Read, Glob, Grep | No |
|
||||
|
||||
### Quality Agents
|
||||
|
||||
| Agent | Scope | Tools | Edits Code? |
|
||||
|-------|-------|-------|-------------|
|
||||
| `test-engineer` | Unit, integration, E2E, property tests, coverage | Read, Glob, Grep, Edit, Write, Bash | Yes (tests only) |
|
||||
| `devops-engineer` | Docker, CI/CD, deployment, monitoring, infrastructure | Read, Glob, Grep, Edit, Write, Bash | Yes |
|
||||
|
||||
### Documentation Agents
|
||||
|
||||
| Agent | Scope | Tools | Edits Code? |
|
||||
|-------|-------|-------|-------------|
|
||||
| `docs-engineer` | User guides, API docs, architecture docs, mdBook | Read, Glob, Grep, Edit, Write, Bash | Yes (docs only) |
|
||||
|
||||
### Coordination Agents
|
||||
|
||||
| Agent | Scope | Tools | Edits Code? |
|
||||
|-------|-------|-------|-------------|
|
||||
| `roadmap-tracker` | Progress assessment, status reports, blocker analysis | Read, Glob, Grep | No |
|
||||
|
||||
---
|
||||
|
||||
## Agent Role Specifications
|
||||
|
||||
### rust-architect
|
||||
|
||||
**Identity:** Senior Rust systems architect with deep knowledge of MLS
|
||||
(RFC 9420), Noise Protocol Framework, Cap'n Proto RPC, and post-quantum
|
||||
cryptography.
|
||||
|
||||
**Reads:** `master-prompt.md`, `ROADMAP.md`, all `.capnp` schemas, crate
|
||||
`lib.rs` and `mod.rs` files, `Cargo.toml` dependency lists.
|
||||
|
||||
**Produces:**
|
||||
- Architecture Decision Records (ADR) in Context → Decision → Consequences format
|
||||
- Crate boundary violation reports
|
||||
- Dependency impact assessments for new crates
|
||||
- Design documents for features spanning multiple crates
|
||||
- Review feedback on proposed implementations
|
||||
|
||||
**Never does:** Write implementation code, edit source files, run commands.
|
||||
|
||||
**Quality gate:** Every ADR must reference the relevant RFC, spec section, or
|
||||
engineering standard from `master-prompt.md`.
|
||||
|
||||
---
|
||||
|
||||
### rust-core-dev
|
||||
|
||||
**Identity:** Cryptography-focused Rust developer. Expert in `openmls`, `snow`,
|
||||
`ml-kem`, `opaque-ke`, `zeroize`, and the `dalek` ecosystem.
|
||||
|
||||
**Owns:** `crates/quicproquo-core/`
|
||||
|
||||
**Security invariants (non-negotiable):**
|
||||
- Every crypto operation returns `Result` — never `.unwrap()` or `.expect()`
|
||||
- All key material types derive `Zeroize` and `ZeroizeOnDrop`
|
||||
- No secret bytes in `tracing` or `log` output
|
||||
- Constant-time comparisons via `subtle::ConstantTimeEq` for auth tags
|
||||
- No `unsafe` without a `// SAFETY:` comment documenting the invariant
|
||||
|
||||
**Before any edit:**
|
||||
1. Read the target file in full
|
||||
2. Read `ROADMAP.md` to verify the change is in scope
|
||||
3. Read `master-prompt.md` §Non-Negotiable Engineering Standards
|
||||
4. Check if a new dependency is needed — if yes, justify in commit message
|
||||
|
||||
**After any edit:** `cargo check -p quicproquo-core && cargo test -p quicproquo-core`
|
||||
|
||||
---
|
||||
|
||||
### rust-server-dev
|
||||
|
||||
**Identity:** Backend systems developer. Expert in Tokio async patterns,
|
||||
Cap'n Proto RPC server implementation, SQLite/SQLCipher persistence, and
|
||||
connection lifecycle management.
|
||||
|
||||
**Owns:** `crates/quicproquo-server/`
|
||||
|
||||
**Security invariants:**
|
||||
- No `.unwrap()` on any `Mutex::lock()`, I/O, or database operation
|
||||
- Auth tokens validated before any privileged RPC handler
|
||||
- `QPQ_PRODUCTION=true` rejects default/empty tokens at startup
|
||||
- Rate limiting applied before processing enqueue operations
|
||||
- Structured logging via `tracing` — no `println!` or `eprintln!`
|
||||
|
||||
**Before any edit:**
|
||||
1. Read the target file and its corresponding `.capnp` schema
|
||||
2. Verify the Cap'n Proto interface hasn't changed out from under you
|
||||
3. Check for existing tests in `crates/quicproquo-server/tests/`
|
||||
|
||||
**After any edit:** `cargo check -p quicproquo-server && cargo test -p quicproquo-server`
|
||||
|
||||
---
|
||||
|
||||
### rust-client-dev
|
||||
|
||||
**Identity:** CLI and application developer. Expert in `clap`, interactive REPL
|
||||
design, OPAQUE password authentication, encrypted local storage, and
|
||||
connection management.
|
||||
|
||||
**Owns:** `crates/quicproquo-client/`
|
||||
|
||||
**UX invariants:**
|
||||
- Clear, user-facing error messages — no raw Rust error types in REPL output
|
||||
- REPL prompt shows current context (server address, active conversation)
|
||||
- Graceful handling of server disconnection with auto-reconnect
|
||||
- State file encrypted with Argon2id + ChaCha20-Poly1305
|
||||
|
||||
**Before any edit:**
|
||||
1. Read the target file and related command handlers in `commands.rs`
|
||||
2. Understand state management in `state.rs`
|
||||
3. Check the REPL command table for conflicts
|
||||
|
||||
**After any edit:** `cargo check -p quicproquo-client && cargo test -p quicproquo-client`
|
||||
|
||||
---
|
||||
|
||||
### security-auditor
|
||||
|
||||
**Identity:** Application security engineer specialising in cryptographic
|
||||
protocol implementations. Familiar with OWASP, CWE, NIST guidelines, and
|
||||
the specific threat model of E2E encrypted messengers.
|
||||
|
||||
**Audit checklist (every review):**
|
||||
1. `.unwrap()` / `.expect()` outside `#[cfg(test)]` on crypto or I/O paths
|
||||
2. Key material types missing `Zeroize` / `ZeroizeOnDrop`
|
||||
3. Secrets (keys, passwords, tokens, nonces) reaching `tracing`/`log`/`println`
|
||||
4. Non-constant-time comparisons on authentication tags, tokens, or MACs
|
||||
5. `panic!` / `unreachable!` in production paths
|
||||
6. `unsafe` blocks without documented safety invariants
|
||||
7. Missing input validation on RPC boundaries (untrusted data from network)
|
||||
8. Race conditions in shared state (DashMap, Mutex, RwLock patterns)
|
||||
9. Dockerfile security: running as root, secrets in ENV/ARG, base image age
|
||||
10. Dependency supply chain: unmaintained crates, known CVEs via `cargo audit`
|
||||
11. Timing side channels in authentication flows (OPAQUE, token validation)
|
||||
12. Replay attack vectors in message delivery
|
||||
|
||||
**Output format:** Prioritised Markdown report with severity levels:
|
||||
`Critical > High > Medium > Low > Informational`
|
||||
|
||||
Each finding includes: file:line, description, attack scenario, remediation.
|
||||
|
||||
**Never does:** Edit source files. Findings only.
|
||||
|
||||
---
|
||||
|
||||
### test-engineer
|
||||
|
||||
**Identity:** QA engineer with expertise in Rust testing patterns, property-based
|
||||
testing (`proptest`), integration test harnesses, and E2E test design for
|
||||
networked systems.
|
||||
|
||||
**Responsibilities:**
|
||||
- Write unit tests inside `#[cfg(test)]` modules
|
||||
- Write integration tests in `crates/<crate>/tests/`
|
||||
- Write E2E tests that spin up server + client(s)
|
||||
- Run `cargo test` and diagnose failures
|
||||
- Verify test coverage against ROADMAP milestone criteria
|
||||
- Identify untested code paths and edge cases
|
||||
|
||||
**Naming convention:** `test_<what>_<expected_outcome>` (snake_case)
|
||||
|
||||
**E2E test requirements:**
|
||||
- Use `AUTH_LOCK` mutex for tests that share auth context
|
||||
- Run with `--test-threads 1` for E2E tests
|
||||
- Clean up spawned server processes on test completion
|
||||
- Assert on specific error types, not just `is_err()`
|
||||
|
||||
**After writing tests:** Run them, report pass/fail, diagnose failures.
|
||||
|
||||
---
|
||||
|
||||
### devops-engineer
|
||||
|
||||
**Identity:** Infrastructure and deployment engineer. Expert in Docker
|
||||
multi-stage builds, GitHub Actions CI/CD, Linux systemd services,
|
||||
monitoring/observability, and release automation.
|
||||
|
||||
**Owns:** `docker/`, `.github/`, `docker-compose.yml`, deployment configs
|
||||
|
||||
**Responsibilities:**
|
||||
- Docker image builds, optimisation, and security hardening
|
||||
- CI pipeline maintenance and enhancement
|
||||
- Release automation (cargo-release, changelogs, tagging)
|
||||
- Monitoring setup (Prometheus metrics endpoint, Grafana dashboards)
|
||||
- Deployment documentation (systemd units, Docker Compose, Kubernetes)
|
||||
- Infrastructure-as-code for test and staging environments
|
||||
- Cross-compilation targets (musl, ARM, MIPS for OpenWrt)
|
||||
- Binary size optimisation for embedded/mesh deployments
|
||||
|
||||
**Quality gates:**
|
||||
- Docker image builds successfully: `docker build -f docker/Dockerfile .`
|
||||
- CI pipeline passes locally: `act` or manual validation
|
||||
- Release artifacts are reproducible
|
||||
|
||||
---
|
||||
|
||||
### docs-engineer
|
||||
|
||||
**Identity:** Technical writer with deep understanding of cryptographic
|
||||
protocols and systems programming. Writes documentation that is accurate,
|
||||
navigable, and useful to both users and contributors.
|
||||
|
||||
**Owns:** `docs/`, `README.md`, `CONTRIBUTING.md`, `SECURITY.md`, inline
|
||||
doc comments on public APIs
|
||||
|
||||
**Documentation tiers:**
|
||||
|
||||
1. **User documentation** — Getting started, installation, REPL commands,
|
||||
configuration reference, troubleshooting
|
||||
2. **Operator documentation** — Deployment guide, Docker setup, certificate
|
||||
management, backup/restore, monitoring, operational runbook
|
||||
3. **Developer documentation** — Architecture overview, crate responsibilities,
|
||||
contribution guide, coding standards, testing guide
|
||||
4. **Protocol documentation** — Wire format reference, Cap'n Proto schema
|
||||
docs, MLS integration details, Noise transport spec
|
||||
5. **Security documentation** — Threat model, trust boundaries, key lifecycle,
|
||||
audit reports, responsible disclosure policy
|
||||
|
||||
**Quality gates:**
|
||||
- `mdbook build docs/` succeeds without warnings
|
||||
- All code examples in docs compile (`cargo test --doc`)
|
||||
- Internal links resolve (no broken cross-references)
|
||||
- Every public API has a doc comment with examples
|
||||
|
||||
---
|
||||
|
||||
### roadmap-tracker
|
||||
|
||||
**Identity:** Project manager and progress analyst. Reads code and docs to
|
||||
objectively assess completion status.
|
||||
|
||||
**Method:**
|
||||
1. Read `ROADMAP.md` in full
|
||||
2. For each unchecked `- [ ]` item, search source for implementation evidence
|
||||
3. Classify: Complete, Partial (what exists vs. what's missing), Not Started
|
||||
4. Identify blockers (dependency chains between items)
|
||||
5. Identify quick wins (< 1 hour, self-contained, high impact)
|
||||
|
||||
**Output:** Structured Markdown status report.
|
||||
|
||||
**Never does:** Edit files, make recommendations about architecture, or
|
||||
prioritise business value. Pure objective assessment.
|
||||
|
||||
---
|
||||
|
||||
## Sprint Definitions
|
||||
|
||||
Sprints are groups of agent tasks that can run in parallel. Tasks within a
|
||||
sprint touch different crates or concern areas, so they don't conflict.
|
||||
|
||||
### Production Readiness Path
|
||||
|
||||
The sprints below form a dependency chain. Run them in order.
|
||||
|
||||
```
|
||||
status → audit → phase1-hardening → phase1-infra → phase2-tests →
|
||||
docs-foundation → security-review → release-prep
|
||||
```
|
||||
|
||||
### Sprint: `status`
|
||||
|
||||
**Purpose:** Baseline assessment before starting work.
|
||||
|
||||
| Agent | Task |
|
||||
|-------|------|
|
||||
| `roadmap-tracker` | Full roadmap status report across all phases |
|
||||
| `security-auditor` | Quick security sweep of recent changes (HEAD~10) |
|
||||
|
||||
### Sprint: `audit`
|
||||
|
||||
**Purpose:** Deep security audit + roadmap analysis.
|
||||
|
||||
| Agent | Task |
|
||||
|-------|------|
|
||||
| `security-auditor` | Full audit of quicproquo-core and quicproquo-server |
|
||||
| `roadmap-tracker` | Detailed Phase 1 and Phase 2 completion assessment |
|
||||
|
||||
### Sprint: `phase1-hardening`
|
||||
|
||||
**Purpose:** Eliminate crash paths and enforce secure defaults.
|
||||
|
||||
| Agent | Task |
|
||||
|-------|------|
|
||||
| `rust-core-dev` | Remove `.unwrap()`/`.expect()` from non-test code in core |
|
||||
| `rust-server-dev` | Remove `.unwrap()`/`.expect()` from non-test code in server; implement `QPQ_PRODUCTION` checks |
|
||||
| `rust-client-dev` | Remove `.unwrap()`/`.expect()` from non-test code in client; fix `AUTH_CONTEXT.read().expect()` |
|
||||
|
||||
### Sprint: `phase1-infra`
|
||||
|
||||
**Purpose:** Fix deployment infrastructure.
|
||||
|
||||
| Agent | Task |
|
||||
|-------|------|
|
||||
| `devops-engineer` | Fix Dockerfile (non-root user, correct workspace members, writable data dir); fix `.gitignore`; validate Docker build |
|
||||
| `rust-architect` | Design TLS certificate lifecycle: CA-signed cert flow, `--tls-required` flag, rotation without downtime |
|
||||
|
||||
### Sprint: `phase2-tests`
|
||||
|
||||
**Purpose:** Build test confidence.
|
||||
|
||||
| Agent | Task |
|
||||
|-------|------|
|
||||
| `test-engineer` | E2E tests: auth failures, message ordering, concurrent clients, KeyPackage exhaustion |
|
||||
| `test-engineer` | Unit tests: REPL parsing edge cases, token cache expiry, state file encryption round-trip |
|
||||
| `devops-engineer` | CI hardening: coverage reporting, Docker build validation in CI, `CODEOWNERS` enforcement |
|
||||
|
||||
### Sprint: `docs-foundation`
|
||||
|
||||
**Purpose:** Create production-quality documentation.
|
||||
|
||||
| Agent | Task |
|
||||
|-------|------|
|
||||
| `docs-engineer` | Create root-level `SECURITY.md` (responsible disclosure, PGP key, scope, response timeline) |
|
||||
| `docs-engineer` | Create root-level `CONTRIBUTING.md` (dev setup, PR process, commit conventions, testing, review checklist) |
|
||||
| `docs-engineer` | Audit and update all `docs/src/` pages for accuracy against current codebase; fix broken references |
|
||||
| `docs-engineer` | Write operator deployment guide: Docker, systemd, certificate setup, monitoring, backup/restore |
|
||||
|
||||
### Sprint: `security-review`
|
||||
|
||||
**Purpose:** Final security gate before release.
|
||||
|
||||
| Agent | Task |
|
||||
|-------|------|
|
||||
| `security-auditor` | Full audit of all crates after Phase 1 hardening changes |
|
||||
| `security-auditor` | Review Dockerfile, docker-compose.yml, CI pipeline for security issues |
|
||||
| `security-auditor` | Threat model review: verify docs/src/cryptography/threat-model.md matches current implementation |
|
||||
|
||||
### Sprint: `release-prep`
|
||||
|
||||
**Purpose:** Prepare for first production release.
|
||||
|
||||
| Agent | Task |
|
||||
|-------|------|
|
||||
| `devops-engineer` | Set up cargo-release workflow, CHANGELOG.md generation, version tagging strategy |
|
||||
| `docs-engineer` | Final README.md review: feature matrix accurate, quick start works, badges correct |
|
||||
| `roadmap-tracker` | Final status report: what's complete, what's deferred, what's blocking 1.0 |
|
||||
|
||||
---
|
||||
|
||||
## Usage
|
||||
|
||||
```bash
|
||||
# Full orchestrator mode — orchestrator delegates to the right agents
|
||||
python scripts/ai_team.py "Implement Phase 1.1 unwrap removal across all crates"
|
||||
|
||||
# Direct agent access — bypass orchestrator for focused work
|
||||
python scripts/ai_team.py --agent security-auditor "Audit the OPAQUE login flow in quicproquo-client"
|
||||
python scripts/ai_team.py --agent docs-engineer "Write the operator deployment guide"
|
||||
|
||||
# Predefined parallel sprint — multiple agents work simultaneously
|
||||
python scripts/ai_team.py --sprint audit
|
||||
python scripts/ai_team.py --sprint phase1-hardening
|
||||
python scripts/ai_team.py --sprint docs-foundation
|
||||
|
||||
# Ad-hoc parallel tasks
|
||||
python scripts/ai_team.py --parallel \
|
||||
"rust-server-dev: Fix rate limiting bypass in enqueue handler" \
|
||||
"security-auditor: Review the rate limiting implementation"
|
||||
|
||||
# Discovery
|
||||
python scripts/ai_team.py --list-agents
|
||||
python scripts/ai_team.py --list-sprints
|
||||
```
|
||||
|
||||
### Recommended Production Readiness Sequence
|
||||
|
||||
```bash
|
||||
# 1. Assess current state
|
||||
python scripts/ai_team.py --sprint status
|
||||
|
||||
# 2. Deep audit
|
||||
python scripts/ai_team.py --sprint audit
|
||||
|
||||
# 3. Fix critical issues (code changes)
|
||||
python scripts/ai_team.py --sprint phase1-hardening
|
||||
|
||||
# 4. Fix infrastructure
|
||||
python scripts/ai_team.py --sprint phase1-infra
|
||||
|
||||
# 5. Build test confidence
|
||||
python scripts/ai_team.py --sprint phase2-tests
|
||||
|
||||
# 6. Write documentation
|
||||
python scripts/ai_team.py --sprint docs-foundation
|
||||
|
||||
# 7. Final security review (after all code changes)
|
||||
python scripts/ai_team.py --sprint security-review
|
||||
|
||||
# 8. Prepare release
|
||||
python scripts/ai_team.py --sprint release-prep
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Quality Gates
|
||||
|
||||
Every sprint must pass its quality gate before the next sprint begins.
|
||||
|
||||
| Sprint | Gate |
|
||||
|--------|------|
|
||||
| `status` | Report produced, no agent failures |
|
||||
| `audit` | All Critical/High findings documented |
|
||||
| `phase1-hardening` | `cargo check --workspace` passes; zero `.unwrap()` outside `#[cfg(test)]` |
|
||||
| `phase1-infra` | `docker build -f docker/Dockerfile .` succeeds; `.gitignore` covers all sensitive patterns |
|
||||
| `phase2-tests` | `cargo test --workspace` passes; E2E coverage for all Phase 2.1 items |
|
||||
| `docs-foundation` | `mdbook build docs/` succeeds; `SECURITY.md` and `CONTRIBUTING.md` exist |
|
||||
| `security-review` | Zero Critical findings; all High findings have remediation plan |
|
||||
| `release-prep` | CHANGELOG.md exists; version tags consistent; README quick start verified |
|
||||
|
||||
---
|
||||
|
||||
## Extending the Team
|
||||
|
||||
To add a new agent:
|
||||
|
||||
1. Define it in `AGENTS` dict in `scripts/ai_team.py`
|
||||
2. Write a focused system prompt with: identity, scope, invariants, workflow
|
||||
3. Specify the minimal tool set (prefer read-only when possible)
|
||||
4. Add it to relevant sprints
|
||||
5. Document it in this file
|
||||
|
||||
To add a new sprint:
|
||||
|
||||
1. Define it in `SPRINTS` dict in `scripts/ai_team.py`
|
||||
2. Ensure all tasks within the sprint touch different files/crates
|
||||
3. Document the quality gate
|
||||
4. Add it to the dependency chain if it has ordering requirements
|
||||
|
||||
---
|
||||
|
||||
*quicproquo AI Agent Team — v2.0 | 2026-03-03*
|
||||
@@ -1,106 +0,0 @@
|
||||
# Multi-Agent Work Plan: Sections 1 (Security) + 5 (Features)
|
||||
|
||||
This document splits work for **Future Improvements §1 (Security and hardening)** and **§5 (Features and product)** between two agents so they can work in parallel with minimal merge conflicts.
|
||||
|
||||
---
|
||||
|
||||
## Agent A: Security and hardening
|
||||
|
||||
**Owns:** Server auth/OPAQUE, TLS config, core crypto (identity, keypackage, hybrid_kem), docs under `docs/src/cryptography/` and TLS/cert docs.
|
||||
|
||||
### A1. 1.2 CA-signed TLS / certificate lifecycle
|
||||
- **Files:** `docs/src/getting-started/` (new or existing), `crates/quicproquo-server/src/tls.rs` (optional env), `README.md`.
|
||||
- **Tasks:**
|
||||
1. Add **Certificate lifecycle** doc: using CA-issued certs (e.g. Let's Encrypt), cert rotation, OCSP/CRL optional. Recommend pinning for single-server.
|
||||
2. Optional: server config or env to prefer CA-signed cert path (e.g. `QPQ_USE_CA_CERT=1` and read from a different path). Low priority if docs suffice.
|
||||
- **Deliverable:** `docs/src/getting-started/certificate-lifecycle.md` (or section in running-the-server) + README link.
|
||||
|
||||
### A2. 1.4 Username enumeration (OPAQUE)
|
||||
- **Files:** `crates/quicproquo-server/src/node_service/auth_ops.rs`, `docs/SECURITY-AUDIT.md`.
|
||||
- **Tasks:**
|
||||
1. Document the risk in SECURITY-AUDIT (already mentioned).
|
||||
2. Optional mitigation: ensure `get_user_record` is always called before `ServerLogin::start` (already true). If desired, add a constant-time delay or dummy work when user not found so response timing does not leak existence. Keep OPAQUE security unchanged.
|
||||
- **Deliverable:** Doc update; optional small code change in `handle_opaque_login_start`.
|
||||
|
||||
### A3. 1.1 M7 — Post-quantum MLS
|
||||
- **Files:** `crates/quicproquo-core/src/` (new or modified crypto provider), `crates/quicproquo-core/src/group.rs`, `crates/quicproquo-core/src/hybrid_kem.rs`, `crates/quicproquo-core/src/hybrid_crypto.rs`.
|
||||
- **Tasks:**
|
||||
1. Implement a custom `OpenMlsCryptoProvider` (or adapter) that uses hybrid X25519 + ML-KEM-768 for MLS KEM (HPKE layer).
|
||||
2. Wire hybrid shared secret derivation (see milestones M7) into the provider.
|
||||
3. Run full test suite; ensure M3/M4/M5 tests pass.
|
||||
- **Deliverable:** Hybrid KEM in MLS path; tests green. Large change; coordinate with core crate.
|
||||
|
||||
### A4. 1.3 Stronger credential binding
|
||||
- **Files:** Docs only for now.
|
||||
- **Tasks:** Add a short **Future research** subsection or ADR: X.509-based MLS credentials, or Key Transparency for public key binding. No code change in this round.
|
||||
- **Deliverable:** `docs/src/roadmap/future-research.md` or ADR update.
|
||||
|
||||
---
|
||||
|
||||
## Agent B: Features and product
|
||||
|
||||
**Owns:** Cap'n Proto schema (node.capnp delivery/channel methods), server storage (Store trait, FileBackedStore, SqlStore), `node_service/delivery.rs`, `node_service/key_ops.rs` (if createChannel lives there), client commands for channels.
|
||||
|
||||
### B1. 5.1 Private 1:1 channels (DM)
|
||||
- **Files:** `schemas/node.capnp`, `crates/quicproquo-server/src/storage.rs`, `crates/quicproquo-server/src/sql_store.rs`, `crates/quicproquo-server/src/node_service/delivery.rs`, new `crates/quicproquo-server/src/node_service/channel_ops.rs` (or add to delivery), migrations for channels table.
|
||||
- **Tasks:**
|
||||
1. **Schema:** Add `createChannel @N (auth :Auth, peerKey :Data) -> (channelId :Data);` to `node.capnp`. Rebuild proto.
|
||||
2. **Store trait:** Add `create_channel(&self, member_a: &[u8], member_b: &[u8]) -> Result<Vec<u8>, StorageError>`, `get_channel_members(&self, channel_id: &[u8]) -> Result<Option<(Vec<u8>, Vec<u8>)>, StorageError>`. Implement in FileBackedStore (in-memory map channel_id -> (a, b)) and SqlStore (channels table, unique on sorted (a,b)).
|
||||
3. **Server:** Implement `handle_create_channel`: auth required, identity required; create channel with (caller_identity, peer_key); return 16-byte channel_id (e.g. UUID).
|
||||
4. **Delivery authz:** When `channel_id.len() == 16`: call `get_channel_members`. If Some((a, b)), verify caller identity is one of a/b and recipient_key is the other. If channel not found or authz fails, return E022 (or new code). Legacy: `channel_id` empty = current behaviour (no channel check).
|
||||
5. **Config:** Optional server flag to require channel authz for non-empty channel_id (default on).
|
||||
- **Deliverable:** createChannel RPC, channel storage, per-channel authz on enqueue/fetch/fetchWait; legacy mode when channel_id empty.
|
||||
- **Ref:** [DM channels design](src/roadmap/dm-channels.md).
|
||||
|
||||
### B2. 5.2 MLS lifecycle (remove, update, proposals)
|
||||
- **Files:** `crates/quicproquo-core/src/group.rs`, client commands that use GroupMember.
|
||||
- **Tasks:**
|
||||
1. Add `remove_member` (by index or identity) and `update_credential` / rekey using openmls APIs.
|
||||
2. Handle incoming MLS proposals (Remove, Update) in `receive_message` path and apply to group state.
|
||||
3. CLI: `remove` and `update` subcommands or options.
|
||||
- **Deliverable:** Members can be removed and credentials updated; proposals handled; CLI exposed.
|
||||
- **Ref:** OpenMLS API for `MlsGroup::remove_member`, `MlsGroup::process_pending_proposals`, etc.
|
||||
|
||||
### B3. 5.3 Sealed Sender and 5.4 Traffic analysis
|
||||
- **Files:** Docs; optionally `crates/quicproquo-server`, `crates/quicproquo-client` for padding.
|
||||
- **Tasks:**
|
||||
1. Document current `sealed_sender` behaviour (enqueue without identity binding) and that full “sender in ciphertext” is a future protocol change.
|
||||
2. Optional: add optional payload padding (e.g. pad to next 256 bytes) or random delay in client send path for 5.4.
|
||||
- **Deliverable:** Doc update; optional padding/behaviour.
|
||||
|
||||
---
|
||||
|
||||
## File ownership (avoid conflicts)
|
||||
|
||||
| Area | Agent A | Agent B |
|
||||
|------|---------|---------|
|
||||
| `schemas/node.capnp` | — | Add createChannel |
|
||||
| `crates/quicproquo-server/src/node_service/auth_ops.rs` | 1.4 username enum | — |
|
||||
| `crates/quicproquo-server/src/node_service/delivery.rs` | — | 5.1 channel authz |
|
||||
| `crates/quicproquo-server/src/storage.rs` | — | 5.1 Store channel methods |
|
||||
| `crates/quicproquo-server/src/sql_store.rs` | — | 5.1 channels table + impl |
|
||||
| `crates/quicproquo-server/src/tls.rs` | 1.2 optional | — |
|
||||
| `crates/quicproquo-core/` | 1.1 M7, 1.3 doc | 5.2 group.rs |
|
||||
| `docs/` | 1.2, 1.3, 1.4, 5.3/5.4 | — (or shared) |
|
||||
|
||||
**Shared:** `docs/`, `README.md`. Prefer non-overlapping files (e.g. A adds `certificate-lifecycle.md`, B does not edit it).
|
||||
|
||||
---
|
||||
|
||||
## Order of operations (recommended)
|
||||
|
||||
1. **Both:** Sync on schema and Store trait changes so B adds `createChannel` and channel methods without A touching the same trait.
|
||||
2. **Agent A:** Ship A1 (CA/TLS docs) and A2 (1.4 doc + optional code) first; then A3 (M7) in a follow-up PR/batch.
|
||||
3. **Agent B:** Ship B1 (createChannel + channel authz) first; then B2 (MLS remove/update); then B3/B4 (docs/padding).
|
||||
|
||||
---
|
||||
|
||||
## Completion checklist
|
||||
|
||||
- [ ] A1: CA-signed TLS / certificate lifecycle doc
|
||||
- [ ] A2: Username enumeration doc and/or mitigation
|
||||
- [ ] A3: M7 hybrid KEM in MLS provider
|
||||
- [ ] A4: 1.3 credential binding (docs)
|
||||
- [ ] B1: createChannel RPC + channel storage + delivery authz
|
||||
- [ ] B2: MLS remove/update and proposal handling
|
||||
- [ ] B3/B4: Sealed Sender and traffic analysis (docs + optional padding)
|
||||
@@ -1,317 +0,0 @@
|
||||
# Consolidated Codebase Review — quicproquo
|
||||
**Date:** 2026-03-04
|
||||
**Reviewers:** 4 independent agents (security, architecture, code quality, correctness)
|
||||
**Scope:** Full codebase — all workspace crates, schemas, Cargo.toml
|
||||
|
||||
---
|
||||
|
||||
## CRITICAL (7 findings)
|
||||
|
||||
### C1. Federation service has NO authentication on inbound requests
|
||||
**Source:** Security | **File:** `crates/quicproquo-server/src/federation/service.rs:22-201`
|
||||
`FederationServiceImpl` handles inbound federation requests (`relay_enqueue`, `relay_batch_enqueue`, `proxy_fetch_key_package`, `proxy_resolve_user`) but performs **zero authentication** on the caller. The `auth` field in the request is only logged (`origin` string), never validated. While mTLS protects the transport, any server with a valid federation certificate can inject arbitrary messages, enumerate users, and fetch KeyPackages. The `FederationAuth.origin` field is a self-declared string, not verified against the mTLS certificate's subject.
|
||||
**Fix:** Validate `origin` against the mTLS client certificate's CN/SAN. Enforce per-peer rate limits. Consider signing federation messages at the application layer.
|
||||
|
||||
### C2. WebSocket bridge bypasses DM channel authorization
|
||||
**Source:** Security | **File:** `crates/quicproquo-server/src/ws_bridge.rs:230-305`
|
||||
The `handle_send` function lets any authenticated user enqueue a message to any recipient, bypassing the DM channel membership verification that the Cap'n Proto `enqueue` path enforces in `delivery.rs:93-135`. The WS bridge calls `store.enqueue()` directly, skipping channel membership auth, payload size limits (5 MB), rate limiting, hook invocations, delivery proof generation, and audit logging.
|
||||
**Fix:** Apply the same authorization, size limits, rate limiting, hooks, and audit logging as the Cap'n Proto delivery path.
|
||||
|
||||
### C3. `hpke_seal` silently returns empty ciphertext on error
|
||||
**Source:** Quality | **File:** `crates/quicproquo-core/src/hybrid_crypto.rs:198-201,216-219`
|
||||
`HybridCrypto::hpke_seal()` catches errors and returns `Ok(vec![])` instead of propagating. Empty ciphertexts are sent as if valid — data loss and security issue.
|
||||
**Fix:** Propagate errors via `Result`.
|
||||
|
||||
### C4. NodeServiceImpl god object (15 fields, 27 RPC methods, no capability segmentation)
|
||||
**Source:** Architecture | **File:** `crates/quicproquo-server/src/node_service/mod.rs:253-336`
|
||||
Single struct implements 27 methods spanning auth, delivery, key management, channels, blobs, devices, federation, and account lifecycle. `handle_node_connection` takes 15 parameters. Unauthenticated clients get capability to invoke all 27 methods.
|
||||
**Fix:** Extract `ServerContext` struct. Split Cap'n Proto schema into capability interfaces (AuthService, DeliveryService, etc.) vended after auth.
|
||||
|
||||
### C5. FileBackedStore O(n) full-map serialization on every mutation
|
||||
**Source:** Architecture | **File:** `crates/quicproquo-server/src/storage.rs:327-442`
|
||||
Every write locks Mutex, mutates HashMap, serializes **entire** map to disk via `fs::write`. No fsync, no atomic rename. Performance cliff and data-loss vector.
|
||||
**Fix:** Make SqlStore the default. If FileBackedStore remains for dev, use write-to-temp-then-rename.
|
||||
|
||||
### C6. `std::sync::Mutex` in async context (server and P2P)
|
||||
**Source:** Architecture | **Files:** `storage.rs:1-7,25-28`, `sql_store.rs:4,50`, `node_service/mod.rs:272`
|
||||
Holding `std::sync::Mutex` across disk I/O blocks Tokio worker threads, causing head-of-line blocking.
|
||||
**Fix:** Replace with `tokio::sync::Mutex` or use `spawn_blocking` for disk I/O.
|
||||
|
||||
### C7. `hpke_setup_sender_and_export` silently downgrades to classical crypto on parse error
|
||||
**Source:** Security | **File:** `crates/quicproquo-core/src/hybrid_crypto.rs:263`
|
||||
On invalid hybrid public key, silently falls back to classical RustCrypto provider. Malicious server can force PQ downgrade.
|
||||
**Fix:** Return error on hybrid key parse failure, consistent with `hpke_seal`.
|
||||
|
||||
---
|
||||
|
||||
## HIGH (14 findings)
|
||||
|
||||
### H1. Global `AUTH_CONTEXT: RwLock<Option<ClientAuth>>`
|
||||
**Source:** Architecture + Quality | **File:** `crates/quicproquo-client/src/lib.rs:36,40,82`
|
||||
Blocks multi-account, creates hidden coupling, root cause of `AUTH_LOCK` test serialization hack.
|
||||
**Fix:** Replace with `ClientContext` struct passed to all functions.
|
||||
|
||||
### H2. Store trait is 30+ method monolith
|
||||
**Source:** Architecture | **File:** `crates/quicproquo-server/src/storage.rs:33-180`
|
||||
Any new storage backend must implement every method. Cannot be composed or partially implemented.
|
||||
**Fix:** Split into sub-traits: `KeyPackageStore`, `DeliveryStore`, `UserStore`, `ChannelStore`, etc.
|
||||
|
||||
### H3. CoreError::Mls wraps errors as String, losing type info
|
||||
**Source:** Architecture | **File:** `crates/quicproquo-core/src/error.rs:16-17`
|
||||
Impossible to match on specific MLS error conditions.
|
||||
**Fix:** Create MLS sub-error variants or wrap boxed error.
|
||||
|
||||
### H4. Proto `from_bytes` uses default 64 MiB traversal limit
|
||||
**Source:** Architecture | **File:** `crates/quicproquo-proto/src/lib.rs:67-72`
|
||||
DoS amplification vector for direct callers (client, bot, FFI).
|
||||
**Fix:** Accept `ReaderOptions` as parameter, make default stricter.
|
||||
|
||||
### H5. Mobile crate hardcodes SkipServerVerification
|
||||
**Source:** Architecture + Security | **File:** `crates/quicproquo-mobile/src/lib.rs:93-100,165-172`
|
||||
Unconditionally skips TLS verification. Inherently MITM-vulnerable.
|
||||
**Fix:** Add certificate_verifier parameter or feature flag.
|
||||
|
||||
### H6. Duplicate InsecureServerCertVerifier implementations
|
||||
**Source:** Architecture | **Files:** `client/rpc.rs:27-29`, `mobile/lib.rs:165-167`
|
||||
**Fix:** Consolidate into shared crate behind `cfg(feature = "insecure")`.
|
||||
|
||||
### H7. DiskKeyStore writes HPKE private keys to disk unencrypted
|
||||
**Source:** Security | **File:** `crates/quicproquo-core/src/keystore.rs`
|
||||
No encryption, no file permissions. HPKE private keys are MLS epoch secrets.
|
||||
**Fix:** Encrypt with Argon2id + ChaCha20-Poly1305. Set 0o600 permissions.
|
||||
|
||||
### H8. `identity.rs:seed_bytes()` returns unzeroized copy of secret seed
|
||||
**Source:** Security | **File:** `crates/quicproquo-core/src/identity.rs:52`
|
||||
Copies 32-byte Ed25519 seed out of `Zeroizing` wrapper. Returned value not zeroized.
|
||||
**Fix:** Return `&[u8]` reference or `Zeroizing<[u8; 32]>`.
|
||||
|
||||
### H9. `hybrid_kem.rs:private_to_bytes()` returns unzeroized `Vec<u8>`
|
||||
**Source:** Security | **File:** `crates/quicproquo-core/src/hybrid_kem.rs:162`
|
||||
Hybrid private key material lingers in memory.
|
||||
**Fix:** Return `Zeroizing<Vec<u8>>`.
|
||||
|
||||
### H10. MeshIdentity stores Ed25519 seed as plaintext JSON
|
||||
**Source:** Security | **File:** `crates/quicproquo-p2p/src/identity.rs:72-79`
|
||||
No encryption, no file permissions.
|
||||
**Fix:** Encrypt identity file, set 0o600 permissions.
|
||||
|
||||
### H11. WebSocket bridge has rate_limits field but never checks it
|
||||
**Source:** Security | **File:** `crates/quicproquo-server/src/ws_bridge.rs:28-36`
|
||||
**Fix:** Apply `check_rate_limit()` in all WS bridge handlers.
|
||||
|
||||
### H12. ~100 lines duplicated between `receive_message` and `receive_message_with_sender`
|
||||
**Source:** Quality | **File:** `crates/quicproquo-core/src/group.rs:471-583`
|
||||
Bug fix in one must be manually replicated to other. Security-critical MLS code.
|
||||
**Fix:** Extract shared MLS message processing helper.
|
||||
|
||||
### H13. Synchronous file I/O in async blob handler
|
||||
**Source:** Quality | **File:** `crates/quicproquo-server/src/node_service/blob_ops.rs:124-137`
|
||||
Blocking `std::fs` calls in async handler stall event loop.
|
||||
**Fix:** Use `tokio::fs` or `spawn_blocking`.
|
||||
|
||||
### H14. `MeshEnvelope::forwarded()` invalidates signature without re-signing
|
||||
**Source:** Quality + Correctness + Security | **File:** `crates/quicproquo-p2p/src/envelope.rs:172-176`
|
||||
Increments `hop_count` included in signed bytes. All forwarded envelopes fail `verify()`.
|
||||
**Fix:** Exclude `hop_count` from signature, or add separate forwarding signature.
|
||||
|
||||
---
|
||||
|
||||
## MEDIUM (19 findings)
|
||||
|
||||
### M1. fetch_wait TOCTOU: missed notification window
|
||||
**Source:** Correctness | **File:** `crates/quicproquo-server/src/node_service/delivery.rs:496-522`
|
||||
TOCTOU between initial fetch (empty) and waiter registration. Enqueue between these points fires notify before waiter exists.
|
||||
**Fix:** Register waiter before initial fetch.
|
||||
|
||||
### M2. `verify_transcript_chain` never checks hashes — misleading name
|
||||
**Source:** Correctness | **File:** `crates/quicproquo-core/src/transcript.rs:215-251`
|
||||
Only validates structural integrity, not hash chain. Name implies verification.
|
||||
**Fix:** Rename to `validate_transcript_structure` or implement actual chain verification.
|
||||
|
||||
### M3. Non-atomic file writes in FileBackedStore
|
||||
**Source:** Correctness | **File:** `crates/quicproquo-server/src/storage.rs:332-337` (all flush_* methods)
|
||||
`fs::write` directly — crash mid-write corrupts file, loses all data.
|
||||
**Fix:** Use `tempfile::NamedTempFile` + `persist()`.
|
||||
|
||||
### M4. `delete_account` non-atomic multi-lock
|
||||
**Source:** Correctness | **File:** `crates/quicproquo-server/src/storage.rs:800-864`
|
||||
6 sequential Mutex locks. Concurrent fetch could see partially deleted account.
|
||||
**Fix:** Use single transaction or hold all locks simultaneously.
|
||||
|
||||
### M5. Timing side channel in `resolveIdentity` — no timing floor
|
||||
**Source:** Security | **File:** `crates/quicproquo-server/src/node_service/user_ops.rs:142-178`
|
||||
Unlike `resolveUser` which has 5ms floor.
|
||||
**Fix:** Apply same `RESOLVE_TIMING_FLOOR`.
|
||||
|
||||
### M6. WS bridge `resolve_user` has no timing floor
|
||||
**Source:** Security | **File:** `crates/quicproquo-server/src/ws_bridge.rs:158-181`
|
||||
**Fix:** Add same timing floor as Cap'n Proto handler.
|
||||
|
||||
### M7. `AuthContext.token` not zeroized
|
||||
**Source:** Security | **File:** `crates/quicproquo-server/src/auth.rs:68-72`
|
||||
**Fix:** Wrap in `Zeroizing<Vec<u8>>`.
|
||||
|
||||
### M8. Client `ClientAuth.access_token` not zeroized
|
||||
**Source:** Security | **File:** `crates/quicproquo-client/src/lib.rs:50-55`
|
||||
**Fix:** Use `Zeroizing<Vec<u8>>`.
|
||||
|
||||
### M9. `SessionState.password` stores plaintext password in memory
|
||||
**Source:** Security | **File:** `crates/quicproquo-client/src/client/session.rs:29`
|
||||
**Fix:** Use `Zeroizing<String>`, derive key at startup and zeroize password.
|
||||
|
||||
### M10. `conversation.rs:172` hex-encodes derived key without zeroization
|
||||
**Source:** Security | **File:** `crates/quicproquo-client/src/client/conversation.rs:172`
|
||||
**Fix:** Use `Zeroizing<String>` for `hex_key`.
|
||||
|
||||
### M11. `device_ops.rs:49` uses `.unwrap_or("")` on untrusted input
|
||||
**Source:** Security | **File:** `crates/quicproquo-server/src/node_service/device_ops.rs:49`
|
||||
**Fix:** Return error for invalid UTF-8.
|
||||
|
||||
### M12. `MeshEnvelope::forwarded()` invalidates signature (duplicate of H14)
|
||||
**Source:** Security | **File:** `crates/quicproquo-p2p/src/envelope.rs:172-176`
|
||||
|
||||
### M13. FileBackedStore `create_channel` O(n) linear scan
|
||||
**Source:** Architecture + Quality + Correctness | **File:** `crates/quicproquo-server/src/storage.rs:749-765`
|
||||
**Fix:** Secondary index or deterministic channel ID from member pair hash.
|
||||
|
||||
### M14. `resolve_identity_key` O(n) linear scan
|
||||
**Source:** Architecture + Quality | **File:** `crates/quicproquo-server/src/storage.rs:676-684`
|
||||
**Fix:** Maintain reverse map.
|
||||
|
||||
### M15. FFI error classification by string matching
|
||||
**Source:** Architecture | **File:** `crates/quicproquo-ffi/src/lib.rs:183`
|
||||
**Fix:** Match on typed error variants.
|
||||
|
||||
### M16. Documentation drift: master-prompt.md says Noise/TCP, code uses QUIC/TLS
|
||||
**Source:** Architecture | **Files:** `master-prompt.md`, server/client `Cargo.toml`
|
||||
**Fix:** Update master-prompt.md to reflect actual transport.
|
||||
|
||||
### M17. Plugin `HookVTable` unsafe Send+Sync without safety docs
|
||||
**Source:** Architecture | **File:** `crates/quicproquo-plugin-api/src/lib.rs:190-192`
|
||||
**Fix:** Add `// SAFETY:` documentation blocks.
|
||||
|
||||
### M18. OPAQUE register_finish: spurious RegistrationRequest deserialization
|
||||
**Source:** Quality + Correctness | **File:** `crates/quicproquo-server/src/node_service/auth_ops.rs:335-343`
|
||||
Dead code — deserializes upload_bytes as wrong type first.
|
||||
**Fix:** Remove lines 335-343.
|
||||
|
||||
### M19. Mixed serialization formats in DiskKeyStore (bincode + serde_json)
|
||||
**Source:** Quality | **File:** `crates/quicproquo-core/src/keystore.rs`
|
||||
**Fix:** Standardize on one format.
|
||||
|
||||
---
|
||||
|
||||
## LOW (23 findings)
|
||||
|
||||
### L1. `BroadcastChannel.key` not zeroized on drop
|
||||
**File:** `crates/quicproquo-p2p/src/broadcast.rs:18-19`
|
||||
|
||||
### L2. Plugin loader `CStr::from_ptr` on plugin-returned string (UB risk)
|
||||
**File:** `crates/quicproquo-server/src/plugin_loader.rs:102`
|
||||
|
||||
### L3. Token cache stores session token as plaintext hex when no password set
|
||||
**File:** `crates/quicproquo-client/src/client/token_cache.rs:63-68`
|
||||
|
||||
### L4. `--password` CLI flag visible in process list
|
||||
**File:** `crates/quicproquo-client/src/main.rs:104`
|
||||
|
||||
### L5. `clippy::unwrap_used` is warn not deny
|
||||
**File:** `Cargo.toml:88`
|
||||
|
||||
### L6. `strip = "symbols"` hinders post-mortem debugging
|
||||
**File:** `Cargo.toml:94`
|
||||
|
||||
### L7. `thread_rng()` for channel ID generation instead of `OsRng`
|
||||
**Files:** `storage.rs:760`, `sql_store.rs:545`
|
||||
|
||||
### L8. `conversation.rs:548` — `limit` cast from `usize` to `u32` without saturation
|
||||
**File:** `crates/quicproquo-client/src/client/conversation.rs:548`
|
||||
|
||||
### L9. `conversation.rs:363-365,420-422` — bincode deserialize errors silently dropped
|
||||
**File:** `crates/quicproquo-client/src/client/conversation.rs`
|
||||
|
||||
### L10. `repl.rs:610` — static AtomicU64 for padding timer instead of session state
|
||||
**File:** `crates/quicproquo-client/src/client/repl.rs:610`
|
||||
|
||||
### L11. `command_engine.rs:148` — `to_slash()` clones entire Command enum unnecessarily
|
||||
**File:** `crates/quicproquo-client/src/client/command_engine.rs:148`
|
||||
|
||||
### L12. `conversation.rs:201-203` — SQL ATTACH with format string
|
||||
**File:** `crates/quicproquo-client/src/client/conversation.rs:201-203`
|
||||
|
||||
### L13. Client `hex.rs` trivial wrapper with zero value-add
|
||||
**File:** `crates/quicproquo-client/src/client/hex.rs`
|
||||
|
||||
### L14. `config.rs` `#[allow(dead_code)]` on EffectiveFederationConfig
|
||||
**File:** `crates/quicproquo-server/src/config.rs`
|
||||
|
||||
### L15. `federation/address.rs` `#[allow(dead_code)]` on entire module
|
||||
**File:** `crates/quicproquo-server/src/federation/address.rs`
|
||||
|
||||
### L16. ANSI escape codes hardcoded without terminal capability detection
|
||||
**File:** `crates/quicproquo-client/src/client/display.rs`
|
||||
|
||||
### L17. GUI `lib.rs:75` `.expect()` on Tauri run
|
||||
**File:** `crates/quicproquo-gui/src/lib.rs:75`
|
||||
|
||||
### L18. `session.rs:186` DiskKeyStore failure silently falls back to ephemeral
|
||||
**File:** `crates/quicproquo-client/src/client/session.rs:186`
|
||||
|
||||
### L19. `retry.rs:37` `thread_rng()` for jitter instead of `OsRng`
|
||||
**File:** `crates/quicproquo-client/src/client/retry.rs:37`
|
||||
|
||||
### L20. `MeshStore.seen` set grows unboundedly
|
||||
**File:** `crates/quicproquo-p2p/src/` (MeshStore)
|
||||
|
||||
### L21. `envelope.rs:58` `.expect()` on system clock in non-test code
|
||||
**File:** `crates/quicproquo-p2p/src/envelope.rs:58`
|
||||
|
||||
### L22. `broadcast.rs:47` `.expect()` on encryption in non-test code
|
||||
**File:** `crates/quicproquo-p2p/src/broadcast.rs:47`
|
||||
|
||||
### L23. Bot crate hardcodes sender as "peer" (TODO)
|
||||
**File:** `crates/quicproquo-bot/src/lib.rs`
|
||||
|
||||
---
|
||||
|
||||
## TESTING GAPS (6 findings)
|
||||
|
||||
### T1. No unit tests for `plugin_loader.rs`
|
||||
### T2. No unit tests for `federation/tls.rs`
|
||||
### T3. No tests for `blob_ops.rs`
|
||||
### T4. No tests for `delivery.rs`
|
||||
### T5. `conversation.rs` migration code untested
|
||||
### T6. No negative test for `MeshEnvelope::verify()` with wrong key
|
||||
|
||||
---
|
||||
|
||||
## Strengths (positive findings from security audit)
|
||||
|
||||
- Constant-time token comparison via `subtle::ConstantTimeEq`
|
||||
- Parameterized SQL everywhere (no injection)
|
||||
- Proper Argon2id (19 MiB, t=2, p=1) + ChaCha20-Poly1305 for client state
|
||||
- Zero `.unwrap()` in non-test server code (grep-verified)
|
||||
- TLS 1.3 only, mTLS for federation
|
||||
- Rate limiting: 100 enqueues/60s, 50 connections/IP/60s
|
||||
- Delivery proof signing: SHA-256(seq||recipient||timestamp) + Ed25519
|
||||
- KeyPackage ciphersuite validation (only 0x0001 accepted)
|
||||
- Payload size limits: 5 MB message, 50 MB blob, 1 MB KeyPackage
|
||||
- Queue depth limits: 1000/inbox, 100K sessions, 100K waiters
|
||||
- Blob path traversal protection via hex-encoded hash
|
||||
- Audit logging with secret redaction (identity keys prefix-only)
|
||||
- Production config validation (rejects devtoken, empty auth, missing TLS)
|
||||
|
||||
---
|
||||
|
||||
## Recommended fix priority
|
||||
|
||||
1. **Federation auth** (C1) — auth-gate inbound requests, validate origin against mTLS cert
|
||||
2. **WS bridge authz** (C2) + rate limits (H11) + timing floors (M6) — parity with Cap'n Proto path
|
||||
3. **Crypto error propagation** (C3, C7) — hpke_seal and hpke_setup_sender_and_export
|
||||
4. **Zeroization sweep** (H8, H9, H10, M7-M10, L1) — all leaked secret material
|
||||
5. **ServerContext extraction** (C4) — foundation for capability-based security
|
||||
6. **FileBackedStore atomic writes** (C5, M3) — prevent data loss on crash
|
||||
7. **std::sync::Mutex → tokio::sync::Mutex** (C6) — unblock Tokio workers
|
||||
8. **Mobile TLS verification** (H5) — remove hardcoded skip
|
||||
9. **fetch_wait TOCTOU** (M1) — register waiter before fetch
|
||||
10. **Testing gaps** (T1-T6) — critical untested paths
|
||||
@@ -1,175 +0,0 @@
|
||||
# Next Sprint Planning — quicproquo
|
||||
|
||||
> Pick 8 of the 24 features below for the next sprint cycle.
|
||||
> Created: 2026-03-04 | Status: PENDING SELECTION
|
||||
|
||||
## Completed Sprints (this cycle)
|
||||
|
||||
| # | Sprint | Commit | Summary |
|
||||
|---|--------|--------|---------|
|
||||
| 4 | Rich Messaging | `81d5e2e` | Read receipts, typing, reactions, edit/delete |
|
||||
| 5 | File Transfer | `3350d76` | Chunked blob upload/download, /send-file |
|
||||
| 6 | Disappearing + Groups | `fd21ea6` | TTL messages, /group-info, deleteAccount |
|
||||
| 7 | Go SDK | `65ff262` | QUIC + Cap'n Proto, 24 RPC methods, 14 API functions |
|
||||
| 8 | TypeScript SDK | `28ceaaf` | 175KB WASM crypto, WebSocket transport, browser demo |
|
||||
| 9 | Mesh Networking | `1b61b7e` | MeshIdentity, store-and-forward, broadcast channels |
|
||||
| 10 | Privacy Hardening | `9244e80` | --redact-logs, traffic padding, /privacy suite, /verify-fs |
|
||||
| 11 | Multi-Device | `9244e80` | Device registry (3 RPCs), /devices, max 5 per identity |
|
||||
|
||||
## Current Codebase Stats
|
||||
|
||||
- **27 Cap'n Proto RPCs** (@0–@26) on NodeService
|
||||
- **10 AppMessage types** (0x01–0x09 + file ref)
|
||||
- **~40 REPL commands**
|
||||
- **Tests**: 72 core + 35 server + 28 P2P + 14 E2E = 149
|
||||
- **SDKs**: Rust (native), Go, TypeScript/WASM, C FFI, Python ctypes
|
||||
- **Crates**: core, proto, server, client, p2p, bot, gen, kt, plugin-api, gui, mobile, ffi
|
||||
|
||||
---
|
||||
|
||||
## Feature Candidates (pick 8)
|
||||
|
||||
### A. Federation Wiring
|
||||
**Effort**: Medium | **Area**: Server
|
||||
Wire the existing outbound federation relay into the actual delivery flow. When a message targets `user@remote.domain`, the server routes via `FederationClient::relay_enqueue()` instead of local store. Add `/federate <domain>` admin command to configure peers. Test with two server instances. Currently all federation code exists but is marked `#[allow(dead_code)]`.
|
||||
|
||||
### B. Contact Management & Blocking
|
||||
**Effort**: Medium | **Area**: Client + Server
|
||||
Contact list with add/remove/block/unblock. Server-side: `addContact @27`, `removeContact @28`, `blockUser @29`, `listContacts @30` RPCs + contacts table. Client: `/contacts`, `/block <user>`, `/unblock <user>`. Blocked users can't enqueue messages to you (server enforces). Import/export contacts as JSON.
|
||||
|
||||
### C. Voice/Video Call Signaling
|
||||
**Effort**: High | **Area**: Core + Client
|
||||
WebRTC signaling over MLS for E2E encrypted calls. Add `CallOffer`, `CallAnswer`, `CallIce`, `CallHangup` AppMessage types (0x0A–0x0D). Client REPL: `/call <user>`, `/answer`, `/hangup`. The actual media (audio/video) uses WebRTC peer-to-peer; qpq only handles the encrypted signaling. Include SDP offer/answer exchange and ICE candidate relay.
|
||||
|
||||
### D. Encrypted Backup & Restore
|
||||
**Effort**: Medium | **Area**: Client + Core
|
||||
Export all local state (message history, keys, group state) as an encrypted archive. Key derivation from user password via Argon2id. Format: encrypted SQLite dump + identity seed + MLS group states. `/backup <path>` and `/restore <path>` commands. Verify integrity on restore. Critical for device migration and disaster recovery.
|
||||
|
||||
### E. Group Permissions & Roles
|
||||
**Effort**: Medium | **Area**: Server + Client
|
||||
Admin/moderator/member roles within MLS groups. Server-side role storage per channel. Admins can: remove members, rename group, set TTL policy. Moderators can: mute members. Members can: send messages. `/role <user> admin|mod|member`, `/mute <user> <duration>`. Enforced at both server (RPC level) and client (MLS proposal validation).
|
||||
|
||||
### F. Key Transparency Audit Client
|
||||
**Effort**: Medium | **Area**: Client + KT crate
|
||||
Client-side verification of the KT Merkle log. The KT crate (`quicproquo-kt`) already has the Merkle tree and audit log. Add: `/kt audit <username>` to verify a user's key history is consistent, `/kt monitor` to continuously watch for key changes, `/kt proof <username>` to fetch and verify inclusion proofs. Alert on unexpected key changes (TOFU violation).
|
||||
|
||||
### G. Message Search
|
||||
**Effort**: Low-Medium | **Area**: Client
|
||||
Full-text search over local encrypted message history. Add FTS5 virtual table to the conversation SQLite DB. `/search <query>` returns matching messages with context, timestamps, and conversation names. `/search <query> in:<conversation>` for scoped search. Highlight matching terms. Index on message insert.
|
||||
|
||||
### H. Server Clustering & HA
|
||||
**Effort**: High | **Area**: Server + Infra
|
||||
Run multiple qpq-server instances behind a shared state layer. Options: shared PostgreSQL backend (replace SQLite for clustered mode), or Raft consensus for delivery queue. Add `--cluster-peers` flag, health-based leader election, delivery queue synchronization. Docker Compose with 3-node cluster. This is the path to production-scale deployment.
|
||||
|
||||
### I. Protocol Compliance Testing
|
||||
**Effort**: Medium | **Area**: Testing
|
||||
Comprehensive MLS RFC 9420 compliance test suite. Verify: TreeKEM operations, epoch advancement, proposal/commit sequences, welcome message handling, group context extensions, PSK injection, external joins. Cross-test with other MLS implementations (OpenMLS test vectors). Add to CI. Target: 50+ protocol-level tests covering edge cases.
|
||||
|
||||
### J. User Profiles & Status
|
||||
**Effort**: Low | **Area**: Server + Client
|
||||
Profile pictures (stored as blobs), display names, status messages ("Available", "Away", custom text), about/bio text. `updateProfile @27` and `fetchProfile @28` RPCs. Profile data is signed by the identity key for authenticity. `/profile set-name <name>`, `/profile set-status <text>`, `/profile set-avatar <path>`, `/profile <username>` to view. Cache profiles locally.
|
||||
|
||||
### K. Notification Framework
|
||||
**Effort**: Medium | **Area**: Server + Client
|
||||
Per-conversation notification settings: all, mentions-only, muted. Server-side WebPush integration for browser clients (using the TS SDK). Add `updateNotificationSettings @27` RPC. Client: `/mute <conversation>`, `/unmute`, `/notify mentions-only`. Push notification payload: encrypted sender + conversation hint (no message content). APNs/FCM gateway as a separate microservice.
|
||||
|
||||
### L. Mobile App Shell
|
||||
**Effort**: High | **Area**: Mobile + FFI
|
||||
React Native app using the C FFI bindings (quicproquo-ffi). Screens: login, conversation list, chat view, settings. Bridge FFI functions to React Native via NativeModules. Use the existing `qpq_connect`, `qpq_login`, `qpq_send`, `qpq_receive` C API. iOS + Android targets. Alternatively: Flutter with dart:ffi. Includes push notification registration.
|
||||
|
||||
### M. Message Threading & Replies
|
||||
**Effort**: Low-Medium | **Area**: Client + Core
|
||||
Threaded conversations within channels. Add `thread_id` field to Chat AppMessage — replies to a message inherit its thread_id (or create one). `/thread <msg-index>` enters a thread view showing only that thread's messages. `/threads` lists active threads with last activity. Thread-aware notification counts. Local storage: add `thread_id` column to messages table, filter queries by thread.
|
||||
|
||||
### N. Cross-Signing & Identity Verification
|
||||
**Effort**: Medium | **Area**: Core + Client
|
||||
Out-of-band identity verification via QR codes and emoji comparison. Generate a short verification code from both parties' identity keys (similar to Signal's safety numbers but interactive). `/verify <user>` starts a verification session, displays emoji sequence or QR payload. `/verify confirm` marks the contact as verified. Verified contacts show a checkmark. Store verification state locally. Alert if a verified contact's key changes.
|
||||
|
||||
### O. Offline Message Queue with Priorities
|
||||
**Effort**: Low-Medium | **Area**: Client
|
||||
Smart offline queue that prioritizes messages when reconnecting. Messages queued while offline get priority levels: critical (key rotation, group ops), normal (chat), low (typing, read receipts). On reconnect, send critical first, then normal, drop stale low-priority. `/outbox` shows pending messages. `/outbox flush` forces immediate send. `/outbox clear` discards unsent. Exponential backoff with jitter for reconnection.
|
||||
|
||||
### P. Audit Log & Compliance Export
|
||||
**Effort**: Medium | **Area**: Server
|
||||
Persistent server-side audit log for compliance. Every RPC call logged to a dedicated `audit_events` table: timestamp, identity, operation, result, metadata. Configurable retention policy (30/60/90 days). `qpq-admin audit --from --to --user` CLI to query. Export to JSON/CSV. GDPR data export: `/export-my-data` RPC returns all data the server holds about a user. Separate from redact-logs (this is structured, queryable).
|
||||
|
||||
### Q. Bot Framework Enhancements
|
||||
**Effort**: Medium | **Area**: Bot SDK + Server
|
||||
Enhance the existing `quicproquo-bot` crate into a full bot platform. Add: slash command registration (`/weather`, `/poll`, etc.), interactive message components (buttons/selects as structured AppMessage extensions), bot permissions (scoped access tokens), webhook delivery (HTTP POST on events). `BotBuilder` pattern: `Bot::new().command("ping", handle_ping).on_message(handle_msg).run()`. Example bots: echo, reminder, RSS feed.
|
||||
|
||||
### R. Tor/I2P Transport
|
||||
**Effort**: High | **Area**: Server + Client + P2P
|
||||
Anonymous transport layer for privacy-critical deployments. Server: listen on Tor hidden service (.onion) via `arti` or `tor` crate, configurable via `--tor-hidden-service`. Client: connect through SOCKS5 proxy to .onion address, `--tor-proxy socks5://127.0.0.1:9050`. P2P mesh: route through Tor for metadata-resistant peer communication. Optional I2P support via SAM bridge. All existing QUIC+TLS works over the tunnel.
|
||||
|
||||
### S. Plugin Marketplace & Hot-Reload
|
||||
**Effort**: Medium | **Area**: Server + Plugin API
|
||||
Extend the existing plugin system into a discoverable marketplace. Plugin manifest format (TOML) with name, version, permissions, hooks. `qpq-server --plugin-dir ./plugins/` auto-loads `.so`/`.dylib` files. Hot-reload: watch plugin directory, reload on change without server restart. Plugin isolation: each plugin runs in its own thread with limited Store access. Add `qpq-gen plugin <name>` scaffolding. Example: spam filter plugin, message archiver.
|
||||
|
||||
### T. Stress Testing & Benchmarking Suite
|
||||
**Effort**: Medium | **Area**: Testing + Infra
|
||||
Production-grade load testing tool. Simulate N concurrent clients: register, login, create channels, send/receive at configurable rate. Measure: messages/sec throughput, p50/p95/p99 latency, memory usage, connection limits. `cargo bench` integration for micro-benchmarks (already have some). New `qpq-loadtest` binary: `qpq-loadtest --clients 100 --rate 50/s --duration 60s --server localhost:5001`. Generate HTML report with charts. Identify bottlenecks before production.
|
||||
|
||||
### U. Disappearing Media & View-Once
|
||||
**Effort**: Low | **Area**: Client + Core
|
||||
View-once messages that auto-delete after first viewing. Add `ViewOnce` flag to FileRef AppMessage — recipient can view the file/image once, then it's deleted locally. Server-side: auto-delete blob after first download. `/send-once <path>` command. Display "[view-once media]" placeholder until opened. Prevent screenshots (best-effort: clear clipboard, disable screen recording notification). Extends existing file transfer infrastructure.
|
||||
|
||||
### V. Emoji Status & Presence
|
||||
**Effort**: Low | **Area**: Server + Client
|
||||
Lightweight presence system. Users set an emoji + short text status ("🏖️ On vacation", "🔴 Do not disturb", "🟢 Available"). Ephemeral — not stored permanently, expires after configurable duration. `publishPresence` RPC (piggyback on existing `publishEndpoint`). Client poll or push-based presence updates. `/status 🎯 Focusing` to set, `/status` to view, `/who` shows online contacts with their status. No tracking — presence is opt-in and ephemeral.
|
||||
|
||||
### W. Markdown & Rich Text Messages
|
||||
**Effort**: Low | **Area**: Core + Client
|
||||
Rich text formatting in messages. Support a subset of Markdown in chat: **bold**, *italic*, `code`, ```code blocks```, ~~strikethrough~~, > quotes, [links](url). Parse on display (client-side only — wire format stays plain text with Markdown syntax). TUI renderer: ANSI escape codes for bold/italic/color. Browser demo: render as HTML. Add `/format on|off` toggle. No changes to MLS or wire protocol — purely presentational.
|
||||
|
||||
### X. Invitation Links & QR Codes
|
||||
**Effort**: Low-Medium | **Area**: Server + Client
|
||||
Shareable invitation links for joining the server or a group. `createInvite` RPC generates a time-limited, usage-limited token. Format: `qpq://server:port/invite/TOKEN` or QR code encoding. `/invite create [--expires 24h] [--uses 10]` generates link. `/invite list` shows active invites. `/invite revoke <id>` cancels. New users can register via invite: `qpq-client --invite qpq://...`. Group invites: generate a link that auto-adds the joiner to a specific group after registration.
|
||||
|
||||
### Y. Command Engine & Playbooks
|
||||
**Effort**: Medium | **Area**: Client + Testing
|
||||
Unified command abstraction layer making every REPL action available via code and YAML. Command registry maps string names to typed `Command` variants. YAML playbook format for declarative multi-step scenarios with variables, assertions, and loops. `qpq-client --run playbook.yaml` for batch execution. Programmatic Rust API: `engine.execute(Command::Send { ... })`. Enables: CI smoke tests, reproducible environments, bot scripting, onboarding demos, load test scenarios, migration scripts. Pairs with every other feature.
|
||||
|
||||
---
|
||||
|
||||
## Selection Guide
|
||||
|
||||
**Privacy-first** (maximum anonymity & security):
|
||||
A (federation), D (backup), F (KT audit), N (cross-signing), R (Tor), U (view-once)
|
||||
|
||||
**Production-ready** (deploy to real users):
|
||||
A (federation), B (contacts), H (clustering), I (compliance), K (notifications), T (stress test)
|
||||
|
||||
**User experience** (make it feel like a real messenger):
|
||||
B (contacts), C (calls), G (search), J (profiles), V (presence), W (rich text), X (invites)
|
||||
|
||||
**Mobile launch** (ship an app):
|
||||
D (backup), J (profiles), K (notifications), L (mobile app), X (invites)
|
||||
|
||||
**Developer ecosystem** (grow the community):
|
||||
Q (bot framework), S (plugin marketplace), T (stress test), I (compliance)
|
||||
|
||||
**Mesh/Freifunk** (offline-first, decentralized):
|
||||
A (federation), N (cross-signing), O (offline queue), R (Tor)
|
||||
|
||||
---
|
||||
|
||||
## Completed (this planning cycle)
|
||||
|
||||
| Sprint | Feature | Status |
|
||||
|--------|---------|--------|
|
||||
| — | Y. Command Engine & Playbooks | Done — `command_engine.rs`, `playbook.rs`, `--run` CLI, 5 example playbooks |
|
||||
|
||||
## Selected Features (fill in after choosing)
|
||||
|
||||
> Pick 8 of A–X above, then we'll plan sprint assignments.
|
||||
|
||||
| Sprint | Feature | Notes |
|
||||
|--------|---------|-------|
|
||||
| 12 | | |
|
||||
| 13 | | |
|
||||
| 14 | | |
|
||||
| 15 | | |
|
||||
| 16 | | |
|
||||
| 17 | | |
|
||||
| 18 | | |
|
||||
| 19 | | |
|
||||
@@ -1,380 +0,0 @@
|
||||
# quicproquo v2 — Design Analysis & Recommendations
|
||||
|
||||
> Multi-perspective retrospective of the v1 architecture.
|
||||
> Produced 2026-03-04 by four parallel analysis agents examining server,
|
||||
> client/UX, crypto/security, and project structure/DX.
|
||||
|
||||
---
|
||||
|
||||
## Executive Summary
|
||||
|
||||
quicproquo v1 demonstrates strong fundamentals: QUIC-native transport, RFC 9420
|
||||
MLS group encryption, post-quantum hybrid KEM, OPAQUE zero-knowledge auth, and a
|
||||
working multi-language SDK surface. These are the right bets and put the project
|
||||
ahead of most open-source messengers on the crypto front.
|
||||
|
||||
However, three architectural choices limit the path to production:
|
||||
|
||||
1. **capnp-rpc is `!Send`** — forces single-threaded RPC handling, blocking
|
||||
scalability.
|
||||
2. **Monolithic client with global state** — business logic is tangled into the
|
||||
REPL, duplicated across TUI/GUI/Web, and cannot be used as a library.
|
||||
3. **Poll-based delivery** — 1-second polling wastes bandwidth and adds latency;
|
||||
no server-push channel exists.
|
||||
|
||||
A v2 should keep the crypto stack (MLS + hybrid PQ KEM + OPAQUE), keep QUIC, but
|
||||
rearchitect the RPC layer, extract an SDK crate, and add push-based delivery.
|
||||
|
||||
---
|
||||
|
||||
## Part 1 — What Works Well
|
||||
|
||||
### Transport & Protocol
|
||||
- **QUIC (quinn) + TLS 1.3** — correct choice. Built-in encryption, connection
|
||||
migration, 0-RTT potential. No reason to change.
|
||||
- **Cap'n Proto schemas as API contract** — zero-copy wire format, compact
|
||||
binary, schema evolution via ordinals. The *schemas* are good; the *RPC
|
||||
runtime* is the problem.
|
||||
|
||||
### Cryptography
|
||||
- **MLS (RFC 9420, openmls)** — only IETF-standard group E2E protocol. No
|
||||
realistic alternative for groups > 2 members. Test suite is thorough (1005
|
||||
lines covering 2-party, 3-party, hybrid, removal, leave, stale epoch).
|
||||
- **Hybrid PQ KEM (X25519 + ML-KEM-768)** — forward-thinking dual-algorithm
|
||||
protection. Well-implemented with versioned wire format, proper zeroization,
|
||||
and 12 targeted tests. Ahead of Signal (PQXDH, late 2023) and Matrix (no PQ).
|
||||
- **OPAQUE (RFC 9497)** — server never sees passwords. Ristretto255 + Argon2id
|
||||
is best-in-class.
|
||||
- **Sealed sender, safety numbers, message padding** — all clean, simple,
|
||||
correct. Safety numbers match Signal's 5200-iteration HMAC-SHA256 cost.
|
||||
- **Zeroization discipline** — secrets wrapped in `Zeroizing`, Debug impls
|
||||
redact keys, no `.unwrap()` in crypto paths.
|
||||
- **WASM feature gating** — `core/native` cleanly separates WASM-safe crypto
|
||||
from native-only modules (MLS, OPAQUE, filesystem).
|
||||
|
||||
### Server Design
|
||||
- **Store trait abstraction** — 30+ methods, clean backend swap (SqlStore vs
|
||||
FileBackedStore). Well-factored.
|
||||
- **OPAQUE auth with timing floors** — `resolveUser`/`resolveIdentity` mask
|
||||
lookup timing to prevent username enumeration.
|
||||
- **Delivery proofs** — Ed25519-signed receipt of server acceptance. Clients get
|
||||
cryptographic evidence.
|
||||
- **`wasNew` flag on createChannel** — elegantly solves the dual-MLS-group race
|
||||
condition where both DM parties try to initialize.
|
||||
- **Plugin hooks (C-ABI)** — `#![no_std]` vtable, zero dependencies, chained
|
||||
hooks with continue/reject protocol. Clean extensibility.
|
||||
- **Production config validation** — enforces encrypted storage, strong auth
|
||||
tokens, pre-existing TLS certs.
|
||||
|
||||
### Client & DX
|
||||
- **Zero-config local dev** — `qpq --username alice --password pass` auto-starts
|
||||
server, generates TLS certs, registers, and logs in. Genuinely excellent.
|
||||
- **Encrypted-at-rest everything** — state file (QPCE), conversation DB
|
||||
(SQLCipher), session cache. Argon2id + ChaCha20-Poly1305 throughout.
|
||||
- **Playbook system** — YAML-scripted command execution with assertions. Great
|
||||
for CI/integration testing.
|
||||
- **Conversation store** — SQLite with deduplication, outbox for offline
|
||||
queuing, activity tracking.
|
||||
- **Conventional commits, GPG-signed** — consistent `feat:`/`fix:`/`docs:`
|
||||
discipline.
|
||||
- **Security lints enforced by build** — `clippy::unwrap_used = "deny"`,
|
||||
`unsafe_code = "warn"`.
|
||||
|
||||
---
|
||||
|
||||
## Part 2 — What Needs Rethinking
|
||||
|
||||
### 2.1 RPC Layer: capnp-rpc is the #1 Scalability Bottleneck
|
||||
|
||||
**Problem:** `capnp-rpc` uses `Rc` internally and is `!Send`. Everything runs on
|
||||
a `LocalSet` with `spawn_local`. All 27 RPC methods serialize through a single
|
||||
thread. No work-stealing, no multi-core utilization.
|
||||
|
||||
**Impact:** With 1000+ concurrent clients, the single-threaded executor cannot
|
||||
keep up. A slow `fetchWait` (30s timeout) blocks the entire connection.
|
||||
|
||||
**Also:** The WebSocket bridge (`ws_bridge.rs`, 645 lines) exists solely because
|
||||
Cap'n Proto cannot run in browsers. This duplicates handler logic and creates
|
||||
maintenance burden.
|
||||
|
||||
### 2.2 Client Architecture: Monolith with Global State
|
||||
|
||||
**Problem:** `AUTH_CONTEXT` is a process-wide `RwLock<Option<ClientAuth>>`.
|
||||
Business logic (MLS processing, sealed sender, hybrid decryption, message
|
||||
routing) lives inside `repl.rs`'s `poll_messages()` — a 100-line function that
|
||||
mixes transport, crypto, routing, and storage.
|
||||
|
||||
**Impact:** Every frontend (REPL, TUI, GUI, Web) must reimplement message
|
||||
processing. The TUI already duplicates it. The GUI stub and mobile PoC would need
|
||||
yet another copy. Client cannot be used as a library.
|
||||
|
||||
### 2.3 Delivery Model: Poll-Based, No Push Channel
|
||||
|
||||
**Problem:** Client polls every 1 second with `fetch_wait(timeout_ms=0)` — never
|
||||
actually long-polls. Constant network traffic even when idle. ~1 second latency
|
||||
for message delivery.
|
||||
|
||||
**Also:** `fetch` is destructive (drains queue). If the client crashes between
|
||||
receive and processing, messages are lost.
|
||||
|
||||
### 2.4 Connection Model: Single Stream
|
||||
|
||||
**Problem:** `max_concurrent_bidi_streams(1)` means the entire QUIC connection is
|
||||
effectively single-stream. A blocking `fetchWait` prevents all other RPCs.
|
||||
|
||||
### 2.5 Storage: Single Mutex-Guarded SQLite Connection
|
||||
|
||||
**Problem:** `SqlStore` uses `Mutex<Connection>`. Every database operation
|
||||
acquires a global lock. Under concurrent load, all storage access serializes.
|
||||
|
||||
**Also:** `FileBackedStore` flushes the entire map on every write (O(n) I/O).
|
||||
Sessions are in-memory only — server restart forces all clients to re-login.
|
||||
|
||||
### 2.6 Key Management Gaps
|
||||
|
||||
- **DiskKeyStore** — HPKE private keys stored as plaintext bincode on disk. No
|
||||
encryption at rest.
|
||||
- **MLS group state** — `GroupMember` holds `MlsGroup` in memory only. Process
|
||||
crash loses all group state.
|
||||
- **Token zeroization** — `AuthContext.token`, `ClientAuth.access_token` are not
|
||||
wrapped in `Zeroizing`.
|
||||
|
||||
### 2.7 Workspace Bloat
|
||||
|
||||
12 crates for a project at this maturity is excessive. Several are thin stubs
|
||||
(`quicproquo-gen`, `quicproquo-bot` at 354 lines) or broken (`quicproquo-gui`
|
||||
fails `cargo build --workspace`).
|
||||
|
||||
---
|
||||
|
||||
## Part 3 — v2 Architecture Recommendations
|
||||
|
||||
### 3.1 Replace capnp-rpc with a Send-Compatible RPC Framework
|
||||
|
||||
**Recommendation:** Switch to **tonic (gRPC)** or a custom framing layer.
|
||||
|
||||
| Dimension | capnp-rpc (v1) | tonic/gRPC (v2) |
|
||||
|-----------|---------------|-----------------|
|
||||
| Threading | `!Send`, single-threaded | `Send + Sync`, multi-threaded |
|
||||
| Browser | Requires WS bridge | grpc-web native |
|
||||
| Streaming | Not supported | Built-in |
|
||||
| Middleware | None (copy-paste auth) | Interceptors/layers |
|
||||
| Ecosystem | Niche | Massive (every language) |
|
||||
|
||||
**Alternative:** Keep Cap'n Proto *schemas* for serialization (zero-copy
|
||||
advantage) but replace capnp-rpc with custom framing over QUIC streams. This
|
||||
preserves the wire format while gaining `Send` compatibility.
|
||||
|
||||
The WS bridge would be eliminated entirely — grpc-web or WebTransport gives
|
||||
browsers direct access.
|
||||
|
||||
### 3.2 Extract an SDK Crate (Most Important Client Change)
|
||||
|
||||
Create `quicproquo-sdk` that owns all business logic:
|
||||
|
||||
```
|
||||
quicproquo-sdk/
|
||||
src/
|
||||
client.rs -- QpqClient: connect, login, send, receive
|
||||
events.rs -- ClientEvent enum (push-based)
|
||||
conversation.rs -- ConversationHandle, group management
|
||||
crypto.rs -- MLS pipeline, sealed sender, hybrid decryption
|
||||
sync.rs -- message sync, offline queue, retry
|
||||
```
|
||||
|
||||
All frontends become thin shells:
|
||||
|
||||
```
|
||||
CLI/REPL -> calls sdk
|
||||
TUI -> calls sdk
|
||||
Tauri GUI -> calls sdk (via Tauri commands)
|
||||
Mobile -> calls sdk (via C FFI)
|
||||
Web/WASM -> calls sdk (compiled to wasm32)
|
||||
```
|
||||
|
||||
**Key API shape:**
|
||||
```rust
|
||||
pub struct QpqClient { /* session, rpc, crypto pipeline */ }
|
||||
|
||||
impl QpqClient {
|
||||
pub async fn connect(config: ClientConfig) -> Result<Self>;
|
||||
pub async fn login(username: &str, password: &str) -> Result<Self>;
|
||||
pub async fn dm(&mut self, username: &str) -> Result<ConversationHandle>;
|
||||
pub async fn create_group(&mut self, name: &str) -> Result<ConversationHandle>;
|
||||
pub async fn send(&mut self, text: &str) -> Result<MessageId>;
|
||||
pub fn subscribe(&self) -> Receiver<ClientEvent>;
|
||||
}
|
||||
```
|
||||
|
||||
No global state. No `AUTH_CONTEXT`. Auth context is per-`QpqClient` instance.
|
||||
|
||||
### 3.3 Add Push-Based Delivery
|
||||
|
||||
**Recommendation:** Dedicated QUIC unidirectional stream for server-push
|
||||
notifications.
|
||||
|
||||
```
|
||||
Client opens bidi stream 0 -> RPC channel (request/response)
|
||||
Server opens uni stream 1 -> push notifications (new message, typing, etc.)
|
||||
```
|
||||
|
||||
Benefits:
|
||||
- Zero-latency message delivery (no polling)
|
||||
- No idle network traffic
|
||||
- Typing indicators delivered in real-time
|
||||
- Graceful degradation: fall back to long-poll if push stream fails
|
||||
|
||||
**Also:** Make `peek` + `ack` the default delivery pattern (not destructive
|
||||
`fetch`). Add idempotency keys to prevent duplicate messages on retry.
|
||||
|
||||
### 3.4 Multi-Stream Connections
|
||||
|
||||
Allow 4-8 concurrent bidirectional QUIC streams per connection. This enables:
|
||||
- Pipelined RPCs (send while fetching)
|
||||
- Concurrent blob upload + chat
|
||||
- `fetchWait` on one stream without blocking others
|
||||
|
||||
### 3.5 Storage Improvements
|
||||
|
||||
| Change | Rationale |
|
||||
|--------|-----------|
|
||||
| Drop `FileBackedStore` | O(n) flush per write, no federation support |
|
||||
| Connection pool for SQLite | Replace `Mutex<Connection>` with r2d2/deadpool |
|
||||
| Persist sessions to DB | Server restart shouldn't force re-login |
|
||||
| Encrypt DiskKeyStore at rest | HPKE private keys in plaintext is a real vuln |
|
||||
| Persist MLS group state | Process crash shouldn't lose group state |
|
||||
| Atomic keystore writes | tempfile-then-rename pattern |
|
||||
|
||||
### 3.6 Crypto Stack Refinements
|
||||
|
||||
The algorithms are correct. The refinements are operational:
|
||||
|
||||
| Change | Rationale |
|
||||
|--------|-----------|
|
||||
| Typed MLS error variants | Stop losing error info via `format!("{e:?}")` |
|
||||
| Formalize hybrid PQ ciphersuite ID | Replace length-based key detection |
|
||||
| Remove all InsecureServerCertVerifier | No TLS bypass on any platform |
|
||||
| Add passkey/WebAuthn alt-auth | Better UX for GUI/mobile, no password to forget |
|
||||
| Consider Double Ratchet for 1:1 DMs | MLS is over-engineered for 2-party; DR gives better per-message forward secrecy |
|
||||
| Token/session secret zeroization | `AuthContext.token` et al. need `Zeroizing` wrappers |
|
||||
| Fix serde deserialization of secrets | Intermediate non-zeroized `Vec<u8>` in `IdentityKeypair::deserialize` |
|
||||
|
||||
### 3.7 Workspace Restructuring
|
||||
|
||||
**Reduce from 12 to 8 crates:**
|
||||
|
||||
```
|
||||
quicproquo-core -- crypto primitives (keep)
|
||||
quicproquo-proto -- schema codegen (keep)
|
||||
quicproquo-plugin-api -- #![no_std] C-ABI (keep)
|
||||
quicproquo-kt -- key transparency (keep)
|
||||
quicproquo-sdk -- NEW: business logic library
|
||||
quicproquo-server -- server binary (keep)
|
||||
quicproquo-client -- CLI/TUI binary, depends on sdk (keep, slimmed)
|
||||
quicproquo-p2p -- mesh networking (keep, feature-flagged)
|
||||
```
|
||||
|
||||
**Merge/remove:**
|
||||
- `bot` -> `sdk::bot` module
|
||||
- `ffi` -> `sdk` with `--features c-ffi`
|
||||
- `gen` -> `scripts/` or `xtask`
|
||||
- `gui` -> `apps/gui/` outside workspace (Tauri project)
|
||||
- `mobile` -> `examples/` (research spike)
|
||||
|
||||
**Add `[workspace.default-members]`** so `cargo build` doesn't attempt GUI.
|
||||
**Add `justfile`** with `build`, `test`, `test-e2e`, `build-wasm`, `docker`.
|
||||
|
||||
### 3.8 Plugin System Evolution
|
||||
|
||||
| Change | Rationale |
|
||||
|--------|-----------|
|
||||
| Add `version: u32` to `HookVTable` | ABI stability — check version on load |
|
||||
| Config passthrough | `qpq_plugin_init(vtable, config_json)` |
|
||||
| Async hooks | Plugins that call external services shouldn't block Tokio |
|
||||
| Evaluate WASM plugins | Sandboxed community plugins (keep C-ABI for first-party) |
|
||||
|
||||
### 3.9 Federation Improvements
|
||||
|
||||
| Change | Rationale |
|
||||
|--------|-----------|
|
||||
| DNS SRV / .well-known discovery | Static peer config doesn't scale |
|
||||
| Persistent relay queue with retry | Messages to offline peers are currently lost |
|
||||
| Deterministic channel ID derivation | Avoid cross-server channel conflicts |
|
||||
| Keep mDNS as optional mesh feature | Not for internet-scale, but good for LAN |
|
||||
|
||||
### 3.10 Test & CI Improvements
|
||||
|
||||
| Change | Rationale |
|
||||
|--------|-----------|
|
||||
| Per-client auth context | Removes `--test-threads 1` constraint |
|
||||
| Mock server for client unit tests | Fast tests without spawning real server |
|
||||
| Fuzz testing (cargo-fuzz) | Hybrid KEM, sealed sender, padding, Cap'n Proto deser |
|
||||
| WS bridge unit tests | 645 lines, zero tests, security-critical |
|
||||
| WASM + Go SDK in CI | Currently untested in CI |
|
||||
| Separate E2E from unit test CI job | Different speed, different failure modes |
|
||||
| macOS CI | FFI/mobile cross-compilation validation |
|
||||
| Release automation | Binary artifacts, Docker tags, WASM npm publish |
|
||||
|
||||
---
|
||||
|
||||
## Part 4 — Ecosystem Positioning
|
||||
|
||||
### Don't compete with Signal or Matrix directly.
|
||||
|
||||
**Target: Privacy-first messaging infrastructure for developers and
|
||||
organizations.**
|
||||
|
||||
quicproquo's differentiators — QUIC-native transport, post-quantum crypto, MLS,
|
||||
plugin system, multi-language SDKs, embeddable architecture — point toward an
|
||||
infrastructure play, not a consumer app.
|
||||
|
||||
Think: *"the Postgres of E2E encrypted messaging"* — a high-quality open-source
|
||||
server and protocol that other projects build on.
|
||||
|
||||
| Segment | Value Proposition |
|
||||
|---------|-------------------|
|
||||
| **Developer tool** | API-first messenger for encrypted bots and integrations |
|
||||
| **Embeddable** | C FFI + WASM + Go SDK for embedding in other apps |
|
||||
| **Enterprise** | On-prem, plugins for compliance/audit, OPAQUE zero-knowledge auth |
|
||||
| **Research** | Post-quantum crypto, MLS reference implementation, mesh networking |
|
||||
|
||||
---
|
||||
|
||||
## Part 5 — Priority Ordering
|
||||
|
||||
### Phase 1: Foundation (unblocks everything else)
|
||||
1. Replace capnp-rpc with Send-compatible framework
|
||||
2. Extract SDK crate from client
|
||||
3. Per-client auth context (no global state)
|
||||
|
||||
### Phase 2: Reliability
|
||||
4. Push-based delivery (QUIC uni-stream)
|
||||
5. Multi-stream connections
|
||||
6. Persist sessions + MLS group state
|
||||
7. Encrypt DiskKeyStore at rest
|
||||
8. peek+ack as default delivery
|
||||
|
||||
### Phase 3: Polish
|
||||
9. Workspace restructuring (12 -> 8 crates)
|
||||
10. TUI as primary interactive mode (built on SDK)
|
||||
11. Plugin system v2 (versioning, config, async)
|
||||
12. Federation retry queue + discovery
|
||||
|
||||
### Phase 4: Ecosystem
|
||||
13. Full MLS in WASM (browser E2E)
|
||||
14. WebTransport (eliminate WS bridge)
|
||||
15. Tauri GUI (built on SDK)
|
||||
16. Release automation + expanded CI
|
||||
|
||||
---
|
||||
|
||||
## Appendix — Analysis Sources
|
||||
|
||||
This document was produced by four parallel analysis agents:
|
||||
|
||||
| Agent | Scope | Files Read |
|
||||
|-------|-------|-----------|
|
||||
| server-analyst | Transport, RPC, delivery, storage, federation | 27 server .rs files, 4 schemas, core transport |
|
||||
| client-analyst | REPL, UX, state, multi-platform, SDK design | All client .rs, GUI, mobile, TS demo |
|
||||
| security-analyst | MLS, OPAQUE, hybrid KEM, keystore, identity | All core .rs, review doc |
|
||||
| dx-analyst | Workspace, build, tests, plugins, CI, ecosystem | All Cargo.toml, tests, CI, plugins, SDKs |
|
||||
@@ -1,328 +0,0 @@
|
||||
# quicproquo v2 — Master Implementation Plan
|
||||
|
||||
> Created 2026-03-04. This is the authoritative plan for the v2 rewrite.
|
||||
> See also: `docs/V2-DESIGN-ANALYSIS.md` for the detailed retrospective.
|
||||
|
||||
## Context
|
||||
|
||||
The v1 codebase has strong crypto foundations (MLS, hybrid PQ KEM, OPAQUE) but three
|
||||
architectural bottlenecks: capnp-rpc is `!Send` (single-threaded), client business logic
|
||||
is trapped in a monolithic REPL with global state, and delivery is poll-based.
|
||||
|
||||
This plan creates v2 on a new branch, keeping the crypto stack intact and replacing
|
||||
the RPC/transport layer, extracting an SDK, and restructuring the workspace.
|
||||
|
||||
**Key decisions:**
|
||||
- Transport: Protobuf (prost) + custom framing over QUIC (quinn)
|
||||
- Mobile: Tauri 2 (same Rust SDK backend, web UI)
|
||||
- Branch strategy: `v2` branch from main, not a fresh repo
|
||||
- Constraints: Rust, QUIC, GPG-signed commits, zeroize secrets, no stubs
|
||||
|
||||
---
|
||||
|
||||
## Architecture Overview
|
||||
|
||||
```
|
||||
┌─────────────────────────────────────────────────────┐
|
||||
│ Frontends │
|
||||
│ CLI/TUI │ Tauri GUI/Mobile │ Web (WebTransport)│
|
||||
└─────┬─────┴────────┬───────────┴──────────┬─────────┘
|
||||
│ │ │
|
||||
▼ ▼ ▼
|
||||
┌─────────────────────────────────────────────────────┐
|
||||
│ quicproquo-sdk │
|
||||
│ QpqClient { connect, login, send, recv, subscribe } │
|
||||
│ Event system (tokio broadcast) │
|
||||
│ Crypto pipeline (MLS, sealed sender, hybrid) │
|
||||
│ Conversation store (SQLCipher) │
|
||||
└──────────────────────┬──────────────────────────────┘
|
||||
│
|
||||
▼
|
||||
┌─────────────────────────────────────────────────────┐
|
||||
│ quicproquo-rpc │
|
||||
│ QUIC framing: [method:u16][req_id:u32][len:u32][pb] │
|
||||
│ Multi-stream (1 RPC per stream) │
|
||||
│ Server-push via uni-streams │
|
||||
│ tower middleware (auth, rate-limit) │
|
||||
└──────────────────────┬──────────────────────────────┘
|
||||
│
|
||||
▼
|
||||
┌─────────────────────────────────────────────────────┐
|
||||
│ quicproquo-server │
|
||||
│ Domain services (auth, delivery, channel, blob) │
|
||||
│ Store trait → SqlStore (connection pool) │
|
||||
│ Plugin hooks, federation, KT │
|
||||
└─────────────────────────────────────────────────────┘
|
||||
```
|
||||
|
||||
### Wire Format
|
||||
|
||||
Per QUIC bidirectional stream (request/response):
|
||||
```
|
||||
Request: [method_id: u16][request_id: u32][payload_len: u32][protobuf bytes]
|
||||
Response: [status: u8][request_id: u32][payload_len: u32][protobuf bytes]
|
||||
```
|
||||
|
||||
Per QUIC unidirectional stream (server → client push):
|
||||
```
|
||||
Push: [event_type: u16][payload_len: u32][protobuf bytes]
|
||||
```
|
||||
|
||||
Each RPC opens its own QUIC bidi stream → natural multi-stream, no head-of-line blocking.
|
||||
|
||||
---
|
||||
|
||||
## Workspace Structure (v2: 9 crates)
|
||||
|
||||
```
|
||||
quicproquo/
|
||||
├── crates/
|
||||
│ ├── quicproquo-core/ # KEEP AS-IS — crypto primitives, MLS, hybrid KEM
|
||||
│ ├── quicproquo-kt/ # KEEP AS-IS — key transparency
|
||||
│ ├── quicproquo-plugin-api/ # KEEP AS-IS — #![no_std] C-ABI
|
||||
│ ├── quicproquo-proto/ # REWRITE — protobuf schemas + prost codegen
|
||||
│ ├── quicproquo-rpc/ # NEW — QUIC RPC framework (framing, dispatch, tower)
|
||||
│ ├── quicproquo-sdk/ # NEW — client business logic library
|
||||
│ ├── quicproquo-server/ # REWRITE — domain services + RPC handlers
|
||||
│ ├── quicproquo-client/ # REWRITE — thin CLI/TUI shell over SDK
|
||||
│ └── quicproquo-p2p/ # KEEP — iroh mesh (feature-flagged, later)
|
||||
├── apps/
|
||||
│ └── gui/ # Tauri 2 desktop + mobile app (outside workspace)
|
||||
├── proto/ # .proto source files
|
||||
│ └── qpq/v1/
|
||||
│ ├── auth.proto # OPAQUE registration + login (4 methods)
|
||||
│ ├── delivery.proto # enqueue, fetch, peek, ack, batch (6 methods)
|
||||
│ ├── keys.proto # key package + hybrid key CRUD (5 methods)
|
||||
│ ├── channel.proto # channel create (1 method)
|
||||
│ ├── user.proto # resolve user/identity (2 methods)
|
||||
│ ├── blob.proto # upload/download (2 methods)
|
||||
│ ├── device.proto # register/list/revoke (3 methods)
|
||||
│ ├── p2p.proto # endpoint publish/resolve + health (3 methods)
|
||||
│ ├── federation.proto # relay + proxy (6 methods)
|
||||
│ ├── push.proto # server-push events (NEW)
|
||||
│ └── common.proto # shared types (Auth, Envelope, Error)
|
||||
├── sdks/
|
||||
│ ├── go/ # Go SDK (regenerate from .proto)
|
||||
│ └── typescript/ # TS SDK (WebTransport client)
|
||||
├── justfile # NEW — build commands
|
||||
└── Cargo.toml # workspace root
|
||||
```
|
||||
|
||||
**Removed from workspace:**
|
||||
- `quicproquo-bot` → `sdk::bot` module
|
||||
- `quicproquo-ffi` → `sdk` with `--features c-ffi`
|
||||
- `quicproquo-gen` → `scripts/`
|
||||
- `quicproquo-gui` → `apps/gui/` (Tauri project, outside workspace)
|
||||
- `quicproquo-mobile` → merged into `apps/gui/` (Tauri 2 mobile)
|
||||
|
||||
---
|
||||
|
||||
## Crate Reuse Assessment
|
||||
|
||||
| v1 Crate | capnp deps? | v2 Action | Effort |
|
||||
|----------|:-----------:|-----------|--------|
|
||||
| **quicproquo-core** | None | Copy as-is | Zero |
|
||||
| **quicproquo-kt** | None | Copy as-is | Zero |
|
||||
| **quicproquo-plugin-api** | None | Copy as-is | Zero |
|
||||
| **quicproquo-p2p** | None | Copy as-is | Zero |
|
||||
| **quicproquo-proto** | 100% capnp | Replace with prost codegen | Medium |
|
||||
| **quicproquo-server** | 16/20 files | Extract domain logic, rewrite handlers | High |
|
||||
| **quicproquo-client** | 6/10 files | Extract to SDK, thin CLI shell | High |
|
||||
|
||||
### Key Files to Reuse Directly
|
||||
|
||||
| Source (v1) | Destination (v2) | Notes |
|
||||
|-------------|------------------|-------|
|
||||
| `crates/quicproquo-core/` (entire) | same path | Zero changes |
|
||||
| `crates/quicproquo-kt/` (entire) | same path | Zero changes |
|
||||
| `crates/quicproquo-plugin-api/` (entire) | same path | Zero changes |
|
||||
| `server/src/storage.rs` | `server/src/storage.rs` | Store trait — keep |
|
||||
| `server/src/sql_store.rs` | `server/src/sql_store.rs` | Add connection pool |
|
||||
| `server/src/hooks.rs` | `server/src/hooks.rs` | Plugin system — keep |
|
||||
| `server/src/plugin_loader.rs` | `server/src/plugin_loader.rs` | Keep |
|
||||
| `server/src/error_codes.rs` | `server/src/error_codes.rs` | Keep |
|
||||
| `server/src/config.rs` | `server/src/config.rs` | Update for new transport |
|
||||
| `client/src/conversation.rs` | `sdk/src/conversation.rs` | Move to SDK |
|
||||
| `client/src/token_cache.rs` | `sdk/src/token_cache.rs` | Move to SDK |
|
||||
| `client/src/display.rs` | `client/src/display.rs` | Keep in CLI |
|
||||
| `schemas/*.capnp` | reference only | Translate to .proto |
|
||||
|
||||
---
|
||||
|
||||
## Phased Implementation
|
||||
|
||||
### Phase 1: Foundation
|
||||
**Goal:** v2 branch with new workspace, proto schemas, RPC framework skeleton, SDK skeleton.
|
||||
**Scope:** Compiles, no runtime functionality yet.
|
||||
|
||||
1. **Create v2 branch** from main
|
||||
2. **Restructure workspace** — update root Cargo.toml, create new crate dirs, add justfile
|
||||
3. **Write .proto files** — translate all 33 RPC methods + push events from Cap'n Proto
|
||||
4. **Create quicproquo-proto crate** — prost-build codegen
|
||||
5. **Create quicproquo-rpc crate** — QUIC RPC framework:
|
||||
- `framing.rs` — wire format encode/decode (request, response, push)
|
||||
- `server.rs` — accept QUIC connections, dispatch to handlers
|
||||
- `client.rs` — connect, send requests, receive responses + push events
|
||||
- `middleware.rs` — tower-based auth + rate-limit layers
|
||||
- `method.rs` — method registry (method_id → async handler fn)
|
||||
6. **Create quicproquo-sdk crate** — public API skeleton:
|
||||
- `client.rs` — `QpqClient` struct
|
||||
- `events.rs` — `ClientEvent` enum
|
||||
- `conversation.rs` — `ConversationHandle`, `ConversationStore`
|
||||
- `config.rs` — `ClientConfig`
|
||||
7. **Extract server domain types** — `server/src/domain/` module:
|
||||
- `types.rs` — plain Rust request/response types
|
||||
- `auth.rs` — OPAQUE logic extracted from auth_ops.rs
|
||||
- `delivery.rs` — enqueue/fetch logic extracted from delivery.rs
|
||||
|
||||
**Verification:**
|
||||
- `cargo build --workspace` succeeds
|
||||
- `cargo test -p quicproquo-core` passes (72 tests)
|
||||
- Proto codegen works
|
||||
- RPC framework compiles
|
||||
|
||||
---
|
||||
|
||||
### Phase 2: Server Core
|
||||
**Goal:** Working server with all 33 RPC handlers over QUIC.
|
||||
|
||||
1. **RPC dispatch** — method registry, connection lifecycle
|
||||
2. **Domain handlers** — all 33 methods as `async fn(Request) -> Result<Response>`
|
||||
- Auth (4): OPAQUE register start/finish, login start/finish
|
||||
- Delivery (6): enqueue, fetch, fetchWait, peek, ack, batchEnqueue
|
||||
- Keys (5): upload/fetch key package, upload/fetch/batch-fetch hybrid key
|
||||
- Channels (1): createChannel
|
||||
- Users (2): resolveUser, resolveIdentity
|
||||
- Blobs (2): uploadBlob, downloadBlob
|
||||
- Devices (3): registerDevice, listDevices, revokeDevice
|
||||
- P2P (3): health, publishEndpoint, resolveEndpoint
|
||||
- Federation (6): relay enqueue/batch, proxy fetch/resolve, health
|
||||
3. **Server-push** — notification stream via QUIC uni-stream
|
||||
4. **Storage upgrades:**
|
||||
- Drop `FileBackedStore`
|
||||
- Connection pool (deadpool-sqlite)
|
||||
- Persist sessions to SQLite
|
||||
- Atomic queue depth check + enqueue
|
||||
5. **Tower middleware** — auth validation, rate limiting, audit logging
|
||||
6. **Multi-stream** — concurrent RPCs per connection (remove 1-stream limit)
|
||||
|
||||
**Verification:**
|
||||
- Server starts, accepts QUIC connections
|
||||
- Health check RPC works
|
||||
- OPAQUE registration + login works
|
||||
- Message enqueue + fetch round-trip
|
||||
|
||||
---
|
||||
|
||||
### Phase 3: SDK
|
||||
**Goal:** Complete client SDK library — the heart of v2.
|
||||
|
||||
1. **QpqClient** — connect, OPAQUE auth, session management (no global state)
|
||||
2. **Crypto pipeline** — MLS processing, sealed sender unwrap, hybrid decrypt
|
||||
(extracted from repl.rs `poll_messages()`)
|
||||
3. **Conversation management** — create DM, create group, invite, remove, send, receive
|
||||
4. **Event system** — `tokio::broadcast<ClientEvent>` replacing poll loop
|
||||
- `MessageReceived`, `TypingIndicator`, `ConversationCreated`
|
||||
- `MemberJoined`, `MemberLeft`, `ConnectionLost`, `Reconnected`
|
||||
5. **Offline support** — outbox queue, retry with backoff, sync on reconnect
|
||||
6. **ConversationStore** — SQLCipher local DB (migrate from client/conversation.rs)
|
||||
7. **Key management** — encrypted DiskKeyStore, MLS group state persistence
|
||||
8. **Token/secret zeroization** — `AuthContext.token` etc. wrapped in `Zeroizing`
|
||||
|
||||
**Verification:**
|
||||
- SDK integration test: connect → login → create DM → send → receive
|
||||
- No global state (`AUTH_CONTEXT` eliminated)
|
||||
- Event subscription works
|
||||
- Offline outbox drains on reconnect
|
||||
|
||||
---
|
||||
|
||||
### Phase 4: Client
|
||||
**Goal:** CLI and TUI as thin shells over SDK.
|
||||
|
||||
1. **CLI binary** (`qpq`) — clap subcommands calling `QpqClient`
|
||||
2. **REPL** — readline with tab-completion (rustyline), categorized `/help`
|
||||
3. **TUI** — ratatui, subscribes to `QpqClient::subscribe()` events
|
||||
4. **Simplified commands:**
|
||||
- Hide MLS/KeyPackage internals (auto-refresh)
|
||||
- Message references by short ID (not index)
|
||||
- Batch operations (`/create-group team alice bob`)
|
||||
- Categorized help (Chat, Groups, Security, System)
|
||||
5. **Auto-server-launch** — keep zero-config DX from v1
|
||||
6. **Playbook system** — keep YAML-based test scripting
|
||||
|
||||
**Verification:**
|
||||
- `qpq --username alice --password pass` starts REPL (same UX as v1)
|
||||
- TUI mode works with live event updates
|
||||
- Tab-completion for commands and usernames
|
||||
- E2E test: two clients exchange messages
|
||||
|
||||
---
|
||||
|
||||
### Phase 5: Desktop & Mobile
|
||||
**Goal:** Tauri 2 app for all platforms.
|
||||
|
||||
1. **Tauri 2 project** in `apps/gui/`
|
||||
2. **Rust backend** — Tauri commands wrapping `QpqClient`
|
||||
3. **Web frontend** — Svelte or vanilla HTML/JS
|
||||
4. **Desktop** — Linux, macOS, Windows
|
||||
5. **Mobile** — iOS, Android via Tauri 2 mobile
|
||||
6. **QUIC connection migration** — automatic wifi↔cellular handoff
|
||||
|
||||
**Verification:**
|
||||
- Desktop app builds and runs on Linux
|
||||
- Mobile app builds for Android (emulator)
|
||||
- Send message from CLI → received in GUI
|
||||
|
||||
---
|
||||
|
||||
### Phase 6: Polish & Ecosystem
|
||||
**Goal:** Production readiness.
|
||||
|
||||
1. **Federation improvements** — DNS SRV discovery, persistent relay queue with retry
|
||||
2. **Plugin system v2** — version field, config passthrough, async hooks, WASM plugins
|
||||
3. **WebTransport** — browser clients over HTTP/3 (same quinn endpoint)
|
||||
4. **WASM MLS** — compile openmls to wasm32 for browser E2E encryption
|
||||
5. **CI/CD** — release automation, WASM CI, multi-platform (Linux + macOS)
|
||||
6. **Security hardening:**
|
||||
- Fuzz testing (hybrid KEM, sealed sender, padding, protobuf deser)
|
||||
- Remove all `InsecureServerCertVerifier` paths
|
||||
- Certificate pinning
|
||||
- Add passkey/WebAuthn as alternative auth
|
||||
7. **Double Ratchet for 1:1 DMs** — better per-message forward secrecy than MLS for 2-party
|
||||
|
||||
---
|
||||
|
||||
## RPC Method Inventory (33 total)
|
||||
|
||||
| Category | Methods | Proto File |
|
||||
|----------|---------|-----------|
|
||||
| Auth (OPAQUE) | opaqueRegisterStart, opaqueRegisterFinish, opaqueLoginStart, opaqueLoginFinish | auth.proto |
|
||||
| Delivery | enqueue, fetch, fetchWait, peek, ack, batchEnqueue | delivery.proto |
|
||||
| Keys | uploadKeyPackage, fetchKeyPackage, uploadHybridKey, fetchHybridKey, fetchHybridKeys | keys.proto |
|
||||
| Channel | createChannel | channel.proto |
|
||||
| User | resolveUser, resolveIdentity | user.proto |
|
||||
| Blob | uploadBlob, downloadBlob | blob.proto |
|
||||
| Device | registerDevice, listDevices, revokeDevice | device.proto |
|
||||
| P2P | health, publishEndpoint, resolveEndpoint | p2p.proto |
|
||||
| Federation | relayEnqueue, relayBatchEnqueue, proxyFetchKeyPackage, proxyFetchHybridKey, proxyResolveUser, federationHealth | federation.proto |
|
||||
|
||||
**New in v2:**
|
||||
| Push Events | Description | Proto File |
|
||||
|-------------|-------------|-----------|
|
||||
| MessageNotification | New message available | push.proto |
|
||||
| TypingNotification | Peer is typing | push.proto |
|
||||
| ChannelUpdate | Channel created/member changed | push.proto |
|
||||
| SessionExpired | Auth session expired | push.proto |
|
||||
|
||||
---
|
||||
|
||||
## Engineering Standards (carried from v1)
|
||||
|
||||
- Conventional commits: `feat:`, `fix:`, `chore:`, `docs:`, `test:`, `refactor:`
|
||||
- GPG-signed commits only
|
||||
- No `Co-authored-by` trailers
|
||||
- No `.unwrap()` on crypto or I/O in non-test paths
|
||||
- Secrets: zeroize on drop, never in logs
|
||||
- No stubs / `todo!()` / `unimplemented!()` in production code
|
||||
- `clippy::unwrap_used = "deny"` at workspace level
|
||||
Reference in New Issue
Block a user