feat: add delivery sequence numbers + major server/client refactor
Delivery sequence numbers (MLS epoch ordering fix):
- schemas/node.capnp: add Envelope{seq,data} struct; enqueue returns seq:UInt64;
fetch/fetchWait return List(Envelope) instead of List(Data)
- storage.rs: Store trait enqueue returns u64; fetch/fetch_limited return
Vec<(u64, Vec<u8>)>; FileBackedStore gains QueueMapV3 with per-inbox seq
counters and V2→V3 on-disk migration
- migrations/002_add_seq.sql: seq column, delivery_seq_counters table, index
- sql_store.rs: atomic UPSERT counter via RETURNING, ORDER BY seq, SCHEMA_VERSION→3
- node_service/delivery.rs: builds Envelope list; returns seq from enqueue
- client/rpc.rs: enqueue→u64, fetch_all/fetch_wait→Vec<(u64,Vec<u8>)>
- client/commands.rs: sort-by-seq before MLS processing; retry loop in cmd_recv
and receive_pending_plaintexts for correct epoch ordering
Server refactor:
- Split monolithic main.rs into node_service/{mod,delivery,auth_ops,key_ops,p2p_ops}
- Add auth.rs (token validation, rate limiting), config.rs, metrics.rs, tls.rs
- Add SQL migrations runner (001_initial.sql, 002_add_seq.sql)
- OPAQUE PAKE login/registration, sealed-sender mode, queue depth limit (1000)
Client refactor:
- Split lib.rs into client/{commands,rpc,state,retry,hex,mod}
- Add cmd_whoami, cmd_health, cmd_check_key, cmd_ping subcommands
- Add cmd_register_user, cmd_login (OPAQUE), cmd_refresh_keypackage
- Hybrid PQ envelope (X25519 + ML-KEM-768) on all send/recv paths
- E2E test suite expanded
Other:
- quicnprotochat-gui: Tauri 2 desktop GUI skeleton (backend + HTML UI)
- quicnprotochat-p2p: iroh-based P2P transport stub
- quicnprotochat-core: app_message, hybrid_crypto modules; GroupMember API updates
- .github/workflows/size-lint.yml: binary size regression check
- docs: protocol comparison, roadmap updates, fully-operational checklist
This commit is contained in:
135
docs/src/roadmap/fully-operational-checklist.md
Normal file
135
docs/src/roadmap/fully-operational-checklist.md
Normal file
@@ -0,0 +1,135 @@
|
||||
# Features Needed to Be Fully Operational
|
||||
|
||||
This checklist reflects the current state after M1–M3, M4-style CLI, M6 migrations, rich messaging, Sealed Sender, and GUI scaffold. It lists what is **done**, what is **partially done**, and what still **must be implemented** for a fully operational chat system.
|
||||
|
||||
---
|
||||
|
||||
## Summary Table
|
||||
|
||||
| Area | Status | Notes |
|
||||
|------|--------|--------|
|
||||
| Transport (QUIC/TLS) | Done | M1 |
|
||||
| Auth service (KeyPackage, OPAQUE) | Done | M2 + register-user, login |
|
||||
| Delivery + MLS groups (2-party) | Done | M3 |
|
||||
| Group CLI (create, invite, join, send, recv, chat) | Done | M4-style |
|
||||
| Server persistence (SQL + migrations) | Done | M6 migrations + runner |
|
||||
| Client state persistence | Done | State file, DiskKeyStore, encrypted (QPCE) |
|
||||
| Rich messaging (app payload schema) | Done | Chat, Reply, Reaction, ReadReceipt, Typing + sender |
|
||||
| Sealed Sender | Done | Server config; enqueue without identity |
|
||||
| Native GUI scaffold | Done | Tauri, whoami, health |
|
||||
| **Multi-party groups (N > 2)** | Done | M5: Commit fan-out, send --all, epoch sync, three-party E2E |
|
||||
| **KeyPackage rotation** | **To do** | Client upload before TTL (24h) |
|
||||
| **Observability** | **To do** | Metrics (Prometheus), tracing (OpenTelemetry), health |
|
||||
| **Client resilience** | **To do** | Retry/backoff, idempotent message IDs, gap detection |
|
||||
| **1:1 channel semantics** | Partial | channelId in DS; per-channel authz/TTL not formalized |
|
||||
| **Production hardening** | **To do** | CI, CODEOWNERS, SBOM, backup/restore, rate-limit tuning |
|
||||
| **Post-quantum (M7)** | Next | Custom OpenMlsCryptoProvider with hybrid KEM |
|
||||
|
||||
---
|
||||
|
||||
## 1. Must-Have for “Fully Operational”
|
||||
|
||||
These are the features that, if missing, prevent the system from being considered fully operational for real use (multi-user groups, reliability, and operations).
|
||||
|
||||
### 1.1 Multi-party groups (M5)
|
||||
|
||||
**Current:** Core supports `add_member` and `merge_staged_commit`; client/server only exercise 2-party (creator + one joiner).
|
||||
|
||||
**To implement:**
|
||||
|
||||
- **Commit fan-out:** When creator invites a new member, the Commit must be delivered to **all existing members** (not just the creator). Client flow: after `add_member`, enqueue the Commit to each existing member’s queue (by identity / recipient_key) in addition to sending the Welcome to the new member.
|
||||
- **Proposal handling:** Ensure all members process Commits and Proposals (Add/Remove/Update) so epoch advancement is consistent; already partially in core (`merge_staged_commit`, `store_pending_proposal`).
|
||||
- **CLI/API:** Extend `invite` so that after adding a member, the client fetches the list of existing members (e.g. from local group state) and enqueues the Commit to each. Optional: `recv` processes incoming Commits and updates local group state before returning application messages.
|
||||
- **Tests:** E2E with 3+ members: create group, invite B, invite C, send from A, B, C; all receive and decrypt.
|
||||
|
||||
### 1.2 KeyPackage rotation
|
||||
|
||||
**Current:** KeyPackages are single-use (consume-on-fetch). Server TTL (e.g. 24h) and client upload are in place, but there is no **scheduled client-side rotation**.
|
||||
|
||||
**To implement:**
|
||||
|
||||
- **Timer or on-demand:** Before KeyPackage TTL expires (e.g. 24h), client uploads a fresh KeyPackage (and optionally removes or replaces the old one). Can be a background task in the client (CLI daemon or GUI backend) or triggered when a “fetch key” fails with “no key”.
|
||||
- **Documentation:** Document TTL and rotation in user/ops docs.
|
||||
|
||||
### 1.3 Observability
|
||||
|
||||
**Current:** Health RPC and basic tracing exist; no structured metrics or distributed tracing.
|
||||
|
||||
**To implement:**
|
||||
|
||||
- **Metrics:** Prometheus (or equivalent) export for: enqueue/fetch rate, RPC latency histograms, queue depth per recipient, KeyPackage store size, active connections. See [Future Research](future-research.md).
|
||||
- **Health:** Existing `health` RPC is sufficient; optionally add a simple HTTP health endpoint for load balancers (e.g. on a separate port).
|
||||
- **Structured logging:** Ensure sensitive data is never logged; audit events (auth, enqueue, rate limit) as in [Production Readiness](production-readiness.md).
|
||||
|
||||
### 1.4 Client resilience
|
||||
|
||||
**Current:** Single attempt for send/recv; no retry, no idempotent message IDs, no gap detection.
|
||||
|
||||
**To implement:**
|
||||
|
||||
- **Retry with backoff:** On transient failures (network, server busy), retry with exponential backoff + jitter for enqueue, fetch, fetchWait.
|
||||
- **Idempotent message IDs:** Client-generated message IDs (already in rich messaging); server-side deduplication by (recipient_key, channel_id, message_id) if desired, to avoid duplicate delivery on retry.
|
||||
- **Gap detection (optional):** Per-channel sequence numbers or epoch checks so the client can detect missing Commits or messages and re-sync (e.g. re-fetch or rejoin).
|
||||
|
||||
---
|
||||
|
||||
## 2. Important for Production Readiness
|
||||
|
||||
Not strictly required for “operational” but expected for production deployments.
|
||||
|
||||
### 2.1 1:1 channel semantics (Phase 4)
|
||||
|
||||
**Current:** Delivery is per `(recipient_key, channel_id)`; channelId is used in enqueue/fetch. No formal per-channel authz or TTL.
|
||||
|
||||
**To implement:**
|
||||
|
||||
- **Per-channel authz:** Ensure fetch/fetchWait only return messages for channels the authenticated identity is allowed to read (e.g. identity bound to recipient_key or to a channel membership list).
|
||||
- **TTL eviction:** Server already has message TTL (e.g. 7 days) and GC; document and optionally make TTL configurable per channel type.
|
||||
|
||||
### 2.2 Wire versioning and protocol hardening (Phase 2)
|
||||
|
||||
**Current:** Wire version is checked on enqueue/fetch (e.g. `CURRENT_WIRE_VERSION`). Ciphersuite allowlist and ALPN are partially in place.
|
||||
|
||||
**To implement:**
|
||||
|
||||
- **Ciphersuite allowlist:** Server rejects KeyPackages with unknown ciphersuites.
|
||||
- **Downgrade guards:** Reject Commits with weaker ciphersuites once a group has advanced.
|
||||
- **Connection draining:** Graceful QUIC `CONNECTION_CLOSE` on server shutdown.
|
||||
|
||||
### 2.3 Production hardening (Phase 1 + 6)
|
||||
|
||||
- **CODEOWNERS:** Map crates to reviewers.
|
||||
- **CI:** `cargo test --workspace`, `cargo clippy`, `cargo fmt --check`, `cargo audit`, optional `cargo deny`.
|
||||
- **SBOM:** e.g. `cargo-cyclonedx` or `cargo-about` in CI.
|
||||
- **Backup/restore:** SQLite/SQLCipher backup and integrity verification for server DB.
|
||||
- **Rate limiting:** Already per-token; optionally add per-IP and per-account limits and document.
|
||||
|
||||
---
|
||||
|
||||
## 3. Roadmap and Documentation Updates
|
||||
|
||||
- **Milestones doc:** Mark M4 as **Complete** (CLI subcommands exist). Mark M6 as **Complete** (migrations + runner; server and client persistence in place). Leave M5 as **Next** and M7 as **Planned**.
|
||||
- **README:** Update milestone table to reflect M4 and M6 complete; add one line on migrations (e.g. “Server supports SQL migrations under `quicnprotochat-server/migrations/`”).
|
||||
- **Migration convention:** Document in README or a dev doc: add new migrations as `NNN_name.sql`, add to `MIGRATIONS` in `sql_store.rs`, bump `SCHEMA_VERSION`.
|
||||
|
||||
---
|
||||
|
||||
## 4. Optional / Later
|
||||
|
||||
- **Post-quantum (M7):** Custom `OpenMlsCryptoProvider` with hybrid X25519 + ML-KEM-768 for MLS HPKE; all M3–M5 tests pass with PQ backend.
|
||||
- **GUI completion:** Full flows (login, conversation list, chat view with send/recv, settings); long-lived connection and streaming recv.
|
||||
- **WebTransport + WASM:** Browser client.
|
||||
- **iroh / P2P:** NAT traversal and optional direct peer-to-peer delivery.
|
||||
|
||||
---
|
||||
|
||||
## Priority Order for “Fully Operational”
|
||||
|
||||
1. **M5 Multi-party groups** — Commit fan-out and client flow for N > 2.
|
||||
2. **KeyPackage rotation** — Client upload before TTL.
|
||||
3. **Observability** — Metrics + health + safe logging.
|
||||
4. **Client resilience** — Retry, backoff, idempotent message IDs.
|
||||
5. **Docs** — Update milestones and README (M4, M6, migrations).
|
||||
6. **Production hardening** — CI, CODEOWNERS, SBOM, backup, rate-limit docs.
|
||||
|
||||
Once 1–5 are in place, the system can be considered **fully operational** for multi-user group chat with durable state and observable, resilient clients. Item 6 and the optional items bring it to **production-ready** and beyond.
|
||||
@@ -14,10 +14,10 @@ for what that means in practice.
|
||||
| M1 | QUIC/TLS Transport | **Complete** | QUIC + TLS 1.3 endpoint, length-prefixed framing, Ping/Pong |
|
||||
| M2 | Authentication Service | **Complete** | Ed25519 identity, KeyPackage generation, AS upload/fetch |
|
||||
| M3 | Delivery Service + MLS Groups | **Complete** | DS relay, GroupMember create/join/add/send/recv |
|
||||
| M4 | Group CLI Subcommands | **Next** | Persistent CLI (create-group, invite, join, send, recv); `demo-group` already available |
|
||||
| M5 | Multi-party Groups | Planned | N > 2 members, Commit fan-out, Proposal handling |
|
||||
| M6 | Persistence | Planned | SQLite key store, durable group state |
|
||||
| M7 | Post-quantum | Planned | PQ hybrid for MLS/HPKE (X25519 + ML-KEM-768) |
|
||||
| M4 | Group CLI Subcommands | **Complete** | Persistent CLI (create-group, invite, join, send, recv), OPAQUE login |
|
||||
| M5 | Multi-party Groups | **Complete** | N > 2 members, Commit fan-out, send --all, epoch sync |
|
||||
| M6 | Persistence | **Complete** | SQLite/SQLCipher, migrations, durable server + client state |
|
||||
| M7 | Post-quantum | **Next** | PQ hybrid for MLS/HPKE (X25519 + ML-KEM-768) |
|
||||
|
||||
---
|
||||
|
||||
@@ -103,63 +103,45 @@ group\_id lifecycle, MLS integration.
|
||||
|
||||
---
|
||||
|
||||
## M4 -- Group CLI Subcommands (Next)
|
||||
## M4 -- Group CLI Subcommands (Complete)
|
||||
|
||||
**Goal:** Persistent, composable CLI subcommands for group operations, replacing
|
||||
the monolithic `demo-group` proof-of-concept.
|
||||
|
||||
**Planned deliverables:**
|
||||
|
||||
- `create-group` -- creates a new MLS group, stores state locally
|
||||
- `invite <identity>` -- adds a member by fetching their KeyPackage from the AS
|
||||
- `join` -- processes a Welcome message and joins an existing group
|
||||
- `send <message>` -- encrypts and enqueues an application message
|
||||
- `recv` -- fetches and decrypts pending messages (or long-polls with `fetchWait`)
|
||||
|
||||
The `demo-group` subcommand remains available as a single-command demonstration
|
||||
of the full flow.
|
||||
**Deliverables:** `create-group`, `invite`, `join`, `send`, `recv`, `chat`;
|
||||
OPAQUE `register-user` and `login`; `demo-group` remains for single-command demo.
|
||||
|
||||
---
|
||||
|
||||
## M5 -- Multi-party Groups (Planned)
|
||||
## M5 -- Multi-party Groups (Complete)
|
||||
|
||||
**Goal:** Support groups with N > 2 members, including Commit fan-out and
|
||||
Proposal handling.
|
||||
epoch synchronisation.
|
||||
|
||||
**Planned deliverables:**
|
||||
|
||||
- Commit fan-out through the DS to all group members
|
||||
- Proposal handling (Add, Remove, Update)
|
||||
- Epoch synchronisation across N members
|
||||
- Criterion benchmarks: key generation, encap/decap, group-add latency
|
||||
(10/100/1000 members)
|
||||
**Deliverables:** Commit fan-out to existing members on invite; `send --all`;
|
||||
`cmd_join` processes all queued payloads (Welcome + Commits); three-party E2E
|
||||
passing. Proposal handling (Remove, Update) and Criterion benchmarks are
|
||||
optional follow-ups.
|
||||
|
||||
---
|
||||
|
||||
## M6 -- Persistence (Planned)
|
||||
## M6 -- Persistence (Complete)
|
||||
|
||||
**Goal:** Server survives restart. Client state persists across sessions.
|
||||
|
||||
**Planned deliverables:**
|
||||
|
||||
- `quicnprotochat-server`: SQLite via `sqlx` for AS key store and DS message log,
|
||||
`migrations/` directory
|
||||
- `docker/Dockerfile`: multi-stage build (`rust:bookworm` builder, `debian:bookworm-slim` runtime)
|
||||
- `docker-compose.yml`: server + SQLite volume, healthcheck
|
||||
- Client reconnect with session resume (re-handshake + rejoin group epoch from
|
||||
DS log)
|
||||
|
||||
**Deliverables:** SQLite/SQLCipher via rusqlite, `migrations/` directory and
|
||||
migration runner; client state file and DiskKeyStore (encrypted QPCE optional).
|
||||
See [Future Research: SQLCipher](future-research.md#storage--persistence) for
|
||||
encrypted-at-rest options.
|
||||
|
||||
---
|
||||
|
||||
## M7 -- Post-quantum (Planned)
|
||||
## M7 -- Post-quantum (Next)
|
||||
|
||||
**Goal:** Replace the MLS crypto backend with a hybrid X25519 + ML-KEM-768 KEM,
|
||||
providing post-quantum confidentiality for all group key material.
|
||||
|
||||
**Planned deliverables:**
|
||||
**Deliverables:**
|
||||
|
||||
- Custom `OpenMlsCryptoProvider` with hybrid KEM in `quicnprotochat-core`
|
||||
- Hybrid shared secret derivation:
|
||||
|
||||
Reference in New Issue
Block a user