feat: add post-quantum hybrid KEM + SQLCipher persistence

Feature 1 — Post-Quantum Hybrid KEM (X25519 + ML-KEM-768):
- Create hybrid_kem.rs with keygen, encrypt, decrypt + 11 unit tests
- Wire format: version(1) | x25519_eph_pk(32) | mlkem_ct(1088) | nonce(12) | ct
- Add uploadHybridKey/fetchHybridKey RPCs to node.capnp schema
- Server: hybrid key storage in FileBackedStore + RPC handlers
- Client: hybrid keypair in StoredState, auto-wrap/unwrap in send/recv/invite/join
- demo-group runs full hybrid PQ envelope round-trip

Feature 2 — SQLCipher Persistence:
- Extract Store trait from FileBackedStore API
- Create SqlStore (rusqlite + bundled-sqlcipher) with encrypted-at-rest SQLite
- Schema: key_packages, deliveries, hybrid_keys tables with indexes
- Server CLI: --store-backend=sql, --db-path, --db-key flags
- 5 unit tests for SqlStore (FIFO, round-trip, upsert, channel isolation)

Also includes: client lib.rs refactor, auth config, TOML config file support,
mdBook documentation, and various cleanups by user.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
This commit is contained in:
2026-02-22 08:07:48 +01:00
parent d1ddef4cea
commit f334ed3d43
81 changed files with 14502 additions and 2289 deletions

View File

@@ -0,0 +1,259 @@
# Service Architecture
The quicnprotochat server exposes a single **NodeService** RPC endpoint that
combines Authentication and Delivery operations. This page documents the RPC
interface, per-connection lifecycle, storage model, long-polling mechanism, and
authentication context.
---
## NodeService Endpoint
A single QUIC + TLS 1.3 listener on **port 7000** serves all operations.
The schema is defined in `schemas/node.capnp` and documented in
[NodeService Schema](../wire-format/node-service-schema.md).
```text
NodeService (port 7000)
├── Authentication methods
│ ├── uploadKeyPackage(identityKey, package, auth) -> fingerprint
│ ├── fetchKeyPackage(identityKey, auth) -> package
│ ├── uploadHybridKey(identityKey, hybridPublicKey) -> ()
│ └── fetchHybridKey(identityKey) -> hybridPublicKey
├── Delivery methods
│ ├── enqueue(recipientKey, payload, channelId, version, auth) -> ()
│ ├── fetch(recipientKey, channelId, version, auth) -> payloads
│ └── fetchWait(recipientKey, channelId, version, timeoutMs, auth) -> payloads
└── Operational
└── health() -> status
```
---
## RPC Method Reference
### Authentication Service Methods
| Method | Params | Returns | Semantics |
|----------------------|-------------------------------------|------------------|-----------|
| `uploadKeyPackage` | `identityKey` (32 B Ed25519 pk), `package` (TLS-encoded KeyPackage), `auth` | `fingerprint` (SHA-256 of package) | Appends the KeyPackage to a per-identity FIFO queue. The fingerprint lets the client detect server-side tampering. Max package size: 1 MB. |
| `fetchKeyPackage` | `identityKey` (32 B), `auth` | `package` (or empty `Data`) | Atomically pops and returns the oldest KeyPackage for the identity. Returns empty bytes if none are stored. Single-use semantics per RFC 9420. |
| `uploadHybridKey` | `identityKey` (32 B), `hybridPublicKey` (X25519 pk + ML-KEM-768 ek) | `()` | Stores (or replaces) the hybrid PQ public key for envelope-level post-quantum encryption. |
| `fetchHybridKey` | `identityKey` (32 B) | `hybridPublicKey` (or empty `Data`) | Returns the stored hybrid public key for a peer, or empty if none. |
### Delivery Service Methods
| Method | Params | Returns | Semantics |
|--------------|------------------------------------------------------------------------|----------------------|-----------|
| `enqueue` | `recipientKey` (32 B), `payload` (opaque), `channelId`, `version`, `auth` | `()` | Appends `payload` to the recipient's FIFO queue. Max payload: 5 MB. Wakes any `fetchWait` waiter for this recipient. Supported versions: 0 (legacy), 1 (current). |
| `fetch` | `recipientKey` (32 B), `channelId`, `version`, `auth` | `payloads: List(Data)` | Atomically drains and returns the full queue in FIFO order. Returns empty list if nothing is pending. |
| `fetchWait` | `recipientKey` (32 B), `channelId`, `version`, `timeoutMs`, `auth` | `payloads: List(Data)` | Same as `fetch`, but if the queue is empty and `timeoutMs > 0`, blocks up to `timeoutMs` milliseconds waiting for a `Notify` signal from `enqueue`. Returns whatever is in the queue when the wait completes or times out. |
### Operational Methods
| Method | Params | Returns | Semantics |
|----------|--------|-----------------|-----------|
| `health` | none | `status: Text` | Returns `"ok"`. Used for liveness/readiness probes. |
---
## Per-Connection Lifecycle
Each incoming QUIC connection follows this sequence:
```text
┌──────────────────────────────────────────────────────────────────────┐
│ Client Server │
│ │
│ 1. UDP packet -> │
│ QUIC INITIAL │
│ │
│ 2. <- QUIC HANDSHAKE │
│ TLS 1.3 ServerHello + │
│ Certificate (self-signed) │
│ ALPN: "capnp" │
│ │
│ 3. Client verifies server │
│ cert against pinned CA │
│ cert (--ca-cert flag) │
│ │
│ 4. QUIC connection established │
│ │
│ 5. Client opens bidirectional ──────────> Server accepts bi stream │
│ QUIC stream (open_bi) (accept_bi) │
│ │
│ 6. tokio_util::compat adapters wrap the send/recv halves │
│ into AsyncRead + AsyncWrite │
│ │
│ 7. capnp-rpc twoparty::VatNetwork │
│ Client Side::Client Server Side::Server │
│ │
│ 8. RpcSystem::new() starts │
│ promise-pipelined RPC loop │
│ │
│ 9. Client bootstraps │
│ node_service::Client NodeServiceImpl created │
│ (shares Arc<FileBackedStore>, │
│ Arc<DashMap<..., Notify>>) │
│ │
│ 10. RPC calls flow over the bidirectional stream │
│ until either side closes the connection. │
└──────────────────────────────────────────────────────────────────────┘
```
### LocalSet requirement
`capnp-rpc` uses `Rc<RefCell<>>` internally, making it `!Send`. Therefore:
- The server runs the entire accept loop inside a `tokio::task::LocalSet`.
- Each connection handler is `spawn_local`, ensuring all RPC futures stay on a
single thread.
- The client wraps each subcommand invocation in its own `LocalSet::run_until`.
This is a fundamental constraint of the Cap'n Proto RPC runtime in Rust.
Attempts to spawn RPC futures on the multi-threaded Tokio executor will fail
with a compile error.
---
## Storage Model
`NodeServiceImpl` holds two pieces of shared state:
### FileBackedStore
```text
FileBackedStore
├── key_packages: Mutex<HashMap<Vec<u8>, VecDeque<Vec<u8>>>>
│ Key: Ed25519 public key (32 bytes)
│ Value: FIFO queue of TLS-encoded KeyPackage blobs
│ File: data/keypackages.bin (bincode)
├── deliveries: Mutex<HashMap<ChannelKey, VecDeque<Vec<u8>>>>
│ ChannelKey: { channel_id: Vec<u8>, recipient_key: Vec<u8> }
│ Value: FIFO queue of opaque payload blobs
│ File: data/deliveries.bin (bincode, v2 format)
└── hybrid_keys: Mutex<HashMap<Vec<u8>, Vec<u8>>>
Key: Ed25519 public key (32 bytes)
Value: serialised HybridPublicKey blob
File: data/hybridkeys.bin (bincode)
```
Every mutation (upload, fetch, enqueue) acquires the relevant `Mutex`, modifies
the in-memory `HashMap`, and then flushes the entire map to disk as a bincode
blob. This is intentionally simple for MVP-scale workloads. A production
deployment would replace this with an embedded database or external store.
The delivery map supports a **v1 -> v2 upgrade path**: if `deliveries.bin`
contains the legacy `QueueMapV1` format (keyed by `recipientKey` only), the
store transparently upgrades entries by wrapping them in `ChannelKey` with an
empty `channel_id`.
### DashMap Waiters
```text
Arc<DashMap<Vec<u8>, Arc<Notify>>>
Key: recipient Ed25519 public key (32 bytes)
Value: tokio::sync::Notify instance
```
The waiters map is orthogonal to `FileBackedStore`. It lives entirely in
memory and serves the `fetchWait` long-polling mechanism:
1. `enqueue` calls `waiter(&recipient_key).notify_waiters()` after storing the
payload.
2. `fetchWait` first tries a regular `fetch`. If the queue is empty and
`timeoutMs > 0`:
- Look up or insert a `Notify` for the recipient.
- `tokio::time::timeout(Duration::from_millis(timeoutMs), notify.notified())`
- When notified (or on timeout), perform a second `fetch` and return
whatever is available.
This design avoids busy-polling while keeping the implementation lock-free
(DashMap uses sharded RwLocks internally).
---
## Auth Struct
Every RPC method that modifies or reads user-specific state accepts an `Auth`
parameter:
```capnp
struct Auth {
version @0 :UInt16; # 0 = legacy/none, 1 = token-based auth
accessToken @1 :Data; # opaque bearer token
deviceId @2 :Data; # optional UUID for auditing/rate limiting
}
```
### Version semantics
| Version | Meaning |
|---------|------------------------------------------------------------|
| 0 | Legacy / no authentication. The server accepts the request without checking credentials. Suitable for development and testing. |
| 1 | Token-based authentication. The `accessToken` field should contain an opaque bearer token issued at login. The server validates the token against a token store (not yet implemented -- see [Auth, Devices, and Tokens](../roadmap/authz-plan.md)). |
The server validates the `version` field on every request via `validate_auth()`.
Requests with unsupported versions are rejected with a Cap'n Proto error.
### Client-side usage
The client CLI accepts `--access-token` and `--device-id` flags (or the
corresponding environment variables). These are bundled into a `ClientAuth`
struct and injected into every outgoing RPC call via the `set_auth()` helper.
Currently, the client sends `version = 0` with empty token and device ID by
default. When the token-based auth flow is implemented, the client will populate
these fields.
---
## Validation and Limits
The server enforces the following constraints on every RPC call:
| Constraint | Value | Error on violation |
|-----------------------------|--------------------|--------------------|
| `identityKey` / `recipientKey` length | Exactly 32 bytes | Cap'n Proto error: "must be exactly 32 bytes" |
| KeyPackage size | <= 1 MB | Cap'n Proto error: "package exceeds max size" |
| Payload size | <= 5 MB | Cap'n Proto error: "payload exceeds max size" |
| Wire version | 0 or 1 | Cap'n Proto error: "unsupported wire version" |
| Auth version | 0 or 1 | Cap'n Proto error: "unsupported auth version" |
| KeyPackage non-empty | `package.len() > 0`| Cap'n Proto error: "package must not be empty" |
| Payload non-empty | `payload.len() > 0`| Cap'n Proto error: "payload must not be empty" |
---
## Configuration
The server binary is configured via CLI flags or environment variables:
| Flag | Env var | Default | Description |
|----------------|----------------------------|----------------------|-------------|
| `--listen` | `QUICNPROTOCHAT_LISTEN` | `0.0.0.0:7000` | QUIC listen address (host:port). |
| `--data-dir` | `QUICNPROTOCHAT_DATA_DIR` | `data` | Directory for persisted KeyPackages, delivery queues, and hybrid keys. |
| `--tls-cert` | `QUICNPROTOCHAT_TLS_CERT` | `data/server-cert.der` | Path to TLS certificate (DER). Auto-generated if missing. |
| `--tls-key` | `QUICNPROTOCHAT_TLS_KEY` | `data/server-key.der` | Path to TLS private key (DER). Auto-generated if missing. |
If the TLS certificate or key files do not exist at startup, the server
auto-generates a self-signed certificate for `localhost`, `127.0.0.1`, and
`::1` using `rcgen`.
Logging level is controlled by the `RUST_LOG` environment variable (default:
`info`).
---
## Further Reading
- [Architecture Overview](overview.md) -- two-service model and dual-key overview
- [NodeService Schema](../wire-format/node-service-schema.md) -- full Cap'n Proto schema
- [End-to-End Data Flow](data-flow.md) -- sequence diagrams showing registration, group creation, and messaging
- [Delivery Service Internals](../internals/delivery-service.md) -- queue routing and channel-aware delivery
- [Authentication Service Internals](../internals/authentication-service.md) -- KeyPackage lifecycle
- [Storage Backend](../internals/storage-backend.md) -- FileBackedStore details and upgrade path
- [Auth, Devices, and Tokens](../roadmap/authz-plan.md) -- planned token-based authentication