feat: add post-quantum hybrid KEM + SQLCipher persistence

Feature 1 — Post-Quantum Hybrid KEM (X25519 + ML-KEM-768): - Create hybrid_kem.rs with keygen, encrypt, decrypt + 11 unit tests - Wire format: version(1) | x25519_eph_pk(32) | mlkem_ct(1088) | nonce(12) | ct - Add uploadHybridKey/fetchHybridKey RPCs to node.capnp schema - Server: hybrid key storage in FileBackedStore + RPC handlers - Client: hybrid keypair in StoredState, auto-wrap/unwrap in send/recv/invite/join - demo-group runs full hybrid PQ envelope round-trip Feature 2 — SQLCipher Persistence: - Extract Store trait from FileBackedStore API - Create SqlStore (rusqlite + bundled-sqlcipher) with encrypted-at-rest SQLite - Schema: key_packages, deliveries, hybrid_keys tables with indexes - Server CLI: --store-backend=sql, --db-path, --db-key flags - 5 unit tests for SqlStore (FIFO, round-trip, upsert, channel isolation) Also includes: client lib.rs refactor, auth config, TOML config file support, mdBook documentation, and various cleanups by user. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-22 08:07:48 +01:00
parent d1ddef4cea
commit f334ed3d43
81 changed files with 14502 additions and 2289 deletions
--- a/docs/src/protocol-layers/capn-proto.md
+++ b/docs/src/protocol-layers/capn-proto.md
@@ -0,0 +1,278 @@
+# Cap'n Proto Serialisation and RPC
+
+quicnprotochat uses [Cap'n Proto](https://capnproto.org/) for both message serialisation and remote procedure calls. The serialisation layer encodes structured messages (Envelopes, Auth tokens, delivery payloads) into a compact binary format. The RPC layer provides the client-server interface for the Authentication Service, Delivery Service, and health checks -- all exposed through a single `NodeService` interface.
+
+This page covers why Cap'n Proto was chosen, how schemas are compiled, the owned `ParsedEnvelope` type, serialisation helpers, and ALPN integration with QUIC.
+
+## Why Cap'n Proto
+
+Several serialisation formats were considered. The table below summarises the trade-offs:
+
+| Format | Zero-copy reads | Schema enforcement | Built-in RPC | Canonical bytes for signing |
+|---|---|---|---|---|
+| **Cap'n Proto** | Yes | Yes (`.capnp` schemas) | Yes (`capnp-rpc`) | Yes (canonical serialisation mode) |
+| Protocol Buffers | No (requires deserialisation) | Yes (`.proto` schemas) | Yes (`tonic`/gRPC) | No (non-deterministic field ordering) |
+| MessagePack | No | No (untyped) | No | No |
+| FlatBuffers | Yes | Yes (`.fbs` schemas) | No built-in RPC | Partial |
+
+Cap'n Proto was selected for the following reasons:
+
+1. **Zero-copy reads**: Cap'n Proto messages can be read directly from the wire buffer without deserialisation. The `Reader` type is a thin pointer into the original bytes. This eliminates allocation and copying on the hot path (message routing in the Delivery Service).
+
+2. **Schema-enforced types**: All messages are defined in `.capnp` schema files. The compiler (`capnpc`) generates type-safe Rust code that prevents mismatched field types at compile time. This is especially valuable for a security-sensitive protocol where a type confusion bug could be exploitable.
+
+3. **Canonical serialisation**: Cap'n Proto can produce deterministic byte representations of messages. This is critical for MLS, where Commits and KeyPackages must be signed -- the signature must cover exactly the same bytes that the verifier will see.
+
+4. **Built-in async RPC**: The `capnp-rpc` crate provides a capability-based RPC system with promise pipelining. quicnprotochat uses it for the `NodeService` interface (KeyPackage upload/fetch, message enqueue/fetch, health checks, hybrid key operations). This avoids the need to hand-roll a request/response protocol.
+
+5. **Compact wire format**: Cap'n Proto's wire format is more compact than JSON or XML and comparable to Protocol Buffers, with the advantage of no decode step.
+
+## Schema compilation flow
+
+Cap'n Proto schemas live in the workspace-root `schemas/` directory:
+
+```text
+schemas/
+  envelope.capnp    -- Top-level wire message (MsgType enum + payload)
+  auth.capnp        -- AuthenticationService RPC interface (legacy, pre-M3)
+  delivery.capnp    -- DeliveryService RPC interface (legacy, pre-M3)
+  node.capnp        -- Unified NodeService RPC interface (M3+)
+```
+
+### build.rs
+
+The `quicnprotochat-proto` crate compiles these schemas at build time via `build.rs`:
+
+```rust
+capnpc::CompilerCommand::new()
+    .src_prefix(&schemas_dir)
+    .file(schemas_dir.join("envelope.capnp"))
+    .file(schemas_dir.join("auth.capnp"))
+    .file(schemas_dir.join("delivery.capnp"))
+    .file(schemas_dir.join("node.capnp"))
+    .run()
+    .expect("Cap'n Proto schema compilation failed.");
+```
+
+Key details:
+
+- **`src_prefix`**: Set to `schemas/` so that inter-schema imports resolve correctly.
+- **Output location**: Generated Rust source is written to `$OUT_DIR` (Cargo's build directory). The filenames follow the convention `{schema_name}_capnp.rs`.
+- **Rerun triggers**: `cargo:rerun-if-changed` directives ensure the build script re-runs whenever any `.capnp` file changes.
+- **Prerequisite**: The `capnp` CLI binary must be installed on the build machine (`apt-get install capnproto` or `brew install capnp`).
+
+### Generated module inclusion
+
+The generated code is spliced into the `quicnprotochat-proto` crate via `include!` macros:
+
+```rust
+pub mod envelope_capnp {
+    include!(concat!(env!("OUT_DIR"), "/envelope_capnp.rs"));
+}
+pub mod auth_capnp {
+    include!(concat!(env!("OUT_DIR"), "/auth_capnp.rs"));
+}
+pub mod delivery_capnp {
+    include!(concat!(env!("OUT_DIR"), "/delivery_capnp.rs"));
+}
+pub mod node_capnp {
+    include!(concat!(env!("OUT_DIR"), "/node_capnp.rs"));
+}
+```
+
+Consumers import types from these modules. For example, `node_capnp::node_service::Server` is the trait that the server implements.
+
+## The Envelope schema
+
+The `Envelope` is the top-level wire message for all quicnprotochat traffic. Every frame exchanged between peers (whether over Noise or QUIC) is serialised as an Envelope:
+
+```capnp
+struct Envelope {
+  msgType     @0 :MsgType;
+  groupId     @1 :Data;     # 32-byte SHA-256 digest of group name
+  senderId    @2 :Data;     # 32-byte SHA-256 digest of Ed25519 pubkey
+  payload     @3 :Data;     # Opaque payload (MLS blob or control data)
+  timestampMs @4 :UInt64;   # Unix epoch milliseconds
+
+  enum MsgType {
+    ping               @0;
+    pong               @1;
+    keyPackageUpload   @2;
+    keyPackageFetch    @3;
+    keyPackageResponse @4;
+    mlsWelcome         @5;
+    mlsCommit          @6;
+    mlsApplication     @7;
+    error              @8;
+  }
+}
+```
+
+The Delivery Service routes by `(groupId, msgType)` without inspecting `payload`. This design keeps the DS MLS-unaware -- see [ADR-004: MLS-Unaware Delivery Service](../design-rationale/adr-004-mls-unaware-ds.md).
+
+## The `ParsedEnvelope` owned type
+
+Cap'n Proto readers (`envelope_capnp::envelope::Reader`) borrow from the original byte buffer and cannot be sent across async task boundaries (`!Send`). This is a fundamental limitation of zero-copy reads.
+
+To bridge this gap, `quicnprotochat-proto` defines `ParsedEnvelope`:
+
+```rust
+pub struct ParsedEnvelope {
+    pub msg_type: MsgType,
+    pub group_id: Vec<u8>,
+    pub sender_id: Vec<u8>,
+    pub payload: Vec<u8>,
+    pub timestamp_ms: u64,
+}
+```
+
+`ParsedEnvelope` eagerly copies all byte fields out of the Cap'n Proto reader, making the type `Send + 'static`. This allows it to cross Tokio task boundaries, be stored in queues, and be passed through channels.
+
+The trade-off is clear: `ParsedEnvelope` allocates and copies, defeating the zero-copy benefit. This is acceptable because:
+
+1. The copying happens once per message at the protocol boundary.
+2. Application-layer code (MLS encryption/decryption, routing) needs owned data anyway.
+3. The performance-critical path (Delivery Service routing) works with opaque `Vec<u8>` payloads, not parsed Cap'n Proto readers.
+
+### Invariants
+
+- `group_id` and `sender_id` are either empty (for control messages like Ping/Pong) or exactly 32 bytes (SHA-256 digest).
+- `payload` is empty for Ping and Pong; non-empty for all MLS variants.
+
+## Serialisation helpers
+
+Two functions handle the conversion between `ParsedEnvelope` and wire bytes:
+
+### `build_envelope`
+
+```rust
+pub fn build_envelope(env: &ParsedEnvelope) -> Result<Vec<u8>, capnp::Error>
+```
+
+Serialises a `ParsedEnvelope` to unpacked Cap'n Proto wire bytes. The output includes the Cap'n Proto segment table header followed by the message data. These bytes are suitable as the body of a length-prefixed frame (the `LengthPrefixedCodec` in `quicnprotochat-core` prepends the 4-byte length) or as a payload within a QUIC stream.
+
+Internally, it builds a `capnp::message::Builder`, populates an `Envelope` root, and serialises via `capnp::serialize::write_message`.
+
+### `parse_envelope`
+
+```rust
+pub fn parse_envelope(bytes: &[u8]) -> Result<ParsedEnvelope, capnp::Error>
+```
+
+Deserialises unpacked Cap'n Proto wire bytes into a `ParsedEnvelope`. All data is copied out of the reader before returning, so the input slice is not retained.
+
+It returns `capnp::Error` if:
+- The bytes are not valid Cap'n Proto wire format.
+- The `msgType` discriminant is not present in the current schema (forward-compatibility guard).
+
+### Low-level helpers
+
+Two additional functions provide raw byte-to-message conversions:
+
+```rust
+pub fn to_bytes<A: Allocator>(msg: &Builder<A>) -> Result<Vec<u8>, capnp::Error>
+pub fn from_bytes(bytes: &[u8]) -> Result<Reader<OwnedSegments>, capnp::Error>
+```
+
+`from_bytes` uses `ReaderOptions::new()` with default limits:
+- **Traversal limit**: 64 MiB (8 * 1024 * 1024 words)
+- **Nesting limit**: 512 levels
+
+These defaults are reasonable for trusted data. For untrusted data from the network, callers should consider tightening `traversal_limit_in_words` to prevent denial-of-service via deeply nested or excessively large messages. The server enforces its own size limits: 5 MB per payload (`MAX_PAYLOAD_BYTES`) and 1 MB per KeyPackage (`MAX_KEYPACKAGE_BYTES`).
+
+## The NodeService RPC interface
+
+The M3 unified RPC interface is defined in `schemas/node.capnp`:
+
+```capnp
+interface NodeService {
+  uploadKeyPackage @0 (identityKey :Data, package :Data, auth :Auth)
+                      -> (fingerprint :Data);
+  fetchKeyPackage  @1 (identityKey :Data, auth :Auth) -> (package :Data);
+  enqueue          @2 (recipientKey :Data, payload :Data,
+                       channelId :Data, version :UInt16, auth :Auth) -> ();
+  fetch            @3 (recipientKey :Data, channelId :Data,
+                       version :UInt16, auth :Auth) -> (payloads :List(Data));
+  fetchWait        @4 (recipientKey :Data, channelId :Data,
+                       version :UInt16, timeoutMs :UInt64, auth :Auth)
+                      -> (payloads :List(Data));
+  health           @5 () -> (status :Text);
+  uploadHybridKey  @6 (identityKey :Data, hybridPublicKey :Data) -> ();
+  fetchHybridKey   @7 (identityKey :Data) -> (hybridPublicKey :Data);
+}
+```
+
+This combines Authentication Service operations (`uploadKeyPackage`, `fetchKeyPackage`), Delivery Service operations (`enqueue`, `fetch`, `fetchWait`), health monitoring (`health`), and hybrid key management (`uploadHybridKey`, `fetchHybridKey`) into a single RPC interface.
+
+### Auth context
+
+Every mutating RPC method accepts an `Auth` struct:
+
+```capnp
+struct Auth {
+  version     @0 :UInt16;   # 0 = legacy/none, 1 = token-based auth
+  accessToken @1 :Data;     # opaque bearer token
+  deviceId    @2 :Data;     # optional UUID bytes for auditing
+}
+```
+
+The server validates the `version` field and rejects unknown versions. Token validation is planned for a future milestone. See [Auth, Devices, and Tokens](../roadmap/authz-plan.md).
+
+## ALPN integration
+
+Cap'n Proto RPC rides directly on the QUIC bidirectional stream. The ALPN (Application-Layer Protocol Negotiation) extension in the TLS handshake identifies the protocol:
+
+```rust
+tls.alpn_protocols = vec![b"capnp".to_vec()];
+```
+
+Both client and server set the ALPN to `b"capnp"`. If the client and server disagree on the ALPN, the TLS handshake fails before any application data is exchanged.
+
+On the QUIC path, the flow is:
+
+```text
+Client                              Server
+  |                                   |
+  |── QUIC handshake (TLS 1.3) ────►|   ALPN: "capnp"
+  |                                   |
+  |── open_bi() ───────────────────►|   Bidirectional QUIC stream
+  |                                   |
+  |◄─────── capnp-rpc messages ────►|   VatNetwork reads/writes on the stream
+```
+
+The `tokio-util` compat layer converts Quinn stream types into `futures::AsyncRead + AsyncWrite`, which `capnp-rpc`'s `VatNetwork` expects. See [QUIC + TLS 1.3](quic-tls.md) for the full connection setup.
+
+On the legacy Noise path, the `into_capnp_io()` bridge serves the same purpose -- converting a Noise-encrypted TCP connection into a byte stream for `VatNetwork`. See [Noise\_XX Handshake](noise-xx.md) for details.
+
+## Comparison with alternatives
+
+### vs Protocol Buffers + gRPC
+
+Protocol Buffers require a full deserialisation step to access any field. Cap'n Proto avoids this with zero-copy readers. gRPC requires HTTP/2 framing, which adds overhead on top of QUIC. Cap'n Proto RPC is leaner and maps naturally to a single QUIC stream.
+
+### vs MessagePack
+
+MessagePack is untyped -- there is no schema file, and type errors are caught at runtime. This is unacceptable for a security protocol where a misinterpreted field could be exploitable. MessagePack also has no RPC framework, requiring a hand-rolled request/response protocol.
+
+### vs FlatBuffers
+
+FlatBuffers supports zero-copy reads (like Cap'n Proto) but lacks a built-in RPC framework. The ecosystem and tooling are also less mature for Rust.
+
+## Design constraints of `quicnprotochat-proto`
+
+The `quicnprotochat-proto` crate enforces three design constraints:
+
+1. **No crypto**: Key material never enters this crate. All encryption and signing happens in `quicnprotochat-core`.
+2. **No I/O**: Callers own the transport. This crate only converts between bytes and types.
+3. **No async**: Pure synchronous data-layer code. Async is the caller's responsibility.
+
+These constraints keep the serialisation layer thin and auditable.
+
+## Further reading
+
+- [Envelope Schema](../wire-format/envelope-schema.md) -- Detailed field-by-field breakdown of the Envelope wire format.
+- [NodeService Schema](../wire-format/node-service-schema.md) -- Full RPC interface documentation.
+- [Auth Schema](../wire-format/auth-schema.md) -- Auth token structure and versioning.
+- [MLS (RFC 9420)](mls.md) -- How MLS messages are carried as opaque payloads inside Cap'n Proto Envelopes.
+- [ADR-002: Cap'n Proto over MessagePack](../design-rationale/adr-002-capnproto.md) -- Design rationale for choosing Cap'n Proto.
+- [ADR-003: RPC Inside the Noise Tunnel](../design-rationale/adr-003-rpc-inside-noise.md) -- Why RPC runs inside the encrypted transport.
--- a/docs/src/protocol-layers/hybrid-kem.md
+++ b/docs/src/protocol-layers/hybrid-kem.md
@@ -0,0 +1,281 @@
+# Hybrid KEM: X25519 + ML-KEM-768
+
+quicnprotochat implements a hybrid Key Encapsulation Mechanism that combines classical X25519 Diffie-Hellman with post-quantum ML-KEM-768 (FIPS 203). The hybrid construction ensures that the system remains secure even if one of the two components is broken: X25519 protects against failures in ML-KEM, and ML-KEM protects against quantum computers breaking X25519.
+
+The implementation lives in `quicnprotochat-core/src/hybrid_kem.rs`. It is fully implemented and tested but **not yet integrated into the MLS ciphersuite** -- integration is planned for the M5 milestone. Currently, the module can be used as a standalone envelope encryption layer to wrap MLS payloads in an outer post-quantum-resistant encryption before they transit the network.
+
+## Design approach
+
+The hybrid KEM follows the **combiner approach** from [draft-ietf-tls-hybrid-design](https://datatracker.ietf.org/doc/draft-ietf-tls-hybrid-design/). The core idea:
+
+1. Perform both a classical key exchange (X25519) and a post-quantum key encapsulation (ML-KEM-768) against the recipient's public keys.
+2. Combine the two shared secrets into a single AEAD key using HKDF.
+3. Encrypt the payload with ChaCha20-Poly1305 using the derived key.
+
+This ensures:
+- **IND-CCA2 security** if *either* X25519 or ML-KEM-768 is secure.
+- No reliance on a single hardness assumption.
+- Graceful degradation: if ML-KEM is found to have a flaw, classical X25519 still protects the data.
+
+## Component algorithms
+
+| Component | Algorithm | Size | Security Level |
+|---|---|---|---|
+| Classical KEM | X25519 ECDH | 32-byte keys, 32-byte shared secret | 128-bit classical |
+| Post-quantum KEM | ML-KEM-768 (FIPS 203) | 1184-byte EK, 2400-byte DK, 1088-byte CT, 32-byte SS | NIST Level 3 (128-bit quantum) |
+| Key derivation | HKDF-SHA256 | 32-byte output key, 12-byte output nonce | 256-bit PRF security |
+| Symmetric encryption | ChaCha20-Poly1305 | 32-byte key, 12-byte nonce, 16-byte tag | 256-bit security |
+
+### ML-KEM-768 constants
+
+These constants are defined in `hybrid_kem.rs` and match FIPS 203:
+
+| Constant | Value | Description |
+|---|---|---|
+| `MLKEM_EK_LEN` | 1,184 bytes | Encapsulation (public) key size |
+| `MLKEM_DK_LEN` | 2,400 bytes | Decapsulation (private) key size |
+| `MLKEM_CT_LEN` | 1,088 bytes | Ciphertext size |
+| Shared secret | 32 bytes | Output of encapsulate/decapsulate |
+
+ML-KEM-768 was chosen over ML-KEM-512 (NIST Level 1) for a stronger security margin and over ML-KEM-1024 (NIST Level 5) because the additional key/ciphertext sizes are not justified for 128-bit target security.
+
+## Wire format
+
+Every hybrid-encrypted payload is packaged as a self-describing envelope:
+
+```text
+┌─────────┬──────────────────┬──────────────────┬──────────────┬──────────────────┐
+│ version │ x25519_eph_pk    │ mlkem_ct         │ aead_nonce   │ aead_ct          │
+│ (1 B)   │ (32 B)           │ (1088 B)         │ (12 B)       │ (variable)       │
+└─────────┴──────────────────┴──────────────────┴──────────────┴──────────────────┘
+```
+
+| Field | Offset | Size | Description |
+|---|---|---|---|
+| `version` | 0 | 1 byte | Envelope version. Currently `0x01`. |
+| `x25519_eph_pk` | 1 | 32 bytes | Ephemeral X25519 public key (generated fresh per encryption). |
+| `mlkem_ct` | 33 | 1,088 bytes | ML-KEM-768 ciphertext (encapsulation of the PQ shared secret). |
+| `aead_nonce` | 1,121 | 12 bytes | ChaCha20-Poly1305 nonce (derived from HKDF). |
+| `aead_ct` | 1,133 | variable | ChaCha20-Poly1305 ciphertext + 16-byte authentication tag. |
+
+The total header (`HEADER_LEN`) is 1 + 32 + 1088 + 12 = **1,133 bytes**. The minimum valid envelope is `HEADER_LEN + 16` = 1,149 bytes (16 bytes for the AEAD tag on an empty plaintext).
+
+The `version` byte enables future format evolution. Decryption rejects any version other than `0x01` with `HybridKemError::UnsupportedVersion`.
+
+## Key derivation
+
+The two shared secrets are combined via HKDF-SHA256 with domain separation:
+
+```text
+ikm  = X25519_shared_secret(32 bytes) || ML-KEM_shared_secret(32 bytes)
+salt = [] (empty)
+
+key   = HKDF-SHA256(salt, ikm, info="quicnprotochat-hybrid-v1",       L=32)
+nonce = HKDF-SHA256(salt, ikm, info="quicnprotochat-hybrid-nonce-v1", L=12)
+```
+
+The implementation in `derive_aead_material()`:
+
+```rust
+fn derive_aead_material(x25519_ss: &[u8], mlkem_ss: &[u8]) -> (Key, Nonce) {
+    let mut ikm = Zeroizing::new(vec![0u8; x25519_ss.len() + mlkem_ss.len()]);
+    ikm[..x25519_ss.len()].copy_from_slice(x25519_ss);
+    ikm[x25519_ss.len()..].copy_from_slice(mlkem_ss);
+
+    let hk = Hkdf::<Sha256>::new(None, &ikm);
+
+    let mut key_bytes = Zeroizing::new([0u8; 32]);
+    hk.expand(b"quicnprotochat-hybrid-v1", &mut *key_bytes).unwrap();
+
+    let mut nonce_bytes = [0u8; 12];
+    hk.expand(b"quicnprotochat-hybrid-nonce-v1", &mut nonce_bytes).unwrap();
+
+    (*Key::from_slice(&*key_bytes), *Nonce::from_slice(&nonce_bytes))
+}
+```
+
+Key design decisions:
+
+- **Concatenation order**: X25519 shared secret first, ML-KEM shared secret second. This is consistent with the draft-ietf-tls-hybrid-design convention.
+- **Separate info strings**: The key and nonce are derived with different HKDF info strings to ensure domain separation. Using the same info string for both would be a cryptographic error.
+- **Zeroization**: The concatenated IKM and the derived key bytes are wrapped in `Zeroizing` to ensure they are cleared from memory when dropped.
+- **Empty salt**: HKDF is used in extract-then-expand mode with no salt. The IKM already has high entropy from both DH operations.
+
+## `HybridKeypair`
+
+Each peer holds a `HybridKeypair` combining classical and post-quantum key material:
+
+```rust
+pub struct HybridKeypair {
+    x25519_sk: StaticSecret,        // 32 bytes
+    x25519_pk: X25519Public,        // 32 bytes
+    mlkem_dk: DecapsulationKey<MlKem768Params>,  // 2400 bytes
+    mlkem_ek: EncapsulationKey<MlKem768Params>,  // 1184 bytes
+}
+```
+
+### Generation
+
+```rust
+pub fn generate() -> Self {
+    let x25519_sk = StaticSecret::random_from_rng(OsRng);
+    let x25519_pk = X25519Public::from(&x25519_sk);
+    let (mlkem_dk, mlkem_ek) = MlKem768::generate(&mut OsRng);
+    // ...
+}
+```
+
+Both key pairs are generated from the OS CSPRNG (`OsRng`). The X25519 key uses `x25519-dalek`'s `StaticSecret` (not `EphemeralSecret`) because the keypair is long-lived and must be stored.
+
+### Serialisation
+
+For persistence, `HybridKeypairBytes` provides a serialisable form:
+
+```rust
+pub struct HybridKeypairBytes {
+    pub x25519_sk: [u8; 32],
+    pub mlkem_dk: Vec<u8>,   // 2400 bytes
+    pub mlkem_ek: Vec<u8>,   // 1184 bytes
+}
+```
+
+Round-trip: `keypair.to_bytes()` serialises, `HybridKeypair::from_bytes(&bytes)` reconstructs. The ML-KEM keys are reconstructed using `DecapsulationKey::from_bytes()` and `EncapsulationKey::from_bytes()`, which accept `Array` types converted from slices.
+
+### Public key extraction
+
+The public portion is extracted for distribution to peers:
+
+```rust
+pub struct HybridPublicKey {
+    pub x25519_pk: [u8; 32],
+    pub mlkem_ek: Vec<u8>,   // 1184 bytes
+}
+```
+
+`HybridPublicKey` can be serialised to a single byte blob: `x25519_pk(32) || mlkem_ek(1184)` = 1,216 bytes total. This is uploaded to the server via the `uploadHybridKey` RPC and fetched by peers via `fetchHybridKey`.
+
+## Encryption flow: `hybrid_encrypt`
+
+```rust
+pub fn hybrid_encrypt(
+    recipient_pk: &HybridPublicKey,
+    plaintext: &[u8],
+) -> Result<Vec<u8>, HybridKemError>
+```
+
+Step-by-step:
+
+1. **Ephemeral X25519 DH**: Generate a fresh `EphemeralSecret`, compute the X25519 shared secret with the recipient's static public key. The ephemeral secret is consumed (moved) by `diffie_hellman()` and cannot be reused.
+
+2. **ML-KEM-768 encapsulation**: Reconstruct the recipient's `EncapsulationKey` from the public key bytes, then call `encapsulate(&mut OsRng)`. This produces a ciphertext (1,088 bytes) and a shared secret (32 bytes).
+
+3. **Key derivation**: Call `derive_aead_material()` with both shared secrets to produce a 32-byte ChaCha20-Poly1305 key and a 12-byte nonce.
+
+4. **AEAD encryption**: Encrypt the plaintext with `ChaCha20Poly1305::encrypt()`. The output includes the 16-byte authentication tag.
+
+5. **Envelope assembly**: Concatenate `version || x25519_eph_pk || mlkem_ct || nonce || aead_ct`.
+
+## Decryption flow: `hybrid_decrypt`
+
+```rust
+pub fn hybrid_decrypt(
+    keypair: &HybridKeypair,
+    envelope: &[u8],
+) -> Result<Vec<u8>, HybridKemError>
+```
+
+Step-by-step:
+
+1. **Envelope parsing**: Verify minimum length (`HEADER_LEN + 16`), check version byte (`0x01`), then extract the five fields by offset.
+
+2. **X25519 DH**: Compute the shared secret using the recipient's static private key (`keypair.x25519_sk`) and the sender's ephemeral public key from the envelope.
+
+3. **ML-KEM-768 decapsulation**: Convert the ciphertext bytes to the `Array` type expected by `DecapsulationKey::decapsulate()`, then decapsulate to recover the shared secret.
+
+4. **Key derivation**: Same `derive_aead_material()` call as encryption, producing the same key and nonce (the nonce from the envelope is used for AEAD decryption, not the derived one -- actually, both are identical because the derivation is deterministic from the same shared secrets).
+
+5. **AEAD decryption**: Decrypt and authenticate the ciphertext with `ChaCha20Poly1305::decrypt()`.
+
+## Error handling
+
+The `HybridKemError` enum covers all failure modes:
+
+| Variant | Meaning |
+|---|---|
+| `EncryptionFailed` | AEAD encryption failed (should not happen with valid inputs) |
+| `DecryptionFailed` | AEAD decryption failed -- wrong recipient key or tampered ciphertext |
+| `UnsupportedVersion(u8)` | Envelope version byte is not `0x01` |
+| `TooShort(usize)` | Envelope is shorter than `HEADER_LEN + 16` bytes |
+| `InvalidMlKemKey` | ML-KEM encapsulation key bytes are malformed |
+| `MlKemDecapsFailed` | ML-KEM decapsulation failed -- tampered ciphertext or wrong key |
+
+The tests in `hybrid_kem.rs` verify:
+- Round-trip encrypt/decrypt with correct keys.
+- Decryption with wrong key fails (`DecryptionFailed`).
+- Tampered AEAD ciphertext fails (`DecryptionFailed`).
+- Tampered ML-KEM ciphertext fails (either `MlKemDecapsFailed` or `DecryptionFailed`).
+- Tampered X25519 ephemeral public key fails (`DecryptionFailed`).
+- Unsupported version is rejected.
+- Too-short envelope is rejected.
+- Keypair and public key serialisation round-trip.
+- Large payloads (50 KB) round-trip successfully.
+
+## Current status and roadmap
+
+The hybrid KEM module is:
+
+- **Implemented**: All types, encryption, decryption, serialisation, and key management are complete.
+- **Tested**: Comprehensive unit tests cover all success and failure paths.
+- **Server-supported**: The `NodeService` RPC interface includes `uploadHybridKey` and `fetchHybridKey` methods. The server stores hybrid public keys in its `FileBackedStore`.
+- **Not yet integrated into MLS**: The MLS ciphersuite (`MLS_128_DHKEMX25519_AES128GCM_SHA256_Ed25519`) uses classical DHKEM(X25519). Replacing it with a hybrid KEM requires either:
+  - A custom openmls ciphersuite that uses the hybrid KEM for HPKE (complex, requires forking openmls).
+  - An outer encryption layer that wraps MLS messages in a hybrid envelope before delivery (simpler, less tightly integrated).
+
+The M5 milestone will integrate the hybrid KEM, likely as an outer encryption layer. Until then, MLS application data is protected by classical X25519 ECDH (128-bit security against classical computers, vulnerable to quantum computers).
+
+The post-quantum gap in the transport layer ([QUIC + TLS 1.3](quic-tls.md) and [Noise\_XX](noise-xx.md)) is a separate concern tracked in [ADR-006: PQ Gap in Noise Transport](../design-rationale/adr-006-pq-gap.md).
+
+## Security analysis
+
+### Hybrid security guarantee
+
+The combiner construction ensures that an attacker must break *both* X25519 and ML-KEM-768 to recover the plaintext. Specifically:
+
+- A **classical attacker** cannot break X25519 (ECDLP is hard on Curve25519) and therefore cannot derive the AEAD key, regardless of whether they can break ML-KEM.
+- A **quantum attacker** with a cryptographically relevant quantum computer could break X25519 via Shor's algorithm but cannot break ML-KEM-768 (based on the Module-LWE problem, believed to be quantum-resistant).
+- An attacker who discovers a **flaw in ML-KEM** still faces X25519, which provides 128-bit classical security.
+
+### Key reuse
+
+The X25519 component of the hybrid keypair is a `StaticSecret` (long-lived), not an `EphemeralSecret`. This is safe because:
+- Each encryption uses a fresh `EphemeralSecret` for the sender's X25519 contribution.
+- The static secret is only used in the DH computation with the ephemeral public key; it never appears in the wire format.
+- The ML-KEM encapsulation also generates fresh randomness per encryption.
+
+### Nonce handling
+
+The AEAD nonce is derived deterministically from the shared secrets via HKDF. Since each encryption uses a fresh ephemeral X25519 key and fresh ML-KEM randomness, the shared secrets (and therefore the derived nonce) are unique per encryption with overwhelming probability. Nonce reuse would require both:
+- The same ephemeral X25519 key (probability 2^{-256}).
+- The same ML-KEM encapsulation randomness (probability 2^{-256}).
+
+## Crate dependencies
+
+| Crate | Version | Role |
+|---|---|---|
+| `ml-kem` | 0.2 | ML-KEM-768 (FIPS 203) implementation |
+| `x25519-dalek` | 2 | X25519 ECDH (with `static_secrets` feature) |
+| `chacha20poly1305` | 0.10 | AEAD symmetric encryption |
+| `hkdf` | 0.12 | HKDF-SHA256 key derivation |
+| `sha2` | 0.10 | SHA-256 (used by HKDF) |
+| `zeroize` | 1 | Secure memory clearing for key material |
+| `rand` | 0.8 | `OsRng` for CSPRNG |
+| `serde` | 1 | Serialisation of keypair and public key types |
+
+## Further reading
+
+- [Post-Quantum Readiness](../cryptography/post-quantum-readiness.md) -- Broader discussion of quicnprotochat's PQ strategy.
+- [MLS (RFC 9420)](mls.md) -- The MLS layer that the hybrid KEM will wrap.
+- [Key Lifecycle and Zeroization](../cryptography/key-lifecycle.md) -- How hybrid key material is managed and cleared.
+- [ADR-006: PQ Gap in Noise Transport](../design-rationale/adr-006-pq-gap.md) -- The accepted PQ gap in the transport layers.
+- [Threat Model](../cryptography/threat-model.md) -- Where hybrid KEM fits in the overall threat model.
+- [Milestone Tracker](../roadmap/milestones.md) -- M5 milestone for hybrid KEM integration into MLS.
--- a/docs/src/protocol-layers/mls.md
+++ b/docs/src/protocol-layers/mls.md
@@ -0,0 +1,420 @@
+# MLS (RFC 9420)
+
+The Messaging Layer Security protocol (RFC 9420) is the core cryptographic layer in quicnprotochat. It provides authenticated group key agreement with forward secrecy and post-compromise security -- properties that distinguish quicnprotochat from a simple transport-encrypted relay. This is the most detailed page in the Protocol Deep Dives section because MLS is the most complex layer in the stack.
+
+The implementation lives in `quicnprotochat-core/src/group.rs` and `quicnprotochat-core/src/keystore.rs`, using the `openmls 0.5` crate.
+
+## Background: what problem MLS solves
+
+Before MLS, group messaging systems had two main approaches:
+
+1. **Pairwise encryption (Signal/Double Ratchet)**: Each pair of group members maintains an independent encrypted session. A message to a group of *n* members requires *n - 1* separate encryptions. Adding or removing a member requires *O(n)* operations by each member. The total work for a group operation is *O(n^2)*.
+
+2. **Server-side fan-out with shared key**: All members share a single group key. The server decrypts and re-encrypts for each member. This is not end-to-end encrypted -- the server sees plaintext.
+
+MLS takes a fundamentally different approach: it uses a **ratchet tree** (a binary tree of Diffie-Hellman key pairs) to derive group keys. This gives:
+
+- **O(log n) scaling**: A group operation (add, remove, update) requires only *O(log n)* DH operations, one per level of the tree, regardless of group size.
+- **Forward secrecy**: Each epoch uses a fresh key derived from the ratchet tree. Compromising the current key does not reveal past messages.
+- **Post-compromise security (PCS)**: After a member's key is compromised, a single Update Commit operation re-randomises the compromised node's path in the tree, restoring confidentiality for all subsequent messages.
+- **End-to-end encryption**: The server (Delivery Service) never sees plaintext. It routes opaque MLS blobs by recipient key without parsing them.
+
+## Ciphersuite
+
+quicnprotochat uses:
+
+```text
+MLS_128_DHKEMX25519_AES128GCM_SHA256_Ed25519
+```
+
+| Component | Algorithm | Purpose |
+|---|---|---|
+| **HPKE KEM** | DHKEM(X25519, HKDF-SHA256) | Key encapsulation for Welcome messages and tree operations |
+| **AEAD** | AES-128-GCM | Symmetric encryption of application messages |
+| **Hash** | SHA-256 | Key derivation, transcript hashing, tree hashing |
+| **Signature** | Ed25519 | Credential binding, Commit signing, KeyPackage signing |
+
+This ciphersuite provides 128-bit classical security. Post-quantum protection is handled by the [Hybrid KEM](hybrid-kem.md) layer wrapping MLS payloads at the transport level (planned for M5).
+
+## The `GroupMember` state machine
+
+The central type is `GroupMember`, defined in `quicnprotochat-core/src/group.rs`. It wraps an openmls `MlsGroup`, a persistent crypto backend (`StoreCrypto`), and the user's long-term Ed25519 identity keypair.
+
+### Lifecycle diagram
+
+```text
+GroupMember::new(identity)
+  |
+  ├── generate_key_package()      → TLS-encoded KeyPackage bytes
+  |                                  (upload to Authentication Service)
+  |
+  ├── create_group(group_id)      → Epoch 0; caller is sole member
+  |     |
+  |     └── add_member(kp_bytes)  → (commit_bytes, welcome_bytes)
+  |           |                      merge_pending_commit() called internally
+  |           |
+  |           ├── [commit_bytes → existing members via DS]
+  |           └── [welcome_bytes → new member via DS]
+  |
+  └── join_group(welcome_bytes)   → Join via Welcome; epoch matches inviter
+        |
+        ├── send_message(plaintext) → MLS PrivateMessage bytes
+        |
+        └── receive_message(bytes)  → Some(plaintext) for Application messages
+                                      None for Commits (state updated internally)
+                                      None for Proposals (stored for later Commit)
+```
+
+### Construction
+
+```rust
+pub fn new(identity: Arc<IdentityKeypair>) -> Self
+```
+
+Creates a new `GroupMember` with:
+- A fresh `StoreCrypto` backend using an ephemeral (in-memory) key store.
+- The provided Ed25519 identity keypair (used as the MLS `Signer`).
+- No active group (`self.group = None`).
+
+For state persistence across restarts, use:
+
+```rust
+pub fn new_with_state(
+    identity: Arc<IdentityKeypair>,
+    key_store: DiskKeyStore,
+    group: Option<MlsGroup>,
+) -> Self
+```
+
+This constructor accepts a pre-existing `DiskKeyStore` (loaded from disk) and an optional serialised `MlsGroup`. The `MlsGroupConfig` is rebuilt with `use_ratchet_tree_extension(true)`.
+
+### MLS group configuration
+
+The group configuration is built once at construction time:
+
+```rust
+let config = MlsGroupConfig::builder()
+    .use_ratchet_tree_extension(true)
+    .build();
+```
+
+The critical setting is `use_ratchet_tree_extension(true)`: this embeds the full ratchet tree inside Welcome messages so that new members can reconstruct the group state without a separate tree-fetching step. The trade-off is larger Welcome messages, but this simplifies the protocol by eliminating a round-trip to a tree distribution service.
+
+## Key operations
+
+### `generate_key_package()`
+
+```rust
+pub fn generate_key_package(&mut self) -> Result<Vec<u8>, CoreError>
+```
+
+Generates a fresh, single-use MLS KeyPackage and returns it as TLS-encoded bytes.
+
+**What happens internally:**
+
+1. A `CredentialWithKey` is created from the identity keypair. The credential type is `Basic` -- the credential body is the raw Ed25519 public key bytes, and the `signature_key` field is the same public key.
+
+2. `KeyPackage::builder().build()` is called with:
+   - `CryptoConfig::with_default_version(CIPHERSUITE)` -- specifies the MLS ciphersuite.
+   - `&self.backend` -- the `StoreCrypto` provider. During build, openmls generates an HPKE init keypair and stores the private key in the backend's key store.
+   - `self.identity.as_ref()` -- the `Signer` (Ed25519 private key) used to sign the KeyPackage.
+   - The `CredentialWithKey` binding the credential to the signature key.
+
+3. The KeyPackage is serialised via `tls_serialize_detached()` (TLS presentation language encoding, as specified by RFC 9420).
+
+**Critical invariant:** The HPKE init private key is stored in `self.backend`'s key store. The **same `GroupMember` instance** (or one reconstructed with the same `DiskKeyStore`) must later call `join_group()`, because `new_from_welcome()` looks up the init private key by reference to decrypt the Welcome. If a different `GroupMember` instance (with a fresh key store) tries to join, the lookup fails and the Welcome cannot be decrypted.
+
+**Why KeyPackages are single-use:** Each KeyPackage contains a unique HPKE init public key. Using the same KeyPackage for two different group joins would allow the joiner's init key to be reused, which could compromise forward secrecy. See [ADR-005: Single-Use KeyPackages](../design-rationale/adr-005-single-use-keypackages.md).
+
+### `create_group(group_id)`
+
+```rust
+pub fn create_group(&mut self, group_id: &[u8]) -> Result<(), CoreError>
+```
+
+Creates a new MLS group at epoch 0 with the caller as the sole member.
+
+**Parameters:**
+- `group_id`: Any non-empty byte string. By convention, quicnprotochat uses the SHA-256 digest of a human-readable group name.
+
+**What happens internally:**
+
+1. A `CredentialWithKey` is created (same as `generate_key_package`).
+2. `MlsGroup::new_with_group_id()` is called with the backend, signer, config, group ID, and credential.
+3. The resulting `MlsGroup` is stored in `self.group`.
+
+After this call, the group exists at epoch 0 with one member. Use `add_member()` to invite additional members.
+
+### `add_member(key_package_bytes)`
+
+```rust
+pub fn add_member(
+    &mut self,
+    key_package_bytes: &[u8],
+) -> Result<(Vec<u8>, Vec<u8>), CoreError>
+```
+
+Adds a new member to the group by their TLS-encoded KeyPackage. Returns `(commit_bytes, welcome_bytes)`.
+
+**What happens internally:**
+
+1. **KeyPackage deserialisation and validation**: The raw bytes are deserialised via `KeyPackageIn::tls_deserialize()`. Note the `In` suffix -- openmls 0.5 distinguishes between `KeyPackage` (trusted, locally-generated) and `KeyPackageIn` (untrusted, received from the network). The `validate()` method verifies the Ed25519 signature on the KeyPackage and returns a trusted `KeyPackage`.
+
+   ```rust
+   let key_package: KeyPackage =
+       KeyPackageIn::tls_deserialize(&mut key_package_bytes.as_ref())?
+           .validate(self.backend.crypto(), ProtocolVersion::Mls10)?;
+   ```
+
+2. **Commit + Welcome creation**: `group.add_members()` produces three outputs:
+   - `commit_out` (`MlsMessageOut`): A Commit message that existing members process to update their state.
+   - `welcome_out` (`MlsMessageOut`): A Welcome message that bootstraps the new member into the group.
+   - `_group_info`: A GroupInfo for external commits (not used here).
+
+3. **Merge pending commit**: `group.merge_pending_commit()` applies the Commit to the local state, advancing the epoch. This is called immediately because the creator of the Commit is also a group member.
+
+4. **Serialisation**: Both `commit_out` and `welcome_out` are serialised to bytes via `.to_bytes()`.
+
+**Caller responsibilities:**
+- Send `commit_bytes` to all existing group members via the Delivery Service. (In the two-party case where the creator is the only member, this can be discarded -- the creator has already merged it locally.)
+- Send `welcome_bytes` to the new member via the Delivery Service.
+
+### `join_group(welcome_bytes)`
+
+```rust
+pub fn join_group(&mut self, welcome_bytes: &[u8]) -> Result<(), CoreError>
+```
+
+Joins an existing group from a TLS-encoded Welcome message.
+
+**Prerequisites:**
+- `generate_key_package()` must have been called on **this same instance** (or one with the same `DiskKeyStore`) so that the HPKE init private key is available in the backend.
+
+**What happens internally:**
+
+1. **Deserialisation**: The bytes are deserialised as `MlsMessageIn`, then the inner body is extracted. The `into_welcome()` method is feature-gated in openmls 0.5, so the implementation uses `msg_in.extract()` with a match on `MlsMessageInBody::Welcome`.
+
+   ```rust
+   let welcome = match msg_in.extract() {
+       MlsMessageInBody::Welcome(w) => w,
+       _ => return Err(CoreError::Mls("expected a Welcome message".into())),
+   };
+   ```
+
+2. **Group construction**: `MlsGroup::new_from_welcome()` is called with:
+   - `&self.backend` -- to look up the HPKE init private key.
+   - `&self.config` -- group configuration (ratchet tree extension enabled).
+   - The `Welcome` message.
+   - `ratchet_tree = None` -- because `use_ratchet_tree_extension = true` means the tree is embedded in the Welcome's `GroupInfo` extension. openmls extracts it automatically.
+
+3. The resulting `MlsGroup` is stored in `self.group`.
+
+### `send_message(plaintext)`
+
+```rust
+pub fn send_message(&mut self, plaintext: &[u8]) -> Result<Vec<u8>, CoreError>
+```
+
+Encrypts plaintext as an MLS Application message (PrivateMessage variant).
+
+**What happens internally:**
+
+1. `group.create_message()` is called with the backend, signer, and plaintext.
+2. The resulting `MlsMessageOut` is serialised to bytes via `.to_bytes()`.
+
+The output is a TLS-encoded MLS message ready for delivery. The Delivery Service treats it as an opaque blob.
+
+### `receive_message(bytes)`
+
+```rust
+pub fn receive_message(&mut self, bytes: &[u8]) -> Result<Option<Vec<u8>>, CoreError>
+```
+
+Processes an incoming TLS-encoded MLS message.
+
+**Return values:**
+- `Ok(Some(plaintext))` -- for Application messages (PrivateMessage). The caller receives the decrypted plaintext.
+- `Ok(None)` -- for Commit messages. The group state is updated internally (epoch advances) via `merge_staged_commit()`.
+- `Ok(None)` -- for Proposal messages. The proposal is stored via `store_pending_proposal()` for inclusion in a future Commit.
+- `Ok(None)` -- for External Join Proposal messages. Also stored as a pending proposal.
+
+**What happens internally:**
+
+1. **Deserialisation**: Bytes are deserialised as `MlsMessageIn`, then extracted as either `PrivateMessage` or `PublicMessage`. The extraction uses manual pattern matching because `into_protocol_message()` is feature-gated in openmls 0.5:
+
+   ```rust
+   let protocol_message = match msg_in.extract() {
+       MlsMessageInBody::PrivateMessage(m) => ProtocolMessage::PrivateMessage(m),
+       MlsMessageInBody::PublicMessage(m) => ProtocolMessage::PublicMessage(m),
+       _ => return Err(CoreError::Mls("not a protocol message".into())),
+   };
+   ```
+
+2. **Processing**: `group.process_message()` decrypts (for PrivateMessage) or verifies (for PublicMessage) the message and returns a `ProcessedMessage`.
+
+3. **Content dispatch**: The `ProcessedMessageContent` is matched:
+   - `ApplicationMessage`: Plaintext bytes are extracted and returned.
+   - `StagedCommitMessage`: The staged commit is merged, advancing the epoch.
+   - `ProposalMessage` / `ExternalJoinProposalMessage`: The proposal is stored for later.
+
+## The `StoreCrypto` backend
+
+The `StoreCrypto` struct (in `quicnprotochat-core/src/keystore.rs`) implements `OpenMlsCryptoProvider`, which openmls requires for all cryptographic operations:
+
+```rust
+pub struct StoreCrypto {
+    crypto: RustCrypto,
+    key_store: DiskKeyStore,
+}
+```
+
+It couples two things:
+
+1. **`RustCrypto`**: The `openmls_rust_crypto` crate's implementation of MLS cryptographic primitives (HPKE, AEAD, hashing, signing). This provides both the `CryptoProvider` and `RandProvider` traits.
+
+2. **`DiskKeyStore`**: A key-value store that maps opaque byte keys to serialised MLS entities (HPKE private keys, epoch secrets, etc.). This is the critical piece -- openmls stores HPKE init private keys here during `KeyPackage::builder().build()` and retrieves them during `MlsGroup::new_from_welcome()`.
+
+### Why the backend must persist
+
+This is the most important implementation detail in the entire MLS layer:
+
+When `generate_key_package()` is called, openmls generates an HPKE init keypair and stores the private key in the `DiskKeyStore` under a reference derived from the init public key. When `join_group()` is later called with a Welcome message, `new_from_welcome()` decrypts the Welcome using that stored private key.
+
+**If the `DiskKeyStore` is lost between these two calls, the Welcome cannot be decrypted.**
+
+This means:
+- For ephemeral usage (tests, demos), `DiskKeyStore::ephemeral()` (in-memory `HashMap`) works as long as the same `GroupMember` instance is used throughout.
+- For persistent usage (real clients), `DiskKeyStore::persistent(path)` must be used. It serialises the `HashMap` to disk via `bincode` on every `store` and `delete` operation.
+
+### DiskKeyStore implementation
+
+```rust
+pub struct DiskKeyStore {
+    path: Option<PathBuf>,
+    values: RwLock<HashMap<Vec<u8>, Vec<u8>>>,
+}
+```
+
+- **Ephemeral mode** (`path = None`): Pure in-memory. Fast but not restart-safe.
+- **Persistent mode** (`path = Some(path)`): Flushes the entire `HashMap` to disk on every mutation. This is simple but not optimised -- a production system would use an append-only log or embedded database.
+
+The `OpenMlsKeyStore` trait implementation:
+- `store()`: Serialises the value via `serde_json`, inserts into the `HashMap`, then flushes to disk.
+- `read()`: Deserialises from the `HashMap` via `serde_json`.
+- `delete()`: Removes from the `HashMap`, then flushes to disk.
+
+## openmls 0.5 API gotchas
+
+Several openmls 0.5 API patterns are non-obvious and worth documenting:
+
+### `KeyPackageIn` vs `KeyPackage`
+
+openmls 0.5 separates untrusted wire types (`*In` suffix) from validated types. `KeyPackage` only derives `TlsSerialize`; `KeyPackageIn` derives `TlsDeserialize`. To go from bytes to a trusted `KeyPackage`:
+
+```rust
+KeyPackageIn::tls_deserialize(&mut bytes.as_ref())?
+    .validate(backend.crypto(), ProtocolVersion::Mls10)?
+```
+
+### Feature-gated methods
+
+Several convenient methods (`into_welcome()`, `into_protocol_message()`) are feature-gated behind openmls feature flags that quicnprotochat does not enable. The workaround is to use `msg_in.extract()` and pattern-match on the `MlsMessageInBody` enum variants.
+
+### MlsGroup is not Send
+
+`MlsGroup` holds internal state that may not be `Send` depending on the crypto backend. In quicnprotochat, `StoreCrypto` uses `RwLock` (which is `Send + Sync`), so `GroupMember` is `Send`. However, all MLS operations must use the same backend instance, so `GroupMember` should not be cloned across tasks.
+
+## Ratchet tree embedding
+
+The ratchet tree is embedded in Welcome messages via the `use_ratchet_tree_extension(true)` configuration. This means:
+
+1. When `add_member()` creates a Welcome, the full ratchet tree is included as a `GroupInfo` extension.
+2. When `join_group()` calls `new_from_welcome()` with `ratchet_tree = None`, openmls extracts the tree from the extension automatically.
+
+The trade-off:
+- **Pro**: No need for a separate tree distribution service or additional round-trips.
+- **Con**: Welcome messages grow with the group size (O(n log n) for a balanced tree of n members).
+
+For quicnprotochat's target group sizes (2-100 members), this trade-off is acceptable.
+
+## Wire format
+
+All MLS messages are serialised using TLS presentation language encoding (`tls_codec`). The TLS-encoded byte vectors are what the transport layer (Noise or QUIC) and the Delivery Service see. The DS routes these blobs without parsing them.
+
+The key wire message types:
+
+| MLS Type | Envelope MsgType | Direction |
+|---|---|---|
+| KeyPackage | `keyPackageUpload` | Client -> AS |
+| Welcome | `mlsWelcome` | Inviter -> DS -> Joinee |
+| Commit (PublicMessage) | `mlsCommit` | Committer -> DS -> Members |
+| Application (PrivateMessage) | `mlsApplication` | Sender -> DS -> Recipient |
+
+## Example: two-party round-trip
+
+The following sequence shows a complete Alice-and-Bob scenario, matching the `two_party_mls_round_trip` test in `group.rs`:
+
+```text
+1. Alice = GroupMember::new(alice_identity)
+2. Bob   = GroupMember::new(bob_identity)
+
+3. bob_kp = Bob.generate_key_package()
+   → Bob's backend now holds the HPKE init private key
+
+4. Alice.create_group(b"test-group")
+   → Alice is sole member at epoch 0
+
+5. (commit, welcome) = Alice.add_member(&bob_kp)
+   → Alice's epoch advances to 1
+   → commit is for existing members (Alice already merged it)
+   → welcome is for Bob
+
+6. Bob.join_group(&welcome)
+   → Bob's backend retrieves the HPKE init key to decrypt the Welcome
+   → Bob is now at the same epoch as Alice
+
+7. ct = Alice.send_message(b"hello bob")
+   → MLS PrivateMessage encrypted under the group key
+
+8. pt = Bob.receive_message(&ct)
+   → pt == Some(b"hello bob")
+
+9. ct = Bob.send_message(b"hello alice")
+10. pt = Alice.receive_message(&ct)
+    → pt == Some(b"hello alice")
+```
+
+## Credential model
+
+quicnprotochat uses MLS `Basic` credentials. The credential body is the raw Ed25519 public key bytes (32 bytes), and the `signature_key` is the same public key:
+
+```rust
+let credential = Credential::new(
+    self.identity.public_key_bytes().to_vec(),
+    CredentialType::Basic,
+)?;
+
+CredentialWithKey {
+    credential,
+    signature_key: self.identity.public_key_bytes().to_vec().into(),
+}
+```
+
+This means the MLS identity *is* the Ed25519 key. There is no X.509 certificate chain or other PKI. The trust model is:
+- Peers trust identity keys obtained out-of-band (e.g., verified via QR code, secure channel, or TOFU).
+- The Authentication Service stores KeyPackages indexed by Ed25519 public key.
+- The Delivery Service routes by Ed25519 public key.
+
+A future milestone may introduce X.509 credentials for integration with external PKI.
+
+## Further reading
+
+- [Forward Secrecy](../cryptography/forward-secrecy.md) -- How MLS epoch ratcheting provides forward secrecy.
+- [Post-Compromise Security](../cryptography/post-compromise-security.md) -- How MLS Update Commits restore security after key compromise.
+- [Ed25519 Identity Keys](../cryptography/identity-keys.md) -- Key generation and management for the identity keypair used as the MLS Signer.
+- [GroupMember Lifecycle](../internals/group-member-lifecycle.md) -- Detailed state transitions and error handling.
+- [KeyPackage Exchange Flow](../internals/keypackage-exchange.md) -- How KeyPackages flow through the Authentication Service.
+- [ADR-004: MLS-Unaware Delivery Service](../design-rationale/adr-004-mls-unaware-ds.md) -- Why the DS does not parse MLS messages.
+- [ADR-005: Single-Use KeyPackages](../design-rationale/adr-005-single-use-keypackages.md) -- Why KeyPackages are single-use.
+- [Hybrid KEM: X25519 + ML-KEM-768](hybrid-kem.md) -- Post-quantum outer encryption layer for MLS payloads.
+- [Storage Backend](../internals/storage-backend.md) -- DiskKeyStore persistence and the FileBackedStore used by the server.
--- a/docs/src/protocol-layers/noise-xx.md
+++ b/docs/src/protocol-layers/noise-xx.md
@@ -0,0 +1,227 @@
+# Noise\_XX Handshake
+
+quicnprotochat's M1 milestone used the Noise Protocol Framework for transport-layer encryption between peers over raw TCP. The implementation lives in `quicnprotochat-core/src/noise.rs` and uses the `snow 0.9` crate. Although the M3 architecture migrated client-server communication to [QUIC + TLS 1.3](quic-tls.md), the Noise\_XX transport remains in the codebase for direct peer-to-peer connections and integration testing.
+
+## The Noise\_XX pattern
+
+quicnprotochat uses the `Noise_XX_25519_ChaChaPoly_BLAKE2s` parameter set:
+
+| Component | Choice | Rationale |
+|---|---|---|
+| **Pattern** | XX | Mutual authentication with no pre-shared keys required |
+| **DH** | X25519 | 128-bit security level; fast; widely reviewed |
+| **AEAD** | ChaCha20-Poly1305 | Constant-time on all platforms (no AES-NI dependency) |
+| **Hash** | BLAKE2s | Faster than SHA-256 on software; 256-bit security level |
+
+The XX pattern involves a three-message handshake:
+
+```text
+XX handshake (3 messages):
+  -> e                      Initiator sends ephemeral public key
+  <- e, ee, s, es           Responder replies: ephemeral, DH(ee), static key, DH(es)
+  -> s, se                  Initiator sends static key, DH(se)
+```
+
+### Message-by-message breakdown
+
+**Message 1: `-> e` (Initiator to Responder)**
+
+The initiator generates an ephemeral X25519 keypair and sends the public half. At this point, no encryption is active. The ephemeral key is sent in the clear, but it reveals nothing about the initiator's identity.
+
+**Message 2: `<- e, ee, s, es` (Responder to Initiator)**
+
+The responder:
+1. Generates its own ephemeral X25519 keypair and sends the public half (`e`).
+2. Performs `DH(e_init, e_resp)` to establish a shared secret (`ee`).
+3. Sends its static (long-term) X25519 public key encrypted under the `ee` shared secret (`s`).
+4. Performs `DH(e_init, s_resp)` for an additional shared secret (`es`).
+
+After this message, the initiator knows the responder's static key and can authenticate it.
+
+**Message 3: `-> s, se` (Initiator to Responder)**
+
+The initiator:
+1. Sends its static X25519 public key encrypted under the accumulated handshake secrets (`s`).
+2. Performs `DH(s_init, e_resp)` for the final shared secret (`se`).
+
+After this message, both parties have authenticated each other's static keys and derived a symmetric session key for ChaCha20-Poly1305.
+
+### Why XX
+
+The XX pattern was chosen over other Noise patterns for several reasons:
+
+- **No pre-shared keys**: Unlike IK or KK, XX does not require either party to know the other's static key before the handshake. This simplifies bootstrapping -- peers can connect to each other using only a network address.
+- **Identity hiding for the initiator**: The initiator's static key is not sent until message 3, after the session is already encrypted. An eavesdropper cannot determine who is initiating the connection.
+- **Mutual authentication**: Both parties prove possession of their static private keys through DH operations. Unlike the NK or NX patterns, neither party is anonymous.
+- **Responder identity protection (partial)**: The responder's static key is encrypted under the `ee` DH secret in message 2, providing protection against passive eavesdroppers (but not against an active attacker who controls the initiator's ephemeral key).
+
+## Implementation
+
+The core type is `NoiseTransport`, defined in `quicnprotochat-core/src/noise.rs`:
+
+```rust
+pub struct NoiseTransport {
+    framed: Framed<TcpStream, LengthPrefixedCodec>,
+    session: snow::TransportState,
+    remote_static: Option<Vec<u8>>,
+}
+```
+
+The struct wraps three components:
+
+1. **`framed`**: A `tokio_util::codec::Framed<TcpStream, LengthPrefixedCodec>` that handles length-prefixed byte framing over TCP. Each frame is prefixed with a 4-byte little-endian length field. See [Length-Prefixed Framing Codec](../wire-format/framing-codec.md) for details on the wire format.
+
+2. **`session`**: A `snow::TransportState` that encrypts and decrypts Noise messages. This is obtained by calling `HandshakeState::into_transport_mode()` after the three-message handshake completes.
+
+3. **`remote_static`**: The remote peer's static X25519 public key (32 bytes), captured from the `HandshakeState` before `into_transport_mode()` consumes it. This is stored explicitly because `snow` does not guarantee that `TransportState::get_remote_static()` survives the mode transition.
+
+### Handshake functions
+
+Two public async functions perform the handshake:
+
+#### `handshake_initiator`
+
+```rust
+pub async fn handshake_initiator(
+    stream: TcpStream,
+    keypair: &NoiseKeypair,
+) -> Result<NoiseTransport, CoreError>
+```
+
+The initiator:
+
+1. Parses the Noise parameter string `Noise_XX_25519_ChaChaPoly_BLAKE2s` and builds a `snow::Builder` with the local private key.
+2. Wraps the TCP stream in `Framed<TcpStream, LengthPrefixedCodec>`.
+3. Allocates a scratch buffer of `NOISE_MAX_MSG` (65,535) bytes.
+4. **Message 1** (`-> e`): Calls `session.write_message(&[], &mut buf)` to produce the ephemeral key, then sends it as a length-prefixed frame.
+5. **Message 2** (`<- e, ee, s, es`): Receives a frame and calls `session.read_message()` to process it.
+6. **Message 3** (`-> s, se`): Calls `session.write_message()` again and sends the result.
+7. Zeroizes the scratch buffer (it contained plaintext key material during the handshake).
+8. Captures the remote static key via `session.get_remote_static()`.
+9. Transitions to transport mode via `session.into_transport_mode()`.
+
+The private key bytes are held in a `Zeroizing` wrapper and dropped immediately after `snow::Builder` clones them internally.
+
+#### `handshake_responder`
+
+```rust
+pub async fn handshake_responder(
+    stream: TcpStream,
+    keypair: &NoiseKeypair,
+) -> Result<NoiseTransport, CoreError>
+```
+
+The responder mirrors the initiator but with reversed message directions:
+
+1. Builds a `snow::Builder` with `build_responder()`.
+2. **Message 1** (`<- e`): Receives and processes the initiator's ephemeral key.
+3. **Message 2** (`-> e, ee, s, es`): Produces and sends the responder's reply.
+4. **Message 3** (`<- s, se`): Receives and processes the initiator's static key.
+5. Same zeroization, key capture, and mode transition as the initiator.
+
+Both functions return `CoreError::HandshakeIncomplete` if the peer closes the connection mid-handshake, `CoreError::Noise` for any snow error, or `CoreError::Codec` for TCP I/O failures.
+
+### Transport-layer I/O
+
+After the handshake, `NoiseTransport` provides two levels of I/O:
+
+**Frame-level** (raw bytes):
+
+- `send_frame(&mut self, plaintext: &[u8])` -- Encrypts plaintext with ChaCha20-Poly1305 (adding a 16-byte AEAD tag) and sends it as a length-prefixed frame. Rejects payloads exceeding `MAX_PLAINTEXT_LEN` (65,519 bytes -- the Noise maximum of 65,535 minus the 16-byte AEAD tag).
+- `recv_frame(&mut self)` -- Receives a length-prefixed frame and decrypts it.
+
+**Envelope-level** (Cap'n Proto messages):
+
+- `send_envelope(&mut self, env: &ParsedEnvelope)` -- Serialises a `ParsedEnvelope` to Cap'n Proto wire bytes via `build_envelope()`, then calls `send_frame()`.
+- `recv_envelope(&mut self)` -- Calls `recv_frame()`, then deserialises the bytes via `parse_envelope()`.
+
+## The capnp-rpc bridge: `into_capnp_io()`
+
+The most architecturally interesting method on `NoiseTransport` is `into_capnp_io()`, which bridges the message-oriented Noise transport with the stream-oriented `capnp-rpc` library:
+
+```rust
+pub fn into_capnp_io(mut self) -> (ReadHalf<DuplexStream>, WriteHalf<DuplexStream>)
+```
+
+### Why this bridge exists
+
+`capnp-rpc`'s `twoparty::VatNetwork` expects `AsyncRead + AsyncWrite` byte streams, but `NoiseTransport` is message-based -- each `send_frame`/`recv_frame` call encrypts/decrypts one discrete Noise message. These two models are incompatible: a byte stream has no inherent message boundaries, while Noise requires them for its AEAD authentication.
+
+### How it works
+
+The bridge uses `tokio::io::duplex` to create an in-process bidirectional byte channel:
+
+```text
+  capnp-rpc           duplex pipe           NoiseTransport
+  ┌─────────┐    ┌─────────────────┐    ┌───────────────────┐
+  │ VatNetwork │◄──►│ app_stream      │◄──►│ bridge task        │◄──► TCP
+  │ (reads/   │    │ (ReadHalf +     │    │ (tokio::select!)   │
+  │  writes)  │    │  WriteHalf)     │    │                    │
+  └─────────┘    └─────────────────┘    └───────────────────┘
+```
+
+1. `into_capnp_io()` creates a `tokio::io::duplex(MAX_PLAINTEXT_LEN)` pipe.
+2. It spawns a background Tokio task that uses `tokio::select!` to shuttle data bidirectionally:
+   - **Noise -> app**: Calls `self.recv_frame()`, writes the decrypted plaintext into the pipe.
+   - **App -> Noise**: Reads bytes from the pipe, calls `self.send_frame()` to encrypt and send them.
+3. The returned `(ReadHalf, WriteHalf)` are the application ends of the pipe, suitable for passing to `VatNetwork::new()`.
+
+The bridge task runs until either side of the pipe closes. When `capnp-rpc` drops the pipe halves, the bridge exits cleanly.
+
+The pipe capacity is set to `MAX_PLAINTEXT_LEN` (65,519 bytes) so that one Noise frame's worth of plaintext can be buffered without blocking.
+
+## Remote static key extraction
+
+After a successful handshake, `NoiseTransport::remote_static_public_key()` returns the authenticated remote peer's X25519 public key:
+
+```rust
+pub fn remote_static_public_key(&self) -> Option<&[u8]> {
+    self.remote_static.as_deref()
+}
+```
+
+This returns `Some(&[u8])` (32 bytes) in all normal cases. `None` would indicate a snow implementation bug where the XX handshake completed without exchanging static keys.
+
+Applications use the remote static key to:
+- Verify the peer's identity against a known-good key fingerprint.
+- Index the peer in a roster or routing table.
+- Derive additional key material for application-layer protocols.
+
+## Post-quantum gap (ADR-006)
+
+The Noise transport uses classical X25519 for all Diffie-Hellman operations. There is currently no standardised PQ-Noise extension in the `snow` crate. This means:
+
+- **Handshake metadata** (ephemeral keys, encrypted static keys) could be harvested by a passive attacker and decrypted later with a quantum computer ("harvest now, decrypt later" attack).
+- **Application data** encrypted by MLS is PQ-protected from the M5 milestone onward via the [Hybrid KEM](hybrid-kem.md) layer.
+
+The residual risk (metadata exposure via handshake harvest) is accepted for M1 through M5. On the QUIC + TLS 1.3 path, the same gap exists: TLS 1.3 key exchange uses classical ECDHE. Both gaps are tracked in [ADR-006: PQ Gap in Noise Transport](../design-rationale/adr-006-pq-gap.md).
+
+## Thread safety
+
+`NoiseTransport` is `Send` but not `Clone` or `Sync`. It should be used from a single Tokio task. To share data across tasks, use channels or other message-passing mechanisms. The `Debug` implementation formats the first four bytes of the remote static key as hex for logging:
+
+```rust
+NoiseTransport { remote_static: Some("a1b2c3d4…"), .. }
+```
+
+## Error handling
+
+All `NoiseTransport` methods return `Result<_, CoreError>` with these variants:
+
+| Error | Meaning |
+|---|---|
+| `CoreError::HandshakeIncomplete` | Peer closed the connection during the handshake |
+| `CoreError::Noise(snow::Error)` | Any Noise operation failed (pattern mismatch, bad DH, decryption failure) |
+| `CoreError::Codec(CodecError)` | TCP I/O failure or frame size violation |
+| `CoreError::ConnectionClosed` | Peer closed the connection during transport phase |
+| `CoreError::MessageTooLarge { size }` | Plaintext exceeds `MAX_PLAINTEXT_LEN` (65,519 bytes) |
+| `CoreError::Capnp(capnp::Error)` | Cap'n Proto serialisation error (envelope methods only) |
+
+## Further reading
+
+- [QUIC + TLS 1.3](quic-tls.md) -- The M3+ replacement for Noise\_XX on the client-server path.
+- [Cap'n Proto Serialisation and RPC](capn-proto.md) -- The serialisation layer that rides on top of the Noise transport.
+- [Length-Prefixed Framing Codec](../wire-format/framing-codec.md) -- The `LengthPrefixedCodec` used by `NoiseTransport`.
+- [X25519 Transport Keys](../cryptography/transport-keys.md) -- Key generation and management for Noise static keys.
+- [ADR-001: Noise\_XX for Transport Auth](../design-rationale/adr-001-noise-xx.md) -- Design rationale for choosing the XX pattern.
+- [ADR-006: PQ Gap in Noise Transport](../design-rationale/adr-006-pq-gap.md) -- Accepted risk of classical-only key exchange.
--- a/docs/src/protocol-layers/overview.md
+++ b/docs/src/protocol-layers/overview.md
@@ -0,0 +1,87 @@
+# Protocol Layers Overview
+
+quicnprotochat composes five distinct protocol layers into a single security stack. Each layer addresses a specific class of threat and delegates everything else to the layers above or below it. No single layer is sufficient on its own; the composition is what delivers end-to-end confidentiality, mutual authentication, forward secrecy, post-compromise security, and post-quantum resistance.
+
+This page provides a high-level comparison and a suggested reading order. The deep-dive pages that follow contain implementation details drawn directly from the source code.
+
+## Layer comparison
+
+| Layer | Standard / Spec | Crate(s) | Security Properties |
+|---|---|---|---|
+| **QUIC + TLS 1.3** | RFC 9000, RFC 9001 | `quinn 0.11`, `rustls 0.23` | Transport confidentiality, server authentication, 0-RTT resumption |
+| **Noise\_XX** | [Noise Protocol Framework](https://noiseprotocol.org/noise.html) | `snow 0.9` | Mutual authentication, identity hiding, ChaCha20-Poly1305 session encryption |
+| **Cap'n Proto** | [capnproto.org specification](https://capnproto.org/encoding.html) | `capnp 0.19`, `capnp-rpc 0.19` | Zero-copy deserialisation, schema-enforced types, canonical serialisation for signing, async RPC |
+| **MLS** | [RFC 9420](https://www.rfc-editor.org/rfc/rfc9420.html) | `openmls 0.5` | Group key agreement, forward secrecy, post-compromise security (PCS) |
+| **Hybrid KEM** | [draft-ietf-tls-hybrid-design](https://datatracker.ietf.org/doc/draft-ietf-tls-hybrid-design/) | `ml-kem 0.2`, `x25519-dalek 2` | Post-quantum resistance via ML-KEM-768 combined with X25519 |
+
+## How the layers compose
+
+Data flows through the stack from top to bottom on send and from bottom to top on receive:
+
+```text
+Application plaintext
+       |
+       v
+  +-----------+
+  |    MLS    |   RFC 9420 group encryption (PrivateMessage)
+  +-----------+
+       |
+       v
+  +-----------+
+  | Cap'n Proto|  Schema-typed serialisation into Envelope frames
+  +-----------+
+       |
+       v
+  +-----------+
+  | Noise_XX  |   Per-session ChaCha20-Poly1305 encryption (M1 TCP path)
+  +-----------+        -- OR --
+  +-----------+
+  | QUIC+TLS  |   QUIC transport encryption (M3+ QUIC path)
+  +-----------+
+       |
+       v
+    Network
+```
+
+In the current M3 architecture, the QUIC + TLS 1.3 layer has replaced the Noise\_XX layer for client-to-server transport. The Noise\_XX implementation remains in the codebase and is used for direct peer-to-peer connections in M1-era integration tests. Both paths carry Cap'n Proto messages as their inner payload.
+
+The Hybrid KEM layer operates orthogonally: it wraps MLS payloads in an outer post-quantum encryption envelope before they enter the transport layer. It is implemented and tested but not yet integrated into the MLS ciphersuite (planned for the M5 milestone).
+
+## Suggested reading order
+
+The pages in this section are ordered to build understanding incrementally:
+
+1. **[QUIC + TLS 1.3](quic-tls.md)** -- Start here. This is the outermost transport layer that every client-server connection uses today. Understanding QUIC stream multiplexing and the TLS 1.3 handshake is prerequisite to understanding how Cap'n Proto RPC rides on top.
+
+2. **[MLS (RFC 9420)](mls.md)** -- The core cryptographic innovation. MLS provides the group key agreement that makes quicnprotochat an E2E encrypted group messenger rather than just a transport-encrypted relay. This is the longest and most detailed page.
+
+3. **[Cap'n Proto Serialisation and RPC](capn-proto.md)** -- The serialisation and RPC layer that bridges MLS application data with the transport. Understanding the Envelope schema, the ParsedEnvelope owned type, and the NodeService RPC interface is essential for reading the server and client source code.
+
+4. **[Noise\_XX Handshake](noise-xx.md)** -- The M1-era transport encryption layer. Even though QUIC has replaced it for client-server communication, the Noise\_XX code remains in the codebase and the design decisions it embodies (mutual authentication, identity hiding) inform the overall architecture.
+
+5. **[Hybrid KEM: X25519 + ML-KEM-768](hybrid-kem.md)** -- The post-quantum encryption layer. Read this last because it builds on concepts from all other layers: key encapsulation (from MLS), wire format conventions (from Cap'n Proto), and AEAD encryption (from Noise).
+
+## Cross-cutting concerns
+
+Several topics span multiple layers and have their own dedicated pages elsewhere in this book:
+
+- **Forward secrecy**: Provided by MLS epoch ratcheting. See [Forward Secrecy](../cryptography/forward-secrecy.md).
+- **Post-compromise security**: Provided by MLS Update proposals. See [Post-Compromise Security](../cryptography/post-compromise-security.md).
+- **Post-quantum readiness**: Currently provided by the standalone Hybrid KEM module; integration into MLS is planned for M5. See [Post-Quantum Readiness](../cryptography/post-quantum-readiness.md).
+- **Key lifecycle and zeroization**: Private key material is zeroized after use across all layers. See [Key Lifecycle and Zeroization](../cryptography/key-lifecycle.md).
+- **Wire format details**: The length-prefixed framing codec and Cap'n Proto schema definitions are documented in the [Wire Format Reference](../wire-format/overview.md) section.
+- **Design rationale**: The ADR pages explain *why* each layer was chosen. See [Design Decisions Overview](../design-rationale/overview.md).
+
+## Crate mapping
+
+Each protocol layer maps to one or more workspace crates:
+
+| Layer | Primary Crate | Source File(s) |
+|---|---|---|
+| QUIC + TLS 1.3 | `quicnprotochat-server`, `quicnprotochat-client` | `main.rs` (server and client entry points) |
+| Noise\_XX | `quicnprotochat-core` | `src/noise.rs`, `src/codec.rs` |
+| Cap'n Proto | `quicnprotochat-proto` | `src/lib.rs`, `build.rs`, `schemas/*.capnp` |
+| MLS | `quicnprotochat-core` | `src/group.rs`, `src/keystore.rs` |
+| Hybrid KEM | `quicnprotochat-core` | `src/hybrid_kem.rs` |
+
+For a full crate responsibility breakdown, see [Crate Responsibilities](../architecture/crate-responsibilities.md).
--- a/docs/src/protocol-layers/quic-tls.md
+++ b/docs/src/protocol-layers/quic-tls.md
@@ -0,0 +1,177 @@
+# QUIC + TLS 1.3
+
+quicnprotochat uses QUIC (RFC 9000) with mandatory TLS 1.3 (RFC 9001) as its client-to-server transport layer. This page explains why QUIC was chosen over raw TCP, how the `quinn` and `rustls` crates are integrated, and what security properties the transport provides.
+
+## Why QUIC over raw TCP
+
+The M1 milestone used raw TCP sockets with a Noise\_XX handshake for transport encryption (see [Noise\_XX Handshake](noise-xx.md)). Starting from M3, the project migrated to QUIC for several reasons:
+
+| Property | Raw TCP + Noise | QUIC + TLS 1.3 |
+|---|---|---|
+| **Multiplexed streams** | Single stream; application must multiplex manually | Native bidirectional streams; each RPC call gets its own stream |
+| **0-RTT resumption** | Not available; full handshake every time | Built-in; returning clients can send data in the first flight |
+| **Head-of-line blocking** | A lost TCP segment blocks all subsequent data | Only the affected stream is blocked; other streams proceed |
+| **NAT traversal** | TCP requires keep-alives; NAT rebinding breaks connections | UDP-based; connection migration survives NAT rebinding |
+| **TLS integration** | Separate Noise handshake layered on top of TCP | TLS 1.3 is integral to the QUIC handshake; no extra round-trips |
+| **Ecosystem support** | Custom framing codec required | `capnp-rpc` can use QUIC bidirectional streams directly via `tokio-util` compat layer |
+
+The migration also simplified the codebase: the custom `LengthPrefixedCodec` framing layer and the `into_capnp_io()` bridge (documented in [Noise\_XX Handshake](noise-xx.md)) are no longer needed on the QUIC path because `capnp-rpc` reads and writes directly on the QUIC stream.
+
+## Crate integration
+
+quicnprotochat uses the following crates for QUIC and TLS:
+
+- **`quinn 0.11`** -- The async QUIC implementation for Tokio. Provides `Endpoint`, `Connection`, and bidirectional stream types.
+- **`quinn-proto 0.11`** -- The protocol-level types, including `QuicServerConfig` and `QuicClientConfig` wrappers that bridge `rustls` into `quinn`.
+- **`rustls 0.23`** -- The TLS implementation. quicnprotochat uses it in strict TLS 1.3 mode with no fallback to TLS 1.2.
+- **`rcgen 0.13`** -- Self-signed certificate generation for development and testing.
+
+### Server configuration
+
+The server builds its QUIC endpoint configuration in `build_server_config()` (in `quicnprotochat-server/src/main.rs`):
+
+```rust
+let mut tls = rustls::ServerConfig::builder_with_protocol_versions(&[&TLS13])
+    .with_no_client_auth()
+    .with_single_cert(cert_chain, key)?;
+tls.alpn_protocols = vec![b"capnp".to_vec()];
+
+let crypto = QuicServerConfig::try_from(tls)?;
+Ok(ServerConfig::with_crypto(Arc::new(crypto)))
+```
+
+Key points:
+
+1. **TLS 1.3 strict mode**: `builder_with_protocol_versions(&[&TLS13])` ensures no TLS 1.2 fallback. This is a hard requirement: TLS 1.2 lacks the 0-RTT and full forward secrecy guarantees that quicnprotochat relies on.
+
+2. **No client certificate authentication**: `with_no_client_auth()` means the server does not verify client certificates at the TLS layer. Client authentication is handled at the application layer via Ed25519 identity keys and MLS credentials. This is a deliberate design choice -- MLS provides stronger authentication properties than TLS client certificates.
+
+3. **ALPN negotiation**: The Application-Layer Protocol Negotiation extension is set to `b"capnp"`, advertising that this endpoint speaks Cap'n Proto RPC. Both client and server must agree on this protocol identifier or the TLS handshake fails.
+
+4. **`QuicServerConfig` bridge**: The `quinn-proto` crate provides `QuicServerConfig::try_from(tls)` to adapt the `rustls::ServerConfig` for use with QUIC. This handles the QUIC-specific TLS parameters (transport parameters, QUIC header protection keys) automatically.
+
+### Client configuration
+
+The client performs the mirror operation. It loads the server's DER-encoded certificate from a local file and constructs a `rustls::ClientConfig`:
+
+```rust
+let mut roots = rustls::RootCertStore::empty();
+roots.add(CertificateDer::from(cert_bytes))?;
+
+let tls = rustls::ClientConfig::builder_with_protocol_versions(&[&TLS13])
+    .with_root_certificates(roots)
+    .with_no_client_auth();
+tls.alpn_protocols = vec![b"capnp".to_vec()];
+
+let crypto = QuicClientConfig::try_from(tls)?;
+```
+
+The client trusts exactly one certificate: the server's self-signed cert loaded from disk. There is no system trust store involved, which simplifies the trust model but requires out-of-band distribution of the server certificate.
+
+### Per-connection handling
+
+Each accepted QUIC connection spawns a handler task:
+
+```rust
+let (send, recv) = connection.accept_bi().await?;
+let (reader, writer) = (recv.compat(), send.compat_write());
+
+let network = twoparty::VatNetwork::new(reader, writer, Side::Server, Default::default());
+let service: node_service::Client = capnp_rpc::new_client(NodeServiceImpl { store, waiters });
+RpcSystem::new(Box::new(network), Some(service.client)).await?;
+```
+
+The `tokio-util` compat layer (`compat()` and `compat_write()`) converts Quinn's `RecvStream` and `SendStream` into types that implement `futures::AsyncRead` and `futures::AsyncWrite`, which `capnp-rpc`'s `VatNetwork` requires. The entire Cap'n Proto RPC system then runs over this single QUIC bidirectional stream.
+
+Because `capnp-rpc` uses `Rc<RefCell<>>` internally (making it `!Send`), all RPC tasks run on a `tokio::task::LocalSet`. The server spawns each connection handler via `tokio::task::spawn_local`.
+
+## Certificate trust model
+
+quicnprotochat currently uses a **trust-on-first-use (TOFU)** model with self-signed certificates:
+
+1. On first start, the server generates a self-signed certificate using `rcgen::generate_simple_self_signed` with SANs for `localhost`, `127.0.0.1`, and `::1`.
+2. The certificate and private key are persisted to disk as DER files (default: `data/server-cert.der` and `data/server-key.der`).
+3. Clients must obtain the server's certificate file out-of-band and reference it via the `--ca-cert` flag or `QUICNPROTOCHAT_CA_CERT` environment variable.
+
+This model is adequate for development and single-server deployments. The roadmap includes:
+
+- **ACME integration** (Let's Encrypt) for production deployments with publicly-routable servers.
+- **Certificate pinning** to detect MITM attacks even when a CA is compromised.
+- **Certificate transparency** log monitoring for detecting misissued certificates.
+
+## Self-signed certificate generation
+
+The server's `generate_self_signed()` function:
+
+```rust
+let subject_alt_names = vec![
+    "localhost".to_string(),
+    "127.0.0.1".to_string(),
+    "::1".to_string(),
+];
+let issued = generate_simple_self_signed(subject_alt_names)?;
+
+fs::write(cert_path, issued.cert.der())?;
+fs::write(key_path, &issued.key_pair.serialize_der())?;
+```
+
+The generated certificate includes both DNS and IP SANs so that clients can connect using either `localhost` or an IP address. The client specifies the expected server name via `--server-name` (default: `localhost`), which must match one of the certificate's SANs.
+
+## Security properties
+
+The QUIC + TLS 1.3 layer provides:
+
+| Property | Mechanism |
+|---|---|
+| **Transport confidentiality** | All application data is encrypted with AES-128-GCM or ChaCha20-Poly1305 (negotiated during the TLS handshake) |
+| **Server authentication** | The client verifies the server's certificate against the locally-trusted DER file |
+| **Forward secrecy** | TLS 1.3 exclusively uses ephemeral Diffie-Hellman key exchange; session keys are not derivable from the server's long-term key |
+| **Replay protection** | QUIC packet numbers and TLS 1.3's anti-replay mechanism prevent replay attacks |
+| **Connection migration** | QUIC connection IDs allow the client to change IP addresses without re-handshaking |
+
+### What TLS does *not* provide
+
+- **Client authentication**: Handled by MLS identity credentials at the application layer. See [MLS (RFC 9420)](mls.md).
+- **End-to-end encryption**: TLS terminates at the server. The server can read the Cap'n Proto RPC framing and message routing metadata. Payload confidentiality is provided by MLS. See [MLS (RFC 9420)](mls.md).
+- **Post-quantum resistance**: TLS 1.3 key exchange uses classical ECDHE. Post-quantum protection of application data is provided by the [Hybrid KEM](hybrid-kem.md) layer (M5 milestone).
+- **Mutual peer authentication**: For peer-to-peer scenarios, the M1-era [Noise\_XX](noise-xx.md) transport provides mutual authentication with identity hiding.
+
+## Comparison with Noise\_XX (M1 approach)
+
+| Aspect | Noise\_XX (M1) | QUIC + TLS 1.3 (M3+) |
+|---|---|---|
+| **Transport** | Raw TCP | UDP (QUIC) |
+| **Handshake** | 3-message Noise XX pattern | TLS 1.3 (1-RTT or 0-RTT) |
+| **Mutual auth** | Both peers authenticate static X25519 keys | Server-only at TLS layer; mutual auth via MLS |
+| **Identity hiding** | Initiator's identity hidden until message 3 | No identity hiding at TLS layer |
+| **Stream multiplexing** | None (single stream) | Native QUIC streams |
+| **RPC bridge** | `into_capnp_io()` with `tokio::io::duplex` | Direct `compat()` wrapper on QUIC stream |
+| **Codebase location** | `quicnprotochat-core/src/noise.rs` | `quicnprotochat-server/src/main.rs`, client `lib.rs` |
+
+The Noise\_XX path remains useful for direct peer-to-peer connections (without a central server) and as a fallback transport. Both paths carry identical Cap'n Proto message payloads, so the application layer is transport-agnostic.
+
+## Configuration reference
+
+### Server
+
+| Environment Variable | CLI Flag | Default | Description |
+|---|---|---|---|
+| `QUICNPROTOCHAT_LISTEN` | `--listen` | `0.0.0.0:7000` | QUIC listen address |
+| `QUICNPROTOCHAT_TLS_CERT` | `--tls-cert` | `data/server-cert.der` | TLS certificate path |
+| `QUICNPROTOCHAT_TLS_KEY` | `--tls-key` | `data/server-key.der` | TLS private key path |
+| `QUICNPROTOCHAT_DATA_DIR` | `--data-dir` | `data` | Persistent storage directory |
+
+### Client
+
+| Environment Variable | CLI Flag | Default | Description |
+|---|---|---|---|
+| `QUICNPROTOCHAT_CA_CERT` | `--ca-cert` | `data/server-cert.der` | Server certificate to trust |
+| `QUICNPROTOCHAT_SERVER_NAME` | `--server-name` | `localhost` | Expected TLS server name (must match certificate SAN) |
+| `QUICNPROTOCHAT_SERVER` | `--server` | `127.0.0.1:7000` | Server address (per-subcommand) |
+
+## Further reading
+
+- [Noise\_XX Handshake](noise-xx.md) -- The M1-era transport layer that QUIC replaced.
+- [Cap'n Proto Serialisation and RPC](capn-proto.md) -- The RPC layer that runs on top of QUIC streams.
+- [Service Architecture](../architecture/service-architecture.md) -- How the server's `NodeServiceImpl` binds to the QUIC endpoint.
+- [ADR-006: PQ Gap in Noise Transport](../design-rationale/adr-006-pq-gap.md) -- Discusses the post-quantum gap in both the Noise and TLS transport layers.