feat: add post-quantum hybrid KEM + SQLCipher persistence

Feature 1 — Post-Quantum Hybrid KEM (X25519 + ML-KEM-768):
- Create hybrid_kem.rs with keygen, encrypt, decrypt + 11 unit tests
- Wire format: version(1) | x25519_eph_pk(32) | mlkem_ct(1088) | nonce(12) | ct
- Add uploadHybridKey/fetchHybridKey RPCs to node.capnp schema
- Server: hybrid key storage in FileBackedStore + RPC handlers
- Client: hybrid keypair in StoredState, auto-wrap/unwrap in send/recv/invite/join
- demo-group runs full hybrid PQ envelope round-trip

Feature 2 — SQLCipher Persistence:
- Extract Store trait from FileBackedStore API
- Create SqlStore (rusqlite + bundled-sqlcipher) with encrypted-at-rest SQLite
- Schema: key_packages, deliveries, hybrid_keys tables with indexes
- Server CLI: --store-backend=sql, --db-path, --db-key flags
- 5 unit tests for SqlStore (FIFO, round-trip, upsert, channel isolation)

Also includes: client lib.rs refactor, auth config, TOML config file support,
mdBook documentation, and various cleanups by user.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
This commit is contained in:
2026-02-22 08:07:48 +01:00
parent d1ddef4cea
commit f334ed3d43
81 changed files with 14502 additions and 2289 deletions

View File

@@ -0,0 +1,256 @@
# Auth, Devices, and Tokens
This page describes the authentication, device management, and authorisation
design for quicnprotochat. It introduces account and device identities, gates
server operations by authenticated identity, enforces rate and size limits, and
binds MLS identity keys to accounts.
This design cuts across milestones M4 through M6. For the broader production
readiness plan, see [Production Readiness WBS](production-readiness.md).
---
## Goals
1. **Introduce accounts and devices** with authenticated access to `NodeService`.
2. **Gate operations by identity:** enqueue/fetch/fetchWait require a valid token
bound to the caller's account and device.
3. **Enforce rate and size limits** per account, per device, and per IP.
4. **Bind MLS identity keys to accounts:** a KeyPackage upload must be associated
with the uploading account, preventing impersonation.
5. **Keep wire changes minimal and versioned:** the `Auth` struct is additive
and uses a version field for backward compatibility.
---
## Data Model (Server)
### Accounts
| Field | Type | Description |
|-------|------|-------------|
| `account_id` | UUID | Unique account identifier |
| `created_at` | Timestamp | Account creation time |
| `status` | Enum | `active`, `suspended`, `deleted` |
### Devices
| Field | Type | Description |
|-------|------|-------------|
| `device_id` | UUID | Unique device identifier |
| `account_id` | UUID | Owning account (foreign key) |
| `device_pubkey` | Ed25519 public key (32 bytes) | Device signing key |
| `created_at` | Timestamp | Device registration time |
| `status` | Enum | `active`, `revoked` |
### Sessions / Tokens
| Field | Type | Description |
|-------|------|-------------|
| `session_id` | UUID | Unique session identifier |
| `account_id` | UUID | Owning account |
| `device_id` | UUID | Originating device |
| `access_token` | Opaque bytes | Short-lived bearer token |
| `refresh_token` | Opaque bytes | Long-lived token for renewal |
| `expires_at` | Timestamp | Access token expiry |
| `created_at` | Timestamp | Session creation time |
### Identity Binding
| Field | Type | Description |
|-------|------|-------------|
| `account_id` | UUID | Owning account |
| `mls_identity_key` | Ed25519 public key (32 bytes) | MLS credential public key |
| `verified_fp` | SHA-256 fingerprint (32 bytes) | Fingerprint of the bound key |
The identity binding table ensures that only the account that registered an
Ed25519 public key can upload KeyPackages for that key. This prevents a
compromised or malicious client from uploading KeyPackages under another
account's identity.
---
## Wire / API Changes
### Auth Struct
A new `Auth` struct is added to all `NodeService` RPC methods:
```capnp
struct Auth {
version @0 :UInt16; # 0 = legacy (no auth), 1 = token-based
accessToken @1 :Data; # opaque bearer token
deviceId @2 :Data; # optional UUID (16 bytes) for audit/rate limit
}
```
The `Auth` struct is included as a parameter in `enqueue`, `fetch`, `fetchWait`,
`uploadKeyPackage`, and `fetchKeyPackage`.
### Versioning
| Version | Meaning |
|---------|---------|
| 0 | Legacy mode: no authentication. Server can allow-list in development but defaults to rejecting in production. |
| 1 | Token-based authentication. `accessToken` is required and validated. |
The server rejects any `version` value higher than its current maximum. This
ensures that a newer client connecting to an older server fails cleanly rather
than silently skipping auth.
### Optional Device ID
The `deviceId` field is optional. When present, the server uses it for:
- Per-device rate limiting (in addition to per-account limits).
- Audit logging (which device performed which operation).
- Future: device revocation without revoking the entire account.
---
## Server Enforcement
### Token Validation
1. Extract `Auth` struct from the incoming RPC.
2. If `version == 0` and server is in production mode, reject with
`AUTHENTICATION_REQUIRED`.
3. If `version == 1`, validate `accessToken`:
- Token must exist in the session store.
- Token must not be expired (`expires_at > now`).
- Associated account must have `status == active`.
- Associated device (if `deviceId` present) must have `status == active`.
4. Map validated token to `(account_id, device_id)` for downstream authorisation.
### Identity Matching
- **uploadKeyPackage:** The `identityKey` in the RPC must match an identity
binding for the authenticated account. Reject with `IDENTITY_MISMATCH` if the
key is not bound to the caller's account.
- **fetchKeyPackage:** No identity restriction (any authenticated client can
fetch any identity's KeyPackage -- this is required for the MLS add-member flow).
- **enqueue:** If `channelId` is present, the caller's identity must be in the
channel membership. If `channelId` is absent (legacy mode), the operation is
allowed for any authenticated client.
- **fetch / fetchWait:** The `recipientKey` must correspond to an identity bound
to the caller's account.
### Rate Limits
| Limit | Scope | Default |
|-------|-------|---------|
| Request rate | Per IP | 50 requests/second |
| Request rate | Per account | 50 requests/second |
| Request rate | Per device | 50 requests/second |
| Payload size | Per RPC call | 5 MB |
| KeyPackage TTL | Per package | 24 hours |
| KeyPackage uploads | Per account | Configurable (prevents store exhaustion) |
Rate limit counters use a sliding window. When a limit is exceeded, the server
responds with `RATE_LIMITED` and includes a `Retry-After` hint.
### Audit Logging
The following events are logged at audit level:
- Authentication success (account, device, IP).
- Authentication failure (reason, IP).
- Token issuance and refresh (account, device).
- KeyPackage upload (account, identity key fingerprint).
- Enqueue (account, channel, recipient).
- Fetch / fetchWait (account, recipient).
- Rate limit exceeded (scope, account/IP, current rate).
All audit log entries include a timestamp and correlation ID. Sensitive fields
(token values, ciphertext, private keys) are never logged.
---
## Client Changes
### Login / Register Flow
1. **Register:** Client generates an Ed25519 identity keypair, sends the public
key to the server. Server creates an account, binds the identity key, and
returns an `(access_token, refresh_token)` pair.
2. **Login:** Client presents credentials (initially: signed challenge from
device key). Server validates and issues tokens.
3. **Token storage:** Access and refresh tokens stored in the client state file
(same location as identity keypair). The state file should be
permission-restricted (`0600`).
4. **Token refresh:** Client detects `TOKEN_EXPIRED` errors and uses the refresh
token to obtain a new access token without re-authenticating.
### RPC Integration
Every RPC call includes the `Auth` struct:
```rust
// Pseudocode for client RPC calls
let auth = Auth {
version: 1,
access_token: state.access_token.clone(),
device_id: Some(state.device_id),
};
node_service.enqueue(auth, recipient_key, channel_id, payload).await?;
```
### Identity Binding
At registration, the client's Ed25519 public key is bound to the new account.
The client must refuse to upload KeyPackages if the local identity key does not
match the bound key -- this prevents accidental identity confusion after key
rotation.
---
## Compatibility
### Wire Version Field
The `Auth` struct includes its own `version` field, independent of the delivery
message version. This allows auth changes to evolve separately from the delivery
protocol.
### Legacy Support
- `version == 0`: No auth. Server behaviour is configurable:
- **Development:** Allow legacy calls (default for `cargo run`).
- **Production:** Reject legacy calls (default for Docker deployment).
- `version == 1`: Full auth. This is the target for M4+.
### N-1 Integration Tests
Compatibility testing covers:
- New client (v1 auth) against new server -- expected: full auth flow works.
- Old client (v0 legacy) against new server in dev mode -- expected: legacy
calls succeed.
- Old client (v0 legacy) against new server in prod mode -- expected: clean
rejection with `AUTHENTICATION_REQUIRED`.
- New client (v1 auth) against old server -- expected: server ignores unknown
`Auth` struct fields; operations succeed if server does not enforce auth.
---
## Implementation Sequence
1. Extend Cap'n Proto schemas with the `Auth` struct and add it to all
`NodeService` methods.
2. Implement token validation middleware in server RPC handlers; add an in-memory
token store (upgradeable to SQLite at M6).
3. Bind `identityKey` to account on upload; enforce on fetch/enqueue.
4. Add tests: unit tests for token validation; integration tests for auth
success and failure paths.
5. Add rate limiting middleware with configurable thresholds.
6. Add audit logging for all auth-related events.
---
## Cross-references
- [Milestones](milestones.md) -- M4 and M6 deliverables
- [Production Readiness WBS](production-readiness.md) -- Phase 3 (Auth/Device/Server Hardening)
- [1:1 Channel Design](dm-channels.md) -- channel-level authz
- [Wire Format: NodeService Schema](../wire-format/node-service-schema.md) -- RPC schema
- [Coding Standards](../contributing/coding-standards.md) -- security-by-design requirements

View File

@@ -0,0 +1,261 @@
# 1:1 Channel Design
This page describes the design for first-class 1:1 (direct message) channels in
quicnprotochat. Channels provide per-conversation authorisation, MLS-encrypted
payloads, message retention with TTL eviction, and backward compatibility with
the legacy delivery model.
For the broader roadmap context, see [Milestones](milestones.md) and
[Production Readiness WBS](production-readiness.md) (Phase 4).
---
## Goals
1. **First-class 1:1 channels.** Each conversation between two participants has
a unique `channelId`, enabling per-channel authorisation, storage, and
eviction.
2. **Per-channel authorisation.** The server enforces that only the two channel
members can enqueue and fetch messages for a given channel.
3. **MLS-encrypted payloads.** All message content is MLS ciphertext. The server
never sees plaintext. Channel metadata (ID + participant keys) is the only
information the server holds.
4. **7-day message retention.** Messages older than 7 days are evicted. This is
configurable but defaults to 7 days.
5. **24-hour KeyPackage TTL.** KeyPackages expire after 24 hours. Clients must
rotate KeyPackages before expiry to remain reachable.
---
## Schema Changes (Cap'n Proto)
### New Fields
The following fields are added to the existing `NodeService` RPC methods:
| RPC Method | New Field | Type | Description |
|------------|-----------|------|-------------|
| `enqueue` | `channelId` | `Data` (UUID, 16 bytes) | Target channel |
| `fetch` | `channelId` | `Data` (UUID, 16 bytes) | Channel to fetch from |
| `fetchWait` | `channelId` | `Data` (UUID, 16 bytes) | Channel to long-poll |
| All messages | `version` | `UInt16` | Wire version for forward compat |
### Version Field
The `version` field on delivery messages allows the server to reject messages
with unknown versions. The current version is `1`. Clients that do not set
`channelId` are treated as version `0` (legacy mode).
### New RPC Method
A new `createChannel` method is added to `NodeService`:
```capnp
createChannel @N (
auth :Auth,
peerKey :Data # Ed25519 public key of the other participant
) -> (
channelId :Data # UUID, 16 bytes
);
```
The server generates the `channelId`, stores the membership, and returns the ID
to the caller. The peer discovers the channel when they receive a message
addressed to it (or via a separate discovery mechanism in a future milestone).
---
## AuthZ Model
### Channel Membership
Each channel has exactly two members, identified by their Ed25519 public keys:
```
Channel {
channelId: UUID (16 bytes)
members: {a_key: Ed25519PubKey, b_key: Ed25519PubKey}
created_at: Timestamp
}
```
The server stores this mapping and enforces it on every operation.
### Enqueue Authorisation
When a client calls `enqueue(auth, channelId, recipientKey, payload)`:
1. Validate the `Auth` token (see [Auth, Devices, and Tokens](authz-plan.md)).
2. Look up the channel by `channelId`.
3. Verify that the caller's identity (from the token) is one of the channel's
two members.
4. Verify that `recipientKey` is the *other* member of the channel (prevents
sending to yourself or to a non-member).
5. Apply rate limits (50 r/s per identity, 5 MB payload cap).
6. Enqueue the payload.
### Fetch Authorisation
When a client calls `fetch(auth, channelId, recipientKey)` or
`fetchWait(auth, channelId, recipientKey, timeout)`:
1. Validate the `Auth` token.
2. Verify that the caller's identity matches `recipientKey`.
3. Verify that `recipientKey` is a member of the specified channel.
4. Return messages for `(channelId, recipientKey)`, filtering out expired
messages (TTL check).
---
## Storage Model
### Channels Table
| Column | Type | Description |
|--------|------|-------------|
| `channel_id` | UUID (16 bytes) | Primary key |
| `member_a_key` | Ed25519 public key (32 bytes) | First member |
| `member_b_key` | Ed25519 public key (32 bytes) | Second member |
| `created_at` | Timestamp | Channel creation time |
A unique constraint on `(member_a_key, member_b_key)` (sorted) prevents
duplicate channels between the same pair of identities.
### Delivery Queue
Messages are keyed by `(channelId, recipient_key)`:
| Column | Type | Description |
|--------|------|-------------|
| `channel_id` | UUID (16 bytes) | Channel |
| `recipient_key` | Ed25519 public key (32 bytes) | Intended recipient |
| `payload` | Bytes | MLS ciphertext (opaque to server) |
| `received_at` | Timestamp | Server receive time |
| `sequence_no` | UInt64 | Per-channel, per-recipient monotonic counter |
### TTL Eviction
Messages are evicted in two ways:
1. **Fetch-time check:** When a client fetches messages, the server filters out
any message where `received_at + TTL < now`. This is the primary eviction
path.
2. **Background sweep:** A periodic task (configurable interval, default 1 hour)
scans for and deletes expired messages. This prevents unbounded storage
growth from inactive channels.
Default TTL values:
| Entity | TTL | Configurable |
|--------|-----|-------------|
| Messages | 7 days | Yes |
| KeyPackages | 24 hours | Yes |
---
## Flows
### Create Channel
```
Alice Server Bob
| | |
|-- createChannel(auth, bob_key) | |
| |-- generate channelId |
| |-- store {channelId, |
| | alice_key, bob_key} |
|<- channelId ------------------| |
| | |
```
Alice receives the `channelId` and can now send messages to Bob on this channel.
Bob discovers the channel when he receives the first message (the `channelId` is
included in the delivery metadata).
### Send (with AuthZ)
```
Alice Server
| |
|-- enqueue(auth, channelId, |
| bob_key, mls_ciphertext) |
| |-- validate auth token
| |-- lookup channel membership
| |-- verify alice_key in members
| |-- verify bob_key is recipient
| |-- check rate limits
| |-- store (channelId, bob_key,
| | payload, received_at, seq)
|<- ok (sequence_no) ------------|
| |
```
### Receive (with TTL)
```
Bob Server
| |
|-- fetchWait(auth, channelId, |
| bob_key, timeout) |
| |-- validate auth token
| |-- verify bob_key in channel
| |-- query (channelId, bob_key)
| |-- filter: received_at + 7d > now
| |-- return non-expired messages
|<- messages[] ------------------|
| |
```
---
## Backward Compatibility
### Legacy Mode (channelId = nil)
When `channelId` is empty or absent:
- The server treats the request as a legacy delivery (pre-channel behavior).
- Messages are routed solely by `recipientKey`, without channel-level authz.
- This mode can be disabled in production via server configuration.
### Version Negotiation
The `version` field on delivery messages allows clean rejection of future schema
changes:
| Version | Behavior |
|---------|----------|
| 0 | Legacy mode: no `channelId`, no per-channel authz |
| 1 | Channel-aware: `channelId` required, authz enforced |
The server rejects messages with `version > max_supported`.
---
## Open Items
These items are deferred to future milestones:
- **Persistence backend:** The current `DashMap`-based store must be extended to
SQLite (or SQLCipher) for durable channel and delivery state. See
[Milestones: M6](milestones.md#m6----persistence-planned).
- **Channel discovery API:** A dedicated RPC for Bob to discover channels he is
a member of, rather than relying on first-message discovery.
- **Client UX:** Map peer identity to `channelId` discovery; cache `channelId`
in the client state file.
- **Audit logging:** Log channel creation, authz failures, send/recv events with
redaction of ciphertext. See [Auth, Devices, and Tokens](authz-plan.md) for
the audit logging design.
- **Multi-device:** A single account on multiple devices sharing the same
channel. Requires per-device delivery queues and MLS multi-device support.
---
## Cross-references
- [Milestones](milestones.md) -- M4 (CLI subcommands) and M6 (persistence)
- [Production Readiness WBS](production-readiness.md) -- Phase 4 (Delivery Semantics)
- [Auth, Devices, and Tokens](authz-plan.md) -- token validation and identity binding
- [Wire Format: Delivery Schema](../wire-format/delivery-schema.md) -- current delivery schema
- [Wire Format: NodeService Schema](../wire-format/node-service-schema.md) -- RPC interface
- [Architecture Overview](../architecture/overview.md) -- system diagram and service model

View File

@@ -0,0 +1,406 @@
# Future Research Directions
This page catalogues technologies and research directions that could strengthen
quicnprotochat beyond the current [milestone plan](milestones.md). Each entry
includes a brief description, the problem it solves, relevant crates or
specifications, and how it maps to the project architecture.
For the production readiness work breakdown, see
[Production Readiness WBS](production-readiness.md).
---
## Transport and Networking
### LibP2P / iroh (n0)
**Problem:** The current architecture is strictly client-server. Clients behind
NAT cannot communicate directly, and the server is a single point of failure for
delivery.
**Solution:** [LibP2P](https://libp2p.io/) and [iroh](https://iroh.computer/)
(from n0) provide peer discovery, NAT traversal (hole-punching), and relay
fallback. iroh is particularly interesting because it is Rust-native and built on
QUIC, aligning with quicnprotochat's existing transport layer.
**Architecture impact:** Move from pure client-server to a hybrid topology where
peers communicate directly when possible and fall back to server relay when NAT
traversal fails. The server role shifts from mandatory relay to optional
rendezvous/relay node.
**Crates:** `libp2p`, `iroh`, `iroh-net`
### WebTransport (HTTP/3)
**Problem:** Browser clients cannot use raw QUIC. The current stack requires a
native Rust binary.
**Solution:** [WebTransport](https://w3c.github.io/webtransport/) exposes
QUIC-like semantics (multiplexed bidirectional streams, datagrams) to browsers
over HTTP/3. A WebTransport endpoint alongside the existing QUIC listener would
enable a web client without WebSocket degradation.
**Architecture impact:** Add a second listener (HTTP/3 + WebTransport) that
terminates WebTransport and bridges into the existing `NodeService` RPC layer.
Cap'n Proto serialisation works in WASM via `capnp` crate.
**Crates:** `h3`, `h3-webtransport`, `wtransport`
### Tor / I2P Integration
**Problem:** MLS protects message content, but connection metadata (who connects
to the server, when, how often) leaks to the server and network observers.
**Solution:** Route client-server connections through
[Tor](https://www.torproject.org/) onion services or
[I2P](https://geti2p.net/) tunnels. This provides metadata resistance at the
network layer.
**Architecture impact:** The server exposes a `.onion` address (Tor) or an I2P
destination. Clients connect through the anonymity network. Latency increases
significantly, so this should be optional.
**Crates:** `arti` (Tor client in Rust), `arti-client`
---
## Storage and Persistence
### SQLCipher / libsql (Turso)
**Problem:** At M6, quicnprotochat needs persistent storage for group state, key
material, and message queues. Storing private keys in a plaintext SQLite database
is insufficient.
**Solution:** [SQLCipher](https://www.zetetic.net/sqlcipher/) provides
transparent, page-level AES-256 encryption for SQLite. Alternatively,
[libsql](https://turso.tech/libsql) (Turso) offers a SQLite fork with
encryption, replication, and embedded server capabilities.
**Architecture impact:** Replace the `sqlx` SQLite backend with SQLCipher.
Encryption key derived from a user-provided passphrase (via Argon2id) or a
hardware-backed key.
**Crates:** `rusqlite` (with `bundled-sqlcipher` feature), `libsql`
### CRDTs (Automerge / Yrs)
**Problem:** Multi-device support requires synchronising state (group membership,
read receipts, settings) across devices without a central authority resolving
conflicts.
**Solution:** Conflict-free replicated data types (CRDTs) allow concurrent edits
to converge without coordination. [Automerge](https://automerge.org/) and
[Yrs](https://docs.rs/yrs/) (Yjs in Rust) provide production-quality CRDT
implementations.
**Architecture impact:** Client-side state (contact list, group membership
cache, read markers) stored as CRDT documents. Synchronisation happens over the
existing MLS-encrypted channel, ensuring the server never sees the state.
**Crates:** `automerge`, `yrs`
### Object Storage (S3-compatible)
**Problem:** Encrypted file and media attachments need a storage backend that
the server can host without seeing the content.
**Solution:** An S3-compatible object store (MinIO, Garage, or a cloud provider)
for encrypted blobs. Clients encrypt attachments client-side (using a key derived
from the MLS group secret) and upload the ciphertext. The server stores and
serves opaque blobs.
**Architecture impact:** Add a media upload/download RPC to `NodeService`. The
server proxies to the object store or returns pre-signed URLs.
**Crates:** `aws-sdk-s3`, `opendal`
---
## Cryptography and Privacy
### ML-KEM + ML-DSA Hybrid (Post-Quantum MLS)
**Problem:** Quantum computers threaten X25519 and Ed25519. While MLS content is
protected by ephemeral key exchange, the init keys and credential signatures are
vulnerable to harvest-now-decrypt-later attacks.
**Solution:** Hybrid X25519 + ML-KEM-768 KEM for MLS init keys, and optionally
hybrid Ed25519 + ML-DSA-65 for credential signatures. The `ml-kem` crate is
already vendored in the workspace.
**Architecture impact:** Custom `OpenMlsCryptoProvider` in `quicnprotochat-core`
implementing the hybrid combiner. This is the M7 milestone -- see
[Milestones](milestones.md#m7----post-quantum-planned) and
[Hybrid KEM](../protocol-layers/hybrid-kem.md).
**Crates:** `ml-kem`, `ml-dsa`
**References:** NIST FIPS 203 (ML-KEM), `draft-ietf-tls-hybrid-design`
### Private Information Retrieval (PIR)
**Problem:** When a client fetches messages or KeyPackages, the server learns
*which* recipient is requesting -- even though it cannot read the content.
**Solution:** Private Information Retrieval (PIR) allows a client to fetch a
record from the server without revealing which record was requested.
[SealPIR](https://github.com/microsoft/SealPIR) and SimplePIR provide practical
constructions.
**Architecture impact:** Replace the `fetch` / `fetchKeyPackage` RPCs with PIR
queries. This is a significant performance trade-off: PIR has high computational
cost. Suitable for KeyPackage fetch (small database) before message fetch (large
database).
### Sealed Sender (Signal-style)
**Problem:** The server sees `(sender, recipient, timestamp)` metadata on every
enqueued message. Even without reading content, this metadata reveals social
graphs.
**Solution:** [Sealed Sender](https://signal.org/blog/sealed-sender/) encrypts
the sender's identity inside the MLS ciphertext. The server routes by
`recipientKey` only and cannot determine who sent the message.
**Architecture impact:** Modify the `enqueue` RPC to omit sender identity from
the server-visible metadata. The sender identity is included only inside the
MLS application message (encrypted).
### Key Transparency (RFC draft)
**Problem:** A compromised server could substitute public keys, performing a
man-in-the-middle attack on MLS group formation.
**Solution:** A verifiable, append-only log of public key bindings (similar to
Certificate Transparency for TLS). Clients verify that the server's response
matches the log before trusting a fetched KeyPackage.
**Architecture impact:** Add a key transparency log (Merkle tree) alongside the
Authentication Service. Clients verify inclusion proofs on every `fetchKeyPackage`
response.
**References:** `draft-ietf-keytrans-protocol`
---
## Identity and Authentication
### DIDs (Decentralized Identifiers)
**Problem:** User identities are currently bound to the server. If the server
goes away, identities are lost.
**Solution:** [Decentralized Identifiers](https://www.w3.org/TR/did-core/)
(`did:key`, `did:web`) provide self-sovereign identity. A user's DID is derived
from their Ed25519 public key and is portable across servers.
**Architecture impact:** Replace raw Ed25519 public keys in MLS credentials with
DID URIs. The server resolves DIDs to public keys for routing.
**Crates:** `did-key`, `ssi`
### OPAQUE (aPAKE)
**Problem:** If quicnprotochat adds password-based account registration, the
server must never see the password -- not even a hash.
**Solution:** [OPAQUE](https://datatracker.ietf.org/doc/rfc9497/) is an
asymmetric password-authenticated key exchange where the server stores only a
one-way transformation of the password. The server cannot perform offline
dictionary attacks.
**Architecture impact:** Replace the registration/login flow with OPAQUE. The
server stores an OPAQUE registration record; the client runs the OPAQUE protocol
to authenticate and derive a session key.
**Crates:** `opaque-ke`
**References:** RFC 9497
### WebAuthn / Passkeys
**Problem:** Password-based auth (even with OPAQUE) is vulnerable to phishing.
Hardware-backed authentication provides stronger device binding.
**Solution:** [WebAuthn](https://www.w3.org/TR/webauthn-3/) / Passkeys allow
authentication via hardware tokens (YubiKey), platform authenticators (Touch ID,
Windows Hello), or synced passkeys.
**Architecture impact:** Add a WebAuthn registration/authentication flow to the
account system. Requires a server-side WebAuthn relying party implementation.
**Crates:** `webauthn-rs`
### Verifiable Credentials (W3C VC)
**Problem:** Proving attributes (organization membership, role, age) without
revealing full identity.
**Solution:** [Verifiable Credentials](https://www.w3.org/TR/vc-data-model/)
allow a user to present cryptographic proofs of attributes issued by a trusted
authority.
**Architecture impact:** Extend MLS credentials with VC presentation. A group
admin could require proof of organization membership before allowing join.
---
## Application Layer
### Matrix-style Federation
**Problem:** A single server is a single point of failure and a single point of
trust. Users on different servers cannot communicate.
**Solution:** Federation allows multiple quicnprotochat servers to exchange
messages, similar to [Matrix](https://matrix.org/) homeserver federation. Each
server manages its own users and relays messages to peer servers.
**Architecture impact:** Major. Requires server-to-server protocol, distributed
identity resolution, and cross-server MLS group management.
### WASM Plugin System
**Problem:** Extensibility (bots, bridges, custom message types) currently
requires forking the codebase.
**Solution:** A sandboxed WASM plugin system allows third-party extensions to run
inside the client or server without access to private key material.
**Architecture impact:** Define a plugin API (message hooks, command handlers).
Plugins compiled to WASM and loaded at runtime via `wasmtime` or `wasmer`.
**Crates:** `wasmtime`, `wasmer`, `extism`
### Double-Ratchet DM Layer
**Problem:** MLS is optimised for groups. For efficient 1:1 conversations, the
Signal double ratchet (X3DH + Axolotl) provides better performance
characteristics (no tree overhead for two parties).
**Solution:** Implement a double-ratchet layer for 1:1 DMs, using MLS only for
groups with N > 2. The [1:1 Channel Design](dm-channels.md) currently uses MLS
for DMs; this would be an optimisation.
**References:** [The Double Ratchet Algorithm](https://signal.org/docs/specifications/doubleratchet/),
[X3DH Key Agreement Protocol](https://signal.org/docs/specifications/x3dh/)
---
## Observability and Operations
### OpenTelemetry (Tracing + Metrics)
**Problem:** The current logging is `tracing`-based but lacks distributed
tracing context and structured metrics export.
**Solution:** [OpenTelemetry](https://opentelemetry.io/) provides a unified
framework for distributed tracing, metrics, and log correlation. OTLP export
enables integration with any observability backend.
**Architecture impact:** Add `tracing-opentelemetry` and `opentelemetry-otlp`
to the server. Instrument RPC handlers with spans. Export to Jaeger, Grafana
Tempo, or any OTLP-compatible backend.
**Crates:** `opentelemetry`, `opentelemetry-otlp`, `tracing-opentelemetry`
### Prometheus + Grafana
**Problem:** No quantitative visibility into server performance (throughput,
latency, queue depth, epoch advancement rate).
**Solution:** Export Prometheus metrics from the server. Visualise with Grafana
dashboards.
**Metrics to export:** message throughput (enqueue/fetch per second), RPC
latency histograms, MLS epoch advancement rate, delivery queue depth, KeyPackage
store size, active connections.
**Crates:** `prometheus`, `metrics`, `metrics-exporter-prometheus`
### Testcontainers-rs
**Problem:** Integration tests currently run server and client in the same
process (`tokio::spawn`). This does not test real network conditions, container
startup, or multi-process interactions.
**Solution:** [Testcontainers-rs](https://docs.rs/testcontainers/) runs Docker
containers from Rust tests, enabling true end-to-end CI with real network
boundaries.
**Architecture impact:** Add testcontainers-based integration tests alongside
the existing in-process tests. The Docker image is already maintained.
**Crates:** `testcontainers`, `testcontainers-modules`
---
## Developer Experience
### Tauri / Dioxus (Native GUI)
**Problem:** The current interface is CLI-only. A graphical client would broaden
the user base for testing and demonstration.
**Solution:** [Tauri](https://tauri.app/) or [Dioxus](https://dioxuslabs.com/)
provide native cross-platform GUI frameworks in Rust. The
`quicnprotochat-core` crate can be shared directly with the GUI client.
**Architecture impact:** Add a `quicnprotochat-gui` crate that depends on
`quicnprotochat-core` and `quicnprotochat-proto`. The GUI drives the same
`GroupMember` and RPC logic as the CLI client.
**Crates:** `tauri`, `dioxus`
### uniffi / diplomat (Mobile FFI)
**Problem:** Mobile clients (iOS, Android) cannot use the Rust binary directly.
**Solution:** [uniffi](https://github.com/aspect-build/aspect-cli) (Mozilla) and
[diplomat](https://github.com/nickelc/diplomat) generate idiomatic Swift and
Kotlin bindings from Rust definitions.
**Architecture impact:** Expose `quicnprotochat-core` through a C-compatible FFI
layer. Mobile apps call into the Rust crypto and protocol logic.
**Crates:** `uniffi`, `diplomat`
### Nix Flakes
**Problem:** The development environment requires `capnp` (Cap'n Proto compiler),
a specific Rust toolchain version, and test infrastructure. Setup varies across
developer machines.
**Solution:** [Nix flakes](https://nixos.wiki/wiki/Flakes) provide a
reproducible, declarative development environment. A single `nix develop`
command sets up the toolchain, `capnp`, and all dependencies.
**Architecture impact:** Add `flake.nix` and `flake.lock` to the repository root.
---
## Top 5 Priority Implementations
The following table ranks the most impactful technologies for near-term adoption,
considering the current state of the codebase and the [milestone plan](milestones.md).
| Priority | Technology | Why | Unlocks |
|----------|-----------|-----|---------|
| 1 | **Post-quantum hybrid KEM** | `ml-kem` is already vendored in the workspace. Completing the hybrid `OpenMlsCryptoProvider` makes quicnprotochat one of the first PQ MLS implementations. | M7 |
| 2 | **SQLCipher persistence** | Encrypted-at-rest storage is the prerequisite for multi-device support, offline usage, and server restart survival. | M6 |
| 3 | **OPAQUE auth** | Zero-knowledge password authentication is a massive security uplift for the account system. The server never sees or stores passwords. | Phase 3 (authz) |
| 4 | **iroh / LibP2P** | NAT traversal and optional P2P mesh makes quicnprotochat deployable without centralised infrastructure. Aligns with the existing QUIC transport. | Beyond M7 |
| 5 | **Sealed Sender + PIR** | Content encryption is table stakes. Metadata resistance (hiding who talks to whom) is the frontier of private messaging research. | Beyond M7 |
---
## Cross-references
- [Milestones](milestones.md) -- current milestone tracker
- [Production Readiness WBS](production-readiness.md) -- phased work breakdown
- [Auth, Devices, and Tokens](authz-plan.md) -- OPAQUE integration point
- [1:1 Channel Design](dm-channels.md) -- double-ratchet optimisation context
- [Hybrid KEM](../protocol-layers/hybrid-kem.md) -- existing PQ design
- [ADR-006: PQ Gap in Noise Transport](../design-rationale/adr-006-pq-gap.md) -- accepted PQ risk
- [References](../appendix/references.md) -- standards and crate documentation

View File

@@ -0,0 +1,194 @@
# Milestone Tracker
This page tracks the project milestones for quicnprotochat, from initial transport
layer through post-quantum cryptography. Each milestone produces production-ready,
tested, deployable code -- see [Coding Standards](../contributing/coding-standards.md)
for what that means in practice.
---
## Milestone Summary
| # | Name | Status | What it adds |
|---|------|--------|-------------|
| M1 | QUIC/TLS Transport | **Complete** | QUIC + TLS 1.3 endpoint, length-prefixed framing, Ping/Pong |
| M2 | Authentication Service | **Complete** | Ed25519 identity, KeyPackage generation, AS upload/fetch |
| M3 | Delivery Service + MLS Groups | **Complete** | DS relay, GroupMember create/join/add/send/recv |
| M4 | Group CLI Subcommands | **Next** | Persistent CLI (create-group, invite, join, send, recv); `demo-group` already available |
| M5 | Multi-party Groups | Planned | N > 2 members, Commit fan-out, Proposal handling |
| M6 | Persistence | Planned | SQLite key store, durable group state |
| M7 | Post-quantum | Planned | PQ hybrid for MLS/HPKE (X25519 + ML-KEM-768) |
---
## M1 -- QUIC/TLS Transport (Complete)
**Goal:** Two processes establish a QUIC connection over TLS 1.3 and exchange
typed Cap'n Proto frames.
**Deliverables:**
- `schemas/envelope.capnp`: `Envelope` struct with `MsgType` enum (Ping/Pong at this stage)
- `quicnprotochat-proto`: `build.rs` invoking `capnpc`, generated type re-exports,
canonical serialisation helpers
- `quicnprotochat-core`: static X25519 keypair generation, Noise\_XX initiator and
responder, length-prefixed Cap'n Proto frame codec (Tokio `Encoder`/`Decoder`)
- `quicnprotochat-server`: QUIC listener with TLS 1.3 (quinn/rustls), Ping to Pong
handler, one tokio task per connection
- `quicnprotochat-client`: connects over QUIC, sends Ping, receives Pong, exits 0
- Integration test: server and client in same test binary using `tokio::spawn`
- `docker-compose.yml` running the server
**Tests:** codec (7 unit tests), keypair (3 unit tests), Noise transport integration.
**Branch:** `feat/m1-noise-transport`
---
## M2 -- Authentication Service (Complete)
**Goal:** Clients register an Ed25519 identity and publish/fetch MLS KeyPackages
via Cap'n Proto RPC.
**Deliverables:**
- `schemas/auth.capnp`: `AuthenticationService` interface (`uploadKeyPackage`,
`fetchKeyPackage`)
- `quicnprotochat-core`: Ed25519 identity keypair generation, MLS KeyPackage
generation via `openmls`
- `quicnprotochat-server`: AS RPC server with `DashMap` store, atomic consume-on-fetch
- `quicnprotochat-client`: `register-state` and `fetch-key` CLI subcommands
- Integration test: Alice uploads KeyPackage, Bob fetches it, fingerprints match
**Tests:** auth\_service.rs integration tests (upload, fetch, consume semantics).
---
## M3 -- Delivery Service + MLS Groups (Complete)
**Goal:** Alice creates a group and adds Bob via MLS Welcome. Both exchange
encrypted application messages through the Delivery Service.
**Deliverables:**
- Unified `NodeService` on port 7000 combining Authentication Service and Delivery
Service into a single Cap'n Proto RPC interface
- `GroupMember` struct with full MLS lifecycle: `create_group`, `add_member`,
`join_from_welcome`, `send_message`, `receive_message`
- DS relay with `enqueue`, `fetch`, and `fetchWait` (long-polling) operations
- `demo-group` subcommand exercising the complete Alice/Bob flow in one process
- Channel-aware delivery: messages routed by `(channelId, recipientKey)`
**Tests:** All passing -- codec (5+ tests), keypair (3 tests), group round-trip,
group\_id lifecycle, MLS integration.
**Key design decisions from M3:**
1. **OpenMlsRustCrypto backend holds the HPKE init key in memory.** The same
`GroupMember` instance that generated the KeyPackage must process the
corresponding Welcome. If the process exits in between, the init private key
is lost. This is by design for M3; persistence comes at M6.
2. **KeyPackage wire format: raw TLS-encoded bytes.** KeyPackages are serialised
using `tls_serialize_detached()` rather than wrapped in `MlsMessageOut`. This
avoids an extra layer of indirection and matches what `openmls` expects on the
receive side via `KeyPackageIn::tls_deserialize_exact()`.
3. **openmls 0.5 API gotchas.** Several `openmls` methods changed signatures
between 0.4 and 0.5 (e.g., `MlsGroup::new` vs `MlsGroup::new_with_group_id`,
`BasicCredential::new` taking `Vec<u8>` directly). These differences are
documented inline in `quicnprotochat-core/src/group.rs`.
**Branch:** `feat/m1-noise-transport`
---
## M4 -- Group CLI Subcommands (Next)
**Goal:** Persistent, composable CLI subcommands for group operations, replacing
the monolithic `demo-group` proof-of-concept.
**Planned deliverables:**
- `create-group` -- creates a new MLS group, stores state locally
- `invite <identity>` -- adds a member by fetching their KeyPackage from the AS
- `join` -- processes a Welcome message and joins an existing group
- `send <message>` -- encrypts and enqueues an application message
- `recv` -- fetches and decrypts pending messages (or long-polls with `fetchWait`)
The `demo-group` subcommand remains available as a single-command demonstration
of the full flow.
---
## M5 -- Multi-party Groups (Planned)
**Goal:** Support groups with N > 2 members, including Commit fan-out and
Proposal handling.
**Planned deliverables:**
- Commit fan-out through the DS to all group members
- Proposal handling (Add, Remove, Update)
- Epoch synchronisation across N members
- Criterion benchmarks: key generation, encap/decap, group-add latency
(10/100/1000 members)
---
## M6 -- Persistence (Planned)
**Goal:** Server survives restart. Client state persists across sessions.
**Planned deliverables:**
- `quicnprotochat-server`: SQLite via `sqlx` for AS key store and DS message log,
`migrations/` directory
- `docker/Dockerfile`: multi-stage build (`rust:bookworm` builder, `debian:bookworm-slim` runtime)
- `docker-compose.yml`: server + SQLite volume, healthcheck
- Client reconnect with session resume (re-handshake + rejoin group epoch from
DS log)
See [Future Research: SQLCipher](future-research.md#storage--persistence) for
encrypted-at-rest options.
---
## M7 -- Post-quantum (Planned)
**Goal:** Replace the MLS crypto backend with a hybrid X25519 + ML-KEM-768 KEM,
providing post-quantum confidentiality for all group key material.
**Planned deliverables:**
- Custom `OpenMlsCryptoProvider` with hybrid KEM in `quicnprotochat-core`
- Hybrid shared secret derivation:
```
SharedSecret = HKDF-SHA256(
ikm = X25519_ss || ML-KEM-768_ss,
info = "quicnprotochat-hybrid-v1",
len = 32
)
```
- All M3/M4/M5 tests pass unchanged with the new ciphersuite
- Follows the combiner approach from `draft-ietf-tls-hybrid-design`
The `ml-kem` crate is already vendored in the workspace. See
[Hybrid KEM](../protocol-layers/hybrid-kem.md) for the detailed design and
[ADR-006: PQ Gap in Noise Transport](../design-rationale/adr-006-pq-gap.md) for
the accepted residual risk in the transport layer.
---
## Cross-references
- [Production Readiness WBS](production-readiness.md) -- phased work breakdown
for hardening beyond the milestone track
- [Auth, Devices, and Tokens](authz-plan.md) -- authentication and authorisation
design that cuts across M4--M6
- [1:1 Channel Design](dm-channels.md) -- DM channel schema and authz model
- [Future Research](future-research.md) -- technology options for M6+ and beyond
- [Testing Strategy](../contributing/testing.md) -- how tests are structured
across milestones

View File

@@ -0,0 +1,226 @@
# Production Readiness WBS
This page defines the work breakdown structure (WBS) for taking quicnprotochat
from a proof-of-concept to a production-hardened system. It covers feature scope,
security policy, phased delivery, and a planning checklist.
For the milestone-by-milestone tracker, see [Milestones](milestones.md). This
document focuses on the cross-cutting concerns that span multiple milestones.
---
## Feature Scope (Must-Have)
These are the feature areas that must be addressed before quicnprotochat can be
considered production-ready. Each area maps to one or more milestones or phases
in the WBS below.
| Area | Description | Primary Milestone |
|------|-------------|-------------------|
| **Identity / Auth** | Account creation, device registration, token-based RPC authentication, MLS identity binding | M4 + Phase 3 |
| **Key / MLS Lifecycle** | KeyPackage rotation, epoch advancement, member removal, credential updates | M5 + Phase 2 |
| **Transport / Delivery** | QUIC + TLS 1.3 hardening, ALPN enforcement, connection draining, reconnect | M1 (done) + Phase 2 |
| **Private 1:1 Channels** | Channel creation, per-channel authz, TTL eviction, DM-specific flows | Phase 4 |
| **Storage / Persistence** | SQLite (or SQLCipher) for AS, DS, client state; migrations; backup/restore | M6 + Phase 6 |
| **Observability / Ops** | Structured logging, metrics, distributed tracing, healthcheck endpoints | Phase 6 |
| **Client Resilience** | Offline queue, retry with backoff, idempotent message IDs, gap detection | Phase 4 |
| **Compatibility / Protocols** | Wire versioning, N-1 client interoperability, ciphersuite negotiation | Phase 2 + Phase 5 |
---
## Security Plan (By Design)
quicnprotochat follows a security-by-design philosophy. The standards below are
non-negotiable -- see [Coding Standards](../contributing/coding-standards.md) for
how they are enforced in code.
### Governance
- `CODEOWNERS` file mapping each crate to a responsible reviewer.
- All PRs require at least one review from a crate owner.
- Security-sensitive changes (crypto, auth, wire format) require two reviewers.
- GPG-signed commits only.
### Transport Policy
- TLS 1.3 only (`rustls` configured with `TLS13` cipher suites exclusively).
- ALPN token `b"capnp"` required; reject connections with mismatched ALPN.
- Self-signed certificates acceptable for development; production deployments
must use a CA-signed certificate or certificate pinning.
- Connection draining on shutdown (QUIC `CONNECTION_CLOSE`).
### MLS Policy
- Ciphersuite: `MLS_128_DHKEMX25519_AES128GCM_SHA256_Ed25519` (baseline).
- Single-use KeyPackages (consumed on fetch, per RFC 9420).
- KeyPackage TTL: 24 hours; clients must rotate before expiry.
- Ciphersuite allowlist: server rejects KeyPackages with unknown ciphersuites.
- No downgrade: once a group has used a ciphersuite, members cannot rejoin with
a weaker one.
### Input Validation
- All incoming Cap'n Proto messages validated against schema before processing.
- Maximum payload size: 5 MB per RPC call.
- Group ID, identity key, and channel ID fields validated for correct length
(32 bytes, 32 bytes, 16 bytes respectively).
- UTF-8 validation on all string fields.
### Secrets Management
- All private key material wrapped in `Zeroizing<T>` (via the `zeroize` crate).
- No secret material in log output at any level.
- No `unwrap()` on cryptographic operations -- all errors are typed and propagated.
- Constant-time comparison for authentication tokens and key fingerprints.
### Abuse / DoS Controls
- Rate limiting: 50 requests/second per IP, per account, and per device.
- Payload cap: 5 MB per message.
- Connection limit: configurable max concurrent QUIC connections.
- KeyPackage upload limit: configurable per account (prevents store exhaustion).
- Long-poll timeout cap: server-enforced maximum for `fetchWait`.
### Data Protection
- MLS ciphertext is opaque to the server (DS never holds group keys).
- Message retention: 7 days default, configurable.
- KeyPackage retention: 24 hours (TTL eviction).
- At-rest encryption for persistent storage (SQLCipher at M6).
### Logging Safety
- Structured logging via `tracing` with `env-filter`.
- Sensitive fields (keys, tokens, ciphertext) are never logged, even at `TRACE`.
- Audit-level events: auth success/failure, token issuance, keypackage upload,
enqueue/fetch, rate limit hits.
### Testing
- Unit tests for all crypto operations (see [Testing Strategy](../contributing/testing.md)).
- Integration tests for every RPC method.
- Negative tests: malformed input, expired tokens, wrong identity, replay attempts.
- N-1 compatibility tests (old client against new server).
- Fuzzing targets for Cap'n Proto parsers and MLS message handling (Phase 5).
---
## Work Breakdown (6 Phases)
### Phase 1 -- Baselines and Governance
**Goal:** Establish project hygiene before adding features.
| Task | Description |
|------|-------------|
| CODEOWNERS | Map crates to responsible reviewers |
| CI pipeline | GitHub Actions: `cargo test --workspace`, `cargo clippy`, `cargo fmt --check`, `cargo deny check` |
| SBOM generation | `cargo-cyclonedx` or `cargo-about` in CI; publish with each release |
| Threat model | Document assets, adversaries, attack surface, trust boundaries; reference in [Threat Model](../cryptography/threat-model.md) |
| Dependency audit | `cargo audit` in CI; pin all major versions per [Coding Standards](../contributing/coding-standards.md) |
### Phase 2 -- Protocols and Core Hardening
**Goal:** Lock down the wire format and cryptographic policy.
| Task | Description |
|------|-------------|
| Wire versioning | Add `version` field to all Cap'n Proto structs; reject unknown versions |
| Ciphersuite allowlist | Server rejects KeyPackages outside the allowed set |
| Downgrade guards | Prevent epoch rollback; reject Commits with weaker ciphersuites |
| ALPN enforcement | Reject connections without `b"capnp"` ALPN token |
| Connection draining | Graceful QUIC `CONNECTION_CLOSE` on server shutdown |
| KeyPackage rotation | Client-side timer to upload fresh KeyPackages before TTL expiry |
### Phase 3 -- Auth, Device, and Server Hardening
**Goal:** Add account/device identity and token-based authentication.
See [Auth, Devices, and Tokens](authz-plan.md) for the full design.
| Task | Description |
|------|-------------|
| Account + device model | `{account_id, device_id, device_pubkey}` with status lifecycle |
| Token issuance | Access + refresh tokens; configurable expiry |
| RPC auth middleware | Validate token on every RPC; map to account/device |
| Identity binding | Bind MLS identity key to account; reject mismatched uploads |
| Rate limiting | Per-IP, per-account, per-device counters |
| Audit logging | Auth events, token lifecycle, rate limit hits |
### Phase 4 -- Delivery Semantics and Client Resilience
**Goal:** Reliable message delivery and 1:1 channels.
See [1:1 Channel Design](dm-channels.md) for the DM-specific design.
| Task | Description |
|------|-------------|
| Idempotent message IDs | Client-generated UUIDs; server deduplicates |
| Ordering guarantees | Per-channel sequence numbers; client detects gaps |
| Offline queue | Server retains messages for offline recipients (up to TTL) |
| 1:1 channels | Channel creation, membership, per-channel authz |
| TTL eviction | Background sweep + fetch-time check for expired messages |
| Client retry | Exponential backoff with jitter on transient failures |
### Phase 5 -- E2E Harness and Security Tests
**Goal:** Automated end-to-end testing and security validation.
| Task | Description |
|------|-------------|
| docker-compose testnet | Multi-node test environment with configurable topology |
| Positive E2E tests | Full group lifecycle: register, create, invite, join, send, recv, leave |
| Negative E2E tests | Expired tokens, wrong identity, replay, malformed messages |
| Compat matrix | N-1 client/server version testing |
| Fuzz targets | `cargo-fuzz` targets for Cap'n Proto parsers, MLS message handlers |
| Golden-wire fixtures | Serialised test vectors for regression testing across versions |
### Phase 6 -- Reliability, Performance, and Operations
**Goal:** Production-grade operations and performance validation.
| Task | Description |
|------|-------------|
| SQLite/SQLCipher persistence | AS key store, DS message log, client state (M6) |
| Soak testing | 72-hour continuous operation under synthetic load |
| Load testing | Throughput and latency benchmarks (Criterion + custom harness) |
| Chaos testing | Network partitions, process crashes, disk full scenarios |
| Backup / restore | SQLite backup with integrity verification |
| Canary / rollback | Rolling deployment strategy with automatic rollback on failure |
| Metrics + dashboards | Prometheus metrics, Grafana dashboards (see [Future Research](future-research.md)) |
---
## Planning Checklist
Use this checklist when planning a new milestone or phase. Each item should have
a documented decision before implementation begins.
- [ ] **Release criteria / SLOs** -- Define what "done" means. Latency targets,
error rate thresholds, test coverage minimums.
- [ ] **Threat model review** -- Update the [Threat Model](../cryptography/threat-model.md)
for any new attack surface introduced by this phase.
- [ ] **Protocol policy** -- Ciphersuite allowlist, wire version, downgrade rules.
- [ ] **Identity / auth model** -- Who authenticates, how, and what operations
are gated.
- [ ] **Data model** -- Schema changes, migrations, backward compatibility.
- [ ] **Abuse controls** -- Rate limits, size caps, connection limits for this phase.
- [ ] **Observability contracts** -- What new metrics, logs, and traces are needed.
- [ ] **Environments / secrets** -- Dev, staging, production configuration;
secret rotation plan.
- [ ] **Testing matrix** -- Unit, integration, E2E, negative, fuzz, compat tests
for this phase.
- [ ] **Rollout / ops** -- Deployment strategy, rollback plan, monitoring during
rollout.
---
## Cross-references
- [Milestones](milestones.md) -- feature milestone tracker
- [Auth, Devices, and Tokens](authz-plan.md) -- Phase 3 design
- [1:1 Channel Design](dm-channels.md) -- Phase 4 design
- [Future Research](future-research.md) -- technology options for Phase 6+
- [Coding Standards](../contributing/coding-standards.md) -- engineering standards
- [Testing Strategy](../contributing/testing.md) -- test structure and conventions
- [Threat Model](../cryptography/threat-model.md) -- security analysis