quicproquo/docs/SECURITY-AUDIT.md

# Security Audit

This document is a security audit of the quicprochat codebase as of the audit date. It aligns with the [Threat Model](src/cryptography/threat-model.md) and [Production Readiness Audit](PRODUCTION-READINESS-AUDIT.md). The project has **not** undergone a formal third-party audit; this is an internal review.

---

## Executive Summary

| Area | Finding | Severity |
|------|---------|----------|
| Authentication & sessions | Token comparison constant-time; session tokens CSPRNG; OPAQUE used correctly; no secrets in logs | ✅ Strong |
| Cryptography | MLS, Ed25519, hybrid KEM, zeroization where appropriate; Argon2/ChaCha20 for state | ✅ Strong |
| Transport (TLS) | TLS 1.3 only; client verifies server cert; self-signed default is documented weakness | ⚠️ Known gap |
| Authorization (DS/AS) | Enqueue/fetch/fetchWait/key ops require auth + identity binding (or sealed_sender); health unauthenticated by design | ✅ Appropriate |
| Input validation & limits | Key/recipient length, payload/KeyPackage size, queue depth, rate limit, UTF-8 username | ✅ Good |
| Secrets handling | No tokens/keys/passwords in logs; DB key optional (documented); state encryption optional | ✅ Good |
| Dependency hygiene | No `cargo audit` in tree; recommend adding and running in CI | ⚠️ Recommendation |

**Overall:** The design and implementation are security-conscious and match the documented threat model. Remaining risks are largely documented (self-signed TLS, metadata visibility, BasicCredential) or operational (deps, production config).

---

## 1. Authentication and Session Management

### 1.1 Token comparison

- **Location:** `crates/quicprochat-server/src/auth.rs`
- **Finding:** Bearer token and identity key comparisons use `subtle::ConstantTimeEq` (`ct_eq`). Length is checked before comparison where applicable.
- **Status:** ✅ No timing leakage from token or identity comparison.

### 1.2 Session token generation

- **Location:** `crates/quicprochat-server/src/node_service/auth_ops.rs` (login finish)
- **Finding:** Session tokens are 32 bytes from `rand::RngCore::fill_bytes(&mut rand::rngs::OsRng, &mut token)`. Stored in `sessions` with TTL (24h). Expired sessions are removed on next use.
- **Status:** ✅ Cryptographically strong, single-use style (opaque 32-byte token).

### 1.3 OPAQUE (RFC 9497)

- **Location:** `crates/quicprochat-core/src/opaque_auth.rs`, server `auth_ops.rs`
- **Finding:** Shared `OpaqueSuite` (Ristretto255, Triple-DH, Argon2id). Server never sees password. Registration and login flows use `ServerRegistration`/`ServerLogin` correctly. Pending login state is stored server-side and removed on consume. Identity key is bound at login finish; mismatch returns E016 and is not logged with secrets.
- **DoS:** Pending-login check runs **before** expensive OPAQUE work (login start); repeated attempts for the same username within 60s are rejected early.
- **Status:** ✅ Correct usage; DoS mitigation in place.

### 1.4 Audit logging (no secrets)

- **Finding:** Comments in `auth_ops.rs` and `delivery.rs` explicitly forbid logging tokens, passwords, or raw keys. Logged fields: username, recipient/key prefix (`fmt_hex`), payload length, seq, counts. Login success/failure and rate-limit hit are logged without session token or identity.
- **Status:** ✅ No sensitive material in logs.

---

## 2. Cryptography

### 2.1 MLS and identity

- **Location:** `quicprochat-core` (group, identity, keypackage)
- **Finding:** MLS ciphersuite `MLS_128_DHKEMX25519_AES128GCM_SHA256_Ed25519` (RFC 9420). Ed25519 identity seed stored in `Zeroizing<[u8; 32]>`; zeroize-on-drop. KeyPackages validated for ciphersuite before server stores. Single-use KeyPackage semantics enforced (consume-on-fetch).
- **Status:** ✅ Aligns with key lifecycle and zeroization goals.

### 2.2 Hybrid KEM (X25519 + ML-KEM-768)

- **Location:** `crates/quicprochat-core/src/hybrid_kem.rs`
- **Finding:** Hybrid keypair and shared secrets use `Zeroizing` where appropriate. HKDF domain separation (`quicprochat-hybrid-v1`). ChaCha20-Poly1305 for AEAD. Versioned envelope.
- **Status:** ✅ PQ-ready envelope layer; secret handling is careful.

### 2.3 Client state encryption (QPCE)

- **Location:** `crates/quicprochat-client/src/client/state.rs`
- **Finding:** Optional password protection: Argon2id (default params) for key derivation, ChaCha20-Poly1305, random salt and nonce. Derived key held in `Zeroizing` during use. Unencrypted state is a documented option (e.g. dev).
- **Recommendation:** Document Argon2 params (memory, iterations) for auditability; consider explicit `Argon2::new()` with named params in a future revision.
- **Status:** ✅ Appropriate for optional at-rest protection.

---

## 3. Transport (TLS / QUIC)

### 3.1 Server TLS

- **Location:** `crates/quicprochat-server/src/tls.rs`
- **Finding:** TLS 1.3 only. No client cert. ALPN `capnp`. When not in production, missing cert/key triggers self-signed generation; key file permissions set to `0o600` on Unix. Production mode requires existing cert/key (no auto-generation).
- **Status:** ✅ Matches documented design; self-signed limitation is documented in threat model.

### 3.2 Client TLS

- **Location:** `crates/quicprochat-client/src/client/rpc.rs`
- **Finding:** Client loads CA cert from file, builds `RootCertStore` with that single cert, uses it for server verification. Server name from CLI/env is used for connection (SNI and cert verification). No custom bypass.
- **Status:** ✅ Proper verification against provided CA; trust-on-first-use / self-signed caveat is documented.

---

## 4. Authorization and Access Control

### 4.1 RPC auth matrix

| RPC | Auth required | Identity binding |
|-----|----------------|------------------|
| health | No | N/A (liveness) |
| opaqueLoginStart/Finish, opaqueRegisterStart/Finish | No (password/session flow) | After login |
| uploadKeyPackage, fetchKeyPackage | Yes | Must match identity_key (or allow_insecure) |
| enqueue | Yes | Must match recipient_key unless sealed_sender |
| fetch, fetchWait | Yes | Must match recipient_key (or allow_insecure) |
| uploadHybridKey, fetchHybridKey | Yes | Must match identity_key (or allow_insecure) |
| publishEndpoint, resolveEndpoint | Yes | Publish: match identity_key; Resolve: any valid token |

- **Finding:** Sensitive operations require `validate_auth_context` and, where relevant, `require_identity_or_request`. Fetch/fetchWait ensure the authenticated identity matches the requested recipient_key, so only the recipient (or someone with their session) can pull messages. With `sealed_sender`, enqueue only requires a valid token (no identity binding to sender).
- **Status:** ✅ Authorization consistent with design; sealed_sender trade-off is documented.

### 4.2 Rate limiting

- **Location:** `crates/quicprochat-server/src/auth.rs`, `delivery.rs`
- **Finding:** Per-token rate limit (e.g. 100 enqueues per 60s). Enqueue path checks before storage. Queue depth and payload size caps (1000 messages, 5 MB) enforced.
- **Status:** ✅ Limits in place to curb abuse and DoS.

---

## 5. Input Validation and Limits

- **Identity/recipient keys:** Rejected unless length exactly 32 bytes (E004).
- **Payload:** Non-empty; max 5 MB (E005, E006).
- **KeyPackage:** Non-empty; max 1 MB (E007, E008); ciphersuite validated before store (E021).
- **Username:** Non-empty; must be valid UTF-8 (E011, E020).
- **Wire version:** Rejected if &gt; CURRENT_WIRE_VERSION (E012).
- **Cap'n Proto:** Server and client set `traversal_limit_in_words(Some(4 * 1024 * 1024))` (4M words = 32 MiB) to bound parsing DoS.
- **Status:** ✅ Validation and limits are consistently applied.

---

## 6. Storage and Persistence

### 6.1 Server

- **File-backed store:** Mutex-protected; lock errors mapped to `StorageError` (no unwrap in hot path). OPAQUE server setup file permissions `0o600` on Unix.
- **SQL store:** Optional SQLCipher with `db_key`; empty key = plaintext (documented). Production validation requires non-empty `db_key` when backend is SQL. User records use INSERT (no OR REPLACE) and unique constraint; duplicate user returns `StorageError::DuplicateUser` (E018).
- **Status:** ✅ Matches production-readiness and auth design; DB encryption caveat documented.

### 6.2 Client

- **State file:** Optional QPCE encryption (Argon2id + ChaCha20-Poly1305). Unencrypted state contains identity seed; documented.
- **Keystore:** Persisted for HPKE init keys so Welcome can be processed after restart; path and format documented.
- **Status:** ✅ Acceptable for threat model; optional encryption and handling of secrets are clear.

---

## 7. Known Gaps (from Threat Model and Docs)

These remain as documented, not new findings:

1. **Self-signed TLS:** MITM possible on first connection if client does not pin or verify out-of-band. Mitigation: certificate pinning or CA-signed certs.
2. **No client auth on DS (by design):** Anyone with a valid token can enqueue to any recipient_key when identity is not required (e.g. sealed_sender). Rate limit and queue/payload caps mitigate abuse.
3. **BasicCredential only:** No CA or revocation; key substitution possible if AS is compromised. Mitigation: Key Transparency or X.509 credentials.
4. **Metadata:** Server sees recipient_key, timing, sizes; Sealed Sender and PIR are future mitigations.

---

## 8. Recommendations

### 8.1 High value

- **Dependency audit:** Run `cargo install cargo-audit` then `cargo audit` locally (and in CI if available) to check for known-vulnerable dependencies. Fix or document any findings. See [Checking dependencies](#checking-dependencies) below.
- **Argon2 params:** Implemented: client state KDF now uses explicit Argon2id parameters (19 MiB memory, 2 iterations, 1 lane) in `quicprochat-client` so they are auditable.

### 8.2 Medium value

- **Certificate pinning:** To pin the server, use the server's certificate as the client's `ca_cert` (e.g. copy `server-cert.der` from the server and pass it via `--ca-cert` or `QPC_CA_CERT`). Do not use a general CA unless you intend to trust that CA for all servers. See [Certificate pinning](#certificate-pinning) below.
- **Health endpoint:** The `health` RPC is unauthenticated by design for liveness probes and load balancers; this is documented in code and in this audit.

### 8.3 Lower priority

- **Cap'n Proto traversal limit:** Implemented: reduced to 4M words (32 MiB) with a named constant; trade-off documented in code.
- **Username enumeration:** OPAQUE login start uses `get_user_record`; timing or response shape might still reveal user existence. Mitigation in scope: consider constant-time or uniform response for unknown users in a future revision (e.g. fixed dummy work when user not found) without weakening OPAQUE. Current code does not implement this.

---

## 9. Summary Table

| Category | Item | Status |
|----------|------|--------|
| Auth | Constant-time token/identity comparison | ✅ |
| Auth | Session token from CSPRNG, not logged | ✅ |
| Auth | OPAQUE used correctly; no password on server | ✅ |
| Auth | Pending-login DoS check before OPAQUE work | ✅ |
| Auth | No secrets in audit logs | ✅ |
| Crypto | MLS ciphersuite, KeyPackage validation | ✅ |
| Crypto | Identity seed zeroized on drop | ✅ |
| Crypto | Hybrid KEM and state encryption use Zeroizing | ✅ |
| Transport | TLS 1.3 only; client verifies server cert | ✅ |
| Transport | Self-signed default (documented weakness) | ⚠️ Known |
| Authz | Enqueue/fetch/key/p2p require auth + identity where required | ✅ |
| Authz | Rate limit and size/depth limits | ✅ |
| Input | Lengths, sizes, UTF-8, wire version, traversal limit | ✅ |
| Storage | Server: lock Result, DB key in prod, duplicate user | ✅ |
| Storage | Client: optional QPCE; explicit Argon2 params; unencrypted state documented | ✅ |
| Deps | Run `cargo audit` locally/CI (see [Checking dependencies](#checking-dependencies)) | ⚠️ Recommend |

---

## Checking dependencies

To check for known vulnerabilities in dependencies:

```bash
cargo install cargo-audit
cargo audit
```

Fix or document any reported issues. Running `cargo audit` in CI (e.g. GitHub Actions) is recommended.

---

## Certificate pinning

The client trusts the server certificate(s) in the file given by `--ca-cert` (or `QPC_CA_CERT`). To **pin** a specific server:

1. Obtain the server's certificate (e.g. copy `data/server-cert.der` from the server, or export from your deployment).
2. Use that file as the client's `ca_cert`. The client will only connect to a server that presents that exact certificate (or chain).
3. Do not use a broad CA bundle as `ca_cert` unless you intend to trust any server certified by that CA.

This gives trust-on-first-use behaviour when you deploy the server and then distribute its cert to clients.

---

## Related Documents

- [Threat Model](src/cryptography/threat-model.md)
- [Production Readiness Audit](PRODUCTION-READINESS-AUDIT.md)
- [Cryptography Overview](src/cryptography/overview.md)
- [Key Lifecycle and Zeroization](src/cryptography/key-lifecycle.md)