fix: security hardening — 40 findings from full codebase review

Full codebase review by 4 independent agents (security, architecture,
code quality, correctness) identified ~80 findings. This commit fixes 40
of them across all workspace crates.

Critical fixes:
- Federation service: validate origin against mTLS cert CN/SAN (C1)
- WS bridge: add DM channel auth, size limits, rate limiting (C2)
- hpke_seal: panic on error instead of silent empty ciphertext (C3)
- hpke_setup_sender_and_export: error on parse fail, no PQ downgrade (C7)

Security fixes:
- Zeroize: seed_bytes() returns Zeroizing<[u8;32]>, private_to_bytes()
  returns Zeroizing<Vec<u8>>, ClientAuth.access_token, SessionState.password,
  conversation hex_key all wrapped in Zeroizing
- Keystore: 0o600 file permissions on Unix
- MeshIdentity: 0o600 file permissions on Unix
- Timing floors: resolveIdentity + WS bridge resolve_user get 5ms floor
- Mobile: TLS verification gated behind insecure-dev feature flag
- Proto: from_bytes default limit tightened from 64 MiB to 8 MiB

Correctness fixes:
- fetch_wait: register waiter before fetch to close TOCTOU window
- MeshEnvelope: exclude hop_count from signature (forwarding no longer
  invalidates sender signature)
- BroadcastChannel: encrypt returns Result instead of panicking
- transcript: rename verify_transcript_chain → validate_transcript_structure
- group.rs: extract shared process_incoming() for receive_message variants
- auth_ops: remove spurious RegistrationRequest deserialization
- MeshStore.seen: bounded to 100K with FIFO eviction

Quality fixes:
- FFI error classification: typed downcast instead of string matching
- Plugin HookVTable: SAFETY documentation for unsafe Send+Sync
- clippy::unwrap_used: warn → deny workspace-wide
- Various .unwrap_or("") → proper error returns

Review report: docs/REVIEW-2026-03-04.md
152 tests passing (72 core + 35 server + 14 E2E + 1 doctest + 30 P2P)
This commit is contained in:
2026-03-04 07:52:12 +01:00
parent 4694a3098b
commit 394199b19b
58 changed files with 3893 additions and 414 deletions

317
docs/REVIEW-2026-03-04.md Normal file
View File

@@ -0,0 +1,317 @@
# Consolidated Codebase Review — quicproquo
**Date:** 2026-03-04
**Reviewers:** 4 independent agents (security, architecture, code quality, correctness)
**Scope:** Full codebase — all workspace crates, schemas, Cargo.toml
---
## CRITICAL (7 findings)
### C1. Federation service has NO authentication on inbound requests
**Source:** Security | **File:** `crates/quicproquo-server/src/federation/service.rs:22-201`
`FederationServiceImpl` handles inbound federation requests (`relay_enqueue`, `relay_batch_enqueue`, `proxy_fetch_key_package`, `proxy_resolve_user`) but performs **zero authentication** on the caller. The `auth` field in the request is only logged (`origin` string), never validated. While mTLS protects the transport, any server with a valid federation certificate can inject arbitrary messages, enumerate users, and fetch KeyPackages. The `FederationAuth.origin` field is a self-declared string, not verified against the mTLS certificate's subject.
**Fix:** Validate `origin` against the mTLS client certificate's CN/SAN. Enforce per-peer rate limits. Consider signing federation messages at the application layer.
### C2. WebSocket bridge bypasses DM channel authorization
**Source:** Security | **File:** `crates/quicproquo-server/src/ws_bridge.rs:230-305`
The `handle_send` function lets any authenticated user enqueue a message to any recipient, bypassing the DM channel membership verification that the Cap'n Proto `enqueue` path enforces in `delivery.rs:93-135`. The WS bridge calls `store.enqueue()` directly, skipping channel membership auth, payload size limits (5 MB), rate limiting, hook invocations, delivery proof generation, and audit logging.
**Fix:** Apply the same authorization, size limits, rate limiting, hooks, and audit logging as the Cap'n Proto delivery path.
### C3. `hpke_seal` silently returns empty ciphertext on error
**Source:** Quality | **File:** `crates/quicproquo-core/src/hybrid_crypto.rs:198-201,216-219`
`HybridCrypto::hpke_seal()` catches errors and returns `Ok(vec![])` instead of propagating. Empty ciphertexts are sent as if valid — data loss and security issue.
**Fix:** Propagate errors via `Result`.
### C4. NodeServiceImpl god object (15 fields, 27 RPC methods, no capability segmentation)
**Source:** Architecture | **File:** `crates/quicproquo-server/src/node_service/mod.rs:253-336`
Single struct implements 27 methods spanning auth, delivery, key management, channels, blobs, devices, federation, and account lifecycle. `handle_node_connection` takes 15 parameters. Unauthenticated clients get capability to invoke all 27 methods.
**Fix:** Extract `ServerContext` struct. Split Cap'n Proto schema into capability interfaces (AuthService, DeliveryService, etc.) vended after auth.
### C5. FileBackedStore O(n) full-map serialization on every mutation
**Source:** Architecture | **File:** `crates/quicproquo-server/src/storage.rs:327-442`
Every write locks Mutex, mutates HashMap, serializes **entire** map to disk via `fs::write`. No fsync, no atomic rename. Performance cliff and data-loss vector.
**Fix:** Make SqlStore the default. If FileBackedStore remains for dev, use write-to-temp-then-rename.
### C6. `std::sync::Mutex` in async context (server and P2P)
**Source:** Architecture | **Files:** `storage.rs:1-7,25-28`, `sql_store.rs:4,50`, `node_service/mod.rs:272`
Holding `std::sync::Mutex` across disk I/O blocks Tokio worker threads, causing head-of-line blocking.
**Fix:** Replace with `tokio::sync::Mutex` or use `spawn_blocking` for disk I/O.
### C7. `hpke_setup_sender_and_export` silently downgrades to classical crypto on parse error
**Source:** Security | **File:** `crates/quicproquo-core/src/hybrid_crypto.rs:263`
On invalid hybrid public key, silently falls back to classical RustCrypto provider. Malicious server can force PQ downgrade.
**Fix:** Return error on hybrid key parse failure, consistent with `hpke_seal`.
---
## HIGH (14 findings)
### H1. Global `AUTH_CONTEXT: RwLock<Option<ClientAuth>>`
**Source:** Architecture + Quality | **File:** `crates/quicproquo-client/src/lib.rs:36,40,82`
Blocks multi-account, creates hidden coupling, root cause of `AUTH_LOCK` test serialization hack.
**Fix:** Replace with `ClientContext` struct passed to all functions.
### H2. Store trait is 30+ method monolith
**Source:** Architecture | **File:** `crates/quicproquo-server/src/storage.rs:33-180`
Any new storage backend must implement every method. Cannot be composed or partially implemented.
**Fix:** Split into sub-traits: `KeyPackageStore`, `DeliveryStore`, `UserStore`, `ChannelStore`, etc.
### H3. CoreError::Mls wraps errors as String, losing type info
**Source:** Architecture | **File:** `crates/quicproquo-core/src/error.rs:16-17`
Impossible to match on specific MLS error conditions.
**Fix:** Create MLS sub-error variants or wrap boxed error.
### H4. Proto `from_bytes` uses default 64 MiB traversal limit
**Source:** Architecture | **File:** `crates/quicproquo-proto/src/lib.rs:67-72`
DoS amplification vector for direct callers (client, bot, FFI).
**Fix:** Accept `ReaderOptions` as parameter, make default stricter.
### H5. Mobile crate hardcodes SkipServerVerification
**Source:** Architecture + Security | **File:** `crates/quicproquo-mobile/src/lib.rs:93-100,165-172`
Unconditionally skips TLS verification. Inherently MITM-vulnerable.
**Fix:** Add certificate_verifier parameter or feature flag.
### H6. Duplicate InsecureServerCertVerifier implementations
**Source:** Architecture | **Files:** `client/rpc.rs:27-29`, `mobile/lib.rs:165-167`
**Fix:** Consolidate into shared crate behind `cfg(feature = "insecure")`.
### H7. DiskKeyStore writes HPKE private keys to disk unencrypted
**Source:** Security | **File:** `crates/quicproquo-core/src/keystore.rs`
No encryption, no file permissions. HPKE private keys are MLS epoch secrets.
**Fix:** Encrypt with Argon2id + ChaCha20-Poly1305. Set 0o600 permissions.
### H8. `identity.rs:seed_bytes()` returns unzeroized copy of secret seed
**Source:** Security | **File:** `crates/quicproquo-core/src/identity.rs:52`
Copies 32-byte Ed25519 seed out of `Zeroizing` wrapper. Returned value not zeroized.
**Fix:** Return `&[u8]` reference or `Zeroizing<[u8; 32]>`.
### H9. `hybrid_kem.rs:private_to_bytes()` returns unzeroized `Vec<u8>`
**Source:** Security | **File:** `crates/quicproquo-core/src/hybrid_kem.rs:162`
Hybrid private key material lingers in memory.
**Fix:** Return `Zeroizing<Vec<u8>>`.
### H10. MeshIdentity stores Ed25519 seed as plaintext JSON
**Source:** Security | **File:** `crates/quicproquo-p2p/src/identity.rs:72-79`
No encryption, no file permissions.
**Fix:** Encrypt identity file, set 0o600 permissions.
### H11. WebSocket bridge has rate_limits field but never checks it
**Source:** Security | **File:** `crates/quicproquo-server/src/ws_bridge.rs:28-36`
**Fix:** Apply `check_rate_limit()` in all WS bridge handlers.
### H12. ~100 lines duplicated between `receive_message` and `receive_message_with_sender`
**Source:** Quality | **File:** `crates/quicproquo-core/src/group.rs:471-583`
Bug fix in one must be manually replicated to other. Security-critical MLS code.
**Fix:** Extract shared MLS message processing helper.
### H13. Synchronous file I/O in async blob handler
**Source:** Quality | **File:** `crates/quicproquo-server/src/node_service/blob_ops.rs:124-137`
Blocking `std::fs` calls in async handler stall event loop.
**Fix:** Use `tokio::fs` or `spawn_blocking`.
### H14. `MeshEnvelope::forwarded()` invalidates signature without re-signing
**Source:** Quality + Correctness + Security | **File:** `crates/quicproquo-p2p/src/envelope.rs:172-176`
Increments `hop_count` included in signed bytes. All forwarded envelopes fail `verify()`.
**Fix:** Exclude `hop_count` from signature, or add separate forwarding signature.
---
## MEDIUM (19 findings)
### M1. fetch_wait TOCTOU: missed notification window
**Source:** Correctness | **File:** `crates/quicproquo-server/src/node_service/delivery.rs:496-522`
TOCTOU between initial fetch (empty) and waiter registration. Enqueue between these points fires notify before waiter exists.
**Fix:** Register waiter before initial fetch.
### M2. `verify_transcript_chain` never checks hashes — misleading name
**Source:** Correctness | **File:** `crates/quicproquo-core/src/transcript.rs:215-251`
Only validates structural integrity, not hash chain. Name implies verification.
**Fix:** Rename to `validate_transcript_structure` or implement actual chain verification.
### M3. Non-atomic file writes in FileBackedStore
**Source:** Correctness | **File:** `crates/quicproquo-server/src/storage.rs:332-337` (all flush_* methods)
`fs::write` directly — crash mid-write corrupts file, loses all data.
**Fix:** Use `tempfile::NamedTempFile` + `persist()`.
### M4. `delete_account` non-atomic multi-lock
**Source:** Correctness | **File:** `crates/quicproquo-server/src/storage.rs:800-864`
6 sequential Mutex locks. Concurrent fetch could see partially deleted account.
**Fix:** Use single transaction or hold all locks simultaneously.
### M5. Timing side channel in `resolveIdentity` — no timing floor
**Source:** Security | **File:** `crates/quicproquo-server/src/node_service/user_ops.rs:142-178`
Unlike `resolveUser` which has 5ms floor.
**Fix:** Apply same `RESOLVE_TIMING_FLOOR`.
### M6. WS bridge `resolve_user` has no timing floor
**Source:** Security | **File:** `crates/quicproquo-server/src/ws_bridge.rs:158-181`
**Fix:** Add same timing floor as Cap'n Proto handler.
### M7. `AuthContext.token` not zeroized
**Source:** Security | **File:** `crates/quicproquo-server/src/auth.rs:68-72`
**Fix:** Wrap in `Zeroizing<Vec<u8>>`.
### M8. Client `ClientAuth.access_token` not zeroized
**Source:** Security | **File:** `crates/quicproquo-client/src/lib.rs:50-55`
**Fix:** Use `Zeroizing<Vec<u8>>`.
### M9. `SessionState.password` stores plaintext password in memory
**Source:** Security | **File:** `crates/quicproquo-client/src/client/session.rs:29`
**Fix:** Use `Zeroizing<String>`, derive key at startup and zeroize password.
### M10. `conversation.rs:172` hex-encodes derived key without zeroization
**Source:** Security | **File:** `crates/quicproquo-client/src/client/conversation.rs:172`
**Fix:** Use `Zeroizing<String>` for `hex_key`.
### M11. `device_ops.rs:49` uses `.unwrap_or("")` on untrusted input
**Source:** Security | **File:** `crates/quicproquo-server/src/node_service/device_ops.rs:49`
**Fix:** Return error for invalid UTF-8.
### M12. `MeshEnvelope::forwarded()` invalidates signature (duplicate of H14)
**Source:** Security | **File:** `crates/quicproquo-p2p/src/envelope.rs:172-176`
### M13. FileBackedStore `create_channel` O(n) linear scan
**Source:** Architecture + Quality + Correctness | **File:** `crates/quicproquo-server/src/storage.rs:749-765`
**Fix:** Secondary index or deterministic channel ID from member pair hash.
### M14. `resolve_identity_key` O(n) linear scan
**Source:** Architecture + Quality | **File:** `crates/quicproquo-server/src/storage.rs:676-684`
**Fix:** Maintain reverse map.
### M15. FFI error classification by string matching
**Source:** Architecture | **File:** `crates/quicproquo-ffi/src/lib.rs:183`
**Fix:** Match on typed error variants.
### M16. Documentation drift: master-prompt.md says Noise/TCP, code uses QUIC/TLS
**Source:** Architecture | **Files:** `master-prompt.md`, server/client `Cargo.toml`
**Fix:** Update master-prompt.md to reflect actual transport.
### M17. Plugin `HookVTable` unsafe Send+Sync without safety docs
**Source:** Architecture | **File:** `crates/quicproquo-plugin-api/src/lib.rs:190-192`
**Fix:** Add `// SAFETY:` documentation blocks.
### M18. OPAQUE register_finish: spurious RegistrationRequest deserialization
**Source:** Quality + Correctness | **File:** `crates/quicproquo-server/src/node_service/auth_ops.rs:335-343`
Dead code — deserializes upload_bytes as wrong type first.
**Fix:** Remove lines 335-343.
### M19. Mixed serialization formats in DiskKeyStore (bincode + serde_json)
**Source:** Quality | **File:** `crates/quicproquo-core/src/keystore.rs`
**Fix:** Standardize on one format.
---
## LOW (23 findings)
### L1. `BroadcastChannel.key` not zeroized on drop
**File:** `crates/quicproquo-p2p/src/broadcast.rs:18-19`
### L2. Plugin loader `CStr::from_ptr` on plugin-returned string (UB risk)
**File:** `crates/quicproquo-server/src/plugin_loader.rs:102`
### L3. Token cache stores session token as plaintext hex when no password set
**File:** `crates/quicproquo-client/src/client/token_cache.rs:63-68`
### L4. `--password` CLI flag visible in process list
**File:** `crates/quicproquo-client/src/main.rs:104`
### L5. `clippy::unwrap_used` is warn not deny
**File:** `Cargo.toml:88`
### L6. `strip = "symbols"` hinders post-mortem debugging
**File:** `Cargo.toml:94`
### L7. `thread_rng()` for channel ID generation instead of `OsRng`
**Files:** `storage.rs:760`, `sql_store.rs:545`
### L8. `conversation.rs:548` — `limit` cast from `usize` to `u32` without saturation
**File:** `crates/quicproquo-client/src/client/conversation.rs:548`
### L9. `conversation.rs:363-365,420-422` — bincode deserialize errors silently dropped
**File:** `crates/quicproquo-client/src/client/conversation.rs`
### L10. `repl.rs:610` — static AtomicU64 for padding timer instead of session state
**File:** `crates/quicproquo-client/src/client/repl.rs:610`
### L11. `command_engine.rs:148` — `to_slash()` clones entire Command enum unnecessarily
**File:** `crates/quicproquo-client/src/client/command_engine.rs:148`
### L12. `conversation.rs:201-203` — SQL ATTACH with format string
**File:** `crates/quicproquo-client/src/client/conversation.rs:201-203`
### L13. Client `hex.rs` trivial wrapper with zero value-add
**File:** `crates/quicproquo-client/src/client/hex.rs`
### L14. `config.rs` `#[allow(dead_code)]` on EffectiveFederationConfig
**File:** `crates/quicproquo-server/src/config.rs`
### L15. `federation/address.rs` `#[allow(dead_code)]` on entire module
**File:** `crates/quicproquo-server/src/federation/address.rs`
### L16. ANSI escape codes hardcoded without terminal capability detection
**File:** `crates/quicproquo-client/src/client/display.rs`
### L17. GUI `lib.rs:75` `.expect()` on Tauri run
**File:** `crates/quicproquo-gui/src/lib.rs:75`
### L18. `session.rs:186` DiskKeyStore failure silently falls back to ephemeral
**File:** `crates/quicproquo-client/src/client/session.rs:186`
### L19. `retry.rs:37` `thread_rng()` for jitter instead of `OsRng`
**File:** `crates/quicproquo-client/src/client/retry.rs:37`
### L20. `MeshStore.seen` set grows unboundedly
**File:** `crates/quicproquo-p2p/src/` (MeshStore)
### L21. `envelope.rs:58` `.expect()` on system clock in non-test code
**File:** `crates/quicproquo-p2p/src/envelope.rs:58`
### L22. `broadcast.rs:47` `.expect()` on encryption in non-test code
**File:** `crates/quicproquo-p2p/src/broadcast.rs:47`
### L23. Bot crate hardcodes sender as "peer" (TODO)
**File:** `crates/quicproquo-bot/src/lib.rs`
---
## TESTING GAPS (6 findings)
### T1. No unit tests for `plugin_loader.rs`
### T2. No unit tests for `federation/tls.rs`
### T3. No tests for `blob_ops.rs`
### T4. No tests for `delivery.rs`
### T5. `conversation.rs` migration code untested
### T6. No negative test for `MeshEnvelope::verify()` with wrong key
---
## Strengths (positive findings from security audit)
- Constant-time token comparison via `subtle::ConstantTimeEq`
- Parameterized SQL everywhere (no injection)
- Proper Argon2id (19 MiB, t=2, p=1) + ChaCha20-Poly1305 for client state
- Zero `.unwrap()` in non-test server code (grep-verified)
- TLS 1.3 only, mTLS for federation
- Rate limiting: 100 enqueues/60s, 50 connections/IP/60s
- Delivery proof signing: SHA-256(seq||recipient||timestamp) + Ed25519
- KeyPackage ciphersuite validation (only 0x0001 accepted)
- Payload size limits: 5 MB message, 50 MB blob, 1 MB KeyPackage
- Queue depth limits: 1000/inbox, 100K sessions, 100K waiters
- Blob path traversal protection via hex-encoded hash
- Audit logging with secret redaction (identity keys prefix-only)
- Production config validation (rejects devtoken, empty auth, missing TLS)
---
## Recommended fix priority
1. **Federation auth** (C1) — auth-gate inbound requests, validate origin against mTLS cert
2. **WS bridge authz** (C2) + rate limits (H11) + timing floors (M6) — parity with Cap'n Proto path
3. **Crypto error propagation** (C3, C7) — hpke_seal and hpke_setup_sender_and_export
4. **Zeroization sweep** (H8, H9, H10, M7-M10, L1) — all leaked secret material
5. **ServerContext extraction** (C4) — foundation for capability-based security
6. **FileBackedStore atomic writes** (C5, M3) — prevent data loss on crash
7. **std::sync::Mutex → tokio::sync::Mutex** (C6) — unblock Tokio workers
8. **Mobile TLS verification** (H5) — remove hardcoded skip
9. **fetch_wait TOCTOU** (M1) — register waiter before fetch
10. **Testing gaps** (T1-T6) — critical untested paths