Files

Christian Nennemann 00b0aa92a1 feat: add protocol comparison docs, P2P crate, production audit, and design fixes

Add comprehensive documentation comparing quicnprotochat against classical
chat protocols (IRC+SSL, XMPP, Telegram) with diagrams and attack scenarios.
Promote comparison pages to top-level sidebar section. Include P2P transport
crate (iroh), production readiness audit, CI workflows, dependency policy,
and continued architecture improvements across all crates.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

2026-02-22 12:15:44 +01:00

7.7 KiB

Raw Permalink Blame History

Production Readiness Audit

This document summarizes issues and fixes needed to get quicnprotochat production-ready, based on a codebase review. It aligns with the existing Production Readiness WBS and Coding Standards.

Critical (fix before production)

1. Auth token and dev defaults

README and example config use auth_token = "devtoken" and db_key = "".
Risk: Deploying with default/example config allows weak or no auth and unencrypted DB.
Fix: Require explicit QUICNPROTOCHAT_AUTH_TOKEN (or config) in production; reject empty or "devtoken" when a production mode/env is set. Document that db_key empty disables SQLCipher and is not acceptable for production.

2. Database encryption optional

sql_store.rs: If db_key is empty, SQLCipher is not applied; DB is plaintext on disk.
Fix: In production, require non-empty db_key (or fail startup with a clear error). Document in README and deployment docs.

3. Secrets and generated files not ignored

.gitignore does not include data/, so data/server-cert.der, data/server-key.der, and data/quicnprotochat.db could be committed.
Fix: Add data/ (and any other dirs that hold certs, keys, or DBs) to .gitignore. Consider adding *.der and *.db if used only for local/dev.

4. Dockerfile out of sync with workspace

Workspace has 5 members including crates/quicnprotochat-p2p.
Dockerfile only copies 4 crate manifests and creates stub dirs for those 4; it never copies quicnprotochat-p2p.
Result: cargo build --release --bin quicnprotochat-server can fail (missing workspace member) or behave inconsistently.
Fix: Add COPY crates/quicnprotochat-p2p/Cargo.toml and a stub crates/quicnprotochat-p2p/src (or equivalent) in the dependency-cache layer so the workspace resolves. Ensure the final COPY crates/ crates/ still brings in real p2p source.

5. E2E test failing (rustls CryptoProvider)

Symptom: e2e_happy_path_register_invite_join_send_recv panics: "Could not automatically determine the process-level CryptoProvider".
Cause: rustls 0.23 requires a default CryptoProvider (e.g. ring or aws-lc-rs). In the test process, nothing calls CryptoProvider::install_default() before the client uses QUIC/rustls.
Fix: In the E2E test (or in a shared test harness), call rustls::crypto::ring::default_provider().install_default().ok() (or the chosen provider) once at process start before any QUIC/rustls usage. Ensure the crate has exactly one of the ring / aws-lc-rs features so the default is unambiguous.

High (security and reliability)

6. Panic risk in client RPC path

quicnprotochat-client/src/lib.rs: set_auth() uses .expect("init_auth must be called with a non-empty token before RPCs"). If RPC is called without init_auth, the process panics.
Fix: Replace with a Result or an error return (e.g. a dedicated error type) so callers get a recoverable error instead of a panic. Document that init_auth must be called before RPCs.

7. Mutex `.unwrap()` in production paths

sql_store.rs: All self.conn.lock().unwrap() calls can panic if the mutex is poisoned.
storage.rs (file backend): Same pattern with .lock().unwrap() on shared maps.
Coding standards: Prefer handling Result from lock() (e.g. lock().map_err(...)?) or use a type that encapsulates poisoning so production paths don’t panic on contention/poison.

8. `unwrap()` in client library

lib.rs: "0.0.0.0:0".parse().unwrap() for the client endpoint. If parsing ever changed or failed, this would panic.
Fix: Use .context("parse client bind address")? (or equivalent) so this is a proper error path.

9. TLS certificate generation is silent on first run

Server auto-generates a self-signed cert if files are missing. Production readiness WBS says: "Self-signed certificates acceptable for development; production deployments must use a CA-signed certificate or certificate pinning."
Fix: Add a startup check (e.g. env or config flag) that in production rejects auto-generation and requires existing cert/key paths. Log clearly when running with self-signed certs so operators know they’re in dev mode.

Medium (hygiene and ops)

10. No CI pipeline

Production Readiness WBS expects: GitHub Actions with cargo test --workspace, cargo clippy, cargo fmt --check, cargo deny check.
Current state: No .github/workflows (or equivalent) found.
Fix: Add a CI workflow that runs tests, clippy, fmt, and deny so every PR is validated.

11. No CODEOWNERS

WBS requires CODEOWNERS for review ownership and security-sensitive changes.
Fix: Add .github/CODEOWNERS mapping crates to owners.

12. No dependency audit in CI

WBS mentions cargo audit in CI.
Fix: Add a CI job that runs cargo audit (and optionally cargo deny check) and fails on known vulnerabilities or policy violations.

13. No `deny.toml` / `deny.toml` config

Coding standards reference cargo deny check; no config file was found.
Fix: Add deny.toml (or equivalent) and run cargo deny check in CI.

14. Warnings in build

Cap'n Proto generated code: unused_parens in generated .rs files. Standards allow #[allow(...)] on generated code; consider suppressing in the codegen output or in the crate that includes it.
Server: SessionInfo has username and identity_key never read (dead code). Either use them (e.g. audit logging) or remove/allow with a short comment.
E2E test: Deprecated cargo_bin, unused_mut; trivial to fix.
openmls: Future-incompat warning; track upstream and plan upgrade.

15. Docker image runs as `nobody`

Dockerfile uses USER nobody. Good for not running as root, but nobody may not have a writable home or data dir.
Fix: Ensure QUICNPROTOCHAT_DATA_DIR (and cert paths) point to a directory writable by nobody, or create a dedicated user/group with a known UID and use that in the Dockerfile and docs.

Already in good shape

Auth token comparison: Uses subtle::ConstantTimeEq (ct_eq) for the static token — good.
Input validation: Recipient key length (32), payload size (5 MB), wire version, rate limiting, queue depth — present and consistent.
Structured logging: tracing with env filter; no secret material in log messages in the reviewed paths.
Error handling: RPC handlers return coded errors; no unwrap() on crypto in server RPC paths.
Health endpoint: Server exposes a health RPC used by E2E and can be used for readiness probes.

Summary checklist

Area	Status	Action
Auth / tokens	Fix	Require strong auth in prod; document devtoken / empty db_key
DB encryption	Fix	Require non-empty db_key in production
.gitignore	Fix	Add `data/` (and cert/DB patterns as needed)
Dockerfile	Fix	Include p2p crate in workspace build
E2E test	Fix	Set rustls CryptoProvider in test harness
Client panic	Improve	Replace expect with Result in set_auth
Mutex unwrap	Improve	Handle poison or use non-panicking API
TLS in production	Improve	Reject auto-generated cert in prod mode
CI / CODEOWNERS	Add	GitHub Actions, deny, audit, CODEOWNERS
Warnings	Clean up	Dead code, deprecated APIs, generated allows

This audit should be revisited after implementing Phase 1–2 of the Production Readiness WBS and before any production deployment.

7.7 KiB Raw Permalink Blame History Unescape Escape