Rename all project references from quicproquo/qpq to quicprochat/qpc across documentation, Docker configuration, CI workflows, packaging scripts, operational configs, and build tooling. - Docker: crate paths, binary names, user/group, data dirs, env vars - CI: workflow crate references, binary names, artifact names - Docs: all markdown files under docs/, SDK READMEs, book.toml - Packaging: OpenWrt Makefile, init script, UCI config (file renames) - Scripts: justfile, dev-shell, screenshot, cross-compile, ai_team - Operations: Prometheus config, alert rules, Grafana dashboard - Config: .env.example (QPQ_* → QPC_*), CODEOWNERS paths - Top-level: README, CONTRIBUTING, ROADMAP, CLAUDE.md
7.7 KiB
7.7 KiB
Production Readiness Audit
This document summarizes issues and fixes needed to get quicprochat production-ready, based on a codebase review. It aligns with the existing Production Readiness WBS and Coding Standards.
Critical (fix before production)
1. Auth token and dev defaults
- README and example config use
auth_token = "devtoken"anddb_key = "". - Risk: Deploying with default/example config allows weak or no auth and unencrypted DB.
- Fix: Require explicit
QPC_AUTH_TOKEN(or config) in production; reject empty or"devtoken"when a production mode/env is set. Document thatdb_keyempty disables SQLCipher and is not acceptable for production.
2. Database encryption optional
sql_store.rs: Ifdb_keyis empty, SQLCipher is not applied; DB is plaintext on disk.- Fix: In production, require non-empty
db_key(or fail startup with a clear error). Document in README and deployment docs.
3. Secrets and generated files not ignored
.gitignoredoes not includedata/, sodata/server-cert.der,data/server-key.der, anddata/qpc.dbcould be committed.- Fix: Add
data/(and any other dirs that hold certs, keys, or DBs) to.gitignore. Consider adding*.derand*.dbif used only for local/dev.
4. Dockerfile out of sync with workspace
- Workspace has 5 members including
crates/quicprochat-p2p. - Dockerfile only copies 4 crate manifests and creates stub dirs for those 4; it never copies
quicprochat-p2p. - Result:
cargo build --release --bin quicprochat-servercan fail (missing workspace member) or behave inconsistently. - Fix: Add
COPY crates/quicprochat-p2p/Cargo.tomland a stubcrates/quicprochat-p2p/src(or equivalent) in the dependency-cache layer so the workspace resolves. Ensure the finalCOPY crates/ crates/still brings in real p2p source.
5. E2E test failing (rustls CryptoProvider)
- Symptom:
e2e_happy_path_register_invite_join_send_recvpanics: "Could not automatically determine the process-level CryptoProvider". - Cause: rustls 0.23 requires a default
CryptoProvider(e.g.ringoraws-lc-rs). In the test process, nothing callsCryptoProvider::install_default()before the client uses QUIC/rustls. - Fix: In the E2E test (or in a shared test harness), call
rustls::crypto::ring::default_provider().install_default().ok()(or the chosen provider) once at process start before any QUIC/rustls usage. Ensure the crate has exactly one of thering/aws-lc-rsfeatures so the default is unambiguous.
High (security and reliability)
6. Panic risk in client RPC path
quicprochat-client/src/lib.rs:set_auth()uses.expect("init_auth must be called with a non-empty token before RPCs"). If RPC is called withoutinit_auth, the process panics.- Fix: Replace with a
Resultor an error return (e.g. a dedicated error type) so callers get a recoverable error instead of a panic. Document thatinit_authmust be called before RPCs.
7. Mutex .unwrap() in production paths
sql_store.rs: Allself.conn.lock().unwrap()calls can panic if the mutex is poisoned.storage.rs(file backend): Same pattern with.lock().unwrap()on shared maps.- Coding standards: Prefer handling
Resultfromlock()(e.g.lock().map_err(...)?) or use a type that encapsulates poisoning so production paths don’t panic on contention/poison.
8. unwrap() in client library
lib.rs:"0.0.0.0:0".parse().unwrap()for the client endpoint. If parsing ever changed or failed, this would panic.- Fix: Use
.context("parse client bind address")?(or equivalent) so this is a proper error path.
9. TLS certificate generation is silent on first run
- Server auto-generates a self-signed cert if files are missing. Production readiness WBS says: "Self-signed certificates acceptable for development; production deployments must use a CA-signed certificate or certificate pinning."
- Fix: Add a startup check (e.g. env or config flag) that in production rejects auto-generation and requires existing cert/key paths. Log clearly when running with self-signed certs so operators know they’re in dev mode.
Medium (hygiene and ops)
10. No CI pipeline
- Production Readiness WBS expects: GitHub Actions with
cargo test --workspace,cargo clippy,cargo fmt --check,cargo deny check. - Current state: No
.github/workflows(or equivalent) found. - Fix: Add a CI workflow that runs tests, clippy, fmt, and deny so every PR is validated.
11. No CODEOWNERS
- WBS requires CODEOWNERS for review ownership and security-sensitive changes.
- Fix: Add
.github/CODEOWNERSmapping crates to owners.
12. No dependency audit in CI
- WBS mentions
cargo auditin CI. - Fix: Add a CI job that runs
cargo audit(and optionallycargo deny check) and fails on known vulnerabilities or policy violations.
13. No deny.toml / deny.toml config
- Coding standards reference
cargo deny check; no config file was found. - Fix: Add
deny.toml(or equivalent) and runcargo deny checkin CI.
14. Warnings in build
- Cap'n Proto generated code:
unused_parensin generated.rsfiles. Standards allow#[allow(...)]on generated code; consider suppressing in the codegen output or in the crate that includes it. - Server:
SessionInfohasusernameandidentity_keynever read (dead code). Either use them (e.g. audit logging) or remove/allow with a short comment. - E2E test: Deprecated
cargo_bin,unused_mut; trivial to fix. - openmls: Future-incompat warning; track upstream and plan upgrade.
15. Docker image runs as nobody
- Dockerfile uses
USER nobody. Good for not running as root, butnobodymay not have a writable home or data dir. - Fix: Ensure
QPC_DATA_DIR(and cert paths) point to a directory writable bynobody, or create a dedicated user/group with a known UID and use that in the Dockerfile and docs.
Already in good shape
- Auth token comparison: Uses
subtle::ConstantTimeEq(ct_eq) for the static token — good. - Input validation: Recipient key length (32), payload size (5 MB), wire version, rate limiting, queue depth — present and consistent.
- Structured logging:
tracingwith env filter; no secret material in log messages in the reviewed paths. - Error handling: RPC handlers return coded errors; no
unwrap()on crypto in server RPC paths. - Health endpoint: Server exposes a health RPC used by E2E and can be used for readiness probes.
Summary checklist
| Area | Status | Action |
|---|---|---|
| Auth / tokens | Fix | Require strong auth in prod; document devtoken / empty db_key |
| DB encryption | Fix | Require non-empty db_key in production |
| .gitignore | Fix | Add data/ (and cert/DB patterns as needed) |
| Dockerfile | Fix | Include p2p crate in workspace build |
| E2E test | Fix | Set rustls CryptoProvider in test harness |
| Client panic | Improve | Replace expect with Result in set_auth |
| Mutex unwrap | Improve | Handle poison or use non-panicking API |
| TLS in production | Improve | Reject auto-generated cert in prod mode |
| CI / CODEOWNERS | Add | GitHub Actions, deny, audit, CODEOWNERS |
| Warnings | Clean up | Dead code, deprecated APIs, generated allows |
This audit should be revisited after implementing Phase 1–2 of the Production Readiness WBS and before any production deployment.