DM channels (createChannel), channel authz, security/docs, future improvements

- Add createChannel RPC (node.capnp @18): create 1:1 channel, returns 16-byte channelId
- Store: create_channel(member_a, member_b), get_channel_members(channel_id)
- FileBackedStore: channels.bin; SqlStore: migration 003_channels, schema v4
- channel_ops: handle_create_channel (auth + identity, peerKey 32 bytes)
- Delivery authz: when channel_id.len() == 16, require caller and recipient are channel members (E022/E023)
- Error codes E022 CHANNEL_ACCESS_DENIED, E023 CHANNEL_NOT_FOUND
- SUMMARY: link Certificate lifecycle; security audit, future improvements, multi-agent plan docs
- Certificate lifecycle doc, SECURITY-AUDIT, FUTURE-IMPROVEMENTS, MULTI-AGENT-WORK-PLAN
- Client/core/tls/auth/server main: assorted fixes and updates from review and audit

Co-authored-by: Cursor <cursoragent@cursor.com>
This commit is contained in:
2026-02-23 22:54:28 +01:00
parent 6b8b61c6ae
commit 750b794342
40 changed files with 4715 additions and 152 deletions

182
docs/FUTURE-IMPROVEMENTS.md Normal file
View File

@@ -0,0 +1,182 @@
# Future Improvements
This document consolidates suggested improvements for quicnprotochat, drawn from the [roadmap](src/roadmap/milestones.md), [production readiness WBS](src/roadmap/production-readiness.md), [security audit](SECURITY-AUDIT.md), [production readiness audit](PRODUCTION-READINESS-AUDIT.md), and [future research](src/roadmap/future-research.md). Items are grouped by theme and ordered by impact and dependency.
---
## 1. Security and hardening
### 1.1 M7 — Post-quantum MLS (next milestone)
- **Goal:** Hybrid X25519 + ML-KEM-768 in the MLS crypto provider so group key material has post-quantum confidentiality.
- **Ref:** [Milestones § M7](src/roadmap/milestones.md), [Hybrid KEM](src/protocol-layers/hybrid-kem.md).
- **Status:** Hybrid KEM exists at the envelope level; integrate into OpenMLS provider and run full test suite.
### 1.2 CA-signed TLS / certificate lifecycle
- **Current:** Self-signed certs; client pins by using server cert as `ca_cert`.
- **Improve:** Document or add support for CA-issued certs (e.g. Let's Encrypt), cert rotation, and optional OCSP/CRL. Keep pinning as the recommended option for single-server deployments.
- **Ref:** [Threat model § Known gaps](src/cryptography/threat-model.md).
### 1.3 Stronger credential binding
- **Current:** MLS `BasicCredential` (raw Ed25519); no revocation or CA chain.
- **Improve:** X.509-based MLS credentials, or Key Transparency / verifiable log for public keys to detect substitution.
- **Ref:** [Threat model](src/cryptography/threat-model.md), [Future research](src/roadmap/future-research.md).
### 1.4 Username enumeration
- **Current:** OPAQUE login start uses `get_user_record`; timing or response shape might reveal user existence.
- **Improve:** If user enumeration is in scope, consider constant-time or uniform response for unknown users (without weakening OPAQUE).
- **Ref:** [Security audit § 8.3](SECURITY-AUDIT.md).
---
## 2. Authorization and abuse prevention
### 2.1 Full AUTHZ plan (accounts, devices, tokens)
- **Current:** Bearer/session tokens and identity binding; no formal account/device model.
- **Improve:** Implement the [authz plan](src/roadmap/authz-plan.md): accounts, devices, device_id in Auth, per-account/per-device rate limits, and binding KeyPackage uploads to the authenticated account.
- **Ref:** [Production readiness WBS](src/roadmap/production-readiness.md), [Threat model § No client auth on DS](src/cryptography/threat-model.md).
### 2.2 Per-IP and connection limits
- **Current:** Per-token rate limit; no per-IP or global connection cap.
- **Improve:** Configurable per-IP rate limit and max concurrent QUIC connections to reduce DoS and resource exhaustion.
- **Ref:** [Production readiness WBS § Abuse / DoS](src/roadmap/production-readiness.md).
---
## 3. Reliability and resilience
### 3.1 Client offline queue and retry
- **Current:** Retry with backoff for RPCs; no offline queue or gap detection.
- **Improve:** Offline message queue, idempotent message IDs, and gap detection so clients can recover after long disconnects without duplicate or lost messages.
- **Ref:** [Production readiness WBS § Client resilience](src/roadmap/production-readiness.md).
### 3.2 Connection draining and graceful shutdown
- **Current:** QUIC endpoint closed on ctrl_c; in-flight RPCs may be cut.
- **Improve:** Draining period: stop accepting new connections, wait for in-flight RPCs (with timeout), then close. Document expected behaviour for load balancers.
### 3.3 N-1 compatibility and wire versioning
- **Current:** `CURRENT_WIRE_VERSION` and server-side check; no formal N-1 support policy.
- **Improve:** Document supported client/server version matrix and how to deprecate old wire versions safely.
- **Ref:** [Production readiness WBS § Compatibility](src/roadmap/production-readiness.md).
---
## 4. Operations and observability
### 4.1 CI pipeline
- **Add:** GitHub Actions (or equivalent) for:
- `cargo test --workspace`
- `cargo clippy`
- `cargo fmt --check`
- `cargo audit` (and optionally `cargo deny check`)
- **Ref:** [Production readiness audit § 10](PRODUCTION-READINESS-AUDIT.md).
### 4.2 CODEOWNERS and review policy
- **Add:** `.github/CODEOWNERS` mapping crates to owners; document that security-sensitive changes (crypto, auth, wire format) require two reviewers.
- **Ref:** [Production readiness WBS § Governance](src/roadmap/production-readiness.md).
### 4.3 Dependency policy (deny.toml)
- **Add:** `deny.toml` (or equivalent) for `cargo deny` (licenses, duplicate crates, banned crates, etc.) and run in CI.
- **Ref:** [Production readiness audit § 13](PRODUCTION-READINESS-AUDIT.md).
### 4.4 HTTP health endpoint (optional)
- **Current:** Health is an RPC over QUIC; no separate HTTP endpoint.
- **Improve:** Optional HTTP (e.g. port 8080) `/health` or `/ready` for load balancers and orchestrators that expect HTTP, or document that health is QUIC-only and how to probe it.
### 4.5 Docker user and writable paths
- **Current:** Image runs as `nobody`; data dir may not be writable.
- **Improve:** Create a dedicated user/group in the image and set `QUICNPROTOCHAT_DATA_DIR` (and cert paths) to a directory writable by that user; document in deployment docs.
- **Ref:** [Production readiness audit § 15](PRODUCTION-READINESS-AUDIT.md).
---
## 5. Features and product
### 5.1 Private 1:1 channels (DM)
- **Goal:** Channel creation, per-channel authz, TTL, and DM-specific flows so 1:1 chats are first-class and access-controlled.
- **Ref:** [DM channels](src/roadmap/dm-channels.md), [Production readiness WBS](src/roadmap/production-readiness.md).
### 5.2 MLS lifecycle (remove, update, proposals)
- **Current:** Add member, send, receive; no remove/update or explicit proposal handling.
- **Improve:** Member remove, credential update, and handling of MLS proposals (Remove, Update) for full group lifecycle.
- **Ref:** [Milestones § M5](src/roadmap/milestones.md) (optional follow-ups).
### 5.3 Sealed Sender and metadata resistance
- **Goal:** Hide sender identity from the server (sender inside MLS ciphertext); optionally PIR for fetch so server does not learn which queue was accessed.
- **Ref:** [Threat model § Future mitigations](src/cryptography/threat-model.md), [Future research](src/roadmap/future-research.md).
### 5.4 Traffic analysis resistance
- **Goal:** Padding and/or traffic shaping to reduce inference from message sizes and timing.
- **Ref:** [Threat model § Future mitigations](src/cryptography/threat-model.md).
---
## 6. Transport and topology
### 6.1 P2P / NAT traversal (iroh, LibP2P)
- **Goal:** Direct peer-to-peer when possible; server as optional relay/rendezvous. Reduces single-point-of-failure and can improve latency.
- **Ref:** [Future research § LibP2P / iroh](src/roadmap/future-research.md). The `quicnprotochat-p2p` crate is a starting point.
### 6.2 WebTransport (browser client)
- **Goal:** HTTP/3 + WebTransport endpoint so a web client can use the same RPC layer without raw QUIC in the browser.
- **Ref:** [Future research § WebTransport](src/roadmap/future-research.md).
### 6.3 Tor / I2P
- **Goal:** Optional routing over Tor or I2P to hide client IP and reduce metadata leakage.
- **Ref:** [Threat model § Future mitigations](src/cryptography/threat-model.md), [Future research](src/roadmap/future-research.md).
---
## 7. Code and maintenance
### 7.1 Warnings and dead code
- **Clean up:** Cap'n Proto generated `unused_parens`; `SessionInfo` dead fields (use or document); E2E deprecated `cargo_bin` and `unused_mut`; track openmls future-incompat.
- **Ref:** [Production readiness audit § 14](PRODUCTION-READINESS-AUDIT.md).
### 7.2 Integration and E2E coverage
- **Add:** More integration tests (e.g. auth + delivery together, failure paths, concurrent register, rate limit, queue full). Broader E2E scenarios (multi-party, rejoin, key refresh).
- **Ref:** [Multi-perspective review](SECURITY-AUDIT.md) maintainability section.
---
## Priority overview
| Priority | Theme | Examples |
|----------|--------|----------|
| **High** | Security | M7 PQ, CA/pinning docs, AUTHZ plan, CI + audit |
| **High** | Ops | CI, CODEOWNERS, deny.toml, Docker user/paths |
| **Medium** | Reliability | Offline queue, draining, N-1 policy |
| **Medium** | Features | DM channels, MLS remove/update |
| **Lower** | Research | Sealed Sender, PIR, P2P, WebTransport, Tor |
---
## Related documents
- [Milestones](src/roadmap/milestones.md) — M7 and beyond
- [Production readiness WBS](src/roadmap/production-readiness.md) — phased hardening
- [Future research](src/roadmap/future-research.md) — technologies and options
- [Security audit](SECURITY-AUDIT.md) — recommendations and status
- [Production readiness audit](PRODUCTION-READINESS-AUDIT.md) — checklist and fixes