feat: add post-quantum hybrid KEM + SQLCipher persistence
Feature 1 — Post-Quantum Hybrid KEM (X25519 + ML-KEM-768): - Create hybrid_kem.rs with keygen, encrypt, decrypt + 11 unit tests - Wire format: version(1) | x25519_eph_pk(32) | mlkem_ct(1088) | nonce(12) | ct - Add uploadHybridKey/fetchHybridKey RPCs to node.capnp schema - Server: hybrid key storage in FileBackedStore + RPC handlers - Client: hybrid keypair in StoredState, auto-wrap/unwrap in send/recv/invite/join - demo-group runs full hybrid PQ envelope round-trip Feature 2 — SQLCipher Persistence: - Extract Store trait from FileBackedStore API - Create SqlStore (rusqlite + bundled-sqlcipher) with encrypted-at-rest SQLite - Schema: key_packages, deliveries, hybrid_keys tables with indexes - Server CLI: --store-backend=sql, --db-path, --db-key flags - 5 unit tests for SqlStore (FIFO, round-trip, upsert, channel isolation) Also includes: client lib.rs refactor, auth config, TOML config file support, mdBook documentation, and various cleanups by user. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
This commit is contained in:
406
docs/src/roadmap/future-research.md
Normal file
406
docs/src/roadmap/future-research.md
Normal file
@@ -0,0 +1,406 @@
|
||||
# Future Research Directions
|
||||
|
||||
This page catalogues technologies and research directions that could strengthen
|
||||
quicnprotochat beyond the current [milestone plan](milestones.md). Each entry
|
||||
includes a brief description, the problem it solves, relevant crates or
|
||||
specifications, and how it maps to the project architecture.
|
||||
|
||||
For the production readiness work breakdown, see
|
||||
[Production Readiness WBS](production-readiness.md).
|
||||
|
||||
---
|
||||
|
||||
## Transport and Networking
|
||||
|
||||
### LibP2P / iroh (n0)
|
||||
|
||||
**Problem:** The current architecture is strictly client-server. Clients behind
|
||||
NAT cannot communicate directly, and the server is a single point of failure for
|
||||
delivery.
|
||||
|
||||
**Solution:** [LibP2P](https://libp2p.io/) and [iroh](https://iroh.computer/)
|
||||
(from n0) provide peer discovery, NAT traversal (hole-punching), and relay
|
||||
fallback. iroh is particularly interesting because it is Rust-native and built on
|
||||
QUIC, aligning with quicnprotochat's existing transport layer.
|
||||
|
||||
**Architecture impact:** Move from pure client-server to a hybrid topology where
|
||||
peers communicate directly when possible and fall back to server relay when NAT
|
||||
traversal fails. The server role shifts from mandatory relay to optional
|
||||
rendezvous/relay node.
|
||||
|
||||
**Crates:** `libp2p`, `iroh`, `iroh-net`
|
||||
|
||||
### WebTransport (HTTP/3)
|
||||
|
||||
**Problem:** Browser clients cannot use raw QUIC. The current stack requires a
|
||||
native Rust binary.
|
||||
|
||||
**Solution:** [WebTransport](https://w3c.github.io/webtransport/) exposes
|
||||
QUIC-like semantics (multiplexed bidirectional streams, datagrams) to browsers
|
||||
over HTTP/3. A WebTransport endpoint alongside the existing QUIC listener would
|
||||
enable a web client without WebSocket degradation.
|
||||
|
||||
**Architecture impact:** Add a second listener (HTTP/3 + WebTransport) that
|
||||
terminates WebTransport and bridges into the existing `NodeService` RPC layer.
|
||||
Cap'n Proto serialisation works in WASM via `capnp` crate.
|
||||
|
||||
**Crates:** `h3`, `h3-webtransport`, `wtransport`
|
||||
|
||||
### Tor / I2P Integration
|
||||
|
||||
**Problem:** MLS protects message content, but connection metadata (who connects
|
||||
to the server, when, how often) leaks to the server and network observers.
|
||||
|
||||
**Solution:** Route client-server connections through
|
||||
[Tor](https://www.torproject.org/) onion services or
|
||||
[I2P](https://geti2p.net/) tunnels. This provides metadata resistance at the
|
||||
network layer.
|
||||
|
||||
**Architecture impact:** The server exposes a `.onion` address (Tor) or an I2P
|
||||
destination. Clients connect through the anonymity network. Latency increases
|
||||
significantly, so this should be optional.
|
||||
|
||||
**Crates:** `arti` (Tor client in Rust), `arti-client`
|
||||
|
||||
---
|
||||
|
||||
## Storage and Persistence
|
||||
|
||||
### SQLCipher / libsql (Turso)
|
||||
|
||||
**Problem:** At M6, quicnprotochat needs persistent storage for group state, key
|
||||
material, and message queues. Storing private keys in a plaintext SQLite database
|
||||
is insufficient.
|
||||
|
||||
**Solution:** [SQLCipher](https://www.zetetic.net/sqlcipher/) provides
|
||||
transparent, page-level AES-256 encryption for SQLite. Alternatively,
|
||||
[libsql](https://turso.tech/libsql) (Turso) offers a SQLite fork with
|
||||
encryption, replication, and embedded server capabilities.
|
||||
|
||||
**Architecture impact:** Replace the `sqlx` SQLite backend with SQLCipher.
|
||||
Encryption key derived from a user-provided passphrase (via Argon2id) or a
|
||||
hardware-backed key.
|
||||
|
||||
**Crates:** `rusqlite` (with `bundled-sqlcipher` feature), `libsql`
|
||||
|
||||
### CRDTs (Automerge / Yrs)
|
||||
|
||||
**Problem:** Multi-device support requires synchronising state (group membership,
|
||||
read receipts, settings) across devices without a central authority resolving
|
||||
conflicts.
|
||||
|
||||
**Solution:** Conflict-free replicated data types (CRDTs) allow concurrent edits
|
||||
to converge without coordination. [Automerge](https://automerge.org/) and
|
||||
[Yrs](https://docs.rs/yrs/) (Yjs in Rust) provide production-quality CRDT
|
||||
implementations.
|
||||
|
||||
**Architecture impact:** Client-side state (contact list, group membership
|
||||
cache, read markers) stored as CRDT documents. Synchronisation happens over the
|
||||
existing MLS-encrypted channel, ensuring the server never sees the state.
|
||||
|
||||
**Crates:** `automerge`, `yrs`
|
||||
|
||||
### Object Storage (S3-compatible)
|
||||
|
||||
**Problem:** Encrypted file and media attachments need a storage backend that
|
||||
the server can host without seeing the content.
|
||||
|
||||
**Solution:** An S3-compatible object store (MinIO, Garage, or a cloud provider)
|
||||
for encrypted blobs. Clients encrypt attachments client-side (using a key derived
|
||||
from the MLS group secret) and upload the ciphertext. The server stores and
|
||||
serves opaque blobs.
|
||||
|
||||
**Architecture impact:** Add a media upload/download RPC to `NodeService`. The
|
||||
server proxies to the object store or returns pre-signed URLs.
|
||||
|
||||
**Crates:** `aws-sdk-s3`, `opendal`
|
||||
|
||||
---
|
||||
|
||||
## Cryptography and Privacy
|
||||
|
||||
### ML-KEM + ML-DSA Hybrid (Post-Quantum MLS)
|
||||
|
||||
**Problem:** Quantum computers threaten X25519 and Ed25519. While MLS content is
|
||||
protected by ephemeral key exchange, the init keys and credential signatures are
|
||||
vulnerable to harvest-now-decrypt-later attacks.
|
||||
|
||||
**Solution:** Hybrid X25519 + ML-KEM-768 KEM for MLS init keys, and optionally
|
||||
hybrid Ed25519 + ML-DSA-65 for credential signatures. The `ml-kem` crate is
|
||||
already vendored in the workspace.
|
||||
|
||||
**Architecture impact:** Custom `OpenMlsCryptoProvider` in `quicnprotochat-core`
|
||||
implementing the hybrid combiner. This is the M7 milestone -- see
|
||||
[Milestones](milestones.md#m7----post-quantum-planned) and
|
||||
[Hybrid KEM](../protocol-layers/hybrid-kem.md).
|
||||
|
||||
**Crates:** `ml-kem`, `ml-dsa`
|
||||
|
||||
**References:** NIST FIPS 203 (ML-KEM), `draft-ietf-tls-hybrid-design`
|
||||
|
||||
### Private Information Retrieval (PIR)
|
||||
|
||||
**Problem:** When a client fetches messages or KeyPackages, the server learns
|
||||
*which* recipient is requesting -- even though it cannot read the content.
|
||||
|
||||
**Solution:** Private Information Retrieval (PIR) allows a client to fetch a
|
||||
record from the server without revealing which record was requested.
|
||||
[SealPIR](https://github.com/microsoft/SealPIR) and SimplePIR provide practical
|
||||
constructions.
|
||||
|
||||
**Architecture impact:** Replace the `fetch` / `fetchKeyPackage` RPCs with PIR
|
||||
queries. This is a significant performance trade-off: PIR has high computational
|
||||
cost. Suitable for KeyPackage fetch (small database) before message fetch (large
|
||||
database).
|
||||
|
||||
### Sealed Sender (Signal-style)
|
||||
|
||||
**Problem:** The server sees `(sender, recipient, timestamp)` metadata on every
|
||||
enqueued message. Even without reading content, this metadata reveals social
|
||||
graphs.
|
||||
|
||||
**Solution:** [Sealed Sender](https://signal.org/blog/sealed-sender/) encrypts
|
||||
the sender's identity inside the MLS ciphertext. The server routes by
|
||||
`recipientKey` only and cannot determine who sent the message.
|
||||
|
||||
**Architecture impact:** Modify the `enqueue` RPC to omit sender identity from
|
||||
the server-visible metadata. The sender identity is included only inside the
|
||||
MLS application message (encrypted).
|
||||
|
||||
### Key Transparency (RFC draft)
|
||||
|
||||
**Problem:** A compromised server could substitute public keys, performing a
|
||||
man-in-the-middle attack on MLS group formation.
|
||||
|
||||
**Solution:** A verifiable, append-only log of public key bindings (similar to
|
||||
Certificate Transparency for TLS). Clients verify that the server's response
|
||||
matches the log before trusting a fetched KeyPackage.
|
||||
|
||||
**Architecture impact:** Add a key transparency log (Merkle tree) alongside the
|
||||
Authentication Service. Clients verify inclusion proofs on every `fetchKeyPackage`
|
||||
response.
|
||||
|
||||
**References:** `draft-ietf-keytrans-protocol`
|
||||
|
||||
---
|
||||
|
||||
## Identity and Authentication
|
||||
|
||||
### DIDs (Decentralized Identifiers)
|
||||
|
||||
**Problem:** User identities are currently bound to the server. If the server
|
||||
goes away, identities are lost.
|
||||
|
||||
**Solution:** [Decentralized Identifiers](https://www.w3.org/TR/did-core/)
|
||||
(`did:key`, `did:web`) provide self-sovereign identity. A user's DID is derived
|
||||
from their Ed25519 public key and is portable across servers.
|
||||
|
||||
**Architecture impact:** Replace raw Ed25519 public keys in MLS credentials with
|
||||
DID URIs. The server resolves DIDs to public keys for routing.
|
||||
|
||||
**Crates:** `did-key`, `ssi`
|
||||
|
||||
### OPAQUE (aPAKE)
|
||||
|
||||
**Problem:** If quicnprotochat adds password-based account registration, the
|
||||
server must never see the password -- not even a hash.
|
||||
|
||||
**Solution:** [OPAQUE](https://datatracker.ietf.org/doc/rfc9497/) is an
|
||||
asymmetric password-authenticated key exchange where the server stores only a
|
||||
one-way transformation of the password. The server cannot perform offline
|
||||
dictionary attacks.
|
||||
|
||||
**Architecture impact:** Replace the registration/login flow with OPAQUE. The
|
||||
server stores an OPAQUE registration record; the client runs the OPAQUE protocol
|
||||
to authenticate and derive a session key.
|
||||
|
||||
**Crates:** `opaque-ke`
|
||||
|
||||
**References:** RFC 9497
|
||||
|
||||
### WebAuthn / Passkeys
|
||||
|
||||
**Problem:** Password-based auth (even with OPAQUE) is vulnerable to phishing.
|
||||
Hardware-backed authentication provides stronger device binding.
|
||||
|
||||
**Solution:** [WebAuthn](https://www.w3.org/TR/webauthn-3/) / Passkeys allow
|
||||
authentication via hardware tokens (YubiKey), platform authenticators (Touch ID,
|
||||
Windows Hello), or synced passkeys.
|
||||
|
||||
**Architecture impact:** Add a WebAuthn registration/authentication flow to the
|
||||
account system. Requires a server-side WebAuthn relying party implementation.
|
||||
|
||||
**Crates:** `webauthn-rs`
|
||||
|
||||
### Verifiable Credentials (W3C VC)
|
||||
|
||||
**Problem:** Proving attributes (organization membership, role, age) without
|
||||
revealing full identity.
|
||||
|
||||
**Solution:** [Verifiable Credentials](https://www.w3.org/TR/vc-data-model/)
|
||||
allow a user to present cryptographic proofs of attributes issued by a trusted
|
||||
authority.
|
||||
|
||||
**Architecture impact:** Extend MLS credentials with VC presentation. A group
|
||||
admin could require proof of organization membership before allowing join.
|
||||
|
||||
---
|
||||
|
||||
## Application Layer
|
||||
|
||||
### Matrix-style Federation
|
||||
|
||||
**Problem:** A single server is a single point of failure and a single point of
|
||||
trust. Users on different servers cannot communicate.
|
||||
|
||||
**Solution:** Federation allows multiple quicnprotochat servers to exchange
|
||||
messages, similar to [Matrix](https://matrix.org/) homeserver federation. Each
|
||||
server manages its own users and relays messages to peer servers.
|
||||
|
||||
**Architecture impact:** Major. Requires server-to-server protocol, distributed
|
||||
identity resolution, and cross-server MLS group management.
|
||||
|
||||
### WASM Plugin System
|
||||
|
||||
**Problem:** Extensibility (bots, bridges, custom message types) currently
|
||||
requires forking the codebase.
|
||||
|
||||
**Solution:** A sandboxed WASM plugin system allows third-party extensions to run
|
||||
inside the client or server without access to private key material.
|
||||
|
||||
**Architecture impact:** Define a plugin API (message hooks, command handlers).
|
||||
Plugins compiled to WASM and loaded at runtime via `wasmtime` or `wasmer`.
|
||||
|
||||
**Crates:** `wasmtime`, `wasmer`, `extism`
|
||||
|
||||
### Double-Ratchet DM Layer
|
||||
|
||||
**Problem:** MLS is optimised for groups. For efficient 1:1 conversations, the
|
||||
Signal double ratchet (X3DH + Axolotl) provides better performance
|
||||
characteristics (no tree overhead for two parties).
|
||||
|
||||
**Solution:** Implement a double-ratchet layer for 1:1 DMs, using MLS only for
|
||||
groups with N > 2. The [1:1 Channel Design](dm-channels.md) currently uses MLS
|
||||
for DMs; this would be an optimisation.
|
||||
|
||||
**References:** [The Double Ratchet Algorithm](https://signal.org/docs/specifications/doubleratchet/),
|
||||
[X3DH Key Agreement Protocol](https://signal.org/docs/specifications/x3dh/)
|
||||
|
||||
---
|
||||
|
||||
## Observability and Operations
|
||||
|
||||
### OpenTelemetry (Tracing + Metrics)
|
||||
|
||||
**Problem:** The current logging is `tracing`-based but lacks distributed
|
||||
tracing context and structured metrics export.
|
||||
|
||||
**Solution:** [OpenTelemetry](https://opentelemetry.io/) provides a unified
|
||||
framework for distributed tracing, metrics, and log correlation. OTLP export
|
||||
enables integration with any observability backend.
|
||||
|
||||
**Architecture impact:** Add `tracing-opentelemetry` and `opentelemetry-otlp`
|
||||
to the server. Instrument RPC handlers with spans. Export to Jaeger, Grafana
|
||||
Tempo, or any OTLP-compatible backend.
|
||||
|
||||
**Crates:** `opentelemetry`, `opentelemetry-otlp`, `tracing-opentelemetry`
|
||||
|
||||
### Prometheus + Grafana
|
||||
|
||||
**Problem:** No quantitative visibility into server performance (throughput,
|
||||
latency, queue depth, epoch advancement rate).
|
||||
|
||||
**Solution:** Export Prometheus metrics from the server. Visualise with Grafana
|
||||
dashboards.
|
||||
|
||||
**Metrics to export:** message throughput (enqueue/fetch per second), RPC
|
||||
latency histograms, MLS epoch advancement rate, delivery queue depth, KeyPackage
|
||||
store size, active connections.
|
||||
|
||||
**Crates:** `prometheus`, `metrics`, `metrics-exporter-prometheus`
|
||||
|
||||
### Testcontainers-rs
|
||||
|
||||
**Problem:** Integration tests currently run server and client in the same
|
||||
process (`tokio::spawn`). This does not test real network conditions, container
|
||||
startup, or multi-process interactions.
|
||||
|
||||
**Solution:** [Testcontainers-rs](https://docs.rs/testcontainers/) runs Docker
|
||||
containers from Rust tests, enabling true end-to-end CI with real network
|
||||
boundaries.
|
||||
|
||||
**Architecture impact:** Add testcontainers-based integration tests alongside
|
||||
the existing in-process tests. The Docker image is already maintained.
|
||||
|
||||
**Crates:** `testcontainers`, `testcontainers-modules`
|
||||
|
||||
---
|
||||
|
||||
## Developer Experience
|
||||
|
||||
### Tauri / Dioxus (Native GUI)
|
||||
|
||||
**Problem:** The current interface is CLI-only. A graphical client would broaden
|
||||
the user base for testing and demonstration.
|
||||
|
||||
**Solution:** [Tauri](https://tauri.app/) or [Dioxus](https://dioxuslabs.com/)
|
||||
provide native cross-platform GUI frameworks in Rust. The
|
||||
`quicnprotochat-core` crate can be shared directly with the GUI client.
|
||||
|
||||
**Architecture impact:** Add a `quicnprotochat-gui` crate that depends on
|
||||
`quicnprotochat-core` and `quicnprotochat-proto`. The GUI drives the same
|
||||
`GroupMember` and RPC logic as the CLI client.
|
||||
|
||||
**Crates:** `tauri`, `dioxus`
|
||||
|
||||
### uniffi / diplomat (Mobile FFI)
|
||||
|
||||
**Problem:** Mobile clients (iOS, Android) cannot use the Rust binary directly.
|
||||
|
||||
**Solution:** [uniffi](https://github.com/aspect-build/aspect-cli) (Mozilla) and
|
||||
[diplomat](https://github.com/nickelc/diplomat) generate idiomatic Swift and
|
||||
Kotlin bindings from Rust definitions.
|
||||
|
||||
**Architecture impact:** Expose `quicnprotochat-core` through a C-compatible FFI
|
||||
layer. Mobile apps call into the Rust crypto and protocol logic.
|
||||
|
||||
**Crates:** `uniffi`, `diplomat`
|
||||
|
||||
### Nix Flakes
|
||||
|
||||
**Problem:** The development environment requires `capnp` (Cap'n Proto compiler),
|
||||
a specific Rust toolchain version, and test infrastructure. Setup varies across
|
||||
developer machines.
|
||||
|
||||
**Solution:** [Nix flakes](https://nixos.wiki/wiki/Flakes) provide a
|
||||
reproducible, declarative development environment. A single `nix develop`
|
||||
command sets up the toolchain, `capnp`, and all dependencies.
|
||||
|
||||
**Architecture impact:** Add `flake.nix` and `flake.lock` to the repository root.
|
||||
|
||||
---
|
||||
|
||||
## Top 5 Priority Implementations
|
||||
|
||||
The following table ranks the most impactful technologies for near-term adoption,
|
||||
considering the current state of the codebase and the [milestone plan](milestones.md).
|
||||
|
||||
| Priority | Technology | Why | Unlocks |
|
||||
|----------|-----------|-----|---------|
|
||||
| 1 | **Post-quantum hybrid KEM** | `ml-kem` is already vendored in the workspace. Completing the hybrid `OpenMlsCryptoProvider` makes quicnprotochat one of the first PQ MLS implementations. | M7 |
|
||||
| 2 | **SQLCipher persistence** | Encrypted-at-rest storage is the prerequisite for multi-device support, offline usage, and server restart survival. | M6 |
|
||||
| 3 | **OPAQUE auth** | Zero-knowledge password authentication is a massive security uplift for the account system. The server never sees or stores passwords. | Phase 3 (authz) |
|
||||
| 4 | **iroh / LibP2P** | NAT traversal and optional P2P mesh makes quicnprotochat deployable without centralised infrastructure. Aligns with the existing QUIC transport. | Beyond M7 |
|
||||
| 5 | **Sealed Sender + PIR** | Content encryption is table stakes. Metadata resistance (hiding who talks to whom) is the frontier of private messaging research. | Beyond M7 |
|
||||
|
||||
---
|
||||
|
||||
## Cross-references
|
||||
|
||||
- [Milestones](milestones.md) -- current milestone tracker
|
||||
- [Production Readiness WBS](production-readiness.md) -- phased work breakdown
|
||||
- [Auth, Devices, and Tokens](authz-plan.md) -- OPAQUE integration point
|
||||
- [1:1 Channel Design](dm-channels.md) -- double-ratchet optimisation context
|
||||
- [Hybrid KEM](../protocol-layers/hybrid-kem.md) -- existing PQ design
|
||||
- [ADR-006: PQ Gap in Noise Transport](../design-rationale/adr-006-pq-gap.md) -- accepted PQ risk
|
||||
- [References](../appendix/references.md) -- standards and crate documentation
|
||||
Reference in New Issue
Block a user