Files
quicproquo/docs/src/roadmap/future-research.md
Christian Nennemann d073f614b3 docs: rewrite mdBook documentation for v2 architecture
Update 25+ files and add 6 new pages to reflect the v2 migration from
Cap'n Proto to Protobuf framing over QUIC. Integrates SDK and Operations
docs into the mdBook, restructures SUMMARY.md, and rewrites the wire
format, architecture, and protocol sections with accurate v2 content.
2026-03-04 22:02:31 +01:00

346 lines
14 KiB
Markdown

# Future Research Directions
This page catalogues technologies and research directions that could strengthen
quicproquo beyond the current [milestone plan](milestones.md). Each entry
includes a brief description, the problem it solves, relevant crates or
specifications, and how it maps to the project architecture.
For the production readiness work breakdown, see
[Production Readiness WBS](production-readiness.md).
---
## Transport and Networking
### WebTransport (HTTP/3)
**Problem:** Browser clients cannot use raw QUIC. The current stack requires a
native Rust binary.
**Solution:** [WebTransport](https://w3c.github.io/webtransport/) exposes
QUIC-like semantics (multiplexed bidirectional streams, datagrams) to browsers
over HTTP/3. A WebTransport endpoint alongside the existing QUIC listener would
enable a web client without WebSocket degradation.
**Architecture impact:** Add a second listener (HTTP/3 + WebTransport) that
terminates WebTransport and bridges into the existing `NodeService` RPC layer.
Cap'n Proto serialisation works in WASM via `capnp` crate.
**Crates:** `h3`, `h3-webtransport`, `wtransport`
### Tor / I2P Integration
**Problem:** MLS protects message content, but connection metadata (who connects
to the server, when, how often) leaks to the server and network observers.
**Solution:** Route client-server connections through
[Tor](https://www.torproject.org/) onion services or
[I2P](https://geti2p.net/) tunnels. This provides metadata resistance at the
network layer.
**Architecture impact:** The server exposes a `.onion` address (Tor) or an I2P
destination. Clients connect through the anonymity network. Latency increases
significantly, so this should be optional.
**Crates:** `arti` (Tor client in Rust), `arti-client`
---
## Storage and Persistence
### CRDTs (Automerge / Yrs)
**Problem:** Multi-device support requires synchronising state (group membership,
read receipts, settings) across devices without a central authority resolving
conflicts.
**Solution:** Conflict-free replicated data types (CRDTs) allow concurrent edits
to converge without coordination. [Automerge](https://automerge.org/) and
[Yrs](https://docs.rs/yrs/) (Yjs in Rust) provide production-quality CRDT
implementations.
**Architecture impact:** Client-side state (contact list, group membership
cache, read markers) stored as CRDT documents. Synchronisation happens over the
existing MLS-encrypted channel, ensuring the server never sees the state.
**Crates:** `automerge`, `yrs`
### Object Storage (S3-compatible)
**Problem:** Encrypted file and media attachments need a storage backend that
the server can host without seeing the content.
**Solution:** An S3-compatible object store (MinIO, Garage, or a cloud provider)
for encrypted blobs. Clients encrypt attachments client-side (using a key derived
from the MLS group secret) and upload the ciphertext. The server stores and
serves opaque blobs.
**Architecture impact:** Add a media upload/download RPC to `NodeService`. The
server proxies to the object store or returns pre-signed URLs.
**Crates:** `aws-sdk-s3`, `opendal`
---
## Cryptography and Privacy
### ML-KEM + ML-DSA Hybrid (Post-Quantum MLS)
**Problem:** Quantum computers threaten X25519 and Ed25519. While MLS content is
protected by ephemeral key exchange, the init keys and credential signatures are
vulnerable to harvest-now-decrypt-later attacks.
**Solution:** Hybrid X25519 + ML-KEM-768 KEM for MLS init keys, and optionally
hybrid Ed25519 + ML-DSA-65 for credential signatures. The `ml-kem` crate is
already vendored in the workspace.
**Architecture impact:** Custom `OpenMlsCryptoProvider` in `quicproquo-core`
implementing the hybrid combiner. This is the M7 milestone -- see
[Milestones](milestones.md#m7----post-quantum-planned) and
[Hybrid KEM](../protocol-layers/hybrid-kem.md).
**Crates:** `ml-kem`, `ml-dsa`
**References:** NIST FIPS 203 (ML-KEM), `draft-ietf-tls-hybrid-design`
### Private Information Retrieval (PIR)
**Problem:** When a client fetches messages or KeyPackages, the server learns
*which* recipient is requesting -- even though it cannot read the content.
**Solution:** Private Information Retrieval (PIR) allows a client to fetch a
record from the server without revealing which record was requested.
[SealPIR](https://github.com/microsoft/SealPIR) and SimplePIR provide practical
constructions.
**Architecture impact:** Replace the `fetch` / `fetchKeyPackage` RPCs with PIR
queries. This is a significant performance trade-off: PIR has high computational
cost. Suitable for KeyPackage fetch (small database) before message fetch (large
database).
### Key Transparency (RFC draft)
**Problem:** A compromised server could substitute public keys, performing a
man-in-the-middle attack on MLS group formation.
**Solution:** A verifiable, append-only log of public key bindings (similar to
Certificate Transparency for TLS). Clients verify that the server's response
matches the log before trusting a fetched KeyPackage.
**Architecture impact:** Add a key transparency log (Merkle tree) alongside the
Authentication Service. Clients verify inclusion proofs on every `fetchKeyPackage`
response.
**References:** `draft-ietf-keytrans-protocol`
---
## Identity and Authentication
### DIDs (Decentralized Identifiers)
**Problem:** User identities are currently bound to the server. If the server
goes away, identities are lost.
**Solution:** [Decentralized Identifiers](https://www.w3.org/TR/did-core/)
(`did:key`, `did:web`) provide self-sovereign identity. A user's DID is derived
from their Ed25519 public key and is portable across servers.
**Architecture impact:** Replace raw Ed25519 public keys in MLS credentials with
DID URIs. The server resolves DIDs to public keys for routing.
**Crates:** `did-key`, `ssi`
### WebAuthn / Passkeys
**Problem:** Password-based auth (even with OPAQUE) is vulnerable to phishing.
Hardware-backed authentication provides stronger device binding.
**Solution:** [WebAuthn](https://www.w3.org/TR/webauthn-3/) / Passkeys allow
authentication via hardware tokens (YubiKey), platform authenticators (Touch ID,
Windows Hello), or synced passkeys.
**Architecture impact:** Add a WebAuthn registration/authentication flow to the
account system. Requires a server-side WebAuthn relying party implementation.
**Crates:** `webauthn-rs`
### Verifiable Credentials (W3C VC)
**Problem:** Proving attributes (organization membership, role, age) without
revealing full identity.
**Solution:** [Verifiable Credentials](https://www.w3.org/TR/vc-data-model/)
allow a user to present cryptographic proofs of attributes issued by a trusted
authority.
**Architecture impact:** Extend MLS credentials with VC presentation. A group
admin could require proof of organization membership before allowing join.
---
## Application Layer
### Matrix-style Federation
**Problem:** A single server is a single point of failure and a single point of
trust. Users on different servers cannot communicate.
**Solution:** Federation allows multiple quicproquo servers to exchange
messages, similar to [Matrix](https://matrix.org/) homeserver federation. Each
server manages its own users and relays messages to peer servers.
**Architecture impact:** Major. Requires server-to-server protocol, distributed
identity resolution, and cross-server MLS group management.
### WASM Plugin System
**Problem:** Extensibility (bots, bridges, custom message types) currently
requires forking the codebase.
**Solution:** A sandboxed WASM plugin system allows third-party extensions to run
inside the client or server without access to private key material.
**Architecture impact:** Define a plugin API (message hooks, command handlers).
Plugins compiled to WASM and loaded at runtime via `wasmtime` or `wasmer`.
**Crates:** `wasmtime`, `wasmer`, `extism`
### Double-Ratchet DM Layer
**Problem:** MLS is optimised for groups. For efficient 1:1 conversations, the
Signal double ratchet (X3DH + Axolotl) provides better performance
characteristics (no tree overhead for two parties).
**Solution:** Implement a double-ratchet layer for 1:1 DMs, using MLS only for
groups with N > 2. The [1:1 Channel Design](dm-channels.md) currently uses MLS
for DMs; this would be an optimisation.
**References:** [The Double Ratchet Algorithm](https://signal.org/docs/specifications/doubleratchet/),
[X3DH Key Agreement Protocol](https://signal.org/docs/specifications/x3dh/)
---
## Observability and Operations
### OpenTelemetry (Tracing + Metrics)
**Problem:** The current logging is `tracing`-based but lacks distributed
tracing context and structured metrics export.
**Solution:** [OpenTelemetry](https://opentelemetry.io/) provides a unified
framework for distributed tracing, metrics, and log correlation. OTLP export
enables integration with any observability backend.
**Architecture impact:** Add `tracing-opentelemetry` and `opentelemetry-otlp`
to the server. Instrument RPC handlers with spans. Export to Jaeger, Grafana
Tempo, or any OTLP-compatible backend.
**Crates:** `opentelemetry`, `opentelemetry-otlp`, `tracing-opentelemetry`
### Prometheus + Grafana
**Problem:** No quantitative visibility into server performance (throughput,
latency, queue depth, epoch advancement rate).
**Solution:** Export Prometheus metrics from the server. Visualise with Grafana
dashboards.
**Metrics to export:** message throughput (enqueue/fetch per second), RPC
latency histograms, MLS epoch advancement rate, delivery queue depth, KeyPackage
store size, active connections.
**Crates:** `prometheus`, `metrics`, `metrics-exporter-prometheus`
### Testcontainers-rs
**Problem:** Integration tests currently run server and client in the same
process (`tokio::spawn`). This does not test real network conditions, container
startup, or multi-process interactions.
**Solution:** [Testcontainers-rs](https://docs.rs/testcontainers/) runs Docker
containers from Rust tests, enabling true end-to-end CI with real network
boundaries.
**Architecture impact:** Add testcontainers-based integration tests alongside
the existing in-process tests. The Docker image is already maintained.
**Crates:** `testcontainers`, `testcontainers-modules`
---
## Developer Experience
### Tauri / Dioxus (Native GUI)
**Problem:** The current interface is CLI-only. A graphical client would broaden
the user base for testing and demonstration.
**Solution:** [Tauri](https://tauri.app/) or [Dioxus](https://dioxuslabs.com/)
provide native cross-platform GUI frameworks in Rust. The
`quicproquo-core` crate can be shared directly with the GUI client.
**Architecture impact:** Add a `quicproquo-gui` crate that depends on
`quicproquo-core` and `quicproquo-proto`. The GUI drives the same
`GroupMember` and RPC logic as the CLI client.
**Crates:** `tauri`, `dioxus`
### uniffi / diplomat (Mobile FFI)
**Problem:** Mobile clients (iOS, Android) cannot use the Rust binary directly.
**Solution:** [uniffi](https://github.com/aspect-build/aspect-cli) (Mozilla) and
[diplomat](https://github.com/nickelc/diplomat) generate idiomatic Swift and
Kotlin bindings from Rust definitions.
**Architecture impact:** Expose `quicproquo-core` through a C-compatible FFI
layer. Mobile apps call into the Rust crypto and protocol logic.
**Crates:** `uniffi`, `diplomat`
### Nix Flakes
**Problem:** The development environment requires `capnp` (Cap'n Proto compiler),
a specific Rust toolchain version, and test infrastructure. Setup varies across
developer machines.
**Solution:** [Nix flakes](https://nixos.wiki/wiki/Flakes) provide a
reproducible, declarative development environment. A single `nix develop`
command sets up the toolchain, `capnp`, and all dependencies.
**Architecture impact:** Add `flake.nix` and `flake.lock` to the repository root.
---
## Top Priority Implementations
The following table ranks the most impactful technologies for near-term adoption,
considering the current state of the codebase and the [milestone plan](milestones.md).
Items marked **Implemented** are already part of the v2 codebase.
| Priority | Technology | Why | Status |
|----------|-----------|-----|--------|
| -- | **Post-quantum hybrid KEM** | `ml-kem` vendored; custom `OpenMlsCryptoProvider` with X25519 + ML-KEM-768. | **Implemented** |
| -- | **SQLCipher persistence** | Encrypted-at-rest storage via rusqlite + bundled-sqlcipher + Argon2id key derivation. | **Implemented** |
| -- | **OPAQUE auth** | Zero-knowledge password authentication via `opaque-ke`. Server never stores passwords. | **Implemented** |
| -- | **iroh P2P** | NAT traversal and optional P2P mesh via the `quicproquo-p2p` crate (feature-flagged). | **Implemented** |
| -- | **Sealed Sender** | `--sealed-sender` flag encrypts sender identity inside MLS ciphertext. | **Implemented** |
| 1 | **PIR (Private Information Retrieval)** | Fetch messages without revealing the recipient's identity to the server. | Future |
| 2 | **Key Transparency** | Verifiable, append-only log of public key bindings. Detects key substitution attacks. | Future |
| 3 | **WebTransport (HTTP/3)** | Enables browser clients without a WebSocket bridge. | Future |
| 4 | **OpenTelemetry** | Distributed tracing and structured metrics for production observability. | Future |
| 5 | **WebAuthn / Passkeys** | Hardware-backed authentication to replace password-based login. | Future |
---
## Cross-references
- [Milestones](milestones.md) -- current milestone tracker
- [Production Readiness WBS](production-readiness.md) -- phased work breakdown
- [Auth, Devices, and Tokens](authz-plan.md) -- OPAQUE integration point
- [1:1 Channel Design](dm-channels.md) -- double-ratchet optimisation context
- [Hybrid KEM](../protocol-layers/hybrid-kem.md) -- existing PQ design
- [References](../appendix/references.md) -- standards and crate documentation