Files
quicproquo/docs/src/roadmap/future-research.md
Christian Nennemann 2e081ead8e chore: rename quicproquo → quicprochat in docs, Docker, CI, and packaging
Rename all project references from quicproquo/qpq to quicprochat/qpc
across documentation, Docker configuration, CI workflows, packaging
scripts, operational configs, and build tooling.

- Docker: crate paths, binary names, user/group, data dirs, env vars
- CI: workflow crate references, binary names, artifact names
- Docs: all markdown files under docs/, SDK READMEs, book.toml
- Packaging: OpenWrt Makefile, init script, UCI config (file renames)
- Scripts: justfile, dev-shell, screenshot, cross-compile, ai_team
- Operations: Prometheus config, alert rules, Grafana dashboard
- Config: .env.example (QPQ_* → QPC_*), CODEOWNERS paths
- Top-level: README, CONTRIBUTING, ROADMAP, CLAUDE.md
2026-03-21 19:14:06 +01:00

346 lines
14 KiB
Markdown

# Future Research Directions
This page catalogues technologies and research directions that could strengthen
quicprochat beyond the current [milestone plan](milestones.md). Each entry
includes a brief description, the problem it solves, relevant crates or
specifications, and how it maps to the project architecture.
For the production readiness work breakdown, see
[Production Readiness WBS](production-readiness.md).
---
## Transport and Networking
### WebTransport (HTTP/3)
**Problem:** Browser clients cannot use raw QUIC. The current stack requires a
native Rust binary.
**Solution:** [WebTransport](https://w3c.github.io/webtransport/) exposes
QUIC-like semantics (multiplexed bidirectional streams, datagrams) to browsers
over HTTP/3. A WebTransport endpoint alongside the existing QUIC listener would
enable a web client without WebSocket degradation.
**Architecture impact:** Add a second listener (HTTP/3 + WebTransport) that
terminates WebTransport and bridges into the existing `NodeService` RPC layer.
Cap'n Proto serialisation works in WASM via `capnp` crate.
**Crates:** `h3`, `h3-webtransport`, `wtransport`
### Tor / I2P Integration
**Problem:** MLS protects message content, but connection metadata (who connects
to the server, when, how often) leaks to the server and network observers.
**Solution:** Route client-server connections through
[Tor](https://www.torproject.org/) onion services or
[I2P](https://geti2p.net/) tunnels. This provides metadata resistance at the
network layer.
**Architecture impact:** The server exposes a `.onion` address (Tor) or an I2P
destination. Clients connect through the anonymity network. Latency increases
significantly, so this should be optional.
**Crates:** `arti` (Tor client in Rust), `arti-client`
---
## Storage and Persistence
### CRDTs (Automerge / Yrs)
**Problem:** Multi-device support requires synchronising state (group membership,
read receipts, settings) across devices without a central authority resolving
conflicts.
**Solution:** Conflict-free replicated data types (CRDTs) allow concurrent edits
to converge without coordination. [Automerge](https://automerge.org/) and
[Yrs](https://docs.rs/yrs/) (Yjs in Rust) provide production-quality CRDT
implementations.
**Architecture impact:** Client-side state (contact list, group membership
cache, read markers) stored as CRDT documents. Synchronisation happens over the
existing MLS-encrypted channel, ensuring the server never sees the state.
**Crates:** `automerge`, `yrs`
### Object Storage (S3-compatible)
**Problem:** Encrypted file and media attachments need a storage backend that
the server can host without seeing the content.
**Solution:** An S3-compatible object store (MinIO, Garage, or a cloud provider)
for encrypted blobs. Clients encrypt attachments client-side (using a key derived
from the MLS group secret) and upload the ciphertext. The server stores and
serves opaque blobs.
**Architecture impact:** Add a media upload/download RPC to `NodeService`. The
server proxies to the object store or returns pre-signed URLs.
**Crates:** `aws-sdk-s3`, `opendal`
---
## Cryptography and Privacy
### ML-KEM + ML-DSA Hybrid (Post-Quantum MLS)
**Problem:** Quantum computers threaten X25519 and Ed25519. While MLS content is
protected by ephemeral key exchange, the init keys and credential signatures are
vulnerable to harvest-now-decrypt-later attacks.
**Solution:** Hybrid X25519 + ML-KEM-768 KEM for MLS init keys, and optionally
hybrid Ed25519 + ML-DSA-65 for credential signatures. The `ml-kem` crate is
already vendored in the workspace.
**Architecture impact:** Custom `OpenMlsCryptoProvider` in `quicprochat-core`
implementing the hybrid combiner. This is the M7 milestone -- see
[Milestones](milestones.md#m7----post-quantum-planned) and
[Hybrid KEM](../protocol-layers/hybrid-kem.md).
**Crates:** `ml-kem`, `ml-dsa`
**References:** NIST FIPS 203 (ML-KEM), `draft-ietf-tls-hybrid-design`
### Private Information Retrieval (PIR)
**Problem:** When a client fetches messages or KeyPackages, the server learns
*which* recipient is requesting -- even though it cannot read the content.
**Solution:** Private Information Retrieval (PIR) allows a client to fetch a
record from the server without revealing which record was requested.
[SealPIR](https://github.com/microsoft/SealPIR) and SimplePIR provide practical
constructions.
**Architecture impact:** Replace the `fetch` / `fetchKeyPackage` RPCs with PIR
queries. This is a significant performance trade-off: PIR has high computational
cost. Suitable for KeyPackage fetch (small database) before message fetch (large
database).
### Key Transparency (RFC draft)
**Problem:** A compromised server could substitute public keys, performing a
man-in-the-middle attack on MLS group formation.
**Solution:** A verifiable, append-only log of public key bindings (similar to
Certificate Transparency for TLS). Clients verify that the server's response
matches the log before trusting a fetched KeyPackage.
**Architecture impact:** Add a key transparency log (Merkle tree) alongside the
Authentication Service. Clients verify inclusion proofs on every `fetchKeyPackage`
response.
**References:** `draft-ietf-keytrans-protocol`
---
## Identity and Authentication
### DIDs (Decentralized Identifiers)
**Problem:** User identities are currently bound to the server. If the server
goes away, identities are lost.
**Solution:** [Decentralized Identifiers](https://www.w3.org/TR/did-core/)
(`did:key`, `did:web`) provide self-sovereign identity. A user's DID is derived
from their Ed25519 public key and is portable across servers.
**Architecture impact:** Replace raw Ed25519 public keys in MLS credentials with
DID URIs. The server resolves DIDs to public keys for routing.
**Crates:** `did-key`, `ssi`
### WebAuthn / Passkeys
**Problem:** Password-based auth (even with OPAQUE) is vulnerable to phishing.
Hardware-backed authentication provides stronger device binding.
**Solution:** [WebAuthn](https://www.w3.org/TR/webauthn-3/) / Passkeys allow
authentication via hardware tokens (YubiKey), platform authenticators (Touch ID,
Windows Hello), or synced passkeys.
**Architecture impact:** Add a WebAuthn registration/authentication flow to the
account system. Requires a server-side WebAuthn relying party implementation.
**Crates:** `webauthn-rs`
### Verifiable Credentials (W3C VC)
**Problem:** Proving attributes (organization membership, role, age) without
revealing full identity.
**Solution:** [Verifiable Credentials](https://www.w3.org/TR/vc-data-model/)
allow a user to present cryptographic proofs of attributes issued by a trusted
authority.
**Architecture impact:** Extend MLS credentials with VC presentation. A group
admin could require proof of organization membership before allowing join.
---
## Application Layer
### Matrix-style Federation
**Problem:** A single server is a single point of failure and a single point of
trust. Users on different servers cannot communicate.
**Solution:** Federation allows multiple quicprochat servers to exchange
messages, similar to [Matrix](https://matrix.org/) homeserver federation. Each
server manages its own users and relays messages to peer servers.
**Architecture impact:** Major. Requires server-to-server protocol, distributed
identity resolution, and cross-server MLS group management.
### WASM Plugin System
**Problem:** Extensibility (bots, bridges, custom message types) currently
requires forking the codebase.
**Solution:** A sandboxed WASM plugin system allows third-party extensions to run
inside the client or server without access to private key material.
**Architecture impact:** Define a plugin API (message hooks, command handlers).
Plugins compiled to WASM and loaded at runtime via `wasmtime` or `wasmer`.
**Crates:** `wasmtime`, `wasmer`, `extism`
### Double-Ratchet DM Layer
**Problem:** MLS is optimised for groups. For efficient 1:1 conversations, the
Signal double ratchet (X3DH + Axolotl) provides better performance
characteristics (no tree overhead for two parties).
**Solution:** Implement a double-ratchet layer for 1:1 DMs, using MLS only for
groups with N > 2. The [1:1 Channel Design](dm-channels.md) currently uses MLS
for DMs; this would be an optimisation.
**References:** [The Double Ratchet Algorithm](https://signal.org/docs/specifications/doubleratchet/),
[X3DH Key Agreement Protocol](https://signal.org/docs/specifications/x3dh/)
---
## Observability and Operations
### OpenTelemetry (Tracing + Metrics)
**Problem:** The current logging is `tracing`-based but lacks distributed
tracing context and structured metrics export.
**Solution:** [OpenTelemetry](https://opentelemetry.io/) provides a unified
framework for distributed tracing, metrics, and log correlation. OTLP export
enables integration with any observability backend.
**Architecture impact:** Add `tracing-opentelemetry` and `opentelemetry-otlp`
to the server. Instrument RPC handlers with spans. Export to Jaeger, Grafana
Tempo, or any OTLP-compatible backend.
**Crates:** `opentelemetry`, `opentelemetry-otlp`, `tracing-opentelemetry`
### Prometheus + Grafana
**Problem:** No quantitative visibility into server performance (throughput,
latency, queue depth, epoch advancement rate).
**Solution:** Export Prometheus metrics from the server. Visualise with Grafana
dashboards.
**Metrics to export:** message throughput (enqueue/fetch per second), RPC
latency histograms, MLS epoch advancement rate, delivery queue depth, KeyPackage
store size, active connections.
**Crates:** `prometheus`, `metrics`, `metrics-exporter-prometheus`
### Testcontainers-rs
**Problem:** Integration tests currently run server and client in the same
process (`tokio::spawn`). This does not test real network conditions, container
startup, or multi-process interactions.
**Solution:** [Testcontainers-rs](https://docs.rs/testcontainers/) runs Docker
containers from Rust tests, enabling true end-to-end CI with real network
boundaries.
**Architecture impact:** Add testcontainers-based integration tests alongside
the existing in-process tests. The Docker image is already maintained.
**Crates:** `testcontainers`, `testcontainers-modules`
---
## Developer Experience
### Tauri / Dioxus (Native GUI)
**Problem:** The current interface is CLI-only. A graphical client would broaden
the user base for testing and demonstration.
**Solution:** [Tauri](https://tauri.app/) or [Dioxus](https://dioxuslabs.com/)
provide native cross-platform GUI frameworks in Rust. The
`quicprochat-core` crate can be shared directly with the GUI client.
**Architecture impact:** Add a `quicprochat-gui` crate that depends on
`quicprochat-core` and `quicprochat-proto`. The GUI drives the same
`GroupMember` and RPC logic as the CLI client.
**Crates:** `tauri`, `dioxus`
### uniffi / diplomat (Mobile FFI)
**Problem:** Mobile clients (iOS, Android) cannot use the Rust binary directly.
**Solution:** [uniffi](https://github.com/aspect-build/aspect-cli) (Mozilla) and
[diplomat](https://github.com/nickelc/diplomat) generate idiomatic Swift and
Kotlin bindings from Rust definitions.
**Architecture impact:** Expose `quicprochat-core` through a C-compatible FFI
layer. Mobile apps call into the Rust crypto and protocol logic.
**Crates:** `uniffi`, `diplomat`
### Nix Flakes
**Problem:** The development environment requires `capnp` (Cap'n Proto compiler),
a specific Rust toolchain version, and test infrastructure. Setup varies across
developer machines.
**Solution:** [Nix flakes](https://nixos.wiki/wiki/Flakes) provide a
reproducible, declarative development environment. A single `nix develop`
command sets up the toolchain, `capnp`, and all dependencies.
**Architecture impact:** Add `flake.nix` and `flake.lock` to the repository root.
---
## Top Priority Implementations
The following table ranks the most impactful technologies for near-term adoption,
considering the current state of the codebase and the [milestone plan](milestones.md).
Items marked **Implemented** are already part of the v2 codebase.
| Priority | Technology | Why | Status |
|----------|-----------|-----|--------|
| -- | **Post-quantum hybrid KEM** | `ml-kem` vendored; custom `OpenMlsCryptoProvider` with X25519 + ML-KEM-768. | **Implemented** |
| -- | **SQLCipher persistence** | Encrypted-at-rest storage via rusqlite + bundled-sqlcipher + Argon2id key derivation. | **Implemented** |
| -- | **OPAQUE auth** | Zero-knowledge password authentication via `opaque-ke`. Server never stores passwords. | **Implemented** |
| -- | **iroh P2P** | NAT traversal and optional P2P mesh via the `quicprochat-p2p` crate (feature-flagged). | **Implemented** |
| -- | **Sealed Sender** | `--sealed-sender` flag encrypts sender identity inside MLS ciphertext. | **Implemented** |
| 1 | **PIR (Private Information Retrieval)** | Fetch messages without revealing the recipient's identity to the server. | Future |
| 2 | **Key Transparency** | Verifiable, append-only log of public key bindings. Detects key substitution attacks. | Future |
| 3 | **WebTransport (HTTP/3)** | Enables browser clients without a WebSocket bridge. | Future |
| 4 | **OpenTelemetry** | Distributed tracing and structured metrics for production observability. | Future |
| 5 | **WebAuthn / Passkeys** | Hardware-backed authentication to replace password-based login. | Future |
---
## Cross-references
- [Milestones](milestones.md) -- current milestone tracker
- [Production Readiness WBS](production-readiness.md) -- phased work breakdown
- [Auth, Devices, and Tokens](authz-plan.md) -- OPAQUE integration point
- [1:1 Channel Design](dm-channels.md) -- double-ratchet optimisation context
- [Hybrid KEM](../protocol-layers/hybrid-kem.md) -- existing PQ design
- [References](../appendix/references.md) -- standards and crate documentation