Update 25+ files and add 6 new pages to reflect the v2 migration from Cap'n Proto to Protobuf framing over QUIC. Integrates SDK and Operations docs into the mdBook, restructures SUMMARY.md, and rewrites the wire format, architecture, and protocol sections with accurate v2 content.
346 lines
14 KiB
Markdown
346 lines
14 KiB
Markdown
# Future Research Directions
|
|
|
|
This page catalogues technologies and research directions that could strengthen
|
|
quicproquo beyond the current [milestone plan](milestones.md). Each entry
|
|
includes a brief description, the problem it solves, relevant crates or
|
|
specifications, and how it maps to the project architecture.
|
|
|
|
For the production readiness work breakdown, see
|
|
[Production Readiness WBS](production-readiness.md).
|
|
|
|
---
|
|
|
|
## Transport and Networking
|
|
|
|
### WebTransport (HTTP/3)
|
|
|
|
**Problem:** Browser clients cannot use raw QUIC. The current stack requires a
|
|
native Rust binary.
|
|
|
|
**Solution:** [WebTransport](https://w3c.github.io/webtransport/) exposes
|
|
QUIC-like semantics (multiplexed bidirectional streams, datagrams) to browsers
|
|
over HTTP/3. A WebTransport endpoint alongside the existing QUIC listener would
|
|
enable a web client without WebSocket degradation.
|
|
|
|
**Architecture impact:** Add a second listener (HTTP/3 + WebTransport) that
|
|
terminates WebTransport and bridges into the existing `NodeService` RPC layer.
|
|
Cap'n Proto serialisation works in WASM via `capnp` crate.
|
|
|
|
**Crates:** `h3`, `h3-webtransport`, `wtransport`
|
|
|
|
### Tor / I2P Integration
|
|
|
|
**Problem:** MLS protects message content, but connection metadata (who connects
|
|
to the server, when, how often) leaks to the server and network observers.
|
|
|
|
**Solution:** Route client-server connections through
|
|
[Tor](https://www.torproject.org/) onion services or
|
|
[I2P](https://geti2p.net/) tunnels. This provides metadata resistance at the
|
|
network layer.
|
|
|
|
**Architecture impact:** The server exposes a `.onion` address (Tor) or an I2P
|
|
destination. Clients connect through the anonymity network. Latency increases
|
|
significantly, so this should be optional.
|
|
|
|
**Crates:** `arti` (Tor client in Rust), `arti-client`
|
|
|
|
---
|
|
|
|
## Storage and Persistence
|
|
|
|
### CRDTs (Automerge / Yrs)
|
|
|
|
**Problem:** Multi-device support requires synchronising state (group membership,
|
|
read receipts, settings) across devices without a central authority resolving
|
|
conflicts.
|
|
|
|
**Solution:** Conflict-free replicated data types (CRDTs) allow concurrent edits
|
|
to converge without coordination. [Automerge](https://automerge.org/) and
|
|
[Yrs](https://docs.rs/yrs/) (Yjs in Rust) provide production-quality CRDT
|
|
implementations.
|
|
|
|
**Architecture impact:** Client-side state (contact list, group membership
|
|
cache, read markers) stored as CRDT documents. Synchronisation happens over the
|
|
existing MLS-encrypted channel, ensuring the server never sees the state.
|
|
|
|
**Crates:** `automerge`, `yrs`
|
|
|
|
### Object Storage (S3-compatible)
|
|
|
|
**Problem:** Encrypted file and media attachments need a storage backend that
|
|
the server can host without seeing the content.
|
|
|
|
**Solution:** An S3-compatible object store (MinIO, Garage, or a cloud provider)
|
|
for encrypted blobs. Clients encrypt attachments client-side (using a key derived
|
|
from the MLS group secret) and upload the ciphertext. The server stores and
|
|
serves opaque blobs.
|
|
|
|
**Architecture impact:** Add a media upload/download RPC to `NodeService`. The
|
|
server proxies to the object store or returns pre-signed URLs.
|
|
|
|
**Crates:** `aws-sdk-s3`, `opendal`
|
|
|
|
---
|
|
|
|
## Cryptography and Privacy
|
|
|
|
### ML-KEM + ML-DSA Hybrid (Post-Quantum MLS)
|
|
|
|
**Problem:** Quantum computers threaten X25519 and Ed25519. While MLS content is
|
|
protected by ephemeral key exchange, the init keys and credential signatures are
|
|
vulnerable to harvest-now-decrypt-later attacks.
|
|
|
|
**Solution:** Hybrid X25519 + ML-KEM-768 KEM for MLS init keys, and optionally
|
|
hybrid Ed25519 + ML-DSA-65 for credential signatures. The `ml-kem` crate is
|
|
already vendored in the workspace.
|
|
|
|
**Architecture impact:** Custom `OpenMlsCryptoProvider` in `quicproquo-core`
|
|
implementing the hybrid combiner. This is the M7 milestone -- see
|
|
[Milestones](milestones.md#m7----post-quantum-planned) and
|
|
[Hybrid KEM](../protocol-layers/hybrid-kem.md).
|
|
|
|
**Crates:** `ml-kem`, `ml-dsa`
|
|
|
|
**References:** NIST FIPS 203 (ML-KEM), `draft-ietf-tls-hybrid-design`
|
|
|
|
### Private Information Retrieval (PIR)
|
|
|
|
**Problem:** When a client fetches messages or KeyPackages, the server learns
|
|
*which* recipient is requesting -- even though it cannot read the content.
|
|
|
|
**Solution:** Private Information Retrieval (PIR) allows a client to fetch a
|
|
record from the server without revealing which record was requested.
|
|
[SealPIR](https://github.com/microsoft/SealPIR) and SimplePIR provide practical
|
|
constructions.
|
|
|
|
**Architecture impact:** Replace the `fetch` / `fetchKeyPackage` RPCs with PIR
|
|
queries. This is a significant performance trade-off: PIR has high computational
|
|
cost. Suitable for KeyPackage fetch (small database) before message fetch (large
|
|
database).
|
|
|
|
### Key Transparency (RFC draft)
|
|
|
|
**Problem:** A compromised server could substitute public keys, performing a
|
|
man-in-the-middle attack on MLS group formation.
|
|
|
|
**Solution:** A verifiable, append-only log of public key bindings (similar to
|
|
Certificate Transparency for TLS). Clients verify that the server's response
|
|
matches the log before trusting a fetched KeyPackage.
|
|
|
|
**Architecture impact:** Add a key transparency log (Merkle tree) alongside the
|
|
Authentication Service. Clients verify inclusion proofs on every `fetchKeyPackage`
|
|
response.
|
|
|
|
**References:** `draft-ietf-keytrans-protocol`
|
|
|
|
---
|
|
|
|
## Identity and Authentication
|
|
|
|
### DIDs (Decentralized Identifiers)
|
|
|
|
**Problem:** User identities are currently bound to the server. If the server
|
|
goes away, identities are lost.
|
|
|
|
**Solution:** [Decentralized Identifiers](https://www.w3.org/TR/did-core/)
|
|
(`did:key`, `did:web`) provide self-sovereign identity. A user's DID is derived
|
|
from their Ed25519 public key and is portable across servers.
|
|
|
|
**Architecture impact:** Replace raw Ed25519 public keys in MLS credentials with
|
|
DID URIs. The server resolves DIDs to public keys for routing.
|
|
|
|
**Crates:** `did-key`, `ssi`
|
|
|
|
### WebAuthn / Passkeys
|
|
|
|
**Problem:** Password-based auth (even with OPAQUE) is vulnerable to phishing.
|
|
Hardware-backed authentication provides stronger device binding.
|
|
|
|
**Solution:** [WebAuthn](https://www.w3.org/TR/webauthn-3/) / Passkeys allow
|
|
authentication via hardware tokens (YubiKey), platform authenticators (Touch ID,
|
|
Windows Hello), or synced passkeys.
|
|
|
|
**Architecture impact:** Add a WebAuthn registration/authentication flow to the
|
|
account system. Requires a server-side WebAuthn relying party implementation.
|
|
|
|
**Crates:** `webauthn-rs`
|
|
|
|
### Verifiable Credentials (W3C VC)
|
|
|
|
**Problem:** Proving attributes (organization membership, role, age) without
|
|
revealing full identity.
|
|
|
|
**Solution:** [Verifiable Credentials](https://www.w3.org/TR/vc-data-model/)
|
|
allow a user to present cryptographic proofs of attributes issued by a trusted
|
|
authority.
|
|
|
|
**Architecture impact:** Extend MLS credentials with VC presentation. A group
|
|
admin could require proof of organization membership before allowing join.
|
|
|
|
---
|
|
|
|
## Application Layer
|
|
|
|
### Matrix-style Federation
|
|
|
|
**Problem:** A single server is a single point of failure and a single point of
|
|
trust. Users on different servers cannot communicate.
|
|
|
|
**Solution:** Federation allows multiple quicproquo servers to exchange
|
|
messages, similar to [Matrix](https://matrix.org/) homeserver federation. Each
|
|
server manages its own users and relays messages to peer servers.
|
|
|
|
**Architecture impact:** Major. Requires server-to-server protocol, distributed
|
|
identity resolution, and cross-server MLS group management.
|
|
|
|
### WASM Plugin System
|
|
|
|
**Problem:** Extensibility (bots, bridges, custom message types) currently
|
|
requires forking the codebase.
|
|
|
|
**Solution:** A sandboxed WASM plugin system allows third-party extensions to run
|
|
inside the client or server without access to private key material.
|
|
|
|
**Architecture impact:** Define a plugin API (message hooks, command handlers).
|
|
Plugins compiled to WASM and loaded at runtime via `wasmtime` or `wasmer`.
|
|
|
|
**Crates:** `wasmtime`, `wasmer`, `extism`
|
|
|
|
### Double-Ratchet DM Layer
|
|
|
|
**Problem:** MLS is optimised for groups. For efficient 1:1 conversations, the
|
|
Signal double ratchet (X3DH + Axolotl) provides better performance
|
|
characteristics (no tree overhead for two parties).
|
|
|
|
**Solution:** Implement a double-ratchet layer for 1:1 DMs, using MLS only for
|
|
groups with N > 2. The [1:1 Channel Design](dm-channels.md) currently uses MLS
|
|
for DMs; this would be an optimisation.
|
|
|
|
**References:** [The Double Ratchet Algorithm](https://signal.org/docs/specifications/doubleratchet/),
|
|
[X3DH Key Agreement Protocol](https://signal.org/docs/specifications/x3dh/)
|
|
|
|
---
|
|
|
|
## Observability and Operations
|
|
|
|
### OpenTelemetry (Tracing + Metrics)
|
|
|
|
**Problem:** The current logging is `tracing`-based but lacks distributed
|
|
tracing context and structured metrics export.
|
|
|
|
**Solution:** [OpenTelemetry](https://opentelemetry.io/) provides a unified
|
|
framework for distributed tracing, metrics, and log correlation. OTLP export
|
|
enables integration with any observability backend.
|
|
|
|
**Architecture impact:** Add `tracing-opentelemetry` and `opentelemetry-otlp`
|
|
to the server. Instrument RPC handlers with spans. Export to Jaeger, Grafana
|
|
Tempo, or any OTLP-compatible backend.
|
|
|
|
**Crates:** `opentelemetry`, `opentelemetry-otlp`, `tracing-opentelemetry`
|
|
|
|
### Prometheus + Grafana
|
|
|
|
**Problem:** No quantitative visibility into server performance (throughput,
|
|
latency, queue depth, epoch advancement rate).
|
|
|
|
**Solution:** Export Prometheus metrics from the server. Visualise with Grafana
|
|
dashboards.
|
|
|
|
**Metrics to export:** message throughput (enqueue/fetch per second), RPC
|
|
latency histograms, MLS epoch advancement rate, delivery queue depth, KeyPackage
|
|
store size, active connections.
|
|
|
|
**Crates:** `prometheus`, `metrics`, `metrics-exporter-prometheus`
|
|
|
|
### Testcontainers-rs
|
|
|
|
**Problem:** Integration tests currently run server and client in the same
|
|
process (`tokio::spawn`). This does not test real network conditions, container
|
|
startup, or multi-process interactions.
|
|
|
|
**Solution:** [Testcontainers-rs](https://docs.rs/testcontainers/) runs Docker
|
|
containers from Rust tests, enabling true end-to-end CI with real network
|
|
boundaries.
|
|
|
|
**Architecture impact:** Add testcontainers-based integration tests alongside
|
|
the existing in-process tests. The Docker image is already maintained.
|
|
|
|
**Crates:** `testcontainers`, `testcontainers-modules`
|
|
|
|
---
|
|
|
|
## Developer Experience
|
|
|
|
### Tauri / Dioxus (Native GUI)
|
|
|
|
**Problem:** The current interface is CLI-only. A graphical client would broaden
|
|
the user base for testing and demonstration.
|
|
|
|
**Solution:** [Tauri](https://tauri.app/) or [Dioxus](https://dioxuslabs.com/)
|
|
provide native cross-platform GUI frameworks in Rust. The
|
|
`quicproquo-core` crate can be shared directly with the GUI client.
|
|
|
|
**Architecture impact:** Add a `quicproquo-gui` crate that depends on
|
|
`quicproquo-core` and `quicproquo-proto`. The GUI drives the same
|
|
`GroupMember` and RPC logic as the CLI client.
|
|
|
|
**Crates:** `tauri`, `dioxus`
|
|
|
|
### uniffi / diplomat (Mobile FFI)
|
|
|
|
**Problem:** Mobile clients (iOS, Android) cannot use the Rust binary directly.
|
|
|
|
**Solution:** [uniffi](https://github.com/aspect-build/aspect-cli) (Mozilla) and
|
|
[diplomat](https://github.com/nickelc/diplomat) generate idiomatic Swift and
|
|
Kotlin bindings from Rust definitions.
|
|
|
|
**Architecture impact:** Expose `quicproquo-core` through a C-compatible FFI
|
|
layer. Mobile apps call into the Rust crypto and protocol logic.
|
|
|
|
**Crates:** `uniffi`, `diplomat`
|
|
|
|
### Nix Flakes
|
|
|
|
**Problem:** The development environment requires `capnp` (Cap'n Proto compiler),
|
|
a specific Rust toolchain version, and test infrastructure. Setup varies across
|
|
developer machines.
|
|
|
|
**Solution:** [Nix flakes](https://nixos.wiki/wiki/Flakes) provide a
|
|
reproducible, declarative development environment. A single `nix develop`
|
|
command sets up the toolchain, `capnp`, and all dependencies.
|
|
|
|
**Architecture impact:** Add `flake.nix` and `flake.lock` to the repository root.
|
|
|
|
---
|
|
|
|
## Top Priority Implementations
|
|
|
|
The following table ranks the most impactful technologies for near-term adoption,
|
|
considering the current state of the codebase and the [milestone plan](milestones.md).
|
|
|
|
Items marked **Implemented** are already part of the v2 codebase.
|
|
|
|
| Priority | Technology | Why | Status |
|
|
|----------|-----------|-----|--------|
|
|
| -- | **Post-quantum hybrid KEM** | `ml-kem` vendored; custom `OpenMlsCryptoProvider` with X25519 + ML-KEM-768. | **Implemented** |
|
|
| -- | **SQLCipher persistence** | Encrypted-at-rest storage via rusqlite + bundled-sqlcipher + Argon2id key derivation. | **Implemented** |
|
|
| -- | **OPAQUE auth** | Zero-knowledge password authentication via `opaque-ke`. Server never stores passwords. | **Implemented** |
|
|
| -- | **iroh P2P** | NAT traversal and optional P2P mesh via the `quicproquo-p2p` crate (feature-flagged). | **Implemented** |
|
|
| -- | **Sealed Sender** | `--sealed-sender` flag encrypts sender identity inside MLS ciphertext. | **Implemented** |
|
|
| 1 | **PIR (Private Information Retrieval)** | Fetch messages without revealing the recipient's identity to the server. | Future |
|
|
| 2 | **Key Transparency** | Verifiable, append-only log of public key bindings. Detects key substitution attacks. | Future |
|
|
| 3 | **WebTransport (HTTP/3)** | Enables browser clients without a WebSocket bridge. | Future |
|
|
| 4 | **OpenTelemetry** | Distributed tracing and structured metrics for production observability. | Future |
|
|
| 5 | **WebAuthn / Passkeys** | Hardware-backed authentication to replace password-based login. | Future |
|
|
|
|
---
|
|
|
|
## Cross-references
|
|
|
|
- [Milestones](milestones.md) -- current milestone tracker
|
|
- [Production Readiness WBS](production-readiness.md) -- phased work breakdown
|
|
- [Auth, Devices, and Tokens](authz-plan.md) -- OPAQUE integration point
|
|
- [1:1 Channel Design](dm-channels.md) -- double-ratchet optimisation context
|
|
- [Hybrid KEM](../protocol-layers/hybrid-kem.md) -- existing PQ design
|
|
- [References](../appendix/references.md) -- standards and crate documentation
|