# Future Research Directions This page catalogues technologies and research directions that could strengthen quicnprotochat beyond the current [milestone plan](milestones.md). Each entry includes a brief description, the problem it solves, relevant crates or specifications, and how it maps to the project architecture. For the production readiness work breakdown, see [Production Readiness WBS](production-readiness.md). --- ## Transport and Networking ### LibP2P / iroh (n0) **Problem:** The current architecture is strictly client-server. Clients behind NAT cannot communicate directly, and the server is a single point of failure for delivery. **Solution:** [LibP2P](https://libp2p.io/) and [iroh](https://iroh.computer/) (from n0) provide peer discovery, NAT traversal (hole-punching), and relay fallback. iroh is particularly interesting because it is Rust-native and built on QUIC, aligning with quicnprotochat's existing transport layer. **Architecture impact:** Move from pure client-server to a hybrid topology where peers communicate directly when possible and fall back to server relay when NAT traversal fails. The server role shifts from mandatory relay to optional rendezvous/relay node. **Crates:** `libp2p`, `iroh`, `iroh-net` ### WebTransport (HTTP/3) **Problem:** Browser clients cannot use raw QUIC. The current stack requires a native Rust binary. **Solution:** [WebTransport](https://w3c.github.io/webtransport/) exposes QUIC-like semantics (multiplexed bidirectional streams, datagrams) to browsers over HTTP/3. A WebTransport endpoint alongside the existing QUIC listener would enable a web client without WebSocket degradation. **Architecture impact:** Add a second listener (HTTP/3 + WebTransport) that terminates WebTransport and bridges into the existing `NodeService` RPC layer. Cap'n Proto serialisation works in WASM via `capnp` crate. **Crates:** `h3`, `h3-webtransport`, `wtransport` ### Tor / I2P Integration **Problem:** MLS protects message content, but connection metadata (who connects to the server, when, how often) leaks to the server and network observers. **Solution:** Route client-server connections through [Tor](https://www.torproject.org/) onion services or [I2P](https://geti2p.net/) tunnels. This provides metadata resistance at the network layer. **Architecture impact:** The server exposes a `.onion` address (Tor) or an I2P destination. Clients connect through the anonymity network. Latency increases significantly, so this should be optional. **Crates:** `arti` (Tor client in Rust), `arti-client` --- ## Storage and Persistence ### SQLCipher / libsql (Turso) **Problem:** At M6, quicnprotochat needs persistent storage for group state, key material, and message queues. Storing private keys in a plaintext SQLite database is insufficient. **Solution:** [SQLCipher](https://www.zetetic.net/sqlcipher/) provides transparent, page-level AES-256 encryption for SQLite. Alternatively, [libsql](https://turso.tech/libsql) (Turso) offers a SQLite fork with encryption, replication, and embedded server capabilities. **Architecture impact:** Replace the `sqlx` SQLite backend with SQLCipher. Encryption key derived from a user-provided passphrase (via Argon2id) or a hardware-backed key. **Crates:** `rusqlite` (with `bundled-sqlcipher` feature), `libsql` ### CRDTs (Automerge / Yrs) **Problem:** Multi-device support requires synchronising state (group membership, read receipts, settings) across devices without a central authority resolving conflicts. **Solution:** Conflict-free replicated data types (CRDTs) allow concurrent edits to converge without coordination. [Automerge](https://automerge.org/) and [Yrs](https://docs.rs/yrs/) (Yjs in Rust) provide production-quality CRDT implementations. **Architecture impact:** Client-side state (contact list, group membership cache, read markers) stored as CRDT documents. Synchronisation happens over the existing MLS-encrypted channel, ensuring the server never sees the state. **Crates:** `automerge`, `yrs` ### Object Storage (S3-compatible) **Problem:** Encrypted file and media attachments need a storage backend that the server can host without seeing the content. **Solution:** An S3-compatible object store (MinIO, Garage, or a cloud provider) for encrypted blobs. Clients encrypt attachments client-side (using a key derived from the MLS group secret) and upload the ciphertext. The server stores and serves opaque blobs. **Architecture impact:** Add a media upload/download RPC to `NodeService`. The server proxies to the object store or returns pre-signed URLs. **Crates:** `aws-sdk-s3`, `opendal` --- ## Cryptography and Privacy ### ML-KEM + ML-DSA Hybrid (Post-Quantum MLS) **Problem:** Quantum computers threaten X25519 and Ed25519. While MLS content is protected by ephemeral key exchange, the init keys and credential signatures are vulnerable to harvest-now-decrypt-later attacks. **Solution:** Hybrid X25519 + ML-KEM-768 KEM for MLS init keys, and optionally hybrid Ed25519 + ML-DSA-65 for credential signatures. The `ml-kem` crate is already vendored in the workspace. **Architecture impact:** Custom `OpenMlsCryptoProvider` in `quicnprotochat-core` implementing the hybrid combiner. This is the M7 milestone -- see [Milestones](milestones.md#m7----post-quantum-planned) and [Hybrid KEM](../protocol-layers/hybrid-kem.md). **Crates:** `ml-kem`, `ml-dsa` **References:** NIST FIPS 203 (ML-KEM), `draft-ietf-tls-hybrid-design` ### Private Information Retrieval (PIR) **Problem:** When a client fetches messages or KeyPackages, the server learns *which* recipient is requesting -- even though it cannot read the content. **Solution:** Private Information Retrieval (PIR) allows a client to fetch a record from the server without revealing which record was requested. [SealPIR](https://github.com/microsoft/SealPIR) and SimplePIR provide practical constructions. **Architecture impact:** Replace the `fetch` / `fetchKeyPackage` RPCs with PIR queries. This is a significant performance trade-off: PIR has high computational cost. Suitable for KeyPackage fetch (small database) before message fetch (large database). ### Sealed Sender (Signal-style) **Problem:** The server sees `(sender, recipient, timestamp)` metadata on every enqueued message. Even without reading content, this metadata reveals social graphs. **Solution:** [Sealed Sender](https://signal.org/blog/sealed-sender/) encrypts the sender's identity inside the MLS ciphertext. The server routes by `recipientKey` only and cannot determine who sent the message. **Architecture impact:** Modify the `enqueue` RPC to omit sender identity from the server-visible metadata. The sender identity is included only inside the MLS application message (encrypted). ### Key Transparency (RFC draft) **Problem:** A compromised server could substitute public keys, performing a man-in-the-middle attack on MLS group formation. **Solution:** A verifiable, append-only log of public key bindings (similar to Certificate Transparency for TLS). Clients verify that the server's response matches the log before trusting a fetched KeyPackage. **Architecture impact:** Add a key transparency log (Merkle tree) alongside the Authentication Service. Clients verify inclusion proofs on every `fetchKeyPackage` response. **References:** `draft-ietf-keytrans-protocol` --- ## Identity and Authentication ### DIDs (Decentralized Identifiers) **Problem:** User identities are currently bound to the server. If the server goes away, identities are lost. **Solution:** [Decentralized Identifiers](https://www.w3.org/TR/did-core/) (`did:key`, `did:web`) provide self-sovereign identity. A user's DID is derived from their Ed25519 public key and is portable across servers. **Architecture impact:** Replace raw Ed25519 public keys in MLS credentials with DID URIs. The server resolves DIDs to public keys for routing. **Crates:** `did-key`, `ssi` ### OPAQUE (aPAKE) **Problem:** If quicnprotochat adds password-based account registration, the server must never see the password -- not even a hash. **Solution:** [OPAQUE](https://datatracker.ietf.org/doc/rfc9497/) is an asymmetric password-authenticated key exchange where the server stores only a one-way transformation of the password. The server cannot perform offline dictionary attacks. **Architecture impact:** Replace the registration/login flow with OPAQUE. The server stores an OPAQUE registration record; the client runs the OPAQUE protocol to authenticate and derive a session key. **Crates:** `opaque-ke` **References:** RFC 9497 ### WebAuthn / Passkeys **Problem:** Password-based auth (even with OPAQUE) is vulnerable to phishing. Hardware-backed authentication provides stronger device binding. **Solution:** [WebAuthn](https://www.w3.org/TR/webauthn-3/) / Passkeys allow authentication via hardware tokens (YubiKey), platform authenticators (Touch ID, Windows Hello), or synced passkeys. **Architecture impact:** Add a WebAuthn registration/authentication flow to the account system. Requires a server-side WebAuthn relying party implementation. **Crates:** `webauthn-rs` ### Verifiable Credentials (W3C VC) **Problem:** Proving attributes (organization membership, role, age) without revealing full identity. **Solution:** [Verifiable Credentials](https://www.w3.org/TR/vc-data-model/) allow a user to present cryptographic proofs of attributes issued by a trusted authority. **Architecture impact:** Extend MLS credentials with VC presentation. A group admin could require proof of organization membership before allowing join. --- ## Application Layer ### Matrix-style Federation **Problem:** A single server is a single point of failure and a single point of trust. Users on different servers cannot communicate. **Solution:** Federation allows multiple quicnprotochat servers to exchange messages, similar to [Matrix](https://matrix.org/) homeserver federation. Each server manages its own users and relays messages to peer servers. **Architecture impact:** Major. Requires server-to-server protocol, distributed identity resolution, and cross-server MLS group management. ### WASM Plugin System **Problem:** Extensibility (bots, bridges, custom message types) currently requires forking the codebase. **Solution:** A sandboxed WASM plugin system allows third-party extensions to run inside the client or server without access to private key material. **Architecture impact:** Define a plugin API (message hooks, command handlers). Plugins compiled to WASM and loaded at runtime via `wasmtime` or `wasmer`. **Crates:** `wasmtime`, `wasmer`, `extism` ### Double-Ratchet DM Layer **Problem:** MLS is optimised for groups. For efficient 1:1 conversations, the Signal double ratchet (X3DH + Axolotl) provides better performance characteristics (no tree overhead for two parties). **Solution:** Implement a double-ratchet layer for 1:1 DMs, using MLS only for groups with N > 2. The [1:1 Channel Design](dm-channels.md) currently uses MLS for DMs; this would be an optimisation. **References:** [The Double Ratchet Algorithm](https://signal.org/docs/specifications/doubleratchet/), [X3DH Key Agreement Protocol](https://signal.org/docs/specifications/x3dh/) --- ## Observability and Operations ### OpenTelemetry (Tracing + Metrics) **Problem:** The current logging is `tracing`-based but lacks distributed tracing context and structured metrics export. **Solution:** [OpenTelemetry](https://opentelemetry.io/) provides a unified framework for distributed tracing, metrics, and log correlation. OTLP export enables integration with any observability backend. **Architecture impact:** Add `tracing-opentelemetry` and `opentelemetry-otlp` to the server. Instrument RPC handlers with spans. Export to Jaeger, Grafana Tempo, or any OTLP-compatible backend. **Crates:** `opentelemetry`, `opentelemetry-otlp`, `tracing-opentelemetry` ### Prometheus + Grafana **Problem:** No quantitative visibility into server performance (throughput, latency, queue depth, epoch advancement rate). **Solution:** Export Prometheus metrics from the server. Visualise with Grafana dashboards. **Metrics to export:** message throughput (enqueue/fetch per second), RPC latency histograms, MLS epoch advancement rate, delivery queue depth, KeyPackage store size, active connections. **Crates:** `prometheus`, `metrics`, `metrics-exporter-prometheus` ### Testcontainers-rs **Problem:** Integration tests currently run server and client in the same process (`tokio::spawn`). This does not test real network conditions, container startup, or multi-process interactions. **Solution:** [Testcontainers-rs](https://docs.rs/testcontainers/) runs Docker containers from Rust tests, enabling true end-to-end CI with real network boundaries. **Architecture impact:** Add testcontainers-based integration tests alongside the existing in-process tests. The Docker image is already maintained. **Crates:** `testcontainers`, `testcontainers-modules` --- ## Developer Experience ### Tauri / Dioxus (Native GUI) **Problem:** The current interface is CLI-only. A graphical client would broaden the user base for testing and demonstration. **Solution:** [Tauri](https://tauri.app/) or [Dioxus](https://dioxuslabs.com/) provide native cross-platform GUI frameworks in Rust. The `quicnprotochat-core` crate can be shared directly with the GUI client. **Architecture impact:** Add a `quicnprotochat-gui` crate that depends on `quicnprotochat-core` and `quicnprotochat-proto`. The GUI drives the same `GroupMember` and RPC logic as the CLI client. **Crates:** `tauri`, `dioxus` ### uniffi / diplomat (Mobile FFI) **Problem:** Mobile clients (iOS, Android) cannot use the Rust binary directly. **Solution:** [uniffi](https://github.com/aspect-build/aspect-cli) (Mozilla) and [diplomat](https://github.com/nickelc/diplomat) generate idiomatic Swift and Kotlin bindings from Rust definitions. **Architecture impact:** Expose `quicnprotochat-core` through a C-compatible FFI layer. Mobile apps call into the Rust crypto and protocol logic. **Crates:** `uniffi`, `diplomat` ### Nix Flakes **Problem:** The development environment requires `capnp` (Cap'n Proto compiler), a specific Rust toolchain version, and test infrastructure. Setup varies across developer machines. **Solution:** [Nix flakes](https://nixos.wiki/wiki/Flakes) provide a reproducible, declarative development environment. A single `nix develop` command sets up the toolchain, `capnp`, and all dependencies. **Architecture impact:** Add `flake.nix` and `flake.lock` to the repository root. --- ## Top 5 Priority Implementations The following table ranks the most impactful technologies for near-term adoption, considering the current state of the codebase and the [milestone plan](milestones.md). | Priority | Technology | Why | Unlocks | |----------|-----------|-----|---------| | 1 | **Post-quantum hybrid KEM** | `ml-kem` is already vendored in the workspace. Completing the hybrid `OpenMlsCryptoProvider` makes quicnprotochat one of the first PQ MLS implementations. | M7 | | 2 | **SQLCipher persistence** | Encrypted-at-rest storage is the prerequisite for multi-device support, offline usage, and server restart survival. | M6 | | 3 | **OPAQUE auth** | Zero-knowledge password authentication is a massive security uplift for the account system. The server never sees or stores passwords. | Phase 3 (authz) | | 4 | **iroh / LibP2P** | NAT traversal and optional P2P mesh makes quicnprotochat deployable without centralised infrastructure. Aligns with the existing QUIC transport. | Beyond M7 | | 5 | **Sealed Sender + PIR** | Content encryption is table stakes. Metadata resistance (hiding who talks to whom) is the frontier of private messaging research. | Beyond M7 | --- ## Cross-references - [Milestones](milestones.md) -- current milestone tracker - [Production Readiness WBS](production-readiness.md) -- phased work breakdown - [Auth, Devices, and Tokens](authz-plan.md) -- OPAQUE integration point - [1:1 Channel Design](dm-channels.md) -- double-ratchet optimisation context - [Hybrid KEM](../protocol-layers/hybrid-kem.md) -- existing PQ design - [References](../appendix/references.md) -- standards and crate documentation