Files
quicproquo/docs/src/roadmap/future-research.md
Chris Nennemann 853ca4fec0 chore: rename project quicnprotochat -> quicproquo (binaries: qpq)
Rename the entire workspace:
- Crate packages: quicnprotochat-{core,proto,server,client,gui,p2p,mobile} -> quicproquo-*
- Binary names: quicnprotochat -> qpq, quicnprotochat-server -> qpq-server,
  quicnprotochat-gui -> qpq-gui
- Default files: *-state.bin -> qpq-state.bin, *-server.toml -> qpq-server.toml,
  *.db -> qpq.db
- Environment variable prefix: QUICNPROTOCHAT_* -> QPQ_*
- App identifier: chat.quicnproto.gui -> chat.quicproquo.gui
- Proto package: quicnprotochat.bench -> quicproquo.bench
- All documentation, Docker, CI, and script references updated

HKDF domain-separation strings and P2P ALPN remain unchanged for
backward compatibility with existing encrypted state and wire protocol.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-01 20:11:51 +01:00

16 KiB

Future Research Directions

This page catalogues technologies and research directions that could strengthen quicproquo beyond the current milestone plan. Each entry includes a brief description, the problem it solves, relevant crates or specifications, and how it maps to the project architecture.

For the production readiness work breakdown, see Production Readiness WBS.


Transport and Networking

LibP2P / iroh (n0)

Problem: The current architecture is strictly client-server. Clients behind NAT cannot communicate directly, and the server is a single point of failure for delivery.

Solution: LibP2P and iroh (from n0) provide peer discovery, NAT traversal (hole-punching), and relay fallback. iroh is particularly interesting because it is Rust-native and built on QUIC, aligning with quicproquo's existing transport layer.

Architecture impact: Move from pure client-server to a hybrid topology where peers communicate directly when possible and fall back to server relay when NAT traversal fails. The server role shifts from mandatory relay to optional rendezvous/relay node.

Crates: libp2p, iroh, iroh-net

WebTransport (HTTP/3)

Problem: Browser clients cannot use raw QUIC. The current stack requires a native Rust binary.

Solution: WebTransport exposes QUIC-like semantics (multiplexed bidirectional streams, datagrams) to browsers over HTTP/3. A WebTransport endpoint alongside the existing QUIC listener would enable a web client without WebSocket degradation.

Architecture impact: Add a second listener (HTTP/3 + WebTransport) that terminates WebTransport and bridges into the existing NodeService RPC layer. Cap'n Proto serialisation works in WASM via capnp crate.

Crates: h3, h3-webtransport, wtransport

Tor / I2P Integration

Problem: MLS protects message content, but connection metadata (who connects to the server, when, how often) leaks to the server and network observers.

Solution: Route client-server connections through Tor onion services or I2P tunnels. This provides metadata resistance at the network layer.

Architecture impact: The server exposes a .onion address (Tor) or an I2P destination. Clients connect through the anonymity network. Latency increases significantly, so this should be optional.

Crates: arti (Tor client in Rust), arti-client


Storage and Persistence

SQLCipher / libsql (Turso)

Problem: At M6, quicproquo needs persistent storage for group state, key material, and message queues. Storing private keys in a plaintext SQLite database is insufficient.

Solution: SQLCipher provides transparent, page-level AES-256 encryption for SQLite. Alternatively, libsql (Turso) offers a SQLite fork with encryption, replication, and embedded server capabilities.

Architecture impact: Replace the sqlx SQLite backend with SQLCipher. Encryption key derived from a user-provided passphrase (via Argon2id) or a hardware-backed key.

Crates: rusqlite (with bundled-sqlcipher feature), libsql

CRDTs (Automerge / Yrs)

Problem: Multi-device support requires synchronising state (group membership, read receipts, settings) across devices without a central authority resolving conflicts.

Solution: Conflict-free replicated data types (CRDTs) allow concurrent edits to converge without coordination. Automerge and Yrs (Yjs in Rust) provide production-quality CRDT implementations.

Architecture impact: Client-side state (contact list, group membership cache, read markers) stored as CRDT documents. Synchronisation happens over the existing MLS-encrypted channel, ensuring the server never sees the state.

Crates: automerge, yrs

Object Storage (S3-compatible)

Problem: Encrypted file and media attachments need a storage backend that the server can host without seeing the content.

Solution: An S3-compatible object store (MinIO, Garage, or a cloud provider) for encrypted blobs. Clients encrypt attachments client-side (using a key derived from the MLS group secret) and upload the ciphertext. The server stores and serves opaque blobs.

Architecture impact: Add a media upload/download RPC to NodeService. The server proxies to the object store or returns pre-signed URLs.

Crates: aws-sdk-s3, opendal


Cryptography and Privacy

ML-KEM + ML-DSA Hybrid (Post-Quantum MLS)

Problem: Quantum computers threaten X25519 and Ed25519. While MLS content is protected by ephemeral key exchange, the init keys and credential signatures are vulnerable to harvest-now-decrypt-later attacks.

Solution: Hybrid X25519 + ML-KEM-768 KEM for MLS init keys, and optionally hybrid Ed25519 + ML-DSA-65 for credential signatures. The ml-kem crate is already vendored in the workspace.

Architecture impact: Custom OpenMlsCryptoProvider in quicproquo-core implementing the hybrid combiner. This is the M7 milestone -- see Milestones and Hybrid KEM.

Crates: ml-kem, ml-dsa

References: NIST FIPS 203 (ML-KEM), draft-ietf-tls-hybrid-design

Private Information Retrieval (PIR)

Problem: When a client fetches messages or KeyPackages, the server learns which recipient is requesting -- even though it cannot read the content.

Solution: Private Information Retrieval (PIR) allows a client to fetch a record from the server without revealing which record was requested. SealPIR and SimplePIR provide practical constructions.

Architecture impact: Replace the fetch / fetchKeyPackage RPCs with PIR queries. This is a significant performance trade-off: PIR has high computational cost. Suitable for KeyPackage fetch (small database) before message fetch (large database).

Sealed Sender (Signal-style)

Problem: The server sees (sender, recipient, timestamp) metadata on every enqueued message. Even without reading content, this metadata reveals social graphs.

Solution: Sealed Sender encrypts the sender's identity inside the MLS ciphertext. The server routes by recipientKey only and cannot determine who sent the message.

Architecture impact: Modify the enqueue RPC to omit sender identity from the server-visible metadata. The sender identity is included only inside the MLS application message (encrypted).

Key Transparency (RFC draft)

Problem: A compromised server could substitute public keys, performing a man-in-the-middle attack on MLS group formation.

Solution: A verifiable, append-only log of public key bindings (similar to Certificate Transparency for TLS). Clients verify that the server's response matches the log before trusting a fetched KeyPackage.

Architecture impact: Add a key transparency log (Merkle tree) alongside the Authentication Service. Clients verify inclusion proofs on every fetchKeyPackage response.

References: draft-ietf-keytrans-protocol


Identity and Authentication

DIDs (Decentralized Identifiers)

Problem: User identities are currently bound to the server. If the server goes away, identities are lost.

Solution: Decentralized Identifiers (did:key, did:web) provide self-sovereign identity. A user's DID is derived from their Ed25519 public key and is portable across servers.

Architecture impact: Replace raw Ed25519 public keys in MLS credentials with DID URIs. The server resolves DIDs to public keys for routing.

Crates: did-key, ssi

OPAQUE (aPAKE)

Problem: If quicproquo adds password-based account registration, the server must never see the password -- not even a hash.

Solution: OPAQUE is an asymmetric password-authenticated key exchange where the server stores only a one-way transformation of the password. The server cannot perform offline dictionary attacks.

Architecture impact: Replace the registration/login flow with OPAQUE. The server stores an OPAQUE registration record; the client runs the OPAQUE protocol to authenticate and derive a session key.

Crates: opaque-ke

References: RFC 9497

WebAuthn / Passkeys

Problem: Password-based auth (even with OPAQUE) is vulnerable to phishing. Hardware-backed authentication provides stronger device binding.

Solution: WebAuthn / Passkeys allow authentication via hardware tokens (YubiKey), platform authenticators (Touch ID, Windows Hello), or synced passkeys.

Architecture impact: Add a WebAuthn registration/authentication flow to the account system. Requires a server-side WebAuthn relying party implementation.

Crates: webauthn-rs

Verifiable Credentials (W3C VC)

Problem: Proving attributes (organization membership, role, age) without revealing full identity.

Solution: Verifiable Credentials allow a user to present cryptographic proofs of attributes issued by a trusted authority.

Architecture impact: Extend MLS credentials with VC presentation. A group admin could require proof of organization membership before allowing join.


Application Layer

Matrix-style Federation

Problem: A single server is a single point of failure and a single point of trust. Users on different servers cannot communicate.

Solution: Federation allows multiple quicproquo servers to exchange messages, similar to Matrix homeserver federation. Each server manages its own users and relays messages to peer servers.

Architecture impact: Major. Requires server-to-server protocol, distributed identity resolution, and cross-server MLS group management.

WASM Plugin System

Problem: Extensibility (bots, bridges, custom message types) currently requires forking the codebase.

Solution: A sandboxed WASM plugin system allows third-party extensions to run inside the client or server without access to private key material.

Architecture impact: Define a plugin API (message hooks, command handlers). Plugins compiled to WASM and loaded at runtime via wasmtime or wasmer.

Crates: wasmtime, wasmer, extism

Double-Ratchet DM Layer

Problem: MLS is optimised for groups. For efficient 1:1 conversations, the Signal double ratchet (X3DH + Axolotl) provides better performance characteristics (no tree overhead for two parties).

Solution: Implement a double-ratchet layer for 1:1 DMs, using MLS only for groups with N > 2. The 1:1 Channel Design currently uses MLS for DMs; this would be an optimisation.

References: The Double Ratchet Algorithm, X3DH Key Agreement Protocol


Observability and Operations

OpenTelemetry (Tracing + Metrics)

Problem: The current logging is tracing-based but lacks distributed tracing context and structured metrics export.

Solution: OpenTelemetry provides a unified framework for distributed tracing, metrics, and log correlation. OTLP export enables integration with any observability backend.

Architecture impact: Add tracing-opentelemetry and opentelemetry-otlp to the server. Instrument RPC handlers with spans. Export to Jaeger, Grafana Tempo, or any OTLP-compatible backend.

Crates: opentelemetry, opentelemetry-otlp, tracing-opentelemetry

Prometheus + Grafana

Problem: No quantitative visibility into server performance (throughput, latency, queue depth, epoch advancement rate).

Solution: Export Prometheus metrics from the server. Visualise with Grafana dashboards.

Metrics to export: message throughput (enqueue/fetch per second), RPC latency histograms, MLS epoch advancement rate, delivery queue depth, KeyPackage store size, active connections.

Crates: prometheus, metrics, metrics-exporter-prometheus

Testcontainers-rs

Problem: Integration tests currently run server and client in the same process (tokio::spawn). This does not test real network conditions, container startup, or multi-process interactions.

Solution: Testcontainers-rs runs Docker containers from Rust tests, enabling true end-to-end CI with real network boundaries.

Architecture impact: Add testcontainers-based integration tests alongside the existing in-process tests. The Docker image is already maintained.

Crates: testcontainers, testcontainers-modules


Developer Experience

Tauri / Dioxus (Native GUI)

Problem: The current interface is CLI-only. A graphical client would broaden the user base for testing and demonstration.

Solution: Tauri or Dioxus provide native cross-platform GUI frameworks in Rust. The quicproquo-core crate can be shared directly with the GUI client.

Architecture impact: Add a quicproquo-gui crate that depends on quicproquo-core and quicproquo-proto. The GUI drives the same GroupMember and RPC logic as the CLI client.

Crates: tauri, dioxus

uniffi / diplomat (Mobile FFI)

Problem: Mobile clients (iOS, Android) cannot use the Rust binary directly.

Solution: uniffi (Mozilla) and diplomat generate idiomatic Swift and Kotlin bindings from Rust definitions.

Architecture impact: Expose quicproquo-core through a C-compatible FFI layer. Mobile apps call into the Rust crypto and protocol logic.

Crates: uniffi, diplomat

Nix Flakes

Problem: The development environment requires capnp (Cap'n Proto compiler), a specific Rust toolchain version, and test infrastructure. Setup varies across developer machines.

Solution: Nix flakes provide a reproducible, declarative development environment. A single nix develop command sets up the toolchain, capnp, and all dependencies.

Architecture impact: Add flake.nix and flake.lock to the repository root.


Top 5 Priority Implementations

The following table ranks the most impactful technologies for near-term adoption, considering the current state of the codebase and the milestone plan.

Priority Technology Why Unlocks
1 Post-quantum hybrid KEM ml-kem is already vendored in the workspace. Completing the hybrid OpenMlsCryptoProvider makes quicproquo one of the first PQ MLS implementations. M7
2 SQLCipher persistence Encrypted-at-rest storage is the prerequisite for multi-device support, offline usage, and server restart survival. M6
3 OPAQUE auth Zero-knowledge password authentication is a massive security uplift for the account system. The server never sees or stores passwords. Phase 3 (authz)
4 iroh / LibP2P NAT traversal and optional P2P mesh makes quicproquo deployable without centralised infrastructure. Aligns with the existing QUIC transport. Beyond M7
5 Sealed Sender + PIR Content encryption is table stakes. Metadata resistance (hiding who talks to whom) is the frontier of private messaging research. Beyond M7

Cross-references