# ADR-004: MLS-Unaware Delivery Service **Status:** Accepted --- ## Context The Delivery Service (DS) is the server-side component that stores and forwards messages between clients. A fundamental design question is: **should the DS understand MLS messages?** An MLS-aware DS could inspect message types and perform optimizations: - **Fan-out:** When a client sends a Commit or Application message intended for all group members, an MLS-aware DS could parse the group membership and deliver the message to all members automatically, instead of requiring the client to enqueue separately for each recipient. - **Membership validation:** An MLS-aware DS could verify that a sender is actually a member of the group before accepting a message, preventing spam from non-members. - **Epoch filtering:** An MLS-aware DS could reject messages from stale epochs, reducing the processing burden on recipients. - **Tree optimization:** An MLS-aware DS could cache the ratchet tree and assist with tree synchronization. However, an MLS-aware DS would also: - Have access to MLS message metadata (group IDs, epoch numbers, sender positions in the tree). - Require an MLS library dependency on the server. - Be more complex to implement, test, and audit. - Potentially violate the MLS architecture's trust model. ### What RFC 9420 says RFC 9420 Section 4 defines the DS as a component that: > "is responsible for ordering handshake messages and delivering them to each client." Critically, the RFC specifies that the DS **does not have access to group keys** and treats message content as opaque. The DS's role is limited to: 1. Ordering: ensuring that handshake messages (Commits) are applied in a consistent order across all group members. 2. Delivery: routing messages to the correct recipients. 3. Optional: enforcing access control (e.g., only group members can send to the group). The RFC explicitly envisions that the DS operates on opaque blobs, not on decrypted MLS content. --- ## Decision The quicproquo Delivery Service is **MLS-unaware**. It routes opaque byte strings by `(recipientKey, channelId)` without parsing, inspecting, or validating any MLS content. ### What the DS sees ```text DS perspective: enqueue(recipientKey=0x1234..., payload=, channelId=, version=1) fetch(recipientKey=0x1234..., channelId=, version=1) -> [, ...] DS does NOT see: - Whether the payload is a Welcome, Commit, or Application message - The MLS group ID or epoch number - The sender's position in the ratchet tree - Any plaintext content ``` ### Routing responsibility Because the DS does not parse MLS messages, the **client** is responsible for routing: | MLS Operation | Client's Routing Responsibility | |---|---| | `add_members()` | Enqueue the Welcome message to the new member's `recipientKey`. Enqueue the Commit to each existing member's `recipientKey`. | | `remove_members()` | Enqueue the Commit to each remaining member's `recipientKey`. | | `create_message()` | Enqueue the Application message to each group member's `recipientKey`. | | `self_update()` | Enqueue the Commit to each other member's `recipientKey`. | This means that sending a message to a group of n members requires n-1 enqueue calls (one per recipient, excluding the sender). The client must maintain its own copy of the group membership list. --- ## Consequences ### Benefits - **Correct MLS architecture.** The DS does not hold group keys or inspect group state, which is the architecture recommended by RFC 9420 Section 4. A compromised DS learns nothing about message content or group structure beyond the routing metadata (recipient keys and channel IDs). - **Audit-friendly.** The DS's audit log is a simple append-only sequence of `(timestamp, recipientKey, channelId, payload_hash)` entries. There is no complex state machine to audit. The server's behavior is trivially verifiable: it accepts blobs and returns them in FIFO order. - **No MLS dependency on the server.** The server does not depend on `openmls` or any MLS library. This reduces the server's attack surface, compile time, and binary size. It also means the server is completely decoupled from MLS version upgrades. - **Simplicity.** The DS is a hash map of FIFO queues. The entire implementation fits in a few hundred lines of Rust. There are no edge cases around epoch transitions, tree synchronization, or membership conflicts. - **Protocol agnosticism.** The DS can carry any payload, not just MLS messages. Future protocol extensions (e.g., signaling for voice/video, file transfer metadata) can reuse the same delivery infrastructure without modification. ### Costs and trade-offs - **No server-side fan-out.** The client must enqueue separately for each recipient. For a group of n members, this means n-1 enqueue calls per message, compared to 1 call if the DS could fan out. This increases client bandwidth usage by a factor of approximately n for the routing metadata (though the payload is the same in each call). - **No server-side membership validation.** The DS cannot verify that a sender is a member of the group. A malicious client could enqueue messages to any recipient key, potentially causing the recipient to process (and reject) invalid MLS messages. This is mitigated by MLS's own authentication: invalid messages are rejected during MLS processing. - **No server-side ordering guarantees.** RFC 9420 envisions the DS providing a consistent ordering of handshake messages. The current DS provides FIFO ordering per `(recipientKey, channelId)` queue, but it does not provide global ordering across all group members. In practice, MLS handles out-of-order delivery gracefully (Commits include the epoch number, and clients can buffer messages for future epochs). - **Client complexity.** The client must maintain the group membership list and perform per-recipient routing. This is additional state that the client must manage correctly. An incorrect membership list results in some members not receiving messages. ### Residual risks - **Metadata exposure.** While the DS does not see message content, it does see routing metadata: which recipient keys receive messages, when, and on which channels. This metadata can reveal communication patterns. Mitigation: use channel IDs that are not correlated with real-world identifiers, and consider padding to hide message sizes. - **Denial of service.** Because the DS does not validate senders, a malicious client could flood a recipient's queue with garbage payloads. Mitigation: rate limiting (planned for a future milestone) and the `Auth` struct for sender identification. --- ## Code references | File | Relevance | |---|---| | `schemas/delivery.capnp` | DeliveryService RPC interface (opaque `Data` payloads) | | `schemas/node.capnp` | NodeService: `enqueue`, `fetch`, `fetchWait` methods | | `crates/quicproquo-server/src/storage.rs` | Server-side queue storage (DashMap-based FIFO queues) | | `crates/quicproquo-server/src/main.rs` | NodeService RPC handler implementation | --- ## Further reading - [Design Decisions Overview](overview.md) -- index of all ADRs - [Delivery Schema](../wire-format/delivery-schema.md) -- the DS RPC interface definition - [NodeService Schema](../wire-format/node-service-schema.md) -- the unified interface that includes DS methods - [ADR-005: Single-Use KeyPackages](adr-005-single-use-keypackages.md) -- related AS design decision - [Architecture Overview](../architecture/overview.md) -- system-level view showing DS in context - [Why This Design, Not Signal/Matrix/...](why-not-signal.md) -- broader protocol comparison