# Mesh Protocol Gaps — Honest Assessment & Action Plan > **Goal:** Identify real weaknesses in QuicProChat's mesh protocol compared to > Reticulum, Meshtastic, and LXMF. Plan concrete improvements. > > Created: 2026-03-30 --- ## Executive Summary QuicProChat has strong cryptography (MLS, PQ-KEM) but **real gaps** in the mesh layer: | Gap | Severity | Status | |-----|----------|--------| | MLS overhead too large for LoRa | **Critical** | **MEASURED** — classical MLS viable! | | No lightweight messaging mode | **High** | **DONE** — MLS-Lite implemented | | KeyPackage distribution over mesh | **High** | **DONE** — announce-based with cache | | Transport capability negotiation | **High** | **DONE** — auto-selects crypto mode | | Announce/routing not battle-tested | **Medium** | S3-S4 done, needs real-world test | | No DTN bundle protocol integration | **Medium** | Priority field added | | Battery/duty-cycle optimization | **Medium** | Basic tracker exists | --- ## Gap 1: MLS Overhead is Prohibitive for Constrained Links ### The Problem **MLS was designed for Internet messaging, not LoRa.** ### Actual Measured Sizes (2026-03-30) | Component | Size (bytes) | LoRa SF12 fragments | At 1% duty | |-----------|--------------|---------------------|------------| | **MLS KeyPackage** | 306 | 6 | ~4 sec | | **MLS Welcome** | 840 | 17 | ~10 sec | | **MLS Commit (add)** | 736 | 15 | ~9 sec | | **MLS AppMessage (5B)** | 143 | 3 | ~2 sec | | **MLS Commit (update)** | 544 | 11 | ~7 sec | | **MLS KeyPackage (PQ)** | 2,676 | 53 | ~32 sec | | **MLS Welcome (PQ)** | 5,504 | 108 | ~65 sec | | **MeshEnvelope V1 (CBOR)** | 410 | 9 | ~5 sec | | **MeshEnvelope V2 (truncated)** | 336 | 7 | ~4 sec | | **MLS-Lite (no sig)** | 129 | 3 | ~2 sec | | **MLS-Lite (with sig)** | 262 | 6 | ~4 sec | | Reticulum LXMF | ~100-150 | 2-3 | ~1-2 sec | | Meshtastic max | 237 | 5 | ~3 sec | **Key insights:** - Classical MLS is **viable** for LoRa — 6 fragments for KeyPackage - Post-quantum hybrid MLS is **prohibitive** — 53+ fragments for KeyPackage - MLS-Lite matches Meshtastic efficiency while adding proper auth - **Total group setup** (KeyPackage + Welcome): ~23 fragments, ~14 sec **The math NOW works for classical MLS on LoRa:** - LoRa SF12/BW125: ~51 byte MTU, ~300 bps effective - EU868 duty cycle: 1% = 36 seconds TX per hour - **One MLS KeyPackage = 6 fragments = 4 sec = acceptable** - **Group setup = 14 sec = half duty budget, but feasible** **Post-quantum is still problematic for constrained links.** ### Current State (Updated 2026-03-30) - ✅ MeshEnvelope V1 uses CBOR, ~410 bytes for empty payload - ✅ MeshEnvelope V2 uses truncated 16-byte addresses, ~336 bytes (~18% savings) - ✅ MLS-Lite implemented: ~129 bytes without signature, ~262 with - ✅ Classical MLS KeyPackage measured at 306 bytes (much better than expected) - ⚠️ PQ-hybrid MLS still large (2.6KB KeyPackage) ### Proposed Solutions #### Option A: Hybrid Crypto Modes (Recommended) ``` ┌─────────────────────────────────────────────────────────────────┐ │ Mode Selection Based on Transport Capability │ ├─────────────────────────────────────────────────────────────────┤ │ │ │ QUIC/TCP/WiFi (>10 kbps): │ │ → Full MLS groups with PQ-KEM │ │ → KeyPackage distribution via server │ │ → Standard protocol │ │ │ │ LoRa/Serial (<1 kbps): │ │ → "MLS-Lite" mode: │ │ • Pre-shared group epoch key (exchanged out-of-band) │ │ • ChaCha20-Poly1305 symmetric encryption │ │ • Ed25519 signatures (64 bytes) │ │ • No per-message KeyPackage exchange │ │ • Manual key rotation via QR code or faster link │ │ │ │ Upgrade path: │ │ When faster transport available → full MLS epoch sync │ │ │ └─────────────────────────────────────────────────────────────────┘ ``` **Trade-off:** Lose automatic PCS on constrained links. Gain usability. #### Option B: Compressed MLS (Research) - Strip unused extensions from KeyPackages - Use shorter credential identifiers (16 bytes instead of 32) - Batch multiple KeyPackages into single transfer over fast link - Cache and reuse KeyPackages more aggressively **Trade-off:** Still large. May not be enough for SF12 LoRa. #### Option C: LXMF-Compatible Mode Implement Reticulum's LXMF format as an alternative wire format: ```rust pub struct LxmfMessage { destination: [u8; 16], // Truncated hash source: [u8; 16], signature: [u8; 64], // Ed25519 payload: Vec, // msgpack: {timestamp, content, title, fields} } // Total: ~100-150 bytes for short message ``` **Trade-off:** Lose MLS group properties. Gain Reticulum interop and efficiency. ### Action Items - [x] **Measure actual MLS sizes** — done, see table above - [x] **Design MLS-Lite spec** — `docs/plans/mls-lite-design.md` - [x] **Implement MLS-Lite** — `crates/quicprochat-p2p/src/mls_lite.rs` - [x] **Implement MeshEnvelope V2** — truncated addresses, priority field - [ ] **Implement transport capability negotiation** in TransportManager - [ ] **Test MLS-Lite vs full MLS on real LoRa** --- ## Gap 2: KeyPackage Distribution Over Mesh ### The Problem MLS requires pre-positioned KeyPackages for adding members to groups. On Internet: server stores KeyPackages, clients fetch on demand. On mesh: **no server**. Current flow (broken for pure mesh): ``` Alice wants to add Bob to group: 1. Alice fetches Bob's KeyPackage from server ← requires Internet 2. Alice creates Welcome + Commit 3. Alice sends to Bob via mesh ``` ### Proposed Solution: Announce-Based KeyPackage Distribution ``` Bob announces on mesh: 1. MeshAnnounce includes: identity_key, capabilities, AND current_keypackage_hash 2. Nearby nodes cache Bob's latest KeyPackage (if they have it) 3. Alice receives Bob's announce, requests KeyPackage via mesh RPC KeyPackage propagation: 1. Bob periodically broadcasts KeyPackage update (larger message, less frequent) 2. Nodes with capacity (CAP_STORE) cache KeyPackages for relaying 3. TTL-based expiry (KeyPackages are single-use, but we can cache N of them) ``` ### Action Items - [x] **Extend MeshAnnounce** with optional `keypackage_hash` field — 8-byte truncated hash - [x] **Add KeyPackage request/response** to mesh protocol — `mesh_protocol.rs` - [x] **Implement KeyPackage cache** — `keypackage_cache.rs` (separate from MeshStore) - [ ] **Design KeyPackage refresh protocol** for mesh-only scenarios - [x] **Add transport capability negotiation** — `transport.rs` TransportCapability enum - [x] **Add MLS-Lite upgrade path** — `crypto_negotiation.rs` --- ## Gap 3: No DTN/Bundle Protocol Integration ### The Problem NASA/IETF Bundle Protocol (RFC 9171) is the standard for delay-tolerant networking. Reticulum effectively reinvented it. QuicProChat should learn from both. Key DTN concepts we're missing: | Concept | DTN/BPv7 | Reticulum | QuicProChat | |---------|----------|-----------|-------------| | **Custody transfer** | Yes | No | No | | **Fragmentation at bundle layer** | Yes | No | Yes (LoRa transport) | | **Convergence layer adapters** | Formal spec | Interfaces | MeshTransport trait | | **Routing protocols** | CGR, EPIDEMIC | Announce-based | Announce-based | | **Priority scheduling** | Yes | No | No | ### Proposed Improvements 1. **Priority levels in MeshEnvelope** (emergency > data > announce) 2. **Custody transfer option** — intermediate node takes responsibility 3. **Better congestion control** — backpressure signals in announce ### Action Items - [ ] **Add priority field** to MeshEnvelope - [ ] **Research custody transfer** — is it worth the complexity? - [ ] **Implement priority queue** in MeshStore and DutyCycleTracker --- ## Gap 4: Battery/Duty-Cycle Optimization ### The Problem Briar drains 4x battery due to constant BT scanning. We claim to be better but haven't proven it. Current state: - DutyCycleTracker enforces EU868 1% limit - Announce interval is configurable (default 10 min) - No adaptive power management ### Proposed Improvements 1. **Adaptive announce interval** — more frequent when activity, less when idle 2. **Listen-before-talk** — don't TX if channel is busy (LoRa CAD) 3. **Scheduled wake windows** — coordinate with peers for efficient sync 4. **Power profiles** — "always-on", "hourly-sync", "manual-only" ### Action Items - [ ] **Implement CAD (Channel Activity Detection)** in LoRaTransport - [ ] **Add power profile config** to P2pNode - [ ] **Measure actual power consumption** with real hardware --- ## Gap 5: Real-World Testing ### The Problem All our mesh code runs against mocks. We claim LoRa support but haven't tested with real radios. ### Testing Plan | Test | Hardware | Status | |------|----------|--------| | LoRa point-to-point | 2x SX1262 dev boards | Not started | | LoRa multi-hop | 3x SX1262, different rooms | Not started | | Mixed transport | LoRa + WiFi relay | Not started | | Outdoor range test | LoRa, line-of-sight 1km | Not started | | Duty cycle compliance | SDR spectrum analyzer | Not started | ### Action Items - [ ] **Procure hardware** — 3x Heltec LoRa32 or similar - [ ] **Implement UART LoRaTransport** for real modems - [ ] **Create test harness** for automated multi-node testing - [ ] **Document actual performance** numbers --- ## Gap 6: Comparison Claims Need Verification ### The Problem Our positioning doc claims superiority over Meshtastic/Reticulum/Briar, but: - We haven't measured our actual overhead vs. theirs - We haven't tested interop scenarios - We haven't run security analysis against their threat models ### Verification Plan | Claim | How to Verify | |-------|---------------| | "MLS is better than shared-key AES" | Threat model comparison doc | | "Multi-hop works" | Integration test with 5+ nodes | | "LoRa-ready" | Actual LoRa hardware test | | "Post-quantum protects groups" | Verify hybrid KEM in MLS path | | "Relay nodes can't read content" | Formal verification of E2E path | ### Action Items - [ ] **Create benchmark suite** comparing message sizes - [ ] **Write threat model comparison** doc (Meshtastic CVEs, Reticulum link-level) - [ ] **Fuzz test** mesh envelope parsing - [ ] **Get external review** of mesh crypto design --- ## Implementation Priority ### Phase 1: Make It Work (Next 2 Sprints) 1. **S4: Multi-hop routing** — complete the core mesh functionality 2. **S5: Truncated addresses** — reduce envelope overhead 3. **Measure actual sizes** — know the real numbers ### Phase 2: Make It Efficient (Following 2 Sprints) 4. **Design MLS-Lite** — spec for constrained links 5. **Priority queue** — emergency messages first 6. **Hardware testing** — real LoRa validation ### Phase 3: Make It Production-Ready 7. **KeyPackage distribution** — mesh-native key exchange 8. **Power profiles** — battery optimization 9. **External review** — security audit of mesh layer --- ## Success Metrics | Metric | Previous | Current | Target | |--------|----------|---------|--------| | MeshEnvelope overhead (empty) | ~410 bytes | ~336 (V2) | ✅ Done | | MLS-Lite message (no sig) | N/A | ~129 bytes | ✅ Done | | Time to send "hello" over SF12 LoRa | ~27 sec | ~4 sec (MLS-Lite) | ✅ Done | | KeyPackage exchange over mesh | Not possible | Pending | Works | | Multi-hop message delivery | Mock only | Code complete | Real hardware | | Battery life (mesh mode) | Unknown | Unknown | Measured | --- ## Honest Assessment **What we do well:** - MLS group crypto is genuinely better than Meshtastic/Reticulum - Transport abstraction is clean - Announce protocol is solid - **NEW: Classical MLS KeyPackage (306B) is actually LoRa-viable** - **NEW: MLS-Lite provides Meshtastic-level efficiency with real auth** **What we still need to fix:** - No solution for KeyPackage distribution without server - No real-world testing with actual LoRa hardware - Post-quantum hybrid mode too large for constrained links **What we can now claim:** - "MLS on LoRa" — YES, classical MLS works with ~14 sec group setup - "MLS-Lite for constrained" — YES, ~2-4 sec messages with auth - "Post-quantum on LoRa" — NO, hybrid mode is impractical (2.6KB KeyPackage) - "Production-ready" — NO, still research-stage, pending hardware tests --- *Last updated: 2026-03-30*