Completed in this session: - KeyPackage distribution over mesh (announce-based) - Transport capability negotiation - MLS-Lite to full MLS upgrade path Updated mesh-protocol-gaps.md to reflect completed items.
13 KiB
Mesh Protocol Gaps — Honest Assessment & Action Plan
Goal: Identify real weaknesses in QuicProChat's mesh protocol compared to Reticulum, Meshtastic, and LXMF. Plan concrete improvements.
Created: 2026-03-30
Executive Summary
QuicProChat has strong cryptography (MLS, PQ-KEM) but real gaps in the mesh layer:
| Gap | Severity | Status |
|---|---|---|
| MLS overhead too large for LoRa | Critical | MEASURED — classical MLS viable! |
| No lightweight messaging mode | High | DONE — MLS-Lite implemented |
| KeyPackage distribution over mesh | High | DONE — announce-based with cache |
| Transport capability negotiation | High | DONE — auto-selects crypto mode |
| Announce/routing not battle-tested | Medium | S3-S4 done, needs real-world test |
| No DTN bundle protocol integration | Medium | Priority field added |
| Battery/duty-cycle optimization | Medium | Basic tracker exists |
Gap 1: MLS Overhead is Prohibitive for Constrained Links
The Problem
MLS was designed for Internet messaging, not LoRa.
Actual Measured Sizes (2026-03-30)
| Component | Size (bytes) | LoRa SF12 fragments | At 1% duty |
|---|---|---|---|
| MLS KeyPackage | 306 | 6 | ~4 sec |
| MLS Welcome | 840 | 17 | ~10 sec |
| MLS Commit (add) | 736 | 15 | ~9 sec |
| MLS AppMessage (5B) | 143 | 3 | ~2 sec |
| MLS Commit (update) | 544 | 11 | ~7 sec |
| MLS KeyPackage (PQ) | 2,676 | 53 | ~32 sec |
| MLS Welcome (PQ) | 5,504 | 108 | ~65 sec |
| MeshEnvelope V1 (CBOR) | 410 | 9 | ~5 sec |
| MeshEnvelope V2 (truncated) | 336 | 7 | ~4 sec |
| MLS-Lite (no sig) | 129 | 3 | ~2 sec |
| MLS-Lite (with sig) | 262 | 6 | ~4 sec |
| Reticulum LXMF | ~100-150 | 2-3 | ~1-2 sec |
| Meshtastic max | 237 | 5 | ~3 sec |
Key insights:
- Classical MLS is viable for LoRa — 6 fragments for KeyPackage
- Post-quantum hybrid MLS is prohibitive — 53+ fragments for KeyPackage
- MLS-Lite matches Meshtastic efficiency while adding proper auth
- Total group setup (KeyPackage + Welcome): ~23 fragments, ~14 sec
The math NOW works for classical MLS on LoRa:
- LoRa SF12/BW125: ~51 byte MTU, ~300 bps effective
- EU868 duty cycle: 1% = 36 seconds TX per hour
- One MLS KeyPackage = 6 fragments = 4 sec = acceptable
- Group setup = 14 sec = half duty budget, but feasible
Post-quantum is still problematic for constrained links.
Current State (Updated 2026-03-30)
- ✅ MeshEnvelope V1 uses CBOR, ~410 bytes for empty payload
- ✅ MeshEnvelope V2 uses truncated 16-byte addresses, ~336 bytes (~18% savings)
- ✅ MLS-Lite implemented: ~129 bytes without signature, ~262 with
- ✅ Classical MLS KeyPackage measured at 306 bytes (much better than expected)
- ⚠️ PQ-hybrid MLS still large (2.6KB KeyPackage)
Proposed Solutions
Option A: Hybrid Crypto Modes (Recommended)
┌─────────────────────────────────────────────────────────────────┐
│ Mode Selection Based on Transport Capability │
├─────────────────────────────────────────────────────────────────┤
│ │
│ QUIC/TCP/WiFi (>10 kbps): │
│ → Full MLS groups with PQ-KEM │
│ → KeyPackage distribution via server │
│ → Standard protocol │
│ │
│ LoRa/Serial (<1 kbps): │
│ → "MLS-Lite" mode: │
│ • Pre-shared group epoch key (exchanged out-of-band) │
│ • ChaCha20-Poly1305 symmetric encryption │
│ • Ed25519 signatures (64 bytes) │
│ • No per-message KeyPackage exchange │
│ • Manual key rotation via QR code or faster link │
│ │
│ Upgrade path: │
│ When faster transport available → full MLS epoch sync │
│ │
└─────────────────────────────────────────────────────────────────┘
Trade-off: Lose automatic PCS on constrained links. Gain usability.
Option B: Compressed MLS (Research)
- Strip unused extensions from KeyPackages
- Use shorter credential identifiers (16 bytes instead of 32)
- Batch multiple KeyPackages into single transfer over fast link
- Cache and reuse KeyPackages more aggressively
Trade-off: Still large. May not be enough for SF12 LoRa.
Option C: LXMF-Compatible Mode
Implement Reticulum's LXMF format as an alternative wire format:
pub struct LxmfMessage {
destination: [u8; 16], // Truncated hash
source: [u8; 16],
signature: [u8; 64], // Ed25519
payload: Vec<u8>, // msgpack: {timestamp, content, title, fields}
}
// Total: ~100-150 bytes for short message
Trade-off: Lose MLS group properties. Gain Reticulum interop and efficiency.
Action Items
- Measure actual MLS sizes — done, see table above
- Design MLS-Lite spec —
docs/plans/mls-lite-design.md - Implement MLS-Lite —
crates/quicprochat-p2p/src/mls_lite.rs - Implement MeshEnvelope V2 — truncated addresses, priority field
- Implement transport capability negotiation in TransportManager
- Test MLS-Lite vs full MLS on real LoRa
Gap 2: KeyPackage Distribution Over Mesh
The Problem
MLS requires pre-positioned KeyPackages for adding members to groups. On Internet: server stores KeyPackages, clients fetch on demand. On mesh: no server.
Current flow (broken for pure mesh):
Alice wants to add Bob to group:
1. Alice fetches Bob's KeyPackage from server ← requires Internet
2. Alice creates Welcome + Commit
3. Alice sends to Bob via mesh
Proposed Solution: Announce-Based KeyPackage Distribution
Bob announces on mesh:
1. MeshAnnounce includes: identity_key, capabilities, AND current_keypackage_hash
2. Nearby nodes cache Bob's latest KeyPackage (if they have it)
3. Alice receives Bob's announce, requests KeyPackage via mesh RPC
KeyPackage propagation:
1. Bob periodically broadcasts KeyPackage update (larger message, less frequent)
2. Nodes with capacity (CAP_STORE) cache KeyPackages for relaying
3. TTL-based expiry (KeyPackages are single-use, but we can cache N of them)
Action Items
- Extend MeshAnnounce with optional
keypackage_hashfield — 8-byte truncated hash - Add KeyPackage request/response to mesh protocol —
mesh_protocol.rs - Implement KeyPackage cache —
keypackage_cache.rs(separate from MeshStore) - Design KeyPackage refresh protocol for mesh-only scenarios
- Add transport capability negotiation —
transport.rsTransportCapability enum - Add MLS-Lite upgrade path —
crypto_negotiation.rs
Gap 3: No DTN/Bundle Protocol Integration
The Problem
NASA/IETF Bundle Protocol (RFC 9171) is the standard for delay-tolerant networking. Reticulum effectively reinvented it. QuicProChat should learn from both.
Key DTN concepts we're missing:
| Concept | DTN/BPv7 | Reticulum | QuicProChat |
|---|---|---|---|
| Custody transfer | Yes | No | No |
| Fragmentation at bundle layer | Yes | No | Yes (LoRa transport) |
| Convergence layer adapters | Formal spec | Interfaces | MeshTransport trait |
| Routing protocols | CGR, EPIDEMIC | Announce-based | Announce-based |
| Priority scheduling | Yes | No | No |
Proposed Improvements
- Priority levels in MeshEnvelope (emergency > data > announce)
- Custody transfer option — intermediate node takes responsibility
- Better congestion control — backpressure signals in announce
Action Items
- Add priority field to MeshEnvelope
- Research custody transfer — is it worth the complexity?
- Implement priority queue in MeshStore and DutyCycleTracker
Gap 4: Battery/Duty-Cycle Optimization
The Problem
Briar drains 4x battery due to constant BT scanning. We claim to be better but haven't proven it.
Current state:
- DutyCycleTracker enforces EU868 1% limit
- Announce interval is configurable (default 10 min)
- No adaptive power management
Proposed Improvements
- Adaptive announce interval — more frequent when activity, less when idle
- Listen-before-talk — don't TX if channel is busy (LoRa CAD)
- Scheduled wake windows — coordinate with peers for efficient sync
- Power profiles — "always-on", "hourly-sync", "manual-only"
Action Items
- Implement CAD (Channel Activity Detection) in LoRaTransport
- Add power profile config to P2pNode
- Measure actual power consumption with real hardware
Gap 5: Real-World Testing
The Problem
All our mesh code runs against mocks. We claim LoRa support but haven't tested with real radios.
Testing Plan
| Test | Hardware | Status |
|---|---|---|
| LoRa point-to-point | 2x SX1262 dev boards | Not started |
| LoRa multi-hop | 3x SX1262, different rooms | Not started |
| Mixed transport | LoRa + WiFi relay | Not started |
| Outdoor range test | LoRa, line-of-sight 1km | Not started |
| Duty cycle compliance | SDR spectrum analyzer | Not started |
Action Items
- Procure hardware — 3x Heltec LoRa32 or similar
- Implement UART LoRaTransport for real modems
- Create test harness for automated multi-node testing
- Document actual performance numbers
Gap 6: Comparison Claims Need Verification
The Problem
Our positioning doc claims superiority over Meshtastic/Reticulum/Briar, but:
- We haven't measured our actual overhead vs. theirs
- We haven't tested interop scenarios
- We haven't run security analysis against their threat models
Verification Plan
| Claim | How to Verify |
|---|---|
| "MLS is better than shared-key AES" | Threat model comparison doc |
| "Multi-hop works" | Integration test with 5+ nodes |
| "LoRa-ready" | Actual LoRa hardware test |
| "Post-quantum protects groups" | Verify hybrid KEM in MLS path |
| "Relay nodes can't read content" | Formal verification of E2E path |
Action Items
- Create benchmark suite comparing message sizes
- Write threat model comparison doc (Meshtastic CVEs, Reticulum link-level)
- Fuzz test mesh envelope parsing
- Get external review of mesh crypto design
Implementation Priority
Phase 1: Make It Work (Next 2 Sprints)
- S4: Multi-hop routing — complete the core mesh functionality
- S5: Truncated addresses — reduce envelope overhead
- Measure actual sizes — know the real numbers
Phase 2: Make It Efficient (Following 2 Sprints)
- Design MLS-Lite — spec for constrained links
- Priority queue — emergency messages first
- Hardware testing — real LoRa validation
Phase 3: Make It Production-Ready
- KeyPackage distribution — mesh-native key exchange
- Power profiles — battery optimization
- External review — security audit of mesh layer
Success Metrics
| Metric | Previous | Current | Target |
|---|---|---|---|
| MeshEnvelope overhead (empty) | ~410 bytes | ~336 (V2) | ✅ Done |
| MLS-Lite message (no sig) | N/A | ~129 bytes | ✅ Done |
| Time to send "hello" over SF12 LoRa | ~27 sec | ~4 sec (MLS-Lite) | ✅ Done |
| KeyPackage exchange over mesh | Not possible | Pending | Works |
| Multi-hop message delivery | Mock only | Code complete | Real hardware |
| Battery life (mesh mode) | Unknown | Unknown | Measured |
Honest Assessment
What we do well:
- MLS group crypto is genuinely better than Meshtastic/Reticulum
- Transport abstraction is clean
- Announce protocol is solid
- NEW: Classical MLS KeyPackage (306B) is actually LoRa-viable
- NEW: MLS-Lite provides Meshtastic-level efficiency with real auth
What we still need to fix:
- No solution for KeyPackage distribution without server
- No real-world testing with actual LoRa hardware
- Post-quantum hybrid mode too large for constrained links
What we can now claim:
- "MLS on LoRa" — YES, classical MLS works with ~14 sec group setup
- "MLS-Lite for constrained" — YES, ~2-4 sec messages with auth
- "Post-quantum on LoRa" — NO, hybrid mode is impractical (2.6KB KeyPackage)
- "Production-ready" — NO, still research-stage, pending hardware tests
Last updated: 2026-03-30