Files
quicproquo/docs/plans/mesh-protocol-gaps.md
Christian Nennemann bcde8b733c docs: update mesh-protocol-gaps with actual measurements
Key findings from actual benchmarks:
- MLS KeyPackage: 306 bytes (6 LoRa fragments, ~4 sec)
- MLS Welcome: 840 bytes (17 fragments, ~10 sec)
- MLS-Lite: 129 bytes without sig, 262 with sig
- MeshEnvelope V2: 336 bytes (~18% savings over V1)

Classical MLS is LoRa-viable! Group setup takes ~14 sec at 1% duty.
Post-quantum hybrid (2.6KB KeyPackage) is still impractical.

Updated action items to reflect completed work:
- MLS-Lite implemented
- MeshEnvelope V2 implemented
- Size measurements complete
2026-03-30 23:53:27 +02:00

13 KiB

Mesh Protocol Gaps — Honest Assessment & Action Plan

Goal: Identify real weaknesses in QuicProChat's mesh protocol compared to Reticulum, Meshtastic, and LXMF. Plan concrete improvements.

Created: 2026-03-30


Executive Summary

QuicProChat has strong cryptography (MLS, PQ-KEM) but real gaps in the mesh layer:

Gap Severity Status
MLS overhead too large for LoRa Critical MEASURED — see actual sizes below
No lightweight messaging mode High DONE — MLS-Lite implemented
KeyPackage distribution over mesh High Not solved
Announce/routing not battle-tested Medium S3-S4 done, needs real-world test
No DTN bundle protocol integration Medium Priority field added
Battery/duty-cycle optimization Medium Basic tracker exists

The Problem

MLS was designed for Internet messaging, not LoRa.

Actual Measured Sizes (2026-03-30)

Component Size (bytes) LoRa SF12 fragments At 1% duty
MLS KeyPackage 306 6 ~4 sec
MLS Welcome 840 17 ~10 sec
MLS Commit (add) 736 15 ~9 sec
MLS AppMessage (5B) 143 3 ~2 sec
MLS Commit (update) 544 11 ~7 sec
MLS KeyPackage (PQ) 2,676 53 ~32 sec
MLS Welcome (PQ) 5,504 108 ~65 sec
MeshEnvelope V1 (CBOR) 410 9 ~5 sec
MeshEnvelope V2 (truncated) 336 7 ~4 sec
MLS-Lite (no sig) 129 3 ~2 sec
MLS-Lite (with sig) 262 6 ~4 sec
Reticulum LXMF ~100-150 2-3 ~1-2 sec
Meshtastic max 237 5 ~3 sec

Key insights:

  • Classical MLS is viable for LoRa — 6 fragments for KeyPackage
  • Post-quantum hybrid MLS is prohibitive — 53+ fragments for KeyPackage
  • MLS-Lite matches Meshtastic efficiency while adding proper auth
  • Total group setup (KeyPackage + Welcome): ~23 fragments, ~14 sec

The math NOW works for classical MLS on LoRa:

  • LoRa SF12/BW125: ~51 byte MTU, ~300 bps effective
  • EU868 duty cycle: 1% = 36 seconds TX per hour
  • One MLS KeyPackage = 6 fragments = 4 sec = acceptable
  • Group setup = 14 sec = half duty budget, but feasible

Post-quantum is still problematic for constrained links.

Current State (Updated 2026-03-30)

  • MeshEnvelope V1 uses CBOR, ~410 bytes for empty payload
  • MeshEnvelope V2 uses truncated 16-byte addresses, ~336 bytes (~18% savings)
  • MLS-Lite implemented: ~129 bytes without signature, ~262 with
  • Classical MLS KeyPackage measured at 306 bytes (much better than expected)
  • ⚠️ PQ-hybrid MLS still large (2.6KB KeyPackage)

Proposed Solutions

┌─────────────────────────────────────────────────────────────────┐
│  Mode Selection Based on Transport Capability                   │
├─────────────────────────────────────────────────────────────────┤
│                                                                 │
│  QUIC/TCP/WiFi (>10 kbps):                                     │
│    → Full MLS groups with PQ-KEM                               │
│    → KeyPackage distribution via server                        │
│    → Standard protocol                                          │
│                                                                 │
│  LoRa/Serial (<1 kbps):                                        │
│    → "MLS-Lite" mode:                                          │
│      • Pre-shared group epoch key (exchanged out-of-band)      │
│      • ChaCha20-Poly1305 symmetric encryption                  │
│      • Ed25519 signatures (64 bytes)                           │
│      • No per-message KeyPackage exchange                      │
│      • Manual key rotation via QR code or faster link          │
│                                                                 │
│  Upgrade path:                                                  │
│    When faster transport available → full MLS epoch sync       │
│                                                                 │
└─────────────────────────────────────────────────────────────────┘

Trade-off: Lose automatic PCS on constrained links. Gain usability.

Option B: Compressed MLS (Research)

  • Strip unused extensions from KeyPackages
  • Use shorter credential identifiers (16 bytes instead of 32)
  • Batch multiple KeyPackages into single transfer over fast link
  • Cache and reuse KeyPackages more aggressively

Trade-off: Still large. May not be enough for SF12 LoRa.

Option C: LXMF-Compatible Mode

Implement Reticulum's LXMF format as an alternative wire format:

pub struct LxmfMessage {
    destination: [u8; 16],   // Truncated hash
    source: [u8; 16],
    signature: [u8; 64],     // Ed25519
    payload: Vec<u8>,        // msgpack: {timestamp, content, title, fields}
}
// Total: ~100-150 bytes for short message

Trade-off: Lose MLS group properties. Gain Reticulum interop and efficiency.

Action Items

  • Measure actual MLS sizes — done, see table above
  • Design MLS-Lite specdocs/plans/mls-lite-design.md
  • Implement MLS-Litecrates/quicprochat-p2p/src/mls_lite.rs
  • Implement MeshEnvelope V2 — truncated addresses, priority field
  • Implement transport capability negotiation in TransportManager
  • Test MLS-Lite vs full MLS on real LoRa

Gap 2: KeyPackage Distribution Over Mesh

The Problem

MLS requires pre-positioned KeyPackages for adding members to groups. On Internet: server stores KeyPackages, clients fetch on demand. On mesh: no server.

Current flow (broken for pure mesh):

Alice wants to add Bob to group:
1. Alice fetches Bob's KeyPackage from server    ← requires Internet
2. Alice creates Welcome + Commit
3. Alice sends to Bob via mesh

Proposed Solution: Announce-Based KeyPackage Distribution

Bob announces on mesh:
1. MeshAnnounce includes: identity_key, capabilities, AND current_keypackage_hash
2. Nearby nodes cache Bob's latest KeyPackage (if they have it)
3. Alice receives Bob's announce, requests KeyPackage via mesh RPC

KeyPackage propagation:
1. Bob periodically broadcasts KeyPackage update (larger message, less frequent)
2. Nodes with capacity (CAP_STORE) cache KeyPackages for relaying
3. TTL-based expiry (KeyPackages are single-use, but we can cache N of them)

Action Items

  • Extend MeshAnnounce with optional keypackage_hash field
  • Add KeyPackage request/response to mesh protocol
  • Implement KeyPackage cache in MeshStore (separate from message queue)
  • Design KeyPackage refresh protocol for mesh-only scenarios

Gap 3: No DTN/Bundle Protocol Integration

The Problem

NASA/IETF Bundle Protocol (RFC 9171) is the standard for delay-tolerant networking. Reticulum effectively reinvented it. QuicProChat should learn from both.

Key DTN concepts we're missing:

Concept DTN/BPv7 Reticulum QuicProChat
Custody transfer Yes No No
Fragmentation at bundle layer Yes No Yes (LoRa transport)
Convergence layer adapters Formal spec Interfaces MeshTransport trait
Routing protocols CGR, EPIDEMIC Announce-based Announce-based
Priority scheduling Yes No No

Proposed Improvements

  1. Priority levels in MeshEnvelope (emergency > data > announce)
  2. Custody transfer option — intermediate node takes responsibility
  3. Better congestion control — backpressure signals in announce

Action Items

  • Add priority field to MeshEnvelope
  • Research custody transfer — is it worth the complexity?
  • Implement priority queue in MeshStore and DutyCycleTracker

Gap 4: Battery/Duty-Cycle Optimization

The Problem

Briar drains 4x battery due to constant BT scanning. We claim to be better but haven't proven it.

Current state:

  • DutyCycleTracker enforces EU868 1% limit
  • Announce interval is configurable (default 10 min)
  • No adaptive power management

Proposed Improvements

  1. Adaptive announce interval — more frequent when activity, less when idle
  2. Listen-before-talk — don't TX if channel is busy (LoRa CAD)
  3. Scheduled wake windows — coordinate with peers for efficient sync
  4. Power profiles — "always-on", "hourly-sync", "manual-only"

Action Items

  • Implement CAD (Channel Activity Detection) in LoRaTransport
  • Add power profile config to P2pNode
  • Measure actual power consumption with real hardware

Gap 5: Real-World Testing

The Problem

All our mesh code runs against mocks. We claim LoRa support but haven't tested with real radios.

Testing Plan

Test Hardware Status
LoRa point-to-point 2x SX1262 dev boards Not started
LoRa multi-hop 3x SX1262, different rooms Not started
Mixed transport LoRa + WiFi relay Not started
Outdoor range test LoRa, line-of-sight 1km Not started
Duty cycle compliance SDR spectrum analyzer Not started

Action Items

  • Procure hardware — 3x Heltec LoRa32 or similar
  • Implement UART LoRaTransport for real modems
  • Create test harness for automated multi-node testing
  • Document actual performance numbers

Gap 6: Comparison Claims Need Verification

The Problem

Our positioning doc claims superiority over Meshtastic/Reticulum/Briar, but:

  • We haven't measured our actual overhead vs. theirs
  • We haven't tested interop scenarios
  • We haven't run security analysis against their threat models

Verification Plan

Claim How to Verify
"MLS is better than shared-key AES" Threat model comparison doc
"Multi-hop works" Integration test with 5+ nodes
"LoRa-ready" Actual LoRa hardware test
"Post-quantum protects groups" Verify hybrid KEM in MLS path
"Relay nodes can't read content" Formal verification of E2E path

Action Items

  • Create benchmark suite comparing message sizes
  • Write threat model comparison doc (Meshtastic CVEs, Reticulum link-level)
  • Fuzz test mesh envelope parsing
  • Get external review of mesh crypto design

Implementation Priority

Phase 1: Make It Work (Next 2 Sprints)

  1. S4: Multi-hop routing — complete the core mesh functionality
  2. S5: Truncated addresses — reduce envelope overhead
  3. Measure actual sizes — know the real numbers

Phase 2: Make It Efficient (Following 2 Sprints)

  1. Design MLS-Lite — spec for constrained links
  2. Priority queue — emergency messages first
  3. Hardware testing — real LoRa validation

Phase 3: Make It Production-Ready

  1. KeyPackage distribution — mesh-native key exchange
  2. Power profiles — battery optimization
  3. External review — security audit of mesh layer

Success Metrics

Metric Previous Current Target
MeshEnvelope overhead (empty) ~410 bytes ~336 (V2) Done
MLS-Lite message (no sig) N/A ~129 bytes Done
Time to send "hello" over SF12 LoRa ~27 sec ~4 sec (MLS-Lite) Done
KeyPackage exchange over mesh Not possible Pending Works
Multi-hop message delivery Mock only Code complete Real hardware
Battery life (mesh mode) Unknown Unknown Measured

Honest Assessment

What we do well:

  • MLS group crypto is genuinely better than Meshtastic/Reticulum
  • Transport abstraction is clean
  • Announce protocol is solid
  • NEW: Classical MLS KeyPackage (306B) is actually LoRa-viable
  • NEW: MLS-Lite provides Meshtastic-level efficiency with real auth

What we still need to fix:

  • No solution for KeyPackage distribution without server
  • No real-world testing with actual LoRa hardware
  • Post-quantum hybrid mode too large for constrained links

What we can now claim:

  • "MLS on LoRa" — YES, classical MLS works with ~14 sec group setup
  • "MLS-Lite for constrained" — YES, ~2-4 sec messages with auth
  • "Post-quantum on LoRa" — NO, hybrid mode is impractical (2.6KB KeyPackage)
  • "Production-ready" — NO, still research-stage, pending hardware tests

Last updated: 2026-03-30