Files

Christian Nennemann bcde8b733c docs: update mesh-protocol-gaps with actual measurements

Key findings from actual benchmarks:
- MLS KeyPackage: 306 bytes (6 LoRa fragments, ~4 sec)
- MLS Welcome: 840 bytes (17 fragments, ~10 sec)
- MLS-Lite: 129 bytes without sig, 262 with sig
- MeshEnvelope V2: 336 bytes (~18% savings over V1)

Classical MLS is LoRa-viable! Group setup takes ~14 sec at 1% duty.
Post-quantum hybrid (2.6KB KeyPackage) is still impractical.

Updated action items to reflect completed work:
- MLS-Lite implemented
- MeshEnvelope V2 implemented
- Size measurements complete

2026-03-30 23:53:27 +02:00

13 KiB

Raw Blame History

Mesh Protocol Gaps — Honest Assessment & Action Plan

Goal: Identify real weaknesses in QuicProChat's mesh protocol compared to Reticulum, Meshtastic, and LXMF. Plan concrete improvements.

Created: 2026-03-30

Executive Summary

QuicProChat has strong cryptography (MLS, PQ-KEM) but real gaps in the mesh layer:

Gap	Severity	Status
MLS overhead too large for LoRa	Critical	MEASURED — see actual sizes below
No lightweight messaging mode	High	DONE — MLS-Lite implemented
KeyPackage distribution over mesh	High	Not solved
Announce/routing not battle-tested	Medium	S3-S4 done, needs real-world test
No DTN bundle protocol integration	Medium	Priority field added
Battery/duty-cycle optimization	Medium	Basic tracker exists

Gap 1: MLS Overhead is Prohibitive for Constrained Links

The Problem

MLS was designed for Internet messaging, not LoRa.

Actual Measured Sizes (2026-03-30)

Component	Size (bytes)	LoRa SF12 fragments	At 1% duty
MLS KeyPackage	306	6	~4 sec
MLS Welcome	840	17	~10 sec
MLS Commit (add)	736	15	~9 sec
MLS AppMessage (5B)	143	3	~2 sec
MLS Commit (update)	544	11	~7 sec
MLS KeyPackage (PQ)	2,676	53	~32 sec
MLS Welcome (PQ)	5,504	108	~65 sec
MeshEnvelope V1 (CBOR)	410	9	~5 sec
MeshEnvelope V2 (truncated)	336	7	~4 sec
MLS-Lite (no sig)	129	3	~2 sec
MLS-Lite (with sig)	262	6	~4 sec
Reticulum LXMF	~100-150	2-3	~1-2 sec
Meshtastic max	237	5	~3 sec

Key insights:

Classical MLS is viable for LoRa — 6 fragments for KeyPackage
Post-quantum hybrid MLS is prohibitive — 53+ fragments for KeyPackage
MLS-Lite matches Meshtastic efficiency while adding proper auth
Total group setup (KeyPackage + Welcome): ~23 fragments, ~14 sec

The math NOW works for classical MLS on LoRa:

LoRa SF12/BW125: ~51 byte MTU, ~300 bps effective
EU868 duty cycle: 1% = 36 seconds TX per hour
One MLS KeyPackage = 6 fragments = 4 sec = acceptable
Group setup = 14 sec = half duty budget, but feasible

Post-quantum is still problematic for constrained links.

Current State (Updated 2026-03-30)

✅ MeshEnvelope V1 uses CBOR, ~410 bytes for empty payload
✅ MeshEnvelope V2 uses truncated 16-byte addresses, ~336 bytes (~18% savings)
✅ MLS-Lite implemented: ~129 bytes without signature, ~262 with
✅ Classical MLS KeyPackage measured at 306 bytes (much better than expected)
⚠️ PQ-hybrid MLS still large (2.6KB KeyPackage)

Proposed Solutions

Option A: Hybrid Crypto Modes (Recommended)

┌─────────────────────────────────────────────────────────────────┐
│  Mode Selection Based on Transport Capability                   │
├─────────────────────────────────────────────────────────────────┤
│                                                                 │
│  QUIC/TCP/WiFi (>10 kbps):                                     │
│    → Full MLS groups with PQ-KEM                               │
│    → KeyPackage distribution via server                        │
│    → Standard protocol                                          │
│                                                                 │
│  LoRa/Serial (<1 kbps):                                        │
│    → "MLS-Lite" mode:                                          │
│      • Pre-shared group epoch key (exchanged out-of-band)      │
│      • ChaCha20-Poly1305 symmetric encryption                  │
│      • Ed25519 signatures (64 bytes)                           │
│      • No per-message KeyPackage exchange                      │
│      • Manual key rotation via QR code or faster link          │
│                                                                 │
│  Upgrade path:                                                  │
│    When faster transport available → full MLS epoch sync       │
│                                                                 │
└─────────────────────────────────────────────────────────────────┘

Trade-off: Lose automatic PCS on constrained links. Gain usability.

Option B: Compressed MLS (Research)

Strip unused extensions from KeyPackages
Use shorter credential identifiers (16 bytes instead of 32)
Batch multiple KeyPackages into single transfer over fast link
Cache and reuse KeyPackages more aggressively

Trade-off: Still large. May not be enough for SF12 LoRa.

Option C: LXMF-Compatible Mode

Implement Reticulum's LXMF format as an alternative wire format:

pub struct LxmfMessage {
    destination: [u8; 16],   // Truncated hash
    source: [u8; 16],
    signature: [u8; 64],     // Ed25519
    payload: Vec<u8>,        // msgpack: {timestamp, content, title, fields}
}
// Total: ~100-150 bytes for short message

Trade-off: Lose MLS group properties. Gain Reticulum interop and efficiency.

Action Items

Measure actual MLS sizes — done, see table above
Design MLS-Lite spec — docs/plans/mls-lite-design.md
Implement MLS-Lite — crates/quicprochat-p2p/src/mls_lite.rs
Implement MeshEnvelope V2 — truncated addresses, priority field
Implement transport capability negotiation in TransportManager
Test MLS-Lite vs full MLS on real LoRa

Gap 2: KeyPackage Distribution Over Mesh

The Problem

MLS requires pre-positioned KeyPackages for adding members to groups. On Internet: server stores KeyPackages, clients fetch on demand. On mesh: no server.

Current flow (broken for pure mesh):

Alice wants to add Bob to group:
1. Alice fetches Bob's KeyPackage from server    ← requires Internet
2. Alice creates Welcome + Commit
3. Alice sends to Bob via mesh

Proposed Solution: Announce-Based KeyPackage Distribution

Bob announces on mesh:
1. MeshAnnounce includes: identity_key, capabilities, AND current_keypackage_hash
2. Nearby nodes cache Bob's latest KeyPackage (if they have it)
3. Alice receives Bob's announce, requests KeyPackage via mesh RPC

KeyPackage propagation:
1. Bob periodically broadcasts KeyPackage update (larger message, less frequent)
2. Nodes with capacity (CAP_STORE) cache KeyPackages for relaying
3. TTL-based expiry (KeyPackages are single-use, but we can cache N of them)

Action Items

Extend MeshAnnounce with optional keypackage_hash field
Add KeyPackage request/response to mesh protocol
Implement KeyPackage cache in MeshStore (separate from message queue)
Design KeyPackage refresh protocol for mesh-only scenarios

Gap 3: No DTN/Bundle Protocol Integration

The Problem

NASA/IETF Bundle Protocol (RFC 9171) is the standard for delay-tolerant networking. Reticulum effectively reinvented it. QuicProChat should learn from both.

Key DTN concepts we're missing:

Concept	DTN/BPv7	Reticulum	QuicProChat
Custody transfer	Yes	No	No
Fragmentation at bundle layer	Yes	No	Yes (LoRa transport)
Convergence layer adapters	Formal spec	Interfaces	MeshTransport trait
Routing protocols	CGR, EPIDEMIC	Announce-based	Announce-based
Priority scheduling	Yes	No	No

Proposed Improvements

Priority levels in MeshEnvelope (emergency > data > announce)
Custody transfer option — intermediate node takes responsibility
Better congestion control — backpressure signals in announce

Action Items

Add priority field to MeshEnvelope
Research custody transfer — is it worth the complexity?
Implement priority queue in MeshStore and DutyCycleTracker

Gap 4: Battery/Duty-Cycle Optimization

The Problem

Briar drains 4x battery due to constant BT scanning. We claim to be better but haven't proven it.

Current state:

DutyCycleTracker enforces EU868 1% limit
Announce interval is configurable (default 10 min)
No adaptive power management

Proposed Improvements

Adaptive announce interval — more frequent when activity, less when idle
Listen-before-talk — don't TX if channel is busy (LoRa CAD)
Scheduled wake windows — coordinate with peers for efficient sync
Power profiles — "always-on", "hourly-sync", "manual-only"

Action Items

Implement CAD (Channel Activity Detection) in LoRaTransport
Add power profile config to P2pNode
Measure actual power consumption with real hardware

Gap 5: Real-World Testing

The Problem

All our mesh code runs against mocks. We claim LoRa support but haven't tested with real radios.

Testing Plan

Test	Hardware	Status
LoRa point-to-point	2x SX1262 dev boards	Not started
LoRa multi-hop	3x SX1262, different rooms	Not started
Mixed transport	LoRa + WiFi relay	Not started
Outdoor range test	LoRa, line-of-sight 1km	Not started
Duty cycle compliance	SDR spectrum analyzer	Not started

Action Items

Procure hardware — 3x Heltec LoRa32 or similar
Implement UART LoRaTransport for real modems
Create test harness for automated multi-node testing
Document actual performance numbers

Gap 6: Comparison Claims Need Verification

The Problem

Our positioning doc claims superiority over Meshtastic/Reticulum/Briar, but:

We haven't measured our actual overhead vs. theirs
We haven't tested interop scenarios
We haven't run security analysis against their threat models

Verification Plan

Claim	How to Verify
"MLS is better than shared-key AES"	Threat model comparison doc
"Multi-hop works"	Integration test with 5+ nodes
"LoRa-ready"	Actual LoRa hardware test
"Post-quantum protects groups"	Verify hybrid KEM in MLS path
"Relay nodes can't read content"	Formal verification of E2E path

Action Items

Create benchmark suite comparing message sizes
Write threat model comparison doc (Meshtastic CVEs, Reticulum link-level)
Fuzz test mesh envelope parsing
Get external review of mesh crypto design

Implementation Priority

Phase 1: Make It Work (Next 2 Sprints)

S4: Multi-hop routing — complete the core mesh functionality
S5: Truncated addresses — reduce envelope overhead
Measure actual sizes — know the real numbers

Phase 2: Make It Efficient (Following 2 Sprints)

Design MLS-Lite — spec for constrained links
Priority queue — emergency messages first
Hardware testing — real LoRa validation

Phase 3: Make It Production-Ready

KeyPackage distribution — mesh-native key exchange
Power profiles — battery optimization
External review — security audit of mesh layer

Success Metrics

Metric	Previous	Current	Target
MeshEnvelope overhead (empty)	~410 bytes	~336 (V2)	✅ Done
MLS-Lite message (no sig)	N/A	~129 bytes	✅ Done
Time to send "hello" over SF12 LoRa	~27 sec	~4 sec (MLS-Lite)	✅ Done
KeyPackage exchange over mesh	Not possible	Pending	Works
Multi-hop message delivery	Mock only	Code complete	Real hardware
Battery life (mesh mode)	Unknown	Unknown	Measured

Honest Assessment

What we do well:

MLS group crypto is genuinely better than Meshtastic/Reticulum
Transport abstraction is clean
Announce protocol is solid
NEW: Classical MLS KeyPackage (306B) is actually LoRa-viable
NEW: MLS-Lite provides Meshtastic-level efficiency with real auth

What we still need to fix:

No solution for KeyPackage distribution without server
No real-world testing with actual LoRa hardware
Post-quantum hybrid mode too large for constrained links

What we can now claim:

"MLS on LoRa" — YES, classical MLS works with ~14 sec group setup
"MLS-Lite for constrained" — YES, ~2-4 sec messages with auth
"Post-quantum on LoRa" — NO, hybrid mode is impractical (2.6KB KeyPackage)
"Production-ready" — NO, still research-stage, pending hardware tests

Last updated: 2026-03-30

13 KiB Raw Blame History

Mesh Protocol Gaps — Honest Assessment & Action Plan

Executive Summary

Gap 1: MLS Overhead is Prohibitive for Constrained Links

The Problem

Actual Measured Sizes (2026-03-30)

Current State (Updated 2026-03-30)

Proposed Solutions

Option A: Hybrid Crypto Modes (Recommended)

Option B: Compressed MLS (Research)

Option C: LXMF-Compatible Mode

Action Items

Gap 2: KeyPackage Distribution Over Mesh

The Problem

Proposed Solution: Announce-Based KeyPackage Distribution

Action Items

Gap 3: No DTN/Bundle Protocol Integration

The Problem

Proposed Improvements

Action Items

Gap 4: Battery/Duty-Cycle Optimization

The Problem

Proposed Improvements

Action Items

Gap 5: Real-World Testing

The Problem

Testing Plan

Action Items

Gap 6: Comparison Claims Need Verification

The Problem

Verification Plan

Action Items

Implementation Priority

Phase 1: Make It Work (Next 2 Sprints)

Phase 2: Make It Efficient (Following 2 Sprints)

Phase 3: Make It Production-Ready

Success Metrics

Honest Assessment

13 KiB

Raw Blame History