Honest assessment of QuicProChat vs Reticulum/Meshtastic/Briar: - MLS overhead (500-800 byte KeyPackages) impractical for SF12 LoRa - KeyPackage distribution over mesh unsolved - No lightweight mode for constrained links MLS-Lite design proposes 41-byte overhead symmetric mode: - ChaCha20-Poly1305 with HKDF key derivation - Optional Ed25519 signatures - Upgrade path to full MLS when faster transport available - QR code / out-of-band key exchange
324 lines
11 KiB
Markdown
324 lines
11 KiB
Markdown
# Mesh Protocol Gaps — Honest Assessment & Action Plan
|
|
|
|
> **Goal:** Identify real weaknesses in QuicProChat's mesh protocol compared to
|
|
> Reticulum, Meshtastic, and LXMF. Plan concrete improvements.
|
|
>
|
|
> Created: 2026-03-30
|
|
|
|
---
|
|
|
|
## Executive Summary
|
|
|
|
QuicProChat has strong cryptography (MLS, PQ-KEM) but **real gaps** in the mesh layer:
|
|
|
|
| Gap | Severity | Status |
|
|
|-----|----------|--------|
|
|
| MLS overhead too large for LoRa | **Critical** | Needs design work |
|
|
| No lightweight messaging mode | **High** | Not started |
|
|
| KeyPackage distribution over mesh | **High** | Not solved |
|
|
| Announce/routing not battle-tested | **Medium** | S3 done, needs real-world test |
|
|
| No DTN bundle protocol integration | **Medium** | Not started |
|
|
| Battery/duty-cycle optimization | **Medium** | Basic tracker exists |
|
|
|
|
---
|
|
|
|
## Gap 1: MLS Overhead is Prohibitive for Constrained Links
|
|
|
|
### The Problem
|
|
|
|
**MLS was designed for Internet messaging, not LoRa.**
|
|
|
|
Measured sizes (approximate):
|
|
|
|
| Component | Size (bytes) | LoRa SF12/BW125 airtime |
|
|
|-----------|--------------|------------------------|
|
|
| MLS KeyPackage | ~500-800 | 80-130 seconds |
|
|
| MLS Welcome | ~1000-2000 | 160-320 seconds |
|
|
| MLS Commit | ~200-500 | 32-80 seconds |
|
|
| MLS ApplicationMessage | ~100-200 | 16-32 seconds |
|
|
| **MeshEnvelope overhead** | ~170 (CBOR) | 27 seconds |
|
|
| **Reticulum LXMF message** | ~100-150 | 16-24 seconds |
|
|
| **Meshtastic payload** | ~237 max | 38 seconds |
|
|
|
|
**The math doesn't work:**
|
|
|
|
- LoRa SF12/BW125: ~51 byte MTU, ~300 bps effective
|
|
- EU868 duty cycle: 1% = 36 seconds TX per hour
|
|
- **One MLS KeyPackage = 10-20 fragments = entire hour's duty budget**
|
|
|
|
### Current State
|
|
|
|
- MeshEnvelope uses CBOR, ~170 bytes overhead for a short message
|
|
- MLS operations happen at application layer, not optimized for mesh
|
|
- No fallback to lighter crypto for constrained links
|
|
|
|
### Proposed Solutions
|
|
|
|
#### Option A: Hybrid Crypto Modes (Recommended)
|
|
|
|
```
|
|
┌─────────────────────────────────────────────────────────────────┐
|
|
│ Mode Selection Based on Transport Capability │
|
|
├─────────────────────────────────────────────────────────────────┤
|
|
│ │
|
|
│ QUIC/TCP/WiFi (>10 kbps): │
|
|
│ → Full MLS groups with PQ-KEM │
|
|
│ → KeyPackage distribution via server │
|
|
│ → Standard protocol │
|
|
│ │
|
|
│ LoRa/Serial (<1 kbps): │
|
|
│ → "MLS-Lite" mode: │
|
|
│ • Pre-shared group epoch key (exchanged out-of-band) │
|
|
│ • ChaCha20-Poly1305 symmetric encryption │
|
|
│ • Ed25519 signatures (64 bytes) │
|
|
│ • No per-message KeyPackage exchange │
|
|
│ • Manual key rotation via QR code or faster link │
|
|
│ │
|
|
│ Upgrade path: │
|
|
│ When faster transport available → full MLS epoch sync │
|
|
│ │
|
|
└─────────────────────────────────────────────────────────────────┘
|
|
```
|
|
|
|
**Trade-off:** Lose automatic PCS on constrained links. Gain usability.
|
|
|
|
#### Option B: Compressed MLS (Research)
|
|
|
|
- Strip unused extensions from KeyPackages
|
|
- Use shorter credential identifiers (16 bytes instead of 32)
|
|
- Batch multiple KeyPackages into single transfer over fast link
|
|
- Cache and reuse KeyPackages more aggressively
|
|
|
|
**Trade-off:** Still large. May not be enough for SF12 LoRa.
|
|
|
|
#### Option C: LXMF-Compatible Mode
|
|
|
|
Implement Reticulum's LXMF format as an alternative wire format:
|
|
|
|
```rust
|
|
pub struct LxmfMessage {
|
|
destination: [u8; 16], // Truncated hash
|
|
source: [u8; 16],
|
|
signature: [u8; 64], // Ed25519
|
|
payload: Vec<u8>, // msgpack: {timestamp, content, title, fields}
|
|
}
|
|
// Total: ~100-150 bytes for short message
|
|
```
|
|
|
|
**Trade-off:** Lose MLS group properties. Gain Reticulum interop and efficiency.
|
|
|
|
### Action Items
|
|
|
|
- [ ] **Measure actual MLS sizes** in current implementation (benchmark)
|
|
- [ ] **Design MLS-Lite spec** for constrained links
|
|
- [ ] **Implement transport capability negotiation** in TransportManager
|
|
- [ ] **Add `--constrained` mode** to MeshEnvelope for minimal overhead
|
|
|
|
---
|
|
|
|
## Gap 2: KeyPackage Distribution Over Mesh
|
|
|
|
### The Problem
|
|
|
|
MLS requires pre-positioned KeyPackages for adding members to groups. On Internet:
|
|
server stores KeyPackages, clients fetch on demand. On mesh: **no server**.
|
|
|
|
Current flow (broken for pure mesh):
|
|
```
|
|
Alice wants to add Bob to group:
|
|
1. Alice fetches Bob's KeyPackage from server ← requires Internet
|
|
2. Alice creates Welcome + Commit
|
|
3. Alice sends to Bob via mesh
|
|
```
|
|
|
|
### Proposed Solution: Announce-Based KeyPackage Distribution
|
|
|
|
```
|
|
Bob announces on mesh:
|
|
1. MeshAnnounce includes: identity_key, capabilities, AND current_keypackage_hash
|
|
2. Nearby nodes cache Bob's latest KeyPackage (if they have it)
|
|
3. Alice receives Bob's announce, requests KeyPackage via mesh RPC
|
|
|
|
KeyPackage propagation:
|
|
1. Bob periodically broadcasts KeyPackage update (larger message, less frequent)
|
|
2. Nodes with capacity (CAP_STORE) cache KeyPackages for relaying
|
|
3. TTL-based expiry (KeyPackages are single-use, but we can cache N of them)
|
|
```
|
|
|
|
### Action Items
|
|
|
|
- [ ] **Extend MeshAnnounce** with optional `keypackage_hash` field
|
|
- [ ] **Add KeyPackage request/response** to mesh protocol
|
|
- [ ] **Implement KeyPackage cache** in MeshStore (separate from message queue)
|
|
- [ ] **Design KeyPackage refresh protocol** for mesh-only scenarios
|
|
|
|
---
|
|
|
|
## Gap 3: No DTN/Bundle Protocol Integration
|
|
|
|
### The Problem
|
|
|
|
NASA/IETF Bundle Protocol (RFC 9171) is the standard for delay-tolerant networking.
|
|
Reticulum effectively reinvented it. QuicProChat should learn from both.
|
|
|
|
Key DTN concepts we're missing:
|
|
|
|
| Concept | DTN/BPv7 | Reticulum | QuicProChat |
|
|
|---------|----------|-----------|-------------|
|
|
| **Custody transfer** | Yes | No | No |
|
|
| **Fragmentation at bundle layer** | Yes | No | Yes (LoRa transport) |
|
|
| **Convergence layer adapters** | Formal spec | Interfaces | MeshTransport trait |
|
|
| **Routing protocols** | CGR, EPIDEMIC | Announce-based | Announce-based |
|
|
| **Priority scheduling** | Yes | No | No |
|
|
|
|
### Proposed Improvements
|
|
|
|
1. **Priority levels in MeshEnvelope** (emergency > data > announce)
|
|
2. **Custody transfer option** — intermediate node takes responsibility
|
|
3. **Better congestion control** — backpressure signals in announce
|
|
|
|
### Action Items
|
|
|
|
- [ ] **Add priority field** to MeshEnvelope
|
|
- [ ] **Research custody transfer** — is it worth the complexity?
|
|
- [ ] **Implement priority queue** in MeshStore and DutyCycleTracker
|
|
|
|
---
|
|
|
|
## Gap 4: Battery/Duty-Cycle Optimization
|
|
|
|
### The Problem
|
|
|
|
Briar drains 4x battery due to constant BT scanning. We claim to be better but
|
|
haven't proven it.
|
|
|
|
Current state:
|
|
- DutyCycleTracker enforces EU868 1% limit
|
|
- Announce interval is configurable (default 10 min)
|
|
- No adaptive power management
|
|
|
|
### Proposed Improvements
|
|
|
|
1. **Adaptive announce interval** — more frequent when activity, less when idle
|
|
2. **Listen-before-talk** — don't TX if channel is busy (LoRa CAD)
|
|
3. **Scheduled wake windows** — coordinate with peers for efficient sync
|
|
4. **Power profiles** — "always-on", "hourly-sync", "manual-only"
|
|
|
|
### Action Items
|
|
|
|
- [ ] **Implement CAD (Channel Activity Detection)** in LoRaTransport
|
|
- [ ] **Add power profile config** to P2pNode
|
|
- [ ] **Measure actual power consumption** with real hardware
|
|
|
|
---
|
|
|
|
## Gap 5: Real-World Testing
|
|
|
|
### The Problem
|
|
|
|
All our mesh code runs against mocks. We claim LoRa support but haven't tested
|
|
with real radios.
|
|
|
|
### Testing Plan
|
|
|
|
| Test | Hardware | Status |
|
|
|------|----------|--------|
|
|
| LoRa point-to-point | 2x SX1262 dev boards | Not started |
|
|
| LoRa multi-hop | 3x SX1262, different rooms | Not started |
|
|
| Mixed transport | LoRa + WiFi relay | Not started |
|
|
| Outdoor range test | LoRa, line-of-sight 1km | Not started |
|
|
| Duty cycle compliance | SDR spectrum analyzer | Not started |
|
|
|
|
### Action Items
|
|
|
|
- [ ] **Procure hardware** — 3x Heltec LoRa32 or similar
|
|
- [ ] **Implement UART LoRaTransport** for real modems
|
|
- [ ] **Create test harness** for automated multi-node testing
|
|
- [ ] **Document actual performance** numbers
|
|
|
|
---
|
|
|
|
## Gap 6: Comparison Claims Need Verification
|
|
|
|
### The Problem
|
|
|
|
Our positioning doc claims superiority over Meshtastic/Reticulum/Briar, but:
|
|
|
|
- We haven't measured our actual overhead vs. theirs
|
|
- We haven't tested interop scenarios
|
|
- We haven't run security analysis against their threat models
|
|
|
|
### Verification Plan
|
|
|
|
| Claim | How to Verify |
|
|
|-------|---------------|
|
|
| "MLS is better than shared-key AES" | Threat model comparison doc |
|
|
| "Multi-hop works" | Integration test with 5+ nodes |
|
|
| "LoRa-ready" | Actual LoRa hardware test |
|
|
| "Post-quantum protects groups" | Verify hybrid KEM in MLS path |
|
|
| "Relay nodes can't read content" | Formal verification of E2E path |
|
|
|
|
### Action Items
|
|
|
|
- [ ] **Create benchmark suite** comparing message sizes
|
|
- [ ] **Write threat model comparison** doc (Meshtastic CVEs, Reticulum link-level)
|
|
- [ ] **Fuzz test** mesh envelope parsing
|
|
- [ ] **Get external review** of mesh crypto design
|
|
|
|
---
|
|
|
|
## Implementation Priority
|
|
|
|
### Phase 1: Make It Work (Next 2 Sprints)
|
|
|
|
1. **S4: Multi-hop routing** — complete the core mesh functionality
|
|
2. **S5: Truncated addresses** — reduce envelope overhead
|
|
3. **Measure actual sizes** — know the real numbers
|
|
|
|
### Phase 2: Make It Efficient (Following 2 Sprints)
|
|
|
|
4. **Design MLS-Lite** — spec for constrained links
|
|
5. **Priority queue** — emergency messages first
|
|
6. **Hardware testing** — real LoRa validation
|
|
|
|
### Phase 3: Make It Production-Ready
|
|
|
|
7. **KeyPackage distribution** — mesh-native key exchange
|
|
8. **Power profiles** — battery optimization
|
|
9. **External review** — security audit of mesh layer
|
|
|
|
---
|
|
|
|
## Success Metrics
|
|
|
|
| Metric | Current | Target |
|
|
|--------|---------|--------|
|
|
| MeshEnvelope overhead (short msg) | ~170 bytes | <100 bytes |
|
|
| Time to send "hello" over SF12 LoRa | ~27 sec | <15 sec |
|
|
| KeyPackage exchange over mesh | Not possible | Works |
|
|
| Multi-hop message delivery | Mock only | Real hardware |
|
|
| Battery life (mesh mode) | Unknown | Measured & documented |
|
|
|
|
---
|
|
|
|
## Honest Assessment
|
|
|
|
**What we do well:**
|
|
- MLS group crypto is genuinely better than Meshtastic/Reticulum
|
|
- Transport abstraction is clean
|
|
- Announce protocol is solid
|
|
|
|
**What we need to fix:**
|
|
- MLS overhead makes LoRa impractical for group setup
|
|
- No solution for KeyPackage distribution without server
|
|
- No real-world testing yet
|
|
|
|
**What we should acknowledge in marketing:**
|
|
- "Best crypto for mesh" is true, but with caveats
|
|
- "LoRa-ready" means "designed for LoRa, pending optimization"
|
|
- We're research-stage, not production-ready
|
|
|
|
---
|
|
|
|
*Last updated: 2026-03-30*
|