docs: add mesh protocol gap analysis and MLS-Lite design

Honest assessment of QuicProChat vs Reticulum/Meshtastic/Briar:
- MLS overhead (500-800 byte KeyPackages) impractical for SF12 LoRa
- KeyPackage distribution over mesh unsolved
- No lightweight mode for constrained links

MLS-Lite design proposes 41-byte overhead symmetric mode:
- ChaCha20-Poly1305 with HKDF key derivation
- Optional Ed25519 signatures
- Upgrade path to full MLS when faster transport available
- QR code / out-of-band key exchange
This commit is contained in:
2026-03-30 23:29:44 +02:00
parent f9ac921a0c
commit 01bc2a4273
2 changed files with 648 additions and 0 deletions

View File

@@ -0,0 +1,323 @@
# Mesh Protocol Gaps — Honest Assessment & Action Plan
> **Goal:** Identify real weaknesses in QuicProChat's mesh protocol compared to
> Reticulum, Meshtastic, and LXMF. Plan concrete improvements.
>
> Created: 2026-03-30
---
## Executive Summary
QuicProChat has strong cryptography (MLS, PQ-KEM) but **real gaps** in the mesh layer:
| Gap | Severity | Status |
|-----|----------|--------|
| MLS overhead too large for LoRa | **Critical** | Needs design work |
| No lightweight messaging mode | **High** | Not started |
| KeyPackage distribution over mesh | **High** | Not solved |
| Announce/routing not battle-tested | **Medium** | S3 done, needs real-world test |
| No DTN bundle protocol integration | **Medium** | Not started |
| Battery/duty-cycle optimization | **Medium** | Basic tracker exists |
---
## Gap 1: MLS Overhead is Prohibitive for Constrained Links
### The Problem
**MLS was designed for Internet messaging, not LoRa.**
Measured sizes (approximate):
| Component | Size (bytes) | LoRa SF12/BW125 airtime |
|-----------|--------------|------------------------|
| MLS KeyPackage | ~500-800 | 80-130 seconds |
| MLS Welcome | ~1000-2000 | 160-320 seconds |
| MLS Commit | ~200-500 | 32-80 seconds |
| MLS ApplicationMessage | ~100-200 | 16-32 seconds |
| **MeshEnvelope overhead** | ~170 (CBOR) | 27 seconds |
| **Reticulum LXMF message** | ~100-150 | 16-24 seconds |
| **Meshtastic payload** | ~237 max | 38 seconds |
**The math doesn't work:**
- LoRa SF12/BW125: ~51 byte MTU, ~300 bps effective
- EU868 duty cycle: 1% = 36 seconds TX per hour
- **One MLS KeyPackage = 10-20 fragments = entire hour's duty budget**
### Current State
- MeshEnvelope uses CBOR, ~170 bytes overhead for a short message
- MLS operations happen at application layer, not optimized for mesh
- No fallback to lighter crypto for constrained links
### Proposed Solutions
#### Option A: Hybrid Crypto Modes (Recommended)
```
┌─────────────────────────────────────────────────────────────────┐
│ Mode Selection Based on Transport Capability │
├─────────────────────────────────────────────────────────────────┤
│ │
│ QUIC/TCP/WiFi (>10 kbps): │
│ → Full MLS groups with PQ-KEM │
│ → KeyPackage distribution via server │
│ → Standard protocol │
│ │
│ LoRa/Serial (<1 kbps): │
│ → "MLS-Lite" mode: │
│ • Pre-shared group epoch key (exchanged out-of-band) │
│ • ChaCha20-Poly1305 symmetric encryption │
│ • Ed25519 signatures (64 bytes) │
│ • No per-message KeyPackage exchange │
│ • Manual key rotation via QR code or faster link │
│ │
│ Upgrade path: │
│ When faster transport available → full MLS epoch sync │
│ │
└─────────────────────────────────────────────────────────────────┘
```
**Trade-off:** Lose automatic PCS on constrained links. Gain usability.
#### Option B: Compressed MLS (Research)
- Strip unused extensions from KeyPackages
- Use shorter credential identifiers (16 bytes instead of 32)
- Batch multiple KeyPackages into single transfer over fast link
- Cache and reuse KeyPackages more aggressively
**Trade-off:** Still large. May not be enough for SF12 LoRa.
#### Option C: LXMF-Compatible Mode
Implement Reticulum's LXMF format as an alternative wire format:
```rust
pub struct LxmfMessage {
destination: [u8; 16], // Truncated hash
source: [u8; 16],
signature: [u8; 64], // Ed25519
payload: Vec<u8>, // msgpack: {timestamp, content, title, fields}
}
// Total: ~100-150 bytes for short message
```
**Trade-off:** Lose MLS group properties. Gain Reticulum interop and efficiency.
### Action Items
- [ ] **Measure actual MLS sizes** in current implementation (benchmark)
- [ ] **Design MLS-Lite spec** for constrained links
- [ ] **Implement transport capability negotiation** in TransportManager
- [ ] **Add `--constrained` mode** to MeshEnvelope for minimal overhead
---
## Gap 2: KeyPackage Distribution Over Mesh
### The Problem
MLS requires pre-positioned KeyPackages for adding members to groups. On Internet:
server stores KeyPackages, clients fetch on demand. On mesh: **no server**.
Current flow (broken for pure mesh):
```
Alice wants to add Bob to group:
1. Alice fetches Bob's KeyPackage from server ← requires Internet
2. Alice creates Welcome + Commit
3. Alice sends to Bob via mesh
```
### Proposed Solution: Announce-Based KeyPackage Distribution
```
Bob announces on mesh:
1. MeshAnnounce includes: identity_key, capabilities, AND current_keypackage_hash
2. Nearby nodes cache Bob's latest KeyPackage (if they have it)
3. Alice receives Bob's announce, requests KeyPackage via mesh RPC
KeyPackage propagation:
1. Bob periodically broadcasts KeyPackage update (larger message, less frequent)
2. Nodes with capacity (CAP_STORE) cache KeyPackages for relaying
3. TTL-based expiry (KeyPackages are single-use, but we can cache N of them)
```
### Action Items
- [ ] **Extend MeshAnnounce** with optional `keypackage_hash` field
- [ ] **Add KeyPackage request/response** to mesh protocol
- [ ] **Implement KeyPackage cache** in MeshStore (separate from message queue)
- [ ] **Design KeyPackage refresh protocol** for mesh-only scenarios
---
## Gap 3: No DTN/Bundle Protocol Integration
### The Problem
NASA/IETF Bundle Protocol (RFC 9171) is the standard for delay-tolerant networking.
Reticulum effectively reinvented it. QuicProChat should learn from both.
Key DTN concepts we're missing:
| Concept | DTN/BPv7 | Reticulum | QuicProChat |
|---------|----------|-----------|-------------|
| **Custody transfer** | Yes | No | No |
| **Fragmentation at bundle layer** | Yes | No | Yes (LoRa transport) |
| **Convergence layer adapters** | Formal spec | Interfaces | MeshTransport trait |
| **Routing protocols** | CGR, EPIDEMIC | Announce-based | Announce-based |
| **Priority scheduling** | Yes | No | No |
### Proposed Improvements
1. **Priority levels in MeshEnvelope** (emergency > data > announce)
2. **Custody transfer option** — intermediate node takes responsibility
3. **Better congestion control** — backpressure signals in announce
### Action Items
- [ ] **Add priority field** to MeshEnvelope
- [ ] **Research custody transfer** — is it worth the complexity?
- [ ] **Implement priority queue** in MeshStore and DutyCycleTracker
---
## Gap 4: Battery/Duty-Cycle Optimization
### The Problem
Briar drains 4x battery due to constant BT scanning. We claim to be better but
haven't proven it.
Current state:
- DutyCycleTracker enforces EU868 1% limit
- Announce interval is configurable (default 10 min)
- No adaptive power management
### Proposed Improvements
1. **Adaptive announce interval** — more frequent when activity, less when idle
2. **Listen-before-talk** — don't TX if channel is busy (LoRa CAD)
3. **Scheduled wake windows** — coordinate with peers for efficient sync
4. **Power profiles** — "always-on", "hourly-sync", "manual-only"
### Action Items
- [ ] **Implement CAD (Channel Activity Detection)** in LoRaTransport
- [ ] **Add power profile config** to P2pNode
- [ ] **Measure actual power consumption** with real hardware
---
## Gap 5: Real-World Testing
### The Problem
All our mesh code runs against mocks. We claim LoRa support but haven't tested
with real radios.
### Testing Plan
| Test | Hardware | Status |
|------|----------|--------|
| LoRa point-to-point | 2x SX1262 dev boards | Not started |
| LoRa multi-hop | 3x SX1262, different rooms | Not started |
| Mixed transport | LoRa + WiFi relay | Not started |
| Outdoor range test | LoRa, line-of-sight 1km | Not started |
| Duty cycle compliance | SDR spectrum analyzer | Not started |
### Action Items
- [ ] **Procure hardware** — 3x Heltec LoRa32 or similar
- [ ] **Implement UART LoRaTransport** for real modems
- [ ] **Create test harness** for automated multi-node testing
- [ ] **Document actual performance** numbers
---
## Gap 6: Comparison Claims Need Verification
### The Problem
Our positioning doc claims superiority over Meshtastic/Reticulum/Briar, but:
- We haven't measured our actual overhead vs. theirs
- We haven't tested interop scenarios
- We haven't run security analysis against their threat models
### Verification Plan
| Claim | How to Verify |
|-------|---------------|
| "MLS is better than shared-key AES" | Threat model comparison doc |
| "Multi-hop works" | Integration test with 5+ nodes |
| "LoRa-ready" | Actual LoRa hardware test |
| "Post-quantum protects groups" | Verify hybrid KEM in MLS path |
| "Relay nodes can't read content" | Formal verification of E2E path |
### Action Items
- [ ] **Create benchmark suite** comparing message sizes
- [ ] **Write threat model comparison** doc (Meshtastic CVEs, Reticulum link-level)
- [ ] **Fuzz test** mesh envelope parsing
- [ ] **Get external review** of mesh crypto design
---
## Implementation Priority
### Phase 1: Make It Work (Next 2 Sprints)
1. **S4: Multi-hop routing** — complete the core mesh functionality
2. **S5: Truncated addresses** — reduce envelope overhead
3. **Measure actual sizes** — know the real numbers
### Phase 2: Make It Efficient (Following 2 Sprints)
4. **Design MLS-Lite** — spec for constrained links
5. **Priority queue** — emergency messages first
6. **Hardware testing** — real LoRa validation
### Phase 3: Make It Production-Ready
7. **KeyPackage distribution** — mesh-native key exchange
8. **Power profiles** — battery optimization
9. **External review** — security audit of mesh layer
---
## Success Metrics
| Metric | Current | Target |
|--------|---------|--------|
| MeshEnvelope overhead (short msg) | ~170 bytes | <100 bytes |
| Time to send "hello" over SF12 LoRa | ~27 sec | <15 sec |
| KeyPackage exchange over mesh | Not possible | Works |
| Multi-hop message delivery | Mock only | Real hardware |
| Battery life (mesh mode) | Unknown | Measured & documented |
---
## Honest Assessment
**What we do well:**
- MLS group crypto is genuinely better than Meshtastic/Reticulum
- Transport abstraction is clean
- Announce protocol is solid
**What we need to fix:**
- MLS overhead makes LoRa impractical for group setup
- No solution for KeyPackage distribution without server
- No real-world testing yet
**What we should acknowledge in marketing:**
- "Best crypto for mesh" is true, but with caveats
- "LoRa-ready" means "designed for LoRa, pending optimization"
- We're research-stage, not production-ready
---
*Last updated: 2026-03-30*

View File

@@ -0,0 +1,325 @@
# MLS-Lite: Lightweight Crypto for Constrained Mesh Links
> **Goal:** Define a symmetric encryption mode that works on LoRa SF12 (51-byte MTU)
> while preserving as much MLS security as possible and enabling upgrade to full MLS
> when faster transports are available.
>
> Created: 2026-03-30 | Status: Design Draft
---
## Problem Statement
Full MLS is impractical on constrained links:
| MLS Operation | Size (bytes) | SF12 Fragments | TX Time (1% duty) |
|---------------|--------------|----------------|-------------------|
| KeyPackage | 500-800 | 10-16 | 10-16 hours |
| Welcome | 1000-2000 | 20-40 | 20-40 hours |
| Commit | 200-500 | 4-10 | 4-10 hours |
| AppMessage | 100-200 | 2-4 | 2-4 hours |
**Result:** Group setup over LoRa takes days. Messages take hours. Unusable.
---
## Design Goals
1. **Short message overhead:** <50 bytes for a "hello" message (fits SF12 MTU unfragmented)
2. **Group encryption:** Shared symmetric key, not just link encryption
3. **Sender authentication:** Ed25519 signature (64 bytes, fragmentable)
4. **Upgrade path:** Seamless transition to full MLS when faster link available
5. **No KeyPackage exchange:** Use pre-shared secrets or out-of-band key exchange
---
## MLS-Lite Protocol
### Mode Selection
```
┌─────────────────────────────────────────────────────────────┐
│ TransportManager │
├─────────────────────────────────────────────────────────────┤
│ On send(destination, payload): │
│ │
│ 1. Check best route to destination │
│ 2. Get transport bitrate: │
│ - QUIC/TCP (>10 kbps) → full MLS │
│ - LoRa SF7-9 (1-10 kbps) → MLS-Lite + signatures │
│ - LoRa SF10-12 (<1 kbps) → MLS-Lite, no signatures │
│ │
│ 3. Wrap payload in appropriate envelope │
│ 4. Fragment if needed for transport MTU │
│ │
└─────────────────────────────────────────────────────────────┘
```
### MLS-Lite Envelope (Minimal Mode)
For SF12 LoRa where every byte counts:
```rust
pub struct MlsLiteEnvelope {
// Header: 25 bytes
pub version: u8, // 1 byte: 0x02 = MLS-Lite
pub flags: u8, // 1 byte: [has_sig, priority(2), reserved(5)]
pub group_id: [u8; 8], // 8 bytes: truncated group identifier
pub sender_addr: [u8; 4], // 4 bytes: truncated sender address
pub seq: u32, // 4 bytes: sequence number (replay protection)
pub epoch: u16, // 2 bytes: key epoch (for rotation)
pub nonce: [u8; 5], // 5 bytes: ChaCha20 nonce suffix (epoch is prefix)
// Payload: variable
pub ciphertext: Vec<u8>, // ChaCha20-Poly1305 encrypted
// includes 16-byte auth tag
// Optional signature: 64 bytes (if has_sig flag set)
pub signature: Option<[u8; 64]>,
}
// Minimal overhead: 25 bytes header + 16 bytes tag = 41 bytes
// With signature: 105 bytes total overhead
```
### Encryption Details
```
Key derivation:
group_secret = HKDF-SHA256(
ikm = pre_shared_key || group_id,
salt = "quicprochat-mls-lite-v1",
info = epoch.to_be_bytes()
)
encryption_key = group_secret[0..32] // ChaCha20 key
nonce_prefix = group_secret[32..39] // 7 bytes
Full nonce (12 bytes):
nonce = nonce_prefix || envelope.nonce
Encrypt:
ciphertext = ChaCha20-Poly1305(
key = encryption_key,
nonce = nonce,
plaintext = payload,
aad = header_bytes // version, flags, group_id, sender_addr, seq, epoch
)
```
### Key Exchange (Out-of-Band)
MLS-Lite groups are established via:
1. **QR Code:** Scan to join group (contains group_secret + group_id)
2. **NFC Tap:** Bump phones to exchange group key
3. **Voice Readout:** 24-word mnemonic for group secret
4. **Faster Link:** Full MLS setup over QUIC, then extract epoch key for MLS-Lite
```
┌─────────────────────────────────────────────────────────────┐
│ Key Exchange Flow │
├─────────────────────────────────────────────────────────────┤
│ │
│ Option A: QR Code (in-person) │
│ Alice generates: QR(group_id || group_secret) │
│ Bob scans → joins MLS-Lite group │
│ │
│ Option B: MLS Bootstrap (hybrid) │
│ 1. Alice & Bob establish full MLS group over Internet │
│ 2. Export current epoch key as MLS-Lite group_secret │
│ 3. Both can now communicate over LoRa using MLS-Lite │
│ 4. When Internet available, re-sync to full MLS │
│ │
│ Option C: Pre-Shared Key (deployment) │
│ Org distributes group_secret to all devices │
│ Like Meshtastic channel key, but with replay protection │
│ │
└─────────────────────────────────────────────────────────────┘
```
### Key Rotation
MLS-Lite does NOT have automatic post-compromise security. Manual rotation:
```
Rotation trigger:
- Periodic (e.g., weekly)
- Member leaves group
- Suspected compromise
Rotation process:
1. New group_secret generated (QR code, or via full MLS if available)
2. epoch incremented
3. Old key deleted after grace period
4. Devices that miss rotation must re-join
```
### Upgrade to Full MLS
When faster transport becomes available:
```
┌─────────────────────────────────────────────────────────────┐
│ MLS-Lite → MLS Upgrade │
├─────────────────────────────────────────────────────────────┤
│ │
│ 1. Device detects QUIC/TCP connectivity │
│ 2. Contacts server, fetches peer KeyPackages │
│ 3. Creates full MLS group with same group_id │
│ 4. Sends MLS Welcome to all known members │
│ 5. Members upgrade to full MLS │
│ 6. MLS-Lite continues in parallel for LoRa-only members │
│ │
│ Bridging: │
│ - Gateway nodes (CAP_GATEWAY) translate between modes │
│ - Full MLS message → re-encrypt as MLS-Lite for LoRa │
│ - MLS-Lite message → forward as MLS AppMessage │
│ │
└─────────────────────────────────────────────────────────────┘
```
---
## Security Analysis
### What MLS-Lite Provides
| Property | Full MLS | MLS-Lite | Notes |
|----------|----------|----------|-------|
| **Confidentiality** | ✓ | ✓ | ChaCha20-Poly1305 |
| **Integrity** | ✓ | ✓ | Poly1305 MAC |
| **Replay protection** | ✓ | ✓ | Sequence numbers |
| **Sender auth (group)** | ✓ | ✓ | Only group members can encrypt |
| **Sender auth (individual)** | ✓ | Optional | Ed25519 signature (64 bytes) |
| **Forward secrecy** | ✓ | Partial | Only on manual epoch rotation |
| **Post-compromise security** | ✓ | ✗ | No automatic healing |
| **Transcript consistency** | ✓ | ✗ | No ratchet tree |
| **Deniability** | ✗ | ✗ | Neither provides this |
### Threat Model
**Protected against:**
- Passive eavesdropping (even quantum with PQ group_secret)
- Message replay (sequence numbers)
- Message tampering (AEAD)
- Outsider injection (need group_secret)
**NOT protected against:**
- Compromised group member reading all traffic (no PCS)
- Long-term key compromise without manual rotation
- Relay node with group_secret (but they're in the group anyway)
### Comparison to Meshtastic
| Property | Meshtastic | MLS-Lite |
|----------|------------|----------|
| **Encryption** | AES-256-CTR | ChaCha20-Poly1305 |
| **Authentication** | None (shared key) | Optional Ed25519 |
| **Replay protection** | None | Sequence numbers |
| **Key rotation** | Manual | Manual (epoch field) |
| **Overhead** | 16 bytes (header) | 41 bytes (no sig), 105 bytes (with sig) |
| **Upgrade path** | None | → Full MLS |
MLS-Lite is strictly better than Meshtastic's crypto while fitting similar constraints.
---
## Wire Format
### MLS-Lite Envelope (CBOR)
```
MlsLiteEnvelope = {
0: uint, ; version (0x02)
1: uint, ; flags
2: bytes .size 8, ; group_id
3: bytes .size 4, ; sender_addr
4: uint, ; seq
5: uint, ; epoch
6: bytes .size 5, ; nonce
7: bytes, ; ciphertext (includes 16-byte tag)
? 8: bytes .size 64 ; signature (optional)
}
```
Estimated sizes:
- Minimal (1-byte payload): ~50 bytes (fits SF12 unfragmented!)
- Short message (20 bytes): ~70 bytes (2 fragments on SF12)
- With signature: add 64 bytes
### MeshEnvelope Mode Flag
Extend MeshEnvelope to indicate crypto mode:
```rust
pub struct MeshEnvelope {
// ... existing fields ...
/// Crypto mode: 0x00 = full MLS, 0x02 = MLS-Lite
pub crypto_mode: u8,
}
```
---
## Implementation Plan
### Phase 1: Core MLS-Lite
1. [ ] Define `MlsLiteEnvelope` struct
2. [ ] Implement key derivation (HKDF)
3. [ ] Implement encrypt/decrypt (ChaCha20-Poly1305)
4. [ ] Add sequence number tracking (replay window)
5. [ ] Add CBOR serialization
6. [ ] Unit tests
### Phase 2: Integration
1. [ ] Add `crypto_mode` to TransportManager routing decisions
2. [ ] Implement QR code key exchange (generate/scan)
3. [ ] Add `/mesh lite-create <name>` REPL command
4. [ ] Add `/mesh lite-join <qr-data>` REPL command
5. [ ] Integration tests with LoRaMockMedium
### Phase 3: Gateway/Bridge
1. [ ] Implement MLS → MLS-Lite translation in gateway nodes
2. [ ] Add CAP_GATEWAY capability flag
3. [ ] Handle epoch sync between modes
4. [ ] End-to-end test: QUIC client → gateway → LoRa client
---
## Open Questions
1. **Signature vs. no signature?**
- Signatures add 64 bytes (1-2 extra fragments on SF12)
- Without signatures, any group member can spoof any sender
- Proposal: configurable, default to signatures on SF7-9, skip on SF10-12
2. **Epoch sync without server?**
- How do LoRa-only nodes learn about epoch changes?
- Proposal: Include epoch in announce, peers relay epoch updates
3. **Post-quantum group_secret?**
- MLS-Lite uses symmetric crypto (quantum-safe for confidentiality)
- Key exchange is vulnerable if using X25519
- Proposal: QR code includes ML-KEM-768 encapsulation for PQ key exchange
4. **Compatibility with Reticulum/LXMF?**
- Should we use msgpack instead of CBOR for LXMF compat?
- Should we implement LXMF as an additional mode?
---
## References
- [MLS RFC 9420](https://datatracker.ietf.org/doc/rfc9420/) — Full MLS spec
- [ChaCha20-Poly1305 RFC 8439](https://datatracker.ietf.org/doc/rfc8439/)
- [HKDF RFC 5869](https://datatracker.ietf.org/doc/rfc5869/)
- [Meshtastic Encryption](https://meshtastic.org/docs/overview/encryption/)
- [Reticulum LXMF](https://github.com/markqvist/LXMF)
---
*Last updated: 2026-03-30*