Implement transport abstraction (TCP/iroh), announce and routing table, multi-hop mesh router, truncated-address link layer, and LoRa mock medium with fragmentation plus EU868-style duty-cycle accounting. Add mesh_lora_relay_demo and scripts/mesh-demo.sh. Relax CBOR vs JSON size assertion to match fixed-size cryptographic overhead. Extend .gitignore for nested targets and node_modules. Made-with: Cursor
512 lines
19 KiB
Markdown
512 lines
19 KiB
Markdown
# Reticulum-Inspired Mesh Upgrade Plan
|
|
|
|
> **Goal:** Transform quicprochat's P2P layer from a simple direct/relay hybrid into a
|
|
> self-organizing, multi-hop mesh capable of running over LoRa, Packet Radio, Serial,
|
|
> and other low-bandwidth transports — incorporating 8 years of Reticulum design
|
|
> learnings, but with Rust, MLS, and post-quantum crypto.
|
|
>
|
|
> Created: 2026-03-30 | Sprints: 6 | Area: `quicprochat-p2p` + `quicprochat-core`
|
|
|
|
---
|
|
|
|
## Architecture Vision
|
|
|
|
```
|
|
Before (current):
|
|
Client A ──── iroh QUIC ────► Client B (direct P2P)
|
|
│ │
|
|
└── QUIC/TLS ── Server ── QUIC/TLS ┘ (relay fallback)
|
|
|
|
After (target):
|
|
Client A ── LoRa ── Node X ── WiFi ── Node Y ── Serial ── Client B
|
|
│ │
|
|
└── iroh QUIC ── Server (optional) ── iroh QUIC ──────────┘
|
|
▲
|
|
any transport works:
|
|
LoRa, Serial, TCP, UDP, WiFi, Packet Radio, QUIC
|
|
```
|
|
|
|
Key difference from Reticulum: we keep MLS group encryption, post-quantum hybrid KEM,
|
|
and formal Protobuf framing. Reticulum's transport-agnostic routing and announce
|
|
semantics are the inspiration, not the crypto.
|
|
|
|
---
|
|
|
|
## Sprint Overview
|
|
|
|
| Sprint | Name | Focus | Key Deliverable |
|
|
|--------|------|-------|-----------------|
|
|
| S1 | Binary Wire Format | Efficiency | CBOR `MeshEnvelope`, ~70% size reduction |
|
|
| S2 | Transport Abstraction | Architecture | `MeshTransport` trait, pluggable backends |
|
|
| S3 | Announce & Discovery | Self-Organization | Network-wide announce propagation + routing table |
|
|
| S4 | Multi-Hop Routing | Core Mesh | Autonomous packet forwarding across intermediate nodes |
|
|
| S5 | Truncated Addresses + Lightweight Handshake | LoRa-Ready | 16-byte addresses, minimal handshake for constrained links |
|
|
| S6 | LoRa Transport + Integration | Hardware | Working LoRa backend, end-to-end mesh demo |
|
|
|
|
---
|
|
|
|
## S1 — Binary Wire Format
|
|
|
|
**Problem:** `MeshEnvelope::to_bytes()` uses JSON serialization. A typical envelope
|
|
is ~500-800 bytes in JSON. On LoRa at 300 bps, that's 13-21 seconds per message.
|
|
|
|
**Solution:** CBOR binary serialization via `ciborium` (already in workspace deps).
|
|
|
|
**Deliverables:**
|
|
|
|
1. **`envelope_binary.rs`** — new serialization functions:
|
|
- `MeshEnvelope::to_cbor() -> Vec<u8>` — compact binary encoding
|
|
- `MeshEnvelope::from_cbor(bytes: &[u8]) -> Result<Self>` — decoding
|
|
- Keep `to_bytes()`/`from_bytes()` as JSON for debug/human-readable use
|
|
- Add `to_wire() -> Vec<u8>` as the default wire format (CBOR)
|
|
- Add `from_wire(bytes: &[u8]) -> Result<Self>` for receiving
|
|
|
|
2. **Compact field encoding:**
|
|
- `sender_key`: 32 bytes raw (not hex-encoded)
|
|
- `recipient_key`: 32 bytes raw (or 16 bytes truncated, prep for S5)
|
|
- `signature`: 64 bytes raw
|
|
- `id`: 32 bytes raw
|
|
- `payload`: raw bytes (no base64)
|
|
- `timestamp`: u64 (8 bytes)
|
|
- `ttl_secs`: u32 (4 bytes)
|
|
- `hop_count`: u8 (1 byte)
|
|
- `max_hops`: u8 (1 byte)
|
|
|
|
3. **Size comparison test:**
|
|
- Create identical envelopes, serialize both ways, assert CBOR < 50% of JSON
|
|
- Expected: ~140-160 bytes CBOR vs ~500-800 bytes JSON for a typical message
|
|
|
|
4. **Migration:** `P2pNode::send_mesh()` and `broadcast()` switch to `to_wire()`.
|
|
`from_wire()` tries CBOR first, falls back to JSON for backward compat.
|
|
|
|
**Tests:** Roundtrip CBOR, size comparison, backward compat with JSON, fuzz test
|
|
for malformed CBOR input.
|
|
|
|
**Estimated changes:** ~150 lines new code, ~20 lines modified.
|
|
|
|
---
|
|
|
|
## S2 — Transport Abstraction
|
|
|
|
**Problem:** P2P layer is hardcoded to iroh QUIC. Cannot support LoRa, Serial,
|
|
Packet Radio, or other media.
|
|
|
|
**Solution:** Abstract transport behind a trait. Reticulum calls this "Interface" —
|
|
we call it `MeshTransport`.
|
|
|
|
**Deliverables:**
|
|
|
|
1. **`transport.rs`** — trait definition:
|
|
```rust
|
|
#[async_trait]
|
|
pub trait MeshTransport: Send + Sync {
|
|
/// Human-readable transport name (e.g., "iroh-quic", "lora", "serial").
|
|
fn name(&self) -> &str;
|
|
|
|
/// Maximum transmission unit in bytes.
|
|
fn mtu(&self) -> usize;
|
|
|
|
/// Estimated bitrate in bits/second (for routing cost calculation).
|
|
fn bitrate(&self) -> u64;
|
|
|
|
/// Whether this transport supports bidirectional communication.
|
|
fn is_bidirectional(&self) -> bool;
|
|
|
|
/// Send raw bytes to a destination address.
|
|
async fn send(&self, dest: &TransportAddr, data: &[u8]) -> Result<()>;
|
|
|
|
/// Receive the next incoming packet. Blocks until data arrives.
|
|
async fn recv(&self) -> Result<(TransportAddr, Vec<u8>)>;
|
|
|
|
/// List reachable peers on this transport (e.g., mDNS scan, LoRa beacon).
|
|
async fn discover(&self) -> Result<Vec<TransportAddr>>;
|
|
}
|
|
|
|
/// Transport-agnostic address.
|
|
pub enum TransportAddr {
|
|
/// iroh node ID + optional relay.
|
|
Iroh(iroh::EndpointAddr),
|
|
/// IP:port for TCP/UDP transports.
|
|
Socket(std::net::SocketAddr),
|
|
/// LoRa device address (4 bytes).
|
|
LoRa([u8; 4]),
|
|
/// Serial port path.
|
|
Serial(String),
|
|
/// Raw bytes for unknown transports.
|
|
Raw(Vec<u8>),
|
|
}
|
|
```
|
|
|
|
2. **`transport_iroh.rs`** — refactor existing `P2pNode` send/recv into
|
|
`IrohTransport` implementing `MeshTransport`.
|
|
|
|
3. **`transport_tcp.rs`** — simple TCP transport for testing and wired mesh nodes.
|
|
Length-prefixed packets over a TCP stream.
|
|
|
|
4. **`P2pNode` refactor:** Accept `Vec<Box<dyn MeshTransport>>` instead of
|
|
hardcoded `Endpoint`. The node listens on all transports simultaneously.
|
|
|
|
5. **`TransportManager`** — manages multiple transports, routes outbound packets
|
|
to the best available transport for a given destination.
|
|
|
|
**Tests:** IrohTransport passes existing P2P tests, TcpTransport roundtrip,
|
|
multi-transport node startup.
|
|
|
|
**Estimated changes:** ~400 lines new code, ~100 lines refactored.
|
|
|
|
---
|
|
|
|
## S3 — Announce & Discovery Protocol
|
|
|
|
**Problem:** No mesh-wide discovery. mDNS only works on LAN. Nodes beyond one hop
|
|
are invisible.
|
|
|
|
**Solution:** Reticulum-style announce propagation. Nodes broadcast signed announcements
|
|
that propagate through the mesh, building a distributed routing table.
|
|
|
|
**Deliverables:**
|
|
|
|
1. **`announce.rs`** — Announce packet:
|
|
```rust
|
|
pub struct MeshAnnounce {
|
|
/// Ed25519 public key of the announcing node.
|
|
pub identity_key: [u8; 32],
|
|
/// Truncated address (hash of identity_key, 16 bytes). Prep for S5.
|
|
pub address: [u8; 16],
|
|
/// Capabilities bitfield (supports_relay, supports_store, etc.).
|
|
pub capabilities: u16,
|
|
/// Sequence number (monotonically increasing per node).
|
|
pub sequence: u64,
|
|
/// Unix timestamp.
|
|
pub timestamp: u64,
|
|
/// Transports this node is reachable on (list of transport name + addr).
|
|
pub reachable_via: Vec<(String, Vec<u8>)>,
|
|
/// Ed25519 signature over all above fields.
|
|
pub signature: [u8; 64],
|
|
}
|
|
```
|
|
|
|
2. **Announce propagation rules (Reticulum-inspired):**
|
|
- On startup: broadcast own announce on all transports
|
|
- On receiving an announce: verify signature, check sequence > last_seen,
|
|
update routing table, re-broadcast on all *other* transports (not the one
|
|
it arrived on) with hop_count incremented
|
|
- Dedup by `(identity_key, sequence)` — don't re-broadcast already-seen announces
|
|
- TTL: announces expire after configurable duration (default 30 minutes)
|
|
- Periodic re-announce: every 10 minutes (configurable)
|
|
|
|
3. **`routing_table.rs`** — Distributed routing table:
|
|
```rust
|
|
pub struct RoutingTable {
|
|
/// Known destinations: address -> routing entry.
|
|
entries: HashMap<[u8; 16], RoutingEntry>,
|
|
}
|
|
|
|
pub struct RoutingEntry {
|
|
/// Full public key of the destination.
|
|
pub identity_key: [u8; 32],
|
|
/// Next-hop transport + address to reach this destination.
|
|
pub next_hop: (String, TransportAddr),
|
|
/// Number of hops to destination (from announce hop_count).
|
|
pub hops: u8,
|
|
/// Estimated cost (hops * inverse_bitrate_weight).
|
|
pub cost: f64,
|
|
/// When this entry was last refreshed.
|
|
pub last_seen: Instant,
|
|
/// Capabilities of the destination.
|
|
pub capabilities: u16,
|
|
}
|
|
```
|
|
|
|
4. **REPL commands:**
|
|
- `/mesh announce` — force re-announce
|
|
- `/mesh routes` — show full routing table (replaces current `/mesh route`)
|
|
- `/mesh nodes` — list all known nodes with hop count and transport
|
|
|
|
**Tests:** Announce create/verify, propagation dedup, routing table CRUD,
|
|
announce expiry, 3-node propagation simulation.
|
|
|
|
**Estimated changes:** ~500 lines new code.
|
|
|
|
---
|
|
|
|
## S4 — Multi-Hop Routing
|
|
|
|
**Problem:** Messages can only be sent directly or via server relay. No intermediate
|
|
node forwarding.
|
|
|
|
**Solution:** Autonomous packet forwarding using the routing table from S3.
|
|
Every node can relay packets for other nodes.
|
|
|
|
**Deliverables:**
|
|
|
|
1. **`router.rs`** — replace `HybridRouter` with `MeshRouter`:
|
|
```rust
|
|
pub struct MeshRouter {
|
|
/// This node's identity.
|
|
identity: MeshIdentity,
|
|
/// Routing table (populated by announce protocol).
|
|
routes: Arc<RwLock<RoutingTable>>,
|
|
/// Available transports.
|
|
transports: Arc<TransportManager>,
|
|
/// Optional server relay (kept as last-resort fallback).
|
|
server_relay: Option<Arc<dyn ServerRelay>>,
|
|
/// Store-and-forward for unreachable destinations.
|
|
store: Arc<Mutex<MeshStore>>,
|
|
/// Per-peer delivery stats.
|
|
stats: Arc<Mutex<HashMap<[u8; 16], ConnectionStats>>>,
|
|
}
|
|
```
|
|
|
|
2. **Routing algorithm:**
|
|
```
|
|
send(destination_addr, payload):
|
|
1. Look up destination in routing table
|
|
2. If direct transport available → send directly
|
|
3. If next-hop known → wrap in MeshEnvelope, send to next-hop
|
|
(next-hop node will repeat this process)
|
|
4. If no route → store-and-forward (queue for later)
|
|
5. If server relay available → use as last resort
|
|
```
|
|
|
|
3. **Forwarding logic (every node runs this):**
|
|
```
|
|
on_receive(envelope):
|
|
1. Verify signature
|
|
2. If addressed to us → deliver to application layer
|
|
3. If addressed to someone else:
|
|
a. Check hop_count < max_hops and not expired
|
|
b. Look up destination in routing table
|
|
c. Forward via next-hop transport
|
|
d. If no route → store for later forwarding
|
|
```
|
|
|
|
4. **Path MTU Discovery:**
|
|
- When routing across transports with different MTUs, fragment if needed
|
|
- Fragment header: `[fragment_id: u32][seq: u8][total: u8][payload]`
|
|
- Reassembly buffer with timeout
|
|
|
|
5. **Routing metrics:**
|
|
- Track per-path latency, success rate, hop count
|
|
- Prefer routes with lower cost (fewer hops, higher bitrate)
|
|
- Exponential backoff on failed routes
|
|
|
|
6. **REPL commands:**
|
|
- `/mesh send <address> <message>` — now works multi-hop
|
|
- `/mesh trace <address>` — show the route a message would take
|
|
- `/mesh stats` — delivery statistics per destination
|
|
|
|
**Tests:** 3-node relay chain (A→B→C), route failover, fragmentation roundtrip,
|
|
store-and-forward when intermediate node offline, routing metric updates.
|
|
|
|
**Estimated changes:** ~600 lines new code, ~200 lines refactored from existing router.
|
|
|
|
---
|
|
|
|
## S5 — Truncated Addresses & Lightweight Handshake
|
|
|
|
**Problem:** Full 32-byte public keys in every envelope waste bandwidth on constrained
|
|
links. QUIC TLS handshake is too heavy for LoRa (2-4 KB).
|
|
|
|
**Solution:** Truncated hash-based addresses (Reticulum-style) and a minimal
|
|
ECDH handshake for low-bandwidth transports.
|
|
|
|
**Deliverables:**
|
|
|
|
1. **`address.rs`** — Mesh address type:
|
|
```rust
|
|
/// 16-byte truncated address derived from Ed25519 public key.
|
|
/// Matches Reticulum's approach but with different hash construction.
|
|
pub struct MeshAddress([u8; 16]);
|
|
|
|
impl MeshAddress {
|
|
/// Derive from an Ed25519 public key.
|
|
/// SHA-256(public_key)[0..16]
|
|
pub fn from_public_key(key: &[u8; 32]) -> Self;
|
|
|
|
/// Check if this address matches a given public key.
|
|
pub fn matches(&self, key: &[u8; 32]) -> bool;
|
|
}
|
|
```
|
|
|
|
2. **Envelope v2 with truncated addresses:**
|
|
- Replace `sender_key: Vec<u8>` (32 bytes) with `sender_addr: MeshAddress` (16 bytes)
|
|
- Replace `recipient_key: Vec<u8>` (32 bytes) with `recipient_addr: MeshAddress` (16 bytes)
|
|
- Full public keys are exchanged during announce (S3) and cached in routing table
|
|
- Saves 32 bytes per envelope (significant on LoRa)
|
|
|
|
3. **Lightweight handshake for constrained transports:**
|
|
```
|
|
Link Setup (inspired by Reticulum, but with PQ option):
|
|
|
|
Packet 1 (Initiator → Responder): 80 bytes
|
|
[initiator_addr: 16][ephemeral_x25519_pub: 32][nonce: 24][flags: 8]
|
|
|
|
Packet 2 (Responder → Initiator): 112 bytes
|
|
[responder_addr: 16][ephemeral_x25519_pub: 32][encrypted_identity_proof: 48][nonce: 16]
|
|
|
|
Packet 3 (Initiator → Responder): 48 bytes
|
|
[encrypted_identity_proof: 48]
|
|
|
|
Total: 240 bytes (vs 2000-4000 for QUIC TLS)
|
|
Shared secret: HKDF-SHA256(X25519(eph_a, eph_b) || X25519(id_a, eph_b))
|
|
```
|
|
|
|
4. **`link.rs`** — `MeshLink` session type:
|
|
- Negotiated via lightweight handshake on constrained transports
|
|
- ChaCha20-Poly1305 for subsequent messages (using derived shared secret)
|
|
- Heartbeat to keep link alive (configurable, default every 5 min)
|
|
- Link teardown notification
|
|
- Automatic upgrade to QUIC if both sides support it
|
|
|
|
5. **Feature flag:** `--features constrained-transport` gates the lightweight
|
|
handshake. QUIC remains the default for Internet/LAN.
|
|
|
|
**Tests:** Address derivation, collision resistance (generate 10K addresses, check
|
|
no collisions), handshake 3-packet roundtrip, link encryption roundtrip,
|
|
envelope v2 with truncated addresses.
|
|
|
|
**Estimated changes:** ~500 lines new code.
|
|
|
|
---
|
|
|
|
## S6 — LoRa Transport & Integration Demo
|
|
|
|
**Problem:** All the mesh infrastructure from S1-S5 needs a real constrained-transport
|
|
to prove it works.
|
|
|
|
**Solution:** LoRa transport backend + end-to-end demo with Meshtastic-compatible
|
|
or standalone LoRa hardware.
|
|
|
|
**Deliverables:**
|
|
|
|
1. **`transport_lora.rs`** — LoRa transport implementation:
|
|
```rust
|
|
pub struct LoRaTransport {
|
|
/// Serial connection to LoRa modem (e.g., SX1276/SX1262 via UART).
|
|
serial: AsyncSerial,
|
|
/// LoRa parameters.
|
|
config: LoRaConfig,
|
|
}
|
|
|
|
pub struct LoRaConfig {
|
|
/// Serial port path (e.g., /dev/ttyUSB0).
|
|
pub port: String,
|
|
/// Baud rate for serial connection to modem.
|
|
pub baud_rate: u32,
|
|
/// LoRa frequency in Hz (e.g., 868_100_000 for EU868).
|
|
pub frequency: u64,
|
|
/// Spreading factor (7-12).
|
|
pub spreading_factor: u8,
|
|
/// Bandwidth in Hz (125000, 250000, 500000).
|
|
pub bandwidth: u32,
|
|
/// Coding rate (5-8, meaning 4/5 to 4/8).
|
|
pub coding_rate: u8,
|
|
/// TX power in dBm.
|
|
pub tx_power: i8,
|
|
}
|
|
```
|
|
|
|
2. **MTU-aware fragmentation:**
|
|
- LoRa MTU is typically 222 bytes (SF7/BW125) to 51 bytes (SF12/BW125)
|
|
- Automatic fragmentation/reassembly in `TransportManager`
|
|
- Fragment numbering for out-of-order reassembly
|
|
|
|
3. **Duty cycle management:**
|
|
- EU868: 1% duty cycle enforcement
|
|
- TX budget tracking: don't exceed legal limits
|
|
- Queue with priority (announces < data < emergency)
|
|
|
|
4. **End-to-end integration demo:**
|
|
```
|
|
Setup:
|
|
Node A (Laptop + LoRa) ── LoRa ── Node B (RPi + LoRa) ── WiFi ── Node C (Laptop)
|
|
|
|
Demo script:
|
|
1. All three nodes start, announce on their transports
|
|
2. A discovers C through B's routing announcements
|
|
3. A sends encrypted message to C: LoRa → B (relay) → WiFi → C
|
|
4. C replies: WiFi → B (relay) → LoRa → A
|
|
5. Show routing table, hop counts, delivery stats at each node
|
|
```
|
|
|
|
5. **`scripts/mesh-demo.sh`** — automated demo setup script.
|
|
|
|
6. **Termux integration:**
|
|
- Update existing Termux build scripts for the mesh features
|
|
- Android phone as a LoRa mesh node (via USB OTG to LoRa modem)
|
|
|
|
**Tests:** LoRa transport with mock serial (loopback), fragmentation across LoRa MTU,
|
|
duty cycle enforcement, 3-node integration test (simulated transports).
|
|
|
|
**Hardware needed:** 2-3x LoRa modules (SX1262 recommended), RPi or similar.
|
|
|
|
**Estimated changes:** ~600 lines new code, ~50 lines build/script changes.
|
|
|
|
---
|
|
|
|
## Dependency Graph
|
|
|
|
```
|
|
S1 (Binary Wire) S2 (Transport Trait)
|
|
│ │
|
|
└──────┬───────────────┘
|
|
│
|
|
S3 (Announce/Discovery)
|
|
│
|
|
S4 (Multi-Hop Routing)
|
|
│
|
|
S5 (Addresses + Handshake)
|
|
│
|
|
S6 (LoRa + Demo)
|
|
```
|
|
|
|
S1 and S2 can run in **parallel** (no dependency). S3+ are sequential.
|
|
|
|
---
|
|
|
|
## Comparison: quicprochat (after) vs Reticulum
|
|
|
|
| Dimension | Reticulum | quicprochat (post-upgrade) |
|
|
|-----------|-----------|---------------------------|
|
|
| Language | Python | Rust (no_std possible) |
|
|
| Crypto | X25519, AES-256-CBC, HMAC-SHA256 | Ed25519, X25519+ML-KEM-768, ChaCha20-Poly1305, MLS |
|
|
| Post-Quantum | No | Yes (ML-KEM-768 hybrid) |
|
|
| Group Encryption | None (link-level only) | MLS RFC 9420 (forward secrecy + PCS) |
|
|
| Wire Format | msgpack | CBOR (compact, IETF standard) |
|
|
| Spec | Reference implementation only | Protobuf schemas + potential IETF Draft |
|
|
| Transport Agnostic | Yes (mature, 8 years) | Yes (new, but Rust-native) |
|
|
| Multi-Hop Routing | Yes (announce + path discovery) | Yes (inspired by Reticulum) |
|
|
| Handshake Size | 297 bytes | ~240 bytes |
|
|
| Security Audit | None | Designed for auditability (fuzzing, formal model) |
|
|
| Embedded Targets | No (CPython required) | Yes (Rust cross-compile, no_std core) |
|
|
| LoRa Support | Yes (via RNode) | Yes (direct SX1262 + Meshtastic compat) |
|
|
|
|
---
|
|
|
|
## Risk Register
|
|
|
|
| Risk | Impact | Mitigation |
|
|
|------|--------|------------|
|
|
| LoRa hardware availability | Blocks S6 | S1-S5 work with simulated transports; LoRa is optional |
|
|
| iroh API breaking changes | Medium | Pin iroh version, abstract behind transport trait (S2) |
|
|
| Address collision (16-byte truncation) | Low (birthday: ~2^64) | Monitor, option to use full 32-byte if needed |
|
|
| Lightweight handshake security gaps | High | Get crypto review before deploying on real networks |
|
|
| Fragmentation complexity | Medium | Start with simple stop-and-wait, optimize later |
|
|
|
|
---
|
|
|
|
## Success Criteria
|
|
|
|
After S4 (minimum viable mesh):
|
|
- [ ] 3+ nodes form a self-organizing mesh over TCP transports
|
|
- [ ] Messages route automatically through intermediate nodes
|
|
- [ ] Node join/leave is handled gracefully (re-announce, route expiry)
|
|
- [ ] Wire format is <200 bytes for a typical chat message envelope
|
|
|
|
After S6 (full demo):
|
|
- [ ] Working LoRa ↔ WiFi ↔ QUIC heterogeneous mesh
|
|
- [ ] Message delivery across 3 hops with different transports
|
|
- [ ] Duty cycle compliance on EU868
|
|
- [ ] Android (Termux) node participates in the mesh
|