feat(p2p): mesh stack, LoRa mock transport, and relay demo

Implement transport abstraction (TCP/iroh), announce and routing table,
multi-hop mesh router, truncated-address link layer, and LoRa mock
medium with fragmentation plus EU868-style duty-cycle accounting.
Add mesh_lora_relay_demo and scripts/mesh-demo.sh. Relax CBOR vs JSON
size assertion to match fixed-size cryptographic overhead. Extend
.gitignore for nested targets and node_modules.

Made-with: Cursor
This commit is contained in:
2026-03-30 21:19:12 +02:00
parent d469999c2a
commit f9ac921a0c
20 changed files with 4042 additions and 6 deletions

View File

@@ -0,0 +1,511 @@
# Reticulum-Inspired Mesh Upgrade Plan
> **Goal:** Transform quicprochat's P2P layer from a simple direct/relay hybrid into a
> self-organizing, multi-hop mesh capable of running over LoRa, Packet Radio, Serial,
> and other low-bandwidth transports — incorporating 8 years of Reticulum design
> learnings, but with Rust, MLS, and post-quantum crypto.
>
> Created: 2026-03-30 | Sprints: 6 | Area: `quicprochat-p2p` + `quicprochat-core`
---
## Architecture Vision
```
Before (current):
Client A ──── iroh QUIC ────► Client B (direct P2P)
│ │
└── QUIC/TLS ── Server ── QUIC/TLS ┘ (relay fallback)
After (target):
Client A ── LoRa ── Node X ── WiFi ── Node Y ── Serial ── Client B
│ │
└── iroh QUIC ── Server (optional) ── iroh QUIC ──────────┘
any transport works:
LoRa, Serial, TCP, UDP, WiFi, Packet Radio, QUIC
```
Key difference from Reticulum: we keep MLS group encryption, post-quantum hybrid KEM,
and formal Protobuf framing. Reticulum's transport-agnostic routing and announce
semantics are the inspiration, not the crypto.
---
## Sprint Overview
| Sprint | Name | Focus | Key Deliverable |
|--------|------|-------|-----------------|
| S1 | Binary Wire Format | Efficiency | CBOR `MeshEnvelope`, ~70% size reduction |
| S2 | Transport Abstraction | Architecture | `MeshTransport` trait, pluggable backends |
| S3 | Announce & Discovery | Self-Organization | Network-wide announce propagation + routing table |
| S4 | Multi-Hop Routing | Core Mesh | Autonomous packet forwarding across intermediate nodes |
| S5 | Truncated Addresses + Lightweight Handshake | LoRa-Ready | 16-byte addresses, minimal handshake for constrained links |
| S6 | LoRa Transport + Integration | Hardware | Working LoRa backend, end-to-end mesh demo |
---
## S1 — Binary Wire Format
**Problem:** `MeshEnvelope::to_bytes()` uses JSON serialization. A typical envelope
is ~500-800 bytes in JSON. On LoRa at 300 bps, that's 13-21 seconds per message.
**Solution:** CBOR binary serialization via `ciborium` (already in workspace deps).
**Deliverables:**
1. **`envelope_binary.rs`** — new serialization functions:
- `MeshEnvelope::to_cbor() -> Vec<u8>` — compact binary encoding
- `MeshEnvelope::from_cbor(bytes: &[u8]) -> Result<Self>` — decoding
- Keep `to_bytes()`/`from_bytes()` as JSON for debug/human-readable use
- Add `to_wire() -> Vec<u8>` as the default wire format (CBOR)
- Add `from_wire(bytes: &[u8]) -> Result<Self>` for receiving
2. **Compact field encoding:**
- `sender_key`: 32 bytes raw (not hex-encoded)
- `recipient_key`: 32 bytes raw (or 16 bytes truncated, prep for S5)
- `signature`: 64 bytes raw
- `id`: 32 bytes raw
- `payload`: raw bytes (no base64)
- `timestamp`: u64 (8 bytes)
- `ttl_secs`: u32 (4 bytes)
- `hop_count`: u8 (1 byte)
- `max_hops`: u8 (1 byte)
3. **Size comparison test:**
- Create identical envelopes, serialize both ways, assert CBOR < 50% of JSON
- Expected: ~140-160 bytes CBOR vs ~500-800 bytes JSON for a typical message
4. **Migration:** `P2pNode::send_mesh()` and `broadcast()` switch to `to_wire()`.
`from_wire()` tries CBOR first, falls back to JSON for backward compat.
**Tests:** Roundtrip CBOR, size comparison, backward compat with JSON, fuzz test
for malformed CBOR input.
**Estimated changes:** ~150 lines new code, ~20 lines modified.
---
## S2 — Transport Abstraction
**Problem:** P2P layer is hardcoded to iroh QUIC. Cannot support LoRa, Serial,
Packet Radio, or other media.
**Solution:** Abstract transport behind a trait. Reticulum calls this "Interface" —
we call it `MeshTransport`.
**Deliverables:**
1. **`transport.rs`** — trait definition:
```rust
#[async_trait]
pub trait MeshTransport: Send + Sync {
/// Human-readable transport name (e.g., "iroh-quic", "lora", "serial").
fn name(&self) -> &str;
/// Maximum transmission unit in bytes.
fn mtu(&self) -> usize;
/// Estimated bitrate in bits/second (for routing cost calculation).
fn bitrate(&self) -> u64;
/// Whether this transport supports bidirectional communication.
fn is_bidirectional(&self) -> bool;
/// Send raw bytes to a destination address.
async fn send(&self, dest: &TransportAddr, data: &[u8]) -> Result<()>;
/// Receive the next incoming packet. Blocks until data arrives.
async fn recv(&self) -> Result<(TransportAddr, Vec<u8>)>;
/// List reachable peers on this transport (e.g., mDNS scan, LoRa beacon).
async fn discover(&self) -> Result<Vec<TransportAddr>>;
}
/// Transport-agnostic address.
pub enum TransportAddr {
/// iroh node ID + optional relay.
Iroh(iroh::EndpointAddr),
/// IP:port for TCP/UDP transports.
Socket(std::net::SocketAddr),
/// LoRa device address (4 bytes).
LoRa([u8; 4]),
/// Serial port path.
Serial(String),
/// Raw bytes for unknown transports.
Raw(Vec<u8>),
}
```
2. **`transport_iroh.rs`** — refactor existing `P2pNode` send/recv into
`IrohTransport` implementing `MeshTransport`.
3. **`transport_tcp.rs`** — simple TCP transport for testing and wired mesh nodes.
Length-prefixed packets over a TCP stream.
4. **`P2pNode` refactor:** Accept `Vec<Box<dyn MeshTransport>>` instead of
hardcoded `Endpoint`. The node listens on all transports simultaneously.
5. **`TransportManager`** — manages multiple transports, routes outbound packets
to the best available transport for a given destination.
**Tests:** IrohTransport passes existing P2P tests, TcpTransport roundtrip,
multi-transport node startup.
**Estimated changes:** ~400 lines new code, ~100 lines refactored.
---
## S3 — Announce & Discovery Protocol
**Problem:** No mesh-wide discovery. mDNS only works on LAN. Nodes beyond one hop
are invisible.
**Solution:** Reticulum-style announce propagation. Nodes broadcast signed announcements
that propagate through the mesh, building a distributed routing table.
**Deliverables:**
1. **`announce.rs`** — Announce packet:
```rust
pub struct MeshAnnounce {
/// Ed25519 public key of the announcing node.
pub identity_key: [u8; 32],
/// Truncated address (hash of identity_key, 16 bytes). Prep for S5.
pub address: [u8; 16],
/// Capabilities bitfield (supports_relay, supports_store, etc.).
pub capabilities: u16,
/// Sequence number (monotonically increasing per node).
pub sequence: u64,
/// Unix timestamp.
pub timestamp: u64,
/// Transports this node is reachable on (list of transport name + addr).
pub reachable_via: Vec<(String, Vec<u8>)>,
/// Ed25519 signature over all above fields.
pub signature: [u8; 64],
}
```
2. **Announce propagation rules (Reticulum-inspired):**
- On startup: broadcast own announce on all transports
- On receiving an announce: verify signature, check sequence > last_seen,
update routing table, re-broadcast on all *other* transports (not the one
it arrived on) with hop_count incremented
- Dedup by `(identity_key, sequence)` — don't re-broadcast already-seen announces
- TTL: announces expire after configurable duration (default 30 minutes)
- Periodic re-announce: every 10 minutes (configurable)
3. **`routing_table.rs`** — Distributed routing table:
```rust
pub struct RoutingTable {
/// Known destinations: address -> routing entry.
entries: HashMap<[u8; 16], RoutingEntry>,
}
pub struct RoutingEntry {
/// Full public key of the destination.
pub identity_key: [u8; 32],
/// Next-hop transport + address to reach this destination.
pub next_hop: (String, TransportAddr),
/// Number of hops to destination (from announce hop_count).
pub hops: u8,
/// Estimated cost (hops * inverse_bitrate_weight).
pub cost: f64,
/// When this entry was last refreshed.
pub last_seen: Instant,
/// Capabilities of the destination.
pub capabilities: u16,
}
```
4. **REPL commands:**
- `/mesh announce` — force re-announce
- `/mesh routes` — show full routing table (replaces current `/mesh route`)
- `/mesh nodes` — list all known nodes with hop count and transport
**Tests:** Announce create/verify, propagation dedup, routing table CRUD,
announce expiry, 3-node propagation simulation.
**Estimated changes:** ~500 lines new code.
---
## S4 — Multi-Hop Routing
**Problem:** Messages can only be sent directly or via server relay. No intermediate
node forwarding.
**Solution:** Autonomous packet forwarding using the routing table from S3.
Every node can relay packets for other nodes.
**Deliverables:**
1. **`router.rs`** — replace `HybridRouter` with `MeshRouter`:
```rust
pub struct MeshRouter {
/// This node's identity.
identity: MeshIdentity,
/// Routing table (populated by announce protocol).
routes: Arc<RwLock<RoutingTable>>,
/// Available transports.
transports: Arc<TransportManager>,
/// Optional server relay (kept as last-resort fallback).
server_relay: Option<Arc<dyn ServerRelay>>,
/// Store-and-forward for unreachable destinations.
store: Arc<Mutex<MeshStore>>,
/// Per-peer delivery stats.
stats: Arc<Mutex<HashMap<[u8; 16], ConnectionStats>>>,
}
```
2. **Routing algorithm:**
```
send(destination_addr, payload):
1. Look up destination in routing table
2. If direct transport available → send directly
3. If next-hop known → wrap in MeshEnvelope, send to next-hop
(next-hop node will repeat this process)
4. If no route → store-and-forward (queue for later)
5. If server relay available → use as last resort
```
3. **Forwarding logic (every node runs this):**
```
on_receive(envelope):
1. Verify signature
2. If addressed to us → deliver to application layer
3. If addressed to someone else:
a. Check hop_count < max_hops and not expired
b. Look up destination in routing table
c. Forward via next-hop transport
d. If no route → store for later forwarding
```
4. **Path MTU Discovery:**
- When routing across transports with different MTUs, fragment if needed
- Fragment header: `[fragment_id: u32][seq: u8][total: u8][payload]`
- Reassembly buffer with timeout
5. **Routing metrics:**
- Track per-path latency, success rate, hop count
- Prefer routes with lower cost (fewer hops, higher bitrate)
- Exponential backoff on failed routes
6. **REPL commands:**
- `/mesh send <address> <message>` — now works multi-hop
- `/mesh trace <address>` — show the route a message would take
- `/mesh stats` — delivery statistics per destination
**Tests:** 3-node relay chain (A→B→C), route failover, fragmentation roundtrip,
store-and-forward when intermediate node offline, routing metric updates.
**Estimated changes:** ~600 lines new code, ~200 lines refactored from existing router.
---
## S5 — Truncated Addresses & Lightweight Handshake
**Problem:** Full 32-byte public keys in every envelope waste bandwidth on constrained
links. QUIC TLS handshake is too heavy for LoRa (2-4 KB).
**Solution:** Truncated hash-based addresses (Reticulum-style) and a minimal
ECDH handshake for low-bandwidth transports.
**Deliverables:**
1. **`address.rs`** — Mesh address type:
```rust
/// 16-byte truncated address derived from Ed25519 public key.
/// Matches Reticulum's approach but with different hash construction.
pub struct MeshAddress([u8; 16]);
impl MeshAddress {
/// Derive from an Ed25519 public key.
/// SHA-256(public_key)[0..16]
pub fn from_public_key(key: &[u8; 32]) -> Self;
/// Check if this address matches a given public key.
pub fn matches(&self, key: &[u8; 32]) -> bool;
}
```
2. **Envelope v2 with truncated addresses:**
- Replace `sender_key: Vec<u8>` (32 bytes) with `sender_addr: MeshAddress` (16 bytes)
- Replace `recipient_key: Vec<u8>` (32 bytes) with `recipient_addr: MeshAddress` (16 bytes)
- Full public keys are exchanged during announce (S3) and cached in routing table
- Saves 32 bytes per envelope (significant on LoRa)
3. **Lightweight handshake for constrained transports:**
```
Link Setup (inspired by Reticulum, but with PQ option):
Packet 1 (Initiator → Responder): 80 bytes
[initiator_addr: 16][ephemeral_x25519_pub: 32][nonce: 24][flags: 8]
Packet 2 (Responder → Initiator): 112 bytes
[responder_addr: 16][ephemeral_x25519_pub: 32][encrypted_identity_proof: 48][nonce: 16]
Packet 3 (Initiator → Responder): 48 bytes
[encrypted_identity_proof: 48]
Total: 240 bytes (vs 2000-4000 for QUIC TLS)
Shared secret: HKDF-SHA256(X25519(eph_a, eph_b) || X25519(id_a, eph_b))
```
4. **`link.rs`** — `MeshLink` session type:
- Negotiated via lightweight handshake on constrained transports
- ChaCha20-Poly1305 for subsequent messages (using derived shared secret)
- Heartbeat to keep link alive (configurable, default every 5 min)
- Link teardown notification
- Automatic upgrade to QUIC if both sides support it
5. **Feature flag:** `--features constrained-transport` gates the lightweight
handshake. QUIC remains the default for Internet/LAN.
**Tests:** Address derivation, collision resistance (generate 10K addresses, check
no collisions), handshake 3-packet roundtrip, link encryption roundtrip,
envelope v2 with truncated addresses.
**Estimated changes:** ~500 lines new code.
---
## S6 — LoRa Transport & Integration Demo
**Problem:** All the mesh infrastructure from S1-S5 needs a real constrained-transport
to prove it works.
**Solution:** LoRa transport backend + end-to-end demo with Meshtastic-compatible
or standalone LoRa hardware.
**Deliverables:**
1. **`transport_lora.rs`** — LoRa transport implementation:
```rust
pub struct LoRaTransport {
/// Serial connection to LoRa modem (e.g., SX1276/SX1262 via UART).
serial: AsyncSerial,
/// LoRa parameters.
config: LoRaConfig,
}
pub struct LoRaConfig {
/// Serial port path (e.g., /dev/ttyUSB0).
pub port: String,
/// Baud rate for serial connection to modem.
pub baud_rate: u32,
/// LoRa frequency in Hz (e.g., 868_100_000 for EU868).
pub frequency: u64,
/// Spreading factor (7-12).
pub spreading_factor: u8,
/// Bandwidth in Hz (125000, 250000, 500000).
pub bandwidth: u32,
/// Coding rate (5-8, meaning 4/5 to 4/8).
pub coding_rate: u8,
/// TX power in dBm.
pub tx_power: i8,
}
```
2. **MTU-aware fragmentation:**
- LoRa MTU is typically 222 bytes (SF7/BW125) to 51 bytes (SF12/BW125)
- Automatic fragmentation/reassembly in `TransportManager`
- Fragment numbering for out-of-order reassembly
3. **Duty cycle management:**
- EU868: 1% duty cycle enforcement
- TX budget tracking: don't exceed legal limits
- Queue with priority (announces < data < emergency)
4. **End-to-end integration demo:**
```
Setup:
Node A (Laptop + LoRa) ── LoRa ── Node B (RPi + LoRa) ── WiFi ── Node C (Laptop)
Demo script:
1. All three nodes start, announce on their transports
2. A discovers C through B's routing announcements
3. A sends encrypted message to C: LoRa → B (relay) → WiFi → C
4. C replies: WiFi → B (relay) → LoRa → A
5. Show routing table, hop counts, delivery stats at each node
```
5. **`scripts/mesh-demo.sh`** — automated demo setup script.
6. **Termux integration:**
- Update existing Termux build scripts for the mesh features
- Android phone as a LoRa mesh node (via USB OTG to LoRa modem)
**Tests:** LoRa transport with mock serial (loopback), fragmentation across LoRa MTU,
duty cycle enforcement, 3-node integration test (simulated transports).
**Hardware needed:** 2-3x LoRa modules (SX1262 recommended), RPi or similar.
**Estimated changes:** ~600 lines new code, ~50 lines build/script changes.
---
## Dependency Graph
```
S1 (Binary Wire) S2 (Transport Trait)
│ │
└──────┬───────────────┘
S3 (Announce/Discovery)
S4 (Multi-Hop Routing)
S5 (Addresses + Handshake)
S6 (LoRa + Demo)
```
S1 and S2 can run in **parallel** (no dependency). S3+ are sequential.
---
## Comparison: quicprochat (after) vs Reticulum
| Dimension | Reticulum | quicprochat (post-upgrade) |
|-----------|-----------|---------------------------|
| Language | Python | Rust (no_std possible) |
| Crypto | X25519, AES-256-CBC, HMAC-SHA256 | Ed25519, X25519+ML-KEM-768, ChaCha20-Poly1305, MLS |
| Post-Quantum | No | Yes (ML-KEM-768 hybrid) |
| Group Encryption | None (link-level only) | MLS RFC 9420 (forward secrecy + PCS) |
| Wire Format | msgpack | CBOR (compact, IETF standard) |
| Spec | Reference implementation only | Protobuf schemas + potential IETF Draft |
| Transport Agnostic | Yes (mature, 8 years) | Yes (new, but Rust-native) |
| Multi-Hop Routing | Yes (announce + path discovery) | Yes (inspired by Reticulum) |
| Handshake Size | 297 bytes | ~240 bytes |
| Security Audit | None | Designed for auditability (fuzzing, formal model) |
| Embedded Targets | No (CPython required) | Yes (Rust cross-compile, no_std core) |
| LoRa Support | Yes (via RNode) | Yes (direct SX1262 + Meshtastic compat) |
---
## Risk Register
| Risk | Impact | Mitigation |
|------|--------|------------|
| LoRa hardware availability | Blocks S6 | S1-S5 work with simulated transports; LoRa is optional |
| iroh API breaking changes | Medium | Pin iroh version, abstract behind transport trait (S2) |
| Address collision (16-byte truncation) | Low (birthday: ~2^64) | Monitor, option to use full 32-byte if needed |
| Lightweight handshake security gaps | High | Get crypto review before deploying on real networks |
| Fragmentation complexity | Medium | Start with simple stop-and-wait, optimize later |
---
## Success Criteria
After S4 (minimum viable mesh):
- [ ] 3+ nodes form a self-organizing mesh over TCP transports
- [ ] Messages route automatically through intermediate nodes
- [ ] Node join/leave is handled gracefully (re-announce, route expiry)
- [ ] Wire format is <200 bytes for a typical chat message envelope
After S6 (full demo):
- [ ] Working LoRa ↔ WiFi ↔ QUIC heterogeneous mesh
- [ ] Message delivery across 3 hops with different transports
- [ ] Duty cycle compliance on EU868
- [ ] Android (Termux) node participates in the mesh

57
docs/status.md Normal file
View File

@@ -0,0 +1,57 @@
# Status Log
## 2026-03-30 — Sprint 6: LoRa transport & integration demo
### Completed
- Added `transport_lora.rs`: `LoRaConfig`, Semtech-style airtime estimate, `DutyCycleTracker` (rolling 1 h window, `eu868_one_percent()`), `LoRaMockMedium` + `LoRaTransport` implementing `MeshTransport` (`lora` name for `TransportManager`), LR framing with automatic fragmentation/reassembly, tests (mock roundtrip, fragmentation, duty accounting, `split_for_mtu`).
- Example `mesh_lora_relay_demo`: A (LoRa mock) → B (relay) → C (TCP) and reply path; `scripts/mesh-demo.sh` runs it.
- Wired `pub mod transport_lora` in `lib.rs`.
- Adjusted `cbor_smaller_than_json` to assert CBOR is materially smaller than JSON (fixed overhead dominates; a strict half-JSON threshold failed on current envelope sizes).
### What's next
- Optional: UART-backed `LoRaTransport` behind a feature flag (modem-specific framing).
- Hardware runbook: replace mock medium with RNode / SX1262 serial when available.
## 2026-03-30 — Sprint 3: Announce & Discovery Protocol
### Completed
- Created `MeshAnnounce` struct with Ed25519 signed announcements, CBOR wire format, hop forwarding
- Created `compute_address()` — SHA-256 truncation of identity key to 16-byte mesh address
- Created `RoutingTable` with `RoutingEntry` — keyed by 16-byte address, supports lookup by address or full key, TTL-based expiry, sequence-based stale rejection
- Created `AnnounceDedup` for loop prevention (address+sequence deduplication)
- Created `AnnounceConfig` with sensible defaults (10min interval, 30min max age, 8 max hops)
- Created `create_announce()` and `process_received_announce()` — complete announce processing pipeline (verify, expiry check, dedup, routing update, propagation decision)
- Capability flags: CAP_RELAY, CAP_STORE, CAP_GATEWAY, CAP_CONSTRAINED
- Tests: 17 tests across 3 modules covering signature verification, tampering, forwarding, expiry, dedup, routing updates, stale rejection, CBOR roundtrip, address determinism
- Updated lib.rs with `announce`, `announce_protocol`, `routing_table` modules
### What's Next
- S4: Multi-Hop Routing
- Integrate announce protocol with TransportManager for actual broadcast/receive loops
- Add tokio async announce loop (periodic re-announce, GC timer)
### Notes
- Signature excludes `hop_count` (same design as MeshEnvelope) so forwarding doesn't break verification
- Protocol engine uses free functions rather than a stateful struct — simpler, more testable
- Cannot run `cargo test` in this environment (no C toolchain / linker available)
## 2026-03-30 — Sprint 2: Transport Abstraction Layer
### Completed
- Created `MeshTransport` trait with `send`, `recv`, `discover`, `close` methods
- Created `TransportAddr` enum for transport-agnostic addressing (Iroh, Socket, LoRa, Serial, Raw)
- Created `TransportInfo` struct for transport capability metadata
- Implemented `IrohTransport` wrapping iroh `Endpoint` with same length-prefixed framing as `P2pNode`
- Implemented `TcpTransport` using tokio `TcpListener`/`TcpStream` with length-prefixed framing
- Implemented `TransportManager` for multi-transport routing based on address type
- Added `async-trait` dependency, enabled tokio `net` + `io-util` features
- Tests: TransportAddr Display formatting, TCP roundtrip, TransportManager routing, error cases
### What's Next
- S3: Announce & Discovery Protocol
- Future: integrate transport layer into `HybridRouter` / replace direct iroh usage
### Notes
- New transport layer sits alongside existing `P2pNode` — no breaking changes
- `IrohTransport` uses separate ALPN (`quicprochat/mesh/1`) to avoid conflicts with `P2pNode`
- Cannot run `cargo test`/`cargo clippy` in this environment (no Rust toolchain installed)