Files
quicproquo/docs/plans/reticulum-mesh-upgrade.md
Christian Nennemann f9ac921a0c feat(p2p): mesh stack, LoRa mock transport, and relay demo
Implement transport abstraction (TCP/iroh), announce and routing table,
multi-hop mesh router, truncated-address link layer, and LoRa mock
medium with fragmentation plus EU868-style duty-cycle accounting.
Add mesh_lora_relay_demo and scripts/mesh-demo.sh. Relax CBOR vs JSON
size assertion to match fixed-size cryptographic overhead. Extend
.gitignore for nested targets and node_modules.

Made-with: Cursor
2026-03-30 21:19:12 +02:00

19 KiB

Reticulum-Inspired Mesh Upgrade Plan

Goal: Transform quicprochat's P2P layer from a simple direct/relay hybrid into a self-organizing, multi-hop mesh capable of running over LoRa, Packet Radio, Serial, and other low-bandwidth transports — incorporating 8 years of Reticulum design learnings, but with Rust, MLS, and post-quantum crypto.

Created: 2026-03-30 | Sprints: 6 | Area: quicprochat-p2p + quicprochat-core


Architecture Vision

Before (current):
  Client A ──── iroh QUIC ────► Client B     (direct P2P)
      │                              │
      └── QUIC/TLS ── Server ── QUIC/TLS ┘   (relay fallback)

After (target):
  Client A ── LoRa ── Node X ── WiFi ── Node Y ── Serial ── Client B
      │                                                         │
      └── iroh QUIC ── Server (optional) ── iroh QUIC ──────────┘
                        ▲
                   any transport works:
                   LoRa, Serial, TCP, UDP, WiFi, Packet Radio, QUIC

Key difference from Reticulum: we keep MLS group encryption, post-quantum hybrid KEM, and formal Protobuf framing. Reticulum's transport-agnostic routing and announce semantics are the inspiration, not the crypto.


Sprint Overview

Sprint Name Focus Key Deliverable
S1 Binary Wire Format Efficiency CBOR MeshEnvelope, ~70% size reduction
S2 Transport Abstraction Architecture MeshTransport trait, pluggable backends
S3 Announce & Discovery Self-Organization Network-wide announce propagation + routing table
S4 Multi-Hop Routing Core Mesh Autonomous packet forwarding across intermediate nodes
S5 Truncated Addresses + Lightweight Handshake LoRa-Ready 16-byte addresses, minimal handshake for constrained links
S6 LoRa Transport + Integration Hardware Working LoRa backend, end-to-end mesh demo

S1 — Binary Wire Format

Problem: MeshEnvelope::to_bytes() uses JSON serialization. A typical envelope is ~500-800 bytes in JSON. On LoRa at 300 bps, that's 13-21 seconds per message.

Solution: CBOR binary serialization via ciborium (already in workspace deps).

Deliverables:

  1. envelope_binary.rs — new serialization functions:

    • MeshEnvelope::to_cbor() -> Vec<u8> — compact binary encoding
    • MeshEnvelope::from_cbor(bytes: &[u8]) -> Result<Self> — decoding
    • Keep to_bytes()/from_bytes() as JSON for debug/human-readable use
    • Add to_wire() -> Vec<u8> as the default wire format (CBOR)
    • Add from_wire(bytes: &[u8]) -> Result<Self> for receiving
  2. Compact field encoding:

    • sender_key: 32 bytes raw (not hex-encoded)
    • recipient_key: 32 bytes raw (or 16 bytes truncated, prep for S5)
    • signature: 64 bytes raw
    • id: 32 bytes raw
    • payload: raw bytes (no base64)
    • timestamp: u64 (8 bytes)
    • ttl_secs: u32 (4 bytes)
    • hop_count: u8 (1 byte)
    • max_hops: u8 (1 byte)
  3. Size comparison test:

    • Create identical envelopes, serialize both ways, assert CBOR < 50% of JSON
    • Expected: ~140-160 bytes CBOR vs ~500-800 bytes JSON for a typical message
  4. Migration: P2pNode::send_mesh() and broadcast() switch to to_wire(). from_wire() tries CBOR first, falls back to JSON for backward compat.

Tests: Roundtrip CBOR, size comparison, backward compat with JSON, fuzz test for malformed CBOR input.

Estimated changes: ~150 lines new code, ~20 lines modified.


S2 — Transport Abstraction

Problem: P2P layer is hardcoded to iroh QUIC. Cannot support LoRa, Serial, Packet Radio, or other media.

Solution: Abstract transport behind a trait. Reticulum calls this "Interface" — we call it MeshTransport.

Deliverables:

  1. transport.rs — trait definition:

    #[async_trait]
    pub trait MeshTransport: Send + Sync {
        /// Human-readable transport name (e.g., "iroh-quic", "lora", "serial").
        fn name(&self) -> &str;
    
        /// Maximum transmission unit in bytes.
        fn mtu(&self) -> usize;
    
        /// Estimated bitrate in bits/second (for routing cost calculation).
        fn bitrate(&self) -> u64;
    
        /// Whether this transport supports bidirectional communication.
        fn is_bidirectional(&self) -> bool;
    
        /// Send raw bytes to a destination address.
        async fn send(&self, dest: &TransportAddr, data: &[u8]) -> Result<()>;
    
        /// Receive the next incoming packet. Blocks until data arrives.
        async fn recv(&self) -> Result<(TransportAddr, Vec<u8>)>;
    
        /// List reachable peers on this transport (e.g., mDNS scan, LoRa beacon).
        async fn discover(&self) -> Result<Vec<TransportAddr>>;
    }
    
    /// Transport-agnostic address.
    pub enum TransportAddr {
        /// iroh node ID + optional relay.
        Iroh(iroh::EndpointAddr),
        /// IP:port for TCP/UDP transports.
        Socket(std::net::SocketAddr),
        /// LoRa device address (4 bytes).
        LoRa([u8; 4]),
        /// Serial port path.
        Serial(String),
        /// Raw bytes for unknown transports.
        Raw(Vec<u8>),
    }
    
  2. transport_iroh.rs — refactor existing P2pNode send/recv into IrohTransport implementing MeshTransport.

  3. transport_tcp.rs — simple TCP transport for testing and wired mesh nodes. Length-prefixed packets over a TCP stream.

  4. P2pNode refactor: Accept Vec<Box<dyn MeshTransport>> instead of hardcoded Endpoint. The node listens on all transports simultaneously.

  5. TransportManager — manages multiple transports, routes outbound packets to the best available transport for a given destination.

Tests: IrohTransport passes existing P2P tests, TcpTransport roundtrip, multi-transport node startup.

Estimated changes: ~400 lines new code, ~100 lines refactored.


S3 — Announce & Discovery Protocol

Problem: No mesh-wide discovery. mDNS only works on LAN. Nodes beyond one hop are invisible.

Solution: Reticulum-style announce propagation. Nodes broadcast signed announcements that propagate through the mesh, building a distributed routing table.

Deliverables:

  1. announce.rs — Announce packet:

    pub struct MeshAnnounce {
        /// Ed25519 public key of the announcing node.
        pub identity_key: [u8; 32],
        /// Truncated address (hash of identity_key, 16 bytes). Prep for S5.
        pub address: [u8; 16],
        /// Capabilities bitfield (supports_relay, supports_store, etc.).
        pub capabilities: u16,
        /// Sequence number (monotonically increasing per node).
        pub sequence: u64,
        /// Unix timestamp.
        pub timestamp: u64,
        /// Transports this node is reachable on (list of transport name + addr).
        pub reachable_via: Vec<(String, Vec<u8>)>,
        /// Ed25519 signature over all above fields.
        pub signature: [u8; 64],
    }
    
  2. Announce propagation rules (Reticulum-inspired):

    • On startup: broadcast own announce on all transports
    • On receiving an announce: verify signature, check sequence > last_seen, update routing table, re-broadcast on all other transports (not the one it arrived on) with hop_count incremented
    • Dedup by (identity_key, sequence) — don't re-broadcast already-seen announces
    • TTL: announces expire after configurable duration (default 30 minutes)
    • Periodic re-announce: every 10 minutes (configurable)
  3. routing_table.rs — Distributed routing table:

    pub struct RoutingTable {
        /// Known destinations: address -> routing entry.
        entries: HashMap<[u8; 16], RoutingEntry>,
    }
    
    pub struct RoutingEntry {
        /// Full public key of the destination.
        pub identity_key: [u8; 32],
        /// Next-hop transport + address to reach this destination.
        pub next_hop: (String, TransportAddr),
        /// Number of hops to destination (from announce hop_count).
        pub hops: u8,
        /// Estimated cost (hops * inverse_bitrate_weight).
        pub cost: f64,
        /// When this entry was last refreshed.
        pub last_seen: Instant,
        /// Capabilities of the destination.
        pub capabilities: u16,
    }
    
  4. REPL commands:

    • /mesh announce — force re-announce
    • /mesh routes — show full routing table (replaces current /mesh route)
    • /mesh nodes — list all known nodes with hop count and transport

Tests: Announce create/verify, propagation dedup, routing table CRUD, announce expiry, 3-node propagation simulation.

Estimated changes: ~500 lines new code.


S4 — Multi-Hop Routing

Problem: Messages can only be sent directly or via server relay. No intermediate node forwarding.

Solution: Autonomous packet forwarding using the routing table from S3. Every node can relay packets for other nodes.

Deliverables:

  1. router.rs — replace HybridRouter with MeshRouter:

    pub struct MeshRouter {
        /// This node's identity.
        identity: MeshIdentity,
        /// Routing table (populated by announce protocol).
        routes: Arc<RwLock<RoutingTable>>,
        /// Available transports.
        transports: Arc<TransportManager>,
        /// Optional server relay (kept as last-resort fallback).
        server_relay: Option<Arc<dyn ServerRelay>>,
        /// Store-and-forward for unreachable destinations.
        store: Arc<Mutex<MeshStore>>,
        /// Per-peer delivery stats.
        stats: Arc<Mutex<HashMap<[u8; 16], ConnectionStats>>>,
    }
    
  2. Routing algorithm:

    send(destination_addr, payload):
      1. Look up destination in routing table
      2. If direct transport available → send directly
      3. If next-hop known → wrap in MeshEnvelope, send to next-hop
         (next-hop node will repeat this process)
      4. If no route → store-and-forward (queue for later)
      5. If server relay available → use as last resort
    
  3. Forwarding logic (every node runs this):

    on_receive(envelope):
      1. Verify signature
      2. If addressed to us → deliver to application layer
      3. If addressed to someone else:
         a. Check hop_count < max_hops and not expired
         b. Look up destination in routing table
         c. Forward via next-hop transport
         d. If no route → store for later forwarding
    
  4. Path MTU Discovery:

    • When routing across transports with different MTUs, fragment if needed
    • Fragment header: [fragment_id: u32][seq: u8][total: u8][payload]
    • Reassembly buffer with timeout
  5. Routing metrics:

    • Track per-path latency, success rate, hop count
    • Prefer routes with lower cost (fewer hops, higher bitrate)
    • Exponential backoff on failed routes
  6. REPL commands:

    • /mesh send <address> <message> — now works multi-hop
    • /mesh trace <address> — show the route a message would take
    • /mesh stats — delivery statistics per destination

Tests: 3-node relay chain (A→B→C), route failover, fragmentation roundtrip, store-and-forward when intermediate node offline, routing metric updates.

Estimated changes: ~600 lines new code, ~200 lines refactored from existing router.


S5 — Truncated Addresses & Lightweight Handshake

Problem: Full 32-byte public keys in every envelope waste bandwidth on constrained links. QUIC TLS handshake is too heavy for LoRa (2-4 KB).

Solution: Truncated hash-based addresses (Reticulum-style) and a minimal ECDH handshake for low-bandwidth transports.

Deliverables:

  1. address.rs — Mesh address type:

    /// 16-byte truncated address derived from Ed25519 public key.
    /// Matches Reticulum's approach but with different hash construction.
    pub struct MeshAddress([u8; 16]);
    
    impl MeshAddress {
        /// Derive from an Ed25519 public key.
        /// SHA-256(public_key)[0..16]
        pub fn from_public_key(key: &[u8; 32]) -> Self;
    
        /// Check if this address matches a given public key.
        pub fn matches(&self, key: &[u8; 32]) -> bool;
    }
    
  2. Envelope v2 with truncated addresses:

    • Replace sender_key: Vec<u8> (32 bytes) with sender_addr: MeshAddress (16 bytes)
    • Replace recipient_key: Vec<u8> (32 bytes) with recipient_addr: MeshAddress (16 bytes)
    • Full public keys are exchanged during announce (S3) and cached in routing table
    • Saves 32 bytes per envelope (significant on LoRa)
  3. Lightweight handshake for constrained transports:

    Link Setup (inspired by Reticulum, but with PQ option):
    
    Packet 1 (Initiator → Responder): 80 bytes
      [initiator_addr: 16][ephemeral_x25519_pub: 32][nonce: 24][flags: 8]
    
    Packet 2 (Responder → Initiator): 112 bytes
      [responder_addr: 16][ephemeral_x25519_pub: 32][encrypted_identity_proof: 48][nonce: 16]
    
    Packet 3 (Initiator → Responder): 48 bytes
      [encrypted_identity_proof: 48]
    
    Total: 240 bytes (vs 2000-4000 for QUIC TLS)
    Shared secret: HKDF-SHA256(X25519(eph_a, eph_b) || X25519(id_a, eph_b))
    
  4. link.rsMeshLink session type:

    • Negotiated via lightweight handshake on constrained transports
    • ChaCha20-Poly1305 for subsequent messages (using derived shared secret)
    • Heartbeat to keep link alive (configurable, default every 5 min)
    • Link teardown notification
    • Automatic upgrade to QUIC if both sides support it
  5. Feature flag: --features constrained-transport gates the lightweight handshake. QUIC remains the default for Internet/LAN.

Tests: Address derivation, collision resistance (generate 10K addresses, check no collisions), handshake 3-packet roundtrip, link encryption roundtrip, envelope v2 with truncated addresses.

Estimated changes: ~500 lines new code.


S6 — LoRa Transport & Integration Demo

Problem: All the mesh infrastructure from S1-S5 needs a real constrained-transport to prove it works.

Solution: LoRa transport backend + end-to-end demo with Meshtastic-compatible or standalone LoRa hardware.

Deliverables:

  1. transport_lora.rs — LoRa transport implementation:

    pub struct LoRaTransport {
        /// Serial connection to LoRa modem (e.g., SX1276/SX1262 via UART).
        serial: AsyncSerial,
        /// LoRa parameters.
        config: LoRaConfig,
    }
    
    pub struct LoRaConfig {
        /// Serial port path (e.g., /dev/ttyUSB0).
        pub port: String,
        /// Baud rate for serial connection to modem.
        pub baud_rate: u32,
        /// LoRa frequency in Hz (e.g., 868_100_000 for EU868).
        pub frequency: u64,
        /// Spreading factor (7-12).
        pub spreading_factor: u8,
        /// Bandwidth in Hz (125000, 250000, 500000).
        pub bandwidth: u32,
        /// Coding rate (5-8, meaning 4/5 to 4/8).
        pub coding_rate: u8,
        /// TX power in dBm.
        pub tx_power: i8,
    }
    
  2. MTU-aware fragmentation:

    • LoRa MTU is typically 222 bytes (SF7/BW125) to 51 bytes (SF12/BW125)
    • Automatic fragmentation/reassembly in TransportManager
    • Fragment numbering for out-of-order reassembly
  3. Duty cycle management:

    • EU868: 1% duty cycle enforcement
    • TX budget tracking: don't exceed legal limits
    • Queue with priority (announces < data < emergency)
  4. End-to-end integration demo:

    Setup:
      Node A (Laptop + LoRa) ── LoRa ── Node B (RPi + LoRa) ── WiFi ── Node C (Laptop)
    
    Demo script:
      1. All three nodes start, announce on their transports
      2. A discovers C through B's routing announcements
      3. A sends encrypted message to C: LoRa → B (relay) → WiFi → C
      4. C replies: WiFi → B (relay) → LoRa → A
      5. Show routing table, hop counts, delivery stats at each node
    
  5. scripts/mesh-demo.sh — automated demo setup script.

  6. Termux integration:

    • Update existing Termux build scripts for the mesh features
    • Android phone as a LoRa mesh node (via USB OTG to LoRa modem)

Tests: LoRa transport with mock serial (loopback), fragmentation across LoRa MTU, duty cycle enforcement, 3-node integration test (simulated transports).

Hardware needed: 2-3x LoRa modules (SX1262 recommended), RPi or similar.

Estimated changes: ~600 lines new code, ~50 lines build/script changes.


Dependency Graph

S1 (Binary Wire)     S2 (Transport Trait)
       │                      │
       └──────┬───────────────┘
              │
       S3 (Announce/Discovery)
              │
       S4 (Multi-Hop Routing)
              │
       S5 (Addresses + Handshake)
              │
       S6 (LoRa + Demo)

S1 and S2 can run in parallel (no dependency). S3+ are sequential.


Comparison: quicprochat (after) vs Reticulum

Dimension Reticulum quicprochat (post-upgrade)
Language Python Rust (no_std possible)
Crypto X25519, AES-256-CBC, HMAC-SHA256 Ed25519, X25519+ML-KEM-768, ChaCha20-Poly1305, MLS
Post-Quantum No Yes (ML-KEM-768 hybrid)
Group Encryption None (link-level only) MLS RFC 9420 (forward secrecy + PCS)
Wire Format msgpack CBOR (compact, IETF standard)
Spec Reference implementation only Protobuf schemas + potential IETF Draft
Transport Agnostic Yes (mature, 8 years) Yes (new, but Rust-native)
Multi-Hop Routing Yes (announce + path discovery) Yes (inspired by Reticulum)
Handshake Size 297 bytes ~240 bytes
Security Audit None Designed for auditability (fuzzing, formal model)
Embedded Targets No (CPython required) Yes (Rust cross-compile, no_std core)
LoRa Support Yes (via RNode) Yes (direct SX1262 + Meshtastic compat)

Risk Register

Risk Impact Mitigation
LoRa hardware availability Blocks S6 S1-S5 work with simulated transports; LoRa is optional
iroh API breaking changes Medium Pin iroh version, abstract behind transport trait (S2)
Address collision (16-byte truncation) Low (birthday: ~2^64) Monitor, option to use full 32-byte if needed
Lightweight handshake security gaps High Get crypto review before deploying on real networks
Fragmentation complexity Medium Start with simple stop-and-wait, optimize later

Success Criteria

After S4 (minimum viable mesh):

  • 3+ nodes form a self-organizing mesh over TCP transports
  • Messages route automatically through intermediate nodes
  • Node join/leave is handled gracefully (re-announce, route expiry)
  • Wire format is <200 bytes for a typical chat message envelope

After S6 (full demo):

  • Working LoRa ↔ WiFi ↔ QUIC heterogeneous mesh
  • Message delivery across 3 hops with different transports
  • Duty cycle compliance on EU868
  • Android (Termux) node participates in the mesh