feat: add observability module and wire MeshNode run() with background tasks
Add health checks (/healthz), Prometheus metrics export (/metricsz), and tracing spans to the P2P mesh node. MeshNode.run() starts GC and health server as background tasks, returning a RunHandle for lifecycle management. Health endpoint returns 503 during graceful shutdown drain.
This commit is contained in:
@@ -1,5 +1,41 @@
|
||||
# Status Log
|
||||
|
||||
## 2026-04-11 — Observability & MeshNode run() wiring
|
||||
|
||||
### Completed
|
||||
- **observability.rs** — new module with health checks, Prometheus text export, HTTP server
|
||||
- `NodeHealth` struct with per-subsystem health checks (transport, routing, store)
|
||||
- `HealthStatus` enum (Healthy/Degraded/Draining/Unhealthy) with HTTP status codes
|
||||
- `prometheus_text()` — renders `MetricsSnapshot` in Prometheus exposition format
|
||||
- `HealthServer` — lightweight TCP-based HTTP server for `/healthz` and `/metricsz`
|
||||
- **MeshNode.run()** — starts background tasks and returns a `RunHandle`
|
||||
- Periodic GC task (store, routing table, rate limiters) with configurable interval
|
||||
- Health/metrics HTTP server (optional, via `MeshNodeBuilder.health_listen()`)
|
||||
- Shutdown coordination via `watch` channel
|
||||
- **RunHandle** — public API for interacting with a running node
|
||||
- `.node()` — access to the MeshNode
|
||||
- `.health()` — current health snapshot
|
||||
- `.metrics_snapshot()` — current metrics
|
||||
- `.health_addr()` — bound health server address
|
||||
- `.shutdown()` — graceful shutdown (signals tasks + drains transports)
|
||||
- **Tracing spans** — `#[tracing::instrument]` on `process_incoming()` and `send()`
|
||||
- Includes sender/dest address and payload length as span fields
|
||||
- GC cycle wrapped in `mesh_gc` info span
|
||||
- **Draining flag** — `AtomicBool` for shutdown awareness; health endpoint returns 503
|
||||
|
||||
### Test Coverage
|
||||
- 232 total tests passing (212 lib + 3 fapp_flow + 1 meshservice + 16 multi_node)
|
||||
- 7 new observability unit tests (health healthy/degraded/draining, prometheus format)
|
||||
- Full workspace `cargo check` clean
|
||||
|
||||
### What's Next
|
||||
1. Wire `MeshNode.run()` into an example binary or the server
|
||||
2. Announce loop task (periodic re-announce to neighbors)
|
||||
3. Grafana dashboard for mesh metrics
|
||||
4. Integration test for health HTTP endpoint
|
||||
|
||||
---
|
||||
|
||||
## 2026-04-01 — meshservice workspace integration
|
||||
|
||||
### Completed
|
||||
|
||||
Reference in New Issue
Block a user