feat: Phase 9 — developer experience, extensibility, and community growth

New crates: - quicproquo-bot: Bot SDK with polling API + JSON pipe mode - quicproquo-kt: Key Transparency Merkle log (RFC 9162 subset) - quicproquo-plugin-api: no_std C-compatible plugin vtable API - quicproquo-gen: scaffolding tool (qpq-gen plugin/bot/rpc/hook) Server features: - ServerHooks trait wired into all RPC handlers (enqueue, fetch, auth, channel, registration) with plugin rejection support - Dynamic plugin loader (libloading) with --plugin-dir config - Delivery proof canary tokens (Ed25519 server signatures on enqueue) - Key Transparency Merkle log with inclusion proofs on resolveUser Core library: - Safety numbers (60-digit HMAC-SHA256 key verification codes) - Verifiable transcript archive (CBOR + ChaCha20-Poly1305 + hash chain) - Delivery proof verification utility - Criterion benchmarks (hybrid KEM, MLS, identity, sealed sender, padding) Client: - /verify REPL command for out-of-band key verification - Full-screen TUI via Ratatui (feature-gated --features tui) - qpq export / qpq export-verify CLI subcommands - KT inclusion proof verification on user resolution Also: ROADMAP Phase 9 added, bot SDK docs, server hooks docs, crate-responsibilities updated, example plugins (rate_limit, logging).
2026-03-03 22:47:38 +01:00
parent b6483dedbc
commit dc4e4e49a0
62 changed files with 6959 additions and 62 deletions
--- a/docs/AGENT-TEAM.md
+++ b/docs/AGENT-TEAM.md
@@ -0,0 +1,483 @@
+# quicproquo — AI Agent Team Specification
+
+> A structured multi-agent system for bringing quicproquo from working prototype
+> to production-grade, audited, documented, deployable software.
+
+---
+
+## Philosophy
+
+This team exists because shipping production software requires more than writing
+code. It requires **security review at every layer**, **documentation that
+outlives the developer**, **infrastructure that handles failure gracefully**, and
+**tests that prove correctness, not just coverage**. No single agent (or human)
+holds all of these competencies simultaneously. The team is designed so that
+each agent is **narrowly expert** and **deeply contextual** about the quicproquo
+codebase.
+
+### Principles
+
+1. **Read before write.** Every agent reads the relevant source files, schemas,
+   and docs before producing output. No agent guesses at code structure.
+2. **Scope discipline.** Agents only touch their assigned crates and concern
+   areas. A server-dev never edits client code. A security auditor never edits
+   production code.
+3. **Security is not optional.** Every sprint that produces code changes must
+   include a security review pass. This is not a suggestion — it is a gate.
+4. **Docs are a deliverable.** Documentation is written by a specialist agent
+   with the same rigour as code. API docs, architecture docs, and user guides
+   are first-class outputs.
+5. **Incremental, verifiable progress.** Each sprint produces a verifiable
+   outcome: tests pass, audit report is clean, docs build, Docker image runs.
+
+---
+
+## Team Roster
+
+### Development Agents
+
+| Agent | Scope | Tools | Edits Code? |
+|-------|-------|-------|-------------|
+| `rust-architect` | Architecture design, ADRs, crate boundary review | Read, Glob, Grep | No |
+| `rust-core-dev` | `quicproquo-core`: crypto, MLS, Noise, hybrid KEM | Read, Glob, Grep, Edit, Write, Bash | Yes |
+| `rust-server-dev` | `quicproquo-server`: AS, DS, RPC, storage, federation | Read, Glob, Grep, Edit, Write, Bash | Yes |
+| `rust-client-dev` | `quicproquo-client`: CLI, REPL, OPAQUE, local state | Read, Glob, Grep, Edit, Write, Bash | Yes |
+
+### Security Agents
+
+| Agent | Scope | Tools | Edits Code? |
+|-------|-------|-------|-------------|
+| `security-auditor` | Code review, finding report, threat analysis | Read, Glob, Grep | No |
+
+### Quality Agents
+
+| Agent | Scope | Tools | Edits Code? |
+|-------|-------|-------|-------------|
+| `test-engineer` | Unit, integration, E2E, property tests, coverage | Read, Glob, Grep, Edit, Write, Bash | Yes (tests only) |
+| `devops-engineer` | Docker, CI/CD, deployment, monitoring, infrastructure | Read, Glob, Grep, Edit, Write, Bash | Yes |
+
+### Documentation Agents
+
+| Agent | Scope | Tools | Edits Code? |
+|-------|-------|-------|-------------|
+| `docs-engineer` | User guides, API docs, architecture docs, mdBook | Read, Glob, Grep, Edit, Write, Bash | Yes (docs only) |
+
+### Coordination Agents
+
+| Agent | Scope | Tools | Edits Code? |
+|-------|-------|-------|-------------|
+| `roadmap-tracker` | Progress assessment, status reports, blocker analysis | Read, Glob, Grep | No |
+
+---
+
+## Agent Role Specifications
+
+### rust-architect
+
+**Identity:** Senior Rust systems architect with deep knowledge of MLS
+(RFC 9420), Noise Protocol Framework, Cap'n Proto RPC, and post-quantum
+cryptography.
+
+**Reads:** `master-prompt.md`, `ROADMAP.md`, all `.capnp` schemas, crate
+`lib.rs` and `mod.rs` files, `Cargo.toml` dependency lists.
+
+**Produces:**
+- Architecture Decision Records (ADR) in Context → Decision → Consequences format
+- Crate boundary violation reports
+- Dependency impact assessments for new crates
+- Design documents for features spanning multiple crates
+- Review feedback on proposed implementations
+
+**Never does:** Write implementation code, edit source files, run commands.
+
+**Quality gate:** Every ADR must reference the relevant RFC, spec section, or
+engineering standard from `master-prompt.md`.
+
+---
+
+### rust-core-dev
+
+**Identity:** Cryptography-focused Rust developer. Expert in `openmls`, `snow`,
+`ml-kem`, `opaque-ke`, `zeroize`, and the `dalek` ecosystem.
+
+**Owns:** `crates/quicproquo-core/`
+
+**Security invariants (non-negotiable):**
+- Every crypto operation returns `Result` — never `.unwrap()` or `.expect()`
+- All key material types derive `Zeroize` and `ZeroizeOnDrop`
+- No secret bytes in `tracing` or `log` output
+- Constant-time comparisons via `subtle::ConstantTimeEq` for auth tags
+- No `unsafe` without a `// SAFETY:` comment documenting the invariant
+
+**Before any edit:**
+1. Read the target file in full
+2. Read `ROADMAP.md` to verify the change is in scope
+3. Read `master-prompt.md` §Non-Negotiable Engineering Standards
+4. Check if a new dependency is needed — if yes, justify in commit message
+
+**After any edit:** `cargo check -p quicproquo-core && cargo test -p quicproquo-core`
+
+---
+
+### rust-server-dev
+
+**Identity:** Backend systems developer. Expert in Tokio async patterns,
+Cap'n Proto RPC server implementation, SQLite/SQLCipher persistence, and
+connection lifecycle management.
+
+**Owns:** `crates/quicproquo-server/`
+
+**Security invariants:**
+- No `.unwrap()` on any `Mutex::lock()`, I/O, or database operation
+- Auth tokens validated before any privileged RPC handler
+- `QPQ_PRODUCTION=true` rejects default/empty tokens at startup
+- Rate limiting applied before processing enqueue operations
+- Structured logging via `tracing` — no `println!` or `eprintln!`
+
+**Before any edit:**
+1. Read the target file and its corresponding `.capnp` schema
+2. Verify the Cap'n Proto interface hasn't changed out from under you
+3. Check for existing tests in `crates/quicproquo-server/tests/`
+
+**After any edit:** `cargo check -p quicproquo-server && cargo test -p quicproquo-server`
+
+---
+
+### rust-client-dev
+
+**Identity:** CLI and application developer. Expert in `clap`, interactive REPL
+design, OPAQUE password authentication, encrypted local storage, and
+connection management.
+
+**Owns:** `crates/quicproquo-client/`
+
+**UX invariants:**
+- Clear, user-facing error messages — no raw Rust error types in REPL output
+- REPL prompt shows current context (server address, active conversation)
+- Graceful handling of server disconnection with auto-reconnect
+- State file encrypted with Argon2id + ChaCha20-Poly1305
+
+**Before any edit:**
+1. Read the target file and related command handlers in `commands.rs`
+2. Understand state management in `state.rs`
+3. Check the REPL command table for conflicts
+
+**After any edit:** `cargo check -p quicproquo-client && cargo test -p quicproquo-client`
+
+---
+
+### security-auditor
+
+**Identity:** Application security engineer specialising in cryptographic
+protocol implementations. Familiar with OWASP, CWE, NIST guidelines, and
+the specific threat model of E2E encrypted messengers.
+
+**Audit checklist (every review):**
+1. `.unwrap()` / `.expect()` outside `#[cfg(test)]` on crypto or I/O paths
+2. Key material types missing `Zeroize` / `ZeroizeOnDrop`
+3. Secrets (keys, passwords, tokens, nonces) reaching `tracing`/`log`/`println`
+4. Non-constant-time comparisons on authentication tags, tokens, or MACs
+5. `panic!` / `unreachable!` in production paths
+6. `unsafe` blocks without documented safety invariants
+7. Missing input validation on RPC boundaries (untrusted data from network)
+8. Race conditions in shared state (DashMap, Mutex, RwLock patterns)
+9. Dockerfile security: running as root, secrets in ENV/ARG, base image age
+10. Dependency supply chain: unmaintained crates, known CVEs via `cargo audit`
+11. Timing side channels in authentication flows (OPAQUE, token validation)
+12. Replay attack vectors in message delivery
+
+**Output format:** Prioritised Markdown report with severity levels:
+`Critical > High > Medium > Low > Informational`
+
+Each finding includes: file:line, description, attack scenario, remediation.
+
+**Never does:** Edit source files. Findings only.
+
+---
+
+### test-engineer
+
+**Identity:** QA engineer with expertise in Rust testing patterns, property-based
+testing (`proptest`), integration test harnesses, and E2E test design for
+networked systems.
+
+**Responsibilities:**
+- Write unit tests inside `#[cfg(test)]` modules
+- Write integration tests in `crates/<crate>/tests/`
+- Write E2E tests that spin up server + client(s)
+- Run `cargo test` and diagnose failures
+- Verify test coverage against ROADMAP milestone criteria
+- Identify untested code paths and edge cases
+
+**Naming convention:** `test_<what>_<expected_outcome>` (snake_case)
+
+**E2E test requirements:**
+- Use `AUTH_LOCK` mutex for tests that share auth context
+- Run with `--test-threads 1` for E2E tests
+- Clean up spawned server processes on test completion
+- Assert on specific error types, not just `is_err()`
+
+**After writing tests:** Run them, report pass/fail, diagnose failures.
+
+---
+
+### devops-engineer
+
+**Identity:** Infrastructure and deployment engineer. Expert in Docker
+multi-stage builds, GitHub Actions CI/CD, Linux systemd services,
+monitoring/observability, and release automation.
+
+**Owns:** `docker/`, `.github/`, `docker-compose.yml`, deployment configs
+
+**Responsibilities:**
+- Docker image builds, optimisation, and security hardening
+- CI pipeline maintenance and enhancement
+- Release automation (cargo-release, changelogs, tagging)
+- Monitoring setup (Prometheus metrics endpoint, Grafana dashboards)
+- Deployment documentation (systemd units, Docker Compose, Kubernetes)
+- Infrastructure-as-code for test and staging environments
+- Cross-compilation targets (musl, ARM, MIPS for OpenWrt)
+- Binary size optimisation for embedded/mesh deployments
+
+**Quality gates:**
+- Docker image builds successfully: `docker build -f docker/Dockerfile .`
+- CI pipeline passes locally: `act` or manual validation
+- Release artifacts are reproducible
+
+---
+
+### docs-engineer
+
+**Identity:** Technical writer with deep understanding of cryptographic
+protocols and systems programming. Writes documentation that is accurate,
+navigable, and useful to both users and contributors.
+
+**Owns:** `docs/`, `README.md`, `CONTRIBUTING.md`, `SECURITY.md`, inline
+doc comments on public APIs
+
+**Documentation tiers:**
+
+1. **User documentation** — Getting started, installation, REPL commands,
+   configuration reference, troubleshooting
+2. **Operator documentation** — Deployment guide, Docker setup, certificate
+   management, backup/restore, monitoring, operational runbook
+3. **Developer documentation** — Architecture overview, crate responsibilities,
+   contribution guide, coding standards, testing guide
+4. **Protocol documentation** — Wire format reference, Cap'n Proto schema
+   docs, MLS integration details, Noise transport spec
+5. **Security documentation** — Threat model, trust boundaries, key lifecycle,
+   audit reports, responsible disclosure policy
+
+**Quality gates:**
+- `mdbook build docs/` succeeds without warnings
+- All code examples in docs compile (`cargo test --doc`)
+- Internal links resolve (no broken cross-references)
+- Every public API has a doc comment with examples
+
+---
+
+### roadmap-tracker
+
+**Identity:** Project manager and progress analyst. Reads code and docs to
+objectively assess completion status.
+
+**Method:**
+1. Read `ROADMAP.md` in full
+2. For each unchecked `- [ ]` item, search source for implementation evidence
+3. Classify: Complete, Partial (what exists vs. what's missing), Not Started
+4. Identify blockers (dependency chains between items)
+5. Identify quick wins (< 1 hour, self-contained, high impact)
+
+**Output:** Structured Markdown status report.
+
+**Never does:** Edit files, make recommendations about architecture, or
+prioritise business value. Pure objective assessment.
+
+---
+
+## Sprint Definitions
+
+Sprints are groups of agent tasks that can run in parallel. Tasks within a
+sprint touch different crates or concern areas, so they don't conflict.
+
+### Production Readiness Path
+
+The sprints below form a dependency chain. Run them in order.
+
+```
+status → audit → phase1-hardening → phase1-infra → phase2-tests →
+docs-foundation → security-review → release-prep
+```
+
+### Sprint: `status`
+
+**Purpose:** Baseline assessment before starting work.
+
+| Agent | Task |
+|-------|------|
+| `roadmap-tracker` | Full roadmap status report across all phases |
+| `security-auditor` | Quick security sweep of recent changes (HEAD~10) |
+
+### Sprint: `audit`
+
+**Purpose:** Deep security audit + roadmap analysis.
+
+| Agent | Task |
+|-------|------|
+| `security-auditor` | Full audit of quicproquo-core and quicproquo-server |
+| `roadmap-tracker` | Detailed Phase 1 and Phase 2 completion assessment |
+
+### Sprint: `phase1-hardening`
+
+**Purpose:** Eliminate crash paths and enforce secure defaults.
+
+| Agent | Task |
+|-------|------|
+| `rust-core-dev` | Remove `.unwrap()`/`.expect()` from non-test code in core |
+| `rust-server-dev` | Remove `.unwrap()`/`.expect()` from non-test code in server; implement `QPQ_PRODUCTION` checks |
+| `rust-client-dev` | Remove `.unwrap()`/`.expect()` from non-test code in client; fix `AUTH_CONTEXT.read().expect()` |
+
+### Sprint: `phase1-infra`
+
+**Purpose:** Fix deployment infrastructure.
+
+| Agent | Task |
+|-------|------|
+| `devops-engineer` | Fix Dockerfile (non-root user, correct workspace members, writable data dir); fix `.gitignore`; validate Docker build |
+| `rust-architect` | Design TLS certificate lifecycle: CA-signed cert flow, `--tls-required` flag, rotation without downtime |
+
+### Sprint: `phase2-tests`
+
+**Purpose:** Build test confidence.
+
+| Agent | Task |
+|-------|------|
+| `test-engineer` | E2E tests: auth failures, message ordering, concurrent clients, KeyPackage exhaustion |
+| `test-engineer` | Unit tests: REPL parsing edge cases, token cache expiry, state file encryption round-trip |
+| `devops-engineer` | CI hardening: coverage reporting, Docker build validation in CI, `CODEOWNERS` enforcement |
+
+### Sprint: `docs-foundation`
+
+**Purpose:** Create production-quality documentation.
+
+| Agent | Task |
+|-------|------|
+| `docs-engineer` | Create root-level `SECURITY.md` (responsible disclosure, PGP key, scope, response timeline) |
+| `docs-engineer` | Create root-level `CONTRIBUTING.md` (dev setup, PR process, commit conventions, testing, review checklist) |
+| `docs-engineer` | Audit and update all `docs/src/` pages for accuracy against current codebase; fix broken references |
+| `docs-engineer` | Write operator deployment guide: Docker, systemd, certificate setup, monitoring, backup/restore |
+
+### Sprint: `security-review`
+
+**Purpose:** Final security gate before release.
+
+| Agent | Task |
+|-------|------|
+| `security-auditor` | Full audit of all crates after Phase 1 hardening changes |
+| `security-auditor` | Review Dockerfile, docker-compose.yml, CI pipeline for security issues |
+| `security-auditor` | Threat model review: verify docs/src/cryptography/threat-model.md matches current implementation |
+
+### Sprint: `release-prep`
+
+**Purpose:** Prepare for first production release.
+
+| Agent | Task |
+|-------|------|
+| `devops-engineer` | Set up cargo-release workflow, CHANGELOG.md generation, version tagging strategy |
+| `docs-engineer` | Final README.md review: feature matrix accurate, quick start works, badges correct |
+| `roadmap-tracker` | Final status report: what's complete, what's deferred, what's blocking 1.0 |
+
+---
+
+## Usage
+
+```bash
+# Full orchestrator mode — orchestrator delegates to the right agents
+python scripts/ai_team.py "Implement Phase 1.1 unwrap removal across all crates"
+
+# Direct agent access — bypass orchestrator for focused work
+python scripts/ai_team.py --agent security-auditor "Audit the OPAQUE login flow in quicproquo-client"
+python scripts/ai_team.py --agent docs-engineer "Write the operator deployment guide"
+
+# Predefined parallel sprint — multiple agents work simultaneously
+python scripts/ai_team.py --sprint audit
+python scripts/ai_team.py --sprint phase1-hardening
+python scripts/ai_team.py --sprint docs-foundation
+
+# Ad-hoc parallel tasks
+python scripts/ai_team.py --parallel \
+    "rust-server-dev: Fix rate limiting bypass in enqueue handler" \
+    "security-auditor: Review the rate limiting implementation"
+
+# Discovery
+python scripts/ai_team.py --list-agents
+python scripts/ai_team.py --list-sprints
+```
+
+### Recommended Production Readiness Sequence
+
+```bash
+# 1. Assess current state
+python scripts/ai_team.py --sprint status
+
+# 2. Deep audit
+python scripts/ai_team.py --sprint audit
+
+# 3. Fix critical issues (code changes)
+python scripts/ai_team.py --sprint phase1-hardening
+
+# 4. Fix infrastructure
+python scripts/ai_team.py --sprint phase1-infra
+
+# 5. Build test confidence
+python scripts/ai_team.py --sprint phase2-tests
+
+# 6. Write documentation
+python scripts/ai_team.py --sprint docs-foundation
+
+# 7. Final security review (after all code changes)
+python scripts/ai_team.py --sprint security-review
+
+# 8. Prepare release
+python scripts/ai_team.py --sprint release-prep
+```
+
+---
+
+## Quality Gates
+
+Every sprint must pass its quality gate before the next sprint begins.
+
+| Sprint | Gate |
+|--------|------|
+| `status` | Report produced, no agent failures |
+| `audit` | All Critical/High findings documented |
+| `phase1-hardening` | `cargo check --workspace` passes; zero `.unwrap()` outside `#[cfg(test)]` |
+| `phase1-infra` | `docker build -f docker/Dockerfile .` succeeds; `.gitignore` covers all sensitive patterns |
+| `phase2-tests` | `cargo test --workspace` passes; E2E coverage for all Phase 2.1 items |
+| `docs-foundation` | `mdbook build docs/` succeeds; `SECURITY.md` and `CONTRIBUTING.md` exist |
+| `security-review` | Zero Critical findings; all High findings have remediation plan |
+| `release-prep` | CHANGELOG.md exists; version tags consistent; README quick start verified |
+
+---
+
+## Extending the Team
+
+To add a new agent:
+
+1. Define it in `AGENTS` dict in `scripts/ai_team.py`
+2. Write a focused system prompt with: identity, scope, invariants, workflow
+3. Specify the minimal tool set (prefer read-only when possible)
+4. Add it to relevant sprints
+5. Document it in this file
+
+To add a new sprint:
+
+1. Define it in `SPRINTS` dict in `scripts/ai_team.py`
+2. Ensure all tasks within the sprint touch different files/crates
+3. Document the quality gate
+4. Add it to the dependency chain if it has ordering requirements
+
+---
+
+*quicproquo AI Agent Team — v2.0 | 2026-03-03*
--- a/docs/src/SUMMARY.md
+++ b/docs/src/SUMMARY.md
@@ -19,6 +19,7 @@
 - [Running the Client](getting-started/running-the-client.md)
 - [Certificate Lifecycle and CA-Signed TLS](getting-started/certificate-lifecycle.md)
 - [Docker Deployment](getting-started/docker.md)
+- [Bot SDK](getting-started/bot-sdk.md)
 - [Demo Walkthrough: Alice and Bob](getting-started/demo-walkthrough.md)

 ---
@@ -82,6 +83,7 @@
 - [Delivery Service Internals](internals/delivery-service.md)
 - [Authentication Service Internals](internals/authentication-service.md)
 - [Storage Backend](internals/storage-backend.md)
+- [Server Hooks (Plugin System)](internals/server-hooks.md)

 ---

--- a/docs/src/architecture/crate-responsibilities.md
+++ b/docs/src/architecture/crate-responsibilities.md
@@ -200,6 +200,39 @@ group state to disk.

 ---

+## quicproquo-bot
+
+**Role:** High-level SDK for building automated agents (bots) on the
+quicproquo network. Wraps the client library into a simple polling-based API.
+
+### Components
+
+| Component        | Description |
+|------------------|-------------|
+| `BotConfig`      | Builder-pattern configuration: server address, credentials, TLS, state file path. |
+| `Bot`            | Connected bot instance. Methods: `connect()`, `send_dm()`, `receive()`, `receive_raw()`, `resolve_user()`. |
+| `Message`        | Received message struct with `sender`, `text`, and `seq` fields. |
+| `run_pipe_mode`  | JSON-lines stdin/stdout interface for shell integration (`send`, `recv`, `resolve` actions). |
+
+### Architecture
+
+Each `send_dm` and `receive` call opens a fresh QUIC connection (stateless
+reconnect pattern). The bot wraps the client's `cmd_send` and
+`receive_pending_plaintexts` functions, handling MLS group state internally.
+
+### What this crate does NOT do
+
+- No server-side logic.
+- No raw MLS operations — delegates to `quicproquo-client` high-level functions.
+- No persistent QUIC connections — each operation reconnects.
+
+### Key dependencies
+
+`quicproquo-core`, `quicproquo-client`, `tokio`, `anyhow`, `tracing`,
+`serde`, `serde_json`, `hex`.
+
+---
+
 ## Other workspace crates

 | Crate                   | Role |
--- a/docs/src/getting-started/bot-sdk.md
+++ b/docs/src/getting-started/bot-sdk.md
@@ -0,0 +1,233 @@
+# Bot SDK
+
+The `quicproquo-bot` crate provides a high-level SDK for building automated
+agents on the quicproquo network. Bots authenticate with OPAQUE, send and
+receive E2E encrypted messages through MLS, and can be driven programmatically
+or via a JSON pipe interface for shell integration.
+
+---
+
+## Adding the dependency
+
+```toml
+[dependencies]
+quicproquo-bot = { path = "../crates/quicproquo-bot" }
+tokio = { version = "1", features = ["macros", "rt-multi-thread"] }
+anyhow = "1"
+```
+
+---
+
+## Quick start
+
+```rust,no_run
+use quicproquo_bot::{Bot, BotConfig};
+
+#[tokio::main]
+async fn main() -> anyhow::Result<()> {
+    let config = BotConfig::new("127.0.0.1:7000", "bot-user", "bot-password")
+        .ca_cert("server-cert.der")
+        .state_path("bot-state.bin");
+
+    let bot = Bot::connect(config).await?;
+
+    // Send a DM
+    bot.send_dm("alice", "Hello from bot!").await?;
+
+    // Poll for messages
+    loop {
+        for msg in bot.receive(5000).await? {
+            println!("{}: {}", msg.sender, msg.text);
+            if msg.text.starts_with("!echo ") {
+                bot.send_dm(&msg.sender, &msg.text[6..]).await?;
+            }
+        }
+    }
+}
+```
+
+---
+
+## Configuration
+
+`BotConfig` uses a builder pattern. The only required arguments are the server
+address, username, and password:
+
+```rust,no_run
+# use quicproquo_bot::BotConfig;
+let config = BotConfig::new("127.0.0.1:7000", "my-bot", "secret123")
+    .ca_cert("certs/server-cert.der")   // TLS CA certificate (DER format)
+    .server_name("my-server.example")   // TLS SNI (default: "localhost")
+    .state_path("my-bot-state.bin")     // Persistent state file
+    .state_password("encrypt-me")       // State file encryption password
+    .device_id("bot-device-1");         // Device identifier
+```
+
+| Method            | Default               | Description |
+|-------------------|-----------------------|-------------|
+| `ca_cert()`       | `"server-cert.der"`   | Path to the server's CA certificate in DER format. |
+| `server_name()`   | `"localhost"`         | TLS server name for certificate validation. |
+| `state_path()`    | `"bot-state.bin"`     | Path to the bot's encrypted state file. |
+| `state_password()` | None (unencrypted)   | Password for encrypting the state file at rest. |
+| `device_id()`     | None                  | Device ID reported to the server in auth tokens. |
+
+---
+
+## Sending messages
+
+```rust,no_run
+# use quicproquo_bot::Bot;
+# async fn example(bot: &Bot) -> anyhow::Result<()> {
+// Send a plaintext DM — encryption is handled internally via MLS
+bot.send_dm("alice", "Hello!").await?;
+# Ok(())
+# }
+```
+
+`send_dm` resolves the username, establishes or joins the MLS group for the DM
+channel, encrypts the plaintext, and delivers it through the server. Each call
+opens a fresh QUIC connection (stateless reconnect pattern).
+
+---
+
+## Receiving messages
+
+```rust,no_run
+# use quicproquo_bot::Bot;
+# async fn example(bot: &Bot) -> anyhow::Result<()> {
+// Wait up to 5 seconds for pending messages
+let messages = bot.receive(5000).await?;
+for msg in &messages {
+    println!("[seq={}] {}: {}", msg.seq, msg.sender, msg.text);
+}
+
+// For binary/non-UTF-8 content, use receive_raw
+let raw_messages = bot.receive_raw(5000).await?;
+for payload in &raw_messages {
+    println!("received {} bytes", payload.len());
+}
+# Ok(())
+# }
+```
+
+The `Message` struct contains:
+
+| Field    | Type     | Description |
+|----------|----------|-------------|
+| `sender` | `String` | The sender's username. |
+| `text`   | `String` | Decrypted plaintext content (UTF-8). |
+| `seq`    | `u64`    | Sequence number. |
+
+---
+
+## Resolving users
+
+```rust,no_run
+# use quicproquo_bot::Bot;
+# async fn example(bot: &Bot) -> anyhow::Result<()> {
+let identity_key = bot.resolve_user("alice").await?;
+println!("alice's identity key: {} bytes", identity_key.len());
+# Ok(())
+# }
+```
+
+---
+
+## Identity inspection
+
+```rust,no_run
+# use quicproquo_bot::Bot;
+# fn example(bot: &Bot) {
+println!("username: {}", bot.username());
+println!("identity key (hex): {}", bot.identity_key_hex());
+let raw_key: [u8; 32] = bot.identity_key();
+# }
+```
+
+---
+
+## Pipe mode (stdin/stdout JSON lines)
+
+For shell integration, the bot SDK supports a JSON-lines pipe interface. Each
+line on stdin is a JSON command; results are written to stdout as JSON lines.
+
+### Supported actions
+
+**Send a message:**
+
+```json
+{"action": "send", "to": "alice", "text": "hello from pipe"}
+```
+
+Response:
+
+```json
+{"status": "ok", "action": "send"}
+```
+
+**Receive pending messages:**
+
+```json
+{"action": "recv", "timeout_ms": 5000}
+```
+
+Response:
+
+```json
+{"status": "ok", "messages": [{"sender": "peer", "text": "hi", "seq": 0}]}
+```
+
+**Resolve a username:**
+
+```json
+{"action": "resolve", "username": "alice"}
+```
+
+Response:
+
+```json
+{"status": "ok", "identity_key": "ab12cd34..."}
+```
+
+### Error responses
+
+All actions return an error object on failure:
+
+```json
+{"error": "OPAQUE login: connection refused"}
+```
+
+### Shell examples
+
+```bash
+# Send via pipe
+echo '{"action":"send","to":"alice","text":"hello"}' | my-bot-binary
+
+# Receive via pipe
+echo '{"action":"recv","timeout_ms":5000}' | my-bot-binary
+
+# Use with jq for pretty output
+echo '{"action":"recv","timeout_ms":3000}' | my-bot-binary | jq .
+```
+
+---
+
+## Architecture notes
+
+- **Stateless reconnect**: Each `send_dm` and `receive` call opens a fresh QUIC
+  connection. There is no persistent connection to manage.
+- **MLS encryption**: All messages are end-to-end encrypted via MLS (RFC 9420).
+  The bot SDK wraps the client library's `cmd_send` and
+  `receive_pending_plaintexts` functions.
+- **State persistence**: The bot's identity seed and MLS group state are stored
+  in the state file. Losing this file means losing the bot's identity.
+- **Cap'n Proto !Send**: RPC calls run on a `tokio::task::LocalSet` because
+  `capnp-rpc` is `!Send`.
+
+---
+
+## Next steps
+
+- [Running the Client](running-the-client.md) -- CLI subcommands and REPL
+- [Server Hooks](../internals/server-hooks.md) -- extend the server with plugins
+- [Demo Walkthrough](demo-walkthrough.md) -- step-by-step messaging scenario
--- a/docs/src/internals/server-hooks.md
+++ b/docs/src/internals/server-hooks.md
@@ -0,0 +1,259 @@
+# Server Hooks
+
+The `ServerHooks` trait provides a plugin system for extending the quicproquo
+server. Hooks fire at key points in the request lifecycle — message delivery,
+authentication, channel creation, and message fetch — allowing you to inspect,
+log, rate-limit, or reject operations without modifying server internals.
+
+---
+
+## Overview
+
+```text
+Client RPC request
+  └─ Validation (auth, rate limits, wire format)
+       └─ Hook fires (on_message_enqueue, on_auth, etc.)
+            ├─ HookAction::Continue → proceed to storage/delivery
+            └─ HookAction::Reject("reason") → error returned to client
+```
+
+Hooks are called **synchronously** in the RPC handler path after validation
+but before storage. Keep hook implementations fast — offload heavy work
+(HTTP calls, disk I/O, analytics) to background tasks.
+
+---
+
+## The `ServerHooks` trait
+
+```rust,ignore
+pub trait ServerHooks: Send + Sync {
+    /// Called before a message is stored in the delivery queue.
+    /// Return HookAction::Reject to prevent delivery.
+    fn on_message_enqueue(&self, event: &MessageEvent) -> HookAction {
+        HookAction::Continue
+    }
+
+    /// Called after a batch of messages is enqueued.
+    fn on_batch_enqueue(&self, events: &[MessageEvent]) {}
+
+    /// Called after a successful or failed login attempt.
+    fn on_auth(&self, event: &AuthEvent) {}
+
+    /// Called after a channel is created or looked up.
+    fn on_channel_created(&self, event: &ChannelEvent) {}
+
+    /// Called after messages are fetched from the delivery queue.
+    fn on_fetch(&self, event: &FetchEvent) {}
+
+    /// Called when a user completes OPAQUE registration.
+    fn on_user_registered(&self, username: &str, identity_key: &[u8]) {}
+}
+```
+
+All methods have default no-op implementations. Override only the events you
+care about.
+
+---
+
+## Hook action
+
+```rust,ignore
+pub enum HookAction {
+    /// Allow the operation to proceed.
+    Continue,
+    /// Reject the operation with a reason (returned to the client as an error).
+    Reject(String),
+}
+```
+
+Currently only `on_message_enqueue` can reject operations. Other hooks are
+observational (fire-and-forget).
+
+---
+
+## Event types
+
+### `MessageEvent`
+
+Fired on `enqueue` and `batch_enqueue` RPC calls.
+
+| Field              | Type              | Description |
+|--------------------|-------------------|-------------|
+| `sender_identity`  | `Option<Vec<u8>>` | Sender's 32-byte identity key (None in sealed sender mode). |
+| `recipient_key`    | `Vec<u8>`         | Recipient's 32-byte identity key. |
+| `channel_id`       | `Vec<u8>`         | 16-byte channel ID. |
+| `payload_len`      | `usize`           | Length of the encrypted payload in bytes. |
+| `seq`              | `u64`             | Server-assigned sequence number. |
+
+### `AuthEvent`
+
+Fired after OPAQUE login completes (success or failure).
+
+| Field            | Type     | Description |
+|------------------|----------|-------------|
+| `username`       | `String` | The username that attempted to authenticate. |
+| `success`        | `bool`   | Whether authentication succeeded. |
+| `failure_reason` | `String` | Failure reason (empty on success). |
+
+### `ChannelEvent`
+
+Fired after a `createChannel` RPC call.
+
+| Field           | Type       | Description |
+|-----------------|------------|-------------|
+| `channel_id`    | `Vec<u8>`  | 16-byte channel ID. |
+| `initiator_key` | `Vec<u8>`  | Identity key of the channel initiator. |
+| `peer_key`      | `Vec<u8>`  | Identity key of the peer. |
+| `was_new`       | `bool`     | True if this is a newly created channel. |
+
+### `FetchEvent`
+
+Fired after a `fetch` or `fetchWait` RPC call.
+
+| Field           | Type       | Description |
+|-----------------|------------|-------------|
+| `recipient_key` | `Vec<u8>`  | Identity key of the fetcher. |
+| `channel_id`    | `Vec<u8>`  | Channel ID being fetched from. |
+| `message_count` | `usize`    | Number of messages returned. |
+
+---
+
+## Built-in implementations
+
+### `NoopHooks`
+
+Does nothing. This is the default when no hooks are configured.
+
+```rust,ignore
+pub struct NoopHooks;
+impl ServerHooks for NoopHooks {}
+```
+
+### `TracingHooks`
+
+Logs all events via the `tracing` crate at info/debug level.
+
+```rust,ignore
+pub struct TracingHooks;
+
+impl ServerHooks for TracingHooks {
+    fn on_message_enqueue(&self, event: &MessageEvent) -> HookAction {
+        tracing::info!(
+            recipient_prefix = %hex_prefix(&event.recipient_key),
+            payload_len = event.payload_len,
+            seq = event.seq,
+            "hook: message enqueued"
+        );
+        HookAction::Continue
+    }
+
+    fn on_auth(&self, event: &AuthEvent) {
+        if event.success {
+            tracing::info!(username = %event.username, "hook: login success");
+        } else {
+            tracing::warn!(
+                username = %event.username,
+                reason = %event.failure_reason,
+                "hook: login failure"
+            );
+        }
+    }
+    // ... other methods log similarly
+}
+```
+
+---
+
+## Writing a custom hook
+
+### Example: payload size limiter
+
+```rust,ignore
+use quicproquo_server::hooks::{ServerHooks, HookAction, MessageEvent};
+
+struct PayloadLimiter {
+    max_bytes: usize,
+}
+
+impl ServerHooks for PayloadLimiter {
+    fn on_message_enqueue(&self, event: &MessageEvent) -> HookAction {
+        if event.payload_len > self.max_bytes {
+            return HookAction::Reject(format!(
+                "payload too large: {} > {} bytes",
+                event.payload_len, self.max_bytes
+            ));
+        }
+        HookAction::Continue
+    }
+}
+```
+
+### Example: login auditor
+
+```rust,ignore
+use quicproquo_server::hooks::{ServerHooks, AuthEvent};
+
+struct LoginAuditor;
+
+impl ServerHooks for LoginAuditor {
+    fn on_auth(&self, event: &AuthEvent) {
+        if !event.success {
+            eprintln!(
+                "AUDIT: failed login for '{}': {}",
+                event.username, event.failure_reason
+            );
+        }
+    }
+}
+```
+
+### Example: composing multiple hooks
+
+```rust,ignore
+use quicproquo_server::hooks::*;
+
+struct CompositeHooks {
+    hooks: Vec<Box<dyn ServerHooks>>,
+}
+
+impl ServerHooks for CompositeHooks {
+    fn on_message_enqueue(&self, event: &MessageEvent) -> HookAction {
+        for hook in &self.hooks {
+            if let HookAction::Reject(reason) = hook.on_message_enqueue(event) {
+                return HookAction::Reject(reason);
+            }
+        }
+        HookAction::Continue
+    }
+
+    fn on_auth(&self, event: &AuthEvent) {
+        for hook in &self.hooks {
+            hook.on_auth(event);
+        }
+    }
+    // ... delegate other methods similarly
+}
+```
+
+---
+
+## Important considerations
+
+- **E2E encryption**: Message payloads are encrypted end-to-end. Hooks cannot
+  inspect plaintext content — they see only metadata (sender, recipient,
+  payload size, sequence number).
+- **Performance**: Hooks run synchronously in the RPC handler. A slow hook
+  blocks the RPC response. Use `tokio::spawn` for async work.
+- **Thread safety**: `ServerHooks` requires `Send + Sync`. Use `Arc<Mutex<_>>`
+  or lock-free structures for shared mutable state.
+- **Reject semantics**: Only `on_message_enqueue` supports rejection. Other
+  hooks are informational — the operation proceeds regardless of what the hook
+  does.
+
+---
+
+## Further reading
+
+- [Delivery Service Internals](delivery-service.md) -- how messages flow through the server
+- [Authentication Service Internals](authentication-service.md) -- OPAQUE auth flow
+- [Bot SDK](../getting-started/bot-sdk.md) -- build bots that interact with the server