quicproquo/docs/operations/scaling-guide.md

# Scaling Guide

This document covers resource sizing, scaling triggers, and capacity planning for quicproquo deployments.

## Architecture Overview

quicproquo runs as a single-process server handling QUIC connections. Key resource consumers:

- **CPU**: TLS 1.3 handshakes (QUIC), OPAQUE PAKE authentication, message routing
- **Memory**: In-memory session state (DashMap), QUIC connection state, delivery waiters, rate limit entries
- **Disk I/O**: SQLCipher reads/writes (WAL mode), blob storage, KT Merkle log
- **Network**: QUIC (UDP), metrics HTTP, optional WebSocket bridge

## Single-Node Sizing

### Minimum (Development / Small Team)

| Resource | Value |
|----------|-------|
| CPU | 1 vCPU |
| Memory | 512 MB |
| Disk | 10 GB SSD |
| Network | 100 Mbps |

Supports ~100 concurrent users, light message traffic.

### Recommended (Production / Small-Medium)

| Resource | Value |
|----------|-------|
| CPU | 2-4 vCPU |
| Memory | 2-4 GB |
| Disk | 50-100 GB NVMe SSD |
| Network | 1 Gbps |

Supports ~1,000-5,000 concurrent users.

### Large (High Traffic)

| Resource | Value |
|----------|-------|
| CPU | 8+ vCPU |
| Memory | 8-16 GB |
| Disk | 500 GB+ NVMe SSD (RAID 10) |
| Network | 10 Gbps |

Supports ~10,000+ concurrent users.

## Scaling Triggers

Monitor these metrics and scale when thresholds are exceeded:

| Metric | Warning | Critical | Action |
|--------|---------|----------|--------|
| CPU usage | > 70% sustained (5 min) | > 90% sustained | Add CPU or scale horizontally |
| Memory usage | > 75% | > 90% | Increase memory, check for leaks |
| Disk usage | > 70% | > 90% | Expand volume, clean old data |
| Disk I/O latency | > 5 ms p95 | > 20 ms p95 | Move to faster storage |
| `delivery_queue_depth` | > 10,000 | > 100,000 | Investigate stale queues |
| `rate_limit_hit_total` rate | > 100/min | > 1000/min | Investigate abuse, adjust limits |
| `auth_login_failure_total` rate | > 50/min | > 500/min | Potential brute force attack |
| Connection count | > 80% of `max_concurrent_bidi_streams` | > 95% | Scale horizontally |
| TLS handshake latency | > 100 ms p95 | > 500 ms p95 | Add CPU, check network |

## Vertical Scaling

### CPU Scaling

The server is async (Tokio) and benefits from multiple cores. QUIC TLS handshakes and OPAQUE computations are CPU-intensive.

```bash
# Check current CPU usage
top -bn1 -p $(pgrep qpq-server)

# For Docker: increase CPU limits
# docker-compose.prod.yml:
#   deploy:
#     resources:
#       limits:
#         cpus: '4'
```

### Memory Scaling

In-memory state scales linearly with concurrent connections:
- ~2-5 KB per active QUIC connection (quinn state)
- ~200 bytes per session entry (DashMap)
- ~100 bytes per rate limit entry
- ~100 bytes per delivery waiter

```bash
# Estimate memory for 10,000 connections:
# 10,000 * 5 KB = ~50 MB for connections
# 10,000 * 500 bytes = ~5 MB for sessions/rate limits
# SQLCipher connection pool: ~50 MB (4 connections, caches)
# Base process: ~30 MB
# Total: ~135 MB + headroom = 256-512 MB minimum
```

### Disk I/O Scaling

SQLCipher uses WAL mode for concurrent reads. For write-heavy workloads:

```bash
# Check current I/O
iostat -x 1 5

# Move to NVMe if on spinning disk
# Increase WAL autocheckpoint threshold for burst writes
sqlite3 data/qpq.db "PRAGMA key='${QPQ_DB_KEY}'; PRAGMA wal_autocheckpoint=2000;"
```

## Horizontal Scaling

quicproquo does not yet have built-in multi-node clustering. For horizontal scaling, use these patterns:

### Load Balancer (UDP/QUIC)

Place a UDP load balancer in front of multiple qpq-server instances. Each instance runs independently with its own database.

```
                    +-----------+
    clients ------> | L4 LB     | ----> qpq-server-1 (db-1)
                    | (UDP/QUIC)| ----> qpq-server-2 (db-2)
                    +-----------+      qpq-server-3 (db-3)
```

**Requirements:**
- Sticky sessions (by client IP or QUIC connection ID) so a client always reaches the same node
- Shared storage backend or federation between nodes

### Federation for Multi-Node

Enable federation to relay messages between nodes:

```toml
# qpq-server.toml on node-1
[federation]
enabled = true
domain = "node1.chat.example.com"
listen = "0.0.0.0:7001"
federation_cert = "data/federation-cert.der"
federation_key = "data/federation-key.der"
federation_ca = "data/federation-ca.der"

[[federation.peers]]
domain = "node2.chat.example.com"
address = "10.0.1.2:7001"
```

### Shared Database (PostgreSQL)

For true horizontal scaling, migrate from SQLCipher to a shared PostgreSQL instance. This is not yet implemented but is the planned approach for multi-node deployments.

```
    qpq-server-1 --\
    qpq-server-2 ---+--> PostgreSQL (shared)
    qpq-server-3 --/
```

## Connection Tuning

The server has these QUIC transport defaults:

| Parameter | Default | Tunable |
|-----------|---------|---------|
| Max idle timeout | 300s (5 min) | Code change required |
| Max concurrent bidi streams | 1 per connection | Code change required |
| Max concurrent uni streams | 0 | Code change required |
| SQLCipher connection pool | 4 connections | Code change required |

For high connection counts, consider:
- Increasing the OS file descriptor limit: `ulimit -n 65536`
- Increasing UDP buffer sizes:

```bash
# /etc/sysctl.d/99-qpq.conf
net.core.rmem_max = 26214400
net.core.wmem_max = 26214400
net.core.rmem_default = 1048576
net.core.wmem_default = 1048576
```

```bash
sysctl -p /etc/sysctl.d/99-qpq.conf
```

## Docker Resource Limits

```yaml
# docker-compose.prod.yml
services:
  server:
    deploy:
      resources:
        limits:
          cpus: '4'
          memory: 4G
        reservations:
          cpus: '2'
          memory: 1G
    ulimits:
      nofile:
        soft: 65536
        hard: 65536
```

## Load Testing

Use the included test infrastructure to benchmark:

```bash
# Build the test client
cargo build --release --bin qpq-client

# Run concurrent connection test (example)
for i in $(seq 1 100); do
  qpq-client --server 127.0.0.1:7000 --auth-token "$QPQ_AUTH_TOKEN" &
done
wait

# Monitor during load test
watch -n1 'curl -s http://localhost:9090/metrics | grep -E "enqueue_total|fetch_total|delivery_queue_depth|rate_limit"'
```

## Capacity Planning Worksheet

| Parameter | Your Value |
|-----------|-----------|
| Expected concurrent users | |
| Messages per user per hour | |
| Average message size (bytes) | |
| Blob uploads per day | |
| Average blob size (MB) | |
| Data retention (days) | |

**Formulas:**

```
Storage per day = (users * msgs/hr * 24 * avg_msg_size) + (blob_uploads * avg_blob_size)
DB growth per month = storage_per_day * 30
Memory estimate = (concurrent_users * 5 KB) + 256 MB base
CPU estimate = 1 vCPU per ~2,500 concurrent connections (depends on message rate)
```