docs: add operational runbook, Grafana dashboard, and production docker-compose
Add comprehensive operational documentation: - docs/operations/backup-restore.md: SQLCipher, file backend, blob backup/restore - docs/operations/key-rotation.md: auth token, TLS, federation, DB key, OPAQUE rotation - docs/operations/incident-response.md: playbook for common incidents - docs/operations/scaling-guide.md: resource sizing, scaling triggers, capacity planning - docs/operations/monitoring.md: Prometheus metrics, alert rules, log monitoring - docs/operations/dashboards/qpq-overview.json: Grafana dashboard template - docs/operations/prometheus.yml + alerts: Prometheus scrape and alert config - docs/operations/grafana-provisioning/: auto-provisioning for datasources and dashboards - docker-compose.prod.yml: production stack (server + Prometheus + Grafana) - .env.example: documented environment variable template
This commit is contained in:
250
docs/operations/key-rotation.md
Normal file
250
docs/operations/key-rotation.md
Normal file
@@ -0,0 +1,250 @@
|
||||
# Key Rotation Procedures
|
||||
|
||||
This document provides step-by-step procedures for rotating all cryptographic material in a quicproquo deployment.
|
||||
|
||||
## Auth Token Rotation
|
||||
|
||||
The auth token (`QPQ_AUTH_TOKEN`) is used for bearer-token authentication (auth version 1). OPAQUE-authenticated sessions are not affected by token rotation.
|
||||
|
||||
### Procedure
|
||||
|
||||
```bash
|
||||
# 1. Generate a new token (minimum 16 characters for production)
|
||||
NEW_TOKEN=$(openssl rand -base64 32)
|
||||
echo "New token: $NEW_TOKEN"
|
||||
|
||||
# 2. Update the config file or environment
|
||||
# Option A: TOML config file
|
||||
sed -i "s/^auth_token = .*/auth_token = \"$NEW_TOKEN\"/" qpq-server.toml
|
||||
|
||||
# Option B: Environment variable (systemd)
|
||||
systemctl edit qpq-server --force
|
||||
# Add: Environment=QPQ_AUTH_TOKEN=<new-token>
|
||||
|
||||
# Option C: Docker Compose
|
||||
# Update QPQ_AUTH_TOKEN in docker-compose.prod.yml or .env file
|
||||
|
||||
# 3. Restart the server
|
||||
systemctl restart qpq-server
|
||||
# or: docker compose restart server
|
||||
|
||||
# 4. Update all clients with the new token
|
||||
# Clients using OPAQUE auth are unaffected.
|
||||
# Clients using bearer-token auth must update their QPQ_ACCESS_TOKEN.
|
||||
```
|
||||
|
||||
### Impact
|
||||
|
||||
- Active bearer-token sessions continue until they expire (sessions are in-memory).
|
||||
- New bearer-token connections must use the new token.
|
||||
- OPAQUE-authenticated clients are not affected.
|
||||
|
||||
## TLS Certificate Rotation
|
||||
|
||||
The server uses DER-encoded X.509 certificates for QUIC TLS 1.3. The server validates certificates at startup and warns if expiry is within 30 days.
|
||||
|
||||
### Procedure
|
||||
|
||||
```bash
|
||||
# 1. Obtain a new certificate (example with Let's Encrypt / certbot)
|
||||
certbot certonly --standalone -d chat.example.com
|
||||
|
||||
# 2. Convert PEM to DER format (qpq-server expects DER)
|
||||
openssl x509 -in /etc/letsencrypt/live/chat.example.com/fullchain.pem \
|
||||
-outform DER -out /tmp/server-cert.der
|
||||
|
||||
openssl pkey -in /etc/letsencrypt/live/chat.example.com/privkey.pem \
|
||||
-outform DER -out /tmp/server-key.der
|
||||
|
||||
# 3. Set restrictive permissions on the private key
|
||||
chmod 600 /tmp/server-key.der
|
||||
|
||||
# 4. Back up the current certificates
|
||||
cp data/server-cert.der data/server-cert.der.bak
|
||||
cp data/server-key.der data/server-key.der.bak
|
||||
|
||||
# 5. Replace certificates
|
||||
cp /tmp/server-cert.der data/server-cert.der
|
||||
cp /tmp/server-key.der data/server-key.der
|
||||
|
||||
# 6. Verify the new certificate
|
||||
openssl x509 -inform DER -in data/server-cert.der -noout -text | head -20
|
||||
|
||||
# 7. Restart the server (QUIC requires restart for new TLS config)
|
||||
systemctl restart qpq-server
|
||||
|
||||
# 8. Verify the server started with the new certificate
|
||||
journalctl -u qpq-server --since "1 min ago" | grep -i tls
|
||||
```
|
||||
|
||||
### Self-Signed Certificate (Development)
|
||||
|
||||
In non-production mode, the server auto-generates a self-signed certificate if none exists. To force regeneration:
|
||||
|
||||
```bash
|
||||
rm data/server-cert.der data/server-key.der
|
||||
systemctl restart qpq-server
|
||||
# Server will generate a new self-signed cert for localhost/127.0.0.1/::1
|
||||
```
|
||||
|
||||
### Automated Renewal with Certbot
|
||||
|
||||
```bash
|
||||
#!/bin/bash
|
||||
# /opt/qpq/scripts/renew-cert.sh
|
||||
set -euo pipefail
|
||||
|
||||
DOMAIN="chat.example.com"
|
||||
CERT_DIR="/etc/letsencrypt/live/$DOMAIN"
|
||||
QPQ_DATA="/var/lib/quicproquo"
|
||||
|
||||
certbot renew --quiet
|
||||
|
||||
openssl x509 -in "$CERT_DIR/fullchain.pem" -outform DER -out "$QPQ_DATA/server-cert.der"
|
||||
openssl pkey -in "$CERT_DIR/privkey.pem" -outform DER -out "$QPQ_DATA/server-key.der"
|
||||
chmod 600 "$QPQ_DATA/server-key.der"
|
||||
chown qpq:qpq "$QPQ_DATA/server-cert.der" "$QPQ_DATA/server-key.der"
|
||||
|
||||
systemctl restart qpq-server
|
||||
```
|
||||
|
||||
```cron
|
||||
# Run cert renewal check twice daily
|
||||
0 3,15 * * * /opt/qpq/scripts/renew-cert.sh >> /var/log/qpq-cert-renew.log 2>&1
|
||||
```
|
||||
|
||||
## Federation Certificate Rotation
|
||||
|
||||
Federation uses mutual TLS (mTLS) with a shared CA for server-to-server authentication.
|
||||
|
||||
### Procedure
|
||||
|
||||
```bash
|
||||
# 1. Generate a new federation certificate signed by the federation CA
|
||||
openssl req -new -nodes -keyout /tmp/federation-key.pem \
|
||||
-out /tmp/federation.csr -subj "/CN=chat.example.com"
|
||||
|
||||
openssl x509 -req -in /tmp/federation.csr \
|
||||
-CA federation-ca.pem -CAkey federation-ca-key.pem \
|
||||
-CAcreateserial -days 365 -out /tmp/federation-cert.pem
|
||||
|
||||
# 2. Convert to DER
|
||||
openssl x509 -in /tmp/federation-cert.pem -outform DER -out data/federation-cert.der
|
||||
openssl pkey -in /tmp/federation-key.pem -outform DER -out data/federation-key.der
|
||||
chmod 600 data/federation-key.der
|
||||
|
||||
# 3. Restart the server
|
||||
systemctl restart qpq-server
|
||||
|
||||
# 4. Coordinate with federation peers: they must trust the same CA
|
||||
```
|
||||
|
||||
## Database Encryption Key Rotation
|
||||
|
||||
The SQLCipher database key (`QPQ_DB_KEY`) encrypts all data at rest.
|
||||
|
||||
### Procedure (SQLCipher PRAGMA rekey)
|
||||
|
||||
```bash
|
||||
# 1. Stop the server
|
||||
systemctl stop qpq-server
|
||||
|
||||
# 2. Back up the database
|
||||
cp data/qpq.db /backups/qpq-pre-rekey-$(date +%Y%m%d).db
|
||||
|
||||
# 3. Rekey the database
|
||||
sqlite3 data/qpq.db <<EOF
|
||||
PRAGMA key = 'old-encryption-key';
|
||||
PRAGMA rekey = 'new-encryption-key';
|
||||
EOF
|
||||
|
||||
# 4. Verify the database opens with the new key
|
||||
sqlite3 data/qpq.db "PRAGMA key = 'new-encryption-key'; PRAGMA integrity_check;"
|
||||
|
||||
# 5. Update the environment/config with the new key
|
||||
# Option A: systemd
|
||||
systemctl edit qpq-server --force
|
||||
# Environment=QPQ_DB_KEY=new-encryption-key
|
||||
|
||||
# Option B: Docker Compose .env
|
||||
echo "QPQ_DB_KEY=new-encryption-key" >> .env
|
||||
|
||||
# 6. Start the server
|
||||
systemctl start qpq-server
|
||||
```
|
||||
|
||||
### Full Re-encryption (Alternative)
|
||||
|
||||
If `PRAGMA rekey` is unavailable or you want a fresh database file:
|
||||
|
||||
```bash
|
||||
# 1. Stop the server and back up
|
||||
systemctl stop qpq-server
|
||||
cp data/qpq.db /backups/qpq-pre-rekey.db
|
||||
|
||||
# 2. Export with old key, import with new key
|
||||
sqlite3 data/qpq.db "PRAGMA key='old-key'; .dump" | \
|
||||
sqlite3 data/qpq-new.db "PRAGMA key='new-key'; .read /dev/stdin"
|
||||
|
||||
# 3. Replace the database
|
||||
mv data/qpq-new.db data/qpq.db
|
||||
|
||||
# 4. Update config and restart
|
||||
systemctl start qpq-server
|
||||
```
|
||||
|
||||
## OPAQUE ServerSetup Rotation
|
||||
|
||||
The OPAQUE ServerSetup is generated once and persisted. Rotating it invalidates all registered OPAQUE credentials.
|
||||
|
||||
**WARNING: Rotating the OPAQUE ServerSetup requires all users to re-register. Only do this if the setup is compromised.**
|
||||
|
||||
```bash
|
||||
# 1. Stop the server
|
||||
systemctl stop qpq-server
|
||||
|
||||
# 2. Back up the database
|
||||
cp data/qpq.db /backups/qpq-pre-opaque-rotate.db
|
||||
|
||||
# 3. Delete the persisted OPAQUE setup
|
||||
# For SQL backend:
|
||||
sqlite3 data/qpq.db "PRAGMA key='${QPQ_DB_KEY}'; DELETE FROM server_state WHERE key = 'opaque_setup';"
|
||||
|
||||
# For file backend:
|
||||
rm data/opaque_setup.bin 2>/dev/null || true
|
||||
|
||||
# 4. Start the server (it will generate a new OPAQUE ServerSetup)
|
||||
systemctl start qpq-server
|
||||
|
||||
# 5. All users must re-register (existing OPAQUE credentials are invalid)
|
||||
```
|
||||
|
||||
## Server Signing Key Rotation
|
||||
|
||||
The Ed25519 signing key is used for delivery proofs. Rotating it means old delivery proofs cannot be verified against the new key.
|
||||
|
||||
```bash
|
||||
# 1. Stop the server
|
||||
systemctl stop qpq-server
|
||||
|
||||
# 2. Back up
|
||||
cp data/qpq.db /backups/qpq-pre-sigkey-rotate.db
|
||||
|
||||
# 3. Delete the persisted signing key seed
|
||||
# For SQL backend:
|
||||
sqlite3 data/qpq.db "PRAGMA key='${QPQ_DB_KEY}'; DELETE FROM server_state WHERE key = 'signing_key_seed';"
|
||||
|
||||
# 4. Start the server (generates a new Ed25519 signing key)
|
||||
systemctl start qpq-server
|
||||
```
|
||||
|
||||
## Rotation Schedule
|
||||
|
||||
| Key Material | Rotation Frequency | Impact |
|
||||
|---|---|---|
|
||||
| Auth token | Quarterly or on compromise | Clients using bearer auth must update |
|
||||
| TLS certificate | Before expiry (automate with certbot) | Server restart required |
|
||||
| Federation cert | Annually or before expiry | Coordinate with peers |
|
||||
| DB encryption key | Annually or on compromise | Server downtime required |
|
||||
| OPAQUE ServerSetup | Only on compromise | All users must re-register |
|
||||
| Server signing key | Only on compromise | Old delivery proofs unverifiable |
|
||||
Reference in New Issue
Block a user