Add comprehensive operational documentation: - docs/operations/backup-restore.md: SQLCipher, file backend, blob backup/restore - docs/operations/key-rotation.md: auth token, TLS, federation, DB key, OPAQUE rotation - docs/operations/incident-response.md: playbook for common incidents - docs/operations/scaling-guide.md: resource sizing, scaling triggers, capacity planning - docs/operations/monitoring.md: Prometheus metrics, alert rules, log monitoring - docs/operations/dashboards/qpq-overview.json: Grafana dashboard template - docs/operations/prometheus.yml + alerts: Prometheus scrape and alert config - docs/operations/grafana-provisioning/: auto-provisioning for datasources and dashboards - docker-compose.prod.yml: production stack (server + Prometheus + Grafana) - .env.example: documented environment variable template
200 lines
5.5 KiB
Markdown
200 lines
5.5 KiB
Markdown
# Backup and Restore Procedures
|
|
|
|
This document covers backup and restore for all quicproquo server data stores.
|
|
|
|
## Data Inventory
|
|
|
|
| Data | Location | Backend | Contains |
|
|
|------|----------|---------|----------|
|
|
| SQLCipher DB | `QPQ_DB_PATH` (default `data/qpq.db`) | `store_backend=sql` | Users, key packages, delivery queues, sessions, KT log, OPAQUE setup, blobs metadata, moderation |
|
|
| File store | `QPQ_DATA_DIR` (default `data/`) | `store_backend=file` | Bincode-serialized key packages, delivery queues, server state |
|
|
| Blob storage | `QPQ_DATA_DIR/blobs/` | Filesystem | Uploaded file transfer blobs |
|
|
| TLS certificates | `QPQ_TLS_CERT`, `QPQ_TLS_KEY` | DER files | Server identity |
|
|
| OPAQUE ServerSetup | Inside DB or file store | Persisted | OPAQUE credential state (critical for auth) |
|
|
| Server signing key | Inside DB or file store | Persisted | Ed25519 key for delivery proofs |
|
|
| KT Merkle log | Inside DB or file store | Persisted | Key transparency audit log |
|
|
|
|
## SQLCipher Backup
|
|
|
|
### Hot Backup (Online)
|
|
|
|
SQLCipher supports the `.backup` command while the server is running (WAL mode allows concurrent readers).
|
|
|
|
```bash
|
|
# 1. Open the encrypted database with the same key
|
|
sqlite3 data/qpq.db
|
|
|
|
# 2. At the sqlite3 prompt, set the encryption key
|
|
PRAGMA key = 'your-db-key-here';
|
|
|
|
# 3. Perform an online backup
|
|
.backup /backups/qpq-$(date +%Y%m%d-%H%M%S).db
|
|
|
|
.quit
|
|
```
|
|
|
|
### Scripted Hot Backup
|
|
|
|
```bash
|
|
#!/bin/bash
|
|
set -euo pipefail
|
|
|
|
BACKUP_DIR="/backups/qpq"
|
|
DB_PATH="${QPQ_DB_PATH:-data/qpq.db}"
|
|
DB_KEY="${QPQ_DB_KEY}"
|
|
TIMESTAMP=$(date +%Y%m%d-%H%M%S)
|
|
BACKUP_FILE="${BACKUP_DIR}/qpq-${TIMESTAMP}.db"
|
|
|
|
mkdir -p "$BACKUP_DIR"
|
|
|
|
sqlite3 "$DB_PATH" <<EOF
|
|
PRAGMA key = '${DB_KEY}';
|
|
.backup ${BACKUP_FILE}
|
|
EOF
|
|
|
|
# Verify the backup is readable
|
|
sqlite3 "$BACKUP_FILE" "PRAGMA key = '${DB_KEY}'; PRAGMA integrity_check;" \
|
|
| grep -q "ok" && echo "Backup verified: $BACKUP_FILE" \
|
|
|| { echo "ERROR: backup verification failed"; exit 1; }
|
|
|
|
# Retain last 7 daily backups
|
|
find "$BACKUP_DIR" -name 'qpq-*.db' -mtime +7 -delete
|
|
```
|
|
|
|
### Cold Backup (Offline)
|
|
|
|
```bash
|
|
# 1. Stop the server
|
|
systemctl stop qpq-server # or docker compose stop server
|
|
|
|
# 2. Copy the database file
|
|
cp data/qpq.db /backups/qpq-$(date +%Y%m%d).db
|
|
|
|
# 3. Copy the WAL and SHM files if they exist
|
|
cp data/qpq.db-wal /backups/ 2>/dev/null || true
|
|
cp data/qpq.db-shm /backups/ 2>/dev/null || true
|
|
|
|
# 4. Restart the server
|
|
systemctl start qpq-server
|
|
```
|
|
|
|
## File Backend Backup
|
|
|
|
When using `store_backend=file`, data is stored as bincode files under `QPQ_DATA_DIR`.
|
|
|
|
```bash
|
|
# Full directory backup
|
|
tar czf /backups/qpq-data-$(date +%Y%m%d-%H%M%S).tar.gz \
|
|
-C "$(dirname "${QPQ_DATA_DIR:-data}")" \
|
|
"$(basename "${QPQ_DATA_DIR:-data}")"
|
|
```
|
|
|
|
## Blob Storage Backup
|
|
|
|
Blobs are stored in `QPQ_DATA_DIR/blobs/`. These are immutable once written.
|
|
|
|
```bash
|
|
# Incremental rsync (blobs are write-once, ideal for rsync)
|
|
rsync -av --progress data/blobs/ /backups/blobs/
|
|
```
|
|
|
|
## TLS Certificate Backup
|
|
|
|
```bash
|
|
# Back up TLS certificates (store separately from DB backups)
|
|
cp data/server-cert.der /backups/tls/server-cert.der
|
|
cp data/server-key.der /backups/tls/server-key.der
|
|
|
|
# Federation certs (if federation is enabled)
|
|
cp data/federation-cert.der /backups/tls/federation-cert.der 2>/dev/null || true
|
|
cp data/federation-key.der /backups/tls/federation-key.der 2>/dev/null || true
|
|
cp data/federation-ca.der /backups/tls/federation-ca.der 2>/dev/null || true
|
|
```
|
|
|
|
## Restore Procedures
|
|
|
|
### Restore SQLCipher Database
|
|
|
|
```bash
|
|
# 1. Stop the server
|
|
systemctl stop qpq-server
|
|
|
|
# 2. Move the current (corrupt/lost) database aside
|
|
mv data/qpq.db data/qpq.db.broken 2>/dev/null || true
|
|
rm -f data/qpq.db-wal data/qpq.db-shm
|
|
|
|
# 3. Copy the backup in place
|
|
cp /backups/qpq-20260304.db data/qpq.db
|
|
|
|
# 4. Verify integrity
|
|
sqlite3 data/qpq.db "PRAGMA key = '${QPQ_DB_KEY}'; PRAGMA integrity_check;"
|
|
|
|
# 5. Start the server (migrations will apply automatically if needed)
|
|
systemctl start qpq-server
|
|
```
|
|
|
|
### Restore File Backend
|
|
|
|
```bash
|
|
# 1. Stop the server
|
|
systemctl stop qpq-server
|
|
|
|
# 2. Replace the data directory
|
|
mv data data.broken 2>/dev/null || true
|
|
tar xzf /backups/qpq-data-20260304.tar.gz -C .
|
|
|
|
# 3. Restore TLS certs if not included in the data backup
|
|
cp /backups/tls/server-cert.der data/server-cert.der
|
|
cp /backups/tls/server-key.der data/server-key.der
|
|
|
|
# 4. Start the server
|
|
systemctl start qpq-server
|
|
```
|
|
|
|
### Restore Blobs Only
|
|
|
|
```bash
|
|
rsync -av /backups/blobs/ data/blobs/
|
|
```
|
|
|
|
## Backup Schedule Recommendations
|
|
|
|
| Frequency | What | Method |
|
|
|-----------|------|--------|
|
|
| Every 6 hours | SQLCipher database | Hot backup script via cron |
|
|
| Daily | File backend / full data dir | tar + offsite copy |
|
|
| Continuous | Blobs | rsync (incremental) |
|
|
| On change | TLS certificates | Manual + secret manager |
|
|
|
|
## Cron Example
|
|
|
|
```cron
|
|
# SQLCipher hot backup every 6 hours
|
|
0 */6 * * * /opt/qpq/scripts/backup-db.sh >> /var/log/qpq-backup.log 2>&1
|
|
|
|
# Full data directory daily at 02:00
|
|
0 2 * * * tar czf /backups/qpq-data-$(date +\%Y\%m\%d).tar.gz -C /var/lib quicproquo
|
|
|
|
# Blob sync every hour
|
|
0 * * * * rsync -a /var/lib/quicproquo/blobs/ /backups/blobs/
|
|
|
|
# Prune backups older than 30 days
|
|
0 3 * * 0 find /backups -name 'qpq-*' -mtime +30 -delete
|
|
```
|
|
|
|
## Verification
|
|
|
|
Always verify backups after creation:
|
|
|
|
```bash
|
|
# SQLCipher integrity check
|
|
sqlite3 /backups/qpq-latest.db \
|
|
"PRAGMA key = '${QPQ_DB_KEY}'; PRAGMA integrity_check; SELECT count(*) FROM users;"
|
|
|
|
# File backend: check the archive is valid
|
|
tar tzf /backups/qpq-data-latest.tar.gz > /dev/null
|
|
|
|
# TLS cert: check it parses and is not expired
|
|
openssl x509 -inform DER -in /backups/tls/server-cert.der -noout -dates
|
|
```
|