Files
claude-archeflow-plugin/skills/af-replay/SKILL.md
Christian Nennemann 4f8e2a9962 feat: add run replay for archetype effectiveness analysis
- archeflow-decision.sh records decision points during runs
- archeflow-replay.sh: timeline, whatif, compare commands
- What-if replay with adjustable archetype weights
- /af-replay skill for interactive use
- Tests in archeflow-replay.bats
2026-04-06 21:43:29 +02:00

43 lines
2.0 KiB
Markdown
Raw Blame History

This file contains ambiguous Unicode characters
This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.
---
name: af-replay
description: "Replay and analyze a recorded ArcheFlow run: decision timeline and weighted what-if. Usage: /af-replay <run-id> [--timeline|--whatif|--compare] [--weights arch=w,...]"
user-invocable: true
---
# ArcheFlow Run Replay
Inspect a completed or in-progress run logged in `.archeflow/events/<run_id>.jsonl`. Use this to study which archetypes drove outcomes and to simulate **weighted** consensus (what-if).
## Recording (during PDCA)
After each meaningful orchestration choice, log a **decision point** (in addition to `review.verdict` where applicable):
```bash
./lib/archeflow-decision.sh <run_id> <phase> <archetype> '<input_summary>' '<decision>' <confidence> [parent_seq]
```
Fields stored: `phase`, `archetype`, `input`, `decision`, `confidence`, `ts` (event timestamp). The event type is `decision.point`.
Lower-level alternative:
```bash
./lib/archeflow-event.sh "$RUN_ID" decision.point check guardian \
'{"archetype":"guardian","input":"diff","decision":"needs_changes","confidence":0.85}' 7
```
## Commands (from project root)
| Action | Shell |
|--------|--------|
| Timeline | `./lib/archeflow-replay.sh timeline <run_id>` |
| What-if | `./lib/archeflow-replay.sh whatif <run_id> [--weights guardian=2,sage=0.5] [--threshold 0.5] [--json]` |
| Both | `./lib/archeflow-replay.sh compare <run_id> [--weights ...]` |
- **Timeline** lists `decision.point` rows and `review.verdict` (check phase).
- **What-if** reads the **last** `review.verdict` per archetype in check. **Original** outcome uses strict any-veto (any non-approve → BLOCK). **Replay** uses weighted mean strictness: each reviewer contributes weight × (1 if not approved, else 0); BLOCK if mean ≥ threshold (default 0.5).
- **`--json`** emits machine-readable output for dashboards or scripts.
## Learning effectiveness
Correlate `decision.point` confidence and verdicts with cycle outcomes (`cycle.boundary`, `run.complete`) and `./lib/archeflow-score.sh extract` to see which archetypes add signal for which task shapes.