Files
claude-archeflow-plugin/skills/af-replay/SKILL.md
Christian Nennemann 4f8e2a9962 feat: add run replay for archetype effectiveness analysis
- archeflow-decision.sh records decision points during runs
- archeflow-replay.sh: timeline, whatif, compare commands
- What-if replay with adjustable archetype weights
- /af-replay skill for interactive use
- Tests in archeflow-replay.bats
2026-04-06 21:43:29 +02:00

2.0 KiB
Raw Blame History

name, description, user-invocable
name description user-invocable
af-replay Replay and analyze a recorded ArcheFlow run: decision timeline and weighted what-if. Usage: /af-replay <run-id> [--timeline|--whatif|--compare] [--weights arch=w,...] true

ArcheFlow Run Replay

Inspect a completed or in-progress run logged in .archeflow/events/<run_id>.jsonl. Use this to study which archetypes drove outcomes and to simulate weighted consensus (what-if).

Recording (during PDCA)

After each meaningful orchestration choice, log a decision point (in addition to review.verdict where applicable):

./lib/archeflow-decision.sh <run_id> <phase> <archetype> '<input_summary>' '<decision>' <confidence> [parent_seq]

Fields stored: phase, archetype, input, decision, confidence, ts (event timestamp). The event type is decision.point.

Lower-level alternative:

./lib/archeflow-event.sh "$RUN_ID" decision.point check guardian \
  '{"archetype":"guardian","input":"diff","decision":"needs_changes","confidence":0.85}' 7

Commands (from project root)

Action Shell
Timeline ./lib/archeflow-replay.sh timeline <run_id>
What-if ./lib/archeflow-replay.sh whatif <run_id> [--weights guardian=2,sage=0.5] [--threshold 0.5] [--json]
Both ./lib/archeflow-replay.sh compare <run_id> [--weights ...]
  • Timeline lists decision.point rows and review.verdict (check phase).
  • What-if reads the last review.verdict per archetype in check. Original outcome uses strict any-veto (any non-approve → BLOCK). Replay uses weighted mean strictness: each reviewer contributes weight × (1 if not approved, else 0); BLOCK if mean ≥ threshold (default 0.5).
  • --json emits machine-readable output for dashboards or scripts.

Learning effectiveness

Correlate decision.point confidence and verdicts with cycle outcomes (cycle.boundary, run.complete) and ./lib/archeflow-score.sh extract to see which archetypes add signal for which task shapes.