Files

Christian Nennemann 4f8e2a9962 feat: add run replay for archetype effectiveness analysis

- archeflow-decision.sh records decision points during runs
- archeflow-replay.sh: timeline, whatif, compare commands
- What-if replay with adjustable archetype weights
- /af-replay skill for interactive use
- Tests in archeflow-replay.bats

2026-04-06 21:43:29 +02:00

2.0 KiB

Raw Blame History

name, description, user-invocable

name	description	user-invocable
af-replay	Replay and analyze a recorded ArcheFlow run: decision timeline and weighted what-if. Usage: /af-replay <run-id> [--timeline\|--whatif\|--compare] [--weights arch=w,...]	true

ArcheFlow Run Replay

Inspect a completed or in-progress run logged in .archeflow/events/<run_id>.jsonl. Use this to study which archetypes drove outcomes and to simulate weighted consensus (what-if).

Recording (during PDCA)

After each meaningful orchestration choice, log a decision point (in addition to review.verdict where applicable):

./lib/archeflow-decision.sh <run_id> <phase> <archetype> '<input_summary>' '<decision>' <confidence> [parent_seq]

Fields stored: phase, archetype, input, decision, confidence, ts (event timestamp). The event type is decision.point.

Lower-level alternative:

./lib/archeflow-event.sh "$RUN_ID" decision.point check guardian \
  '{"archetype":"guardian","input":"diff","decision":"needs_changes","confidence":0.85}' 7

Commands (from project root)

Action	Shell
Timeline	`./lib/archeflow-replay.sh timeline <run_id>`
What-if	`./lib/archeflow-replay.sh whatif <run_id> [--weights guardian=2,sage=0.5] [--threshold 0.5] [--json]`
Both	`./lib/archeflow-replay.sh compare <run_id> [--weights ...]`

Timeline lists decision.point rows and review.verdict (check phase).
What-if reads the last review.verdict per archetype in check. Original outcome uses strict any-veto (any non-approve → BLOCK). Replay uses weighted mean strictness: each reviewer contributes weight × (1 if not approved, else 0); BLOCK if mean ≥ threshold (default 0.5).
--json emits machine-readable output for dashboards or scripts.

Learning effectiveness

Correlate decision.point confidence and verdicts with cycle outcomes (cycle.boundary, run.complete) and ./lib/archeflow-score.sh extract to see which archetypes add signal for which task shapes.

2.0 KiB Raw Blame History Unescape Escape

ArcheFlow Run Replay

Recording (during PDCA)

Commands (from project root)

Learning effectiveness

2.0 KiB

Raw Blame History