feat: add run replay for archetype effectiveness analysis
- archeflow-decision.sh records decision points during runs - archeflow-replay.sh: timeline, whatif, compare commands - What-if replay with adjustable archetype weights - /af-replay skill for interactive use - Tests in archeflow-replay.bats
This commit is contained in:
42
skills/af-replay/SKILL.md
Normal file
42
skills/af-replay/SKILL.md
Normal file
@@ -0,0 +1,42 @@
|
||||
---
|
||||
name: af-replay
|
||||
description: "Replay and analyze a recorded ArcheFlow run: decision timeline and weighted what-if. Usage: /af-replay <run-id> [--timeline|--whatif|--compare] [--weights arch=w,...]"
|
||||
user-invocable: true
|
||||
---
|
||||
|
||||
# ArcheFlow Run Replay
|
||||
|
||||
Inspect a completed or in-progress run logged in `.archeflow/events/<run_id>.jsonl`. Use this to study which archetypes drove outcomes and to simulate **weighted** consensus (what-if).
|
||||
|
||||
## Recording (during PDCA)
|
||||
|
||||
After each meaningful orchestration choice, log a **decision point** (in addition to `review.verdict` where applicable):
|
||||
|
||||
```bash
|
||||
./lib/archeflow-decision.sh <run_id> <phase> <archetype> '<input_summary>' '<decision>' <confidence> [parent_seq]
|
||||
```
|
||||
|
||||
Fields stored: `phase`, `archetype`, `input`, `decision`, `confidence`, `ts` (event timestamp). The event type is `decision.point`.
|
||||
|
||||
Lower-level alternative:
|
||||
|
||||
```bash
|
||||
./lib/archeflow-event.sh "$RUN_ID" decision.point check guardian \
|
||||
'{"archetype":"guardian","input":"diff","decision":"needs_changes","confidence":0.85}' 7
|
||||
```
|
||||
|
||||
## Commands (from project root)
|
||||
|
||||
| Action | Shell |
|
||||
|--------|--------|
|
||||
| Timeline | `./lib/archeflow-replay.sh timeline <run_id>` |
|
||||
| What-if | `./lib/archeflow-replay.sh whatif <run_id> [--weights guardian=2,sage=0.5] [--threshold 0.5] [--json]` |
|
||||
| Both | `./lib/archeflow-replay.sh compare <run_id> [--weights ...]` |
|
||||
|
||||
- **Timeline** lists `decision.point` rows and `review.verdict` (check phase).
|
||||
- **What-if** reads the **last** `review.verdict` per archetype in check. **Original** outcome uses strict any-veto (any non-approve → BLOCK). **Replay** uses weighted mean strictness: each reviewer contributes weight × (1 if not approved, else 0); BLOCK if mean ≥ threshold (default 0.5).
|
||||
- **`--json`** emits machine-readable output for dashboards or scripts.
|
||||
|
||||
## Learning effectiveness
|
||||
|
||||
Correlate `decision.point` confidence and verdicts with cycle outcomes (`cycle.boundary`, `run.complete`) and `./lib/archeflow-score.sh extract` to see which archetypes add signal for which task shapes.
|
||||
@@ -352,6 +352,7 @@ Emit events via `./lib/archeflow-event.sh <run_id> <type> <phase> <agent> '<json
|
||||
| After agent returns | `agent.complete` | archetype, duration_ms, artifacts, summary |
|
||||
| Phase boundary | `phase.transition` | from, to, artifacts_so_far |
|
||||
| Alternative chosen | `decision` | what, chosen, alternatives, rationale |
|
||||
| Orchestrator decision (replay) | `decision.point` | archetype, input, decision, confidence — use `./lib/archeflow-decision.sh` |
|
||||
| Reviewer verdict | `review.verdict` | archetype, verdict, findings[] |
|
||||
| Fix addressing review | `fix.applied` | source, finding, file, line |
|
||||
| End of PDCA cycle | `cycle.boundary` | cycle, max_cycles, exit_condition, convergence |
|
||||
@@ -403,6 +404,12 @@ Scores stored in `.archeflow/memory/effectiveness.jsonl`. After 10+ runs, recomm
|
||||
|
||||
---
|
||||
|
||||
## Run replay (decision log + what-if)
|
||||
|
||||
After key choices (routing, fast-path skip, escalation), emit `decision.point` via `./lib/archeflow-decision.sh` so runs can be inspected with `./lib/archeflow-replay.sh timeline|whatif|compare <run_id>`. Weighted what-if helps estimate how much each review archetype swayed the effective ship/block outcome. See skill `af-replay`.
|
||||
|
||||
---
|
||||
|
||||
## Dry-Run Mode
|
||||
|
||||
When `--dry-run`: Run Plan phase only. Display workflow, agent counts, confidence scores, cost estimate. Ask user to proceed. If yes, continue with `--start-from do`.
|
||||
|
||||
@@ -7,7 +7,7 @@ description: Use at session start when implementing features, reviewing code, de
|
||||
|
||||
On activation, print ONE line then proceed silently:
|
||||
```
|
||||
archeflow v0.8.0 · 23 skills · <domain> domain
|
||||
archeflow v0.9.0 · 24 skills · <domain> domain
|
||||
```
|
||||
Domain auto-detected: `writing` if `colette.yaml` exists, `research` if paper/thesis files, `code` otherwise.
|
||||
|
||||
@@ -46,6 +46,7 @@ Do NOT use for: single-line fixes, questions, reading/exploring, config tweaks,
|
||||
| `/af-memory` | Cross-run lesson memory |
|
||||
| `/af-fanout` | Colette book fanout via agents |
|
||||
| `/af-dag` | DAG of current/last run |
|
||||
| `/af-replay <run_id>` | Decision timeline + weighted what-if on recorded events |
|
||||
|
||||
## Mini-Reflect Fallback
|
||||
|
||||
|
||||
Reference in New Issue
Block a user