Files
claude-archeflow-plugin/skills/run/SKILL.md
Christian Nennemann 1e96d87f49 feat: introduce Wiggum Break as named circuit breaker
Replaces generic "circuit breaker" with "Wiggum Break" — policy enforcement
halt condition named after Chief Wiggum (policy + Ralph Loop's dad).
Hard breaks (immediate halt) and soft breaks (finish then halt) with
wiggum.break event type. Updated both papers and shadow-detection skill.
2026-04-08 05:19:35 +02:00

468 lines
18 KiB
Markdown

---
name: run
description: |
Start an ArcheFlow PDCA run. Usage: /af-run <task description> [--workflow fast|standard|thorough] [--dry-run] [--start-from plan|do|check|act]
---
# ArcheFlow Run — PDCA Orchestration
One command runs the full cycle: Plan (Explorer+Creator) -> Do (Maker in worktree) -> Check (Guardian first, then others) -> Act (collect findings, route fixes, exit or cycle).
## 0. Initialize
1. Generate run ID: `<YYYY-MM-DD>-<task-slug>`
2. Create artifact directory: `mkdir -p .archeflow/artifacts/<run_id>`
3. Verify `./lib/archeflow-*.sh` scripts exist before proceeding
4. Inject cross-run memory: `./lib/archeflow-memory.sh inject "$DOMAIN" "" --audit "$RUN_ID"`
5. Read `.archeflow/config.yaml` models section. Resolution order: per-workflow per-archetype > per-workflow default > per-archetype > global default.
6. Emit `run.start` event
### Strategy Selection
Determine strategy from CLI flag `--strategy`, config `strategy:` field, or auto-detect:
| Signal | Strategy |
|--------|----------|
| Task contains fix/bug/patch/hotfix | `pipeline` |
| Task contains refactor/redesign/review | `pdca` |
| Workflow is `fast` with single file | `pipeline` |
| Workflow is `thorough` | `pdca` |
| Default | `pdca` |
If `pipeline`, skip to the Pipeline section at the end. Otherwise continue with PDCA below.
### Workflow Selection
| Signal | Workflow | Max Cycles |
|--------|----------|------------|
| Small fix, low risk, single concern | `fast` | 1 |
| Feature, multiple files, moderate risk | `standard` | 2 |
| Security-sensitive, breaking changes, public API | `thorough` | 3 |
---
## Attention Filters
Each agent receives only what it needs. This is the canonical reference:
| Archetype | Receives | Excludes |
|-----------|----------|----------|
| Explorer | Task description, codebase access | Prior proposals, reviews, diffs |
| Creator | Task + Explorer output (+ feedback cycle 2+) | Raw files, diffs, reviewer outputs |
| Maker | Creator's proposal (+ Maker-routed feedback cycle 2+) | Explorer research, reviewer outputs |
| Guardian | Maker's diff + proposal risk section | Full proposal, Explorer research |
| Skeptic | Creator's proposal (assumptions focus) | Diff details, Explorer research |
| Sage | Proposal + diff + implementation summary | Explorer research, other reviews |
| Trickster | Maker's diff only | Everything else |
---
## Status Token Protocol
Every agent ends output with `STATUS: <token>`. Parse it to decide the next action.
| Status | Action |
|--------|--------|
| `DONE` | Proceed to next phase |
| `DONE_WITH_CONCERNS` | Log concerns, proceed |
| `NEEDS_CONTEXT` | Pause, request info from user |
| `BLOCKED` | Abort phase, report blocker |
If no status token found, default to `DONE`.
---
## 1. Plan Phase
### 1a. Explorer (standard/thorough only)
```
Agent(
description: "Explorer: research context for <task>",
prompt: "You are the EXPLORER archetype.
<task description>
Research: 1) affected files/functions, 2) dependencies, 3) test coverage,
4) codebase patterns. Write a structured research report.
Be thorough but focused — no rabbit holes.",
subagent_type: "Explore"
)
```
Save output to `.archeflow/artifacts/<run_id>/plan-explorer.md`.
### 1b. Creator
Fast workflow (no Explorer): Creator must perform Mini-Reflect first:
1. Restate the task in one sentence
2. List 3 assumptions
3. Name the highest-damage risk
```
Agent(
description: "Creator: design proposal for <task>",
prompt: "You are the CREATOR archetype.
<task description>
<if fast: Perform Mini-Reflect first (restate task, 3 assumptions, top risk)>
<if standard/thorough: Research findings: <plan-explorer.md contents>>
<if cycle 2+: Prior feedback: <Creator-routed section of act-feedback.md>>
Design a proposal:
1. Architecture decisions (with rationale)
2. Files to create/modify (exact paths, specific changes, 2-5 min per item)
3. Alternatives considered (2+, with rejection rationale)
4. Test strategy (specific test cases)
5. Confidence table (task understanding, solution completeness, risk coverage — each 0.0-1.0)
6. Risks and mitigations
<if cycle 2+: 7. How each prior issue was addressed (Fixed/Deferred/Accepted/Disputed)>
Be decisive — ship a clear plan, not a menu.",
subagent_type: "Plan"
)
```
Save output to `.archeflow/artifacts/<run_id>/plan-creator.md`.
### 1c. Confidence Gate (Rule A3)
Parse the `### Confidence` table from `plan-creator.md`. If unparseable, default to 0.0 (triggers gate).
| Axis | Score < 0.5 | Action |
|------|-------------|--------|
| Task understanding | Pause | Ask user for clarification. Do not spawn Maker. |
| Solution completeness | Upgrade | If fast -> standard. Spawn Explorer, re-run Creator. |
| Risk coverage | Mini-Explorer | Spawn focused Explorer for risky areas (5 min max, parallel with Do prep). |
Mini-Explorer prompt: "You are the EXPLORER. Risk coverage scored <score>. Identified risks: <risks>. Research ONLY the risky areas. Is the risk real? What mitigations exist?"
Save to `plan-mini-explorer.md`.
---
## 2. Do Phase
### 2a. Maker
```
Agent(
description: "Maker: implement <task>",
prompt: "You are the MAKER archetype.
Implement this proposal: <plan-creator.md contents>
<if cycle 2+: Implementation feedback: <Maker-routed findings from act-feedback.md>>
Rules:
1. Follow the proposal exactly — don't redesign
2. Write tests for every behavioral change
3. Commit with descriptive messages (CRITICAL: uncommitted worktree changes are LOST)
4. Run existing tests — nothing may break
5. If unclear, implement best interpretation and note it
Self-review before finishing:
- All proposal files changed? Tests added? No out-of-scope files? Existing tests pass?",
isolation: "worktree",
mode: "bypassPermissions"
)
```
Save summary to `do-maker.md`. Save changed file list to `do-maker-files.txt`.
### 2b. Test-First Gate
Check `do-maker-files.txt` for test files. If none found and domain is not `writing`:
- If Creator specified a test strategy: re-run Maker with targeted test instruction (1 retry within Do, not a full cycle)
- If no test strategy: emit WARNING, proceed
---
## 3. Check Phase
Spawn Guardian FIRST. Evaluate Rule A2 before spawning other reviewers.
### 3a. Guardian (always first)
```
Agent(
description: "Guardian: security review for <task>",
prompt: "You are the GUARDIAN archetype.
Review changes in branch: <maker's branch>
<git diff from Maker's branch>
<risks section from plan-creator.md>
Assess: security, reliability, breaking changes, dependencies.
Output: APPROVED or REJECTED with findings table.
Each finding: | file:line | CRITICAL/WARNING/INFO | category | description | fix |
Be rigorous but practical — flag real risks, not theoretical ones."
)
```
Save to `check-guardian.md`.
### 3b. Guardian Fast-Path (Rule A2)
If Guardian found **0 CRITICAL and 0 WARNING** AND workflow is not escalated AND not first cycle of thorough:
- Skip remaining reviewers, proceed directly to Act phase
- Log "Guardian fast-path taken"
### 3c. Remaining Reviewers (spawn in parallel)
**Skeptic** (standard/thorough) — receives Creator's proposal, focus on assumptions.
Save to `check-skeptic.md`.
**Sage** (standard/thorough) — receives proposal + diff + implementation summary.
Save to `check-sage.md`.
**Trickster** (thorough only) — receives Maker's diff only. "Think like a QA engineer paid per bug."
Save to `check-trickster.md`.
### 3d. Evidence Validation
After all reviewers complete, scan CRITICAL/WARNING findings. Downgrade to INFO if:
- **Banned phrases** without evidence: "might be", "could potentially", "appears to", "seems like", "may not"
- **No evidence**: no command output, code citation, line reference, or reproduction steps
Track downgrades in events (do NOT modify artifact files). Act phase excludes downgraded findings from CRITICAL tallies.
---
## 4. Act Phase
### 4a. Collect Verdicts
Read all `check-*.md` artifacts. Tally CRITICAL/WARNING/INFO per reviewer.
### 4b. Escalation Check (Rule A1)
If `fast` workflow and Guardian found 2+ CRITICAL: upgrade next cycle to `standard` (add Skeptic + Sage). Once escalated, stays escalated. A2 does not apply to escalated workflows.
### 4c. Convergence Check (cycle 2+ only)
Compare current findings against previous cycle. Classify each as NEW / RESOLVED / PERSISTENT / REGRESSED.
```
convergence_score = resolved / (resolved + new + regressed)
```
| Score | Status | Action |
|-------|--------|--------|
| > 0.8 | Converging | Continue if cycles remain |
| 0.5-0.8 | Stalling | Continue with caution |
| < 0.5 | Diverging | STOP if 2 consecutive cycles |
| 0.0 (all persistent) | Stuck | STOP |
**Oscillation**: Finding present in cycle N-2, absent in N-1, back in N. Two or more oscillating findings = STOP and escalate to user.
Convergence STOP overrides normal cycle-back even if cycles remain.
### 4d. All Approved
1. Run pre-merge hooks from `.archeflow/hooks.yaml` if defined
2. Merge: `./lib/archeflow-git.sh merge "$RUN_ID" --no-ff`
3. Post-merge test validation: `./lib/archeflow-rollback.sh "$RUN_ID"` (auto-reverts if tests fail)
4. If rollback triggered and cycles remain: cycle back with "integration test failure" feedback
5. Clean up worktree: `./lib/archeflow-git.sh cleanup "$RUN_ID"`
6. Proceed to Completion
### 4e. Issues Found (cycles remaining)
Build `act-feedback.md` using the feedback routing table below. Archive current cycle artifacts to `cycle-<N>/`. Increment cycle, go back to Plan.
### 4f. Max Cycles Reached
Report unresolved findings. Present best implementation on its branch (not merged). Let user decide.
---
## Feedback Routing Table
Route each finding to the right agent for the next cycle:
| Source | Category | Routes To | Rationale |
|--------|----------|-----------|-----------|
| Guardian | security, breaking-change | **Creator** | Design must change |
| Guardian | reliability, dependency | **Creator** | Architectural decision needed |
| Skeptic | design, scalability | **Creator** | Assumptions need revision |
| Sage | quality, consistency | **Maker** | Implementation refinement |
| Sage | testing | **Maker** | Test gap, not design flaw |
| Trickster | reliability (design flaw) | **Creator** | Needs redesign |
| Trickster | reliability (test gap) | **Maker** | Needs more tests |
| Trickster | testing | **Maker** | Edge case not covered |
**Disambiguation**: If fix requires changing the approach -> Creator. If fix requires changing the code within the existing approach -> Maker.
### Feedback File Format
`act-feedback.md` splits into `## Creator-Routed Issues` and `## Maker-Routed Issues`. Inject only the relevant section into each agent's prompt.
**Same finding in 2 consecutive cycles**: escalate to user. Do not cycle again blindly.
**Cross-archetype dedup**: If two reviewers raise the same issue (same file + category), merge into one finding. Route to higher-priority destination (Creator over Maker).
---
## 5. Completion
1. Emit `run.complete` event
2. Check for regressions: `./lib/archeflow-memory.sh regression-check`
3. Generate report: `./lib/archeflow-report.sh .archeflow/events/<run_id>.jsonl`
4. Score effectiveness: `./lib/archeflow-score.sh extract .archeflow/events/<run_id>.jsonl`
5. Append to run index: `.archeflow/events/index.jsonl`
---
## Progress Display
```
━━━ ArcheFlow Run: <task> ━━━━━━━━━━━━━━━━━━━
Run ID: <run_id> | Workflow: <standard> | Cycle: 1/<max>
[Plan] Explorer researching... -> done (35s)
[Plan] Creator designing proposal... -> done (25s, confidence: 0.8)
[Do] Maker implementing... -> done (90s, 4 files, 8 tests)
[Check] Guardian reviewing... -> APPROVED
[Check] Skeptic challenging... -> APPROVED (1 INFO)
[Check] Sage reviewing... -> APPROVED
[Act] All approved — merging... -> merged to main
━━━ Complete: 3m 10s, 1 cycle ━━━━━━━━━━━━━━━
Artifacts: .archeflow/artifacts/<run_id>/
Report: .archeflow/events/<run_id>.jsonl
```
---
## Shadow Monitoring
Quick-check after each agent completes:
| Archetype | Shadow | Trigger |
|-----------|--------|---------|
| Explorer | Rabbit Hole | Output >2000 words without Recommendation section |
| Creator | Over-Architect | >2 new abstractions for one feature |
| Maker | Rogue | No tests in changeset, or files outside proposal |
| Guardian | Paranoid | CRITICAL:WARNING ratio >2:1, or zero approvals |
| Skeptic | Paralytic | >7 challenges, <50% have alternatives |
| Trickster | False Alarm | Findings in untouched code, or >10 findings |
| Sage | Bureaucrat | Review >2x code change length |
On detection: apply correction prompt. On 2nd detection of same shadow: replace agent. On 3+ shadows in one cycle: escalate to user.
---
## Event Reference
Emit events via `./lib/archeflow-event.sh <run_id> <type> <phase> <agent> '<json>'`. Events are optional — never let logging block orchestration.
| When | Event Type | Key Data |
|------|-----------|----------|
| Run starts | `run.start` | task, workflow, max_cycles |
| Before agent spawn | `agent.start` | archetype, model, prompt_summary |
| After agent returns | `agent.complete` | archetype, duration_ms, artifacts, summary |
| Phase boundary | `phase.transition` | from, to, artifacts_so_far |
| Alternative chosen | `decision` | what, chosen, alternatives, rationale |
| Orchestrator decision (replay) | `decision.point` | archetype, input, decision, confidence — use `./lib/archeflow-decision.sh` |
| Reviewer verdict | `review.verdict` | archetype, verdict, findings[] |
| Fix addressing review | `fix.applied` | source, finding, file, line |
| End of PDCA cycle | `cycle.boundary` | cycle, max_cycles, exit_condition, convergence |
| Shadow triggered | `shadow.detected` | archetype, shadow, trigger, action |
| Policy halt | `wiggum.break` | trigger, run_state, unresolved_findings, hard/soft |
| Run ends | `run.complete` | status, cycles, agents_total, fixes_total |
Parent rules: `run.start` has `parent: []`. Agents parent to the event that triggered them. Phase transitions fan-in from all completing events. Parallel agents share the same parent.
---
## Artifact Naming
All artifacts live in `.archeflow/artifacts/<run_id>/`:
| File | Content |
|------|---------|
| `plan-explorer.md` | Explorer research (missing in fast) |
| `plan-creator.md` | Creator proposal with confidence |
| `plan-mini-explorer.md` | Risk research (only if A3 triggers) |
| `do-maker.md` | Maker implementation summary |
| `do-maker-files.txt` | Changed file paths (one per line) |
| `check-guardian.md` | Guardian verdict + findings |
| `check-skeptic.md` | Skeptic verdict (if spawned) |
| `check-sage.md` | Sage verdict (if spawned) |
| `check-trickster.md` | Trickster verdict (if spawned) |
| `act-feedback.md` | Structured feedback for next cycle |
| `act-fixes.jsonl` | Applied fixes log |
| `cycle-<N>/` | Archived artifacts from cycle N |
Always check artifact existence before injecting. Missing optional artifacts are expected — skip, don't fail.
Git diff is generated on-the-fly (`git diff main...<maker-branch>`), not saved to disk.
---
## Effectiveness Scoring
After each run, `./lib/archeflow-score.sh extract` scores review archetypes on:
| Dimension | Weight |
|-----------|--------|
| Signal-to-noise (useful / total findings) | 0.30 |
| Fix rate (findings that led to fixes) | 0.25 |
| Cost efficiency (useful findings per dollar) | 0.20 |
| Accuracy (not contradicted by others) | 0.15 |
| Cycle impact (led to cycle exit) | 0.10 |
Scores stored in `.archeflow/memory/effectiveness.jsonl`. After 10+ runs, recommend model tier changes and archetype removal.
---
## Run replay (decision log + what-if)
After key choices (routing, fast-path skip, escalation), emit `decision.point` via `./lib/archeflow-decision.sh` so runs can be inspected with `./lib/archeflow-replay.sh timeline|whatif|compare <run_id>`. Weighted what-if helps estimate how much each review archetype swayed the effective ship/block outcome. See skill `af-replay`.
---
## Dry-Run Mode
When `--dry-run`: Run Plan phase only. Display workflow, agent counts, confidence scores, cost estimate. Ask user to proceed. If yes, continue with `--start-from do`.
## Start-From Mode
| Start from | Required artifacts |
|------------|--------------------|
| `plan` | None |
| `do` | `plan-creator.md` |
| `check` | `plan-creator.md`, `do-maker.md`, `do-maker-files.txt` |
| `act` | All `check-*.md` files |
Validate required artifacts exist. Error if missing.
---
## Error Handling
- **Agent timeout (5 min)**: Emit error event, abort run. Do not retry blindly.
- **Event emitter fails**: Log warning, continue. Events are observation, not control flow.
- **Artifact write fails**: Blocking — abort and report. Artifacts are required for phase handoff.
- **Merge conflict**: Do not force-resolve. Report conflict, leave branch intact, let user decide.
---
## Pipeline Strategy
Linear flow with no cycle-back. Use for bug fixes and well-understood single-concern tasks.
```
Plan (Creator only) -> Implement (Maker) -> Spec-Review (Guardian, then Skeptic if findings) -> Quality-Review (Sage) -> Verify (tests + merge)
```
1. **Plan**: Spawn Creator with Mini-Reflect (no Explorer). Save `plan-creator.md`.
2. **Implement**: Spawn Maker in worktree. Save `do-maker.md`.
3. **Spec-Review**: Guardian first. Skeptic only if Guardian has findings.
4. **Quality-Review**: Sage reviews proposal + diff + summary.
5. **Verify**: Run tests. If pass and 0 CRITICAL: merge. If CRITICAL: one targeted Maker fix, re-review, re-test. If still failing: abort, report branch name for manual resolution.
WARNINGs are logged but do not block merge.
```
━━━ ArcheFlow Pipeline: <task> ━━━━━━━━━━━━━━━━
Run ID: <run_id> | Strategy: pipeline
[Plan] Creator designing... -> done (20s)
[Implement] Maker building... -> done (60s, 3 files)
[Spec] Guardian reviewing... -> APPROVED
[Quality] Sage reviewing... -> APPROVED (1 WARNING)
[Verify] Tests passing... -> merged to main
━━━ Complete: 2m 15s ━━━━━━━━━━━━━━━━━━━━━━━━━━
```