Replaces generic "circuit breaker" with "Wiggum Break" — policy enforcement halt condition named after Chief Wiggum (policy + Ralph Loop's dad). Hard breaks (immediate halt) and soft breaks (finish then halt) with wiggum.break event type. Updated both papers and shadow-detection skill.
468 lines
18 KiB
Markdown
468 lines
18 KiB
Markdown
---
|
|
name: run
|
|
description: |
|
|
Start an ArcheFlow PDCA run. Usage: /af-run <task description> [--workflow fast|standard|thorough] [--dry-run] [--start-from plan|do|check|act]
|
|
---
|
|
|
|
# ArcheFlow Run — PDCA Orchestration
|
|
|
|
One command runs the full cycle: Plan (Explorer+Creator) -> Do (Maker in worktree) -> Check (Guardian first, then others) -> Act (collect findings, route fixes, exit or cycle).
|
|
|
|
## 0. Initialize
|
|
|
|
1. Generate run ID: `<YYYY-MM-DD>-<task-slug>`
|
|
2. Create artifact directory: `mkdir -p .archeflow/artifacts/<run_id>`
|
|
3. Verify `./lib/archeflow-*.sh` scripts exist before proceeding
|
|
4. Inject cross-run memory: `./lib/archeflow-memory.sh inject "$DOMAIN" "" --audit "$RUN_ID"`
|
|
5. Read `.archeflow/config.yaml` models section. Resolution order: per-workflow per-archetype > per-workflow default > per-archetype > global default.
|
|
6. Emit `run.start` event
|
|
|
|
### Strategy Selection
|
|
|
|
Determine strategy from CLI flag `--strategy`, config `strategy:` field, or auto-detect:
|
|
|
|
| Signal | Strategy |
|
|
|--------|----------|
|
|
| Task contains fix/bug/patch/hotfix | `pipeline` |
|
|
| Task contains refactor/redesign/review | `pdca` |
|
|
| Workflow is `fast` with single file | `pipeline` |
|
|
| Workflow is `thorough` | `pdca` |
|
|
| Default | `pdca` |
|
|
|
|
If `pipeline`, skip to the Pipeline section at the end. Otherwise continue with PDCA below.
|
|
|
|
### Workflow Selection
|
|
|
|
| Signal | Workflow | Max Cycles |
|
|
|--------|----------|------------|
|
|
| Small fix, low risk, single concern | `fast` | 1 |
|
|
| Feature, multiple files, moderate risk | `standard` | 2 |
|
|
| Security-sensitive, breaking changes, public API | `thorough` | 3 |
|
|
|
|
---
|
|
|
|
## Attention Filters
|
|
|
|
Each agent receives only what it needs. This is the canonical reference:
|
|
|
|
| Archetype | Receives | Excludes |
|
|
|-----------|----------|----------|
|
|
| Explorer | Task description, codebase access | Prior proposals, reviews, diffs |
|
|
| Creator | Task + Explorer output (+ feedback cycle 2+) | Raw files, diffs, reviewer outputs |
|
|
| Maker | Creator's proposal (+ Maker-routed feedback cycle 2+) | Explorer research, reviewer outputs |
|
|
| Guardian | Maker's diff + proposal risk section | Full proposal, Explorer research |
|
|
| Skeptic | Creator's proposal (assumptions focus) | Diff details, Explorer research |
|
|
| Sage | Proposal + diff + implementation summary | Explorer research, other reviews |
|
|
| Trickster | Maker's diff only | Everything else |
|
|
|
|
---
|
|
|
|
## Status Token Protocol
|
|
|
|
Every agent ends output with `STATUS: <token>`. Parse it to decide the next action.
|
|
|
|
| Status | Action |
|
|
|--------|--------|
|
|
| `DONE` | Proceed to next phase |
|
|
| `DONE_WITH_CONCERNS` | Log concerns, proceed |
|
|
| `NEEDS_CONTEXT` | Pause, request info from user |
|
|
| `BLOCKED` | Abort phase, report blocker |
|
|
|
|
If no status token found, default to `DONE`.
|
|
|
|
---
|
|
|
|
## 1. Plan Phase
|
|
|
|
### 1a. Explorer (standard/thorough only)
|
|
|
|
```
|
|
Agent(
|
|
description: "Explorer: research context for <task>",
|
|
prompt: "You are the EXPLORER archetype.
|
|
<task description>
|
|
Research: 1) affected files/functions, 2) dependencies, 3) test coverage,
|
|
4) codebase patterns. Write a structured research report.
|
|
Be thorough but focused — no rabbit holes.",
|
|
subagent_type: "Explore"
|
|
)
|
|
```
|
|
|
|
Save output to `.archeflow/artifacts/<run_id>/plan-explorer.md`.
|
|
|
|
### 1b. Creator
|
|
|
|
Fast workflow (no Explorer): Creator must perform Mini-Reflect first:
|
|
1. Restate the task in one sentence
|
|
2. List 3 assumptions
|
|
3. Name the highest-damage risk
|
|
|
|
```
|
|
Agent(
|
|
description: "Creator: design proposal for <task>",
|
|
prompt: "You are the CREATOR archetype.
|
|
<task description>
|
|
<if fast: Perform Mini-Reflect first (restate task, 3 assumptions, top risk)>
|
|
<if standard/thorough: Research findings: <plan-explorer.md contents>>
|
|
<if cycle 2+: Prior feedback: <Creator-routed section of act-feedback.md>>
|
|
Design a proposal:
|
|
1. Architecture decisions (with rationale)
|
|
2. Files to create/modify (exact paths, specific changes, 2-5 min per item)
|
|
3. Alternatives considered (2+, with rejection rationale)
|
|
4. Test strategy (specific test cases)
|
|
5. Confidence table (task understanding, solution completeness, risk coverage — each 0.0-1.0)
|
|
6. Risks and mitigations
|
|
<if cycle 2+: 7. How each prior issue was addressed (Fixed/Deferred/Accepted/Disputed)>
|
|
Be decisive — ship a clear plan, not a menu.",
|
|
subagent_type: "Plan"
|
|
)
|
|
```
|
|
|
|
Save output to `.archeflow/artifacts/<run_id>/plan-creator.md`.
|
|
|
|
### 1c. Confidence Gate (Rule A3)
|
|
|
|
Parse the `### Confidence` table from `plan-creator.md`. If unparseable, default to 0.0 (triggers gate).
|
|
|
|
| Axis | Score < 0.5 | Action |
|
|
|------|-------------|--------|
|
|
| Task understanding | Pause | Ask user for clarification. Do not spawn Maker. |
|
|
| Solution completeness | Upgrade | If fast -> standard. Spawn Explorer, re-run Creator. |
|
|
| Risk coverage | Mini-Explorer | Spawn focused Explorer for risky areas (5 min max, parallel with Do prep). |
|
|
|
|
Mini-Explorer prompt: "You are the EXPLORER. Risk coverage scored <score>. Identified risks: <risks>. Research ONLY the risky areas. Is the risk real? What mitigations exist?"
|
|
Save to `plan-mini-explorer.md`.
|
|
|
|
---
|
|
|
|
## 2. Do Phase
|
|
|
|
### 2a. Maker
|
|
|
|
```
|
|
Agent(
|
|
description: "Maker: implement <task>",
|
|
prompt: "You are the MAKER archetype.
|
|
Implement this proposal: <plan-creator.md contents>
|
|
<if cycle 2+: Implementation feedback: <Maker-routed findings from act-feedback.md>>
|
|
Rules:
|
|
1. Follow the proposal exactly — don't redesign
|
|
2. Write tests for every behavioral change
|
|
3. Commit with descriptive messages (CRITICAL: uncommitted worktree changes are LOST)
|
|
4. Run existing tests — nothing may break
|
|
5. If unclear, implement best interpretation and note it
|
|
Self-review before finishing:
|
|
- All proposal files changed? Tests added? No out-of-scope files? Existing tests pass?",
|
|
isolation: "worktree",
|
|
mode: "bypassPermissions"
|
|
)
|
|
```
|
|
|
|
Save summary to `do-maker.md`. Save changed file list to `do-maker-files.txt`.
|
|
|
|
### 2b. Test-First Gate
|
|
|
|
Check `do-maker-files.txt` for test files. If none found and domain is not `writing`:
|
|
- If Creator specified a test strategy: re-run Maker with targeted test instruction (1 retry within Do, not a full cycle)
|
|
- If no test strategy: emit WARNING, proceed
|
|
|
|
---
|
|
|
|
## 3. Check Phase
|
|
|
|
Spawn Guardian FIRST. Evaluate Rule A2 before spawning other reviewers.
|
|
|
|
### 3a. Guardian (always first)
|
|
|
|
```
|
|
Agent(
|
|
description: "Guardian: security review for <task>",
|
|
prompt: "You are the GUARDIAN archetype.
|
|
Review changes in branch: <maker's branch>
|
|
<git diff from Maker's branch>
|
|
<risks section from plan-creator.md>
|
|
Assess: security, reliability, breaking changes, dependencies.
|
|
Output: APPROVED or REJECTED with findings table.
|
|
Each finding: | file:line | CRITICAL/WARNING/INFO | category | description | fix |
|
|
Be rigorous but practical — flag real risks, not theoretical ones."
|
|
)
|
|
```
|
|
|
|
Save to `check-guardian.md`.
|
|
|
|
### 3b. Guardian Fast-Path (Rule A2)
|
|
|
|
If Guardian found **0 CRITICAL and 0 WARNING** AND workflow is not escalated AND not first cycle of thorough:
|
|
- Skip remaining reviewers, proceed directly to Act phase
|
|
- Log "Guardian fast-path taken"
|
|
|
|
### 3c. Remaining Reviewers (spawn in parallel)
|
|
|
|
**Skeptic** (standard/thorough) — receives Creator's proposal, focus on assumptions.
|
|
Save to `check-skeptic.md`.
|
|
|
|
**Sage** (standard/thorough) — receives proposal + diff + implementation summary.
|
|
Save to `check-sage.md`.
|
|
|
|
**Trickster** (thorough only) — receives Maker's diff only. "Think like a QA engineer paid per bug."
|
|
Save to `check-trickster.md`.
|
|
|
|
### 3d. Evidence Validation
|
|
|
|
After all reviewers complete, scan CRITICAL/WARNING findings. Downgrade to INFO if:
|
|
- **Banned phrases** without evidence: "might be", "could potentially", "appears to", "seems like", "may not"
|
|
- **No evidence**: no command output, code citation, line reference, or reproduction steps
|
|
|
|
Track downgrades in events (do NOT modify artifact files). Act phase excludes downgraded findings from CRITICAL tallies.
|
|
|
|
---
|
|
|
|
## 4. Act Phase
|
|
|
|
### 4a. Collect Verdicts
|
|
|
|
Read all `check-*.md` artifacts. Tally CRITICAL/WARNING/INFO per reviewer.
|
|
|
|
### 4b. Escalation Check (Rule A1)
|
|
|
|
If `fast` workflow and Guardian found 2+ CRITICAL: upgrade next cycle to `standard` (add Skeptic + Sage). Once escalated, stays escalated. A2 does not apply to escalated workflows.
|
|
|
|
### 4c. Convergence Check (cycle 2+ only)
|
|
|
|
Compare current findings against previous cycle. Classify each as NEW / RESOLVED / PERSISTENT / REGRESSED.
|
|
|
|
```
|
|
convergence_score = resolved / (resolved + new + regressed)
|
|
```
|
|
|
|
| Score | Status | Action |
|
|
|-------|--------|--------|
|
|
| > 0.8 | Converging | Continue if cycles remain |
|
|
| 0.5-0.8 | Stalling | Continue with caution |
|
|
| < 0.5 | Diverging | STOP if 2 consecutive cycles |
|
|
| 0.0 (all persistent) | Stuck | STOP |
|
|
|
|
**Oscillation**: Finding present in cycle N-2, absent in N-1, back in N. Two or more oscillating findings = STOP and escalate to user.
|
|
|
|
Convergence STOP overrides normal cycle-back even if cycles remain.
|
|
|
|
### 4d. All Approved
|
|
|
|
1. Run pre-merge hooks from `.archeflow/hooks.yaml` if defined
|
|
2. Merge: `./lib/archeflow-git.sh merge "$RUN_ID" --no-ff`
|
|
3. Post-merge test validation: `./lib/archeflow-rollback.sh "$RUN_ID"` (auto-reverts if tests fail)
|
|
4. If rollback triggered and cycles remain: cycle back with "integration test failure" feedback
|
|
5. Clean up worktree: `./lib/archeflow-git.sh cleanup "$RUN_ID"`
|
|
6. Proceed to Completion
|
|
|
|
### 4e. Issues Found (cycles remaining)
|
|
|
|
Build `act-feedback.md` using the feedback routing table below. Archive current cycle artifacts to `cycle-<N>/`. Increment cycle, go back to Plan.
|
|
|
|
### 4f. Max Cycles Reached
|
|
|
|
Report unresolved findings. Present best implementation on its branch (not merged). Let user decide.
|
|
|
|
---
|
|
|
|
## Feedback Routing Table
|
|
|
|
Route each finding to the right agent for the next cycle:
|
|
|
|
| Source | Category | Routes To | Rationale |
|
|
|--------|----------|-----------|-----------|
|
|
| Guardian | security, breaking-change | **Creator** | Design must change |
|
|
| Guardian | reliability, dependency | **Creator** | Architectural decision needed |
|
|
| Skeptic | design, scalability | **Creator** | Assumptions need revision |
|
|
| Sage | quality, consistency | **Maker** | Implementation refinement |
|
|
| Sage | testing | **Maker** | Test gap, not design flaw |
|
|
| Trickster | reliability (design flaw) | **Creator** | Needs redesign |
|
|
| Trickster | reliability (test gap) | **Maker** | Needs more tests |
|
|
| Trickster | testing | **Maker** | Edge case not covered |
|
|
|
|
**Disambiguation**: If fix requires changing the approach -> Creator. If fix requires changing the code within the existing approach -> Maker.
|
|
|
|
### Feedback File Format
|
|
|
|
`act-feedback.md` splits into `## Creator-Routed Issues` and `## Maker-Routed Issues`. Inject only the relevant section into each agent's prompt.
|
|
|
|
**Same finding in 2 consecutive cycles**: escalate to user. Do not cycle again blindly.
|
|
|
|
**Cross-archetype dedup**: If two reviewers raise the same issue (same file + category), merge into one finding. Route to higher-priority destination (Creator over Maker).
|
|
|
|
---
|
|
|
|
## 5. Completion
|
|
|
|
1. Emit `run.complete` event
|
|
2. Check for regressions: `./lib/archeflow-memory.sh regression-check`
|
|
3. Generate report: `./lib/archeflow-report.sh .archeflow/events/<run_id>.jsonl`
|
|
4. Score effectiveness: `./lib/archeflow-score.sh extract .archeflow/events/<run_id>.jsonl`
|
|
5. Append to run index: `.archeflow/events/index.jsonl`
|
|
|
|
---
|
|
|
|
## Progress Display
|
|
|
|
```
|
|
━━━ ArcheFlow Run: <task> ━━━━━━━━━━━━━━━━━━━
|
|
Run ID: <run_id> | Workflow: <standard> | Cycle: 1/<max>
|
|
|
|
[Plan] Explorer researching... -> done (35s)
|
|
[Plan] Creator designing proposal... -> done (25s, confidence: 0.8)
|
|
[Do] Maker implementing... -> done (90s, 4 files, 8 tests)
|
|
[Check] Guardian reviewing... -> APPROVED
|
|
[Check] Skeptic challenging... -> APPROVED (1 INFO)
|
|
[Check] Sage reviewing... -> APPROVED
|
|
[Act] All approved — merging... -> merged to main
|
|
|
|
━━━ Complete: 3m 10s, 1 cycle ━━━━━━━━━━━━━━━
|
|
Artifacts: .archeflow/artifacts/<run_id>/
|
|
Report: .archeflow/events/<run_id>.jsonl
|
|
```
|
|
|
|
---
|
|
|
|
## Shadow Monitoring
|
|
|
|
Quick-check after each agent completes:
|
|
|
|
| Archetype | Shadow | Trigger |
|
|
|-----------|--------|---------|
|
|
| Explorer | Rabbit Hole | Output >2000 words without Recommendation section |
|
|
| Creator | Over-Architect | >2 new abstractions for one feature |
|
|
| Maker | Rogue | No tests in changeset, or files outside proposal |
|
|
| Guardian | Paranoid | CRITICAL:WARNING ratio >2:1, or zero approvals |
|
|
| Skeptic | Paralytic | >7 challenges, <50% have alternatives |
|
|
| Trickster | False Alarm | Findings in untouched code, or >10 findings |
|
|
| Sage | Bureaucrat | Review >2x code change length |
|
|
|
|
On detection: apply correction prompt. On 2nd detection of same shadow: replace agent. On 3+ shadows in one cycle: escalate to user.
|
|
|
|
---
|
|
|
|
## Event Reference
|
|
|
|
Emit events via `./lib/archeflow-event.sh <run_id> <type> <phase> <agent> '<json>'`. Events are optional — never let logging block orchestration.
|
|
|
|
| When | Event Type | Key Data |
|
|
|------|-----------|----------|
|
|
| Run starts | `run.start` | task, workflow, max_cycles |
|
|
| Before agent spawn | `agent.start` | archetype, model, prompt_summary |
|
|
| After agent returns | `agent.complete` | archetype, duration_ms, artifacts, summary |
|
|
| Phase boundary | `phase.transition` | from, to, artifacts_so_far |
|
|
| Alternative chosen | `decision` | what, chosen, alternatives, rationale |
|
|
| Orchestrator decision (replay) | `decision.point` | archetype, input, decision, confidence — use `./lib/archeflow-decision.sh` |
|
|
| Reviewer verdict | `review.verdict` | archetype, verdict, findings[] |
|
|
| Fix addressing review | `fix.applied` | source, finding, file, line |
|
|
| End of PDCA cycle | `cycle.boundary` | cycle, max_cycles, exit_condition, convergence |
|
|
| Shadow triggered | `shadow.detected` | archetype, shadow, trigger, action |
|
|
| Policy halt | `wiggum.break` | trigger, run_state, unresolved_findings, hard/soft |
|
|
| Run ends | `run.complete` | status, cycles, agents_total, fixes_total |
|
|
|
|
Parent rules: `run.start` has `parent: []`. Agents parent to the event that triggered them. Phase transitions fan-in from all completing events. Parallel agents share the same parent.
|
|
|
|
---
|
|
|
|
## Artifact Naming
|
|
|
|
All artifacts live in `.archeflow/artifacts/<run_id>/`:
|
|
|
|
| File | Content |
|
|
|------|---------|
|
|
| `plan-explorer.md` | Explorer research (missing in fast) |
|
|
| `plan-creator.md` | Creator proposal with confidence |
|
|
| `plan-mini-explorer.md` | Risk research (only if A3 triggers) |
|
|
| `do-maker.md` | Maker implementation summary |
|
|
| `do-maker-files.txt` | Changed file paths (one per line) |
|
|
| `check-guardian.md` | Guardian verdict + findings |
|
|
| `check-skeptic.md` | Skeptic verdict (if spawned) |
|
|
| `check-sage.md` | Sage verdict (if spawned) |
|
|
| `check-trickster.md` | Trickster verdict (if spawned) |
|
|
| `act-feedback.md` | Structured feedback for next cycle |
|
|
| `act-fixes.jsonl` | Applied fixes log |
|
|
| `cycle-<N>/` | Archived artifacts from cycle N |
|
|
|
|
Always check artifact existence before injecting. Missing optional artifacts are expected — skip, don't fail.
|
|
|
|
Git diff is generated on-the-fly (`git diff main...<maker-branch>`), not saved to disk.
|
|
|
|
---
|
|
|
|
## Effectiveness Scoring
|
|
|
|
After each run, `./lib/archeflow-score.sh extract` scores review archetypes on:
|
|
|
|
| Dimension | Weight |
|
|
|-----------|--------|
|
|
| Signal-to-noise (useful / total findings) | 0.30 |
|
|
| Fix rate (findings that led to fixes) | 0.25 |
|
|
| Cost efficiency (useful findings per dollar) | 0.20 |
|
|
| Accuracy (not contradicted by others) | 0.15 |
|
|
| Cycle impact (led to cycle exit) | 0.10 |
|
|
|
|
Scores stored in `.archeflow/memory/effectiveness.jsonl`. After 10+ runs, recommend model tier changes and archetype removal.
|
|
|
|
---
|
|
|
|
## Run replay (decision log + what-if)
|
|
|
|
After key choices (routing, fast-path skip, escalation), emit `decision.point` via `./lib/archeflow-decision.sh` so runs can be inspected with `./lib/archeflow-replay.sh timeline|whatif|compare <run_id>`. Weighted what-if helps estimate how much each review archetype swayed the effective ship/block outcome. See skill `af-replay`.
|
|
|
|
---
|
|
|
|
## Dry-Run Mode
|
|
|
|
When `--dry-run`: Run Plan phase only. Display workflow, agent counts, confidence scores, cost estimate. Ask user to proceed. If yes, continue with `--start-from do`.
|
|
|
|
## Start-From Mode
|
|
|
|
| Start from | Required artifacts |
|
|
|------------|--------------------|
|
|
| `plan` | None |
|
|
| `do` | `plan-creator.md` |
|
|
| `check` | `plan-creator.md`, `do-maker.md`, `do-maker-files.txt` |
|
|
| `act` | All `check-*.md` files |
|
|
|
|
Validate required artifacts exist. Error if missing.
|
|
|
|
---
|
|
|
|
## Error Handling
|
|
|
|
- **Agent timeout (5 min)**: Emit error event, abort run. Do not retry blindly.
|
|
- **Event emitter fails**: Log warning, continue. Events are observation, not control flow.
|
|
- **Artifact write fails**: Blocking — abort and report. Artifacts are required for phase handoff.
|
|
- **Merge conflict**: Do not force-resolve. Report conflict, leave branch intact, let user decide.
|
|
|
|
---
|
|
|
|
## Pipeline Strategy
|
|
|
|
Linear flow with no cycle-back. Use for bug fixes and well-understood single-concern tasks.
|
|
|
|
```
|
|
Plan (Creator only) -> Implement (Maker) -> Spec-Review (Guardian, then Skeptic if findings) -> Quality-Review (Sage) -> Verify (tests + merge)
|
|
```
|
|
|
|
1. **Plan**: Spawn Creator with Mini-Reflect (no Explorer). Save `plan-creator.md`.
|
|
2. **Implement**: Spawn Maker in worktree. Save `do-maker.md`.
|
|
3. **Spec-Review**: Guardian first. Skeptic only if Guardian has findings.
|
|
4. **Quality-Review**: Sage reviews proposal + diff + summary.
|
|
5. **Verify**: Run tests. If pass and 0 CRITICAL: merge. If CRITICAL: one targeted Maker fix, re-review, re-test. If still failing: abort, report branch name for manual resolution.
|
|
|
|
WARNINGs are logged but do not block merge.
|
|
|
|
```
|
|
━━━ ArcheFlow Pipeline: <task> ━━━━━━━━━━━━━━━━
|
|
Run ID: <run_id> | Strategy: pipeline
|
|
|
|
[Plan] Creator designing... -> done (20s)
|
|
[Implement] Maker building... -> done (60s, 3 files)
|
|
[Spec] Guardian reviewing... -> APPROVED
|
|
[Quality] Sage reviewing... -> APPROVED (1 WARNING)
|
|
[Verify] Tests passing... -> merged to main
|
|
|
|
━━━ Complete: 2m 15s ━━━━━━━━━━━━━━━━━━━━━━━━━━
|
|
```
|