Files
Christian Nennemann 1e96d87f49 feat: introduce Wiggum Break as named circuit breaker
Replaces generic "circuit breaker" with "Wiggum Break" — policy enforcement
halt condition named after Chief Wiggum (policy + Ralph Loop's dad).
Hard breaks (immediate halt) and soft breaks (finish then halt) with
wiggum.break event type. Updated both papers and shadow-detection skill.
2026-04-08 05:19:35 +02:00

18 KiB

name, description
name description
run Start an ArcheFlow PDCA run. Usage: /af-run <task description> [--workflow fast|standard|thorough] [--dry-run] [--start-from plan|do|check|act]

ArcheFlow Run — PDCA Orchestration

One command runs the full cycle: Plan (Explorer+Creator) -> Do (Maker in worktree) -> Check (Guardian first, then others) -> Act (collect findings, route fixes, exit or cycle).

0. Initialize

  1. Generate run ID: <YYYY-MM-DD>-<task-slug>
  2. Create artifact directory: mkdir -p .archeflow/artifacts/<run_id>
  3. Verify ./lib/archeflow-*.sh scripts exist before proceeding
  4. Inject cross-run memory: ./lib/archeflow-memory.sh inject "$DOMAIN" "" --audit "$RUN_ID"
  5. Read .archeflow/config.yaml models section. Resolution order: per-workflow per-archetype > per-workflow default > per-archetype > global default.
  6. Emit run.start event

Strategy Selection

Determine strategy from CLI flag --strategy, config strategy: field, or auto-detect:

Signal Strategy
Task contains fix/bug/patch/hotfix pipeline
Task contains refactor/redesign/review pdca
Workflow is fast with single file pipeline
Workflow is thorough pdca
Default pdca

If pipeline, skip to the Pipeline section at the end. Otherwise continue with PDCA below.

Workflow Selection

Signal Workflow Max Cycles
Small fix, low risk, single concern fast 1
Feature, multiple files, moderate risk standard 2
Security-sensitive, breaking changes, public API thorough 3

Attention Filters

Each agent receives only what it needs. This is the canonical reference:

Archetype Receives Excludes
Explorer Task description, codebase access Prior proposals, reviews, diffs
Creator Task + Explorer output (+ feedback cycle 2+) Raw files, diffs, reviewer outputs
Maker Creator's proposal (+ Maker-routed feedback cycle 2+) Explorer research, reviewer outputs
Guardian Maker's diff + proposal risk section Full proposal, Explorer research
Skeptic Creator's proposal (assumptions focus) Diff details, Explorer research
Sage Proposal + diff + implementation summary Explorer research, other reviews
Trickster Maker's diff only Everything else

Status Token Protocol

Every agent ends output with STATUS: <token>. Parse it to decide the next action.

Status Action
DONE Proceed to next phase
DONE_WITH_CONCERNS Log concerns, proceed
NEEDS_CONTEXT Pause, request info from user
BLOCKED Abort phase, report blocker

If no status token found, default to DONE.


1. Plan Phase

1a. Explorer (standard/thorough only)

Agent(
  description: "Explorer: research context for <task>",
  prompt: "You are the EXPLORER archetype.
    <task description>
    Research: 1) affected files/functions, 2) dependencies, 3) test coverage,
    4) codebase patterns. Write a structured research report.
    Be thorough but focused — no rabbit holes.",
  subagent_type: "Explore"
)

Save output to .archeflow/artifacts/<run_id>/plan-explorer.md.

1b. Creator

Fast workflow (no Explorer): Creator must perform Mini-Reflect first:

  1. Restate the task in one sentence
  2. List 3 assumptions
  3. Name the highest-damage risk
Agent(
  description: "Creator: design proposal for <task>",
  prompt: "You are the CREATOR archetype.
    <task description>
    <if fast: Perform Mini-Reflect first (restate task, 3 assumptions, top risk)>
    <if standard/thorough: Research findings: <plan-explorer.md contents>>
    <if cycle 2+: Prior feedback: <Creator-routed section of act-feedback.md>>
    Design a proposal:
    1. Architecture decisions (with rationale)
    2. Files to create/modify (exact paths, specific changes, 2-5 min per item)
    3. Alternatives considered (2+, with rejection rationale)
    4. Test strategy (specific test cases)
    5. Confidence table (task understanding, solution completeness, risk coverage — each 0.0-1.0)
    6. Risks and mitigations
    <if cycle 2+: 7. How each prior issue was addressed (Fixed/Deferred/Accepted/Disputed)>
    Be decisive — ship a clear plan, not a menu.",
  subagent_type: "Plan"
)

Save output to .archeflow/artifacts/<run_id>/plan-creator.md.

1c. Confidence Gate (Rule A3)

Parse the ### Confidence table from plan-creator.md. If unparseable, default to 0.0 (triggers gate).

Axis Score < 0.5 Action
Task understanding Pause Ask user for clarification. Do not spawn Maker.
Solution completeness Upgrade If fast -> standard. Spawn Explorer, re-run Creator.
Risk coverage Mini-Explorer Spawn focused Explorer for risky areas (5 min max, parallel with Do prep).

Mini-Explorer prompt: "You are the EXPLORER. Risk coverage scored . Identified risks: . Research ONLY the risky areas. Is the risk real? What mitigations exist?" Save to plan-mini-explorer.md.


2. Do Phase

2a. Maker

Agent(
  description: "Maker: implement <task>",
  prompt: "You are the MAKER archetype.
    Implement this proposal: <plan-creator.md contents>
    <if cycle 2+: Implementation feedback: <Maker-routed findings from act-feedback.md>>
    Rules:
    1. Follow the proposal exactly — don't redesign
    2. Write tests for every behavioral change
    3. Commit with descriptive messages (CRITICAL: uncommitted worktree changes are LOST)
    4. Run existing tests — nothing may break
    5. If unclear, implement best interpretation and note it
    Self-review before finishing:
    - All proposal files changed? Tests added? No out-of-scope files? Existing tests pass?",
  isolation: "worktree",
  mode: "bypassPermissions"
)

Save summary to do-maker.md. Save changed file list to do-maker-files.txt.

2b. Test-First Gate

Check do-maker-files.txt for test files. If none found and domain is not writing:

  • If Creator specified a test strategy: re-run Maker with targeted test instruction (1 retry within Do, not a full cycle)
  • If no test strategy: emit WARNING, proceed

3. Check Phase

Spawn Guardian FIRST. Evaluate Rule A2 before spawning other reviewers.

3a. Guardian (always first)

Agent(
  description: "Guardian: security review for <task>",
  prompt: "You are the GUARDIAN archetype.
    Review changes in branch: <maker's branch>
    <git diff from Maker's branch>
    <risks section from plan-creator.md>
    Assess: security, reliability, breaking changes, dependencies.
    Output: APPROVED or REJECTED with findings table.
    Each finding: | file:line | CRITICAL/WARNING/INFO | category | description | fix |
    Be rigorous but practical — flag real risks, not theoretical ones."
)

Save to check-guardian.md.

3b. Guardian Fast-Path (Rule A2)

If Guardian found 0 CRITICAL and 0 WARNING AND workflow is not escalated AND not first cycle of thorough:

  • Skip remaining reviewers, proceed directly to Act phase
  • Log "Guardian fast-path taken"

3c. Remaining Reviewers (spawn in parallel)

Skeptic (standard/thorough) — receives Creator's proposal, focus on assumptions. Save to check-skeptic.md.

Sage (standard/thorough) — receives proposal + diff + implementation summary. Save to check-sage.md.

Trickster (thorough only) — receives Maker's diff only. "Think like a QA engineer paid per bug." Save to check-trickster.md.

3d. Evidence Validation

After all reviewers complete, scan CRITICAL/WARNING findings. Downgrade to INFO if:

  • Banned phrases without evidence: "might be", "could potentially", "appears to", "seems like", "may not"
  • No evidence: no command output, code citation, line reference, or reproduction steps

Track downgrades in events (do NOT modify artifact files). Act phase excludes downgraded findings from CRITICAL tallies.


4. Act Phase

4a. Collect Verdicts

Read all check-*.md artifacts. Tally CRITICAL/WARNING/INFO per reviewer.

4b. Escalation Check (Rule A1)

If fast workflow and Guardian found 2+ CRITICAL: upgrade next cycle to standard (add Skeptic + Sage). Once escalated, stays escalated. A2 does not apply to escalated workflows.

4c. Convergence Check (cycle 2+ only)

Compare current findings against previous cycle. Classify each as NEW / RESOLVED / PERSISTENT / REGRESSED.

convergence_score = resolved / (resolved + new + regressed)
Score Status Action
> 0.8 Converging Continue if cycles remain
0.5-0.8 Stalling Continue with caution
< 0.5 Diverging STOP if 2 consecutive cycles
0.0 (all persistent) Stuck STOP

Oscillation: Finding present in cycle N-2, absent in N-1, back in N. Two or more oscillating findings = STOP and escalate to user.

Convergence STOP overrides normal cycle-back even if cycles remain.

4d. All Approved

  1. Run pre-merge hooks from .archeflow/hooks.yaml if defined
  2. Merge: ./lib/archeflow-git.sh merge "$RUN_ID" --no-ff
  3. Post-merge test validation: ./lib/archeflow-rollback.sh "$RUN_ID" (auto-reverts if tests fail)
  4. If rollback triggered and cycles remain: cycle back with "integration test failure" feedback
  5. Clean up worktree: ./lib/archeflow-git.sh cleanup "$RUN_ID"
  6. Proceed to Completion

4e. Issues Found (cycles remaining)

Build act-feedback.md using the feedback routing table below. Archive current cycle artifacts to cycle-<N>/. Increment cycle, go back to Plan.

4f. Max Cycles Reached

Report unresolved findings. Present best implementation on its branch (not merged). Let user decide.


Feedback Routing Table

Route each finding to the right agent for the next cycle:

Source Category Routes To Rationale
Guardian security, breaking-change Creator Design must change
Guardian reliability, dependency Creator Architectural decision needed
Skeptic design, scalability Creator Assumptions need revision
Sage quality, consistency Maker Implementation refinement
Sage testing Maker Test gap, not design flaw
Trickster reliability (design flaw) Creator Needs redesign
Trickster reliability (test gap) Maker Needs more tests
Trickster testing Maker Edge case not covered

Disambiguation: If fix requires changing the approach -> Creator. If fix requires changing the code within the existing approach -> Maker.

Feedback File Format

act-feedback.md splits into ## Creator-Routed Issues and ## Maker-Routed Issues. Inject only the relevant section into each agent's prompt.

Same finding in 2 consecutive cycles: escalate to user. Do not cycle again blindly.

Cross-archetype dedup: If two reviewers raise the same issue (same file + category), merge into one finding. Route to higher-priority destination (Creator over Maker).


5. Completion

  1. Emit run.complete event
  2. Check for regressions: ./lib/archeflow-memory.sh regression-check
  3. Generate report: ./lib/archeflow-report.sh .archeflow/events/<run_id>.jsonl
  4. Score effectiveness: ./lib/archeflow-score.sh extract .archeflow/events/<run_id>.jsonl
  5. Append to run index: .archeflow/events/index.jsonl

Progress Display

━━━ ArcheFlow Run: <task> ━━━━━━━━━━━━━━━━━━━
Run ID: <run_id> | Workflow: <standard> | Cycle: 1/<max>

[Plan]  Explorer researching...        -> done (35s)
[Plan]  Creator designing proposal...  -> done (25s, confidence: 0.8)
[Do]    Maker implementing...          -> done (90s, 4 files, 8 tests)
[Check] Guardian reviewing...          -> APPROVED
[Check] Skeptic challenging...         -> APPROVED (1 INFO)
[Check] Sage reviewing...             -> APPROVED
[Act]   All approved — merging...      -> merged to main

━━━ Complete: 3m 10s, 1 cycle ━━━━━━━━━━━━━━━
Artifacts: .archeflow/artifacts/<run_id>/
Report: .archeflow/events/<run_id>.jsonl

Shadow Monitoring

Quick-check after each agent completes:

Archetype Shadow Trigger
Explorer Rabbit Hole Output >2000 words without Recommendation section
Creator Over-Architect >2 new abstractions for one feature
Maker Rogue No tests in changeset, or files outside proposal
Guardian Paranoid CRITICAL:WARNING ratio >2:1, or zero approvals
Skeptic Paralytic >7 challenges, <50% have alternatives
Trickster False Alarm Findings in untouched code, or >10 findings
Sage Bureaucrat Review >2x code change length

On detection: apply correction prompt. On 2nd detection of same shadow: replace agent. On 3+ shadows in one cycle: escalate to user.


Event Reference

Emit events via ./lib/archeflow-event.sh <run_id> <type> <phase> <agent> '<json>'. Events are optional — never let logging block orchestration.

When Event Type Key Data
Run starts run.start task, workflow, max_cycles
Before agent spawn agent.start archetype, model, prompt_summary
After agent returns agent.complete archetype, duration_ms, artifacts, summary
Phase boundary phase.transition from, to, artifacts_so_far
Alternative chosen decision what, chosen, alternatives, rationale
Orchestrator decision (replay) decision.point archetype, input, decision, confidence — use ./lib/archeflow-decision.sh
Reviewer verdict review.verdict archetype, verdict, findings[]
Fix addressing review fix.applied source, finding, file, line
End of PDCA cycle cycle.boundary cycle, max_cycles, exit_condition, convergence
Shadow triggered shadow.detected archetype, shadow, trigger, action
Policy halt wiggum.break trigger, run_state, unresolved_findings, hard/soft
Run ends run.complete status, cycles, agents_total, fixes_total

Parent rules: run.start has parent: []. Agents parent to the event that triggered them. Phase transitions fan-in from all completing events. Parallel agents share the same parent.


Artifact Naming

All artifacts live in .archeflow/artifacts/<run_id>/:

File Content
plan-explorer.md Explorer research (missing in fast)
plan-creator.md Creator proposal with confidence
plan-mini-explorer.md Risk research (only if A3 triggers)
do-maker.md Maker implementation summary
do-maker-files.txt Changed file paths (one per line)
check-guardian.md Guardian verdict + findings
check-skeptic.md Skeptic verdict (if spawned)
check-sage.md Sage verdict (if spawned)
check-trickster.md Trickster verdict (if spawned)
act-feedback.md Structured feedback for next cycle
act-fixes.jsonl Applied fixes log
cycle-<N>/ Archived artifacts from cycle N

Always check artifact existence before injecting. Missing optional artifacts are expected — skip, don't fail.

Git diff is generated on-the-fly (git diff main...<maker-branch>), not saved to disk.


Effectiveness Scoring

After each run, ./lib/archeflow-score.sh extract scores review archetypes on:

Dimension Weight
Signal-to-noise (useful / total findings) 0.30
Fix rate (findings that led to fixes) 0.25
Cost efficiency (useful findings per dollar) 0.20
Accuracy (not contradicted by others) 0.15
Cycle impact (led to cycle exit) 0.10

Scores stored in .archeflow/memory/effectiveness.jsonl. After 10+ runs, recommend model tier changes and archetype removal.


Run replay (decision log + what-if)

After key choices (routing, fast-path skip, escalation), emit decision.point via ./lib/archeflow-decision.sh so runs can be inspected with ./lib/archeflow-replay.sh timeline|whatif|compare <run_id>. Weighted what-if helps estimate how much each review archetype swayed the effective ship/block outcome. See skill af-replay.


Dry-Run Mode

When --dry-run: Run Plan phase only. Display workflow, agent counts, confidence scores, cost estimate. Ask user to proceed. If yes, continue with --start-from do.

Start-From Mode

Start from Required artifacts
plan None
do plan-creator.md
check plan-creator.md, do-maker.md, do-maker-files.txt
act All check-*.md files

Validate required artifacts exist. Error if missing.


Error Handling

  • Agent timeout (5 min): Emit error event, abort run. Do not retry blindly.
  • Event emitter fails: Log warning, continue. Events are observation, not control flow.
  • Artifact write fails: Blocking — abort and report. Artifacts are required for phase handoff.
  • Merge conflict: Do not force-resolve. Report conflict, leave branch intact, let user decide.

Pipeline Strategy

Linear flow with no cycle-back. Use for bug fixes and well-understood single-concern tasks.

Plan (Creator only) -> Implement (Maker) -> Spec-Review (Guardian, then Skeptic if findings) -> Quality-Review (Sage) -> Verify (tests + merge)
  1. Plan: Spawn Creator with Mini-Reflect (no Explorer). Save plan-creator.md.
  2. Implement: Spawn Maker in worktree. Save do-maker.md.
  3. Spec-Review: Guardian first. Skeptic only if Guardian has findings.
  4. Quality-Review: Sage reviews proposal + diff + summary.
  5. Verify: Run tests. If pass and 0 CRITICAL: merge. If CRITICAL: one targeted Maker fix, re-review, re-test. If still failing: abort, report branch name for manual resolution.

WARNINGs are logged but do not block merge.

━━━ ArcheFlow Pipeline: <task> ━━━━━━━━━━━━━━━━
Run ID: <run_id> | Strategy: pipeline

[Plan]      Creator designing...          -> done (20s)
[Implement] Maker building...             -> done (60s, 3 files)
[Spec]      Guardian reviewing...         -> APPROVED
[Quality]   Sage reviewing...             -> APPROVED (1 WARNING)
[Verify]    Tests passing...              -> merged to main

━━━ Complete: 2m 15s ━━━━━━━━━━━━━━━━━━━━━━━━━━