Files

Christian Nennemann 29762a8464 feat: add strategy abstraction with pdca and pipeline strategies

2026-04-04 09:36:05 +02:00

26 KiB

Raw Blame History

name, description

name	description
orchestration	Use when executing a multi-agent orchestration — spawning archetype agents, managing PDCA cycles, coordinating worktrees, and merging results. This is the step-by-step execution guide.

Orchestration Execution

This skill guides you through running a full ArcheFlow orchestration using Claude Code's native Agent tool and git worktrees.

Strategy Selection

A strategy defines the shape of an orchestration run — which phases execute, in what order, and when to iterate. A workflow (fast/standard/thorough) controls the depth within a strategy.

Available Strategies

Strategy	Flow	When to Use
`pdca`	Plan -> Do -> Check -> Act (cyclic)	Refactors, thorough reviews, multi-concern tasks
`pipeline`	Plan -> Implement -> Spec-Review -> Quality-Review -> Verify (linear)	Bug fixes, fast patches, single-concern tasks
`auto`	Selected by task analysis	Default — let ArcheFlow decide

Strategy Interface

Every strategy defines:

Phases — ordered list of execution stages
Agent mapping — which archetypes run in each phase
Transition rules — conditions for moving between phases
Iteration model — cyclic (PDCA) or linear (pipeline)
Exit conditions — when the run terminates

PDCA Strategy

The existing orchestration flow (Steps 0-4 below). Cyclic — the Act phase can feed back to Plan for another iteration. Best for tasks requiring multiple review perspectives and iterative refinement.

Pipeline Strategy

Linear flow with no cycle-back. Faster for well-understood tasks where one pass is sufficient.

Phase	Agent	Purpose
Plan	Creator	Design proposal
Implement	Maker	Build in worktree
Spec-Review	Guardian, then Skeptic	Security + assumption check (sequential)
Quality-Review	Sage	Code quality review
Verify	(automated)	Run tests, apply targeted fix if CRITICAL

No cycle-back — WARNINGs are logged but do not block. CRITICALs in Verify trigger a single targeted fix attempt by the Maker, not a full cycle.

Auto-Selection Rules

When strategy: auto (default):

Task contains "fix", "bug", "patch", "hotfix" → pipeline
Task contains "refactor", "redesign", "review" → pdca
Workflow is thorough → pdca (always)
Workflow is fast with single file → pipeline
Otherwise → pdca

Step 0: Choose a Workflow

If .archeflow/teams/<name>.yaml exists, the user can reference a team preset: "Use the backend team". Load the preset's phase config instead of built-in defaults. See archeflow:custom-archetypes skill for preset format.

Otherwise, assess the task and pick:

Signal	Workflow
Small fix, low risk, single concern	`fast` (1 cycle)
Feature, multiple files, moderate risk	`standard` (2 cycles)
Security-sensitive, breaking changes, public API	`thorough` (3 cycles)

Workflow Adaptation Rules

The initial workflow choice is a starting point, not a commitment. These rules adapt the workflow at runtime. Each rule specifies when it evaluates (which phase boundary).

A3: Confidence Gate (evaluates: after Plan, before Do)

When: Creator's confidence table has any axis below 0.5. Action by axis:

Axis	Score < 0.5 Action
Task understanding	Pause. Ask user to clarify before proceeding. Do not spawn Maker.
Solution completeness	Upgrade to standard. Add Explorer before Maker starts.
Risk coverage	Spawn mini-Explorer for the specific risky area (parallel, 5 min max). Maker can proceed.

A3 runs before any Do/Check agents spawn, so there are no cancellation issues.

A1: Conditional Escalation (evaluates: after Check, before next cycle)

When: Guardian rejects with 2+ CRITICAL findings in a fast workflow. Action: Escalate to standard for the next cycle — add Skeptic + Sage to the reviewer roster. Why: If Guardian found serious issues, more perspectives help find root causes. Sticky: Once escalated, the workflow stays escalated for all remaining cycles. A2 does not apply to escalated workflows.

A2: Guardian Fast-Path (evaluates: after Guardian, before spawning other reviewers)

When: Guardian finds 0 CRITICAL and 0 WARNING in a non-escalated standard or thorough workflow. Action: Do not spawn Skeptic, Sage, or Trickster. Proceed directly to Act phase. Why: Guardian's security review is the strictest gate. Clean pass = safe to skip additional reviewers. Critical: Evaluate A2 after Guardian completes but before other reviewers are spawned. Do not spawn reviewers in parallel with Guardian — spawn Guardian first, check A2, then spawn remaining reviewers only if A2 doesn't trigger. Does not apply to: Escalated workflows (A1 triggered), or first cycle of thorough workflows (Trickster is mandatory on first pass). Log: Note "Guardian fast-path taken" in orchestration report.

Evaluation Order

Plan phase completes → A3 (confidence gate)
                     ↓
Guardian completes  → A2 (fast-path check) → if clean, skip other reviewers
                     ↓                       if not, spawn other reviewers
Check phase done    → A1 (escalation check) → if 2+ CRITICALs in fast, next cycle is standard

Process Logging

If .archeflow/events/ exists (or should be created), emit structured events throughout orchestration. See archeflow:process-log skill for full schema.

Quick reference — emit at these points:

run.start        → After workflow selection, before first agent
agent.start      → Before each Agent tool call
agent.complete   → After each Agent returns (include duration, tokens, summary, artifacts)
decision         → When choosing between alternatives (plot direction, approach, fix strategy)
phase.transition → At Plan→Do, Do→Check, Check→Act boundaries
review.verdict   → After each reviewer delivers verdict
fix.applied      → After each edit addressing a review finding
cycle.boundary   → End of PDCA cycle
shadow.detected  → When shadow threshold triggers
run.complete     → After final Act phase (include totals)

Helper: ./lib/archeflow-event.sh <run_id> <type> <phase> <agent> '<json>'

Report: ./lib/archeflow-report.sh .archeflow/events/<run_id>.jsonl

Events are optional — if the events dir doesn't exist, skip logging. Never let logging block orchestration.

Model Configuration

Model assignment per archetype and workflow is configured in .archeflow/config.yaml under the models: section. The archeflow:run skill (section 0c) handles resolution with fallback chain: per-workflow per-archetype > per-workflow default > per-archetype > global default. When spawning agents manually, read the config to select the appropriate model.

Step 1: Plan Phase

Spawn agents sequentially — Creator needs Explorer's findings.

Explorer (if standard or thorough)

Context to include: Task description, relevant file paths, codebase access. Context to exclude: Prior proposals, review outputs, implementation details, feedback from previous cycles.

Agent(
  description: "🔍 Explorer: research context",
  prompt: "<task description>
    You are the EXPLORER archetype.
    Research the codebase to understand:
    1. What files and functions are involved
    2. What dependencies exist
    3. What tests currently cover this area
    4. What patterns the codebase uses
    Write your findings as a structured research report.
    Be thorough but focused — no rabbit holes.",
  subagent_type: "Explore"
)

Creator

Context to include: Task description, Explorer's research output. On cycle 2+: prior cycle's structured feedback (see Cycle Feedback Protocol). Context to exclude: Raw file contents (Explorer already summarized), git diffs, reviewer full outputs.

Fast workflow only (no Explorer): The Creator must perform a Mini-Reflect before proposing:

Restate the task in your own words (catch misunderstandings early)
List 3 assumptions you're making
Name the one risk that would cause most damage if wrong

Agent(
  description: "🏗️ Creator: design proposal",
  prompt: "<task description>
    You are the CREATOR archetype.
    <if fast workflow (no Explorer): Before proposing, perform a Mini-Reflect:
      1. Restate the task in one sentence
      2. List 3 assumptions you're making
      3. Name the highest-damage risk
      Then propose.>
    <if standard/thorough: Based on the research findings: <Explorer's output>>
    <if cycle 2+: Prior cycle feedback: <structured feedback — see Cycle Feedback Protocol>>
    Design a solution proposal including:
    1. Architecture decisions (with rationale)
    2. Files to create/modify (with specific changes)
    3. Alternatives considered (at least 2, with rejection rationale)
    4. Test strategy
    5. Confidence (scored by axis: task understanding, solution completeness, risk coverage)
    6. Risks you foresee
    <if cycle 2+: 6. How you addressed each unresolved issue from prior feedback>
    Be decisive. Ship a clear plan, not a menu of options.",
  subagent_type: "Plan"
)

Step 2: Do Phase

Spawn Maker in an isolated worktree so changes don't affect main.

Context to include: Creator's proposal only. On cycle 2+: implementation-routed feedback from Sage/Trickster. Context to exclude: Explorer's research, Guardian/Skeptic findings (those go to Creator).

Agent(
  description: "⚒️ Maker: implement proposal",
  prompt: "<task description>
    You are the MAKER archetype.
    Implement this proposal: <Creator's output>
    <if cycle 2+: Implementation feedback from prior cycle: <Sage/Trickster findings only>>
    Rules:
    1. Follow the proposal exactly — don't redesign
    2. Write tests for every behavioral change
    3. Commit with descriptive messages
    4. Run existing tests — nothing may break
    5. If the proposal is unclear, implement your best interpretation and note it
    Do NOT skip tests. Do NOT refactor unrelated code.

    BEFORE finishing — Self-Review Checklist:
    1. Did I change ALL files listed in the proposal's Changes section?
    2. Did I add tests for each behavioral change?
    3. Are there files in my diff NOT listed in the proposal? If yes, revert them.
    4. Do all existing tests still pass?
    Report any gaps in your Implementation summary.",
  isolation: "worktree",
  mode: "bypassPermissions"
)

Critical: The Maker MUST commit its changes before finishing. Uncommitted changes in a worktree are lost.

Step 3: Check Phase

Spawn Guardian first. After Guardian completes, check adaptation rule A2 (fast-path). If A2 triggers (0 CRITICAL, 0 WARNING, non-escalated workflow), skip remaining reviewers and proceed to Act. Otherwise, spawn remaining reviewers in parallel.

Reviewer spawning protocol: The canonical sequence (Guardian first, A2 evaluation, parallel spawning, timeout handling) is defined in archeflow:check-phase under "Reviewer Spawning Protocol". Follow that protocol for the exact spawning order, context per reviewer, and timeout rules.

Guardian (always runs first)

Context to include: Maker's git diff, proposal risk section only. Context to exclude: Explorer's research, full proposal, other reviewer outputs.

Agent(
  description: "🛡️ Guardian: security and risk review",
  prompt: "You are the GUARDIAN archetype.
    Review the changes in branch: <maker's branch>
    Assess:
    1. Security vulnerabilities (injection, auth bypass, data exposure)
    2. Reliability risks (error handling, edge cases, race conditions)
    3. Breaking changes (API compatibility, schema migrations)
    4. Dependency risks (new deps, version conflicts)
    Output: APPROVED or REJECTED with specific findings.
    Each finding: | file:line | CRITICAL/WARNING/INFO | category | description | fix |
    Categories: security, reliability, design, breaking-change, dependency
    Be rigorous but practical — flag real risks, not theoretical ones."
)

Skeptic (if standard or thorough)

Context to include: Creator's proposal (focus on assumptions section). Context to exclude: Git diff details, Explorer's research, other reviewer outputs.

Agent(
  description: "🤔 Skeptic: challenge assumptions",
  prompt: "You are the SKEPTIC archetype.
    Review the proposal: <Creator's proposal>
    Challenge:
    1. Assumptions in the design — what if they're wrong?
    2. Alternative approaches not considered
    3. Edge cases not tested
    4. Scalability concerns
    Output: APPROVED or REJECTED with counterarguments.
    Each finding: | file:line | CRITICAL/WARNING/INFO | category | description | fix |
    Categories: design, quality, testing, scalability
    Be constructive — every challenge must include a suggested alternative."
)

Sage (if standard or thorough)

Context to include: Creator's proposal, Maker's git diff, implementation summary. Context to exclude: Explorer's raw research, other reviewer outputs.

Agent(
  description: "📚 Sage: holistic quality review",
  prompt: "You are the SAGE archetype.
    Review the changes in branch: <maker's branch>
    Evaluate holistically:
    1. Code quality (readability, maintainability, simplicity)
    2. Test coverage (are the tests meaningful, not just present?)
    3. Documentation (does the change need docs?)
    4. Consistency with codebase patterns
    Output: APPROVED or REJECTED with quality findings.
    Each finding: | file:line | CRITICAL/WARNING/INFO | category | description | fix |
    Categories: quality, testing, design, consistency
    Judge like a senior engineer doing a PR review."
)

Trickster (if thorough only)

Context to include: Maker's git diff only. Context to exclude: Everything else — proposal, research, other reviews.

Agent(
  description: "🃏 Trickster: adversarial testing",
  prompt: "You are the TRICKSTER archetype.
    Try to break the changes in branch: <maker's branch>
    Attack vectors:
    1. Malformed input, boundary values, empty/null/huge data
    2. Concurrency and race conditions
    3. Error path exploitation
    4. Dependency failure scenarios
    Output: APPROVED or REJECTED with edge cases found.
    Each finding: | file:line | CRITICAL/WARNING/INFO | category | description | fix |
    Categories: security, reliability, testing
    Think like a QA engineer who gets paid per bug found."
)

Step 4: Act Phase

Collect all reviewer outputs and decide.

Completion Promise (optional)

If the user defined explicit done criteria with the task, check them now:

Completion criteria: <test command passes> AND <Guardian approves>
Example: "done when pytest passes and Guardian approves with 0 CRITICAL"

If completion criteria are defined, all criteria must pass — reviewer approval alone is not sufficient. If tests fail but reviewers approved, cycle back with "tests failing" as feedback to Creator.

All Approved (and completion criteria met)

Pre-merge hooks: Check .archeflow/hooks.yaml for pre-merge hooks. Run them. If fail_action: abort, stop and report.
Merge the Maker's worktree branch into the target branch
Post-merge hooks: Run post-merge hooks from .archeflow/hooks.yaml if defined. Then run the project's test suite on the merged branch
- Tests pass → proceed to step 3
- Tests fail → auto-revert the merge commit, report the failure, and cycle back with "integration test failure on main" as feedback
Report: what was implemented, what was reviewed, any warnings noted
Clean up the worktree
Record metrics (see Orchestration Metrics)

Issues Found (and cycles remaining)

Build structured feedback using the Cycle Feedback Protocol below
Go back to Step 1 (Plan) with the feedback
Creator revises the proposal, addressing each unresolved issue
Maker re-implements in a fresh worktree
Reviewers check again

Max Cycles Reached with Unresolved Issues

Report all unresolved findings to the user
Present the best implementation so far (on its branch)
Let the user decide: merge as-is, fix manually, or abandon

Cycle Feedback Protocol

After the Check phase, build structured feedback for the next cycle. This replaces dumping raw reviewer output.

1. Extract Findings

Parse each reviewer's output into the standardized format:

## Cycle N Feedback

### Unresolved Issues
| Source | Severity | Category | Issue | Route to |
|--------|----------|----------|-------|----------|
| Guardian | CRITICAL | security | SQL injection in user input | Creator |
| Skeptic | WARNING | design | Assumes single-tenant only | Creator |
| Sage | WARNING | quality | Test names don't describe behavior | Maker |
| Trickster | CRITICAL | reliability | Empty string bypasses validation | Creator |

### Resolved (from cycle N-1)
| Source | Issue | Resolution |
|--------|-------|------------|
| Guardian | Missing rate limit | Added rate limiter middleware |

2. Route Feedback

Not all findings go to the same agent:

Source	Category	Routes to	Reason
Guardian	security, breaking-change	Creator	Design must change
Guardian	reliability, dependency	Creator	Architectural decision needed
Skeptic	design, scalability	Creator	Assumptions need revision
Sage	quality, consistency	Maker	Implementation refinement
Sage	testing	Maker	Test gap, not design flaw
Trickster	reliability (design flaw)	Creator	Needs redesign
Trickster	reliability (test gap)	Maker	Needs more tests
Trickster	testing	Maker	Edge case not covered

Disambiguation rule: When in doubt: if the fix requires changing the approach, route to Creator. If it requires changing the code within the existing approach, route to Maker.

3. Track Resolution

Compare cycle N findings against cycle N-1:

If a prior finding no longer appears in the same category → mark resolved
If a prior finding persists → it stays unresolved with an incremented cycle count
If new findings appear → add as new unresolved issues

This prevents regression and gives the Creator/Maker a clear list of what to address.

4. Convergence Detection

If the same finding (same category + same file location) appears unresolved in 2 consecutive cycles, escalate to user:

"Finding persists across 2 cycles: [Guardian] CRITICAL security — SQL injection in src/auth.ts:48. This may need human judgment or a different approach."

Do not cycle again blindly. The issue is likely structural (wrong design, not wrong implementation) and needs human input.

5. Cross-Archetype Dedup

If two reviewers raise the same issue (same file + same category + similar description), merge into one finding in the consolidated output:

| Guardian + Skeptic | CRITICAL | security | Input not sanitized (src/api.ts:30) | Add validation |

Don't double-count in severity tallies. Route to the higher-priority destination (Creator over Maker).

Orchestration Metrics

Track lightweight metrics throughout the orchestration. No token counting (unreliable from skill layer) — just timing and outcomes.

Per-Phase Logging

After each phase completes, note:

| Phase | Duration | Agents | Outcome |
|-------|----------|--------|---------|
| Plan  | 45s      | 2      | Proposal ready (confidence: 0.8) |
| Do    | 90s      | 1      | 4 files changed, 8 tests added |
| Check | 60s      | 3      | 1 REJECTED (Guardian), 2 APPROVED |
| Act   | —        | —      | Cycle back → feedback built |

Orchestration Summary

At orchestration end, include in the report:

## Orchestration Metrics
| Metric | Value |
|--------|-------|
| Workflow | standard |
| Cycles | 2 of 2 |
| Total duration | 4m 30s |
| Agents spawned | 9 |
| Findings (total) | 5 |
| Findings (critical) | 1 |
| Findings (resolved) | 4 |
| Shadow detections | 0 |

Use this data to calibrate future workflow selection — if fast workflows consistently need 0 cycles of revision, the task was well-scoped.

Autonomous Mode

When running unattended (overnight sessions, batch queues), add these behaviors to the orchestration loop:

Between-Task Checkpoint

After each task completes (success or failure):

Commit and push all changes immediately
Update session log at .archeflow/session-log.md with task outcome
Check stop conditions before starting next task:
- 3 consecutive failures → STOP
- Shadow escalation (same shadow 3+ times) → STOP
- Test suite broken after merge → REVERT and STOP
- Destructive action detected → STOP

Session Log Protocol

Primary: Emit run.complete event to .archeflow/events/<run_id>.jsonl (see Process Logging section above). The event stream is the source of truth.

Secondary: Also write a human-readable summary to .archeflow/session-log.md:

## Task N: <description>
**Workflow:** standard | **Status:** COMPLETED/FAILED
**Cycles:** 1 of 2
**Findings:** Guardian APPROVED, Skeptic APPROVED, Sage WARNING (test names)
**Files changed:** 5 | **Tests added:** 12
**Branch:** merged to main (commit abc1234) | OR: archeflow/maker-xyz (NOT merged)
**Duration:** 8 min
**Events:** `.archeflow/events/<run_id>.jsonl` (full process log)

Generate the full Markdown report: ./lib/archeflow-report.sh .archeflow/events/<run_id>.jsonl

Safety Rules

Never force-push. Never modify main history.
All work stays on worktree branches until explicitly merged
Merges use --no-ff — individually revertable
Failed tasks leave branches intact for manual inspection

For full autonomous mode details (task queues, overnight checklists, user controls): load the archeflow:autonomous-mode skill.

Shadow Monitoring

During orchestration, watch for shadow activation after each agent completes. Quick checklist:

Archetype	Shadow	Quick Check
Explorer	Rabbit Hole	Output >2000 words without Recommendation section?
Creator	Over-Architect	>2 new abstractions for one feature?
Maker	Rogue	No test files in changeset? Files outside proposal?
Guardian	Paranoid	CRITICAL:WARNING ratio >2:1? Zero approvals?
Skeptic	Paralytic	>7 challenges? <50% have alternatives?
Trickster	False Alarm	Findings in untouched code? >10 findings?
Sage	Bureaucrat	Review >2x code change length?

On detection: apply correction prompt from archeflow:shadow-detection skill. On second detection of same shadow: replace agent. On 3+ shadows in same cycle: escalate to user.

Parallel Team Orchestration

When running multiple independent tasks, spawn parallel ArcheFlow teams. Each team runs its own PDCA cycle on a separate worktree.

Rules

Non-overlapping file scope: Each team must work on different files. If two tasks touch the same file, run them sequentially.
Independent worktrees: Each team's Maker gets its own worktree branch (archeflow/team-1-maker, archeflow/team-2-maker).
First-finished-first-merged: Teams merge in completion order. Later teams rebase onto the updated main before their own merge.
Merge conflict handling: If rebase fails, the later team re-runs its Check phase against the merged main. If conflicts are structural, escalate to user.
Max 3 parallel teams: More causes diminishing returns and merge headaches.

Spawning Parallel Teams

# Launch 2-3 teams in a single message with multiple Agent calls:
Agent(description: "🏗️ Team 1: pagination fix (fast)", ...)
Agent(description: "🏗️ Team 2: JWT auth (standard)", ...)
Agent(description: "🏗️ Team 3: logging refactor (fast)", ...)

Each team follows the full PDCA steps independently. The orchestrator monitors all teams and handles merges.

Reviewer Profiles

Projects can configure which reviewers matter in .archeflow/config.yaml:

reviewers:
  always: [guardian]        # Always runs
  default: [sage]           # Runs in standard+thorough
  thorough_only: [trickster] # Only in thorough
  skip: [skeptic]           # Never runs for this project

If no config exists, use the built-in workflow defaults. Profiles save tokens by not spawning reviewers that add little value for the specific project.

Explorer Cache

If the same code area was explored recently, skip Explorer and reuse prior research:

Cache hit criteria: Same files affected (>70% overlap by path) AND prior research is <24 hours old AND no commits to those files since the research.

On cache hit: Show the prior research to Creator with a note: "Using cached Explorer research from [timestamp]. If the codebase changed significantly, re-run Explorer."

On cache miss: Run Explorer normally.

Cache is stored in .archeflow/explorer-cache/ as timestamped markdown files. The orchestrator checks for matches before spawning Explorer.

Learning from History

Track which archetypes catch real issues per project over time. After each orchestration, append to .archeflow/metrics.jsonl:

{"task": "...", "archetype": "guardian", "findings": 2, "critical": 1, "resolved": 2, "useful": true}
{"task": "...", "archetype": "skeptic", "findings": 3, "critical": 0, "resolved": 0, "useful": false}

A finding is useful if it was resolved (led to a code change) rather than dismissed.

After 10+ orchestrations, the orchestrator can recommend reviewer profile changes:

"Skeptic has found 0 useful issues in 8 runs — consider moving to skip or thorough_only"
"Guardian catches critical issues in 80% of runs — confirmed as essential"

This is advisory, not automatic. The user decides based on the data.

Orchestration Report

After completion, summarize:

## ArcheFlow Orchestration Report
- **Task:** <description>
- **Workflow:** standard (2 cycles)
- **Cycle 1:** Guardian rejected (SQL injection in user input handler)
- **Cycle 2:** All approved after input sanitization added
- **Files changed:** 4 files, +120 -30 lines
- **Tests added:** 8 new tests
- **Branch:** archeflow/maker-<id> → merged to main
- **Metrics:** 9 agents, 4m 30s, 5 findings (4 resolved, 1 info remaining)

26 KiB Raw Blame History