Sprint 1 — Workflow Intelligence (A1-A3):
- Conditional escalation: fast→standard on 2+ CRITICALs
- Guardian fast-path: skip remaining reviewers on clean pass
- Confidence-triggered escalation: pause/upgrade/probe on low scores
Sprint 2 — Quality Loop (B1-B2, B5-B6):
- Maker self-review checklist before submitting to Check phase
- Proposal diff ("What Changed") on cycle 2+ revisions
- Convergence detection: escalate to user if same finding persists 2 cycles
- Cross-archetype dedup: merge duplicate findings from different reviewers
Sprint 3 — Completion & Verification (B3-B4):
- Completion promise: user-defined done criteria checked in Act phase
- Post-merge verification: run tests on main, auto-revert on failure
Sprint 4 — Parallel & Scale (C1-C4):
- Parallel team orchestration: 2-3 independent teams with merge gate
- Task dependency graph in autonomous queue format
- Auto-resume on interruption via .archeflow/state.json
- Budget-aware scheduling with automatic workflow downgrade
478 lines
18 KiB
Markdown
478 lines
18 KiB
Markdown
---
|
|
name: orchestration
|
|
description: Use when executing a multi-agent orchestration — spawning archetype agents, managing PDCA cycles, coordinating worktrees, and merging results. This is the step-by-step execution guide.
|
|
---
|
|
|
|
# Orchestration Execution
|
|
|
|
This skill guides you through running a full ArcheFlow orchestration using Claude Code's native Agent tool and git worktrees.
|
|
|
|
## Step 0: Choose a Workflow
|
|
|
|
Assess the task and pick:
|
|
|
|
| Signal | Workflow |
|
|
|--------|----------|
|
|
| Small fix, low risk, single concern | `fast` (1 cycle) |
|
|
| Feature, multiple files, moderate risk | `standard` (2 cycles) |
|
|
| Security-sensitive, breaking changes, public API | `thorough` (3 cycles) |
|
|
|
|
## Workflow Adaptation Rules
|
|
|
|
The initial workflow choice is a starting point, not a commitment. These rules adapt the workflow at runtime based on signals from Creator and Guardian. Evaluate in order after each relevant phase.
|
|
|
|
### A1: Conditional Escalation (fast → standard)
|
|
|
|
**When:** Guardian rejects with 2+ CRITICAL findings in a `fast` workflow.
|
|
**Action:** Escalate to `standard` for the next cycle — add Skeptic + Sage to the reviewer roster.
|
|
**Why:** If Guardian found serious issues, more perspectives help find root causes.
|
|
**Override:** If A3 already escalated to `thorough`, do nothing (already higher).
|
|
|
|
### A2: Guardian Fast-Path (skip remaining reviewers)
|
|
|
|
**When:** Guardian finds 0 CRITICAL and 0 WARNING in a `standard` or `thorough` workflow.
|
|
**Action:** Skip Skeptic, Sage, and Trickster. Proceed directly to Act phase (merge).
|
|
**Why:** Guardian's security review is the strictest gate. Clean pass = safe to merge without additional review cost.
|
|
**Override:** Suppressed if A1 just triggered in the same cycle. Do not fast-path an escalated workflow.
|
|
**Log:** Note "Guardian fast-path taken — remaining reviewers skipped" in orchestration report.
|
|
|
|
### A3: Confidence-Triggered Escalation
|
|
|
|
**When:** Creator's confidence table has any axis below 0.5.
|
|
**Action by axis:**
|
|
|
|
| Axis | Score < 0.5 Action |
|
|
|------|-------------------|
|
|
| Task understanding | **Pause.** Ask user to clarify. Do not proceed until >0.5 or user overrides. |
|
|
| Solution completeness | **Upgrade to standard.** Add Explorer before proceeding. |
|
|
| Risk coverage | **Spawn mini-Explorer** for the specific risky area (parallel, 5 min max). Don't block other phases. |
|
|
|
|
Multiple axes can trigger simultaneously — handle in parallel.
|
|
|
|
## Step 1: Plan Phase
|
|
|
|
Spawn agents sequentially — Creator needs Explorer's findings.
|
|
|
|
### Explorer (if standard or thorough)
|
|
|
|
**Context to include:** Task description, relevant file paths, codebase access.
|
|
**Context to exclude:** Prior proposals, review outputs, implementation details, feedback from previous cycles.
|
|
|
|
```
|
|
Agent(
|
|
description: "🔍 Explorer: research context",
|
|
prompt: "<task description>
|
|
You are the EXPLORER archetype.
|
|
Research the codebase to understand:
|
|
1. What files and functions are involved
|
|
2. What dependencies exist
|
|
3. What tests currently cover this area
|
|
4. What patterns the codebase uses
|
|
Write your findings as a structured research report.
|
|
Be thorough but focused — no rabbit holes.",
|
|
subagent_type: "Explore"
|
|
)
|
|
```
|
|
|
|
### Creator
|
|
|
|
**Context to include:** Task description, Explorer's research output. On cycle 2+: prior cycle's structured feedback (see Cycle Feedback Protocol).
|
|
**Context to exclude:** Raw file contents (Explorer already summarized), git diffs, reviewer full outputs.
|
|
|
|
**Fast workflow only (no Explorer):** The Creator must perform a Mini-Reflect before proposing:
|
|
1. Restate the task in your own words (catch misunderstandings early)
|
|
2. List 3 assumptions you're making
|
|
3. Name the one risk that would cause most damage if wrong
|
|
|
|
```
|
|
Agent(
|
|
description: "🏗️ Creator: design proposal",
|
|
prompt: "<task description>
|
|
You are the CREATOR archetype.
|
|
<if fast workflow (no Explorer): Before proposing, perform a Mini-Reflect:
|
|
1. Restate the task in one sentence
|
|
2. List 3 assumptions you're making
|
|
3. Name the highest-damage risk
|
|
Then propose.>
|
|
<if standard/thorough: Based on the research findings: <Explorer's output>>
|
|
<if cycle 2+: Prior cycle feedback: <structured feedback — see Cycle Feedback Protocol>>
|
|
Design a solution proposal including:
|
|
1. Architecture decisions (with rationale)
|
|
2. Files to create/modify (with specific changes)
|
|
3. Alternatives considered (at least 2, with rejection rationale)
|
|
4. Test strategy
|
|
5. Confidence (scored by axis: task understanding, solution completeness, risk coverage)
|
|
6. Risks you foresee
|
|
<if cycle 2+: 6. How you addressed each unresolved issue from prior feedback>
|
|
Be decisive. Ship a clear plan, not a menu of options.",
|
|
subagent_type: "Plan"
|
|
)
|
|
```
|
|
|
|
## Step 2: Do Phase
|
|
|
|
Spawn Maker in an **isolated worktree** so changes don't affect main.
|
|
|
|
**Context to include:** Creator's proposal only. On cycle 2+: implementation-routed feedback from Sage/Trickster.
|
|
**Context to exclude:** Explorer's research, Guardian/Skeptic findings (those go to Creator).
|
|
|
|
```
|
|
Agent(
|
|
description: "⚒️ Maker: implement proposal",
|
|
prompt: "<task description>
|
|
You are the MAKER archetype.
|
|
Implement this proposal: <Creator's output>
|
|
<if cycle 2+: Implementation feedback from prior cycle: <Sage/Trickster findings only>>
|
|
Rules:
|
|
1. Follow the proposal exactly — don't redesign
|
|
2. Write tests for every behavioral change
|
|
3. Commit with descriptive messages
|
|
4. Run existing tests — nothing may break
|
|
5. If the proposal is unclear, implement your best interpretation and note it
|
|
Do NOT skip tests. Do NOT refactor unrelated code.
|
|
|
|
BEFORE finishing — Self-Review Checklist:
|
|
1. Did I change ALL files listed in the proposal's Changes section?
|
|
2. Did I add tests for each behavioral change?
|
|
3. Are there files in my diff NOT listed in the proposal? If yes, revert them.
|
|
4. Do all existing tests still pass?
|
|
Report any gaps in your Implementation summary.",
|
|
isolation: "worktree",
|
|
mode: "bypassPermissions"
|
|
)
|
|
```
|
|
|
|
**Critical:** The Maker MUST commit its changes before finishing. Uncommitted changes in a worktree are lost.
|
|
|
|
## Step 3: Check Phase
|
|
|
|
Spawn reviewers **in parallel** — they read the Maker's changes independently.
|
|
|
|
### Guardian
|
|
|
|
**Context to include:** Maker's git diff, proposal risk section only.
|
|
**Context to exclude:** Explorer's research, full proposal, other reviewer outputs.
|
|
|
|
```
|
|
Agent(
|
|
description: "🛡️ Guardian: security and risk review",
|
|
prompt: "You are the GUARDIAN archetype.
|
|
Review the changes in branch: <maker's branch>
|
|
Assess:
|
|
1. Security vulnerabilities (injection, auth bypass, data exposure)
|
|
2. Reliability risks (error handling, edge cases, race conditions)
|
|
3. Breaking changes (API compatibility, schema migrations)
|
|
4. Dependency risks (new deps, version conflicts)
|
|
Output: APPROVED or REJECTED with specific findings.
|
|
Each finding: | file:line | CRITICAL/WARNING/INFO | category | description | fix |
|
|
Categories: security, reliability, design, breaking-change, dependency
|
|
Be rigorous but practical — flag real risks, not theoretical ones."
|
|
)
|
|
```
|
|
|
|
### Skeptic (if standard or thorough)
|
|
|
|
**Context to include:** Creator's proposal (focus on assumptions section).
|
|
**Context to exclude:** Git diff details, Explorer's research, other reviewer outputs.
|
|
|
|
```
|
|
Agent(
|
|
description: "🤔 Skeptic: challenge assumptions",
|
|
prompt: "You are the SKEPTIC archetype.
|
|
Review the proposal: <Creator's proposal>
|
|
Challenge:
|
|
1. Assumptions in the design — what if they're wrong?
|
|
2. Alternative approaches not considered
|
|
3. Edge cases not tested
|
|
4. Scalability concerns
|
|
Output: APPROVED or REJECTED with counterarguments.
|
|
Each finding: | file:line | CRITICAL/WARNING/INFO | category | description | fix |
|
|
Categories: design, quality, testing, scalability
|
|
Be constructive — every challenge must include a suggested alternative."
|
|
)
|
|
```
|
|
|
|
### Sage (if standard or thorough)
|
|
|
|
**Context to include:** Creator's proposal, Maker's git diff, implementation summary.
|
|
**Context to exclude:** Explorer's raw research, other reviewer outputs.
|
|
|
|
```
|
|
Agent(
|
|
description: "📚 Sage: holistic quality review",
|
|
prompt: "You are the SAGE archetype.
|
|
Review the changes in branch: <maker's branch>
|
|
Evaluate holistically:
|
|
1. Code quality (readability, maintainability, simplicity)
|
|
2. Test coverage (are the tests meaningful, not just present?)
|
|
3. Documentation (does the change need docs?)
|
|
4. Consistency with codebase patterns
|
|
Output: APPROVED or REJECTED with quality findings.
|
|
Each finding: | file:line | CRITICAL/WARNING/INFO | category | description | fix |
|
|
Categories: quality, testing, design, consistency
|
|
Judge like a senior engineer doing a PR review."
|
|
)
|
|
```
|
|
|
|
### Trickster (if thorough only)
|
|
|
|
**Context to include:** Maker's git diff only.
|
|
**Context to exclude:** Everything else — proposal, research, other reviews.
|
|
|
|
```
|
|
Agent(
|
|
description: "🃏 Trickster: adversarial testing",
|
|
prompt: "You are the TRICKSTER archetype.
|
|
Try to break the changes in branch: <maker's branch>
|
|
Attack vectors:
|
|
1. Malformed input, boundary values, empty/null/huge data
|
|
2. Concurrency and race conditions
|
|
3. Error path exploitation
|
|
4. Dependency failure scenarios
|
|
Output: APPROVED or REJECTED with edge cases found.
|
|
Each finding: | file:line | CRITICAL/WARNING/INFO | category | description | fix |
|
|
Categories: security, reliability, testing
|
|
Think like a QA engineer who gets paid per bug found."
|
|
)
|
|
```
|
|
|
|
## Step 4: Act Phase
|
|
|
|
Collect all reviewer outputs and decide.
|
|
|
|
### Completion Promise (optional)
|
|
|
|
If the user defined explicit done criteria with the task, check them now:
|
|
|
|
```
|
|
Completion criteria: <test command passes> AND <Guardian approves>
|
|
Example: "done when pytest passes and Guardian approves with 0 CRITICAL"
|
|
```
|
|
|
|
If completion criteria are defined, **all criteria must pass** — reviewer approval alone is not sufficient. If tests fail but reviewers approved, cycle back with "tests failing" as feedback to Creator.
|
|
|
|
### All Approved (and completion criteria met)
|
|
1. Merge the Maker's worktree branch into the target branch
|
|
2. **Post-merge verification:** Run the project's test suite on the merged branch
|
|
- Tests pass → proceed to step 3
|
|
- Tests fail → **auto-revert** the merge commit, report the failure, and cycle back with "integration test failure on main" as feedback
|
|
3. Report: what was implemented, what was reviewed, any warnings noted
|
|
4. Clean up the worktree
|
|
5. Record metrics (see Orchestration Metrics)
|
|
|
|
### Issues Found (and cycles remaining)
|
|
1. Build structured feedback using the Cycle Feedback Protocol below
|
|
2. Go back to Step 1 (Plan) with the feedback
|
|
3. Creator revises the proposal, addressing each unresolved issue
|
|
4. Maker re-implements in a fresh worktree
|
|
5. Reviewers check again
|
|
|
|
### Max Cycles Reached with Unresolved Issues
|
|
1. Report all unresolved findings to the user
|
|
2. Present the best implementation so far (on its branch)
|
|
3. Let the user decide: merge as-is, fix manually, or abandon
|
|
|
|
---
|
|
|
|
## Cycle Feedback Protocol
|
|
|
|
After the Check phase, build structured feedback for the next cycle. This replaces dumping raw reviewer output.
|
|
|
|
### 1. Extract Findings
|
|
|
|
Parse each reviewer's output into the standardized format:
|
|
|
|
```markdown
|
|
## Cycle N Feedback
|
|
|
|
### Unresolved Issues
|
|
| Source | Severity | Category | Issue | Route to |
|
|
|--------|----------|----------|-------|----------|
|
|
| Guardian | CRITICAL | security | SQL injection in user input | Creator |
|
|
| Skeptic | WARNING | design | Assumes single-tenant only | Creator |
|
|
| Sage | WARNING | quality | Test names don't describe behavior | Maker |
|
|
| Trickster | CRITICAL | reliability | Empty string bypasses validation | Creator |
|
|
|
|
### Resolved (from cycle N-1)
|
|
| Source | Issue | Resolution |
|
|
|--------|-------|------------|
|
|
| Guardian | Missing rate limit | Added rate limiter middleware |
|
|
```
|
|
|
|
### 2. Route Feedback
|
|
|
|
Not all findings go to the same agent:
|
|
|
|
| Finding source | Routes to | Rationale |
|
|
|----------------|-----------|-----------|
|
|
| Guardian (security, breaking-change) | **Creator** | Design must change |
|
|
| Skeptic (design, scalability) | **Creator** | Assumptions need revision |
|
|
| Sage (quality, consistency) | **Maker** | Implementation refinement |
|
|
| Trickster (reliability, testing) | **Creator** if design flaw, **Maker** if test gap | Depends on root cause |
|
|
|
|
### 3. Track Resolution
|
|
|
|
Compare cycle N findings against cycle N-1:
|
|
- If a prior finding no longer appears in the same category → mark **resolved**
|
|
- If a prior finding persists → it stays **unresolved** with an incremented cycle count
|
|
- If new findings appear → add as new unresolved issues
|
|
|
|
This prevents regression and gives the Creator/Maker a clear list of what to address.
|
|
|
|
### 4. Convergence Detection
|
|
|
|
If the **same finding** (same category + same file location) appears **unresolved in 2 consecutive cycles**, escalate to user:
|
|
|
|
> "Finding persists across 2 cycles: [Guardian] CRITICAL security — SQL injection in src/auth.ts:48. This may need human judgment or a different approach."
|
|
|
|
Do not cycle again blindly. The issue is likely structural (wrong design, not wrong implementation) and needs human input.
|
|
|
|
### 5. Cross-Archetype Dedup
|
|
|
|
If two reviewers raise the same issue (same file + same category + similar description), merge into one finding in the consolidated output:
|
|
|
|
```
|
|
| Guardian + Skeptic | CRITICAL | security | Input not sanitized (src/api.ts:30) | Add validation |
|
|
```
|
|
|
|
Don't double-count in severity tallies. Route to the higher-priority destination (Creator over Maker).
|
|
|
|
---
|
|
|
|
## Orchestration Metrics
|
|
|
|
Track lightweight metrics throughout the orchestration. No token counting (unreliable from skill layer) — just timing and outcomes.
|
|
|
|
### Per-Phase Logging
|
|
|
|
After each phase completes, note:
|
|
|
|
```
|
|
| Phase | Duration | Agents | Outcome |
|
|
|-------|----------|--------|---------|
|
|
| Plan | 45s | 2 | Proposal ready (confidence: 0.8) |
|
|
| Do | 90s | 1 | 4 files changed, 8 tests added |
|
|
| Check | 60s | 3 | 1 REJECTED (Guardian), 2 APPROVED |
|
|
| Act | — | — | Cycle back → feedback built |
|
|
```
|
|
|
|
### Orchestration Summary
|
|
|
|
At orchestration end, include in the report:
|
|
|
|
```markdown
|
|
## Orchestration Metrics
|
|
| Metric | Value |
|
|
|--------|-------|
|
|
| Workflow | standard |
|
|
| Cycles | 2 of 2 |
|
|
| Total duration | 4m 30s |
|
|
| Agents spawned | 9 |
|
|
| Findings (total) | 5 |
|
|
| Findings (critical) | 1 |
|
|
| Findings (resolved) | 4 |
|
|
| Shadow detections | 0 |
|
|
```
|
|
|
|
Use this data to calibrate future workflow selection — if fast workflows consistently need 0 cycles of revision, the task was well-scoped.
|
|
|
|
---
|
|
|
|
## Autonomous Mode
|
|
|
|
When running unattended (overnight sessions, batch queues), add these behaviors to the orchestration loop:
|
|
|
|
### Between-Task Checkpoint
|
|
|
|
After each task completes (success or failure):
|
|
1. **Commit and push** all changes immediately
|
|
2. **Update session log** at `.archeflow/session-log.md` with task outcome
|
|
3. **Check stop conditions** before starting next task:
|
|
- 3 consecutive failures → STOP
|
|
- Shadow escalation (same shadow 3+ times) → STOP
|
|
- Test suite broken after merge → REVERT and STOP
|
|
- Destructive action detected → STOP
|
|
|
|
### Session Log Protocol
|
|
|
|
Write to `.archeflow/session-log.md` after each task:
|
|
|
|
```markdown
|
|
## Task N: <description>
|
|
**Workflow:** standard | **Status:** COMPLETED/FAILED
|
|
**Cycles:** 1 of 2
|
|
**Findings:** Guardian APPROVED, Skeptic APPROVED, Sage WARNING (test names)
|
|
**Files changed:** 5 | **Tests added:** 12
|
|
**Branch:** merged to main (commit abc1234) | OR: archeflow/maker-xyz (NOT merged)
|
|
**Duration:** 8 min
|
|
```
|
|
|
|
### Safety Rules
|
|
- Never force-push. Never modify main history.
|
|
- All work stays on worktree branches until explicitly merged
|
|
- Merges use `--no-ff` — individually revertable
|
|
- Failed tasks leave branches intact for manual inspection
|
|
|
|
For full autonomous mode details (task queues, overnight checklists, user controls): load the `archeflow:autonomous-mode` skill.
|
|
|
|
---
|
|
|
|
## Shadow Monitoring
|
|
|
|
During orchestration, watch for shadow activation after each agent completes. Quick checklist:
|
|
|
|
| Archetype | Shadow | Quick Check |
|
|
|-----------|--------|-------------|
|
|
| Explorer | Rabbit Hole | Output >2000 words without Recommendation section? |
|
|
| Creator | Over-Architect | >2 new abstractions for one feature? |
|
|
| Maker | Rogue | No test files in changeset? Files outside proposal? |
|
|
| Guardian | Paranoid | CRITICAL:WARNING ratio >2:1? Zero approvals? |
|
|
| Skeptic | Paralytic | >7 challenges? <50% have alternatives? |
|
|
| Trickster | False Alarm | Findings in untouched code? >10 findings? |
|
|
| Sage | Bureaucrat | Review >2x code change length? |
|
|
|
|
On detection: apply correction prompt from `archeflow:shadow-detection` skill. On second detection of same shadow: replace agent. On 3+ shadows in same cycle: escalate to user.
|
|
|
|
---
|
|
|
|
## Parallel Team Orchestration
|
|
|
|
When running multiple independent tasks, spawn parallel ArcheFlow teams. Each team runs its own PDCA cycle on a separate worktree.
|
|
|
|
### Rules
|
|
|
|
1. **Non-overlapping file scope:** Each team must work on different files. If two tasks touch the same file, run them sequentially.
|
|
2. **Independent worktrees:** Each team's Maker gets its own worktree branch (`archeflow/team-1-maker`, `archeflow/team-2-maker`).
|
|
3. **First-finished-first-merged:** Teams merge in completion order. Later teams rebase onto the updated main before their own merge.
|
|
4. **Merge conflict handling:** If rebase fails, the later team re-runs its Check phase against the merged main. If conflicts are structural, escalate to user.
|
|
5. **Max 3 parallel teams:** More causes diminishing returns and merge headaches.
|
|
|
|
### Spawning Parallel Teams
|
|
|
|
```
|
|
# Launch 2-3 teams in a single message with multiple Agent calls:
|
|
Agent(description: "🏗️ Team 1: pagination fix (fast)", ...)
|
|
Agent(description: "🏗️ Team 2: JWT auth (standard)", ...)
|
|
Agent(description: "🏗️ Team 3: logging refactor (fast)", ...)
|
|
```
|
|
|
|
Each team follows the full PDCA steps independently. The orchestrator monitors all teams and handles merges.
|
|
|
|
---
|
|
|
|
## Orchestration Report
|
|
|
|
After completion, summarize:
|
|
|
|
```markdown
|
|
## ArcheFlow Orchestration Report
|
|
- **Task:** <description>
|
|
- **Workflow:** standard (2 cycles)
|
|
- **Cycle 1:** Guardian rejected (SQL injection in user input handler)
|
|
- **Cycle 2:** All approved after input sanitization added
|
|
- **Files changed:** 4 files, +120 -30 lines
|
|
- **Tests added:** 8 new tests
|
|
- **Branch:** archeflow/maker-<id> → merged to main
|
|
- **Metrics:** 9 agents, 4m 30s, 5 findings (4 resolved, 1 info remaining)
|
|
```
|