From c8bd55d97cd4b2b4f52fa667ba83d40f54ec7ba4 Mon Sep 17 00:00:00 2001 From: Christian Nennemann Date: Mon, 6 Apr 2026 20:44:46 +0200 Subject: [PATCH] =?UTF-8?q?refactor:=20consolidate=20run=20skill=20?= =?UTF-8?q?=E2=80=94=20merge=208=20skills=20into=20one=20self-contained=20?= =?UTF-8?q?PDCA=20orchestrator?= MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Merge run + orchestration + plan-phase + do-phase + artifact-routing + process-log + attention-filters + convergence + effectiveness into a single 459-line run/SKILL.md. Before: run skill (890 lines) + 3 prerequisites (~1,300 lines) = ~2,200 lines of context. After: one self-contained skill (459 lines) with zero prerequisites. Preserved: PDCA flow, workflow selection, adaptation rules A1-A3, agent prompts, attention filters, feedback routing, convergence detection, effectiveness scoring, shadow monitoring, pipeline strategy, event reference, artifact naming. Removed: verbose bash code blocks, shell variable tracking, resolve_model() function, lib validation loops, evidence validation bash, redundant event emission blocks. --- skills/artifact-routing/SKILL.md | 289 --------- skills/convergence/SKILL.md | 249 ------- skills/do-phase/SKILL.md | 193 ------ skills/effectiveness/SKILL.md | 200 ------ skills/orchestration/SKILL.md | 634 ------------------ skills/plan-phase/SKILL.md | 175 ----- skills/process-log/SKILL.md | 278 -------- skills/run/SKILL.md | 1039 +++++++++--------------------- 8 files changed, 304 insertions(+), 2753 deletions(-) delete mode 100644 skills/artifact-routing/SKILL.md delete mode 100644 skills/convergence/SKILL.md delete mode 100644 skills/do-phase/SKILL.md delete mode 100644 skills/effectiveness/SKILL.md delete mode 100644 skills/orchestration/SKILL.md delete mode 100644 skills/plan-phase/SKILL.md delete mode 100644 skills/process-log/SKILL.md diff --git a/skills/artifact-routing/SKILL.md b/skills/artifact-routing/SKILL.md deleted file mode 100644 index 5530752..0000000 --- a/skills/artifact-routing/SKILL.md +++ /dev/null @@ -1,289 +0,0 @@ ---- -name: artifact-routing -description: | - Inter-phase artifact protocol for ArcheFlow runs. Defines how artifacts are named, stored, - routed between agents, and archived across PDCA cycles. Ensures each agent receives exactly - the context it needs — no more, no less. - Automatically loaded by archeflow:run - User: "What does the Maker receive as context?" ---- - -# Artifact Routing — Inter-Phase Context Protocol - -Every ArcheFlow run produces artifacts — research notes, proposals, diffs, reviews, feedback. This skill defines how those artifacts are named, where they live, what each agent receives, and how they are preserved across cycles. - -## Artifact Directory Structure - -``` -.archeflow/artifacts// -├── plan-explorer.md # Explorer research output -├── plan-creator.md # Creator proposal/outline -├── do-maker.md # Maker implementation summary -├── do-maker-files.txt # List of files created/modified (one path per line) -├── check-guardian.md # Guardian review verdict + findings -├── check-sage.md # Sage review (if present) -├── check-skeptic.md # Skeptic review (if present) -├── check-trickster.md # Trickster review (if present) -├── act-feedback.md # Structured feedback for next cycle (Cycle Feedback Protocol) -├── act-fixes.jsonl # Applied fixes log (one JSON line per fix) -├── cycle-1/ # Archived artifacts from cycle 1 -│ ├── plan-explorer.md -│ ├── plan-creator.md -│ ├── do-maker.md -│ ├── do-maker-files.txt -│ ├── check-guardian.md -│ ├── check-sage.md -│ └── act-feedback.md -└── cycle-2/ # Archived artifacts from cycle 2 (if cycle 3 starts) - └── ... -``` - -## Naming Convention - -Artifacts follow the pattern: `-.` - -| Phase | Agent | Filename | Format | -|-------|-------|----------|--------| -| plan | explorer | `plan-explorer.md` | Markdown research report | -| plan | creator | `plan-creator.md` | Markdown proposal with confidence scores | -| plan | mini-explorer | `plan-mini-explorer.md` | Focused risk research (only if confidence gate triggers) | -| do | maker | `do-maker.md` | Markdown implementation summary | -| do | maker | `do-maker-files.txt` | Plain text, one file path per line | -| check | guardian | `check-guardian.md` | Markdown verdict + findings table | -| check | sage | `check-sage.md` | Markdown verdict + findings table | -| check | skeptic | `check-skeptic.md` | Markdown verdict + findings table | -| check | trickster | `check-trickster.md` | Markdown verdict + findings table | -| act | (orchestrator) | `act-feedback.md` | Structured feedback (see Cycle Feedback Protocol) | -| act | (orchestrator) | `act-fixes.jsonl` | JSONL fix log | - -**Rule:** Never invent new artifact names during a run. If a reviewer is skipped (A2 fast-path, reviewer profile), its artifact simply does not exist. Downstream phases check for file existence before reading. - ---- - -## Context Injection Rules - -Each agent receives a filtered subset of artifacts. This is the **attention filter** — it controls what context is injected into the agent's prompt. - -### Plan Phase - -| Agent | Receives | Does NOT receive | -|-------|----------|-----------------| -| **Explorer** | Task description, relevant file paths, codebase access | Prior proposals, review outputs, implementation details | -| **Creator** (cycle 1) | Task description, `plan-explorer.md` (if exists) | Raw file contents (Explorer summarized them), git diffs | -| **Creator** (cycle 2+) | Task description, `plan-explorer.md`, `act-feedback.md` (Creator-routed findings only) | Raw reviewer outputs, Maker-routed findings | - -**Creator context injection template (cycle 2+):** -```markdown -## Task - - -## Research (from Explorer) - - -## Feedback from Prior Cycle - - -Note: Address each unresolved issue listed above. Explain how your revised proposal resolves it. -``` - -### Do Phase - -| Agent | Receives | Does NOT receive | -|-------|----------|-----------------| -| **Maker** (cycle 1) | `plan-creator.md` (the proposal), `plan-mini-explorer.md` (if exists) | `plan-explorer.md`, reviewer outputs, raw task description | -| **Maker** (cycle 2+) | `plan-creator.md`, `plan-mini-explorer.md` (if exists), Maker-routed findings from `act-feedback.md` | Explorer research, Guardian/Skeptic findings (those went to Creator) | - -**Maker context injection template (cycle 2+):** -```markdown -## Proposal - - -## Implementation Feedback from Prior Cycle - - -Note: The proposal has been revised to address design-level issues. Focus on the implementation -feedback items above (code quality, test gaps, consistency). -``` - -**Why Maker doesn't get Explorer output:** The Creator already distilled Explorer's research into a concrete proposal. Giving Maker raw research causes scope creep and "Rogue" shadow activation. - -### Check Phase - -| Agent | Receives | Does NOT receive | -|-------|----------|-----------------| -| **Guardian** | Maker's git diff, risk section from `plan-creator.md` | Full proposal, Explorer research, other reviewer outputs | -| **Skeptic** | `plan-creator.md` (assumptions focus) | Git diff details, Explorer research, other reviewer outputs | -| **Sage** | `plan-creator.md`, Maker's git diff, `do-maker.md` | Explorer research, other reviewer outputs | -| **Trickster** | Maker's git diff only | Everything else | - -**Guardian context injection template:** -```markdown -## Changes to Review - - -## Risk Assessment (from proposal) - - -Review these changes for security, reliability, breaking changes, and dependency risks. -``` - -**Skeptic context injection template:** -```markdown -## Proposal to Challenge - - -Focus on assumptions, alternatives not considered, edge cases, and scalability. -``` - -**Sage context injection template:** -```markdown -## Proposal - - -## Implementation Summary - - -## Changes - - -Evaluate code quality, test coverage, documentation, and codebase consistency. -``` - -**Trickster context injection template:** -```markdown -## Changes to Attack - - -Try to break this. Malformed input, boundaries, concurrency, error paths, dependency failures. -``` - -### Act Phase - -No agents are spawned in Act. The orchestrator reads all `check-*.md` artifacts directly. - ---- - -## Feedback Routing - -> **This is the canonical routing table.** Other skills (orchestration, act-phase) must match this table exactly. When updating routing rules, update this table first, then sync the others. - -When building `act-feedback.md` after the Check phase, route each finding to the right agent for the next cycle: - -| Finding Source | Finding Category | Routes To | Rationale | -|---------------|-----------------|-----------|-----------| -| Guardian | security, breaking-change | **Creator** | Design must change | -| Guardian | reliability, dependency | **Creator** | Architectural decision needed | -| Skeptic | design, scalability | **Creator** | Assumptions need revision | -| Sage | quality, consistency | **Maker** | Implementation refinement | -| Sage | testing | **Maker** | Test gap, not design flaw | -| Trickster | reliability (design flaw) | **Creator** | Needs redesign | -| Trickster | reliability (test gap) | **Maker** | Needs more tests | -| Trickster | testing | **Maker** | Edge case not covered | - -**Disambiguation rule:** When in doubt: if the fix requires changing the approach, route to Creator. If it requires changing the code within the existing approach, route to Maker. - -### Feedback File Format - -`act-feedback.md` is split into two sections so each agent can be given only its portion: - -```markdown -# Cycle Feedback - -## Creator-Routed Issues -| # | Source | Severity | Category | Issue | Suggested Fix | -|---|--------|----------|----------|-------|---------------| -| 1 | Guardian | CRITICAL | security | SQL injection in user input | Add parameterized queries | -| 2 | Skeptic | WARNING | design | Assumes single-tenant only | Add tenant isolation | - -## Maker-Routed Issues -| # | Source | Severity | Category | Issue | Suggested Fix | -|---|--------|----------|----------|-------|---------------| -| 3 | Sage | WARNING | quality | Test names don't describe behavior | Rename to describe expected outcome | -| 4 | Sage | INFO | consistency | Import order doesn't match codebase style | Re-order imports | - -## Resolved (from prior cycles) -| # | Source | Issue | Resolution | Resolved In | -|---|--------|-------|------------|-------------| -| 1 | Guardian | Missing rate limit | Added rate limiter middleware | Cycle 1 | - -## Convergence Warnings - -``` - -When injecting feedback into Creator's prompt, include **only** the "Creator-Routed Issues" section. -When injecting feedback into Maker's prompt, include **only** the "Maker-Routed Issues" section. - ---- - -## Cycle Archiving - -When a PDCA cycle completes and a new cycle begins, archive the current artifacts so they are preserved and the working directory is clean for the next iteration. - -### Archive Procedure - -At the end of each cycle (before starting the next): - -```bash -RUN_DIR=".archeflow/artifacts/${RUN_ID}" -ARCHIVE_DIR="${RUN_DIR}/cycle-${CYCLE}" - -mkdir -p "$ARCHIVE_DIR" - -# Copy all phase artifacts to archive -cp "${RUN_DIR}"/plan-*.md "$ARCHIVE_DIR/" 2>/dev/null || true -cp "${RUN_DIR}"/do-*.md "$ARCHIVE_DIR/" 2>/dev/null || true -cp "${RUN_DIR}"/do-*.txt "$ARCHIVE_DIR/" 2>/dev/null || true -cp "${RUN_DIR}"/check-*.md "$ARCHIVE_DIR/" 2>/dev/null || true -cp "${RUN_DIR}"/act-feedback.md "$ARCHIVE_DIR/" 2>/dev/null || true -``` - -**Do NOT delete** the working-level artifacts after archiving. The next cycle's agents need `act-feedback.md` and `plan-explorer.md` (Explorer cache may reuse prior research). Old artifacts in the working directory get overwritten when the new cycle's agents produce their outputs. - -### Archive Access - -Archived artifacts are read-only references. Use them for: -- **Resolution tracking:** Compare `cycle-1/check-guardian.md` findings against `cycle-2/check-guardian.md` to detect resolved/persisting issues -- **Convergence detection:** Same finding in `cycle-N/act-feedback.md` and `cycle-N+1/act-feedback.md` → escalate to user -- **Post-hoc analysis:** Understanding how a solution evolved across cycles - ---- - -## Artifact Existence Checks - -Before injecting an artifact into an agent's context, always check if the file exists. Missing artifacts are expected in certain workflows: - -| Artifact | Missing when | -|----------|-------------| -| `plan-explorer.md` | Fast workflow (no Explorer) | -| `plan-mini-explorer.md` | Confidence gate did not trigger for risk coverage | -| `check-skeptic.md` | Fast workflow, or A2 fast-path taken | -| `check-sage.md` | Fast workflow, or A2 fast-path taken | -| `check-trickster.md` | Non-thorough workflow, or A2 fast-path taken | -| `act-feedback.md` | Cycle 1 (no prior feedback exists) | -| `act-fixes.jsonl` | Cycle 1, or no fixes applied | - -**Rule:** Never fail because an optional artifact is missing. Check existence, skip injection if absent, and note what was skipped in the event data. - ---- - -## Git Diff as Artifact - -The Maker's git diff is not saved as a file — it is generated on-the-fly from the Maker's worktree branch: - -```bash -git diff main... -``` - -This ensures reviewers always see the actual current diff, not a stale snapshot. The diff is injected directly into reviewer prompts, not saved to disk. - -Exception: `do-maker-files.txt` IS saved to disk (just the file list, not the full diff) for quick reference by the orchestrator and for archiving purposes. - ---- - -## Design Principles - -1. **Minimal context per agent.** Each agent gets only what it needs. Over-injection causes distraction, shadow activation, and wasted tokens. -2. **Artifacts are the handoff mechanism.** Agents never communicate directly. All inter-agent data flows through saved artifacts. -3. **Files over memory.** Everything is on disk. If a session crashes, artifacts survive. A `--start-from` resume reads artifacts, not session state. -4. **Overwrite, don't accumulate.** Working-level artifacts get overwritten each cycle. Archives preserve history. This keeps the working directory simple. -5. **Check before inject.** Always verify artifact existence. Gracefully handle missing optional artifacts. diff --git a/skills/convergence/SKILL.md b/skills/convergence/SKILL.md deleted file mode 100644 index ebab8e2..0000000 --- a/skills/convergence/SKILL.md +++ /dev/null @@ -1,249 +0,0 @@ ---- -name: convergence -description: | - Detects convergence, stalling, and oscillation in multi-cycle PDCA runs. Prevents wasted cycles - by stopping early when findings are not being resolved or are bouncing between cycles. - Automatically loaded during Act phase before exit decision - User: "Is the run converging?" ---- - -# Convergence Detection - -In multi-cycle PDCA runs, the Act phase must decide whether another cycle will help or just waste tokens. This skill provides the analysis: are findings being resolved (converging), staying the same (stalling), or bouncing back (oscillating)? - -## When It Runs - -Convergence analysis runs **after the Check phase completes and before the Act phase exit decision**. It requires at least 2 cycles of data — on cycle 1, it is skipped (no comparison baseline). - -``` -Check phase → Convergence Analysis → Act phase exit decision -``` - ---- - -## Step 1: Finding Comparison - -Extract findings from the current cycle and compare against the previous cycle. - -### Data Sources - -- **Current cycle findings:** Parsed from `check-*.md` artifacts in `.archeflow/artifacts//` -- **Previous cycle findings:** Parsed from `check-*.md` artifacts in `.archeflow/artifacts//cycle-/` - -Each finding is identified by a composite key: `source + category + file_location + description_keywords`. - -### Finding Categories - -Every finding from the current cycle is classified into exactly one category: - -| Category | Definition | -|----------|------------| -| **NEW** | Finding not present in any previous cycle | -| **RESOLVED** | Was present in the previous cycle, absent in the current cycle | -| **PERSISTENT** | Present in both the current and previous cycle (same key) | -| **REGRESSED** | Was RESOLVED in the previous cycle (was present in N-2, absent in N-1), but returned in the current cycle | - -### Matching Algorithm - -Two findings match if: -1. Same `source` archetype (guardian, sage, etc.) -2. Same `category` (security, reliability, quality, etc.) -3. Same or overlapping file location (same file, line within 10 lines) -4. 50%+ keyword overlap in description (lowercase, strip punctuation) - -All four conditions must hold. This prevents false matches across unrelated findings. - ---- - -## Step 2: Convergence Score - -Calculate a convergence score from the categorized findings: - -``` -convergence = resolved_count / (resolved_count + new_count + regressed_count) -``` - -If the denominator is 0 (no resolved, no new, no regressed — only persistent), the score is `0.0` (stalled, not converging). - -### Score Interpretation - -| Score Range | Status | Meaning | -|-------------|--------|---------| -| > 0.8 | **Converging** | Most issues being resolved, few new ones introduced | -| 0.5 - 0.8 | **Stalling** | Fixing roughly as many as introducing | -| < 0.5 | **Diverging** | Making things worse — more new/regressed than resolved | -| 0.0 (all persistent) | **Stuck** | No progress in either direction | - ---- - -## Step 3: Oscillation Detection - -An oscillating finding is one that bounces between resolved and re-introduced across cycles: - -1. Finding was present in cycle N-2 -2. Finding was absent in cycle N-1 (resolved) -3. Finding is present again in cycle N (regressed) - -This indicates the fix in cycle N-1 was undone or invalidated by other changes in cycle N. - -### Oscillation Rules - -- A single oscillating finding: **flag it** in the convergence report but continue. -- Two or more oscillating findings: **STOP** and escalate to the user. -- Message: `"Findings X and Y are oscillating between cycles. Manual intervention needed — the automated fixes are interfering with each other."` - -Oscillation tracking requires 3+ cycles of data. On cycles 1-2, oscillation detection is skipped. - ---- - -## Step 4: Early Termination Rules - -The convergence analysis can override the normal Act phase exit decision. If any of these conditions hold, the recommendation is **STOP**: - -| Condition | Threshold | Recommendation | -|-----------|-----------|----------------| -| Diverging | Score < 0.5 for 2 consecutive cycles | STOP — changes are making things worse | -| Stalled | 0 findings resolved between cycles | STOP — no progress, further cycles will not help | -| Stuck | All findings are PERSISTENT for 2 consecutive cycles | STOP — automated fixes cannot resolve these | -| Oscillating | 2+ findings oscillating | STOP — fixes are interfering with each other | - -When STOP is recommended, the Act phase should: -1. **Not** start another PDCA cycle -2. Report all unresolved findings to the user -3. Present the best implementation so far (on its branch, not merged) -4. Include the convergence report explaining why the run was stopped - -### Override Behavior - -The convergence STOP recommendation overrides the normal cycle-back logic in the Act phase. Even if `CYCLE < MAX_CYCLES` and there are fixable-looking findings, if convergence says STOP, the run stops. - -The user can always override by explicitly requesting another cycle: `"Run one more cycle anyway"`. - ---- - -## Step 5: Integration with Act Phase - -### Event Data - -Convergence data is included in the `cycle.boundary` event emitted by the Act phase: - -```json -{ - "type": "cycle.boundary", - "phase": "act", - "data": { - "cycle": 2, - "max_cycles": 3, - "exit_condition": "convergence_stop", - "met": false, - "fixes_applied": 2, - "next_action": "stop", - "convergence": { - "score": 0.35, - "status": "diverging", - "resolved": 1, - "new": 2, - "regressed": 1, - "persistent": 3, - "oscillating": ["Timeline reference mismatch"], - "recommendation": "stop", - "reason": "Diverging for 2 consecutive cycles" - } - } -} -``` - -### Decision Tree Update - -The Act phase decision tree (from `act-phase` skill Step 4) gains a new first branch: - -``` -┌─ Convergence analysis (cycle 2+) -│ -├─ Convergence says STOP -│ └─ STOP: Report to user with convergence report -│ -├─ Convergence says CONTINUE -│ └─ Fall through to normal exit decision logic -│ -└─ Cycle 1 (no convergence data) - └─ Fall through to normal exit decision logic -``` - -### Act Feedback Enhancement - -When the Act phase builds `act-feedback.md` for the next cycle, it includes the convergence summary at the top: - -```markdown -## Convergence Analysis (Cycle 1 → 2) - -Score: 0.75 (converging) -Resolved: 3 | New: 1 | Regressed: 0 | Persistent: 2 - -Recommendation: Continue — trend is positive - -### Finding Status -| Finding | Status | Cycles | -|---------|--------|--------| -| SQL injection in user input | RESOLVED | 1 | -| Missing rate limit | RESOLVED | 1 | -| Test names unclear | RESOLVED | 1 | -| Null check missing in parser | PERSISTENT | 2 | -| Error path not tested | PERSISTENT | 2 | -| New: Unused import introduced | NEW | 1 | -``` - ---- - -## Convergence Report Format - -The full convergence report is generated as part of the orchestration output: - -```markdown -## Convergence Analysis (Cycle N-1 → N) - -**Score:** 0.75 (converging) -**Resolved:** 3 | **New:** 1 | **Regressed:** 0 | **Persistent:** 2 | **Oscillating:** 0 - -### Resolved This Cycle -| Source | Category | Description | -|--------|----------|-------------| -| guardian | security | SQL injection in user input handler | -| guardian | reliability | Missing rate limit on auth endpoint | -| sage | quality | Test names don't describe behavior | - -### New This Cycle -| Source | Category | Description | -|--------|----------|-------------| -| sage | quality | Unused import introduced by fix | - -### Persistent (unresolved across cycles) -| Source | Category | Description | Cycles Open | -|--------|----------|-------------|-------------| -| trickster | reliability | Null check missing in parser | 2 | -| sage | testing | Error path not tested | 2 | - -### Oscillating -(none) - -**Recommendation:** Continue — trend is positive -``` - ---- - -## Integration with Memory Skill - -When convergence detects PERSISTENT findings (present for 2+ cycles), these are strong candidates for the `memory` skill's lesson extraction: - -- After a run that had persistent findings, `archeflow-memory.sh extract` will pick these up with higher confidence (they have been confirmed across multiple cycles within a single run). -- Persistent findings that also appear in `lessons.jsonl` from prior runs get a double frequency boost (cross-cycle within run + cross-run pattern). - ---- - -## Design Principles - -1. **Conservative stopping.** Requires 2 consecutive data points before recommending STOP. A single bad cycle might be noise. -2. **User has final say.** STOP is a recommendation, not an enforced shutdown. The user can override. -3. **Cheap computation.** Keyword matching on finding descriptions, simple arithmetic on counts. No ML, no embeddings. -4. **Bounded scope.** Only compares adjacent cycles (N vs N-1, with N-2 for oscillation). Does not attempt to model long-term trends across many cycles. -5. **Observable.** All convergence data is included in the `cycle.boundary` event, making it available for post-hoc analysis via the process log. diff --git a/skills/do-phase/SKILL.md b/skills/do-phase/SKILL.md deleted file mode 100644 index 39d560a..0000000 --- a/skills/do-phase/SKILL.md +++ /dev/null @@ -1,193 +0,0 @@ ---- -name: do-phase -description: Use when acting as Maker in the Do phase. Defines execution rules, worktree protocol, commit discipline, and output format. ---- - -# Do Phase - -Maker implements the Creator's proposal. This skill defines the execution protocol — the agent definition (`agents/maker.md`) has the behavioral rules. - -## Execution Protocol - -### 1. Read Before Writing -Read the Creator's proposal completely. Identify: -- Files to create or modify (the `### Changes` section) -- Test strategy (the `### Test Strategy` section) -- Scope boundaries (the `### Not Doing` section) - -If the proposal is unclear on any point: implement your best interpretation and note the assumption in your output. - -### 2. Implementation Order -For each change in the proposal: -1. Write the test first (expect it to fail) -2. Implement the change (make the test pass) -3. Verify existing tests still pass -4. Commit with a descriptive message - -For writing domain (stories, prose): -1. Read the outline / scene plan -2. Read the voice profile and character sheets -3. Draft scene by scene, following the outline's emotional beats -4. Self-check: does the voice hold? Does dialogue sound natural? -5. Commit after each scene or logical section - -### 3. Commit Discipline - -**CRITICAL: Always commit before finishing.** Uncommitted worktree changes are LOST when the agent exits. - -Commit conventions: -``` -feat: # New functionality -fix: # Bug fix within the task -test: # Test additions -docs: # Documentation only -``` - -Commit frequency: -- **Code:** After each logical step (one feature, one fix, one test suite) -- **Writing:** After each scene or section (~500-1000 words) -- **Never:** One big commit at the end with everything - -### 4. Scope Control - -Do exactly what the proposal says. No more, no less. - -**In scope:** -- Files listed in the proposal's `### Changes` section -- Tests specified in the `### Test Strategy` section -- Dependencies explicitly mentioned - -**Out of scope (even if tempting):** -- Refactoring code you noticed while implementing -- Adding features not in the proposal -- Fixing pre-existing bugs in adjacent code -- Updating documentation beyond what the task requires - -If you encounter something that needs fixing but is out of scope: note it in `### Notes` for future work. Don't fix it now. - -### 5. Blocker Protocol - -If you hit a blocker (dependency missing, test infrastructure broken, proposal contradicts codebase): -1. Document what's blocked and why -2. Document what you completed before the block -3. Commit what you have -4. Stop and report — don't silently work around it - -## Worktree Protocol - -When running in an isolated git worktree (`isolation: "worktree"`): - -``` -main branch (untouched) -└── archeflow/maker- (worktree branch) - ├── commit: implementation step 1 - ├── commit: implementation step 2 - └── commit: implementation step 3 (final) -``` - -- All work stays on the worktree branch -- Main branch is never modified directly -- The branch name follows the pattern: `archeflow/maker-` -- After Check phase approves: the orchestrator merges (not the Maker) - -## Output Format - -```markdown -## Implementation: - -### Files Changed -- `path/file.ext` — What changed (+N -M lines) - -### Tests -- N new tests, all passing -- M existing tests still passing - -### Commits -1. `feat: description` (hash) -2. `test: description` (hash) - -### Notes -- Assumptions made where proposal was unclear -- Out-of-scope issues noticed (for future work) - -### Branch -`archeflow/maker-` — ready for review -``` - -For writing domain: -```markdown -## Draft: - -### Scenes Written -- Scene 1: (~N words) -- Scene 2: <title> (~N words) - -### Word Count -- Target: N | Actual: M | Delta: +/- - -### Voice Notes -- Dialect usage: N instances (target: moderate) -- Essen/Trinken: present in X/Y scenes - -### Commits -1. `feat: scene 1 - <title>` (hash) -2. `feat: scene 2 - <title>` (hash) - -### Notes -- Deviations from outline (with reasoning) -``` - -## With Prior Feedback (Cycle 2+) - -When the Maker receives feedback from a prior cycle's Check phase: - -1. Read the `act-feedback.md` — focus on the `### For Maker` section -2. Address each finding marked as "routed to Maker" -3. In your output, include a response table: - -```markdown -### Feedback Response -| Finding | Source | Action | -|---------|--------|--------| -| Test names unclear | Sage | Fixed — renamed to behavior descriptions | -| Missing edge case | Trickster | Added test for empty input | -``` - -Do not address findings routed to Creator — those were handled in the revised proposal. - -## Quality Checklist (self-check before finishing) - -Before your final commit, verify: -- [ ] All proposal changes implemented -- [ ] All new tests pass -- [ ] All existing tests still pass -- [ ] No files modified outside proposal scope -- [ ] Every logical step has its own commit -- [ ] Output summary is complete and accurate -- [ ] Branch name follows convention - -## Test-First Gate - -Before the Maker's output is accepted, the orchestrator validates that tests were included. - -### Validation Logic - -Read `do-maker-files.txt`. Check if any file path matches common test patterns: -- `*test*`, `*spec*`, `*.test.*`, `*.spec.*`, `*_test.*`, `*_spec.*` -- Files in directories named `test/`, `tests/`, `__tests__/`, `spec/` - -For writing domain projects, this gate is skipped. - -### Outcomes - -| Result | Action | -|--------|--------| -| Test files found | Pass — proceed to Check phase | -| No test files, code domain | **Warn** — emit WARNING event, note in do-maker.md | -| No test files + Creator specified tests | **Block** — re-run Maker with test instruction (1 retry) | -| Writing domain | Skip gate entirely | - -The block case triggers a targeted re-run with prompt: -"The proposal specified these test cases: <test strategy section>. No test files -were found in your changes. Add the specified tests before finishing." -This is one retry within the Do phase, not a full PDCA cycle. diff --git a/skills/effectiveness/SKILL.md b/skills/effectiveness/SKILL.md deleted file mode 100644 index ded06e7..0000000 --- a/skills/effectiveness/SKILL.md +++ /dev/null @@ -1,200 +0,0 @@ ---- -name: effectiveness -description: | - Track archetype effectiveness across runs. Scores each archetype on signal-to-noise, - fix rate, cost efficiency, accuracy, and cycle impact. Recommends model tier changes - and archetype removal based on rolling averages. - <example>User: "Which reviewers are actually useful?"</example> - <example>User: "Show archetype effectiveness report"</example> ---- - -# Agent Effectiveness Scoring - -Track which archetypes are most useful vs. which waste tokens. Over multiple runs, build a profile of each archetype's effectiveness and use it to optimize team composition and model selection. - -## Storage - -``` -.archeflow/memory/effectiveness.jsonl # Per-run archetype scores (append-only) -``` - -## Scoring Dimensions - -For each archetype that participates in a run, calculate these scores: - -| Dimension | How Measured | Weight | -|-----------|-------------|--------| -| **Signal-to-noise** | useful findings / total findings | 0.30 | -| **Fix rate** | findings that led to actual fixes / total findings | 0.25 | -| **Cost efficiency** | useful findings per dollar spent | 0.20 | -| **Accuracy** | findings not contradicted by other reviewers | 0.15 | -| **Cycle impact** | did this archetype's findings lead to cycle exit? | 0.10 | - -### Definitions - -- **Useful finding**: A finding in a `review.verdict` event with `severity >= WARNING` (i.e., severity is `warning`, `bug`, or `critical`) AND `fix_required == true`. -- **Actual fix**: A `fix.applied` event whose `source` field matches this archetype (or whose DAG `parent` chain traces back to this archetype's `review.verdict` event). -- **Contradicted finding**: Another reviewer's `review.verdict` has `verdict == "approved"` for the same scope where this archetype flagged an issue. Approximation: if archetype A flags N findings but archetype B approves the same code with 0 findings in overlapping severity categories, A's unmatched findings are considered potentially contradicted. -- **Cycle impact**: The archetype's findings (with `fix_required == true`) resulted in fixes that were part of the final approved cycle. Determined by checking if `fix.applied` events referencing this archetype exist before the final `cycle.boundary` with `met == true`. - -### Composite Score - -``` -composite = (signal_to_noise * 0.30) - + (fix_rate * 0.25) - + (cost_efficiency_normalized * 0.20) - + (accuracy * 0.15) - + (cycle_impact * 0.10) -``` - -**Cost efficiency normalization**: Raw cost efficiency is `useful_findings / cost_usd`. To normalize to 0-1 range, use: `min(1.0, raw_efficiency / 100)`. The threshold of 100 means "100 useful findings per dollar" is considered perfect efficiency (achievable with haiku on structured reviews). - -## Per-Run Scoring - -After `run.complete`, calculate scores for each archetype that participated. The `extract` command does this. - -### Per-Run Score Record - -```jsonl -{"ts":"2026-04-03T16:00:00Z","run_id":"2026-04-03-der-huster","archetype":"guardian","signal_to_noise":0.85,"fix_rate":1.0,"cost_efficiency":42.5,"accuracy":1.0,"cycle_impact":true,"composite_score":0.91,"tokens":5000,"cost_usd":0.004,"model":"haiku","findings_total":4,"findings_useful":3,"fixes_applied":3} -``` - -Appended to `.archeflow/memory/effectiveness.jsonl`. - -### Scoring Non-Review Archetypes - -Only archetypes that produce `review.verdict` events are scored (Guardian, Skeptic, Sage, Trickster, and any custom review archetypes). Non-review archetypes (Explorer, Creator, Maker) are tracked by cost-tracking but not effectiveness-scored, because their output quality is measured differently (by whether the run succeeds, not by individual findings). - -## Aggregate Scoring - -Across all runs, maintain rolling averages (computed on-demand, not stored): - -```jsonl -{"archetype":"guardian","runs":12,"avg_composite":0.88,"avg_signal_noise":0.82,"avg_cost_efficiency":38.2,"trend":"stable","recommendation":"keep"} -{"archetype":"trickster","runs":8,"avg_composite":0.35,"avg_signal_noise":0.20,"avg_cost_efficiency":5.1,"trend":"declining","recommendation":"consider_removing"} -``` - -### Trend Calculation - -Compare the average composite score of the last 5 runs to the 5 runs before that: - -- **improving**: last-5 avg > prior-5 avg + 0.05 -- **declining**: last-5 avg < prior-5 avg - 0.05 -- **stable**: within +/- 0.05 - -If fewer than 10 runs exist, trend is `"insufficient_data"`. - -### Recommendations - -Based on aggregate composite scores: - -| Composite Score | Recommendation | Meaning | -|----------------|---------------|---------| -| >= 0.70 | `keep` | Archetype is valuable, contributes meaningful findings | -| 0.40 - 0.69 | `optimize` | Consider cheaper model or tighter review lens | -| < 0.40 | `consider_removing` | Might be wasting tokens, review whether it adds value | - -## Integration Points - -### At Run Start - -When the `run` skill initializes, show a brief effectiveness summary for the team's archetypes: - -``` -Archetype effectiveness (last 10 runs): - guardian: 0.88 (keep) — haiku, $0.004/run avg - sage: 0.72 (keep) — sonnet, $0.08/run avg - skeptic: 0.45 (optimize) — haiku, $0.003/run avg - trickster: 0.32 (consider_removing) — haiku, $0.003/run avg -``` - -### Model Tier Suggestions - -Cross-reference effectiveness with model assignment: - -- **High effectiveness on cheap model** (composite >= 0.7, model = haiku): "Keep cheap. Working well." -- **Low effectiveness on cheap model** (composite < 0.5, model = haiku): "Consider upgrading to sonnet — cheap model may not be capturing issues." -- **High effectiveness on expensive model** (composite >= 0.7, model = sonnet): "Try downgrading to haiku — may maintain quality at lower cost." -- **Low effectiveness on expensive model** (composite < 0.5, model = sonnet): "Consider removing — expensive and not contributing." - -### Cost-Tracking Integration - -Multiply estimated cost by effectiveness to get "value per dollar": - -``` -value_per_dollar = composite_score / cost_usd -``` - -This metric helps compare archetypes directly: a cheap archetype with moderate effectiveness may have higher value_per_dollar than an expensive one with high effectiveness. - -## Effectiveness Script - -**Location:** `lib/archeflow-score.sh` - -``` -Usage: - archeflow-score.sh extract <events.jsonl> # Score archetypes from a completed run - archeflow-score.sh report # Show aggregate effectiveness report - archeflow-score.sh recommend <team.yaml> # Recommend model tiers for a team -``` - -### `extract` Command - -1. Read all events from the JSONL file -2. Verify a `run.complete` event exists (scoring incomplete runs is unreliable) -3. For each `review.verdict` event: - - Count total findings and useful findings (severity >= WARNING, fix_required) - - Cross-reference with `fix.applied` events via the `source` field or DAG parent chain - - Check for contradictions from other reviewers - - Determine cycle impact -4. Calculate all scoring dimensions and composite score -5. Append per-archetype score records to `.archeflow/memory/effectiveness.jsonl` - -### `report` Command - -1. Read `.archeflow/memory/effectiveness.jsonl` -2. Group by archetype -3. Calculate rolling averages (last 10 runs per archetype) -4. Calculate trends (last 5 vs. prior 5) -5. Output a markdown table: - -```markdown -# Archetype Effectiveness Report - -| Archetype | Runs | Avg Score | S/N | Fix Rate | Cost Eff | Accuracy | Trend | Rec | -|-----------|------|-----------|-----|----------|----------|----------|-------|-----| -| guardian | 12 | 0.88 | 0.82 | 0.95 | 38.2 | 0.97 | stable | keep | -| sage | 10 | 0.72 | 0.70 | 0.80 | 12.1 | 0.88 | improving | keep | -| skeptic | 8 | 0.45 | 0.40 | 0.50 | 22.5 | 0.60 | stable | optimize | -| trickster | 8 | 0.35 | 0.20 | 0.30 | 5.1 | 0.55 | declining | consider_removing | - -**Model suggestions:** -- skeptic (haiku, score 0.45): Consider upgrading to sonnet or tightening review lens -- trickster (haiku, score 0.35): Consider removing — low signal, low fix rate -``` - -### `recommend` Command - -1. Read the team preset YAML file -2. For each archetype in the team, look up its effectiveness from `.archeflow/memory/effectiveness.jsonl` -3. Cross-reference current model assignment with effectiveness -4. Output recommendations: - -```markdown -# Model Recommendations for team: story-development - -| Archetype | Current Model | Score | Suggestion | -|-----------|--------------|-------|------------| -| guardian | haiku | 0.88 | Keep haiku — high effectiveness at low cost | -| sage | sonnet | 0.72 | Keep sonnet — quality-sensitive role | -| skeptic | haiku | 0.45 | Try sonnet — may improve signal quality | -| trickster | haiku | 0.35 | Consider removing from team | -``` - -## Design Principles - -1. **Append-only.** Score records are immutable facts. Aggregates are computed on-demand. -2. **Review archetypes only.** Non-review agents (Explorer, Creator, Maker) are not scored — their value is in the final product, not in individual findings. -3. **Relative, not absolute.** Scores are meaningful in comparison (guardian vs. trickster), not as standalone numbers. The thresholds (0.7, 0.4) are starting points — calibrate after 20+ runs. -4. **Actionable.** Every report ends with concrete recommendations (keep, optimize, remove, change model). -5. **Cheap to compute.** One JSONL scan per report. No databases, no external services. diff --git a/skills/orchestration/SKILL.md b/skills/orchestration/SKILL.md deleted file mode 100644 index 68846c2..0000000 --- a/skills/orchestration/SKILL.md +++ /dev/null @@ -1,634 +0,0 @@ ---- -name: orchestration -description: Use when executing a multi-agent orchestration — spawning archetype agents, managing PDCA cycles, coordinating worktrees, and merging results. This is the step-by-step execution guide. ---- - -# Orchestration Execution - -This skill guides you through running a full ArcheFlow orchestration using Claude Code's native Agent tool and git worktrees. - -## Strategy Selection - -A **strategy** defines the shape of an orchestration run — which phases execute, in what order, and when to iterate. A **workflow** (fast/standard/thorough) controls the depth within a strategy. - -### Available Strategies - -| Strategy | Flow | When to Use | -|----------|------|-------------| -| `pdca` | Plan -> Do -> Check -> Act (cyclic) | Refactors, thorough reviews, multi-concern tasks | -| `pipeline` | Plan -> Implement -> Spec-Review -> Quality-Review -> Verify (linear) | Bug fixes, fast patches, single-concern tasks | -| `auto` | Selected by task analysis | Default — let ArcheFlow decide | - -### Strategy Interface - -Every strategy defines: - -- **Phases** — ordered list of execution stages -- **Agent mapping** — which archetypes run in each phase -- **Transition rules** — conditions for moving between phases -- **Iteration model** — cyclic (PDCA) or linear (pipeline) -- **Exit conditions** — when the run terminates - -### PDCA Strategy - -The existing orchestration flow (Steps 0-4 below). Cyclic — the Act phase can feed back to Plan for another iteration. Best for tasks requiring multiple review perspectives and iterative refinement. - -### Pipeline Strategy - -Linear flow with no cycle-back. Faster for well-understood tasks where one pass is sufficient. - -| Phase | Agent | Purpose | -|-------|-------|---------| -| Plan | Creator | Design proposal | -| Implement | Maker | Build in worktree | -| Spec-Review | Guardian, then Skeptic | Security + assumption check (sequential) | -| Quality-Review | Sage | Code quality review | -| Verify | (automated) | Run tests, apply targeted fix if CRITICAL | - -No cycle-back — WARNINGs are logged but do not block. CRITICALs in Verify trigger a single targeted fix attempt by the Maker, not a full cycle. - -### Auto-Selection Rules - -When `strategy: auto` (default): - -- Task contains "fix", "bug", "patch", "hotfix" → `pipeline` -- Task contains "refactor", "redesign", "review" → `pdca` -- Workflow is `thorough` → `pdca` (always) -- Workflow is `fast` with single file → `pipeline` -- Otherwise → `pdca` - ---- - -## Step 0: Choose a Workflow - -If `.archeflow/teams/<name>.yaml` exists, the user can reference a team preset: `"Use the backend team"`. Load the preset's phase config instead of built-in defaults. See `archeflow:custom-archetypes` skill for preset format. - -Otherwise, assess the task and pick: - -| Signal | Workflow | -|--------|----------| -| Small fix, low risk, single concern | `fast` (1 cycle) | -| Feature, multiple files, moderate risk | `standard` (2 cycles) | -| Security-sensitive, breaking changes, public API | `thorough` (3 cycles) | - -## Workflow Adaptation Rules - -The initial workflow choice is a starting point, not a commitment. These rules adapt the workflow at runtime. Each rule specifies when it evaluates (which phase boundary). - -### A3: Confidence Gate (evaluates: after Plan, before Do) - -**When:** Creator's confidence table has any axis below 0.5. -**Action by axis:** - -| Axis | Score < 0.5 Action | -|------|-------------------| -| Task understanding | **Pause.** Ask user to clarify before proceeding. Do not spawn Maker. | -| Solution completeness | **Upgrade to standard.** Add Explorer before Maker starts. | -| Risk coverage | **Spawn mini-Explorer** for the specific risky area (parallel, 5 min max). Maker can proceed. | - -A3 runs before any Do/Check agents spawn, so there are no cancellation issues. - -### A1: Conditional Escalation (evaluates: after Check, before next cycle) - -**When:** Guardian rejects with 2+ CRITICAL findings in a `fast` workflow. -**Action:** Escalate to `standard` for the **next cycle** — add Skeptic + Sage to the reviewer roster. -**Why:** If Guardian found serious issues, more perspectives help find root causes. -**Sticky:** Once escalated, the workflow stays escalated for all remaining cycles. A2 does not apply to escalated workflows. - -### A2: Guardian Fast-Path (evaluates: after Guardian, before spawning other reviewers) - -**When:** Guardian finds 0 CRITICAL and 0 WARNING in a non-escalated `standard` or `thorough` workflow. -**Action:** Do not spawn Skeptic, Sage, or Trickster. Proceed directly to Act phase. -**Why:** Guardian's security review is the strictest gate. Clean pass = safe to skip additional reviewers. -**Critical:** Evaluate A2 **after Guardian completes but before other reviewers are spawned.** Do not spawn reviewers in parallel with Guardian — spawn Guardian first, check A2, then spawn remaining reviewers only if A2 doesn't trigger. -**Does not apply to:** Escalated workflows (A1 triggered), or first cycle of `thorough` workflows (Trickster is mandatory on first pass). -**Log:** Note "Guardian fast-path taken" in orchestration report. - -### Evaluation Order - -``` -Plan phase completes → A3 (confidence gate) - ↓ -Guardian completes → A2 (fast-path check) → if clean, skip other reviewers - ↓ if not, spawn other reviewers -Check phase done → A1 (escalation check) → if 2+ CRITICALs in fast, next cycle is standard -``` - -## Process Logging - -If `.archeflow/events/` exists (or should be created), emit structured events throughout orchestration. See `archeflow:process-log` skill for full schema. - -**Quick reference — emit at these points:** - -``` -run.start → After workflow selection, before first agent -agent.start → Before each Agent tool call -agent.complete → After each Agent returns (include duration, tokens, summary, artifacts) -decision → When choosing between alternatives (plot direction, approach, fix strategy) -phase.transition → At Plan→Do, Do→Check, Check→Act boundaries -review.verdict → After each reviewer delivers verdict -fix.applied → After each edit addressing a review finding -cycle.boundary → End of PDCA cycle -shadow.detected → When shadow threshold triggers -run.complete → After final Act phase (include totals) -``` - -**Helper:** `./lib/archeflow-event.sh <run_id> <type> <phase> <agent> '<json>'` - -**Report:** `./lib/archeflow-report.sh .archeflow/events/<run_id>.jsonl` - -Events are optional — if the events dir doesn't exist, skip logging. Never let logging block orchestration. - ---- - -## Model Configuration - -Model assignment per archetype and workflow is configured in `.archeflow/config.yaml` under the `models:` section. The `archeflow:run` skill (section 0c) handles resolution with fallback chain: per-workflow per-archetype > per-workflow default > per-archetype > global default. When spawning agents manually, read the config to select the appropriate model. - ---- - -## Step 1: Plan Phase - -Spawn agents sequentially — Creator needs Explorer's findings. - -### Explorer (if standard or thorough) - -**Context to include:** Task description, relevant file paths, codebase access. -**Context to exclude:** Prior proposals, review outputs, implementation details, feedback from previous cycles. - -``` -Agent( - description: "🔍 Explorer: research context", - prompt: "<task description> - You are the EXPLORER archetype. - Research the codebase to understand: - 1. What files and functions are involved - 2. What dependencies exist - 3. What tests currently cover this area - 4. What patterns the codebase uses - Write your findings as a structured research report. - Be thorough but focused — no rabbit holes.", - subagent_type: "Explore" -) -``` - -### Creator - -**Context to include:** Task description, Explorer's research output. On cycle 2+: prior cycle's structured feedback (see Cycle Feedback Protocol). -**Context to exclude:** Raw file contents (Explorer already summarized), git diffs, reviewer full outputs. - -**Fast workflow only (no Explorer):** The Creator must perform a Mini-Reflect before proposing: -1. Restate the task in your own words (catch misunderstandings early) -2. List 3 assumptions you're making -3. Name the one risk that would cause most damage if wrong - -``` -Agent( - description: "🏗️ Creator: design proposal", - prompt: "<task description> - You are the CREATOR archetype. - <if fast workflow (no Explorer): Before proposing, perform a Mini-Reflect: - 1. Restate the task in one sentence - 2. List 3 assumptions you're making - 3. Name the highest-damage risk - Then propose.> - <if standard/thorough: Based on the research findings: <Explorer's output>> - <if cycle 2+: Prior cycle feedback: <structured feedback — see Cycle Feedback Protocol>> - Design a solution proposal including: - 1. Architecture decisions (with rationale) - 2. Files to create/modify (with specific changes) - 3. Alternatives considered (at least 2, with rejection rationale) - 4. Test strategy - 5. Confidence (scored by axis: task understanding, solution completeness, risk coverage) - 6. Risks you foresee - <if cycle 2+: 6. How you addressed each unresolved issue from prior feedback> - Be decisive. Ship a clear plan, not a menu of options.", - subagent_type: "Plan" -) -``` - -## Step 2: Do Phase - -Spawn Maker in an **isolated worktree** so changes don't affect main. - -**Context to include:** Creator's proposal only. On cycle 2+: implementation-routed feedback from Sage/Trickster. -**Context to exclude:** Explorer's research, Guardian/Skeptic findings (those go to Creator). - -``` -Agent( - description: "⚒️ Maker: implement proposal", - prompt: "<task description> - You are the MAKER archetype. - Implement this proposal: <Creator's output> - <if cycle 2+: Implementation feedback from prior cycle: <Sage/Trickster findings only>> - Rules: - 1. Follow the proposal exactly — don't redesign - 2. Write tests for every behavioral change - 3. Commit with descriptive messages - 4. Run existing tests — nothing may break - 5. If the proposal is unclear, implement your best interpretation and note it - Do NOT skip tests. Do NOT refactor unrelated code. - - BEFORE finishing — Self-Review Checklist: - 1. Did I change ALL files listed in the proposal's Changes section? - 2. Did I add tests for each behavioral change? - 3. Are there files in my diff NOT listed in the proposal? If yes, revert them. - 4. Do all existing tests still pass? - Report any gaps in your Implementation summary.", - isolation: "worktree", - mode: "bypassPermissions" -) -``` - -**Critical:** The Maker MUST commit its changes before finishing. Uncommitted changes in a worktree are lost. - -## Step 3: Check Phase - -Spawn Guardian **first**. After Guardian completes, check adaptation rule A2 (fast-path). If A2 triggers (0 CRITICAL, 0 WARNING, non-escalated workflow), skip remaining reviewers and proceed to Act. Otherwise, spawn remaining reviewers **in parallel**. - -**Reviewer spawning protocol:** The canonical sequence (Guardian first, A2 evaluation, parallel spawning, timeout handling) is defined in `archeflow:check-phase` under "Reviewer Spawning Protocol". Follow that protocol for the exact spawning order, context per reviewer, and timeout rules. - -### Guardian (always runs first) - -**Context to include:** Maker's git diff, proposal risk section only. -**Context to exclude:** Explorer's research, full proposal, other reviewer outputs. - -``` -Agent( - description: "🛡️ Guardian: security and risk review", - prompt: "You are the GUARDIAN archetype. - Review the changes in branch: <maker's branch> - Assess: - 1. Security vulnerabilities (injection, auth bypass, data exposure) - 2. Reliability risks (error handling, edge cases, race conditions) - 3. Breaking changes (API compatibility, schema migrations) - 4. Dependency risks (new deps, version conflicts) - Output: APPROVED or REJECTED with specific findings. - Each finding: | file:line | CRITICAL/WARNING/INFO | category | description | fix | - Categories: security, reliability, design, breaking-change, dependency - Be rigorous but practical — flag real risks, not theoretical ones." -) -``` - -### Skeptic (if standard or thorough) - -**Context to include:** Creator's proposal (focus on assumptions section). -**Context to exclude:** Git diff details, Explorer's research, other reviewer outputs. - -``` -Agent( - description: "🤔 Skeptic: challenge assumptions", - prompt: "You are the SKEPTIC archetype. - Review the proposal: <Creator's proposal> - Challenge: - 1. Assumptions in the design — what if they're wrong? - 2. Alternative approaches not considered - 3. Edge cases not tested - 4. Scalability concerns - Output: APPROVED or REJECTED with counterarguments. - Each finding: | file:line | CRITICAL/WARNING/INFO | category | description | fix | - Categories: design, quality, testing, scalability - Be constructive — every challenge must include a suggested alternative." -) -``` - -### Sage (if standard or thorough) - -**Context to include:** Creator's proposal, Maker's git diff, implementation summary. -**Context to exclude:** Explorer's raw research, other reviewer outputs. - -``` -Agent( - description: "📚 Sage: holistic quality review", - prompt: "You are the SAGE archetype. - Review the changes in branch: <maker's branch> - Evaluate holistically: - 1. Code quality (readability, maintainability, simplicity) - 2. Test coverage (are the tests meaningful, not just present?) - 3. Documentation (does the change need docs?) - 4. Consistency with codebase patterns - Output: APPROVED or REJECTED with quality findings. - Each finding: | file:line | CRITICAL/WARNING/INFO | category | description | fix | - Categories: quality, testing, design, consistency - Judge like a senior engineer doing a PR review." -) -``` - -### Trickster (if thorough only) - -**Context to include:** Maker's git diff only. -**Context to exclude:** Everything else — proposal, research, other reviews. - -``` -Agent( - description: "🃏 Trickster: adversarial testing", - prompt: "You are the TRICKSTER archetype. - Try to break the changes in branch: <maker's branch> - Attack vectors: - 1. Malformed input, boundary values, empty/null/huge data - 2. Concurrency and race conditions - 3. Error path exploitation - 4. Dependency failure scenarios - Output: APPROVED or REJECTED with edge cases found. - Each finding: | file:line | CRITICAL/WARNING/INFO | category | description | fix | - Categories: security, reliability, testing - Think like a QA engineer who gets paid per bug found." -) -``` - -## Step 4: Act Phase - -Collect all reviewer outputs and decide. - -### Completion Promise (optional) - -If the user defined explicit done criteria with the task, check them now: - -``` -Completion criteria: <test command passes> AND <Guardian approves> -Example: "done when pytest passes and Guardian approves with 0 CRITICAL" -``` - -If completion criteria are defined, **all criteria must pass** — reviewer approval alone is not sufficient. If tests fail but reviewers approved, cycle back with "tests failing" as feedback to Creator. - -### All Approved (and completion criteria met) -1. **Pre-merge hooks:** Check `.archeflow/hooks.yaml` for `pre-merge` hooks. Run them. If `fail_action: abort`, stop and report. -2. Merge the Maker's worktree branch into the target branch -3. **Post-merge hooks:** Run `post-merge` hooks from `.archeflow/hooks.yaml` if defined. Then run the project's test suite on the merged branch - - Tests pass → proceed to step 3 - - Tests fail → **auto-revert** the merge commit, report the failure, and cycle back with "integration test failure on main" as feedback -3. Report: what was implemented, what was reviewed, any warnings noted -4. Clean up the worktree -5. Record metrics (see Orchestration Metrics) - -### Issues Found (and cycles remaining) -1. Build structured feedback using the Cycle Feedback Protocol below -2. Go back to Step 1 (Plan) with the feedback -3. Creator revises the proposal, addressing each unresolved issue -4. Maker re-implements in a fresh worktree -5. Reviewers check again - -### Max Cycles Reached with Unresolved Issues -1. Report all unresolved findings to the user -2. Present the best implementation so far (on its branch) -3. Let the user decide: merge as-is, fix manually, or abandon - ---- - -## Cycle Feedback Protocol - -After the Check phase, build structured feedback for the next cycle. This replaces dumping raw reviewer output. - -### 1. Extract Findings - -Parse each reviewer's output into the standardized format: - -```markdown -## Cycle N Feedback - -### Unresolved Issues -| Source | Severity | Category | Issue | Route to | -|--------|----------|----------|-------|----------| -| Guardian | CRITICAL | security | SQL injection in user input | Creator | -| Skeptic | WARNING | design | Assumes single-tenant only | Creator | -| Sage | WARNING | quality | Test names don't describe behavior | Maker | -| Trickster | CRITICAL | reliability | Empty string bypasses validation | Creator | - -### Resolved (from cycle N-1) -| Source | Issue | Resolution | -|--------|-------|------------| -| Guardian | Missing rate limit | Added rate limiter middleware | -``` - -### 2. Route Feedback - -Not all findings go to the same agent: - -| Source | Category | Routes to | Reason | -|--------|----------|-----------|--------| -| Guardian | security, breaking-change | **Creator** | Design must change | -| Guardian | reliability, dependency | **Creator** | Architectural decision needed | -| Skeptic | design, scalability | **Creator** | Assumptions need revision | -| Sage | quality, consistency | **Maker** | Implementation refinement | -| Sage | testing | **Maker** | Test gap, not design flaw | -| Trickster | reliability (design flaw) | **Creator** | Needs redesign | -| Trickster | reliability (test gap) | **Maker** | Needs more tests | -| Trickster | testing | **Maker** | Edge case not covered | - -**Disambiguation rule:** When in doubt: if the fix requires changing the approach, route to Creator. If it requires changing the code within the existing approach, route to Maker. - -### 3. Track Resolution - -Compare cycle N findings against cycle N-1: -- If a prior finding no longer appears in the same category → mark **resolved** -- If a prior finding persists → it stays **unresolved** with an incremented cycle count -- If new findings appear → add as new unresolved issues - -This prevents regression and gives the Creator/Maker a clear list of what to address. - -### 4. Convergence Detection - -If the **same finding** (same category + same file location) appears **unresolved in 2 consecutive cycles**, escalate to user: - -> "Finding persists across 2 cycles: [Guardian] CRITICAL security — SQL injection in src/auth.ts:48. This may need human judgment or a different approach." - -Do not cycle again blindly. The issue is likely structural (wrong design, not wrong implementation) and needs human input. - -### 5. Cross-Archetype Dedup - -If two reviewers raise the same issue (same file + same category + similar description), merge into one finding in the consolidated output: - -``` -| Guardian + Skeptic | CRITICAL | security | Input not sanitized (src/api.ts:30) | Add validation | -``` - -Don't double-count in severity tallies. Route to the higher-priority destination (Creator over Maker). - ---- - -## Orchestration Metrics - -Track lightweight metrics throughout the orchestration. No token counting (unreliable from skill layer) — just timing and outcomes. - -### Per-Phase Logging - -After each phase completes, note: - -``` -| Phase | Duration | Agents | Outcome | -|-------|----------|--------|---------| -| Plan | 45s | 2 | Proposal ready (confidence: 0.8) | -| Do | 90s | 1 | 4 files changed, 8 tests added | -| Check | 60s | 3 | 1 REJECTED (Guardian), 2 APPROVED | -| Act | — | — | Cycle back → feedback built | -``` - -### Orchestration Summary - -At orchestration end, include in the report: - -```markdown -## Orchestration Metrics -| Metric | Value | -|--------|-------| -| Workflow | standard | -| Cycles | 2 of 2 | -| Total duration | 4m 30s | -| Agents spawned | 9 | -| Findings (total) | 5 | -| Findings (critical) | 1 | -| Findings (resolved) | 4 | -| Shadow detections | 0 | -``` - -Use this data to calibrate future workflow selection — if fast workflows consistently need 0 cycles of revision, the task was well-scoped. - ---- - -## Autonomous Mode - -When running unattended (overnight sessions, batch queues), add these behaviors to the orchestration loop: - -### Between-Task Checkpoint - -After each task completes (success or failure): -1. **Commit and push** all changes immediately -2. **Update session log** at `.archeflow/session-log.md` with task outcome -3. **Check stop conditions** before starting next task: - - 3 consecutive failures → STOP - - Shadow escalation (same shadow 3+ times) → STOP - - Test suite broken after merge → REVERT and STOP - - Destructive action detected → STOP - -### Session Log Protocol - -**Primary:** Emit `run.complete` event to `.archeflow/events/<run_id>.jsonl` (see Process Logging section above). The event stream is the source of truth. - -**Secondary:** Also write a human-readable summary to `.archeflow/session-log.md`: - -```markdown -## Task N: <description> -**Workflow:** standard | **Status:** COMPLETED/FAILED -**Cycles:** 1 of 2 -**Findings:** Guardian APPROVED, Skeptic APPROVED, Sage WARNING (test names) -**Files changed:** 5 | **Tests added:** 12 -**Branch:** merged to main (commit abc1234) | OR: archeflow/maker-xyz (NOT merged) -**Duration:** 8 min -**Events:** `.archeflow/events/<run_id>.jsonl` (full process log) -``` - -Generate the full Markdown report: `./lib/archeflow-report.sh .archeflow/events/<run_id>.jsonl` - -### Safety Rules -- Never force-push. Never modify main history. -- All work stays on worktree branches until explicitly merged -- Merges use `--no-ff` — individually revertable -- Failed tasks leave branches intact for manual inspection - -For full autonomous mode details (task queues, overnight checklists, user controls): load the `archeflow:autonomous-mode` skill. - ---- - -## Shadow Monitoring - -During orchestration, watch for shadow activation after each agent completes. Quick checklist: - -| Archetype | Shadow | Quick Check | -|-----------|--------|-------------| -| Explorer | Rabbit Hole | Output >2000 words without Recommendation section? | -| Creator | Over-Architect | >2 new abstractions for one feature? | -| Maker | Rogue | No test files in changeset? Files outside proposal? | -| Guardian | Paranoid | CRITICAL:WARNING ratio >2:1? Zero approvals? | -| Skeptic | Paralytic | >7 challenges? <50% have alternatives? | -| Trickster | False Alarm | Findings in untouched code? >10 findings? | -| Sage | Bureaucrat | Review >2x code change length? | - -On detection: apply correction prompt from `archeflow:shadow-detection` skill. On second detection of same shadow: replace agent. On 3+ shadows in same cycle: escalate to user. - ---- - -## Parallel Team Orchestration - -When running multiple independent tasks, spawn parallel ArcheFlow teams. Each team runs its own PDCA cycle on a separate worktree. - -### Rules - -1. **Non-overlapping file scope:** Each team must work on different files. If two tasks touch the same file, run them sequentially. -2. **Independent worktrees:** Each team's Maker gets its own worktree branch (`archeflow/team-1-maker`, `archeflow/team-2-maker`). -3. **First-finished-first-merged:** Teams merge in completion order. Later teams rebase onto the updated main before their own merge. -4. **Merge conflict handling:** If rebase fails, the later team re-runs its Check phase against the merged main. If conflicts are structural, escalate to user. -5. **Max 3 parallel teams:** More causes diminishing returns and merge headaches. - -### Spawning Parallel Teams - -``` -# Launch 2-3 teams in a single message with multiple Agent calls: -Agent(description: "🏗️ Team 1: pagination fix (fast)", ...) -Agent(description: "🏗️ Team 2: JWT auth (standard)", ...) -Agent(description: "🏗️ Team 3: logging refactor (fast)", ...) -``` - -Each team follows the full PDCA steps independently. The orchestrator monitors all teams and handles merges. - ---- - -## Reviewer Profiles - -Projects can configure which reviewers matter in `.archeflow/config.yaml`: - -```yaml -reviewers: - always: [guardian] # Always runs - default: [sage] # Runs in standard+thorough - thorough_only: [trickster] # Only in thorough - skip: [skeptic] # Never runs for this project -``` - -If no config exists, use the built-in workflow defaults. Profiles save tokens by not spawning reviewers that add little value for the specific project. - -## Explorer Cache - -If the same code area was explored recently, skip Explorer and reuse prior research: - -**Cache hit criteria:** Same files affected (>70% overlap by path) AND prior research is <24 hours old AND no commits to those files since the research. - -**On cache hit:** Show the prior research to Creator with a note: "Using cached Explorer research from [timestamp]. If the codebase changed significantly, re-run Explorer." - -**On cache miss:** Run Explorer normally. - -Cache is stored in `.archeflow/explorer-cache/` as timestamped markdown files. The orchestrator checks for matches before spawning Explorer. - -## Learning from History - -Track which archetypes catch real issues per project over time. After each orchestration, append to `.archeflow/metrics.jsonl`: - -```json -{"task": "...", "archetype": "guardian", "findings": 2, "critical": 1, "resolved": 2, "useful": true} -{"task": "...", "archetype": "skeptic", "findings": 3, "critical": 0, "resolved": 0, "useful": false} -``` - -A finding is **useful** if it was resolved (led to a code change) rather than dismissed. - -After 10+ orchestrations, the orchestrator can recommend reviewer profile changes: -- "Skeptic has found 0 useful issues in 8 runs — consider moving to `skip` or `thorough_only`" -- "Guardian catches critical issues in 80% of runs — confirmed as essential" - -This is advisory, not automatic. The user decides based on the data. - ---- - -## Orchestration Report - -After completion, summarize: - -```markdown -## ArcheFlow Orchestration Report -- **Task:** <description> -- **Workflow:** standard (2 cycles) -- **Cycle 1:** Guardian rejected (SQL injection in user input handler) -- **Cycle 2:** All approved after input sanitization added -- **Files changed:** 4 files, +120 -30 lines -- **Tests added:** 8 new tests -- **Branch:** archeflow/maker-<id> → merged to main -- **Metrics:** 9 agents, 4m 30s, 5 findings (4 resolved, 1 info remaining) -``` diff --git a/skills/plan-phase/SKILL.md b/skills/plan-phase/SKILL.md deleted file mode 100644 index d8cabd2..0000000 --- a/skills/plan-phase/SKILL.md +++ /dev/null @@ -1,175 +0,0 @@ ---- -name: plan-phase -description: Use when acting as Explorer or Creator in the Plan phase. Defines output formats for research and proposals. ---- - -# Plan Phase - -Explorer researches, then Creator designs. Sequential — Creator needs Explorer's findings. - -## Explorer Output Format - -```markdown -## Research: <task> - -### Affected Code -- `path/file.ext` — description (L<start>-<end>) - -### Dependencies -- What depends on what, what breaks if changed - -### Patterns -- How the codebase solves similar problems - -### Risks -- What could go wrong - -### Recommendation -<one paragraph: approach + rationale> -``` - -## Creator Output Format - -```markdown -## Proposal: <task> - -### Mini-Reflect (fast workflow only — skip if Explorer ran) -- **Task restated:** <one sentence> -- **Assumptions:** 1) ... 2) ... 3) ... -- **Highest-damage risk:** <the one thing that would hurt most if wrong> - -### Architecture Decision -<What and WHY> - -### Alternatives Considered -| Approach | Why Rejected | -|----------|-------------| -| <option A> | <reason> | -| <option B> | <reason> | - -### Changes -1. **`path/file.ext`** — What changes and why -2. **`path/test.ext`** — What tests to add - -### Test Strategy -- <specific test cases> - -### Confidence -| Axis | Score | Note | -|------|-------|------| -| Task understanding | <0.0-1.0> | <why> | -| Solution completeness | <0.0-1.0> | <gaps?> | -| Risk coverage | <0.0-1.0> | <unknowns?> | - -### Risks -- <what could go wrong + mitigations> - -### Not Doing -- <adjacent concerns deliberately excluded> -``` - -**Confidence triggers:** If any axis scores below 0.5, flag it to the orchestrator. Low task understanding → clarify with user. Low solution completeness → consider standard workflow. Low risk coverage → spawn targeted Explorer research. - -## Creator with Prior Feedback (Cycle 2+) - -When the Creator receives structured feedback from a prior cycle, the proposal must include an additional section addressing each unresolved issue: - -```markdown -## Proposal: <task> (Revision — Cycle N) - -### What Changed (vs. prior proposal) -- <brief delta: what was added, removed, or redesigned> - -### Prior Feedback Response -| Issue | Source | Action | Rationale | -|-------|--------|--------|-----------| -| SQL injection in user input | Guardian | **Fixed** — added parameterized queries | Direct security fix | -| Assumes single-tenant | Skeptic | **Deferred** — multi-tenant out of scope | Not in task requirements | -| Test names unclear | Sage | **Accepted** — routed to Maker | Implementation concern | - -### Architecture Decision -<revised design addressing feedback> - -### Changes -<updated file list> - -### Test Strategy -<updated test cases> - -### Confidence -| Axis | Score | Note | -|------|-------|------| -| Task understanding | <0.0-1.0> | <why> | -| Solution completeness | <0.0-1.0> | <gaps?> | -| Risk coverage | <0.0-1.0> | <unknowns?> | - -### Risks -<updated risks — include any new risks from the revision> - -### Not Doing -<updated scope boundaries> -``` - -**Rules for addressing feedback:** -- **Fixed:** Changed the design to resolve the issue. Explain how. -- **Deferred:** Not addressing now, with explicit reason. Must not be a CRITICAL finding. -- **Accepted:** Acknowledged and routed to Maker for implementation-level fix. -- **Disputed:** Disagrees with the finding. Must provide evidence or reasoning. - -CRITICAL findings cannot be deferred or disputed — they must be fixed or the proposal will be rejected again. - -## Task Granularity - -Each change item in the Creator's proposal must be a **2-5 minute task** — specific enough that the Maker can implement it without interpretation. - -### Requirements per Change Item - -Every item in the `### Changes` section must include: - -1. **Exact file path** — `src/auth/handler.ts`, not "the auth module" -2. **What to change** — a code block showing the target state or transformation -3. **How to verify** — a command or check that confirms correctness - -### Good Example - -```markdown -1. **`src/auth/handler.ts:48`** — Add input length validation before token processing - ```typescript - if (!token || token.trim().length === 0) { - throw new ValidationError('Token must not be empty'); - } - ``` - **Verify:** `npm test -- --grep "empty token"` passes -``` - -### Bad Example - -```markdown -1. **Auth module** — Fix the validation logic -``` - -This is too vague. Which file? Which function? What does "fix" mean? The Maker will guess. - -### Granularity Check - -- If a single change item would take **>5 minutes**, split it into smaller items -- If a non-trivial task has **<2 change items**, it is under-specified — the Creator missed something -- Each item should touch **1-2 files** at most. Cross-cutting changes need separate items per file. - ---- - -## Explorer Skip Conditions - -Not every task needs Explorer research. Use this decision table: - -| Condition | Skip Explorer? | Reason | -|-----------|---------------|--------| -| Task names specific files (1-2) and change is clear | **Yes** | Context is already known | -| Bug fix with stack trace or error message | **Yes** | Root cause is locatable without research | -| High confidence + small scope (single function/class) | **Yes** | Creator can mini-reflect instead | -| Task contains "investigate", "research", "explore" | **No** | Explicit research request | -| Task affects >3 files or unknown scope | **No** | Need dependency mapping | -| Unfamiliar area of codebase (no recent commits by team) | **No** | Need pattern discovery | -| Security-sensitive change (auth, crypto, input handling) | **No** | Need risk surface mapping | - -When Explorer is skipped, Creator MUST include the **Mini-Reflect** section in its proposal to compensate for missing research context. diff --git a/skills/process-log/SKILL.md b/skills/process-log/SKILL.md deleted file mode 100644 index 4dce56e..0000000 --- a/skills/process-log/SKILL.md +++ /dev/null @@ -1,278 +0,0 @@ ---- -name: process-log -description: | - Event-based process logging for ArcheFlow orchestrations. Captures every phase transition, - agent output, decision, and fix as structured JSONL events. Enables post-hoc reports, - dashboards, and process archaeology. - <example>Automatically loaded during orchestration</example> - <example>User: "Show me how this story was made"</example> ---- - -# Process Log — Event-Sourced Orchestration History - -Every ArcheFlow orchestration writes structured events to a JSONL file. Events are the **single source of truth** — all reports (Markdown, dashboards, timelines) are generated views. - -## Event Storage - -``` -.archeflow/events/<run-id>.jsonl # One file per orchestration run -.archeflow/events/index.jsonl # Run index (one line per run, for listing) -``` - -**Run ID format:** `<date>-<slug>` (e.g., `2026-04-03-der-huster`) - -## When to Emit Events - -Emit an event at each of these points during orchestration: - -| Moment | Event Type | Trigger | -|--------|-----------|---------| -| Orchestration starts | `run.start` | After workflow selection, before first agent | -| Agent spawned | `agent.start` | Before each Agent tool call | -| Agent completes | `agent.complete` | After each Agent returns | -| Phase transition | `phase.transition` | Plan→Do, Do→Check, Check→Act | -| Decision made | `decision` | Plot direction chosen, fix applied, workflow adapted | -| Review verdict | `review.verdict` | Guardian/Sage/Skeptic delivers verdict | -| Fix applied | `fix.applied` | After each edit that addresses a review finding | -| Cycle boundary | `cycle.boundary` | End of PDCA cycle, before next (or exit) | -| Shadow detected | `shadow.detected` | Shadow threshold triggered | -| Orchestration ends | `run.complete` | After final Act phase | - -## Event Schema - -Every event is one JSON line with these required fields: - -```jsonl -{ - "ts": "2026-04-03T14:32:07Z", - "run_id": "2026-04-03-der-huster", - "seq": 4, - "parent": [2], - "type": "agent.complete", - "phase": "plan", - "agent": "creator", - "data": { ... } -} -``` - -| Field | Type | Description | -|-------|------|-------------| -| `ts` | ISO 8601 | Timestamp | -| `run_id` | string | Unique run identifier | -| `seq` | integer | Monotonically increasing sequence number within run | -| `parent` | int[] | Seq numbers of causal parent events. Forms a DAG. `[]` for root events. | -| `type` | string | Event type (see table above) | -| `phase` | string | Current PDCA phase: `plan`, `do`, `check`, `act` | -| `agent` | string or null | Agent archetype that triggered the event | -| `data` | object | Event-type-specific payload (see below) | - -### Parent Relationships (DAG) - -The `parent` field turns the flat event stream into a directed acyclic graph (agent call graph). This enables: - -- **Causal reconstruction:** which agent output caused which downstream action -- **Parallel visualization:** agents sharing a parent ran concurrently -- **Blame tracking:** trace a fix back through review → draft → outline → research - -Rules: -- `run.start` has `parent: []` (root node) -- An agent has `parent: [seq of event that triggered it]` -- A phase transition has `parent: [seq of all completing events in prior phase]` -- A fix has `parent: [seq of the review that found the issue]` -- A decision has `parent: [seq of the agent that produced the alternatives]` -- Parallel agents share the same parent (fan-out), phase transitions collect them (fan-in) - -Example DAG from a writing workflow: -``` -#1 run.start [] -├── #2 agent.complete (explorer) [1] -│ └── #3 decision (plot direction) [2] -├── #4 agent.complete (creator) [2] ← explorer informs creator -├── #5 phase.transition (plan→do) [3,4] ← fan-in -│ └── #6 agent.complete (maker) [5] -├── #7 phase.transition (do→check) [6] -│ ├── #8 review (guardian) [7] ← parallel (fan-out) -│ └── #9 review (sage) [7] ← parallel (fan-out) -├── #10 phase.transition (check→act) [8,9] ← fan-in -├── #11 fix (timeline) [8] ← caused by guardian -├── #12 fix (voice drift) [9] ← caused by sage -└── #18 run.complete [17] -``` - -## Event Payloads by Type - -### `run.start` -```json -{ - "task": "Write short story 'Der Huster'", - "workflow": "kurzgeschichte", - "team": "story-development", - "max_cycles": 2, - "config": { - "voice_profile": "vp-giesing-gschichten-v1", - "persona": "giesinger", - "target_words": 6000 - } -} -``` - -### `agent.start` -```json -{ - "archetype": "story-explorer", - "model": "haiku", - "prompt_summary": "Research premise, find emotional core, suggest 3 plot directions" -} -``` - -### `agent.complete` -```json -{ - "archetype": "story-explorer", - "duration_ms": 87605, - "tokens": 21645, - "artifacts": ["docs/01-der-huster-research.md"], - "summary": "3 plot directions developed, recommended C (Mo krank + Koffer)" -} -``` - -### `decision` -```json -{ - "what": "plot_direction", - "chosen": "C — Mo krank + Koffer aus B", - "alternatives": [ - {"id": "A", "label": "Mo ist weg", "reason_rejected": "Zu passiv für 6k-Story"}, - {"id": "B", "label": "Huster gehört nicht Mo", "reason_rejected": "Zu Krimi-nah"} - ], - "rationale": "Stärkster emotionaler Kern, passt zum Voice Profile" -} -``` - -### `review.verdict` -```json -{ - "archetype": "guardian", - "verdict": "approved_with_fixes", - "findings": [ - {"severity": "bug", "description": "Timeline: 'Montag' referenced but story starts Dienstag", "fix_required": true}, - {"severity": "recommendation", "description": "Gentrification monologue too long for Alex register", "fix_required": false} - ] -} -``` - -### `fix.applied` -```json -{ - "source": "guardian", - "finding": "Timeline: Montag → Dienstag", - "file": "stories/01-der-huster.md", - "line": 302, - "before": "das Gegenteil von Montag", - "after": "das Gegenteil von Dienstag" -} -``` - -### `phase.transition` -```json -{ - "from": "plan", - "to": "do", - "artifacts_so_far": ["research.md", "outline.md"], - "notes": "Explorer recommended direction C, Creator produced 6-scene outline" -} -``` - -### `cycle.boundary` -```json -{ - "cycle": 1, - "max_cycles": 2, - "exit_condition": "all_approved", - "met": true, - "fixes_applied": 6, - "next_action": "complete" -} -``` - -### `shadow.detected` -```json -{ - "archetype": "story-explorer", - "shadow": "endless_research", - "trigger": "output >2000 words without recommendation", - "action": "correction_prompt_applied", - "occurrence": 1 -} -``` - -### `run.complete` -```json -{ - "status": "completed", - "cycles": 1, - "agents_total": 5, - "fixes_total": 6, - "shadows": 0, - "duration_ms": 1295519, - "artifacts": [ - "docs/01-der-huster-research.md", - "docs/01-der-huster-outline.md", - "stories/01-der-huster.md", - "docs/01-der-huster-guardian-review.md", - "docs/01-der-huster-sage-review.md", - "docs/01-der-huster-process.md" - ] -} -``` - -## How to Emit Events - -During orchestration, write events using this pattern: - -```bash -# Append one event to the run's JSONL file -echo '{"ts":"'$(date -u +%Y-%m-%dT%H:%M:%SZ)'","run_id":"RUN_ID","seq":SEQ,"type":"TYPE","phase":"PHASE","agent":"AGENT","data":{...}}' >> .archeflow/events/RUN_ID.jsonl -``` - -Or use the helper script: - -```bash -./lib/archeflow-event.sh RUN_ID TYPE PHASE AGENT '{"key":"value"}' -``` - -The orchestration skill should call the event emitter at each trigger point listed in the table above. - -## Generating Reports - -After orchestration completes (or during, for live progress): - -```bash -# Generate markdown process report -./lib/archeflow-report.sh .archeflow/events/2026-04-03-der-huster.jsonl > docs/process-report.md - -# List all runs -cat .archeflow/events/index.jsonl | jq -r '[.run_id, .status, .task] | @tsv' -``` - -## Run Index - -After each `run.complete`, append a summary line to `.archeflow/events/index.jsonl`: - -```jsonl -{"run_id":"2026-04-03-der-huster","ts":"2026-04-03T16:00:00Z","task":"Write Der Huster","workflow":"kurzgeschichte","status":"completed","cycles":1,"agents":5,"fixes":6,"duration_ms":1295519} -``` - -## Integration with Existing Skills - -- **`orchestration`**: Emit events at phase transitions and after each agent -- **`shadow-detection`**: Emit `shadow.detected` when thresholds trigger -- **`autonomous-mode`**: Use `index.jsonl` for session summaries instead of separate session-log -- **`workflow-design`**: Custom workflows inherit logging automatically - -## Design Principles - -1. **Append-only.** Never modify or delete events. They are immutable facts. -2. **Self-contained.** Each event has enough context to be understood alone (no forward references). -3. **Cheap.** One `echo >>` per event. No database, no service, no dependencies. -4. **Optional.** If events dir doesn't exist, orchestration works fine without logging. Events are observation, not control flow. diff --git a/skills/run/SKILL.md b/skills/run/SKILL.md index 8d5ec45..16d2b21 100644 --- a/skills/run/SKILL.md +++ b/skills/run/SKILL.md @@ -1,791 +1,309 @@ --- name: run description: | - Automated PDCA execution loop. Single-command orchestration that initializes a run, flows through - Plan/Do/Check/Act phases, emits events at every step, saves artifacts to disk, and handles - cycle-back with structured feedback. Use instead of manually following orchestration steps. - <example>User: "archeflow:run"</example> - <example>User: "Run this through ArcheFlow"</example> - <example>User: "archeflow:run --start-from check"</example> - <example>User: "archeflow:run --dry-run"</example> + Start an ArcheFlow PDCA run. Usage: /af-run <task description> [--workflow fast|standard|thorough] [--dry-run] [--start-from plan|do|check|act] --- -# ArcheFlow Run — Automated PDCA Execution Loop +# ArcheFlow Run — PDCA Orchestration -This skill automates the full orchestration cycle. When invoked, Claude executes all PDCA phases end-to-end, emitting events and saving artifacts at every step. No manual phase-by-phase intervention needed. +One command runs the full cycle: Plan (Explorer+Creator) -> Do (Maker in worktree) -> Check (Guardian first, then others) -> Act (collect findings, route fixes, exit or cycle). -## Prerequisites +## 0. Initialize -Load these skills (they are referenced throughout): -- `archeflow:orchestration` — agent prompts, workflow selection, adaptation rules -- `archeflow:process-log` — event schema and DAG parent rules -- `archeflow:artifact-routing` — artifact naming, context injection, cycle archiving +1. Generate run ID: `<YYYY-MM-DD>-<task-slug>` +2. Create artifact directory: `mkdir -p .archeflow/artifacts/<run_id>` +3. Verify `./lib/archeflow-*.sh` scripts exist before proceeding +4. Inject cross-run memory: `./lib/archeflow-memory.sh inject "$DOMAIN" "" --audit "$RUN_ID"` +5. Read `.archeflow/config.yaml` models section. Resolution order: per-workflow per-archetype > per-workflow default > per-archetype > global default. +6. Emit `run.start` event -## Invocation +### Strategy Selection -``` -archeflow:run # Full run, auto-select workflow -archeflow:run --workflow standard # Force a specific workflow -archeflow:run --start-from do # Resume from Do phase (requires prior artifacts) -archeflow:run --start-from check # Resume from Check phase -archeflow:run --dry-run # Plan phase only, show cost estimate -archeflow:run --max-cycles 1 # Override max cycles -``` +Determine strategy from CLI flag `--strategy`, config `strategy:` field, or auto-detect: + +| Signal | Strategy | +|--------|----------| +| Task contains fix/bug/patch/hotfix | `pipeline` | +| Task contains refactor/redesign/review | `pdca` | +| Workflow is `fast` with single file | `pipeline` | +| Workflow is `thorough` | `pdca` | +| Default | `pdca` | + +If `pipeline`, skip to the Pipeline section at the end. Otherwise continue with PDCA below. + +### Workflow Selection + +| Signal | Workflow | Max Cycles | +|--------|----------|------------| +| Small fix, low risk, single concern | `fast` | 1 | +| Feature, multiple files, moderate risk | `standard` | 2 | +| Security-sensitive, breaking changes, public API | `thorough` | 3 | --- -## Execution Steps +## Attention Filters -### 0. Initialize +Each agent receives only what it needs. This is the canonical reference: -Generate a run ID and set up the artifact directory. - -```bash -# Generate run_id -RUN_ID="$(date -u +%Y-%m-%d)-<task-slug>" - -# Create artifact directory -mkdir -p .archeflow/artifacts/${RUN_ID} - -# Emit run.start event (seq=1, parent=[]) -./lib/archeflow-event.sh "$RUN_ID" run.start plan "" \ - '{"task":"<task description>","workflow":"<fast|standard|thorough>","max_cycles":<N>}' -``` - -**Track state:** Maintain these variables throughout the run: -- `RUN_ID` — unique run identifier -- `SEQ` — current sequence number (read from event file line count after each emit) -- `CYCLE` — current PDCA cycle number (starts at 1) -- `WORKFLOW` — fast/standard/thorough (may change via adaptation rules) -- `ESCALATED` — boolean, set true if A1 triggers - -After emitting `run.start`, record `SEQ_RUN_START=1`. - -If `--start-from` is specified, verify that the required prior artifacts exist in `.archeflow/artifacts/${RUN_ID}/` before skipping phases. If missing, abort with an error. - -#### 0a. Strategy Resolution - -Determine the execution strategy before proceeding. Strategy controls the overall flow shape (cyclic vs linear). - -```bash -# Read strategy from config or CLI flag -STRATEGY=$(grep '^strategy:' "$CONFIG" 2>/dev/null | sed 's/strategy:\s*//' | tr -d '"' | head -1) -STRATEGY="${STRATEGY:-auto}" - -# CLI override: --strategy pdca|pipeline -# (parsed from invocation args, overrides config) - -# Auto-select logic -if [[ "$STRATEGY" == "auto" ]]; then - TASK_LOWER=$(echo "$TASK" | tr '[:upper:]' '[:lower:]') - if echo "$TASK_LOWER" | grep -qE '(fix|bug|patch|hotfix)'; then - STRATEGY="pipeline" - elif echo "$TASK_LOWER" | grep -qE '(refactor|redesign|review)'; then - STRATEGY="pdca" - elif [[ "$WORKFLOW" == "fast" ]]; then - STRATEGY="pipeline" - elif [[ "$WORKFLOW" == "thorough" ]]; then - STRATEGY="pdca" - else - STRATEGY="pdca" - fi -fi - -echo "Strategy: $STRATEGY" -``` - -**Strategy dispatch:** If `STRATEGY=pdca`, execute Steps 1-5 below (existing PDCA flow). If `STRATEGY=pipeline`, skip to the "Pipeline Strategy Execution" section at the end of this skill. - -#### 0b. Lib Script Validation - -Verify that all required library scripts exist and are executable before proceeding. Fail fast if any dependency is missing. - -```bash -# Required lib scripts -REQUIRED_LIBS=( - "archeflow-event.sh" - "archeflow-memory.sh" - "archeflow-git.sh" - "archeflow-rollback.sh" - "archeflow-report.sh" - "archeflow-progress.sh" -) - -MISSING=() -for lib in "${REQUIRED_LIBS[@]}"; do - if [[ ! -x "./lib/$lib" ]]; then - MISSING+=("$lib") - fi -done - -if [[ ${#MISSING[@]} -gt 0 ]]; then - echo "ERROR: Missing or non-executable lib scripts:" >&2 - for m in "${MISSING[@]}"; do - echo " - lib/$m" >&2 - done - echo "Ensure ArcheFlow is installed correctly. See README for setup." >&2 - exit 1 -fi - -# Check jq availability (required for event processing and memory) -if ! command -v jq &>/dev/null; then - echo "ERROR: jq is required but not found in PATH." >&2 - echo "Install with: apt install jq / brew install jq" >&2 - exit 1 -fi -``` - -#### 0c. Memory Injection - -Load cross-run memory lessons and inject into agent prompts. Use `--audit` to track which lessons were injected for this run: - -```bash -# Load cross-run memory for this domain (with audit trail) -MEMORY_LESSONS=$(./lib/archeflow-memory.sh inject "$DOMAIN" "" --audit "$RUN_ID") - -# Inject into Explorer/Creator prompts if non-empty -if [[ -n "$MEMORY_LESSONS" ]]; then - EXPLORER_PROMPT="${EXPLORER_PROMPT} - -${MEMORY_LESSONS}" - CREATOR_PROMPT="${CREATOR_PROMPT} - -${MEMORY_LESSONS}" -fi -``` - -#### 0d. Model Configuration - -Read model assignment from `.archeflow/config.yaml` and resolve the model for each archetype based on the current workflow. Per-workflow overrides take precedence over per-archetype overrides, which take precedence over the default. - -```bash -CONFIG=".archeflow/config.yaml" - -# Read default model -DEFAULT_MODEL=$(grep -A1 '^models:' "$CONFIG" 2>/dev/null | grep 'default:' | sed 's/.*default:\s*//' | tr -d '"' | head -1) -DEFAULT_MODEL="${DEFAULT_MODEL:-sonnet}" - -# Resolve model for a given archetype and workflow -# Usage: resolve_model <archetype> <workflow> -resolve_model() { - local arch="$1" wf="$2" model="" - - # Check per-workflow per-archetype override - model=$(sed -n "/workflows:/,\$p" "$CONFIG" 2>/dev/null \ - | sed -n "/${wf}:/,/^ [a-z]/p" \ - | grep -A1 "archetypes:" | grep "${arch}:" \ - | sed "s/.*${arch}:\s*//" | tr -d '"' | head -1) - [[ -n "$model" ]] && echo "$model" && return - - # Check per-workflow default - model=$(sed -n "/workflows:/,\$p" "$CONFIG" 2>/dev/null \ - | sed -n "/${wf}:/,/^ [a-z]/p" \ - | grep 'default:' | sed 's/.*default:\s*//' | tr -d '"' | head -1) - [[ -n "$model" ]] && echo "$model" && return - - # Check per-archetype override - model=$(sed -n "/^ archetypes:/,/^ [a-z]/p" "$CONFIG" 2>/dev/null \ - | grep "${arch}:" | sed "s/.*${arch}:\s*//" | tr -d '"' | head -1) - [[ -n "$model" ]] && echo "$model" && return - - # Fall back to default - echo "$DEFAULT_MODEL" -} - -# Example: EXPLORER_MODEL=$(resolve_model explorer "$WORKFLOW") -``` - -Use `resolve_model` when spawning each agent to pass the correct model. The resolved model can be included in the `agent.start` event data for cost tracking. +| Archetype | Receives | Excludes | +|-----------|----------|----------| +| Explorer | Task description, codebase access | Prior proposals, reviews, diffs | +| Creator | Task + Explorer output (+ feedback cycle 2+) | Raw files, diffs, reviewer outputs | +| Maker | Creator's proposal (+ Maker-routed feedback cycle 2+) | Explorer research, reviewer outputs | +| Guardian | Maker's diff + proposal risk section | Full proposal, Explorer research | +| Skeptic | Creator's proposal (assumptions focus) | Diff details, Explorer research | +| Sage | Proposal + diff + implementation summary | Explorer research, other reviews | +| Trickster | Maker's diff only | Everything else | --- -### Status Token Protocol +## Status Token Protocol -Every agent ends its output with a `STATUS:` line. The orchestrator parses this to decide the next action. - -**Parsing:** - -```bash -STATUS=$(tail -20 "$AGENT_OUTPUT" | grep -oE 'STATUS: (DONE|DONE_WITH_CONCERNS|NEEDS_CONTEXT|BLOCKED)' | head -1) -STATUS="${STATUS#STATUS: }" -if [[ -z "$STATUS" ]]; then STATUS="DONE"; fi -``` - -**Status to action mapping:** +Every agent ends output with `STATUS: <token>`. Parse it to decide the next action. | Status | Action | |--------|--------| -| `DONE` | Proceed to next phase or agent | -| `DONE_WITH_CONCERNS` | Log concerns in event data, proceed | -| `NEEDS_CONTEXT` | Pause run, request missing information from user | -| `BLOCKED` | Abort phase, report blocker to user | +| `DONE` | Proceed to next phase | +| `DONE_WITH_CONCERNS` | Log concerns, proceed | +| `NEEDS_CONTEXT` | Pause, request info from user | +| `BLOCKED` | Abort phase, report blocker | -Include the parsed status in the `agent.complete` event data: `"status":"<STATUS>"`. +If no status token found, default to `DONE`. --- -### 1. Plan Phase +## 1. Plan Phase -#### 1a. Explorer (if standard or thorough) - -```bash -# Emit agent.start -./lib/archeflow-event.sh "$RUN_ID" agent.start plan explorer \ - '{"archetype":"explorer","prompt_summary":"Research codebase context for task"}' "$SEQ_RUN_START" -``` - -Spawn the Explorer agent using the prompt from `archeflow:orchestration` Step 1. +### 1a. Explorer (standard/thorough only) ``` Agent( description: "Explorer: research context for <task>", - prompt: "<Explorer prompt from orchestration skill>", + prompt: "You are the EXPLORER archetype. + <task description> + Research: 1) affected files/functions, 2) dependencies, 3) test coverage, + 4) codebase patterns. Write a structured research report. + Be thorough but focused — no rabbit holes.", subagent_type: "Explore" ) ``` -After Explorer returns: -1. Save output to `.archeflow/artifacts/${RUN_ID}/plan-explorer.md` -2. Emit `agent.complete`: - ```bash - ./lib/archeflow-event.sh "$RUN_ID" agent.complete plan explorer \ - '{"archetype":"explorer","duration_ms":<ms>,"artifacts":["plan-explorer.md"],"summary":"<1-line summary>"}' "$SEQ_EXPLORER_START" - ``` -3. Record `SEQ_EXPLORER_COMPLETE` for DAG references. +Save output to `.archeflow/artifacts/<run_id>/plan-explorer.md`. -#### 1b. Creator +### 1b. Creator -The Creator receives Explorer output (if it exists) or performs Mini-Reflect (fast workflow). - -```bash -# Emit agent.start — parent is explorer.complete (or run.start for fast) -./lib/archeflow-event.sh "$RUN_ID" agent.start plan creator \ - '{"archetype":"creator","prompt_summary":"Design solution proposal"}' "$SEQ_EXPLORER_COMPLETE" -``` - -Spawn the Creator agent using the prompt from `archeflow:orchestration` Step 1. - -**Context injection (from artifact-routing skill):** -- Fast workflow: task description only -- Standard/thorough: task description + contents of `plan-explorer.md` -- Cycle 2+: task description + `plan-explorer.md` + `act-feedback.md` from prior cycle +Fast workflow (no Explorer): Creator must perform Mini-Reflect first: +1. Restate the task in one sentence +2. List 3 assumptions +3. Name the highest-damage risk ``` Agent( description: "Creator: design proposal for <task>", - prompt: "<Creator prompt from orchestration skill, with context injected per above>", + prompt: "You are the CREATOR archetype. + <task description> + <if fast: Perform Mini-Reflect first (restate task, 3 assumptions, top risk)> + <if standard/thorough: Research findings: <plan-explorer.md contents>> + <if cycle 2+: Prior feedback: <Creator-routed section of act-feedback.md>> + Design a proposal: + 1. Architecture decisions (with rationale) + 2. Files to create/modify (exact paths, specific changes, 2-5 min per item) + 3. Alternatives considered (2+, with rejection rationale) + 4. Test strategy (specific test cases) + 5. Confidence table (task understanding, solution completeness, risk coverage — each 0.0-1.0) + 6. Risks and mitigations + <if cycle 2+: 7. How each prior issue was addressed (Fixed/Deferred/Accepted/Disputed)> + Be decisive — ship a clear plan, not a menu.", subagent_type: "Plan" ) ``` -After Creator returns: -1. Save output to `.archeflow/artifacts/${RUN_ID}/plan-creator.md` -2. Emit `agent.complete` -3. Record `SEQ_CREATOR_COMPLETE` +Save output to `.archeflow/artifacts/<run_id>/plan-creator.md`. -#### 1c. Confidence Gate (Adaptation Rule A3) +### 1c. Confidence Gate (Rule A3) -**Parsing instructions:** +Parse the `### Confidence` table from `plan-creator.md`. If unparseable, default to 0.0 (triggers gate). -Read `plan-creator.md`, locate the `### Confidence` table. Extract scores for each axis as floats: +| Axis | Score < 0.5 | Action | +|------|-------------|--------| +| Task understanding | Pause | Ask user for clarification. Do not spawn Maker. | +| Solution completeness | Upgrade | If fast -> standard. Spawn Explorer, re-run Creator. | +| Risk coverage | Mini-Explorer | Spawn focused Explorer for risky areas (5 min max, parallel with Do prep). | -```bash -CONF_FILE=".archeflow/artifacts/${RUN_ID}/plan-creator.md" - -# Extract confidence scores (expects format: "| Task understanding | 0.8 |") -TASK_UNDERSTANDING=$(grep -i "task understanding" "$CONF_FILE" | grep -oE '[0-9]+\.[0-9]+' | head -1) -SOLUTION_COMPLETENESS=$(grep -i "solution completeness" "$CONF_FILE" | grep -oE '[0-9]+\.[0-9]+' | head -1) -RISK_COVERAGE=$(grep -i "risk coverage" "$CONF_FILE" | grep -oE '[0-9]+\.[0-9]+' | head -1) - -# Fallback: if unparseable, emit warning and default to 0.0 (triggers gate, not bypasses it) -if [[ -z "$TASK_UNDERSTANDING" || -z "$SOLUTION_COMPLETENESS" || -z "$RISK_COVERAGE" ]]; then - echo "WARNING: Could not parse confidence scores from plan-creator.md" >&2 - ./lib/archeflow-event.sh "$RUN_ID" decision plan "" \ - '{"what":"confidence_parse_failure","chosen":"warn","rationale":"one or more scores unparseable"}' "$SEQ_CREATOR_COMPLETE" -fi -TASK_UNDERSTANDING="${TASK_UNDERSTANDING:-0.0}" -SOLUTION_COMPLETENESS="${SOLUTION_COMPLETENESS:-0.0}" -RISK_COVERAGE="${RISK_COVERAGE:-0.0}" -``` - -**Pause branch** (Task understanding < 0.5): - -The Creator does not sufficiently understand the task. Do not spawn Maker. - -1. Emit decision event with `"chosen":"pause"` -2. Display message to user: "Creator rated task understanding at <score>. Clarification needed before proceeding." -3. Block until the user provides clarification -4. Re-run Creator with the clarification appended to the task description - -```bash -./lib/archeflow-event.sh "$RUN_ID" decision plan "" \ - '{"what":"confidence_gate","chosen":"pause","rationale":"task_understanding scored '"$TASK_UNDERSTANDING"'"}' "$SEQ_CREATOR_COMPLETE" -``` - -**Upgrade branch** (Solution completeness < 0.5): - -The Creator's proposal is incomplete — more research is needed. - -1. If fast workflow: upgrade to standard, spawn Explorer, then re-run Creator with Explorer output -2. If already standard/thorough: re-run Explorer with a focused prompt targeting the incomplete areas - -```bash -./lib/archeflow-event.sh "$RUN_ID" decision plan "" \ - '{"what":"confidence_gate","chosen":"upgrade","rationale":"solution_completeness scored '"$SOLUTION_COMPLETENESS"'"}' "$SEQ_CREATOR_COMPLETE" - -# If fast → standard upgrade: -WORKFLOW="standard" -# Spawn Explorer, then re-run Creator with Explorer findings -``` - -**Mini-Explorer branch** (Risk coverage < 0.5): - -The Creator identified risks but lacks confidence in their assessment. Spawn a focused Explorer to investigate. - -``` -Agent( - description: "Mini-Explorer: investigate risk area for <task>", - prompt: "You are the EXPLORER archetype. The Creator rated risk coverage at <score>. - Identified risks: <risks from plan-creator.md> - Research ONLY the risky areas. Answer: Is the risk real? What mitigations exist? What tests/guards would help? - Limit: focused output only.", - subagent_type: "Explore" -) -``` - -Save output to `.archeflow/artifacts/${RUN_ID}/plan-mini-explorer.md`. The Maker receives both `plan-creator.md` and `plan-mini-explorer.md` as context. - -```bash -./lib/archeflow-event.sh "$RUN_ID" decision plan "" \ - '{"what":"confidence_gate","chosen":"mini_explorer","rationale":"risk_coverage scored '"$RISK_COVERAGE"'"}' "$SEQ_CREATOR_COMPLETE" -``` - -**Note:** The mini-Explorer runs in parallel with Do phase preparation (5 min max). The Maker can proceed once both `plan-creator.md` and `plan-mini-explorer.md` are available. - -#### 1d. Phase Transition: Plan to Do - -```bash -# Parent = all completing events in Plan phase -./lib/archeflow-event.sh "$RUN_ID" phase.transition do "" \ - '{"from":"plan","to":"do","artifacts_so_far":["plan-explorer.md","plan-creator.md"]}' "$SEQ_CREATOR_COMPLETE" -``` - -Record `SEQ_PLAN_TO_DO`. +Mini-Explorer prompt: "You are the EXPLORER. Risk coverage scored <score>. Identified risks: <risks>. Research ONLY the risky areas. Is the risk real? What mitigations exist?" +Save to `plan-mini-explorer.md`. --- -### 2. Do Phase +## 2. Do Phase -#### 2a. Maker - -**Context injection (from artifact-routing skill):** -- Contents of `plan-creator.md` (the proposal) -- Cycle 2+: also contents of `act-feedback.md` filtered to Maker-routed findings only - -```bash -./lib/archeflow-event.sh "$RUN_ID" agent.start do maker \ - '{"archetype":"maker","prompt_summary":"Implement proposal in isolated worktree"}' "$SEQ_PLAN_TO_DO" -``` +### 2a. Maker ``` Agent( description: "Maker: implement <task>", - prompt: "<Maker prompt from orchestration skill, with Creator proposal injected> - <if cycle 2+: Implementation feedback: <Maker-routed findings from act-feedback.md>>", + prompt: "You are the MAKER archetype. + Implement this proposal: <plan-creator.md contents> + <if cycle 2+: Implementation feedback: <Maker-routed findings from act-feedback.md>> + Rules: + 1. Follow the proposal exactly — don't redesign + 2. Write tests for every behavioral change + 3. Commit with descriptive messages (CRITICAL: uncommitted worktree changes are LOST) + 4. Run existing tests — nothing may break + 5. If unclear, implement best interpretation and note it + Self-review before finishing: + - All proposal files changed? Tests added? No out-of-scope files? Existing tests pass?", isolation: "worktree", mode: "bypassPermissions" ) ``` -After Maker returns: -1. Save implementation summary to `.archeflow/artifacts/${RUN_ID}/do-maker.md` -2. Capture list of changed files: `git diff --name-only` on the Maker's branch, save to `.archeflow/artifacts/${RUN_ID}/do-maker-files.txt` -3. Emit `agent.complete`: - ```bash - ./lib/archeflow-event.sh "$RUN_ID" agent.complete do maker \ - '{"archetype":"maker","duration_ms":<ms>,"artifacts":["do-maker.md","do-maker-files.txt"],"summary":"<files changed, tests added>"}' "$SEQ_MAKER_START" - ``` -4. Record `SEQ_MAKER_COMPLETE` +Save summary to `do-maker.md`. Save changed file list to `do-maker-files.txt`. -**Critical:** Verify the Maker committed its changes before proceeding. If uncommitted changes exist, instruct the Maker to commit. +### 2b. Test-First Gate -#### 2a-ii. Test-First Validation - -After Maker completes, check `do-maker-files.txt` for test files: -```bash -TEST_FILES=$(grep -iE '([/_.-](test|spec)[/_.-]|\.(test|spec)\.|_(test|spec)\.|/tests?/|/__tests__/|/specs?/)' ".archeflow/artifacts/${RUN_ID}/do-maker-files.txt" || true) -``` - -If `TEST_FILES` is empty and domain is not `writing`: -1. Check if `plan-creator.md` contains a `### Test Strategy` section -2. If yes: re-run Maker with targeted test instruction (one retry within Do phase) -3. If no test strategy specified: emit WARNING event and proceed - -```bash -./lib/archeflow-event.sh "$RUN_ID" decision do "" \ - '{"what":"test_first_gate","chosen":"<pass|warn|retry>","rationale":"<reason>"}' "$SEQ_MAKER_COMPLETE" -``` - -The re-run prompt for the retry case: -> "The proposal specified these test cases: <test strategy section>. No test files were found in your changes. Add the specified tests before finishing." - -This is one retry within the Do phase, not a full PDCA cycle. If the retry also produces no tests, emit WARNING and proceed to Check. - -#### 2b. Phase Transition: Do to Check - -```bash -./lib/archeflow-event.sh "$RUN_ID" phase.transition check "" \ - '{"from":"do","to":"check","artifacts_so_far":["plan-explorer.md","plan-creator.md","do-maker.md","do-maker-files.txt"]}' "$SEQ_MAKER_COMPLETE" -``` - -Record `SEQ_DO_TO_CHECK`. +Check `do-maker-files.txt` for test files. If none found and domain is not `writing`: +- If Creator specified a test strategy: re-run Maker with targeted test instruction (1 retry within Do, not a full cycle) +- If no test strategy: emit WARNING, proceed --- -### 3. Check Phase +## 3. Check Phase -**Important:** Spawn Guardian FIRST, then evaluate A2 before spawning other reviewers. +Spawn Guardian FIRST. Evaluate Rule A2 before spawning other reviewers. -#### 3a. Guardian (always first) - -**Context injection:** Maker's git diff + proposal risk section only (not full proposal, not Explorer research). - -```bash -./lib/archeflow-event.sh "$RUN_ID" agent.start check guardian \ - '{"archetype":"guardian","prompt_summary":"Security and risk review of changes"}' "$SEQ_DO_TO_CHECK" -``` +### 3a. Guardian (always first) ``` Agent( description: "Guardian: security review for <task>", - prompt: "<Guardian prompt from orchestration skill, with Maker's diff injected>" + prompt: "You are the GUARDIAN archetype. + Review changes in branch: <maker's branch> + <git diff from Maker's branch> + <risks section from plan-creator.md> + Assess: security, reliability, breaking changes, dependencies. + Output: APPROVED or REJECTED with findings table. + Each finding: | file:line | CRITICAL/WARNING/INFO | category | description | fix | + Be rigorous but practical — flag real risks, not theoretical ones." ) ``` -After Guardian returns: -1. Save to `.archeflow/artifacts/${RUN_ID}/check-guardian.md` -2. Emit `review.verdict`: - ```bash - ./lib/archeflow-event.sh "$RUN_ID" review.verdict check guardian \ - '{"archetype":"guardian","verdict":"<approved|rejected|approved_with_fixes>","findings":[...]}' "$SEQ_GUARDIAN_START" - ``` -3. Record `SEQ_GUARDIAN_VERDICT` +Save to `check-guardian.md`. -#### 3b. Guardian Fast-Path Check (Adaptation Rule A2) +### 3b. Guardian Fast-Path (Rule A2) -Parse Guardian's output. If **0 CRITICAL and 0 WARNING** AND workflow is not escalated AND not first cycle of thorough: +If Guardian found **0 CRITICAL and 0 WARNING** AND workflow is not escalated AND not first cycle of thorough: +- Skip remaining reviewers, proceed directly to Act phase +- Log "Guardian fast-path taken" -```bash -./lib/archeflow-event.sh "$RUN_ID" decision check "" \ - '{"what":"guardian_fast_path","chosen":"skip_remaining_reviewers","rationale":"0 CRITICAL, 0 WARNING"}' "$SEQ_GUARDIAN_VERDICT" -``` +### 3c. Remaining Reviewers (spawn in parallel) -Skip to Phase Transition (3d). Log "Guardian fast-path taken" in report. +**Skeptic** (standard/thorough) — receives Creator's proposal, focus on assumptions. +Save to `check-skeptic.md`. -Otherwise, proceed to spawn remaining reviewers. +**Sage** (standard/thorough) — receives proposal + diff + implementation summary. +Save to `check-sage.md`. -#### 3c. Remaining Reviewers (in parallel) +**Trickster** (thorough only) — receives Maker's diff only. "Think like a QA engineer paid per bug." +Save to `check-trickster.md`. -Spawn these based on workflow (see `archeflow:orchestration` for which reviewers apply): +### 3d. Evidence Validation -**Skeptic** (standard/thorough): -- Context: Creator's proposal (assumptions section focus) -- Save to: `check-skeptic.md` +After all reviewers complete, scan CRITICAL/WARNING findings. Downgrade to INFO if: +- **Banned phrases** without evidence: "might be", "could potentially", "appears to", "seems like", "may not" +- **No evidence**: no command output, code citation, line reference, or reproduction steps -**Sage** (standard/thorough): -- Context: Creator's proposal + Maker's diff + implementation summary -- Save to: `check-sage.md` - -**Trickster** (thorough only): -- Context: Maker's diff only -- Save to: `check-trickster.md` - -Spawn all applicable reviewers in parallel (multiple Agent calls in one message). For each: - -```bash -# Emit agent.start with parent = SEQ_DO_TO_CHECK -./lib/archeflow-event.sh "$RUN_ID" agent.start check <archetype> \ - '{"archetype":"<archetype>","prompt_summary":"<review focus>"}' "$SEQ_DO_TO_CHECK" -``` - -After each returns, emit `review.verdict` and save artifact. - -#### 3c-ii. Evidence Validation - -After all reviewers complete, scan CRITICAL/WARNING findings for two conditions: -1. **Banned phrases** — hedged language without evidence -2. **Missing evidence** — no command output, code citation, or reproduction steps - -Downgrade unsupported findings to INFO before proceeding to Act. - -```bash -BANNED_PHRASES=("might be" "could potentially" "appears to" "seems like" "may not") -EVIDENCE_MARKERS=("exit" "output" "line [0-9]" ":[0-9]" "returned" "FAIL" "PASS" "assert") - -for artifact in .archeflow/artifacts/${RUN_ID}/check-*.md; do - REVIEWER=$(basename "$artifact" .md | sed 's/check-//') - - # Read findings table rows (CRITICAL and WARNING only) - grep -E '\| (CRITICAL|WARNING) \|' "$artifact" 2>/dev/null | while IFS= read -r line; do - SEVERITY=$(echo "$line" | grep -oE '(CRITICAL|WARNING)' | head -1) - DOWNGRADE_REASON="" - - # Check 1: banned phrases - for phrase in "${BANNED_PHRASES[@]}"; do - if echo "$line" | grep -qi "$phrase"; then - DOWNGRADE_REASON="banned phrase: $phrase" - break - fi - done - - # Check 2: no evidence markers (only if not already flagged) - if [[ -z "$DOWNGRADE_REASON" ]]; then - HAS_EVIDENCE=false - for marker in "${EVIDENCE_MARKERS[@]}"; do - if echo "$line" | grep -qiE "$marker"; then - HAS_EVIDENCE=true - break - fi - done - if [[ "$HAS_EVIDENCE" == "false" ]]; then - DOWNGRADE_REASON="no evidence cited" - fi - fi - - if [[ -n "$DOWNGRADE_REASON" ]]; then - echo "EVIDENCE DOWNGRADE: $REVIEWER $SEVERITY finding — $DOWNGRADE_REASON" - ./lib/archeflow-event.sh "$RUN_ID" decision check "" \ - '{"what":"evidence_downgrade","from":"'"$SEVERITY"'","to":"INFO","reviewer":"'"$REVIEWER"'","reason":"'"$DOWNGRADE_REASON"'"}' - # Note: the orchestrator tracks downgraded findings separately — - # do not modify the artifact file (avoids sed corruption on table rows) - fi - done -done -``` - -**Important:** Downgraded findings are tracked in events, NOT by modifying artifact files. The Act phase reads the decision events to know which findings were downgraded and excludes them from CRITICAL tallies. - -#### 3d. Phase Transition: Check to Act - -Collect all verdict seq numbers for the parent array. - -```bash -./lib/archeflow-event.sh "$RUN_ID" phase.transition act "" \ - '{"from":"check","to":"act"}' "<all_verdict_seqs>" -``` - -Record `SEQ_CHECK_TO_ACT`. +Track downgrades in events (do NOT modify artifact files). Act phase excludes downgraded findings from CRITICAL tallies. --- -### 4. Act Phase +## 4. Act Phase -#### 4a. Collect Verdicts +### 4a. Collect Verdicts -Read all `check-*.md` artifacts. Tally findings: -- Count CRITICAL, WARNING, INFO per reviewer -- Check for unanimous approval +Read all `check-*.md` artifacts. Tally CRITICAL/WARNING/INFO per reviewer. -#### 4b. Escalation Check (Adaptation Rule A1) +### 4b. Escalation Check (Rule A1) -If workflow is `fast` and Guardian found 2+ CRITICAL: -- Set `ESCALATED=true` -- Upgrade next cycle to `standard` (add Skeptic + Sage) -- Emit decision event +If `fast` workflow and Guardian found 2+ CRITICAL: upgrade next cycle to `standard` (add Skeptic + Sage). Once escalated, stays escalated. A2 does not apply to escalated workflows. -#### 4c. Branch: All Approved +### 4c. Convergence Check (cycle 2+ only) -If all reviewers approved (and completion criteria met, if defined): +Compare current findings against previous cycle. Classify each as NEW / RESOLVED / PERSISTENT / REGRESSED. -1. Emit `cycle.boundary`: - ```bash - ./lib/archeflow-event.sh "$RUN_ID" cycle.boundary act "" \ - '{"cycle":<N>,"max_cycles":<M>,"exit_condition":"all_approved","met":true,"next_action":"complete"}' "$SEQ_CHECK_TO_ACT" - ``` +``` +convergence_score = resolved / (resolved + new + regressed) +``` -2. **Pre-merge hook check:** - ```bash - # Read hooks config if it exists - if [[ -f ".archeflow/hooks.yaml" ]]; then - PRE_MERGE_HOOKS=$(grep -A5 "pre-merge:" .archeflow/hooks.yaml || true) - if [[ -n "$PRE_MERGE_HOOKS" ]]; then - echo "Running pre-merge hooks..." - # Execute hooks; abort merge if fail_action: abort - # Hook execution is project-specific — see .archeflow/hooks.yaml - fi - fi - ``` +| Score | Status | Action | +|-------|--------|--------| +| > 0.8 | Converging | Continue if cycles remain | +| 0.5-0.8 | Stalling | Continue with caution | +| < 0.5 | Diverging | STOP if 2 consecutive cycles | +| 0.0 (all persistent) | Stuck | STOP | -3. **Merge the Maker's worktree branch:** - ```bash - ./lib/archeflow-git.sh merge "$RUN_ID" --no-ff - ``` +**Oscillation**: Finding present in cycle N-2, absent in N-1, back in N. Two or more oscillating findings = STOP and escalate to user. -4. **Post-merge test validation** (using the auto-rollback script): - ```bash - # Run tests and auto-revert if they fail - if ! ./lib/archeflow-rollback.sh "$RUN_ID"; then - # Rollback script already reverted HEAD and emitted decision event - # If cycles remain, cycle back with integration test failure feedback - if [[ "$CYCLE" -lt "$MAX_CYCLES" ]]; then - echo "Cycling back with integration test failure feedback..." - # Build act-feedback.md with "integration test failure on main" as top finding - # Continue to step 4d (Issues Found) - else - echo "Max cycles reached. Reporting failure to user." - # Continue to step 4e (Max Cycles Reached) - fi - fi - ``` +Convergence STOP overrides normal cycle-back even if cycles remain. -5. **Clean up worktree:** - ```bash - ./lib/archeflow-git.sh cleanup "$RUN_ID" - ``` +### 4d. All Approved -6. Proceed to Completion (step 5) +1. Run pre-merge hooks from `.archeflow/hooks.yaml` if defined +2. Merge: `./lib/archeflow-git.sh merge "$RUN_ID" --no-ff` +3. Post-merge test validation: `./lib/archeflow-rollback.sh "$RUN_ID"` (auto-reverts if tests fail) +4. If rollback triggered and cycles remain: cycle back with "integration test failure" feedback +5. Clean up worktree: `./lib/archeflow-git.sh cleanup "$RUN_ID"` +6. Proceed to Completion -#### 4d. Branch: Issues Found (cycles remaining) +### 4e. Issues Found (cycles remaining) -If any reviewer rejected and `CYCLE < MAX_CYCLES`: +Build `act-feedback.md` using the feedback routing table below. Archive current cycle artifacts to `cycle-<N>/`. Increment cycle, go back to Plan. -1. Build structured feedback using the Cycle Feedback Protocol from `archeflow:orchestration`: - - Extract findings from all `check-*.md` artifacts - - Route findings: Guardian/Skeptic issues → Creator, Sage issues → Maker - - Check convergence: same finding in 2 consecutive cycles → escalate to user - - Dedup cross-archetype findings +### 4f. Max Cycles Reached -2. Save to `.archeflow/artifacts/${RUN_ID}/act-feedback.md` - -3. Save applied fixes log (initially empty, populated during next Do phase): - ```bash - touch .archeflow/artifacts/${RUN_ID}/act-fixes.jsonl - ``` - -4. Emit `cycle.boundary`: - ```bash - ./lib/archeflow-event.sh "$RUN_ID" cycle.boundary act "" \ - '{"cycle":<N>,"max_cycles":<M>,"exit_condition":"all_approved","met":false,"next_action":"cycle_back"}' "$SEQ_CHECK_TO_ACT" - ``` - -5. Archive current cycle artifacts: - ```bash - mkdir -p .archeflow/artifacts/${RUN_ID}/cycle-${CYCLE} - cp .archeflow/artifacts/${RUN_ID}/plan-*.md .archeflow/artifacts/${RUN_ID}/cycle-${CYCLE}/ - cp .archeflow/artifacts/${RUN_ID}/do-*.md .archeflow/artifacts/${RUN_ID}/do-*.txt .archeflow/artifacts/${RUN_ID}/cycle-${CYCLE}/ 2>/dev/null || true - cp .archeflow/artifacts/${RUN_ID}/check-*.md .archeflow/artifacts/${RUN_ID}/cycle-${CYCLE}/ - ``` - -6. Increment `CYCLE`, go back to Step 1 (Plan Phase) - -#### 4e. Branch: Max Cycles Reached - -If `CYCLE >= MAX_CYCLES` and issues remain: - -1. Report all unresolved findings to the user -2. Present the best implementation (on its branch, not merged) -3. Let the user decide: merge as-is, fix manually, or abandon -4. Emit `cycle.boundary` with `"met": false, "next_action": "user_decision"` +Report unresolved findings. Present best implementation on its branch (not merged). Let user decide. --- -### 5. Completion +## Feedback Routing Table -```bash -# Emit run.complete -./lib/archeflow-event.sh "$RUN_ID" run.complete act "" \ - '{"status":"completed","cycles":<N>,"agents_total":<count>,"fixes_total":<count>,"shadows":0,"artifacts":[<list>]}' +Route each finding to the right agent for the next cycle: -# Check for regressions from previously fixed findings -if ./lib/archeflow-memory.sh regression-check ".archeflow/events/${RUN_ID}.jsonl"; then - echo "No regressions detected." -else - echo "WARNING: Regressions detected — previously fixed findings have reappeared." - echo "Review the regression output above and consider addressing them." -fi +| Source | Category | Routes To | Rationale | +|--------|----------|-----------|-----------| +| Guardian | security, breaking-change | **Creator** | Design must change | +| Guardian | reliability, dependency | **Creator** | Architectural decision needed | +| Skeptic | design, scalability | **Creator** | Assumptions need revision | +| Sage | quality, consistency | **Maker** | Implementation refinement | +| Sage | testing | **Maker** | Test gap, not design flaw | +| Trickster | reliability (design flaw) | **Creator** | Needs redesign | +| Trickster | reliability (test gap) | **Maker** | Needs more tests | +| Trickster | testing | **Maker** | Edge case not covered | -# Generate report -./lib/archeflow-report.sh .archeflow/events/${RUN_ID}.jsonl +**Disambiguation**: If fix requires changing the approach -> Creator. If fix requires changing the code within the existing approach -> Maker. -# Update run index -echo '{"run_id":"'$RUN_ID'","ts":"'$(date -u +%Y-%m-%dT%H:%M:%SZ)'","task":"<task>","workflow":"<wf>","status":"completed","cycles":<N>}' \ - >> .archeflow/events/index.jsonl -``` +### Feedback File Format -Display the orchestration report to the user (see `archeflow:orchestration` report format). +`act-feedback.md` splits into `## Creator-Routed Issues` and `## Maker-Routed Issues`. Inject only the relevant section into each agent's prompt. + +**Same finding in 2 consecutive cycles**: escalate to user. Do not cycle again blindly. + +**Cross-archetype dedup**: If two reviewers raise the same issue (same file + category), merge into one finding. Route to higher-priority destination (Creator over Maker). --- -## Fix Tracking +## 5. Completion -When the Maker addresses review findings in cycle 2+, emit `fix.applied` for each: - -```bash -./lib/archeflow-event.sh "$RUN_ID" fix.applied act "" \ - '{"source":"<reviewer>","finding":"<description>","file":"<path>","line":<n>}' "$SEQ_OF_REVIEW" -``` - -Also append to `.archeflow/artifacts/${RUN_ID}/act-fixes.jsonl`: -```jsonl -{"source":"guardian","finding":"SQL injection","file":"src/auth.ts","line":48,"fixed_in_cycle":2} -``` - ---- - -## Dry-Run Mode - -When `--dry-run` is specified: - -1. Run **only the Plan phase** (Explorer + Creator) -2. Display: - ``` - Dry run for: "<task>" - Workflow: <standard> (<N> cycles max) - Agents per cycle: <count> - Max agents total: <count * cycles> - Plan phase result: see .archeflow/artifacts/<run_id>/plan-creator.md - Creator confidence: <scores> - Estimated phases: Plan (done) -> Do -> Check -> Act - Proceed with full run? [y/n] - ``` -3. Do NOT emit `run.complete` — the run is paused, not finished -4. If user says yes, continue from `--start-from do` using the saved artifacts - ---- - -## Start-From Mode - -When `--start-from <phase>` is specified: - -| Start from | Required artifacts in `.archeflow/artifacts/<run_id>/` | -|------------|-------------------------------------------------------| -| `plan` | None (equivalent to full run) | -| `do` | `plan-creator.md` | -| `check` | `plan-creator.md`, `do-maker.md`, `do-maker-files.txt` | -| `act` | All `check-*.md` files | - -Validate required artifacts exist. If missing, error: -``` -Cannot start from <phase>: missing artifact <name>. Run the prior phase first. -``` - -When resuming, emit a `run.start` event with `{"resumed_from":"<phase>"}` in data. - ---- - -## Error Handling - -- **Agent fails to return:** Wait up to 5 minutes. If no response, emit `agent.complete` with `"error": true`, log the failure, and abort the run. Do not retry blindly. -- **Event emitter fails:** Log a warning but do not block orchestration. Events are observation, not control flow. -- **Artifact write fails:** This IS blocking. Artifacts are required for phase handoff. Abort and report. -- **Merge conflict:** Do not force-resolve. Report the conflict, leave the branch intact, let the user decide. +1. Emit `run.complete` event +2. Check for regressions: `./lib/archeflow-memory.sh regression-check` +3. Generate report: `./lib/archeflow-report.sh .archeflow/events/<run_id>.jsonl` +4. Score effectiveness: `./lib/archeflow-score.sh extract .archeflow/events/<run_id>.jsonl` +5. Append to run index: `.archeflow/events/index.jsonl` --- ## Progress Display -Throughout the run, display live progress using the format from `archeflow:using-archeflow`: - ``` ━━━ ArcheFlow Run: <task> ━━━━━━━━━━━━━━━━━━━ Run ID: <run_id> | Workflow: <standard> | Cycle: 1/<max> @@ -805,86 +323,137 @@ Report: .archeflow/events/<run_id>.jsonl --- -## Pipeline Strategy Execution +## Shadow Monitoring -When `STRATEGY=pipeline`, execute this linear flow instead of the PDCA cycle above. +Quick-check after each agent completes: -### Pipeline Phases +| Archetype | Shadow | Trigger | +|-----------|--------|---------| +| Explorer | Rabbit Hole | Output >2000 words without Recommendation section | +| Creator | Over-Architect | >2 new abstractions for one feature | +| Maker | Rogue | No tests in changeset, or files outside proposal | +| Guardian | Paranoid | CRITICAL:WARNING ratio >2:1, or zero approvals | +| Skeptic | Paralytic | >7 challenges, <50% have alternatives | +| Trickster | False Alarm | Findings in untouched code, or >10 findings | +| Sage | Bureaucrat | Review >2x code change length | + +On detection: apply correction prompt. On 2nd detection of same shadow: replace agent. On 3+ shadows in one cycle: escalate to user. + +--- + +## Event Reference + +Emit events via `./lib/archeflow-event.sh <run_id> <type> <phase> <agent> '<json>'`. Events are optional — never let logging block orchestration. + +| When | Event Type | Key Data | +|------|-----------|----------| +| Run starts | `run.start` | task, workflow, max_cycles | +| Before agent spawn | `agent.start` | archetype, model, prompt_summary | +| After agent returns | `agent.complete` | archetype, duration_ms, artifacts, summary | +| Phase boundary | `phase.transition` | from, to, artifacts_so_far | +| Alternative chosen | `decision` | what, chosen, alternatives, rationale | +| Reviewer verdict | `review.verdict` | archetype, verdict, findings[] | +| Fix addressing review | `fix.applied` | source, finding, file, line | +| End of PDCA cycle | `cycle.boundary` | cycle, max_cycles, exit_condition, convergence | +| Shadow triggered | `shadow.detected` | archetype, shadow, trigger, action | +| Run ends | `run.complete` | status, cycles, agents_total, fixes_total | + +Parent rules: `run.start` has `parent: []`. Agents parent to the event that triggered them. Phase transitions fan-in from all completing events. Parallel agents share the same parent. + +--- + +## Artifact Naming + +All artifacts live in `.archeflow/artifacts/<run_id>/`: + +| File | Content | +|------|---------| +| `plan-explorer.md` | Explorer research (missing in fast) | +| `plan-creator.md` | Creator proposal with confidence | +| `plan-mini-explorer.md` | Risk research (only if A3 triggers) | +| `do-maker.md` | Maker implementation summary | +| `do-maker-files.txt` | Changed file paths (one per line) | +| `check-guardian.md` | Guardian verdict + findings | +| `check-skeptic.md` | Skeptic verdict (if spawned) | +| `check-sage.md` | Sage verdict (if spawned) | +| `check-trickster.md` | Trickster verdict (if spawned) | +| `act-feedback.md` | Structured feedback for next cycle | +| `act-fixes.jsonl` | Applied fixes log | +| `cycle-<N>/` | Archived artifacts from cycle N | + +Always check artifact existence before injecting. Missing optional artifacts are expected — skip, don't fail. + +Git diff is generated on-the-fly (`git diff main...<maker-branch>`), not saved to disk. + +--- + +## Effectiveness Scoring + +After each run, `./lib/archeflow-score.sh extract` scores review archetypes on: + +| Dimension | Weight | +|-----------|--------| +| Signal-to-noise (useful / total findings) | 0.30 | +| Fix rate (findings that led to fixes) | 0.25 | +| Cost efficiency (useful findings per dollar) | 0.20 | +| Accuracy (not contradicted by others) | 0.15 | +| Cycle impact (led to cycle exit) | 0.10 | + +Scores stored in `.archeflow/memory/effectiveness.jsonl`. After 10+ runs, recommend model tier changes and archetype removal. + +--- + +## Dry-Run Mode + +When `--dry-run`: Run Plan phase only. Display workflow, agent counts, confidence scores, cost estimate. Ask user to proceed. If yes, continue with `--start-from do`. + +## Start-From Mode + +| Start from | Required artifacts | +|------------|--------------------| +| `plan` | None | +| `do` | `plan-creator.md` | +| `check` | `plan-creator.md`, `do-maker.md`, `do-maker-files.txt` | +| `act` | All `check-*.md` files | + +Validate required artifacts exist. Error if missing. + +--- + +## Error Handling + +- **Agent timeout (5 min)**: Emit error event, abort run. Do not retry blindly. +- **Event emitter fails**: Log warning, continue. Events are observation, not control flow. +- **Artifact write fails**: Blocking — abort and report. Artifacts are required for phase handoff. +- **Merge conflict**: Do not force-resolve. Report conflict, leave branch intact, let user decide. + +--- + +## Pipeline Strategy + +Linear flow with no cycle-back. Use for bug fixes and well-understood single-concern tasks. ``` -Plan -> Implement -> Spec-Review -> Quality-Review -> Verify +Plan (Creator only) -> Implement (Maker) -> Spec-Review (Guardian, then Skeptic if findings) -> Quality-Review (Sage) -> Verify (tests + merge) ``` -No cycle-back. Each phase runs once. +1. **Plan**: Spawn Creator with Mini-Reflect (no Explorer). Save `plan-creator.md`. +2. **Implement**: Spawn Maker in worktree. Save `do-maker.md`. +3. **Spec-Review**: Guardian first. Skeptic only if Guardian has findings. +4. **Quality-Review**: Sage reviews proposal + diff + summary. +5. **Verify**: Run tests. If pass and 0 CRITICAL: merge. If CRITICAL: one targeted Maker fix, re-review, re-test. If still failing: abort, report branch name for manual resolution. -### 1. Plan - -Spawn Creator only (no Explorer). Use fast-workflow Creator prompt with Mini-Reflect. - -Save output to `.archeflow/artifacts/${RUN_ID}/plan-creator.md`. - -### 2. Implement - -Spawn Maker in isolated worktree with Creator's proposal. - -Save output to `.archeflow/artifacts/${RUN_ID}/do-maker.md`. - -### 3. Spec-Review - -Run Guardian and Skeptic **sequentially** (Guardian first, then Skeptic only if Guardian has findings). - -- Guardian receives: Maker's git diff + proposal risk section -- Skeptic receives: Creator's proposal (assumptions focus) - -Save to `check-guardian.md` and `check-skeptic.md`. - -### 4. Quality-Review - -Spawn Sage with proposal + diff + implementation summary. - -Save to `check-sage.md`. - -### 5. Verify - -Run the project's test suite. If tests pass and no CRITICAL findings exist: - -1. Merge the Maker's branch -2. Emit `run.complete` - -If CRITICAL findings exist: - -1. **Do NOT merge yet** — the branch remains separate -2. Spawn Maker for a **single targeted fix** — provide only the CRITICAL findings as context -3. Re-run the reviewer(s) that raised the CRITICAL finding(s) on just the fixed files -4. Re-run test suite -5. If tests pass and re-review approves: merge -6. If still failing after this one fix attempt: **abort** — do NOT merge, report to user with the branch name for manual resolution - -```bash -# Pipeline verify: explicit merge guard -if [[ "$VERIFY_PASS" == "true" ]]; then - ./lib/archeflow-git.sh merge "$RUN_ID" --no-ff - ./lib/archeflow-rollback.sh "$RUN_ID" # post-merge test validation -else - echo "Pipeline aborted: CRITICAL findings not resolved after 1 fix attempt." - echo "Branch: archeflow/$RUN_ID (not merged)" - # Emit run.complete with status: aborted -fi -``` - -WARNINGs are logged in the run event but do not block the merge. - -### Pipeline Progress Display +WARNINGs are logged but do not block merge. ``` ━━━ ArcheFlow Pipeline: <task> ━━━━━━━━━━━━━━━━ Run ID: <run_id> | Strategy: pipeline -[Plan] Creator designing... -> done (20s) -[Implement] Maker building... -> done (60s, 3 files) -[Spec] Guardian reviewing... -> APPROVED -[Quality] Sage reviewing... -> APPROVED (1 WARNING) -[Verify] Tests passing... -> merged to main +[Plan] Creator designing... -> done (20s) +[Implement] Maker building... -> done (60s, 3 files) +[Spec] Guardian reviewing... -> APPROVED +[Quality] Sage reviewing... -> APPROVED (1 WARNING) +[Verify] Tests passing... -> merged to main ━━━ Complete: 2m 15s ━━━━━━━━━━━━━━━━━━━━━━━━━━ ```