52 Commits

Author SHA1 Message Date
3ef956485f docs: add HITL discussion — Wiggum Breaks as formal autonomy boundary
New subsection in Discussion framing Wiggum Breaks as the formal boundary
between autonomous and human-supervised operation. Derives HITL from
convergence theory rather than pre-defined approval gates. Covers
oscillation, divergence, and repeated shadow detection as provably
unproductive conditions that trigger human escalation.
2026-04-08 05:21:20 +02:00
1e96d87f49 feat: introduce Wiggum Break as named circuit breaker
Replaces generic "circuit breaker" with "Wiggum Break" — policy enforcement
halt condition named after Chief Wiggum (policy + Ralph Loop's dad).
Hard breaks (immediate halt) and soft breaks (finish then halt) with
wiggum.break event type. Updated both papers and shadow-detection skill.
2026-04-08 05:19:35 +02:00
d99f449083 docs: add Six Sigma Agent, AgileCoder, Reflexion citations to taxonomy paper
Incorporate findings from literature survey: Six Sigma Agent (arXiv:2601.22290)
as the only prior explicit PM/OM-named framework, AgileCoder for Scrum sprints,
Reflexion as implicit PDCA, CAMEL for role theory.
2026-04-08 05:15:55 +02:00
58315ac982 docs: add taxonomy paper — PM/OM methods for agent orchestration
Survey of 12 operations management methods (PDCA, Scrum, DMAIC, Kanban,
TOC, Lean, OODA, Cynefin, Stage-Gate, Design Thinking, TRIZ, FMEA, SPC)
evaluated against 5 agent constraints. Includes compatibility matrix
and decision framework.
2026-04-08 05:13:59 +02:00
24ea632207 docs: add arXiv paper on ArcheFlow architecture
LaTeX paper describing the archetypal role system, PDCA quality cycles,
shadow detection framework, attention filters, convergence detection,
and effectiveness scoring. References Lu et al. 2026 (Assistant Axis)
for persona stability grounding.
2026-04-08 04:54:14 +02:00
55dde5f07a docs: add ArcheFlow roadmap v0.9-v0.12 2026-04-06 23:08:11 +02:00
4f8e2a9962 feat: add run replay for archetype effectiveness analysis
- archeflow-decision.sh records decision points during runs
- archeflow-replay.sh: timeline, whatif, compare commands
- What-if replay with adjustable archetype weights
- /af-replay skill for interactive use
- Tests in archeflow-replay.bats
2026-04-06 21:43:29 +02:00
506143d613 feat: add decision.point event, decision logger, and run replay 2026-04-06 21:33:42 +02:00
607a53f1bf feat: add decision.point event type, decision logger, and run replay script
- archeflow-decision.sh: convenience wrapper for logging PDCA decision points
- archeflow-replay.sh: timeline view and weighted what-if replay for recorded runs
- archeflow-event.sh: add decision.point usage example
- archeflow-dag.sh: render decision.point events in DAG output
2026-04-06 21:33:36 +02:00
6a49c21bbe test: add bats test suite for lib/ helper scripts
110 tests across 10 test files covering all lib/ scripts:
- archeflow-event.sh: JSONL format, seq numbering, parent fields, validation
- archeflow-memory.sh: add/list/decay/forget/inject/extract commands
- archeflow-git.sh: branch creation, commit format, merge strategies, safety
- archeflow-report.sh: markdown output, summary mode, in-progress handling
- archeflow-progress.sh: progress.md generation, JSON mode, error handling
- archeflow-score.sh: archetype scoring, effectiveness report, validation
- archeflow-dag.sh: DAG rendering, color flags, tree structure
- archeflow-rollback.sh: arg parsing, phase validation, mutual exclusivity
- archeflow-init.sh: template listing, clone from project, arg validation
- archeflow-review.sh: diff modes, stats, branch/commit range review

Includes test_helper.bash (shared setup/teardown with temp git repos)
and scripts/run-tests.sh runner.
2026-04-06 21:20:05 +02:00
6bae80b874 feat: add af-status, af-score, af-dag, af-report slash command skills 2026-04-06 21:10:22 +02:00
43a147676e refactor: slim session-start hook from 55 to ~20 lines of injected context
Create ACTIVATION.md as minimal stub for session-start injection.
Full SKILL.md stays in place for on-demand loading when commands are invoked.
2026-04-06 21:10:14 +02:00
14d70689ce refactor: ArcheFlow v0.8.0 — consolidate 27 to 19 skills, corrective action framework 2026-04-06 21:07:01 +02:00
130c04fa58 feat: corrective action framework + CLAUDE.md rewrite + v0.8.0 cleanup
- Extend shadow-detection with 3-layer corrective action framework:
  archetype shadows, system shadows (tunnel vision, echo chamber, etc.),
  and policy boundaries (checkpoints, budget gates, circuit breakers)
- Rewrite CLAUDE.md with proper guardrails (DO/DO NOT, skill writing rules,
  200-line max per skill, no bash pseudo-code in skills)
- Update plugin.json to v0.8.0 with consolidated 19-skill list
- Update README architecture tree and skills reference
- Update using-archeflow version string to v0.8.0 / 19 skills
- Remove 8 empty skill directories (absorbed into run skill)
2026-04-06 20:52:27 +02:00
752177528f refactor: trim act-phase skill from 371 to 140 lines
Remove duplicated routing tables, verbose JSON event examples,
writing/prose domain template (belongs in domains/colette-bridge),
--start-from section (belongs in run skill), and redundant checklist.
Consolidate three Agent() templates into one compact template.
Preserve all routing rules, decision logic, and feedback format.
2026-04-06 20:50:59 +02:00
a1667633ad Merge branch 'refactor/consolidate-check-phase-v2' into refactor/trim-secondary-skills
# Conflicts:
#	skills/colette-bridge/SKILL.md
#	skills/using-archeflow/SKILL.md
2026-04-06 20:50:31 +02:00
d94688ca1b refactor: trim 11 secondary ArcheFlow skills from 3340 to 952 lines
Remove verbose YAML examples, bash pseudo-code, tutorial prose, and
motivational content from configuration/integration skills while
preserving all operational protocols, reference tables, and rules.

Skills trimmed: domains, colette-bridge, multi-project, cost-tracking,
git-integration, custom-archetypes, workflow-design, templates,
autonomous-mode, progress, presence.
2026-04-06 20:48:50 +02:00
c8bd55d97c refactor: consolidate run skill — merge 8 skills into one self-contained PDCA orchestrator
Merge run + orchestration + plan-phase + do-phase + artifact-routing + process-log +
attention-filters + convergence + effectiveness into a single 459-line run/SKILL.md.

Before: run skill (890 lines) + 3 prerequisites (~1,300 lines) = ~2,200 lines of context.
After: one self-contained skill (459 lines) with zero prerequisites.

Preserved: PDCA flow, workflow selection, adaptation rules A1-A3, agent prompts,
attention filters, feedback routing, convergence detection, effectiveness scoring,
shadow monitoring, pipeline strategy, event reference, artifact naming.

Removed: verbose bash code blocks, shell variable tracking, resolve_model() function,
lib validation loops, evidence validation bash, redundant event emission blocks.
2026-04-06 20:44:46 +02:00
55de51aabe chore: trim sprint and using-archeflow skills for context efficiency
sprint: 302 -> 164 lines (removed pseudocode, redundant tables, Prerequisites)
using-archeflow: 185 -> 55 lines (removed archetypes table, PDCA diagram, progress
indicators, dry-run example, full skills reference)
2026-04-06 20:43:23 +02:00
1baaa79946 refactor: simplify memory and shadow-detection skills
Trim verbose implementation details that duplicate what the bash helper
scripts already handle. Memory skill: 278 -> 120 lines. Shadow detection
skill: 180 -> 66 lines. All essential protocols, tables, and commands
preserved; removed redundant algorithm descriptions, multiple examples,
and narrative prose.
2026-04-06 20:43:08 +02:00
8837a359ac refactor: simplify memory and shadow-detection skills
Trim verbose implementation details that duplicate what the bash helper
scripts already handle. Memory skill: 278 -> 120 lines. Shadow detection
skill: 180 -> 66 lines. All essential protocols, tables, and commands
preserved; removed redundant algorithm descriptions, multiple examples,
and narrative prose.
2026-04-06 20:42:47 +02:00
af1f4e7da7 refactor: merge attention-filters into check-phase skill
Consolidate the attention-filters skill (122 lines) into check-phase,
reducing check-phase from 234 to 110 lines. Removed verbose bash code
blocks, 30-line consolidated output example, re-check protocol (belongs
in act-phase), and motivational section. Updated all references in
README, plugin.json, using-archeflow, and colette-bridge.
2026-04-06 20:41:36 +02:00
55a6ba14c9 feat: add Claude plugin marketplace metadata 2026-04-06 18:47:20 +02:00
da13dfba85 docs: add project-specific CLAUDE.md for agent context 2026-04-06 16:57:41 +02:00
e19ff0acc3 refactor: refocus ArcheFlow as workspace orchestrator, not feature-dev competitor
- README: lead with af-sprint (parallel multi-project), af-review (post-impl quality)
- Sprint skill: L/XL code tasks use feature-dev style (explore→plan→impl→self-review)
  instead of PDCA. Reserve PDCA for writing/research domains.
- Session start: route to af-sprint/af-review/af-run based on task type
- Explicitly state: for single-feature dev, use feature-dev plugin instead
2026-04-04 18:44:18 +02:00
1bf1376a80 feat: implement archeflow-review.sh for Guardian-only diff review
Standalone bash script that extracts git diffs for af-review without
PDCA orchestration. Supports --branch, --commit, and uncommitted modes.
Reports stats (files/lines changed) to stderr, diff to stdout.
2026-04-04 18:39:06 +02:00
6309614bfa feat: add sprint runner and review-only skills 2026-04-04 18:21:19 +02:00
aebf55a9a7 docs: add dogfood report #2 (batch API) with 7 improvement hypotheses 2026-04-04 18:05:48 +02:00
b72eed3157 docs: add dogfood comparison report (plain Claude vs ArcheFlow PDCA) 2026-04-04 17:48:44 +02:00
35c9f8269b docs: update status log with v0.7.0 sprint 2026-04-04 09:36:42 +02:00
6854e858a4 fix: address v0.7.0 review findings
- Auto-select: fast workflow now maps to pipeline strategy (was falling through to pdca)
- Evidence validation: check for missing evidence markers, not just banned phrases
- Remove sed-based artifact mutation (avoids table row corruption), track downgrades in events only
- Pipeline verify: explicit merge guard prevents merging before tests/re-review pass
2026-04-04 09:36:05 +02:00
44f0896e3c docs: update CHANGELOG and version for v0.7.0 2026-04-04 09:36:05 +02:00
cfd3267272 docs: add experimental status and interdisciplinary framing to README 2026-04-04 09:36:05 +02:00
29762a8464 feat: add strategy abstraction with pdca and pipeline strategies 2026-04-04 09:36:05 +02:00
a6dcd2c956 feat: add plan granularity constraint to plan-phase and creator 2026-04-04 09:36:05 +02:00
516fe11710 feat: add evidence-gated verification to check phase and reviewers 2026-04-04 09:36:05 +02:00
f10e853d8e feat: add structured status tokens to all agents and run skill 2026-04-04 09:36:05 +02:00
eabf13b9b0 feat: add context isolation protocol to attention-filters and all agents 2026-04-04 09:36:05 +02:00
9b2b4b3527 docs: update status log with v0.4-v0.6 sprint summary 2026-04-04 08:52:12 +02:00
6cb7dad600 docs: add runnable quickstart example 2026-04-04 08:51:19 +02:00
57e95ba151 docs: add v0.4.0 changelog, update to v0.6.0 2026-04-04 08:51:19 +02:00
4e20dc277c fix: normalize agent persona frontmatter and examples 2026-04-04 08:51:19 +02:00
3c7d336c93 feat: add Explorer skip heuristic to plan-phase skill 2026-04-04 08:51:19 +02:00
12575b5a47 feat: expand attention-filters from stub to full skill 2026-04-04 08:51:19 +02:00
362fb9ada9 fix: address v0.5.0 review findings
- Add --to/--test-cmd mutual exclusivity guard in rollback script
- Convert all jq string interpolation to --arg (cmd_extract, cmd_inject, cmd_forget)
- Fix CRITICAL/WARNING grep to match table rows only (not prose)
- Add thorough+cycle-1 guard to fast-path bash snippet in check-phase
- Clarify prev_run_id selection comment (tail -1 = most recent non-current)
2026-04-04 08:44:16 +02:00
c3f5df8161 docs: update CHANGELOG and version for v0.5.0 2026-04-04 08:44:16 +02:00
c5174e88eb feat: flesh out check-phase with parallel reviewer spawning protocol 2026-04-04 08:44:16 +02:00
5e2117c9be feat: add cross-run finding regression detection 2026-04-04 08:44:16 +02:00
30ddc6a2c4 feat: add per-workflow model assignment configuration 2026-04-04 08:44:16 +02:00
e09538e5e0 feat: add phase rollback support to archeflow-rollback.sh 2026-04-04 08:44:16 +02:00
92b56e714b docs: add hook points documentation and config template 2026-04-04 08:44:16 +02:00
008315b0c4 feat: add lib script validation at run initialization 2026-04-04 08:44:16 +02:00
80 changed files with 6786 additions and 6277 deletions

View File

@@ -1,7 +1,10 @@
# ArcheFlow Configuration
# Copy to your project's .archeflow/config.yaml and customize
version: "0.3.0"
version: "0.7.0"
# Strategy — execution shape: pdca (cyclic), pipeline (linear), auto (task-based selection)
strategy: auto
# Budget
costs:
@@ -26,7 +29,54 @@ memory:
max_lessons: 10
decay_after_runs: 10
# Models — default and per-archetype/per-workflow model selection.
# ArcheFlow reads this to assign models to agents. The default applies unless overridden.
models:
default: sonnet
# Per-archetype overrides (uncomment to customize):
# archetypes:
# explorer: haiku # Cheap model for research/exploration
# creator: sonnet # Creative tasks need stronger model
# maker: sonnet # Implementation needs full capability
# guardian: sonnet # Security review — don't skimp
# skeptic: haiku # Assumption checking is analytical
# sage: haiku # Quality review can use cheaper model
# trickster: sonnet # Adversarial testing benefits from stronger model
# Per-workflow overrides (uncomment to customize):
# workflows:
# fast:
# default: haiku # Fast workflow uses cheaper models by default
# archetypes:
# guardian: sonnet # Except Guardian — always needs strong model
# standard:
# default: sonnet
# thorough:
# default: sonnet
# Progress
progress:
enabled: true
file: .archeflow/progress.md
# Hooks — commands to run at orchestration lifecycle events.
# Uncomment and customize as needed.
#
# hooks:
# run-start:
# command: "echo 'ArcheFlow run starting'"
# fail_action: warn # warn | abort
# phase-complete:
# command: "./scripts/on-phase-complete.sh"
# fail_action: warn
# agent-complete:
# command: "./scripts/on-agent-complete.sh"
# fail_action: warn
# pre-merge:
# command: "./scripts/pre-merge-checks.sh"
# fail_action: abort # abort recommended — blocks bad merges
# post-merge:
# command: "./scripts/post-merge-notify.sh"
# fail_action: warn
# run-complete:
# command: "./scripts/on-run-complete.sh"
# fail_action: warn

View File

@@ -0,0 +1,16 @@
{
"name": "claude-archeflow-plugin",
"description": "ArcheFlow plugin marketplace",
"plugins": [
{
"name": "archeflow",
"description": "Multi-agent orchestration with Jungian archetypes. PDCA quality cycles, shadow detection, git worktree isolation.",
"version": "0.3.0",
"path": ".",
"keywords": [
"orchestration", "multi-agent", "archetypes", "pdca",
"code-review", "quality", "worktrees", "shadow-detection"
]
}
]
}

View File

@@ -1,7 +1,7 @@
{
"name": "archeflow",
"description": "Multi-agent orchestration with Jungian archetypes. PDCA quality cycles, shadow detection, git worktree isolation. Zero dependencies — works with any Claude Code session.",
"version": "0.3.0",
"version": "0.9.0",
"author": {
"name": "Chris Nennemann"
},
@@ -14,12 +14,12 @@
"shadow-detection", "workflows"
],
"skills": [
"run", "orchestration", "plan-phase", "do-phase", "check-phase", "act-phase",
"shadow-detection", "attention-filters", "convergence", "artifact-routing",
"process-log", "memory", "effectiveness", "progress",
"colette-bridge", "git-integration", "multi-project",
"custom-archetypes", "workflow-design", "domains", "cost-tracking",
"templates", "autonomous-mode", "using-archeflow", "presence"
"run", "sprint", "review", "check-phase", "act-phase",
"shadow-detection", "memory", "progress", "presence",
"colette-bridge", "git-integration", "multi-project", "cost-tracking",
"custom-archetypes", "workflow-design", "domains",
"templates", "autonomous-mode", "using-archeflow",
"af-status", "af-score", "af-dag", "af-report", "af-replay"
],
"hooks": "hooks/hooks.json"
}

8
.gitignore vendored
View File

@@ -8,3 +8,11 @@ Thumbs.db
# Editor
*.swp
*~
# Paper build artifacts
paper/*.aux
paper/*.bbl
paper/*.blg
paper/*.log
paper/*.out
paper/*.pdf
paper/*.toc

View File

@@ -2,6 +2,54 @@
All notable changes to ArcheFlow are documented in this file.
## [0.9.0] -- 2026-04-06
### Added
- Run replay: `decision.point` events via `archeflow-decision.sh`; `archeflow-replay.sh` with `timeline`, `whatif` (weighted archetype weights + threshold), and `compare`; skill `af-replay`; DAG labels for `decision.point`.
## [0.7.0] -- 2026-04-04
### Added
- Context isolation protocol in attention-filters skill and all 7 agent personas — agents receive only orchestrator-constructed context, no session bleed or cross-agent contamination
- Structured status tokens (`STATUS: DONE`, `DONE_WITH_CONCERNS`, `NEEDS_CONTEXT`, `BLOCKED`) for all agents with orchestrator parsing protocol in run skill
- Evidence-gated verification in check-phase — CRITICAL/WARNING findings require concrete evidence (command output, code citations, reproduction steps); banned speculative phrases auto-downgrade to INFO
- Plan granularity constraint in plan-phase and Creator — each change item must be a 2-5 minute task with exact file path, code block, and verify command
- Strategy abstraction with `pdca` (cyclic) and `pipeline` (linear) execution strategies, auto-selection by task type, and pipeline execution flow in run skill
- Experimental status and interdisciplinary framing in README
## [0.6.0] -- 2026-04-04
### Added
- Expanded attention-filters skill with prompt templates, token budgets, cycle-back filtering, and verification checklist
- Explorer skip heuristic in plan-phase with decision table for when to skip/require research
- Runnable quickstart example (`examples/runnable-quickstart.md`)
### Fixed
- Normalized agent persona frontmatter: added examples, moved isolation note to Rules, documented model choices
## [0.5.0] -- 2026-04-04
### Added
- Lib script validation at run initialization — fail fast if required scripts or `jq` are missing
- Hook points documentation with 6 lifecycle events (run-start, phase-complete, agent-complete, pre-merge, post-merge, run-complete) and config template
- Phase rollback support in `archeflow-rollback.sh` via `--to <phase>` flag
- Per-workflow model assignment configuration with fallback chain (per-workflow per-archetype > per-workflow default > per-archetype > global default)
- Cross-run finding regression detection in `archeflow-memory.sh` — compares current findings against previously resolved fixes
- Check-phase parallel reviewer spawning protocol with Guardian-first sequence, A2 fast-path evaluation, timeout handling, and re-check protocol
## [0.4.0] -- 2026-04-04
### Added
- Confidence gate parsing with bash snippets for extracting scores from `plan-creator.md`
- Mini-Explorer spawning when risk coverage < 0.5
- Worktree merge flow with explicit pre-merge hooks and post-merge test validation
- `archeflow-rollback.sh` for post-merge test failure auto-revert
- Test-first validation gate in Do phase
- Memory injection audit trail with `--audit` flag and `audit-check` command
### Fixed
- Unified feedback routing tables across orchestration, act-phase, artifact-routing
## [0.3.0] -- 2026-04-03
### Added

119
CLAUDE.md Normal file
View File

@@ -0,0 +1,119 @@
# archeflow — Multi-Agent Orchestration Plugin for Claude Code
PDCA quality cycles with Jungian archetype roles, corrective action framework, sprint runner, and post-implementation review. Zero dependencies — pure Bash + Markdown.
## Architecture
```
skills/ Slash commands and internal protocols (one SKILL.md per dir)
run/ /af-run — self-contained PDCA orchestration (core skill)
sprint/ /af-sprint — queue-driven parallel agent dispatch
review/ /af-review — Guardian-led code review
check-phase/ Shared reviewer protocol (used by run + review)
act-phase/ Finding collection, fix routing, exit decisions
shadow-detection/ Corrective action framework (archetype + system + policy)
memory/ Cross-run lessons learned
cost-tracking/ Token/cost awareness and budget enforcement
domains/ Domain detection (code, writing, research)
colette-bridge/ Writing context loader from colette.yaml
multi-project/ Cross-repo orchestration with dependency DAG
git-integration/ Per-phase commits, branch strategy, rollback
templates/ Workflow/team bundle gallery
autonomous-mode/ Unattended session protocol
using-archeflow/ Session-start activation (auto-loaded via hook)
agents/ Archetype personality definitions (one .md per archetype)
lib/ Bash helper scripts (events, git, memory, progress, etc.)
hooks/ Session-start hook (injects using-archeflow)
templates/bundles/ Pre-configured workflow bundles
```
## Commands
| Command | Purpose |
|---------|---------|
| `/af-run <task>` | PDCA orchestration with full agent cycle |
| `/af-sprint` | Work the queue across projects |
| `/af-review` | Review existing code changes |
| `/af-status` | Current/last run status |
| `/af-init` | Initialize ArcheFlow in a project |
| `/af-score` | Archetype effectiveness scores |
| `/af-memory` | Cross-run lesson memory |
| `/af-report` | Full process report |
| `/af-fanout` | Colette book fanout via agents |
## Core Concepts
### PDCA Cycle
```
Plan (Explorer + Creator) -> Do (Maker in worktree) -> Check (Guardian first, then others) -> Act (fix, merge, or cycle)
```
### Archetypes
Explorer (research), Creator (design), Maker (implement), Guardian (security), Skeptic (assumptions), Trickster (edge cases), Sage (quality). Each has a virtue and a shadow — see `shadow-detection` skill.
### Corrective Action Framework
Three layers, one escalation protocol:
- **Archetype shadows** — individual agent dysfunction
- **System shadows** — orchestration-level issues (echo chamber, tunnel vision, scope creep)
- **Policy boundaries** — operational limits (checkpoints, budgets, Wiggum Breaks)
### Workflows
| Risk Level | Workflow | Agents |
|------------|----------|--------|
| Low | `fast` | Creator -> Maker -> Guardian |
| Medium | `standard` | Explorer + Creator -> Maker -> Guardian + Skeptic + Sage |
| High | `thorough` | Explorer + Creator -> Maker -> All 4 reviewers |
## Guardrails
### DO
- Keep skills self-contained. The `run` skill needs zero prerequisites — it was consolidated for a reason.
- Write skills as operational instructions Claude can follow, not software specifications.
- Use tables for reference data, numbered steps for protocols.
- Emit events via `./lib/archeflow-event.sh` — but never let logging block orchestration.
- Maintain the corrective action framework when adding new agent types.
- Test skill changes by running `/af-run --dry-run` and verifying the flow.
- Keep archetype personalities distinct — each agent definition in `agents/` has a specific voice.
### DO NOT
- **Add runtime dependencies.** This must stay zero-dependency (Bash + Markdown only).
- **Bloat skills back up.** The consolidation from 27 to ~15 skills was intentional. Do not create new skills for internal implementation details — inline them.
- **Write bash pseudo-code in skills.** Skills are Claude instructions, not shell scripts. Use one-liner commands or lib script references, not multi-line bash blocks.
- **Duplicate protocol definitions.** Finding format lives in `check-phase`. Routing table lives in `act-phase`. Shadow detection lives in `shadow-detection`. One source of truth per concept.
- **Skip the Check phase** in PDCA cycles. It's the quality gate.
- **Change archetype personalities** without updating all referencing skills and agent definitions.
- **Use ArcheFlow for trivial tasks.** Single-file fixes, config changes, questions — just do them directly.
- **Let skills exceed ~200 lines.** If a skill is growing past this, it probably needs splitting or the content belongs in a lib script.
### Skill Writing Rules
1. **Frontmatter**: `name` (kebab-case), `description` (one-liner + `<example>` tags for user-invocable skills)
2. **Structure**: Imperative voice. Lead with what to do, not why. Tables > prose. Steps > paragraphs.
3. **Agent templates**: Keep Agent() spawn templates concise. Include only the prompt, subagent_type, and isolation mode.
4. **Cross-references**: Use `archeflow:<skill-name>` backtick syntax to reference other skills. Avoid circular dependencies.
5. **Bash commands**: One-liners only in skills. Multi-step logic belongs in `lib/` scripts.
### Cost Awareness
- Prefer cheap models (haiku) for analytical tasks (validation, diff scoring)
- Use capable models (sonnet/opus) for creative tasks (writing, complex design)
- Budget enforcement via `cost-tracking` skill and `.archeflow/config.yaml`
- Track token spend per agent in events for post-run analysis
### Git Rules
- Signing: `git config gpg.format ssh`, key at `~/.ssh/id_ed25519_dev.pub`
- Push: `GIT_SSH_COMMAND="ssh -i /home/c/.ssh/id_ed25519_dev -o IdentitiesOnly=yes" git push origin main`
- Conventional commits: `feat:`, `fix:`, `chore:`, `docs:`, `refactor:`
- No Co-Authored-By trailers
- All work on worktree branches until explicitly merged
- Merges use `--no-ff` (individually revertable)
## Dogfooding
When using ArcheFlow to develop ArcheFlow itself:
- Log observations to `.archeflow/memory/lessons.jsonl`
- Note friction points, shadow false positives, skill gaps
- Test skill changes with `/af-run --dry-run` before committing

191
README.md
View File

@@ -1,16 +1,37 @@
# ArcheFlow -- Multi-Agent Orchestration for Claude Code
# ArcheFlow -- Workspace Orchestration for Claude Code
**Structured quality through archetypal collaboration.** ArcheFlow coordinates multiple Claude Code agents through PDCA cycles, where each agent embodies a Jungian archetype with defined strengths and known failure modes.
**Run parallel agent teams across your entire project portfolio.** ArcheFlow reads a task queue, spawns agents across multiple projects simultaneously, collects results, commits, and keeps going. Built for developers managing 10-30 repos who want throughput, not ceremony.
Zero dependencies. No build step. Install and go.
> **Status: Experimental.** ArcheFlow is a research prototype exploring the intersection of
> analytical psychology (Jungian archetypes), process engineering (PDCA cycles), and
> multi-agent software engineering. It is functional and actively developed, but not production-ready.
> APIs, skill formats, and orchestration behavior may change between versions.
## What It Does
Large coding tasks benefit from multiple perspectives, but "just spawn more agents" creates chaos. Agents duplicate work, miss each other's output, argue in circles, or go rogue. The problem is not intelligence -- it is coordination.
ArcheFlow solves three problems:
ArcheFlow solves this by giving each agent an *archetype*: a behavioral protocol that defines what the agent cares about, what context it receives, and how its output feeds into the next phase. Seven archetypes collaborate through **Plan-Do-Check-Act cycles**, where each iteration builds on structured feedback from the last. No unreviewed code reaches your main branch.
**1. Workspace Sprint Runner** (`/af-sprint`) -- The primary mode. Reads your task queue, picks the highest-priority items across different projects, spawns 3-5 agents in parallel, collects results, commits+pushes, and immediately starts the next batch. Turns a 25-item backlog into done work while you watch (or don't).
The key insight: archetypes are not just system prompts. Each one has a **virtue** (its unique contribution) and a **shadow** (the dysfunction it falls into when pushed too far). ArcheFlow monitors for shadow activation and course-corrects automatically -- replacing an agent that blocks everything, reining in one that researches forever, or escalating when a maker goes off-script.
**2. Post-Implementation Review** (`/af-review`) -- Run security and quality review on any diff, branch, or commit range. No planning, no implementation orchestration -- just Guardian analysis of what could go wrong. The highest-ROI mode for catching design-level bugs that linters miss.
**3. Deep Orchestration** (`/af-run`) -- For complex tasks that need structured exploration, design, implementation, and multi-perspective review. Uses archetypal roles (Explorer, Creator, Maker, Guardian) through PDCA cycles. Best for security-sensitive changes, multi-module refactors, and creative writing.
### When to use what
| Situation | Command | Why |
|-----------|---------|-----|
| Work the backlog | `/af-sprint` | Parallel agents, maximum throughput |
| Review before merging | `/af-review` | Catch design bugs, not style nits |
| Complex feature (L/XL) | `/af-run` or `feature-dev` | Structured exploration + review |
| Simple fix (S/M) | Just do it | No orchestration overhead needed |
| Creative writing | `/af-run --domain writing` | Archetypes shine here -- no linters exist for prose |
### What ArcheFlow is NOT
ArcheFlow is not a feature development tool. For single-feature implementation with user interaction at every step (clarify requirements, choose architecture, review), use Claude Code's `feature-dev` plugin or work directly. ArcheFlow adds value through **parallel execution across projects** and **domain-specific quality review** (writing, research), not by competing with single-task development tools.
## Quick Start
@@ -54,50 +75,61 @@ After installing, run `/reload-plugins` or restart Claude Code. ArcheFlow activa
- `--scope project` — only in the current project
- `--scope local` — only in the current directory
### 2. Run your first orchestration
Just describe a task. ArcheFlow activates automatically for multi-file changes:
### 2. Run your first sprint
```
> Add input validation to all API endpoints
> /af-sprint
```
Or invoke it explicitly:
ArcheFlow reads your task queue (`docs/orchestra/queue.json`), picks the highest-priority items, and spawns parallel agents:
```
> archeflow:run "Add JWT authentication" --workflow standard
── af-sprint: Batch 1 ──────────────────────────
🔸 writing.colette config parser expansion [P2, M] running
🔸 product.jobradar search API endpoint [P3, M] running
🔸 tool.git-alm SVG export + minimap [P3, M] running
🔸 product.game-factory completion tracking [P3, S] running
────────────────────────────────────────────────
[5 min later]
── Batch 1 complete ────────────────────────────
✓ writing.colette config parser done (3m24s)
✓ product.jobradar search API done (5m01s)
✓ tool.git-alm SVG export done (4m30s)
✓ product.game-factory tracking done (2m15s)
4 tasks · 4 projects · all committed + pushed
Next batch: 2 items ready → dispatching...
────────────────────────────────────────────────
```
### 3. What happens
ArcheFlow selects a workflow (fast, standard, or thorough) and runs a PDCA cycle:
### 3. Review before merging
```
Plan --> Explorer researches codebase context, Creator designs a proposal
Do --> Maker implements in an isolated git worktree
Check --> Reviewers assess in parallel (Guardian, Skeptic, Sage, Trickster)
Act --> All approved? Merge. Issues found? Cycle back with structured feedback.
Each cycle catches what the last one missed.
> /af-review --branch feat/batch-api
```
Progress is visible in real time:
Guardian analyzes the diff for error handling gaps, security issues, and data loss scenarios:
```
--- ArcheFlow: Add JWT authentication ---------
Workflow: standard (2 cycles max)
🔍 [Plan] Explorer researching... done (35s)
🏗️ [Plan] Creator designing proposal... done (25s, confidence: 0.8)
⚒️ [Do] Maker implementing... done (90s, 4 files, 8 tests)
🛡️ [Check] Guardian reviewing... APPROVED
🤔 [Check] Skeptic challenging... APPROVED (1 INFO)
📚 [Check] Sage reviewing... APPROVED
[Act] All approved -- merging... merged to main
--- Complete: 3m 10s, 1 cycle -----------------
── af-review: writing.colette ─────────────────
🛡️ Guardian: 2 findings (1 HIGH, 1 MEDIUM)
[HIGH] Timeout marks variant as done — loses batch state (fanout.py:552)
[MEDIUM] No JSON error handling on corrupted state (batch.py:310)
────────────────────────────────────────────────
```
### 4. Deep orchestration (when needed)
For complex, security-sensitive, or creative tasks:
```
> /af-run "Add JWT authentication" --workflow standard
```
This runs the full PDCA cycle with archetypal roles. See "Deep Orchestration" below for details.
## The Seven Archetypes
| Archetype | Phase | Virtue | Shadow | Role |
@@ -114,69 +146,61 @@ Shadow detection is quantitative, not vibes. Explorer output exceeding 2000 word
## Skills Reference
ArcheFlow ships with 24 skills organized by function.
ArcheFlow ships with 19 skills organized by function. The `run` skill is self-contained -- no prerequisites needed.
### Core Orchestration
| Skill | Description |
|-------|-------------|
| `archeflow:run` | Automated PDCA execution loop -- single-command orchestration with `--start-from`, `--dry-run`, and cycle-back |
| `archeflow:orchestration` | Step-by-step PDCA execution guide for manual orchestration |
| `archeflow:plan-phase` | Explorer and Creator output formats and protocols |
| `archeflow:do-phase` | Maker implementation rules and worktree commit strategy |
| `archeflow:check-phase` | Shared reviewer protocols and output format |
| `archeflow:act-phase` | Post-Check decision logic: collect findings, route fixes, exit or cycle |
| `archeflow:run` | Self-contained PDCA orchestration -- Plan/Do/Check/Act with adaptation rules, pipeline strategy, and cycle-back |
| `archeflow:sprint` | Queue-driven parallel agent dispatch across projects (primary mode) |
| `archeflow:review` | Guardian-led code review on diff/branch/commit range |
| `archeflow:check-phase` | Shared reviewer protocol -- finding format, evidence requirements, attention filters |
| `archeflow:act-phase` | Finding collection, fix routing, exit decisions |
### Quality and Safety
| Skill | Description |
|-------|-------------|
| `archeflow:shadow-detection` | Quantitative dysfunction detection and automatic correction |
| `archeflow:attention-filters` | Context optimization per archetype -- each agent gets only what it needs |
| `archeflow:convergence` | Detects convergence, stalling, and oscillation in multi-cycle runs |
| `archeflow:artifact-routing` | Inter-phase artifact protocol -- naming, storage, routing, archiving |
### Process Intelligence
| Skill | Description |
|-------|-------------|
| `archeflow:process-log` | Event-sourced JSONL logging with DAG parent relationships |
| `archeflow:shadow-detection` | Corrective action framework -- archetype shadows, system shadows, policy boundaries |
| `archeflow:memory` | Cross-run memory that learns recurring findings and injects lessons |
| `archeflow:effectiveness` | Archetype scoring on signal-to-noise, fix rate, cost efficiency |
| `archeflow:progress` | Live progress file watchable from a second terminal |
### Integration
| Skill | Description |
|-------|-------------|
| `archeflow:colette-bridge` | Bridges ArcheFlow with the Colette writing platform |
| `archeflow:git-integration` | Git-per-phase commits, branch-per-run, rollback to any phase boundary |
| `archeflow:git-integration` | Per-phase commits, branch-per-run, rollback |
| `archeflow:multi-project` | Cross-repo orchestration with dependency DAG and shared budget |
| `archeflow:cost-tracking` | Budget enforcement, per-agent cost aggregation, model tier recommendations |
### Configuration
| Skill | Description |
|-------|-------------|
| `archeflow:domains` | Domain adapters for writing, research, and non-code workflows |
| `archeflow:custom-archetypes` | Create domain-specific roles (database reviewer, compliance auditor, etc.) |
| `archeflow:workflow-design` | Design custom workflows with per-phase archetype assignment and exit conditions |
| `archeflow:domains` | Domain adapters for writing, research, and other non-code workflows |
| `archeflow:cost-tracking` | Budget enforcement, per-agent cost aggregation, model tier recommendations |
| `archeflow:workflow-design` | Design custom workflows with per-phase archetype assignment |
| `archeflow:templates` | Template gallery for sharing workflows, teams, and setup bundles |
| `archeflow:autonomous-mode` | Unattended overnight sessions with progress logging and safe stopping |
| `archeflow:autonomous-mode` | Unattended sessions with corrective action checkpoints |
| `archeflow:progress` | Live progress file watchable from a second terminal |
| `archeflow:presence` | User-facing output format -- show outcomes, not mechanics |
### Meta
| Skill | Description |
|-------|-------------|
| `archeflow:using-archeflow` | Session-start skill -- activation criteria, workflow selection, quick reference |
| `archeflow:using-archeflow` | Session-start activation -- decision tree, workflow selection, commands |
## Library Scripts
Eight shell scripts in `lib/` power the process infrastructure.
Ten shell scripts in `lib/` power the process infrastructure.
| Script | Purpose | Usage |
|--------|---------|-------|
| `archeflow-event.sh` | Append structured JSONL events to a run log | `archeflow-event.sh <run_id> <type> <phase> <agent> '<json>'` |
| `archeflow-decision.sh` | Log a `decision.point` (phase, archetype, input, decision, confidence) | `archeflow-decision.sh <run_id> check guardian 'diff' 'needs_changes' 0.85` |
| `archeflow-replay.sh` | Timeline + weighted what-if over recorded verdicts | `archeflow-replay.sh compare <run_id> --weights sage=2,guardian=1` |
| `archeflow-dag.sh` | Render ASCII DAG from JSONL events | `archeflow-dag.sh events.jsonl --color` |
| `archeflow-report.sh` | Generate Markdown process report | `archeflow-report.sh events.jsonl --output report.md --dag` |
| `archeflow-progress.sh` | Regenerate live progress file from events | `archeflow-progress.sh <run_id>` |
@@ -309,47 +333,28 @@ archetypes: [explorer, creator, maker, guardian, db-specialist]
```
archeflow/
├── .claude-plugin/plugin.json # Plugin manifest (v0.3.0)
├── .claude-plugin/plugin.json # Plugin manifest
├── agents/ # 7 archetype personas (behavioral protocols)
│ ├── explorer.md # Plan: research and context mapping
│ ├── creator.md # Plan: solution design and proposals
── maker.md # Do: implementation in isolated worktree
├── guardian.md # Check: security and reliability review
├── skeptic.md # Check: assumption challenging
│ ├── trickster.md # Check: adversarial testing
── sage.md # Check: holistic quality review
├── skills/ # 24 behavioral skills
│ ├── run/ # Automated PDCA loop
│ ├── orchestration/ # Manual PDCA execution guide
│ ├── plan-phase/ # Plan protocols
│ ├── do-phase/ # Do protocols
│ ├── check-phase/ # Check protocols
│ ├── act-phase/ # Act phase decision logic
│ ├── shadow-detection/ # Dysfunction detection
│ ├── attention-filters/ # Context optimization
│ ├── convergence/ # Cycle convergence detection
│ ├── artifact-routing/ # Inter-phase artifact protocol
│ ├── process-log/ # Event-sourced JSONL logging
│ ├── explorer.md, creator.md # Plan phase agents
│ ├── maker.md # Do phase agent
── guardian.md, skeptic.md, # Check phase agents
trickster.md, sage.md
├── skills/ # 19 skills (consolidated from 27)
│ ├── run/ # Self-contained PDCA orchestration (core)
── sprint/ # Queue-driven parallel agent dispatch
│ ├── review/ # Guardian-led code review
│ ├── check-phase/ # Shared reviewer protocol + attention filters
│ ├── act-phase/ # Finding collection + fix routing
│ ├── shadow-detection/ # Corrective action framework (3 layers)
│ ├── memory/ # Cross-run learning
── effectiveness/ # Archetype scoring
│ ├── progress/ # Live progress file
│ ├── colette-bridge/ # Colette writing platform bridge
│ ├── git-integration/ # Per-phase git commits
│ ├── multi-project/ # Cross-repo orchestration
│ ├── custom-archetypes/ # Domain-specific roles
│ ├── workflow-design/ # Custom workflow design
│ ├── domains/ # Domain adapters
│ ├── cost-tracking/ # Budget and cost management
│ ├── templates/ # Template gallery
│ ├── autonomous-mode/ # Unattended sessions
│ └── using-archeflow/ # Session-start activation
├── lib/ # 8 shell scripts (process infrastructure)
── ... # + 12 config/integration skills
├── lib/ # 10 shell scripts (events, git, memory, etc.)
├── hooks/ # Auto-activation (SessionStart)
├── examples/ # Walkthroughs, templates, custom archetypes
└── docs/ # Roadmap, changelog
```
The flow: skills define behavioral rules (what agents should do), agents define personas (how they think), lib scripts handle tooling (event logging, git, reporting), and hooks wire it all together at session start. Events are emitted at every phase transition, forming a DAG that can be rendered, reported, or scored after the run.
Skills define behavioral rules, agents define personas, lib scripts handle tooling, hooks wire it together at session start. The `run` skill is self-contained -- it absorbed 8 previously separate skills (orchestration, plan-phase, do-phase, artifact-routing, process-log, convergence, effectiveness, attention-filters) into one 459-line operational guide.
## Philosophy

View File

@@ -46,8 +46,16 @@ For the full output format (including Mini-Reflect, Alternatives Considered, and
| <option B> | <reason> |
### Changes
1. **`path/file.ext`** — What changes and why
1. **`path/file.ext:line`** — What changes and why
```language
<target code state>
```
**Verify:** `<command to confirm correctness>`
2. **`path/test.ext`** — What tests to add
```language
<test code>
```
**Verify:** `<test command>`
### Test Strategy
- <specific test cases>
@@ -67,11 +75,24 @@ For the full output format (including Mini-Reflect, Alternatives Considered, and
```
## Rules
- **Context isolation:** You receive only what the orchestrator provides. Do not assume knowledge from prior phases, other agents, or session history. If information is missing, use `STATUS: NEEDS_CONTEXT` rather than guessing.
- Be decisive. One proposal, not three alternatives (but list alternatives you rejected).
- Name every file. The Maker needs exact paths.
- Scope ruthlessly. Adjacent problems go under "Not Doing."
- Include test strategy. No proposal is complete without it.
- **Granularity:** Each change item must be a 2-5 minute task with exact file path, code block showing the target state, and a verify command. If an item would take >5 minutes, split it. If a non-trivial task has <2 items, you under-specified.
- Any Confidence axis < 0.5? Flag it — the orchestrator may pause or escalate.
## Status Token
End your output with exactly one status line:
- `STATUS: DONE` — proposal ready with confidence scores
- `STATUS: DONE_WITH_CONCERNS` — proposal ready but low confidence on one or more axes
- `STATUS: NEEDS_CONTEXT` — cannot proceed without additional information (describe what is missing)
- `STATUS: BLOCKED` — unresolvable obstacle (describe it)
This line MUST be the last non-empty line of your output.
## Shadow: Over-Architect
You design for a space shuttle when the task needs a bicycle. Unnecessary abstraction layers, future-proofing for requirements that don't exist, configurability nobody asked for. If the proposal has more infrastructure than business logic — simplify. Design for the current order of magnitude, not 100x.

View File

@@ -4,7 +4,7 @@ description: |
Spawn as the Explorer archetype for the Plan phase — researches codebase context, maps dependencies, identifies patterns, and synthesizes findings.
<example>User: "Research the auth module before we redesign it"</example>
<example>Part of ArcheFlow Plan phase</example>
model: haiku
model: haiku # Cost optimization: research/exploration is analytical, cheaper model suffices
---
You are the **Explorer** archetype 🔍. You gather context so the team can make informed decisions.
@@ -45,9 +45,21 @@ You see the landscape before anyone acts. You map dependencies, spot existing pa
```
## Rules
- **Context isolation:** You receive only what the orchestrator provides. Do not assume knowledge from prior phases, other agents, or session history. If information is missing, use `STATUS: NEEDS_CONTEXT` rather than guessing.
- Synthesize, don't dump. Raw file lists are useless.
- Stay focused on the task. Interesting tangents go in a "See Also" footnote, not the main report.
- Cap your research at 15 files. If you need more, the task is too broad.
## Status Token
End your output with exactly one status line:
- `STATUS: DONE` — research complete, findings ready
- `STATUS: DONE_WITH_CONCERNS` — research complete but gaps remain (noted in output)
- `STATUS: NEEDS_CONTEXT` — cannot proceed without additional information (describe what is missing)
- `STATUS: BLOCKED` — unresolvable obstacle (describe it)
This line MUST be the last non-empty line of your output.
## Shadow: Rabbit Hole
Your curiosity becomes compulsive investigation. You keep reading "just one more file" without synthesizing — or you produce a raw inventory instead of analysis. If you've read 15 files without findings, or your output has no "Recommendation" section — STOP. Synthesize what you have. A dump is not research. Good-enough now beats perfect never.

View File

@@ -36,9 +36,22 @@ You see attack surfaces others walk past. You calibrate your response to actual
- **INFO** — Minor hardening opportunity.
## Rules
- **Context isolation:** You receive only what the orchestrator provides. Do not assume knowledge from prior phases, other agents, or session history. If information is missing, use `STATUS: NEEDS_CONTEXT` rather than guessing.
- APPROVED = zero CRITICAL findings
- Every finding needs a suggested fix, not just a complaint
- **Evidence required:** Every CRITICAL or WARNING must cite a specific command output, exit code, or exact code with file path and line numbers. Findings without evidence are downgraded to INFO by the orchestrator.
- Be rigorous but practical — flag real risks, not science fiction
## Status Token
End your output with exactly one status line:
- `STATUS: DONE` — review complete, verdict and findings ready
- `STATUS: DONE_WITH_CONCERNS` — review complete but some areas could not be fully assessed
- `STATUS: NEEDS_CONTEXT` — cannot proceed without additional information (describe what is missing)
- `STATUS: BLOCKED` — unresolvable obstacle (describe it)
This line MUST be the last non-empty line of your output.
## Shadow: Paranoid
Your risk awareness becomes blocking everything. Every finding is CRITICAL, every risk is existential, and you reject without suggesting how to fix it. Ask: "Would a senior engineer block this PR for this?" If no, downgrade. Every rejection MUST include a specific fix — if you can't suggest one, you don't understand the problem well enough to reject.

View File

@@ -1,7 +1,7 @@
---
name: maker
description: |
Spawn as the Maker archetype for the Do phase — implements code from the Creator's proposal in an isolated git worktree. Always use with isolation: "worktree".
Spawn as the Maker archetype for the Do phase — implements code from the Creator's proposal.
<example>Part of ArcheFlow Do phase</example>
model: inherit
---
@@ -45,6 +45,8 @@ You turn plans into working, tested, committed code. Small steps, steady progres
```
## Rules
- **Context isolation:** You receive only what the orchestrator provides. Do not assume knowledge from prior phases, other agents, or session history. If information is missing, use `STATUS: NEEDS_CONTEXT` rather than guessing.
- **Isolation:** Always spawn with `isolation: "worktree"` to work in a dedicated git worktree.
- Follow the proposal. Don't redesign.
- Tests before implementation. Always.
- Commit after each logical step. Not one big commit at the end.
@@ -52,5 +54,16 @@ You turn plans into working, tested, committed code. Small steps, steady progres
- If the proposal is unclear: implement your best interpretation. Note what you assumed.
- If you find a blocker: document it and stop. Don't silently work around it.
## Status Token
End your output with exactly one status line:
- `STATUS: DONE` — implementation complete, all commits made
- `STATUS: DONE_WITH_CONCERNS` — implementation complete but assumptions were made (noted in output)
- `STATUS: NEEDS_CONTEXT` — cannot proceed without additional information (describe what is missing)
- `STATUS: BLOCKED` — unresolvable obstacle (describe it)
This line MUST be the last non-empty line of your output.
## Shadow: Rogue
Your bias for action becomes reckless shipping. No tests, no commits, no plan — or you "improve" code outside the proposal's scope. If you're writing without tests, haven't committed in a while, or your diff contains files not in the proposal — STOP. Read the proposal. Write a test. Commit. Revert extras.

View File

@@ -46,10 +46,23 @@ You see the forest, not just the trees. "Will a new team member understand this
- Are existing docs/comments still accurate after the change?
## Rules
- **Context isolation:** You receive only what the orchestrator provides. Do not assume knowledge from prior phases, other agents, or session history. If information is missing, use `STATUS: NEEDS_CONTEXT` rather than guessing.
- APPROVED = code is readable, tested, consistent, and complete
- REJECTED = significant quality issues that affect maintainability
- **Evidence required:** Quality findings must cite specific code (file:line, exact construct) or measurable criteria. Do not raise vague suggestions — if you cannot point to the code, do not raise the finding.
- Focus on the next 6 months. Not the next 6 years.
- Your review should be shorter than the code change. If it's not, you're over-reviewing.
## Status Token
End your output with exactly one status line:
- `STATUS: DONE` — review complete, verdict and findings ready
- `STATUS: DONE_WITH_CONCERNS` — review complete but some quality dimensions could not be assessed
- `STATUS: NEEDS_CONTEXT` — cannot proceed without additional information (describe what is missing)
- `STATUS: BLOCKED` — unresolvable obstacle (describe it)
This line MUST be the last non-empty line of your output.
## Shadow: Bureaucrat
Your thoroughness becomes bloat. Your review is longer than the code change, you're suggesting improvements to untouched code, or producing deep-sounding analysis without actionable findings. If you can't state the consequence of NOT fixing it, don't raise it. If a finding doesn't end with a specific action, delete it. Insight without action is noise.

View File

@@ -2,6 +2,7 @@
name: skeptic
description: |
Spawn as the Skeptic archetype for the Check phase — challenges assumptions, identifies untested scenarios, and proposes alternatives the team hasn't considered.
<example>User: "Challenge the assumptions in this proposal"</example>
<example>Part of ArcheFlow Check phase</example>
model: inherit
---
@@ -32,11 +33,24 @@ You make the implicit explicit. "The plan assumes X — but does X actually hold
```
## Rules
- **Context isolation:** You receive only what the orchestrator provides. Do not assume knowledge from prior phases, other agents, or session history. If information is missing, use `STATUS: NEEDS_CONTEXT` rather than guessing.
- Every challenge MUST include an alternative. "This might not work" alone is not helpful.
- Limit to 3-5 challenges. More than 7 is shadow behavior.
- **Evidence required:** Every challenge must reference specific code (file:line) or describe a concrete scenario with reproduction steps. Vague concerns without evidence are downgraded to INFO by the orchestrator.
- Stay in scope. Challenge the task's assumptions, not the universe's.
- APPROVED = no fundamental design flaws
- REJECTED = the approach is wrong, and you have a better one
## Status Token
End your output with exactly one status line:
- `STATUS: DONE` — review complete, verdict and findings ready
- `STATUS: DONE_WITH_CONCERNS` — review complete but some assumptions could not be verified
- `STATUS: NEEDS_CONTEXT` — cannot proceed without additional information (describe what is missing)
- `STATUS: BLOCKED` — unresolvable obstacle (describe it)
This line MUST be the last non-empty line of your output.
## Shadow: Paralytic
Your critical thinking becomes inability to approve anything. You list 7+ challenges, chain "what about X?" tangents, or question things outside the task — each plausible alone, none actionable together. STOP. Rank by impact. Keep top 3. Each must include an alternative. Delete the rest.

View File

@@ -4,7 +4,7 @@ description: |
Spawn as the Trickster archetype for the Check phase (thorough workflow only) — adversarial testing, boundary attacks, edge case exploitation, and chaos engineering.
<example>User: "Try to break the new input handler"</example>
<example>Part of ArcheFlow thorough Check phase</example>
model: haiku
model: haiku # Cost optimization: adversarial testing is pattern-matching, cheaper model suffices
---
You are the **Trickster** archetype 🃏. You break things so users don't have to.
@@ -39,10 +39,22 @@ You think like an attacker, a clumsy user, a failing network. You find the edges
```
## Rules
- **Context isolation:** You receive only what the orchestrator provides. Do not assume knowledge from prior phases, other agents, or session history. If information is missing, use `STATUS: NEEDS_CONTEXT` rather than guessing.
- Test ONLY the changed code, not the entire system
- Every finding needs exact reproduction steps
- If you can't break it after 5 serious attempts — APPROVED. The code is resilient.
- Constructive chaos only. Your goal is quality, not destruction.
## Status Token
End your output with exactly one status line:
- `STATUS: DONE` — review complete, verdict and findings ready
- `STATUS: DONE_WITH_CONCERNS` — testing complete but some attack vectors could not be exercised
- `STATUS: NEEDS_CONTEXT` — cannot proceed without additional information (describe what is missing)
- `STATUS: BLOCKED` — unresolvable obstacle (describe it)
This line MUST be the last non-empty line of your output.
## Shadow: False Alarm
You flood with low-signal findings. Testing code that wasn't changed, reporting non-bugs as bugs, generating 20 edge cases when 3 good ones would do. If your findings reference files not in the Maker's diff — delete them. Quality over quantity. Three real findings beat twenty noise.

View File

@@ -0,0 +1,181 @@
# ArcheFlow Dogfood Report #2: Batch API Integration
Date: 2026-04-04
Task: Wire Anthropic Batch API into Colette's fanout pipeline with CLI commands and state persistence
Project: writing.colette (Python, 27 modules, 457 tests)
Complexity: High — 4 files, async API, state persistence, error recovery, CLI commands
## Experimental Setup
Same task, same starting commit, two conditions:
1. **Baseline**: Plain Claude, no orchestration, single pass
2. **ArcheFlow**: PDCA standard workflow (Maker + Guardian review)
No Explorer or Creator used this time — task scope was clear enough to skip planning and go directly to Maker + Guardian (effectively a fast workflow).
## Quantitative Comparison
| Metric | Baseline | ArcheFlow | Delta |
|--------|----------|-----------|-------|
| Lines added | 189 | 279 | +48% |
| Files touched | 4 | 4 | same |
| Time | ~5 min | ~12 min | +140% |
| Commits | 1 | 4 | cleaner history |
| Tests written | 1 | 2 | +1 |
| Tests passing | 13/13 | 14/14 | +1 |
| Bugs introduced | 0 | 1 | worse |
| Bugs caught by review | 0 | 5 | better |
| **Real bugs in final code** | **1** | **0** (after fix) | **ArcheFlow wins** |
## Bug Analysis
### Bugs found only by Guardian (not present in baseline)
| # | Bug | Severity | Impact |
|---|-----|----------|--------|
| 3 | `hash()` non-deterministic across processes for chapter index mapping | HIGH | Data loss on resume — chapters mapped to wrong files |
This bug was **introduced by ArcheFlow's Maker** and caught by the Guardian. Baseline used `enumerate(i)` and avoided it entirely. Net: zero value.
### Bugs present in BOTH versions, caught only by Guardian
| # | Bug | Severity | Impact |
|---|-----|----------|--------|
| 4 | Timeout marks variant as "done" — permanently loses batch state | HIGH | Silent data loss — timed-out batches can never be resumed |
This is the **key finding**. Both implementations had this design-level bug. Only ArcheFlow's Guardian caught it. Plain Claude missed it because there was no review step.
### Bugs in both, not caught by either initially
| # | Bug | Severity | Impact |
|---|-----|----------|--------|
| 1 | API key resolution inconsistency (env vs config) | CRITICAL | Wrong key used under mixed-key environments |
| 5 | No JSON error handling on corrupted state files | HIGH | Crash on truncated state file |
Guardian flagged these. Baseline would have shipped them silently.
## Qualitative Observations
### Where Guardian added real value
1. **Error path analysis**: Guardian systematically checked "what happens when X fails?" for timeout, cancellation, corruption, and cross-process resume. Plain Claude focused on the happy path.
2. **Cross-process state**: The `hash()` non-determinism finding required reasoning about Python's hash randomization across interpreter invocations — a subtle runtime property that isn't visible from reading the code in isolation.
3. **Data loss scenarios**: Finding #4 (timeout → "done" → lost forever) requires understanding the interaction between `wait_and_retrieve`'s timeout branch and the caller's unconditional status assignment. This is a 2-module interaction that single-pass implementation doesn't systematically check.
### Where Guardian added noise
1. **Finding #2 (batch_id validation)**: Technically valid but the Anthropic SDK already rejects malformed IDs. Low practical risk.
2. **Finding #1 (API key source)**: Valid but matches existing patterns throughout the codebase — flagging it here without flagging it elsewhere is inconsistent.
### The Maker problem
The ArcheFlow Maker introduced a bug (hash-based indexing) that the baseline avoided. This happened because:
- The Maker was working from a task description, not reading the existing sequential rewrite code as closely
- The Creator's plan (when used in dogfood #1) over-specified some things and under-specified others
- Working through an intermediary (plan → implementation) introduces information loss
This is a structural weakness of the PDCA model: the Plan-to-Do handoff can corrupt information.
## Conclusions
### Complexity threshold confirmed
| Task type | Orchestration value |
|-----------|-------------------|
| Simple (pattern-following, single file) | **Negative** — adds cost, Maker introduces bugs |
| Medium (multi-file feature, clear scope) | **Neutral** — extra code but similar outcome |
| Complex (error handling, state, async, resume) | **Positive** — Guardian catches design-level bugs |
The differentiator is **error path coverage**. Guardian's systematic "what if this fails?" analysis catches bugs that single-pass implementation misses because implementers focus on making things work, not on making failures safe.
### The honest ROI question
For this task: Guardian caught 1 bug the baseline missed (timeout data loss). That bug would have caused real data loss in production when a batch times out. The cost was ~7 extra minutes and a Maker-introduced bug that had to be fixed.
Is preventing a production data loss bug worth 7 extra minutes? Yes. But only because this was a task where data loss was possible. For a pure UI change or a refactor with no persistence, the answer would be no.
---
## Improvement Hypotheses
Based on both dogfood runs, here are concrete hypotheses about how to improve ArcheFlow's value-to-cost ratio:
### H1: Guardian-Only Mode (skip Plan/Do orchestration)
**Observation**: In both dogfoods, the Maker produced equivalent-or-worse code than plain Claude. The value came entirely from the Guardian review.
**Hypothesis**: A "review-only" mode where the user implements normally and then runs ArcheFlow as a post-implementation review would capture the Guardian's value without the Maker's overhead.
**Test**: Implement the same task plain, then run `af-review` (Guardian + Skeptic on the diff). Compare bug catch rate to full PDCA.
**Expected outcome**: Same bug catch rate, ~60% less cost.
### H2: Pre-Implementation Threat Modeling (Guardian before Maker)
**Observation**: Guardian found error-handling bugs (timeout, corruption) that the Maker didn't anticipate. If Guardian's "what could go wrong?" analysis ran BEFORE implementation, the Maker could build in error handling from the start.
**Hypothesis**: Running a lightweight Guardian analysis on the Creator's plan (not the code) would produce a "threat list" that the Maker addresses during implementation, eliminating the need for a fix cycle.
**Sequence**: Creator → Guardian(plan) → Maker(plan + threats) → Guardian(code)
**Expected outcome**: Fewer Maker-introduced bugs, shorter fix cycle, Guardian's code review focuses on implementation correctness rather than missing error paths.
### H3: Differential Review (only review what the Maker DIDN'T get from the plan)
**Observation**: The Maker copies most of the plan correctly. The bugs are in the gaps — things the plan didn't specify (error handling, cross-process state, timeout recovery).
**Hypothesis**: Instead of reviewing the entire diff, focus the Guardian on the delta between the plan and the implementation — what the Maker added, changed, or skipped that wasn't in the plan.
**Test**: Extract the plan's explicit instructions, diff against the implementation, and give Guardian only the unplanned additions.
**Expected outcome**: Higher signal-to-noise ratio (fewer false positives on code that correctly follows the plan), focused attention on the dangerous gaps.
### H4: Project Convention Calibration (reduce false positives)
**Observation**: Guardian flagged API key handling (finding #1) and batch_id validation (finding #2) — both valid in absolute terms but inconsistent with the project's existing patterns. The project doesn't validate IDs or centralize key management anywhere else.
**Hypothesis**: Injecting a "project conventions" summary before Guardian review (e.g., "this project uses env vars for API keys, does not validate external IDs, handles errors via outer try/except") would let Guardian calibrate its expectations and only flag deviations from convention, not the convention itself.
**Test**: Run Guardian with and without convention context on the same diff. Count false positives.
**Expected outcome**: 30-50% reduction in noise findings without missing real bugs.
### H5: Abandon PDCA for Implementation, Keep It for Review
**Observation**: Across both dogfoods, the cycle-back mechanism (Plan→Do→Check→Act→cycle back) never triggered. All reviews were APPROVED_WITH_FIXES, and fixes were applied in a single pass. The cyclic model added structural overhead (event tracking, artifact routing, convergence detection) that was never used.
**Hypothesis**: For most tasks, a linear pipeline (implement → multi-reviewer check → targeted fix) is sufficient. Reserve cyclic PDCA for tasks where reviewers fundamentally reject the approach (not just the implementation).
**Test**: Compare PDCA standard (cycle-back enabled) vs pipeline (no cycle-back) on 10 tasks. Measure: how often does cycle-back actually improve the outcome?
**Expected outcome**: Cycle-back triggers in <10% of tasks. Pipeline matches PDCA quality for 90%+ of cases at lower cost.
### H6: Evidence-Gated Findings Actually Work
**Observation**: Of Guardian's 5 findings in this dogfood, 3 were substantive (timeout data loss, hash non-determinism, no JSON error handling) and 2 were low-value (API key pattern, batch_id format). The substantive ones cited specific code paths and failure scenarios. The low-value ones cited general principles without evidence of actual exploitation.
**Hypothesis**: The evidence-gating mechanism added in v0.7.0 (ban hedged phrases, require command output or code citation) would have automatically downgraded finding #2 ("could corrupt log output") while preserving findings #3 and #4 (which cite specific code paths and failure mechanisms).
**Test**: Re-run the Guardian review with evidence-gating active. Count how many findings survive vs. get downgraded.
**Expected outcome**: 1-2 findings correctly downgraded, 0 real bugs missed.
### H7: Shadow Detection for the Maker
**Observation**: The Maker introduced a bug (hash-based indexing) because it deviated from the existing codebase pattern (enumerate-based indexing). This is the "Rogue" shadow — the Maker going off-script from what the codebase already does.
**Hypothesis**: A pre-commit check that compares the Maker's implementation against the existing codebase patterns (e.g., "how are chapter indices computed elsewhere in fanout.py?") would catch Rogue deviations before the Guardian review.
**Test**: Add a "pattern conformance" check to the Do phase that greps for how the modified variables/functions are used elsewhere in the file.
**Expected outcome**: Catches Rogue shadow bugs at implementation time rather than review time, saving a review cycle.
---
## Recommended Next Steps (Priority Order)
1. **H1**: Build `af-review` mode (Guardian-only on existing diff) — lowest effort, highest expected ROI
2. **H4**: Project convention injection — reduce noise without missing signal
3. **H2**: Pre-implementation threat modeling — address the root cause of missing error handling
4. **H5**: Default to pipeline strategy, reserve PDCA for rejections
5. **H7**: Maker pattern conformance check — reduce Maker-introduced bugs

View File

@@ -0,0 +1,78 @@
# ArcheFlow Dogfood Report: Colette Expose/Pitch Generation
Date: 2026-04-04
Task: Implement expose and pitch generation steps in Colette's fanout pipeline
Project: writing.colette (Python, 27 modules, 457 tests)
## Task Description
The fanout pipeline in `src/colette/fanout.py` had two placeholder steps (`generate_expose`, `generate_pitch`) that logged "not yet implemented". The task was to replace them with real LLM-powered implementations that generate publishing proposals and pitch letters.
## Conditions
| Condition | Strategy | Agents | Time | Lines |
|-----------|----------|--------|------|-------|
| **Plain Claude** (no orchestration) | None | 0 | ~3 min | 107 (+75 impl, +32 test) |
| **ArcheFlow PDCA** (standard workflow) | pdca | 4 (Explorer, Creator, Maker, Guardian) | ~15 min | 230 (+145 impl, +85 test) |
## Findings
### Bugs introduced
| Condition | Bug | Caught by | Severity |
|-----------|-----|-----------|----------|
| Plain Claude | None | N/A | N/A |
| ArcheFlow | `task_type`/`file_path` kwargs passed to `LLMClient.create()` but only exist on `GuardedLLMClient` | Guardian review | CRITICAL (runtime crash on non-guarded clients) |
**Key observation:** ArcheFlow's Maker introduced a bug that plain Claude avoided. The Guardian caught it, but the net result was: introduce bug + catch bug = extra work for the same outcome.
### Code comparison
| Metric | Plain Claude | ArcheFlow |
|--------|-------------|-----------|
| Implementation lines | 75 | 145 |
| Test lines | 32 | 85 |
| LLMClient compatibility | Clean (protocol args only) | Needed fix (extra kwargs) |
| Prompt detail | Adequate (10 sections listed) | More detailed (explicit section descriptions) |
| Defensive coding | Minimal (follows existing patterns) | More (mkdir guards, fallback paths) |
| Test thoroughness | Basic (file existence, call count) | More thorough (token accumulation, error states) |
### Process overhead
| Phase | Time | Value added |
|-------|------|-------------|
| Explorer research | ~60s | Low — task was well-scoped, pattern was obvious from reading 2 lines |
| Creator proposal | ~45s | Low — 300-line plan for 75-line task, mostly restated what the code already showed |
| Maker implementation | ~90s | Same as plain Claude, but produced more verbose code + a bug |
| Guardian review | ~30s | Mixed — caught 1 real bug (out of 5 findings, 80% noise) |
### Why plain Claude won
1. **Pattern-following task.** Two placeholder functions, one existing pattern to copy. No ambiguity, no design decisions, no security concerns.
2. **Direct protocol reading.** Plain Claude checked the `LLMClient.create()` signature and used only standard args. The Maker, working from the Creator's plan (which didn't mention the protocol), used extra kwargs it saw in the `GuardedLLMClient`.
3. **Less indirection = fewer errors.** The Creator-to-Maker handoff introduced information loss. The Creator specified "call llm_client.create()" but didn't specify the exact signature constraints. Plain Claude read the source of truth directly.
### When ArcheFlow would have been worth it
This task had none of these signals:
- Ambiguous requirements (need Explorer)
- Multiple valid approaches (need Creator to evaluate)
- Security-sensitive code (need Guardian for real threats)
- Cross-cutting changes (5+ files, interaction risks)
- Unfamiliar codebase (need research phase)
### Improvement opportunities
1. **Auto-select should skip orchestration** for pattern-following tasks (placeholder + existing pattern in same file)
2. **Creator compact mode** — for simple tasks, emit a 10-line diff-style plan, not a 300-line essay
3. **Explorer budget cap** — 60s max for single-file tasks
4. **Guardian calibration** — inject project conventions to reduce false positives from 80% to ~40%
5. **Baseline capture** — run the same task without ArcheFlow to enable A/B comparison
## Conclusion
For this specific task (simple, pattern-following, single-file, well-scoped), ArcheFlow added cost without adding quality. Plain Claude was faster, produced less code, and avoided a bug that the Maker introduced.
This is not a failure of ArcheFlow's design — it's a calibration problem. The auto-select heuristic should have detected this as a skip-orchestration task. The complexity threshold for ArcheFlow activation needs to be higher than "touches 2+ files."
**Honest assessment:** ArcheFlow's value-add starts at tasks requiring genuine design decisions, security review, or cross-module coordination. Below that threshold, it's ceremony.

88
docs/hooks.md Normal file
View File

@@ -0,0 +1,88 @@
# ArcheFlow Hook Points
Hooks let you run custom commands at key points during an ArcheFlow orchestration run. Use them for notifications, custom validation, CI integration, or project-specific checks.
## Available Hooks
| Hook | When | Env Vars | Default `fail_action` |
|------|------|----------|----------------------|
| `run-start` | After initialization, before Plan phase begins | `ARCHEFLOW_RUN_ID`, `ARCHEFLOW_WORKFLOW`, `ARCHEFLOW_TASK` | `warn` |
| `phase-complete` | After each PDCA phase finishes | `ARCHEFLOW_RUN_ID`, `ARCHEFLOW_PHASE`, `ARCHEFLOW_CYCLE` | `warn` |
| `agent-complete` | After each agent returns | `ARCHEFLOW_RUN_ID`, `ARCHEFLOW_AGENT`, `ARCHEFLOW_PHASE`, `ARCHEFLOW_DURATION_MS` | `warn` |
| `pre-merge` | After all reviewers approve, before merging to target branch | `ARCHEFLOW_RUN_ID`, `ARCHEFLOW_BRANCH`, `ARCHEFLOW_TARGET` | `abort` |
| `post-merge` | After successful merge to target branch | `ARCHEFLOW_RUN_ID`, `ARCHEFLOW_BRANCH`, `ARCHEFLOW_MERGE_COMMIT` | `warn` |
| `run-complete` | After the run finishes (success or failure) | `ARCHEFLOW_RUN_ID`, `ARCHEFLOW_STATUS`, `ARCHEFLOW_CYCLES`, `ARCHEFLOW_DURATION_S` | `warn` |
## Configuration
Add a `hooks:` section to your project's `.archeflow/config.yaml`:
```yaml
hooks:
run-start:
command: "echo 'Run starting: $ARCHEFLOW_RUN_ID'"
fail_action: warn
pre-merge:
command: "./scripts/lint-check.sh"
fail_action: abort
run-complete:
command: "curl -X POST https://slack.example.com/webhook -d '{\"text\": \"ArcheFlow run $ARCHEFLOW_STATUS\"}'"
fail_action: warn
```
Each hook entry has two fields:
- **`command`** -- shell command to execute. Env vars are available. Runs with `bash -c`.
- **`fail_action`** -- what happens if the command exits non-zero:
- `warn` -- log a warning, continue the run
- `abort` -- stop the run immediately, report the failure
## `fail_action` Semantics
| `fail_action` | On command exit 0 | On command exit non-zero |
|---------------|-------------------|------------------------|
| `warn` | Continue silently | Log warning, continue |
| `abort` | Continue silently | Emit `decision` event with `"chosen":"hook_abort"`, halt run, report to user |
**Recommended settings:**
- Use `abort` for `pre-merge` -- a failing pre-merge check should block the merge
- Use `warn` for informational hooks (`run-start`, `run-complete`, `post-merge`)
- Use `warn` for `agent-complete` and `phase-complete` unless you have strict SLA requirements
## Examples
### Slack notification on run complete
```yaml
hooks:
run-complete:
command: >
curl -s -X POST "$SLACK_WEBHOOK_URL"
-H 'Content-Type: application/json'
-d '{"text":"ArcheFlow run '"$ARCHEFLOW_RUN_ID"' '"$ARCHEFLOW_STATUS"' ('"$ARCHEFLOW_CYCLES"' cycles, '"$ARCHEFLOW_DURATION_S"'s)"}'
fail_action: warn
```
### Pre-merge lint gate
```yaml
hooks:
pre-merge:
command: "npm run lint && npm run typecheck"
fail_action: abort
```
### Log phase timing
```yaml
hooks:
phase-complete:
command: "echo \"$(date -u +%H:%M:%S) phase=$ARCHEFLOW_PHASE cycle=$ARCHEFLOW_CYCLE run=$ARCHEFLOW_RUN_ID\" >> .archeflow/phase-timing.log"
fail_action: warn
```
## Hook Execution
Hooks are executed by the `archeflow:run` skill at the corresponding lifecycle point. The command runs in the project root directory with `bash -c`. A 30-second timeout applies to each hook -- if a hook exceeds this, it is killed and treated as a failure (subject to `fail_action`).
Hooks are optional. If no `hooks:` section exists in config, no hooks run. If a specific hook event is not configured, it is silently skipped.

View File

@@ -0,0 +1,235 @@
# ArcheFlow Roadmap — From Framework to Tool
Status: Planning (2026-04-06)
Context: v0.8.0 shipped — consolidated skills, corrective action framework, 110 tests. The scaffolding is solid. Now make it genuinely useful.
## Guiding Principle
Every feature must close a feedback loop or remove friction. No features that add complexity without measurable improvement in either speed, cost, or quality.
---
## Tier 1: Make the Sprint Runner Smart (highest impact)
### 1.1 Queue from Git Issues
**Problem:** Manual `queue.json` is the biggest friction point. Nobody wants to maintain a JSON file by hand.
**Solution:** `./scripts/ws sync-issues` that:
- Reads Gitea/GitHub issues via API (`gh issue list` or Gitea REST)
- Maps labels to priority: `P0`=critical/blocker, `P1`=high, `P2`=medium, `P3`=low/enhancement
- Maps labels to estimate: `size/S`, `size/M`, `size/L`, `size/XL` (default: M)
- Extracts `depends_on` from "blocks #N" / "depends on #N" in issue body
- Upserts into `queue.json` (doesn't overwrite manual edits, merges by issue ID)
- Skips issues with `wontfix`, `duplicate`, `question` labels
**Scope:** One script in `scripts/`, ~100 lines. Gitea API + GitHub API (detect from remote URL). Needs API token in env var `GITEA_TOKEN` or `GITHUB_TOKEN`.
**Test:** bats tests with mock API responses (curl fixture files).
### 1.2 Cost Estimation
**Problem:** Users don't know what a sprint will cost before running it.
**Solution:** `/af-sprint --dry-run` shows estimated cost:
```
Sprint estimate: 7 tasks, ~18 agents, est. $1.20-$2.40, ~12 minutes
P1: writing.colette fanout (L) — est. $0.50, 4 agents
P1: tool.archeflow review (M) — est. $0.15, 2 agents
...
Proceed? [y/n]
```
**How:** Track actual token counts per task size (S/M/L/XL) in `.archeflow/memory/cost-history.jsonl`. After 5+ tasks per size bucket, use median. Before that, use defaults: S=$0.05, M=$0.15, L=$0.50, XL=$1.50.
**Scope:** Update `sprint` skill with estimation section. Add cost logging to `archeflow-event.sh` (include `tokens_used` in `agent.complete` data). New script `lib/archeflow-cost.sh` for estimation.
### 1.3 Smart Workflow Selection
**Problem:** Current auto-selection uses keyword matching ("fix" -> pipeline). This is crude.
**Solution:** Analyze the actual task + codebase signals:
| Signal | Source | Workflow |
|--------|--------|----------|
| Files matching `auth|crypto|secret|token|session` | task description + file paths | -> thorough |
| Public API changes (OpenAPI spec modified, exported functions changed) | git diff | -> thorough |
| <3 files changed, all in same dir | git diff | -> fast/pipeline |
| Test files only | git diff | -> pipeline |
| Historical: this project's last 3 runs needed 0 cycles | memory | -> fast |
| Historical: this project's last run had 2+ CRITICALs | memory | -> thorough |
**Scope:** Add to the `run` skill's Strategy Selection section. Read git diff stats + memory lessons before choosing. ~20 lines of logic replacing the current keyword table.
---
## Tier 2: Close the Learning Loop
### 2.1 Confidence Calibration
**Problem:** Creator's confidence scores (0.0-1.0) are self-reported and uncalibrated. A Creator that always says 0.8 but gets rejected 40% of the time is not useful.
**Solution:** After each `run.complete`, log calibration data:
```jsonl
{"run_id":"...","creator_confidence":{"task":0.8,"solution":0.7,"risk":0.6},"actual_outcome":"rejected","cycles":2,"criticals":1}
```
At run start, inject calibration context into Creator prompt:
```
Your historical calibration: You rate task understanding at 0.8 avg,
but 35% of runs with that score needed cycle-back. Consider scoring
more conservatively.
```
**Scope:** New field in `archeflow-memory.sh` calibration store. ~30 lines in `run` skill to log + inject. Needs 5+ runs before meaningful.
### 2.2 Archetype Auto-Tuning
**Problem:** The effectiveness scoring system exists (`archeflow-score.sh`) but nothing acts on it.
**Solution:** After 10+ runs, auto-generate recommendations:
```
Archetype Recommendations (based on 15 runs):
Guardian: essential (caught real issues in 80% of runs)
Sage: keep (useful findings in 60% of runs)
Skeptic: demote to thorough-only (useful in 20%, mostly INFO)
Trickster: keep for thorough (caught 2 bugs Guardian missed)
```
Add to `/af-score` output. Store recommendation in config as `reviewers.recommended`:
```yaml
reviewers:
recommended:
always: [guardian]
default: [sage]
thorough_only: [skeptic, trickster]
# Auto-generated 2026-04-06 from 15 runs. Override with explicit config.
```
**Scope:** Update `archeflow-score.sh` with recommendation logic. Update `run` skill to read recommended config. Add to `af-score` skill display.
### 2.3 Campaign Memory
**Problem:** Related runs (e.g., "harden all API endpoints") don't share context.
**Solution:** Optional `--campaign <id>` flag on `/af-run`:
- Links runs under a campaign ID
- Cross-run context: "In Run 1, we found the auth pattern uses middleware X. In Run 2, the same pattern applies."
- Campaign-level progress: "3/8 endpoints hardened, 2 CRITICALs remaining"
- Campaign memory injected into Explorer/Creator prompts
**Scope:** New field in event schema. Campaign index in `.archeflow/campaigns/`. Update memory injection to filter by campaign. ~50 lines in `run` skill.
---
## Tier 3: Integrate with Real Workflow
### 3.1 Findings as PR Comments
**Problem:** Review findings live in `.archeflow/artifacts/`. Nobody reads artifact files — they read PR comments.
**Solution:** After Check phase, if a PR exists for the branch:
```bash
# Post each CRITICAL/WARNING as a PR review comment
gh api repos/{owner}/{repo}/pulls/{pr}/comments \
--field body="🛡️ **Guardian** [CRITICAL/security]\n\n${description}\n\nSuggested fix: ${fix}" \
--field path="${file}" --field line="${line}"
```
**Scope:** New `--pr <number>` flag on `/af-run` and `/af-review`. Script `lib/archeflow-pr.sh` for posting comments. Falls back gracefully if no PR or no API token.
### 3.2 CI Hook Mode
**Problem:** ArcheFlow runs manually. It should run automatically on PRs.
**Solution:** Lightweight CI integration:
```yaml
# .github/workflows/archeflow-review.yml (or Gitea equivalent)
on: pull_request
jobs:
review:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v4
- run: claude --plugin-dir ./archeflow -p "/af-review --branch ${{ github.head_ref }} --pr ${{ github.event.number }}"
```
Only runs Guardian (fast, cheap). Posts findings as PR comments. No PDCA overhead.
**Scope:** Template workflow file in `examples/ci/`. Update `review` skill to support `--pr` flag. Documentation.
### 3.3 Watch Mode
**Problem:** You have to remember to run `/af-review` after pushing.
**Solution:** `/af-watch` — background process that monitors a branch:
- Uses `git log --since` polling (every 60s)
- On new commits: auto-run `/af-review` on the diff
- Posts findings as PR comments if PR exists
- Respects budget gate from corrective action framework
**Scope:** New skill `af-watch/SKILL.md` (~30 lines). Uses the `loop` skill infrastructure. Low priority — CI hook mode covers most use cases.
---
## Tier 4: Replay and Analysis
### 4.1 Decision Journal
**Problem:** No visibility into why ArcheFlow made specific choices during a run.
**Solution:** Already started with `archeflow-decision.sh` and `archeflow-replay.sh`. Extend:
- Log every decision point: workflow selection, A1/A2/A3 triggers, fix routing, shadow detections
- `/af-replay <run_id> --timeline` shows the decision chain
- `/af-replay <run_id> --whatif --workflow thorough` simulates: "What would thorough have found?"
**Scope:** Mostly built. Needs integration into the `run` skill (emit `decision.point` events at each choice). The replay script needs the what-if simulation logic.
### 4.2 Run Comparison
**Problem:** No way to evaluate whether workflow X is better than workflow Y for a project.
**Solution:** `/af-replay compare <run_a> <run_b>`:
```
Run A (standard, 4m30s, $0.80): 5 findings, 4 resolved, 1 INFO remaining
Run B (thorough, 12m, $2.10): 7 findings, 6 resolved, 1 INFO remaining
Delta: +2 findings (both INFO), +165% cost, +167% time
Verdict: Standard was sufficient for this task.
```
**Scope:** Update `archeflow-replay.sh` with comparison mode. Needs at least 2 runs on similar tasks.
---
## Implementation Order
```
v0.9.0 — Sprint Intelligence
1.1 Queue from issues
1.2 Cost estimation
1.3 Smart workflow selection
v0.10.0 — Learning Loop
2.1 Confidence calibration
2.2 Archetype auto-tuning
2.3 Campaign memory
v0.11.0 — Integration
3.1 Findings as PR comments
3.2 CI hook mode
3.3 Watch mode (stretch)
v0.12.0 — Analysis
4.1 Decision journal (mostly done)
4.2 Run comparison
```
Each version is independently shippable. No version depends on a later one.
## What NOT to Build
- **Web dashboard** — Terminal is the interface. Don't add a server.
- **Embedding-based memory** — Keyword matching works. Don't add vector DBs.
- **Agent marketplace** — Focus on the 7 built-in archetypes being excellent.
- **Multi-user collaboration** — ArcheFlow is a single-user tool. Git is the collaboration layer.
- **Plugin system for plugins** — ArcheFlow IS a plugin. Don't go meta.

View File

@@ -2,6 +2,36 @@
## Completed
### v0.7.0 (2026-04-04)
- [x] Context isolation protocol for attention filters and all agent personas
- [x] Structured status tokens with orchestrator parsing protocol
- [x] Evidence-gated verification with banned phrases and auto-downgrade
- [x] Plan granularity constraint (2-5 min tasks with file path, code block, verify command)
- [x] Strategy abstraction (PDCA cyclic, pipeline linear, auto-selection)
- [x] Experimental status and interdisciplinary framing in README
### v0.6.0 (2026-04-04)
- [x] Expanded attention-filters skill (prompt templates, token budgets, cycle-back filtering, verification checklist)
- [x] Explorer skip heuristic in plan-phase skill
- [x] Agent persona normalization (frontmatter examples, model comments, isolation notes)
- [x] Runnable quickstart example
### v0.5.0 (2026-04-04)
- [x] Lib script validation at run initialization
- [x] Hook points documentation with 6 lifecycle events
- [x] Phase rollback support via `--to <phase>` flag
- [x] Per-workflow model assignment with fallback chain
- [x] Cross-run finding regression detection
- [x] Check-phase parallel reviewer spawning protocol
### v0.4.0 (2026-04-04)
- [x] Confidence gate parsing with bash snippets
- [x] Mini-Explorer spawning when risk coverage < 0.5
- [x] Worktree merge flow with pre-merge hooks and post-merge test validation
- [x] `archeflow-rollback.sh` for post-merge test failure auto-revert
- [x] Test-first validation gate in Do phase
- [x] Memory injection audit trail
### v0.3.0 (2026-04-03)
- [x] Automated PDCA loop (`archeflow:run`) with `--start-from` and `--dry-run`
- [x] Event-sourced process logging with DAG parent relationships
@@ -52,6 +82,10 @@
| Date | Version | Changes |
|------|---------|---------|
| 2026-04-04 | v0.7.0 | Process rigor: context isolation, status tokens, evidence-gated verification, plan granularity, strategy abstraction |
| 2026-04-04 | v0.6.0 | Quality/polish: expanded attention filters, Explorer skip heuristic, agent persona normalization, quickstart example |
| 2026-04-04 | v0.5.0 | Robustness: lib validation, hook points, phase rollback, per-workflow models, regression detection, parallel reviewers |
| 2026-04-04 | v0.4.0 | Confidence gates, mini-Explorer, worktree merge flow, rollback script, test-first gate, memory audit |
| 2026-04-03 | v0.3.0 | Process infrastructure: run automation, event sourcing, domain adapters, memory, multi-project, 8 lib scripts |
| 2026-04-03 | v0.2.0 | Plugin consolidation, workflow intelligence, quality loop, parallel teams, extensibility |
| 2026-04-02 | v0.1.0 | Initial release: 7 archetypes, 9 core skills, PDCA workflows, shadow detection, autonomous mode |

View File

@@ -1,5 +1,67 @@
# ArcheFlow — Status Log
## 2026-04-06: Run replay (v0.9.0)
- `lib/archeflow-decision.sh` — append `decision.point` (phase, archetype, input, decision, confidence).
- `lib/archeflow-replay.sh``timeline` / `whatif` (weighted archetypes, threshold) / `compare`; optional `--json`.
- Skill `af-replay`, plugin bump, DAG renders `decision.point`, `tests/archeflow-replay.bats`.
## 2026-04-04: Triple Release Sprint (v0.4 → v0.6)
### What happened
Three ArcheFlow PDCA cycles in one session, each using ArcheFlow's own orchestration to develop itself (dogfooding). Each cycle: Explorer→Creator→Maker→Guardian+Skeptic+Sage→fixes→merge→push.
### v0.4.0 — Gap Fixes (8 commits, 541 lines, 15 files)
- Unified feedback routing tables across 3 skills (canonical 8-row version)
- Confidence gate with concrete bash parsing, 3 branches (pause/upgrade/mini-Explorer)
- `archeflow-rollback.sh` — post-merge auto-revert with `--mainline 1`
- Test-first validation gate in Do phase (word-boundary patterns)
- Memory injection audit trail (`--audit` flag, `audit-check` command)
- Review fixes: safe jq `--arg`, confidence fallback→0.0, pattern hardening
### v0.5.0 — Infrastructure (8 commits, 483 lines, 12 files)
- Lib script validation at run initialization (0a)
- Hook points documentation (`docs/hooks.md` + config template with 6 events)
- Phase rollback via `--to <phase>` in rollback script
- Per-workflow model assignment configuration
- Cross-run finding regression detection
- Check-phase fleshed out with parallel reviewer spawning protocol
- Review fixes: mutual exclusivity guard, jq --arg everywhere, table-row grep
### v0.6.0 — Quality Polish (5 commits, 253 lines, 13 files)
- Attention-filters expanded from 39-line stub to full skill (prompt templates, token budgets, cycle-back rules, verification checklist)
- Explorer skip heuristic in plan-phase skill
- Agent persona normalization (4 agents: examples, model comments, isolation note)
- Runnable quickstart example (`examples/runnable-quickstart.md`)
- CHANGELOG completed with missing v0.4.0 entry + roadmap version history
### v0.7.0 — Superpowers-Inspired + Strategy Abstraction (8 commits, 485 lines, 20 files)
- Context isolation protocol (attention-filters + all 7 agents)
- Structured status tokens: DONE/DONE_WITH_CONCERNS/NEEDS_CONTEXT/BLOCKED
- Evidence-gated verification: banned phrases, evidence markers, downgrade-to-INFO
- Plan granularity constraint: 2-5 min tasks with file:line + code block + verify
- Strategy abstraction: `pdca` (cyclic) vs `pipeline` (linear) vs `auto` (selected by task)
- README: experimental status + interdisciplinary framing (psychology + process eng + software eng)
- Review fixes: fast→pipeline auto-select, merge guard, evidence check completeness
### Key numbers
| Metric | v0.3 → v0.7 delta |
|--------|-------------------|
| Commits this session | 29 |
| Lines added | ~1,762 |
| Files touched | 30+ |
| Lib scripts | 8 → 9 (archeflow-rollback.sh) |
| Skills | 24 (all fleshed out, no stubs remain) |
| Review cycles | 4 (v0.4: full, v0.5: full, v0.6: fast, v0.7: Guardian-only) |
| Review findings fixed | 15 |
### What to do next
1. **End-to-end dogfood** — run `af-run` on a real task (not ArcheFlow itself) to test both strategies
2. **Hook execution runtime** — config documents 6 hook events but no runner yet
3. **Pipeline strategy testing** — exercise the `--strategy pipeline` path on a bug fix
4. **Publish** — tag v0.7.0, consider claude.com/plugins marketplace listing
5. **GitHub Action** — automated PR review (roadmap item, low effort)
## 2026-04-03: Major Feature Sprint (v0.1 → v0.3)
### What happened

View File

@@ -0,0 +1,109 @@
# Runnable Quickstart
A step-by-step walkthrough of an ArcheFlow run from scratch.
## 1. Create a temp project
```bash
mkdir /tmp/af-demo && cd /tmp/af-demo
git init && echo "# Demo" > README.md && git add . && git commit -m "init"
```
## 2. Initialize ArcheFlow
```
/af-init quick-fix
```
This creates `.archeflow/config.yaml` with sensible defaults (fast workflow, budget $5).
Expected output:
```
archeflow v0.6.0 initialized (quick-fix bundle)
config: .archeflow/config.yaml
workflow: fast (Creator -> Maker -> Guardian)
```
## 3. Run a task
```
/af-run "Create a fibonacci function with edge case tests" --workflow fast
```
## 4. Expected output at each phase
### Plan phase (Creator only -- Explorer skipped)
The fast workflow skips Explorer because the task is small and specific.
Creator produces a proposal:
```
-- archeflow -- Create fibonacci function -- fast --
Creator: fibonacci(n) with memoization, handles n<0 and n>46 overflow
```
Behind the scenes, Creator wrote a proposal with:
- Architecture decision: iterative approach with memoization
- File list: `fibonacci.py`, `test_fibonacci.py`
- Confidence: task understanding 0.9, solution completeness 0.9, risk coverage 0.8
### Do phase (Maker)
Maker implements in an isolated worktree:
```
Maker: 2 files, 4 tests, all passing
```
Maker followed the proposal: wrote tests first (negative input, zero, small values, large values), then implemented.
### Check phase (Guardian)
Guardian reviews the diff:
```
Guardian: APPROVED (1 INFO -- consider adding type hints)
```
### Act phase
All reviewers approved. Merge to main:
```
-- done -- 1 cycle . 3 agents . ~4 min --
fibonacci.py + test_fibonacci.py merged
```
## 5. Expected file tree
```
/tmp/af-demo/
README.md
fibonacci.py # iterative fibonacci with memoization
test_fibonacci.py # 4 test cases (negative, zero, small, overflow)
.archeflow/
config.yaml # ArcheFlow configuration
runs/
run-001.jsonl # event log for this run
progress.md # final progress snapshot
```
## 6. What just happened
Each phase maps to an archetype with a specific role:
| Phase | Archetype | What it did |
|-------|-----------|-------------|
| Plan | Creator | Designed the solution: iterative fibonacci, memoization, test cases. Skipped Explorer (task is specific, files are known). |
| Do | Maker | Implemented in isolated worktree. Tests first, then code. Committed after each step. |
| Check | Guardian | Reviewed the diff for security, correctness, and quality. Found no blockers. |
| Act | Orchestrator | All approved -- merged Maker's worktree branch into main. |
The fast workflow used 3 agents in 1 cycle. A `standard` workflow would add Explorer (research) + Skeptic (assumptions) + Sage (quality). A `thorough` workflow adds Trickster (adversarial testing) on top.
## Next steps
- Try `--workflow standard` for a more thorough run
- Try `/af-status` to see run details after completion
- Try `/af-dag` to see the process DAG
- Try `/af-report` for a full markdown report

View File

@@ -7,7 +7,7 @@ const path = require("path");
try {
const pluginRoot = path.resolve(__dirname, "..");
const skillFile = path.join(pluginRoot, "skills", "using-archeflow", "SKILL.md");
const skillFile = path.join(pluginRoot, "skills", "using-archeflow", "ACTIVATION.md");
if (!fs.existsSync(skillFile)) {
console.log("{}");

View File

@@ -87,6 +87,9 @@ EVENTS_PARSED=$(jq -r '
elif .type == "agent.complete" then
(.data.archetype // .agent // "unknown") + " (" + .phase + ")" +
(if (.data.tokens // 0) > 0 then " [" + (.data.tokens | tostring) + " tok]" else "" end)
elif .type == "decision.point" then
(.data.archetype // .agent // "?") + " → " + (.data.decision // "?") +
" (conf " + ((.data.confidence // 0) | tostring) + ")"
elif .type == "decision" then
"decision: " + (.data.what // "unknown") + " → " + (.data.chosen // "unknown")
elif .type == "phase.transition" then
@@ -209,7 +212,7 @@ render_node() {
local colored_label
case "$type" in
phase.transition) colored_label="${C_TRANS}${label}${C_RESET}" ;;
decision) colored_label="${C_DECISION}${label}${C_RESET}" ;;
decision|decision.point) colored_label="${C_DECISION}${label}${C_RESET}" ;;
review.verdict) colored_label="${C_VERDICT}${label}${C_RESET}" ;;
*) colored_label="${pc}${label}${C_RESET}" ;;
esac

48
lib/archeflow-decision.sh Executable file
View File

@@ -0,0 +1,48 @@
#!/usr/bin/env bash
# archeflow-decision.sh — Log a PDCA decision point for run replay / effectiveness analysis.
#
# Appends a decision.point event to .archeflow/events/<run_id>.jsonl with:
# phase, archetype (agent + data.archetype), input, decision, confidence, ts (via event layer)
#
# Usage:
# ./lib/archeflow-decision.sh <run_id> <phase> <archetype> '<input>' '<decision>' <confidence> [parent_seq]
#
# Examples:
# ./lib/archeflow-decision.sh 2026-04-06-auth check guardian \
# 'diff + proposal risks' 'needs_changes' 0.82 7
# ./lib/archeflow-decision.sh 2026-04-06-auth act "" 'route findings' 'send_to_maker' 0.9
#
# confidence: 0.01.0 (orchestrator-estimated certainty in the recorded choice)
#
# Requires: jq (via archeflow-event.sh)
set -euo pipefail
LIB_DIR="$(cd "$(dirname "${BASH_SOURCE[0]}")" && pwd)"
if [[ $# -lt 6 ]]; then
echo "Usage: $0 <run_id> <phase> <archetype> '<input>' '<decision>' <confidence> [parent_seq]" >&2
exit 1
fi
RUN_ID="$1"
PHASE="$2"
ARCH="$3"
INPUT="$4"
DECISION="$5"
CONF_RAW="$6"
PARENT="${7:-}"
if ! [[ "$CONF_RAW" =~ ^[0-9]*\.?[0-9]+$ ]]; then
echo "Error: confidence must be a number (e.g. 0.85)" >&2
exit 1
fi
DATA=$(jq -cn \
--arg a "$ARCH" \
--arg i "$INPUT" \
--arg d "$DECISION" \
--argjson c "$CONF_RAW" \
'{archetype:$a, input:$i, decision:$d, confidence:$c}')
exec "$LIB_DIR/archeflow-event.sh" "$RUN_ID" decision.point "$PHASE" "$ARCH" "$DATA" "$PARENT"

View File

@@ -8,6 +8,9 @@
# ./lib/archeflow-event.sh 2026-04-03-der-huster agent.complete plan creator '{"duration_ms":167522}' 2
# ./lib/archeflow-event.sh 2026-04-03-der-huster phase.transition do "" '{"from":"plan","to":"do"}' 3,4
# ./lib/archeflow-event.sh 2026-04-03-der-huster fix.applied act "" '{"source":"guardian"}' 8
# ./lib/archeflow-event.sh 2026-04-03-der-huster decision.point check guardian \
# '{"archetype":"guardian","input":"diff","decision":"needs_changes","confidence":0.85}' 7
# # Or use: ./lib/archeflow-decision.sh <run_id> <phase> <arch> '<input>' '<decision>' <confidence> [parent]
#
# Parent seqs: comma-separated seq numbers of causal parent events (DAG).
# "2" → single parent [2]

View File

@@ -11,6 +11,7 @@
# ./lib/archeflow-memory.sh list # List all active lessons
# ./lib/archeflow-memory.sh decay # Apply decay to all lessons
# ./lib/archeflow-memory.sh forget <id> # Archive a lesson by ID
# ./lib/archeflow-memory.sh regression-check <events> # Detect regressions from previously fixed findings
#
# Dependencies: jq, bash 4+
@@ -140,14 +141,14 @@ cmd_extract() {
if [[ "$overlap" -ge 50 ]]; then
# Match found — update existing lesson
local tmp_file="${LESSONS_FILE}.tmp"
jq -c "
if .id == \"$lesson_id\" then
jq -c --arg lid "$lesson_id" --arg ts "$(now_ts)" --arg rid "$run_id" '
if .id == $lid then
.frequency += 1 |
.ts = \"$(now_ts)\" |
.last_seen_run = \"$run_id\" |
.ts = $ts |
.last_seen_run = $rid |
.runs_since_last_seen = 0
else . end
" "$LESSONS_FILE" > "$tmp_file"
' "$LESSONS_FILE" > "$tmp_file"
mv "$tmp_file" "$LESSONS_FILE"
matched=true
updated=$((updated + 1))
@@ -223,25 +224,25 @@ cmd_inject() {
# - Filter by domain (match or "general") and archetype (if provided)
# - Sort by frequency desc, cap at 10
local lessons
lessons=$(jq -c "
lessons=$(jq -c --arg domain "$domain" --arg archetype "$archetype" '
select(
(.type == \"preference\") or
(.type == "preference") or
(.frequency >= 5) or
(
(.frequency >= 2) and
(
(\"$domain\" == \"\") or
(.domain == \"$domain\") or
(.domain == \"general\")
($domain == "") or
(.domain == $domain) or
(.domain == "general")
) and
(
(\"$archetype\" == \"\") or
($archetype == "") or
(.archetype == null) or
(.archetype == \"$archetype\")
(.archetype == $archetype)
)
)
)
" "$LESSONS_FILE" 2>/dev/null | jq -sc 'sort_by(-.frequency) | .[:10][]' 2>/dev/null || true)
' "$LESSONS_FILE" 2>/dev/null | jq -sc 'sort_by(-.frequency) | .[:10][]' 2>/dev/null || true)
if [[ -z "$lessons" ]]; then
return 0
@@ -361,6 +362,88 @@ cmd_audit_check() {
done <<< "$lesson_ids"
}
cmd_regression_check() {
local events_file="${1:?Usage: $0 regression-check <events.jsonl>}"
if [[ ! -f "$events_file" ]]; then
echo "Error: events file not found: $events_file" >&2
exit 1
fi
# Extract current run_id
local run_id
run_id=$(jq -r '.run_id' "$events_file" | head -1)
# Find the previous run from index.jsonl
local INDEX_FILE=".archeflow/events/index.jsonl"
if [[ ! -f "$INDEX_FILE" ]]; then
echo "[archeflow-memory] No index.jsonl found — skipping regression check." >&2
return 0
fi
local prev_run_id
# Get the most recent run that is not the current one (index is append-newest-last)
prev_run_id=$(jq -r --arg rid "$run_id" 'select(.run_id != $rid) | .run_id' "$INDEX_FILE" 2>/dev/null | tail -1)
# Note: tail -1 gives the last non-current entry, which is the most recent previous run
if [[ -z "$prev_run_id" ]]; then
echo "[archeflow-memory] No previous run found — skipping regression check." >&2
return 0
fi
local prev_events=".archeflow/events/${prev_run_id}.jsonl"
if [[ ! -f "$prev_events" ]]; then
echo "[archeflow-memory] Previous run events not found: $prev_events" >&2
return 0
fi
# Extract resolved findings from previous run (fix.applied events)
local resolved_findings
resolved_findings=$(jq -r 'select(.type == "fix.applied") | .data.finding // empty' "$prev_events" 2>/dev/null || true)
if [[ -z "$resolved_findings" ]]; then
echo "[archeflow-memory] No resolved findings in previous run — nothing to regress." >&2
return 0
fi
# Extract current run findings from review.verdict events
local current_findings
current_findings=$(jq -r '
select(.type == "review.verdict") |
.data.findings[]? | .description // empty
' "$events_file" 2>/dev/null || true)
if [[ -z "$current_findings" ]]; then
echo "[archeflow-memory] No findings in current run — no regressions." >&2
return 0
fi
# Compare: for each resolved finding, check if it reappeared
local regressions=0
while IFS= read -r resolved; do
[[ -z "$resolved" ]] && continue
while IFS= read -r current; do
[[ -z "$current" ]] && continue
local overlap
overlap=$(keyword_overlap "$resolved" "$current")
if [[ "$overlap" -ge 50 ]]; then
echo "REGRESSION: \"$resolved\" (fixed in $prev_run_id) reappeared as \"$current\""
regressions=$((regressions + 1))
break
fi
done <<< "$current_findings"
done <<< "$resolved_findings"
if [[ "$regressions" -gt 0 ]]; then
echo "[archeflow-memory] $regressions regression(s) detected from run $prev_run_id." >&2
return 1
else
echo "[archeflow-memory] No regressions detected." >&2
return 0
fi
}
cmd_add() {
local type="${1:-preference}"
local desc="${2:-}"
@@ -474,17 +557,17 @@ cmd_forget() {
ensure_dir
# Check if the lesson exists
if ! jq -e "select(.id == \"$target_id\")" "$LESSONS_FILE" > /dev/null 2>&1; then
if ! jq -e --arg tid "$target_id" 'select(.id == $tid)' "$LESSONS_FILE" > /dev/null 2>&1; then
echo "Error: lesson $target_id not found." >&2
exit 1
fi
# Archive the lesson
jq -c "select(.id == \"$target_id\")" "$LESSONS_FILE" >> "$ARCHIVE_FILE"
jq -c --arg tid "$target_id" 'select(.id == $tid)' "$LESSONS_FILE" >> "$ARCHIVE_FILE"
# Remove from lessons
local tmp_file="${LESSONS_FILE}.tmp"
jq -c "select(.id != \"$target_id\")" "$LESSONS_FILE" > "$tmp_file"
jq -c --arg tid "$target_id" 'select(.id != $tid)' "$LESSONS_FILE" > "$tmp_file"
mv "$tmp_file" "$LESSONS_FILE"
echo "[archeflow-memory] Forgot lesson $target_id (moved to archive)" >&2
@@ -503,6 +586,7 @@ if [[ $# -lt 1 ]]; then
echo " decay Apply decay to all lessons" >&2
echo " forget <id> Archive a lesson by ID" >&2
echo " audit-check <run_id> Check lesson effectiveness for a run" >&2
echo " regression-check <events.jsonl> Detect regressions from previously fixed findings" >&2
exit 1
fi
@@ -535,6 +619,10 @@ case "$COMMAND" in
[[ $# -lt 1 ]] && { echo "Usage: $0 audit-check <run_id>" >&2; exit 1; }
cmd_audit_check "$1"
;;
regression-check)
[[ $# -lt 1 ]] && { echo "Usage: $0 regression-check <events.jsonl>" >&2; exit 1; }
cmd_regression_check "$1"
;;
*)
echo "Unknown command: $COMMAND" >&2
exit 1

228
lib/archeflow-replay.sh Executable file
View File

@@ -0,0 +1,228 @@
#!/usr/bin/env bash
# archeflow-replay.sh — Inspect recorded runs: decision timeline and weighted what-if replay.
#
# Usage:
# archeflow-replay.sh timeline <run_id>
# archeflow-replay.sh whatif <run_id> [--weights arch=w,arch2=w2] [--threshold 0.5] [--json]
# archeflow-replay.sh compare <run_id> [--weights ...] [--threshold ...] [--json]
#
# Events file: .archeflow/events/<run_id>.jsonl (relative to current working directory)
#
# whatif / compare:
# - Loads check-phase review.verdict events (last verdict per archetype).
# - Original gate (strict): BLOCK if any reviewer is not approved.
# - Replay gate (weighted): BLOCK if sum(weight * strict) / sum(weight) >= threshold,
# where strict=1 for non-approved verdicts, else 0. Default weight per archetype is 1.0.
#
# Requires: jq
set -euo pipefail
if [[ $# -lt 2 ]]; then
echo "Usage: $0 {timeline|whatif|compare} <run_id> [options]" >&2
echo "" >&2
echo " timeline <run_id> Decision timeline (decision.point + review.verdict)" >&2
echo " whatif <run_id> [--weights k=v,...] [--threshold 0.5] [--json]" >&2
echo " compare <run_id> (timeline + whatif summary)" >&2
exit 1
fi
COMMAND="$1"
RUN_ID="$2"
shift 2
if ! command -v jq &>/dev/null; then
echo "Error: jq is required." >&2
exit 1
fi
EVENT_FILE=".archeflow/events/${RUN_ID}.jsonl"
resolve_event_file() {
if [[ ! -f "$EVENT_FILE" ]]; then
echo "Error: event file not found: $EVENT_FILE" >&2
exit 1
fi
}
cmd_timeline() {
resolve_event_file
echo "## Decision timeline — run_id=${RUN_ID}"
echo ""
local cnt
cnt=$(jq -s '[.[] | select(.type == "decision.point")] | length' "$EVENT_FILE")
if [[ "$cnt" -gt 0 ]]; then
echo "### decision.point (${cnt})"
jq -r 'select(.type == "decision.point")
| "- \(.ts) [\(.phase)] \(.data.archetype // .agent // "?") \(.data.decision) conf=\(.data.confidence // "n/a") input=\(.data.input // "")"' \
"$EVENT_FILE"
echo ""
else
echo "### decision.point"
echo "(none — emit with ./lib/archeflow-decision.sh during the run)"
echo ""
fi
echo "### review.verdict (check phase)"
if jq -e -s '[.[] | select(.type == "review.verdict" and .phase == "check")] | length > 0' "$EVENT_FILE" >/dev/null 2>&1; then
jq -r 'select(.type == "review.verdict" and .phase == "check")
| "- \(.ts) \(.data.archetype // .agent // "?") verdict=\(.data.verdict) findings=\((.data.findings // []) | length)"' \
"$EVENT_FILE"
else
echo "(none)"
fi
echo ""
}
parse_weights_to_json() {
local raw="${1:-}"
local obj='{}'
if [[ -z "$raw" ]]; then
echo '{}'
return
fi
IFS=',' read -ra pairs <<< "$raw"
for pair in "${pairs[@]}"; do
[[ -z "$pair" ]] && continue
local k="${pair%%=*}"
local v="${pair#*=}"
k=$(echo "$k" | tr '[:upper:]' '[:lower:]' | xargs)
v=$(echo "$v" | xargs)
if [[ -z "$k" || "$k" == "$pair" ]]; then
echo "Error: invalid weight entry (use arch=1.5): $pair" >&2
exit 1
fi
obj=$(echo "$obj" | jq --arg k "$k" --argjson v "$v" '. + {($k): $v}')
done
echo "$obj"
}
cmd_whatif() {
local weights_str=""
local threshold="0.5"
local json_out="false"
while [[ $# -gt 0 ]]; do
case "$1" in
--weights)
weights_str="$2"
shift 2
;;
--threshold)
threshold="$2"
shift 2
;;
--json)
json_out="true"
shift
;;
*)
echo "Unknown option: $1" >&2
exit 1
;;
esac
done
resolve_event_file
local weights_json
weights_json="$(parse_weights_to_json "$weights_str")"
local result
result=$(jq -s --argjson weights "$weights_json" --argjson thr "$threshold" --arg run_id "$RUN_ID" '
def strict($v):
if $v == null then 1
else ($v | ascii_downcase) as $lv
| if ($lv == "approved" or $lv == "approve") then 0 else 1 end
end;
def norm_key: ascii_downcase;
([.[] | select(.type == "review.verdict" and .phase == "check")]
| sort_by(.seq)
| reduce .[] as $e ({}; . + { (($e.data.archetype // $e.agent // "unknown") | norm_key): $e })
) as $last |
($last | keys) as $keys |
if ($keys | length) == 0 then
{
run_id: $run_id,
error: "no check-phase review.verdict events; nothing to simulate"
}
else
[ $keys[] as $k | $last[$k] as $ev |
($weights[($k | norm_key)] // 1.0) as $w
| strict($ev.data.verdict) as $s
| {
archetype: ($ev.data.archetype // $ev.agent // $k),
verdict: ($ev.data.verdict // "unknown"),
weight: $w,
strict: $s,
weighted_contrib: ($w * $s)
}
] as $rows |
($rows | map(.weighted_contrib) | add) as $num |
($rows | map(.weight) | add) as $den |
(if $den > 0 then ($num / $den) else 0 end) as $ratio |
(if ($rows | map(.strict) | max) == 1 then "BLOCK" else "SHIP" end) as $strict_out |
(if $ratio >= $thr then "BLOCK" else "SHIP" end) as $replay_out |
{
run_id: $run_id,
threshold: $thr,
weights_used: $weights,
strict_any_veto: {
outcome: $strict_out,
description: "BLOCK if any reviewer verdict is not approved"
},
weighted_replay: {
weighted_strictness: ($ratio * 1000 | round / 1000),
outcome: $replay_out,
description: ("BLOCK if weighted strictness >= " + ($thr | tostring))
},
reviewers: $rows
}
end
' "$EVENT_FILE")
if [[ "$json_out" == "true" ]]; then
echo "$result"
else
echo "$result" | jq -r '
if .error then "Error: \(.error)" else
"# What-if replay — run_id=\(.run_id)\n",
"",
"## Outcomes",
"| Model | Result |",
"|-------|--------|",
"| Original (any non-approve → BLOCK) | \(.strict_any_veto.outcome) |",
"| Weighted replay (threshold=\(.threshold)) | \(.weighted_replay.outcome) |",
"",
"## Weighted strictness",
"\(.weighted_replay.weighted_strictness) (0 = all approved, 1 = all blocking)",
"",
"## Per reviewer",
"| Archetype | Verdict | Weight | Strict | w×strict |",
"|-----------|---------|--------|--------|----------|",
(.reviewers[] | "| \(.archetype) | \(.verdict) | \(.weight) | \(.strict) | \(.weighted_contrib) |"),
"",
(if (.weights_used | length) > 0 then
"## Custom weights applied\n" + (.weights_used | to_entries | map("- \(.key): \(.value)") | join("\n")) + "\n"
else empty end)
end
'
fi
}
cmd_compare() {
cmd_timeline
echo ""
cmd_whatif "$@"
}
case "$COMMAND" in
timeline) cmd_timeline ;;
whatif) cmd_whatif "$@" ;;
compare) cmd_compare "$@" ;;
*)
echo "Unknown command: $COMMAND" >&2
exit 1
;;
esac

197
lib/archeflow-review.sh Executable file
View File

@@ -0,0 +1,197 @@
#!/usr/bin/env bash
# archeflow-review.sh — Get a git diff for Guardian review, with stats.
#
# Standalone diff helper for af-review. No PDCA orchestration — just extracts
# the right diff and reports stats so the Claude Code agent can feed it to
# Guardian (or other reviewers).
#
# Usage:
# archeflow-review.sh # Uncommitted changes (staged + unstaged)
# archeflow-review.sh --branch feat/batch-api # Branch diff vs main
# archeflow-review.sh --commit HEAD~3..HEAD # Commit range
# archeflow-review.sh --base develop # Override base branch (default: main)
# archeflow-review.sh --stat-only # Only print stats, no diff output
#
# Output:
# Prints the diff to stdout. Stats go to stderr so they don't pollute the diff.
# Exit code 0 if diff is non-empty, 1 if empty (nothing to review).
set -euo pipefail
# ---------------------------------------------------------------------------
# Globals
# ---------------------------------------------------------------------------
BASE_BRANCH="main"
MODE="uncommitted" # uncommitted | branch | commit
TARGET=""
STAT_ONLY="false"
# ---------------------------------------------------------------------------
# Helpers
# ---------------------------------------------------------------------------
die() {
echo "[af-review] ERROR: $*" >&2
exit 1
}
info() {
echo "[af-review] $*" >&2
}
# Print diff stats (files changed, insertions, deletions) to stderr.
print_stats() {
local diff_text="$1"
local files_changed lines_added lines_removed total_lines
files_changed=$(echo "$diff_text" | grep -c '^diff --git' || true)
lines_added=$(echo "$diff_text" | grep -c '^+[^+]' || true)
lines_removed=$(echo "$diff_text" | grep -c '^-[^-]' || true)
total_lines=$(echo "$diff_text" | wc -l | tr -d ' ')
info "--- Review Stats ---"
info "Files changed: ${files_changed}"
info "Lines added: +${lines_added}"
info "Lines removed: -${lines_removed}"
info "Diff size: ${total_lines} lines"
if [[ "$total_lines" -gt 500 ]]; then
info "Warning: large diff (>500 lines). Consider reviewing per-file."
fi
}
# Detect the default base branch (main or master).
detect_base_branch() {
if git show-ref --verify --quiet "refs/heads/main" 2>/dev/null; then
echo "main"
elif git show-ref --verify --quiet "refs/heads/master" 2>/dev/null; then
echo "master"
else
echo "main"
fi
}
# ---------------------------------------------------------------------------
# Argument parsing
# ---------------------------------------------------------------------------
parse_args() {
while [[ $# -gt 0 ]]; do
case "$1" in
--branch)
MODE="branch"
TARGET="${2:?Missing branch name after --branch}"
shift 2
;;
--commit)
MODE="commit"
TARGET="${2:?Missing commit range after --commit}"
shift 2
;;
--base)
BASE_BRANCH="${2:?Missing base branch after --base}"
shift 2
;;
--stat-only)
STAT_ONLY="true"
shift
;;
-h|--help)
echo "Usage: $0 [--branch <name>] [--commit <range>] [--base <branch>] [--stat-only]"
echo ""
echo " (no args) Review uncommitted changes (staged + unstaged)"
echo " --branch <name> Review branch diff against base (default: main)"
echo " --commit <range> Review a commit range (e.g. HEAD~3..HEAD)"
echo " --base <branch> Override base branch (default: auto-detect main/master)"
echo " --stat-only Print stats only, no diff output"
exit 0
;;
*)
die "Unknown argument: $1. Use --help for usage."
;;
esac
done
}
# ---------------------------------------------------------------------------
# Diff extraction
# ---------------------------------------------------------------------------
get_diff() {
local diff_text=""
case "$MODE" in
uncommitted)
# Combine staged and unstaged changes against HEAD
diff_text=$(git diff HEAD 2>/dev/null || true)
if [[ -z "$diff_text" ]]; then
# Maybe everything is staged, try just staged
diff_text=$(git diff --cached 2>/dev/null || true)
fi
;;
branch)
# Verify target branch exists
if ! git show-ref --verify --quiet "refs/heads/${TARGET}" 2>/dev/null; then
# Maybe it's a remote branch
if ! git rev-parse --verify "${TARGET}" &>/dev/null; then
die "Branch '${TARGET}' not found."
fi
fi
diff_text=$(git diff "${BASE_BRANCH}...${TARGET}" 2>/dev/null || true)
;;
commit)
# Validate commit range resolves
if ! git rev-parse "${TARGET}" &>/dev/null 2>&1; then
die "Invalid commit range: '${TARGET}'"
fi
diff_text=$(git diff "${TARGET}" 2>/dev/null || true)
;;
esac
echo "$diff_text"
}
# ---------------------------------------------------------------------------
# Main
# ---------------------------------------------------------------------------
main() {
# Verify we're in a git repo
if ! git rev-parse --is-inside-work-tree &>/dev/null; then
die "Not inside a git repository."
fi
parse_args "$@"
# Auto-detect base branch if not overridden
if [[ "$BASE_BRANCH" == "main" ]]; then
BASE_BRANCH=$(detect_base_branch)
fi
# Describe what we're reviewing
case "$MODE" in
uncommitted) info "Reviewing: uncommitted changes vs HEAD" ;;
branch) info "Reviewing: branch '${TARGET}' vs '${BASE_BRANCH}'" ;;
commit) info "Reviewing: commit range '${TARGET}'" ;;
esac
local diff_text
diff_text=$(get_diff)
# Validate non-empty
if [[ -z "$diff_text" ]]; then
info "No changes found. Nothing to review."
exit 1
fi
# Print stats to stderr
print_stats "$diff_text"
# Output the diff to stdout (unless stat-only)
if [[ "$STAT_ONLY" != "true" ]]; then
echo "$diff_text"
fi
}
main "$@"

View File

@@ -1,26 +1,73 @@
#!/usr/bin/env bash
# archeflow-rollback.sh — Auto-revert a merge that fails post-merge tests.
# archeflow-rollback.sh — Auto-revert a merge that fails post-merge tests,
# or roll back to a specific PDCA phase boundary.
#
# Usage: archeflow-rollback.sh <run_id> [--test-cmd <cmd>]
# Usage:
# archeflow-rollback.sh <run_id> [--test-cmd <cmd>] # Post-merge test + revert
# archeflow-rollback.sh <run_id> --to <phase> # Roll back to phase boundary
#
# If --test-cmd not provided, reads test_command from .archeflow/config.yaml.
# Returns 0 if tests pass, 1 if tests fail (merge reverted).
# --to <phase>: Roll back to the given phase boundary (plan, do, or check).
# Delegates to archeflow-git.sh rollback and emits a decision event.
#
# If --test-cmd not provided (and --to not used), reads test_command from .archeflow/config.yaml.
# Returns 0 if tests pass (or rollback succeeds), 1 if tests fail (merge reverted).
set -euo pipefail
SCRIPT_DIR="$(cd "$(dirname "$0")" && pwd)"
RUN_ID="${1:?Usage: archeflow-rollback.sh <run_id> [--test-cmd <cmd>]}"
RUN_ID="${1:?Usage: archeflow-rollback.sh <run_id> [--test-cmd <cmd>] [--to <phase>]}"
shift
# Parse optional --test-cmd
# Parse options
TEST_CMD=""
TARGET_PHASE=""
while [[ $# -gt 0 ]]; do
case "$1" in
--test-cmd) TEST_CMD="$2"; shift 2 ;;
--to) TARGET_PHASE="$2"; shift 2 ;;
*) echo "Unknown option: $1" >&2; exit 2 ;;
esac
done
# Mutual exclusivity check
if [[ -n "$TARGET_PHASE" && -n "$TEST_CMD" ]]; then
echo "ERROR: --to and --test-cmd are mutually exclusive." >&2
exit 2
fi
# --- Phase rollback mode ---
if [[ -n "$TARGET_PHASE" ]]; then
# Validate phase name
case "$TARGET_PHASE" in
plan|do|check) ;;
*)
echo "ERROR: Invalid phase '$TARGET_PHASE'. Must be one of: plan, do, check" >&2
exit 2
;;
esac
echo "Rolling back run $RUN_ID to phase boundary: $TARGET_PHASE"
# Delegate to archeflow-git.sh
if [[ ! -x "$SCRIPT_DIR/archeflow-git.sh" ]]; then
echo "ERROR: archeflow-git.sh not found or not executable" >&2
exit 1
fi
"$SCRIPT_DIR/archeflow-git.sh" rollback "$RUN_ID" --to "$TARGET_PHASE"
# Emit decision event
if [[ -x "$SCRIPT_DIR/archeflow-event.sh" ]]; then
"$SCRIPT_DIR/archeflow-event.sh" "$RUN_ID" decision act "" \
"{\"what\":\"phase_rollback\",\"chosen\":\"rollback_to_${TARGET_PHASE}\",\"rationale\":\"user requested rollback to ${TARGET_PHASE} phase boundary\"}" ""
fi
echo "Rollback to $TARGET_PHASE complete for run $RUN_ID."
exit 0
fi
# --- Post-merge test mode ---
# Read test_command from config if not provided
if [[ -z "$TEST_CMD" ]]; then
if [[ -f ".archeflow/config.yaml" ]]; then

18
paper/Makefile Normal file
View File

@@ -0,0 +1,18 @@
# Build the ArcheFlow paper
# Usage: make (build PDF)
# make clean (remove build artifacts)
MAIN = archeflow
.PHONY: all clean
all: $(MAIN).pdf
$(MAIN).pdf: $(MAIN).tex references.bib
pdflatex $(MAIN)
bibtex $(MAIN)
pdflatex $(MAIN)
pdflatex $(MAIN)
clean:
rm -f $(MAIN).{aux,bbl,blg,log,out,pdf,toc,lof,lot,nav,snm,vrb}

880
paper/archeflow.tex Normal file
View File

@@ -0,0 +1,880 @@
\documentclass[11pt,a4paper]{article}
% ---- Packages ----
\usepackage[utf8]{inputenc}
\usepackage[T1]{fontenc}
\usepackage{amsmath,amssymb}
\usepackage{graphicx}
\usepackage{booktabs}
\usepackage{hyperref}
\usepackage{xcolor}
\usepackage{listings}
\usepackage{subcaption}
\usepackage{tikz}
\usetikzlibrary{shapes,arrows.meta,positioning,fit,calc}
\usepackage[numbers]{natbib}
\usepackage{geometry}
\geometry{margin=1in}
% ---- Listings style ----
\lstset{
basicstyle=\ttfamily\small,
breaklines=true,
frame=single,
framesep=3pt,
columns=flexible,
keepspaces=true,
showstringspaces=false,
commentstyle=\color{gray},
keywordstyle=\color{blue!70!black},
}
% ---- Title ----
\title{%
ArcheFlow: Multi-Agent Orchestration with\\
Archetypal Roles and PDCA Quality Cycles%
}
\author{
Christian Nennemann\\
Independent Researcher\\
\texttt{chris@nennemann.de}\\
\texttt{https://github.com/XORwell/archeflow}
}
\date{April 2026}
\begin{document}
\maketitle
% ============================================================
\begin{abstract}
We present \textsc{ArcheFlow}, an open-source orchestration framework for
multi-agent software engineering that assigns \emph{archetypal roles}---derived
from Jungian analytical psychology---to LLM agents and coordinates them through
\emph{Plan--Do--Check--Act} (PDCA) quality cycles. Each of seven archetypes
(Explorer, Creator, Maker, Guardian, Skeptic, Trickster, Sage) carries a defined
cognitive virtue and a quantitatively detected \emph{shadow}---a failure mode
triggered when the virtue becomes excessive. The framework implements a
three-layer corrective action system (archetype shadows, system shadows, policy
boundaries) that detects and mitigates agent dysfunction during autonomous
operation. We describe ArcheFlow's architecture as a zero-dependency plugin for
Claude Code, detail its attention filtering, feedback routing, convergence
detection, and effectiveness scoring mechanisms, and discuss connections to
recent work on persona stability in language models
\citep{lu2026assistant}. ArcheFlow demonstrates that structured persona
assignment with shadow detection can maintain productive agent behavior across
extended autonomous sessions spanning multiple projects and quality domains
(code, prose, research). The system is publicly available under the MIT license.
\end{abstract}
% ============================================================
\section{Introduction}
\label{sec:introduction}
The rise of agentic coding assistants---tools that autonomously write, test,
review, and commit code---has created a new class of software engineering
challenges. While individual LLM agents can produce competent code, the quality
of autonomous output degrades under conditions that are well-known from human
software teams: reviewers who rubber-stamp, architects who over-engineer,
implementers who ignore specifications, and testers who optimize for coverage
metrics rather than real defects.
These failure modes are not merely analogies. \citet{lu2026assistant}
demonstrate that language models occupy a measurable \emph{persona space} and
can drift from their trained Assistant identity during extended conversations,
particularly under emotional or philosophical pressure. Their ``Assistant
Axis''---a dominant directional component in activation space---predicts when
models will exhibit uncharacteristic behavior. If a single model drifts, a
multi-agent system where each agent maintains a distinct persona faces
compounded persona management challenges.
ArcheFlow addresses this problem by drawing on two established frameworks:
\begin{enumerate}
\item \textbf{Jungian archetypal psychology} \citep{jung1968archetypes}, which
provides a taxonomy of cognitive orientations---each with a productive
\emph{virtue} and a destructive \emph{shadow}---that map naturally onto
software engineering roles.
\item \textbf{PDCA quality cycles} \citep{deming1986out}, which provide a
convergence mechanism for iterative refinement with measurable exit criteria.
\end{enumerate}
The contribution of this paper is threefold:
\begin{itemize}
\item We present a \emph{shadow detection framework} that quantitatively
identifies agent dysfunction---not through sentiment analysis or output
classification, but through structural metrics (output length, finding ratios,
scope violations) specific to each archetype's failure mode (Section~\ref{sec:shadows}).
\item We describe \emph{attention filters} and \emph{feedback routing} mechanisms
that constrain what each agent sees and where its output flows, preventing the
information overload and echo chamber effects that plague na\"ive multi-agent
systems (Section~\ref{sec:attention}).
\item We demonstrate that PDCA convergence detection---including oscillation
analysis and divergence scoring---provides principled stopping criteria for
iterative review cycles (Section~\ref{sec:convergence}).
\end{itemize}
ArcheFlow is implemented as a zero-dependency plugin (Bash + Markdown) for
Claude Code\footnote{\url{https://claude.ai/claude-code}}, Anthropic's CLI
coding assistant. It has been used in production across a portfolio of 10--30
repositories spanning code, creative writing, and academic research.
% ============================================================
\section{Related Work}
\label{sec:related}
\subsection{Multi-Agent Software Engineering}
Multi-agent systems for software engineering have proliferated since 2024.
\citet{hong2024metagpt} propose MetaGPT, which assigns human-like roles
(product manager, architect, engineer) to LLM agents and enforces structured
communication through Standardized Operating Procedures (SOPs). ChatDev
\citep{qian2024chatdev} simulates a virtual software company with role-playing
agents communicating through natural language chat. SWE-Agent
\citep{yang2024sweagent} focuses on single-agent benchmark performance on
GitHub issues, demonstrating that tool-augmented agents can resolve real-world
bugs.
These systems share a common limitation: roles are defined by \emph{job
descriptions} rather than \emph{cognitive orientations}. A ``product manager''
agent may behave identically to a ``tech lead'' agent when both receive the same
context, because the role boundary is semantic rather than structural. ArcheFlow
addresses this through attention filters (Section~\ref{sec:attention}) that
physically restrict what each agent perceives, ensuring that role differences
manifest in behavior rather than merely in prompts.
\subsection{Persona Stability in Language Models}
\citet{lu2026assistant} identify the ``Assistant Axis'' in LLM activation
space---a linear direction capturing the degree to which a model operates in its
default helpful mode versus an alternative persona. Their key findings are
directly relevant to multi-agent orchestration:
\begin{enumerate}
\item \textbf{Persona space is low-dimensional}: only 4--19 principal
components explain 70\% of persona variance across 275 character archetypes.
\item \textbf{Drift is predictable}: user message embeddings predict response
position along the Assistant Axis ($R^2 = 0.53$--$0.77$).
\item \textbf{Drift correlates with harm}: models are more liable to produce
harmful outputs when drifted from the Assistant identity ($r = 0.39$--$0.52$).
\end{enumerate}
ArcheFlow's shadow detection (Section~\ref{sec:shadows}) can be understood as an
\emph{application-level} analog to activation capping: where \citet{lu2026assistant}
constrain neural activations to maintain persona stability, ArcheFlow constrains
\emph{behavioral outputs} through quantitative triggers and corrective prompts.
Both approaches recognize that productive personas require active stabilization,
not merely initial assignment.
\subsection{Quality Cycles in Software Engineering}
The Plan--Do--Check--Act (PDCA) cycle, formalized by \citet{deming1986out} and
rooted in Shewhart's statistical process control \citep{shewhart1939statistical},
is the dominant quality improvement framework in manufacturing and has been
applied to software engineering through agile retrospectives and continuous
improvement. To our knowledge, ArcheFlow is the first system to apply PDCA
cycles to multi-agent LLM orchestration with formal convergence detection and
oscillation analysis.
\subsection{Jungian Archetypes in Computing}
While Jungian archetypes have been applied in user experience design
\citep{hartson2012ux}, brand strategy, and game design, their application to
AI agent systems is novel. The closest related work is in computational
creativity, where archetypal narratives have been used to structure story
generation \citep{winston2011strong}. ArcheFlow extends this to software
engineering by mapping archetypal virtues and shadows to measurable engineering
outcomes.
% ============================================================
\section{Architecture}
\label{sec:architecture}
ArcheFlow is a plugin for Claude Code that operates entirely through prompt
engineering, shell scripts, and file-based communication. It has zero runtime
dependencies beyond Bash and a compatible LLM backend.
\begin{figure}[t]
\centering
\begin{tikzpicture}[
node distance=1.2cm and 2cm,
phase/.style={draw, rounded corners, minimum width=2.5cm, minimum height=0.8cm, font=\small\bfseries},
agent/.style={draw, rounded corners, minimum width=2cm, minimum height=0.6cm, font=\small, fill=blue!5},
arrow/.style={-{Stealth[length=3mm]}, thick},
label/.style={font=\scriptsize, text=gray},
]
% PDCA Cycle
\node[phase, fill=yellow!20] (plan) {Plan};
\node[phase, fill=green!20, right=of plan] (do) {Do};
\node[phase, fill=orange!20, right=of do] (check) {Check};
\node[phase, fill=red!15, right=of check] (act) {Act};
% Plan agents
\node[agent, below left=0.8cm and 0.3cm of plan] (explorer) {Explorer};
\node[agent, below right=0.8cm and 0.3cm of plan] (creator) {Creator};
% Do agent
\node[agent, below=0.8cm of do] (maker) {Maker};
% Check agents
\node[agent, below left=0.8cm and -0.2cm of check] (guardian) {Guardian};
\node[agent, below=0.8cm of check] (skeptic) {Skeptic};
\node[agent, below right=0.8cm and -0.2cm of check] (sage) {Sage};
% Arrows
\draw[arrow] (plan) -- (do);
\draw[arrow] (do) -- (check);
\draw[arrow] (check) -- (act);
\draw[arrow, dashed] (act.south) -- ++(0,-0.5) -| node[label, below, pos=0.25] {cycle back} (plan.south);
% Agent connections
\draw[-] (plan.south) -- (explorer.north);
\draw[-] (plan.south) -- (creator.north);
\draw[-] (do.south) -- (maker.north);
\draw[-] (check.south) -- (guardian.north);
\draw[-] (check.south) -- (skeptic.north);
\draw[-] (check.south) -- (sage.north);
\end{tikzpicture}
\caption{ArcheFlow PDCA cycle with archetypal agent assignments. The dashed arrow represents cycle-back when reviewers find issues. A Trickster agent (not shown) joins the Check phase in \texttt{thorough} workflows.}
\label{fig:pdca}
\end{figure}
\subsection{Components}
The system comprises four component types:
\begin{description}
\item[Agent personas] (\texttt{agents/*.md}): Behavioral protocols for each
archetype, defining the agent's cognitive lens, output format, and quality
criteria. Each persona is a Markdown file loaded as a system prompt.
\item[Skills] (\texttt{skills/*/SKILL.md}): Operational instructions that
Claude Code follows to orchestrate the PDCA cycle. The core \texttt{run} skill
(466 lines) is self-contained---it encodes the complete orchestration protocol
including workflow selection, agent spawning, attention filtering, convergence
checking, and exit decisions.
\item[Library scripts] (\texttt{lib/*.sh}): Ten Bash scripts handling
infrastructure concerns: JSONL event logging, git operations (per-phase
commits, branch management, rollback), cross-run memory, progress tracking,
effectiveness scoring, and run replay.
\item[Hooks] (\texttt{hooks/}): Session-start hook that auto-activates
ArcheFlow and injects the domain detection logic.
\end{description}
\subsection{Execution Modes}
ArcheFlow provides three execution modes optimized for different use cases:
\begin{description}
\item[Sprint] (\texttt{/af-sprint}): Queue-driven parallel dispatch. Reads a
priority-ordered task queue, spawns 3--5 agents across different projects
simultaneously, collects results, commits, and starts the next batch. Designed
for throughput over ceremony.
\item[Review] (\texttt{/af-review}): Guardian-led post-implementation review
on existing diffs, branches, or commit ranges. No planning or implementation
orchestration---pure quality analysis.
\item[Run] (\texttt{/af-run}): Full PDCA orchestration for complex tasks
requiring structured exploration, design, implementation, and multi-perspective
review.
\end{description}
\subsection{Domain Adaptation}
ArcheFlow adapts its terminology and quality criteria based on domain detection:
\texttt{code} (diffs, tests, security), \texttt{writing} (voice consistency,
dialect authenticity, narrative structure), and \texttt{research} (source quality,
argument coherence, citation accuracy). Domain is auto-detected from project
contents or specified in configuration.
% ============================================================
\section{The Seven Archetypes}
\label{sec:archetypes}
Each archetype embodies a cognitive orientation with a defined virtue (productive
mode) and shadow (destructive mode). \Cref{tab:archetypes} summarizes the
complete taxonomy.
\begin{table}[t]
\centering
\caption{The seven ArcheFlow archetypes with their PDCA phase assignments,
cognitive virtues, and shadow failure modes.}
\label{tab:archetypes}
\begin{tabular}{@{}llllll@{}}
\toprule
\textbf{Archetype} & \textbf{Phase} & \textbf{Virtue} & \textbf{Shadow} & \textbf{Model Tier} \\
\midrule
Explorer & Plan & Contextual Clarity & Rabbit Hole & Haiku \\
Creator & Plan & Decisive Framing & Over-Architect & Sonnet \\
Maker & Do & Execution Discipline & Rogue & Sonnet \\
Guardian & Check & Threat Intuition & Paranoid & Sonnet \\
Skeptic & Check & Assumption Surfacing & Paralytic & Haiku \\
Trickster & Check & Adversarial Creativity & False Alarm & Haiku \\
Sage & Check & Maintainability Judgment & Bureaucrat & Haiku \\
\bottomrule
\end{tabular}
\end{table}
The archetype--shadow pairing is not metaphorical; it is the core mechanism
for maintaining agent quality. The virtue describes \emph{what} the archetype
contributes; the shadow describes what happens when that contribution becomes
excessive. An Explorer who never stops researching (Rabbit Hole) delays the
entire pipeline. A Guardian who rejects everything (Paranoid) prevents any
code from shipping.
\subsection{Cost-Aware Model Assignment}
Not all archetypes require the same model capability. Analytical tasks
(exploration, assumption checking, code quality review) can be performed by
cheaper models (Haiku), while creative tasks (architecture design,
implementation, security analysis) benefit from more capable models (Sonnet).
This tiered assignment reduces per-run costs by 40--60\% compared to using the
most capable model for all agents, with no observed quality degradation in
analytical roles.
% ============================================================
\section{Shadow Detection and Corrective Action}
\label{sec:shadows}
\subsection{Archetype Shadows}
Shadow detection is \emph{quantitative, not sentiment-based}. Each archetype has
a specific trigger condition derived from structural properties of its output:
\begin{table}[h]
\centering
\caption{Shadow detection triggers. Each trigger is evaluated automatically
after the agent completes.}
\label{tab:shadows}
\begin{tabular}{@{}lll@{}}
\toprule
\textbf{Archetype} & \textbf{Shadow} & \textbf{Trigger} \\
\midrule
Explorer & Rabbit Hole & Output $> 2000$ words without Recommendation section \\
Creator & Over-Architect & $> 2$ new abstractions for a single feature \\
Maker & Rogue & No tests in changeset, or files outside proposal scope \\
Guardian & Paranoid & CRITICAL:WARNING ratio $> 2{:}1$, or zero approvals \\
Skeptic & Paralytic & $> 7$ challenges with $< 50\%$ having alternatives \\
Trickster & False Alarm & Findings in untouched code, or $> 10$ total findings \\
Sage & Bureaucrat & Review length $> 2\times$ code change length \\
\bottomrule
\end{tabular}
\end{table}
The escalation protocol follows a three-strike pattern:
\begin{enumerate}
\item \textbf{First detection}: Inject a correction prompt that names the
shadow and redirects the agent toward its virtue.
\item \textbf{Second detection} (same shadow, same run): Replace the agent
with a fresh instance.
\item \textbf{Third detection}: Escalate to the user for manual intervention.
\end{enumerate}
\subsection{System Shadows}
Beyond individual archetype dysfunction, ArcheFlow monitors for
\emph{system-level} failure modes:
\begin{description}
\item[Echo Chamber]: Multiple reviewers produce identical findings, suggesting
they are confirming each other rather than applying independent judgment.
Detected when $> 60\%$ of findings across reviewers share the same
file-and-category tuple.
\item[Tunnel Vision]: All findings cluster in a single file or module while
the changeset spans multiple. Detected when $> 80\%$ of findings target
$< 20\%$ of changed files.
\item[Scope Creep]: Maker modifies files not mentioned in the Creator's
proposal. Detected by comparing \texttt{do-maker-files.txt} against the
proposal's file list.
\end{description}
\subsection{Policy Boundaries and the Wiggum Break}
The third layer enforces operational limits through budget gates, cycle
limits, and checkpoint policies. When limits are exceeded, the system
triggers a \emph{Wiggum Break}\footnote{Named after Chief Wiggum from
\emph{The Simpsons}---a nod to both ``policy enforcement'' and the
Ralph Loop plugin for Claude Code.}---a circuit breaker that halts
execution, saves state, and reports to the user.
Wiggum Breaks are classified as \emph{hard} (halt immediately) or
\emph{soft} (finish current task, then halt):
\begin{description}
\item[Hard breaks]: 3 consecutive agent failures, 3 consecutive shadow
detections in one run, test suite broken after merge, 2+ oscillating
findings.
\item[Soft breaks]: convergence score $< 0.5$ for 2 consecutive cycles,
findings unchanged between cycles, budget $> 95\%$ spent.
\end{description}
Each Wiggum Break emits a \texttt{wiggum.break} event capturing the
trigger, run state, and unresolved findings for post-run analysis.
\subsection{Connection to the Assistant Axis}
The shadow detection framework addresses the same fundamental problem identified
by \citet{lu2026assistant}: models drift from productive personas during
extended operation. Where their work identifies drift in activation space and
proposes activation capping as a mitigation, ArcheFlow operates at the
\emph{behavioral} level---detecting drift through output structure rather than
internal representations, and correcting through prompt injection rather than
activation manipulation.
This application-level approach has a practical advantage: it requires no access
to model internals and works with any LLM backend, including API-only models
where activation-level interventions are impossible. The tradeoff is that
behavioral detection is necessarily coarser than activation-level measurement
and can only detect drift after it manifests in output, not before.
% ============================================================
\section{Attention Filters and Information Flow}
\label{sec:attention}
A key design principle is that each agent receives \emph{only the information
relevant to its role}. This is implemented through \emph{attention filters}---rules
governing which artifacts from prior phases are injected into each agent's
context.
\begin{table}[h]
\centering
\caption{Attention filter matrix. Each agent receives only the artifacts marked
with \checkmark.}
\label{tab:attention}
\begin{tabular}{@{}lccccc@{}}
\toprule
\textbf{Agent} & \textbf{Task} & \textbf{Explorer} & \textbf{Creator} & \textbf{Diff} & \textbf{Reviews} \\
\midrule
Explorer & \checkmark & & & & \\
Creator & \checkmark & \checkmark & & & \\
Maker & \checkmark & & \checkmark & & \\
Guardian & & & (risks) & \checkmark & \\
Skeptic & & & \checkmark & & \\
Sage & & & \checkmark & \checkmark & \\
Trickster & & & & \checkmark & \\
\bottomrule
\end{tabular}
\end{table}
The rationale for attention filtering is twofold:
\begin{enumerate}
\item \textbf{Independence}: Reviewers who see each other's findings tend to
converge on a shared narrative rather than applying independent judgment. By
isolating reviewer inputs, ArcheFlow ensures that each reviewer contributes a
genuinely distinct perspective.
\item \textbf{Focus}: An agent given everything tends to address everything,
producing diluted analysis. The Trickster, for example, receives \emph{only}
the diff---no design rationale, no risk analysis---forcing it to evaluate the
code purely on its own terms.
\end{enumerate}
In PDCA cycle 2+, the feedback from the Act phase is routed selectively:
Creator-routed issues go to the Creator, Maker-routed issues go to the Maker.
Neither sees the other's feedback, preventing defensive responses to criticism
that was directed elsewhere.
% ============================================================
\section{Feedback Routing}
\label{sec:routing}
When the Check phase identifies issues, the Act phase must decide where to route
each finding for the next cycle. ArcheFlow uses a deterministic routing table
based on the source archetype and finding category:
\begin{table}[h]
\centering
\caption{Feedback routing table. Findings are routed to the agent best equipped
to address them, preventing cross-contamination.}
\label{tab:routing}
\begin{tabular}{@{}llll@{}}
\toprule
\textbf{Source} & \textbf{Category} & \textbf{Routes To} & \textbf{Rationale} \\
\midrule
Guardian & security, breaking-change & Creator & Design must change \\
Guardian & reliability, dependency & Creator & Architectural decision \\
Skeptic & design, scalability & Creator & Assumptions need revision \\
Sage & quality, consistency & Maker & Implementation refinement \\
Sage & testing & Maker & Test gap, not design flaw \\
Trickster & reliability (design flaw) & Creator & Needs redesign \\
Trickster & reliability (test gap) & Maker & Needs more tests \\
\bottomrule
\end{tabular}
\end{table}
The disambiguation principle: if fixing the issue requires changing the
\emph{approach}, route to Creator. If it requires changing the \emph{code within
the existing approach}, route to Maker. Findings that persist across two
consecutive cycles are escalated to the user rather than cycled indefinitely.
% ============================================================
\section{Convergence Detection}
\label{sec:convergence}
\subsection{Convergence Score}
In PDCA cycle 2+, ArcheFlow compares current findings against the previous cycle
and classifies each as \textsc{New}, \textsc{Resolved}, \textsc{Persistent}, or
\textsc{Regressed}. The convergence score is:
\begin{equation}
C = \frac{|\textsc{Resolved}|}{|\textsc{Resolved}| + |\textsc{New}| + |\textsc{Regressed}|}
\label{eq:convergence}
\end{equation}
\begin{table}[h]
\centering
\caption{Convergence score interpretation and corresponding actions.}
\label{tab:convergence}
\begin{tabular}{@{}lll@{}}
\toprule
\textbf{Score Range} & \textbf{Status} & \textbf{Action} \\
\midrule
$C > 0.8$ & Converging & Continue if cycles remain \\
$0.5 \leq C \leq 0.8$ & Stalling & Continue with caution \\
$C < 0.5$ & Diverging & Stop if 2 consecutive diverging cycles \\
$C = 0$ & Stuck & Stop immediately \\
\bottomrule
\end{tabular}
\end{table}
\subsection{Oscillation Detection}
A finding is \emph{oscillating} if it was present in cycle $n-2$, absent in
cycle $n-1$, and present again in cycle $n$. Two or more oscillating findings
trigger an immediate stop with escalation to the user, as oscillation indicates
a fundamental tension in the review criteria that automated cycles cannot
resolve.
\subsection{Adaptive Workflow Escalation}
Convergence detection interacts with workflow selection through Rule A1: if a
\texttt{fast} workflow and Guardian finds $\geq 2$ CRITICAL findings, the next
cycle escalates to \texttt{standard} (adding Skeptic and Sage reviewers). Once
escalated, the workflow remains escalated for the duration of the run.
Conversely, Rule A2 provides a \emph{fast-path}: if Guardian finds zero CRITICAL
and zero WARNING findings, remaining reviewers are skipped entirely, and the
system proceeds directly to Act. This optimization reduces the cost of runs
where the Maker's implementation is clean.
% ============================================================
\section{Evidence Validation}
\label{sec:evidence}
Reviewer findings are subject to evidence validation before they influence
routing decisions. A CRITICAL or WARNING finding is downgraded to INFO if:
\begin{itemize}
\item It uses \emph{banned hedging phrases} without supporting evidence:
``might be'', ``could potentially'', ``appears to'', ``seems like'', ``may not''.
\item It contains \emph{no evidence}: no command output, code citation, line
reference, or reproduction steps.
\end{itemize}
This mechanism addresses a well-known failure mode of LLM reviewers: generating
plausible-sounding but unsupported concerns. By requiring evidence for
high-severity findings, ArcheFlow forces reviewers to ground their analysis in
the actual changeset rather than speculation.
Downgrades are tracked in the event log but do \emph{not} modify the original
artifact files, preserving the complete reviewer output for post-run analysis.
% ============================================================
\section{Effectiveness Scoring}
\label{sec:effectiveness}
After each completed run, ArcheFlow scores review archetypes across five
dimensions:
\begin{table}[h]
\centering
\caption{Effectiveness scoring dimensions and their weights.}
\label{tab:effectiveness}
\begin{tabular}{@{}lp{7cm}r@{}}
\toprule
\textbf{Dimension} & \textbf{Description} & \textbf{Weight} \\
\midrule
Signal-to-noise & Ratio of useful findings to total findings & 0.30 \\
Fix rate & Fraction of findings that led to applied fixes & 0.25 \\
Cost efficiency & Useful findings per dollar of model inference cost & 0.20 \\
Accuracy & Fraction not contradicted by other reviewers & 0.15 \\
Cycle impact & Whether findings contributed to cycle exit decision & 0.10 \\
\bottomrule
\end{tabular}
\end{table}
Scores accumulate in a cross-run memory file
(\texttt{.archeflow/memory/effectiveness.jsonl}). After 10+ completed runs,
the system recommends model tier changes (e.g., promoting a Haiku-tier reviewer
to Sonnet if its signal-to-noise is consistently high) and, in extreme cases,
archetype removal for persistently low-scoring reviewers.
% ============================================================
\section{Cross-Run Memory}
\label{sec:memory}
ArcheFlow maintains a lesson-learning system that persists across runs. When
recurring findings are detected---the same category of issue appearing in
multiple runs---the system stores a lesson and injects it into future agents
as additional context.
Lessons decay over time: each lesson has a relevance counter that increments on
reuse and decrements on irrelevance. Lessons that fall below a threshold are
archived rather than injected, preventing the accumulation of stale guidance.
The memory system also performs regression detection: if a previously resolved
issue reappears, it is flagged as a regression with higher priority than a
fresh finding.
% ============================================================
\section{Implementation}
\label{sec:implementation}
ArcheFlow is implemented in approximately 6,700 lines across three layers:
\begin{itemize}
\item \textbf{Skills} (19 Markdown files, $\sim$2,500 lines): Operational
instructions for Claude Code, written as imperative protocols. The core
\texttt{run} skill encodes the complete PDCA orchestration in 466 lines.
\item \textbf{Agent personas} (7 Markdown files, $\sim$700 lines): Behavioral
protocols defining each archetype's cognitive lens, output format, and
self-review checklist.
\item \textbf{Library scripts} (10 Bash scripts, $\sim$3,500 lines): Event
logging, git operations, memory management, progress tracking, effectiveness
scoring, and run replay.
\end{itemize}
The system uses no database, no API server, and no runtime dependencies beyond
Bash 4+ and a Claude Code installation. All state is stored in JSONL event logs
and Markdown artifact files. This zero-dependency architecture was a deliberate
design choice: orchestration infrastructure that itself requires complex setup
and maintenance undermines the autonomy it is supposed to enable.
\subsection{Git Integration}
ArcheFlow creates per-phase commits, enabling fine-grained rollback. The Maker
operates in a git worktree---an isolated working copy---so its changes do not
affect the main branch until explicitly merged. If post-merge tests fail, the
system auto-reverts the merge and cycles back with ``integration test failure''
feedback.
\subsection{Run Replay}
All orchestration decisions are logged as \texttt{decision.point} events,
enabling post-hoc analysis. The replay system provides:
\begin{itemize}
\item \textbf{Timeline view}: chronological sequence of all decisions with
confidence scores.
\item \textbf{Weighted what-if}: re-evaluation of the ship/block outcome
using different reviewer weights, answering questions like ``would the outcome
have changed if we weighted Guardian 2x and Sage 0.5x?''
\item \textbf{Cross-run comparison}: side-by-side analysis of decision
patterns across runs.
\end{itemize}
% ============================================================
\section{Multi-Domain Application}
\label{sec:domains}
ArcheFlow's archetype system extends beyond code. The framework has been
deployed across three domains:
\subsection{Software Engineering}
The primary domain. Archetypes map to standard engineering roles: Explorer
performs codebase research, Creator designs architecture, Maker writes code,
and the Check-phase archetypes review for security (Guardian), design flaws
(Skeptic), edge cases (Trickster), and overall quality (Sage).
\subsection{Creative Writing}
In writing mode, the same archetype structure applies with adapted quality
criteria. Custom archetypes (story-explorer, story-sage) replace or augment
the defaults. The framework integrates with Colette, a voice profiling system
that maintains consistent authorial voice across chapters. Quality gates check
for voice consistency, dialect authenticity, and narrative structure rather
than test coverage and security.
\subsection{Academic Research}
In research mode, quality criteria shift to source quality, argument coherence,
citation accuracy, and methodological rigor. The Guardian reviews for logical
fallacies and unsupported claims rather than security vulnerabilities.
% ============================================================
\section{Discussion}
\label{sec:discussion}
\subsection{Archetypes vs. Role Descriptions}
The key distinction between ArcheFlow's approach and prior multi-agent systems
is the \emph{shadow} mechanism. A role description tells an agent what to do;
an archetype tells an agent what to do \emph{and what doing too much of it
looks like}. This bidirectional specification creates a bounded operating
range for each agent, preventing the unbounded optimization that leads to
dysfunction.
The connection to \citet{lu2026assistant}'s persona axis is instructive.
They show that model personas exist on a continuum, with the Assistant identity
at one extreme and theatrical/mystical identities at the other. ArcheFlow's
archetypes deliberately position agents \emph{away} from the default Assistant
toward specific cognitive orientations---but the shadow mechanism prevents them
from drifting too far, maintaining a productive operating range analogous to
what \citeauthor{lu2026assistant} achieve through activation capping.
\subsection{Wiggum Breaks as Human-in-the-Loop Boundaries}
A central question in autonomous agent systems is: \emph{when should the
system stop acting and ask a human?} Most frameworks treat this as an
implementation detail---a timeout, a retry limit, an exception handler.
ArcheFlow treats it as a first-class architectural concept through the
\emph{Wiggum Break}.
The Wiggum Break defines the \textbf{formal boundary between autonomous and
human-supervised operation}. It is not a failure mode; it is the system's
\emph{designed} response to situations where autonomous resolution is
provably unproductive:
\begin{itemize}
\item \textbf{Oscillation} (finding present $\to$ absent $\to$ present)
indicates a genuine tension in the review criteria that no amount of
cycling will resolve---only human judgment about which criterion takes
priority.
\item \textbf{Divergence} (convergence score $< 0.5$ for two consecutive
cycles) indicates that the implementation is getting worse with each
iteration---the agents lack the context or capability to solve the
problem, and continuing wastes resources.
\item \textbf{Repeated shadow detection} (same dysfunction three times)
indicates that the corrective action framework has exhausted its
options---the task structure is incompatible with the assigned archetype,
and a human must re-scope.
\end{itemize}
This framing inverts the typical HITL paradigm. Rather than asking
``how much autonomy should the system have?'' and pre-defining approval
gates, ArcheFlow asks ``under what conditions is autonomy
\emph{provably unproductive}?'' and derives the HITL boundary from
convergence theory. The system runs autonomously by default and escalates
only when it can demonstrate---through quantitative metrics, not
heuristics---that continued autonomous operation will not improve the
outcome.
This approach has three advantages over pre-defined approval gates:
\begin{enumerate}
\item \textbf{Adaptive autonomy}: Simple tasks never trigger a Wiggum
Break; complex tasks trigger one quickly. The HITL boundary adapts to
task difficulty without manual configuration.
\item \textbf{Auditable escalation}: Every Wiggum Break emits a
\texttt{wiggum.break} event with the trigger condition, run state, and
unresolved findings. The human receives not just a request for help,
but a structured summary of \emph{why} autonomous resolution failed
and what specifically needs their judgment.
\item \textbf{Minimal interruption}: Pre-defined gates (``approve every
PR'', ``review every design'') interrupt the human on tasks the system
could have handled autonomously. Convergence-derived breaks interrupt
only when the system has evidence that it cannot proceed productively.
\end{enumerate}
The Wiggum Break thus operationalizes a principle from resilience
engineering: the system should be \emph{autonomy-seeking} (preferring to
resolve issues itself) but \emph{escalation-ready} (able to produce a
useful handoff when self-resolution fails). The quality of the handoff---not
just the fact of escalation---is what makes HITL effective.
\subsection{Limitations}
\begin{enumerate}
\item \textbf{No activation-level control}: ArcheFlow operates purely at the
prompt level. It cannot detect persona drift before it manifests in output,
unlike activation-level approaches \citep{lu2026assistant}.
\item \textbf{Single LLM backend}: The current implementation targets Claude
Code. While the architectural principles are model-agnostic, the skill and
hook system is specific to Claude Code's plugin API.
\item \textbf{Evaluation methodology}: We have not conducted controlled
experiments comparing ArcheFlow's output quality against baselines (single-agent,
role-based multi-agent without shadows, PDCA without archetypes). The system
has been evaluated through production use across real projects, which
demonstrates practical utility but not causal attribution.
\item \textbf{Shadow trigger thresholds}: The quantitative thresholds
(e.g., 2000 words for Rabbit Hole, ratio $> 2{:}1$ for Paranoid) were
determined empirically through iterative use and may not generalize across
all codebases and domains.
\end{enumerate}
\subsection{Future Work}
\begin{enumerate}
\item \textbf{Activation-level integration}: Combining behavioral shadow
detection with the Assistant Axis measurement from \citet{lu2026assistant}
could provide earlier and more reliable drift detection, particularly for
open-weight models where activations are accessible.
\item \textbf{Controlled evaluation}: A systematic comparison across standard
benchmarks (SWE-bench, HumanEval) would establish whether the archetype +
PDCA approach provides measurable quality improvements over simpler
orchestration strategies.
\item \textbf{Archetype discovery}: Rather than hand-designing archetypes,
the persona space analysis from \citet{lu2026assistant} could be used to
identify \emph{natural} cognitive orientations that models adopt, potentially
revealing useful archetypes that human intuition would not suggest.
\item \textbf{Cross-model persona stability}: Investigating whether shadow
triggers calibrated for one model family transfer to others, or whether
per-model calibration is necessary.
\end{enumerate}
% ============================================================
\section{Conclusion}
\label{sec:conclusion}
ArcheFlow demonstrates that multi-agent LLM orchestration benefits from
structured persona management---not just telling agents \emph{what to do},
but actively monitoring and correcting \emph{how they do it}. The combination
of Jungian archetypes (providing a principled taxonomy of cognitive virtues and
their failure modes) with PDCA quality cycles (providing convergence guarantees
and principled stopping criteria) produces an orchestration framework that
maintains productive agent behavior across extended autonomous sessions.
The shadow detection mechanism---quantitative triggers for archetype-specific
dysfunction---addresses the same persona stability challenge identified by
\citet{lu2026assistant} at the application level, requiring no access to model
internals and working with any LLM backend. While coarser than activation-level
approaches, behavioral shadow detection is practical, interpretable, and
immediately deployable.
ArcheFlow is open-source under the MIT license and available at
\url{https://github.com/XORwell/archeflow}.
% ============================================================
\section*{Acknowledgments}
The author thanks the Claude Code team at Anthropic for building the plugin
infrastructure that made ArcheFlow possible, and the authors of
\citet{lu2026assistant} for the Assistant Axis framework that informed the
theoretical grounding of shadow detection.
% ============================================================
\bibliographystyle{plainnat}
\bibliography{references}
\end{document}

89
paper/references.bib Normal file
View File

@@ -0,0 +1,89 @@
@article{lu2026assistant,
title={The Assistant Axis: Situating and Stabilizing the Default Persona of Language Models},
author={Lu, Christina and Gallagher, Jack and Michala, Jonathan and Fish, Kyle and Lindsey, Jack},
journal={arXiv preprint arXiv:2601.10387},
year={2026},
url={https://arxiv.org/abs/2601.10387}
}
@book{jung1968archetypes,
title={The Archetypes and the Collective Unconscious},
author={Jung, Carl Gustav},
year={1968},
publisher={Princeton University Press},
edition={2nd},
series={Collected Works of C.G. Jung},
volume={9}
}
@book{deming1986out,
title={Out of the Crisis},
author={Deming, W. Edwards},
year={1986},
publisher={MIT Press},
address={Cambridge, MA}
}
@book{shewhart1939statistical,
title={Statistical Method from the Viewpoint of Quality Control},
author={Shewhart, Walter Andrew},
year={1939},
publisher={Graduate School of the Department of Agriculture},
address={Washington, DC}
}
@article{hong2024metagpt,
title={MetaGPT: Meta Programming for A Multi-Agent Collaborative Framework},
author={Hong, Sirui and Zhuge, Mingchen and Chen, Jonathan and Zheng, Xiawu and Cheng, Yuheng and Zhang, Ceyao and Wang, Jinlin and Wang, Zili and Yau, Steven Ka Shing and Lin, Zijuan and Zhou, Liyang and Ran, Chenyu and Xiao, Lingfeng and Wu, Chenglin and Schmidhuber, J{\"u}rgen},
journal={arXiv preprint arXiv:2308.00352},
year={2024},
url={https://arxiv.org/abs/2308.00352}
}
@article{qian2024chatdev,
title={ChatDev: Communicative Agents for Software Development},
author={Qian, Chen and Liu, Wei and Liu, Hongzhang and Chen, Nuo and Dang, Yufan and Li, Jiahao and Yang, Cheng and Chen, Weize and Su, Yusheng and Cong, Xin and Xu, Juyuan and Li, Dahai and Liu, Zhiyuan and Sun, Maosong},
journal={arXiv preprint arXiv:2307.07924},
year={2024},
url={https://arxiv.org/abs/2307.07924}
}
@article{yang2024sweagent,
title={SWE-agent: Agent-Computer Interfaces Enable Automated Software Engineering},
author={Yang, John and Jimenez, Carlos E and Wettig, Alexander and Liber, Kilian and Narasimhan, Karthik and Press, Ofir},
journal={arXiv preprint arXiv:2405.15793},
year={2024},
url={https://arxiv.org/abs/2405.15793}
}
@article{chen2025persona,
title={Persona Vectors: Monitoring and Controlling Character Traits via Activation Directions},
author={Chen, Yiwei and others},
journal={arXiv preprint arXiv:2507.21509},
year={2025},
url={https://arxiv.org/abs/2507.21509}
}
@article{bai2022constitutional,
title={Constitutional AI: Harmlessness from AI Feedback},
author={Bai, Yuntao and Kadavath, Saurav and Kundu, Sandipan and Askell, Amanda and Kernion, Jackson and Jones, Andy and Chen, Anna and Goldie, Anna and Mirhoseini, Azalia and McKinnon, Cameron and others},
journal={arXiv preprint arXiv:2212.08073},
year={2022},
url={https://arxiv.org/abs/2212.08073}
}
@book{hartson2012ux,
title={The UX Book: Process and Guidelines for Ensuring a Quality User Experience},
author={Hartson, Rex and Pyla, Pardha S.},
year={2012},
publisher={Morgan Kaufmann},
address={Burlington, MA}
}
@inproceedings{winston2011strong,
title={The Strong Story Hypothesis and the Directed Perception Hypothesis},
author={Winston, Patrick Henry},
booktitle={AAAI Fall Symposium: Advances in Cognitive Systems},
year={2011},
pages={345--352}
}

194
paper/taxonomy-refs.bib Normal file
View File

@@ -0,0 +1,194 @@
% ---- Agent Frameworks ----
@article{hong2024metagpt,
title={MetaGPT: Meta Programming for A Multi-Agent Collaborative Framework},
author={Hong, Sirui and Zhuge, Mingchen and Chen, Jonathan and Zheng, Xiawu and Cheng, Yuheng and Zhang, Ceyao and Wang, Jinlin and Wang, Zili and Yau, Steven Ka Shing and Lin, Zijuan and Zhou, Liyang and Ran, Chenyu and Xiao, Lingfeng and Wu, Chenglin and Schmidhuber, J{\"u}rgen},
journal={arXiv preprint arXiv:2308.00352},
year={2024},
url={https://arxiv.org/abs/2308.00352}
}
@article{qian2024chatdev,
title={ChatDev: Communicative Agents for Software Development},
author={Qian, Chen and Liu, Wei and Liu, Hongzhang and Chen, Nuo and Dang, Yufan and Li, Jiahao and Yang, Cheng and Chen, Weize and Su, Yusheng and Cong, Xin and Xu, Juyuan and Li, Dahai and Liu, Zhiyuan and Sun, Maosong},
journal={arXiv preprint arXiv:2307.07924},
year={2024},
url={https://arxiv.org/abs/2307.07924}
}
@article{wu2023autogen,
title={AutoGen: Enabling Next-Gen LLM Applications via Multi-Agent Conversation},
author={Wu, Qingyun and Bansal, Gagan and Zhang, Jieyu and Wu, Yiran and Li, Beibin and Zhu, Erkang and Jiang, Li and Zhang, Xiaoyun and Zhang, Shaokun and Liu, Jiale and Awadallah, Ahmed Hassan and White, Ryen W. and Burger, Doug and Wang, Chi},
journal={arXiv preprint arXiv:2308.08155},
year={2023},
url={https://arxiv.org/abs/2308.08155}
}
@article{yang2024sweagent,
title={SWE-agent: Agent-Computer Interfaces Enable Automated Software Engineering},
author={Yang, John and Jimenez, Carlos E and Wettig, Alexander and Liber, Kilian and Narasimhan, Karthik and Press, Ofir},
journal={arXiv preprint arXiv:2405.15793},
year={2024},
url={https://arxiv.org/abs/2405.15793}
}
@article{nennemann2026archeflow,
title={ArcheFlow: Multi-Agent Orchestration with Archetypal Roles and PDCA Quality Cycles},
author={Nennemann, Christian},
journal={arXiv preprint},
year={2026},
url={https://github.com/XORwell/archeflow}
}
@article{nguyen2024agilecoder,
title={AgileCoder: Dynamic Collaborative Agents for Software Development based on Agile Methodology},
author={Nguyen, Minh Huynh and Chau, Thang Phan and Phung, Phong X. and Nguyen, Nghi D. Q.},
journal={arXiv preprint arXiv:2406.11912},
year={2024},
url={https://arxiv.org/abs/2406.11912}
}
@article{patel2026sixsigma,
title={The Six Sigma Agent: Achieving Enterprise-Grade Reliability in LLM Systems Through Consensus-Driven Decomposed Execution},
author={Patel, Rushi and Surendira, Bala and George, Allen and Kapale, Kiran},
journal={arXiv preprint arXiv:2601.22290},
year={2026},
url={https://arxiv.org/abs/2601.22290}
}
@article{shinn2023reflexion,
title={Reflexion: Language Agents with Verbal Reinforcement Learning},
author={Shinn, Noah and Cassano, Federico and Gopinath, Ashwin and Narasimhan, Karthik and Yao, Shunyu},
journal={Advances in Neural Information Processing Systems},
volume={36},
year={2023},
url={https://arxiv.org/abs/2303.11366}
}
@article{xia2024eddops,
title={Evaluation-Driven Development and Operations of LLM Agents: A Process Model and Reference Architecture},
author={Xia, Boming and Lu, Qinghua and Zhu, Liming and Xing, Zhenchang and Zhao, Dehai and Zhang, Hao},
journal={arXiv preprint arXiv:2411.13768},
year={2024},
url={https://arxiv.org/abs/2411.13768}
}
@article{rasheed2024survey,
title={LLM-Based Multi-Agent Systems for Software Engineering: Literature Review, Vision and the Road Ahead},
author={Rasheed, Zeeshan and others},
journal={ACM Transactions on Software Engineering and Methodology},
year={2025},
url={https://arxiv.org/abs/2404.04834}
}
@article{li2023camel,
title={CAMEL: Communicative Agents for ``Mind'' Exploration of Large Language Model Society},
author={Li, Guohao and Hammoud, Hasan Abed Al Kader and Itani, Hani and Khizbullin, Dmitrii and Ghanem, Bernard},
journal={Advances in Neural Information Processing Systems},
volume={36},
year={2023},
url={https://arxiv.org/abs/2303.17760}
}
% ---- Persona Stability ----
@article{lu2026assistant,
title={The Assistant Axis: Situating and Stabilizing the Default Persona of Language Models},
author={Lu, Christina and Gallagher, Jack and Michala, Jonathan and Fish, Kyle and Lindsey, Jack},
journal={arXiv preprint arXiv:2601.10387},
year={2026},
url={https://arxiv.org/abs/2601.10387}
}
% ---- PM/OM Foundations ----
@book{deming1986out,
title={Out of the Crisis},
author={Deming, W. Edwards},
year={1986},
publisher={MIT Press},
address={Cambridge, MA}
}
@book{shewhart1939statistical,
title={Statistical Method from the Viewpoint of Quality Control},
author={Shewhart, Walter Andrew},
year={1939},
publisher={Graduate School of the Department of Agriculture},
address={Washington, DC}
}
@book{goldratt1984goal,
title={The Goal: A Process of Ongoing Improvement},
author={Goldratt, Eliyahu M. and Cox, Jeff},
year={1984},
publisher={North River Press},
address={Great Barrington, MA}
}
@book{ohno1988toyota,
title={Toyota Production System: Beyond Large-Scale Production},
author={Ohno, Taiichi},
year={1988},
publisher={Productivity Press},
address={Portland, OR}
}
@book{womack1996lean,
title={Lean Thinking: Banish Waste and Create Wealth in Your Corporation},
author={Womack, James P. and Jones, Daniel T.},
year={1996},
publisher={Simon \& Schuster},
address={New York}
}
@article{cooper1990stagegate,
title={Stage-Gate Systems: A New Tool for Managing New Products},
author={Cooper, Robert G.},
journal={Business Horizons},
volume={33},
number={3},
pages={44--54},
year={1990},
publisher={Elsevier}
}
@article{snowden2007cynefin,
title={A Leader's Framework for Decision Making},
author={Snowden, David J. and Boone, Mary E.},
journal={Harvard Business Review},
volume={85},
number={11},
pages={68--76},
year={2007}
}
@book{altshuller1999innovation,
title={The Innovation Algorithm: TRIZ, Systematic Innovation and Technical Creativity},
author={Altshuller, Genrich},
year={1999},
publisher={Technical Innovation Center},
address={Worcester, MA}
}
@article{boyd1976destruction,
title={Destruction and Creation},
author={Boyd, John R.},
year={1976},
note={Unpublished manuscript, widely circulated}
}
@book{schwaber2020scrum,
title={The Scrum Guide},
author={Schwaber, Ken and Sutherland, Jeff},
year={2020},
publisher={Scrum.org},
note={Available at \url{https://scrumguides.org}}
}
@techreport{mil1949fmea,
title={MIL-P-1629: Procedures for Performing a Failure Mode, Effects and Criticality Analysis},
institution={United States Department of Defense},
year={1949},
note={Revised as MIL-STD-1629A, 1980}
}

805
paper/taxonomy.tex Normal file
View File

@@ -0,0 +1,805 @@
\documentclass[11pt,a4paper]{article}
% ---- Packages ----
\usepackage[utf8]{inputenc}
\usepackage[T1]{fontenc}
\usepackage{amsmath,amssymb}
\usepackage{graphicx}
\usepackage{booktabs}
\usepackage{hyperref}
\usepackage{xcolor}
\usepackage{listings}
\usepackage{subcaption}
\usepackage{tikz}
\usetikzlibrary{shapes,arrows.meta,positioning,fit,calc,matrix}
\usepackage[numbers]{natbib}
\usepackage{geometry}
\usepackage{enumitem}
\geometry{margin=1in}
% ---- Colors ----
\definecolor{highfit}{HTML}{2E7D32}
\definecolor{medfit}{HTML}{F57F17}
\definecolor{lowfit}{HTML}{C62828}
\definecolor{neutral}{HTML}{546E7A}
% ---- Title ----
\title{%
From Factory Floor to Token Stream:\\
A Taxonomy of Operations Management Methods\\
for LLM Agent Orchestration%
}
\author{
Christian Nennemann\\
Independent Researcher\\
\texttt{chris@nennemann.de}
}
\date{April 2026}
\begin{document}
\maketitle
% ============================================================
\begin{abstract}
Multi-agent systems built on large language models (LLMs) increasingly adopt
metaphors from human project management---sprints, standups, code review---yet
draw from a remarkably narrow slice of the operations management literature.
This paper presents a systematic taxonomy of twelve established PM/OM methods,
evaluates their structural compatibility with LLM agent constraints (stateless
invocation, cheap cloning, deterministic dysfunction, absence of human
psychology), and identifies which methods are underexploited, which are
inapplicable, and which require fundamental adaptation. We find that methods
designed for \emph{flow optimization} (Kanban, Theory of Constraints) and
\emph{rapid decision-making} (OODA Loop) are structurally well-suited to
agent orchestration but remain largely unexplored, while methods centered on
\emph{human psychology} (Scrum ceremonies, Design Thinking empathy phases)
transfer poorly without significant reformulation. We propose a decision
framework for selecting orchestration methods based on task complexity, agent
count, and quality requirements, and identify five open research directions
at the intersection of operations management and agentic AI.
\end{abstract}
% ============================================================
\section{Introduction}
\label{sec:intro}
The dominant paradigm for multi-agent LLM systems borrows from agile software
development: agents are organized into ``teams'' with role-based
specialization, tasks are decomposed into work items, and results are reviewed
before merging \citep{hong2024metagpt, qian2024chatdev}. This borrowing is
natural---the humans building these systems are software engineers familiar
with agile methods---but it is also narrow. The operations management
literature contains dozens of methods developed over a century of industrial
practice, each encoding different assumptions about workflow structure, quality
assurance, failure modes, and coordination costs.
Not all of these methods are equally applicable to LLM agents. Agents differ
from human workers in five structurally important ways:
\begin{enumerate}[label=\textbf{C\arabic*}]
\item \label{c:stateless} \textbf{Stateless invocation}: Agents do not
retain memory between invocations unless explicitly persisted. Human team
members accumulate institutional knowledge automatically.
\item \label{c:cloning} \textbf{Cheap to clone, expensive to coordinate}:
Spawning a new agent costs milliseconds and cents; coordinating two agents
costs tokens and latency. For human teams, the inverse holds---hiring is
expensive, coordination is (comparatively) cheap.
\item \label{c:dysfunction} \textbf{Deterministic dysfunction}: LLM agents
fail in predictable, repeatable patterns---verbosity, scope creep, false
positives---rather than the varied, context-dependent failures of human
cognition \citep{nennemann2026archeflow}.
\item \label{c:psychology} \textbf{No psychology}: Agents have no morale,
fatigue, ego, or office politics. Methods designed to manage human
psychology (retrospectives, team-building, conflict resolution) have no
direct function.
\item \label{c:speed} \textbf{Cycle speed}: Agents complete tasks in
seconds to minutes, enabling iteration frequencies that would be
impractical for human teams. Methods that assume week-long or month-long
cycles can be compressed.
\end{enumerate}
These constraints define a \emph{fitness landscape}: some PM/OM methods gain
effectiveness when applied to agents (because agents remove friction those
methods were designed to manage), while others lose their raison d'\^etre
(because they solve human problems agents don't have).
This paper contributes:
\begin{itemize}
\item A systematic taxonomy of twelve PM/OM methods evaluated against the
five agent constraints (\ref{c:stateless}--\ref{c:speed}).
\item A compatibility matrix scoring each method's structural fit for
agent orchestration (\S\ref{sec:matrix}).
\item A decision framework for practitioners selecting orchestration
strategies (\S\ref{sec:decision}).
\item Five open research directions at the intersection of operations
management theory and agentic AI (\S\ref{sec:future}).
\end{itemize}
% ============================================================
\section{Background: Current Agent Orchestration Landscape}
\label{sec:background}
\subsection{Frameworks and Their Implicit PM Models}
The current generation of multi-agent LLM frameworks implicitly adopts
project management concepts, though rarely with explicit attribution to
PM/OM theory.
\textbf{MetaGPT} \citep{hong2024metagpt} assigns human job titles (product
manager, architect, engineer) and enforces communication through Standardized
Operating Procedures (SOPs)---an implicit adoption of \emph{waterfall}
phase gates with role-based access control.
\textbf{ChatDev} \citep{qian2024chatdev} simulates a software company with
sequential phases (design, coding, testing, documentation). Despite the
``company'' framing, the execution model is a \emph{linear pipeline} with
pair-programming-style chat between adjacent roles.
\textbf{AgileCoder} \citep{nguyen2024agilecoder} is the first framework to
explicitly adopt sprint-based iteration, assigning Scrum Master and Product
Manager roles to LLM agents with a Dynamic Code Graph Generator tracking
inter-file dependencies between sprints.
\textbf{CrewAI} organizes agents into ``crews'' with a ``manager'' agent
orchestrating task delegation---an implicit \emph{hierarchical management}
model with single-point-of-failure coordination.
\textbf{AutoGen} \citep{wu2023autogen} provides a conversation-based
framework where agents negotiate through multi-turn dialogue. The implicit
model is \emph{committee decision-making}---all agents see all messages,
consensus emerges through discussion.
\textbf{The Six Sigma Agent} \citep{patel2026sixsigma} decomposes tasks
into atomic dependency trees, executes each node $n$ times with independent
LLM samples, and uses consensus voting to achieve defect rates scaling as
$O(p^{\lceil n/2 \rceil})$---reaching 3.4 DPMO (the Six Sigma threshold)
at $n=13$.
\textbf{Reflexion} \citep{shinn2023reflexion} implements a de facto PDCA
loop through verbal reinforcement: Plan $\to$ Act $\to$ Evaluate (Check)
$\to$ Reflect (Act), though it does not name this structure explicitly.
\textbf{ArcheFlow} \citep{nennemann2026archeflow} explicitly applies PDCA
quality cycles with Jungian archetypal roles, representing the first
framework to deliberately adopt a named PM/OM methodology with formal
convergence criteria.
\subsection{The Gap}
Despite the variety of frameworks, the PM/OM methods actually employed
cluster tightly around four approaches: (1) waterfall-style sequential
phases (MetaGPT, ChatDev), (2) role-based team simulation (CAMEL
\citep{li2023camel}, CrewAI), (3) informal ``manager'' delegation
(AutoGen), and (4) agile sprints (AgileCoder). The Six Sigma Agent
\citep{patel2026sixsigma} is a notable exception---the only framework to
explicitly name a PM/OM method as its primary architectural contribution.
Methods from lean manufacturing, constraint theory, military
decision-making, innovation management, and failure analysis remain
unexplored in the peer-reviewed agent orchestration literature, despite
strong structural compatibility with agent constraints.
% ============================================================
\section{Taxonomy of PM/OM Methods}
\label{sec:taxonomy}
We evaluate twelve methods spanning five categories: iterative improvement,
flow optimization, decision-making, innovation management, and quality
engineering. For each method, we describe the core mechanism, evaluate
structural compatibility with agent constraints \ref{c:stateless}--\ref{c:speed},
identify the primary adaptation required, and assess overall fitness.
% ---- 3.1 Iterative Improvement ----
\subsection{Iterative Improvement Methods}
\subsubsection{PDCA (Plan--Do--Check--Act)}
\label{sec:pdca}
\textbf{Origin}: Shewhart \citep{shewhart1939statistical}, popularized by
Deming \citep{deming1986out}.
\textbf{Mechanism}: Four-phase cycle repeated until quality targets are met.
Each cycle narrows the gap between current and desired state through
structured feedback.
\textbf{Agent fitness}: \textsc{High}. PDCA's phase structure maps directly
to agent orchestration: Plan (research + design agents), Do (implementation
agent), Check (review agents), Act (routing + merge decisions). The cycle
abstraction handles the core challenge of ``when to stop iterating'' through
convergence metrics. Demonstrated in ArcheFlow \citep{nennemann2026archeflow}.
\textbf{Key adaptation}: Convergence detection must be automated (human PDCA
relies on subjective judgment). ArcheFlow addresses this with a convergence
score based on finding classification (new, resolved, persistent, regressed)
and oscillation detection.
\textbf{Constraint fit}: Stateless (\ref{c:stateless})---artifacts persist
state between cycles. Cloning (\ref{c:cloning})---fresh agents per cycle
avoid accumulated bias. Speed (\ref{c:speed})---cycles complete in minutes,
enabling 2--3 cycles where humans would manage one.
\subsubsection{Scrum}
\label{sec:scrum}
\textbf{Origin}: Schwaber \& Sutherland, 1995.
\textbf{Mechanism}: Time-boxed sprints with defined roles (Product Owner,
Scrum Master, Development Team), ceremonies (planning, daily standup,
review, retrospective), and artifacts (backlog, sprint board, burndown).
\textbf{Agent fitness}: \textsc{Low--Medium}. Scrum's ceremony-heavy
structure exists primarily to manage human coordination challenges: standups
maintain shared awareness (agents can share a filesystem), retrospectives
address interpersonal friction (agents have none), sprint planning negotiates
capacity (agents have deterministic throughput). The useful kernel---time-boxed
work with a prioritized backlog---is trivially implementable without Scrum's
overhead.
\textbf{Key adaptation}: Strip ceremonies, keep the backlog + sprint
structure. ``Daily standups'' become status file reads. ``Retrospectives''
become cross-run memory extraction. The Scrum Master role is pure overhead
for agents.
\textbf{Constraint fit}: Psychology (\ref{c:psychology})---most Scrum
ceremonies solve human problems. Speed (\ref{c:speed})---sprint length
compresses from weeks to minutes. Cloning (\ref{c:cloning})---team
stability (a Scrum value) is irrelevant when agents are stateless.
\subsubsection{DMAIC (Six Sigma)}
\label{sec:dmaic}
\textbf{Origin}: Motorola, 1986; systematized by General Electric.
\textbf{Mechanism}: Define--Measure--Analyze--Improve--Control. Unlike PDCA,
DMAIC emphasizes \emph{statistical measurement} of process capability and
explicitly separates analysis (understanding the problem) from improvement
(fixing it).
\textbf{Agent fitness}: \textsc{Medium--High}. The Define--Measure--Analyze
front-loading is valuable for agents: it forces explicit quality metrics
\emph{before} implementation, preventing the common failure mode of agents
optimizing for the wrong objective. The Control phase---establishing
monitoring to prevent regression---maps to cross-run memory systems.
\textbf{Key adaptation}: Agents can compute statistical process control
metrics (defect rates, cycle times, sigma levels) automatically from event
logs. The ``Measure'' phase, which is expensive and tedious for humans,
becomes a strength: agents can instrument everything.
\textbf{Constraint fit}: Speed (\ref{c:speed})---full DMAIC in minutes.
Dysfunction (\ref{c:dysfunction})---agent failure modes have measurable
baselines, making sigma calculations meaningful. Stateless
(\ref{c:stateless})---Control phase requires persistent monitoring, which
must be explicitly built.
% ---- 3.2 Flow Optimization ----
\subsection{Flow Optimization Methods}
\subsubsection{Kanban}
\label{sec:kanban}
\textbf{Origin}: Toyota Production System, Taiichi Ohno, 1950s.
\textbf{Mechanism}: Pull-based workflow with explicit work-in-progress (WIP)
limits. Work items flow through columns (stages); new work is pulled only
when capacity is available. No iterations---continuous flow.
\textbf{Agent fitness}: \textsc{High}. Kanban's WIP limits directly address
a critical agent challenge: \emph{coordination cost scaling}. Without WIP
limits, spawning more agents increases throughput initially but eventually
degrades quality due to coordination overhead (conflicting changes, merge
conflicts, context fragmentation). Kanban provides a principled mechanism for
determining optimal concurrency.
\textbf{Key adaptation}: WIP limits should be \emph{dynamic}, adjusting
based on observed coordination costs (merge conflicts, finding duplications)
rather than fixed. The pull mechanism maps naturally: agents poll a task
queue and pull the highest-priority item they can handle.
\textbf{Constraint fit}: Cloning (\ref{c:cloning})---WIP limits are
\emph{exactly} the missing constraint for cheap-to-clone agents. Speed
(\ref{c:speed})---flow metrics (lead time, cycle time, throughput) update
in real-time. Psychology (\ref{c:psychology})---no ``swarming'' or
``blocked item'' social dynamics to manage.
\subsubsection{Theory of Constraints (TOC)}
\label{sec:toc}
\textbf{Origin}: Goldratt, \emph{The Goal}, 1984.
\textbf{Mechanism}: Identify the system's constraint (bottleneck), exploit
it (maximize its throughput), subordinate everything else to it, elevate it
(invest to remove it), repeat. The Five Focusing Steps.
\textbf{Agent fitness}: \textsc{High}. In multi-agent pipelines, the
bottleneck is typically the most capable (and expensive) agent: the
implementation agent that must run on a powerful model, or the security
reviewer that requires deep context. TOC provides a framework for
organizing the entire pipeline around this constraint.
\textbf{Key adaptation}: ``Exploit the constraint'' means ensuring the
bottleneck agent never waits for input. Pre-compute its context, batch
its inputs, and schedule cheaper agents (research, formatting, validation)
to run during its processing time. ``Subordinate'' means cheaper agents
should produce output in the format the bottleneck needs, not in whatever
format is easiest for them.
\textbf{Constraint fit}: Cloning (\ref{c:cloning})---non-bottleneck agents
are cheap to overprovision. Speed (\ref{c:speed})---constraint shifts can
be detected and responded to within a single run. Dysfunction
(\ref{c:dysfunction})---bottleneck agent's failure mode has outsized impact,
justifying targeted shadow detection.
\subsubsection{Lean / Toyota Production System}
\label{sec:lean}
\textbf{Origin}: Ohno, 1988; Womack \& Jones, 1996.
\textbf{Mechanism}: Eliminate waste (\emph{muda}), reduce variability
(\emph{mura}), avoid overburden (\emph{muri}). Seven wastes: overproduction,
waiting, transport, overprocessing, inventory, motion, defects.
\textbf{Agent fitness}: \textsc{Medium--High}. The seven wastes map
surprisingly well to agent systems:
\begin{itemize}[nosep]
\item \textbf{Overproduction}: Agents generating output nobody reads
(verbose research reports, unused alternative proposals).
\item \textbf{Waiting}: Agents idle while waiting for predecessor output
(sequential pipeline where parallel would work).
\item \textbf{Transport}: Redundant context passing (sending full codebase
to agents that need only a diff).
\item \textbf{Overprocessing}: Running thorough review on trivial changes.
\item \textbf{Inventory}: Accumulated artifacts from prior cycles that
are never referenced.
\item \textbf{Motion}: Agents reading files they don't need, exploring
irrelevant code paths.
\item \textbf{Defects}: Findings that are false positives, requiring
rework to dismiss.
\end{itemize}
\textbf{Key adaptation}: Lean's ``respect for people'' pillar has no direct
analog. The technical pillar (continuous improvement, waste elimination)
transfers fully.
% ---- 3.3 Decision-Making ----
\subsection{Decision-Making Methods}
\subsubsection{OODA Loop (Observe--Orient--Decide--Act)}
\label{sec:ooda}
\textbf{Origin}: John Boyd, 1976. Military strategy for air combat; later
generalized to competitive decision-making.
\textbf{Mechanism}: Continuous loop of Observe (gather data), Orient (analyze
context, update mental models), Decide (select course of action), Act
(execute). The key insight is that the \emph{speed} of the loop---not any
individual decision's quality---determines competitive advantage. ``Getting
inside the opponent's OODA loop'' means acting faster than the adversary can
react.
\textbf{Agent fitness}: \textsc{High}. OODA is structurally similar to PDCA
but optimized for speed over thoroughness. For agent systems, this maps to
scenarios requiring rapid adaptation: adversarial testing, incident response,
market-reactive coding, or any context where the problem space changes
during execution.
\textbf{Key adaptation}: Boyd's ``Orient'' phase---updating mental models
based on new information---is the hardest to implement for stateless agents.
It requires either persistent state (a world model that updates across
iterations) or a ``fast reorientation'' agent that rapidly synthesizes new
information into an updated context.
\textbf{Constraint fit}: Speed (\ref{c:speed})---agents can OODA at
superhuman frequency. Stateless (\ref{c:stateless})---the Orient phase
needs explicit state management. Psychology (\ref{c:psychology})---Boyd's
concept of ``mental agility'' translates to model selection: smaller, faster
models for rapid OODA; larger models for deep Orient phases.
\subsubsection{Cynefin Framework}
\label{sec:cynefin}
\textbf{Origin}: Snowden \& Boone, 2007.
\textbf{Mechanism}: Classify problems into five domains---\textsc{Clear}
(obvious cause-effect), \textsc{Complicated} (expert analysis needed),
\textsc{Complex} (emergent, probe-sense-respond), \textsc{Chaotic}
(act first, then sense), \textsc{Confused} (unknown domain)---and apply
domain-appropriate strategies.
\textbf{Agent fitness}: \textsc{Medium--High}. Cynefin provides a
\emph{meta-framework}: instead of choosing one orchestration method for all
tasks, classify the task first, then select the appropriate method:
\begin{itemize}[nosep]
\item \textsc{Clear}: Single agent, no review (``fix this typo'').
\item \textsc{Complicated}: Expert agent with review (PDCA fast workflow).
\item \textsc{Complex}: Multiple competing proposals, let results emerge
(PDCA standard/thorough with parallel alternatives).
\item \textsc{Chaotic}: Act immediately, stabilize, then analyze (OODA
with hotfix agent, then PDCA for proper fix).
\end{itemize}
\textbf{Key adaptation}: Task classification must be automated. Proxies:
number of files affected, cross-module dependencies, security sensitivity,
test coverage of affected area.
% ---- 3.4 Innovation Management ----
\subsection{Innovation Management Methods}
\subsubsection{Stage-Gate}
\label{sec:stagegate}
\textbf{Origin}: Cooper, 1990.
\textbf{Mechanism}: Innovation projects pass through stages (scoping,
business case, development, testing, launch), separated by gates where a
cross-functional team decides: Go, Kill, Hold, or Recycle. The gate
decision is binary---no ``continue with reservations.''
\textbf{Agent fitness}: \textsc{Medium}. The gate mechanism maps well to
agent confidence checks: a Creator agent's proposal either meets the
confidence threshold (Go) or doesn't (Kill/Recycle). However, Stage-Gate
assumes expensive stages (weeks/months of human work), making Kill decisions
high-stakes. For agents, stages are cheap (minutes), reducing the value of
formal gate decisions.
\textbf{Key adaptation}: Gates become lightweight confidence checks rather
than committee reviews. The ``Kill'' decision---rare and painful in human
innovation---should be common and cheap for agents. Explore multiple
proposals in parallel, gate aggressively, continue only the best.
\subsubsection{Design Thinking}
\label{sec:designthinking}
\textbf{Origin}: IDEO / Stanford d.school, 2000s.
\textbf{Mechanism}: Five phases: Empathize (understand the user),
Define (frame the problem), Ideate (generate solutions), Prototype (build
quickly), Test (get feedback). Emphasis on user empathy and divergent
thinking.
\textbf{Agent fitness}: \textsc{Low}. Design Thinking's core value
proposition---\emph{empathy with users}---is precisely what LLM agents
cannot genuinely do. Agents can simulate empathy (generate persona-based
scenarios), but the insight that comes from observing real users in context
has no agent equivalent. The Ideate phase (divergent brainstorming) is
feasible but produces quantity over quality without the ``empathy filter''
that makes Design Thinking effective.
\textbf{Key adaptation}: If used, the Empathize phase must be replaced
with explicit user research artifacts (personas, journey maps, interview
transcripts) provided as input. This transforms Design Thinking from a
discovery method into a synthesis method---fundamentally changing its nature.
\subsubsection{TRIZ}
\label{sec:triz}
\textbf{Origin}: Altshuller, 1946--1985. Theory of Inventive Problem
Solving.
\textbf{Mechanism}: Problems contain contradictions (improving one parameter
worsens another). TRIZ provides a contradiction matrix mapping 39 engineering
parameters to 40 inventive principles. Instead of compromise, TRIZ seeks
solutions that resolve the contradiction.
\textbf{Agent fitness}: \textsc{Medium}. TRIZ's structured problem-solving
is well-suited to agents: the contradiction matrix is a lookup table, and
agents can systematically apply inventive principles. However, TRIZ requires
\emph{reformulating the problem as a contradiction}---a creative step that
is itself challenging for agents.
\textbf{Key adaptation}: Provide the contradiction matrix as context. Train
agents to identify the ``improving parameter'' and ``worsening parameter''
in engineering tasks (e.g., ``improving security worsens performance'').
Use TRIZ principles as a structured brainstorming prompt for the Creator
archetype.
% ---- 3.5 Quality Engineering ----
\subsection{Quality Engineering Methods}
\subsubsection{FMEA (Failure Mode and Effects Analysis)}
\label{sec:fmea}
\textbf{Origin}: US Military, 1949; adopted by automotive (AIAG) and
aerospace.
\textbf{Mechanism}: For each component/process step, systematically
enumerate: (1) potential failure modes, (2) effects of each failure,
(3) causes, (4) current controls, (5) risk priority number
(severity $\times$ occurrence $\times$ detection). Address highest-RPN
items first.
\textbf{Agent fitness}: \textsc{High}. FMEA's systematic enumeration is
exactly what LLM agents excel at: given a design, enumerate everything that
could go wrong, assess severity, and propose mitigations. The Risk Priority
Number provides a quantitative framework for prioritizing review effort---more
principled than the common ``CRITICAL/WARNING/INFO'' severity classification.
\textbf{Key adaptation}: Use FMEA \emph{before} implementation (as part of
the Plan phase) rather than only during review. An FMEA agent analyzes the
Creator's proposal and generates a failure mode table; the Maker then
implements with awareness of high-RPN failure modes; the Guardian validates
that mitigations are in place.
\textbf{Constraint fit}: Dysfunction (\ref{c:dysfunction})---agents' own
failure modes can be pre-enumerated via FMEA, creating a meta-level
quality system. Cloning (\ref{c:cloning})---FMEA agents are cheap
(analytical, not creative), enabling systematic coverage.
\subsubsection{Statistical Process Control (SPC)}
\label{sec:spc}
\textbf{Origin}: Shewhart, 1920s.
\textbf{Mechanism}: Monitor process outputs over time using control charts.
Distinguish \emph{common cause} variation (inherent to the process) from
\emph{special cause} variation (attributable to specific events). React only
to special causes; reduce common cause variation through process improvement.
\textbf{Agent fitness}: \textsc{Medium--High}. SPC requires historical data,
which agent orchestration systems naturally generate (event logs, finding
counts, cycle times, token usage). Control charts over agent effectiveness
scores can distinguish between normal variation (``Guardian found 2 issues
this run vs. 1 last run'') and genuine degradation (``Guardian's false
positive rate spiked after a model update'').
\textbf{Key adaptation}: Sufficient run history is needed to establish
control limits. Early runs operate without SPC; after 10--20 runs,
control limits become meaningful. Model updates reset control limits
(new process = new baseline).
% ============================================================
\section{Compatibility Matrix}
\label{sec:matrix}
Table~\ref{tab:matrix} scores each method against the five agent constraints,
producing an overall fitness assessment.
\begin{table}[t]
\centering
\small
\caption{Compatibility matrix: PM/OM methods scored against agent constraints.
\textcolor{highfit}{\textbf{+}} = method benefits from this constraint;
\textcolor{lowfit}{\textbf{--}} = method is undermined;
\textcolor{neutral}{\textbf{0}} = neutral.
Overall fitness: H = High, M = Medium, L = Low.}
\label{tab:matrix}
\begin{tabular}{@{}l*{5}{c}c@{}}
\toprule
\textbf{Method} &
\textbf{C1} &
\textbf{C2} &
\textbf{C3} &
\textbf{C4} &
\textbf{C5} &
\textbf{Fit} \\
\midrule
PDCA & \textcolor{neutral}{0} & \textcolor{highfit}{+} & \textcolor{highfit}{+} & \textcolor{neutral}{0} & \textcolor{highfit}{+} & \textbf{H} \\
Scrum & \textcolor{lowfit}{--} & \textcolor{neutral}{0} & \textcolor{neutral}{0} & \textcolor{lowfit}{--} & \textcolor{highfit}{+} & \textbf{L--M} \\
DMAIC & \textcolor{lowfit}{--} & \textcolor{highfit}{+} & \textcolor{highfit}{+} & \textcolor{neutral}{0} & \textcolor{highfit}{+} & \textbf{M--H} \\
Kanban & \textcolor{neutral}{0} & \textcolor{highfit}{+} & \textcolor{neutral}{0} & \textcolor{highfit}{+} & \textcolor{highfit}{+} & \textbf{H} \\
TOC & \textcolor{neutral}{0} & \textcolor{highfit}{+} & \textcolor{highfit}{+} & \textcolor{highfit}{+} & \textcolor{highfit}{+} & \textbf{H} \\
Lean & \textcolor{neutral}{0} & \textcolor{highfit}{+} & \textcolor{neutral}{0} & \textcolor{lowfit}{--} & \textcolor{highfit}{+} & \textbf{M--H} \\
OODA & \textcolor{lowfit}{--} & \textcolor{highfit}{+} & \textcolor{neutral}{0} & \textcolor{highfit}{+} & \textcolor{highfit}{+} & \textbf{H} \\
Cynefin & \textcolor{neutral}{0} & \textcolor{neutral}{0} & \textcolor{neutral}{0} & \textcolor{neutral}{0} & \textcolor{neutral}{0} & \textbf{M--H} \\
Stage-Gate & \textcolor{neutral}{0} & \textcolor{highfit}{+} & \textcolor{neutral}{0} & \textcolor{highfit}{+} & \textcolor{lowfit}{--} & \textbf{M} \\
Design Think. & \textcolor{neutral}{0} & \textcolor{neutral}{0} & \textcolor{neutral}{0} & \textcolor{lowfit}{--} & \textcolor{neutral}{0} & \textbf{L} \\
TRIZ & \textcolor{neutral}{0} & \textcolor{highfit}{+} & \textcolor{neutral}{0} & \textcolor{neutral}{0} & \textcolor{highfit}{+} & \textbf{M} \\
FMEA & \textcolor{neutral}{0} & \textcolor{highfit}{+} & \textcolor{highfit}{+} & \textcolor{highfit}{+} & \textcolor{highfit}{+} & \textbf{H} \\
SPC & \textcolor{lowfit}{--} & \textcolor{neutral}{0} & \textcolor{highfit}{+} & \textcolor{highfit}{+} & \textcolor{highfit}{+} & \textbf{M--H} \\
\bottomrule
\end{tabular}
\end{table}
\subsection{Analysis}
Several patterns emerge from the compatibility matrix:
\textbf{High-fitness methods share three properties}: they are
\emph{mechanistic} (decisions follow rules, not judgment), \emph{flow-oriented}
(optimize throughput, not team dynamics), and \emph{metric-driven} (quality
is quantified, not discussed). PDCA, Kanban, TOC, OODA, and FMEA all share
this profile.
\textbf{Low-fitness methods are psychology-dependent}: Scrum and Design
Thinking derive their primary value from managing human cognitive and social
limitations. Without those limitations, the methods become overhead.
\textbf{The ``Cheap Clone'' constraint is universally beneficial}: every
method either benefits from or is neutral to the ability to spawn agents
cheaply. This suggests that agent orchestration should generally favor
\emph{parallelism}---run multiple approaches simultaneously, then
select the best result.
\textbf{``Stateless'' is the most disruptive constraint}: methods that
assume accumulated knowledge (Scrum's team velocity, SPC's control charts,
DMAIC's baseline measurements) require explicit persistence mechanisms that
agents don't provide natively.
% ============================================================
\section{Hybrid Approaches and Method Composition}
\label{sec:hybrid}
The methods in our taxonomy are not mutually exclusive. Effective agent
orchestration likely requires combining methods at different levels:
\subsection{Proposed Three-Layer Architecture}
\begin{description}
\item[Strategic layer (Cynefin)]: Classify the task and select the
appropriate orchestration method. Simple tasks get a single agent;
complicated tasks get PDCA; complex tasks get parallel competing
approaches; chaotic tasks get OODA.
\item[Operational layer (PDCA/OODA + Kanban)]: Execute the selected
method with flow control. Kanban WIP limits prevent coordination
overload. PDCA provides quality convergence for standard tasks; OODA
provides rapid adaptation for time-sensitive tasks.
\item[Quality layer (FMEA + SPC + TOC)]: Monitor execution quality.
FMEA front-loads failure analysis in the Plan phase. SPC monitors
long-term agent effectiveness trends. TOC identifies and optimizes
around bottleneck agents.
\end{description}
\subsection{ArcheFlow as a Case Study}
ArcheFlow \citep{nennemann2026archeflow} already implements elements of
this three-layer architecture, though without explicitly naming all methods:
\begin{itemize}[nosep]
\item \textbf{Strategic}: Workflow selection (fast/standard/thorough)
functions as a simplified Cynefin classification.
\item \textbf{Operational}: PDCA cycles with convergence detection;
sprint mode with WIP-limited parallel dispatch (implicit Kanban).
\item \textbf{Quality}: Shadow detection (behavioral FMEA for agent
failure modes); effectiveness scoring (rudimentary SPC); Guardian
fast-path (TOC---don't waste the bottleneck on clean code); ``Wiggum
Break'' circuit breakers (hard/soft halt conditions with event logging).
\end{itemize}
The gap is in explicit TOC application (identifying and optimizing around
the most expensive agent) and in OODA integration for time-sensitive tasks.
% ============================================================
\section{Decision Framework}
\label{sec:decision}
We propose a practitioner-oriented decision framework for selecting
orchestration methods based on three dimensions:
\begin{figure}[h]
\centering
\begin{tikzpicture}[
box/.style={draw, rounded corners, minimum width=3.5cm, minimum height=0.7cm, font=\small, fill=#1},
arrow/.style={-{Stealth[length=3mm]}, thick},
]
% Decision tree
\node[box=yellow!20] (start) {Task arrives};
\node[box=orange!15, below=0.8cm of start] (cynefin) {Classify (Cynefin)};
\node[box=green!15, below left=1cm and 2cm of cynefin] (clear) {Clear};
\node[box=green!15, below left=1cm and 0cm of cynefin] (complicated) {Complicated};
\node[box=blue!10, below right=1cm and 0cm of cynefin] (complex) {Complex};
\node[box=red!10, below right=1cm and 2cm of cynefin] (chaotic) {Chaotic};
\node[box=white, below=0.7cm of clear, text width=2.5cm, align=center, font=\scriptsize] (m1) {Single agent\\No review};
\node[box=white, below=0.7cm of complicated, text width=2.5cm, align=center, font=\scriptsize] (m2) {PDCA fast\\+ FMEA};
\node[box=white, below=0.7cm of complex, text width=2.5cm, align=center, font=\scriptsize] (m3) {PDCA thorough\\+ parallel proposals};
\node[box=white, below=0.7cm of chaotic, text width=2.5cm, align=center, font=\scriptsize] (m4) {OODA\\then PDCA};
\draw[arrow] (start) -- (cynefin);
\draw[arrow] (cynefin) -- (clear);
\draw[arrow] (cynefin) -- (complicated);
\draw[arrow] (cynefin) -- (complex);
\draw[arrow] (cynefin) -- (chaotic);
\draw[arrow] (clear) -- (m1);
\draw[arrow] (complicated) -- (m2);
\draw[arrow] (complex) -- (m3);
\draw[arrow] (chaotic) -- (m4);
\end{tikzpicture}
\caption{Decision framework for selecting agent orchestration method
based on Cynefin task classification.}
\label{fig:decision}
\end{figure}
\textbf{Cross-cutting concerns} apply regardless of classification:
\begin{itemize}[nosep]
\item \textbf{Kanban WIP limits}: Always. Prevents coordination overload.
\item \textbf{TOC awareness}: Identify the costliest agent; schedule
others around it.
\item \textbf{SPC monitoring}: After 10+ runs, establish control limits
for agent effectiveness.
\item \textbf{Lean waste audit}: Periodically review token usage patterns
for waste (unused artifacts, redundant context, overprocessing).
\end{itemize}
% ============================================================
\section{Open Research Directions}
\label{sec:future}
\subsection{Adaptive Method Selection}
Current frameworks use a fixed orchestration method. An adaptive system
would classify each incoming task (Cynefin), select the appropriate method,
and switch methods mid-execution if the task's nature changes (e.g.,
a ``complicated'' task reveals unexpected complexity during exploration).
This requires a \emph{method-aware orchestrator} that understands the
assumptions and exit criteria of each method.
\subsection{Kanban for Agent Swarms}
As agent counts increase beyond 5--10, coordination costs dominate.
Kanban's WIP limits and flow metrics provide a theoretical basis for
determining optimal agent concurrency, but empirical studies are needed
to establish how coordination cost scales with agent count across
different task types and model capabilities.
\subsection{OODA for Adversarial Agent Scenarios}
Boyd's OODA loop was designed for competitive environments where speed of
decision-making determines the winner. Applications include adversarial
testing (red team agents vs. blue team agents), competitive code generation
(multiple agents racing to solve a problem), and incident response
(rapid diagnosis and mitigation under time pressure).
\subsection{Cross-Method Quality Metrics}
Each PM/OM method defines quality differently: PDCA uses convergence scores,
Six Sigma uses sigma levels, Lean uses waste ratios, SPC uses control
limits. A unified quality metric for agent orchestration---one that allows
meaningful comparison across methods---does not yet exist.
\subsection{FMEA for Agent Failure Modes}
Agent failure modes (hallucination, scope creep, false positive reviews,
persona drift \citep{lu2026assistant}) can be systematically enumerated
using FMEA methodology. A comprehensive FMEA catalog for LLM agents---with
severity, occurrence, and detection ratings calibrated from empirical
data---would provide a foundation for designing more robust orchestration
systems.
% ============================================================
\section{Conclusion}
\label{sec:conclusion}
The operations management literature offers a rich toolkit for agent
orchestration that extends far beyond the agile methods currently dominant
in the field. Our taxonomy reveals that the highest-fitness methods---PDCA,
Kanban, TOC, OODA, and FMEA---share a common profile: mechanistic,
flow-oriented, and metric-driven. Methods centered on human psychology
(Scrum, Design Thinking) transfer poorly without fundamental reformulation.
The key insight is that LLM agents are not ``fast humans.'' They have
fundamentally different constraint profiles---cheap to clone, expensive to
coordinate, stateless, psychologically inert---and these differences make
some PM/OM methods \emph{more} effective (OODA loops at superhuman speed,
FMEA with exhaustive enumeration) while rendering others irrelevant
(standups without psychology, retrospectives without learning).
We encourage the agent orchestration community to look beyond agile sprints
and role-playing frameworks toward the broader operations management
tradition. A century of industrial practice has much to teach us about
orchestrating intelligent agents---if we take the time to translate.
% ============================================================
\section*{Acknowledgments}
The author thanks the operations management and quality engineering
communities whose work, developed over decades for human organizations,
provides the theoretical foundation for this analysis.
% ============================================================
\bibliographystyle{plainnat}
\bibliography{taxonomy-refs}
\end{document}

34
scripts/run-tests.sh Executable file
View File

@@ -0,0 +1,34 @@
#!/usr/bin/env bash
# run-tests.sh — Run all ArcheFlow bats tests.
#
# Usage: ./scripts/run-tests.sh [bats-args...]
# Examples:
# ./scripts/run-tests.sh # Run all tests
# ./scripts/run-tests.sh --filter "event" # Run only event tests
# ./scripts/run-tests.sh -t # TAP output
set -euo pipefail
SCRIPT_DIR="$(cd "$(dirname "${BASH_SOURCE[0]}")" && pwd)"
PROJECT_DIR="$(cd "$SCRIPT_DIR/.." && pwd)"
TESTS_DIR="$PROJECT_DIR/tests"
# Find bats binary
BATS="${BATS:-}"
if [[ -z "$BATS" ]]; then
if command -v bats &>/dev/null; then
BATS="bats"
elif [[ -x "$HOME/.local/bin/bats" ]]; then
BATS="$HOME/.local/bin/bats"
else
echo "ERROR: bats not found. Install bats-core or set BATS env var." >&2
exit 1
fi
fi
echo "Running ArcheFlow tests..."
echo " bats: $($BATS --version)"
echo " tests: $TESTS_DIR"
echo ""
exec "$BATS" "$@" "$TESTS_DIR"/*.bats

View File

@@ -1,292 +1,46 @@
---
name: act-phase
description: |
Use after the Check phase completes. Collects reviewer findings, prioritizes them, routes fixes to the right agent or tool, applies fixes systematically, and decides whether to exit or cycle.
Use after the Check phase completes. Collects reviewer findings, routes fixes, applies them, decides whether to exit or cycle.
<example>Automatically loaded during orchestration after Check phase</example>
<example>User: "Run just the act phase on existing findings"</example>
---
# Act Phase
After all reviewers complete, the Act phase turns findings into fixes and decides whether the cycle is done. This is the bridge between "what's wrong" and "what we do about it."
## Overview
Turn Check phase findings into fixes, then decide: exit or cycle.
```
Check phase output → Collect → Prioritize → Route → Fix → Verify → Exit or Cycle
Check output → Collect → Deduplicate → Route → Fix → Exit or Cycle
```
---
## Step 1: Finding Collection
## Step 1: Collect and Consolidate Findings
Parse all reviewer outputs into one consolidated findings table. Use the standardized format from the `check-phase` skill.
Parse all reviewer outputs into one table grouped by severity (CRITICAL / WARNING / INFO):
```markdown
## Findings Summary — Cycle N
### CRITICAL (must fix before next cycle)
| # | Source | Location | Category | Description | Suggested Fix |
|---|--------|----------|----------|-------------|---------------|
| 1 | guardian | src/auth/handler.ts:48 | security | Empty string bypasses validation | Add length check |
| 2 | trickster | src/api/parse.ts:92 | reliability | Null input causes crash | Guard with null check |
### WARNING (should fix)
| # | Source | Location | Category | Description | Suggested Fix |
|---|--------|----------|----------|-------------|---------------|
| 3 | sage | tests/auth.test.ts:15 | testing | Test names don't describe behavior | Rename to "should reject expired tokens" |
| 4 | guardian | src/auth/handler.ts:52 | security | Missing rate limit | Add rate limiter middleware |
### INFO (nice to have)
| # | Source | Location | Category | Description | Suggested Fix |
|---|--------|----------|----------|-------------|---------------|
| 5 | skeptic | src/auth/handler.ts:30 | design | Consider caching validated tokens | Add TTL cache |
```
### Deduplication
Before listing findings, deduplicate across reviewers (same rule as `check-phase`):
- Same file + same category + similar description = one finding
- Use the higher severity
- Credit all sources: `guardian + skeptic`
- Don't double-count in severity tallies
Same file + same category + similar description = one finding. Use the higher severity, credit all sources (e.g. `guardian + skeptic`).
### Cross-Cycle Tracking
### Cross-Cycle Tracking (cycle > 1)
Compare against prior cycle findings (if cycle > 1):
- **Resolved:** Finding from cycle N-1 no longer present mark resolved, do not re-raise
- **Persisting:** Same location + category still present → increment `cycle_count`
- **New:** Finding not seen before → add with `cycle_count: 1`
Compare against prior cycle findings:
- **Resolved** no longer present, mark resolved, do not re-raise
- **Persisting** — same location + category, increment `cycle_count`
- **New** — first appearance, `cycle_count: 1`
If a finding persists for 2+ consecutive cycles, flag for user escalation (see Step 5).
Finding persisting 2+ cycles = flag for escalation (see Step 4).
---
## Step 2: Fix Routing
Not all findings are fixed the same way. Route each finding based on its nature:
| Category | Fix Route | Rationale |
|----------|-----------|-----------|
| `security` | Spawn Maker with targeted instructions | Security fixes need tested code changes |
| `reliability` | Spawn Maker with targeted instructions | Same — code-level fix with test |
| `breaking-change` | Route to Creator in next cycle | Design decision needed |
| `design` | Route to Creator in next cycle | Architecture change, not a patch |
| `dependency` | Spawn Maker with targeted instructions | Package update or removal |
| `quality` | Spawn Maker or apply directly | Depends on scope (see below) |
| `testing` | Spawn Maker with targeted instructions | Tests need to be written and run |
| `consistency` | Apply directly or spawn Maker | Naming/style → direct. Pattern change → Maker |
### Direct Fix (no agent)
Apply directly with Edit tool when **all** of these are true:
- The fix is mechanical (typo, naming, formatting, import order)
- No behavioral change
- No test update needed
- Exactly one file affected
Examples: rename a variable, fix a typo in a string, reorder imports, fix indentation.
### Maker Fix (spawn agent)
Spawn a targeted Maker when the fix involves:
- Code logic changes
- New or modified tests
- Multiple files
- Any behavioral change
Provide the Maker with:
1. The specific finding(s) to address (not all findings — just the routed ones)
2. The file and line location
3. The suggested fix from the reviewer
4. The Maker's original branch (to apply fixes on top)
```
Agent(
description: "Fix: <finding description>",
prompt: "You are the MAKER archetype.
Apply this fix on branch: <maker's branch>
Finding: <source> | <severity> | <category>
Location: <file:line>
Issue: <description>
Suggested fix: <fix>
Rules:
1. Fix ONLY this issue — no other changes
2. Add/update tests if the fix changes behavior
3. Run existing tests — nothing may break
4. Commit with message: 'fix: <description>'
Do NOT refactor surrounding code.",
isolation: "worktree",
mode: "bypassPermissions"
)
```
### Writing/Prose Fix (domain-specific)
For writing projects (books, stories), voice or prose findings need special context:
```
Agent(
description: "Fix: voice drift in <file>",
prompt: "You are the MAKER archetype.
Apply this prose fix on branch: <maker's branch>
Finding: <source> | <severity> | <category>
Location: <file:line>
Issue: <description>
Voice profile to match: <load from .archeflow/config.yaml or project voice profile>
Rules:
1. Fix the flagged passage to match the voice profile
2. Do not rewrite surrounding paragraphs
3. Preserve the narrative intent — only change voice/style
4. Commit with message: 'fix: <description>'",
isolation: "worktree",
mode: "bypassPermissions"
)
```
### Design Fix (route to next cycle)
Findings that require design changes are NOT fixed in the Act phase. They become structured feedback for the Creator in the next PDCA cycle. Collect them into `act-feedback.md` (see Step 5).
---
## Step 3: Fix Application Protocol
Apply fixes in severity order: CRITICAL first, then WARNING, then INFO. Within the same severity, fix in file order (reduces context switching).
### For each fix:
1. **Apply the change** (direct edit or via Maker agent)
2. **Emit `fix.applied` event:**
```json
{
"type": "fix.applied",
"phase": "act",
"agent": "maker",
"data": {
"source": "guardian",
"finding": "Empty string bypasses validation",
"file": "src/auth/handler.ts",
"line": 48,
"severity": "CRITICAL",
"before": "<old code>",
"after": "<new code>"
},
"parent": [<seq of the review.verdict that found it>]
}
```
3. **Targeted re-check** (if the fix is non-trivial):
- Re-run only the reviewer that raised the finding
- Scope the re-check to just the changed file(s)
- If the re-check raises new findings → add them to the findings list with source `re-check:<reviewer>`
### Batching Maker Fixes
If multiple findings route to the same Maker and affect the same file or tightly coupled files, batch them into a single Maker spawn:
```
Agent(
description: "Fix: 3 findings in src/auth/",
prompt: "You are the MAKER archetype.
Apply these fixes on branch: <maker's branch>
1. [CRITICAL] src/auth/handler.ts:48 — Empty string bypass → Add length check
2. [WARNING] src/auth/handler.ts:52 — Missing rate limit → Add middleware
3. [WARNING] tests/auth.test.ts:15 — Bad test names → Rename to behavior descriptions
Fix all three. Commit each as a separate commit.
Run tests after all fixes."
)
```
Batch only within the same functional area. Don't batch unrelated fixes — the Maker loses focus.
---
## Step 4: Exit Decision
After all fixes are applied, evaluate exit conditions:
### Decision Tree
```
┌─ Count remaining CRITICAL findings (including from re-checks)
├─ CRITICAL = 0 AND completion criteria met (if defined)
│ └─ EXIT: Proceed to merge
├─ CRITICAL = 0 AND completion criteria NOT met
│ └─ CYCLE: Feed back "completion criteria failing" to Creator
├─ CRITICAL > 0 AND cycles_remaining > 0
│ └─ CYCLE: Build feedback, go to Plan phase
├─ CRITICAL > 0 AND cycles_remaining = 0
│ └─ STOP: Report to user with unresolved findings
└─ Same CRITICAL finding persisted 2+ cycles
└─ ESCALATE: Stop and ask user for guidance
```
### Emit `cycle.boundary` event:
```json
{
"type": "cycle.boundary",
"phase": "act",
"data": {
"cycle": 1,
"max_cycles": 2,
"exit_condition": "all_approved",
"met": false,
"critical_remaining": 1,
"warning_remaining": 2,
"info_remaining": 1,
"fixes_applied": 3,
"design_issues_forwarded": 1,
"next_action": "cycle"
}
}
```
---
## Step 5: Cycle Feedback Protocol
When cycling back, produce `act-feedback.md` as a structured handoff. This replaces dumping raw findings.
```markdown
## Cycle N Feedback → Cycle N+1
### For Creator (design changes needed)
| # | Source | Severity | Category | Issue | Cycles Open |
|---|--------|----------|----------|-------|-------------|
| 1 | guardian | CRITICAL | security | SQL injection in user input | 1 |
| 2 | skeptic | WARNING | design | Assumes single-tenant only | 1 |
### For Maker (implementation fixes needed)
| # | Source | Severity | Category | Issue | Cycles Open |
|---|--------|----------|----------|-------|-------------|
| 3 | sage | WARNING | testing | Test assertions too weak | 1 |
| 4 | trickster | WARNING | reliability | Error path not tested | 1 |
### Resolved in This Cycle
| # | Source | Issue | How Resolved |
|---|--------|-------|--------------|
| 5 | guardian | Missing rate limit | Added rate limiter middleware (commit abc123) |
| 6 | sage | Test names unclear | Renamed to behavior descriptions (commit def456) |
### Persisting Issues (escalation candidates)
| # | Source | Issue | Cycles Open | Action |
|---|--------|-------|-------------|--------|
| — | — | — | — | — |
```
**Routing rules** (canonical table — matches orchestration and artifact-routing skills):
This is the **canonical routing table** (single source of truth for the whole system):
| Source | Category | Routes to | Reason |
|--------|----------|-----------|--------|
@@ -296,76 +50,91 @@ When cycling back, produce `act-feedback.md` as a structured handoff. This repla
| Sage | quality, consistency | Maker | Implementation refinement |
| Sage | testing | Maker | Test gap, not design flaw |
| Trickster | reliability (design flaw) | Creator | Needs redesign |
| Trickster | reliability (test gap) | Maker | Needs more tests |
| Trickster | testing | Maker | Edge case not covered |
| Trickster | reliability (test gap), testing | Maker | Needs more tests |
**Disambiguation rule:** When in doubt: if the fix requires changing the approach, route to Creator. If it requires changing the code within the existing approach, route to Maker.
**Disambiguation:** If the fix requires changing the approach Creator. If it requires changing code within the existing approach Maker.
### Direct Fix (no agent)
Apply with Edit tool when **all** are true:
- Mechanical (typo, naming, formatting, import order)
- No behavioral change
- No test update needed
- Single file
### Maker Fix (spawn agent)
Spawn a targeted Maker when the fix involves code logic, tests, multiple files, or behavioral changes. Batch findings in the same file area into one Maker spawn.
```
Agent(
description: "Fix: <description>",
prompt: "You are the MAKER archetype.
Branch: <maker's branch>
Findings:
1. [CRITICAL] file:line — issue → suggested fix
2. [WARNING] file:line — issue → suggested fix
Rules: fix ONLY these issues, add/update tests if behavior changes,
run tests, commit each fix separately as 'fix: <description>'.
Do NOT refactor surrounding code.",
isolation: "worktree",
mode: "bypassPermissions"
)
```
### Design Fix (route to Creator)
Design findings are NOT fixed in Act. Collect them into `act-feedback.md` for the Creator in the next cycle (see Step 5).
---
## Step 6: Incremental Runs
## Step 3: Fix Application
Support starting the orchestration from any phase by reusing existing artifacts.
Apply in severity order: CRITICAL → WARNING → INFO. Within same severity, group by file.
### `--start-from check`
Re-run Check + Act on existing Do artifacts:
1. Read `.archeflow/artifacts/<run_id>/` for Maker branch and implementation summary
2. Verify the Maker branch still exists (`git branch --list`)
3. Spawn reviewers against the existing branch
4. Proceed through Act phase normally
### `--start-from act`
Re-run Act with existing Check findings:
1. Read `.archeflow/artifacts/<run_id>/` for Check phase consolidated output
2. Parse findings from the stored reviewer outputs
3. Skip finding collection (already done) — proceed from Step 2 (Fix Routing)
### `--start-from do`
Re-run Do + Check + Act with existing Plan:
1. Read `.archeflow/artifacts/<run_id>/` for Creator's proposal
2. Verify proposal exists and is parseable
3. Spawn Maker with the existing proposal
4. Proceed through Check and Act normally
### Artifact Verification
Before starting from a mid-point, verify required artifacts exist:
```
--start-from do → needs: proposal (Creator output)
--start-from check → needs: proposal + implementation (Maker branch + summary)
--start-from act → needs: proposal + implementation + review outputs
```
If artifacts are missing, report which ones and abort. Don't guess or generate placeholders.
### Event Continuity
For incremental runs, emit events with `parent` pointing to the existing artifacts' events:
1. Read the existing `<run_id>.jsonl` to find the last `seq` number
2. Continue sequence numbering from there
3. Set `parent` on the first new event to point to the last event of the prior phase
For each fix:
1. Apply the change (direct edit or via Maker agent)
2. Emit `fix.applied` event with source, finding, file, severity, before/after
3. For non-trivial fixes: re-run only the originating reviewer scoped to changed files. New findings from re-check get added with source `re-check:<reviewer>`
---
## Act Phase Checklist (Quick Reference)
## Step 4: Exit Decision
```
□ Parse all reviewer outputs into consolidated findings table
□ Deduplicate across reviewers
□ Compare against prior cycle findings (if cycle > 1)
□ Route each finding: direct fix / Maker / Creator feedback
□ Apply direct fixes first (fastest)
□ Spawn Maker(s) for code fixes (batch by file area)
□ Emit fix.applied event for each fix
□ Re-check non-trivial fixes with the originating reviewer
□ Count remaining CRITICALs after all fixes
□ Check completion criteria (if defined)
□ Decide: exit / cycle / escalate
□ If cycling: produce act-feedback.md with routed findings
□ If exiting: proceed to merge (see orchestration skill Step 4)
□ Emit cycle.boundary event
CRITICAL = 0 AND criteria met → EXIT: proceed to merge
CRITICAL = 0 AND criteria NOT met → CYCLE: feedback to Creator
CRITICAL > 0 AND cycles remaining → CYCLE: build feedback, go to Plan
CRITICAL > 0 AND no cycles left → STOP: report unresolved to user
Same CRITICAL persists 2+ cycles → ESCALATE: ask user for guidance
```
Emit `cycle.boundary` event with: cycle number, max_cycles, critical/warning/info remaining, fixes applied, next action.
---
## Step 5: Cycle Feedback
When cycling back, produce `act-feedback.md`:
```markdown
## Cycle N → Cycle N+1
### For Creator (design changes needed)
| # | Source | Severity | Category | Issue | Cycles Open |
|---|--------|----------|----------|-------|-------------|
### For Maker (implementation fixes needed)
| # | Source | Severity | Category | Issue | Cycles Open |
|---|--------|----------|----------|-------|-------------|
### Resolved This Cycle
| # | Source | Issue | How Resolved |
|---|--------|-------|--------------|
### Persisting Issues (escalation candidates)
| # | Source | Issue | Cycles Open | Action |
|---|--------|-------|-------------|--------|
```
Route findings into Creator vs Maker sections using the routing table in Step 2.

34
skills/af-dag/SKILL.md Normal file
View File

@@ -0,0 +1,34 @@
---
name: af-dag
description: |
Show the DAG of the current or last ArcheFlow run.
<example>User: "/af-dag"</example>
<example>User: "/af-dag 2026-04-06-jwt-auth"</example>
---
# ArcheFlow Run DAG
1. Parse `run_id` from args. If none provided, read the latest run_id from `.archeflow/events/index.jsonl`.
2. Run `./lib/archeflow-dag.sh .archeflow/events/<run_id>.jsonl` if the script exists. Display its output.
3. If the script does not exist, read `.archeflow/events/<run_id>.jsonl` and render a text DAG:
- Each node is an event (phase transitions, agent starts/completes, findings).
- Show parent relationships via indentation.
- Mark completed events with `[done]`, active with `[running]`, failed with `[FAIL]`.
Example output:
```
run.start 2026-04-06-jwt-auth
plan.start
agent.complete explorer (42s)
agent.complete creator (68s)
do.start
agent.complete maker (180s)
check.start
agent.complete guardian (55s) -- 3 findings
agent.complete skeptic (40s) -- 1 finding
act.start
fixes.applied 3/4
run.complete (6m12s)
```
4. If no events found for the run_id, say: "No events found for run `<run_id>`."

42
skills/af-replay/SKILL.md Normal file
View File

@@ -0,0 +1,42 @@
---
name: af-replay
description: "Replay and analyze a recorded ArcheFlow run: decision timeline and weighted what-if. Usage: /af-replay <run-id> [--timeline|--whatif|--compare] [--weights arch=w,...]"
user-invocable: true
---
# ArcheFlow Run Replay
Inspect a completed or in-progress run logged in `.archeflow/events/<run_id>.jsonl`. Use this to study which archetypes drove outcomes and to simulate **weighted** consensus (what-if).
## Recording (during PDCA)
After each meaningful orchestration choice, log a **decision point** (in addition to `review.verdict` where applicable):
```bash
./lib/archeflow-decision.sh <run_id> <phase> <archetype> '<input_summary>' '<decision>' <confidence> [parent_seq]
```
Fields stored: `phase`, `archetype`, `input`, `decision`, `confidence`, `ts` (event timestamp). The event type is `decision.point`.
Lower-level alternative:
```bash
./lib/archeflow-event.sh "$RUN_ID" decision.point check guardian \
'{"archetype":"guardian","input":"diff","decision":"needs_changes","confidence":0.85}' 7
```
## Commands (from project root)
| Action | Shell |
|--------|--------|
| Timeline | `./lib/archeflow-replay.sh timeline <run_id>` |
| What-if | `./lib/archeflow-replay.sh whatif <run_id> [--weights guardian=2,sage=0.5] [--threshold 0.5] [--json]` |
| Both | `./lib/archeflow-replay.sh compare <run_id> [--weights ...]` |
- **Timeline** lists `decision.point` rows and `review.verdict` (check phase).
- **What-if** reads the **last** `review.verdict` per archetype in check. **Original** outcome uses strict any-veto (any non-approve → BLOCK). **Replay** uses weighted mean strictness: each reviewer contributes weight × (1 if not approved, else 0); BLOCK if mean ≥ threshold (default 0.5).
- **`--json`** emits machine-readable output for dashboards or scripts.
## Learning effectiveness
Correlate `decision.point` confidence and verdicts with cycle outcomes (`cycle.boundary`, `run.complete`) and `./lib/archeflow-score.sh extract` to see which archetypes add signal for which task shapes.

40
skills/af-report/SKILL.md Normal file
View File

@@ -0,0 +1,40 @@
---
name: af-report
description: |
Generate a full process report for an ArcheFlow run.
<example>User: "/af-report"</example>
<example>User: "/af-report 2026-04-06-jwt-auth"</example>
---
# ArcheFlow Run Report
1. Parse `run_id` from args. If none provided, read the latest run_id from `.archeflow/events/index.jsonl`.
2. Run `./lib/archeflow-report.sh .archeflow/events/<run_id>.jsonl` if the script exists. Display its output.
3. If the script does not exist, read `.archeflow/events/<run_id>.jsonl` and produce a markdown report:
```markdown
# ArcheFlow Report: <run_id>
## Overview
| Field | Value |
|-------|-------|
| Task | ... |
| Workflow | fast/standard/thorough |
| Cycles | N |
| Duration | Xm Ys |
| Total Cost | $X.XX |
## Phase Summary
For each phase (Plan, Do, Check, Act): agents involved, duration, token cost, key outputs.
## Findings
Table of all findings: severity, category, description, archetype source, resolution (fixed/dismissed/deferred).
## Fixes Applied
List of fixes with before/after summary and which finding they addressed.
## Lessons Learned
Any new lessons extracted to memory during this run.
```
4. If no events found for the run_id, say: "No events found for run `<run_id>`."

23
skills/af-score/SKILL.md Normal file
View File

@@ -0,0 +1,23 @@
---
name: af-score
description: |
Show archetype effectiveness scores across runs.
<example>User: "/af-score"</example>
---
# ArcheFlow Effectiveness Scores
1. Run `./lib/archeflow-score.sh list` if the script exists. Display its output.
2. If the script does not exist, read `.archeflow/memory/effectiveness.jsonl` directly.
3. Summarize per archetype as a table:
| Archetype | Runs | Signal/Noise | Fix Rate | Avg Cost |
|-----------|------|--------------|----------|----------|
| Guardian | ... | ... | ... | ... |
| Skeptic | ... | ... | ... | ... |
- **Signal/Noise**: findings that led to actual fixes vs total findings raised.
- **Fix Rate**: percentage of findings that were applied (not dismissed).
- **Avg Cost**: mean token cost per review across runs.
4. If no effectiveness data exists, say: "No effectiveness data yet. Run `/af-run` at least once."

25
skills/af-status/SKILL.md Normal file
View File

@@ -0,0 +1,25 @@
---
name: af-status
description: |
Show ArcheFlow status — current/last run, active agents, findings.
<example>User: "/af-status"</example>
---
# ArcheFlow Status
1. Read `.archeflow/state.json` if it exists. Extract: task, phase, cycle, workflow, active agents, findings count, start time.
2. If `state.json` does not exist, read the latest entry from `.archeflow/events/index.jsonl`. Extract run_id, task, last event type, timestamp.
3. Calculate duration from start time to now (or to completion time if run finished).
4. Report as a compact table:
| Field | Value |
|-------|-------|
| Run | `<run_id>` |
| Task | `<task description>` |
| Phase | `<current phase>` |
| Cycle | `<cycle number>` |
| Workflow | `<fast/standard/thorough>` |
| Findings | `<count>` |
| Duration | `<elapsed>` |
5. If no `state.json` and no `index.jsonl`, say: "No active or recent ArcheFlow runs."

View File

@@ -1,289 +0,0 @@
---
name: artifact-routing
description: |
Inter-phase artifact protocol for ArcheFlow runs. Defines how artifacts are named, stored,
routed between agents, and archived across PDCA cycles. Ensures each agent receives exactly
the context it needs — no more, no less.
<example>Automatically loaded by archeflow:run</example>
<example>User: "What does the Maker receive as context?"</example>
---
# Artifact Routing — Inter-Phase Context Protocol
Every ArcheFlow run produces artifacts — research notes, proposals, diffs, reviews, feedback. This skill defines how those artifacts are named, where they live, what each agent receives, and how they are preserved across cycles.
## Artifact Directory Structure
```
.archeflow/artifacts/<run_id>/
├── plan-explorer.md # Explorer research output
├── plan-creator.md # Creator proposal/outline
├── do-maker.md # Maker implementation summary
├── do-maker-files.txt # List of files created/modified (one path per line)
├── check-guardian.md # Guardian review verdict + findings
├── check-sage.md # Sage review (if present)
├── check-skeptic.md # Skeptic review (if present)
├── check-trickster.md # Trickster review (if present)
├── act-feedback.md # Structured feedback for next cycle (Cycle Feedback Protocol)
├── act-fixes.jsonl # Applied fixes log (one JSON line per fix)
├── cycle-1/ # Archived artifacts from cycle 1
│ ├── plan-explorer.md
│ ├── plan-creator.md
│ ├── do-maker.md
│ ├── do-maker-files.txt
│ ├── check-guardian.md
│ ├── check-sage.md
│ └── act-feedback.md
└── cycle-2/ # Archived artifacts from cycle 2 (if cycle 3 starts)
└── ...
```
## Naming Convention
Artifacts follow the pattern: `<phase>-<agent>.<ext>`
| Phase | Agent | Filename | Format |
|-------|-------|----------|--------|
| plan | explorer | `plan-explorer.md` | Markdown research report |
| plan | creator | `plan-creator.md` | Markdown proposal with confidence scores |
| plan | mini-explorer | `plan-mini-explorer.md` | Focused risk research (only if confidence gate triggers) |
| do | maker | `do-maker.md` | Markdown implementation summary |
| do | maker | `do-maker-files.txt` | Plain text, one file path per line |
| check | guardian | `check-guardian.md` | Markdown verdict + findings table |
| check | sage | `check-sage.md` | Markdown verdict + findings table |
| check | skeptic | `check-skeptic.md` | Markdown verdict + findings table |
| check | trickster | `check-trickster.md` | Markdown verdict + findings table |
| act | (orchestrator) | `act-feedback.md` | Structured feedback (see Cycle Feedback Protocol) |
| act | (orchestrator) | `act-fixes.jsonl` | JSONL fix log |
**Rule:** Never invent new artifact names during a run. If a reviewer is skipped (A2 fast-path, reviewer profile), its artifact simply does not exist. Downstream phases check for file existence before reading.
---
## Context Injection Rules
Each agent receives a filtered subset of artifacts. This is the **attention filter** — it controls what context is injected into the agent's prompt.
### Plan Phase
| Agent | Receives | Does NOT receive |
|-------|----------|-----------------|
| **Explorer** | Task description, relevant file paths, codebase access | Prior proposals, review outputs, implementation details |
| **Creator** (cycle 1) | Task description, `plan-explorer.md` (if exists) | Raw file contents (Explorer summarized them), git diffs |
| **Creator** (cycle 2+) | Task description, `plan-explorer.md`, `act-feedback.md` (Creator-routed findings only) | Raw reviewer outputs, Maker-routed findings |
**Creator context injection template (cycle 2+):**
```markdown
## Task
<task description>
## Research (from Explorer)
<contents of plan-explorer.md>
## Feedback from Prior Cycle
<Creator-routed section of act-feedback.md only>
Note: Address each unresolved issue listed above. Explain how your revised proposal resolves it.
```
### Do Phase
| Agent | Receives | Does NOT receive |
|-------|----------|-----------------|
| **Maker** (cycle 1) | `plan-creator.md` (the proposal), `plan-mini-explorer.md` (if exists) | `plan-explorer.md`, reviewer outputs, raw task description |
| **Maker** (cycle 2+) | `plan-creator.md`, `plan-mini-explorer.md` (if exists), Maker-routed findings from `act-feedback.md` | Explorer research, Guardian/Skeptic findings (those went to Creator) |
**Maker context injection template (cycle 2+):**
```markdown
## Proposal
<contents of plan-creator.md>
## Implementation Feedback from Prior Cycle
<Maker-routed section of act-feedback.md only>
Note: The proposal has been revised to address design-level issues. Focus on the implementation
feedback items above (code quality, test gaps, consistency).
```
**Why Maker doesn't get Explorer output:** The Creator already distilled Explorer's research into a concrete proposal. Giving Maker raw research causes scope creep and "Rogue" shadow activation.
### Check Phase
| Agent | Receives | Does NOT receive |
|-------|----------|-----------------|
| **Guardian** | Maker's git diff, risk section from `plan-creator.md` | Full proposal, Explorer research, other reviewer outputs |
| **Skeptic** | `plan-creator.md` (assumptions focus) | Git diff details, Explorer research, other reviewer outputs |
| **Sage** | `plan-creator.md`, Maker's git diff, `do-maker.md` | Explorer research, other reviewer outputs |
| **Trickster** | Maker's git diff only | Everything else |
**Guardian context injection template:**
```markdown
## Changes to Review
<git diff from Maker's branch>
## Risk Assessment (from proposal)
<risks section extracted from plan-creator.md>
Review these changes for security, reliability, breaking changes, and dependency risks.
```
**Skeptic context injection template:**
```markdown
## Proposal to Challenge
<contents of plan-creator.md>
Focus on assumptions, alternatives not considered, edge cases, and scalability.
```
**Sage context injection template:**
```markdown
## Proposal
<contents of plan-creator.md>
## Implementation Summary
<contents of do-maker.md>
## Changes
<git diff from Maker's branch>
Evaluate code quality, test coverage, documentation, and codebase consistency.
```
**Trickster context injection template:**
```markdown
## Changes to Attack
<git diff from Maker's branch>
Try to break this. Malformed input, boundaries, concurrency, error paths, dependency failures.
```
### Act Phase
No agents are spawned in Act. The orchestrator reads all `check-*.md` artifacts directly.
---
## Feedback Routing
> **This is the canonical routing table.** Other skills (orchestration, act-phase) must match this table exactly. When updating routing rules, update this table first, then sync the others.
When building `act-feedback.md` after the Check phase, route each finding to the right agent for the next cycle:
| Finding Source | Finding Category | Routes To | Rationale |
|---------------|-----------------|-----------|-----------|
| Guardian | security, breaking-change | **Creator** | Design must change |
| Guardian | reliability, dependency | **Creator** | Architectural decision needed |
| Skeptic | design, scalability | **Creator** | Assumptions need revision |
| Sage | quality, consistency | **Maker** | Implementation refinement |
| Sage | testing | **Maker** | Test gap, not design flaw |
| Trickster | reliability (design flaw) | **Creator** | Needs redesign |
| Trickster | reliability (test gap) | **Maker** | Needs more tests |
| Trickster | testing | **Maker** | Edge case not covered |
**Disambiguation rule:** When in doubt: if the fix requires changing the approach, route to Creator. If it requires changing the code within the existing approach, route to Maker.
### Feedback File Format
`act-feedback.md` is split into two sections so each agent can be given only its portion:
```markdown
# Cycle <N> Feedback
## Creator-Routed Issues
| # | Source | Severity | Category | Issue | Suggested Fix |
|---|--------|----------|----------|-------|---------------|
| 1 | Guardian | CRITICAL | security | SQL injection in user input | Add parameterized queries |
| 2 | Skeptic | WARNING | design | Assumes single-tenant only | Add tenant isolation |
## Maker-Routed Issues
| # | Source | Severity | Category | Issue | Suggested Fix |
|---|--------|----------|----------|-------|---------------|
| 3 | Sage | WARNING | quality | Test names don't describe behavior | Rename to describe expected outcome |
| 4 | Sage | INFO | consistency | Import order doesn't match codebase style | Re-order imports |
## Resolved (from prior cycles)
| # | Source | Issue | Resolution | Resolved In |
|---|--------|-------|------------|-------------|
| 1 | Guardian | Missing rate limit | Added rate limiter middleware | Cycle 1 |
## Convergence Warnings
<any finding that appeared unresolved in 2+ consecutive cycles — requires user input>
```
When injecting feedback into Creator's prompt, include **only** the "Creator-Routed Issues" section.
When injecting feedback into Maker's prompt, include **only** the "Maker-Routed Issues" section.
---
## Cycle Archiving
When a PDCA cycle completes and a new cycle begins, archive the current artifacts so they are preserved and the working directory is clean for the next iteration.
### Archive Procedure
At the end of each cycle (before starting the next):
```bash
RUN_DIR=".archeflow/artifacts/${RUN_ID}"
ARCHIVE_DIR="${RUN_DIR}/cycle-${CYCLE}"
mkdir -p "$ARCHIVE_DIR"
# Copy all phase artifacts to archive
cp "${RUN_DIR}"/plan-*.md "$ARCHIVE_DIR/" 2>/dev/null || true
cp "${RUN_DIR}"/do-*.md "$ARCHIVE_DIR/" 2>/dev/null || true
cp "${RUN_DIR}"/do-*.txt "$ARCHIVE_DIR/" 2>/dev/null || true
cp "${RUN_DIR}"/check-*.md "$ARCHIVE_DIR/" 2>/dev/null || true
cp "${RUN_DIR}"/act-feedback.md "$ARCHIVE_DIR/" 2>/dev/null || true
```
**Do NOT delete** the working-level artifacts after archiving. The next cycle's agents need `act-feedback.md` and `plan-explorer.md` (Explorer cache may reuse prior research). Old artifacts in the working directory get overwritten when the new cycle's agents produce their outputs.
### Archive Access
Archived artifacts are read-only references. Use them for:
- **Resolution tracking:** Compare `cycle-1/check-guardian.md` findings against `cycle-2/check-guardian.md` to detect resolved/persisting issues
- **Convergence detection:** Same finding in `cycle-N/act-feedback.md` and `cycle-N+1/act-feedback.md` → escalate to user
- **Post-hoc analysis:** Understanding how a solution evolved across cycles
---
## Artifact Existence Checks
Before injecting an artifact into an agent's context, always check if the file exists. Missing artifacts are expected in certain workflows:
| Artifact | Missing when |
|----------|-------------|
| `plan-explorer.md` | Fast workflow (no Explorer) |
| `plan-mini-explorer.md` | Confidence gate did not trigger for risk coverage |
| `check-skeptic.md` | Fast workflow, or A2 fast-path taken |
| `check-sage.md` | Fast workflow, or A2 fast-path taken |
| `check-trickster.md` | Non-thorough workflow, or A2 fast-path taken |
| `act-feedback.md` | Cycle 1 (no prior feedback exists) |
| `act-fixes.jsonl` | Cycle 1, or no fixes applied |
**Rule:** Never fail because an optional artifact is missing. Check existence, skip injection if absent, and note what was skipped in the event data.
---
## Git Diff as Artifact
The Maker's git diff is not saved as a file — it is generated on-the-fly from the Maker's worktree branch:
```bash
git diff main...<maker-branch>
```
This ensures reviewers always see the actual current diff, not a stale snapshot. The diff is injected directly into reviewer prompts, not saved to disk.
Exception: `do-maker-files.txt` IS saved to disk (just the file list, not the full diff) for quick reference by the orchestrator and for archiving purposes.
---
## Design Principles
1. **Minimal context per agent.** Each agent gets only what it needs. Over-injection causes distraction, shadow activation, and wasted tokens.
2. **Artifacts are the handoff mechanism.** Agents never communicate directly. All inter-agent data flows through saved artifacts.
3. **Files over memory.** Everything is on disk. If a session crashes, artifacts survive. A `--start-from` resume reads artifacts, not session state.
4. **Overwrite, don't accumulate.** Working-level artifacts get overwritten each cycle. Archives preserve history. This keeps the working directory simple.
5. **Check before inject.** Always verify artifact existence. Gracefully handle missing optional artifacts.

View File

@@ -1,39 +0,0 @@
---
name: attention-filters
description: Use when spawning archetype agents to decide what context each agent receives. Reduces token waste and sharpens focus by passing only relevant artifacts.
---
# Attention Filters
Each archetype needs different context. Pass only what's relevant — not everything.
| Archetype | Receives | Does NOT Receive |
|-----------|----------|-----------------|
| Explorer | Task description, codebase access | Prior proposals or reviews |
| Creator | Explorer's research + task description | Implementation details |
| Maker | Creator's proposal | Explorer's research, reviews |
| Guardian | Maker's git diff + proposal risk section | Explorer's research |
| Skeptic | Creator's proposal (focus: assumptions) | Git diff details |
| Trickster | Maker's git diff only | Everything else |
| Sage | Proposal + implementation + diff | Explorer's raw research |
## Why This Matters
- **Token cost:** A Guardian reading the Explorer's 2000-word research wastes ~2600 tokens on irrelevant context
- **Focus:** An agent with too much context drifts from its archetype's concern
- **Shadow prevention:** Over-loading context encourages rabbit-holing (Explorer) and scope creep (Maker)
## In Practice
When spawning a Check-phase agent, include only the filtered context in the prompt:
```
# Guardian receives:
"Review these changes: <git diff output>
The proposal identified these risks: <risks section only>
Verdict: APPROVED or REJECTED with findings."
# NOT:
"Here is the full research, the full proposal, the full implementation,
the full git log, and everything else we have..."
```

View File

@@ -1,221 +1,70 @@
---
name: autonomous-mode
description: Use when the user wants to run ArcheFlow orchestrations unattended overnight sessions, batch processing multiple tasks, or fully autonomous coding. Handles self-organization, progress logging, and safe stopping.
description: Use when the user wants to run ArcheFlow orchestrations unattended -- overnight sessions, batch processing multiple tasks, or fully autonomous coding. Handles self-organization, progress logging, and safe stopping.
---
# Autonomous Mode
ArcheFlow orchestrations can run fully autonomously because the archetypes self-organize through the PDCA cycle. The user sets the task queue, walks away, and reviews results later.
## How Autonomous Mode Works
The PDCA cycle provides natural quality gates at every turn of the spiral:
- **Plan** phase produces a proposal — reviewable artifact
- **Do** phase produces committed code in a worktree — isolated, reversible
- **Check** phase produces approval/rejection — automatic quality control
- **Act** phase either merges (safe) or cycles back (self-correcting)
No unreviewed code reaches the main branch. Ever. That's what makes overnight runs safe.
## Starting an Autonomous Session
```
You are entering AUTONOMOUS MODE.
Task queue:
1. "Add input validation to all API endpoints" (thorough)
2. "Refactor auth middleware to use JWT" (standard)
3. "Fix pagination bug in search results" (fast)
4. "Add rate limiting to public endpoints" (standard)
Rules:
- Process tasks sequentially (one orchestration at a time)
- Log progress to .archeflow/session-log.md after each task
- If a task fails after max cycles: log findings, skip to next task
- If 3 consecutive tasks fail: STOP and wait for user
- Commit and push after each successful merge
- Never force-push. Never modify main history.
```
## Session Log — Full Visibility
Every autonomous session writes to `.archeflow/session-log.md`:
```markdown
# ArcheFlow Autonomous Session
**Started:** 2026-04-02 22:00 UTC
**Mode:** autonomous
**Tasks:** 4 queued
---
## Task 1: Add input validation to all API endpoints
**Workflow:** thorough | **Status:** COMPLETED
**Cycles:** 2 of 3
**Cycle 1:** Guardian REJECTED (missing sanitization on 2 endpoints)
**Cycle 2:** All APPROVED
**Files changed:** 8 | **Tests added:** 24
**Branch:** merged to main (commit abc1234)
**Duration:** 12 min | **Completed:** 22:12 UTC
---
## Task 2: Refactor auth middleware to use JWT
**Workflow:** standard | **Status:** COMPLETED
**Cycles:** 1 of 2
**Cycle 1:** All APPROVED (clean implementation)
**Files changed:** 5 | **Tests added:** 15
**Branch:** merged to main (commit def5678)
**Duration:** 8 min | **Completed:** 22:20 UTC
---
## Task 3: Fix pagination bug in search results
**Workflow:** fast | **Status:** COMPLETED
**Cycles:** 1 of 1
**Cycle 1:** Guardian APPROVED
**Files changed:** 2 | **Tests added:** 3
**Branch:** merged to main (commit ghi9012)
**Duration:** 4 min | **Completed:** 22:24 UTC
---
## Task 4: Add rate limiting to public endpoints
**Workflow:** standard | **Status:** FAILED (max cycles)
**Cycles:** 2 of 2
**Cycle 1:** Skeptic REJECTED (Redis dependency not in Docker setup)
**Cycle 2:** Guardian REJECTED (race condition in token bucket)
**Unresolved:** Race condition in concurrent token bucket decrement
**Branch:** archeflow/maker-xyz (NOT merged — available for manual review)
**Duration:** 15 min | **Completed:** 22:39 UTC
---
## Session Summary
**Completed:** 3 of 4 tasks
**Failed:** 1 (rate limiting — needs human input on concurrency design)
**Total duration:** 39 min
**Files changed:** 15 | **Tests added:** 42
**Ended:** 22:39 UTC
```
## Safety Mechanisms
### Automatic Stop Conditions
The session halts and waits for the user when:
- **3 consecutive failures:** Something systemic is wrong
- **Destructive action detected:** Force push, branch deletion, schema drop
- **Shadow escalation:** Same shadow detected 3+ times across tasks
- **Budget exceeded:** If cost tracking is enabled, stop at budget limit
- **Test suite broken:** If existing tests fail after merge, halt immediately and revert
### Everything is Reversible
- Code changes live on worktree branches until explicitly merged
- Merges use `--no-ff` — every merge commit is individually revertable
- The session log captures every decision for post-hoc review
- Failed tasks leave their branches intact for manual inspection
### User Controls
The user can at any time:
- **Cancel:** Kill the session. All incomplete work stays on branches.
- **Pause:** Stop after current task completes. Resume later.
- **Skip:** Skip the current task, move to the next one.
- **Review:** Read `.archeflow/session-log.md` for real-time progress.
- **Intervene:** Jump into a worktree branch and fix something manually.
ArcheFlow orchestrations run fully autonomously through the PDCA cycle's natural quality gates. No unreviewed code reaches main.
## Task Queue Formats
### Simple (inline)
**Inline:**
```
Tasks:
1. "Fix the login bug" (fast)
2. "Add user profile page" (standard)
```
### From File
Create `.archeflow/queue.md`:
**From file (`.archeflow/queue.md`):**
```markdown
- [ ] Fix the login bug | fast
- [ ] Add user profile page | standard
- [ ] Security audit of payment flow | thorough
- [x] Refactor database queries | standard (completed)
- [ ] Add user profile page | standard | depends: fix login
- [ ] Security audit | thorough | done: Guardian approves AND load_test.sh passes
```
### With Dependencies
```markdown
- [ ] Add user model (standard)
- [ ] Add user API endpoints (standard) | depends: user model
- [ ] Add user UI (standard) | depends: user API endpoints
```
Dependencies are processed in order: a task with `depends: X` waits until X completes successfully. Tasks without dependencies or with resolved dependencies can run in parallel (see Parallel Team Orchestration in the orchestration skill).
Tasks with `depends:` wait for the named task to complete. Tasks with `done:` have completion criteria checked in the Act phase.
### With Completion Criteria
```markdown
- [ ] Fix login bug | fast | done: login_test.py passes
- [ ] Add rate limiting | standard | done: Guardian approves AND load_test.sh passes
```
Completion criteria are checked in the Act phase. If the test command fails even when reviewers approve, the task cycles back.
## Safety Mechanisms
### Automatic Stop Conditions
- **3 consecutive failures:** Something systemic is wrong
- **Test suite broken:** Halt immediately, revert last merge
- **Budget exceeded:** Stop at limit
- **Shadow escalation:** Same shadow detected 3+ times across tasks
- **Destructive action detected:** Force push, branch deletion, schema drop
### Everything is Reversible
- Code lives on worktree branches until explicitly merged
- Merges use `--no-ff` (individually revertable)
- Failed tasks leave branches intact for inspection
### User Controls
- **Cancel:** Kill session, incomplete work stays on branches
- **Pause:** Stop after current task, resume later
- **Skip:** Move to next task
- **Review:** Read `.archeflow/session-log.md` for progress
## Session Log
Every session writes to `.archeflow/session-log.md` with per-task entries:
- Workflow, status, cycles, reviewer verdicts
- Files changed, tests added
- Branch and commit info
- Duration and timestamps
- Session summary at the end
## Budget-Aware Scheduling
Set a token or cost budget for the session. The orchestrator tracks estimated cost per task and adapts:
```
Budget: $5.00 (or ~2M tokens)
```
| Budget Remaining | Action |
|-----------------|--------|
| > 50% | Run tasks at their selected workflow level |
| 25-50% | Downgrade `thorough``standard`, `standard``fast` |
| < 25% | Run remaining tasks as `fast` only |
| Exhausted | Stop. Log remaining tasks as "skipped — budget exhausted" |
| > 50% | Run at selected workflow level |
| 25-50% | Downgrade thorough to standard, standard to fast |
| < 25% | All tasks as fast only |
| Exhausted | Stop, log remaining as skipped |
Budget is tracked per-task in the session log. Estimated cost per agent by model tier:
## Auto-Resume
| Tier | Model | Est. Cost/Agent |
|------|-------|----------------|
| cheap | Haiku | ~$0.01 |
| standard | Sonnet | ~$0.05 |
| premium | Opus | ~$0.25 |
A standard workflow (6 agents, mostly Sonnet) costs ~$0.30. A thorough workflow (8 agents) costs ~$0.50. These are rough estimates — actual cost depends on context size and output length.
## Auto-Resume on Interruption
If a session is interrupted (crash, timeout, user cancel), save state for resumption:
### On Interruption
Write `.archeflow/state.json`:
```json
{
"session_id": "...",
"current_task": 2,
"current_phase": "check",
"current_cycle": 1,
"completed_tasks": [1],
"queue": ["task3", "task4"],
"worktree_branch": "archeflow/maker-abc",
"timestamp": "2026-04-03T22:15:00Z"
}
```
### On Next Session Start
If `.archeflow/state.json` exists:
1. Report: "Found interrupted ArcheFlow session from [timestamp]. Task [N] was in [phase] phase."
2. Offer: "Resume from where we left off? Or start fresh?"
3. If resume: pick up from the saved phase. The worktree branch is still intact.
4. If fresh: clean up state file and worktrees, start over.
## Overnight Session Checklist
Before starting an autonomous overnight session:
1. **Clean working tree:** `git status` — no uncommitted changes
2. **Tests passing:** Run the full test suite. Don't start on a broken baseline.
3. **Task queue defined:** Either inline or in `.archeflow/queue.md`
4. **Workflow selected per task:** Match risk level to workflow type
5. **Budget set (optional):** If cost matters, set a token/dollar limit
6. **Push access:** Verify git push works (SSH key, auth token)
Then: set it, forget it, read the session log in the morning.
On interruption, save state to `.archeflow/state.json` (current task, phase, cycle, completed tasks, worktree branch). On next session start, offer to resume or start fresh.

View File

@@ -1,85 +1,110 @@
---
name: check-phase
description: Use when you are acting as Guardian, Skeptic, Sage, or Trickster archetype in the Check phase. Defines shared review rules and output format.
description: Use when acting as Guardian, Skeptic, Sage, or Trickster in the Check phase. Defines review rules, finding format, attention filters, and spawning protocol.
---
# Check Phase
Multiple reviewers examine the Maker's implementation in parallel. Each agent definition has its specific protocol — this skill defines the shared rules.
Reviewers examine the Maker's implementation. This skill defines shared rules, finding format, and spawning protocol.
## Shared Rules
1. **Read the proposal first.** Review against the intended design, not invented requirements.
2. **Read the actual code.** Use `git diff` on the Maker's branch. Don't review descriptions alone.
3. **Structured findings.** Use the standardized finding format below for every issue.
4. **Clear verdict:** `APPROVED` or `REJECTED` with rationale.
1. Review against the proposal's intended design, not invented requirements.
2. Read actual code via `git diff` on the Maker's branch.
3. Use the finding format below for every issue.
4. Give a clear verdict: `APPROVED` or `REJECTED` with rationale.
5. `STATUS: DONE` signals agent completion. `APPROVED`/`REJECTED` is domain output. Both are parsed independently.
## Finding Format
Every finding must use this format for cross-cycle tracking:
```
| Location | Severity | Category | Description | Fix |
|----------|----------|----------|-------------|-----|
| src/auth/handler.ts:48 | CRITICAL | security | Empty string bypasses validation | Add length check before processing |
```
| src/auth/handler.ts:48 | CRITICAL | security | Empty string bypasses validation | Add length check |
**Severity:**
- **CRITICAL** — Must fix. Blocks approval.
- **WARNING** — Should fix. Doesn't block alone.
- **INFO** — Nice to have. Never blocks.
**Severity:** CRITICAL = must fix, blocks approval. WARNING = should fix, doesn't block alone. INFO = nice to have, never blocks.
**Categories** (use consistently for cross-cycle tracking):
- `security` — Injection, auth bypass, data exposure, secrets
- `reliability` — Error handling, edge cases, race conditions, crashes
- `design` — Architecture, assumptions, scalability, coupling
- `breaking-change` — API compatibility, schema migrations, removals
- `dependency` — New deps, version conflicts, license issues
- `quality` — Readability, maintainability, naming, duplication
- `testing` — Missing tests, weak assertions, untested paths
- `consistency` — Deviates from codebase patterns
**Categories:** `security` `reliability` `design` `breaking-change` `dependency` `quality` `testing` `consistency`
## Consolidated Output
## Evidence Requirements
After all reviewers finish, compile:
Every CRITICAL or WARNING must include concrete evidence. Without evidence, downgrade to INFO.
**Valid evidence:** command output, exit codes, code citations with line numbers, git diff excerpts, reproduction steps.
**Banned in CRITICAL/WARNING:** "might be", "could potentially", "appears to", "seems like", "may not". Rewrite with evidence or downgrade.
For each CRITICAL/WARNING, state: (1) what was tested, (2) what was observed, (3) what correct behavior should be.
## Attention Filters
Each archetype receives only relevant context. Do not pass everything.
| Archetype | Receives | Excludes |
|-----------|----------|----------|
| Guardian | Maker's git diff + proposal risk section + test results | Explorer research, Creator rationale, other reviewers |
| Skeptic | Creator's proposal (assumptions + architecture) + confidence scores | Git diff, Explorer research, other reviewers |
| Sage | Creator's proposal + Maker's diff + implementation summary + test results | Explorer raw research, other reviewer verdicts |
| Trickster | Maker's git diff + attack surface summary (file types + entry points) | Proposal, research, other reviewers |
**Token budget targets:**
| Archetype | Fast | Standard | Thorough |
|-----------|------|----------|----------|
| Guardian | 1500 | 2000 | 2500 |
| Skeptic | skip | 1500 | 2000 |
| Trickster | skip | skip | 1500 |
| Sage | skip | 2500 | 3000 |
**Context isolation:** Agents receive fresh, controller-constructed context only. No session bleed, no cross-agent contamination, no ambient knowledge. Verify zero references to excluded artifacts before spawning.
**Cycle-back filtering (cycle 2+):** Pass structured feedback table only (not full reviewer artifacts). Strip resolved items. Cap at 500 tokens — summarize by severity if exceeded.
## Reviewer Spawning Protocol
### Step 1: Guardian First (mandatory)
Guardian always runs first. It receives the Maker's git diff and the proposal's risk section only.
Save output to `.archeflow/artifacts/${RUN_ID}/check-guardian.md`.
### Step 2: A2 Fast-Path Evaluation
After Guardian completes, count CRITICAL and WARNING findings in its output. If both are zero, and not escalated, and not first cycle of a thorough workflow — skip remaining reviewers and proceed to Act phase.
### Step 3: Parallel Remaining Reviewers
If A2 does not trigger, spawn remaining reviewers in parallel:
| Workflow | Reviewers (after Guardian) |
|----------|--------------------------|
| `fast` | None (Guardian only) |
| `fast` (escalated) | Skeptic + Sage |
| `standard` | Skeptic + Sage |
| `thorough` | Skeptic + Sage + Trickster |
Each reviewer gets context per the attention filters above.
### Step 4: Collect and Consolidate
For each reviewer: save to `.archeflow/artifacts/${RUN_ID}/check-<archetype>.md`, emit `review.verdict` event, record sequence number.
**Deduplication:** If two reviewers raise the same issue (same file + same category), merge into one finding using the higher severity. Don't double-count.
**Verdict:** Count CRITICAL findings across all reviewers (after dedup). Any CRITICAL = `REJECTED`. Otherwise `APPROVED`.
Example consolidated output:
```markdown
## Check Phase Results — Cycle N
## Check Phase Results — Cycle 1
### Guardian: APPROVED
| Location | Severity | Category | Description | Fix |
|----------|----------|----------|-------------|-----|
| src/auth/handler.ts:52 | WARNING | security | Missing rate limit | Add rate limiter middleware |
### Skeptic: APPROVED
| Location | Severity | Category | Description | Fix |
|----------|----------|----------|-------------|-----|
| src/auth/handler.ts:30 | INFO | design | Consider caching validated tokens | Add TTL cache for token validation |
### Sage: APPROVED
| Location | Severity | Category | Description | Fix |
|----------|----------|----------|-------------|-----|
| tests/auth.test.ts:15 | WARNING | testing | Test names don't describe behavior | Rename to "should reject expired tokens" |
### Trickster: REJECTED
| Location | Severity | Category | Description | Fix |
|----------|----------|----------|-------------|-----|
| src/auth/handler.ts:48 | CRITICAL | reliability | Empty string bypasses validation | Add `if (!token || token.trim() === '')` guard |
### Deduplication
If two reviewers raise the same issue (same file + same category), merge:
| Guardian + Skeptic | CRITICAL | security | Input not sanitized (src/api.ts:30) | Add validation |
Use the higher severity. Don't double-count in the verdict.
### Verdict: REJECTED — 1 critical finding
→ Build cycle feedback (see orchestration skill) and feed to Plan phase
| src/auth.ts:52 | WARNING | security | Missing rate limit | Add rate limiter |
### Verdict: APPROVED — 0 critical, 1 warning
```
## Why Structured Findings Matter
## Timeout Handling
The standardized format enables:
- **Cross-cycle tracking:** Same category + location = same issue. Can detect resolution or regression.
- **Feedback routing:** Security/design findings → Creator. Quality/testing findings → Maker.
- **Shadow detection:** CRITICAL:WARNING ratios, finding counts, and category distributions are measurable.
- **Metrics:** Severity counts feed into the orchestration summary.
Each reviewer has a **5-minute timeout**. On timeout: emit `agent.complete` with `"error": true`, log WARNING, treat as no findings, proceed.
**Exception:** Guardian timeout is blocking — abort Check phase and report to user.

View File

@@ -9,384 +9,91 @@ description: |
<example>User: "archeflow:run" in a project with colette.yaml</example>
---
# Colette Bridge Writing Context Auto-Loader
# Colette Bridge -- Writing Context Auto-Loader
When ArcheFlow detects `colette.yaml` in the project root, this skill automatically loads voice profiles, personas, character sheets, and project rules into a context bundle that every agent receives (filtered by archetype role).
When `colette.yaml` exists in the project root, this skill loads voice profiles, personas, character sheets, and project rules into a context bundle filtered per archetype.
## Prerequisites
## Activation
- `archeflow:domains` — Colette Bridge sets domain to `writing` automatically
- `archeflow:artifact-routing` — bundle is injected via the artifact routing system
- `archeflow:run` — bridge hooks into run initialization
## Trigger
At `run.start`, after domain detection but before the Plan phase:
1. Check if `colette.yaml` exists in the project root
2. If found, activate Colette Bridge
3. If not found, skip silently (no error, no warning)
When the bridge activates, it emits a decision event:
```bash
./lib/archeflow-event.sh "$RUN_ID" decision init "" \
'{"what":"colette_bridge","chosen":"activated","signal":"colette.yaml found","files_resolved":<count>}'
```
---
At `run.start`, after domain detection but before Plan phase:
1. Check for `colette.yaml` in project root
2. If found: activate bridge, set domain to `writing`
3. If not found: skip silently
## File Resolution
Colette projects reference files by ID (e.g., `vp-giesing-gschichten-v1`) but the actual YAML files may live in different locations. The bridge resolves files using this search order:
Colette projects reference files by ID (e.g., `vp-giesing-gschichten-v1`). The bridge resolves them:
### Search Priority (highest first)
| Priority | Location |
|----------|----------|
| 1 | Explicit path in `colette.yaml` (has `/` or `.yaml`) |
| 2 | Project root subdirectories (`./profiles/<id>.yaml`) |
| 3 | Parent `writing.colette/` dir (`../writing.colette/profiles/<id>.yaml`) |
| Priority | Location | Example |
|----------|----------|---------|
| 1 | Explicit path in `colette.yaml` | `voice.profile: ../writing.colette/profiles/custom.yaml` |
| 2 | Project root subdirectories | `./profiles/vp-giesing-gschichten-v1.yaml` |
| 3 | Parent directory + `writing.colette/` | `../writing.colette/profiles/vp-giesing-gschichten-v1.yaml` |
**What gets resolved:**
### What Gets Resolved
| Source | colette.yaml field | Search paths |
|--------|-------------------|-------------|
| Voice profile | `voice.profile` | `profiles/<id>.yaml`, `../writing.colette/profiles/<id>.yaml` |
| Persona | `writing.persona` or inferred from profile | `personas/<id>.yaml`, `../writing.colette/personas/<id>.yaml` |
| Source | colette.yaml field | Search subdirs |
|--------|-------------------|----------------|
| Voice profile | `voice.profile` | `profiles/` |
| Persona | `writing.persona` or inferred from profile | `personas/` |
| Characters | Auto-discovered | `characters/*.yaml` |
| Series config | `series` section (if present) | `colette.yaml` itself, `../writing.colette/series/<name>.yaml` |
| Series config | `series` section | `colette.yaml` itself |
| Project rules | Always | `CLAUDE.md` in project root |
### Resolution Procedure
```
for each reference in colette.yaml:
1. If the field contains a path (has / or .yaml) → use as-is, verify exists
2. If the field contains an ID (e.g., "vp-giesing-gschichten-v1"):
a. Check ./profiles/<id>.yaml (or ./personas/<id>.yaml)
b. Check ../writing.colette/profiles/<id>.yaml (or ../writing.colette/personas/<id>.yaml)
c. If not found → warn in event log, skip this file
3. For characters/ → glob characters/*.yaml in project root
4. For CLAUDE.md → check project root
```
If a referenced file cannot be found at any location, emit a warning event but do not abort:
```bash
./lib/archeflow-event.sh "$RUN_ID" decision init "" \
'{"what":"colette_bridge_warning","chosen":"skip","file":"vp-giesing-gschichten-v1","reason":"not found in any search path"}'
```
---
Missing files emit a warning event but do not abort the run.
## Context Bundle
The bridge generates `.archeflow/context/colette-bundle.md` — a summarized, token-efficient Markdown file that agents receive as part of their prompt context.
Generated at `.archeflow/context/colette-bundle.md`. Summarized, not raw YAML. Target: under 1500 tokens.
### Bundle Structure
```markdown
# Writing Context (auto-loaded from Colette)
## Voice Profile: <id>
**Tone:** <tone_summary from meta>
**Perspective:** <perspektive>
**Density:** <dichte>
**Attitude:** <haltung>
**Sharpness:** <schaerfe>
**Humor:** <humor>
**Tempo:** <tempo>
**Reader relationship:** <leser_beziehung>
### Forbidden
- <each item from verboten>
### Allowed
- <each item from erlaubt>
### Style models
- <each item from vorbilder, name only + one-word tag>
## Persona: <id>
**Name:** <name>
**Bio:** <bio, max 2 sentences>
**Genres:** <genres, comma-separated>
### Rules
- <each item from rules>
## Characters
### <name> (<role>)
- **Age:** <age>
- **Key traits:** <first 3 personality items>
- **Speech:** <speech_pattern, first sentence only>
- **Relationships:** <key relationships, one line each>
[Repeated for each character in characters/*.yaml]
## Series Context
[Only if series config found in colette.yaml]
- **Shared concepts:** <list>
- **Glossary:** <key terms>
- **Forbidden cross-story:** <items>
## Project Rules (from CLAUDE.md)
[Key writing rules extracted from CLAUDE.md, summarized as bullet points]
- <rule 1>
- <rule 2>
- ...
```
### Summarization Rules
The bundle is **summarized**, not a raw YAML dump. This reduces token cost:
- Voice profile dimensions: key name + value (no YAML formatting, no `dimensionen:` wrapper)
- Verboten/erlaubt: bullet list, strip explanation after the dash if over 15 words
**Summarization rules:**
- Voice dimensions: key + value (no YAML wrapper)
- Verboten/erlaubt: bullet list, truncate items over 15 words
- Characters: name, role, age, top 3 traits, first sentence of speech pattern, relationships
- Persona bio: max 2 sentences
- CLAUDE.md: extract only rules/style sections, skip meta/git/cost config
- Target: bundle should be under 1500 tokens for a typical project
---
- CLAUDE.md: only writing rules, skip meta/git/cost config
## Caching
The bundle is regenerated only when source files have changed. Cache validation uses file modification times.
### Cache Check Procedure
```
bundle_path = .archeflow/context/colette-bundle.md
if bundle_path does not exist → generate
if bundle_path exists:
bundle_mtime = mtime of bundle_path
for each resolved source file:
if source_mtime > bundle_mtime → regenerate, break
if no source file is newer → use cached bundle
```
When the cache is valid, emit:
```bash
./lib/archeflow-event.sh "$RUN_ID" decision init "" \
'{"what":"colette_bundle_cache","chosen":"reuse","reason":"all sources older than bundle"}'
```
When regenerating:
```bash
./lib/archeflow-event.sh "$RUN_ID" decision init "" \
'{"what":"colette_bundle_cache","chosen":"regenerate","reason":"<file> modified since last bundle"}'
```
---
Bundle regenerated only when source file mtimes are newer than the bundle. If all sources are older, reuse cached bundle.
## Per-Agent Attention Filters
Not every agent needs the full bundle. The bridge defines attention filters that control which sections each archetype receives. This extends the base attention filters from `archeflow:attention-filters`.
Not every agent needs the full bundle:
| Archetype | Bundle sections injected | Rationale |
|-----------|------------------------|-----------|
| **Explorer** | Full bundle | Needs all context for research — setting, characters, voice, rules |
| **Creator** | Voice dimensions + persona rules + characters | Designs outline — needs to know who speaks how, who exists, what's allowed |
| **Maker** | Full bundle | Writes prose — needs voice for style, characters for dialogue, rules for guardrails |
| **Guardian** | Characters + series shared_concepts | Checks consistency — needs character facts and cross-story constraints |
| **Sage** | Voice profile (full, including verboten/erlaubt) + persona rules | Checks voice drift — needs the complete voice spec and persona constraints |
| **Trickster** | Characters + series glossary | Tests continuity — needs character facts and terminology for contradiction checks |
| Archetype | Receives |
|-----------|----------|
| Explorer | Full bundle |
| Creator | Voice dimensions + persona rules + characters |
| Maker | Full bundle |
| Guardian | Characters + series shared_concepts |
| Sage | Full voice profile (incl. verboten/erlaubt) + persona rules |
| Trickster | Characters + series glossary |
### Filter Implementation
When injecting the bundle into an agent prompt, extract only the relevant sections:
```
# For Guardian:
Extract: "## Characters" section (all characters)
Extract: "## Series Context" section (if present)
Skip: everything else
# For Sage:
Extract: "## Voice Profile" section (full, with forbidden/allowed)
Extract: "## Persona" section (rules subsection)
Skip: characters, series, project rules
# For Explorer and Maker:
Inject: full bundle as-is
```
The filtering happens at prompt assembly time, not at bundle generation time. One bundle, multiple filtered views.
### Custom Archetypes
Custom archetypes (e.g., `story-explorer`, `story-sage`) inherit the filter of their closest base archetype:
| Custom archetype | Inherits filter from | Override |
|-----------------|---------------------|----------|
| `story-explorer` | Explorer | Full bundle |
| `story-sage` | Sage | Full voice profile + persona rules |
| `story-guardian` | Guardian | Characters + series |
If a custom archetype needs a different filter, define it in the archetype's markdown frontmatter:
Custom archetypes inherit the filter of their closest base archetype. Override with `colette_filter` in archetype frontmatter:
```yaml
---
name: story-sage
colette_filter: [voice_profile, persona, characters]
---
```
The `colette_filter` field accepts section keys: `voice_profile`, `persona`, `characters`, `series`, `project_rules`, `full`.
Section keys: `voice_profile`, `persona`, `characters`, `series`, `project_rules`, `full`.
---
## Integration with Run Skill
The Colette Bridge hooks into `archeflow:run` initialization. The sequence is:
## Run Integration
```
run.start
├── Domain detection (from archeflow:domains)
│ └── colette.yaml found → domain = writing
├── Colette Bridge activation
├── Resolve files (voice profile, persona, characters, CLAUDE.md)
├── Check bundle cache
│ ├── Generate/refresh bundle → .archeflow/context/colette-bundle.md
│ └── Register bundle path in artifact routing
└── Continue to Plan phase
+-- Domain detection -> colette.yaml found -> domain = writing
+-- Colette Bridge activation
| +-- Resolve files
| +-- Check/refresh bundle cache
| +-- Register bundle in artifact routing
+-- Continue to Plan phase
```
### Artifact Routing Registration
The bundle path is registered so that every phase's context injection includes the (filtered) bundle:
```
artifact_routing.register_context(
path = ".archeflow/context/colette-bundle.md",
inject_at = "all_phases",
filter_by = "archetype" # Apply per-agent attention filters
)
```
In practice, this means the run skill prepends the filtered bundle content to each agent's prompt, after the standard task description but before phase-specific artifacts.
### Prompt Injection Order
```
1. Archetype definition (from SKILL.md or custom archetype .md)
2. Domain-specific review focus (from archeflow:domains)
**Prompt injection order:**
1. Archetype definition
2. Domain-specific review focus
3. Colette bundle (filtered for this archetype)
4. Task description
5. Phase-specific artifacts (Explorer output, Creator proposal, etc.)
5. Phase-specific artifacts
6. Cycle feedback (if cycle 2+)
```
---
## Example: Giesing Gschichten
Given this `colette.yaml`:
```yaml
project:
name: "Giesing Gschichten"
author: "C. Nennemann"
language: de
type: fiction
voice:
profile: vp-giesing-gschichten-v1
writing:
target_words: 6000
style: "Ich-Erzaehler, lakonisch, Eberhofer-meets-Grossstadt"
```
The bridge:
1. Reads `voice.profile: vp-giesing-gschichten-v1`
2. Searches for `./profiles/vp-giesing-gschichten-v1.yaml` — not found
3. Searches for `../writing.colette/profiles/vp-giesing-gschichten-v1.yaml` — found
4. Infers persona from voice profile ID pattern or searches `personas/` — finds `giesinger.yaml` at `../writing.colette/personas/giesinger.yaml`
5. Globs `characters/*.yaml` — finds `alex.yaml` (and others if present)
6. Reads `CLAUDE.md` for writing rules
7. Generates bundle:
```markdown
# Writing Context (auto-loaded from Colette)
## Voice Profile: vp-giesing-gschichten-v1
**Tone:** Lakonisch, warmherzig-genervt, trockener Humor
**Perspective:** Ich-Erzaehler (Alex), nah dran, subjektiv
**Density:** Alltagsdetails die Atmosphaere schaffen
**Attitude:** Lakonisch, leicht genervt, aber mit Herz
**Sharpness:** Beobachtungsscharf, sprachlich reduziert
**Humor:** Trocken, Understatement, absurde Situationen
**Tempo:** Gemaechlich mit Spannungsspitzen, Slow Burn
**Reader relationship:** Kumpel am Stammtisch
### Forbidden
- Hochdeutsch-Sterilitaet
- Krimi-Klischees (CSI, Profiler, Tatort)
- Lederhosen-Kitsch und Oktoberfest-Folklore
- Dialekt-Overkill
- Moralisieren oder Erklaeren
- Kuenstliche Spannungsaufbauten
- Adverb-Orgien und Adjektiv-Ketten
- Infodumps
### Allowed
- Bairische Einsprengsel in Hochdeutsch-Prosa
- Essen und Trinken als Leitmotiv
- Kiffer-Humor und Slow-Motion-Beobachtungen
- Gentrification-Satire
- Echte Giesinger Orte und Strassen
- Skurrile Nachbarn
- Kriminalplot aus dem Alltag
- Kurze, lakonische Dialoge
### Style models
- Rita Falk (Erzaehlton), Wolf Haas (lakonisch), Helmut Dietl (Muenchner Milieu), Friedrich Ani (duester), Bukowski (Anti-Held)
## Persona: giesinger
**Name:** Der Giesinger
**Bio:** Erzaehlt Geschichten aus Muenchen-Giesing. Eberhofer meets Grossstadt.
**Genres:** Krimi, Kurzgeschichte, Milieustudie
### Rules
- Ich-Erzaehler, immer — Alex erzaehlt
- Hauptsaechlich Hochdeutsch mit bairischen Einsprengsel
- Jede Geschichte hat einen Kriminalplot
- Essen/Trinken in jeder Geschichte
- Echte Giesinger Orte und Strassen
- Humor durch Understatement
- Alex ist kein Ermittler
- Figuren reden wie echte Menschen
## Characters
### Alex (protagonist)
- **Age:** Mitte 30
- **Key traits:** Lakonisch, funktionaler Kiffer, unmotiviert aber nicht dumm
- **Speech:** Kurze Saetze, Hochdeutsch mit bairischen Einsprengsel.
- **Relationships:** Mo — Nachbar, Kumpel und Unruhestifter
## Project Rules (from CLAUDE.md)
- Jede Geschichte beginnt mit einer Alltagsszene
- Kriminalplot ergibt sich organisch aus dem Alltag
- Essen/Trinken in jeder Geschichte
- Echte Giesinger Orte verwenden
- Kein Moralisieren, kein Erklaerbaer
- Ende muss nicht alles aufloesen
```
---
## Design Principles
1. **Summarize, don't dump.** Raw YAML wastes tokens and confuses agents. The bundle is a curated briefing.
2. **Cache aggressively.** Voice profiles and characters rarely change mid-run. Only regenerate when mtimes change.
3. **Filter per agent.** A Guardian checking plot consistency does not need the full voice profile. A Sage checking voice drift does not need character sheets.
4. **Graceful degradation.** Missing files are warned about, not fatal. A project with `colette.yaml` but no characters/ still works — the Characters section is simply empty.
5. **One bundle, filtered views.** Generate the full bundle once. Filter at injection time per archetype. This keeps caching simple.
6. **Additive to existing skills.** The bridge does not replace domain detection or artifact routing — it hooks into them. Remove the bridge, everything still works (just without auto-loaded writing context).

View File

@@ -1,249 +0,0 @@
---
name: convergence
description: |
Detects convergence, stalling, and oscillation in multi-cycle PDCA runs. Prevents wasted cycles
by stopping early when findings are not being resolved or are bouncing between cycles.
<example>Automatically loaded during Act phase before exit decision</example>
<example>User: "Is the run converging?"</example>
---
# Convergence Detection
In multi-cycle PDCA runs, the Act phase must decide whether another cycle will help or just waste tokens. This skill provides the analysis: are findings being resolved (converging), staying the same (stalling), or bouncing back (oscillating)?
## When It Runs
Convergence analysis runs **after the Check phase completes and before the Act phase exit decision**. It requires at least 2 cycles of data — on cycle 1, it is skipped (no comparison baseline).
```
Check phase → Convergence Analysis → Act phase exit decision
```
---
## Step 1: Finding Comparison
Extract findings from the current cycle and compare against the previous cycle.
### Data Sources
- **Current cycle findings:** Parsed from `check-*.md` artifacts in `.archeflow/artifacts/<run_id>/`
- **Previous cycle findings:** Parsed from `check-*.md` artifacts in `.archeflow/artifacts/<run_id>/cycle-<N-1>/`
Each finding is identified by a composite key: `source + category + file_location + description_keywords`.
### Finding Categories
Every finding from the current cycle is classified into exactly one category:
| Category | Definition |
|----------|------------|
| **NEW** | Finding not present in any previous cycle |
| **RESOLVED** | Was present in the previous cycle, absent in the current cycle |
| **PERSISTENT** | Present in both the current and previous cycle (same key) |
| **REGRESSED** | Was RESOLVED in the previous cycle (was present in N-2, absent in N-1), but returned in the current cycle |
### Matching Algorithm
Two findings match if:
1. Same `source` archetype (guardian, sage, etc.)
2. Same `category` (security, reliability, quality, etc.)
3. Same or overlapping file location (same file, line within 10 lines)
4. 50%+ keyword overlap in description (lowercase, strip punctuation)
All four conditions must hold. This prevents false matches across unrelated findings.
---
## Step 2: Convergence Score
Calculate a convergence score from the categorized findings:
```
convergence = resolved_count / (resolved_count + new_count + regressed_count)
```
If the denominator is 0 (no resolved, no new, no regressed — only persistent), the score is `0.0` (stalled, not converging).
### Score Interpretation
| Score Range | Status | Meaning |
|-------------|--------|---------|
| > 0.8 | **Converging** | Most issues being resolved, few new ones introduced |
| 0.5 - 0.8 | **Stalling** | Fixing roughly as many as introducing |
| < 0.5 | **Diverging** | Making things worse — more new/regressed than resolved |
| 0.0 (all persistent) | **Stuck** | No progress in either direction |
---
## Step 3: Oscillation Detection
An oscillating finding is one that bounces between resolved and re-introduced across cycles:
1. Finding was present in cycle N-2
2. Finding was absent in cycle N-1 (resolved)
3. Finding is present again in cycle N (regressed)
This indicates the fix in cycle N-1 was undone or invalidated by other changes in cycle N.
### Oscillation Rules
- A single oscillating finding: **flag it** in the convergence report but continue.
- Two or more oscillating findings: **STOP** and escalate to the user.
- Message: `"Findings X and Y are oscillating between cycles. Manual intervention needed — the automated fixes are interfering with each other."`
Oscillation tracking requires 3+ cycles of data. On cycles 1-2, oscillation detection is skipped.
---
## Step 4: Early Termination Rules
The convergence analysis can override the normal Act phase exit decision. If any of these conditions hold, the recommendation is **STOP**:
| Condition | Threshold | Recommendation |
|-----------|-----------|----------------|
| Diverging | Score < 0.5 for 2 consecutive cycles | STOP — changes are making things worse |
| Stalled | 0 findings resolved between cycles | STOP — no progress, further cycles will not help |
| Stuck | All findings are PERSISTENT for 2 consecutive cycles | STOP — automated fixes cannot resolve these |
| Oscillating | 2+ findings oscillating | STOP — fixes are interfering with each other |
When STOP is recommended, the Act phase should:
1. **Not** start another PDCA cycle
2. Report all unresolved findings to the user
3. Present the best implementation so far (on its branch, not merged)
4. Include the convergence report explaining why the run was stopped
### Override Behavior
The convergence STOP recommendation overrides the normal cycle-back logic in the Act phase. Even if `CYCLE < MAX_CYCLES` and there are fixable-looking findings, if convergence says STOP, the run stops.
The user can always override by explicitly requesting another cycle: `"Run one more cycle anyway"`.
---
## Step 5: Integration with Act Phase
### Event Data
Convergence data is included in the `cycle.boundary` event emitted by the Act phase:
```json
{
"type": "cycle.boundary",
"phase": "act",
"data": {
"cycle": 2,
"max_cycles": 3,
"exit_condition": "convergence_stop",
"met": false,
"fixes_applied": 2,
"next_action": "stop",
"convergence": {
"score": 0.35,
"status": "diverging",
"resolved": 1,
"new": 2,
"regressed": 1,
"persistent": 3,
"oscillating": ["Timeline reference mismatch"],
"recommendation": "stop",
"reason": "Diverging for 2 consecutive cycles"
}
}
}
```
### Decision Tree Update
The Act phase decision tree (from `act-phase` skill Step 4) gains a new first branch:
```
┌─ Convergence analysis (cycle 2+)
├─ Convergence says STOP
│ └─ STOP: Report to user with convergence report
├─ Convergence says CONTINUE
│ └─ Fall through to normal exit decision logic
└─ Cycle 1 (no convergence data)
└─ Fall through to normal exit decision logic
```
### Act Feedback Enhancement
When the Act phase builds `act-feedback.md` for the next cycle, it includes the convergence summary at the top:
```markdown
## Convergence Analysis (Cycle 1 → 2)
Score: 0.75 (converging)
Resolved: 3 | New: 1 | Regressed: 0 | Persistent: 2
Recommendation: Continue — trend is positive
### Finding Status
| Finding | Status | Cycles |
|---------|--------|--------|
| SQL injection in user input | RESOLVED | 1 |
| Missing rate limit | RESOLVED | 1 |
| Test names unclear | RESOLVED | 1 |
| Null check missing in parser | PERSISTENT | 2 |
| Error path not tested | PERSISTENT | 2 |
| New: Unused import introduced | NEW | 1 |
```
---
## Convergence Report Format
The full convergence report is generated as part of the orchestration output:
```markdown
## Convergence Analysis (Cycle N-1 → N)
**Score:** 0.75 (converging)
**Resolved:** 3 | **New:** 1 | **Regressed:** 0 | **Persistent:** 2 | **Oscillating:** 0
### Resolved This Cycle
| Source | Category | Description |
|--------|----------|-------------|
| guardian | security | SQL injection in user input handler |
| guardian | reliability | Missing rate limit on auth endpoint |
| sage | quality | Test names don't describe behavior |
### New This Cycle
| Source | Category | Description |
|--------|----------|-------------|
| sage | quality | Unused import introduced by fix |
### Persistent (unresolved across cycles)
| Source | Category | Description | Cycles Open |
|--------|----------|-------------|-------------|
| trickster | reliability | Null check missing in parser | 2 |
| sage | testing | Error path not tested | 2 |
### Oscillating
(none)
**Recommendation:** Continue — trend is positive
```
---
## Integration with Memory Skill
When convergence detects PERSISTENT findings (present for 2+ cycles), these are strong candidates for the `memory` skill's lesson extraction:
- After a run that had persistent findings, `archeflow-memory.sh extract` will pick these up with higher confidence (they have been confirmed across multiple cycles within a single run).
- Persistent findings that also appear in `lessons.jsonl` from prior runs get a double frequency boost (cross-cycle within run + cross-run pattern).
---
## Design Principles
1. **Conservative stopping.** Requires 2 consecutive data points before recommending STOP. A single bad cycle might be noise.
2. **User has final say.** STOP is a recommendation, not an enforced shutdown. The user can override.
3. **Cheap computation.** Keyword matching on finding descriptions, simple arithmetic on counts. No ML, no embeddings.
4. **Bounded scope.** Only compares adjacent cycles (N vs N-1, with N-2 for oscillation). Does not attempt to model long-term trends across many cycles.
5. **Observable.** All convergence data is included in the `cycle.boundary` event, making it available for post-hoc analysis via the process log.

View File

@@ -8,320 +8,87 @@ description: |
<example>Automatically active when budget is configured</example>
---
# Cost Tracking Budget-Aware Orchestration
# Cost Tracking -- Budget-Aware Orchestration
Every ArcheFlow orchestration consumes LLM tokens. This skill tracks costs per agent and per run, enforces budgets, and recommends cost-optimal model assignments.
Tracks costs per agent and per run, enforces budgets, and selects cost-optimal models.
## Model Pricing Table
## Model Pricing
Current pricing (update when models change):
| Model | Input ($/M tok) | Output ($/M tok) |
|-------|----------------:|-----------------:|
| claude-opus-4-6 | 15.00 | 75.00 |
| claude-sonnet-4-6 | 3.00 | 15.00 |
| claude-haiku-4-5 | 0.80 | 4.00 |
| Model | Input ($/M tokens) | Output ($/M tokens) | Notes |
|-------|--------------------:|---------------------:|-------|
| `claude-opus-4-6` | 15.00 | 75.00 | Highest quality, use sparingly |
| `claude-sonnet-4-6` | 3.00 | 15.00 | Good balance of quality and cost |
| `claude-haiku-4-5` | 0.80 | 4.00 | Cheap, fast, good for structured tasks |
**Prompt caching:** 90% discount on cached input tokens. Structure system prompts for cache hits.
**Batches API:** 50% discount. Use for non-time-sensitive bulk ops.
**Prompt caching** (when applicable): 90% discount on cached input tokens. The orchestrator should structure system prompts to maximize cache hits (archetype instructions, voice profiles, and domain context are cache-friendly since they repeat across agents in a run).
**Batches API**: 50% discount on all tokens. Use for non-time-sensitive bulk operations (validation passes, consistency checks).
## Per-Agent Cost Tracking
Every `agent.complete` event includes cost data:
```jsonl
{
"type": "agent.complete",
"data": {
"archetype": "story-explorer",
"duration_ms": 87605,
"tokens_input": 15000,
"tokens_output": 6000,
"tokens_cache_read": 8000,
"model": "haiku",
"estimated_cost_usd": 0.02,
"summary": "3 plot directions developed, recommended C"
}
}
```
### Cost Calculation
## Cost Calculation
```
cost = (tokens_input - tokens_cache_read) * input_price / 1_000_000
+ tokens_cache_read * input_price * 0.10 / 1_000_000
+ tokens_output * output_price / 1_000_000
cost = (input - cache_read) * input_price/1M
+ cache_read * input_price * 0.10/1M
+ output * output_price/1M
```
If exact token counts are unavailable (Claude Code doesn't always expose them), estimate based on character count:
If exact tokens unavailable, estimate: `tokens ~= chars / 4`. Mark with `cost_estimated: true`.
```
estimated_tokens = character_count / 4 # rough heuristic
```
## Default Model Assignments
Mark estimated costs with `"cost_estimated": true` in the event data so reports can distinguish measured from estimated values.
| Archetype | Code | Writing |
|-----------|------|---------|
| Explorer | haiku | haiku |
| Creator | sonnet | sonnet |
| Maker | sonnet | **sonnet** |
| Guardian | haiku | haiku |
| Skeptic | haiku | haiku |
| Sage | sonnet | **sonnet** |
| Trickster | haiku | haiku |
## Run-Level Aggregation
Opus is user-opt-in only (team preset `model_overrides`).
The `run.complete` event includes cost totals:
**Resolution order:** team preset override > domain override > archetype default.
```jsonl
{
"type": "run.complete",
"data": {
"status": "completed",
"total_tokens_input": 95000,
"total_tokens_output": 33000,
"total_tokens_cache_read": 42000,
"total_cost_usd": 1.45,
"budget_usd": 10.00,
"budget_remaining_usd": 8.55,
"agents_total": 5,
"cost_by_phase": {
"plan": 0.35,
"do": 0.72,
"check": 0.38
},
"cost_by_model": {
"haiku": 0.12,
"sonnet": 1.33
}
}
}
```
## Pre-Agent Cost Estimates
### Cost Summary in Orchestration Report
| Archetype | Typical Input | Typical Output |
|-----------|-------------:|---------------:|
| Explorer | 8k | 4k |
| Creator | 12k | 6k |
| Maker | 15k | 12k |
| Guardian | 10k | 3k |
| Skeptic | 8k | 3k |
| Sage | 12k | 4k |
| Trickster | 8k | 4k |
After each orchestration, the report includes a cost section:
```markdown
## Cost Summary
| Phase | Model(s) | Tokens (in/out) | Cost |
|-------|----------|-----------------|------|
| Plan | haiku, sonnet | 32k / 12k | $0.35 |
| Do | sonnet | 40k / 15k | $0.72 |
| Check | haiku, sonnet | 23k / 6k | $0.38 |
| **Total** | | **95k / 33k** | **$1.45** |
Budget: $10.00 | Spent: $1.45 | Remaining: $8.55
```
After 10+ runs, use actual averages from `metrics.jsonl` instead.
## Budget Configuration
Budgets are defined in team presets or `.archeflow/config.yaml`:
```yaml
# .archeflow/config.yaml
budget:
per_run_usd: 10.00 # Max cost per orchestration run
per_agent_usd: 3.00 # Max cost per individual agent
daily_usd: 50.00 # Max daily spend across all runs
warn_at_percent: 75 # Warn when this % of budget is consumed
per_run_usd: 10.00
per_agent_usd: 3.00
daily_usd: 50.00
warn_at_percent: 75
```
```yaml
# Team preset override
name: story-development
domain: writing
budget:
per_run_usd: 5.00 # Writing runs are usually cheaper
```
Team preset budget overrides the global config for that run.
### Budget Precedence
1. Team preset `budget` (if set)
2. `.archeflow/config.yaml` `budget`
3. No budget (unlimited) — costs are still tracked but not enforced
Team preset budget overrides global config. No budget = unlimited (costs still tracked).
## Budget Enforcement
Budget checks happen at two points:
**Pre-agent:** Estimate cost. If > remaining budget: stop (autonomous) or warn (attended).
### 1. Pre-Agent Check (before spawning)
**Post-agent:** Update total. Warn at threshold. Stop if budget exceeded.
Before each agent is spawned, estimate its cost and check against remaining budget:
## Cost Optimization
```
estimated_agent_cost = estimate_tokens(archetype, task_complexity) * model_price
remaining_budget = budget - sum(costs_so_far)
if estimated_agent_cost > remaining_budget:
WARN: "Estimated cost for {archetype} (${estimated}) would exceed remaining budget (${remaining}). Continue? [y/N]"
```
**In autonomous mode**: if budget would be exceeded, STOP the run and report. Do not prompt — there is no one to answer.
**In attended mode**: warn and ask the user. They can approve the overage or stop.
### 2. Post-Agent Check (after completion)
After each agent completes, update the running total and check:
```
if total_cost > budget * warn_at_percent / 100:
WARN: "Budget ${warn_at_percent}% consumed (${total_cost} of ${budget})"
if total_cost > budget:
STOP: "Budget exceeded (${total_cost} of ${budget}). Run halted."
```
### Pre-Agent Cost Estimation
Rough token estimates by archetype (calibrate over time with actual data from `metrics.jsonl`):
| Archetype | Typical Input | Typical Output | Notes |
|-----------|-------------:|---------------:|-------|
| Explorer | 8k | 4k | Research, reads many files |
| Creator | 12k | 6k | Receives Explorer output, produces plan |
| Maker | 15k | 12k | Largest output (implementation/prose) |
| Guardian | 10k | 3k | Reads diff, structured output |
| Skeptic | 8k | 3k | Reads proposal, structured challenges |
| Sage | 12k | 4k | Reads diff + proposal |
| Trickster | 8k | 4k | Reads diff, generates test cases |
These are starting estimates. After 10+ runs, use actual averages from `metrics.jsonl` instead.
## Cost-Aware Model Selection
Each archetype has a recommended model tier based on the quality requirements of its role:
### Default Model Assignments (Code Domain)
| Archetype | Model | Rationale |
|-----------|-------|-----------|
| Explorer | haiku | Research is structured extraction — cheap model handles it well |
| Creator | sonnet | Design decisions need reasoning quality |
| Maker | sonnet | Implementation needs quality to avoid rework cycles |
| Guardian | haiku | Security/risk review is checklist-driven — structured and cheap |
| Skeptic | haiku | Challenge generation follows patterns — cheap |
| Sage | sonnet | Holistic quality judgment needs nuance |
| Trickster | haiku | Adversarial testing is systematic — cheap |
### Writing Domain Overrides
Writing tasks need higher quality for prose-generating agents:
| Archetype | Model | Rationale |
|-----------|-------|-----------|
| Explorer / story-explorer | haiku | Research is still cheap |
| Creator | sonnet | Outline design needs narrative judgment |
| Maker | **sonnet** | Prose quality is the product — cannot be cheap |
| Guardian | haiku | Plot/continuity checks are structured |
| Skeptic | haiku | Premise challenges are structured |
| Sage / story-sage | **sonnet** | Voice and craft judgment need taste |
| Trickster | haiku | Reader-confusion analysis is systematic |
**When to escalate to opus**: Only for final-pass prose polishing on high-stakes content (book manuscripts, not short stories). Never for review or research agents. The user must explicitly opt in via:
```yaml
# Team preset
model_overrides:
maker: opus # Only for final polish pass
```
### Domain-Driven Model Selection
The effective model for each agent is resolved in this order:
1. **Team preset `model_overrides`** (highest priority — explicit choice)
2. **Domain `model_overrides`** (from `.archeflow/domains/<name>.yaml`)
3. **Archetype default** (from the table above)
4. **Custom archetype `model` field** (from archetype YAML frontmatter)
Example resolution for `story-sage` in a writing run:
- Team preset says nothing about story-sage → skip
- Writing domain says `story-sage: sonnet`**use sonnet**
- Archetype YAML says `model: sonnet` → would have been used if domain didn't specify
## Cost Optimization Strategies
### 1. Prompt Caching
Structure prompts so that stable content comes first (maximizes cache prefix hits):
```
[System prompt — archetype instructions] ← cached across agents in same run
[Domain context — voice profile, persona] ← cached across agents in same run
[Phase context — Explorer output, proposal] ← changes per agent
[Task-specific instructions] ← changes per agent
```
Estimated savings: 30-50% on input tokens for runs with 5+ agents.
### 2. Guardian Fast-Path (A2)
When Guardian approves with 0 issues, skip Skeptic/Sage/Trickster. This saves 2-3 agent calls per cycle. See `archeflow:orchestration` skill, rule A2.
Typical savings: $0.30-0.80 per skipped cycle (depending on models).
### 3. Explorer Cache
Reuse recent Explorer research instead of re-running. See `archeflow:orchestration` skill, Explorer Cache section.
Typical savings: $0.02-0.05 per cache hit (haiku Explorer).
### 4. Batches API for Bulk Operations
When running consistency checks, validation passes, or other non-time-sensitive work across multiple files, use the Batches API (50% discount):
```yaml
# Mark agents as batch-eligible in team presets
batch_eligible:
- guardian # Structured review, can wait
- skeptic # Challenge generation, can wait
```
Only use batches when the user is not waiting for real-time results (overnight runs, autonomous mode).
### 5. Early Termination
If the first cycle produces a clean Guardian pass (A2 fast-path) AND the Maker's self-review checklist is clean, skip the remaining cycles even if `max_cycles > 1`. This avoids spending tokens on unnecessary verification.
1. **Prompt caching:** Stable content first (archetype instructions, voice profiles). Saves 30-50% on input.
2. **Guardian fast-path (A2):** 0 issues = skip remaining reviewers. Saves $0.30-0.80/cycle.
3. **Explorer cache:** Reuse recent research. Saves $0.02-0.05/hit.
4. **Batches API:** For autonomous/overnight review passes (50% discount).
5. **Early termination:** Clean Guardian + clean Maker self-review = skip remaining cycles.
## Daily Cost Tracking
Across runs, maintain a daily cost ledger:
```
.archeflow/costs/<YYYY-MM-DD>.jsonl
```
Each line is one run's cost summary:
```jsonl
{"run_id":"2026-04-03-der-huster","cost_usd":1.45,"tokens_input":95000,"tokens_output":33000,"models":{"haiku":2,"sonnet":3},"domain":"writing"}
{"run_id":"2026-04-03-auth-refactor","cost_usd":2.10,"tokens_input":120000,"tokens_output":45000,"models":{"haiku":3,"sonnet":2},"domain":"code"}
```
Daily budget enforcement reads this file to check `daily_usd` limits before starting new runs.
### Cost Report Command
```bash
# Show today's costs
./lib/archeflow-costs.sh today
# Show costs for a date range
./lib/archeflow-costs.sh 2026-04-01 2026-04-03
# Show costs for a specific run
./lib/archeflow-costs.sh run 2026-04-03-der-huster
```
## Integration with Other Skills
- **`orchestration`**: Calls pre-agent and post-agent budget checks. Includes cost summary in orchestration report.
- **`process-log`**: Cost data is embedded in `agent.complete` and `run.complete` events. No separate cost events needed.
- **`domains`**: Reads `model_overrides` from the active domain to determine effective model per agent.
- **`autonomous-mode`**: Enforces budget strictly (no prompts — just stop on budget exceeded). Uses daily budget to limit overnight spend.
- **`workflow-design`**: Custom workflows can specify per-phase model assignments that override domain defaults.
## Design Principles
1. **Track always, enforce optionally.** Cost data is in every event regardless of whether a budget is set. Budget enforcement is opt-in.
2. **Estimate before spend.** Always estimate before spawning an agent. Surprises are worse than slightly inaccurate estimates.
3. **Cheapest model that works.** Default to haiku. Upgrade to sonnet only when the task demonstrably needs it. Opus is user-opt-in only.
4. **Transparent.** Every cost shows up in the orchestration report. No hidden token spend.
5. **Learn from history.** After enough runs, replace estimates with actual averages from `metrics.jsonl`.
Ledger at `.archeflow/costs/<YYYY-MM-DD>.jsonl`. One line per run with cost, tokens, models, domain. Daily budget enforcement reads this before starting new runs.

View File

@@ -1,181 +1,58 @@
---
name: custom-archetypes
description: Use when the user wants to create domain-specific archetypes specialized agent roles beyond the 7 built-in ones. For example a database reviewer, compliance auditor, or accessibility tester.
description: Use when the user wants to create domain-specific archetypes -- specialized agent roles beyond the 7 built-in ones.
---
# Custom Archetypes
ArcheFlow's 7 built-in archetypes cover general software engineering. Custom archetypes add **domain expertise** — a database specialist, a compliance auditor, an accessibility reviewer.
Add domain expertise beyond the 7 built-ins: database specialist, compliance auditor, accessibility reviewer, etc.
## When to Create One
## When to Create
- A recurring review concern isn't covered by built-in archetypes
- A recurring review concern isn't covered by built-ins
- You need domain knowledge (GDPR, PCI-DSS, WCAG, SQL optimization)
- The same custom instructions are used in multiple orchestrations
- Same custom instructions used across multiple orchestrations
## Archetype Definition
## Definition Format
Create a markdown file in your project at `.archeflow/archetypes/<id>.md`:
Create `.archeflow/archetypes/<id>.md`:
```markdown
# <Name>
## Identity
**ID:** <lowercase-with-hyphens>
**Role:** <one sentence — what this archetype does>
**Lens:** <the question this archetype always asks>
**Role:** <one sentence>
**Lens:** <the one question this archetype always asks>
**Model tier:** cheap | standard | premium
## Behavior
<System prompt injected into the agent. Define:
- What to look for
- How to evaluate
- What output format to use
- Decision criteria for approve/reject>
<System prompt: what to look for, how to evaluate, output format, decision criteria>
## Outputs
<What message types this archetype produces>
- Research (if it gathers info)
- Proposal (if it designs)
- Challenge (if it critiques)
- RiskAssessment (if it assesses risk)
- QualityReport (if it reviews quality)
- Implementation (if it writes code)
<Message types: Research, Proposal, Challenge, RiskAssessment, QualityReport, Implementation>
## Shadow
**Name:** <the dysfunction>
**Strength inverted:** <how the core strength becomes destructive>
**Symptoms:**
- <observable behavior 1>
- <observable behavior 2>
- <observable behavior 3>
**Name:** <dysfunction name>
**Strength inverted:** <how core strength becomes destructive>
**Symptoms:** <3 observable behaviors>
**Correction:** <specific prompt to course-correct>
```
## Examples
## Composition
### Database Specialist
```markdown
# Database Specialist
Combine two archetypes into a focused super-reviewer:
## Identity
**ID:** db-specialist
**Role:** Reviews database schemas, queries, and migration safety
**Lens:** "Will this scale? Will this corrupt data?"
**Model tier:** standard
## Behavior
You review database changes for:
1. Schema design — normalization, index coverage, constraint integrity
2. Query performance — would an EXPLAIN ANALYZE show problems?
3. Migration safety — backward compatible? Zero-downtime possible?
4. Data integrity — foreign keys, unique constraints, NOT NULL where needed
Output APPROVED or REJECTED with findings including:
- Table/column/query location
- Severity (CRITICAL/WARNING/INFO)
- Specific fix
## Outputs
- Challenge
- QualityReport
## Shadow
**Name:** Schema Perfectionist
**Strength inverted:** Database expertise becomes over-normalization and premature optimization
**Symptoms:**
- Demanding 3NF for a 10-row config table
- Requiring indexes for queries that run once a day
- Blocking on theoretical scale issues for an app with 50 users
**Correction:** "Optimize for the current order of magnitude. If the app has 1000 users, design for 10,000. Not for 10 million."
```
### Compliance Auditor
```markdown
# Compliance Auditor
## Identity
**ID:** compliance-auditor
**Role:** Verifies code changes against regulatory requirements
**Lens:** "Could this get us fined?"
**Model tier:** premium
## Behavior
You audit changes against:
1. GDPR — personal data handling, consent, right to deletion
2. PCI-DSS — payment data storage, transmission, access controls
3. Logging — are sensitive fields being logged? PII in error messages?
4. Data retention — are we keeping data longer than allowed?
Reference specific regulation articles in findings.
## Outputs
- RiskAssessment
## Shadow
**Name:** Regulation Zealot
**Strength inverted:** Compliance awareness becomes impossible-to-satisfy requirements
**Symptoms:**
- Citing regulations irrelevant to the change
- Requiring legal review for non-PII code
- Blocking internal tools with customer-facing compliance standards
**Correction:** "Match the compliance level to the data classification. Internal admin tools don't need PCI-DSS Level 1 controls."
```
## Using Custom Archetypes
Reference them by ID when orchestrating:
```
# In the orchestration skill, add to Check phase:
Agent(
description: "db-specialist: review schema changes",
prompt: "<contents of .archeflow/archetypes/db-specialist.md>
Review the changes in branch: <maker's branch>
..."
)
```
Or in a custom workflow, include them in the check phase archetypes list.
## Archetype Composition
Combine two archetypes into a focused super-reviewer when you need a specific perspective but don't want to spawn two agents:
```markdown
# .archeflow/archetypes/security-breaker.md
## Identity
**ID:** security-breaker
**Composed of:** Guardian + Trickster
**Role:** Security review with active exploitation attempts
**Lens:** "Can I break the security model? How?"
**Model tier:** standard
## Behavior
Combine Guardian's checklist-driven security review with Trickster's
adversarial testing. For each Guardian finding, attempt to exploit it.
Only report findings you can actually reproduce.
## Shadow
**Name:** Security Theater
**Strength inverted:** Both shadows compound — paranoid blocking + noise
**Correction:** "Only report findings with reproduction steps. Max 5."
```
**Rules for composition:**
- Max 2 archetypes combined (more defeats the purpose)
- Max 2 archetypes combined
- Combined shadow must address both source shadows
- Use when spawning both separately would waste tokens on overlapping context
## Team Presets
Save common team configurations for your project in `.archeflow/teams/`:
Save team configs in `.archeflow/teams/<name>.yaml`:
```yaml
# .archeflow/teams/backend.yaml
name: backend
description: Standard backend development team
plan: [explorer, creator]
do: [maker]
check: [guardian, sage]
@@ -183,23 +60,12 @@ exit: all_approved
max_cycles: 2
```
```yaml
# .archeflow/teams/security-audit.yaml
name: security-audit
description: Security-focused review team
plan: [explorer, creator]
do: [maker]
check: [guardian, trickster, compliance-auditor]
exit: all_approved
max_cycles: 3
```
Reference custom archetypes by ID in the `check` (or any phase) list.
Use in orchestration: `"Use the backend team preset"` or `"Run security-audit workflow on this change"`
## Rules
## Design Principles
1. **One concern per archetype.** Don't make a "full-stack reviewer."
2. **Concrete shadow.** Vague shadows don't get detected. Use observable symptoms.
3. **Right model tier.** Analytical → cheap. Creative → standard. Judgment-heavy → premium.
4. **Specific lens.** The one question the archetype asks. This focuses behavior.
5. **Composition over sprawl.** Combine before creating from scratch. 2 composed > 3 separate.
1. One concern per archetype
2. Concrete shadow with observable symptoms
3. Right model tier: analytical = cheap, creative = standard, judgment = premium
4. Specific lens question focuses behavior
5. Compose before creating from scratch

View File

@@ -1,193 +0,0 @@
---
name: do-phase
description: Use when acting as Maker in the Do phase. Defines execution rules, worktree protocol, commit discipline, and output format.
---
# Do Phase
Maker implements the Creator's proposal. This skill defines the execution protocol — the agent definition (`agents/maker.md`) has the behavioral rules.
## Execution Protocol
### 1. Read Before Writing
Read the Creator's proposal completely. Identify:
- Files to create or modify (the `### Changes` section)
- Test strategy (the `### Test Strategy` section)
- Scope boundaries (the `### Not Doing` section)
If the proposal is unclear on any point: implement your best interpretation and note the assumption in your output.
### 2. Implementation Order
For each change in the proposal:
1. Write the test first (expect it to fail)
2. Implement the change (make the test pass)
3. Verify existing tests still pass
4. Commit with a descriptive message
For writing domain (stories, prose):
1. Read the outline / scene plan
2. Read the voice profile and character sheets
3. Draft scene by scene, following the outline's emotional beats
4. Self-check: does the voice hold? Does dialogue sound natural?
5. Commit after each scene or logical section
### 3. Commit Discipline
**CRITICAL: Always commit before finishing.** Uncommitted worktree changes are LOST when the agent exits.
Commit conventions:
```
feat: <what was added> # New functionality
fix: <what was fixed> # Bug fix within the task
test: <what was tested> # Test additions
docs: <what was documented> # Documentation only
```
Commit frequency:
- **Code:** After each logical step (one feature, one fix, one test suite)
- **Writing:** After each scene or section (~500-1000 words)
- **Never:** One big commit at the end with everything
### 4. Scope Control
Do exactly what the proposal says. No more, no less.
**In scope:**
- Files listed in the proposal's `### Changes` section
- Tests specified in the `### Test Strategy` section
- Dependencies explicitly mentioned
**Out of scope (even if tempting):**
- Refactoring code you noticed while implementing
- Adding features not in the proposal
- Fixing pre-existing bugs in adjacent code
- Updating documentation beyond what the task requires
If you encounter something that needs fixing but is out of scope: note it in `### Notes` for future work. Don't fix it now.
### 5. Blocker Protocol
If you hit a blocker (dependency missing, test infrastructure broken, proposal contradicts codebase):
1. Document what's blocked and why
2. Document what you completed before the block
3. Commit what you have
4. Stop and report — don't silently work around it
## Worktree Protocol
When running in an isolated git worktree (`isolation: "worktree"`):
```
main branch (untouched)
└── archeflow/maker-<run_id> (worktree branch)
├── commit: implementation step 1
├── commit: implementation step 2
└── commit: implementation step 3 (final)
```
- All work stays on the worktree branch
- Main branch is never modified directly
- The branch name follows the pattern: `archeflow/maker-<run_id>`
- After Check phase approves: the orchestrator merges (not the Maker)
## Output Format
```markdown
## Implementation: <task>
### Files Changed
- `path/file.ext` — What changed (+N -M lines)
### Tests
- N new tests, all passing
- M existing tests still passing
### Commits
1. `feat: description` (hash)
2. `test: description` (hash)
### Notes
- Assumptions made where proposal was unclear
- Out-of-scope issues noticed (for future work)
### Branch
`archeflow/maker-<run_id>` — ready for review
```
For writing domain:
```markdown
## Draft: <story/chapter title>
### Scenes Written
- Scene 1: <title> (~N words)
- Scene 2: <title> (~N words)
### Word Count
- Target: N | Actual: M | Delta: +/-
### Voice Notes
- Dialect usage: N instances (target: moderate)
- Essen/Trinken: present in X/Y scenes
### Commits
1. `feat: scene 1 - <title>` (hash)
2. `feat: scene 2 - <title>` (hash)
### Notes
- Deviations from outline (with reasoning)
```
## With Prior Feedback (Cycle 2+)
When the Maker receives feedback from a prior cycle's Check phase:
1. Read the `act-feedback.md` — focus on the `### For Maker` section
2. Address each finding marked as "routed to Maker"
3. In your output, include a response table:
```markdown
### Feedback Response
| Finding | Source | Action |
|---------|--------|--------|
| Test names unclear | Sage | Fixed — renamed to behavior descriptions |
| Missing edge case | Trickster | Added test for empty input |
```
Do not address findings routed to Creator — those were handled in the revised proposal.
## Quality Checklist (self-check before finishing)
Before your final commit, verify:
- [ ] All proposal changes implemented
- [ ] All new tests pass
- [ ] All existing tests still pass
- [ ] No files modified outside proposal scope
- [ ] Every logical step has its own commit
- [ ] Output summary is complete and accurate
- [ ] Branch name follows convention
## Test-First Gate
Before the Maker's output is accepted, the orchestrator validates that tests were included.
### Validation Logic
Read `do-maker-files.txt`. Check if any file path matches common test patterns:
- `*test*`, `*spec*`, `*.test.*`, `*.spec.*`, `*_test.*`, `*_spec.*`
- Files in directories named `test/`, `tests/`, `__tests__/`, `spec/`
For writing domain projects, this gate is skipped.
### Outcomes
| Result | Action |
|--------|--------|
| Test files found | Pass — proceed to Check phase |
| No test files, code domain | **Warn** — emit WARNING event, note in do-maker.md |
| No test files + Creator specified tests | **Block** — re-run Maker with test instruction (1 retry) |
| Writing domain | Skip gate entirely |
The block case triggers a targeted re-run with prompt:
"The proposal specified these test cases: <test strategy section>. No test files
were found in your changes. Add the specified tests before finishing."
This is one retry within the Do phase, not a full PDCA cycle.

View File

@@ -10,363 +10,92 @@ description: |
# Domain Adapter System
ArcheFlow's PDCA pipeline and archetype system are domain-agnostic. This skill defines how to adapt them to specific domains (writing, code, research, etc.) so that events, metrics, reviews, and context use terminology that makes sense for the work being done.
Adapts the PDCA pipeline and archetype system to specific domains (writing, code, research) so events, metrics, reviews, and context use domain-appropriate terminology.
## Domain Registry
Domain definitions live in `.archeflow/domains/<name>.yaml`. Each domain maps ArcheFlow's generic concepts to domain-specific equivalents and configures what metrics to track, what reviewers should focus on, and what context agents need.
Domain definitions live in `.archeflow/domains/<name>.yaml`. Each maps generic concepts to domain-specific equivalents.
### Writing Domain
### Concept Mapping
```yaml
# .archeflow/domains/writing.yaml
name: writing
description: "Creative writing — stories, novels, non-fiction"
| Generic Concept | Code | Writing | Research |
|----------------|------|---------|----------|
| implementation | code changes | draft/prose | draft/analysis |
| tests | automated tests | consistency checks | citation verification |
| files_changed | files changed | word count delta | section count |
| test_coverage | test coverage % | voice drift score | source coverage |
| code_review | code review | prose review | peer review |
| build | build/compile | compile/export | compile (LaTeX/PDF) |
| deploy | deploy | publish | submit/publish |
| bug | bug | continuity error | unsupported claim |
| feature | feature | scene/chapter | section |
# Concept mapping — how generic ArcheFlow terms translate
concepts:
implementation: "draft/prose"
tests: "consistency checks"
files_changed: "word count delta"
test_coverage: "voice drift score"
code_review: "prose review"
build: "compile/export"
deploy: "publish"
refactor: "revision"
bug: "continuity error"
feature: "scene/chapter"
PR: "manuscript submission"
### Metrics by Domain
# Metrics — what to track instead of lines/files/tests
metrics:
- word_count
- voice_drift_score
- dialect_density
- essen_count # Giesing Gschichten rule: food in every scene
- scene_count
- dialogue_ratio
| Code | Writing | Research |
|------|---------|----------|
| files_changed | word_count | word_count |
| lines_added/removed | voice_drift_score | citation_count |
| tests_added | dialect_density | source_diversity |
| tests_passing | scene_count | claim_count |
| coverage_delta | dialogue_ratio | unsupported_claims |
# Review focus areas — override default Guardian/Sage lenses
review_focus:
guardian:
- plot_coherence
- character_consistency
- timeline_accuracy
- continuity
sage:
- voice_consistency
- prose_quality
- dialect_authenticity
- forbidden_pattern_violations
skeptic:
- premise_strength
- character_motivation
- ending_satisfaction
trickster:
- reader_confusion_points
- pacing_dead_spots
- suspension_of_disbelief_breaks
### Review Focus by Domain
# Context injection — what extra files agents should read per phase
context:
always:
- "voice profile YAML (profiles/*.yaml)"
- "persona YAML (personas/*.yaml)"
- "character sheets (characters/*.yaml)"
plan_phase:
- "series config (colette.yaml)"
- "previous stories (if series, for continuity)"
- "story brief / premise"
do_phase:
- "scene outline from Creator"
- "voice profile (for style reference)"
check_phase:
- "voice profile (for Sage drift scoring)"
- "outline (for Guardian coherence check)"
- "character sheets (for consistency)"
| Reviewer | Code | Writing | Research |
|----------|------|---------|----------|
| Guardian | security, breaking changes, deps, error handling | plot coherence, character consistency, timeline, continuity | factual accuracy, citation validity, logic, methodology |
| Sage | code quality, coverage, docs, patterns | voice consistency, prose quality, dialect authenticity | argument structure, clarity, tone, completeness |
| Skeptic | design assumptions, scalability, edge cases | premise strength, motivation, ending satisfaction | (default) |
| Trickster | malformed input, races, error paths, dep failures | reader confusion, pacing dead spots, disbelief breaks | (default) |
# Model preferences — domain-specific overrides
model_overrides:
maker: sonnet # Prose quality matters more than for code
story-sage: sonnet # Needs taste for voice evaluation
```
### Model Overrides
### Code Domain (Default)
Domains can override default model assignments:
```yaml
# .archeflow/domains/code.yaml
name: code
description: "Software development — applications, libraries, infrastructure"
| Domain | Override | Rationale |
|--------|----------|-----------|
| Writing | maker: sonnet | Prose quality is the product |
| Writing | story-sage: sonnet | Voice evaluation needs taste |
| Research | maker: sonnet | Analysis quality matters |
| Code | (none) | Defaults are calibrated for code |
concepts:
implementation: "code changes"
tests: "automated tests"
files_changed: "files changed"
test_coverage: "test coverage %"
code_review: "code review"
build: "build/compile"
deploy: "deploy"
refactor: "refactor"
bug: "bug"
feature: "feature"
PR: "pull request"
### Context Injection by Domain
metrics:
- files_changed
- lines_added
- lines_removed
- tests_added
- tests_passing
- coverage_delta
Domains declare which extra files agents should read per phase. Context injection is additive (on top of standard ArcheFlow context).
review_focus:
guardian:
- security_vulnerabilities
- breaking_changes
- dependency_risks
- error_handling
sage:
- code_quality
- test_coverage
- documentation
- pattern_consistency
skeptic:
- design_assumptions
- scalability
- alternative_approaches
- edge_cases
trickster:
- malformed_input
- concurrency_races
- error_path_exploitation
- dependency_failures
context:
always:
- "README.md"
- ".archeflow/config.yaml"
plan_phase:
- "relevant source files (Explorer identifies)"
- "existing tests for affected area"
do_phase:
- "Creator's proposal"
- "test fixtures and helpers"
check_phase:
- "git diff from Maker"
- "proposal risk section"
model_overrides: {}
# Code domain uses default archetype model assignments
```
### Research Domain (Example Extension)
```yaml
# .archeflow/domains/research.yaml
name: research
description: "Academic or technical research — papers, analysis, literature review"
concepts:
implementation: "draft/analysis"
tests: "citation verification"
files_changed: "section count"
test_coverage: "source coverage"
code_review: "peer review"
build: "compile (LaTeX/PDF)"
deploy: "submit/publish"
metrics:
- word_count
- citation_count
- source_diversity
- claim_count
- unsupported_claims
review_focus:
guardian:
- factual_accuracy
- citation_validity
- logical_coherence
- methodology_soundness
sage:
- argument_structure
- prose_clarity
- academic_tone
- completeness
context:
always:
- "bibliography/references"
- "research brief"
plan_phase:
- "prior literature notes"
- "methodology constraints"
check_phase:
- "citation database"
- "claims vs. evidence mapping"
model_overrides:
maker: sonnet # Research writing needs quality
```
| Phase | Code | Writing |
|-------|------|---------|
| always | README.md, config.yaml | voice profile, persona, characters |
| plan | relevant source files, existing tests | series config, previous stories, brief |
| do | Creator's proposal, test fixtures | scene outline, voice profile |
| check | git diff, risk section | voice profile (Sage), outline (Guardian), characters |
## Domain Detection
ArcheFlow auto-detects the domain based on project markers. Detection runs once at `run.start` and the result is stored in the run's event stream.
Auto-detects at `run.start`. Result stored in event stream.
### Detection Priority (highest first)
| Priority | Signal | Domain | Rationale |
|----------|--------|--------|-----------|
| 1 | CLI flag `--domain <name>` | as specified | Explicit override always wins |
| 2 | Team preset has `domain: <name>` | as specified | Preset knows its domain |
| 3 | `colette.yaml` exists in project root | `writing` | Colette is the writing platform |
| 4 | `*.bib` or `references/` exists | `research` | Bibliography signals research |
| 5 | `package.json` exists | `code` | Node.js project |
| 6 | `Cargo.toml` exists | `code` | Rust project |
| 7 | `pyproject.toml` exists | `code` | Python project |
| 8 | `go.mod` exists | `code` | Go project |
| 9 | `Makefile` or `CMakeLists.txt` exists | `code` | C/C++ project |
| 10 | No markers found | `code` | Default fallback |
### Detection in Team Presets
Team presets can declare their domain explicitly:
```yaml
# .archeflow/teams/story-development.yaml
name: story-development
domain: writing # <-- explicit domain
description: "Kurzgeschichten-Entwicklung"
plan: [story-explorer, creator]
do: [maker]
check: [guardian, story-sage]
```
When `domain` is set in the preset, detection is skipped entirely.
### Detection Event
Domain detection emits a decision event:
```jsonl
{"ts":"...","run_id":"...","seq":1,"parent":[],"type":"decision","phase":"init","agent":null,"data":{"what":"domain_detection","chosen":"writing","signal":"colette.yaml exists","alternatives":[{"id":"code","reason_rejected":"No code project markers found"}]}}
```
## How Domains Affect Orchestration
### 1. Concept Translation in Reports
The orchestration report and session log use domain-translated terms:
```markdown
# Code domain report
- **Files changed:** 4 files, +120 -30 lines
- **Tests added:** 8 new tests
# Writing domain report (same data, different framing)
- **Word count delta:** +6004 words across 7 scenes
- **Consistency checks:** voice drift 0.12, 2 continuity fixes applied
```
### 2. Domain-Specific Event Data
Events include domain-relevant metrics in their `data` payload:
```jsonl
// Writing domain — agent.complete
{"type":"agent.complete","data":{"archetype":"maker","duration_ms":180000,"word_count":6004,"voice_drift":0.12,"scenes":7,"dialogue_ratio":0.35,"essen_count":4}}
// Code domain — agent.complete
{"type":"agent.complete","data":{"archetype":"maker","duration_ms":90000,"files_changed":5,"tests_added":12,"coverage_delta":"+3%","lines_added":245,"lines_removed":80}}
// Writing domain — run.complete
{"type":"run.complete","data":{"status":"completed","word_count":6004,"voice_drift_final":0.08,"scenes":7,"dialect_density":0.15,"cycles":1}}
// Code domain — run.complete
{"type":"run.complete","data":{"status":"completed","files_changed":4,"tests_total":20,"coverage":"87%","cycles":2}}
```
### 3. Review Focus Override
When a domain defines `review_focus`, reviewers receive domain-specific instructions instead of the defaults:
```
# Without domain adapter (code defaults):
Guardian → "Check for security vulnerabilities, breaking changes..."
# With writing domain adapter:
Guardian → "Check for plot coherence, character consistency, timeline accuracy, continuity..."
```
The orchestration skill reads the domain's `review_focus` and injects it into the reviewer prompt. The archetype's base personality (virtue, shadow, lens) stays the same — only the checklist changes.
### 4. Context Injection
The domain's `context` config tells the orchestrator which additional files to pass to each agent:
```
# Plan phase in writing domain:
# Orchestrator automatically includes voice profile, persona, character sheets, series config
# alongside the standard task description and Explorer output
# Check phase in writing domain:
# Guardian gets the outline (for coherence)
# Sage gets the voice profile (for drift scoring)
```
Context injection is additive — domain context is added on top of ArcheFlow's standard context rules (task description, prior phase output, etc.).
### 5. Model Overrides
If the domain specifies `model_overrides`, those override the default model assignment for the listed archetypes:
```
# Default: Maker uses whatever the workflow assigns (often haiku for cheap tasks)
# Writing domain: Maker uses sonnet (prose quality matters)
# Research domain: Maker uses sonnet (analysis quality matters)
```
Model overrides interact with cost tracking — the cost-tracking skill reads the effective model assignment (after domain overrides) for its estimates.
| Priority | Signal | Domain |
|----------|--------|--------|
| 1 | CLI `--domain <name>` | as specified |
| 2 | Team preset `domain:` field | as specified |
| 3 | `colette.yaml` exists | writing |
| 4 | `*.bib` or `references/` exists | research |
| 5 | `package.json`, `Cargo.toml`, `pyproject.toml`, `go.mod`, `Makefile` | code |
| 6 | No markers | code (default) |
## Adding a New Domain
1. Create `.archeflow/domains/<name>.yaml` following the schema above
2. Add detection signals to the priority table (or rely on `--domain` / team preset)
3. Define custom archetypes if needed (e.g., `story-explorer` for writing)
4. Test with `--domain <name> --dry-run` to verify detection and context injection
1. Create `.archeflow/domains/<name>.yaml` with `name`, `concepts`, `metrics` (minimum required)
2. Optionally add `review_focus`, `context`, `model_overrides`
3. Missing sections fall back to `code` domain defaults
4. Test with `--domain <name> --dry-run`
### Minimum Viable Domain
## How Domains Affect Orchestration
Only `name`, `concepts`, and `metrics` are required. Everything else has sensible defaults:
```yaml
name: legal
description: "Legal document drafting and review"
concepts:
implementation: "draft"
tests: "compliance checks"
code_review: "legal review"
metrics:
- clause_count
- citation_count
- compliance_score
```
Missing sections fall back to the `code` domain defaults.
## Integration with Other Skills
- **`orchestration`**: Reads domain config at `run.start`, applies concept translation, context injection, model overrides, and review focus throughout the run
- **`process-log`**: Domain-specific event data fields are included in `agent.complete` and `run.complete` payloads
- **`cost-tracking`**: Reads `model_overrides` from the active domain to calculate accurate cost estimates
- **`custom-archetypes`**: Domain-specific archetypes (e.g., `story-explorer`, `story-sage`) are defined per-project and referenced in team presets
- **`workflow-design`**: Custom workflows can reference a domain explicitly
## Design Principles
1. **Additive, not replacing.** Domains add context and translate terms. They do not change the PDCA cycle, archetype system, or event schema.
2. **Graceful degradation.** If no domain config exists, everything works as before (code domain defaults).
3. **One domain per run.** A run operates in exactly one domain. Multi-domain projects use separate runs.
4. **Domain config is data, not code.** YAML files, no scripts. Portable across projects.
- **Reports** use domain-translated terms (e.g., "word count delta" instead of "files changed")
- **Events** include domain-relevant metrics in `agent.complete` and `run.complete` payloads
- **Reviewers** receive domain-specific focus checklists (archetype personality stays the same)
- **Context injection** adds domain-declared files to each agent's prompt
- **Model overrides** change which model an archetype uses (interacts with cost-tracking)
- **One domain per run.** Multi-domain projects use separate runs.

View File

@@ -1,200 +0,0 @@
---
name: effectiveness
description: |
Track archetype effectiveness across runs. Scores each archetype on signal-to-noise,
fix rate, cost efficiency, accuracy, and cycle impact. Recommends model tier changes
and archetype removal based on rolling averages.
<example>User: "Which reviewers are actually useful?"</example>
<example>User: "Show archetype effectiveness report"</example>
---
# Agent Effectiveness Scoring
Track which archetypes are most useful vs. which waste tokens. Over multiple runs, build a profile of each archetype's effectiveness and use it to optimize team composition and model selection.
## Storage
```
.archeflow/memory/effectiveness.jsonl # Per-run archetype scores (append-only)
```
## Scoring Dimensions
For each archetype that participates in a run, calculate these scores:
| Dimension | How Measured | Weight |
|-----------|-------------|--------|
| **Signal-to-noise** | useful findings / total findings | 0.30 |
| **Fix rate** | findings that led to actual fixes / total findings | 0.25 |
| **Cost efficiency** | useful findings per dollar spent | 0.20 |
| **Accuracy** | findings not contradicted by other reviewers | 0.15 |
| **Cycle impact** | did this archetype's findings lead to cycle exit? | 0.10 |
### Definitions
- **Useful finding**: A finding in a `review.verdict` event with `severity >= WARNING` (i.e., severity is `warning`, `bug`, or `critical`) AND `fix_required == true`.
- **Actual fix**: A `fix.applied` event whose `source` field matches this archetype (or whose DAG `parent` chain traces back to this archetype's `review.verdict` event).
- **Contradicted finding**: Another reviewer's `review.verdict` has `verdict == "approved"` for the same scope where this archetype flagged an issue. Approximation: if archetype A flags N findings but archetype B approves the same code with 0 findings in overlapping severity categories, A's unmatched findings are considered potentially contradicted.
- **Cycle impact**: The archetype's findings (with `fix_required == true`) resulted in fixes that were part of the final approved cycle. Determined by checking if `fix.applied` events referencing this archetype exist before the final `cycle.boundary` with `met == true`.
### Composite Score
```
composite = (signal_to_noise * 0.30)
+ (fix_rate * 0.25)
+ (cost_efficiency_normalized * 0.20)
+ (accuracy * 0.15)
+ (cycle_impact * 0.10)
```
**Cost efficiency normalization**: Raw cost efficiency is `useful_findings / cost_usd`. To normalize to 0-1 range, use: `min(1.0, raw_efficiency / 100)`. The threshold of 100 means "100 useful findings per dollar" is considered perfect efficiency (achievable with haiku on structured reviews).
## Per-Run Scoring
After `run.complete`, calculate scores for each archetype that participated. The `extract` command does this.
### Per-Run Score Record
```jsonl
{"ts":"2026-04-03T16:00:00Z","run_id":"2026-04-03-der-huster","archetype":"guardian","signal_to_noise":0.85,"fix_rate":1.0,"cost_efficiency":42.5,"accuracy":1.0,"cycle_impact":true,"composite_score":0.91,"tokens":5000,"cost_usd":0.004,"model":"haiku","findings_total":4,"findings_useful":3,"fixes_applied":3}
```
Appended to `.archeflow/memory/effectiveness.jsonl`.
### Scoring Non-Review Archetypes
Only archetypes that produce `review.verdict` events are scored (Guardian, Skeptic, Sage, Trickster, and any custom review archetypes). Non-review archetypes (Explorer, Creator, Maker) are tracked by cost-tracking but not effectiveness-scored, because their output quality is measured differently (by whether the run succeeds, not by individual findings).
## Aggregate Scoring
Across all runs, maintain rolling averages (computed on-demand, not stored):
```jsonl
{"archetype":"guardian","runs":12,"avg_composite":0.88,"avg_signal_noise":0.82,"avg_cost_efficiency":38.2,"trend":"stable","recommendation":"keep"}
{"archetype":"trickster","runs":8,"avg_composite":0.35,"avg_signal_noise":0.20,"avg_cost_efficiency":5.1,"trend":"declining","recommendation":"consider_removing"}
```
### Trend Calculation
Compare the average composite score of the last 5 runs to the 5 runs before that:
- **improving**: last-5 avg > prior-5 avg + 0.05
- **declining**: last-5 avg < prior-5 avg - 0.05
- **stable**: within +/- 0.05
If fewer than 10 runs exist, trend is `"insufficient_data"`.
### Recommendations
Based on aggregate composite scores:
| Composite Score | Recommendation | Meaning |
|----------------|---------------|---------|
| >= 0.70 | `keep` | Archetype is valuable, contributes meaningful findings |
| 0.40 - 0.69 | `optimize` | Consider cheaper model or tighter review lens |
| < 0.40 | `consider_removing` | Might be wasting tokens, review whether it adds value |
## Integration Points
### At Run Start
When the `run` skill initializes, show a brief effectiveness summary for the team's archetypes:
```
Archetype effectiveness (last 10 runs):
guardian: 0.88 (keep) — haiku, $0.004/run avg
sage: 0.72 (keep) — sonnet, $0.08/run avg
skeptic: 0.45 (optimize) — haiku, $0.003/run avg
trickster: 0.32 (consider_removing) — haiku, $0.003/run avg
```
### Model Tier Suggestions
Cross-reference effectiveness with model assignment:
- **High effectiveness on cheap model** (composite >= 0.7, model = haiku): "Keep cheap. Working well."
- **Low effectiveness on cheap model** (composite < 0.5, model = haiku): "Consider upgrading to sonnet — cheap model may not be capturing issues."
- **High effectiveness on expensive model** (composite >= 0.7, model = sonnet): "Try downgrading to haiku — may maintain quality at lower cost."
- **Low effectiveness on expensive model** (composite < 0.5, model = sonnet): "Consider removing — expensive and not contributing."
### Cost-Tracking Integration
Multiply estimated cost by effectiveness to get "value per dollar":
```
value_per_dollar = composite_score / cost_usd
```
This metric helps compare archetypes directly: a cheap archetype with moderate effectiveness may have higher value_per_dollar than an expensive one with high effectiveness.
## Effectiveness Script
**Location:** `lib/archeflow-score.sh`
```
Usage:
archeflow-score.sh extract <events.jsonl> # Score archetypes from a completed run
archeflow-score.sh report # Show aggregate effectiveness report
archeflow-score.sh recommend <team.yaml> # Recommend model tiers for a team
```
### `extract` Command
1. Read all events from the JSONL file
2. Verify a `run.complete` event exists (scoring incomplete runs is unreliable)
3. For each `review.verdict` event:
- Count total findings and useful findings (severity >= WARNING, fix_required)
- Cross-reference with `fix.applied` events via the `source` field or DAG parent chain
- Check for contradictions from other reviewers
- Determine cycle impact
4. Calculate all scoring dimensions and composite score
5. Append per-archetype score records to `.archeflow/memory/effectiveness.jsonl`
### `report` Command
1. Read `.archeflow/memory/effectiveness.jsonl`
2. Group by archetype
3. Calculate rolling averages (last 10 runs per archetype)
4. Calculate trends (last 5 vs. prior 5)
5. Output a markdown table:
```markdown
# Archetype Effectiveness Report
| Archetype | Runs | Avg Score | S/N | Fix Rate | Cost Eff | Accuracy | Trend | Rec |
|-----------|------|-----------|-----|----------|----------|----------|-------|-----|
| guardian | 12 | 0.88 | 0.82 | 0.95 | 38.2 | 0.97 | stable | keep |
| sage | 10 | 0.72 | 0.70 | 0.80 | 12.1 | 0.88 | improving | keep |
| skeptic | 8 | 0.45 | 0.40 | 0.50 | 22.5 | 0.60 | stable | optimize |
| trickster | 8 | 0.35 | 0.20 | 0.30 | 5.1 | 0.55 | declining | consider_removing |
**Model suggestions:**
- skeptic (haiku, score 0.45): Consider upgrading to sonnet or tightening review lens
- trickster (haiku, score 0.35): Consider removing — low signal, low fix rate
```
### `recommend` Command
1. Read the team preset YAML file
2. For each archetype in the team, look up its effectiveness from `.archeflow/memory/effectiveness.jsonl`
3. Cross-reference current model assignment with effectiveness
4. Output recommendations:
```markdown
# Model Recommendations for team: story-development
| Archetype | Current Model | Score | Suggestion |
|-----------|--------------|-------|------------|
| guardian | haiku | 0.88 | Keep haiku — high effectiveness at low cost |
| sage | sonnet | 0.72 | Keep sonnet — quality-sensitive role |
| skeptic | haiku | 0.45 | Try sonnet — may improve signal quality |
| trickster | haiku | 0.35 | Consider removing from team |
```
## Design Principles
1. **Append-only.** Score records are immutable facts. Aggregates are computed on-demand.
2. **Review archetypes only.** Non-review agents (Explorer, Creator, Maker) are not scored — their value is in the final product, not in individual findings.
3. **Relative, not absolute.** Scores are meaningful in comparison (guardian vs. trickster), not as standalone numbers. The thresholds (0.7, 0.4) are starting points — calibrate after 20+ runs.
4. **Actionable.** Every report ends with concrete recommendations (keep, optimize, remove, change model).
5. **Cheap to compute.** One JSONL scan per report. No databases, no external services.

View File

@@ -6,263 +6,86 @@ description: |
Enables rollback to any phase boundary and full audit trail via git history.
<example>Automatically loaded by archeflow:run when git.enabled is true</example>
<example>User: "archeflow rollback --to plan"</example>
<example>User: "Show me the git history for this run"</example>
---
# Git Integration Per-Phase Commit Strategy
# Git Integration -- Per-Phase Commit Strategy
Every ArcheFlow run creates a dedicated branch. Each phase transition and agent completion produces a commit. At run completion, the branch is merged back to the base branch. On failure, the branch stays intact for inspection or rollback.
## Prerequisites
- `archeflow:orchestration` — workflow rules and safety constraints
- `archeflow:process-log` — event schema (git events are emitted alongside process events)
- `archeflow:artifact-routing` — artifact paths that get committed
## Helper Script
All git operations go through the helper script:
```bash
./lib/archeflow-git.sh <command> <run_id> [args...]
```
See `lib/archeflow-git.sh` for full usage. The skill describes *when* to call the script; the script handles *how*.
---
Every run creates branch `archeflow/<run_id>`. Each phase transition and agent completion produces a commit. On success, merge back. On failure, branch stays for inspection.
## Branch Strategy
```
main (or current base branch)
└── archeflow/<run_id> # Created at run.start
├── commit: "archeflow(plan): explorer research"
├── commit: "archeflow(plan): creator outline"
├── commit: "archeflow(plando): phase transition"
├── commit: "archeflow(do): maker draft"
├── commit: "archeflow(do→check): phase transition"
├── commit: "archeflow(check): guardian review"
├── commit: "archeflow(check): sage review"
├── commit: "archeflow(check→act): phase transition"
├── commit: "archeflow(act): apply 6 fixes"
├── commit: "archeflow(act): cycle 1 complete"
└── commit: "archeflow(run): complete — <summary>"
main
+-- archeflow/<run_id>
+-- archeflow(plan): explorer research
+-- archeflow(plan): creator outline
+-- archeflow(plan->do): phase transition
+-- archeflow(do): maker draft
+-- archeflow(check): guardian review
+-- archeflow(act): cycle 1 complete
+-- archeflow(run): complete
```
Branch naming: `archeflow/<run_id>` (e.g., `archeflow/2026-04-03-jwt-auth`).
---
## Commit Points
| Trigger | What to commit | Message format |
|---------|---------------|----------------|
| After `agent.complete` | Agent artifacts + any created/modified files | `archeflow(<phase>): <archetype> <summary>` |
| After `phase.transition` | All artifacts from completed phase | `archeflow(<from><to>): phase transition` |
| After each `fix.applied` | The fixed file | `archeflow(fix): <source> <finding summary>` |
| After `cycle.boundary` | Everything staged | `archeflow(act): cycle <N> <status>` |
| After `run.complete` | Final state + process report | `archeflow(run): complete <summary>` |
---
| Trigger | Message format |
|---------|----------------|
| `agent.complete` | `archeflow(<phase>): <archetype> <summary>` |
| `phase.transition` | `archeflow(<from>-><to>): phase transition` |
| `fix.applied` | `archeflow(fix): <source> -- <finding>` |
| `cycle.boundary` | `archeflow(act): cycle <N> <status>` |
| `run.complete` | `archeflow(run): complete -- <summary>` |
## Commit Protocol
1. **Stage only relevant files.** Never `git add -A`. Stage:
- `.archeflow/artifacts/<run_id>/` — artifacts produced by the current agent/phase
- `.archeflow/events/<run_id>.jsonl` — updated event log
- Any project files created or modified by the current agent (from `do-maker-files.txt` or explicit file list)
2. **Exclude ephemeral files.** Never commit:
- `.archeflow/progress.md` (live progress display, ephemeral)
- `.archeflow/explorer-cache/` (local cache, not run-specific)
- `.archeflow/session-log.md` (separate concern)
3. **Use conventional commit format:** `archeflow(<scope>): <message>`
4. **Signing:** If `git.signing_key` is configured, pass `-c user.signingkey=<key>` to `git commit`.
- Stage only relevant files: `.archeflow/artifacts/<run_id>/`, event log, project files from maker
- Never `git add -A`
- Exclude: `progress.md`, `explorer-cache/`, `session-log.md`
- Use conventional commit format
- Signing opt-in via `git.signing_key` config
### Integration with the Run Skill
## All operations go through `./lib/archeflow-git.sh`:
The `archeflow:run` skill calls git operations at these points:
| Run event | Command |
|-----------|---------|
| `run.start` | `init <run_id>` (create+switch branch) |
| `agent.complete` | `commit <run_id> <phase> "<msg>" [files]` |
| `phase.transition` | `phase-commit <run_id> <phase>` |
| `run.complete` (ok) | `merge <run_id> [--squash|--no-ff]` |
| `run.complete` (fail) | branch preserved |
```
run.start → ./lib/archeflow-git.sh init <run_id>
agent.complete → ./lib/archeflow-git.sh commit <run_id> <phase> "<archetype> <summary>" [files...]
phase.transition → ./lib/archeflow-git.sh phase-commit <run_id> <phase>
fix.applied → ./lib/archeflow-git.sh commit <run_id> fix "<source> — <finding>"
cycle.boundary → ./lib/archeflow-git.sh commit <run_id> act "cycle <N> <status>"
run.complete (ok) → ./lib/archeflow-git.sh merge <run_id> [--squash|--no-ff]
run.complete (fail) → branch preserved, not merged
```
## Merge
---
## Run Lifecycle
### 1. Initialization (`run.start`)
```bash
./lib/archeflow-git.sh init <run_id>
```
This will:
1. Verify a clean working tree (or stash uncommitted changes)
2. Create branch `archeflow/<run_id>` from current HEAD
3. Switch to the new branch
### 2. During Execution (phase commits)
After each agent completes or phase transitions, the run skill calls:
```bash
# After an agent completes:
./lib/archeflow-git.sh commit <run_id> plan "explorer research" \
.archeflow/artifacts/<run_id>/plan-explorer.md
# After a phase transition:
./lib/archeflow-git.sh phase-commit <run_id> plan
```
The `commit` command stages artifact directories and event logs automatically. Additional files can be passed as trailing arguments.
The `phase-commit` command stages all artifacts matching the phase prefix and commits with a transition message.
### 3. Completion (merge)
```bash
# Success — squash merge (default):
./lib/archeflow-git.sh merge <run_id> --squash
# Success — preserve history:
./lib/archeflow-git.sh merge <run_id> --no-ff
# Failure or user abort:
# Do nothing. Branch stays for inspection.
echo "Branch archeflow/<run_id> preserved for inspection."
```
The merge command:
1. Verifies all changes on the branch are committed
2. Switches to the base branch (main or wherever the run started)
3. Merges with the chosen strategy
4. If squash: creates a single commit with `feat: <task summary>`
5. Does NOT delete the branch (user may want to inspect)
### 4. Cleanup (optional, after inspection)
```bash
./lib/archeflow-git.sh cleanup <run_id>
```
Deletes the branch after the user has confirmed the merge is correct.
---
1. Verify all changes committed
2. Switch to base branch
3. Merge with configured strategy (squash default)
4. Branch NOT auto-deleted (user may inspect)
## Rollback
Roll back to the end of any completed phase:
`./lib/archeflow-git.sh rollback <run_id> --to <target>`
```bash
./lib/archeflow-git.sh rollback <run_id> --to plan
```
Targets: `plan`, `do`, `check`, `act`, `cycle-N`. Only works on `archeflow/<run_id>` branch. Resets to last commit for target phase and trims event JSONL.
This will:
1. Find the last commit for the target phase by searching commit messages
2. Show the user what commits will be lost (everything after the target)
3. Perform `git reset --hard <commit>` on the branch
4. Trim the events JSONL to remove events that occurred after the rollback point
## Post-Merge Validation
**Supported rollback targets:** `plan`, `do`, `check`, `act`, or any cycle number (`cycle-1`, `cycle-2`).
**Safety:** Rollback only works on the run's branch, never on main. The script verifies you are on `archeflow/<run_id>` before proceeding.
---
## Status
View the git state of a run:
```bash
./lib/archeflow-git.sh status <run_id>
```
Output:
```
Branch: archeflow/2026-04-03-jwt-auth
Base: main (3 commits ahead)
Commits:
abc1234 archeflow(plan): explorer research
def5678 archeflow(plan): creator outline
ghi9012 archeflow(plan→do): phase transition
jkl3456 archeflow(do): maker implementation
Current phase: do
Files changed (total): 8
Uncommitted changes: none
```
---
After merge, runs project test suite (from `test_command` in config) with 5-min timeout. If tests fail: `git revert --no-edit HEAD`.
## Configuration
In `.archeflow/config.yaml` or a team preset:
```yaml
git:
enabled: true # Default: true. Set false to disable all git operations.
branch_prefix: "archeflow/" # Default. The run_id is appended.
commit_style: conventional # conventional (archeflow(<scope>): msg) | simple (<phase>: msg)
enabled: true
branch_prefix: "archeflow/"
merge_strategy: squash # squash | no-ff | rebase
auto_push: false # Push branch to remote after each commit
signing_key: null # SSH key path for signed commits (e.g., ~/.ssh/id_ed25519.pub)
auto_push: false
signing_key: null
```
The helper script reads this config if it exists. All values have sensible defaults.
---
## Post-Merge Rollback
After merging, the run skill validates the merge by running the project's test suite. If tests fail, the merge is automatically reverted.
### Script
```bash
./lib/archeflow-rollback.sh <run_id> [--test-cmd <cmd>]
```
**Behavior:**
1. Reads `test_command` from `.archeflow/config.yaml` (or uses `--test-cmd` override)
2. Runs the test suite with a 5-minute timeout
3. If tests pass: exits 0 (merge is good)
4. If tests fail: runs `git revert --no-edit HEAD`, emits a `decision` event, exits 1
5. Verifies HEAD is an ArcheFlow merge commit before reverting (warning if not, proceeds anyway)
**Integration with run skill:** Called in section 4c (All Approved) after `archeflow-git.sh merge`. If it returns non-zero, the orchestrator cycles back with "integration test failure" feedback or reports to the user if max cycles are reached.
**Configuration:** Set `test_command` in `.archeflow/config.yaml`:
```yaml
test_command: "npm test" # or "pytest", "cargo test", etc.
```
---
## Safety Rules
These rules are inherited from `archeflow:orchestration` and reinforced here:
1. **Never force-push.** No `--force`, no `--force-with-lease`. If a push fails, diagnose and fix.
2. **Never modify main history.** Merges are forward-only. No rebasing main.
3. **Branch stays intact on failure.** If a run fails or is aborted, the branch is preserved for inspection. Never auto-delete failed branches.
4. **All commits are individually revertable.** Each commit represents a discrete unit of work.
5. **Worktree mode compatibility.** If the Maker runs in a worktree, git-integration commits go to the worktree's branch. The merge happens at the run level, not the worktree level. The Maker's worktree branch is a sub-branch of `archeflow/<run_id>`.
6. **Clean merge or abort.** If a merge produces conflicts, do not force-resolve. Report the conflict, leave the branch intact, and let the user decide.
7. **No signing by default.** Signing is opt-in via config. If configured, all commits on the branch are signed.
---
## Design Principles
1. **Git is the audit trail.** Every phase transition is a commit. `git log` tells the full story of a run.
2. **Rollback is cheap.** Reset to any phase boundary, re-run from there. No need to start over.
3. **Merge strategy is a project decision.** Squash for clean history, no-ff for detailed history. Both are valid.
4. **Events + git = full observability.** Process events capture *what happened* (decisions, verdicts, timing). Git captures *what changed* (files, diffs). Together they provide complete run archaeology.
5. **Fail-safe by default.** Every safety rule defaults to the conservative option. The user must explicitly opt in to destructive operations.
- Never force-push
- Never modify main history
- Branch stays intact on failure
- Clean merge or abort (no force-resolve on conflicts)
- Worktree-compatible (Maker's worktree branch is sub-branch of run branch)

View File

@@ -11,21 +11,14 @@ description: |
# Cross-Run Memory
ArcheFlow forgets everything after each run. If Guardian repeatedly flags the same type of issue (e.g., timeline errors in fiction, missing null checks in code), the next run starts from zero. This skill fixes that by extracting lessons from completed runs and injecting them into future agent prompts.
ArcheFlow forgets everything after each run. This skill extracts lessons from completed runs and injects them into future agent prompts, so recurring issues (timeline errors, missing null checks) are caught proactively.
## Storage
```
.archeflow/memory/lessons.jsonl # Append-only, one lesson per line
```
Each lesson is a single JSON line:
```jsonl
{"id":"m-001","ts":"2026-04-03T14:00:00Z","run_id":"2026-04-03-der-huster","type":"pattern","source":"guardian","description":"Timeline references must match story start day","frequency":2,"severity":"bug","domain":"writing","tags":["continuity","timeline"],"last_seen_run":"2026-04-03-der-huster","runs_since_last_seen":0}
{"id":"m-002","ts":"2026-04-03T15:00:00Z","run_id":"2026-04-03-der-huster","type":"preference","source":"user_feedback","description":"User prefers single bundled PR over many small ones","frequency":1,"severity":"info","domain":"general","tags":["workflow"],"last_seen_run":"","runs_since_last_seen":0}
{"id":"m-003","ts":"2026-04-04T10:00:00Z","run_id":"2026-04-04-auth-fix","type":"archetype_hint","source":"sage","description":"Voice drift most common in long monologue passages","frequency":3,"severity":"warning","domain":"writing","tags":["voice","prose"],"archetype":"story-sage","last_seen_run":"2026-04-04-auth-fix","runs_since_last_seen":0}
{"id":"m-004","ts":"2026-04-04T11:00:00Z","run_id":"2026-04-04-auth-fix","type":"anti_pattern","source":"maker","description":"Splitting auth middleware into per-route handlers causes duplication","frequency":1,"severity":"warning","domain":"code","tags":["auth","middleware"],"last_seen_run":"2026-04-04-auth-fix","runs_since_last_seen":0}
.archeflow/memory/archive.jsonl # Decayed lessons (frequency reached 0)
.archeflow/memory/audit.jsonl # Injection audit trail
```
## Lesson Types
@@ -33,245 +26,95 @@ Each lesson is a single JSON line:
| Type | Source | Description |
|------|--------|-------------|
| `pattern` | Auto-detected | Recurring finding across runs (same category + similar description) |
| `preference` | Manual | User correction or workflow preference (added via CLI) |
| `preference` | Manual | User correction or workflow preference (injected immediately, skips frequency threshold) |
| `archetype_hint` | Auto-detected | Per-archetype insight (e.g., Sage catches voice drift in monologues) |
| `anti_pattern` | Manual or auto | Something that was tried and failed avoid repeating |
| `anti_pattern` | Manual or auto | Something that was tried and failed -- avoid repeating |
## Lesson Fields
## Lesson JSON Fields
| Field | Type | Description |
|-------|------|-------------|
| `id` | string | Unique ID, format `m-NNN` (monotonically increasing) |
| `ts` | ISO 8601 | When the lesson was created or last updated |
| `id` | string | `m-NNN` (monotonically increasing) |
| `ts` | ISO 8601 | Created or last updated |
| `run_id` | string | Run that created or last triggered this lesson |
| `type` | string | One of: `pattern`, `preference`, `archetype_hint`, `anti_pattern` |
| `source` | string | Archetype or `user_feedback` that originated the lesson |
| `type` | string | `pattern`, `preference`, `archetype_hint`, `anti_pattern` |
| `source` | string | Archetype name or `user_feedback` |
| `description` | string | Human-readable lesson text |
| `frequency` | integer | How many times this lesson was triggered |
| `severity` | string | `bug`, `warning`, `info`, or `recommendation` |
| `frequency` | integer | Times this lesson was triggered |
| `severity` | string | `bug`, `warning`, `info`, `recommendation` |
| `domain` | string | `writing`, `code`, `general`, or project-specific |
| `tags` | string[] | Keywords for matching and filtering |
| `archetype` | string or null | For `archetype_hint` type — which archetype this applies to |
| `last_seen_run` | string | Run ID where this lesson was last matched |
| `runs_since_last_seen` | integer | Counter for decay — incremented each run that does NOT trigger this lesson |
| `archetype` | string? | For `archetype_hint` -- which archetype this applies to |
| `last_seen_run` | string | Run ID where last matched |
| `runs_since_last_seen` | integer | Counter for decay |
Example:
```jsonl
{"id":"m-001","ts":"2026-04-03T14:00:00Z","run_id":"2026-04-03-der-huster","type":"pattern","source":"guardian","description":"Timeline references must match story start day","frequency":2,"severity":"bug","domain":"writing","tags":["continuity","timeline"],"last_seen_run":"2026-04-03-der-huster","runs_since_last_seen":0}
```
---
## Auto-Detection
After each `run.complete`, the orchestrator runs lesson extraction:
After each `run.complete`, extract lessons from findings:
```bash
./lib/archeflow-memory.sh extract .archeflow/events/<run_id>.jsonl
```
### Extraction Algorithm
The script reads `review.verdict` events, matches findings against existing lessons by keyword overlap (50%+ threshold), increments frequency on matches, and creates new candidate lessons (frequency: 1) for unmatched findings with severity >= WARNING.
1. **Read all `review.verdict` events** from the completed run's JSONL.
2. **For each finding** in each verdict:
a. Tokenize the finding description into keywords (lowercase, strip punctuation).
b. Compare keywords against each existing lesson's description + tags.
c. **Match threshold:** 50%+ keyword overlap between finding and lesson.
3. **If match found:** Update the existing lesson:
- Increment `frequency` by 1
- Update `ts` to now
- Update `last_seen_run` to current run ID
- Reset `runs_since_last_seen` to 0
4. **If no match AND severity >= WARNING:** Add as candidate lesson with `frequency: 1`.
5. **Candidates become active** when `frequency >= 2` (triggered in a second run).
### Promotion Rule
A finding that appears in only one run stays at `frequency: 1` — it might be a one-off. Once the same pattern appears in a second run (matched by keyword overlap), it gets promoted to `frequency: 2` and becomes eligible for injection.
This prevents noise from single-run anomalies while still capturing genuine recurring issues quickly.
---
**Promotion rule:** A finding needs `frequency >= 2` (seen in 2+ runs) before injection. This filters out one-off noise. Preferences skip this threshold.
## Injection
At run start, before spawning agents, the orchestrator injects relevant lessons:
Before spawning agents, inject relevant lessons:
```bash
LESSONS=$(./lib/archeflow-memory.sh inject <domain> <archetype>)
```
### Injection Rules
Rules: filters by domain (or `general`), optionally by archetype, requires `frequency >= 2`, sorts by frequency descending, caps at 10 lessons. Lessons with `frequency >= 5` are always injected regardless of filters.
1. Read `lessons.jsonl`.
2. Filter by `domain` (exact match or `general`) and optionally by `archetype`.
3. Only include lessons with `frequency >= 2` (confirmed patterns).
4. Sort by frequency descending (most common first).
5. Cap at **10 lessons** per injection.
6. Lessons with `frequency >= 5` are **always injected** regardless of domain/archetype filter (they are universal enough to matter).
### Injection Format
Append to the agent's system prompt as a structured section:
Injected as a markdown section appended to the agent's system prompt:
```markdown
## Known Issues (from past runs)
- Timeline references must match story start day [seen 3x, guardian]
- Voice drift common in monologue passages >200 words [seen 2x, sage]
- Missing null checks in API response handlers [seen 5x, guardian]
```
### Integration with Run Skill
In the `run` skill, after Step 0 (Initialize) and before Step 1 (Plan Phase):
```bash
# Load cross-run memory for this domain
MEMORY_LESSONS=$(./lib/archeflow-memory.sh inject "$DOMAIN" "")
# Inject into Explorer/Creator prompts if non-empty
if [[ -n "$MEMORY_LESSONS" ]]; then
EXPLORER_PROMPT="${EXPLORER_PROMPT}
${MEMORY_LESSONS}"
CREATOR_PROMPT="${CREATOR_PROMPT}
${MEMORY_LESSONS}"
fi
```
For reviewers in the Check phase, inject archetype-specific lessons:
```bash
GUARDIAN_LESSONS=$(./lib/archeflow-memory.sh inject "$DOMAIN" "guardian")
SAGE_LESSONS=$(./lib/archeflow-memory.sh inject "$DOMAIN" "sage")
```
---
## Decay
Lessons that stop being relevant should fade out. After each `run.complete`, apply decay:
After each `run.complete`, apply decay: lessons not seen for 10 runs lose 1 frequency. When frequency reaches 0, the lesson is archived.
```bash
./lib/archeflow-memory.sh decay
```
### Decay Algorithm
1. For every lesson in `lessons.jsonl`:
- If `last_seen_run` is NOT the current run → increment `runs_since_last_seen` by 1
2. If `runs_since_last_seen >= 10`:
- Decrement `frequency` by 1
- Reset `runs_since_last_seen` to 0
3. If `frequency` drops to 0:
- Move the lesson to `.archeflow/memory/archive.jsonl` (append)
- Remove from `lessons.jsonl`
This means a lesson that was seen 5 times but then stops appearing will survive 50 runs of non-triggering before being fully archived (5 decrements x 10 runs each).
---
## Manual Management
### Add a lesson
```bash
archeflow memory add "User prefers single bundled PR" # Add preference (injected immediately)
archeflow memory list # Show all active lessons
archeflow memory forget m-002 # Archive a lesson
```
## Audit Trail
Track which lessons are injected per run and whether they were effective. Pass `--audit <run_id>` to inject to log records. After a run, `audit-check <run_id>` compares injected lessons against review findings: no matching finding = helpful (issue prevented), matching finding = ineffective (issue repeated despite injection).
```bash
archeflow memory add "User prefers single bundled PR over many small ones"
# Internally: ./lib/archeflow-memory.sh add preference "User prefers single bundled PR over many small ones"
./lib/archeflow-memory.sh inject "$DOMAIN" "" --audit "$RUN_ID"
./lib/archeflow-memory.sh audit-check <run_id>
```
Manually added lessons start at `frequency: 1` but with type `preference`, which means they are injected immediately (preferences skip the frequency >= 2 threshold).
### List lessons
```bash
archeflow memory list
# Internally: ./lib/archeflow-memory.sh list
```
Output:
```
ID Freq Type Domain Description
m-001 3 pattern writing Timeline references must match story start day
m-002 1 preference general User prefers single bundled PR over many small ones
m-003 5 archetype_hint writing Voice drift most common in long monologue passages
m-004 1 anti_pattern code Splitting auth middleware causes duplication
```
### Forget a lesson
```bash
archeflow memory forget m-002
# Internally: ./lib/archeflow-memory.sh forget m-002
```
Moves the lesson to `archive.jsonl` regardless of frequency.
---
## Integration Points
| Moment | Action | Script Command |
|--------|--------|----------------|
| After `run.complete` | Extract lessons from findings | `archeflow-memory.sh extract <events.jsonl>` |
| After extraction | Apply decay to all lessons | `archeflow-memory.sh decay` |
| Before agent spawn (run start) | Inject relevant lessons | `archeflow-memory.sh inject <domain> <archetype>` |
| Before agent spawn | Inject relevant lessons | `archeflow-memory.sh inject <domain> <archetype>` |
| User command | Add/list/forget lessons | `archeflow-memory.sh add/list/forget` |
## Audit Trail
Track which lessons are injected into each run and whether they were effective.
### Storage
```
.archeflow/memory/audit.jsonl # Append-only audit log
```
### Injection Audit Record
When `--audit <run_id>` is passed to the `inject` command, an audit record is written:
```jsonl
{"ts":"2026-04-04T10:00:00Z","run_id":"2026-04-04-auth-fix","domain":"code","archetype":"","lessons_injected":["m-001","m-003"],"lesson_count":2}
```
Usage:
```bash
./lib/archeflow-memory.sh inject "$DOMAIN" "" --audit "$RUN_ID"
```
### Effectiveness Check
After a run completes, check whether injected lessons prevented issues:
```bash
./lib/archeflow-memory.sh audit-check <run_id>
```
This command:
1. Reads `audit.jsonl` for lessons injected in the given run
2. Reads the run's event file for `review.verdict` events
3. For each injected lesson, checks keyword overlap between the lesson's description and review findings
4. **No matching finding** = `helpful` (the lesson likely prevented the issue)
5. **Matching finding** = `ineffective` (the issue repeated despite the lesson being injected)
6. Appends effectiveness results to `audit.jsonl`
### Effectiveness Over Time
By querying `audit.jsonl` for effectiveness records, you can measure:
- Which lessons consistently prevent issues (high `helpful` count)
- Which lessons are not working (high `ineffective` count — consider rewording or removing)
- Overall memory system ROI (ratio of helpful to ineffective across all runs)
```bash
# Count effectiveness per lesson
jq -r 'select(.type == "effectiveness_check") | [.lesson_id, .effectiveness] | @tsv' .archeflow/memory/audit.jsonl | sort | uniq -c
```
---
## Design Principles
1. **Append-only storage.** `lessons.jsonl` is append-only during writes; decay rewrites the file in place but preserves all data (archived lessons move to `archive.jsonl`).
2. **Conservative promotion.** A finding must appear in 2+ runs before injection. One-offs are noise.
3. **Graceful degradation.** If `lessons.jsonl` doesn't exist, injection returns empty — no error, no block.
4. **Cheap.** Keyword matching, not embeddings. `jq` for JSON, `grep` for matching. No external services.
5. **Bounded.** Max 10 lessons injected per prompt. Prevents context pollution.

View File

@@ -6,624 +6,138 @@ description: |
and enforces a shared budget. Each sub-run uses the standard `run` skill internally.
<example>User: "archeflow:multi-project" with a multi-run.yaml</example>
<example>User: "Run this across archeflow, colette, and giesing"</example>
<example>User: "archeflow:multi-project --dry-run"</example>
---
# Multi-Project Orchestration
Coordinates ArcheFlow runs across multiple projects in a workspace. Each project gets its own
PDCA run (via the standard `run` skill), but dependencies between projects are respected, artifacts
are shared, and budget is tracked globally.
## Prerequisites
Load these skills (they are referenced throughout):
- `archeflow:run` — single-project PDCA execution loop
- `archeflow:process-log` — event schema and DAG parent rules
- `archeflow:artifact-routing` — artifact naming, context injection, cycle archiving
- `archeflow:cost-tracking` — cost aggregation and budget enforcement
- `archeflow:domains` — domain detection per project
## Invocation
```
archeflow:multi-project # Read from .archeflow/multi-run.yaml
archeflow:multi-project --config path/to.yaml # Explicit config file
archeflow:multi-project --dry-run # Plan phase only for all projects, show cost estimate
archeflow:multi-project --resume <multi-run-id> # Resume a failed/paused multi-run
```
---
Coordinates ArcheFlow runs across multiple projects. Each project gets its own PDCA run (via `run` skill), but dependencies are respected, artifacts shared, and budget tracked globally.
## Multi-Run Definition
A multi-run is defined in YAML, either in `.archeflow/multi-run.yaml` or passed via `--config`.
Defined in `.archeflow/multi-run.yaml` or passed via `--config`.
```yaml
name: "giesing-gschichten-v2"
description: "Write second story with improved ArcheFlow + Colette integration"
projects:
- id: archeflow
path: "../archeflow" # Relative to workspace root, or absolute
path: "../archeflow"
task: "Add memory injection to run skill"
workflow: fast # fast | standard | thorough (optional, auto-select if omitted)
domain: code # Optional, auto-detected if omitted
depends_on: [] # No dependencies — can start immediately
workflow: fast
depends_on: []
- id: colette
path: "../writing.colette"
task: "Add story-specific voice validation command"
workflow: standard
domain: code
depends_on: [] # Independent of archeflow — runs in parallel
task: "Add voice validation command"
depends_on: []
- id: giesing
path: "."
task: "Write story #2 using improved tools"
task: "Write story #2"
workflow: kurzgeschichte
domain: writing
depends_on: [archeflow, colette] # Waits for both to complete
depends_on: [archeflow, colette]
budget:
total_usd: 15.00 # Hard cap — stops all projects when exceeded
per_project_usd: 10.00 # Soft cap — warns but does not stop
parallel: true # Run independent projects concurrently (default: true)
total_usd: 15.00
per_project_usd: 10.00
```
### Definition Rules
**Rules:** Unique `id` per project. `depends_on` references other `id` values. Cycles rejected at validation. At least one project must have empty `depends_on`. `workflow` and `domain` auto-select if omitted.
- `id` must be unique within the multi-run.
- `path` is resolved relative to the directory containing the YAML file unless absolute.
- `depends_on` references other project `id` values. Cycles are rejected at validation time.
- `workflow` and `domain` are optional. If omitted, the `run` skill auto-selects per project.
- At least one project must have an empty `depends_on` (otherwise the DAG has no entry point).
## Dependency Resolution
---
## Workspace Registry Integration
If `docs/project-registry.md` exists at the workspace root, the multi-project skill can:
1. **Auto-discover paths:** When `path` is omitted from a project entry, look up the project `id` in the registry to find its directory.
2. **Validate existence:** Before starting, verify that every project path exists on disk. Abort with a clear error if a path is missing.
3. **Show registry status:** In the progress table, include the project's current sprint goal from the registry alongside the multi-run status.
4. **Update registry:** After the multi-run completes, update each project's status in the registry if meaningful changes were made (new features, completed sprint goals).
---
## Execution Steps
### 0. Validate and Initialize
**0a. Parse and validate the multi-run definition:**
Topological sort of the project DAG determines execution order.
```
1. Read the YAML file.
2. Validate all required fields (name, projects with id/path/task).
3. Resolve all paths to absolute paths.
4. Verify each path exists on disk.
5. Build the dependency DAG.
6. Check for cycles — abort if any detected.
7. Identify the entry-point projects (depends_on is empty).
8. Verify at least one entry-point exists.
```
**0b. Generate multi-run ID and directory structure:**
```bash
MULTI_RUN_ID="$(date -u +%Y-%m-%d)-${name}"
# Master event file
mkdir -p .archeflow/events
touch .archeflow/events/${MULTI_RUN_ID}.jsonl
# Cross-project artifact directory
mkdir -p .archeflow/artifacts/${MULTI_RUN_ID}
for project in ${PROJECT_IDS}; do
mkdir -p .archeflow/artifacts/${MULTI_RUN_ID}/${project}
done
# Progress file
touch .archeflow/multi-progress.md
```
**0c. Emit `multi.start`:**
```jsonl
{"ts":"...","run_id":"<MULTI_RUN_ID>","seq":1,"parent":[],"type":"multi.start","phase":"init","agent":null,"data":{"name":"giesing-v2","description":"...","projects":["archeflow","colette","giesing"],"parallel":true,"budget_total_usd":15.00,"dag":{"archeflow":[],"colette":[],"giesing":["archeflow","colette"]}}}
```
**Track state throughout the multi-run:**
- `MULTI_RUN_ID` — unique multi-run identifier
- `MULTI_SEQ` — master event sequence counter
- `PROJECT_STATUS` — map of project_id to status (`pending | running | completed | failed | blocked | skipped`)
- `PROJECT_RUN_IDS` — map of project_id to its sub-run_id
- `TOTAL_COST` — running cost total across all projects
- `REMAINING_BUDGET` — budget minus total cost
---
### 1. Dependency Resolution
Build a topological sort of the project DAG. This determines execution order.
```
Given:
archeflow: depends_on=[]
colette: depends_on=[]
giesing: depends_on=[archeflow, colette]
Topological layers:
Layer 0 (immediate): [archeflow, colette] # No deps, start now
Layer 1: [giesing] # Depends on Layer 0
```
**Algorithm:**
1. Find all projects with zero unmet dependencies. These form the current layer.
2. When a project completes, remove it from the dependency lists of all downstream projects.
3. Any project whose dependency list becomes empty moves to the ready queue.
4. Repeat until all projects are complete, failed, or blocked.
Independent projects in the same layer run in parallel. When a project completes, downstream projects with all deps met move to the ready queue.
**Cycle detection:** Before starting, verify the DAG is acyclic. Use Kahn's algorithm — if after processing all nodes the sorted list is shorter than the project list, there is a cycle. Report which projects form the cycle and abort.
Cycle detection via Kahn's algorithm. If sorted list is shorter than project list, report the cycle and abort.
---
## Parallel Execution
### 2. Parallel Execution
For each ready project, start a sub-run as a parallel subagent with `isolation: "worktree"`. Each sub-run invokes `archeflow:run` with its own run_id, workflow, domain, and budget slice.
For each project in the ready queue, start a sub-run. Independent projects run concurrently.
When `parallel: false`, run sequentially in topological order.
**Starting a sub-run:**
## Cross-Project Artifacts
```
For each ready project:
1. Set PROJECT_STATUS[project_id] = "running"
2. Generate sub-run ID: MULTI_RUN_ID/project_id
(e.g., "2026-04-03-giesing-v2/archeflow")
3. Emit project.start to master event file
4. cd into the project's path
5. Invoke archeflow:run with:
- run_id = MULTI_RUN_ID/project_id
- workflow = project.workflow (or auto-select)
- domain = project.domain (or auto-detect)
- budget = min(per_project_budget, remaining_total_budget)
- artifact_dir = .archeflow/artifacts/MULTI_RUN_ID/project_id/
6. The sub-run emits its own events to its own JSONL file
inside the project's directory (standard run behavior)
```
When project B depends on A, B's Explorer receives upstream artifact summaries:
- Only summaries injected (not full artifacts)
- Large artifacts (>200 lines): extract summary section only
- Cross-project injection happens only in Plan phase
- Downstream Explorer has filesystem access to full artifacts if needed
**Concurrency model:**
Artifact directory: `.archeflow/artifacts/<MULTI_RUN_ID>/<project_id>/`
When `parallel: true` (default), spawn independent projects as parallel subagents:
## Budget Coordination
```
Agent(
description: "Multi-project sub-run: <project_id> — <task>",
prompt: "Run archeflow:run in <path> with task: <task>.
Run ID: <MULTI_RUN_ID>/<project_id>
Workflow: <workflow>
Domain: <domain>
Budget: $<per_project_budget>
Save artifacts to: .archeflow/artifacts/<MULTI_RUN_ID>/<project_id>/
When complete, report: status, cost, artifact list, and any issues.",
isolation: "worktree",
mode: "bypassPermissions"
)
```
Launch all Layer 0 projects simultaneously. As each completes, check if any Layer 1+ projects become unblocked.
When `parallel: false`, run projects sequentially in topological order. Still respect dependencies — a project does not start until all its dependencies have completed.
---
### 3. Master Events
All multi-run-level events are written to `.archeflow/events/<MULTI_RUN_ID>.jsonl`. These track the overall orchestration, not individual PDCA phases (those go to each project's own event file).
#### Master Event Types
| Event | When | Key Data |
| Level | Type | Behavior |
|-------|------|----------|
| `multi.start` | Multi-run begins | Project list, DAG, budget |
| `project.start` | A sub-run launches | project_id, run_id, path |
| `project.complete` | A sub-run finishes successfully | project_id, status, cost, artifacts |
| `project.failed` | A sub-run fails | project_id, error, cost_so_far |
| `project.blocked` | A dependency failed, blocking this project | project_id, blocked_by |
| `project.unblocked` | All dependencies met, project can start | project_id, unblocked_by |
| `project.skipped` | User chose to skip a blocked project | project_id, reason |
| `budget.warning` | Budget threshold crossed | spent, budget, percent |
| `budget.exceeded` | Hard budget cap hit | spent, budget, halted_projects |
| `multi.complete` | All projects done (or halted) | status, projects_completed, total_cost |
| `total_usd` | Hard cap | Stops ALL projects when exceeded |
| `per_project_usd` | Soft cap | Warns but continues |
#### Example Master Event Stream
**Enforcement points:**
1. Before starting a sub-run: estimate cost, halt if > remaining budget
2. After each sub-run: update total, emit `budget.warning` at threshold, emit `budget.exceeded` at cap
```jsonl
{"seq":1,"type":"multi.start","phase":"init","data":{"name":"giesing-v2","projects":["archeflow","colette","giesing"],"parallel":true,"budget_total_usd":15.00}}
{"seq":2,"type":"project.start","phase":"run","data":{"project":"archeflow","run_id":"2026-04-03-giesing-v2/archeflow","path":"/home/c/projects/archeflow"}}
{"seq":3,"type":"project.start","phase":"run","data":{"project":"colette","run_id":"2026-04-03-giesing-v2/colette","path":"/home/c/projects/writing.colette"}}
{"seq":4,"type":"project.complete","phase":"run","data":{"project":"archeflow","status":"completed","run_id":"2026-04-03-giesing-v2/archeflow","cost_usd":1.20,"artifacts":["plan-explorer.md","plan-creator.md","do-maker.md","check-guardian.md"]}}
{"seq":5,"type":"project.complete","phase":"run","data":{"project":"colette","status":"completed","run_id":"2026-04-03-giesing-v2/colette","cost_usd":1.80,"artifacts":["plan-creator.md","do-maker.md","check-guardian.md","check-sage.md"]}}
{"seq":6,"type":"project.unblocked","phase":"run","data":{"project":"giesing","unblocked_by":["archeflow","colette"]}}
{"seq":7,"type":"project.start","phase":"run","data":{"project":"giesing","run_id":"2026-04-03-giesing-v2/giesing","path":"/home/c/projects/book.giesing-gschichten"}}
{"seq":8,"type":"project.complete","phase":"run","data":{"project":"giesing","status":"completed","run_id":"2026-04-03-giesing-v2/giesing","cost_usd":3.50,"artifacts":["plan-explorer.md","plan-creator.md","do-maker.md","check-guardian.md","check-sage.md"]}}
{"seq":9,"type":"multi.complete","phase":"done","data":{"status":"completed","projects_completed":3,"projects_failed":0,"total_cost_usd":6.50,"budget_remaining_usd":8.50}}
```
Each sub-run receives `min(per_project_usd, remaining_total_budget)` as its budget.
---
### 4. Cross-Project Artifacts
When project B depends on project A, B's agents can access A's artifacts. This is the primary mechanism for cross-project information flow.
#### Artifact Directory Layout
```
.archeflow/artifacts/<MULTI_RUN_ID>/
├── archeflow/ # Sub-run artifacts from archeflow
│ ├── plan-explorer.md
│ ├── plan-creator.md
│ ├── do-maker.md
│ ├── do-maker-files.txt
│ └── check-guardian.md
├── colette/ # Sub-run artifacts from colette
│ ├── plan-creator.md
│ ├── do-maker.md
│ └── check-sage.md
└── giesing/ # Sub-run artifacts from giesing (depends on both)
├── plan-explorer.md # Explorer can reference upstream artifacts
├── plan-creator.md
├── do-maker.md
└── check-guardian.md
```
#### Cross-Project Context Injection
When a dependent project's sub-run starts, inject upstream artifact summaries into the Explorer's prompt:
```markdown
## Upstream Project Results
### archeflow (completed)
Summary: Added memory injection to run skill.
Key artifacts:
- plan-creator.md: <first 20 lines or summary section>
- do-maker.md: <implementation summary>
### colette (completed)
Summary: Added story-specific voice validation command.
Key artifacts:
- plan-creator.md: <first 20 lines or summary section>
- do-maker.md: <implementation summary>
Use these results as context. The changes from these projects are available in their
respective directories and have been committed to their branches.
```
**Rules for cross-project injection:**
- Only inject summaries, not full artifacts (keep context small).
- If an upstream artifact is large (>200 lines), extract the summary/overview section only.
- The dependent project's Explorer has filesystem access to read full upstream artifacts if needed.
- Cross-project injection happens ONLY in the Plan phase (Explorer and Creator). The Maker works from the Creator's proposal, which already incorporates upstream context.
---
### 5. Budget Coordination
The multi-run has a shared budget across all projects.
#### Budget Hierarchy
```
total_usd: 15.00 # Hard cap — stops ALL projects when exceeded
per_project_usd: 10.00 # Soft cap — warns but does not stop individual project
```
#### Budget Tracking
Maintain a running total across all sub-runs:
```
TOTAL_COST = sum of all project costs reported in project.complete events
REMAINING = total_usd - TOTAL_COST
```
#### Budget Enforcement Points
1. **Before starting a sub-run:**
- Estimate the sub-run cost (based on workflow and domain).
- If estimated cost > REMAINING: warn and ask user (attended) or halt (autonomous).
2. **After each sub-run completes:**
- Update TOTAL_COST with actual cost from the sub-run.
- If TOTAL_COST > total_usd * warn_at_percent: emit `budget.warning`.
- If TOTAL_COST > total_usd: emit `budget.exceeded`, halt remaining projects.
3. **Per-project soft cap:**
- Each sub-run receives `min(per_project_usd, REMAINING)` as its budget.
- The `run` skill's own budget enforcement handles the per-project cap.
- If a project exceeds per_project_usd, it warns but continues (soft cap).
#### Budget Events
```jsonl
{"seq":5,"type":"budget.warning","data":{"spent_usd":11.50,"budget_usd":15.00,"percent":77,"message":"Budget 77% consumed"}}
{"seq":8,"type":"budget.exceeded","data":{"spent_usd":15.30,"budget_usd":15.00,"halted_projects":["giesing"],"message":"Hard budget cap exceeded. Halting remaining projects."}}
```
---
### 6. Failure Handling
Failures in one project affect downstream projects but not independent ones.
#### Failure Scenarios
## Failure Handling
| Scenario | Action |
|----------|--------|
| Project fails (run error, test failure, max cycles) | Mark as `failed` in master events. Independent projects continue. |
| Dependency of project X failed | Mark X as `blocked`. Do not start X. |
| Budget exceeded mid-run | Halt the current project. Mark remaining as `blocked`. |
| All entry-point projects fail | Entire multi-run fails. No downstream projects can start. |
| Project fails | Mark `failed`. Independent projects continue. |
| Dependency failed | Mark downstream as `blocked`. Do not start. |
| Budget exceeded | Halt current project. Skip downstream. |
| All entry-points fail | Entire multi-run fails. |
#### Blocked Project Resolution
**Blocked project resolution:**
- Autonomous mode: skip blocked projects, continue independent ones
- Attended mode: offer skip / retry / abort
When a project is blocked because a dependency failed, offer three options:
## Progress Tracking
1. **Skip:** Mark the blocked project as `skipped`. Continue with other independent projects.
2. **Retry:** Re-run the failed dependency. If it succeeds, unblock downstream projects.
3. **Abort:** Stop the entire multi-run. Report what completed and what did not.
In **autonomous mode**, the default action is `skip` — blocked projects are skipped, independent projects continue, and the multi-run completes with partial results.
In **attended mode**, prompt the user with the options above.
#### Failure Events
```jsonl
{"seq":4,"type":"project.failed","data":{"project":"archeflow","error":"Max cycles reached with unresolved CRITICAL findings","cost_usd":2.10}}
{"seq":5,"type":"project.blocked","data":{"project":"giesing","blocked_by":["archeflow"],"reason":"Dependency 'archeflow' failed"}}
```
---
### 7. Progress Tracking
Maintain a live progress file at `.archeflow/multi-progress.md`. Update it after every project state change.
Live progress at `.archeflow/multi-progress.md`, updated after every project state change:
```markdown
# Multi-Run: giesing-v2
Started: 2026-04-03T14:00:00Z
| Project | Status | Domain | Phase | Detail |
|---------|--------|--------|-------|--------|
| archeflow | completed | code | -- | 1 cycle, $1.20 |
| colette | running | code | DO | maker drafting |
| giesing | blocked | writing | -- | waiting for colette |
## Budget
| | Amount |
|---|--------|
| Spent | $3.00 |
| Budget | $15.00 |
| Remaining | $12.00 |
| Utilization | 20% |
## Dependency Graph
```
archeflow ----\
+---> giesing
colette ------/
Budget: $3.00 / $15.00 (20%)
```
## Timeline
- 14:00:00 — Started archeflow, colette (parallel)
- 14:05:23 — archeflow completed ($1.20, 1 cycle)
- 14:06:10 — colette DO phase, maker drafting
```
## Master Events
Update this file after:
- A project starts
- A project changes phase (via status polling or sub-agent reporting)
- A project completes or fails
- A project becomes unblocked
- Budget threshold is crossed
Written to `.archeflow/events/<MULTI_RUN_ID>.jsonl`:
---
| Event | When |
|-------|------|
| `multi.start` | Multi-run begins |
| `project.start` | Sub-run launches |
| `project.complete` | Sub-run succeeds |
| `project.failed` | Sub-run fails |
| `project.blocked` | Dependency failed |
| `project.unblocked` | All deps met |
| `budget.warning` | Threshold crossed |
| `budget.exceeded` | Hard cap hit |
| `multi.complete` | All projects done |
### 8. Completion
## Dry-Run and Resume
When all projects are complete (or blocked/skipped with no more actionable items):
**`--dry-run`:** Validates DAG, runs `archeflow:run --dry-run` per project, shows cost estimate. Does not execute.
**8a. Emit `multi.complete`:**
**`--resume <id>`:** Reconstructs state from master events. Retries failed projects, starts pending ones with deps met.
```jsonl
{"seq":9,"type":"multi.complete","phase":"done","data":{"status":"completed","projects_completed":3,"projects_failed":0,"projects_skipped":0,"total_cost_usd":6.50,"budget_remaining_usd":8.50,"duration_ms":600000}}
```
## Workspace Registry
Status values:
- `completed` — all projects finished successfully
- `partial` — some projects completed, some failed/skipped
- `failed` — no projects completed successfully
- `halted` — stopped due to budget or user abort
If `docs/project-registry.md` exists: auto-discover paths by project id, validate existence, update registry after meaningful changes.
**8b. Generate multi-run report:**
## Completion
```markdown
# Multi-Run Report: giesing-v2
Status values: `completed` (all done), `partial` (some failed/skipped), `failed` (none completed), `halted` (budget/abort).
## Summary
| Metric | Value |
|--------|-------|
| Projects | 3 |
| Completed | 3 |
| Failed | 0 |
| Total cost | $6.50 / $15.00 |
| Duration | 10m 00s |
## Per-Project Results
### archeflow
- **Status:** completed
- **Task:** Add memory injection to run skill
- **Workflow:** fast (1 cycle)
- **Cost:** $1.20
- **Key artifacts:** plan-creator.md, do-maker.md
### colette
- **Status:** completed
- **Task:** Add story-specific voice validation command
- **Workflow:** standard (1 cycle)
- **Cost:** $1.80
- **Key artifacts:** plan-creator.md, do-maker.md, check-sage.md
### giesing
- **Status:** completed
- **Task:** Write story #2 using improved tools
- **Workflow:** kurzgeschichte (2 cycles)
- **Cost:** $3.50
- **Key artifacts:** plan-explorer.md, do-maker.md, check-guardian.md
## Dependency Graph Execution
archeflow (Layer 0) ----> completed
colette (Layer 0) ----> completed
giesing (Layer 1) ----> unblocked ----> completed
## Cost Breakdown
| Project | Plan | Do | Check | Total |
|---------|------|----|-------|-------|
| archeflow | $0.20 | $0.60 | $0.40 | $1.20 |
| colette | $0.30 | $0.80 | $0.70 | $1.80 |
| giesing | $0.50 | $2.00 | $1.00 | $3.50 |
| **Total** | **$1.00** | **$3.40** | **$2.10** | **$6.50** |
```
**8c. Update master event index:**
Append to `.archeflow/events/index.jsonl`:
```jsonl
{"run_id":"2026-04-03-giesing-v2","ts":"2026-04-03T14:10:00Z","type":"multi","task":"Write second story with improved ArcheFlow + Colette integration","status":"completed","projects":3,"total_cost_usd":6.50}
```
**8d. Update workspace registry (if applicable):**
If `docs/project-registry.md` exists and project statuses changed meaningfully, update the registry entries for affected projects.
---
## Dry-Run Mode
When `--dry-run` is specified:
1. Validate the multi-run definition (DAG, paths, budget).
2. For each project (in topological order), run `archeflow:run --dry-run` to get a cost estimate and plan preview.
3. Display a summary:
```
Multi-Run Dry Run: giesing-v2
Projects: 3
Dependency layers: 2
Parallel execution: yes
Layer 0 (parallel):
archeflow — fast workflow, code domain
Estimated cost: $0.50-1.50
colette — standard workflow, code domain
Estimated cost: $1.00-3.00
Layer 1 (after Layer 0):
giesing — kurzgeschichte workflow, writing domain
Estimated cost: $2.00-5.00
Total estimated cost: $3.50-9.50
Budget: $15.00 (sufficient)
Proceed? [y/n]
```
4. Do NOT emit `multi.complete`. The multi-run is paused.
5. If user says yes, start the full multi-run using the validated config.
---
## Resume Mode
When `--resume <multi-run-id>` is specified:
1. Read the master event file `.archeflow/events/<multi-run-id>.jsonl`.
2. Reconstruct `PROJECT_STATUS` from events (which projects completed, failed, are pending).
3. Identify resumable projects:
- `failed` projects can be retried.
- `blocked` projects whose blockers are now `completed` (e.g., after manual fix) can start.
- `pending` projects that were never started can start if their deps are met.
4. Display current state and ask for confirmation.
5. Continue the multi-run from where it left off, appending to the existing master event file.
Resume emits a `multi.resume` event:
```jsonl
{"seq":10,"type":"multi.resume","phase":"init","data":{"resumed_from":"2026-04-03-giesing-v2","projects_completed":["archeflow"],"projects_to_run":["colette","giesing"]}}
```
---
## Integration with Existing Skills
| Skill | Integration Point |
|-------|-------------------|
| `run` | Each sub-run is a standard `archeflow:run` invocation. The multi-project skill wraps and coordinates multiple runs. |
| `process-log` | Master events follow the same schema (ts, run_id, seq, parent, type, phase, agent, data). Sub-run events use the standard event types. |
| `artifact-routing` | Each sub-run follows standard artifact routing internally. Cross-project artifacts follow the injection rules in Section 4. |
| `cost-tracking` | Per-project costs come from sub-run `run.complete` events. The multi-project skill aggregates them and enforces the shared budget. |
| `domains` | Each project auto-detects its domain independently. Different projects in the same multi-run can have different domains. |
| `git-integration` | Each sub-run manages its own branch. The multi-project skill does not merge across repos — each project's Act phase handles its own merge. |
| `autonomous-mode` | Multi-project runs are autonomous-mode-friendly. Budget enforcement is strict (halt, don't prompt). Blocked projects are skipped. |
---
## Progress Display
Throughout the multi-run, display live progress:
```
━━━ ArcheFlow Multi-Run: giesing-v2 ━━━━━━━━━━━━━━━━━━━
Projects: 3 | Budget: $15.00 | Parallel: yes
[archeflow] fast/code -> running (Plan: Creator designing...)
[colette] standard/code -> running (Do: Maker implementing...)
[giesing] kurzgeschichte/writing -> blocked (waiting: archeflow, colette)
Cost: $1.80 / $15.00 (12%)
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
```
Update the display when:
- A project changes state (start, phase change, complete, fail, unblock)
- Budget thresholds are crossed
---
## Error Handling
| Error | Response |
|-------|----------|
| YAML parse error | Abort before starting. Report the parse error with line number. |
| Dependency cycle detected | Abort. Report which projects form the cycle. |
| Project path does not exist | Abort. Report the missing path. |
| Sub-run agent fails to return | Mark project as failed (5-min timeout per the `run` skill). Continue independent projects. |
| Master event write fails | Log warning. Continue orchestration. Events are observation, not control flow. |
| Artifact directory creation fails | Abort the affected project. This is blocking for cross-project artifact sharing. |
| Budget exceeded mid-project | Halt that project immediately. Emit `budget.exceeded`. Skip downstream dependents. |
---
## Design Principles
1. **Each project is autonomous.** Sub-runs use the standard `run` skill without modification. The multi-project skill is a coordinator, not a replacement.
2. **DAG over sequence.** Dependencies are declared, not implied by order. Independent projects always run in parallel when possible.
3. **Shared budget, independent domains.** Budget is global, but each project detects its own domain, selects its own workflow, and manages its own artifacts.
4. **Fail forward.** A failure in one project does not halt independent projects. Only downstream dependents are blocked.
5. **Artifacts are the interface.** Projects communicate through saved artifacts, not shared memory or direct agent-to-agent messaging.
6. **Resume over restart.** Multi-runs can be resumed from any point. Master events provide enough state to reconstruct progress.
7. **Registry-aware.** When a workspace registry exists, use it for discovery and keep it updated. When it does not exist, everything still works.
Final report includes per-project results, cost breakdown by phase, and dependency graph execution timeline.

View File

@@ -1,574 +0,0 @@
---
name: orchestration
description: Use when executing a multi-agent orchestration — spawning archetype agents, managing PDCA cycles, coordinating worktrees, and merging results. This is the step-by-step execution guide.
---
# Orchestration Execution
This skill guides you through running a full ArcheFlow orchestration using Claude Code's native Agent tool and git worktrees.
## Step 0: Choose a Workflow
If `.archeflow/teams/<name>.yaml` exists, the user can reference a team preset: `"Use the backend team"`. Load the preset's phase config instead of built-in defaults. See `archeflow:custom-archetypes` skill for preset format.
Otherwise, assess the task and pick:
| Signal | Workflow |
|--------|----------|
| Small fix, low risk, single concern | `fast` (1 cycle) |
| Feature, multiple files, moderate risk | `standard` (2 cycles) |
| Security-sensitive, breaking changes, public API | `thorough` (3 cycles) |
## Workflow Adaptation Rules
The initial workflow choice is a starting point, not a commitment. These rules adapt the workflow at runtime. Each rule specifies when it evaluates (which phase boundary).
### A3: Confidence Gate (evaluates: after Plan, before Do)
**When:** Creator's confidence table has any axis below 0.5.
**Action by axis:**
| Axis | Score < 0.5 Action |
|------|-------------------|
| Task understanding | **Pause.** Ask user to clarify before proceeding. Do not spawn Maker. |
| Solution completeness | **Upgrade to standard.** Add Explorer before Maker starts. |
| Risk coverage | **Spawn mini-Explorer** for the specific risky area (parallel, 5 min max). Maker can proceed. |
A3 runs before any Do/Check agents spawn, so there are no cancellation issues.
### A1: Conditional Escalation (evaluates: after Check, before next cycle)
**When:** Guardian rejects with 2+ CRITICAL findings in a `fast` workflow.
**Action:** Escalate to `standard` for the **next cycle** — add Skeptic + Sage to the reviewer roster.
**Why:** If Guardian found serious issues, more perspectives help find root causes.
**Sticky:** Once escalated, the workflow stays escalated for all remaining cycles. A2 does not apply to escalated workflows.
### A2: Guardian Fast-Path (evaluates: after Guardian, before spawning other reviewers)
**When:** Guardian finds 0 CRITICAL and 0 WARNING in a non-escalated `standard` or `thorough` workflow.
**Action:** Do not spawn Skeptic, Sage, or Trickster. Proceed directly to Act phase.
**Why:** Guardian's security review is the strictest gate. Clean pass = safe to skip additional reviewers.
**Critical:** Evaluate A2 **after Guardian completes but before other reviewers are spawned.** Do not spawn reviewers in parallel with Guardian — spawn Guardian first, check A2, then spawn remaining reviewers only if A2 doesn't trigger.
**Does not apply to:** Escalated workflows (A1 triggered), or first cycle of `thorough` workflows (Trickster is mandatory on first pass).
**Log:** Note "Guardian fast-path taken" in orchestration report.
### Evaluation Order
```
Plan phase completes → A3 (confidence gate)
Guardian completes → A2 (fast-path check) → if clean, skip other reviewers
↓ if not, spawn other reviewers
Check phase done → A1 (escalation check) → if 2+ CRITICALs in fast, next cycle is standard
```
## Process Logging
If `.archeflow/events/` exists (or should be created), emit structured events throughout orchestration. See `archeflow:process-log` skill for full schema.
**Quick reference — emit at these points:**
```
run.start → After workflow selection, before first agent
agent.start → Before each Agent tool call
agent.complete → After each Agent returns (include duration, tokens, summary, artifacts)
decision → When choosing between alternatives (plot direction, approach, fix strategy)
phase.transition → At Plan→Do, Do→Check, Check→Act boundaries
review.verdict → After each reviewer delivers verdict
fix.applied → After each edit addressing a review finding
cycle.boundary → End of PDCA cycle
shadow.detected → When shadow threshold triggers
run.complete → After final Act phase (include totals)
```
**Helper:** `./lib/archeflow-event.sh <run_id> <type> <phase> <agent> '<json>'`
**Report:** `./lib/archeflow-report.sh .archeflow/events/<run_id>.jsonl`
Events are optional — if the events dir doesn't exist, skip logging. Never let logging block orchestration.
---
## Step 1: Plan Phase
Spawn agents sequentially — Creator needs Explorer's findings.
### Explorer (if standard or thorough)
**Context to include:** Task description, relevant file paths, codebase access.
**Context to exclude:** Prior proposals, review outputs, implementation details, feedback from previous cycles.
```
Agent(
description: "🔍 Explorer: research context",
prompt: "<task description>
You are the EXPLORER archetype.
Research the codebase to understand:
1. What files and functions are involved
2. What dependencies exist
3. What tests currently cover this area
4. What patterns the codebase uses
Write your findings as a structured research report.
Be thorough but focused — no rabbit holes.",
subagent_type: "Explore"
)
```
### Creator
**Context to include:** Task description, Explorer's research output. On cycle 2+: prior cycle's structured feedback (see Cycle Feedback Protocol).
**Context to exclude:** Raw file contents (Explorer already summarized), git diffs, reviewer full outputs.
**Fast workflow only (no Explorer):** The Creator must perform a Mini-Reflect before proposing:
1. Restate the task in your own words (catch misunderstandings early)
2. List 3 assumptions you're making
3. Name the one risk that would cause most damage if wrong
```
Agent(
description: "🏗️ Creator: design proposal",
prompt: "<task description>
You are the CREATOR archetype.
<if fast workflow (no Explorer): Before proposing, perform a Mini-Reflect:
1. Restate the task in one sentence
2. List 3 assumptions you're making
3. Name the highest-damage risk
Then propose.>
<if standard/thorough: Based on the research findings: <Explorer's output>>
<if cycle 2+: Prior cycle feedback: <structured feedback — see Cycle Feedback Protocol>>
Design a solution proposal including:
1. Architecture decisions (with rationale)
2. Files to create/modify (with specific changes)
3. Alternatives considered (at least 2, with rejection rationale)
4. Test strategy
5. Confidence (scored by axis: task understanding, solution completeness, risk coverage)
6. Risks you foresee
<if cycle 2+: 6. How you addressed each unresolved issue from prior feedback>
Be decisive. Ship a clear plan, not a menu of options.",
subagent_type: "Plan"
)
```
## Step 2: Do Phase
Spawn Maker in an **isolated worktree** so changes don't affect main.
**Context to include:** Creator's proposal only. On cycle 2+: implementation-routed feedback from Sage/Trickster.
**Context to exclude:** Explorer's research, Guardian/Skeptic findings (those go to Creator).
```
Agent(
description: "⚒️ Maker: implement proposal",
prompt: "<task description>
You are the MAKER archetype.
Implement this proposal: <Creator's output>
<if cycle 2+: Implementation feedback from prior cycle: <Sage/Trickster findings only>>
Rules:
1. Follow the proposal exactly — don't redesign
2. Write tests for every behavioral change
3. Commit with descriptive messages
4. Run existing tests — nothing may break
5. If the proposal is unclear, implement your best interpretation and note it
Do NOT skip tests. Do NOT refactor unrelated code.
BEFORE finishing — Self-Review Checklist:
1. Did I change ALL files listed in the proposal's Changes section?
2. Did I add tests for each behavioral change?
3. Are there files in my diff NOT listed in the proposal? If yes, revert them.
4. Do all existing tests still pass?
Report any gaps in your Implementation summary.",
isolation: "worktree",
mode: "bypassPermissions"
)
```
**Critical:** The Maker MUST commit its changes before finishing. Uncommitted changes in a worktree are lost.
## Step 3: Check Phase
Spawn Guardian **first**. After Guardian completes, check adaptation rule A2 (fast-path). If A2 triggers (0 CRITICAL, 0 WARNING, non-escalated workflow), skip remaining reviewers and proceed to Act. Otherwise, spawn remaining reviewers **in parallel**.
### Guardian (always runs first)
**Context to include:** Maker's git diff, proposal risk section only.
**Context to exclude:** Explorer's research, full proposal, other reviewer outputs.
```
Agent(
description: "🛡️ Guardian: security and risk review",
prompt: "You are the GUARDIAN archetype.
Review the changes in branch: <maker's branch>
Assess:
1. Security vulnerabilities (injection, auth bypass, data exposure)
2. Reliability risks (error handling, edge cases, race conditions)
3. Breaking changes (API compatibility, schema migrations)
4. Dependency risks (new deps, version conflicts)
Output: APPROVED or REJECTED with specific findings.
Each finding: | file:line | CRITICAL/WARNING/INFO | category | description | fix |
Categories: security, reliability, design, breaking-change, dependency
Be rigorous but practical — flag real risks, not theoretical ones."
)
```
### Skeptic (if standard or thorough)
**Context to include:** Creator's proposal (focus on assumptions section).
**Context to exclude:** Git diff details, Explorer's research, other reviewer outputs.
```
Agent(
description: "🤔 Skeptic: challenge assumptions",
prompt: "You are the SKEPTIC archetype.
Review the proposal: <Creator's proposal>
Challenge:
1. Assumptions in the design — what if they're wrong?
2. Alternative approaches not considered
3. Edge cases not tested
4. Scalability concerns
Output: APPROVED or REJECTED with counterarguments.
Each finding: | file:line | CRITICAL/WARNING/INFO | category | description | fix |
Categories: design, quality, testing, scalability
Be constructive — every challenge must include a suggested alternative."
)
```
### Sage (if standard or thorough)
**Context to include:** Creator's proposal, Maker's git diff, implementation summary.
**Context to exclude:** Explorer's raw research, other reviewer outputs.
```
Agent(
description: "📚 Sage: holistic quality review",
prompt: "You are the SAGE archetype.
Review the changes in branch: <maker's branch>
Evaluate holistically:
1. Code quality (readability, maintainability, simplicity)
2. Test coverage (are the tests meaningful, not just present?)
3. Documentation (does the change need docs?)
4. Consistency with codebase patterns
Output: APPROVED or REJECTED with quality findings.
Each finding: | file:line | CRITICAL/WARNING/INFO | category | description | fix |
Categories: quality, testing, design, consistency
Judge like a senior engineer doing a PR review."
)
```
### Trickster (if thorough only)
**Context to include:** Maker's git diff only.
**Context to exclude:** Everything else — proposal, research, other reviews.
```
Agent(
description: "🃏 Trickster: adversarial testing",
prompt: "You are the TRICKSTER archetype.
Try to break the changes in branch: <maker's branch>
Attack vectors:
1. Malformed input, boundary values, empty/null/huge data
2. Concurrency and race conditions
3. Error path exploitation
4. Dependency failure scenarios
Output: APPROVED or REJECTED with edge cases found.
Each finding: | file:line | CRITICAL/WARNING/INFO | category | description | fix |
Categories: security, reliability, testing
Think like a QA engineer who gets paid per bug found."
)
```
## Step 4: Act Phase
Collect all reviewer outputs and decide.
### Completion Promise (optional)
If the user defined explicit done criteria with the task, check them now:
```
Completion criteria: <test command passes> AND <Guardian approves>
Example: "done when pytest passes and Guardian approves with 0 CRITICAL"
```
If completion criteria are defined, **all criteria must pass** — reviewer approval alone is not sufficient. If tests fail but reviewers approved, cycle back with "tests failing" as feedback to Creator.
### All Approved (and completion criteria met)
1. **Pre-merge hooks:** Check `.archeflow/hooks.yaml` for `pre-merge` hooks. Run them. If `fail_action: abort`, stop and report.
2. Merge the Maker's worktree branch into the target branch
3. **Post-merge hooks:** Run `post-merge` hooks from `.archeflow/hooks.yaml` if defined. Then run the project's test suite on the merged branch
- Tests pass → proceed to step 3
- Tests fail → **auto-revert** the merge commit, report the failure, and cycle back with "integration test failure on main" as feedback
3. Report: what was implemented, what was reviewed, any warnings noted
4. Clean up the worktree
5. Record metrics (see Orchestration Metrics)
### Issues Found (and cycles remaining)
1. Build structured feedback using the Cycle Feedback Protocol below
2. Go back to Step 1 (Plan) with the feedback
3. Creator revises the proposal, addressing each unresolved issue
4. Maker re-implements in a fresh worktree
5. Reviewers check again
### Max Cycles Reached with Unresolved Issues
1. Report all unresolved findings to the user
2. Present the best implementation so far (on its branch)
3. Let the user decide: merge as-is, fix manually, or abandon
---
## Cycle Feedback Protocol
After the Check phase, build structured feedback for the next cycle. This replaces dumping raw reviewer output.
### 1. Extract Findings
Parse each reviewer's output into the standardized format:
```markdown
## Cycle N Feedback
### Unresolved Issues
| Source | Severity | Category | Issue | Route to |
|--------|----------|----------|-------|----------|
| Guardian | CRITICAL | security | SQL injection in user input | Creator |
| Skeptic | WARNING | design | Assumes single-tenant only | Creator |
| Sage | WARNING | quality | Test names don't describe behavior | Maker |
| Trickster | CRITICAL | reliability | Empty string bypasses validation | Creator |
### Resolved (from cycle N-1)
| Source | Issue | Resolution |
|--------|-------|------------|
| Guardian | Missing rate limit | Added rate limiter middleware |
```
### 2. Route Feedback
Not all findings go to the same agent:
| Source | Category | Routes to | Reason |
|--------|----------|-----------|--------|
| Guardian | security, breaking-change | **Creator** | Design must change |
| Guardian | reliability, dependency | **Creator** | Architectural decision needed |
| Skeptic | design, scalability | **Creator** | Assumptions need revision |
| Sage | quality, consistency | **Maker** | Implementation refinement |
| Sage | testing | **Maker** | Test gap, not design flaw |
| Trickster | reliability (design flaw) | **Creator** | Needs redesign |
| Trickster | reliability (test gap) | **Maker** | Needs more tests |
| Trickster | testing | **Maker** | Edge case not covered |
**Disambiguation rule:** When in doubt: if the fix requires changing the approach, route to Creator. If it requires changing the code within the existing approach, route to Maker.
### 3. Track Resolution
Compare cycle N findings against cycle N-1:
- If a prior finding no longer appears in the same category → mark **resolved**
- If a prior finding persists → it stays **unresolved** with an incremented cycle count
- If new findings appear → add as new unresolved issues
This prevents regression and gives the Creator/Maker a clear list of what to address.
### 4. Convergence Detection
If the **same finding** (same category + same file location) appears **unresolved in 2 consecutive cycles**, escalate to user:
> "Finding persists across 2 cycles: [Guardian] CRITICAL security — SQL injection in src/auth.ts:48. This may need human judgment or a different approach."
Do not cycle again blindly. The issue is likely structural (wrong design, not wrong implementation) and needs human input.
### 5. Cross-Archetype Dedup
If two reviewers raise the same issue (same file + same category + similar description), merge into one finding in the consolidated output:
```
| Guardian + Skeptic | CRITICAL | security | Input not sanitized (src/api.ts:30) | Add validation |
```
Don't double-count in severity tallies. Route to the higher-priority destination (Creator over Maker).
---
## Orchestration Metrics
Track lightweight metrics throughout the orchestration. No token counting (unreliable from skill layer) — just timing and outcomes.
### Per-Phase Logging
After each phase completes, note:
```
| Phase | Duration | Agents | Outcome |
|-------|----------|--------|---------|
| Plan | 45s | 2 | Proposal ready (confidence: 0.8) |
| Do | 90s | 1 | 4 files changed, 8 tests added |
| Check | 60s | 3 | 1 REJECTED (Guardian), 2 APPROVED |
| Act | — | — | Cycle back → feedback built |
```
### Orchestration Summary
At orchestration end, include in the report:
```markdown
## Orchestration Metrics
| Metric | Value |
|--------|-------|
| Workflow | standard |
| Cycles | 2 of 2 |
| Total duration | 4m 30s |
| Agents spawned | 9 |
| Findings (total) | 5 |
| Findings (critical) | 1 |
| Findings (resolved) | 4 |
| Shadow detections | 0 |
```
Use this data to calibrate future workflow selection — if fast workflows consistently need 0 cycles of revision, the task was well-scoped.
---
## Autonomous Mode
When running unattended (overnight sessions, batch queues), add these behaviors to the orchestration loop:
### Between-Task Checkpoint
After each task completes (success or failure):
1. **Commit and push** all changes immediately
2. **Update session log** at `.archeflow/session-log.md` with task outcome
3. **Check stop conditions** before starting next task:
- 3 consecutive failures → STOP
- Shadow escalation (same shadow 3+ times) → STOP
- Test suite broken after merge → REVERT and STOP
- Destructive action detected → STOP
### Session Log Protocol
**Primary:** Emit `run.complete` event to `.archeflow/events/<run_id>.jsonl` (see Process Logging section above). The event stream is the source of truth.
**Secondary:** Also write a human-readable summary to `.archeflow/session-log.md`:
```markdown
## Task N: <description>
**Workflow:** standard | **Status:** COMPLETED/FAILED
**Cycles:** 1 of 2
**Findings:** Guardian APPROVED, Skeptic APPROVED, Sage WARNING (test names)
**Files changed:** 5 | **Tests added:** 12
**Branch:** merged to main (commit abc1234) | OR: archeflow/maker-xyz (NOT merged)
**Duration:** 8 min
**Events:** `.archeflow/events/<run_id>.jsonl` (full process log)
```
Generate the full Markdown report: `./lib/archeflow-report.sh .archeflow/events/<run_id>.jsonl`
### Safety Rules
- Never force-push. Never modify main history.
- All work stays on worktree branches until explicitly merged
- Merges use `--no-ff` — individually revertable
- Failed tasks leave branches intact for manual inspection
For full autonomous mode details (task queues, overnight checklists, user controls): load the `archeflow:autonomous-mode` skill.
---
## Shadow Monitoring
During orchestration, watch for shadow activation after each agent completes. Quick checklist:
| Archetype | Shadow | Quick Check |
|-----------|--------|-------------|
| Explorer | Rabbit Hole | Output >2000 words without Recommendation section? |
| Creator | Over-Architect | >2 new abstractions for one feature? |
| Maker | Rogue | No test files in changeset? Files outside proposal? |
| Guardian | Paranoid | CRITICAL:WARNING ratio >2:1? Zero approvals? |
| Skeptic | Paralytic | >7 challenges? <50% have alternatives? |
| Trickster | False Alarm | Findings in untouched code? >10 findings? |
| Sage | Bureaucrat | Review >2x code change length? |
On detection: apply correction prompt from `archeflow:shadow-detection` skill. On second detection of same shadow: replace agent. On 3+ shadows in same cycle: escalate to user.
---
## Parallel Team Orchestration
When running multiple independent tasks, spawn parallel ArcheFlow teams. Each team runs its own PDCA cycle on a separate worktree.
### Rules
1. **Non-overlapping file scope:** Each team must work on different files. If two tasks touch the same file, run them sequentially.
2. **Independent worktrees:** Each team's Maker gets its own worktree branch (`archeflow/team-1-maker`, `archeflow/team-2-maker`).
3. **First-finished-first-merged:** Teams merge in completion order. Later teams rebase onto the updated main before their own merge.
4. **Merge conflict handling:** If rebase fails, the later team re-runs its Check phase against the merged main. If conflicts are structural, escalate to user.
5. **Max 3 parallel teams:** More causes diminishing returns and merge headaches.
### Spawning Parallel Teams
```
# Launch 2-3 teams in a single message with multiple Agent calls:
Agent(description: "🏗️ Team 1: pagination fix (fast)", ...)
Agent(description: "🏗️ Team 2: JWT auth (standard)", ...)
Agent(description: "🏗️ Team 3: logging refactor (fast)", ...)
```
Each team follows the full PDCA steps independently. The orchestrator monitors all teams and handles merges.
---
## Reviewer Profiles
Projects can configure which reviewers matter in `.archeflow/config.yaml`:
```yaml
reviewers:
always: [guardian] # Always runs
default: [sage] # Runs in standard+thorough
thorough_only: [trickster] # Only in thorough
skip: [skeptic] # Never runs for this project
```
If no config exists, use the built-in workflow defaults. Profiles save tokens by not spawning reviewers that add little value for the specific project.
## Explorer Cache
If the same code area was explored recently, skip Explorer and reuse prior research:
**Cache hit criteria:** Same files affected (>70% overlap by path) AND prior research is <24 hours old AND no commits to those files since the research.
**On cache hit:** Show the prior research to Creator with a note: "Using cached Explorer research from [timestamp]. If the codebase changed significantly, re-run Explorer."
**On cache miss:** Run Explorer normally.
Cache is stored in `.archeflow/explorer-cache/` as timestamped markdown files. The orchestrator checks for matches before spawning Explorer.
## Learning from History
Track which archetypes catch real issues per project over time. After each orchestration, append to `.archeflow/metrics.jsonl`:
```json
{"task": "...", "archetype": "guardian", "findings": 2, "critical": 1, "resolved": 2, "useful": true}
{"task": "...", "archetype": "skeptic", "findings": 3, "critical": 0, "resolved": 0, "useful": false}
```
A finding is **useful** if it was resolved (led to a code change) rather than dismissed.
After 10+ orchestrations, the orchestrator can recommend reviewer profile changes:
- "Skeptic has found 0 useful issues in 8 runs — consider moving to `skip` or `thorough_only`"
- "Guardian catches critical issues in 80% of runs — confirmed as essential"
This is advisory, not automatic. The user decides based on the data.
---
## Orchestration Report
After completion, summarize:
```markdown
## ArcheFlow Orchestration Report
- **Task:** <description>
- **Workflow:** standard (2 cycles)
- **Cycle 1:** Guardian rejected (SQL injection in user input handler)
- **Cycle 2:** All approved after input sanitization added
- **Files changed:** 4 files, +120 -30 lines
- **Tests added:** 8 new tests
- **Branch:** archeflow/maker-<id> → merged to main
- **Metrics:** 9 agents, 4m 30s, 5 findings (4 resolved, 1 info remaining)
```

View File

@@ -1,119 +0,0 @@
---
name: plan-phase
description: Use when acting as Explorer or Creator in the Plan phase. Defines output formats for research and proposals.
---
# Plan Phase
Explorer researches, then Creator designs. Sequential — Creator needs Explorer's findings.
## Explorer Output Format
```markdown
## Research: <task>
### Affected Code
- `path/file.ext` — description (L<start>-<end>)
### Dependencies
- What depends on what, what breaks if changed
### Patterns
- How the codebase solves similar problems
### Risks
- What could go wrong
### Recommendation
<one paragraph: approach + rationale>
```
## Creator Output Format
```markdown
## Proposal: <task>
### Mini-Reflect (fast workflow only — skip if Explorer ran)
- **Task restated:** <one sentence>
- **Assumptions:** 1) ... 2) ... 3) ...
- **Highest-damage risk:** <the one thing that would hurt most if wrong>
### Architecture Decision
<What and WHY>
### Alternatives Considered
| Approach | Why Rejected |
|----------|-------------|
| <option A> | <reason> |
| <option B> | <reason> |
### Changes
1. **`path/file.ext`** — What changes and why
2. **`path/test.ext`** — What tests to add
### Test Strategy
- <specific test cases>
### Confidence
| Axis | Score | Note |
|------|-------|------|
| Task understanding | <0.0-1.0> | <why> |
| Solution completeness | <0.0-1.0> | <gaps?> |
| Risk coverage | <0.0-1.0> | <unknowns?> |
### Risks
- <what could go wrong + mitigations>
### Not Doing
- <adjacent concerns deliberately excluded>
```
**Confidence triggers:** If any axis scores below 0.5, flag it to the orchestrator. Low task understanding → clarify with user. Low solution completeness → consider standard workflow. Low risk coverage → spawn targeted Explorer research.
## Creator with Prior Feedback (Cycle 2+)
When the Creator receives structured feedback from a prior cycle, the proposal must include an additional section addressing each unresolved issue:
```markdown
## Proposal: <task> (Revision — Cycle N)
### What Changed (vs. prior proposal)
- <brief delta: what was added, removed, or redesigned>
### Prior Feedback Response
| Issue | Source | Action | Rationale |
|-------|--------|--------|-----------|
| SQL injection in user input | Guardian | **Fixed** — added parameterized queries | Direct security fix |
| Assumes single-tenant | Skeptic | **Deferred** — multi-tenant out of scope | Not in task requirements |
| Test names unclear | Sage | **Accepted** — routed to Maker | Implementation concern |
### Architecture Decision
<revised design addressing feedback>
### Changes
<updated file list>
### Test Strategy
<updated test cases>
### Confidence
| Axis | Score | Note |
|------|-------|------|
| Task understanding | <0.0-1.0> | <why> |
| Solution completeness | <0.0-1.0> | <gaps?> |
| Risk coverage | <0.0-1.0> | <unknowns?> |
### Risks
<updated risks — include any new risks from the revision>
### Not Doing
<updated scope boundaries>
```
**Rules for addressing feedback:**
- **Fixed:** Changed the design to resolve the issue. Explain how.
- **Deferred:** Not addressing now, with explicit reason. Must not be a CRITICAL finding.
- **Accepted:** Acknowledged and routed to Maker for implementation-level fix.
- **Disputed:** Disagrees with the finding. Must provide evidence or reasoning.
CRITICAL findings cannot be deferred or disputed — they must be fixed or the proposal will be rejected again.

View File

@@ -1,160 +1,59 @@
---
name: presence
description: |
Defines how ArcheFlow communicates its activity to the user visible but not noisy.
Defines how ArcheFlow communicates its activity to the user -- visible but not noisy.
Show value, not process. Auto-loaded by the run skill.
---
# ArcheFlow Presence Visible Value, Not Noise
# ArcheFlow Presence -- Visible Value, Not Noise
ArcheFlow should feel like a skilled colleague working alongside you: you know they're there, you see results, but they don't narrate every keystroke.
## Output Rules
## Principles
1. **Show outcomes, not mechanics.** "Guardian caught a timeline bug" — good. "Spawning Guardian agent with attention filters..." — noise.
2. **One line per phase, not per agent.** The user sees phases complete, not individual agent lifecycle.
3. **Numbers over words.** "2 fixes applied" beats "We have successfully applied two fixes to the codebase."
4. **Silence is fine.** If a phase completes cleanly with no findings, don't announce it. Clean passes are the expected case.
5. **Value at the end.** The completion summary is the most important output — what was built, what was caught, what was fixed.
1. Show outcomes, not mechanics
2. One line per phase, not per agent
3. Numbers over words
4. Silence on clean passes
5. Value summary at the end
## Status Line Format
At key moments during a run, output a compact status line:
### Run Start
**Run start:**
```
── archeflow ── <task> ── <workflow> (<max_cycles> cycles) ──
```
Example:
```
── archeflow ── Write story "Der Huster" ── kurzgeschichte (2 cycles) ──
-- archeflow -- <task> -- <workflow> (<max_cycles> cycles) --
```
### Phase Complete (only if something happened worth mentioning)
**Phase complete (only if noteworthy):**
```
plan explorer: 3 directions chose C (Koffer) | creator: 6 scenes
do 6004 words drafted
check guardian: 1 fix needed | sage: 5 voice adjustments
act 6 fixes applied
V plan explorer: 3 directions -> chose C | creator: 6 scenes
V do 6004 words drafted
T check guardian: 1 fix needed | sage: 5 voice adjustments
V act 6 fixes applied
```
Symbols: V = clean, T = issues found, X = failed/blocked.
Symbols:
- `✓` — phase clean, no issues
- `△` — phase found issues (fixes needed)
- `✗` — phase failed (blocked, needs user input)
### Run Complete
**Run complete:**
```
── done ── 1 cycle · 5 agents · 6 fixes · ~22 min ──
```
If value was delivered, add a one-liner:
```
── done ── 1 cycle · 5 agents · 6 fixes · ~22 min ──
-- done -- 1 cycle . 5 agents . 6 fixes . ~22 min --
story drafted, reviewed, and polished. see stories/01-der-huster.md
```
### Run Complete (with DAG, if terminal supports it)
Only show if the user explicitly asks or if `progress.dag_on_complete: true` in config:
**Activation indicator (session start, one line):**
```
── archeflow ── complete ──────────────────────
#1 run.start
├── #2 explorer → #3 decision (C) → #4 creator
├── #6 maker (6004 words)
├── #8 guardian △1 · #9 sage △5
└── #12 complete [6 fixes]
───────────────────────────────────────────────
archeflow v0.7.0 . 24 skills . writing domain detected
```
## When to Be Silent
- **Agent spawning/completion** — don't announce
- **Event emission** — internal bookkeeping, never visible
- **Artifact routing** — internal
- **Clean review passes** — if Guardian says APPROVED with 0 findings, skip it
- **Phase transitions** — only show if the phase produced visible output
- Agent spawning/completion lifecycle
- Event emission
- Artifact routing
- Clean review passes (0 findings)
- Phase transitions with no visible output
## When to Speak
- **Run start** — always (user should know ArcheFlow activated)
- **Findings found** — always (this is the value)
- **Fixes applied** — always (this is the outcome)
- **Run complete** — always (closure)
- **Budget warnings** — always (user needs to know)
- **Shadow detected** — always (something went wrong)
- **User decision needed** — always (blocking)
## Activation Indicator
When ArcheFlow activates at session start (via the `using-archeflow` skill), show ONE line:
```
archeflow v0.3.0 · 24 skills · writing domain detected
```
Or for code projects:
```
archeflow v0.3.0 · 24 skills · code domain
```
If ArcheFlow decides NOT to activate (simple task, single file):
```
(nothing — silence is correct for simple tasks)
```
## Integration with Progress File
The `.archeflow/progress.md` file is the detailed view for users who want more. The status lines above are the default — brief, inline, part of the conversation flow.
Users who want the full picture: `archeflow-progress.sh <run_id> --watch` in a second terminal.
## Anti-Patterns (Don't Do This)
```
❌ "I'm now activating the ArcheFlow orchestration framework..."
❌ "Spawning Explorer agent with model haiku and attention filter..."
❌ "The Guardian archetype has completed its security review and found..."
❌ "Let me run the convergence detection algorithm to check..."
❌ "According to the ArcheFlow process-log event schema..."
```
These expose internal mechanics. The user doesn't care about archetypes, attention filters, or event schemas. They care about: what was done, what was found, what was fixed.
## Examples: Good Presence
### Example 1: Feature Implementation
```
── archeflow ── Add JWT auth ── standard (2 cycles) ──
✓ plan 3 files affected, JWT + middleware approach
✓ do implemented (auth.ts, middleware.ts, tests)
△ check guardian: missing token expiry check
✓ act 1 fix applied
── done ── 1 cycle · 4 agents · 1 fix · ~8 min ──
```
### Example 2: Story Writing
```
── archeflow ── Write "Der Huster" ── kurzgeschichte (2 cycles) ──
✓ plan 3 plot directions → chose C (Mo krank + Koffer)
✓ do 6004 words, 7 scenes
△ check 1 timeline bug, 5 voice adjustments
✓ act 6 fixes applied
── done ── 1 cycle · 5 agents · 6 fixes · ~22 min ──
stories/01-der-huster.md ready
```
### Example 3: Quick Fix (minimal output)
```
── archeflow ── Fix pagination bug ── fast ──
✓ fix applied, tests pass
── done ── 1 cycle · 3 agents · ~4 min ──
```
### Example 4: Multi-Project
```
── archeflow ── giesing-story-v2 ── 3 projects ──
✓ archeflow artifact routing improved
✓ colette voice validation added
✓ giesing story #2 drafted (5800 words)
── done ── 3 projects · 12 agents · ~35 min ──
```
- Run start and complete (always)
- Findings found and fixes applied
- Budget warnings
- Shadow detected
- User decision needed

View File

@@ -1,278 +0,0 @@
---
name: process-log
description: |
Event-based process logging for ArcheFlow orchestrations. Captures every phase transition,
agent output, decision, and fix as structured JSONL events. Enables post-hoc reports,
dashboards, and process archaeology.
<example>Automatically loaded during orchestration</example>
<example>User: "Show me how this story was made"</example>
---
# Process Log — Event-Sourced Orchestration History
Every ArcheFlow orchestration writes structured events to a JSONL file. Events are the **single source of truth** — all reports (Markdown, dashboards, timelines) are generated views.
## Event Storage
```
.archeflow/events/<run-id>.jsonl # One file per orchestration run
.archeflow/events/index.jsonl # Run index (one line per run, for listing)
```
**Run ID format:** `<date>-<slug>` (e.g., `2026-04-03-der-huster`)
## When to Emit Events
Emit an event at each of these points during orchestration:
| Moment | Event Type | Trigger |
|--------|-----------|---------|
| Orchestration starts | `run.start` | After workflow selection, before first agent |
| Agent spawned | `agent.start` | Before each Agent tool call |
| Agent completes | `agent.complete` | After each Agent returns |
| Phase transition | `phase.transition` | Plan→Do, Do→Check, Check→Act |
| Decision made | `decision` | Plot direction chosen, fix applied, workflow adapted |
| Review verdict | `review.verdict` | Guardian/Sage/Skeptic delivers verdict |
| Fix applied | `fix.applied` | After each edit that addresses a review finding |
| Cycle boundary | `cycle.boundary` | End of PDCA cycle, before next (or exit) |
| Shadow detected | `shadow.detected` | Shadow threshold triggered |
| Orchestration ends | `run.complete` | After final Act phase |
## Event Schema
Every event is one JSON line with these required fields:
```jsonl
{
"ts": "2026-04-03T14:32:07Z",
"run_id": "2026-04-03-der-huster",
"seq": 4,
"parent": [2],
"type": "agent.complete",
"phase": "plan",
"agent": "creator",
"data": { ... }
}
```
| Field | Type | Description |
|-------|------|-------------|
| `ts` | ISO 8601 | Timestamp |
| `run_id` | string | Unique run identifier |
| `seq` | integer | Monotonically increasing sequence number within run |
| `parent` | int[] | Seq numbers of causal parent events. Forms a DAG. `[]` for root events. |
| `type` | string | Event type (see table above) |
| `phase` | string | Current PDCA phase: `plan`, `do`, `check`, `act` |
| `agent` | string or null | Agent archetype that triggered the event |
| `data` | object | Event-type-specific payload (see below) |
### Parent Relationships (DAG)
The `parent` field turns the flat event stream into a directed acyclic graph (agent call graph). This enables:
- **Causal reconstruction:** which agent output caused which downstream action
- **Parallel visualization:** agents sharing a parent ran concurrently
- **Blame tracking:** trace a fix back through review → draft → outline → research
Rules:
- `run.start` has `parent: []` (root node)
- An agent has `parent: [seq of event that triggered it]`
- A phase transition has `parent: [seq of all completing events in prior phase]`
- A fix has `parent: [seq of the review that found the issue]`
- A decision has `parent: [seq of the agent that produced the alternatives]`
- Parallel agents share the same parent (fan-out), phase transitions collect them (fan-in)
Example DAG from a writing workflow:
```
#1 run.start []
├── #2 agent.complete (explorer) [1]
│ └── #3 decision (plot direction) [2]
├── #4 agent.complete (creator) [2] ← explorer informs creator
├── #5 phase.transition (plan→do) [3,4] ← fan-in
│ └── #6 agent.complete (maker) [5]
├── #7 phase.transition (do→check) [6]
│ ├── #8 review (guardian) [7] ← parallel (fan-out)
│ └── #9 review (sage) [7] ← parallel (fan-out)
├── #10 phase.transition (check→act) [8,9] ← fan-in
├── #11 fix (timeline) [8] ← caused by guardian
├── #12 fix (voice drift) [9] ← caused by sage
└── #18 run.complete [17]
```
## Event Payloads by Type
### `run.start`
```json
{
"task": "Write short story 'Der Huster'",
"workflow": "kurzgeschichte",
"team": "story-development",
"max_cycles": 2,
"config": {
"voice_profile": "vp-giesing-gschichten-v1",
"persona": "giesinger",
"target_words": 6000
}
}
```
### `agent.start`
```json
{
"archetype": "story-explorer",
"model": "haiku",
"prompt_summary": "Research premise, find emotional core, suggest 3 plot directions"
}
```
### `agent.complete`
```json
{
"archetype": "story-explorer",
"duration_ms": 87605,
"tokens": 21645,
"artifacts": ["docs/01-der-huster-research.md"],
"summary": "3 plot directions developed, recommended C (Mo krank + Koffer)"
}
```
### `decision`
```json
{
"what": "plot_direction",
"chosen": "C — Mo krank + Koffer aus B",
"alternatives": [
{"id": "A", "label": "Mo ist weg", "reason_rejected": "Zu passiv für 6k-Story"},
{"id": "B", "label": "Huster gehört nicht Mo", "reason_rejected": "Zu Krimi-nah"}
],
"rationale": "Stärkster emotionaler Kern, passt zum Voice Profile"
}
```
### `review.verdict`
```json
{
"archetype": "guardian",
"verdict": "approved_with_fixes",
"findings": [
{"severity": "bug", "description": "Timeline: 'Montag' referenced but story starts Dienstag", "fix_required": true},
{"severity": "recommendation", "description": "Gentrification monologue too long for Alex register", "fix_required": false}
]
}
```
### `fix.applied`
```json
{
"source": "guardian",
"finding": "Timeline: Montag → Dienstag",
"file": "stories/01-der-huster.md",
"line": 302,
"before": "das Gegenteil von Montag",
"after": "das Gegenteil von Dienstag"
}
```
### `phase.transition`
```json
{
"from": "plan",
"to": "do",
"artifacts_so_far": ["research.md", "outline.md"],
"notes": "Explorer recommended direction C, Creator produced 6-scene outline"
}
```
### `cycle.boundary`
```json
{
"cycle": 1,
"max_cycles": 2,
"exit_condition": "all_approved",
"met": true,
"fixes_applied": 6,
"next_action": "complete"
}
```
### `shadow.detected`
```json
{
"archetype": "story-explorer",
"shadow": "endless_research",
"trigger": "output >2000 words without recommendation",
"action": "correction_prompt_applied",
"occurrence": 1
}
```
### `run.complete`
```json
{
"status": "completed",
"cycles": 1,
"agents_total": 5,
"fixes_total": 6,
"shadows": 0,
"duration_ms": 1295519,
"artifacts": [
"docs/01-der-huster-research.md",
"docs/01-der-huster-outline.md",
"stories/01-der-huster.md",
"docs/01-der-huster-guardian-review.md",
"docs/01-der-huster-sage-review.md",
"docs/01-der-huster-process.md"
]
}
```
## How to Emit Events
During orchestration, write events using this pattern:
```bash
# Append one event to the run's JSONL file
echo '{"ts":"'$(date -u +%Y-%m-%dT%H:%M:%SZ)'","run_id":"RUN_ID","seq":SEQ,"type":"TYPE","phase":"PHASE","agent":"AGENT","data":{...}}' >> .archeflow/events/RUN_ID.jsonl
```
Or use the helper script:
```bash
./lib/archeflow-event.sh RUN_ID TYPE PHASE AGENT '{"key":"value"}'
```
The orchestration skill should call the event emitter at each trigger point listed in the table above.
## Generating Reports
After orchestration completes (or during, for live progress):
```bash
# Generate markdown process report
./lib/archeflow-report.sh .archeflow/events/2026-04-03-der-huster.jsonl > docs/process-report.md
# List all runs
cat .archeflow/events/index.jsonl | jq -r '[.run_id, .status, .task] | @tsv'
```
## Run Index
After each `run.complete`, append a summary line to `.archeflow/events/index.jsonl`:
```jsonl
{"run_id":"2026-04-03-der-huster","ts":"2026-04-03T16:00:00Z","task":"Write Der Huster","workflow":"kurzgeschichte","status":"completed","cycles":1,"agents":5,"fixes":6,"duration_ms":1295519}
```
## Integration with Existing Skills
- **`orchestration`**: Emit events at phase transitions and after each agent
- **`shadow-detection`**: Emit `shadow.detected` when thresholds trigger
- **`autonomous-mode`**: Use `index.jsonl` for session summaries instead of separate session-log
- **`workflow-design`**: Custom workflows inherit logging automatically
## Design Principles
1. **Append-only.** Never modify or delete events. They are immutable facts.
2. **Self-contained.** Each event has enough context to be understood alone (no forward references).
3. **Cheap.** One `echo >>` per event. No database, no service, no dependencies.
4. **Optional.** If events dir doesn't exist, orchestration works fine without logging. Events are observation, not control flow.

View File

@@ -3,37 +3,20 @@ name: progress
description: |
Live progress file for ArcheFlow orchestrations. Regenerates `.archeflow/progress.md`
after every event emission, giving users real-time visibility into run status, budget
usage, and DAG shape watchable from a second terminal.
usage, and DAG shape -- watchable from a second terminal.
<example>User: "What's happening with my run?"</example>
<example>watch -n 2 cat .archeflow/progress.md</example>
---
# Live Progress Real-Time Run Visibility
# Live Progress -- Real-Time Run Visibility
During long-running orchestrations (Maker drafting, parallel reviews), users have no visibility into what is happening. This skill solves that by maintaining a live progress file that is regenerated after every event.
## Progress File
**Location:** `.archeflow/progress.md`
Updated after every event emission during a run. Users can watch it from a second terminal:
```bash
# Simple polling
watch -n 2 cat .archeflow/progress.md
# Continuous mode (built-in)
./lib/archeflow-progress.sh <run_id> --watch
# Programmatic consumption
./lib/archeflow-progress.sh <run_id> --json
```
Maintains `.archeflow/progress.md`, updated after every event during a run.
## Progress File Format
```markdown
# ArcheFlow Run: 2026-04-03-der-huster
**Status:** DO phase maker running (3/6 scenes drafted)
**Status:** DO phase -- maker running (3/6 scenes drafted)
**Started:** 14:32 | **Elapsed:** 8 min
**Budget:** $1.45 / $10.00 (14%)
@@ -47,145 +30,40 @@ watch -n 2 cat .archeflow/progress.md
- [ ] ACT: Apply fixes
## Latest Event
#6 agent.start maker (do) 14:40
## DAG (so far)
#1 run.start
├── #2 story-explorer ✓
│ ├── #3 decision ✓
│ └── #4 creator ✓
├── #5 plan→do ✓
└── #6 maker ← running
#6 agent.start -- maker (do) -- 14:40
```
## How to Use
## Usage
### During Orchestration (run skill integration)
The `run` skill should call `archeflow-progress.sh` after each event emission. This keeps progress decoupled from the event emitter itself — no modification to `archeflow-event.sh` is needed.
Add this call after every `archeflow-event.sh` invocation in the run loop:
```bash
# After emitting an event:
./lib/archeflow-event.sh "$RUN_ID" agent.complete plan explorer '{"archetype":"explorer",...}'
# Update progress:
./lib/archeflow-progress.sh "$RUN_ID"
The `run` skill calls `archeflow-progress.sh` after each event emission:
```
This is a fast operation (reads JSONL, writes one markdown file) and adds negligible overhead.
### From a Second Terminal
```bash
# One-shot: see current state
./lib/archeflow-progress.sh <run_id>
cat .archeflow/progress.md
# Continuous: auto-refresh every 2 seconds
./lib/archeflow-progress.sh <run_id> --watch
# JSON output for dashboards or scripts
./lib/archeflow-progress.sh <run_id> --json
```
### Reactive Mode (via JSONL tail)
**From a second terminal:**
- One-shot: `cat .archeflow/progress.md`
- Continuous: `./lib/archeflow-progress.sh <run_id> --watch`
- JSON output: `./lib/archeflow-progress.sh <run_id> --json`
```bash
tail -f .archeflow/events/<run_id>.jsonl | while read line; do
./lib/archeflow-progress.sh <run_id>
done
```
## How the Script Works
## Progress Script
**Location:** `lib/archeflow-progress.sh`
```
Usage:
archeflow-progress.sh <run_id> # Generate/update progress.md
archeflow-progress.sh <run_id> --watch # Continuous update mode (2s interval)
archeflow-progress.sh <run_id> --json # Output as JSON (for dashboards)
```
### What the Script Does
1. **Read** `.archeflow/events/<run_id>.jsonl` — the event stream for this run
2. **Determine** current phase and active agent from the latest events
3. **Build checklist** — mark completed agents with timing/cost data, show pending agents as unchecked
4. **Show partial DAG** — completed nodes with checkmarks, running node with arrow indicator
5. **Calculate budget** — sum `estimated_cost_usd` from `agent.complete` events, compare to budget from `run.start` config or `.archeflow/config.yaml`
6. **Compute elapsed time** — difference between `run.start` timestamp and now
7. **Write** to `.archeflow/progress.md`
### Output Modes
**Default (markdown):** Writes `.archeflow/progress.md` and prints the same content to stdout.
**`--watch`:** Clears the terminal every 2 seconds, re-reads the JSONL, and regenerates the display. Exits when a `run.complete` event is found.
**`--json`:** Outputs a structured JSON object to stdout (does not write progress.md):
```json
{
"run_id": "2026-04-03-der-huster",
"status": "running",
"phase": "do",
"active_agent": "maker",
"elapsed_seconds": 480,
"budget_used_usd": 1.45,
"budget_total_usd": 10.00,
"budget_percent": 14,
"completed": [
{"agent": "explorer", "phase": "plan", "duration_s": 87, "tokens": 21000, "cost_usd": 0.02},
{"agent": "creator", "phase": "plan", "duration_s": 167, "tokens": 26000, "cost_usd": 0.08}
],
"pending": ["guardian", "sage"],
"latest_event": {"seq": 6, "type": "agent.start", "agent": "maker", "phase": "do"},
"total_events": 6
}
```
1. Read `.archeflow/events/<run_id>.jsonl`
2. Determine current phase and active agent
3. Build checklist from events (only started/completed agents shown)
4. Calculate budget from `agent.complete` cost data
5. Write `.archeflow/progress.md`
## Checklist Construction
The progress checklist is built from events, not from a predefined workflow definition. Each event type maps to a checklist entry:
| Event Type | Checklist Entry |
|-----------|----------------|
| Event Type | Entry |
|-----------|-------|
| `agent.complete` | `- [x] PHASE: archetype (duration, tokens, cost)` |
| `agent.start` (no matching complete) | `- [ ] **PHASE: archetype** <- running (elapsed)` |
| `agent.start` (no complete) | `- [ ] **PHASE: archetype** <- running` |
| `phase.transition` | `- [x] PHASE -> PHASE transition` |
| `review.verdict` | `- [x] CHECK: archetype -> VERDICT` |
| `fix.applied` | `- [x] ACT: Fix (source)` |
| `cycle.boundary` | `- [x] Cycle N complete` |
Pending agents (not yet started) are NOT shown in the checklist — only started or completed agents appear. This avoids guessing which agents will be spawned.
Pending (not-yet-started) agents are NOT shown to avoid guessing.
## Budget Display
Budget information comes from two sources:
1. **`run.start` event** — may contain `config.budget_usd`
2. **`.archeflow/config.yaml`** — global `budget.per_run_usd`
If no budget is configured, the budget line shows cost only (no percentage):
```
**Cost:** $1.45 (no budget set)
```
## Integration with Other Skills
- **`run`**: Should call `archeflow-progress.sh` after each event emission
- **`process-log`**: Progress reads the same JSONL that process-log defines
- **`cost-tracking`**: Budget data and cost calculations follow cost-tracking conventions
- **`autonomous-mode`**: Progress file is useful for monitoring autonomous overnight runs
## Design Principles
1. **Read-only on events.** Progress never modifies the JSONL. It is a derived view.
2. **Fast.** One JSONL read + one markdown write. No jq streaming, no databases.
3. **Decoupled.** No hooks in `archeflow-event.sh`. The `run` skill calls progress explicitly.
4. **Optional.** If progress is never called, orchestration works fine. No side effects.
5. **Terminal-friendly.** Output is plain markdown — renders well in `cat`, `bat`, `glow`, or any terminal.
Source: `run.start` event or `.archeflow/config.yaml`. If no budget configured: show cost only.

146
skills/review/SKILL.md Normal file
View File

@@ -0,0 +1,146 @@
---
name: review
description: |
Review-only mode. Run Guardian + optional reviewers on an existing diff or branch,
without any Plan/Do orchestration. The highest-ROI mode for catching design-level bugs.
<example>User: "af-review"</example>
<example>User: "Review the last commit"</example>
<example>User: "af-review --reviewers guardian,skeptic"</example>
---
# ArcheFlow Review Mode
Run reviewers on existing code changes without orchestrating implementation.
This is the most cost-effective mode — it delivers Guardian's error-path analysis
without the Maker overhead.
## When to Use
- After you've implemented something and want a quality check
- On a PR or branch before merging
- When the sprint runner flags a task as DONE_WITH_CONCERNS
- As a pre-commit quality gate for complex changes
## Invocation
```
af-review # Review uncommitted changes
af-review --branch feat/batch-api # Review branch diff against main
af-review --commit HEAD~3..HEAD # Review last 3 commits
af-review --reviewers guardian,skeptic,sage # Choose reviewers (default: guardian)
af-review --evidence # Enable evidence-gating (stricter)
```
---
## Execution
### Step 1: Get the Diff
Use `lib/archeflow-review.sh` to extract the diff and stats:
```bash
# Uncommitted changes (default)
DIFF=$(bash lib/archeflow-review.sh)
# Branch diff against main
DIFF=$(bash lib/archeflow-review.sh --branch feat/batch-api)
# Commit range
DIFF=$(bash lib/archeflow-review.sh --commit HEAD~3..HEAD)
# Override base branch
DIFF=$(bash lib/archeflow-review.sh --branch feat/x --base develop)
# Stats only (no diff output)
bash lib/archeflow-review.sh --stat-only
```
The script prints the diff to stdout and stats to stderr. It exits 1 if the diff
is empty (nothing to review). For large diffs (>500 lines), it warns on stderr.
### Step 2: Spawn Reviewers
Default: Guardian only (fastest, highest ROI).
With `--reviewers`: spawn requested reviewers in parallel.
**Guardian** (always first):
```
Agent(
description: "Guardian: review changes for <project>",
prompt: "You are the GUARDIAN archetype — security and risk reviewer.
Review this diff for: security vulnerabilities, error handling gaps,
data loss scenarios, race conditions, and breaking changes.
For each finding: cite specific code (file:line), state what you tested
or observed, state what the correct behavior should be.
Diff:
<DIFF>
STATUS: DONE | DONE_WITH_CONCERNS | NEEDS_CONTEXT | BLOCKED",
subagent_type: "code-reviewer"
)
```
**Skeptic** (if requested):
- Focus: hidden assumptions, edge cases, scalability
- Context: diff + any design docs
**Sage** (if requested):
- Focus: code quality, test coverage, maintainability
- Context: diff + surrounding code
**Trickster** (if requested):
- Focus: adversarial inputs, failure injection, chaos testing
- Context: diff only
### Step 3: Collect and Report
Parse each reviewer's output. Show findings:
```
── af-review: <project> ───────────────────────
Reviewers: guardian, skeptic
🛡️ Guardian: 2 findings (1 HIGH, 1 MEDIUM)
[HIGH] Timeout marks variant as done — loses batch state (fanout.py:552)
[MEDIUM] No JSON error handling on corrupted state (batch.py:310)
🤔 Skeptic: 1 finding (1 INFO)
[INFO] hash() non-deterministic across processes (fanout.py:524)
Total: 3 findings (1 HIGH, 1 MEDIUM, 1 INFO)
────────────────────────────────────────────────
```
### Step 4: Evidence Gate (if --evidence)
When `--evidence` is active, apply the evidence requirements from `archeflow:check-phase`:
- Scan findings for banned phrases ("might be", "could potentially", etc.)
- Check for evidence markers (exit codes, line numbers, reproduction steps)
- Downgrade unsupported findings to INFO
---
## Integration with Sprint Runner
The sprint runner can invoke `af-review` automatically:
| Sprint trigger | Review action |
|----------------|--------------|
| Task marked DONE_WITH_CONCERNS | Run Guardian on the agent's changes |
| Task is L/XL estimate | Run Guardian + Skeptic after completion |
| Task involves security keywords | Run Guardian automatically |
| User requests | Run specified reviewers |
---
## Cost
Review-only is 60-80% cheaper than full PDCA:
- No Explorer research (~30% of PDCA cost)
- No Creator planning (~20% of PDCA cost)
- No Maker implementation (already done)
- Only reviewer token costs remain

View File

@@ -1,588 +1,309 @@
---
name: run
description: |
Automated PDCA execution loop. Single-command orchestration that initializes a run, flows through
Plan/Do/Check/Act phases, emits events at every step, saves artifacts to disk, and handles
cycle-back with structured feedback. Use instead of manually following orchestration steps.
<example>User: "archeflow:run"</example>
<example>User: "Run this through ArcheFlow"</example>
<example>User: "archeflow:run --start-from check"</example>
<example>User: "archeflow:run --dry-run"</example>
Start an ArcheFlow PDCA run. Usage: /af-run <task description> [--workflow fast|standard|thorough] [--dry-run] [--start-from plan|do|check|act]
---
# ArcheFlow Run — Automated PDCA Execution Loop
# ArcheFlow Run — PDCA Orchestration
This skill automates the full orchestration cycle. When invoked, Claude executes all PDCA phases end-to-end, emitting events and saving artifacts at every step. No manual phase-by-phase intervention needed.
One command runs the full cycle: Plan (Explorer+Creator) -> Do (Maker in worktree) -> Check (Guardian first, then others) -> Act (collect findings, route fixes, exit or cycle).
## Prerequisites
## 0. Initialize
Load these skills (they are referenced throughout):
- `archeflow:orchestration` — agent prompts, workflow selection, adaptation rules
- `archeflow:process-log` — event schema and DAG parent rules
- `archeflow:artifact-routing` — artifact naming, context injection, cycle archiving
1. Generate run ID: `<YYYY-MM-DD>-<task-slug>`
2. Create artifact directory: `mkdir -p .archeflow/artifacts/<run_id>`
3. Verify `./lib/archeflow-*.sh` scripts exist before proceeding
4. Inject cross-run memory: `./lib/archeflow-memory.sh inject "$DOMAIN" "" --audit "$RUN_ID"`
5. Read `.archeflow/config.yaml` models section. Resolution order: per-workflow per-archetype > per-workflow default > per-archetype > global default.
6. Emit `run.start` event
## Invocation
### Strategy Selection
```
archeflow:run # Full run, auto-select workflow
archeflow:run --workflow standard # Force a specific workflow
archeflow:run --start-from do # Resume from Do phase (requires prior artifacts)
archeflow:run --start-from check # Resume from Check phase
archeflow:run --dry-run # Plan phase only, show cost estimate
archeflow:run --max-cycles 1 # Override max cycles
```
Determine strategy from CLI flag `--strategy`, config `strategy:` field, or auto-detect:
| Signal | Strategy |
|--------|----------|
| Task contains fix/bug/patch/hotfix | `pipeline` |
| Task contains refactor/redesign/review | `pdca` |
| Workflow is `fast` with single file | `pipeline` |
| Workflow is `thorough` | `pdca` |
| Default | `pdca` |
If `pipeline`, skip to the Pipeline section at the end. Otherwise continue with PDCA below.
### Workflow Selection
| Signal | Workflow | Max Cycles |
|--------|----------|------------|
| Small fix, low risk, single concern | `fast` | 1 |
| Feature, multiple files, moderate risk | `standard` | 2 |
| Security-sensitive, breaking changes, public API | `thorough` | 3 |
---
## Execution Steps
## Attention Filters
### 0. Initialize
Each agent receives only what it needs. This is the canonical reference:
Generate a run ID and set up the artifact directory.
```bash
# Generate run_id
RUN_ID="$(date -u +%Y-%m-%d)-<task-slug>"
# Create artifact directory
mkdir -p .archeflow/artifacts/${RUN_ID}
# Emit run.start event (seq=1, parent=[])
./lib/archeflow-event.sh "$RUN_ID" run.start plan "" \
'{"task":"<task description>","workflow":"<fast|standard|thorough>","max_cycles":<N>}'
```
**Track state:** Maintain these variables throughout the run:
- `RUN_ID` — unique run identifier
- `SEQ` — current sequence number (read from event file line count after each emit)
- `CYCLE` — current PDCA cycle number (starts at 1)
- `WORKFLOW` — fast/standard/thorough (may change via adaptation rules)
- `ESCALATED` — boolean, set true if A1 triggers
After emitting `run.start`, record `SEQ_RUN_START=1`.
If `--start-from` is specified, verify that the required prior artifacts exist in `.archeflow/artifacts/${RUN_ID}/` before skipping phases. If missing, abort with an error.
#### 0b. Memory Injection
Load cross-run memory lessons and inject into agent prompts. Use `--audit` to track which lessons were injected for this run:
```bash
# Load cross-run memory for this domain (with audit trail)
MEMORY_LESSONS=$(./lib/archeflow-memory.sh inject "$DOMAIN" "" --audit "$RUN_ID")
# Inject into Explorer/Creator prompts if non-empty
if [[ -n "$MEMORY_LESSONS" ]]; then
EXPLORER_PROMPT="${EXPLORER_PROMPT}
${MEMORY_LESSONS}"
CREATOR_PROMPT="${CREATOR_PROMPT}
${MEMORY_LESSONS}"
fi
```
| Archetype | Receives | Excludes |
|-----------|----------|----------|
| Explorer | Task description, codebase access | Prior proposals, reviews, diffs |
| Creator | Task + Explorer output (+ feedback cycle 2+) | Raw files, diffs, reviewer outputs |
| Maker | Creator's proposal (+ Maker-routed feedback cycle 2+) | Explorer research, reviewer outputs |
| Guardian | Maker's diff + proposal risk section | Full proposal, Explorer research |
| Skeptic | Creator's proposal (assumptions focus) | Diff details, Explorer research |
| Sage | Proposal + diff + implementation summary | Explorer research, other reviews |
| Trickster | Maker's diff only | Everything else |
---
### 1. Plan Phase
## Status Token Protocol
#### 1a. Explorer (if standard or thorough)
Every agent ends output with `STATUS: <token>`. Parse it to decide the next action.
```bash
# Emit agent.start
./lib/archeflow-event.sh "$RUN_ID" agent.start plan explorer \
'{"archetype":"explorer","prompt_summary":"Research codebase context for task"}' "$SEQ_RUN_START"
```
| Status | Action |
|--------|--------|
| `DONE` | Proceed to next phase |
| `DONE_WITH_CONCERNS` | Log concerns, proceed |
| `NEEDS_CONTEXT` | Pause, request info from user |
| `BLOCKED` | Abort phase, report blocker |
Spawn the Explorer agent using the prompt from `archeflow:orchestration` Step 1.
If no status token found, default to `DONE`.
---
## 1. Plan Phase
### 1a. Explorer (standard/thorough only)
```
Agent(
description: "Explorer: research context for <task>",
prompt: "<Explorer prompt from orchestration skill>",
prompt: "You are the EXPLORER archetype.
<task description>
Research: 1) affected files/functions, 2) dependencies, 3) test coverage,
4) codebase patterns. Write a structured research report.
Be thorough but focused — no rabbit holes.",
subagent_type: "Explore"
)
```
After Explorer returns:
1. Save output to `.archeflow/artifacts/${RUN_ID}/plan-explorer.md`
2. Emit `agent.complete`:
```bash
./lib/archeflow-event.sh "$RUN_ID" agent.complete plan explorer \
'{"archetype":"explorer","duration_ms":<ms>,"artifacts":["plan-explorer.md"],"summary":"<1-line summary>"}' "$SEQ_EXPLORER_START"
```
3. Record `SEQ_EXPLORER_COMPLETE` for DAG references.
Save output to `.archeflow/artifacts/<run_id>/plan-explorer.md`.
#### 1b. Creator
### 1b. Creator
The Creator receives Explorer output (if it exists) or performs Mini-Reflect (fast workflow).
```bash
# Emit agent.start — parent is explorer.complete (or run.start for fast)
./lib/archeflow-event.sh "$RUN_ID" agent.start plan creator \
'{"archetype":"creator","prompt_summary":"Design solution proposal"}' "$SEQ_EXPLORER_COMPLETE"
```
Spawn the Creator agent using the prompt from `archeflow:orchestration` Step 1.
**Context injection (from artifact-routing skill):**
- Fast workflow: task description only
- Standard/thorough: task description + contents of `plan-explorer.md`
- Cycle 2+: task description + `plan-explorer.md` + `act-feedback.md` from prior cycle
Fast workflow (no Explorer): Creator must perform Mini-Reflect first:
1. Restate the task in one sentence
2. List 3 assumptions
3. Name the highest-damage risk
```
Agent(
description: "Creator: design proposal for <task>",
prompt: "<Creator prompt from orchestration skill, with context injected per above>",
prompt: "You are the CREATOR archetype.
<task description>
<if fast: Perform Mini-Reflect first (restate task, 3 assumptions, top risk)>
<if standard/thorough: Research findings: <plan-explorer.md contents>>
<if cycle 2+: Prior feedback: <Creator-routed section of act-feedback.md>>
Design a proposal:
1. Architecture decisions (with rationale)
2. Files to create/modify (exact paths, specific changes, 2-5 min per item)
3. Alternatives considered (2+, with rejection rationale)
4. Test strategy (specific test cases)
5. Confidence table (task understanding, solution completeness, risk coverage — each 0.0-1.0)
6. Risks and mitigations
<if cycle 2+: 7. How each prior issue was addressed (Fixed/Deferred/Accepted/Disputed)>
Be decisive — ship a clear plan, not a menu.",
subagent_type: "Plan"
)
```
After Creator returns:
1. Save output to `.archeflow/artifacts/${RUN_ID}/plan-creator.md`
2. Emit `agent.complete`
3. Record `SEQ_CREATOR_COMPLETE`
Save output to `.archeflow/artifacts/<run_id>/plan-creator.md`.
#### 1c. Confidence Gate (Adaptation Rule A3)
### 1c. Confidence Gate (Rule A3)
**Parsing instructions:**
Parse the `### Confidence` table from `plan-creator.md`. If unparseable, default to 0.0 (triggers gate).
Read `plan-creator.md`, locate the `### Confidence` table. Extract scores for each axis as floats:
| Axis | Score < 0.5 | Action |
|------|-------------|--------|
| Task understanding | Pause | Ask user for clarification. Do not spawn Maker. |
| Solution completeness | Upgrade | If fast -> standard. Spawn Explorer, re-run Creator. |
| Risk coverage | Mini-Explorer | Spawn focused Explorer for risky areas (5 min max, parallel with Do prep). |
```bash
CONF_FILE=".archeflow/artifacts/${RUN_ID}/plan-creator.md"
# Extract confidence scores (expects format: "| Task understanding | 0.8 |")
TASK_UNDERSTANDING=$(grep -i "task understanding" "$CONF_FILE" | grep -oE '[0-9]+\.[0-9]+' | head -1)
SOLUTION_COMPLETENESS=$(grep -i "solution completeness" "$CONF_FILE" | grep -oE '[0-9]+\.[0-9]+' | head -1)
RISK_COVERAGE=$(grep -i "risk coverage" "$CONF_FILE" | grep -oE '[0-9]+\.[0-9]+' | head -1)
# Fallback: if unparseable, emit warning and default to 0.0 (triggers gate, not bypasses it)
if [[ -z "$TASK_UNDERSTANDING" || -z "$SOLUTION_COMPLETENESS" || -z "$RISK_COVERAGE" ]]; then
echo "WARNING: Could not parse confidence scores from plan-creator.md" >&2
./lib/archeflow-event.sh "$RUN_ID" decision plan "" \
'{"what":"confidence_parse_failure","chosen":"warn","rationale":"one or more scores unparseable"}' "$SEQ_CREATOR_COMPLETE"
fi
TASK_UNDERSTANDING="${TASK_UNDERSTANDING:-0.0}"
SOLUTION_COMPLETENESS="${SOLUTION_COMPLETENESS:-0.0}"
RISK_COVERAGE="${RISK_COVERAGE:-0.0}"
```
**Pause branch** (Task understanding < 0.5):
The Creator does not sufficiently understand the task. Do not spawn Maker.
1. Emit decision event with `"chosen":"pause"`
2. Display message to user: "Creator rated task understanding at <score>. Clarification needed before proceeding."
3. Block until the user provides clarification
4. Re-run Creator with the clarification appended to the task description
```bash
./lib/archeflow-event.sh "$RUN_ID" decision plan "" \
'{"what":"confidence_gate","chosen":"pause","rationale":"task_understanding scored '"$TASK_UNDERSTANDING"'"}' "$SEQ_CREATOR_COMPLETE"
```
**Upgrade branch** (Solution completeness < 0.5):
The Creator's proposal is incomplete — more research is needed.
1. If fast workflow: upgrade to standard, spawn Explorer, then re-run Creator with Explorer output
2. If already standard/thorough: re-run Explorer with a focused prompt targeting the incomplete areas
```bash
./lib/archeflow-event.sh "$RUN_ID" decision plan "" \
'{"what":"confidence_gate","chosen":"upgrade","rationale":"solution_completeness scored '"$SOLUTION_COMPLETENESS"'"}' "$SEQ_CREATOR_COMPLETE"
# If fast → standard upgrade:
WORKFLOW="standard"
# Spawn Explorer, then re-run Creator with Explorer findings
```
**Mini-Explorer branch** (Risk coverage < 0.5):
The Creator identified risks but lacks confidence in their assessment. Spawn a focused Explorer to investigate.
```
Agent(
description: "Mini-Explorer: investigate risk area for <task>",
prompt: "You are the EXPLORER archetype. The Creator rated risk coverage at <score>.
Identified risks: <risks from plan-creator.md>
Research ONLY the risky areas. Answer: Is the risk real? What mitigations exist? What tests/guards would help?
Limit: focused output only.",
subagent_type: "Explore"
)
```
Save output to `.archeflow/artifacts/${RUN_ID}/plan-mini-explorer.md`. The Maker receives both `plan-creator.md` and `plan-mini-explorer.md` as context.
```bash
./lib/archeflow-event.sh "$RUN_ID" decision plan "" \
'{"what":"confidence_gate","chosen":"mini_explorer","rationale":"risk_coverage scored '"$RISK_COVERAGE"'"}' "$SEQ_CREATOR_COMPLETE"
```
**Note:** The mini-Explorer runs in parallel with Do phase preparation (5 min max). The Maker can proceed once both `plan-creator.md` and `plan-mini-explorer.md` are available.
#### 1d. Phase Transition: Plan to Do
```bash
# Parent = all completing events in Plan phase
./lib/archeflow-event.sh "$RUN_ID" phase.transition do "" \
'{"from":"plan","to":"do","artifacts_so_far":["plan-explorer.md","plan-creator.md"]}' "$SEQ_CREATOR_COMPLETE"
```
Record `SEQ_PLAN_TO_DO`.
Mini-Explorer prompt: "You are the EXPLORER. Risk coverage scored <score>. Identified risks: <risks>. Research ONLY the risky areas. Is the risk real? What mitigations exist?"
Save to `plan-mini-explorer.md`.
---
### 2. Do Phase
## 2. Do Phase
#### 2a. Maker
**Context injection (from artifact-routing skill):**
- Contents of `plan-creator.md` (the proposal)
- Cycle 2+: also contents of `act-feedback.md` filtered to Maker-routed findings only
```bash
./lib/archeflow-event.sh "$RUN_ID" agent.start do maker \
'{"archetype":"maker","prompt_summary":"Implement proposal in isolated worktree"}' "$SEQ_PLAN_TO_DO"
```
### 2a. Maker
```
Agent(
description: "Maker: implement <task>",
prompt: "<Maker prompt from orchestration skill, with Creator proposal injected>
<if cycle 2+: Implementation feedback: <Maker-routed findings from act-feedback.md>>",
prompt: "You are the MAKER archetype.
Implement this proposal: <plan-creator.md contents>
<if cycle 2+: Implementation feedback: <Maker-routed findings from act-feedback.md>>
Rules:
1. Follow the proposal exactly — don't redesign
2. Write tests for every behavioral change
3. Commit with descriptive messages (CRITICAL: uncommitted worktree changes are LOST)
4. Run existing tests — nothing may break
5. If unclear, implement best interpretation and note it
Self-review before finishing:
- All proposal files changed? Tests added? No out-of-scope files? Existing tests pass?",
isolation: "worktree",
mode: "bypassPermissions"
)
```
After Maker returns:
1. Save implementation summary to `.archeflow/artifacts/${RUN_ID}/do-maker.md`
2. Capture list of changed files: `git diff --name-only` on the Maker's branch, save to `.archeflow/artifacts/${RUN_ID}/do-maker-files.txt`
3. Emit `agent.complete`:
```bash
./lib/archeflow-event.sh "$RUN_ID" agent.complete do maker \
'{"archetype":"maker","duration_ms":<ms>,"artifacts":["do-maker.md","do-maker-files.txt"],"summary":"<files changed, tests added>"}' "$SEQ_MAKER_START"
```
4. Record `SEQ_MAKER_COMPLETE`
Save summary to `do-maker.md`. Save changed file list to `do-maker-files.txt`.
**Critical:** Verify the Maker committed its changes before proceeding. If uncommitted changes exist, instruct the Maker to commit.
### 2b. Test-First Gate
#### 2a-ii. Test-First Validation
After Maker completes, check `do-maker-files.txt` for test files:
```bash
TEST_FILES=$(grep -iE '([/_.-](test|spec)[/_.-]|\.(test|spec)\.|_(test|spec)\.|/tests?/|/__tests__/|/specs?/)' ".archeflow/artifacts/${RUN_ID}/do-maker-files.txt" || true)
```
If `TEST_FILES` is empty and domain is not `writing`:
1. Check if `plan-creator.md` contains a `### Test Strategy` section
2. If yes: re-run Maker with targeted test instruction (one retry within Do phase)
3. If no test strategy specified: emit WARNING event and proceed
```bash
./lib/archeflow-event.sh "$RUN_ID" decision do "" \
'{"what":"test_first_gate","chosen":"<pass|warn|retry>","rationale":"<reason>"}' "$SEQ_MAKER_COMPLETE"
```
The re-run prompt for the retry case:
> "The proposal specified these test cases: <test strategy section>. No test files were found in your changes. Add the specified tests before finishing."
This is one retry within the Do phase, not a full PDCA cycle. If the retry also produces no tests, emit WARNING and proceed to Check.
#### 2b. Phase Transition: Do to Check
```bash
./lib/archeflow-event.sh "$RUN_ID" phase.transition check "" \
'{"from":"do","to":"check","artifacts_so_far":["plan-explorer.md","plan-creator.md","do-maker.md","do-maker-files.txt"]}' "$SEQ_MAKER_COMPLETE"
```
Record `SEQ_DO_TO_CHECK`.
Check `do-maker-files.txt` for test files. If none found and domain is not `writing`:
- If Creator specified a test strategy: re-run Maker with targeted test instruction (1 retry within Do, not a full cycle)
- If no test strategy: emit WARNING, proceed
---
### 3. Check Phase
## 3. Check Phase
**Important:** Spawn Guardian FIRST, then evaluate A2 before spawning other reviewers.
Spawn Guardian FIRST. Evaluate Rule A2 before spawning other reviewers.
#### 3a. Guardian (always first)
**Context injection:** Maker's git diff + proposal risk section only (not full proposal, not Explorer research).
```bash
./lib/archeflow-event.sh "$RUN_ID" agent.start check guardian \
'{"archetype":"guardian","prompt_summary":"Security and risk review of changes"}' "$SEQ_DO_TO_CHECK"
```
### 3a. Guardian (always first)
```
Agent(
description: "Guardian: security review for <task>",
prompt: "<Guardian prompt from orchestration skill, with Maker's diff injected>"
prompt: "You are the GUARDIAN archetype.
Review changes in branch: <maker's branch>
<git diff from Maker's branch>
<risks section from plan-creator.md>
Assess: security, reliability, breaking changes, dependencies.
Output: APPROVED or REJECTED with findings table.
Each finding: | file:line | CRITICAL/WARNING/INFO | category | description | fix |
Be rigorous but practical — flag real risks, not theoretical ones."
)
```
After Guardian returns:
1. Save to `.archeflow/artifacts/${RUN_ID}/check-guardian.md`
2. Emit `review.verdict`:
```bash
./lib/archeflow-event.sh "$RUN_ID" review.verdict check guardian \
'{"archetype":"guardian","verdict":"<approved|rejected|approved_with_fixes>","findings":[...]}' "$SEQ_GUARDIAN_START"
```
3. Record `SEQ_GUARDIAN_VERDICT`
Save to `check-guardian.md`.
#### 3b. Guardian Fast-Path Check (Adaptation Rule A2)
### 3b. Guardian Fast-Path (Rule A2)
Parse Guardian's output. If **0 CRITICAL and 0 WARNING** AND workflow is not escalated AND not first cycle of thorough:
If Guardian found **0 CRITICAL and 0 WARNING** AND workflow is not escalated AND not first cycle of thorough:
- Skip remaining reviewers, proceed directly to Act phase
- Log "Guardian fast-path taken"
```bash
./lib/archeflow-event.sh "$RUN_ID" decision check "" \
'{"what":"guardian_fast_path","chosen":"skip_remaining_reviewers","rationale":"0 CRITICAL, 0 WARNING"}' "$SEQ_GUARDIAN_VERDICT"
```
### 3c. Remaining Reviewers (spawn in parallel)
Skip to Phase Transition (3d). Log "Guardian fast-path taken" in report.
**Skeptic** (standard/thorough) — receives Creator's proposal, focus on assumptions.
Save to `check-skeptic.md`.
Otherwise, proceed to spawn remaining reviewers.
**Sage** (standard/thorough) — receives proposal + diff + implementation summary.
Save to `check-sage.md`.
#### 3c. Remaining Reviewers (in parallel)
**Trickster** (thorough only) — receives Maker's diff only. "Think like a QA engineer paid per bug."
Save to `check-trickster.md`.
Spawn these based on workflow (see `archeflow:orchestration` for which reviewers apply):
### 3d. Evidence Validation
**Skeptic** (standard/thorough):
- Context: Creator's proposal (assumptions section focus)
- Save to: `check-skeptic.md`
After all reviewers complete, scan CRITICAL/WARNING findings. Downgrade to INFO if:
- **Banned phrases** without evidence: "might be", "could potentially", "appears to", "seems like", "may not"
- **No evidence**: no command output, code citation, line reference, or reproduction steps
**Sage** (standard/thorough):
- Context: Creator's proposal + Maker's diff + implementation summary
- Save to: `check-sage.md`
**Trickster** (thorough only):
- Context: Maker's diff only
- Save to: `check-trickster.md`
Spawn all applicable reviewers in parallel (multiple Agent calls in one message). For each:
```bash
# Emit agent.start with parent = SEQ_DO_TO_CHECK
./lib/archeflow-event.sh "$RUN_ID" agent.start check <archetype> \
'{"archetype":"<archetype>","prompt_summary":"<review focus>"}' "$SEQ_DO_TO_CHECK"
```
After each returns, emit `review.verdict` and save artifact.
#### 3d. Phase Transition: Check to Act
Collect all verdict seq numbers for the parent array.
```bash
./lib/archeflow-event.sh "$RUN_ID" phase.transition act "" \
'{"from":"check","to":"act"}' "<all_verdict_seqs>"
```
Record `SEQ_CHECK_TO_ACT`.
Track downgrades in events (do NOT modify artifact files). Act phase excludes downgraded findings from CRITICAL tallies.
---
### 4. Act Phase
## 4. Act Phase
#### 4a. Collect Verdicts
### 4a. Collect Verdicts
Read all `check-*.md` artifacts. Tally findings:
- Count CRITICAL, WARNING, INFO per reviewer
- Check for unanimous approval
Read all `check-*.md` artifacts. Tally CRITICAL/WARNING/INFO per reviewer.
#### 4b. Escalation Check (Adaptation Rule A1)
### 4b. Escalation Check (Rule A1)
If workflow is `fast` and Guardian found 2+ CRITICAL:
- Set `ESCALATED=true`
- Upgrade next cycle to `standard` (add Skeptic + Sage)
- Emit decision event
If `fast` workflow and Guardian found 2+ CRITICAL: upgrade next cycle to `standard` (add Skeptic + Sage). Once escalated, stays escalated. A2 does not apply to escalated workflows.
#### 4c. Branch: All Approved
### 4c. Convergence Check (cycle 2+ only)
If all reviewers approved (and completion criteria met, if defined):
Compare current findings against previous cycle. Classify each as NEW / RESOLVED / PERSISTENT / REGRESSED.
1. Emit `cycle.boundary`:
```bash
./lib/archeflow-event.sh "$RUN_ID" cycle.boundary act "" \
'{"cycle":<N>,"max_cycles":<M>,"exit_condition":"all_approved","met":true,"next_action":"complete"}' "$SEQ_CHECK_TO_ACT"
```
convergence_score = resolved / (resolved + new + regressed)
```
2. **Pre-merge hook check:**
```bash
# Read hooks config if it exists
if [[ -f ".archeflow/hooks.yaml" ]]; then
PRE_MERGE_HOOKS=$(grep -A5 "pre-merge:" .archeflow/hooks.yaml || true)
if [[ -n "$PRE_MERGE_HOOKS" ]]; then
echo "Running pre-merge hooks..."
# Execute hooks; abort merge if fail_action: abort
# Hook execution is project-specific — see .archeflow/hooks.yaml
fi
fi
```
| Score | Status | Action |
|-------|--------|--------|
| > 0.8 | Converging | Continue if cycles remain |
| 0.5-0.8 | Stalling | Continue with caution |
| < 0.5 | Diverging | STOP if 2 consecutive cycles |
| 0.0 (all persistent) | Stuck | STOP |
3. **Merge the Maker's worktree branch:**
```bash
./lib/archeflow-git.sh merge "$RUN_ID" --no-ff
```
**Oscillation**: Finding present in cycle N-2, absent in N-1, back in N. Two or more oscillating findings = STOP and escalate to user.
4. **Post-merge test validation** (using the auto-rollback script):
```bash
# Run tests and auto-revert if they fail
if ! ./lib/archeflow-rollback.sh "$RUN_ID"; then
# Rollback script already reverted HEAD and emitted decision event
# If cycles remain, cycle back with integration test failure feedback
if [[ "$CYCLE" -lt "$MAX_CYCLES" ]]; then
echo "Cycling back with integration test failure feedback..."
# Build act-feedback.md with "integration test failure on main" as top finding
# Continue to step 4d (Issues Found)
else
echo "Max cycles reached. Reporting failure to user."
# Continue to step 4e (Max Cycles Reached)
fi
fi
```
Convergence STOP overrides normal cycle-back even if cycles remain.
5. **Clean up worktree:**
```bash
./lib/archeflow-git.sh cleanup "$RUN_ID"
```
### 4d. All Approved
6. Proceed to Completion (step 5)
1. Run pre-merge hooks from `.archeflow/hooks.yaml` if defined
2. Merge: `./lib/archeflow-git.sh merge "$RUN_ID" --no-ff`
3. Post-merge test validation: `./lib/archeflow-rollback.sh "$RUN_ID"` (auto-reverts if tests fail)
4. If rollback triggered and cycles remain: cycle back with "integration test failure" feedback
5. Clean up worktree: `./lib/archeflow-git.sh cleanup "$RUN_ID"`
6. Proceed to Completion
#### 4d. Branch: Issues Found (cycles remaining)
### 4e. Issues Found (cycles remaining)
If any reviewer rejected and `CYCLE < MAX_CYCLES`:
Build `act-feedback.md` using the feedback routing table below. Archive current cycle artifacts to `cycle-<N>/`. Increment cycle, go back to Plan.
1. Build structured feedback using the Cycle Feedback Protocol from `archeflow:orchestration`:
- Extract findings from all `check-*.md` artifacts
- Route findings: Guardian/Skeptic issues → Creator, Sage issues → Maker
- Check convergence: same finding in 2 consecutive cycles → escalate to user
- Dedup cross-archetype findings
### 4f. Max Cycles Reached
2. Save to `.archeflow/artifacts/${RUN_ID}/act-feedback.md`
3. Save applied fixes log (initially empty, populated during next Do phase):
```bash
touch .archeflow/artifacts/${RUN_ID}/act-fixes.jsonl
```
4. Emit `cycle.boundary`:
```bash
./lib/archeflow-event.sh "$RUN_ID" cycle.boundary act "" \
'{"cycle":<N>,"max_cycles":<M>,"exit_condition":"all_approved","met":false,"next_action":"cycle_back"}' "$SEQ_CHECK_TO_ACT"
```
5. Archive current cycle artifacts:
```bash
mkdir -p .archeflow/artifacts/${RUN_ID}/cycle-${CYCLE}
cp .archeflow/artifacts/${RUN_ID}/plan-*.md .archeflow/artifacts/${RUN_ID}/cycle-${CYCLE}/
cp .archeflow/artifacts/${RUN_ID}/do-*.md .archeflow/artifacts/${RUN_ID}/do-*.txt .archeflow/artifacts/${RUN_ID}/cycle-${CYCLE}/ 2>/dev/null || true
cp .archeflow/artifacts/${RUN_ID}/check-*.md .archeflow/artifacts/${RUN_ID}/cycle-${CYCLE}/
```
6. Increment `CYCLE`, go back to Step 1 (Plan Phase)
#### 4e. Branch: Max Cycles Reached
If `CYCLE >= MAX_CYCLES` and issues remain:
1. Report all unresolved findings to the user
2. Present the best implementation (on its branch, not merged)
3. Let the user decide: merge as-is, fix manually, or abandon
4. Emit `cycle.boundary` with `"met": false, "next_action": "user_decision"`
Report unresolved findings. Present best implementation on its branch (not merged). Let user decide.
---
### 5. Completion
## Feedback Routing Table
```bash
# Emit run.complete
./lib/archeflow-event.sh "$RUN_ID" run.complete act "" \
'{"status":"completed","cycles":<N>,"agents_total":<count>,"fixes_total":<count>,"shadows":0,"artifacts":[<list>]}'
Route each finding to the right agent for the next cycle:
# Generate report
./lib/archeflow-report.sh .archeflow/events/${RUN_ID}.jsonl
| Source | Category | Routes To | Rationale |
|--------|----------|-----------|-----------|
| Guardian | security, breaking-change | **Creator** | Design must change |
| Guardian | reliability, dependency | **Creator** | Architectural decision needed |
| Skeptic | design, scalability | **Creator** | Assumptions need revision |
| Sage | quality, consistency | **Maker** | Implementation refinement |
| Sage | testing | **Maker** | Test gap, not design flaw |
| Trickster | reliability (design flaw) | **Creator** | Needs redesign |
| Trickster | reliability (test gap) | **Maker** | Needs more tests |
| Trickster | testing | **Maker** | Edge case not covered |
# Update run index
echo '{"run_id":"'$RUN_ID'","ts":"'$(date -u +%Y-%m-%dT%H:%M:%SZ)'","task":"<task>","workflow":"<wf>","status":"completed","cycles":<N>}' \
>> .archeflow/events/index.jsonl
```
**Disambiguation**: If fix requires changing the approach -> Creator. If fix requires changing the code within the existing approach -> Maker.
Display the orchestration report to the user (see `archeflow:orchestration` report format).
### Feedback File Format
`act-feedback.md` splits into `## Creator-Routed Issues` and `## Maker-Routed Issues`. Inject only the relevant section into each agent's prompt.
**Same finding in 2 consecutive cycles**: escalate to user. Do not cycle again blindly.
**Cross-archetype dedup**: If two reviewers raise the same issue (same file + category), merge into one finding. Route to higher-priority destination (Creator over Maker).
---
## Fix Tracking
## 5. Completion
When the Maker addresses review findings in cycle 2+, emit `fix.applied` for each:
```bash
./lib/archeflow-event.sh "$RUN_ID" fix.applied act "" \
'{"source":"<reviewer>","finding":"<description>","file":"<path>","line":<n>}' "$SEQ_OF_REVIEW"
```
Also append to `.archeflow/artifacts/${RUN_ID}/act-fixes.jsonl`:
```jsonl
{"source":"guardian","finding":"SQL injection","file":"src/auth.ts","line":48,"fixed_in_cycle":2}
```
---
## Dry-Run Mode
When `--dry-run` is specified:
1. Run **only the Plan phase** (Explorer + Creator)
2. Display:
```
Dry run for: "<task>"
Workflow: <standard> (<N> cycles max)
Agents per cycle: <count>
Max agents total: <count * cycles>
Plan phase result: see .archeflow/artifacts/<run_id>/plan-creator.md
Creator confidence: <scores>
Estimated phases: Plan (done) -> Do -> Check -> Act
Proceed with full run? [y/n]
```
3. Do NOT emit `run.complete` — the run is paused, not finished
4. If user says yes, continue from `--start-from do` using the saved artifacts
---
## Start-From Mode
When `--start-from <phase>` is specified:
| Start from | Required artifacts in `.archeflow/artifacts/<run_id>/` |
|------------|-------------------------------------------------------|
| `plan` | None (equivalent to full run) |
| `do` | `plan-creator.md` |
| `check` | `plan-creator.md`, `do-maker.md`, `do-maker-files.txt` |
| `act` | All `check-*.md` files |
Validate required artifacts exist. If missing, error:
```
Cannot start from <phase>: missing artifact <name>. Run the prior phase first.
```
When resuming, emit a `run.start` event with `{"resumed_from":"<phase>"}` in data.
---
## Error Handling
- **Agent fails to return:** Wait up to 5 minutes. If no response, emit `agent.complete` with `"error": true`, log the failure, and abort the run. Do not retry blindly.
- **Event emitter fails:** Log a warning but do not block orchestration. Events are observation, not control flow.
- **Artifact write fails:** This IS blocking. Artifacts are required for phase handoff. Abort and report.
- **Merge conflict:** Do not force-resolve. Report the conflict, leave the branch intact, let the user decide.
1. Emit `run.complete` event
2. Check for regressions: `./lib/archeflow-memory.sh regression-check`
3. Generate report: `./lib/archeflow-report.sh .archeflow/events/<run_id>.jsonl`
4. Score effectiveness: `./lib/archeflow-score.sh extract .archeflow/events/<run_id>.jsonl`
5. Append to run index: `.archeflow/events/index.jsonl`
---
## Progress Display
Throughout the run, display live progress using the format from `archeflow:using-archeflow`:
```
━━━ ArcheFlow Run: <task> ━━━━━━━━━━━━━━━━━━━
Run ID: <run_id> | Workflow: <standard> | Cycle: 1/<max>
@@ -599,3 +320,148 @@ Run ID: <run_id> | Workflow: <standard> | Cycle: 1/<max>
Artifacts: .archeflow/artifacts/<run_id>/
Report: .archeflow/events/<run_id>.jsonl
```
---
## Shadow Monitoring
Quick-check after each agent completes:
| Archetype | Shadow | Trigger |
|-----------|--------|---------|
| Explorer | Rabbit Hole | Output >2000 words without Recommendation section |
| Creator | Over-Architect | >2 new abstractions for one feature |
| Maker | Rogue | No tests in changeset, or files outside proposal |
| Guardian | Paranoid | CRITICAL:WARNING ratio >2:1, or zero approvals |
| Skeptic | Paralytic | >7 challenges, <50% have alternatives |
| Trickster | False Alarm | Findings in untouched code, or >10 findings |
| Sage | Bureaucrat | Review >2x code change length |
On detection: apply correction prompt. On 2nd detection of same shadow: replace agent. On 3+ shadows in one cycle: escalate to user.
---
## Event Reference
Emit events via `./lib/archeflow-event.sh <run_id> <type> <phase> <agent> '<json>'`. Events are optional — never let logging block orchestration.
| When | Event Type | Key Data |
|------|-----------|----------|
| Run starts | `run.start` | task, workflow, max_cycles |
| Before agent spawn | `agent.start` | archetype, model, prompt_summary |
| After agent returns | `agent.complete` | archetype, duration_ms, artifacts, summary |
| Phase boundary | `phase.transition` | from, to, artifacts_so_far |
| Alternative chosen | `decision` | what, chosen, alternatives, rationale |
| Orchestrator decision (replay) | `decision.point` | archetype, input, decision, confidence — use `./lib/archeflow-decision.sh` |
| Reviewer verdict | `review.verdict` | archetype, verdict, findings[] |
| Fix addressing review | `fix.applied` | source, finding, file, line |
| End of PDCA cycle | `cycle.boundary` | cycle, max_cycles, exit_condition, convergence |
| Shadow triggered | `shadow.detected` | archetype, shadow, trigger, action |
| Policy halt | `wiggum.break` | trigger, run_state, unresolved_findings, hard/soft |
| Run ends | `run.complete` | status, cycles, agents_total, fixes_total |
Parent rules: `run.start` has `parent: []`. Agents parent to the event that triggered them. Phase transitions fan-in from all completing events. Parallel agents share the same parent.
---
## Artifact Naming
All artifacts live in `.archeflow/artifacts/<run_id>/`:
| File | Content |
|------|---------|
| `plan-explorer.md` | Explorer research (missing in fast) |
| `plan-creator.md` | Creator proposal with confidence |
| `plan-mini-explorer.md` | Risk research (only if A3 triggers) |
| `do-maker.md` | Maker implementation summary |
| `do-maker-files.txt` | Changed file paths (one per line) |
| `check-guardian.md` | Guardian verdict + findings |
| `check-skeptic.md` | Skeptic verdict (if spawned) |
| `check-sage.md` | Sage verdict (if spawned) |
| `check-trickster.md` | Trickster verdict (if spawned) |
| `act-feedback.md` | Structured feedback for next cycle |
| `act-fixes.jsonl` | Applied fixes log |
| `cycle-<N>/` | Archived artifacts from cycle N |
Always check artifact existence before injecting. Missing optional artifacts are expected — skip, don't fail.
Git diff is generated on-the-fly (`git diff main...<maker-branch>`), not saved to disk.
---
## Effectiveness Scoring
After each run, `./lib/archeflow-score.sh extract` scores review archetypes on:
| Dimension | Weight |
|-----------|--------|
| Signal-to-noise (useful / total findings) | 0.30 |
| Fix rate (findings that led to fixes) | 0.25 |
| Cost efficiency (useful findings per dollar) | 0.20 |
| Accuracy (not contradicted by others) | 0.15 |
| Cycle impact (led to cycle exit) | 0.10 |
Scores stored in `.archeflow/memory/effectiveness.jsonl`. After 10+ runs, recommend model tier changes and archetype removal.
---
## Run replay (decision log + what-if)
After key choices (routing, fast-path skip, escalation), emit `decision.point` via `./lib/archeflow-decision.sh` so runs can be inspected with `./lib/archeflow-replay.sh timeline|whatif|compare <run_id>`. Weighted what-if helps estimate how much each review archetype swayed the effective ship/block outcome. See skill `af-replay`.
---
## Dry-Run Mode
When `--dry-run`: Run Plan phase only. Display workflow, agent counts, confidence scores, cost estimate. Ask user to proceed. If yes, continue with `--start-from do`.
## Start-From Mode
| Start from | Required artifacts |
|------------|--------------------|
| `plan` | None |
| `do` | `plan-creator.md` |
| `check` | `plan-creator.md`, `do-maker.md`, `do-maker-files.txt` |
| `act` | All `check-*.md` files |
Validate required artifacts exist. Error if missing.
---
## Error Handling
- **Agent timeout (5 min)**: Emit error event, abort run. Do not retry blindly.
- **Event emitter fails**: Log warning, continue. Events are observation, not control flow.
- **Artifact write fails**: Blocking — abort and report. Artifacts are required for phase handoff.
- **Merge conflict**: Do not force-resolve. Report conflict, leave branch intact, let user decide.
---
## Pipeline Strategy
Linear flow with no cycle-back. Use for bug fixes and well-understood single-concern tasks.
```
Plan (Creator only) -> Implement (Maker) -> Spec-Review (Guardian, then Skeptic if findings) -> Quality-Review (Sage) -> Verify (tests + merge)
```
1. **Plan**: Spawn Creator with Mini-Reflect (no Explorer). Save `plan-creator.md`.
2. **Implement**: Spawn Maker in worktree. Save `do-maker.md`.
3. **Spec-Review**: Guardian first. Skeptic only if Guardian has findings.
4. **Quality-Review**: Sage reviews proposal + diff + summary.
5. **Verify**: Run tests. If pass and 0 CRITICAL: merge. If CRITICAL: one targeted Maker fix, re-review, re-test. If still failing: abort, report branch name for manual resolution.
WARNINGs are logged but do not block merge.
```
━━━ ArcheFlow Pipeline: <task> ━━━━━━━━━━━━━━━━
Run ID: <run_id> | Strategy: pipeline
[Plan] Creator designing... -> done (20s)
[Implement] Maker building... -> done (60s, 3 files)
[Spec] Guardian reviewing... -> APPROVED
[Quality] Sage reviewing... -> APPROVED (1 WARNING)
[Verify] Tests passing... -> merged to main
━━━ Complete: 2m 15s ━━━━━━━━━━━━━━━━━━━━━━━━━━
```

View File

@@ -1,180 +1,139 @@
---
name: shadow-detection
description: Use when monitoring agent behavior for dysfunction, when an agent seems stuck, or when orchestration quality is degrading. Detects and corrects Jungian shadow activation in archetypes.
description: |
Corrective action framework for agent dysfunction, system health, and operational policy.
Three layers — archetype shadows, system shadows, policy boundaries — one escalation protocol.
---
# Shadow Detection
# Corrective Action Framework
Every archetype has a **virtue** (its unique contribution) and a **shadow** (the destructive inversion of that virtue). A shadow activates when the virtue is pushed too far.
Detect dysfunction. Apply corrective action. Escalate if repeated.
```
Virtue (healthy) → pushed too far → Shadow (dysfunction)
Contextual Clarity → can't stop → Rabbit Hole
Decisive Framing → over-builds → Over-Architect
Execution Discipline → no guardrails → Rogue
Threat Intuition → sees threats only → Paranoid
Assumption Surfacing → questions only → Paralytic
Adversarial Creativity → noise over signal → False Alarm
Maintainability Judgment → reviews only → Bureaucrat
```
Three layers, one protocol:
- **Archetype Shadows** — individual agent dysfunction (virtue pushed too far)
- **System Shadows** — orchestration-level dysfunction (process going wrong)
- **Policy Boundaries** — operational limits (time, cost, quality thresholds)
---
## Explorer → Rabbit Hole
**Virtue inverted:** Contextual Clarity becomes compulsive investigation — or output that dumps without analyzing.
## Archetype Shadows
**Symptoms:**
- Research output keeps growing but never synthesizes
- "I found one more thing to check" repeated 3+ times
- Reading more than 15 files without producing findings
- Output is a raw inventory of files with no analysis or recommendation
| Archetype | Shadow | Detect (any) | Corrective Action |
|-----------|--------|-------------|-------------------|
| Explorer | Rabbit Hole | Output >2000w without Recommendation; >3 tangents; >15 files no patterns; no synthesis in final 25% | "Summarize top 3 findings and one recommendation in 300 words." |
| Creator | Over-Architect | >2 new abstractions for one feature; "future-proof" in rationale; scope exceeds task >50%; >1 new package | "Design for the current order of magnitude. Remove abstractions for hypothetical requirements." |
| Maker | Rogue | Zero test files with >=3 files changed; single monolithic commit; files outside proposal; no test run evidence | "Read the proposal. Write a test. Commit. Revert out-of-scope files." |
| Guardian | Paranoid | CRITICAL:WARNING ratio >2:1 (min 3); zero APPROVED in 3+ reviews; <50% findings include fix; findings require compromised systems | "For each CRITICAL: would a senior engineer block a PR? If not, downgrade. Every rejection needs a specific fix." |
| Skeptic | Paralytic | >7 challenges; <50% include alternatives; same concern 2+ times reworded; >3 findings outside scope | "Rank by impact. Keep top 3 with alternatives. Delete the rest." |
| Trickster | False Alarm | Findings in untouched code; >10 findings for <5 files; impossible scenarios; >3 without repro steps | "Delete findings outside the diff. Rank by likelihood x impact. Keep top 3-5." |
| Sage | Bureaucrat | Review words >2x diff lines; findings outside changeset; >2 "consider" without action; suggesting docs for trivial functions | "Limit to issues affecting maintainability in 6 months. Every finding needs a specific action." |
**Detection Checklist** (trigger on ANY):
- [ ] Output >2000 words without a `### Recommendation` section
- [ ] >3 tangent topics not directly related to the original task
- [ ] >15 files read with no `### Patterns` identified
- [ ] No synthesis language (recommend, suggest, conclusion, finding, summary) in final 25% of output
### Shadow Immunity
**Correction:**
"Summarize your top 3 findings and one recommendation in under 300 words. If your output has no Recommendation section, add one. A dump is not research."
Intensity alone is not a shadow. **Shadow = behavior disconnected from the goal.**
- Explorer reading 20 files in a monorepo with scattered deps -- not rabbit hole if each is relevant
- Guardian blocking with 2 CRITICALs -- not paranoid if both are genuine vulnerabilities
- Trickster finding 5 edge cases -- not false alarm if all are in changed code with repro steps
---
## Creator → Over-Architect
**Virtue inverted:** Decisive Framing becomes designing at the wrong scale.
## System Shadows
**Symptoms:**
- Abstraction layers for one-time operations
- Future-proofing for requirements that don't exist
- Configuration systems for things that could be constants
- Proposal has more infrastructure than business logic
Orchestration-level dysfunction that isn't tied to one archetype.
**Detection Checklist** (trigger on ANY):
- [ ] >2 new abstractions (interfaces, base classes, factories, registries) for a single feature
- [ ] "In the future we might need..." or "future-proof" appears in rationale
- [ ] Proposal scope (files changed) exceeds original task scope by >50%
- [ ] More than 1 new package/module introduced for a single feature
**Correction:**
"Design for the current order of magnitude. If the app has 1000 users, design for 10,000 — not 10 million. Remove abstractions that serve hypothetical requirements."
| Shadow | Detect | Corrective Action |
|--------|--------|-------------------|
| **Tunnel Vision** | All reviewers flag same category (e.g., 4 security findings, 0 quality/testing) | "Redistribute attention. Are we missing quality, testing, or design concerns?" |
| **Echo Chamber** | Unanimous approval in <30s on standard/thorough workflow | "Suspicious fast consensus. Re-run Guardian with adversarial prompt." |
| **Gold Plating** | Maker working on INFO fixes while CRITICALs remain open | "Fix CRITICALs first. Park INFO items." |
| **Analysis Paralysis** | Plan phase >2x longer than Do phase; Explorer spawned 3+ times | "Stop researching. Ship a proposal with known gaps." |
| **Cargo Cult** | Memory lesson injected but the same finding repeats anyway | "Lesson ineffective. Reword, strengthen, or remove it." |
| **Broken Window** | 3+ WARNINGs deferred across consecutive runs in the same project | "Accumulated tech debt. Schedule a cleanup sprint." |
| **Scope Creep** | Maker changes >2x files listed in proposal | "Revert to proposal scope. If more files needed, update the proposal first." |
---
## Maker → Rogue
**Virtue inverted:** Execution Discipline becomes reckless shipping — or expanding beyond the plan.
## Policy Boundaries
**Symptoms:**
- Writing code before reading the proposal fully
- No tests, or tests written after implementation
- Large uncommitted working tree
- Files changed that aren't mentioned in the proposal
Operational limits that protect session quality, cost, and resumability.
**Detection Checklist** (trigger on ANY):
- [ ] Zero test files (`.test.`, `.spec.`, `_test.`) in the changeset with >=3 files changed
- [ ] Single monolithic commit instead of incremental commits
- [ ] Diff contains files not listed in the Creator's proposal `### Changes` section
- [ ] No evidence of running existing test suite before finishing
### Checkpoint Policy
**Correction:**
"Read the proposal. Write a test. Commit what you have. Revert changes to files not in the proposal. Then continue."
Every **45 minutes** or **3 completed tasks** (whichever first):
1. Commit + push all work in progress
2. Write handoff summary to `control-center.md`
3. Log token spend so far
4. Compare output quality: last task vs first task
5. If quality degrading -> STOP with clean state
6. If budget >80% spent -> STOP with clean state
7. Otherwise -> continue
### Budget Gate
| Threshold | Action |
|-----------|--------|
| 50% budget spent | Log warning, continue |
| 80% budget spent | Downgrade models (sonnet->haiku for reviewers) |
| 95% budget spent | Complete current task, then STOP |
| 100% budget | STOP immediately, commit WIP |
### Wiggum Break (Circuit Breaker)
Named after Chief Wiggum — policy enforcement AND the Ralph Loop's dad.
When a Wiggum Break triggers, the system halts execution, saves state, and
reports to the user. "Bake 'em away, toys."
**Hard breaks** (halt immediately, commit WIP):
| Trigger | Reason |
|---------|--------|
| 3 consecutive agent failures/timeouts | Infrastructure issue, not a code problem |
| 3 consecutive task failures in sprint | Something systemic is wrong |
| Same shadow detected 3+ times in one cycle | Task needs to be broken down or re-scoped |
| Test suite broken after merge | Auto-revert, then halt |
| 2+ oscillating findings (present→absent→present) | Fundamental tension in review criteria |
**Soft breaks** (finish current task, then halt):
| Signal | Reason |
|--------|--------|
| Cycle N findings identical to cycle N-1 | No progress — present best result |
| Convergence score <0.5 for 2 consecutive cycles | "This needs a different approach" |
| Reviewer finding count increases cycle over cycle | Implementation is diverging, not converging |
When a Wiggum Break fires, emit a `wiggum.break` event with trigger, run state, and unresolved findings.
The event log makes it easy to audit why a run was halted and whether the break was warranted.
### Context Pollution
| Signal | Action |
|--------|--------|
| >15 memory lessons injected into one prompt | Prune to top 5 by frequency |
| >20 findings tracked across cycles | Summarize into top 5 themes |
| Agent prompt exceeds estimated 50% of context window | Strip examples, keep rules only |
---
## Guardian → Paranoid
**Virtue inverted:** Threat Intuition becomes blocking everything — without offering a path forward.
## Unified Escalation Protocol
**Symptoms:**
- Every finding marked CRITICAL
- Blocking on theoretical risks with < 1% probability
- Rejecting without suggesting how to fix
- Security concerns for internal-only code at external-API severity
All three layers use the same escalation:
**Detection Checklist** (trigger on ANY):
- [ ] CRITICAL:WARNING ratio >2:1 (with minimum 3 total findings)
- [ ] Zero APPROVED verdicts in 3+ consecutive reviews
- [ ] <50% of findings include a suggested fix in the `Fix` column
- [ ] Findings reference attack scenarios that require already-compromised internal systems
**Correction:**
"For each CRITICAL finding, answer: Would a senior engineer block a PR for this? If not, downgrade. Every rejection must include a specific, implementable fix."
| Step | Archetype Shadows | System Shadows | Policy Boundaries |
|------|-------------------|----------------|-------------------|
| **1st** | Apply corrective action, let agent continue | Apply corrective action, continue run | Apply boundary action (downgrade, checkpoint) |
| **2nd** (same issue) | Replace the agent -- shadow is entrenched | Pause run, report to user | Force stop with clean state |
| **3rd** (pattern) | Escalate to user: "task needs re-scoping" | Escalate to user: "systemic issue" | Escalate to user: "resource limits reached" |
---
## Skeptic → Paralytic
**Virtue inverted:** Assumption Surfacing becomes inability to approve anything — drowning signal in tangential concerns.
## Integration
**Symptoms:**
- More than 7 challenges raised
- Challenges without suggested alternatives
- "What about X?" chains that drift from the task
- Restating the same concern in different words
Shadow checks run **after each agent completes** during orchestration. System shadow checks run **at phase boundaries**. Policy checks run **on a timer and at task boundaries**.
**Detection Checklist** (trigger on ANY):
- [ ] >7 findings/challenges raised in a single review
- [ ] <50% of findings include an alternative in the `Fix` column
- [ ] Same conceptual concern appears 2+ times with different wording
- [ ] >3 findings reference code or scenarios outside the task scope
**Correction:**
"Rank your challenges by impact. Keep the top 3. Each must include a specific alternative. Delete the rest."
---
## Trickster → False Alarm
**Virtue inverted:** Adversarial Creativity becomes noise — too many low-signal findings drowning the real issues.
**Symptoms:**
- Testing code that wasn't changed
- Reporting non-bugs as bugs (unrealistic test scenarios)
- 20 findings when 3 good ones would cover the real risks
- Edge cases for edge cases (diminishing returns)
**Detection Checklist** (trigger on ANY):
- [ ] Any finding references code untouched by the Maker's diff
- [ ] >10 findings for a change touching <5 files
- [ ] Findings describe scenarios requiring conditions that can't occur in the deployment context
- [ ] >3 findings without reproduction steps
**Correction:**
"Quality over quantity. Delete findings outside the Maker's diff. Rank remaining by likelihood x impact. Keep top 3-5. Three real findings beat twenty noise."
---
## Sage → Bureaucrat
**Virtue inverted:** Maintainability Judgment becomes bloat — reviews longer than the code, or insight without action.
**Symptoms:**
- Review longer than the code change itself
- Requesting documentation for self-evident code
- Suggesting refactors unrelated to the current task
- Deep-sounding analysis that doesn't end with a specific action
**Detection Checklist** (trigger on ANY):
- [ ] Review word count >2x the code change's line count (rough: review words > diff lines x 2)
- [ ] Any finding references files not in the Maker's changeset
- [ ] >2 findings use "consider" or "think about" without a concrete action in the `Fix` column
- [ ] Suggesting documentation for functions with <5 lines or self-descriptive names
**Correction:**
"Limit your review to issues that affect maintainability in the next 6 months. Every finding must end with a specific action. If you can't state the consequence of NOT fixing it, don't raise it."
---
## Shadow Escalation Protocol
1. **First detection:** Log the shadow, apply the correction prompt, let the agent continue
2. **Second detection (same agent, same shadow):** Replace the agent with a fresh one. The shadow is entrenched.
3. **Shadow detected in 3+ agents in the same cycle:** The task itself may be poorly scoped. Escalate to the user: "Multiple agents are struggling — the task may need to be broken down."
## Shadow Immunity
Some behaviors LOOK like shadows but aren't:
- Explorer reading 20 files in a monorepo with scattered dependencies → **not a rabbit hole** if each file is genuinely relevant
- Creator adding an abstraction → **not over-architect** if the abstraction is genuinely needed by the current task
- Guardian blocking with 2 CRITICAL findings → **not paranoid** if both are genuine security vulnerabilities
- Trickster finding 5 edge cases → **not false alarm** if all are in the changed code with reproduction steps
- Sage writing a long review → **not bureaucrat** if the change is large and every finding is actionable
**Rule of thumb:** Shadow = behavior disconnected from the goal. Intensity alone is not a shadow.
The `run` skill references this framework at:
- Step 3 (Check phase): archetype shadow monitoring
- Step 4 (Act phase): convergence/diminishing returns
- Step 5 (Completion): effectiveness scoring
- Sprint skill: checkpoint policy between batches

164
skills/sprint/SKILL.md Normal file
View File

@@ -0,0 +1,164 @@
---
name: sprint
description: |
Workspace sprint runner. Reads queue.json, spawns parallel agent teams across projects,
manages lifecycle (commit, push, next task), tracks progress. The main operational mode
for ArcheFlow in multi-project workspaces.
<example>User: "af-sprint"</example>
<example>User: "Run the sprint"</example>
<example>User: "af-sprint --slots 5 --dry-run"</example>
---
# Workspace Sprint Runner
Read the task queue, spawn parallel agents across projects, collect results, commit+push,
spawn next batch. Repeat until the queue is drained or budget is exhausted.
## When to Use
This is the **primary operational mode** for ArcheFlow in multi-project workspaces.
Use it when the user says "run the sprint", "work the queue", "go autonomous", or
invokes `af-sprint`.
Do NOT use `archeflow:run` for individual tasks within a sprint -- the sprint runner
handles task dispatch internally, using `archeflow:run` only when a task warrants
full PDCA orchestration.
## Invocation
```
af-sprint # Run sprint with defaults (4 slots, AUTONOM mode)
af-sprint --slots 5 # Max 5 parallel agents
af-sprint --dry-run # Show what would run, don't execute
af-sprint --priority P0,P1 # Only process P0 and P1 items
af-sprint --project writing.colette # Only process items for this project
```
---
## Execution Protocol
### Step 0: Orient
Load queue from `docs/orchestra/queue.json`. Check mode (`AUTONOM` / `ATTENDED` / `PAUSED`).
Show one-line status: `sprint: AUTONOM | 7 pending (1xP0, 1xP2, 5xP3) | 4 slots`
- `AUTONOM` -- proceed without asking
- `ATTENDED` -- show plan, wait for user approval before each batch
- `PAUSED` -- report status only, do not start tasks
### Step 1: Select Batch
Pick tasks for the next batch. Rules:
1. **Priority cascade**: P0 first, then P1, then P2. Never start P3 unless user explicitly includes it.
2. **Dependency check**: Skip tasks whose `depends_on` items aren't all `completed`.
3. **One agent per project**: Never run two tasks on the same project simultaneously.
4. **Cost-aware concurrency**: L/XL tasks (expensive) max 2 concurrent. Fill remaining slots with S/M tasks. Target mix: 1-2 expensive + 2-3 cheap.
5. **Slot limit**: Never exceed `--slots` (default 4).
### Step 2: Assess and Dispatch
For each task in the batch, decide the execution strategy:
| Signal | Strategy |
|--------|----------|
| Estimate S, clear scope | **Direct** -- Agent with task description, no orchestration |
| Estimate M, multi-file | **Direct+** -- Agent with "read code first, run tests after" |
| Estimate L/XL, code | **Feature-dev** -- Agent explores, plans, implements, tests, self-reviews, commits |
| Estimate L/XL, writing | **PDCA** -- Use af-run with writing domain archetypes |
| validate/test/lint/check tasks | **Direct** -- cheap analytical, no orchestration |
| review/audit/security tasks | **Review** -- spawn Guardian + relevant reviewers only |
### L/XL Code Task Template
Give the agent a structured process:
```
Agent(prompt: "You are working on <project> at <path>. Task: <description>
1. EXPLORE: Read CLAUDE.md, docs/status.md, relevant source files.
2. PLAN: Identify files to change, write brief plan (what, where, why).
3. IMPLEMENT: Follow existing code patterns strictly.
4. TEST: Run project test suite, fix failures.
5. SELF-REVIEW: Re-read diff -- error handling, protocol compliance, test coverage.
6. COMMIT + PUSH: Conventional commits, signed, pushed.
STATUS: DONE | DONE_WITH_CONCERNS | NEEDS_CONTEXT | BLOCKED")
```
### Agent Spawn Template
Spawn ALL batch agents in a **single message** (parallel execution). Each agent gets:
```
Agent(
description: "<project>: <task-short>",
prompt: "You are working on <project> at <path>. Task: <description>
Rules:
- Read the project's CLAUDE.md first
- Commit: git -c user.signingkey=/home/c/.ssh/id_ed25519_dev.pub commit
- NO Co-Authored-By trailers, conventional commits
- Push: GIT_SSH_COMMAND='ssh -i /home/c/.ssh/id_ed25519_dev -o IdentitiesOnly=yes' git push origin main
- Run tests if the project has them
- Report: what you did, what changed, any blockers
STATUS: DONE | DONE_WITH_CONCERNS | NEEDS_CONTEXT | BLOCKED",
isolation: "worktree" # Only for L/XL tasks; S/M run directly
)
```
### Step 3: Mark Running
Update the queue after spawning:
```bash
./scripts/ws start <task-id> # or update queue.json status to "running" directly
```
### Step 4: Collect Results
Parse status token from agent output. Based on status:
- `DONE` -- mark completed, note result
- `DONE_WITH_CONCERNS` -- mark completed, log concerns for user review
- `NEEDS_CONTEXT` -- mark pending, add concern to notes, skip for now
- `BLOCKED` -- mark failed, add blocker to notes
Update: `./scripts/ws done <task-id> -r "<summary>"` or `./scripts/ws fail <task-id> -r "<reason>"`
### Step 5: Report and Loop
Show batch status, then **immediately select next batch** (no user prompt in AUTONOM mode):
```
-- Sprint Batch 1 --------------------------------------------------
+ writing.colette fanout run done (45s)
+ book.3sets validation done (30s)
! book.sos meta-book concept needs_context
+ tool.archeflow af-review mode done (60s)
Queue: 3 completed, 1 blocked, 3 remaining
--------------------------------------------------------------------
```
### Step 6: Sprint Complete
When no more tasks are schedulable:
1. Update `docs/control-center.md` Handoff section
2. Run `./scripts/ws log --summary "<sprint summary>"`
3. Show final report with duration, tasks completed/blocked/remaining, projects touched, commits
---
## Mode Behavior
| Mode | Dispatch | Between batches | Stops for |
|------|----------|----------------|-----------|
| **AUTONOM** | Immediate | One-line status, no pause | BLOCKED or budget exhaustion |
| **ATTENDED** | Show batch, wait for approval | Show results, ask "Continue? [y/n/edit]" | User decision |
| **PAUSED** | No dispatch | -- | Always (status display only) |
## Error Recovery
- **Agent crash**: Mark `failed`, continue with next batch
- **Git push fails**: Log error, do NOT retry -- user handles conflicts
- **Queue corrupted**: Run `./scripts/ws validate`, stop if invalid
- **Budget exceeded**: Stop sprint, report remaining tasks and estimated cost
- **All blocked**: Report dependency graph, suggest which blockers to resolve first

View File

@@ -7,316 +7,79 @@ description: |
<example>User: "archeflow init writing-short-story"</example>
<example>User: "archeflow template save my-backend-setup"</example>
<example>User: "archeflow template list"</example>
<example>User: "archeflow init --from ../book.giesing-gschichten"</example>
---
# Template Gallery Shareable ArcheFlow Configurations
# Template Gallery -- Shareable ArcheFlow Configurations
Workflows, team presets, custom archetypes, and domain configs should be reusable across projects. This skill defines the template system that makes ArcheFlow setups portable and shareable.
Makes ArcheFlow setups portable and reusable across projects.
## Template Storage
Templates live in two locations, with project-local overriding global:
| Location | Scope | Precedence |
|----------|-------|------------|
| `.archeflow/templates/` | Project-local | Higher (checked first) |
| `~/.archeflow/templates/` | Global (user-wide) | Lower (fallback) |
### Directory Structure
Subdirectories: `workflows/`, `teams/`, `archetypes/`, `domains/`, `bundles/`.
```
~/.archeflow/templates/
├── workflows/
│ ├── kurzgeschichte.yaml
│ ├── feature-implementation.yaml
│ └── security-review.yaml
├── teams/
│ ├── story-development.yaml
│ ├── backend.yaml
│ └── fullstack.yaml
├── archetypes/
│ ├── story-explorer.md
│ ├── story-sage.md
│ └── db-specialist.md
├── domains/
│ ├── writing.yaml
│ ├── code.yaml
│ └── research.yaml
└── bundles/
├── writing-short-story/
│ ├── manifest.yaml
│ ├── team.yaml
│ ├── workflow.yaml
│ ├── archetypes/
│ │ ├── story-explorer.md
│ │ └── story-sage.md
│ └── domain.yaml
└── backend-feature/
├── manifest.yaml
├── team.yaml
├── workflow.yaml
└── domain.yaml
```
## Bundles
Individual templates (workflows/, teams/, archetypes/, domains/) are single files that can be used standalone. Bundles are complete setups that include everything a project needs.
A bundle is a complete setup (team + workflow + archetypes + domain) in one directory.
---
## Bundle Manifest
Every bundle has a `manifest.yaml` that declares what it contains, what it requires, and what variables it exposes.
**Manifest (`manifest.yaml`):**
```yaml
name: writing-short-story
description: "Complete setup for short fiction writing with ArcheFlow"
version: 1
description: "Complete setup for short fiction writing"
domain: writing
includes:
team: story-development.yaml
workflow: kurzgeschichte.yaml
archetypes:
- story-explorer.md
- story-sage.md
archetypes: [story-explorer.md, story-sage.md]
domain: writing.yaml
requires:
- colette.yaml # Project must have this file
variables:
target_words: 6000 # Default, can be overridden at init time
max_cycles: 2 # Default, can be overridden at init time
```
### Manifest Fields
| Field | Required | Description |
|-------|----------|-------------|
| `name` | Yes | Bundle identifier (used in `archeflow init <name>`) |
| `description` | Yes | Human-readable description |
| `version` | No | Bundle version (integer, default 1) |
| `domain` | No | Domain this bundle is designed for |
| `includes` | Yes | Map of file types to filenames within the bundle |
| `requires` | No | List of files that must exist in the target project |
| `variables` | No | Key-value pairs with defaults, overridable at init |
### Includes Types
| Key | Target location in `.archeflow/` | Accepts |
|-----|----------------------------------|---------|
| `team` | `teams/<filename>` | Single YAML file |
| `workflow` | `workflows/<filename>` | Single YAML file |
| `archetypes` | `archetypes/<filename>` | List of Markdown files |
| `domain` | `domains/<filename>` | Single YAML file |
| `hooks` | `hooks.yaml` | Single YAML file |
---
## Operations
### `archeflow init <bundle-name>`
Initialize a project's `.archeflow/` directory from a named bundle.
**Procedure:**
1. Search for the bundle:
- `.archeflow/templates/bundles/<name>/manifest.yaml` (project-local)
- `~/.archeflow/templates/bundles/<name>/manifest.yaml` (global)
- If not found: error with list of available bundles
2. Read `manifest.yaml`
3. Check `requires`:
- For each required file, verify it exists in the project root
- If missing: error with `"Required file not found: <file>. This bundle requires it."`
4. Check for existing `.archeflow/` setup:
- If `.archeflow/teams/`, `.archeflow/workflows/`, etc. already contain files: warn and ask before overwriting
- Never silently overwrite existing configuration
5. Copy files from bundle to `.archeflow/`:
- `team``.archeflow/teams/<filename>`
- `workflow``.archeflow/workflows/<filename>`
- `archetypes``.archeflow/archetypes/<filename>` (each file)
- `domain``.archeflow/domains/<filename>`
- `hooks``.archeflow/hooks.yaml`
6. Create `.archeflow/config.yaml` with variables from manifest:
```yaml
# Generated by archeflow init from bundle: <name>
bundle: <name>
bundle_version: <version>
initialized: <timestamp>
requires: [colette.yaml]
variables:
target_words: 6000
max_cycles: 2
```
7. Print setup summary:
```
ArcheFlow initialized from bundle: <name>
Team: <team filename> → .archeflow/teams/
Workflow: <workflow filename> → .archeflow/workflows/
Archetypes: <count> files → .archeflow/archetypes/
Domain: <domain filename> → .archeflow/domains/
Config: .archeflow/config.yaml (variables: target_words=6000, max_cycles=2)
Ready to run: archeflow:run
```
| Field | Required | Description |
|-------|----------|-------------|
| `name` | Yes | Bundle identifier for `archeflow init <name>` |
| `description` | Yes | Human-readable description |
| `includes` | Yes | File types to filenames within bundle |
| `requires` | No | Files that must exist in target project |
| `variables` | No | Key-value defaults, overridable at init |
### `archeflow init --from <project-path>`
## Operations
Clone another project's ArcheFlow setup into the current project.
**`archeflow init <bundle-name>`**
1. Find bundle (project-local, then global)
2. Check `requires` files exist
3. Warn before overwriting existing `.archeflow/` config
4. Copy files to `.archeflow/` (teams/, workflows/, archetypes/, domains/)
5. Generate `.archeflow/config.yaml` with variables
**Procedure:**
**`archeflow init --from <project-path>`**
- Copy teams/, workflows/, archetypes/, domains/, config.yaml, hooks.yaml
- Skip run-specific data: events/, artifacts/, context/, templates/
1. Verify `<project-path>/.archeflow/` exists
2. Copy these subdirectories (if they exist):
- `teams/`
- `workflows/`
- `archetypes/`
- `domains/`
- `config.yaml`
- `hooks.yaml`
3. Do NOT copy (run-specific data):
- `events/`
- `artifacts/`
- `context/` (generated by colette-bridge, project-specific)
- `templates/` (project-local templates stay local)
4. Warn if target `.archeflow/` already has files
5. Print summary of what was copied
**`archeflow template save <name>`**
- Package current `.archeflow/` into `~/.archeflow/templates/bundles/<name>/`
- Auto-generate manifest.yaml
### `archeflow template save <name>`
Save the current project's `.archeflow/` setup as a reusable template bundle.
**Procedure:**
1. Verify `.archeflow/` exists and has content
2. Create bundle directory: `~/.archeflow/templates/bundles/<name>/`
- If it already exists: warn and ask before overwriting
3. Copy from `.archeflow/` to bundle:
- `teams/*.yaml` → bundle `team` (first file, or prompt if multiple)
- `workflows/*.yaml` → bundle `workflow` (first file, or prompt if multiple)
- `archetypes/*.md` → bundle `archetypes/`
- `domains/*.yaml` → bundle `domain` (first file, or prompt if multiple)
- `hooks.yaml` → bundle (if exists)
4. Generate `manifest.yaml`:
```yaml
name: <name>
description: "Saved from <project directory name>"
version: 1
domain: <from domain yaml if present>
includes:
team: <filename>
workflow: <filename>
archetypes: [<filenames>]
domain: <filename>
requires: []
variables: <from config.yaml variables section if present>
```
5. Print summary:
```
Template saved: <name>
Location: ~/.archeflow/templates/bundles/<name>/
Files: <count> files
Use with: archeflow init <name>
```
### `archeflow template list`
List all available templates — both individual files and bundles, from both global and project-local locations.
**Output format:**
```
ArcheFlow Templates
====================
Bundles:
writing-short-story Complete setup for short fiction writing [global]
backend-feature Backend feature implementation [global]
my-project-setup Saved from book.giesing-gschichten [global]
Individual Templates:
Workflows:
kurzgeschichte.yaml [global]
feature-implementation.yaml [global]
Teams:
story-development.yaml [global]
backend.yaml [global]
Archetypes:
story-explorer.md [global]
story-sage.md [global]
Domains:
writing.yaml [global]
code.yaml [global]
```
### `archeflow template share <name> <path>`
Export a template bundle to a directory for sharing (e.g., via git, email, file share).
**Procedure:**
1. Find the bundle (global or local)
2. Copy the entire bundle directory to `<path>/<name>/`
3. Print the path and a one-liner for importing:
```
Exported: <path>/<name>/
To import: cp -r <path>/<name> ~/.archeflow/templates/bundles/
```
---
**`archeflow template list`**
- Show all bundles and individual templates (global + project-local)
## Variable Substitution
Bundle manifests can define variables with defaults. These are stored in `.archeflow/config.yaml` after init and can be overridden:
Variables in manifests are stored in `.archeflow/config.yaml` after init. Substitution happens at run time, not template time.
- At init time: `archeflow init writing-short-story --set target_words=8000`
- After init: edit `.archeflow/config.yaml` directly
Override at init: `archeflow init writing-short-story --set target_words=8000`
Variables are available to workflows and the run skill via config:
## Individual Templates
```yaml
# In a workflow, reference variables:
phases:
do:
description: |
Draft the story. Target: ${target_words} words.
```
Variable substitution happens at run time, not at init time. The workflow file contains the `${variable}` placeholder; the run skill reads `.archeflow/config.yaml` and substitutes before passing to agents.
---
## Individual Template Usage
Not everything needs a bundle. Individual templates can be copied directly:
```bash
# Copy a single workflow
cp ~/.archeflow/templates/workflows/kurzgeschichte.yaml .archeflow/workflows/
# Copy a single archetype
cp ~/.archeflow/templates/archetypes/story-explorer.md .archeflow/archetypes/
# Copy a team preset
cp ~/.archeflow/templates/teams/story-development.yaml .archeflow/teams/
```
The `archeflow init` command handles bundles. For individual files, manual copy or the helper script (`lib/archeflow-init.sh`) can be used.
---
## Integration with Other Skills
- **`archeflow:run`** — Reads `.archeflow/config.yaml` for variables, applies them during run initialization
- **`archeflow:domains`** — Domain YAML from templates is loaded like any other domain config
- **`archeflow:custom-archetypes`** — Archetype .md files from templates work identically to hand-written ones
- **`archeflow:workflow-design`** — Workflow YAML from templates follows the same schema
- **`archeflow:colette-bridge`** — Bundle `requires: [colette.yaml]` ensures the bridge has what it needs
---
## Design Principles
1. **Bundles are self-contained.** Everything needed to set up a project is in the bundle directory. No external dependencies beyond `requires`.
2. **Never silently overwrite.** Init warns before replacing existing files. Templates are helpers, not bulldozers.
3. **Global + local layering.** Project-local templates override global ones. This allows per-project customization without polluting the global registry.
4. **Skip run data.** Events, artifacts, and context are run-specific. Templates carry only configuration.
5. **Variables are late-bound.** Substitution happens at run time, not template time. This keeps templates generic.
6. **Plain files, no magic.** Templates are just directories of YAML and Markdown files. No databases, no registries, no lock files.
Single files can be copied directly without a bundle:
- `~/.archeflow/templates/workflows/<name>.yaml`
- `~/.archeflow/templates/archetypes/<name>.md`
- `~/.archeflow/templates/teams/<name>.yaml`

View File

@@ -0,0 +1,22 @@
# ArcheFlow -- Active
Multi-agent orchestration using archetypal roles and PDCA quality cycles.
## Session Start
On activation, print ONE line then proceed silently:
```
archeflow v0.8.0 · 19 skills · <domain> domain
```
Domain: `writing` if `colette.yaml` exists, `research` if paper/thesis files, `code` otherwise.
## When to Use
| Need | Command |
|------|---------|
| Work the queue | `/af-sprint` |
| Deep orchestration | `/af-run <task>` |
| Code review | `/af-review` |
| Simple fix / question | Skip ArcheFlow — just do it directly |
Do NOT use ArcheFlow for: single-line fixes, questions, reading code, config tweaks, git ops.

View File

@@ -5,169 +5,52 @@ description: Use at session start when implementing features, reviewing code, de
# ArcheFlow -- Active
Multi-agent orchestration using archetypal roles and PDCA quality cycles.
## Session Start
On activation, print ONE line:
On activation, print ONE line then proceed silently:
```
archeflow v0.3.0 · 25 skills · <domain> domain
archeflow v0.9.0 · 24 skills · <domain> domain
```
Where `<domain>` is auto-detected: `writing` if `colette.yaml` exists, `research` if paper/thesis files exist, `code` otherwise. Then proceed silently — no further announcement unless `archeflow:run` is invoked.
Domain auto-detected: `writing` if `colette.yaml` exists, `research` if paper/thesis files, `code` otherwise.
During runs, follow the `archeflow:presence` skill for output format: show outcomes not mechanics, one line per phase, value at the end.
## When to Use What
## IMPORTANT: When to Activate
You MUST use ArcheFlow orchestration (load `archeflow:orchestration` skill and follow its steps) for any task that matches:
- **New features** -- any feature touching 2+ files
- **Refactoring** -- structural changes across modules
- **Security-sensitive changes** -- auth, encryption, input handling, API keys
- **Bug fixes with unclear root cause** -- use Explorer to investigate first
- **Code review requests** -- spawn Guardian + relevant reviewers
- **Multi-file changes** -- anything beyond a single-file edit
Choose the workflow based on risk:
| Signal | Workflow | Command |
|--------|----------|---------|
| Small fix, low risk, single concern | `fast` | Creator --> Maker --> Guardian |
| Feature, multiple files, moderate risk | `standard` | Explorer + Creator --> Maker --> Guardian + Skeptic + Sage |
| Security-sensitive, breaking changes, public API | `thorough` | Explorer + Creator --> Maker --> All 4 reviewers |
| Need | Command | When |
|------|---------|------|
| **Work the queue** | `/af-sprint` | Multiple tasks pending across projects, "run the sprint" |
| **Deep orchestration** | `/af-run` | Writing/research tasks, security-sensitive code, complex multi-module refactors |
| **Code review** | `/af-review` | Review diff/branch/commits before merging, security-sensitive changes |
| **Single feature** | `feature-dev` or direct | Clear scope, one project -- no orchestration needed |
## When to Skip ArcheFlow
Do NOT use ArcheFlow for these -- just do them directly:
Do NOT use for: single-line fixes, questions, reading/exploring, config tweaks, git ops.
- Single-line fixes, typos, formatting
- Answering questions (no code changes)
- Reading/exploring code without making changes
- Config changes to a single file
- Git operations (commit, push, branch)
## Workflow Selection
**Mini-Reflect fallback:** Even when skipping ArcheFlow, apply a quick reflection for non-trivial single-file changes: (1) restate what you're changing, (2) name one assumption, (3) check if it could break anything. This takes ~10 seconds and catches misunderstandings before they become commits.
## Archetypes
| Archetype | Avatar | Virtue | Shadow | Phase |
|-----------|--------|--------|--------|-------|
| **Explorer** | 🔍 | Contextual Clarity | Rabbit Hole | Plan |
| **Creator** | 🏗️ | Decisive Framing | Over-Architect | Plan |
| **Maker** | ⚒️ | Execution Discipline | Rogue | Do |
| **Guardian** | 🛡️ | Threat Intuition | Paranoid | Check |
| **Skeptic** | 🤔 | Assumption Surfacing | Paralytic | Check |
| **Trickster** | 🃏 | Adversarial Creativity | False Alarm | Check |
| **Sage** | 📚 | Maintainability Judgment | Bureaucrat | Check |
## PDCA Cycle
```
Plan --> Explorer researches, Creator proposes
Do --> Maker implements in isolated worktree
Check --> Reviewers assess in parallel (approve/reject)
Act --> All approved? Merge. Issues? Cycle back to Plan.
```
## Progress Indicators
During orchestration, emit phase markers so the user can track progress:
```
--- ArcheFlow: <task> -------------------------
Workflow: standard (2 cycles max)
🔍 [Plan] Explorer researching... done (35s)
🏗️ [Plan] Creator designing proposal... done (25s, confidence: 0.8)
⚒️ [Do] Maker implementing... done (90s, 4 files, 8 tests)
🛡️ [Check] Guardian reviewing... APPROVED
🤔 [Check] Skeptic challenging... APPROVED (1 INFO)
📚 [Check] Sage reviewing... APPROVED
[Act] All approved -- merging... merged to main
--- Complete: 3m 10s, 1 cycle -----------------
```
Update each line as agents complete. This gives the user real-time visibility without interrupting the flow.
## Dry-Run Mode
When the user asks "what would ArcheFlow do?" or uses `--dry-run`, show the plan without executing:
```
Dry run for: "Add JWT authentication"
Workflow: standard (2 cycles)
Agents: 🔍 Explorer --> 🏗️ Creator --> ⚒️ Maker --> 🛡️ Guardian + 🤔 Skeptic + 📚 Sage
Est. agents: 6 per cycle, 12 max
Worktree: yes (isolated branch)
Proceed? [y/n]
```
## Quick Start
When the user gives an implementation task:
1. Assess: does this need ArcheFlow? (see criteria above)
2. If yes: load `archeflow:orchestration` skill
3. Pick workflow (fast/standard/thorough)
4. Execute the PDCA steps from the orchestration skill
5. Emit progress indicators throughout (see above)
| Signal | Workflow | Pipeline |
|--------|----------|----------|
| Small fix, low risk | `fast` | Creator --> Maker --> Guardian |
| Feature, multi-file, moderate risk | `standard` | Explorer + Creator --> Maker --> Guardian + Skeptic + Sage |
| Security, breaking changes, public API | `thorough` | Explorer + Creator --> Maker --> All 4 reviewers |
## Available Commands
| Command | What it does |
|---------|-------------|
| `archeflow:run` | Automated PDCA loop -- single command to orchestrate a full run |
| `archeflow:orchestration` | Load manual PDCA execution guide |
| `archeflow:shadow-detection` | Load shadow monitoring rules |
| `archeflow:autonomous-mode` | Load autonomous/overnight session protocol |
| `archeflow:status` | Show current orchestration state (phase, cycle, active agents) |
| `archeflow:history` | Show past orchestration summaries from `.archeflow/session-log.md` |
| `/af-sprint` | Queue-driven parallel agent runner (primary mode) |
| `/af-run <task>` | PDCA orchestration loop (`--dry-run`, `--start-from`, `--workflow`) |
| `/af-review` | Guardian-led code review on diff/branch/range |
| `/af-status` | Current run state, active agents, findings |
| `/af-report` | Full process report for a run |
| `/af-init` | Initialize ArcheFlow in a project |
| `/af-score` | Archetype effectiveness scores |
| `/af-memory` | Cross-run lesson memory |
| `/af-fanout` | Colette book fanout via agents |
| `/af-dag` | DAG of current/last run |
| `/af-replay <run_id>` | Decision timeline + weighted what-if on recorded events |
### `archeflow:status`
Read `.archeflow/state.json` (if exists) and report:
- Current task, phase, and cycle
- Active agents and their status
- Findings so far (by severity)
- Time elapsed
## Mini-Reflect Fallback
### `archeflow:history`
Read `.archeflow/session-log.md` and show the last 5 orchestration summaries in compact format.
## Skills Reference (All 24)
### Core Orchestration
- **archeflow:run** -- Automated PDCA execution loop with `--start-from` and `--dry-run`
- **archeflow:orchestration** -- Step-by-step manual execution guide
- **archeflow:plan-phase** -- Explorer and Creator output formats and protocols
- **archeflow:do-phase** -- Maker implementation rules and worktree commit strategy
- **archeflow:check-phase** -- Shared reviewer protocols and output format
- **archeflow:act-phase** -- Post-Check decision logic: collect findings, route fixes, exit or cycle
### Quality and Safety
- **archeflow:shadow-detection** -- Quantitative dysfunction detection and correction
- **archeflow:attention-filters** -- Context optimization per archetype
- **archeflow:convergence** -- Detects convergence, stalling, and oscillation in multi-cycle runs
- **archeflow:artifact-routing** -- Inter-phase artifact protocol for naming, storage, and routing
### Process Intelligence
- **archeflow:process-log** -- Event-sourced JSONL logging with DAG parent relationships
- **archeflow:memory** -- Cross-run learning from recurring findings
- **archeflow:effectiveness** -- Archetype scoring on signal-to-noise, fix rate, cost efficiency
- **archeflow:progress** -- Live progress file watchable from a second terminal
### Integration
- **archeflow:colette-bridge** -- Bridges ArcheFlow with the Colette writing platform
- **archeflow:git-integration** -- Git-per-phase commits, branch-per-run, rollback
- **archeflow:multi-project** -- Cross-repo orchestration with dependency DAG and shared budget
### Configuration
- **archeflow:custom-archetypes** -- Create domain-specific roles
- **archeflow:workflow-design** -- Design custom workflows with per-phase archetype assignment
- **archeflow:domains** -- Domain adapters for writing, research, and non-code workflows
- **archeflow:cost-tracking** -- Budget enforcement and model tier recommendations
- **archeflow:templates** -- Template gallery for sharing workflows, teams, and setup bundles
- **archeflow:autonomous-mode** -- Unattended overnight sessions
### Meta
- **archeflow:using-archeflow** -- This skill: session-start activation and quick reference
Even when skipping ArcheFlow, apply for non-trivial changes:
1. Restate what you're changing
2. Name one assumption
3. Check if it could break anything

View File

@@ -1,244 +1,70 @@
---
name: workflow-design
description: Use when designing custom orchestration workflows choosing which archetypes run in each PDCA phase, setting exit conditions, and configuring PDCA cycles.
description: Use when designing custom orchestration workflows -- choosing which archetypes run in each PDCA phase, setting exit conditions, and configuring PDCA cycles.
---
# Workflow Design PDCA Cycles
# Workflow Design -- PDCA Cycles
ArcheFlow's PDCA cycles spiral upward through iterations — each cycle incorporates feedback from the previous one, producing progressively better results. Each cycle incorporates feedback from the previous one.
```
Act ──────────── Done ✓
Check (review)
Do (implement)
Plan (design) ← Cycle 2 (with feedback from Cycle 1)
Act ─┘ (issues found → feed back)
│ ↑
│ Check (review)
│ ↑
│ Do (implement)
│ ↑
│ Plan (design) ← Cycle 1 (initial)
```
PDCA cycles spiral upward: each cycle incorporates feedback from the previous one.
## Built-in Workflows
### `fast` — Single Turn
```
Plan: Creator designs
Do: Maker implements (worktree)
Check: Guardian reviews
Act: Approve or reject (1 cycle max)
```
**Use for:** Bug fixes, small changes, low-risk tasks.
### `standard` — Two Cycles
```
Plan: Explorer researches → Creator designs
Do: Maker implements (worktree)
Check: Guardian + Skeptic + Sage review (parallel)
Act: Approve or cycle (2 cycles max)
```
**Use for:** Features, refactors, moderate-risk changes.
### `thorough` — Three Cycles
```
Plan: Explorer researches → Creator designs
Do: Maker implements (worktree)
Check: Guardian + Skeptic + Sage + Trickster (parallel)
Act: Approve or cycle (3 cycles max)
```
**Use for:** Security-critical, public APIs, infrastructure changes.
| Workflow | Plan | Do | Check | Exit | Max Cycles |
|----------|------|----|-------|------|------------|
| `fast` | Creator | Maker | Guardian | approve/reject | 1 |
| `standard` | Explorer + Creator | Maker | Guardian + Skeptic + Sage | all_approved | 2 |
| `thorough` | Explorer + Creator | Maker | Guardian + Skeptic + Sage + Trickster | all_approved | 3 |
## Designing Custom Workflows
### Step 1: Identify the Concern
**Step 1: Identify the concern**
What's the primary risk?
| Risk | Emphasize in Check |
|------|-------------------|
| Security | Guardian + Trickster |
| Correctness | Skeptic + Sage |
| Performance | Custom `perf-tester` |
| Compliance | Custom `compliance-auditor` |
| Data integrity | Custom `db-specialist` |
| Primary Risk | Emphasize |
|-------------|-----------|
| Security | Guardian + Trickster in Check |
| Correctness | Skeptic + Sage in Check |
| Performance | Custom `perf-tester` archetype |
| Compliance | Custom `compliance-auditor` archetype |
| Data integrity | Custom `db-specialist` archetype |
| User experience | Custom `ux-reviewer` archetype |
**Step 2: Phase assignment rules**
- Plan always includes Creator
- Do always includes Maker
- Check needs at least one reviewer
- Max 3 archetypes per phase
- Explorer goes in Plan only; Maker goes in Do only
### Step 2: Assign Phases
**Step 3: Exit conditions**
Rules:
- **Plan** always includes Creator (someone must propose)
- **Do** always includes Maker (someone must build)
- **Check** needs at least one reviewer
- Max 3 archetypes per phase (diminishing returns beyond that)
- Explorer goes in Plan only (research before design)
- Maker goes in Do only (build from plan, not from scratch)
| Condition | Cycle ends when |
|-----------|----------------|
| `all_approved` | Every reviewer says APPROVED |
| `no_critical` | No CRITICAL findings |
| `convergence` | No new issues vs previous cycle |
| `always` | Runs all maxCycles unconditionally |
### Step 3: Set Exit Conditions
| Condition | When Cycle Ends | Best For |
|-----------|----------------|----------|
| `all_approved` | Every Check reviewer says APPROVED | Consensus-driven (default) |
| `no_critical` | No CRITICAL findings in Check output | Speed with safety net |
| `convergence` | No new issues vs. previous cycle | Diminishing returns detection |
| `always` | Runs all maxCycles unconditionally | Research, exploration |
### Step 4: Set Max Cycles
- **1 cycle:** Fast, low-risk (fast workflow)
- **2 cycles:** Balanced — one shot + one fix (standard workflow)
- **3 cycles:** Thorough — usually converges by cycle 3
- **4+ cycles:** Rarely useful. If 3 cycles don't converge, the task needs human input.
## Example Custom Workflows
### Security-First
```
Plan: Explorer (threat modeling) → Creator
Do: Maker
Check: Guardian + Trickster (parallel)
Exit: all_approved, max 3 cycles
```
### Research-Heavy
```
Plan: Explorer (deep research) → Creator
Do: Maker
Check: Skeptic + Sage (parallel)
Exit: all_approved, max 2 cycles
```
### Domain-Specific (with custom archetypes)
```
Plan: Explorer → Creator
Do: Maker
Check: Guardian + db-specialist + compliance-auditor (parallel)
Exit: all_approved, max 2 cycles
```
### Minimal Validation
```
Plan: Creator (no research)
Do: Maker
Check: Guardian
Exit: no_critical, max 1 cycle
```
**Step 4: Max cycles** -- 1 (fast), 2 (balanced), 3 (thorough). 4+ rarely useful.
## Hook Points
Add project-specific validation at key moments in the PDCA cycle. Define hooks in `.archeflow/hooks.yaml`:
Define in `.archeflow/hooks.yaml`:
```yaml
# .archeflow/hooks.yaml
pre-plan:
- command: "npm run lint"
description: "Ensure clean baseline before planning"
fail_action: abort # abort | warn | ignore
post-check:
- command: "npm test"
description: "Run tests after review to verify reviewer suggestions"
fail_action: cycle_back
pre-merge:
- command: "./scripts/check-migrations.sh"
description: "Verify migration safety before merging"
fail_action: abort
post-merge:
- command: "npm run integration-test"
description: "Full integration test after merge"
fail_action: revert
```
**Available hook points:**
| Hook | When | Typical Use |
| Hook | When | Typical use |
|------|------|-------------|
| `pre-plan` | Before Explorer/Creator start | Lint, ensure clean baseline |
| `post-plan` | After Creator's proposal | Validate proposal against constraints |
| `pre-do` | Before Maker starts | Check worktree setup |
| `post-do` | After Maker commits | Quick smoke test |
| `post-check` | After reviewers finish | Run test suite |
| `pre-merge` | Before merging to main | Migration safety, API compatibility |
| `post-merge` | After merge completes | Integration tests, deploy checks |
| `pre-plan` | Before Explorer/Creator | Lint, clean baseline |
| `post-plan` | After Creator's proposal | Validate constraints |
| `pre-do` | Before Maker | Check worktree |
| `post-do` | After Maker commits | Smoke test |
| `post-check` | After reviewers | Run test suite |
| `pre-merge` | Before merge | Migration safety |
| `post-merge` | After merge | Integration tests |
## Workflow Template Library
Pre-built workflows for common scenarios. Use as-is or as starting points for custom workflows.
### API Design
```yaml
name: api-design
description: New or changed API endpoints
plan: [explorer, creator]
do: [maker]
check: [guardian, skeptic] # Guardian for security, Skeptic for API design assumptions
exit: all_approved
max_cycles: 2
hooks:
post-check: "npm run api-compatibility-check"
```
### Database Migration
```yaml
name: migration
description: Schema changes and data migrations
plan: [explorer, creator]
do: [maker]
check: [guardian, db-specialist] # Requires custom db-specialist archetype
exit: all_approved
max_cycles: 2
hooks:
pre-merge: "./scripts/check-migration-reversibility.sh"
```
### Dependency Upgrade
```yaml
name: dep-upgrade
description: Upgrading dependencies (major versions, security patches)
plan: [creator] # No Explorer needed — changelog is the research
do: [maker]
check: [guardian]
exit: no_critical
max_cycles: 1
hooks:
post-do: "npm audit"
post-merge: "npm test && npm run e2e"
```
### Documentation Rewrite
```yaml
name: docs-rewrite
description: Major documentation changes
plan: [explorer, creator]
do: [maker]
check: [sage] # Quality/consistency only — no security review needed
exit: all_approved
max_cycles: 1
```
### Hotfix
```yaml
name: hotfix
description: Emergency production fix
plan: [creator]
do: [maker]
check: [guardian]
exit: no_critical
max_cycles: 1
hooks:
post-merge: "npm test"
```
Each hook has `command`, `description`, and `fail_action` (abort / warn / ignore / cycle_back / revert).
## Anti-Patterns
- **Kitchen sink:** Putting all 7 archetypes in Check. Most can't add value simultaneously.
- **Runaway cycles:** maxCycles > 4 burns tokens without convergence.
- **Reviewerless Do:** Skipping Check phase "to save time." You'll pay in bugs.
- **Maker in Plan:** Maker should implement from a proposal, not design on the fly.
- **Solo orchestration:** One archetype in every phase. That's just a single agent with extra steps.
- All 7 archetypes in Check (diminishing returns)
- maxCycles > 4 (burns tokens without convergence)
- Skipping Check phase
- Maker in Plan phase
- One archetype in every phase (just a single agent with overhead)

71
tests/archeflow-dag.bats Normal file
View File

@@ -0,0 +1,71 @@
# Tests for archeflow-dag.sh — ASCII DAG rendering from JSONL events.
#
# Validates: basic rendering, parent relationships, color flags, missing file handling.
setup() {
load test_helper
_common_setup
# Create a standard events file with parent relationships
cat > "$BATS_TEST_TMPDIR/dag-events.jsonl" <<'EVENTS'
{"ts":"2026-04-03T10:00:00Z","run_id":"dag-run","seq":1,"parent":[],"type":"run.start","phase":"plan","agent":null,"data":{"task":"DAG test"}}
{"ts":"2026-04-03T10:01:00Z","run_id":"dag-run","seq":2,"parent":[1],"type":"agent.complete","phase":"plan","agent":"creator","data":{"archetype":"creator","duration_ms":60000,"tokens":1500}}
{"ts":"2026-04-03T10:02:00Z","run_id":"dag-run","seq":3,"parent":[2],"type":"phase.transition","phase":"do","agent":null,"data":{"from":"plan","to":"do"}}
{"ts":"2026-04-03T10:03:00Z","run_id":"dag-run","seq":4,"parent":[3],"type":"agent.complete","phase":"do","agent":"maker","data":{"archetype":"maker","duration_ms":120000,"tokens":3000}}
{"ts":"2026-04-03T10:04:00Z","run_id":"dag-run","seq":5,"parent":[4],"type":"run.complete","phase":"act","agent":null,"data":{"agents_total":2,"fixes_total":0}}
EVENTS
}
@test "dag: exits 1 with usage when called with no args" {
run "$LIB_DIR/archeflow-dag.sh"
[ "$status" -eq 1 ]
[[ "$output" == *"Usage"* ]]
}
@test "dag: exits 1 when events file not found" {
run "$LIB_DIR/archeflow-dag.sh" nonexistent.jsonl
[ "$status" -eq 1 ]
[[ "$output" == *"not found"* ]]
}
@test "dag: renders run.start as root node" {
run "$LIB_DIR/archeflow-dag.sh" "$BATS_TEST_TMPDIR/dag-events.jsonl" --no-color
[ "$status" -eq 0 ]
[[ "$output" == *"#1"* ]]
[[ "$output" == *"run.start"* ]]
}
@test "dag: renders agent.complete events with archetype name" {
run "$LIB_DIR/archeflow-dag.sh" "$BATS_TEST_TMPDIR/dag-events.jsonl" --no-color
[ "$status" -eq 0 ]
[[ "$output" == *"creator"* ]]
[[ "$output" == *"maker"* ]]
}
@test "dag: renders phase transitions" {
run "$LIB_DIR/archeflow-dag.sh" "$BATS_TEST_TMPDIR/dag-events.jsonl" --no-color
[ "$status" -eq 0 ]
[[ "$output" == *"plan"* ]]
[[ "$output" == *"do"* ]]
}
@test "dag: renders run.complete with agent/fix counts" {
run "$LIB_DIR/archeflow-dag.sh" "$BATS_TEST_TMPDIR/dag-events.jsonl" --no-color
[ "$status" -eq 0 ]
[[ "$output" == *"run.complete"* ]]
[[ "$output" == *"2 agents"* ]]
}
@test "dag: --no-color suppresses ANSI codes" {
run "$LIB_DIR/archeflow-dag.sh" "$BATS_TEST_TMPDIR/dag-events.jsonl" --no-color
[ "$status" -eq 0 ]
# Should not contain escape sequences
[[ "$output" != *$'\033'* ]]
}
@test "dag: uses tree-drawing characters for hierarchy" {
run "$LIB_DIR/archeflow-dag.sh" "$BATS_TEST_TMPDIR/dag-events.jsonl" --no-color
[ "$status" -eq 0 ]
# Should contain box-drawing characters (either unicode or ASCII connectors)
[[ "$output" == *"├"* ]] || [[ "$output" == *"└"* ]]
}

127
tests/archeflow-event.bats Normal file
View File

@@ -0,0 +1,127 @@
# Tests for archeflow-event.sh — structured JSONL event logging.
#
# Validates: JSONL output format, sequence numbering, parent field handling,
# input validation, file/directory creation.
setup() {
load test_helper
_common_setup
}
teardown() {
_common_teardown
}
@test "event: exits 1 with usage when called with fewer than 4 args" {
run "$LIB_DIR/archeflow-event.sh" run1 type1 plan
[ "$status" -eq 1 ]
[[ "$output" == *"Usage"* ]]
}
@test "event: creates events directory and file on first call" {
run "$LIB_DIR/archeflow-event.sh" test-run run.start plan "" '{"task":"test"}'
[ "$status" -eq 0 ]
[ -d ".archeflow/events" ]
[ -f ".archeflow/events/test-run.jsonl" ]
}
@test "event: first event has seq=1" {
run "$LIB_DIR/archeflow-event.sh" test-run run.start plan "" '{"task":"test"}'
[ "$status" -eq 0 ]
local seq
seq=$(head -1 ".archeflow/events/test-run.jsonl" | jq -r '.seq')
[ "$seq" -eq 1 ]
}
@test "event: second event has seq=2" {
"$LIB_DIR/archeflow-event.sh" test-run run.start plan "" '{"task":"test"}' 2>/dev/null
"$LIB_DIR/archeflow-event.sh" test-run agent.complete plan creator '{"dur":100}' "1" 2>/dev/null
local count
count=$(wc -l < ".archeflow/events/test-run.jsonl")
[ "$count" -eq 2 ]
local seq2
seq2=$(tail -1 ".archeflow/events/test-run.jsonl" | jq -r '.seq')
[ "$seq2" -eq 2 ]
}
@test "event: output is valid JSONL" {
"$LIB_DIR/archeflow-event.sh" test-run run.start plan "" '{"task":"hello"}' 2>/dev/null
# jq will fail if the line is not valid JSON
jq empty ".archeflow/events/test-run.jsonl"
}
@test "event: fields are correctly populated" {
"$LIB_DIR/archeflow-event.sh" test-run agent.complete do maker '{"tokens":500}' 2>/dev/null
local event
event=$(head -1 ".archeflow/events/test-run.jsonl")
[ "$(echo "$event" | jq -r '.run_id')" = "test-run" ]
[ "$(echo "$event" | jq -r '.type')" = "agent.complete" ]
[ "$(echo "$event" | jq -r '.phase')" = "do" ]
[ "$(echo "$event" | jq -r '.agent')" = "maker" ]
[ "$(echo "$event" | jq -r '.data.tokens')" = "500" ]
}
@test "event: empty agent becomes null in JSON" {
"$LIB_DIR/archeflow-event.sh" test-run phase.transition do "" '{"from":"plan","to":"do"}' 2>/dev/null
local agent
agent=$(head -1 ".archeflow/events/test-run.jsonl" | jq -r '.agent')
[ "$agent" = "null" ]
}
@test "event: parent field is empty array for root events" {
"$LIB_DIR/archeflow-event.sh" test-run run.start plan "" '{}' 2>/dev/null
local parent
parent=$(head -1 ".archeflow/events/test-run.jsonl" | jq -c '.parent')
[ "$parent" = "[]" ]
}
@test "event: single parent is parsed correctly" {
"$LIB_DIR/archeflow-event.sh" test-run run.start plan "" '{}' 2>/dev/null
"$LIB_DIR/archeflow-event.sh" test-run agent.complete plan creator '{}' "1" 2>/dev/null
local parent
parent=$(tail -1 ".archeflow/events/test-run.jsonl" | jq -c '.parent')
[ "$parent" = "[1]" ]
}
@test "event: multiple parents (fan-in) are parsed correctly" {
"$LIB_DIR/archeflow-event.sh" test-run run.start plan "" '{}' 2>/dev/null
"$LIB_DIR/archeflow-event.sh" test-run a plan "" '{}' "1" 2>/dev/null
"$LIB_DIR/archeflow-event.sh" test-run b plan "" '{}' "1" 2>/dev/null
"$LIB_DIR/archeflow-event.sh" test-run merge plan "" '{}' "2,3" 2>/dev/null
local parent
parent=$(tail -1 ".archeflow/events/test-run.jsonl" | jq -c '.parent')
[ "$parent" = "[2,3]" ]
}
@test "event: rejects invalid JSON data" {
run "$LIB_DIR/archeflow-event.sh" test-run run.start plan "" 'not-json'
[ "$status" -eq 1 ]
[[ "$output" == *"invalid JSON"* ]]
}
@test "event: rejects invalid parent format" {
run "$LIB_DIR/archeflow-event.sh" test-run run.start plan "" '{}' "abc"
[ "$status" -eq 1 ]
[[ "$output" == *"invalid parent format"* ]]
}
@test "event: timestamp is ISO 8601 UTC format" {
"$LIB_DIR/archeflow-event.sh" test-run run.start plan "" '{}' 2>/dev/null
local ts
ts=$(head -1 ".archeflow/events/test-run.jsonl" | jq -r '.ts')
# Matches YYYY-MM-DDTHH:MM:SSZ
[[ "$ts" =~ ^[0-9]{4}-[0-9]{2}-[0-9]{2}T[0-9]{2}:[0-9]{2}:[0-9]{2}Z$ ]]
}
@test "event: default data is empty object when omitted" {
"$LIB_DIR/archeflow-event.sh" test-run run.start plan agent 2>/dev/null
local data
data=$(head -1 ".archeflow/events/test-run.jsonl" | jq -c '.data')
[ "$data" = "{}" ]
}
@test "event: confirmation message goes to stderr" {
run "$LIB_DIR/archeflow-event.sh" test-run run.start plan "" '{}' "" 2>&1
[[ "$output" == *"[archeflow-event]"* ]]
[[ "$output" == *"#1"* ]]
}

212
tests/archeflow-git.bats Normal file
View File

@@ -0,0 +1,212 @@
# Tests for archeflow-git.sh — git branch/commit strategy for ArcheFlow runs.
#
# Validates: branch creation with correct naming, commit formatting,
# merge strategies, input validation, and safety guards.
setup() {
load test_helper
_common_setup
}
teardown() {
_common_teardown
}
# --- Usage ---
@test "git: exits 1 with usage when called with fewer than 2 args" {
run "$LIB_DIR/archeflow-git.sh"
[ "$status" -eq 1 ]
[[ "$output" == *"Usage"* ]]
}
@test "git: exits 1 for unknown command" {
run "$LIB_DIR/archeflow-git.sh" nonexistent test-run
[ "$status" -ne 0 ]
[[ "$output" == *"Unknown command"* ]]
}
# --- init ---
@test "git init: creates branch with archeflow/ prefix" {
run "$LIB_DIR/archeflow-git.sh" init test-run
[ "$status" -eq 0 ]
local current
current=$(git branch --show-current)
[ "$current" = "archeflow/test-run" ]
}
@test "git init: stores base branch in .archeflow/runs/<run_id>/base-branch" {
"$LIB_DIR/archeflow-git.sh" init test-run 2>/dev/null
[ -f ".archeflow/runs/test-run/base-branch" ]
local base
base=$(cat ".archeflow/runs/test-run/base-branch")
[ "$base" = "main" ]
}
@test "git init: fails if branch already exists" {
"$LIB_DIR/archeflow-git.sh" init test-run 2>/dev/null
git checkout main --quiet
run "$LIB_DIR/archeflow-git.sh" init test-run
[ "$status" -ne 0 ]
[[ "$output" == *"already exists"* ]]
}
# --- commit ---
@test "git commit: uses conventional commit format by default" {
"$LIB_DIR/archeflow-git.sh" init test-run 2>/dev/null
# Create a file to commit
mkdir -p .archeflow/events
echo '{"test":true}' > .archeflow/events/test-run.jsonl
"$LIB_DIR/archeflow-git.sh" commit test-run plan "initial plan" 2>/dev/null
local msg
msg=$(git log -1 --format=%s)
[[ "$msg" == "archeflow(plan): initial plan" ]]
}
@test "git commit: stages event file automatically" {
"$LIB_DIR/archeflow-git.sh" init test-run 2>/dev/null
mkdir -p .archeflow/events
echo '{"test":true}' > .archeflow/events/test-run.jsonl
"$LIB_DIR/archeflow-git.sh" commit test-run plan "test commit" 2>/dev/null
# Verify the event file was committed
local committed_files
committed_files=$(git diff-tree --no-commit-id --name-only -r HEAD)
[[ "$committed_files" == *"test-run.jsonl"* ]]
}
@test "git commit: stages extra files passed as arguments" {
"$LIB_DIR/archeflow-git.sh" init test-run 2>/dev/null
echo "extra content" > extra.txt
"$LIB_DIR/archeflow-git.sh" commit test-run do "with extras" extra.txt 2>/dev/null
local committed_files
committed_files=$(git diff-tree --no-commit-id --name-only -r HEAD)
[[ "$committed_files" == *"extra.txt"* ]]
}
@test "git commit: reports nothing to commit when no changes" {
"$LIB_DIR/archeflow-git.sh" init test-run 2>/dev/null
# Commit the init artifacts first so there's a clean state
git add -A && git commit -m "init artifacts" --quiet 2>/dev/null || true
run bash -c "cd '$BATS_TEST_TMPDIR' && '$LIB_DIR/archeflow-git.sh' commit test-run plan 'empty' 2>&1"
[ "$status" -eq 0 ]
[[ "$output" == *"Nothing to commit"* ]]
}
@test "git commit: fails if not on the run branch" {
"$LIB_DIR/archeflow-git.sh" init test-run 2>/dev/null
git checkout main --quiet
run "$LIB_DIR/archeflow-git.sh" commit test-run plan "wrong branch"
[ "$status" -ne 0 ]
[[ "$output" == *"Expected to be on branch"* ]]
}
# --- phase-commit ---
@test "git phase-commit: creates commit with phase transition message" {
"$LIB_DIR/archeflow-git.sh" init test-run 2>/dev/null
mkdir -p .archeflow/events
echo '{"test":true}' > .archeflow/events/test-run.jsonl
"$LIB_DIR/archeflow-git.sh" phase-commit test-run plan 2>/dev/null
local msg
msg=$(git log -1 --format=%s)
# Should contain the phase transition arrow
[[ "$msg" == *"plan"* ]]
[[ "$msg" == *"do"* ]]
}
# --- merge ---
@test "git merge: squash merge is the default strategy" {
"$LIB_DIR/archeflow-git.sh" init test-run 2>/dev/null
mkdir -p .archeflow/events
echo '{"test":true}' > .archeflow/events/test-run.jsonl
"$LIB_DIR/archeflow-git.sh" commit test-run plan "test" 2>/dev/null
"$LIB_DIR/archeflow-git.sh" merge test-run 2>/dev/null
local current
current=$(git branch --show-current)
[ "$current" = "main" ]
local msg
msg=$(git log -1 --format=%s)
[[ "$msg" == *"archeflow run test-run"* ]]
}
@test "git merge: --no-ff creates a merge commit" {
"$LIB_DIR/archeflow-git.sh" init test-run 2>/dev/null
mkdir -p .archeflow/events
echo '{"test":true}' > .archeflow/events/test-run.jsonl
"$LIB_DIR/archeflow-git.sh" commit test-run plan "test" 2>/dev/null
"$LIB_DIR/archeflow-git.sh" merge test-run --no-ff 2>/dev/null
local current
current=$(git branch --show-current)
[ "$current" = "main" ]
# no-ff merge commit should have 2 parents
local parent_count
parent_count=$(git cat-file -p HEAD | grep -c '^parent')
[ "$parent_count" -eq 2 ]
}
@test "git merge: rejects unknown merge strategy" {
"$LIB_DIR/archeflow-git.sh" init test-run 2>/dev/null
mkdir -p .archeflow/events
echo '{"test":true}' > .archeflow/events/test-run.jsonl
"$LIB_DIR/archeflow-git.sh" commit test-run plan "test" 2>/dev/null
run "$LIB_DIR/archeflow-git.sh" merge test-run --fast-forward
[ "$status" -ne 0 ]
[[ "$output" == *"Unknown merge strategy"* ]]
}
@test "git merge: fails with uncommitted changes" {
"$LIB_DIR/archeflow-git.sh" init test-run 2>/dev/null
echo "dirty" > dirty.txt
git add dirty.txt
run "$LIB_DIR/archeflow-git.sh" merge test-run
[ "$status" -ne 0 ]
[[ "$output" == *"Uncommitted changes"* ]]
}
# --- format_message ---
@test "git commit: simple style uses 'phase: msg' format" {
"$LIB_DIR/archeflow-git.sh" init test-run 2>/dev/null
# Create config with simple style
mkdir -p .archeflow
echo "commit_style: simple" > .archeflow/config.yaml
mkdir -p .archeflow/events
echo '{"test":true}' > .archeflow/events/test-run.jsonl
"$LIB_DIR/archeflow-git.sh" commit test-run plan "simple test" 2>/dev/null
local msg
msg=$(git log -1 --format=%s)
[ "$msg" = "plan: simple test" ]
}
# --- status ---
@test "git status: shows branch info for existing run" {
"$LIB_DIR/archeflow-git.sh" init test-run 2>/dev/null
run "$LIB_DIR/archeflow-git.sh" status test-run
[ "$status" -eq 0 ]
[[ "$output" == *"Branch: archeflow/test-run"* ]]
[[ "$output" == *"Base: main"* ]]
}
@test "git status: fails for nonexistent branch" {
run "$LIB_DIR/archeflow-git.sh" status nonexistent
[ "$status" -ne 0 ]
[[ "$output" == *"does not exist"* ]]
}
# --- cleanup ---
@test "git cleanup: fails if currently on the run branch" {
"$LIB_DIR/archeflow-git.sh" init test-run 2>/dev/null
run "$LIB_DIR/archeflow-git.sh" cleanup test-run
[ "$status" -ne 0 ]
[[ "$output" == *"Cannot delete"* ]]
}

81
tests/archeflow-init.bats Normal file
View File

@@ -0,0 +1,81 @@
# Tests for archeflow-init.sh — project initialization from templates.
#
# Validates: usage output, --list, --from (clone), and argument parsing.
setup() {
load test_helper
_common_setup
}
teardown() {
_common_teardown
}
@test "init: shows usage when called with no args" {
run "$LIB_DIR/archeflow-init.sh"
[ "$status" -eq 0 ]
[[ "$output" == *"Usage"* ]]
[[ "$output" == *"bundle-name"* ]]
}
@test "init: --list shows template listing without errors" {
run "$LIB_DIR/archeflow-init.sh" --list
[ "$status" -eq 0 ]
[[ "$output" == *"Templates"* ]]
[[ "$output" == *"Bundles"* ]]
}
@test "init: --from fails when source has no .archeflow dir" {
local source_dir
source_dir=$(mktemp -d)
run "$LIB_DIR/archeflow-init.sh" --from "$source_dir"
[ "$status" -ne 0 ]
[[ "$output" == *"No .archeflow/"* ]]
rm -rf "$source_dir"
}
@test "init: --from clones setup from another project" {
# Create a source project with .archeflow structure
local source_dir
source_dir=$(mktemp -d)
mkdir -p "$source_dir/.archeflow/teams" "$source_dir/.archeflow/workflows"
echo "name: test-team" > "$source_dir/.archeflow/teams/test.yaml"
echo "name: test-workflow" > "$source_dir/.archeflow/workflows/test.yaml"
echo "bundle: test" > "$source_dir/.archeflow/config.yaml"
run "$LIB_DIR/archeflow-init.sh" --from "$source_dir"
[ "$status" -eq 0 ]
[ -f ".archeflow/teams/test.yaml" ]
[ -f ".archeflow/workflows/test.yaml" ]
[ -f ".archeflow/config.yaml" ]
rm -rf "$source_dir"
}
@test "init: --from skips events and artifacts directories" {
local source_dir
source_dir=$(mktemp -d)
mkdir -p "$source_dir/.archeflow/events" "$source_dir/.archeflow/artifacts"
mkdir -p "$source_dir/.archeflow/teams"
echo "name: test" > "$source_dir/.archeflow/teams/t.yaml"
echo '{"test":true}' > "$source_dir/.archeflow/events/run.jsonl"
echo "artifact" > "$source_dir/.archeflow/artifacts/test.txt"
run "$LIB_DIR/archeflow-init.sh" --from "$source_dir"
[ "$status" -eq 0 ]
[ ! -f ".archeflow/events/run.jsonl" ]
[ ! -f ".archeflow/artifacts/test.txt" ]
[[ "$output" == *"skipped events"* ]]
rm -rf "$source_dir"
}
@test "init: rejects unknown options" {
run "$LIB_DIR/archeflow-init.sh" --nonexistent
[ "$status" -ne 0 ]
[[ "$output" == *"Unknown option"* ]]
}
@test "init: --save fails with no .archeflow directory" {
run "$LIB_DIR/archeflow-init.sh" --save test-save
[ "$status" -ne 0 ]
[[ "$output" == *"No .archeflow/"* ]]
}

227
tests/archeflow-memory.bats Normal file
View File

@@ -0,0 +1,227 @@
# Tests for archeflow-memory.sh — cross-run lesson memory management.
#
# Validates: add, list, decay, forget, inject filtering, and JSONL format.
setup() {
load test_helper
_common_setup
}
teardown() {
_common_teardown
}
# --- Usage / error handling ---
@test "memory: exits 1 with usage when called with no args" {
run "$LIB_DIR/archeflow-memory.sh"
[ "$status" -eq 1 ]
[[ "$output" == *"Usage"* ]]
}
@test "memory: exits 1 for unknown command" {
run "$LIB_DIR/archeflow-memory.sh" nonexistent
[ "$status" -eq 1 ]
[[ "$output" == *"Unknown command"* ]]
}
# --- add ---
@test "memory add: creates lessons.jsonl and appends a valid JSONL line" {
run "$LIB_DIR/archeflow-memory.sh" add preference "Always validate inputs"
[ "$status" -eq 0 ]
[ -f ".archeflow/memory/lessons.jsonl" ]
jq empty ".archeflow/memory/lessons.jsonl"
}
@test "memory add: lesson has correct fields" {
"$LIB_DIR/archeflow-memory.sh" add pattern "Guardian misses SQL injection" 2>/dev/null
[ "$(jq -r '.type' .archeflow/memory/lessons.jsonl)" = "pattern" ]
[ "$(jq -r '.description' .archeflow/memory/lessons.jsonl)" = "Guardian misses SQL injection" ]
[ "$(jq -r '.source' .archeflow/memory/lessons.jsonl)" = "user_feedback" ]
[ "$(jq -r '.frequency' .archeflow/memory/lessons.jsonl)" = "1" ]
[ "$(jq -r '.run_id' .archeflow/memory/lessons.jsonl)" = "manual" ]
[ "$(jq -r '.domain' .archeflow/memory/lessons.jsonl)" = "general" ]
}
@test "memory add: generates sequential IDs" {
"$LIB_DIR/archeflow-memory.sh" add pattern "first lesson" 2>/dev/null
"$LIB_DIR/archeflow-memory.sh" add pattern "second lesson" 2>/dev/null
local id1 id2
id1=$(head -1 ".archeflow/memory/lessons.jsonl" | jq -r '.id')
id2=$(tail -1 ".archeflow/memory/lessons.jsonl" | jq -r '.id')
[ "$id1" = "m-001" ]
[ "$id2" = "m-002" ]
}
@test "memory add: generates tags from description" {
"$LIB_DIR/archeflow-memory.sh" add pattern "Guardian misses SQL injection attacks" 2>/dev/null
local tags_count
tags_count=$(head -1 ".archeflow/memory/lessons.jsonl" | jq '.tags | length')
[ "$tags_count" -gt 0 ]
}
@test "memory add: exits 1 when description is missing" {
run "$LIB_DIR/archeflow-memory.sh" add pattern
[ "$status" -eq 1 ]
[[ "$output" == *"Usage"* ]]
}
# --- list ---
@test "memory list: shows message when no lessons exist" {
run bash -c "'$LIB_DIR/archeflow-memory.sh' list 2>&1"
[ "$status" -eq 0 ]
[[ "$output" == *"No lessons"* ]]
}
@test "memory list: shows table header and lesson data" {
"$LIB_DIR/archeflow-memory.sh" add pattern "Test lesson for listing" 2>/dev/null
run "$LIB_DIR/archeflow-memory.sh" list
[ "$status" -eq 0 ]
[[ "$output" == *"ID"* ]]
[[ "$output" == *"Freq"* ]]
[[ "$output" == *"m-001"* ]]
[[ "$output" == *"Test lesson for listing"* ]]
}
# --- decay ---
@test "memory decay: increments runs_since_last_seen" {
"$LIB_DIR/archeflow-memory.sh" add pattern "Decay test lesson" 2>/dev/null
"$LIB_DIR/archeflow-memory.sh" decay 2>/dev/null
local runs_since
runs_since=$(head -1 ".archeflow/memory/lessons.jsonl" | jq '.runs_since_last_seen')
[ "$runs_since" -eq 1 ]
}
@test "memory decay: decrements frequency after 10 runs" {
"$LIB_DIR/archeflow-memory.sh" add pattern "Decay frequency test" 2>/dev/null
# Set frequency=3 and runs_since=9 to trigger decay on next call
local tmp=".archeflow/memory/lessons.jsonl.tmp"
head -1 ".archeflow/memory/lessons.jsonl" | jq -c '.frequency = 3 | .runs_since_last_seen = 9' > "$tmp"
mv "$tmp" ".archeflow/memory/lessons.jsonl"
"$LIB_DIR/archeflow-memory.sh" decay 2>/dev/null
local freq
freq=$(head -1 ".archeflow/memory/lessons.jsonl" | jq '.frequency')
[ "$freq" -eq 2 ]
}
@test "memory decay: archives lesson when frequency reaches 0" {
"$LIB_DIR/archeflow-memory.sh" add pattern "Will be archived" 2>/dev/null
# Set frequency=1 and runs_since=9 to trigger archival
local tmp=".archeflow/memory/lessons.jsonl.tmp"
head -1 ".archeflow/memory/lessons.jsonl" | jq -c '.frequency = 1 | .runs_since_last_seen = 9' > "$tmp"
mv "$tmp" ".archeflow/memory/lessons.jsonl"
"$LIB_DIR/archeflow-memory.sh" decay 2>/dev/null
# Lesson should be gone from lessons file (file should be empty)
local remaining
remaining=$(wc -l < ".archeflow/memory/lessons.jsonl" | tr -d ' ')
[ "$remaining" -eq 0 ]
# And present in archive
[ -f ".archeflow/memory/archive.jsonl" ]
local archived_count
archived_count=$(wc -l < ".archeflow/memory/archive.jsonl" | tr -d ' ')
[ "$archived_count" -eq 1 ]
}
@test "memory decay: does nothing when no lessons exist" {
run "$LIB_DIR/archeflow-memory.sh" decay
[ "$status" -eq 0 ]
}
# --- forget ---
@test "memory forget: moves lesson to archive" {
"$LIB_DIR/archeflow-memory.sh" add pattern "Will forget this" 2>/dev/null
"$LIB_DIR/archeflow-memory.sh" forget m-001 2>/dev/null
# Lessons file should be empty
local remaining
remaining=$(wc -l < ".archeflow/memory/lessons.jsonl" | tr -d ' ')
[ "$remaining" -eq 0 ]
# Archive should have it
[ -f ".archeflow/memory/archive.jsonl" ]
local archived_id
archived_id=$(head -1 ".archeflow/memory/archive.jsonl" | jq -r '.id')
[ "$archived_id" = "m-001" ]
}
@test "memory forget: exits 1 for nonexistent ID" {
"$LIB_DIR/archeflow-memory.sh" add pattern "test" 2>/dev/null
run "$LIB_DIR/archeflow-memory.sh" forget m-999
[ "$status" -eq 1 ]
[[ "$output" == *"not found"* ]]
}
@test "memory forget: exits 1 when no lessons file exists" {
run "$LIB_DIR/archeflow-memory.sh" forget m-001
[ "$status" -eq 1 ]
[[ "$output" == *"No lessons file"* ]]
}
# --- inject ---
@test "memory inject: outputs nothing when no lessons file exists" {
run "$LIB_DIR/archeflow-memory.sh" inject code guardian
[ "$status" -eq 0 ]
[ -z "$output" ]
}
@test "memory inject: outputs relevant lessons with frequency >= 2" {
"$LIB_DIR/archeflow-memory.sh" add pattern "Test injection lesson" 2>/dev/null
# Bump frequency to 2
local tmp=".archeflow/memory/lessons.jsonl.tmp"
jq -c '.frequency = 2' ".archeflow/memory/lessons.jsonl" > "$tmp"
mv "$tmp" ".archeflow/memory/lessons.jsonl"
run "$LIB_DIR/archeflow-memory.sh" inject "" ""
[ "$status" -eq 0 ]
[[ "$output" == *"Known Issues"* ]]
[[ "$output" == *"Test injection lesson"* ]]
}
@test "memory inject: skips lessons with frequency < 2 (except preferences)" {
"$LIB_DIR/archeflow-memory.sh" add pattern "Low frequency lesson" 2>/dev/null
# frequency is 1 by default, type is pattern -> should NOT be injected
run "$LIB_DIR/archeflow-memory.sh" inject "" ""
[ "$status" -eq 0 ]
[ -z "$output" ]
}
@test "memory inject: always injects preferences regardless of frequency" {
"$LIB_DIR/archeflow-memory.sh" add preference "User prefers explicit error messages" 2>/dev/null
run "$LIB_DIR/archeflow-memory.sh" inject "" ""
[ "$status" -eq 0 ]
[[ "$output" == *"User prefers explicit error messages"* ]]
}
# --- extract ---
@test "memory extract: exits 1 when events file not found" {
run "$LIB_DIR/archeflow-memory.sh" extract nonexistent.jsonl
[ "$status" -eq 1 ]
[[ "$output" == *"not found"* ]]
}
@test "memory extract: extracts findings from review.verdict events" {
# Create a mock events file with a review.verdict
mkdir -p .archeflow/events
cat > /tmp/test-events.jsonl <<'EOF'
{"run_id":"test-run","seq":1,"type":"run.start","phase":"plan","data":{"task":"test"}}
{"run_id":"test-run","seq":2,"type":"review.verdict","phase":"check","data":{"archetype":"guardian","verdict":"needs_changes","findings":[{"severity":"warning","description":"Missing input validation on user endpoint","category":"code"}]}}
EOF
run "$LIB_DIR/archeflow-memory.sh" extract /tmp/test-events.jsonl
[ "$status" -eq 0 ]
[ -f ".archeflow/memory/lessons.jsonl" ]
local desc
desc=$(jq -r '.description' ".archeflow/memory/lessons.jsonl")
[[ "$desc" == *"Missing input validation"* ]]
rm -f /tmp/test-events.jsonl
}

View File

@@ -0,0 +1,78 @@
# Tests for archeflow-progress.sh — live progress file generation.
#
# Validates: markdown output structure, JSON mode, missing events handling, exit codes.
setup() {
load test_helper
_common_setup
# Create standard events for progress tests
mkdir -p .archeflow/events
cat > ".archeflow/events/test-run.jsonl" <<'EVENTS'
{"ts":"2026-04-03T10:00:00Z","run_id":"test-run","seq":1,"parent":[],"type":"run.start","phase":"plan","agent":null,"data":{"task":"Build feature","workflow":"standard","team":"default"}}
{"ts":"2026-04-03T10:01:00Z","run_id":"test-run","seq":2,"parent":[1],"type":"agent.complete","phase":"plan","agent":"creator","data":{"archetype":"creator","duration_ms":60000,"tokens":1500,"estimated_cost_usd":0.02,"summary":"Planned"}}
EVENTS
}
@test "progress: exits 1 with usage when called with no args" {
run "$LIB_DIR/archeflow-progress.sh"
[ "$status" -eq 1 ]
[[ "$output" == *"Usage"* ]]
}
@test "progress: exits 1 when events file not found" {
run "$LIB_DIR/archeflow-progress.sh" nonexistent-run
[ "$status" -eq 1 ]
[[ "$output" == *"not found"* ]]
}
@test "progress: default mode generates progress.md" {
run "$LIB_DIR/archeflow-progress.sh" test-run
[ "$status" -eq 0 ]
[ -f ".archeflow/progress.md" ]
[[ "$output" == *"# ArcheFlow Run: test-run"* ]]
[[ "$output" == *"Status:"* ]]
[[ "$output" == *"Progress"* ]]
}
@test "progress: json mode outputs valid JSON" {
run "$LIB_DIR/archeflow-progress.sh" test-run --json
[ "$status" -eq 0 ]
echo "$output" | jq empty
local run_id
run_id=$(echo "$output" | jq -r '.run_id')
[ "$run_id" = "test-run" ]
}
@test "progress: json mode includes completed agents" {
run "$LIB_DIR/archeflow-progress.sh" test-run --json
[ "$status" -eq 0 ]
local completed_count
completed_count=$(echo "$output" | jq '.completed | length')
[ "$completed_count" -eq 1 ]
local agent
agent=$(echo "$output" | jq -r '.completed[0].agent')
[ "$agent" = "creator" ]
}
@test "progress: json mode shows correct phase" {
run "$LIB_DIR/archeflow-progress.sh" test-run --json
[ "$status" -eq 0 ]
local phase
phase=$(echo "$output" | jq -r '.phase')
[ "$phase" = "plan" ]
}
@test "progress: reports error in json when events file missing" {
run "$LIB_DIR/archeflow-progress.sh" missing-run --json
# JSON mode returns the JSON even on error
local error
error=$(echo "$output" | jq -r '.error // empty')
[[ "$error" == *"not found"* ]]
}
@test "progress: rejects unknown flags" {
run "$LIB_DIR/archeflow-progress.sh" test-run --invalid
[ "$status" -eq 1 ]
[[ "$output" == *"Unknown flag"* ]]
}

View File

@@ -0,0 +1,62 @@
# Tests for archeflow-replay.sh — timeline, what-if, and compare modes.
setup() {
load test_helper
_common_setup
mkdir -p .archeflow/events
cat > ".archeflow/events/replay-run.jsonl" <<'EVENTS'
{"ts":"2026-04-03T10:00:00Z","run_id":"replay-run","seq":1,"parent":[],"type":"run.start","phase":"plan","agent":null,"data":{"task":"replay test"}}
{"ts":"2026-04-03T10:05:00Z","run_id":"replay-run","seq":2,"parent":[1],"type":"decision.point","phase":"check","agent":"guardian","data":{"archetype":"guardian","input":"diff","decision":"needs_changes","confidence":0.88}}
{"ts":"2026-04-03T10:06:00Z","run_id":"replay-run","seq":3,"parent":[1],"type":"review.verdict","phase":"check","agent":"guardian","data":{"archetype":"guardian","verdict":"needs_changes","findings":[]}}
{"ts":"2026-04-03T10:07:00Z","run_id":"replay-run","seq":4,"parent":[1],"type":"review.verdict","phase":"check","agent":"sage","data":{"archetype":"sage","verdict":"approved","findings":[]}}
{"ts":"2026-04-03T10:08:00Z","run_id":"replay-run","seq":5,"parent":[1],"type":"run.complete","phase":"act","agent":null,"data":{"agents_total":2,"fixes_total":0}}
EVENTS
}
@test "replay: usage without args" {
run "$LIB_DIR/archeflow-replay.sh"
[ "$status" -eq 1 ]
[[ "$output" == *"Usage"* ]]
}
@test "replay: timeline shows decision.point" {
run "$LIB_DIR/archeflow-replay.sh" timeline replay-run
[ "$status" -eq 0 ]
[[ "$output" == *"decision.point"* ]]
[[ "$output" == *"guardian"* ]]
[[ "$output" == *"needs_changes"* ]]
}
@test "replay: whatif strict blocks when any reviewer blocks" {
run "$LIB_DIR/archeflow-replay.sh" whatif replay-run
[ "$status" -eq 0 ]
[[ "$output" == *"BLOCK"* ]]
}
@test "replay: whatif weighted can ship when blocker is down-weighted" {
run "$LIB_DIR/archeflow-replay.sh" whatif replay-run --weights guardian=0.2,sage=3
[ "$status" -eq 0 ]
[[ "$output" == *"Weighted replay"* ]] || [[ "$output" == *"SHIP"* ]]
[[ "$output" == *"SHIP"* ]]
}
@test "replay: whatif --json is valid JSON" {
run "$LIB_DIR/archeflow-replay.sh" whatif replay-run --json
[ "$status" -eq 0 ]
echo "$output" | jq -e '.run_id == "replay-run"' >/dev/null
}
@test "replay: compare includes timeline and whatif" {
run "$LIB_DIR/archeflow-replay.sh" compare replay-run
[ "$status" -eq 0 ]
[[ "$output" == *"Decision timeline"* ]]
[[ "$output" == *"What-if replay"* ]]
}
@test "decision: logs decision.point via wrapper" {
run "$LIB_DIR/archeflow-decision.sh" replay-run check trickster 'diff only' 'edge_case' 0.61 1
[ "$status" -eq 0 ]
last=$(jq -r 'select(.type=="decision.point") | .data.decision' ".archeflow/events/replay-run.jsonl" | tail -1)
[ "$last" = "edge_case" ]
}

View File

@@ -0,0 +1,80 @@
# Tests for archeflow-report.sh — Markdown process report generation from JSONL events.
#
# Validates: report output format, summary mode, missing file handling, jq dependency check.
setup() {
load test_helper
_common_setup
# Create a standard events file used by multiple tests
mkdir -p .archeflow/events
cat > "$BATS_TEST_TMPDIR/events.jsonl" <<'EVENTS'
{"ts":"2026-04-03T10:00:00Z","run_id":"test-run","seq":1,"parent":[],"type":"run.start","phase":"plan","agent":null,"data":{"task":"Write unit tests","workflow":"standard","team":"default"}}
{"ts":"2026-04-03T10:01:00Z","run_id":"test-run","seq":2,"parent":[1],"type":"agent.complete","phase":"plan","agent":"creator","data":{"archetype":"creator","duration_ms":60000,"tokens":1500,"summary":"Designed test structure"}}
{"ts":"2026-04-03T10:02:00Z","run_id":"test-run","seq":3,"parent":[2],"type":"phase.transition","phase":"do","agent":null,"data":{"from":"plan","to":"do"}}
{"ts":"2026-04-03T10:05:00Z","run_id":"test-run","seq":4,"parent":[3],"type":"agent.complete","phase":"do","agent":"maker","data":{"archetype":"maker","duration_ms":180000,"tokens":3000,"summary":"Implemented tests"}}
{"ts":"2026-04-03T10:06:00Z","run_id":"test-run","seq":5,"parent":[4],"type":"phase.transition","phase":"check","agent":null,"data":{"from":"do","to":"check"}}
{"ts":"2026-04-03T10:07:00Z","run_id":"test-run","seq":6,"parent":[5],"type":"review.verdict","phase":"check","agent":"guardian","data":{"archetype":"guardian","verdict":"approved","findings":[]}}
{"ts":"2026-04-03T10:08:00Z","run_id":"test-run","seq":7,"parent":[6],"type":"run.complete","phase":"act","agent":null,"data":{"status":"completed","cycles":1,"agents_total":3,"fixes_total":0,"duration_ms":480000}}
EVENTS
}
@test "report: exits 1 with usage when called with no args" {
run "$LIB_DIR/archeflow-report.sh"
[ "$status" -eq 1 ]
[[ "$output" == *"Usage"* ]]
}
@test "report: exits 1 when events file not found" {
run "$LIB_DIR/archeflow-report.sh" nonexistent.jsonl
[ "$status" -eq 1 ]
[[ "$output" == *"not found"* ]]
}
@test "report: full mode produces markdown with header and overview" {
run "$LIB_DIR/archeflow-report.sh" "$BATS_TEST_TMPDIR/events.jsonl"
[ "$status" -eq 0 ]
[[ "$output" == *"# Process Report: Write unit tests"* ]]
[[ "$output" == *"test-run"* ]]
[[ "$output" == *"Overview"* ]]
[[ "$output" == *"Status"* ]]
[[ "$output" == *"completed"* ]]
}
@test "report: full mode includes phase sections" {
run "$LIB_DIR/archeflow-report.sh" "$BATS_TEST_TMPDIR/events.jsonl"
[ "$status" -eq 0 ]
[[ "$output" == *"PLAN"* ]]
[[ "$output" == *"DO"* ]]
[[ "$output" == *"CHECK"* ]]
}
@test "report: summary mode outputs one-line summary" {
run "$LIB_DIR/archeflow-report.sh" "$BATS_TEST_TMPDIR/events.jsonl" --summary
[ "$status" -eq 0 ]
# Should be a single logical line with key stats
[[ "$output" == *"[completed]"* ]]
[[ "$output" == *"Write unit tests"* ]]
[[ "$output" == *"1 cycles"* ]]
[[ "$output" == *"test-run"* ]]
}
@test "report: --output writes to file instead of stdout" {
run "$LIB_DIR/archeflow-report.sh" "$BATS_TEST_TMPDIR/events.jsonl" --output "$BATS_TEST_TMPDIR/report.md"
[ "$status" -eq 0 ]
[ -f "$BATS_TEST_TMPDIR/report.md" ]
local content
content=$(cat "$BATS_TEST_TMPDIR/report.md")
[[ "$content" == *"# Process Report"* ]]
}
@test "report: summary for in-progress run shows [in-progress]" {
# Events file without run.complete
cat > "$BATS_TEST_TMPDIR/in-progress.jsonl" <<'EVENTS'
{"ts":"2026-04-03T10:00:00Z","run_id":"wip-run","seq":1,"parent":[],"type":"run.start","phase":"plan","agent":null,"data":{"task":"WIP task","workflow":"fast","team":"default"}}
EVENTS
run "$LIB_DIR/archeflow-report.sh" "$BATS_TEST_TMPDIR/in-progress.jsonl" --summary
[ "$status" -eq 0 ]
[[ "$output" == *"[in-progress]"* ]]
[[ "$output" == *"WIP task"* ]]
}

View File

@@ -0,0 +1,82 @@
# Tests for archeflow-review.sh — git diff extraction for code review.
#
# Validates: argument parsing, diff modes, stats output, empty diff handling.
setup() {
load test_helper
_common_setup
}
teardown() {
_common_teardown
}
@test "review: --help shows usage" {
run "$LIB_DIR/archeflow-review.sh" --help
[ "$status" -eq 0 ]
[[ "$output" == *"Usage"* ]]
[[ "$output" == *"--branch"* ]]
[[ "$output" == *"--commit"* ]]
}
@test "review: exits 1 when no changes to review" {
run "$LIB_DIR/archeflow-review.sh"
[ "$status" -eq 1 ]
[[ "$output" == *"No changes"* ]]
}
@test "review: shows diff for uncommitted changes" {
echo "new content" > testfile.txt
git add testfile.txt
run "$LIB_DIR/archeflow-review.sh"
[ "$status" -eq 0 ]
[[ "$output" == *"testfile.txt"* ]]
}
@test "review: --stat-only prints stats without diff content" {
echo "stat content" > statfile.txt
git add statfile.txt
run "$LIB_DIR/archeflow-review.sh" --stat-only
[ "$status" -eq 0 ]
# stderr has stats, stdout should be empty (no diff)
# But run captures both, so just check it ran ok
[[ "$output" == *"Review Stats"* ]]
}
@test "review: --branch fails for nonexistent branch" {
run "$LIB_DIR/archeflow-review.sh" --branch nonexistent-branch-xyz
[ "$status" -ne 0 ]
[[ "$output" == *"not found"* ]]
}
@test "review: rejects unknown arguments" {
run "$LIB_DIR/archeflow-review.sh" --unknown
[ "$status" -ne 0 ]
[[ "$output" == *"Unknown argument"* ]]
}
@test "review: --branch shows diff against base" {
# Create a feature branch with changes
git checkout -b feat/test-review --quiet
echo "feature" > feature.txt
git add feature.txt
git commit -m "feat: add feature" --quiet
git checkout main --quiet
run "$LIB_DIR/archeflow-review.sh" --branch feat/test-review
[ "$status" -eq 0 ]
[[ "$output" == *"feature.txt"* ]]
}
@test "review: --commit shows diff for commit range" {
echo "first" > first.txt
git add first.txt
git commit -m "first" --quiet
echo "second" > second.txt
git add second.txt
git commit -m "second" --quiet
run "$LIB_DIR/archeflow-review.sh" --commit HEAD~1..HEAD
[ "$status" -eq 0 ]
[[ "$output" == *"second.txt"* ]]
}

View File

@@ -0,0 +1,58 @@
# Tests for archeflow-rollback.sh — post-merge test and phase rollback.
#
# Validates: argument parsing, mutual exclusivity, phase validation, test-cmd config reading.
setup() {
load test_helper
_common_setup
}
teardown() {
_common_teardown
}
@test "rollback: exits with error when called with no args" {
run "$LIB_DIR/archeflow-rollback.sh"
[ "$status" -ne 0 ]
}
@test "rollback: rejects mutually exclusive --to and --test-cmd" {
run "$LIB_DIR/archeflow-rollback.sh" test-run --to plan --test-cmd "true"
[ "$status" -eq 2 ]
[[ "$output" == *"mutually exclusive"* ]]
}
@test "rollback: rejects invalid phase names" {
run "$LIB_DIR/archeflow-rollback.sh" test-run --to invalid-phase
[ "$status" -eq 2 ]
[[ "$output" == *"Invalid phase"* ]]
}
@test "rollback: accepts valid phase names (plan, do, check)" {
# This will fail because no git branch exists, but should NOT fail on phase validation
run "$LIB_DIR/archeflow-rollback.sh" test-run --to plan
# Should fail later (archeflow-git.sh rollback) not on phase validation
[[ "$output" != *"Invalid phase"* ]]
}
@test "rollback: exits 2 when no test command available" {
run "$LIB_DIR/archeflow-rollback.sh" test-run
[ "$status" -eq 2 ]
[[ "$output" == *"No test command"* ]]
}
@test "rollback: reads test_command from config.yaml" {
mkdir -p .archeflow
echo 'test_command: "echo ok"' > .archeflow/config.yaml
# HEAD won't have archeflow in its message, but the script just warns and proceeds
run "$LIB_DIR/archeflow-rollback.sh" test-run
# It should pick up the command and try to run it (test should pass -> exit 0)
[ "$status" -eq 0 ]
[[ "$output" == *"Tests passed"* ]]
}
@test "rollback: rejects unknown options" {
run "$LIB_DIR/archeflow-rollback.sh" test-run --unknown-flag
[ "$status" -eq 2 ]
[[ "$output" == *"Unknown option"* ]]
}

105
tests/archeflow-score.bats Normal file
View File

@@ -0,0 +1,105 @@
# Tests for archeflow-score.sh — archetype effectiveness scoring.
#
# Validates: score extraction from events, report generation, input validation.
setup() {
load test_helper
_common_setup
# Create a complete run events file with review data
mkdir -p .archeflow/events .archeflow/memory
cat > "$BATS_TEST_TMPDIR/scored-events.jsonl" <<'EVENTS'
{"ts":"2026-04-03T10:00:00Z","run_id":"score-run","seq":1,"parent":[],"type":"run.start","phase":"plan","agent":null,"data":{"task":"Score test"}}
{"ts":"2026-04-03T10:01:00Z","run_id":"score-run","seq":2,"parent":[1],"type":"agent.complete","phase":"plan","agent":"creator","data":{"archetype":"creator","duration_ms":60000,"tokens":1500,"estimated_cost_usd":0.02}}
{"ts":"2026-04-03T10:02:00Z","run_id":"score-run","seq":3,"parent":[2],"type":"agent.complete","phase":"do","agent":"maker","data":{"archetype":"maker","duration_ms":120000,"tokens":3000,"estimated_cost_usd":0.05}}
{"ts":"2026-04-03T10:03:00Z","run_id":"score-run","seq":4,"parent":[3],"type":"review.verdict","phase":"check","agent":"guardian","data":{"archetype":"guardian","verdict":"needs_changes","findings":[{"severity":"warning","description":"Missing validation","fix_required":true},{"severity":"info","description":"Consider logging","fix_required":false}]}}
{"ts":"2026-04-03T10:03:30Z","run_id":"score-run","seq":5,"parent":[3],"type":"review.verdict","phase":"check","agent":"sage","data":{"archetype":"sage","verdict":"approved","findings":[]}}
{"ts":"2026-04-03T10:04:00Z","run_id":"score-run","seq":6,"parent":[4],"type":"fix.applied","phase":"act","agent":null,"data":{"source":"guardian","finding":"Missing validation"}}
{"ts":"2026-04-03T10:05:00Z","run_id":"score-run","seq":7,"parent":[6],"type":"cycle.boundary","phase":"act","agent":null,"data":{"cycle":1,"max_cycles":3,"met":true,"next_action":"merge"}}
{"ts":"2026-04-03T10:06:00Z","run_id":"score-run","seq":8,"parent":[7],"type":"run.complete","phase":"act","agent":null,"data":{"status":"completed","cycles":1,"agents_total":4,"fixes_total":1}}
EVENTS
}
@test "score: exits 1 with usage when called with no args" {
run "$LIB_DIR/archeflow-score.sh"
[ "$status" -eq 1 ]
[[ "$output" == *"Usage"* ]]
}
@test "score: exits 1 for unknown command" {
run "$LIB_DIR/archeflow-score.sh" nonexistent
[ "$status" -eq 1 ]
[[ "$output" == *"Unknown command"* ]]
}
@test "score extract: exits 1 when events file not found" {
run "$LIB_DIR/archeflow-score.sh" extract nonexistent.jsonl
[ "$status" -eq 1 ]
[[ "$output" == *"not found"* ]]
}
@test "score extract: exits 1 for incomplete run (no run.complete)" {
cat > "$BATS_TEST_TMPDIR/incomplete.jsonl" <<'EVENTS'
{"ts":"2026-04-03T10:00:00Z","run_id":"incomplete","seq":1,"parent":[],"type":"run.start","phase":"plan","agent":null,"data":{"task":"Incomplete"}}
EVENTS
run "$LIB_DIR/archeflow-score.sh" extract "$BATS_TEST_TMPDIR/incomplete.jsonl"
[ "$status" -eq 1 ]
[[ "$output" == *"run.complete"* ]]
}
@test "score extract: creates effectiveness.jsonl with archetype scores" {
run "$LIB_DIR/archeflow-score.sh" extract "$BATS_TEST_TMPDIR/scored-events.jsonl"
[ "$status" -eq 0 ]
[ -f ".archeflow/memory/effectiveness.jsonl" ]
# Should have scores for guardian and sage (the reviewers)
local guardian_score
guardian_score=$(grep '"guardian"' ".archeflow/memory/effectiveness.jsonl" | head -1)
[ -n "$guardian_score" ]
# Verify JSONL is valid
while IFS= read -r line; do
echo "$line" | jq empty
done < ".archeflow/memory/effectiveness.jsonl"
}
@test "score extract: guardian has correct finding counts" {
"$LIB_DIR/archeflow-score.sh" extract "$BATS_TEST_TMPDIR/scored-events.jsonl" 2>/dev/null
local guardian
guardian=$(grep '"guardian"' ".archeflow/memory/effectiveness.jsonl" | head -1)
local total_findings
total_findings=$(echo "$guardian" | jq '.findings_total')
[ "$total_findings" -eq 2 ]
local useful_findings
useful_findings=$(echo "$guardian" | jq '.findings_useful')
[ "$useful_findings" -eq 1 ]
local fixes
fixes=$(echo "$guardian" | jq '.fixes_applied')
[ "$fixes" -eq 1 ]
}
@test "score extract: composite score is between 0 and 1" {
"$LIB_DIR/archeflow-score.sh" extract "$BATS_TEST_TMPDIR/scored-events.jsonl" 2>/dev/null
while IFS= read -r line; do
local score
score=$(echo "$line" | jq '.composite_score')
# score >= 0 and score <= 1
[ "$(echo "$score >= 0" | bc)" -eq 1 ]
[ "$(echo "$score <= 1" | bc)" -eq 1 ]
done < ".archeflow/memory/effectiveness.jsonl"
}
@test "score report: exits 1 when no effectiveness data" {
run "$LIB_DIR/archeflow-score.sh" report
[ "$status" -eq 1 ]
[[ "$output" == *"No effectiveness data"* ]]
}
@test "score report: outputs markdown table with archetype data" {
"$LIB_DIR/archeflow-score.sh" extract "$BATS_TEST_TMPDIR/scored-events.jsonl" 2>/dev/null
run "$LIB_DIR/archeflow-score.sh" report
[ "$status" -eq 0 ]
[[ "$output" == *"Archetype Effectiveness Report"* ]]
[[ "$output" == *"Archetype"* ]]
[[ "$output" == *"guardian"* ]]
}

40
tests/test_helper.bash Normal file
View File

@@ -0,0 +1,40 @@
# test_helper.bash — Shared setup/teardown for ArcheFlow bats tests.
#
# Usage in .bats files:
# setup() { load test_helper; _common_setup; }
# teardown() { _common_teardown; }
#
# Provides:
# - BATS_TEST_TMPDIR: unique temp directory per test
# - Mock .archeflow/ structure via a git repo
# - LIB_DIR: path to the lib/ scripts under test
LIB_DIR="$(cd "$(dirname "${BASH_SOURCE[0]}")/../lib" && pwd)"
_common_setup() {
# Create a unique temp directory for this test
BATS_TEST_TMPDIR="$(mktemp -d)"
export BATS_TEST_TMPDIR
# Work inside the temp dir so scripts create .archeflow/ there
cd "$BATS_TEST_TMPDIR"
# Initialize a minimal git repo (many scripts need it)
git init --quiet
git config user.email "test@test.com"
git config user.name "Test User"
# Disable commit signing in tests (global config may have it enabled)
git config commit.gpgsign false
git config tag.gpgsign false
# Create an initial commit so HEAD exists
echo "init" > README.md
git add README.md
git commit -m "init" --quiet
}
_common_teardown() {
# Return to a safe directory before cleanup
cd /tmp
rm -rf "$BATS_TEST_TMPDIR"
}