Compare commits
9 Commits
6854e858a4
...
chore/trim
| Author | SHA1 | Date | |
|---|---|---|---|
| f9afca0c58 | |||
| 55a6ba14c9 | |||
| da13dfba85 | |||
| e19ff0acc3 | |||
| 1bf1376a80 | |||
| 6309614bfa | |||
| aebf55a9a7 | |||
| b72eed3157 | |||
| 35c9f8269b |
16
.claude-plugin/marketplace.json
Normal file
16
.claude-plugin/marketplace.json
Normal file
@@ -0,0 +1,16 @@
|
|||||||
|
{
|
||||||
|
"name": "claude-archeflow-plugin",
|
||||||
|
"description": "ArcheFlow plugin marketplace",
|
||||||
|
"plugins": [
|
||||||
|
{
|
||||||
|
"name": "archeflow",
|
||||||
|
"description": "Multi-agent orchestration with Jungian archetypes. PDCA quality cycles, shadow detection, git worktree isolation.",
|
||||||
|
"version": "0.3.0",
|
||||||
|
"path": ".",
|
||||||
|
"keywords": [
|
||||||
|
"orchestration", "multi-agent", "archetypes", "pdca",
|
||||||
|
"code-review", "quality", "worktrees", "shadow-detection"
|
||||||
|
]
|
||||||
|
}
|
||||||
|
]
|
||||||
|
}
|
||||||
71
CLAUDE.md
Normal file
71
CLAUDE.md
Normal file
@@ -0,0 +1,71 @@
|
|||||||
|
# archeflow — Multi-Agent Orchestration Plugin for Claude Code
|
||||||
|
|
||||||
|
Workspace-level orchestration: parallel agent teams across project portfolios, PDCA cycles with Jungian archetype roles, sprint runner, and post-implementation review. Installed as a Claude Code plugin.
|
||||||
|
|
||||||
|
## Tech Stack
|
||||||
|
|
||||||
|
- **Runtime:** Bash (lib scripts) + Claude Code skill system (Markdown skills)
|
||||||
|
- **No build step, no dependencies** — pure bash + markdown
|
||||||
|
- **Plugin format:** Claude Code plugin (skills/, hooks/, agents/, templates/)
|
||||||
|
|
||||||
|
## Key Commands
|
||||||
|
|
||||||
|
```bash
|
||||||
|
# Use via Claude Code slash commands:
|
||||||
|
/af-sprint # Main mode: work the queue across projects
|
||||||
|
/af-run <task> # Deep orchestration with PDCA cycles
|
||||||
|
/af-review # Post-implementation security/quality review
|
||||||
|
/af-status # Current run status
|
||||||
|
/af-init # Initialize ArcheFlow in a project
|
||||||
|
/af-score # Archetype effectiveness scores
|
||||||
|
/af-memory # Cross-run lesson memory
|
||||||
|
/af-report # Full process report
|
||||||
|
/af-fanout # Colette book fanout via agents
|
||||||
|
```
|
||||||
|
|
||||||
|
## Architecture
|
||||||
|
|
||||||
|
```
|
||||||
|
skills/ Slash command implementations (one dir per skill)
|
||||||
|
sprint/ /af-sprint — queue-driven parallel agent runner
|
||||||
|
run/ /af-run — PDCA orchestration
|
||||||
|
review/ /af-review — Guardian-led code review
|
||||||
|
plan-phase/ PDCA Plan phase
|
||||||
|
do-phase/ PDCA Do phase
|
||||||
|
check-phase/ PDCA Check phase
|
||||||
|
act-phase/ PDCA Act phase
|
||||||
|
memory/ Cross-run lessons learned
|
||||||
|
cost-tracking/ Token/cost awareness
|
||||||
|
domains/ Domain detection (code, writing, research)
|
||||||
|
... ~25 skill directories
|
||||||
|
hooks/
|
||||||
|
hooks.json Hook definitions
|
||||||
|
session-start/ Auto-activation on session start
|
||||||
|
agents/ Archetype agent definitions
|
||||||
|
explorer.md Divergent thinking, research
|
||||||
|
creator.md Design, architecture
|
||||||
|
maker.md Implementation
|
||||||
|
guardian.md Security, risk, quality gates
|
||||||
|
sage.md Wisdom, patterns, trade-offs
|
||||||
|
skeptic.md Devil's advocate
|
||||||
|
trickster.md Edge cases, unconventional approaches
|
||||||
|
lib/ Bash helper scripts (git, DAG, events, progress, etc.)
|
||||||
|
templates/bundles/ Pre-configured workflow bundles
|
||||||
|
docs/ Roadmap, dogfood notes, test reports
|
||||||
|
```
|
||||||
|
|
||||||
|
## Domain Rules
|
||||||
|
|
||||||
|
- Skills are Markdown files with frontmatter — follow existing skill format exactly
|
||||||
|
- Agents are archetype personas — maintain their distinct voice and perspective
|
||||||
|
- Dogfood observations go to `archeflow/.archeflow/memory/lessons.jsonl`
|
||||||
|
- Cost tracking: prefer cheap models for bulk ops, expensive for creative/review
|
||||||
|
- PDCA cycle order is mandatory: Plan -> Do -> Check -> Act
|
||||||
|
|
||||||
|
## Do NOT
|
||||||
|
|
||||||
|
- Add runtime dependencies — this must stay zero-dependency
|
||||||
|
- Change archetype personalities without updating all referencing skills
|
||||||
|
- Skip the Check phase in PDCA cycles (quality gate)
|
||||||
|
- Modify hooks.json format without testing plugin reload
|
||||||
|
- Use ArcheFlow to orchestrate simple single-file tasks (overhead not justified)
|
||||||
95
README.md
95
README.md
@@ -1,21 +1,37 @@
|
|||||||
# ArcheFlow -- Multi-Agent Orchestration for Claude Code
|
# ArcheFlow -- Workspace Orchestration for Claude Code
|
||||||
|
|
||||||
**Structured quality through archetypal collaboration.** ArcheFlow coordinates multiple Claude Code agents through PDCA cycles, where each agent embodies a Jungian archetype with defined strengths and known failure modes.
|
**Run parallel agent teams across your entire project portfolio.** ArcheFlow reads a task queue, spawns agents across multiple projects simultaneously, collects results, commits, and keeps going. Built for developers managing 10-30 repos who want throughput, not ceremony.
|
||||||
|
|
||||||
Zero dependencies. No build step. Install and go.
|
Zero dependencies. No build step. Install and go.
|
||||||
|
|
||||||
> **Status: Experimental.** ArcheFlow is a research prototype exploring the intersection of
|
> **Status: Experimental.** ArcheFlow is a research prototype exploring the intersection of
|
||||||
> analytical psychology (Jungian archetypes), process engineering (Deming's PDCA cycles), and
|
> analytical psychology (Jungian archetypes), process engineering (PDCA cycles), and
|
||||||
> multi-agent software engineering. It is functional and actively developed, but not production-ready.
|
> multi-agent software engineering. It is functional and actively developed, but not production-ready.
|
||||||
> APIs, skill formats, and orchestration behavior may change between versions.
|
> APIs, skill formats, and orchestration behavior may change between versions.
|
||||||
|
|
||||||
## What It Does
|
## What It Does
|
||||||
|
|
||||||
Large coding tasks benefit from multiple perspectives, but "just spawn more agents" creates chaos. Agents duplicate work, miss each other's output, argue in circles, or go rogue. The problem is not intelligence -- it is coordination.
|
ArcheFlow solves three problems:
|
||||||
|
|
||||||
ArcheFlow solves this by giving each agent an *archetype*: a behavioral protocol that defines what the agent cares about, what context it receives, and how its output feeds into the next phase. Seven archetypes collaborate through **Plan-Do-Check-Act cycles**, where each iteration builds on structured feedback from the last. No unreviewed code reaches your main branch.
|
**1. Workspace Sprint Runner** (`/af-sprint`) -- The primary mode. Reads your task queue, picks the highest-priority items across different projects, spawns 3-5 agents in parallel, collects results, commits+pushes, and immediately starts the next batch. Turns a 25-item backlog into done work while you watch (or don't).
|
||||||
|
|
||||||
The key insight: archetypes are not just system prompts. Each one has a **virtue** (its unique contribution) and a **shadow** (the dysfunction it falls into when pushed too far). ArcheFlow monitors for shadow activation and course-corrects automatically -- replacing an agent that blocks everything, reining in one that researches forever, or escalating when a maker goes off-script.
|
**2. Post-Implementation Review** (`/af-review`) -- Run security and quality review on any diff, branch, or commit range. No planning, no implementation orchestration -- just Guardian analysis of what could go wrong. The highest-ROI mode for catching design-level bugs that linters miss.
|
||||||
|
|
||||||
|
**3. Deep Orchestration** (`/af-run`) -- For complex tasks that need structured exploration, design, implementation, and multi-perspective review. Uses archetypal roles (Explorer, Creator, Maker, Guardian) through PDCA cycles. Best for security-sensitive changes, multi-module refactors, and creative writing.
|
||||||
|
|
||||||
|
### When to use what
|
||||||
|
|
||||||
|
| Situation | Command | Why |
|
||||||
|
|-----------|---------|-----|
|
||||||
|
| Work the backlog | `/af-sprint` | Parallel agents, maximum throughput |
|
||||||
|
| Review before merging | `/af-review` | Catch design bugs, not style nits |
|
||||||
|
| Complex feature (L/XL) | `/af-run` or `feature-dev` | Structured exploration + review |
|
||||||
|
| Simple fix (S/M) | Just do it | No orchestration overhead needed |
|
||||||
|
| Creative writing | `/af-run --domain writing` | Archetypes shine here -- no linters exist for prose |
|
||||||
|
|
||||||
|
### What ArcheFlow is NOT
|
||||||
|
|
||||||
|
ArcheFlow is not a feature development tool. For single-feature implementation with user interaction at every step (clarify requirements, choose architecture, review), use Claude Code's `feature-dev` plugin or work directly. ArcheFlow adds value through **parallel execution across projects** and **domain-specific quality review** (writing, research), not by competing with single-task development tools.
|
||||||
|
|
||||||
## Quick Start
|
## Quick Start
|
||||||
|
|
||||||
@@ -59,50 +75,61 @@ After installing, run `/reload-plugins` or restart Claude Code. ArcheFlow activa
|
|||||||
- `--scope project` — only in the current project
|
- `--scope project` — only in the current project
|
||||||
- `--scope local` — only in the current directory
|
- `--scope local` — only in the current directory
|
||||||
|
|
||||||
### 2. Run your first orchestration
|
### 2. Run your first sprint
|
||||||
|
|
||||||
Just describe a task. ArcheFlow activates automatically for multi-file changes:
|
|
||||||
|
|
||||||
```
|
```
|
||||||
> Add input validation to all API endpoints
|
> /af-sprint
|
||||||
```
|
```
|
||||||
|
|
||||||
Or invoke it explicitly:
|
ArcheFlow reads your task queue (`docs/orchestra/queue.json`), picks the highest-priority items, and spawns parallel agents:
|
||||||
|
|
||||||
```
|
```
|
||||||
> archeflow:run "Add JWT authentication" --workflow standard
|
── af-sprint: Batch 1 ──────────────────────────
|
||||||
|
🔸 writing.colette config parser expansion [P2, M] running
|
||||||
|
🔸 product.jobradar search API endpoint [P3, M] running
|
||||||
|
🔸 tool.git-alm SVG export + minimap [P3, M] running
|
||||||
|
🔸 product.game-factory completion tracking [P3, S] running
|
||||||
|
────────────────────────────────────────────────
|
||||||
|
|
||||||
|
[5 min later]
|
||||||
|
|
||||||
|
── Batch 1 complete ────────────────────────────
|
||||||
|
✓ writing.colette config parser done (3m24s)
|
||||||
|
✓ product.jobradar search API done (5m01s)
|
||||||
|
✓ tool.git-alm SVG export done (4m30s)
|
||||||
|
✓ product.game-factory tracking done (2m15s)
|
||||||
|
|
||||||
|
4 tasks · 4 projects · all committed + pushed
|
||||||
|
Next batch: 2 items ready → dispatching...
|
||||||
|
────────────────────────────────────────────────
|
||||||
```
|
```
|
||||||
|
|
||||||
### 3. What happens
|
### 3. Review before merging
|
||||||
|
|
||||||
ArcheFlow selects a workflow (fast, standard, or thorough) and runs a PDCA cycle:
|
|
||||||
|
|
||||||
```
|
```
|
||||||
Plan --> Explorer researches codebase context, Creator designs a proposal
|
> /af-review --branch feat/batch-api
|
||||||
Do --> Maker implements in an isolated git worktree
|
|
||||||
Check --> Reviewers assess in parallel (Guardian, Skeptic, Sage, Trickster)
|
|
||||||
Act --> All approved? Merge. Issues found? Cycle back with structured feedback.
|
|
||||||
|
|
||||||
Each cycle catches what the last one missed.
|
|
||||||
```
|
```
|
||||||
|
|
||||||
Progress is visible in real time:
|
Guardian analyzes the diff for error handling gaps, security issues, and data loss scenarios:
|
||||||
|
|
||||||
```
|
```
|
||||||
--- ArcheFlow: Add JWT authentication ---------
|
── af-review: writing.colette ─────────────────
|
||||||
Workflow: standard (2 cycles max)
|
🛡️ Guardian: 2 findings (1 HIGH, 1 MEDIUM)
|
||||||
|
[HIGH] Timeout marks variant as done — loses batch state (fanout.py:552)
|
||||||
🔍 [Plan] Explorer researching... done (35s)
|
[MEDIUM] No JSON error handling on corrupted state (batch.py:310)
|
||||||
🏗️ [Plan] Creator designing proposal... done (25s, confidence: 0.8)
|
────────────────────────────────────────────────
|
||||||
⚒️ [Do] Maker implementing... done (90s, 4 files, 8 tests)
|
|
||||||
🛡️ [Check] Guardian reviewing... APPROVED
|
|
||||||
🤔 [Check] Skeptic challenging... APPROVED (1 INFO)
|
|
||||||
📚 [Check] Sage reviewing... APPROVED
|
|
||||||
[Act] All approved -- merging... merged to main
|
|
||||||
|
|
||||||
--- Complete: 3m 10s, 1 cycle -----------------
|
|
||||||
```
|
```
|
||||||
|
|
||||||
|
### 4. Deep orchestration (when needed)
|
||||||
|
|
||||||
|
For complex, security-sensitive, or creative tasks:
|
||||||
|
|
||||||
|
```
|
||||||
|
> /af-run "Add JWT authentication" --workflow standard
|
||||||
|
```
|
||||||
|
|
||||||
|
This runs the full PDCA cycle with archetypal roles. See "Deep Orchestration" below for details.
|
||||||
|
|
||||||
## The Seven Archetypes
|
## The Seven Archetypes
|
||||||
|
|
||||||
| Archetype | Phase | Virtue | Shadow | Role |
|
| Archetype | Phase | Virtue | Shadow | Role |
|
||||||
|
|||||||
181
docs/dogfood-2026-04-04-batch.md
Normal file
181
docs/dogfood-2026-04-04-batch.md
Normal file
@@ -0,0 +1,181 @@
|
|||||||
|
# ArcheFlow Dogfood Report #2: Batch API Integration
|
||||||
|
|
||||||
|
Date: 2026-04-04
|
||||||
|
Task: Wire Anthropic Batch API into Colette's fanout pipeline with CLI commands and state persistence
|
||||||
|
Project: writing.colette (Python, 27 modules, 457 tests)
|
||||||
|
Complexity: High — 4 files, async API, state persistence, error recovery, CLI commands
|
||||||
|
|
||||||
|
## Experimental Setup
|
||||||
|
|
||||||
|
Same task, same starting commit, two conditions:
|
||||||
|
1. **Baseline**: Plain Claude, no orchestration, single pass
|
||||||
|
2. **ArcheFlow**: PDCA standard workflow (Maker + Guardian review)
|
||||||
|
|
||||||
|
No Explorer or Creator used this time — task scope was clear enough to skip planning and go directly to Maker + Guardian (effectively a fast workflow).
|
||||||
|
|
||||||
|
## Quantitative Comparison
|
||||||
|
|
||||||
|
| Metric | Baseline | ArcheFlow | Delta |
|
||||||
|
|--------|----------|-----------|-------|
|
||||||
|
| Lines added | 189 | 279 | +48% |
|
||||||
|
| Files touched | 4 | 4 | same |
|
||||||
|
| Time | ~5 min | ~12 min | +140% |
|
||||||
|
| Commits | 1 | 4 | cleaner history |
|
||||||
|
| Tests written | 1 | 2 | +1 |
|
||||||
|
| Tests passing | 13/13 | 14/14 | +1 |
|
||||||
|
| Bugs introduced | 0 | 1 | worse |
|
||||||
|
| Bugs caught by review | 0 | 5 | better |
|
||||||
|
| **Real bugs in final code** | **1** | **0** (after fix) | **ArcheFlow wins** |
|
||||||
|
|
||||||
|
## Bug Analysis
|
||||||
|
|
||||||
|
### Bugs found only by Guardian (not present in baseline)
|
||||||
|
|
||||||
|
| # | Bug | Severity | Impact |
|
||||||
|
|---|-----|----------|--------|
|
||||||
|
| 3 | `hash()` non-deterministic across processes for chapter index mapping | HIGH | Data loss on resume — chapters mapped to wrong files |
|
||||||
|
|
||||||
|
This bug was **introduced by ArcheFlow's Maker** and caught by the Guardian. Baseline used `enumerate(i)` and avoided it entirely. Net: zero value.
|
||||||
|
|
||||||
|
### Bugs present in BOTH versions, caught only by Guardian
|
||||||
|
|
||||||
|
| # | Bug | Severity | Impact |
|
||||||
|
|---|-----|----------|--------|
|
||||||
|
| 4 | Timeout marks variant as "done" — permanently loses batch state | HIGH | Silent data loss — timed-out batches can never be resumed |
|
||||||
|
|
||||||
|
This is the **key finding**. Both implementations had this design-level bug. Only ArcheFlow's Guardian caught it. Plain Claude missed it because there was no review step.
|
||||||
|
|
||||||
|
### Bugs in both, not caught by either initially
|
||||||
|
|
||||||
|
| # | Bug | Severity | Impact |
|
||||||
|
|---|-----|----------|--------|
|
||||||
|
| 1 | API key resolution inconsistency (env vs config) | CRITICAL | Wrong key used under mixed-key environments |
|
||||||
|
| 5 | No JSON error handling on corrupted state files | HIGH | Crash on truncated state file |
|
||||||
|
|
||||||
|
Guardian flagged these. Baseline would have shipped them silently.
|
||||||
|
|
||||||
|
## Qualitative Observations
|
||||||
|
|
||||||
|
### Where Guardian added real value
|
||||||
|
|
||||||
|
1. **Error path analysis**: Guardian systematically checked "what happens when X fails?" for timeout, cancellation, corruption, and cross-process resume. Plain Claude focused on the happy path.
|
||||||
|
2. **Cross-process state**: The `hash()` non-determinism finding required reasoning about Python's hash randomization across interpreter invocations — a subtle runtime property that isn't visible from reading the code in isolation.
|
||||||
|
3. **Data loss scenarios**: Finding #4 (timeout → "done" → lost forever) requires understanding the interaction between `wait_and_retrieve`'s timeout branch and the caller's unconditional status assignment. This is a 2-module interaction that single-pass implementation doesn't systematically check.
|
||||||
|
|
||||||
|
### Where Guardian added noise
|
||||||
|
|
||||||
|
1. **Finding #2 (batch_id validation)**: Technically valid but the Anthropic SDK already rejects malformed IDs. Low practical risk.
|
||||||
|
2. **Finding #1 (API key source)**: Valid but matches existing patterns throughout the codebase — flagging it here without flagging it elsewhere is inconsistent.
|
||||||
|
|
||||||
|
### The Maker problem
|
||||||
|
|
||||||
|
The ArcheFlow Maker introduced a bug (hash-based indexing) that the baseline avoided. This happened because:
|
||||||
|
- The Maker was working from a task description, not reading the existing sequential rewrite code as closely
|
||||||
|
- The Creator's plan (when used in dogfood #1) over-specified some things and under-specified others
|
||||||
|
- Working through an intermediary (plan → implementation) introduces information loss
|
||||||
|
|
||||||
|
This is a structural weakness of the PDCA model: the Plan-to-Do handoff can corrupt information.
|
||||||
|
|
||||||
|
## Conclusions
|
||||||
|
|
||||||
|
### Complexity threshold confirmed
|
||||||
|
|
||||||
|
| Task type | Orchestration value |
|
||||||
|
|-----------|-------------------|
|
||||||
|
| Simple (pattern-following, single file) | **Negative** — adds cost, Maker introduces bugs |
|
||||||
|
| Medium (multi-file feature, clear scope) | **Neutral** — extra code but similar outcome |
|
||||||
|
| Complex (error handling, state, async, resume) | **Positive** — Guardian catches design-level bugs |
|
||||||
|
|
||||||
|
The differentiator is **error path coverage**. Guardian's systematic "what if this fails?" analysis catches bugs that single-pass implementation misses because implementers focus on making things work, not on making failures safe.
|
||||||
|
|
||||||
|
### The honest ROI question
|
||||||
|
|
||||||
|
For this task: Guardian caught 1 bug the baseline missed (timeout data loss). That bug would have caused real data loss in production when a batch times out. The cost was ~7 extra minutes and a Maker-introduced bug that had to be fixed.
|
||||||
|
|
||||||
|
Is preventing a production data loss bug worth 7 extra minutes? Yes. But only because this was a task where data loss was possible. For a pure UI change or a refactor with no persistence, the answer would be no.
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Improvement Hypotheses
|
||||||
|
|
||||||
|
Based on both dogfood runs, here are concrete hypotheses about how to improve ArcheFlow's value-to-cost ratio:
|
||||||
|
|
||||||
|
### H1: Guardian-Only Mode (skip Plan/Do orchestration)
|
||||||
|
|
||||||
|
**Observation**: In both dogfoods, the Maker produced equivalent-or-worse code than plain Claude. The value came entirely from the Guardian review.
|
||||||
|
|
||||||
|
**Hypothesis**: A "review-only" mode where the user implements normally and then runs ArcheFlow as a post-implementation review would capture the Guardian's value without the Maker's overhead.
|
||||||
|
|
||||||
|
**Test**: Implement the same task plain, then run `af-review` (Guardian + Skeptic on the diff). Compare bug catch rate to full PDCA.
|
||||||
|
|
||||||
|
**Expected outcome**: Same bug catch rate, ~60% less cost.
|
||||||
|
|
||||||
|
### H2: Pre-Implementation Threat Modeling (Guardian before Maker)
|
||||||
|
|
||||||
|
**Observation**: Guardian found error-handling bugs (timeout, corruption) that the Maker didn't anticipate. If Guardian's "what could go wrong?" analysis ran BEFORE implementation, the Maker could build in error handling from the start.
|
||||||
|
|
||||||
|
**Hypothesis**: Running a lightweight Guardian analysis on the Creator's plan (not the code) would produce a "threat list" that the Maker addresses during implementation, eliminating the need for a fix cycle.
|
||||||
|
|
||||||
|
**Sequence**: Creator → Guardian(plan) → Maker(plan + threats) → Guardian(code)
|
||||||
|
|
||||||
|
**Expected outcome**: Fewer Maker-introduced bugs, shorter fix cycle, Guardian's code review focuses on implementation correctness rather than missing error paths.
|
||||||
|
|
||||||
|
### H3: Differential Review (only review what the Maker DIDN'T get from the plan)
|
||||||
|
|
||||||
|
**Observation**: The Maker copies most of the plan correctly. The bugs are in the gaps — things the plan didn't specify (error handling, cross-process state, timeout recovery).
|
||||||
|
|
||||||
|
**Hypothesis**: Instead of reviewing the entire diff, focus the Guardian on the delta between the plan and the implementation — what the Maker added, changed, or skipped that wasn't in the plan.
|
||||||
|
|
||||||
|
**Test**: Extract the plan's explicit instructions, diff against the implementation, and give Guardian only the unplanned additions.
|
||||||
|
|
||||||
|
**Expected outcome**: Higher signal-to-noise ratio (fewer false positives on code that correctly follows the plan), focused attention on the dangerous gaps.
|
||||||
|
|
||||||
|
### H4: Project Convention Calibration (reduce false positives)
|
||||||
|
|
||||||
|
**Observation**: Guardian flagged API key handling (finding #1) and batch_id validation (finding #2) — both valid in absolute terms but inconsistent with the project's existing patterns. The project doesn't validate IDs or centralize key management anywhere else.
|
||||||
|
|
||||||
|
**Hypothesis**: Injecting a "project conventions" summary before Guardian review (e.g., "this project uses env vars for API keys, does not validate external IDs, handles errors via outer try/except") would let Guardian calibrate its expectations and only flag deviations from convention, not the convention itself.
|
||||||
|
|
||||||
|
**Test**: Run Guardian with and without convention context on the same diff. Count false positives.
|
||||||
|
|
||||||
|
**Expected outcome**: 30-50% reduction in noise findings without missing real bugs.
|
||||||
|
|
||||||
|
### H5: Abandon PDCA for Implementation, Keep It for Review
|
||||||
|
|
||||||
|
**Observation**: Across both dogfoods, the cycle-back mechanism (Plan→Do→Check→Act→cycle back) never triggered. All reviews were APPROVED_WITH_FIXES, and fixes were applied in a single pass. The cyclic model added structural overhead (event tracking, artifact routing, convergence detection) that was never used.
|
||||||
|
|
||||||
|
**Hypothesis**: For most tasks, a linear pipeline (implement → multi-reviewer check → targeted fix) is sufficient. Reserve cyclic PDCA for tasks where reviewers fundamentally reject the approach (not just the implementation).
|
||||||
|
|
||||||
|
**Test**: Compare PDCA standard (cycle-back enabled) vs pipeline (no cycle-back) on 10 tasks. Measure: how often does cycle-back actually improve the outcome?
|
||||||
|
|
||||||
|
**Expected outcome**: Cycle-back triggers in <10% of tasks. Pipeline matches PDCA quality for 90%+ of cases at lower cost.
|
||||||
|
|
||||||
|
### H6: Evidence-Gated Findings Actually Work
|
||||||
|
|
||||||
|
**Observation**: Of Guardian's 5 findings in this dogfood, 3 were substantive (timeout data loss, hash non-determinism, no JSON error handling) and 2 were low-value (API key pattern, batch_id format). The substantive ones cited specific code paths and failure scenarios. The low-value ones cited general principles without evidence of actual exploitation.
|
||||||
|
|
||||||
|
**Hypothesis**: The evidence-gating mechanism added in v0.7.0 (ban hedged phrases, require command output or code citation) would have automatically downgraded finding #2 ("could corrupt log output") while preserving findings #3 and #4 (which cite specific code paths and failure mechanisms).
|
||||||
|
|
||||||
|
**Test**: Re-run the Guardian review with evidence-gating active. Count how many findings survive vs. get downgraded.
|
||||||
|
|
||||||
|
**Expected outcome**: 1-2 findings correctly downgraded, 0 real bugs missed.
|
||||||
|
|
||||||
|
### H7: Shadow Detection for the Maker
|
||||||
|
|
||||||
|
**Observation**: The Maker introduced a bug (hash-based indexing) because it deviated from the existing codebase pattern (enumerate-based indexing). This is the "Rogue" shadow — the Maker going off-script from what the codebase already does.
|
||||||
|
|
||||||
|
**Hypothesis**: A pre-commit check that compares the Maker's implementation against the existing codebase patterns (e.g., "how are chapter indices computed elsewhere in fanout.py?") would catch Rogue deviations before the Guardian review.
|
||||||
|
|
||||||
|
**Test**: Add a "pattern conformance" check to the Do phase that greps for how the modified variables/functions are used elsewhere in the file.
|
||||||
|
|
||||||
|
**Expected outcome**: Catches Rogue shadow bugs at implementation time rather than review time, saving a review cycle.
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Recommended Next Steps (Priority Order)
|
||||||
|
|
||||||
|
1. **H1**: Build `af-review` mode (Guardian-only on existing diff) — lowest effort, highest expected ROI
|
||||||
|
2. **H4**: Project convention injection — reduce noise without missing signal
|
||||||
|
3. **H2**: Pre-implementation threat modeling — address the root cause of missing error handling
|
||||||
|
4. **H5**: Default to pipeline strategy, reserve PDCA for rejections
|
||||||
|
5. **H7**: Maker pattern conformance check — reduce Maker-introduced bugs
|
||||||
78
docs/dogfood-2026-04-04.md
Normal file
78
docs/dogfood-2026-04-04.md
Normal file
@@ -0,0 +1,78 @@
|
|||||||
|
# ArcheFlow Dogfood Report: Colette Expose/Pitch Generation
|
||||||
|
|
||||||
|
Date: 2026-04-04
|
||||||
|
Task: Implement expose and pitch generation steps in Colette's fanout pipeline
|
||||||
|
Project: writing.colette (Python, 27 modules, 457 tests)
|
||||||
|
|
||||||
|
## Task Description
|
||||||
|
|
||||||
|
The fanout pipeline in `src/colette/fanout.py` had two placeholder steps (`generate_expose`, `generate_pitch`) that logged "not yet implemented". The task was to replace them with real LLM-powered implementations that generate publishing proposals and pitch letters.
|
||||||
|
|
||||||
|
## Conditions
|
||||||
|
|
||||||
|
| Condition | Strategy | Agents | Time | Lines |
|
||||||
|
|-----------|----------|--------|------|-------|
|
||||||
|
| **Plain Claude** (no orchestration) | None | 0 | ~3 min | 107 (+75 impl, +32 test) |
|
||||||
|
| **ArcheFlow PDCA** (standard workflow) | pdca | 4 (Explorer, Creator, Maker, Guardian) | ~15 min | 230 (+145 impl, +85 test) |
|
||||||
|
|
||||||
|
## Findings
|
||||||
|
|
||||||
|
### Bugs introduced
|
||||||
|
|
||||||
|
| Condition | Bug | Caught by | Severity |
|
||||||
|
|-----------|-----|-----------|----------|
|
||||||
|
| Plain Claude | None | N/A | N/A |
|
||||||
|
| ArcheFlow | `task_type`/`file_path` kwargs passed to `LLMClient.create()` but only exist on `GuardedLLMClient` | Guardian review | CRITICAL (runtime crash on non-guarded clients) |
|
||||||
|
|
||||||
|
**Key observation:** ArcheFlow's Maker introduced a bug that plain Claude avoided. The Guardian caught it, but the net result was: introduce bug + catch bug = extra work for the same outcome.
|
||||||
|
|
||||||
|
### Code comparison
|
||||||
|
|
||||||
|
| Metric | Plain Claude | ArcheFlow |
|
||||||
|
|--------|-------------|-----------|
|
||||||
|
| Implementation lines | 75 | 145 |
|
||||||
|
| Test lines | 32 | 85 |
|
||||||
|
| LLMClient compatibility | Clean (protocol args only) | Needed fix (extra kwargs) |
|
||||||
|
| Prompt detail | Adequate (10 sections listed) | More detailed (explicit section descriptions) |
|
||||||
|
| Defensive coding | Minimal (follows existing patterns) | More (mkdir guards, fallback paths) |
|
||||||
|
| Test thoroughness | Basic (file existence, call count) | More thorough (token accumulation, error states) |
|
||||||
|
|
||||||
|
### Process overhead
|
||||||
|
|
||||||
|
| Phase | Time | Value added |
|
||||||
|
|-------|------|-------------|
|
||||||
|
| Explorer research | ~60s | Low — task was well-scoped, pattern was obvious from reading 2 lines |
|
||||||
|
| Creator proposal | ~45s | Low — 300-line plan for 75-line task, mostly restated what the code already showed |
|
||||||
|
| Maker implementation | ~90s | Same as plain Claude, but produced more verbose code + a bug |
|
||||||
|
| Guardian review | ~30s | Mixed — caught 1 real bug (out of 5 findings, 80% noise) |
|
||||||
|
|
||||||
|
### Why plain Claude won
|
||||||
|
|
||||||
|
1. **Pattern-following task.** Two placeholder functions, one existing pattern to copy. No ambiguity, no design decisions, no security concerns.
|
||||||
|
2. **Direct protocol reading.** Plain Claude checked the `LLMClient.create()` signature and used only standard args. The Maker, working from the Creator's plan (which didn't mention the protocol), used extra kwargs it saw in the `GuardedLLMClient`.
|
||||||
|
3. **Less indirection = fewer errors.** The Creator-to-Maker handoff introduced information loss. The Creator specified "call llm_client.create()" but didn't specify the exact signature constraints. Plain Claude read the source of truth directly.
|
||||||
|
|
||||||
|
### When ArcheFlow would have been worth it
|
||||||
|
|
||||||
|
This task had none of these signals:
|
||||||
|
- Ambiguous requirements (need Explorer)
|
||||||
|
- Multiple valid approaches (need Creator to evaluate)
|
||||||
|
- Security-sensitive code (need Guardian for real threats)
|
||||||
|
- Cross-cutting changes (5+ files, interaction risks)
|
||||||
|
- Unfamiliar codebase (need research phase)
|
||||||
|
|
||||||
|
### Improvement opportunities
|
||||||
|
|
||||||
|
1. **Auto-select should skip orchestration** for pattern-following tasks (placeholder + existing pattern in same file)
|
||||||
|
2. **Creator compact mode** — for simple tasks, emit a 10-line diff-style plan, not a 300-line essay
|
||||||
|
3. **Explorer budget cap** — 60s max for single-file tasks
|
||||||
|
4. **Guardian calibration** — inject project conventions to reduce false positives from 80% to ~40%
|
||||||
|
5. **Baseline capture** — run the same task without ArcheFlow to enable A/B comparison
|
||||||
|
|
||||||
|
## Conclusion
|
||||||
|
|
||||||
|
For this specific task (simple, pattern-following, single-file, well-scoped), ArcheFlow added cost without adding quality. Plain Claude was faster, produced less code, and avoided a bug that the Maker introduced.
|
||||||
|
|
||||||
|
This is not a failure of ArcheFlow's design — it's a calibration problem. The auto-select heuristic should have detected this as a skip-orchestration task. The complexity threshold for ArcheFlow activation needs to be higher than "touches 2+ files."
|
||||||
|
|
||||||
|
**Honest assessment:** ArcheFlow's value-add starts at tasks requiring genuine design decisions, security review, or cross-module coordination. Below that threshold, it's ceremony.
|
||||||
@@ -29,22 +29,32 @@ Three ArcheFlow PDCA cycles in one session, each using ArcheFlow's own orchestra
|
|||||||
- Runnable quickstart example (`examples/runnable-quickstart.md`)
|
- Runnable quickstart example (`examples/runnable-quickstart.md`)
|
||||||
- CHANGELOG completed with missing v0.4.0 entry + roadmap version history
|
- CHANGELOG completed with missing v0.4.0 entry + roadmap version history
|
||||||
|
|
||||||
|
### v0.7.0 — Superpowers-Inspired + Strategy Abstraction (8 commits, 485 lines, 20 files)
|
||||||
|
- Context isolation protocol (attention-filters + all 7 agents)
|
||||||
|
- Structured status tokens: DONE/DONE_WITH_CONCERNS/NEEDS_CONTEXT/BLOCKED
|
||||||
|
- Evidence-gated verification: banned phrases, evidence markers, downgrade-to-INFO
|
||||||
|
- Plan granularity constraint: 2-5 min tasks with file:line + code block + verify
|
||||||
|
- Strategy abstraction: `pdca` (cyclic) vs `pipeline` (linear) vs `auto` (selected by task)
|
||||||
|
- README: experimental status + interdisciplinary framing (psychology + process eng + software eng)
|
||||||
|
- Review fixes: fast→pipeline auto-select, merge guard, evidence check completeness
|
||||||
|
|
||||||
### Key numbers
|
### Key numbers
|
||||||
| Metric | v0.3 → v0.6 delta |
|
| Metric | v0.3 → v0.7 delta |
|
||||||
|--------|-------------------|
|
|--------|-------------------|
|
||||||
| Commits this session | 21 |
|
| Commits this session | 29 |
|
||||||
| Lines added | ~1,277 |
|
| Lines added | ~1,762 |
|
||||||
| Files touched | 25+ |
|
| Files touched | 30+ |
|
||||||
| Lib scripts | 8 → 9 (archeflow-rollback.sh) |
|
| Lib scripts | 8 → 9 (archeflow-rollback.sh) |
|
||||||
| Skills | 24 (all now fleshed out) |
|
| Skills | 24 (all fleshed out, no stubs remain) |
|
||||||
| Review cycles | 3 (Guardian + Skeptic + Sage per round) |
|
| Review cycles | 4 (v0.4: full, v0.5: full, v0.6: fast, v0.7: Guardian-only) |
|
||||||
| Review findings fixed | 11 |
|
| Review findings fixed | 15 |
|
||||||
|
|
||||||
### What to do next
|
### What to do next
|
||||||
1. **End-to-end dogfood** — run `af-run` on a real task (not ArcheFlow itself) to test the full PDCA loop
|
1. **End-to-end dogfood** — run `af-run` on a real task (not ArcheFlow itself) to test both strategies
|
||||||
2. **Hook execution runtime** — the config documents 6 hook events but no runner yet
|
2. **Hook execution runtime** — config documents 6 hook events but no runner yet
|
||||||
3. **Publish** — consider tagging v0.6.0 and announcing on git.xorwell.de
|
3. **Pipeline strategy testing** — exercise the `--strategy pipeline` path on a bug fix
|
||||||
4. **GitHub Action** — automated PR review (roadmap item, low effort)
|
4. **Publish** — tag v0.7.0, consider claude.com/plugins marketplace listing
|
||||||
|
5. **GitHub Action** — automated PR review (roadmap item, low effort)
|
||||||
|
|
||||||
## 2026-04-03: Major Feature Sprint (v0.1 → v0.3)
|
## 2026-04-03: Major Feature Sprint (v0.1 → v0.3)
|
||||||
|
|
||||||
|
|||||||
197
lib/archeflow-review.sh
Executable file
197
lib/archeflow-review.sh
Executable file
@@ -0,0 +1,197 @@
|
|||||||
|
#!/usr/bin/env bash
|
||||||
|
# archeflow-review.sh — Get a git diff for Guardian review, with stats.
|
||||||
|
#
|
||||||
|
# Standalone diff helper for af-review. No PDCA orchestration — just extracts
|
||||||
|
# the right diff and reports stats so the Claude Code agent can feed it to
|
||||||
|
# Guardian (or other reviewers).
|
||||||
|
#
|
||||||
|
# Usage:
|
||||||
|
# archeflow-review.sh # Uncommitted changes (staged + unstaged)
|
||||||
|
# archeflow-review.sh --branch feat/batch-api # Branch diff vs main
|
||||||
|
# archeflow-review.sh --commit HEAD~3..HEAD # Commit range
|
||||||
|
# archeflow-review.sh --base develop # Override base branch (default: main)
|
||||||
|
# archeflow-review.sh --stat-only # Only print stats, no diff output
|
||||||
|
#
|
||||||
|
# Output:
|
||||||
|
# Prints the diff to stdout. Stats go to stderr so they don't pollute the diff.
|
||||||
|
# Exit code 0 if diff is non-empty, 1 if empty (nothing to review).
|
||||||
|
|
||||||
|
set -euo pipefail
|
||||||
|
|
||||||
|
# ---------------------------------------------------------------------------
|
||||||
|
# Globals
|
||||||
|
# ---------------------------------------------------------------------------
|
||||||
|
|
||||||
|
BASE_BRANCH="main"
|
||||||
|
MODE="uncommitted" # uncommitted | branch | commit
|
||||||
|
TARGET=""
|
||||||
|
STAT_ONLY="false"
|
||||||
|
|
||||||
|
# ---------------------------------------------------------------------------
|
||||||
|
# Helpers
|
||||||
|
# ---------------------------------------------------------------------------
|
||||||
|
|
||||||
|
die() {
|
||||||
|
echo "[af-review] ERROR: $*" >&2
|
||||||
|
exit 1
|
||||||
|
}
|
||||||
|
|
||||||
|
info() {
|
||||||
|
echo "[af-review] $*" >&2
|
||||||
|
}
|
||||||
|
|
||||||
|
# Print diff stats (files changed, insertions, deletions) to stderr.
|
||||||
|
print_stats() {
|
||||||
|
local diff_text="$1"
|
||||||
|
|
||||||
|
local files_changed lines_added lines_removed total_lines
|
||||||
|
files_changed=$(echo "$diff_text" | grep -c '^diff --git' || true)
|
||||||
|
lines_added=$(echo "$diff_text" | grep -c '^+[^+]' || true)
|
||||||
|
lines_removed=$(echo "$diff_text" | grep -c '^-[^-]' || true)
|
||||||
|
total_lines=$(echo "$diff_text" | wc -l | tr -d ' ')
|
||||||
|
|
||||||
|
info "--- Review Stats ---"
|
||||||
|
info "Files changed: ${files_changed}"
|
||||||
|
info "Lines added: +${lines_added}"
|
||||||
|
info "Lines removed: -${lines_removed}"
|
||||||
|
info "Diff size: ${total_lines} lines"
|
||||||
|
|
||||||
|
if [[ "$total_lines" -gt 500 ]]; then
|
||||||
|
info "Warning: large diff (>500 lines). Consider reviewing per-file."
|
||||||
|
fi
|
||||||
|
}
|
||||||
|
|
||||||
|
# Detect the default base branch (main or master).
|
||||||
|
detect_base_branch() {
|
||||||
|
if git show-ref --verify --quiet "refs/heads/main" 2>/dev/null; then
|
||||||
|
echo "main"
|
||||||
|
elif git show-ref --verify --quiet "refs/heads/master" 2>/dev/null; then
|
||||||
|
echo "master"
|
||||||
|
else
|
||||||
|
echo "main"
|
||||||
|
fi
|
||||||
|
}
|
||||||
|
|
||||||
|
# ---------------------------------------------------------------------------
|
||||||
|
# Argument parsing
|
||||||
|
# ---------------------------------------------------------------------------
|
||||||
|
|
||||||
|
parse_args() {
|
||||||
|
while [[ $# -gt 0 ]]; do
|
||||||
|
case "$1" in
|
||||||
|
--branch)
|
||||||
|
MODE="branch"
|
||||||
|
TARGET="${2:?Missing branch name after --branch}"
|
||||||
|
shift 2
|
||||||
|
;;
|
||||||
|
--commit)
|
||||||
|
MODE="commit"
|
||||||
|
TARGET="${2:?Missing commit range after --commit}"
|
||||||
|
shift 2
|
||||||
|
;;
|
||||||
|
--base)
|
||||||
|
BASE_BRANCH="${2:?Missing base branch after --base}"
|
||||||
|
shift 2
|
||||||
|
;;
|
||||||
|
--stat-only)
|
||||||
|
STAT_ONLY="true"
|
||||||
|
shift
|
||||||
|
;;
|
||||||
|
-h|--help)
|
||||||
|
echo "Usage: $0 [--branch <name>] [--commit <range>] [--base <branch>] [--stat-only]"
|
||||||
|
echo ""
|
||||||
|
echo " (no args) Review uncommitted changes (staged + unstaged)"
|
||||||
|
echo " --branch <name> Review branch diff against base (default: main)"
|
||||||
|
echo " --commit <range> Review a commit range (e.g. HEAD~3..HEAD)"
|
||||||
|
echo " --base <branch> Override base branch (default: auto-detect main/master)"
|
||||||
|
echo " --stat-only Print stats only, no diff output"
|
||||||
|
exit 0
|
||||||
|
;;
|
||||||
|
*)
|
||||||
|
die "Unknown argument: $1. Use --help for usage."
|
||||||
|
;;
|
||||||
|
esac
|
||||||
|
done
|
||||||
|
}
|
||||||
|
|
||||||
|
# ---------------------------------------------------------------------------
|
||||||
|
# Diff extraction
|
||||||
|
# ---------------------------------------------------------------------------
|
||||||
|
|
||||||
|
get_diff() {
|
||||||
|
local diff_text=""
|
||||||
|
|
||||||
|
case "$MODE" in
|
||||||
|
uncommitted)
|
||||||
|
# Combine staged and unstaged changes against HEAD
|
||||||
|
diff_text=$(git diff HEAD 2>/dev/null || true)
|
||||||
|
if [[ -z "$diff_text" ]]; then
|
||||||
|
# Maybe everything is staged, try just staged
|
||||||
|
diff_text=$(git diff --cached 2>/dev/null || true)
|
||||||
|
fi
|
||||||
|
;;
|
||||||
|
branch)
|
||||||
|
# Verify target branch exists
|
||||||
|
if ! git show-ref --verify --quiet "refs/heads/${TARGET}" 2>/dev/null; then
|
||||||
|
# Maybe it's a remote branch
|
||||||
|
if ! git rev-parse --verify "${TARGET}" &>/dev/null; then
|
||||||
|
die "Branch '${TARGET}' not found."
|
||||||
|
fi
|
||||||
|
fi
|
||||||
|
diff_text=$(git diff "${BASE_BRANCH}...${TARGET}" 2>/dev/null || true)
|
||||||
|
;;
|
||||||
|
commit)
|
||||||
|
# Validate commit range resolves
|
||||||
|
if ! git rev-parse "${TARGET}" &>/dev/null 2>&1; then
|
||||||
|
die "Invalid commit range: '${TARGET}'"
|
||||||
|
fi
|
||||||
|
diff_text=$(git diff "${TARGET}" 2>/dev/null || true)
|
||||||
|
;;
|
||||||
|
esac
|
||||||
|
|
||||||
|
echo "$diff_text"
|
||||||
|
}
|
||||||
|
|
||||||
|
# ---------------------------------------------------------------------------
|
||||||
|
# Main
|
||||||
|
# ---------------------------------------------------------------------------
|
||||||
|
|
||||||
|
main() {
|
||||||
|
# Verify we're in a git repo
|
||||||
|
if ! git rev-parse --is-inside-work-tree &>/dev/null; then
|
||||||
|
die "Not inside a git repository."
|
||||||
|
fi
|
||||||
|
|
||||||
|
parse_args "$@"
|
||||||
|
|
||||||
|
# Auto-detect base branch if not overridden
|
||||||
|
if [[ "$BASE_BRANCH" == "main" ]]; then
|
||||||
|
BASE_BRANCH=$(detect_base_branch)
|
||||||
|
fi
|
||||||
|
|
||||||
|
# Describe what we're reviewing
|
||||||
|
case "$MODE" in
|
||||||
|
uncommitted) info "Reviewing: uncommitted changes vs HEAD" ;;
|
||||||
|
branch) info "Reviewing: branch '${TARGET}' vs '${BASE_BRANCH}'" ;;
|
||||||
|
commit) info "Reviewing: commit range '${TARGET}'" ;;
|
||||||
|
esac
|
||||||
|
|
||||||
|
local diff_text
|
||||||
|
diff_text=$(get_diff)
|
||||||
|
|
||||||
|
# Validate non-empty
|
||||||
|
if [[ -z "$diff_text" ]]; then
|
||||||
|
info "No changes found. Nothing to review."
|
||||||
|
exit 1
|
||||||
|
fi
|
||||||
|
|
||||||
|
# Print stats to stderr
|
||||||
|
print_stats "$diff_text"
|
||||||
|
|
||||||
|
# Output the diff to stdout (unless stat-only)
|
||||||
|
if [[ "$STAT_ONLY" != "true" ]]; then
|
||||||
|
echo "$diff_text"
|
||||||
|
fi
|
||||||
|
}
|
||||||
|
|
||||||
|
main "$@"
|
||||||
@@ -1,292 +1,46 @@
|
|||||||
---
|
---
|
||||||
name: act-phase
|
name: act-phase
|
||||||
description: |
|
description: |
|
||||||
Use after the Check phase completes. Collects reviewer findings, prioritizes them, routes fixes to the right agent or tool, applies fixes systematically, and decides whether to exit or cycle.
|
Use after the Check phase completes. Collects reviewer findings, routes fixes, applies them, decides whether to exit or cycle.
|
||||||
<example>Automatically loaded during orchestration after Check phase</example>
|
<example>Automatically loaded during orchestration after Check phase</example>
|
||||||
<example>User: "Run just the act phase on existing findings"</example>
|
|
||||||
---
|
---
|
||||||
|
|
||||||
# Act Phase
|
# Act Phase
|
||||||
|
|
||||||
After all reviewers complete, the Act phase turns findings into fixes and decides whether the cycle is done. This is the bridge between "what's wrong" and "what we do about it."
|
Turn Check phase findings into fixes, then decide: exit or cycle.
|
||||||
|
|
||||||
## Overview
|
|
||||||
|
|
||||||
```
|
```
|
||||||
Check phase output → Collect → Prioritize → Route → Fix → Verify → Exit or Cycle
|
Check output → Collect → Deduplicate → Route → Fix → Exit or Cycle
|
||||||
```
|
```
|
||||||
|
|
||||||
---
|
---
|
||||||
|
|
||||||
## Step 1: Finding Collection
|
## Step 1: Collect and Consolidate Findings
|
||||||
|
|
||||||
Parse all reviewer outputs into one consolidated findings table. Use the standardized format from the `check-phase` skill.
|
Parse all reviewer outputs into one table grouped by severity (CRITICAL / WARNING / INFO):
|
||||||
|
|
||||||
```markdown
|
|
||||||
## Findings Summary — Cycle N
|
|
||||||
|
|
||||||
### CRITICAL (must fix before next cycle)
|
|
||||||
| # | Source | Location | Category | Description | Suggested Fix |
|
| # | Source | Location | Category | Description | Suggested Fix |
|
||||||
|---|--------|----------|----------|-------------|---------------|
|
|---|--------|----------|----------|-------------|---------------|
|
||||||
| 1 | guardian | src/auth/handler.ts:48 | security | Empty string bypasses validation | Add length check |
|
| 1 | guardian | src/auth/handler.ts:48 | security | Empty string bypasses validation | Add length check |
|
||||||
| 2 | trickster | src/api/parse.ts:92 | reliability | Null input causes crash | Guard with null check |
|
|
||||||
|
|
||||||
### WARNING (should fix)
|
|
||||||
| # | Source | Location | Category | Description | Suggested Fix |
|
|
||||||
|---|--------|----------|----------|-------------|---------------|
|
|
||||||
| 3 | sage | tests/auth.test.ts:15 | testing | Test names don't describe behavior | Rename to "should reject expired tokens" |
|
|
||||||
| 4 | guardian | src/auth/handler.ts:52 | security | Missing rate limit | Add rate limiter middleware |
|
|
||||||
|
|
||||||
### INFO (nice to have)
|
|
||||||
| # | Source | Location | Category | Description | Suggested Fix |
|
|
||||||
|---|--------|----------|----------|-------------|---------------|
|
|
||||||
| 5 | skeptic | src/auth/handler.ts:30 | design | Consider caching validated tokens | Add TTL cache |
|
|
||||||
```
|
|
||||||
|
|
||||||
### Deduplication
|
### Deduplication
|
||||||
|
|
||||||
Before listing findings, deduplicate across reviewers (same rule as `check-phase`):
|
Same file + same category + similar description = one finding. Use the higher severity, credit all sources (e.g. `guardian + skeptic`).
|
||||||
- Same file + same category + similar description = one finding
|
|
||||||
- Use the higher severity
|
|
||||||
- Credit all sources: `guardian + skeptic`
|
|
||||||
- Don't double-count in severity tallies
|
|
||||||
|
|
||||||
### Cross-Cycle Tracking
|
### Cross-Cycle Tracking (cycle > 1)
|
||||||
|
|
||||||
Compare against prior cycle findings (if cycle > 1):
|
Compare against prior cycle findings:
|
||||||
- **Resolved:** Finding from cycle N-1 no longer present → mark resolved, do not re-raise
|
- **Resolved** — no longer present, mark resolved, do not re-raise
|
||||||
- **Persisting:** Same location + category still present → increment `cycle_count`
|
- **Persisting** — same location + category, increment `cycle_count`
|
||||||
- **New:** Finding not seen before → add with `cycle_count: 1`
|
- **New** — first appearance, `cycle_count: 1`
|
||||||
|
|
||||||
If a finding persists for 2+ consecutive cycles, flag for user escalation (see Step 5).
|
Finding persisting 2+ cycles = flag for escalation (see Step 4).
|
||||||
|
|
||||||
---
|
---
|
||||||
|
|
||||||
## Step 2: Fix Routing
|
## Step 2: Fix Routing
|
||||||
|
|
||||||
Not all findings are fixed the same way. Route each finding based on its nature:
|
This is the **canonical routing table** (single source of truth for the whole system):
|
||||||
|
|
||||||
| Category | Fix Route | Rationale |
|
|
||||||
|----------|-----------|-----------|
|
|
||||||
| `security` | Spawn Maker with targeted instructions | Security fixes need tested code changes |
|
|
||||||
| `reliability` | Spawn Maker with targeted instructions | Same — code-level fix with test |
|
|
||||||
| `breaking-change` | Route to Creator in next cycle | Design decision needed |
|
|
||||||
| `design` | Route to Creator in next cycle | Architecture change, not a patch |
|
|
||||||
| `dependency` | Spawn Maker with targeted instructions | Package update or removal |
|
|
||||||
| `quality` | Spawn Maker or apply directly | Depends on scope (see below) |
|
|
||||||
| `testing` | Spawn Maker with targeted instructions | Tests need to be written and run |
|
|
||||||
| `consistency` | Apply directly or spawn Maker | Naming/style → direct. Pattern change → Maker |
|
|
||||||
|
|
||||||
### Direct Fix (no agent)
|
|
||||||
|
|
||||||
Apply directly with Edit tool when **all** of these are true:
|
|
||||||
- The fix is mechanical (typo, naming, formatting, import order)
|
|
||||||
- No behavioral change
|
|
||||||
- No test update needed
|
|
||||||
- Exactly one file affected
|
|
||||||
|
|
||||||
Examples: rename a variable, fix a typo in a string, reorder imports, fix indentation.
|
|
||||||
|
|
||||||
### Maker Fix (spawn agent)
|
|
||||||
|
|
||||||
Spawn a targeted Maker when the fix involves:
|
|
||||||
- Code logic changes
|
|
||||||
- New or modified tests
|
|
||||||
- Multiple files
|
|
||||||
- Any behavioral change
|
|
||||||
|
|
||||||
Provide the Maker with:
|
|
||||||
1. The specific finding(s) to address (not all findings — just the routed ones)
|
|
||||||
2. The file and line location
|
|
||||||
3. The suggested fix from the reviewer
|
|
||||||
4. The Maker's original branch (to apply fixes on top)
|
|
||||||
|
|
||||||
```
|
|
||||||
Agent(
|
|
||||||
description: "Fix: <finding description>",
|
|
||||||
prompt: "You are the MAKER archetype.
|
|
||||||
Apply this fix on branch: <maker's branch>
|
|
||||||
|
|
||||||
Finding: <source> | <severity> | <category>
|
|
||||||
Location: <file:line>
|
|
||||||
Issue: <description>
|
|
||||||
Suggested fix: <fix>
|
|
||||||
|
|
||||||
Rules:
|
|
||||||
1. Fix ONLY this issue — no other changes
|
|
||||||
2. Add/update tests if the fix changes behavior
|
|
||||||
3. Run existing tests — nothing may break
|
|
||||||
4. Commit with message: 'fix: <description>'
|
|
||||||
Do NOT refactor surrounding code.",
|
|
||||||
isolation: "worktree",
|
|
||||||
mode: "bypassPermissions"
|
|
||||||
)
|
|
||||||
```
|
|
||||||
|
|
||||||
### Writing/Prose Fix (domain-specific)
|
|
||||||
|
|
||||||
For writing projects (books, stories), voice or prose findings need special context:
|
|
||||||
|
|
||||||
```
|
|
||||||
Agent(
|
|
||||||
description: "Fix: voice drift in <file>",
|
|
||||||
prompt: "You are the MAKER archetype.
|
|
||||||
Apply this prose fix on branch: <maker's branch>
|
|
||||||
|
|
||||||
Finding: <source> | <severity> | <category>
|
|
||||||
Location: <file:line>
|
|
||||||
Issue: <description>
|
|
||||||
|
|
||||||
Voice profile to match: <load from .archeflow/config.yaml or project voice profile>
|
|
||||||
|
|
||||||
Rules:
|
|
||||||
1. Fix the flagged passage to match the voice profile
|
|
||||||
2. Do not rewrite surrounding paragraphs
|
|
||||||
3. Preserve the narrative intent — only change voice/style
|
|
||||||
4. Commit with message: 'fix: <description>'",
|
|
||||||
isolation: "worktree",
|
|
||||||
mode: "bypassPermissions"
|
|
||||||
)
|
|
||||||
```
|
|
||||||
|
|
||||||
### Design Fix (route to next cycle)
|
|
||||||
|
|
||||||
Findings that require design changes are NOT fixed in the Act phase. They become structured feedback for the Creator in the next PDCA cycle. Collect them into `act-feedback.md` (see Step 5).
|
|
||||||
|
|
||||||
---
|
|
||||||
|
|
||||||
## Step 3: Fix Application Protocol
|
|
||||||
|
|
||||||
Apply fixes in severity order: CRITICAL first, then WARNING, then INFO. Within the same severity, fix in file order (reduces context switching).
|
|
||||||
|
|
||||||
### For each fix:
|
|
||||||
|
|
||||||
1. **Apply the change** (direct edit or via Maker agent)
|
|
||||||
2. **Emit `fix.applied` event:**
|
|
||||||
```json
|
|
||||||
{
|
|
||||||
"type": "fix.applied",
|
|
||||||
"phase": "act",
|
|
||||||
"agent": "maker",
|
|
||||||
"data": {
|
|
||||||
"source": "guardian",
|
|
||||||
"finding": "Empty string bypasses validation",
|
|
||||||
"file": "src/auth/handler.ts",
|
|
||||||
"line": 48,
|
|
||||||
"severity": "CRITICAL",
|
|
||||||
"before": "<old code>",
|
|
||||||
"after": "<new code>"
|
|
||||||
},
|
|
||||||
"parent": [<seq of the review.verdict that found it>]
|
|
||||||
}
|
|
||||||
```
|
|
||||||
3. **Targeted re-check** (if the fix is non-trivial):
|
|
||||||
- Re-run only the reviewer that raised the finding
|
|
||||||
- Scope the re-check to just the changed file(s)
|
|
||||||
- If the re-check raises new findings → add them to the findings list with source `re-check:<reviewer>`
|
|
||||||
|
|
||||||
### Batching Maker Fixes
|
|
||||||
|
|
||||||
If multiple findings route to the same Maker and affect the same file or tightly coupled files, batch them into a single Maker spawn:
|
|
||||||
|
|
||||||
```
|
|
||||||
Agent(
|
|
||||||
description: "Fix: 3 findings in src/auth/",
|
|
||||||
prompt: "You are the MAKER archetype.
|
|
||||||
Apply these fixes on branch: <maker's branch>
|
|
||||||
|
|
||||||
1. [CRITICAL] src/auth/handler.ts:48 — Empty string bypass → Add length check
|
|
||||||
2. [WARNING] src/auth/handler.ts:52 — Missing rate limit → Add middleware
|
|
||||||
3. [WARNING] tests/auth.test.ts:15 — Bad test names → Rename to behavior descriptions
|
|
||||||
|
|
||||||
Fix all three. Commit each as a separate commit.
|
|
||||||
Run tests after all fixes."
|
|
||||||
)
|
|
||||||
```
|
|
||||||
|
|
||||||
Batch only within the same functional area. Don't batch unrelated fixes — the Maker loses focus.
|
|
||||||
|
|
||||||
---
|
|
||||||
|
|
||||||
## Step 4: Exit Decision
|
|
||||||
|
|
||||||
After all fixes are applied, evaluate exit conditions:
|
|
||||||
|
|
||||||
### Decision Tree
|
|
||||||
|
|
||||||
```
|
|
||||||
┌─ Count remaining CRITICAL findings (including from re-checks)
|
|
||||||
│
|
|
||||||
├─ CRITICAL = 0 AND completion criteria met (if defined)
|
|
||||||
│ └─ EXIT: Proceed to merge
|
|
||||||
│
|
|
||||||
├─ CRITICAL = 0 AND completion criteria NOT met
|
|
||||||
│ └─ CYCLE: Feed back "completion criteria failing" to Creator
|
|
||||||
│
|
|
||||||
├─ CRITICAL > 0 AND cycles_remaining > 0
|
|
||||||
│ └─ CYCLE: Build feedback, go to Plan phase
|
|
||||||
│
|
|
||||||
├─ CRITICAL > 0 AND cycles_remaining = 0
|
|
||||||
│ └─ STOP: Report to user with unresolved findings
|
|
||||||
│
|
|
||||||
└─ Same CRITICAL finding persisted 2+ cycles
|
|
||||||
└─ ESCALATE: Stop and ask user for guidance
|
|
||||||
```
|
|
||||||
|
|
||||||
### Emit `cycle.boundary` event:
|
|
||||||
```json
|
|
||||||
{
|
|
||||||
"type": "cycle.boundary",
|
|
||||||
"phase": "act",
|
|
||||||
"data": {
|
|
||||||
"cycle": 1,
|
|
||||||
"max_cycles": 2,
|
|
||||||
"exit_condition": "all_approved",
|
|
||||||
"met": false,
|
|
||||||
"critical_remaining": 1,
|
|
||||||
"warning_remaining": 2,
|
|
||||||
"info_remaining": 1,
|
|
||||||
"fixes_applied": 3,
|
|
||||||
"design_issues_forwarded": 1,
|
|
||||||
"next_action": "cycle"
|
|
||||||
}
|
|
||||||
}
|
|
||||||
```
|
|
||||||
|
|
||||||
---
|
|
||||||
|
|
||||||
## Step 5: Cycle Feedback Protocol
|
|
||||||
|
|
||||||
When cycling back, produce `act-feedback.md` as a structured handoff. This replaces dumping raw findings.
|
|
||||||
|
|
||||||
```markdown
|
|
||||||
## Cycle N Feedback → Cycle N+1
|
|
||||||
|
|
||||||
### For Creator (design changes needed)
|
|
||||||
| # | Source | Severity | Category | Issue | Cycles Open |
|
|
||||||
|---|--------|----------|----------|-------|-------------|
|
|
||||||
| 1 | guardian | CRITICAL | security | SQL injection in user input | 1 |
|
|
||||||
| 2 | skeptic | WARNING | design | Assumes single-tenant only | 1 |
|
|
||||||
|
|
||||||
### For Maker (implementation fixes needed)
|
|
||||||
| # | Source | Severity | Category | Issue | Cycles Open |
|
|
||||||
|---|--------|----------|----------|-------|-------------|
|
|
||||||
| 3 | sage | WARNING | testing | Test assertions too weak | 1 |
|
|
||||||
| 4 | trickster | WARNING | reliability | Error path not tested | 1 |
|
|
||||||
|
|
||||||
### Resolved in This Cycle
|
|
||||||
| # | Source | Issue | How Resolved |
|
|
||||||
|---|--------|-------|--------------|
|
|
||||||
| 5 | guardian | Missing rate limit | Added rate limiter middleware (commit abc123) |
|
|
||||||
| 6 | sage | Test names unclear | Renamed to behavior descriptions (commit def456) |
|
|
||||||
|
|
||||||
### Persisting Issues (escalation candidates)
|
|
||||||
| # | Source | Issue | Cycles Open | Action |
|
|
||||||
|---|--------|-------|-------------|--------|
|
|
||||||
| — | — | — | — | — |
|
|
||||||
```
|
|
||||||
|
|
||||||
**Routing rules** (canonical table — matches orchestration and artifact-routing skills):
|
|
||||||
|
|
||||||
| Source | Category | Routes to | Reason |
|
| Source | Category | Routes to | Reason |
|
||||||
|--------|----------|-----------|--------|
|
|--------|----------|-----------|--------|
|
||||||
@@ -296,76 +50,91 @@ When cycling back, produce `act-feedback.md` as a structured handoff. This repla
|
|||||||
| Sage | quality, consistency | Maker | Implementation refinement |
|
| Sage | quality, consistency | Maker | Implementation refinement |
|
||||||
| Sage | testing | Maker | Test gap, not design flaw |
|
| Sage | testing | Maker | Test gap, not design flaw |
|
||||||
| Trickster | reliability (design flaw) | Creator | Needs redesign |
|
| Trickster | reliability (design flaw) | Creator | Needs redesign |
|
||||||
| Trickster | reliability (test gap) | Maker | Needs more tests |
|
| Trickster | reliability (test gap), testing | Maker | Needs more tests |
|
||||||
| Trickster | testing | Maker | Edge case not covered |
|
|
||||||
|
|
||||||
**Disambiguation rule:** When in doubt: if the fix requires changing the approach, route to Creator. If it requires changing the code within the existing approach, route to Maker.
|
**Disambiguation:** If the fix requires changing the approach → Creator. If it requires changing code within the existing approach → Maker.
|
||||||
|
|
||||||
|
### Direct Fix (no agent)
|
||||||
|
|
||||||
|
Apply with Edit tool when **all** are true:
|
||||||
|
- Mechanical (typo, naming, formatting, import order)
|
||||||
|
- No behavioral change
|
||||||
|
- No test update needed
|
||||||
|
- Single file
|
||||||
|
|
||||||
|
### Maker Fix (spawn agent)
|
||||||
|
|
||||||
|
Spawn a targeted Maker when the fix involves code logic, tests, multiple files, or behavioral changes. Batch findings in the same file area into one Maker spawn.
|
||||||
|
|
||||||
|
```
|
||||||
|
Agent(
|
||||||
|
description: "Fix: <description>",
|
||||||
|
prompt: "You are the MAKER archetype.
|
||||||
|
Branch: <maker's branch>
|
||||||
|
Findings:
|
||||||
|
1. [CRITICAL] file:line — issue → suggested fix
|
||||||
|
2. [WARNING] file:line — issue → suggested fix
|
||||||
|
Rules: fix ONLY these issues, add/update tests if behavior changes,
|
||||||
|
run tests, commit each fix separately as 'fix: <description>'.
|
||||||
|
Do NOT refactor surrounding code.",
|
||||||
|
isolation: "worktree",
|
||||||
|
mode: "bypassPermissions"
|
||||||
|
)
|
||||||
|
```
|
||||||
|
|
||||||
|
### Design Fix (route to Creator)
|
||||||
|
|
||||||
|
Design findings are NOT fixed in Act. Collect them into `act-feedback.md` for the Creator in the next cycle (see Step 5).
|
||||||
|
|
||||||
---
|
---
|
||||||
|
|
||||||
## Step 6: Incremental Runs
|
## Step 3: Fix Application
|
||||||
|
|
||||||
Support starting the orchestration from any phase by reusing existing artifacts.
|
Apply in severity order: CRITICAL → WARNING → INFO. Within same severity, group by file.
|
||||||
|
|
||||||
### `--start-from check`
|
For each fix:
|
||||||
|
1. Apply the change (direct edit or via Maker agent)
|
||||||
Re-run Check + Act on existing Do artifacts:
|
2. Emit `fix.applied` event with source, finding, file, severity, before/after
|
||||||
1. Read `.archeflow/artifacts/<run_id>/` for Maker branch and implementation summary
|
3. For non-trivial fixes: re-run only the originating reviewer scoped to changed files. New findings from re-check get added with source `re-check:<reviewer>`
|
||||||
2. Verify the Maker branch still exists (`git branch --list`)
|
|
||||||
3. Spawn reviewers against the existing branch
|
|
||||||
4. Proceed through Act phase normally
|
|
||||||
|
|
||||||
### `--start-from act`
|
|
||||||
|
|
||||||
Re-run Act with existing Check findings:
|
|
||||||
1. Read `.archeflow/artifacts/<run_id>/` for Check phase consolidated output
|
|
||||||
2. Parse findings from the stored reviewer outputs
|
|
||||||
3. Skip finding collection (already done) — proceed from Step 2 (Fix Routing)
|
|
||||||
|
|
||||||
### `--start-from do`
|
|
||||||
|
|
||||||
Re-run Do + Check + Act with existing Plan:
|
|
||||||
1. Read `.archeflow/artifacts/<run_id>/` for Creator's proposal
|
|
||||||
2. Verify proposal exists and is parseable
|
|
||||||
3. Spawn Maker with the existing proposal
|
|
||||||
4. Proceed through Check and Act normally
|
|
||||||
|
|
||||||
### Artifact Verification
|
|
||||||
|
|
||||||
Before starting from a mid-point, verify required artifacts exist:
|
|
||||||
|
|
||||||
```
|
|
||||||
--start-from do → needs: proposal (Creator output)
|
|
||||||
--start-from check → needs: proposal + implementation (Maker branch + summary)
|
|
||||||
--start-from act → needs: proposal + implementation + review outputs
|
|
||||||
```
|
|
||||||
|
|
||||||
If artifacts are missing, report which ones and abort. Don't guess or generate placeholders.
|
|
||||||
|
|
||||||
### Event Continuity
|
|
||||||
|
|
||||||
For incremental runs, emit events with `parent` pointing to the existing artifacts' events:
|
|
||||||
1. Read the existing `<run_id>.jsonl` to find the last `seq` number
|
|
||||||
2. Continue sequence numbering from there
|
|
||||||
3. Set `parent` on the first new event to point to the last event of the prior phase
|
|
||||||
|
|
||||||
---
|
---
|
||||||
|
|
||||||
## Act Phase Checklist (Quick Reference)
|
## Step 4: Exit Decision
|
||||||
|
|
||||||
```
|
```
|
||||||
□ Parse all reviewer outputs into consolidated findings table
|
CRITICAL = 0 AND criteria met → EXIT: proceed to merge
|
||||||
□ Deduplicate across reviewers
|
CRITICAL = 0 AND criteria NOT met → CYCLE: feedback to Creator
|
||||||
□ Compare against prior cycle findings (if cycle > 1)
|
CRITICAL > 0 AND cycles remaining → CYCLE: build feedback, go to Plan
|
||||||
□ Route each finding: direct fix / Maker / Creator feedback
|
CRITICAL > 0 AND no cycles left → STOP: report unresolved to user
|
||||||
□ Apply direct fixes first (fastest)
|
Same CRITICAL persists 2+ cycles → ESCALATE: ask user for guidance
|
||||||
□ Spawn Maker(s) for code fixes (batch by file area)
|
|
||||||
□ Emit fix.applied event for each fix
|
|
||||||
□ Re-check non-trivial fixes with the originating reviewer
|
|
||||||
□ Count remaining CRITICALs after all fixes
|
|
||||||
□ Check completion criteria (if defined)
|
|
||||||
□ Decide: exit / cycle / escalate
|
|
||||||
□ If cycling: produce act-feedback.md with routed findings
|
|
||||||
□ If exiting: proceed to merge (see orchestration skill Step 4)
|
|
||||||
□ Emit cycle.boundary event
|
|
||||||
```
|
```
|
||||||
|
|
||||||
|
Emit `cycle.boundary` event with: cycle number, max_cycles, critical/warning/info remaining, fixes applied, next action.
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Step 5: Cycle Feedback
|
||||||
|
|
||||||
|
When cycling back, produce `act-feedback.md`:
|
||||||
|
|
||||||
|
```markdown
|
||||||
|
## Cycle N → Cycle N+1
|
||||||
|
|
||||||
|
### For Creator (design changes needed)
|
||||||
|
| # | Source | Severity | Category | Issue | Cycles Open |
|
||||||
|
|---|--------|----------|----------|-------|-------------|
|
||||||
|
|
||||||
|
### For Maker (implementation fixes needed)
|
||||||
|
| # | Source | Severity | Category | Issue | Cycles Open |
|
||||||
|
|---|--------|----------|----------|-------|-------------|
|
||||||
|
|
||||||
|
### Resolved This Cycle
|
||||||
|
| # | Source | Issue | How Resolved |
|
||||||
|
|---|--------|-------|--------------|
|
||||||
|
|
||||||
|
### Persisting Issues (escalation candidates)
|
||||||
|
| # | Source | Issue | Cycles Open | Action |
|
||||||
|
|---|--------|-------|-------------|--------|
|
||||||
|
```
|
||||||
|
|
||||||
|
Route findings into Creator vs Maker sections using the routing table in Step 2.
|
||||||
|
|||||||
146
skills/review/SKILL.md
Normal file
146
skills/review/SKILL.md
Normal file
@@ -0,0 +1,146 @@
|
|||||||
|
---
|
||||||
|
name: review
|
||||||
|
description: |
|
||||||
|
Review-only mode. Run Guardian + optional reviewers on an existing diff or branch,
|
||||||
|
without any Plan/Do orchestration. The highest-ROI mode for catching design-level bugs.
|
||||||
|
<example>User: "af-review"</example>
|
||||||
|
<example>User: "Review the last commit"</example>
|
||||||
|
<example>User: "af-review --reviewers guardian,skeptic"</example>
|
||||||
|
---
|
||||||
|
|
||||||
|
# ArcheFlow Review Mode
|
||||||
|
|
||||||
|
Run reviewers on existing code changes without orchestrating implementation.
|
||||||
|
This is the most cost-effective mode — it delivers Guardian's error-path analysis
|
||||||
|
without the Maker overhead.
|
||||||
|
|
||||||
|
## When to Use
|
||||||
|
|
||||||
|
- After you've implemented something and want a quality check
|
||||||
|
- On a PR or branch before merging
|
||||||
|
- When the sprint runner flags a task as DONE_WITH_CONCERNS
|
||||||
|
- As a pre-commit quality gate for complex changes
|
||||||
|
|
||||||
|
## Invocation
|
||||||
|
|
||||||
|
```
|
||||||
|
af-review # Review uncommitted changes
|
||||||
|
af-review --branch feat/batch-api # Review branch diff against main
|
||||||
|
af-review --commit HEAD~3..HEAD # Review last 3 commits
|
||||||
|
af-review --reviewers guardian,skeptic,sage # Choose reviewers (default: guardian)
|
||||||
|
af-review --evidence # Enable evidence-gating (stricter)
|
||||||
|
```
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Execution
|
||||||
|
|
||||||
|
### Step 1: Get the Diff
|
||||||
|
|
||||||
|
Use `lib/archeflow-review.sh` to extract the diff and stats:
|
||||||
|
|
||||||
|
```bash
|
||||||
|
# Uncommitted changes (default)
|
||||||
|
DIFF=$(bash lib/archeflow-review.sh)
|
||||||
|
|
||||||
|
# Branch diff against main
|
||||||
|
DIFF=$(bash lib/archeflow-review.sh --branch feat/batch-api)
|
||||||
|
|
||||||
|
# Commit range
|
||||||
|
DIFF=$(bash lib/archeflow-review.sh --commit HEAD~3..HEAD)
|
||||||
|
|
||||||
|
# Override base branch
|
||||||
|
DIFF=$(bash lib/archeflow-review.sh --branch feat/x --base develop)
|
||||||
|
|
||||||
|
# Stats only (no diff output)
|
||||||
|
bash lib/archeflow-review.sh --stat-only
|
||||||
|
```
|
||||||
|
|
||||||
|
The script prints the diff to stdout and stats to stderr. It exits 1 if the diff
|
||||||
|
is empty (nothing to review). For large diffs (>500 lines), it warns on stderr.
|
||||||
|
|
||||||
|
### Step 2: Spawn Reviewers
|
||||||
|
|
||||||
|
Default: Guardian only (fastest, highest ROI).
|
||||||
|
With `--reviewers`: spawn requested reviewers in parallel.
|
||||||
|
|
||||||
|
**Guardian** (always first):
|
||||||
|
```
|
||||||
|
Agent(
|
||||||
|
description: "Guardian: review changes for <project>",
|
||||||
|
prompt: "You are the GUARDIAN archetype — security and risk reviewer.
|
||||||
|
|
||||||
|
Review this diff for: security vulnerabilities, error handling gaps,
|
||||||
|
data loss scenarios, race conditions, and breaking changes.
|
||||||
|
|
||||||
|
For each finding: cite specific code (file:line), state what you tested
|
||||||
|
or observed, state what the correct behavior should be.
|
||||||
|
|
||||||
|
Diff:
|
||||||
|
<DIFF>
|
||||||
|
|
||||||
|
STATUS: DONE | DONE_WITH_CONCERNS | NEEDS_CONTEXT | BLOCKED",
|
||||||
|
subagent_type: "code-reviewer"
|
||||||
|
)
|
||||||
|
```
|
||||||
|
|
||||||
|
**Skeptic** (if requested):
|
||||||
|
- Focus: hidden assumptions, edge cases, scalability
|
||||||
|
- Context: diff + any design docs
|
||||||
|
|
||||||
|
**Sage** (if requested):
|
||||||
|
- Focus: code quality, test coverage, maintainability
|
||||||
|
- Context: diff + surrounding code
|
||||||
|
|
||||||
|
**Trickster** (if requested):
|
||||||
|
- Focus: adversarial inputs, failure injection, chaos testing
|
||||||
|
- Context: diff only
|
||||||
|
|
||||||
|
### Step 3: Collect and Report
|
||||||
|
|
||||||
|
Parse each reviewer's output. Show findings:
|
||||||
|
|
||||||
|
```
|
||||||
|
── af-review: <project> ───────────────────────
|
||||||
|
Reviewers: guardian, skeptic
|
||||||
|
|
||||||
|
🛡️ Guardian: 2 findings (1 HIGH, 1 MEDIUM)
|
||||||
|
[HIGH] Timeout marks variant as done — loses batch state (fanout.py:552)
|
||||||
|
[MEDIUM] No JSON error handling on corrupted state (batch.py:310)
|
||||||
|
|
||||||
|
🤔 Skeptic: 1 finding (1 INFO)
|
||||||
|
[INFO] hash() non-deterministic across processes (fanout.py:524)
|
||||||
|
|
||||||
|
Total: 3 findings (1 HIGH, 1 MEDIUM, 1 INFO)
|
||||||
|
────────────────────────────────────────────────
|
||||||
|
```
|
||||||
|
|
||||||
|
### Step 4: Evidence Gate (if --evidence)
|
||||||
|
|
||||||
|
When `--evidence` is active, apply the evidence requirements from `archeflow:check-phase`:
|
||||||
|
- Scan findings for banned phrases ("might be", "could potentially", etc.)
|
||||||
|
- Check for evidence markers (exit codes, line numbers, reproduction steps)
|
||||||
|
- Downgrade unsupported findings to INFO
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Integration with Sprint Runner
|
||||||
|
|
||||||
|
The sprint runner can invoke `af-review` automatically:
|
||||||
|
|
||||||
|
| Sprint trigger | Review action |
|
||||||
|
|----------------|--------------|
|
||||||
|
| Task marked DONE_WITH_CONCERNS | Run Guardian on the agent's changes |
|
||||||
|
| Task is L/XL estimate | Run Guardian + Skeptic after completion |
|
||||||
|
| Task involves security keywords | Run Guardian automatically |
|
||||||
|
| User requests | Run specified reviewers |
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Cost
|
||||||
|
|
||||||
|
Review-only is 60-80% cheaper than full PDCA:
|
||||||
|
- No Explorer research (~30% of PDCA cost)
|
||||||
|
- No Creator planning (~20% of PDCA cost)
|
||||||
|
- No Maker implementation (already done)
|
||||||
|
- Only reviewer token costs remain
|
||||||
302
skills/sprint/SKILL.md
Normal file
302
skills/sprint/SKILL.md
Normal file
@@ -0,0 +1,302 @@
|
|||||||
|
---
|
||||||
|
name: sprint
|
||||||
|
description: |
|
||||||
|
Workspace sprint runner. Reads queue.json, spawns parallel agent teams across projects,
|
||||||
|
manages lifecycle (commit, push, next task), tracks progress. The main operational mode
|
||||||
|
for ArcheFlow in multi-project workspaces.
|
||||||
|
<example>User: "af-sprint"</example>
|
||||||
|
<example>User: "Run the sprint"</example>
|
||||||
|
<example>User: "af-sprint --slots 5 --dry-run"</example>
|
||||||
|
---
|
||||||
|
|
||||||
|
# Workspace Sprint Runner
|
||||||
|
|
||||||
|
Read the task queue, spawn parallel agents across projects, collect results, commit+push,
|
||||||
|
spawn next batch. Repeat until the queue is drained or budget is exhausted.
|
||||||
|
|
||||||
|
## When to Use
|
||||||
|
|
||||||
|
This is the **primary operational mode** for ArcheFlow in multi-project workspaces.
|
||||||
|
Use it when the user says "run the sprint", "work the queue", "go autonomous", or
|
||||||
|
invokes `af-sprint`.
|
||||||
|
|
||||||
|
Do NOT use `archeflow:run` for individual tasks within a sprint — the sprint runner
|
||||||
|
handles task dispatch internally, using `archeflow:run` only when a task warrants
|
||||||
|
full PDCA orchestration.
|
||||||
|
|
||||||
|
## Prerequisites
|
||||||
|
|
||||||
|
- `docs/orchestra/queue.json` — task queue (managed by `./scripts/ws`)
|
||||||
|
- `./scripts/ws` — workspace CLI for queue operations
|
||||||
|
- Each project is a separate git repo under the workspace root
|
||||||
|
|
||||||
|
## Invocation
|
||||||
|
|
||||||
|
```
|
||||||
|
af-sprint # Run sprint with defaults (4 slots, AUTONOM mode)
|
||||||
|
af-sprint --slots 5 # Max 5 parallel agents
|
||||||
|
af-sprint --dry-run # Show what would run, don't execute
|
||||||
|
af-sprint --priority P0,P1 # Only process P0 and P1 items
|
||||||
|
af-sprint --project writing.colette # Only process items for this project
|
||||||
|
```
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Execution Protocol
|
||||||
|
|
||||||
|
### Step 0: Orient
|
||||||
|
|
||||||
|
```bash
|
||||||
|
# Load queue and workspace state
|
||||||
|
QUEUE=$(cat docs/orchestra/queue.json)
|
||||||
|
MODE=$(echo "$QUEUE" | jq -r '.mode')
|
||||||
|
```
|
||||||
|
|
||||||
|
Check mode:
|
||||||
|
- `AUTONOM` → proceed without asking
|
||||||
|
- `ATTENDED` → show plan, wait for user approval before each batch
|
||||||
|
- `PAUSED` → report status only, do not start tasks
|
||||||
|
|
||||||
|
Show one-line status:
|
||||||
|
```
|
||||||
|
sprint: AUTONOM · 7 pending (1×P0, 1×P2, 5×P3) · 4 slots
|
||||||
|
```
|
||||||
|
|
||||||
|
### Step 1: Select Batch
|
||||||
|
|
||||||
|
Pick tasks for the next batch. Rules:
|
||||||
|
|
||||||
|
1. **Priority cascade**: P0 first, then P1, then P2. Never start P3 unless user explicitly includes it.
|
||||||
|
2. **Dependency check**: Skip tasks whose `depends_on` items aren't all `completed`.
|
||||||
|
3. **One agent per project**: Never run two tasks on the same project simultaneously.
|
||||||
|
4. **Cost-aware concurrency**:
|
||||||
|
- Estimate task cost from `estimate` field: S=cheap, M=moderate, L=expensive, XL=very expensive
|
||||||
|
- **Expensive tasks** (L, XL): max 2 concurrent
|
||||||
|
- **Cheap tasks** (S, M): fill remaining slots
|
||||||
|
- Target mix: 1-2 expensive + 2-3 cheap = 4-5 total
|
||||||
|
5. **Slot limit**: Never exceed `--slots` (default 4).
|
||||||
|
|
||||||
|
```python
|
||||||
|
# Pseudocode for batch selection
|
||||||
|
batch = []
|
||||||
|
used_projects = set()
|
||||||
|
expensive_count = 0
|
||||||
|
|
||||||
|
for priority in ["P0", "P1", "P2"]:
|
||||||
|
for task in queue_items(priority, status="pending"):
|
||||||
|
if len(batch) >= MAX_SLOTS:
|
||||||
|
break
|
||||||
|
if task.project in used_projects:
|
||||||
|
continue # One agent per project
|
||||||
|
if not deps_satisfied(task):
|
||||||
|
continue
|
||||||
|
if task.estimate in ("L", "XL"):
|
||||||
|
if expensive_count >= 2:
|
||||||
|
continue
|
||||||
|
expensive_count += 1
|
||||||
|
batch.append(task)
|
||||||
|
used_projects.add(task.project)
|
||||||
|
```
|
||||||
|
|
||||||
|
### Step 2: Assess and Dispatch
|
||||||
|
|
||||||
|
For each task in the batch, decide the execution strategy:
|
||||||
|
|
||||||
|
| Signal | Strategy | What happens |
|
||||||
|
|--------|----------|-------------|
|
||||||
|
| Estimate S, clear scope | **Direct** | Spawn Agent() with task description, no orchestration |
|
||||||
|
| Estimate M, multi-file | **Direct+** | Spawn Agent() with task + "read code first, run tests after" |
|
||||||
|
| Estimate L/XL, code | **Feature-dev style** | Agent explores → implements → self-reviews (see below) |
|
||||||
|
| Estimate L/XL, writing | **PDCA** | Use af-run with writing domain archetypes |
|
||||||
|
| Task contains "validate", "test", "lint", "check" | **Direct** | Cheap analytical task, no orchestration |
|
||||||
|
| Task contains "review", "audit", "security" | **Review** | Spawn Guardian + relevant reviewers only |
|
||||||
|
|
||||||
|
### L/XL Code Task Template (feature-dev style)
|
||||||
|
|
||||||
|
For complex code tasks, give the agent a structured process instead of PDCA:
|
||||||
|
|
||||||
|
```
|
||||||
|
Agent(
|
||||||
|
description: "<project>: <task-short>",
|
||||||
|
prompt: "You are working on project <project> at <path>.
|
||||||
|
Task: <task description>
|
||||||
|
|
||||||
|
Follow this process:
|
||||||
|
1. EXPLORE: Read CLAUDE.md, docs/status.md, and the relevant source files.
|
||||||
|
Understand existing patterns before writing anything.
|
||||||
|
2. PLAN: Identify 2-3 files to change. Write a brief plan (what, where, why).
|
||||||
|
If ambiguous, list your assumptions.
|
||||||
|
3. IMPLEMENT: Make the changes. Follow existing code patterns strictly.
|
||||||
|
4. TEST: Run the project's test suite. Fix any failures.
|
||||||
|
5. SELF-REVIEW: Before committing, re-read your diff. Check:
|
||||||
|
- Error handling: what happens when this fails?
|
||||||
|
- Protocol compliance: am I using the right function signatures?
|
||||||
|
- Tests: did I test the important paths?
|
||||||
|
6. COMMIT + PUSH: Conventional commits, signed, pushed.
|
||||||
|
|
||||||
|
<standard rules>
|
||||||
|
|
||||||
|
STATUS: DONE | DONE_WITH_CONCERNS | NEEDS_CONTEXT | BLOCKED"
|
||||||
|
)
|
||||||
|
```
|
||||||
|
|
||||||
|
This gives the agent feature-dev's structured exploration without the multi-agent overhead.
|
||||||
|
For writing/research L/XL tasks, use af-run instead — archetypes add value where linters don't exist.
|
||||||
|
|
||||||
|
**Agent spawn template:**
|
||||||
|
|
||||||
|
For each task in the batch, spawn an Agent in the SAME message (parallel dispatch):
|
||||||
|
|
||||||
|
```
|
||||||
|
Agent(
|
||||||
|
description: "<project>: <task-short>",
|
||||||
|
prompt: "You are working on project <project> at <path>.
|
||||||
|
Task: <task description>
|
||||||
|
<notes if any>
|
||||||
|
|
||||||
|
Rules:
|
||||||
|
- Read the project's CLAUDE.md first
|
||||||
|
- Commit with: git -c user.signingkey=/home/c/.ssh/id_ed25519_dev.pub commit
|
||||||
|
- NO Co-Authored-By trailers
|
||||||
|
- Conventional commits
|
||||||
|
- Push when done: GIT_SSH_COMMAND='ssh -i /home/c/.ssh/id_ed25519_dev -o IdentitiesOnly=yes' git push origin main
|
||||||
|
- Run tests if the project has them
|
||||||
|
- Report: what you did, what changed, any blockers
|
||||||
|
|
||||||
|
STATUS: DONE | DONE_WITH_CONCERNS | NEEDS_CONTEXT | BLOCKED",
|
||||||
|
subagent_type: "general-purpose",
|
||||||
|
isolation: "worktree" # Only for L/XL tasks; S/M tasks run directly
|
||||||
|
)
|
||||||
|
```
|
||||||
|
|
||||||
|
**CRITICAL: Spawn all batch agents in a SINGLE message.** This enables parallel execution.
|
||||||
|
Do not spawn them sequentially.
|
||||||
|
|
||||||
|
### Step 3: Mark Running
|
||||||
|
|
||||||
|
After spawning, update the queue:
|
||||||
|
|
||||||
|
```bash
|
||||||
|
# For each spawned task
|
||||||
|
./scripts/ws start <task-id> # or manually update queue.json status to "running"
|
||||||
|
```
|
||||||
|
|
||||||
|
If `./scripts/ws start` doesn't exist, update queue.json directly:
|
||||||
|
```python
|
||||||
|
task["status"] = "running"
|
||||||
|
# Write back to docs/orchestra/queue.json
|
||||||
|
```
|
||||||
|
|
||||||
|
### Step 4: Collect Results
|
||||||
|
|
||||||
|
As agents complete, process their results:
|
||||||
|
|
||||||
|
1. **Parse status token** from agent output (last line: `STATUS: DONE|...`)
|
||||||
|
2. **Based on status**:
|
||||||
|
- `DONE` → mark completed, note result
|
||||||
|
- `DONE_WITH_CONCERNS` → mark completed, log concerns for user review
|
||||||
|
- `NEEDS_CONTEXT` → mark pending, add concern to notes, skip for now
|
||||||
|
- `BLOCKED` → mark failed, add blocker to notes
|
||||||
|
3. **Update queue**:
|
||||||
|
```bash
|
||||||
|
./scripts/ws done <task-id> -r "<summary of what was done>"
|
||||||
|
# or
|
||||||
|
./scripts/ws fail <task-id> -r "<reason>"
|
||||||
|
```
|
||||||
|
|
||||||
|
### Step 5: Report and Loop
|
||||||
|
|
||||||
|
After batch completes, show sprint status:
|
||||||
|
|
||||||
|
```
|
||||||
|
── Sprint Batch 1 ──────────────────────────────
|
||||||
|
✓ writing.colette fanout run done (45s)
|
||||||
|
✓ book.3sets validation done (30s)
|
||||||
|
△ book.sos meta-book concept needs_context (missing outline)
|
||||||
|
✓ tool.archeflow af-review mode done (60s)
|
||||||
|
|
||||||
|
Queue: 3 completed, 1 blocked, 3 remaining
|
||||||
|
Next batch: 2 items ready
|
||||||
|
────────────────────────────────────────────────
|
||||||
|
```
|
||||||
|
|
||||||
|
Then **immediately select and dispatch the next batch** (Step 1). Don't wait for user input in AUTONOM mode.
|
||||||
|
|
||||||
|
### Step 6: Sprint Complete
|
||||||
|
|
||||||
|
When no more tasks are schedulable (all done, blocked, or P3-only):
|
||||||
|
|
||||||
|
1. Update `docs/control-center.md` Handoff section
|
||||||
|
2. Run `./scripts/ws log --summary "<sprint summary>"` if available
|
||||||
|
3. Show final sprint report:
|
||||||
|
|
||||||
|
```
|
||||||
|
── Sprint Complete ─────────────────────────────
|
||||||
|
Duration: 12 min
|
||||||
|
Tasks: 5 completed, 1 blocked, 1 remaining (P3)
|
||||||
|
Projects touched: 4
|
||||||
|
Commits: 7
|
||||||
|
────────────────────────────────────────────────
|
||||||
|
```
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Mode Behavior
|
||||||
|
|
||||||
|
### AUTONOM
|
||||||
|
- Dispatch immediately, no user confirmation
|
||||||
|
- Commit + push after each agent completes
|
||||||
|
- Only pause for BLOCKED tasks or budget exhaustion
|
||||||
|
- Report between batches (one-line status)
|
||||||
|
|
||||||
|
### ATTENDED
|
||||||
|
- Show the selected batch before dispatching
|
||||||
|
- Wait for user to approve: "Proceed with this batch? [y/n]"
|
||||||
|
- After each batch, show results and ask: "Continue to next batch? [y/n/edit]"
|
||||||
|
- "edit" lets the user reprioritize before next batch
|
||||||
|
|
||||||
|
### PAUSED
|
||||||
|
- Show queue status only
|
||||||
|
- Do not dispatch any agents
|
||||||
|
- Useful for reviewing state between sessions
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## When to Use ArcheFlow Orchestration Within Sprint
|
||||||
|
|
||||||
|
Most sprint tasks should be **direct agent dispatch** (no PDCA/pipeline overhead).
|
||||||
|
Only escalate to full orchestration when:
|
||||||
|
|
||||||
|
| Signal | Action |
|
||||||
|
|--------|--------|
|
||||||
|
| Task is S/M, clear scope, single project | Direct dispatch |
|
||||||
|
| Task is L/XL | Use pipeline or PDCA strategy |
|
||||||
|
| Task mentions "security", "auth", "encryption" | Add Guardian review |
|
||||||
|
| Task is a review/audit | Spawn reviewers only (af-review mode) |
|
||||||
|
| Task failed in a previous sprint | Escalate to PDCA with Explorer |
|
||||||
|
|
||||||
|
The sprint runner's job is **throughput**, not perfection. Ship fast, fix forward.
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Integration with Existing Tools
|
||||||
|
|
||||||
|
| Tool | How sprint uses it |
|
||||||
|
|------|-------------------|
|
||||||
|
| `./scripts/ws next` | Get next schedulable task |
|
||||||
|
| `./scripts/ws done <id>` | Mark task completed |
|
||||||
|
| `./scripts/ws fail <id>` | Mark task failed |
|
||||||
|
| `./scripts/ws orient` | Initial workspace overview |
|
||||||
|
| `./scripts/ws validate` | Pre-flight queue validation |
|
||||||
|
| `git` per project | Commit + push after each agent |
|
||||||
|
| `archeflow:run` | Only for L/XL tasks needing PDCA |
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Error Recovery
|
||||||
|
|
||||||
|
- **Agent crashes mid-task**: Mark task as `failed`, add error to notes, continue with next batch
|
||||||
|
- **Git push fails**: Log the error, do NOT retry. User will handle push conflicts manually.
|
||||||
|
- **Queue file corrupted**: Run `./scripts/ws validate`. If invalid, stop sprint and report.
|
||||||
|
- **Budget exceeded**: Stop sprint, report remaining tasks and estimated cost.
|
||||||
|
- **All tasks blocked**: Report dependency graph, suggest which blockers to resolve first.
|
||||||
@@ -17,16 +17,27 @@ Where `<domain>` is auto-detected: `writing` if `colette.yaml` exists, `research
|
|||||||
|
|
||||||
During runs, follow the `archeflow:presence` skill for output format: show outcomes not mechanics, one line per phase, value at the end.
|
During runs, follow the `archeflow:presence` skill for output format: show outcomes not mechanics, one line per phase, value at the end.
|
||||||
|
|
||||||
## IMPORTANT: When to Activate
|
## IMPORTANT: When to Use What
|
||||||
|
|
||||||
You MUST use ArcheFlow orchestration (load `archeflow:orchestration` skill and follow its steps) for any task that matches:
|
### Use `/af-sprint` (primary mode) when:
|
||||||
|
- User says "run the sprint", "work the queue", "go autonomous"
|
||||||
|
- Multiple tasks are pending across projects
|
||||||
|
- The workspace queue (docs/orchestra/queue.json) has pending items
|
||||||
|
|
||||||
- **New features** -- any feature touching 2+ files
|
### Use `/af-review` when:
|
||||||
- **Refactoring** -- structural changes across modules
|
- User wants to review code before merging
|
||||||
- **Security-sensitive changes** -- auth, encryption, input handling, API keys
|
- A diff, branch, or commit range needs quality check
|
||||||
- **Bug fixes with unclear root cause** -- use Explorer to investigate first
|
- Security-sensitive changes need Guardian analysis
|
||||||
- **Code review requests** -- spawn Guardian + relevant reviewers
|
|
||||||
- **Multi-file changes** -- anything beyond a single-file edit
|
### Use `/af-run` (deep orchestration) when:
|
||||||
|
- **Writing/research tasks** -- archetypes add value where linters don't exist
|
||||||
|
- **Security-sensitive code changes** -- auth, encryption, API keys
|
||||||
|
- **Complex multi-module refactors** with unclear approach
|
||||||
|
|
||||||
|
### Do NOT use ArcheFlow for:
|
||||||
|
- **Single-feature code development** -- use `feature-dev` plugin or work directly
|
||||||
|
- **Simple fixes** -- just do them
|
||||||
|
- **Questions, exploration, reading** -- no code changes needed
|
||||||
|
|
||||||
Choose the workflow based on risk:
|
Choose the workflow based on risk:
|
||||||
|
|
||||||
|
|||||||
Reference in New Issue
Block a user