feat: add sprint runner and review-only skills

2026-04-04 18:21:19 +02:00
parent aebf55a9a7
commit 6309614bfa
2 changed files with 410 additions and 0 deletions
--- a/skills/review/SKILL.md
+++ b/skills/review/SKILL.md
@@ -0,0 +1,141 @@
+---
+name: review
+description: |
+  Review-only mode. Run Guardian + optional reviewers on an existing diff or branch,
+  without any Plan/Do orchestration. The highest-ROI mode for catching design-level bugs.
+  <example>User: "af-review"</example>
+  <example>User: "Review the last commit"</example>
+  <example>User: "af-review --reviewers guardian,skeptic"</example>
+---
+
+# ArcheFlow Review Mode
+
+Run reviewers on existing code changes without orchestrating implementation.
+This is the most cost-effective mode — it delivers Guardian's error-path analysis
+without the Maker overhead.
+
+## When to Use
+
+- After you've implemented something and want a quality check
+- On a PR or branch before merging
+- When the sprint runner flags a task as DONE_WITH_CONCERNS
+- As a pre-commit quality gate for complex changes
+
+## Invocation
+
+```
+af-review                                    # Review uncommitted changes
+af-review --branch feat/batch-api            # Review branch diff against main
+af-review --commit HEAD~3..HEAD              # Review last 3 commits
+af-review --reviewers guardian,skeptic,sage   # Choose reviewers (default: guardian)
+af-review --evidence                         # Enable evidence-gating (stricter)
+```
+
+---
+
+## Execution
+
+### Step 1: Get the Diff
+
+```bash
+# Uncommitted changes
+DIFF=$(git diff HEAD)
+
+# Branch diff
+DIFF=$(git diff main...HEAD)
+
+# Commit range
+DIFF=$(git diff HEAD~3..HEAD)
+
+# If diff is too large (>500 lines), split by file
+if [[ $(echo "$DIFF" | wc -l) -gt 500 ]]; then
+  # Review per-file to keep context focused
+  FILES=$(git diff --name-only HEAD)
+fi
+```
+
+### Step 2: Spawn Reviewers
+
+Default: Guardian only (fastest, highest ROI).
+With `--reviewers`: spawn requested reviewers in parallel.
+
+**Guardian** (always first):
+```
+Agent(
+  description: "Guardian: review changes for <project>",
+  prompt: "You are the GUARDIAN archetype — security and risk reviewer.
+
+    Review this diff for: security vulnerabilities, error handling gaps,
+    data loss scenarios, race conditions, and breaking changes.
+
+    For each finding: cite specific code (file:line), state what you tested
+    or observed, state what the correct behavior should be.
+
+    Diff:
+    <DIFF>
+
+    STATUS: DONE | DONE_WITH_CONCERNS | NEEDS_CONTEXT | BLOCKED",
+  subagent_type: "code-reviewer"
+)
+```
+
+**Skeptic** (if requested):
+- Focus: hidden assumptions, edge cases, scalability
+- Context: diff + any design docs
+
+**Sage** (if requested):
+- Focus: code quality, test coverage, maintainability
+- Context: diff + surrounding code
+
+**Trickster** (if requested):
+- Focus: adversarial inputs, failure injection, chaos testing
+- Context: diff only
+
+### Step 3: Collect and Report
+
+Parse each reviewer's output. Show findings:
+
+```
+── af-review: <project> ───────────────────────
+Reviewers: guardian, skeptic
+
+🛡️ Guardian: 2 findings (1 HIGH, 1 MEDIUM)
+  [HIGH] Timeout marks variant as done — loses batch state (fanout.py:552)
+  [MEDIUM] No JSON error handling on corrupted state (batch.py:310)
+
+🤔 Skeptic: 1 finding (1 INFO)
+  [INFO] hash() non-deterministic across processes (fanout.py:524)
+
+Total: 3 findings (1 HIGH, 1 MEDIUM, 1 INFO)
+────────────────────────────────────────────────
+```
+
+### Step 4: Evidence Gate (if --evidence)
+
+When `--evidence` is active, apply the evidence requirements from `archeflow:check-phase`:
+- Scan findings for banned phrases ("might be", "could potentially", etc.)
+- Check for evidence markers (exit codes, line numbers, reproduction steps)
+- Downgrade unsupported findings to INFO
+
+---
+
+## Integration with Sprint Runner
+
+The sprint runner can invoke `af-review` automatically:
+
+| Sprint trigger | Review action |
+|----------------|--------------|
+| Task marked DONE_WITH_CONCERNS | Run Guardian on the agent's changes |
+| Task is L/XL estimate | Run Guardian + Skeptic after completion |
+| Task involves security keywords | Run Guardian automatically |
+| User requests | Run specified reviewers |
+
+---
+
+## Cost
+
+Review-only is 60-80% cheaper than full PDCA:
+- No Explorer research (~30% of PDCA cost)
+- No Creator planning (~20% of PDCA cost)
+- No Maker implementation (already done)
+- Only reviewer token costs remain
--- a/skills/sprint/SKILL.md
+++ b/skills/sprint/SKILL.md
@@ -0,0 +1,269 @@
+---
+name: sprint
+description: |
+  Workspace sprint runner. Reads queue.json, spawns parallel agent teams across projects,
+  manages lifecycle (commit, push, next task), tracks progress. The main operational mode
+  for ArcheFlow in multi-project workspaces.
+  <example>User: "af-sprint"</example>
+  <example>User: "Run the sprint"</example>
+  <example>User: "af-sprint --slots 5 --dry-run"</example>
+---
+
+# Workspace Sprint Runner
+
+Read the task queue, spawn parallel agents across projects, collect results, commit+push,
+spawn next batch. Repeat until the queue is drained or budget is exhausted.
+
+## When to Use
+
+This is the **primary operational mode** for ArcheFlow in multi-project workspaces.
+Use it when the user says "run the sprint", "work the queue", "go autonomous", or
+invokes `af-sprint`.
+
+Do NOT use `archeflow:run` for individual tasks within a sprint — the sprint runner
+handles task dispatch internally, using `archeflow:run` only when a task warrants
+full PDCA orchestration.
+
+## Prerequisites
+
+- `docs/orchestra/queue.json` — task queue (managed by `./scripts/ws`)
+- `./scripts/ws` — workspace CLI for queue operations
+- Each project is a separate git repo under the workspace root
+
+## Invocation
+
+```
+af-sprint                          # Run sprint with defaults (4 slots, AUTONOM mode)
+af-sprint --slots 5                # Max 5 parallel agents
+af-sprint --dry-run                # Show what would run, don't execute
+af-sprint --priority P0,P1         # Only process P0 and P1 items
+af-sprint --project writing.colette # Only process items for this project
+```
+
+---
+
+## Execution Protocol
+
+### Step 0: Orient
+
+```bash
+# Load queue and workspace state
+QUEUE=$(cat docs/orchestra/queue.json)
+MODE=$(echo "$QUEUE" | jq -r '.mode')
+```
+
+Check mode:
+- `AUTONOM` → proceed without asking
+- `ATTENDED` → show plan, wait for user approval before each batch
+- `PAUSED` → report status only, do not start tasks
+
+Show one-line status:
+```
+sprint: AUTONOM · 7 pending (1×P0, 1×P2, 5×P3) · 4 slots
+```
+
+### Step 1: Select Batch
+
+Pick tasks for the next batch. Rules:
+
+1. **Priority cascade**: P0 first, then P1, then P2. Never start P3 unless user explicitly includes it.
+2. **Dependency check**: Skip tasks whose `depends_on` items aren't all `completed`.
+3. **One agent per project**: Never run two tasks on the same project simultaneously.
+4. **Cost-aware concurrency**:
+   - Estimate task cost from `estimate` field: S=cheap, M=moderate, L=expensive, XL=very expensive
+   - **Expensive tasks** (L, XL): max 2 concurrent
+   - **Cheap tasks** (S, M): fill remaining slots
+   - Target mix: 1-2 expensive + 2-3 cheap = 4-5 total
+5. **Slot limit**: Never exceed `--slots` (default 4).
+
+```python
+# Pseudocode for batch selection
+batch = []
+used_projects = set()
+expensive_count = 0
+
+for priority in ["P0", "P1", "P2"]:
+    for task in queue_items(priority, status="pending"):
+        if len(batch) >= MAX_SLOTS:
+            break
+        if task.project in used_projects:
+            continue  # One agent per project
+        if not deps_satisfied(task):
+            continue
+        if task.estimate in ("L", "XL"):
+            if expensive_count >= 2:
+                continue
+            expensive_count += 1
+        batch.append(task)
+        used_projects.add(task.project)
+```
+
+### Step 2: Assess and Dispatch
+
+For each task in the batch, decide the execution strategy:
+
+| Signal | Strategy | What happens |
+|--------|----------|-------------|
+| Estimate S, clear scope | **Direct** | Spawn Agent() with task description, no orchestration |
+| Estimate M, multi-file | **Pipeline** | Spawn Agent() with af-run --strategy pipeline |
+| Estimate L/XL, complex | **PDCA** | Spawn Agent() with af-run --strategy pdca |
+| Task contains "validate", "test", "lint", "check" | **Direct** | Cheap analytical task, no orchestration |
+| Task contains "review", "audit", "security" | **Review** | Spawn Guardian + relevant reviewers only |
+
+**Agent spawn template:**
+
+For each task in the batch, spawn an Agent in the SAME message (parallel dispatch):
+
+```
+Agent(
+  description: "<project>: <task-short>",
+  prompt: "You are working on project <project> at <path>.
+    Task: <task description>
+    <notes if any>
+
+    Rules:
+    - Read the project's CLAUDE.md first
+    - Commit with: git -c user.signingkey=/home/c/.ssh/id_ed25519_dev.pub commit
+    - NO Co-Authored-By trailers
+    - Conventional commits
+    - Push when done: GIT_SSH_COMMAND='ssh -i /home/c/.ssh/id_ed25519_dev -o IdentitiesOnly=yes' git push origin main
+    - Run tests if the project has them
+    - Report: what you did, what changed, any blockers
+
+    STATUS: DONE | DONE_WITH_CONCERNS | NEEDS_CONTEXT | BLOCKED",
+  subagent_type: "general-purpose",
+  isolation: "worktree"  # Only for L/XL tasks; S/M tasks run directly
+)
+```
+
+**CRITICAL: Spawn all batch agents in a SINGLE message.** This enables parallel execution.
+Do not spawn them sequentially.
+
+### Step 3: Mark Running
+
+After spawning, update the queue:
+
+```bash
+# For each spawned task
+./scripts/ws start <task-id>  # or manually update queue.json status to "running"
+```
+
+If `./scripts/ws start` doesn't exist, update queue.json directly:
+```python
+task["status"] = "running"
+# Write back to docs/orchestra/queue.json
+```
+
+### Step 4: Collect Results
+
+As agents complete, process their results:
+
+1. **Parse status token** from agent output (last line: `STATUS: DONE|...`)
+2. **Based on status**:
+   - `DONE` → mark completed, note result
+   - `DONE_WITH_CONCERNS` → mark completed, log concerns for user review
+   - `NEEDS_CONTEXT` → mark pending, add concern to notes, skip for now
+   - `BLOCKED` → mark failed, add blocker to notes
+3. **Update queue**:
+   ```bash
+   ./scripts/ws done <task-id> -r "<summary of what was done>"
+   # or
+   ./scripts/ws fail <task-id> -r "<reason>"
+   ```
+
+### Step 5: Report and Loop
+
+After batch completes, show sprint status:
+
+```
+── Sprint Batch 1 ──────────────────────────────
+  ✓ writing.colette    fanout run           done (45s)
+  ✓ book.3sets         validation           done (30s)
+  △ book.sos           meta-book concept    needs_context (missing outline)
+  ✓ tool.archeflow     af-review mode       done (60s)
+
+Queue: 3 completed, 1 blocked, 3 remaining
+Next batch: 2 items ready
+────────────────────────────────────────────────
+```
+
+Then **immediately select and dispatch the next batch** (Step 1). Don't wait for user input in AUTONOM mode.
+
+### Step 6: Sprint Complete
+
+When no more tasks are schedulable (all done, blocked, or P3-only):
+
+1. Update `docs/control-center.md` Handoff section
+2. Run `./scripts/ws log --summary "<sprint summary>"` if available
+3. Show final sprint report:
+
+```
+── Sprint Complete ─────────────────────────────
+Duration: 12 min
+Tasks: 5 completed, 1 blocked, 1 remaining (P3)
+Projects touched: 4
+Commits: 7
+────────────────────────────────────────────────
+```
+
+---
+
+## Mode Behavior
+
+### AUTONOM
+- Dispatch immediately, no user confirmation
+- Commit + push after each agent completes
+- Only pause for BLOCKED tasks or budget exhaustion
+- Report between batches (one-line status)
+
+### ATTENDED
+- Show the selected batch before dispatching
+- Wait for user to approve: "Proceed with this batch? [y/n]"
+- After each batch, show results and ask: "Continue to next batch? [y/n/edit]"
+- "edit" lets the user reprioritize before next batch
+
+### PAUSED
+- Show queue status only
+- Do not dispatch any agents
+- Useful for reviewing state between sessions
+
+---
+
+## When to Use ArcheFlow Orchestration Within Sprint
+
+Most sprint tasks should be **direct agent dispatch** (no PDCA/pipeline overhead).
+Only escalate to full orchestration when:
+
+| Signal | Action |
+|--------|--------|
+| Task is S/M, clear scope, single project | Direct dispatch |
+| Task is L/XL | Use pipeline or PDCA strategy |
+| Task mentions "security", "auth", "encryption" | Add Guardian review |
+| Task is a review/audit | Spawn reviewers only (af-review mode) |
+| Task failed in a previous sprint | Escalate to PDCA with Explorer |
+
+The sprint runner's job is **throughput**, not perfection. Ship fast, fix forward.
+
+---
+
+## Integration with Existing Tools
+
+| Tool | How sprint uses it |
+|------|-------------------|
+| `./scripts/ws next` | Get next schedulable task |
+| `./scripts/ws done <id>` | Mark task completed |
+| `./scripts/ws fail <id>` | Mark task failed |
+| `./scripts/ws orient` | Initial workspace overview |
+| `./scripts/ws validate` | Pre-flight queue validation |
+| `git` per project | Commit + push after each agent |
+| `archeflow:run` | Only for L/XL tasks needing PDCA |
+
+---
+
+## Error Recovery
+
+- **Agent crashes mid-task**: Mark task as `failed`, add error to notes, continue with next batch
+- **Git push fails**: Log the error, do NOT retry. User will handle push conflicts manually.
+- **Queue file corrupted**: Run `./scripts/ws validate`. If invalid, stop sprint and report.
+- **Budget exceeded**: Stop sprint, report remaining tasks and estimated cost.
+- **All tasks blocked**: Report dependency graph, suggest which blockers to resolve first.