feat: add sprint runner and review-only skills
This commit is contained in:
141
skills/review/SKILL.md
Normal file
141
skills/review/SKILL.md
Normal file
@@ -0,0 +1,141 @@
|
|||||||
|
---
|
||||||
|
name: review
|
||||||
|
description: |
|
||||||
|
Review-only mode. Run Guardian + optional reviewers on an existing diff or branch,
|
||||||
|
without any Plan/Do orchestration. The highest-ROI mode for catching design-level bugs.
|
||||||
|
<example>User: "af-review"</example>
|
||||||
|
<example>User: "Review the last commit"</example>
|
||||||
|
<example>User: "af-review --reviewers guardian,skeptic"</example>
|
||||||
|
---
|
||||||
|
|
||||||
|
# ArcheFlow Review Mode
|
||||||
|
|
||||||
|
Run reviewers on existing code changes without orchestrating implementation.
|
||||||
|
This is the most cost-effective mode — it delivers Guardian's error-path analysis
|
||||||
|
without the Maker overhead.
|
||||||
|
|
||||||
|
## When to Use
|
||||||
|
|
||||||
|
- After you've implemented something and want a quality check
|
||||||
|
- On a PR or branch before merging
|
||||||
|
- When the sprint runner flags a task as DONE_WITH_CONCERNS
|
||||||
|
- As a pre-commit quality gate for complex changes
|
||||||
|
|
||||||
|
## Invocation
|
||||||
|
|
||||||
|
```
|
||||||
|
af-review # Review uncommitted changes
|
||||||
|
af-review --branch feat/batch-api # Review branch diff against main
|
||||||
|
af-review --commit HEAD~3..HEAD # Review last 3 commits
|
||||||
|
af-review --reviewers guardian,skeptic,sage # Choose reviewers (default: guardian)
|
||||||
|
af-review --evidence # Enable evidence-gating (stricter)
|
||||||
|
```
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Execution
|
||||||
|
|
||||||
|
### Step 1: Get the Diff
|
||||||
|
|
||||||
|
```bash
|
||||||
|
# Uncommitted changes
|
||||||
|
DIFF=$(git diff HEAD)
|
||||||
|
|
||||||
|
# Branch diff
|
||||||
|
DIFF=$(git diff main...HEAD)
|
||||||
|
|
||||||
|
# Commit range
|
||||||
|
DIFF=$(git diff HEAD~3..HEAD)
|
||||||
|
|
||||||
|
# If diff is too large (>500 lines), split by file
|
||||||
|
if [[ $(echo "$DIFF" | wc -l) -gt 500 ]]; then
|
||||||
|
# Review per-file to keep context focused
|
||||||
|
FILES=$(git diff --name-only HEAD)
|
||||||
|
fi
|
||||||
|
```
|
||||||
|
|
||||||
|
### Step 2: Spawn Reviewers
|
||||||
|
|
||||||
|
Default: Guardian only (fastest, highest ROI).
|
||||||
|
With `--reviewers`: spawn requested reviewers in parallel.
|
||||||
|
|
||||||
|
**Guardian** (always first):
|
||||||
|
```
|
||||||
|
Agent(
|
||||||
|
description: "Guardian: review changes for <project>",
|
||||||
|
prompt: "You are the GUARDIAN archetype — security and risk reviewer.
|
||||||
|
|
||||||
|
Review this diff for: security vulnerabilities, error handling gaps,
|
||||||
|
data loss scenarios, race conditions, and breaking changes.
|
||||||
|
|
||||||
|
For each finding: cite specific code (file:line), state what you tested
|
||||||
|
or observed, state what the correct behavior should be.
|
||||||
|
|
||||||
|
Diff:
|
||||||
|
<DIFF>
|
||||||
|
|
||||||
|
STATUS: DONE | DONE_WITH_CONCERNS | NEEDS_CONTEXT | BLOCKED",
|
||||||
|
subagent_type: "code-reviewer"
|
||||||
|
)
|
||||||
|
```
|
||||||
|
|
||||||
|
**Skeptic** (if requested):
|
||||||
|
- Focus: hidden assumptions, edge cases, scalability
|
||||||
|
- Context: diff + any design docs
|
||||||
|
|
||||||
|
**Sage** (if requested):
|
||||||
|
- Focus: code quality, test coverage, maintainability
|
||||||
|
- Context: diff + surrounding code
|
||||||
|
|
||||||
|
**Trickster** (if requested):
|
||||||
|
- Focus: adversarial inputs, failure injection, chaos testing
|
||||||
|
- Context: diff only
|
||||||
|
|
||||||
|
### Step 3: Collect and Report
|
||||||
|
|
||||||
|
Parse each reviewer's output. Show findings:
|
||||||
|
|
||||||
|
```
|
||||||
|
── af-review: <project> ───────────────────────
|
||||||
|
Reviewers: guardian, skeptic
|
||||||
|
|
||||||
|
🛡️ Guardian: 2 findings (1 HIGH, 1 MEDIUM)
|
||||||
|
[HIGH] Timeout marks variant as done — loses batch state (fanout.py:552)
|
||||||
|
[MEDIUM] No JSON error handling on corrupted state (batch.py:310)
|
||||||
|
|
||||||
|
🤔 Skeptic: 1 finding (1 INFO)
|
||||||
|
[INFO] hash() non-deterministic across processes (fanout.py:524)
|
||||||
|
|
||||||
|
Total: 3 findings (1 HIGH, 1 MEDIUM, 1 INFO)
|
||||||
|
────────────────────────────────────────────────
|
||||||
|
```
|
||||||
|
|
||||||
|
### Step 4: Evidence Gate (if --evidence)
|
||||||
|
|
||||||
|
When `--evidence` is active, apply the evidence requirements from `archeflow:check-phase`:
|
||||||
|
- Scan findings for banned phrases ("might be", "could potentially", etc.)
|
||||||
|
- Check for evidence markers (exit codes, line numbers, reproduction steps)
|
||||||
|
- Downgrade unsupported findings to INFO
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Integration with Sprint Runner
|
||||||
|
|
||||||
|
The sprint runner can invoke `af-review` automatically:
|
||||||
|
|
||||||
|
| Sprint trigger | Review action |
|
||||||
|
|----------------|--------------|
|
||||||
|
| Task marked DONE_WITH_CONCERNS | Run Guardian on the agent's changes |
|
||||||
|
| Task is L/XL estimate | Run Guardian + Skeptic after completion |
|
||||||
|
| Task involves security keywords | Run Guardian automatically |
|
||||||
|
| User requests | Run specified reviewers |
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Cost
|
||||||
|
|
||||||
|
Review-only is 60-80% cheaper than full PDCA:
|
||||||
|
- No Explorer research (~30% of PDCA cost)
|
||||||
|
- No Creator planning (~20% of PDCA cost)
|
||||||
|
- No Maker implementation (already done)
|
||||||
|
- Only reviewer token costs remain
|
||||||
269
skills/sprint/SKILL.md
Normal file
269
skills/sprint/SKILL.md
Normal file
@@ -0,0 +1,269 @@
|
|||||||
|
---
|
||||||
|
name: sprint
|
||||||
|
description: |
|
||||||
|
Workspace sprint runner. Reads queue.json, spawns parallel agent teams across projects,
|
||||||
|
manages lifecycle (commit, push, next task), tracks progress. The main operational mode
|
||||||
|
for ArcheFlow in multi-project workspaces.
|
||||||
|
<example>User: "af-sprint"</example>
|
||||||
|
<example>User: "Run the sprint"</example>
|
||||||
|
<example>User: "af-sprint --slots 5 --dry-run"</example>
|
||||||
|
---
|
||||||
|
|
||||||
|
# Workspace Sprint Runner
|
||||||
|
|
||||||
|
Read the task queue, spawn parallel agents across projects, collect results, commit+push,
|
||||||
|
spawn next batch. Repeat until the queue is drained or budget is exhausted.
|
||||||
|
|
||||||
|
## When to Use
|
||||||
|
|
||||||
|
This is the **primary operational mode** for ArcheFlow in multi-project workspaces.
|
||||||
|
Use it when the user says "run the sprint", "work the queue", "go autonomous", or
|
||||||
|
invokes `af-sprint`.
|
||||||
|
|
||||||
|
Do NOT use `archeflow:run` for individual tasks within a sprint — the sprint runner
|
||||||
|
handles task dispatch internally, using `archeflow:run` only when a task warrants
|
||||||
|
full PDCA orchestration.
|
||||||
|
|
||||||
|
## Prerequisites
|
||||||
|
|
||||||
|
- `docs/orchestra/queue.json` — task queue (managed by `./scripts/ws`)
|
||||||
|
- `./scripts/ws` — workspace CLI for queue operations
|
||||||
|
- Each project is a separate git repo under the workspace root
|
||||||
|
|
||||||
|
## Invocation
|
||||||
|
|
||||||
|
```
|
||||||
|
af-sprint # Run sprint with defaults (4 slots, AUTONOM mode)
|
||||||
|
af-sprint --slots 5 # Max 5 parallel agents
|
||||||
|
af-sprint --dry-run # Show what would run, don't execute
|
||||||
|
af-sprint --priority P0,P1 # Only process P0 and P1 items
|
||||||
|
af-sprint --project writing.colette # Only process items for this project
|
||||||
|
```
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Execution Protocol
|
||||||
|
|
||||||
|
### Step 0: Orient
|
||||||
|
|
||||||
|
```bash
|
||||||
|
# Load queue and workspace state
|
||||||
|
QUEUE=$(cat docs/orchestra/queue.json)
|
||||||
|
MODE=$(echo "$QUEUE" | jq -r '.mode')
|
||||||
|
```
|
||||||
|
|
||||||
|
Check mode:
|
||||||
|
- `AUTONOM` → proceed without asking
|
||||||
|
- `ATTENDED` → show plan, wait for user approval before each batch
|
||||||
|
- `PAUSED` → report status only, do not start tasks
|
||||||
|
|
||||||
|
Show one-line status:
|
||||||
|
```
|
||||||
|
sprint: AUTONOM · 7 pending (1×P0, 1×P2, 5×P3) · 4 slots
|
||||||
|
```
|
||||||
|
|
||||||
|
### Step 1: Select Batch
|
||||||
|
|
||||||
|
Pick tasks for the next batch. Rules:
|
||||||
|
|
||||||
|
1. **Priority cascade**: P0 first, then P1, then P2. Never start P3 unless user explicitly includes it.
|
||||||
|
2. **Dependency check**: Skip tasks whose `depends_on` items aren't all `completed`.
|
||||||
|
3. **One agent per project**: Never run two tasks on the same project simultaneously.
|
||||||
|
4. **Cost-aware concurrency**:
|
||||||
|
- Estimate task cost from `estimate` field: S=cheap, M=moderate, L=expensive, XL=very expensive
|
||||||
|
- **Expensive tasks** (L, XL): max 2 concurrent
|
||||||
|
- **Cheap tasks** (S, M): fill remaining slots
|
||||||
|
- Target mix: 1-2 expensive + 2-3 cheap = 4-5 total
|
||||||
|
5. **Slot limit**: Never exceed `--slots` (default 4).
|
||||||
|
|
||||||
|
```python
|
||||||
|
# Pseudocode for batch selection
|
||||||
|
batch = []
|
||||||
|
used_projects = set()
|
||||||
|
expensive_count = 0
|
||||||
|
|
||||||
|
for priority in ["P0", "P1", "P2"]:
|
||||||
|
for task in queue_items(priority, status="pending"):
|
||||||
|
if len(batch) >= MAX_SLOTS:
|
||||||
|
break
|
||||||
|
if task.project in used_projects:
|
||||||
|
continue # One agent per project
|
||||||
|
if not deps_satisfied(task):
|
||||||
|
continue
|
||||||
|
if task.estimate in ("L", "XL"):
|
||||||
|
if expensive_count >= 2:
|
||||||
|
continue
|
||||||
|
expensive_count += 1
|
||||||
|
batch.append(task)
|
||||||
|
used_projects.add(task.project)
|
||||||
|
```
|
||||||
|
|
||||||
|
### Step 2: Assess and Dispatch
|
||||||
|
|
||||||
|
For each task in the batch, decide the execution strategy:
|
||||||
|
|
||||||
|
| Signal | Strategy | What happens |
|
||||||
|
|--------|----------|-------------|
|
||||||
|
| Estimate S, clear scope | **Direct** | Spawn Agent() with task description, no orchestration |
|
||||||
|
| Estimate M, multi-file | **Pipeline** | Spawn Agent() with af-run --strategy pipeline |
|
||||||
|
| Estimate L/XL, complex | **PDCA** | Spawn Agent() with af-run --strategy pdca |
|
||||||
|
| Task contains "validate", "test", "lint", "check" | **Direct** | Cheap analytical task, no orchestration |
|
||||||
|
| Task contains "review", "audit", "security" | **Review** | Spawn Guardian + relevant reviewers only |
|
||||||
|
|
||||||
|
**Agent spawn template:**
|
||||||
|
|
||||||
|
For each task in the batch, spawn an Agent in the SAME message (parallel dispatch):
|
||||||
|
|
||||||
|
```
|
||||||
|
Agent(
|
||||||
|
description: "<project>: <task-short>",
|
||||||
|
prompt: "You are working on project <project> at <path>.
|
||||||
|
Task: <task description>
|
||||||
|
<notes if any>
|
||||||
|
|
||||||
|
Rules:
|
||||||
|
- Read the project's CLAUDE.md first
|
||||||
|
- Commit with: git -c user.signingkey=/home/c/.ssh/id_ed25519_dev.pub commit
|
||||||
|
- NO Co-Authored-By trailers
|
||||||
|
- Conventional commits
|
||||||
|
- Push when done: GIT_SSH_COMMAND='ssh -i /home/c/.ssh/id_ed25519_dev -o IdentitiesOnly=yes' git push origin main
|
||||||
|
- Run tests if the project has them
|
||||||
|
- Report: what you did, what changed, any blockers
|
||||||
|
|
||||||
|
STATUS: DONE | DONE_WITH_CONCERNS | NEEDS_CONTEXT | BLOCKED",
|
||||||
|
subagent_type: "general-purpose",
|
||||||
|
isolation: "worktree" # Only for L/XL tasks; S/M tasks run directly
|
||||||
|
)
|
||||||
|
```
|
||||||
|
|
||||||
|
**CRITICAL: Spawn all batch agents in a SINGLE message.** This enables parallel execution.
|
||||||
|
Do not spawn them sequentially.
|
||||||
|
|
||||||
|
### Step 3: Mark Running
|
||||||
|
|
||||||
|
After spawning, update the queue:
|
||||||
|
|
||||||
|
```bash
|
||||||
|
# For each spawned task
|
||||||
|
./scripts/ws start <task-id> # or manually update queue.json status to "running"
|
||||||
|
```
|
||||||
|
|
||||||
|
If `./scripts/ws start` doesn't exist, update queue.json directly:
|
||||||
|
```python
|
||||||
|
task["status"] = "running"
|
||||||
|
# Write back to docs/orchestra/queue.json
|
||||||
|
```
|
||||||
|
|
||||||
|
### Step 4: Collect Results
|
||||||
|
|
||||||
|
As agents complete, process their results:
|
||||||
|
|
||||||
|
1. **Parse status token** from agent output (last line: `STATUS: DONE|...`)
|
||||||
|
2. **Based on status**:
|
||||||
|
- `DONE` → mark completed, note result
|
||||||
|
- `DONE_WITH_CONCERNS` → mark completed, log concerns for user review
|
||||||
|
- `NEEDS_CONTEXT` → mark pending, add concern to notes, skip for now
|
||||||
|
- `BLOCKED` → mark failed, add blocker to notes
|
||||||
|
3. **Update queue**:
|
||||||
|
```bash
|
||||||
|
./scripts/ws done <task-id> -r "<summary of what was done>"
|
||||||
|
# or
|
||||||
|
./scripts/ws fail <task-id> -r "<reason>"
|
||||||
|
```
|
||||||
|
|
||||||
|
### Step 5: Report and Loop
|
||||||
|
|
||||||
|
After batch completes, show sprint status:
|
||||||
|
|
||||||
|
```
|
||||||
|
── Sprint Batch 1 ──────────────────────────────
|
||||||
|
✓ writing.colette fanout run done (45s)
|
||||||
|
✓ book.3sets validation done (30s)
|
||||||
|
△ book.sos meta-book concept needs_context (missing outline)
|
||||||
|
✓ tool.archeflow af-review mode done (60s)
|
||||||
|
|
||||||
|
Queue: 3 completed, 1 blocked, 3 remaining
|
||||||
|
Next batch: 2 items ready
|
||||||
|
────────────────────────────────────────────────
|
||||||
|
```
|
||||||
|
|
||||||
|
Then **immediately select and dispatch the next batch** (Step 1). Don't wait for user input in AUTONOM mode.
|
||||||
|
|
||||||
|
### Step 6: Sprint Complete
|
||||||
|
|
||||||
|
When no more tasks are schedulable (all done, blocked, or P3-only):
|
||||||
|
|
||||||
|
1. Update `docs/control-center.md` Handoff section
|
||||||
|
2. Run `./scripts/ws log --summary "<sprint summary>"` if available
|
||||||
|
3. Show final sprint report:
|
||||||
|
|
||||||
|
```
|
||||||
|
── Sprint Complete ─────────────────────────────
|
||||||
|
Duration: 12 min
|
||||||
|
Tasks: 5 completed, 1 blocked, 1 remaining (P3)
|
||||||
|
Projects touched: 4
|
||||||
|
Commits: 7
|
||||||
|
────────────────────────────────────────────────
|
||||||
|
```
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Mode Behavior
|
||||||
|
|
||||||
|
### AUTONOM
|
||||||
|
- Dispatch immediately, no user confirmation
|
||||||
|
- Commit + push after each agent completes
|
||||||
|
- Only pause for BLOCKED tasks or budget exhaustion
|
||||||
|
- Report between batches (one-line status)
|
||||||
|
|
||||||
|
### ATTENDED
|
||||||
|
- Show the selected batch before dispatching
|
||||||
|
- Wait for user to approve: "Proceed with this batch? [y/n]"
|
||||||
|
- After each batch, show results and ask: "Continue to next batch? [y/n/edit]"
|
||||||
|
- "edit" lets the user reprioritize before next batch
|
||||||
|
|
||||||
|
### PAUSED
|
||||||
|
- Show queue status only
|
||||||
|
- Do not dispatch any agents
|
||||||
|
- Useful for reviewing state between sessions
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## When to Use ArcheFlow Orchestration Within Sprint
|
||||||
|
|
||||||
|
Most sprint tasks should be **direct agent dispatch** (no PDCA/pipeline overhead).
|
||||||
|
Only escalate to full orchestration when:
|
||||||
|
|
||||||
|
| Signal | Action |
|
||||||
|
|--------|--------|
|
||||||
|
| Task is S/M, clear scope, single project | Direct dispatch |
|
||||||
|
| Task is L/XL | Use pipeline or PDCA strategy |
|
||||||
|
| Task mentions "security", "auth", "encryption" | Add Guardian review |
|
||||||
|
| Task is a review/audit | Spawn reviewers only (af-review mode) |
|
||||||
|
| Task failed in a previous sprint | Escalate to PDCA with Explorer |
|
||||||
|
|
||||||
|
The sprint runner's job is **throughput**, not perfection. Ship fast, fix forward.
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Integration with Existing Tools
|
||||||
|
|
||||||
|
| Tool | How sprint uses it |
|
||||||
|
|------|-------------------|
|
||||||
|
| `./scripts/ws next` | Get next schedulable task |
|
||||||
|
| `./scripts/ws done <id>` | Mark task completed |
|
||||||
|
| `./scripts/ws fail <id>` | Mark task failed |
|
||||||
|
| `./scripts/ws orient` | Initial workspace overview |
|
||||||
|
| `./scripts/ws validate` | Pre-flight queue validation |
|
||||||
|
| `git` per project | Commit + push after each agent |
|
||||||
|
| `archeflow:run` | Only for L/XL tasks needing PDCA |
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Error Recovery
|
||||||
|
|
||||||
|
- **Agent crashes mid-task**: Mark task as `failed`, add error to notes, continue with next batch
|
||||||
|
- **Git push fails**: Log the error, do NOT retry. User will handle push conflicts manually.
|
||||||
|
- **Queue file corrupted**: Run `./scripts/ws validate`. If invalid, stop sprint and report.
|
||||||
|
- **Budget exceeded**: Stop sprint, report remaining tasks and estimated cost.
|
||||||
|
- **All tasks blocked**: Report dependency graph, suggest which blockers to resolve first.
|
||||||
Reference in New Issue
Block a user