refactor: merge attention-filters into check-phase skill

Consolidate the attention-filters skill (122 lines) into check-phase, reducing check-phase from 234 to 110 lines. Removed verbose bash code blocks, 30-line consolidated output example, re-check protocol (belongs in act-phase), and motivational section. Updated all references in README, plugin.json, using-archeflow, and colette-bridge.
2026-04-06 20:41:20 +02:00
parent 55a6ba14c9
commit af1f4e7da7
6 changed files with 61 additions and 308 deletions
--- a/skills/check-phase/SKILL.md
+++ b/skills/check-phase/SKILL.md
@@ -1,233 +1,110 @@
 ---
 name: check-phase
-description: Use when you are acting as Guardian, Skeptic, Sage, or Trickster archetype in the Check phase. Defines shared review rules and output format.
+description: Use when acting as Guardian, Skeptic, Sage, or Trickster in the Check phase. Defines review rules, finding format, attention filters, and spawning protocol.
 ---

 # Check Phase

-Multiple reviewers examine the Maker's implementation in parallel. Each agent definition has its specific protocol — this skill defines the shared rules.
+Reviewers examine the Maker's implementation. This skill defines shared rules, finding format, and spawning protocol.

 ## Shared Rules

-1. **Read the proposal first.** Review against the intended design, not invented requirements.
-2. **Read the actual code.** Use `git diff` on the Maker's branch. Don't review descriptions alone.
-3. **Structured findings.** Use the standardized finding format below for every issue.
-4. **Clear verdict:** `APPROVED` or `REJECTED` with rationale.
-5. **Status tokens are separate from verdicts.** The `STATUS: DONE` line signals the agent finished successfully. The `APPROVED`/`REJECTED` verdict is domain output. A reviewer can be `STATUS: DONE` with verdict `REJECTED` — that is normal. Parse both independently.
+1. Review against the proposal's intended design, not invented requirements.
+2. Read actual code via `git diff` on the Maker's branch.
+3. Use the finding format below for every issue.
+4. Give a clear verdict: `APPROVED` or `REJECTED` with rationale.
+5. `STATUS: DONE` signals agent completion. `APPROVED`/`REJECTED` is domain output. Both are parsed independently.

 ## Finding Format

-Every finding must use this format for cross-cycle tracking:
-
-```
 | Location | Severity | Category | Description | Fix |
 |----------|----------|----------|-------------|-----|
-| src/auth/handler.ts:48 | CRITICAL | security | Empty string bypasses validation | Add length check before processing |
-```
+| src/auth/handler.ts:48 | CRITICAL | security | Empty string bypasses validation | Add length check |

-**Severity:**
- **CRITICAL** — Must fix. Blocks approval.
- **WARNING** — Should fix. Doesn't block alone.
- **INFO** — Nice to have. Never blocks.
+**Severity:** CRITICAL = must fix, blocks approval. WARNING = should fix, doesn't block alone. INFO = nice to have, never blocks.

-**Categories** (use consistently for cross-cycle tracking):
- `security` — Injection, auth bypass, data exposure, secrets
- `reliability` — Error handling, edge cases, race conditions, crashes
- `design` — Architecture, assumptions, scalability, coupling
- `breaking-change` — API compatibility, schema migrations, removals
- `dependency` — New deps, version conflicts, license issues
- `quality` — Readability, maintainability, naming, duplication
- `testing` — Missing tests, weak assertions, untested paths
- `consistency` — Deviates from codebase patterns
+**Categories:** `security` `reliability` `design` `breaking-change` `dependency` `quality` `testing` `consistency`

-## Consolidated Output
+## Evidence Requirements

-After all reviewers finish, compile:
+Every CRITICAL or WARNING must include concrete evidence. Without evidence, downgrade to INFO.

-```markdown
-## Check Phase Results — Cycle N
+**Valid evidence:** command output, exit codes, code citations with line numbers, git diff excerpts, reproduction steps.

-### Guardian: APPROVED
-| Location | Severity | Category | Description | Fix |
-|----------|----------|----------|-------------|-----|
-| src/auth/handler.ts:52 | WARNING | security | Missing rate limit | Add rate limiter middleware |
+**Banned in CRITICAL/WARNING:** "might be", "could potentially", "appears to", "seems like", "may not". Rewrite with evidence or downgrade.

-### Skeptic: APPROVED
-| Location | Severity | Category | Description | Fix |
-|----------|----------|----------|-------------|-----|
-| src/auth/handler.ts:30 | INFO | design | Consider caching validated tokens | Add TTL cache for token validation |
+For each CRITICAL/WARNING, state: (1) what was tested, (2) what was observed, (3) what correct behavior should be.

-### Sage: APPROVED
-| Location | Severity | Category | Description | Fix |
-|----------|----------|----------|-------------|-----|
-| tests/auth.test.ts:15 | WARNING | testing | Test names don't describe behavior | Rename to "should reject expired tokens" |
+## Attention Filters

-### Trickster: REJECTED
-| Location | Severity | Category | Description | Fix |
-|----------|----------|----------|-------------|-----|
-| src/auth/handler.ts:48 | CRITICAL | reliability | Empty string bypasses validation | Add `if (!token || token.trim() === '')` guard |
+Each archetype receives only relevant context. Do not pass everything.

-### Deduplication
-If two reviewers raise the same issue (same file + same category), merge:
-| Guardian + Skeptic | CRITICAL | security | Input not sanitized (src/api.ts:30) | Add validation |
+| Archetype | Receives | Excludes |
+|-----------|----------|----------|
+| Guardian | Maker's git diff + proposal risk section + test results | Explorer research, Creator rationale, other reviewers |
+| Skeptic | Creator's proposal (assumptions + architecture) + confidence scores | Git diff, Explorer research, other reviewers |
+| Sage | Creator's proposal + Maker's diff + implementation summary + test results | Explorer raw research, other reviewer verdicts |
+| Trickster | Maker's git diff + attack surface summary (file types + entry points) | Proposal, research, other reviewers |

-Use the higher severity. Don't double-count in the verdict.
+**Token budget targets:**

-### Verdict: REJECTED — 1 critical finding
-→ Build cycle feedback (see orchestration skill) and feed to Plan phase
-```
+| Archetype | Fast | Standard | Thorough |
+|-----------|------|----------|----------|
+| Guardian | 1500 | 2000 | 2500 |
+| Skeptic | skip | 1500 | 2000 |
+| Trickster | skip | skip | 1500 |
+| Sage | skip | 2500 | 3000 |
+
+**Context isolation:** Agents receive fresh, controller-constructed context only. No session bleed, no cross-agent contamination, no ambient knowledge. Verify zero references to excluded artifacts before spawning.
+
+**Cycle-back filtering (cycle 2+):** Pass structured feedback table only (not full reviewer artifacts). Strip resolved items. Cap at 500 tokens — summarize by severity if exceeded.

 ## Reviewer Spawning Protocol

-This section defines the exact sequence for spawning reviewers in the Check phase.
-
 ### Step 1: Guardian First (mandatory)

-Guardian always runs first, before any other reviewer. It receives the Maker's git diff and the proposal's risk section only.
-
-**Context for Guardian:**
- `git diff main...<maker-branch>` (the actual code changes)
- Risk section from `plan-creator.md` (if present)
- Do NOT include: Explorer research, full proposal, other reviewer outputs
-
-```
-Agent(
-  description: "Guardian: security and risk review for <task>",
-  prompt: "You are the GUARDIAN archetype.
-    Review the diff: <maker's diff>
-    Proposal risks: <risk section from plan-creator.md>
-    Assess: security vulnerabilities, reliability risks, breaking changes, dependency risks.
-    Output: APPROVED or REJECTED with findings in the standardized format.
-    Each finding: | Location | Severity | Category | Description | Fix |",
-  model: <resolve_model guardian $WORKFLOW>
-)
-```
+Guardian always runs first. It receives the Maker's git diff and the proposal's risk section only.

 Save output to `.archeflow/artifacts/${RUN_ID}/check-guardian.md`.

 ### Step 2: A2 Fast-Path Evaluation

-After Guardian completes, parse its output before spawning other reviewers:
+After Guardian completes, count CRITICAL and WARNING findings in its output. If both are zero, and not escalated, and not first cycle of a thorough workflow — skip remaining reviewers and proceed to Act phase.

-```bash
-CRITICAL_COUNT=$(grep -c "| CRITICAL |" ".archeflow/artifacts/${RUN_ID}/check-guardian.md" || echo 0)
-WARNING_COUNT=$(grep -c "| WARNING |" ".archeflow/artifacts/${RUN_ID}/check-guardian.md" || echo 0)
+### Step 3: Parallel Remaining Reviewers

-# A2 fast-path: skip remaining reviewers if Guardian is clean
-# Exception: first cycle of thorough workflows always spawns all reviewers
-if [[ "$CRITICAL_COUNT" -eq 0 && "$WARNING_COUNT" -eq 0 \
-      && "$ESCALATED" != "true" \
-      && ! ("$WORKFLOW" == "thorough" && "$CYCLE" -eq 1) ]]; then
-  echo "Guardian fast-path: 0 CRITICAL, 0 WARNING — skipping remaining reviewers."
-  # Proceed directly to Act phase
-fi
-```
-
-### Step 3: Parallel Reviewer Spawning
-
-If A2 does not trigger, spawn remaining reviewers in parallel based on workflow:
+If A2 does not trigger, spawn remaining reviewers in parallel:

 | Workflow | Reviewers (after Guardian) |
 |----------|--------------------------|
 | `fast` | None (Guardian only) |
-| `fast` (escalated via A1) | Skeptic + Sage |
+| `fast` (escalated) | Skeptic + Sage |
 | `standard` | Skeptic + Sage |
 | `thorough` | Skeptic + Sage + Trickster |

-Spawn all applicable reviewers in a single message with multiple Agent calls:
+Each reviewer gets context per the attention filters above.

-```
-# Standard workflow example — spawn Skeptic and Sage in parallel:
-Agent(
-  description: "Skeptic: challenge assumptions for <task>",
-  prompt: "<Skeptic prompt with Creator's proposal>",
-  model: <resolve_model skeptic $WORKFLOW>
-)
+### Step 4: Collect and Consolidate

-Agent(
-  description: "Sage: holistic quality review for <task>",
-  prompt: "<Sage prompt with proposal + diff + implementation summary>",
-  model: <resolve_model sage $WORKFLOW>
-)
+For each reviewer: save to `.archeflow/artifacts/${RUN_ID}/check-<archetype>.md`, emit `review.verdict` event, record sequence number.
+
+**Deduplication:** If two reviewers raise the same issue (same file + same category), merge into one finding using the higher severity. Don't double-count.
+
+**Verdict:** Count CRITICAL findings across all reviewers (after dedup). Any CRITICAL = `REJECTED`. Otherwise `APPROVED`.
+
+Example consolidated output:
+
+```markdown
+## Check Phase Results — Cycle 1
+### Guardian: APPROVED
+| Location | Severity | Category | Description | Fix |
+|----------|----------|----------|-------------|-----|
+| src/auth.ts:52 | WARNING | security | Missing rate limit | Add rate limiter |
+### Verdict: APPROVED — 0 critical, 1 warning
 ```

-Each reviewer gets context per the attention filters defined in `archeflow:orchestration`:
- **Skeptic:** Creator's proposal (assumptions section focus)
- **Sage:** Creator's proposal + Maker's diff + implementation summary
- **Trickster:** Maker's diff only
+## Timeout Handling

-### Step 4: Collect Results
+Each reviewer has a **5-minute timeout**. On timeout: emit `agent.complete` with `"error": true`, log WARNING, treat as no findings, proceed.

-Wait for all spawned reviewers to return. For each:
-1. Save output to `.archeflow/artifacts/${RUN_ID}/check-<archetype>.md`
-2. Emit `review.verdict` event with findings
-3. Record sequence number for DAG parent tracking
-
-### Timeout Handling
-
-Each reviewer has a **5-minute timeout**. If a reviewer does not return within 5 minutes:
-1. Emit `agent.complete` with `"error": true, "reason": "timeout"`
-2. Log a WARNING — do not block the run
-3. Treat the timed-out reviewer as having delivered no findings (neither approved nor rejected)
-4. Proceed with available verdicts
-
-If Guardian times out, this is a blocking failure — abort the Check phase and report to the user.
-
-### Re-Check Protocol (Act Phase Fixes)
-
-When the Act phase routes findings back to the Maker and the Maker applies fixes in a subsequent cycle, the Check phase re-runs with the updated diff. Reviewers who previously rejected should focus on whether their specific findings were addressed. The structured feedback from `act-feedback.md` provides the mapping of which findings were routed where.
-
---
-
-## Evidence Requirements
-
-Every CRITICAL or WARNING finding must include concrete evidence. Findings without evidence are downgraded to INFO.
-
-### Evidence Types
-
-| Type | Example | When Required |
-|------|---------|---------------|
-| Command output | `npm test` output showing failure | Test-related findings |
-| Exit code | `exit code 1 from eslint` | Tool-based validation |
-| Code citation | `src/auth.ts:48 — \`if (token) { ... }\`` | Logic or security findings |
-| Git diff | `+  db.query(userInput)` (unsanitized) | Implementation review |
-| Reproduction steps | "1. Send POST with empty body, 2. Observe 500" | Runtime behavior findings |
-
-### Banned Phrases
-
-The following phrases are not permitted in CRITICAL or WARNING findings. They indicate speculation, not evidence:
-
- "might be"
- "could potentially"
- "appears to"
- "seems like"
- "may not"
-
-A finding using these phrases must either be rewritten with evidence or downgraded to INFO.
-
-### Verification Protocol
-
-For each CRITICAL or WARNING finding, state:
-
-1. **What was tested** — the specific code path, input, or scenario examined
-2. **What was observed** — the actual behavior or code construct found
-3. **What correct behavior should be** — the expected alternative
-
-### Downgrade Rule
-
-If a reviewer produces a CRITICAL or WARNING finding without any of the evidence types above, the orchestrator downgrades it to INFO and emits a `decision` event:
-
-```bash
-./lib/archeflow-event.sh "$RUN_ID" decision check "" \
-  '{"what":"evidence_downgrade","from":"CRITICAL","to":"INFO","finding":"<description>","reviewer":"<archetype>","reason":"no evidence provided"}'
-```
-
---
-
-## Why Structured Findings Matter
-
-The standardized format enables:
- **Cross-cycle tracking:** Same category + location = same issue. Can detect resolution or regression.
- **Feedback routing:** Security/design findings → Creator. Quality/testing findings → Maker.
- **Shadow detection:** CRITICAL:WARNING ratios, finding counts, and category distributions are measurable.
- **Metrics:** Severity counts feed into the orchestration summary.
+**Exception:** Guardian timeout is blocking — abort Check phase and report to user.