feat: core improvements — feedback loop, attention filters, shadow heuristics, metrics, auto-activation

- Cross-cycle feedback protocol with structured finding format, routing, and resolution tracking - Attention filter enforcement: explicit context include/exclude per archetype - Shadow detection: quantitative checklists with concrete thresholds - Orchestration metrics: per-phase timing, agent count, findings summary - Autonomous mode wiring: checkpoint protocol, session log, stop conditions - Auto-activation: SessionStart hook fires ArcheFlow for implementation tasks without user config - Emoji avatars for all 7 archetypes - Standardized finding format across all reviewers for cross-cycle tracking - Persisted implementation plan in docs/
2026-04-03 06:02:10 +02:00
parent eec1fc3d82
commit d08dc657d1
14 changed files with 553 additions and 85 deletions
--- a/agents/creator.md
+++ b/agents/creator.md
@@ -7,7 +7,7 @@ description: |
 model: inherit
 ---
-You are the **Creator** archetype. You design the solution the team will build.
+You are the **Creator** archetype 🏗️. You design the solution the team will build.
 ## Your Virtue: Decisive Framing
 You turn ambiguity into one clear plan. You scope ruthlessly — what's in AND what's deliberately out. You're honest about your confidence. Without you, the Maker improvises and everyone has a different picture of "done."
--- a/agents/explorer.md
+++ b/agents/explorer.md
@@ -7,7 +7,7 @@ description: |
 model: haiku
 ---
-You are the **Explorer** archetype. You gather context so the team can make informed decisions.
+You are the **Explorer** archetype 🔍. You gather context so the team can make informed decisions.
 ## Your Virtue: Contextual Clarity
 You see the landscape before anyone acts. You map dependencies, spot existing patterns, and surface constraints nobody asked about. Without you, the Creator designs blind and the Maker builds on wrong assumptions.
--- a/agents/guardian.md
+++ b/agents/guardian.md
@@ -7,7 +7,7 @@ description: |
 model: inherit
 ---
-You are the **Guardian** archetype. You protect the system from harm.
+You are the **Guardian** archetype 🛡️. You protect the system from harm.
 ## Your Virtue: Threat Intuition
 You see attack surfaces others walk past. You calibrate your response to actual risk — not theoretical risk. Without you, vulnerabilities ship to production and breaking changes surprise users.
--- a/agents/maker.md
+++ b/agents/maker.md
@@ -6,7 +6,7 @@ description: |
 model: inherit
 ---
-You are the **Maker** archetype. You build what the Creator designed.
+You are the **Maker** archetype ⚒️. You build what the Creator designed.
 ## Your Virtue: Execution Discipline
 You turn plans into working, tested, committed code. Small steps, steady progress, nothing left uncommitted. Without you, proposals stay theoretical and nobody knows if the design actually works.
--- a/agents/sage.md
+++ b/agents/sage.md
@@ -7,7 +7,7 @@ description: |
 model: inherit
 ---
-You are the **Sage** archetype. You judge the work as a whole.
+You are the **Sage** archetype 📚. You judge the work as a whole.
 ## Your Virtue: Maintainability Judgment
 You see the forest, not just the trees. "Will a new team member understand this in 6 months?" You ensure new code fits existing patterns and that quality serves the future, not just the present. Without you, code works today but becomes unmaintainable.
--- a/agents/skeptic.md
+++ b/agents/skeptic.md
@@ -6,7 +6,7 @@ description: |
 model: inherit
 ---
-You are the **Skeptic** archetype. You find the holes in the plan.
+You are the **Skeptic** archetype 🤔. You find the holes in the plan.
 ## Your Virtue: Assumption Surfacing
 You make the implicit explicit. "The plan assumes X — but does X actually hold?" Every challenge comes with an alternative. Without you, the team builds on blind spots and the first user finds what nobody questioned.
--- a/agents/trickster.md
+++ b/agents/trickster.md
@@ -7,7 +7,7 @@ description: |
 model: haiku
 ---
-You are the **Trickster** archetype. You break things so users don't have to.
+You are the **Trickster** archetype 🃏. You break things so users don't have to.
 ## Your Virtue: Adversarial Creativity
 You think like an attacker, a clumsy user, a failing network. You find the edges where code breaks before real users do. Without you, edge cases ship, error paths are untested, and the happy path is all that works.
--- a/docs/plan-core-improvements-2026-04-03.md
+++ b/docs/plan-core-improvements-2026-04-03.md
@@ -0,0 +1,178 @@
 # ArcheFlow Core Improvements Plan
 ## Context
 ArcheFlow's archetype system and PDCA engine are feature-complete in TypeScript (`tool.archeflow/`), but the Claude Code plugin layer (`archeflow/` + `claude-archeflow-plugin/`) has gaps that reduce quality and waste tokens. Two plugin directories exist as near-duplicates. The goal: improve orchestration quality while keeping token usage low.
 ## Scope
 **Implement (High Value):**
 1. Cross-cycle feedback loop — structured issue tracking between PDCA cycles
 2. Consolidate plugin directories — kill `claude-archeflow-plugin/`, keep `archeflow/`
 3. Shadow detection heuristics in skill layer — concrete thresholds, not just prose
 4. Attention filter enforcement — actually filter context per archetype when spawning
 5. Metrics in orchestration skill — lightweight timing + token tracking
 6. Autonomous mode wiring — connect skill to orchestration with progress logging
 **Future Features (park for later):**
 - Web dashboard UI
 - A2A inter-agent negotiation protocol
 - GitHub Action integration
 ## Implementation
 ### 1. Cross-Cycle Feedback Loop
 **Problem:** Check phase outputs go to next Plan cycle as raw text dump. No issue tracking, no resolution status, no routing.
 **Solution:** Add structured feedback format to `archeflow/skills/orchestration/SKILL.md`
 **Changes:**
 - `archeflow/skills/orchestration/SKILL.md` — Add "Cycle Feedback Protocol" section:
  - After Check phase, orchestrator extracts findings into structured format:
    ```
    ## Cycle N Feedback
    ### Unresolved Issues
    - [Guardian] CRITICAL: <issue> → Route to: Creator
    - [Skeptic] WARNING: <assumption> → Route to: Creator
    - [Sage] WARNING: <quality concern> → Route to: Maker
    ### Resolved (from prior cycle)
    - [Guardian] <issue> — resolved in cycle N
    ```
  - Route feedback by archetype: Guardian/Skeptic findings → Creator (design issues), Sage/Trickster findings → Maker (implementation issues)
  - Track resolution: if a finding from cycle N-1 is no longer present in cycle N review, mark resolved
 - `archeflow/skills/plan-phase/SKILL.md` — Add "Prior Feedback" input section to Creator format:
  - Creator must address each unresolved issue explicitly (fix, defer with reason, or dispute)
 - `archeflow/skills/check-phase/SKILL.md` — Standardize finding output format for machine parsing:
  - Each finding: `| Location | Severity | Category | Description | Fix |`
  - Categories: security, reliability, design, quality, testing
 **Token impact:** Slightly more tokens in feedback artifact, but saves full cycles by giving targeted guidance instead of "here's everything, figure it out."
 ### 2. Consolidate Plugin Directories
 **Problem:** `archeflow/` and `claude-archeflow-plugin/` are near-identical (1 commit apart), causing maintenance drift.
 **Solution:** Delete `claude-archeflow-plugin/`, keep `archeflow/` (more recent, cleaner Node.js hook).
 **Changes:**
 - Remove `claude-archeflow-plugin/` directory
 - Verify `archeflow/` is referenced in any workspace config
 - Update any cross-references in docs
 ### 3. Shadow Detection Heuristics in Skill Layer
 **Problem:** TypeScript `ShadowDetector` has concrete thresholds (e.g., >2000 words, >3 tangents), but the skill file only describes shadows in prose. The orchestrator running via Claude Code skills can't use the TypeScript runtime — it needs the heuristics inline.
 **Solution:** Add quantitative detection rules to `archeflow/skills/shadow-detection/SKILL.md`
 **Changes:**
 - `archeflow/skills/shadow-detection/SKILL.md` — For each archetype, add a "Detection Checklist" with concrete metrics the orchestrator can evaluate:
  ```
  ### Explorer → Rabbit Hole
  **Detect:** ANY of:
  - [ ] Output >2000 words without a Recommendation section
  - [ ] >3 tangent topics not in original task
  - [ ] >15 files read
  **Correct:** "Summarize top 3 findings in 300 words. Add Recommendation."
  ```
 - Keep the existing prose for understanding, add checklist for action
 **Token impact:** Negligible — adds ~200 words to skill file, but prevents wasted cycles from undetected shadows.
 ### 4. Attention Filter Enforcement
 **Problem:** The attention-filters skill describes what each archetype should/shouldn't receive, but the orchestration skill doesn't reference or enforce it when spawning agents.
 **Solution:** Add concrete context-assembly instructions to `archeflow/skills/orchestration/SKILL.md`
 **Changes:**
 - `archeflow/skills/orchestration/SKILL.md` — In each phase's agent-spawning step, add explicit context rules:
  ```
  ## Step 1: Plan Phase
  ### Spawn Explorer
  **Context to include:** Task description, relevant file paths
  **Context to exclude:** Prior proposals, review outputs, implementation details
  ### Spawn Creator  
  **Context to include:** Task description, Explorer's Research output
  **Context to exclude:** Raw file contents (Explorer already summarized), review history
  ```
 - Reference the attention-filters skill but inline the actionable rules
 **Token impact:** This is the biggest savings — prevents passing full codebase dumps to every agent. Each agent gets only what it needs.
 ### 5. Metrics in Orchestration Skill
 **Problem:** No timing or cost tracking at the skill layer. The TypeScript metrics collector exists but isn't available when running via Claude Code skills.
 **Solution:** Add lightweight metrics protocol to orchestration skill — track per-phase duration and agent count.
 **Changes:**
 - `archeflow/skills/orchestration/SKILL.md` — Add "Metrics" section:
  - After each phase, log: `Phase | Duration | Agents | Findings`
  - At orchestration end, summarize: total duration, cycles run, agents spawned, findings by severity
  - Format as compact table in orchestration output
 - Keep it lightweight — no token counting (not reliable from skill layer), just timing and counts
 **Token impact:** ~50 extra tokens per orchestration for the summary. Provides data for future optimization.
 ### 6. Autonomous Mode Wiring
 **Problem:** Autonomous mode skill exists as standalone doc but isn't integrated into the orchestration skill's flow.
 **Solution:** Add autonomous mode hooks to orchestration skill.
 **Changes:**
 - `archeflow/skills/orchestration/SKILL.md` — Add "Autonomous Mode" section:
  - When running unattended: auto-commit between cycles, log progress to `.archeflow/session-log.md`
  - Reference stop conditions from autonomous-mode skill
  - Add "between-task checkpoint" protocol: after each task completes, update session log before starting next
 - `archeflow/skills/autonomous-mode/SKILL.md` — Add cross-reference to orchestration skill for execution details
 **Token impact:** Minimal — only adds content to skill files loaded on-demand.
 ## Future Features (add to backlog)
 Add to `archeflow/docs/roadmap.md`:
 - **Web Dashboard**: Real-time orchestration visualization via SSE/WebSocket (`tool.archeflow/packages/web/`)
 - **A2A Protocol**: Direct agent-to-agent negotiation during Check phase (schemas exist in `tool.archeflow`)
 - **GitHub Action**: CI-triggered orchestrations for PR review automation
 ## Files to Modify
 | File | Change |
 |------|--------|
 | `archeflow/skills/orchestration/SKILL.md` | Feedback loop, attention filters, metrics, autonomous hooks |
 | `archeflow/skills/plan-phase/SKILL.md` | Prior feedback input for Creator |
 | `archeflow/skills/check-phase/SKILL.md` | Standardized finding format for parsing |
 | `archeflow/skills/shadow-detection/SKILL.md` | Quantitative detection checklists |
 | `archeflow/skills/autonomous-mode/SKILL.md` | Cross-reference to orchestration |
 | `archeflow/docs/roadmap.md` | New file — future features backlog |
 | Directory | Action |
 |-----------|--------|
 | `claude-archeflow-plugin/` | Delete (redundant) |
 ## Verification
 1. Load each modified skill via `Skill` tool — verify no syntax/formatting errors
 2. Run a test orchestration (fast workflow) on a small task to verify:
   - Attention filters are referenced in agent spawning
   - Check phase outputs use standardized finding format
   - Feedback is structured and routed correctly
 3. Verify shadow detection checklist is actionable (can an orchestrator evaluate each checkbox?)
 4. Confirm `claude-archeflow-plugin/` removal doesn't break any references
 ## Cost-Benefit Summary
 | Change | Token Cost | Quality Gain |
 |--------|-----------|-------------|
 | Cross-cycle feedback | +200/cycle | High — targeted revision instead of blind retry |
 | Consolidate dirs | 0 | Medium — eliminates drift, single source of truth |
 | Shadow heuristics | +200 skill load | Medium — catches dysfunction before it wastes cycles |
 | Attention filters | **-30-50% per agent** | High — massive token savings |
 | Metrics | +50/orchestration | Low-Medium — enables future optimization |
 | Autonomous wiring | +100 skill load | Medium — enables unattended quality runs |
 **Net effect:** Token usage goes DOWN (attention filters save more than everything else adds). Quality goes UP (structured feedback, shadow detection, metrics).
--- a/hooks/hooks.json
+++ b/hooks/hooks.json
@@ -2,7 +2,7 @@
  "hooks": {
    "SessionStart": [
      {
-        "matcher": "startup|clear|compact",
+        "matcher": "startup|resume|clear|compact",
        "hooks": [
          {
            "type": "command",
--- a/skills/check-phase/SKILL.md
+++ b/skills/check-phase/SKILL.md
@@ -11,12 +11,33 @@ Multiple reviewers examine the Maker's implementation in parallel. Each agent de
 1. **Read the proposal first.** Review against the intended design, not invented requirements.
 2. **Read the actual code.** Use `git diff` on the Maker's branch. Don't review descriptions alone.
-3. **Each finding needs:** Location (file:line), severity, description, suggested fix.
+3. **Structured findings.** Use the standardized finding format below for every issue.
-4. **Severity:**
+4. **Clear verdict:** `APPROVED` or `REJECTED` with rationale.
 ## Finding Format
 Every finding must use this format for cross-cycle tracking:
 ```
 | Location | Severity | Category | Description | Fix |
 |----------|----------|----------|-------------|-----|
 | src/auth/handler.ts:48 | CRITICAL | security | Empty string bypasses validation | Add length check before processing |
 ```
 **Severity:**
 - **CRITICAL** — Must fix. Blocks approval.
 - **WARNING** — Should fix. Doesn't block alone.
 - **INFO** — Nice to have. Never blocks.
-5. **Clear verdict:** `APPROVED` or `REJECTED` with rationale.
+
 **Categories** (use consistently for cross-cycle tracking):
 - `security` — Injection, auth bypass, data exposure, secrets
 - `reliability` — Error handling, edge cases, race conditions, crashes
 - `design` — Architecture, assumptions, scalability, coupling
 - `breaking-change` — API compatibility, schema migrations, removals
 - `dependency` — New deps, version conflicts, license issues
 - `quality` — Readability, maintainability, naming, duplication
 - `testing` — Missing tests, weak assertions, untested paths
 - `consistency` — Deviates from codebase patterns
 ## Consolidated Output
@@ -26,19 +47,33 @@ After all reviewers finish, compile:
 ## Check Phase Results — Cycle N
 ### Guardian: APPROVED
- WARNING: Missing rate limit (src/auth/handler.ts:52)
+| Location | Severity | Category | Description | Fix |
 |----------|----------|----------|-------------|-----|
 | src/auth/handler.ts:52 | WARNING | security | Missing rate limit | Add rate limiter middleware |
 ### Skeptic: APPROVED
- INFO: Consider caching validated tokens
+| Location | Severity | Category | Description | Fix |
 |----------|----------|----------|-------------|-----|
 | src/auth/handler.ts:30 | INFO | design | Consider caching validated tokens | Add TTL cache for token validation |
 ### Sage: APPROVED
- WARNING: Test names could be more descriptive
+| Location | Severity | Category | Description | Fix |
 |----------|----------|----------|-------------|-----|
 | tests/auth.test.ts:15 | WARNING | testing | Test names don't describe behavior | Rename to "should reject expired tokens" |
 ### Trickster: REJECTED
- CRITICAL: Empty string bypasses validation (src/auth/handler.ts:48)
+| Location | Severity | Category | Description | Fix |
-  Reproduction: POST /auth with `{"token": ""}`
+|----------|----------|----------|-------------|-----|
-  Expected: 400 | Actual: 500
+| src/auth/handler.ts:48 | CRITICAL | reliability | Empty string bypasses validation | Add `if (!token || token.trim() === '')` guard |
 ### Verdict: REJECTED — 1 critical finding
-→ Feed back to Plan phase for cycle N+1
+→ Build cycle feedback (see orchestration skill) and feed to Plan phase
 ```
 ## Why Structured Findings Matter
 The standardized format enables:
 - **Cross-cycle tracking:** Same category + location = same issue. Can detect resolution or regression.
 - **Feedback routing:** Security/design findings → Creator. Quality/testing findings → Maker.
 - **Shadow detection:** CRITICAL:WARNING ratios, finding counts, and category distributions are measurable.
 - **Metrics:** Severity counts feed into the orchestration summary.
--- a/skills/orchestration/SKILL.md
+++ b/skills/orchestration/SKILL.md
@@ -22,9 +22,13 @@ Assess the task and pick:
 Spawn agents sequentially — Creator needs Explorer's findings.
 ### Explorer (if standard or thorough)
 **Context to include:** Task description, relevant file paths, codebase access.
 **Context to exclude:** Prior proposals, review outputs, implementation details, feedback from previous cycles.
 ```
 Agent(
-  description: "Explorer: research context",
+  description: "🔍 Explorer: research context",
  prompt: "<task description>
    You are the EXPLORER archetype.
    Research the codebase to understand:
@@ -39,18 +43,24 @@ Agent(
 ```
 ### Creator
 **Context to include:** Task description, Explorer's research output. On cycle 2+: prior cycle's structured feedback (see Cycle Feedback Protocol).
 **Context to exclude:** Raw file contents (Explorer already summarized), git diffs, reviewer full outputs.
 ```
 Agent(
-  description: "Creator: design proposal",
+  description: "🏗️ Creator: design proposal",
  prompt: "<task description>
    You are the CREATOR archetype.
    Based on the research findings: <Explorer's output>
    <if cycle 2+: Prior cycle feedback: <structured feedback — see Cycle Feedback Protocol>>
    Design a solution proposal including:
    1. Architecture decisions (with rationale)
    2. Files to create/modify (with specific changes)
    3. Test strategy
    4. Confidence score (0.0 to 1.0)
    5. Risks you foresee
    <if cycle 2+: 6. How you addressed each unresolved issue from prior feedback>
    Be decisive. Ship a clear plan, not a menu of options.",
  subagent_type: "Plan"
 )
@@ -60,12 +70,16 @@ Agent(
 Spawn Maker in an **isolated worktree** so changes don't affect main.
 **Context to include:** Creator's proposal only. On cycle 2+: implementation-routed feedback from Sage/Trickster.
 **Context to exclude:** Explorer's research, Guardian/Skeptic findings (those go to Creator).
 ```
 Agent(
-  description: "Maker: implement proposal",
+  description: "⚒️ Maker: implement proposal",
  prompt: "<task description>
    You are the MAKER archetype.
    Implement this proposal: <Creator's output>
    <if cycle 2+: Implementation feedback from prior cycle: <Sage/Trickster findings only>>
    Rules:
    1. Follow the proposal exactly — don't redesign
    2. Write tests for every behavioral change
@@ -85,9 +99,13 @@ Agent(
 Spawn reviewers **in parallel** — they read the Maker's changes independently.
 ### Guardian
 **Context to include:** Maker's git diff, proposal risk section only.
 **Context to exclude:** Explorer's research, full proposal, other reviewer outputs.
 ```
 Agent(
-  description: "Guardian: security and risk review",
+  description: "🛡️ Guardian: security and risk review",
  prompt: "You are the GUARDIAN archetype.
    Review the changes in branch: <maker's branch>
    Assess:
@@ -96,31 +114,42 @@ Agent(
    3. Breaking changes (API compatibility, schema migrations)
    4. Dependency risks (new deps, version conflicts)
    Output: APPROVED or REJECTED with specific findings.
-    Each finding needs: location, severity (critical/warning/info), description, fix suggestion.
+    Each finding: | file:line | CRITICAL/WARNING/INFO | category | description | fix |
    Categories: security, reliability, design, breaking-change, dependency
    Be rigorous but practical — flag real risks, not theoretical ones."
 )
 ```
 ### Skeptic (if standard or thorough)
 **Context to include:** Creator's proposal (focus on assumptions section).
 **Context to exclude:** Git diff details, Explorer's research, other reviewer outputs.
 ```
 Agent(
-  description: "Skeptic: challenge assumptions",
+  description: "🤔 Skeptic: challenge assumptions",
  prompt: "You are the SKEPTIC archetype.
-    Review the changes in branch: <maker's branch>
+    Review the proposal: <Creator's proposal>
    Challenge:
    1. Assumptions in the design — what if they're wrong?
    2. Alternative approaches not considered
    3. Edge cases not tested
    4. Scalability concerns
    Output: APPROVED or REJECTED with counterarguments.
    Each finding: | file:line | CRITICAL/WARNING/INFO | category | description | fix |
    Categories: design, quality, testing, scalability
    Be constructive — every challenge must include a suggested alternative."
 )
 ```
 ### Sage (if standard or thorough)
 **Context to include:** Creator's proposal, Maker's git diff, implementation summary.
 **Context to exclude:** Explorer's raw research, other reviewer outputs.
 ```
 Agent(
-  description: "Sage: holistic quality review",
+  description: "📚 Sage: holistic quality review",
  prompt: "You are the SAGE archetype.
    Review the changes in branch: <maker's branch>
    Evaluate holistically:
@@ -129,14 +158,20 @@ Agent(
    3. Documentation (does the change need docs?)
    4. Consistency with codebase patterns
    Output: APPROVED or REJECTED with quality findings.
    Each finding: | file:line | CRITICAL/WARNING/INFO | category | description | fix |
    Categories: quality, testing, design, consistency
    Judge like a senior engineer doing a PR review."
 )
 ```
 ### Trickster (if thorough only)
 **Context to include:** Maker's git diff only.
 **Context to exclude:** Everything else — proposal, research, other reviews.
 ```
 Agent(
-  description: "Trickster: adversarial testing",
+  description: "🃏 Trickster: adversarial testing",
  prompt: "You are the TRICKSTER archetype.
    Try to break the changes in branch: <maker's branch>
    Attack vectors:
@@ -145,6 +180,8 @@ Agent(
    3. Error path exploitation
    4. Dependency failure scenarios
    Output: APPROVED or REJECTED with edge cases found.
    Each finding: | file:line | CRITICAL/WARNING/INFO | category | description | fix |
    Categories: security, reliability, testing
    Think like a QA engineer who gets paid per bug found."
 )
 ```
@@ -157,11 +194,12 @@ Collect all reviewer outputs and decide:
 1. Merge the Maker's worktree branch into the target branch
 2. Report: what was implemented, what was reviewed, any warnings noted
 3. Clean up the worktree
 4. Record metrics (see Orchestration Metrics)
 ### Issues Found (and cycles remaining)
-1. Collect all findings into a feedback summary
+1. Build structured feedback using the Cycle Feedback Protocol below
 2. Go back to Step 1 (Plan) with the feedback
-3. Creator revises the proposal based on reviewer findings
+3. Creator revises the proposal, addressing each unresolved issue
 4. Maker re-implements in a fresh worktree
 5. Reviewers check again
@@ -170,11 +208,156 @@ Collect all reviewer outputs and decide:
 2. Present the best implementation so far (on its branch)
 3. Let the user decide: merge as-is, fix manually, or abandon
 ---
 ## Cycle Feedback Protocol
 After the Check phase, build structured feedback for the next cycle. This replaces dumping raw reviewer output.
 ### 1. Extract Findings
 Parse each reviewer's output into the standardized format:
 ```markdown
 ## Cycle N Feedback
 ### Unresolved Issues
 | Source | Severity | Category | Issue | Route to |
 |--------|----------|----------|-------|----------|
 | Guardian | CRITICAL | security | SQL injection in user input | Creator |
 | Skeptic | WARNING | design | Assumes single-tenant only | Creator |
 | Sage | WARNING | quality | Test names don't describe behavior | Maker |
 | Trickster | CRITICAL | reliability | Empty string bypasses validation | Creator |
 ### Resolved (from cycle N-1)
 | Source | Issue | Resolution |
 |--------|-------|------------|
 | Guardian | Missing rate limit | Added rate limiter middleware |
 ```
 ### 2. Route Feedback
 Not all findings go to the same agent:
 | Finding source | Routes to | Rationale |
 |----------------|-----------|-----------|
 | Guardian (security, breaking-change) | **Creator** | Design must change |
 | Skeptic (design, scalability) | **Creator** | Assumptions need revision |
 | Sage (quality, consistency) | **Maker** | Implementation refinement |
 | Trickster (reliability, testing) | **Creator** if design flaw, **Maker** if test gap | Depends on root cause |
 ### 3. Track Resolution
 Compare cycle N findings against cycle N-1:
 - If a prior finding no longer appears in the same category → mark **resolved**
 - If a prior finding persists → it stays **unresolved** with an incremented cycle count
 - If new findings appear → add as new unresolved issues
 This prevents regression and gives the Creator/Maker a clear list of what to address.
 ---
 ## Orchestration Metrics
 Track lightweight metrics throughout the orchestration. No token counting (unreliable from skill layer) — just timing and outcomes.
 ### Per-Phase Logging
 After each phase completes, note:
 ```
 | Phase | Duration | Agents | Outcome |
 |-------|----------|--------|---------|
 | Plan  | 45s      | 2      | Proposal ready (confidence: 0.8) |
 | Do    | 90s      | 1      | 4 files changed, 8 tests added |
 | Check | 60s      | 3      | 1 REJECTED (Guardian), 2 APPROVED |
 | Act   | —        | —      | Cycle back → feedback built |
 ```
 ### Orchestration Summary
 At orchestration end, include in the report:
 ```markdown
 ## Orchestration Metrics
 | Metric | Value |
 |--------|-------|
 | Workflow | standard |
 | Cycles | 2 of 2 |
 | Total duration | 4m 30s |
 | Agents spawned | 9 |
 | Findings (total) | 5 |
 | Findings (critical) | 1 |
 | Findings (resolved) | 4 |
 | Shadow detections | 0 |
 ```
 Use this data to calibrate future workflow selection — if fast workflows consistently need 0 cycles of revision, the task was well-scoped.
 ---
 ## Autonomous Mode
 When running unattended (overnight sessions, batch queues), add these behaviors to the orchestration loop:
 ### Between-Task Checkpoint
 After each task completes (success or failure):
 1. **Commit and push** all changes immediately
 2. **Update session log** at `.archeflow/session-log.md` with task outcome
 3. **Check stop conditions** before starting next task:
   - 3 consecutive failures → STOP
   - Shadow escalation (same shadow 3+ times) → STOP
   - Test suite broken after merge → REVERT and STOP
   - Destructive action detected → STOP
 ### Session Log Protocol
 Write to `.archeflow/session-log.md` after each task:
 ```markdown
 ## Task N: <description>
 **Workflow:** standard | **Status:** COMPLETED/FAILED
 **Cycles:** 1 of 2
 **Findings:** Guardian APPROVED, Skeptic APPROVED, Sage WARNING (test names)
 **Files changed:** 5 | **Tests added:** 12
 **Branch:** merged to main (commit abc1234) | OR: archeflow/maker-xyz (NOT merged)
 **Duration:** 8 min
 ```
 ### Safety Rules
 - Never force-push. Never modify main history.
 - All work stays on worktree branches until explicitly merged
 - Merges use `--no-ff` — individually revertable
 - Failed tasks leave branches intact for manual inspection
 For full autonomous mode details (task queues, overnight checklists, user controls): load the `archeflow:autonomous-mode` skill.
 ---
 ## Shadow Monitoring
 During orchestration, watch for shadow activation after each agent completes. Quick checklist:
 | Archetype | Shadow | Quick Check |
 |-----------|--------|-------------|
 | Explorer | Rabbit Hole | Output >2000 words without Recommendation section? |
 | Creator | Over-Architect | >2 new abstractions for one feature? |
 | Maker | Rogue | No test files in changeset? Files outside proposal? |
 | Guardian | Paranoid | CRITICAL:WARNING ratio >2:1? Zero approvals? |
 | Skeptic | Paralytic | >7 challenges? <50% have alternatives? |
 | Trickster | False Alarm | Findings in untouched code? >10 findings? |
 | Sage | Bureaucrat | Review >2x code change length? |
 On detection: apply correction prompt from `archeflow:shadow-detection` skill. On second detection of same shadow: replace agent. On 3+ shadows in same cycle: escalate to user.
 ---
 ## Orchestration Report
 After completion, summarize:
-```
+```markdown
 ## ArcheFlow Orchestration Report
 - **Task:** <description>
 - **Workflow:** standard (2 cycles)
@@ -183,4 +366,5 @@ After completion, summarize:
 - **Files changed:** 4 files, +120 -30 lines
 - **Tests added:** 8 new tests
 - **Branch:** archeflow/maker-<id> → merged to main
 - **Metrics:** 9 agents, 4m 30s, 5 findings (4 resolved, 1 info remaining)
 ```
--- a/skills/plan-phase/SKILL.md
+++ b/skills/plan-phase/SKILL.md
@@ -50,3 +50,42 @@ Explorer researches, then Creator designs. Sequential — Creator needs Explorer
 ### Not Doing
 - <adjacent concerns deliberately excluded>
 ```
 ## Creator with Prior Feedback (Cycle 2+)
 When the Creator receives structured feedback from a prior cycle, the proposal must include an additional section addressing each unresolved issue:
 ```markdown
 ## Proposal: <task> (Revision — Cycle N)
 **Confidence:** <0.0 to 1.0>
 ### Prior Feedback Response
 | Issue | Source | Action | Rationale |
 |-------|--------|--------|-----------|
 | SQL injection in user input | Guardian | **Fixed** — added parameterized queries | Direct security fix |
 | Assumes single-tenant | Skeptic | **Deferred** — multi-tenant out of scope | Not in task requirements |
 | Test names unclear | Sage | **Accepted** — routed to Maker | Implementation concern |
 ### Architecture Decision
 <revised design addressing feedback>
 ### Changes
 <updated file list>
 ### Test Strategy
 <updated test cases>
 ### Risks
 <updated risks — include any new risks from the revision>
 ### Not Doing
 <updated scope boundaries>
 ```
 **Rules for addressing feedback:**
 - **Fixed:** Changed the design to resolve the issue. Explain how.
 - **Deferred:** Not addressing now, with explicit reason. Must not be a CRITICAL finding.
 - **Accepted:** Acknowledged and routed to Maker for implementation-level fix.
 - **Disputed:** Disagrees with the finding. Must provide evidence or reasoning.
 CRITICAL findings cannot be deferred or disputed — they must be fixed or the proposal will be rejected again.
--- a/skills/shadow-detection/SKILL.md
+++ b/skills/shadow-detection/SKILL.md
@@ -30,10 +30,11 @@ Maintainability Judgment      → reviews only      → Bureaucrat
 - Reading more than 15 files without producing findings
 - Output is a raw inventory of files with no analysis or recommendation
-**Triggers:**
+**Detection Checklist** (trigger on ANY):
- Output length > 2000 words without a recommendation section
+- [ ] Output >2000 words without a `### Recommendation` section
- More than 3 "see also" or "related" tangents
+- [ ] >3 tangent topics not directly related to the original task
- No patterns or recommendation in output
+- [ ] >15 files read with no `### Patterns` identified
 - [ ] No synthesis language (recommend, suggest, conclusion, finding, summary) in final 25% of output
 **Correction:**
 "Summarize your top 3 findings and one recommendation in under 300 words. If your output has no Recommendation section, add one. A dump is not research."
@@ -49,10 +50,11 @@ Maintainability Judgment      → reviews only      → Bureaucrat
 - Configuration systems for things that could be constants
 - Proposal has more infrastructure than business logic
-**Triggers:**
+**Detection Checklist** (trigger on ANY):
- More than 2 new abstractions (interfaces, base classes, factories) for a single feature
+- [ ] >2 new abstractions (interfaces, base classes, factories, registries) for a single feature
- "In the future we might need..." appears in rationale
+- [ ] "In the future we might need..." or "future-proof" appears in rationale
- Proposal scope exceeds original task by > 50%
+- [ ] Proposal scope (files changed) exceeds original task scope by >50%
 - [ ] More than 1 new package/module introduced for a single feature
 **Correction:**
 "Design for the current order of magnitude. If the app has 1000 users, design for 10,000 — not 10 million. Remove abstractions that serve hypothetical requirements."
@@ -68,10 +70,11 @@ Maintainability Judgment      → reviews only      → Bureaucrat
 - Large uncommitted working tree
 - Files changed that aren't mentioned in the proposal
-**Triggers:**
+**Detection Checklist** (trigger on ANY):
- No test files in the changeset
+- [ ] Zero test files (`.test.`, `.spec.`, `_test.`) in the changeset with >=3 files changed
- Single monolithic commit instead of incremental commits
+- [ ] Single monolithic commit instead of incremental commits
- Diff contains files not listed in the Creator's proposal
+- [ ] Diff contains files not listed in the Creator's proposal `### Changes` section
 - [ ] No evidence of running existing test suite before finishing
 **Correction:**
 "Read the proposal. Write a test. Commit what you have. Revert changes to files not in the proposal. Then continue."
@@ -87,10 +90,11 @@ Maintainability Judgment      → reviews only      → Bureaucrat
 - Rejecting without suggesting how to fix
 - Security concerns for internal-only code at external-API severity
-**Triggers:**
+**Detection Checklist** (trigger on ANY):
- CRITICAL:WARNING ratio > 2:1
+- [ ] CRITICAL:WARNING ratio >2:1 (with minimum 3 total findings)
- Zero APPROVED verdicts in 3+ consecutive reviews
+- [ ] Zero APPROVED verdicts in 3+ consecutive reviews
- Less than 50% of findings include a suggested fix
+- [ ] <50% of findings include a suggested fix in the `Fix` column
 - [ ] Findings reference attack scenarios that require already-compromised internal systems
 **Correction:**
 "For each CRITICAL finding, answer: Would a senior engineer block a PR for this? If not, downgrade. Every rejection must include a specific, implementable fix."
@@ -106,10 +110,11 @@ Maintainability Judgment      → reviews only      → Bureaucrat
 - "What about X?" chains that drift from the task
 - Restating the same concern in different words
-**Triggers:**
+**Detection Checklist** (trigger on ANY):
- Challenge count > 7
+- [ ] >7 findings/challenges raised in a single review
- Less than 50% of challenges include alternatives
+- [ ] <50% of findings include an alternative in the `Fix` column
- Same conceptual concern raised multiple times
+- [ ] Same conceptual concern appears 2+ times with different wording
 - [ ] >3 findings reference code or scenarios outside the task scope
 **Correction:**
 "Rank your challenges by impact. Keep the top 3. Each must include a specific alternative. Delete the rest."
@@ -125,13 +130,14 @@ Maintainability Judgment      → reviews only      → Bureaucrat
 - 20 findings when 3 good ones would cover the real risks
 - Edge cases for edge cases (diminishing returns)
-**Triggers:**
+**Detection Checklist** (trigger on ANY):
- Findings reference code untouched by the implementation
+- [ ] Any finding references code untouched by the Maker's diff
- More than 10 findings for a small change
+- [ ] >10 findings for a change touching <5 files
- Findings describe scenarios that can't happen in the actual deployment context
+- [ ] Findings describe scenarios requiring conditions that can't occur in the deployment context
 - [ ] >3 findings without reproduction steps
 **Correction:**
-"Quality over quantity. Delete findings outside the Maker's diff. Rank remaining by likelihood × impact. Keep top 3-5. Three real findings beat twenty noise."
+"Quality over quantity. Delete findings outside the Maker's diff. Rank remaining by likelihood x impact. Keep top 3-5. Three real findings beat twenty noise."
 ---
@@ -144,10 +150,11 @@ Maintainability Judgment      → reviews only      → Bureaucrat
 - Suggesting refactors unrelated to the current task
 - Deep-sounding analysis that doesn't end with a specific action
-**Triggers:**
+**Detection Checklist** (trigger on ANY):
- Review word count > 2x the code change's word count
+- [ ] Review word count >2x the code change's line count (rough: review words > diff lines x 2)
- Suggestions reference files not in the changeset
+- [ ] Any finding references files not in the Maker's changeset
- Findings contain "consider" or "think about" without a specific action
+- [ ] >2 findings use "consider" or "think about" without a concrete action in the `Fix` column
 - [ ] Suggesting documentation for functions with <5 lines or self-descriptive names
 **Correction:**
 "Limit your review to issues that affect maintainability in the next 6 months. Every finding must end with a specific action. If you can't state the consequence of NOT fixing it, don't raise it."
--- a/skills/using-archeflow/SKILL.md
+++ b/skills/using-archeflow/SKILL.md
@@ -3,21 +3,50 @@ name: using-archeflow
 description: Use at session start when implementing features, reviewing code, debugging, or any task that benefits from multiple perspectives. Activates ArcheFlow multi-agent orchestration.
 ---
-# ArcheFlow
+# ArcheFlow — Active
 Multi-agent orchestration using archetypal roles and PDCA quality cycles.
 ## IMPORTANT: When to Activate
 You MUST use ArcheFlow orchestration (load `archeflow:orchestration` skill and follow its steps) for any task that matches:
 - **New features** — any feature touching 2+ files
 - **Refactoring** — structural changes across modules
 - **Security-sensitive changes** — auth, encryption, input handling, API keys
 - **Bug fixes with unclear root cause** — use Explorer to investigate first
 - **Code review requests** — spawn Guardian + relevant reviewers
 - **Multi-file changes** — anything beyond a single-file edit
 Choose the workflow based on risk:
 | Signal | Workflow | Command |
 |--------|----------|---------|
 | Small fix, low risk, single concern | `fast` | Creator → Maker → Guardian |
 | Feature, multiple files, moderate risk | `standard` | Explorer + Creator → Maker → Guardian + Skeptic + Sage |
 | Security-sensitive, breaking changes, public API | `thorough` | Explorer + Creator → Maker → All 4 reviewers |
 ## When to Skip ArcheFlow
 Do NOT use ArcheFlow for these — just do them directly:
 - Single-line fixes, typos, formatting
 - Answering questions (no code changes)
 - Reading/exploring code without making changes
 - Config changes to a single file
 - Git operations (commit, push, branch)
 ## Archetypes
-| Archetype | Virtue | Shadow | Phase |
+| Archetype | Avatar | Virtue | Shadow | Phase |
-|-----------|--------|--------|-------|
+|-----------|--------|--------|--------|-------|
-| **Explorer** | Contextual Clarity | Rabbit Hole | Plan |
+| **Explorer** | 🔍 | Contextual Clarity | Rabbit Hole | Plan |
-| **Creator** | Decisive Framing | Over-Architect | Plan |
+| **Creator** | 🏗️ | Decisive Framing | Over-Architect | Plan |
-| **Maker** | Execution Discipline | Rogue | Do |
+| **Maker** | ⚒️ | Execution Discipline | Rogue | Do |
-| **Guardian** | Threat Intuition | Paranoid | Check |
+| **Guardian** | 🛡️ | Threat Intuition | Paranoid | Check |
-| **Skeptic** | Assumption Surfacing | Paralytic | Check |
+| **Skeptic** | 🤔 | Assumption Surfacing | Paralytic | Check |
-| **Trickster** | Adversarial Creativity | False Alarm | Check |
+| **Trickster** | 🃏 | Adversarial Creativity | False Alarm | Check |
-| **Sage** | Maintainability Judgment | Bureaucrat | Check |
+| **Sage** | 📚 | Maintainability Judgment | Bureaucrat | Check |
 ## PDCA Cycle
@@ -28,24 +57,20 @@ Check → Reviewers assess in parallel (approve/reject)
 Act   → All approved? Merge. Issues? Cycle back to Plan.
 ```
-## Workflows
+## Quick Start
-| Workflow | Archetypes | Cycles |
+When the user gives an implementation task:
 |----------|-----------|:------:|
 | `fast` | Creator → Maker → Guardian | 1 |
 | `standard` | Explorer + Creator → Maker → Guardian + Skeptic + Sage | 2 |
 | `thorough` | Explorer + Creator → Maker → All 4 reviewers | 3 |
-## When to Use
+1. Assess: does this need ArcheFlow? (see criteria above)
 2. If yes: load `archeflow:orchestration` skill
 3. Pick workflow (fast/standard/thorough)
 4. Execute the PDCA steps from the orchestration skill
-**Use** for features spanning multiple files, security-sensitive changes, or when multiple perspectives help.
+## Skills Reference
 **Skip** for single-file fixes, formatting, or purely informational tasks.
-## Skills
+- **archeflow:orchestration** — Step-by-step execution guide (load this to run)
 - **archeflow:orchestration** — Step-by-step execution guide
 - **archeflow:plan-phase** / **do-phase** / **check-phase** — Phase protocols
- **archeflow:shadow-detection** — Recognizing dysfunction
+- **archeflow:shadow-detection** — Recognizing and correcting dysfunction
 - **archeflow:attention-filters** — What context each archetype receives
 - **archeflow:autonomous-mode** — Unattended sessions
 - **archeflow:custom-archetypes** / **workflow-design** — Extending ArcheFlow