feat: core improvements — feedback loop, attention filters, shadow heuristics, metrics, auto-activation

- Cross-cycle feedback protocol with structured finding format, routing, and resolution tracking - Attention filter enforcement: explicit context include/exclude per archetype - Shadow detection: quantitative checklists with concrete thresholds - Orchestration metrics: per-phase timing, agent count, findings summary - Autonomous mode wiring: checkpoint protocol, session log, stop conditions - Auto-activation: SessionStart hook fires ArcheFlow for implementation tasks without user config - Emoji avatars for all 7 archetypes - Standardized finding format across all reviewers for cross-cycle tracking - Persisted implementation plan in docs/
2026-04-03 06:02:10 +02:00
parent eec1fc3d82
commit d08dc657d1
14 changed files with 553 additions and 85 deletions
--- a/skills/orchestration/SKILL.md
+++ b/skills/orchestration/SKILL.md
@@ -22,9 +22,13 @@ Assess the task and pick:
 Spawn agents sequentially — Creator needs Explorer's findings.

 ### Explorer (if standard or thorough)
+
+**Context to include:** Task description, relevant file paths, codebase access.
+**Context to exclude:** Prior proposals, review outputs, implementation details, feedback from previous cycles.
+
 ```
 Agent(
-  description: "Explorer: research context",
+  description: "🔍 Explorer: research context",
  prompt: "<task description>
    You are the EXPLORER archetype.
    Research the codebase to understand:
@@ -39,18 +43,24 @@ Agent(
 ```

 ### Creator
+
+**Context to include:** Task description, Explorer's research output. On cycle 2+: prior cycle's structured feedback (see Cycle Feedback Protocol).
+**Context to exclude:** Raw file contents (Explorer already summarized), git diffs, reviewer full outputs.
+
 ```
 Agent(
-  description: "Creator: design proposal",
+  description: "🏗️ Creator: design proposal",
  prompt: "<task description>
    You are the CREATOR archetype.
    Based on the research findings: <Explorer's output>
+    <if cycle 2+: Prior cycle feedback: <structured feedback — see Cycle Feedback Protocol>>
    Design a solution proposal including:
    1. Architecture decisions (with rationale)
    2. Files to create/modify (with specific changes)
    3. Test strategy
    4. Confidence score (0.0 to 1.0)
    5. Risks you foresee
+    <if cycle 2+: 6. How you addressed each unresolved issue from prior feedback>
    Be decisive. Ship a clear plan, not a menu of options.",
  subagent_type: "Plan"
 )
@@ -60,12 +70,16 @@ Agent(

 Spawn Maker in an **isolated worktree** so changes don't affect main.

+**Context to include:** Creator's proposal only. On cycle 2+: implementation-routed feedback from Sage/Trickster.
+**Context to exclude:** Explorer's research, Guardian/Skeptic findings (those go to Creator).
+
 ```
 Agent(
-  description: "Maker: implement proposal",
+  description: "⚒️ Maker: implement proposal",
  prompt: "<task description>
    You are the MAKER archetype.
    Implement this proposal: <Creator's output>
+    <if cycle 2+: Implementation feedback from prior cycle: <Sage/Trickster findings only>>
    Rules:
    1. Follow the proposal exactly — don't redesign
    2. Write tests for every behavioral change
@@ -85,9 +99,13 @@ Agent(
 Spawn reviewers **in parallel** — they read the Maker's changes independently.

 ### Guardian
+
+**Context to include:** Maker's git diff, proposal risk section only.
+**Context to exclude:** Explorer's research, full proposal, other reviewer outputs.
+
 ```
 Agent(
-  description: "Guardian: security and risk review",
+  description: "🛡️ Guardian: security and risk review",
  prompt: "You are the GUARDIAN archetype.
    Review the changes in branch: <maker's branch>
    Assess:
@@ -96,31 +114,42 @@ Agent(
    3. Breaking changes (API compatibility, schema migrations)
    4. Dependency risks (new deps, version conflicts)
    Output: APPROVED or REJECTED with specific findings.
-    Each finding needs: location, severity (critical/warning/info), description, fix suggestion.
+    Each finding: | file:line | CRITICAL/WARNING/INFO | category | description | fix |
+    Categories: security, reliability, design, breaking-change, dependency
    Be rigorous but practical — flag real risks, not theoretical ones."
 )
 ```

 ### Skeptic (if standard or thorough)
+
+**Context to include:** Creator's proposal (focus on assumptions section).
+**Context to exclude:** Git diff details, Explorer's research, other reviewer outputs.
+
 ```
 Agent(
-  description: "Skeptic: challenge assumptions",
+  description: "🤔 Skeptic: challenge assumptions",
  prompt: "You are the SKEPTIC archetype.
-    Review the changes in branch: <maker's branch>
+    Review the proposal: <Creator's proposal>
    Challenge:
    1. Assumptions in the design — what if they're wrong?
    2. Alternative approaches not considered
    3. Edge cases not tested
    4. Scalability concerns
    Output: APPROVED or REJECTED with counterarguments.
+    Each finding: | file:line | CRITICAL/WARNING/INFO | category | description | fix |
+    Categories: design, quality, testing, scalability
    Be constructive — every challenge must include a suggested alternative."
 )
 ```

 ### Sage (if standard or thorough)
+
+**Context to include:** Creator's proposal, Maker's git diff, implementation summary.
+**Context to exclude:** Explorer's raw research, other reviewer outputs.
+
 ```
 Agent(
-  description: "Sage: holistic quality review",
+  description: "📚 Sage: holistic quality review",
  prompt: "You are the SAGE archetype.
    Review the changes in branch: <maker's branch>
    Evaluate holistically:
@@ -129,14 +158,20 @@ Agent(
    3. Documentation (does the change need docs?)
    4. Consistency with codebase patterns
    Output: APPROVED or REJECTED with quality findings.
+    Each finding: | file:line | CRITICAL/WARNING/INFO | category | description | fix |
+    Categories: quality, testing, design, consistency
    Judge like a senior engineer doing a PR review."
 )
 ```

 ### Trickster (if thorough only)
+
+**Context to include:** Maker's git diff only.
+**Context to exclude:** Everything else — proposal, research, other reviews.
+
 ```
 Agent(
-  description: "Trickster: adversarial testing",
+  description: "🃏 Trickster: adversarial testing",
  prompt: "You are the TRICKSTER archetype.
    Try to break the changes in branch: <maker's branch>
    Attack vectors:
@@ -145,6 +180,8 @@ Agent(
    3. Error path exploitation
    4. Dependency failure scenarios
    Output: APPROVED or REJECTED with edge cases found.
+    Each finding: | file:line | CRITICAL/WARNING/INFO | category | description | fix |
+    Categories: security, reliability, testing
    Think like a QA engineer who gets paid per bug found."
 )
 ```
@@ -157,11 +194,12 @@ Collect all reviewer outputs and decide:
 1. Merge the Maker's worktree branch into the target branch
 2. Report: what was implemented, what was reviewed, any warnings noted
 3. Clean up the worktree
+4. Record metrics (see Orchestration Metrics)

 ### Issues Found (and cycles remaining)
-1. Collect all findings into a feedback summary
+1. Build structured feedback using the Cycle Feedback Protocol below
 2. Go back to Step 1 (Plan) with the feedback
-3. Creator revises the proposal based on reviewer findings
+3. Creator revises the proposal, addressing each unresolved issue
 4. Maker re-implements in a fresh worktree
 5. Reviewers check again

@@ -170,11 +208,156 @@ Collect all reviewer outputs and decide:
 2. Present the best implementation so far (on its branch)
 3. Let the user decide: merge as-is, fix manually, or abandon

+---
+
+## Cycle Feedback Protocol
+
+After the Check phase, build structured feedback for the next cycle. This replaces dumping raw reviewer output.
+
+### 1. Extract Findings
+
+Parse each reviewer's output into the standardized format:
+
+```markdown
+## Cycle N Feedback
+
+### Unresolved Issues
+| Source | Severity | Category | Issue | Route to |
+|--------|----------|----------|-------|----------|
+| Guardian | CRITICAL | security | SQL injection in user input | Creator |
+| Skeptic | WARNING | design | Assumes single-tenant only | Creator |
+| Sage | WARNING | quality | Test names don't describe behavior | Maker |
+| Trickster | CRITICAL | reliability | Empty string bypasses validation | Creator |
+
+### Resolved (from cycle N-1)
+| Source | Issue | Resolution |
+|--------|-------|------------|
+| Guardian | Missing rate limit | Added rate limiter middleware |
+```
+
+### 2. Route Feedback
+
+Not all findings go to the same agent:
+
+| Finding source | Routes to | Rationale |
+|----------------|-----------|-----------|
+| Guardian (security, breaking-change) | **Creator** | Design must change |
+| Skeptic (design, scalability) | **Creator** | Assumptions need revision |
+| Sage (quality, consistency) | **Maker** | Implementation refinement |
+| Trickster (reliability, testing) | **Creator** if design flaw, **Maker** if test gap | Depends on root cause |
+
+### 3. Track Resolution
+
+Compare cycle N findings against cycle N-1:
+- If a prior finding no longer appears in the same category → mark **resolved**
+- If a prior finding persists → it stays **unresolved** with an incremented cycle count
+- If new findings appear → add as new unresolved issues
+
+This prevents regression and gives the Creator/Maker a clear list of what to address.
+
+---
+
+## Orchestration Metrics
+
+Track lightweight metrics throughout the orchestration. No token counting (unreliable from skill layer) — just timing and outcomes.
+
+### Per-Phase Logging
+
+After each phase completes, note:
+
+```
+| Phase | Duration | Agents | Outcome |
+|-------|----------|--------|---------|
+| Plan  | 45s      | 2      | Proposal ready (confidence: 0.8) |
+| Do    | 90s      | 1      | 4 files changed, 8 tests added |
+| Check | 60s      | 3      | 1 REJECTED (Guardian), 2 APPROVED |
+| Act   | —        | —      | Cycle back → feedback built |
+```
+
+### Orchestration Summary
+
+At orchestration end, include in the report:
+
+```markdown
+## Orchestration Metrics
+| Metric | Value |
+|--------|-------|
+| Workflow | standard |
+| Cycles | 2 of 2 |
+| Total duration | 4m 30s |
+| Agents spawned | 9 |
+| Findings (total) | 5 |
+| Findings (critical) | 1 |
+| Findings (resolved) | 4 |
+| Shadow detections | 0 |
+```
+
+Use this data to calibrate future workflow selection — if fast workflows consistently need 0 cycles of revision, the task was well-scoped.
+
+---
+
+## Autonomous Mode
+
+When running unattended (overnight sessions, batch queues), add these behaviors to the orchestration loop:
+
+### Between-Task Checkpoint
+
+After each task completes (success or failure):
+1. **Commit and push** all changes immediately
+2. **Update session log** at `.archeflow/session-log.md` with task outcome
+3. **Check stop conditions** before starting next task:
+   - 3 consecutive failures → STOP
+   - Shadow escalation (same shadow 3+ times) → STOP
+   - Test suite broken after merge → REVERT and STOP
+   - Destructive action detected → STOP
+
+### Session Log Protocol
+
+Write to `.archeflow/session-log.md` after each task:
+
+```markdown
+## Task N: <description>
+**Workflow:** standard | **Status:** COMPLETED/FAILED
+**Cycles:** 1 of 2
+**Findings:** Guardian APPROVED, Skeptic APPROVED, Sage WARNING (test names)
+**Files changed:** 5 | **Tests added:** 12
+**Branch:** merged to main (commit abc1234) | OR: archeflow/maker-xyz (NOT merged)
+**Duration:** 8 min
+```
+
+### Safety Rules
+- Never force-push. Never modify main history.
+- All work stays on worktree branches until explicitly merged
+- Merges use `--no-ff` — individually revertable
+- Failed tasks leave branches intact for manual inspection
+
+For full autonomous mode details (task queues, overnight checklists, user controls): load the `archeflow:autonomous-mode` skill.
+
+---
+
+## Shadow Monitoring
+
+During orchestration, watch for shadow activation after each agent completes. Quick checklist:
+
+| Archetype | Shadow | Quick Check |
+|-----------|--------|-------------|
+| Explorer | Rabbit Hole | Output >2000 words without Recommendation section? |
+| Creator | Over-Architect | >2 new abstractions for one feature? |
+| Maker | Rogue | No test files in changeset? Files outside proposal? |
+| Guardian | Paranoid | CRITICAL:WARNING ratio >2:1? Zero approvals? |
+| Skeptic | Paralytic | >7 challenges? <50% have alternatives? |
+| Trickster | False Alarm | Findings in untouched code? >10 findings? |
+| Sage | Bureaucrat | Review >2x code change length? |
+
+On detection: apply correction prompt from `archeflow:shadow-detection` skill. On second detection of same shadow: replace agent. On 3+ shadows in same cycle: escalate to user.
+
+---
+
 ## Orchestration Report

 After completion, summarize:

-```
+```markdown
 ## ArcheFlow Orchestration Report
 - **Task:** <description>
 - **Workflow:** standard (2 cycles)
@@ -183,4 +366,5 @@ After completion, summarize:
 - **Files changed:** 4 files, +120 -30 lines
 - **Tests added:** 8 new tests
 - **Branch:** archeflow/maker-<id> → merged to main
+- **Metrics:** 9 agents, 4m 30s, 5 findings (4 resolved, 1 info remaining)
 ```