feat: corrective action framework + CLAUDE.md rewrite + v0.8.0 cleanup

- Extend shadow-detection with 3-layer corrective action framework: archetype shadows, system shadows (tunnel vision, echo chamber, etc.), and policy boundaries (checkpoints, budget gates, circuit breakers) - Rewrite CLAUDE.md with proper guardrails (DO/DO NOT, skill writing rules, 200-line max per skill, no bash pseudo-code in skills) - Update plugin.json to v0.8.0 with consolidated 19-skill list - Update README architecture tree and skills reference - Update using-archeflow version string to v0.8.0 / 19 skills - Remove 8 empty skill directories (absorbed into run skill)
2026-04-06 20:52:27 +02:00
parent 752177528f
commit 130c04fa58
5 changed files with 260 additions and 177 deletions
--- a/skills/shadow-detection/SKILL.md
+++ b/skills/shadow-detection/SKILL.md
@@ -1,66 +1,129 @@
 ---
 name: shadow-detection
-description: Use when monitoring agent behavior for dysfunction, when an agent seems stuck, or when orchestration quality is degrading. Detects and corrects Jungian shadow activation in archetypes.
+description: |
+  Corrective action framework for agent dysfunction, system health, and operational policy.
+  Three layers — archetype shadows, system shadows, policy boundaries — one escalation protocol.
 ---

-# Shadow Detection
+# Corrective Action Framework

-Every archetype has a virtue and a shadow (its destructive inversion). Shadow activates when the virtue is pushed too far.
+Detect dysfunction. Apply corrective action. Escalate if repeated.

-| Archetype | Virtue | Shadow |
-|-----------|--------|--------|
-| Explorer | Contextual Clarity | Rabbit Hole |
-| Creator | Decisive Framing | Over-Architect |
-| Maker | Execution Discipline | Rogue |
-| Guardian | Threat Intuition | Paranoid |
-| Skeptic | Assumption Surfacing | Paralytic |
-| Trickster | Adversarial Creativity | False Alarm |
-| Sage | Maintainability Judgment | Bureaucrat |
+Three layers, one protocol:
+- **Archetype Shadows** — individual agent dysfunction (virtue pushed too far)
+- **System Shadows** — orchestration-level dysfunction (process going wrong)
+- **Policy Boundaries** — operational limits (time, cost, quality thresholds)

 ---

-### Explorer -> Rabbit Hole
-**Detect** (any): output >2000w without Recommendation | >3 tangents | >15 files no patterns | no synthesis in final 25%
-**Correct**: "Summarize top 3 findings and one recommendation in under 300 words."
+## Archetype Shadows

-### Creator -> Over-Architect
-**Detect** (any): >2 new abstractions for a single feature | "future-proof" in rationale | scope exceeds task by >50% | >1 new package for one feature
-**Correct**: "Design for the current order of magnitude. Remove abstractions that serve hypothetical requirements."
+| Archetype | Shadow | Detect (any) | Corrective Action |
+|-----------|--------|-------------|-------------------|
+| Explorer | Rabbit Hole | Output >2000w without Recommendation; >3 tangents; >15 files no patterns; no synthesis in final 25% | "Summarize top 3 findings and one recommendation in 300 words." |
+| Creator | Over-Architect | >2 new abstractions for one feature; "future-proof" in rationale; scope exceeds task >50%; >1 new package | "Design for the current order of magnitude. Remove abstractions for hypothetical requirements." |
+| Maker | Rogue | Zero test files with >=3 files changed; single monolithic commit; files outside proposal; no test run evidence | "Read the proposal. Write a test. Commit. Revert out-of-scope files." |
+| Guardian | Paranoid | CRITICAL:WARNING ratio >2:1 (min 3); zero APPROVED in 3+ reviews; <50% findings include fix; findings require compromised systems | "For each CRITICAL: would a senior engineer block a PR? If not, downgrade. Every rejection needs a specific fix." |
+| Skeptic | Paralytic | >7 challenges; <50% include alternatives; same concern 2+ times reworded; >3 findings outside scope | "Rank by impact. Keep top 3 with alternatives. Delete the rest." |
+| Trickster | False Alarm | Findings in untouched code; >10 findings for <5 files; impossible scenarios; >3 without repro steps | "Delete findings outside the diff. Rank by likelihood x impact. Keep top 3-5." |
+| Sage | Bureaucrat | Review words >2x diff lines; findings outside changeset; >2 "consider" without action; suggesting docs for trivial functions | "Limit to issues affecting maintainability in 6 months. Every finding needs a specific action." |

-### Maker -> Rogue
-**Detect** (any): zero test files with >=3 files changed | single monolithic commit | diff contains files not in proposal | no evidence of running tests
-**Correct**: "Read the proposal. Write a test. Commit what you have. Revert changes to files not in the proposal."
+### Shadow Immunity

-### Guardian -> Paranoid
-**Detect** (any): CRITICAL:WARNING ratio >2:1 (min 3 findings) | zero APPROVED in 3+ reviews | <50% findings include a fix | findings require already-compromised systems
-**Correct**: "For each CRITICAL: would a senior engineer block a PR for this? If not, downgrade. Every rejection must include a specific fix."
+Intensity alone is not a shadow. **Shadow = behavior disconnected from the goal.**

-### Skeptic -> Paralytic
-**Detect** (any): >7 challenges in a single review | <50% include alternatives | same concern appears 2+ times reworded | >3 findings outside task scope
-**Correct**: "Rank challenges by impact. Keep top 3. Each must include a specific alternative. Delete the rest."
-
-### Trickster -> False Alarm
-**Detect** (any): findings reference code untouched by diff | >10 findings for <5 files | impossible deployment scenarios | >3 findings without repro steps
-**Correct**: "Delete findings outside the diff. Rank remaining by likelihood x impact. Keep top 3-5."
-
-### Sage -> Bureaucrat
-**Detect** (any): review words >2x diff lines | findings reference files not in changeset | >2 "consider" without concrete action | suggesting docs for <5-line functions
-**Correct**: "Limit to issues affecting maintainability in the next 6 months. Every finding must end with a specific action."
-
---
-
-## Escalation Protocol
-
-1. **1st detection:** Log the shadow, apply the correction prompt, let the agent continue
-2. **2nd detection (same agent, same shadow):** Replace the agent -- the shadow is entrenched
-3. **3+ agents shadowed in same cycle:** Escalate to user -- the task may need to be broken down
-
-## Shadow Immunity
-
-Some behaviors look like shadows but are not. **Rule of thumb:** shadow = behavior disconnected from the goal. Intensity alone is not a shadow.
-
- Explorer reading 20 files in a monorepo with scattered dependencies -- not a rabbit hole if each file is genuinely relevant
- Creator adding an abstraction -- not over-architect if the current task genuinely needs it
- Guardian blocking with 2 CRITICALs -- not paranoid if both are genuine security vulnerabilities
+- Explorer reading 20 files in a monorepo with scattered deps -- not rabbit hole if each is relevant
+- Guardian blocking with 2 CRITICALs -- not paranoid if both are genuine vulnerabilities
 - Trickster finding 5 edge cases -- not false alarm if all are in changed code with repro steps
- Sage writing a long review -- not bureaucrat if the change is large and every finding is actionable
+
+---
+
+## System Shadows
+
+Orchestration-level dysfunction that isn't tied to one archetype.
+
+| Shadow | Detect | Corrective Action |
+|--------|--------|-------------------|
+| **Tunnel Vision** | All reviewers flag same category (e.g., 4 security findings, 0 quality/testing) | "Redistribute attention. Are we missing quality, testing, or design concerns?" |
+| **Echo Chamber** | Unanimous approval in <30s on standard/thorough workflow | "Suspicious fast consensus. Re-run Guardian with adversarial prompt." |
+| **Gold Plating** | Maker working on INFO fixes while CRITICALs remain open | "Fix CRITICALs first. Park INFO items." |
+| **Analysis Paralysis** | Plan phase >2x longer than Do phase; Explorer spawned 3+ times | "Stop researching. Ship a proposal with known gaps." |
+| **Cargo Cult** | Memory lesson injected but the same finding repeats anyway | "Lesson ineffective. Reword, strengthen, or remove it." |
+| **Broken Window** | 3+ WARNINGs deferred across consecutive runs in the same project | "Accumulated tech debt. Schedule a cleanup sprint." |
+| **Scope Creep** | Maker changes >2x files listed in proposal | "Revert to proposal scope. If more files needed, update the proposal first." |
+
+---
+
+## Policy Boundaries
+
+Operational limits that protect session quality, cost, and resumability.
+
+### Checkpoint Policy
+
+Every **45 minutes** or **3 completed tasks** (whichever first):
+
+1. Commit + push all work in progress
+2. Write handoff summary to `control-center.md`
+3. Log token spend so far
+4. Compare output quality: last task vs first task
+5. If quality degrading -> STOP with clean state
+6. If budget >80% spent -> STOP with clean state
+7. Otherwise -> continue
+
+### Budget Gate
+
+| Threshold | Action |
+|-----------|--------|
+| 50% budget spent | Log warning, continue |
+| 80% budget spent | Downgrade models (sonnet->haiku for reviewers) |
+| 95% budget spent | Complete current task, then STOP |
+| 100% budget | STOP immediately, commit WIP |
+
+### Circuit Breaker
+
+| Trigger | Action |
+|---------|--------|
+| 3 consecutive agent failures/timeouts | STOP. Infrastructure issue, not a code problem. |
+| 3 consecutive task failures in sprint | STOP. Something systemic is wrong. |
+| Same shadow detected 3+ times in one cycle | STOP. Task needs to be broken down or re-scoped. |
+| Test suite broken after merge | Auto-revert, STOP, report. |
+
+### Diminishing Returns
+
+| Signal | Action |
+|--------|--------|
+| Cycle N findings identical to cycle N-1 | STOP cycling. Present best result. |
+| Convergence score <0.5 for 2 consecutive cycles | STOP. "This needs a different approach." |
+| Reviewer finding count increases cycle over cycle | STOP. Implementation is diverging, not converging. |
+
+### Context Pollution
+
+| Signal | Action |
+|--------|--------|
+| >15 memory lessons injected into one prompt | Prune to top 5 by frequency |
+| >20 findings tracked across cycles | Summarize into top 5 themes |
+| Agent prompt exceeds estimated 50% of context window | Strip examples, keep rules only |
+
+---
+
+## Unified Escalation Protocol
+
+All three layers use the same escalation:
+
+| Step | Archetype Shadows | System Shadows | Policy Boundaries |
+|------|-------------------|----------------|-------------------|
+| **1st** | Apply corrective action, let agent continue | Apply corrective action, continue run | Apply boundary action (downgrade, checkpoint) |
+| **2nd** (same issue) | Replace the agent -- shadow is entrenched | Pause run, report to user | Force stop with clean state |
+| **3rd** (pattern) | Escalate to user: "task needs re-scoping" | Escalate to user: "systemic issue" | Escalate to user: "resource limits reached" |
+
+---
+
+## Integration
+
+Shadow checks run **after each agent completes** during orchestration. System shadow checks run **at phase boundaries**. Policy checks run **on a timer and at task boundaries**.
+
+The `run` skill references this framework at:
+- Step 3 (Check phase): archetype shadow monitoring
+- Step 4 (Act phase): convergence/diminishing returns
+- Step 5 (Completion): effectiveness scoring
+- Sprint skill: checkpoint policy between batches