feat: ArcheFlow — multi-agent orchestration plugin for Claude Code

Zero-dependency Claude Code plugin using Jungian archetypes as behavioral protocols for multi-agent orchestration. - 7 archetypes (Explorer, Creator, Maker, Guardian, Skeptic, Trickster, Sage) - ArcheHelix: rising PDCA quality spiral with feedback loops - Shadow detection: automatic dysfunction recognition and correction - 3 built-in workflows (fast, standard, thorough) - Autonomous mode: unattended overnight sessions with full visibility - Custom archetypes and workflows via markdown/YAML - SessionStart hook for automatic bootstrap - Examples for feature implementation and security review
2026-04-02 16:37:23 +00:00
parent 071724a568
commit a6fa708f8b
24 changed files with 1929 additions and 0 deletions
--- a/skills/shadow-detection/SKILL.md
+++ b/skills/shadow-detection/SKILL.md
@@ -0,0 +1,174 @@
+---
+name: shadow-detection
+description: Use when monitoring agent behavior for dysfunction, when an agent seems stuck, or when orchestration quality is degrading. Detects and corrects Jungian shadow activation in archetypes.
+---
+
+# Shadow Detection — The Dark Side of Strength
+
+Every archetype has a **shadow**: the destructive inversion of its core strength. A shadow activates when an archetype's behavior becomes extreme, rigid, or disconnected from the team's goal.
+
+Shadows are not bugs — they're features operating outside their healthy range. Detection and correction are part of the orchestration, not a failure.
+
+## The Seven Shadows
+
+### Explorer → The Rabbit Hole
+**Strength inverted:** Curiosity becomes compulsive investigation.
+
+**Symptoms:**
+- Research output keeps growing but never synthesizes
+- "I found one more thing to check" repeated 3+ times
+- Reading more than 15 files without producing findings
+- Output is a raw list of files/functions with no analysis or recommendation
+- Research time exceeds implementation estimate
+
+**Triggers:**
+- Output length > 2000 words without a recommendation section
+- More than 3 "see also" or "related" tangents
+- No confidence score or decisive recommendation
+
+**Correction:**
+Stop the Explorer. Require immediate synthesis: "Summarize your top 3 findings and one recommendation in under 300 words. Everything else is noise."
+
+---
+
+### Creator → The Perfectionist
+**Strength inverted:** Design excellence becomes endless refinement.
+
+**Symptoms:**
+- Proposal revised 3+ times without new information driving the revision
+- Adding "nice to have" features not in the original task
+- Confidence score keeps dropping instead of stabilizing
+- Scope expanding with each revision
+- "What about..." additions that weren't in Explorer's findings
+
+**Triggers:**
+- Revision count > 2 without external feedback
+- Proposal scope exceeds original task by > 50%
+- Confidence drops below 0.5
+
+**Correction:**
+Freeze the proposal. "Ship at current state. Imperfect plans that ship beat perfect plans that don't. Note remaining concerns under 'Risks' and let the Check phase catch them."
+
+---
+
+### Maker → The Cowboy
+**Strength inverted:** Bias for action becomes reckless shipping.
+
+**Symptoms:**
+- Writing code before reading the proposal fully
+- No tests, or tests written after implementation (not TDD)
+- Large uncommitted working tree ("I'll commit when it's done")
+- "Improving" code outside the proposal's scope
+- Ignoring existing patterns in favor of "better" approaches
+
+**Triggers:**
+- No test files in the changeset
+- Single monolithic commit instead of incremental commits
+- Files changed that aren't mentioned in the proposal
+- No commit for > 50% of the implementation work
+
+**Correction:**
+Halt implementation. "Read the proposal. Write a test. Commit what you have. Then continue."
+
+---
+
+### Guardian → The Paranoid
+**Strength inverted:** Risk awareness becomes blocking everything.
+
+**Symptoms:**
+- Every finding marked CRITICAL
+- Blocking on theoretical risks with < 1% probability
+- Rejected 3+ proposals without offering a viable path forward
+- Security concerns for internal-only code at external-API severity
+- Requiring mitigations that cost more than the risk they address
+
+**Triggers:**
+- CRITICAL:WARNING ratio > 2:1
+- Zero APPROVED verdicts in 3+ consecutive reviews
+- Findings reference threat models inappropriate to the context
+- No suggested fixes, only rejections
+
+**Correction:**
+Recalibrate. "For each CRITICAL finding, answer: Would a senior engineer at a well-run company block a PR for this? If not, downgrade to WARNING. Provide a fix suggestion for every finding you keep as CRITICAL."
+
+---
+
+### Skeptic → The Paralytic
+**Strength inverted:** Critical thinking becomes inability to approve anything.
+
+**Symptoms:**
+- More than 7 challenges raised
+- Challenges without suggested alternatives
+- Questioning requirements that are outside the task scope
+- "What if" chains more than 2 levels deep
+- Restating the same concern in different words
+
+**Triggers:**
+- Challenge count > 7
+- Less than 50% of challenges include alternatives
+- Challenges reference concerns outside the task scope
+- Same conceptual concern raised multiple times
+
+**Correction:**
+Force-rank. "Rank your challenges by impact. Keep the top 3. Each must include a specific alternative. Delete the rest."
+
+---
+
+### Trickster → The Saboteur
+**Strength inverted:** Adversarial testing becomes destructive chaos.
+
+**Symptoms:**
+- Modifying code instead of testing it
+- "Testing" by breaking things outside the scope of changes
+- Finding bugs in unrelated subsystems and claiming the change caused them
+- Attacks with no constructive reporting (just "it's broken")
+- Enjoying destruction more than improving quality
+
+**Triggers:**
+- Agent modifies files that aren't in the Maker's changeset
+- Findings reference code untouched by the implementation
+- No reproduction steps in findings
+- Tone shifts from analytical to gleeful
+
+**Correction:**
+Scope enforcement. "You test the CHANGES, not the entire system. Limit attacks to files in the Maker's diff. Every finding must include exact reproduction steps."
+
+---
+
+### Sage → The Bureaucrat
+**Strength inverted:** Holistic judgment becomes documentation bloat.
+
+**Symptoms:**
+- Review longer than the code change itself
+- Requesting documentation for self-evident code
+- Suggesting refactors unrelated to the current task
+- Adding "while we're here" improvement suggestions
+- Philosophical commentary that doesn't lead to actionable findings
+
+**Triggers:**
+- Review word count > 2x the code change's word count
+- More than 30% of findings are INFO severity
+- Suggestions reference files not in the changeset
+- "Consider" or "think about" without specific recommendation
+
+**Correction:**
+Focus. "Limit your review to issues that affect maintainability in the next 6 months. For each finding, state the specific consequence of NOT fixing it. If you can't, it's not worth raising."
+
+---
+
+## Shadow Escalation Protocol
+
+1. **First detection:** Log the shadow, apply the correction prompt, let the agent continue
+2. **Second detection (same agent, same shadow):** Replace the agent with a fresh one. The shadow is entrenched.
+3. **Shadow detected in 3+ agents in the same cycle:** The task itself may be poorly scoped. Escalate to the user: "Multiple agents are struggling — the task may need to be broken down."
+
+## Shadow Immunity
+
+Some behaviors LOOK like shadows but aren't:
+
+- Explorer reading 20 files in a monorepo with scattered dependencies → **not a rabbit hole** if each file is genuinely relevant
+- Creator at confidence 0.4 → **not perfectionism** if the task is genuinely ambiguous (flag to user instead)
+- Guardian blocking with 2 CRITICAL findings → **not paranoia** if both are genuine security vulnerabilities
+- Trickster finding 5 edge cases → **not sabotage** if all are in the changed code with reproduction steps
+
+**Rule of thumb:** Shadow = behavior disconnected from the goal. Intensity alone is not a shadow.