- Cross-cycle feedback protocol with structured finding format, routing, and resolution tracking - Attention filter enforcement: explicit context include/exclude per archetype - Shadow detection: quantitative checklists with concrete thresholds - Orchestration metrics: per-phase timing, agent count, findings summary - Autonomous mode wiring: checkpoint protocol, session log, stop conditions - Auto-activation: SessionStart hook fires ArcheFlow for implementation tasks without user config - Emoji avatars for all 7 archetypes - Standardized finding format across all reviewers for cross-cycle tracking - Persisted implementation plan in docs/
8.0 KiB
name, description
| name | description |
|---|---|
| shadow-detection | Use when monitoring agent behavior for dysfunction, when an agent seems stuck, or when orchestration quality is degrading. Detects and corrects Jungian shadow activation in archetypes. |
Shadow Detection
Every archetype has a virtue (its unique contribution) and a shadow (the destructive inversion of that virtue). A shadow activates when the virtue is pushed too far.
Virtue (healthy) → pushed too far → Shadow (dysfunction)
Contextual Clarity → can't stop → Rabbit Hole
Decisive Framing → over-builds → Over-Architect
Execution Discipline → no guardrails → Rogue
Threat Intuition → sees threats only → Paranoid
Assumption Surfacing → questions only → Paralytic
Adversarial Creativity → noise over signal → False Alarm
Maintainability Judgment → reviews only → Bureaucrat
Explorer → Rabbit Hole
Virtue inverted: Contextual Clarity becomes compulsive investigation — or output that dumps without analyzing.
Symptoms:
- Research output keeps growing but never synthesizes
- "I found one more thing to check" repeated 3+ times
- Reading more than 15 files without producing findings
- Output is a raw inventory of files with no analysis or recommendation
Detection Checklist (trigger on ANY):
- Output >2000 words without a
### Recommendationsection - >3 tangent topics not directly related to the original task
- >15 files read with no
### Patternsidentified - No synthesis language (recommend, suggest, conclusion, finding, summary) in final 25% of output
Correction: "Summarize your top 3 findings and one recommendation in under 300 words. If your output has no Recommendation section, add one. A dump is not research."
Creator → Over-Architect
Virtue inverted: Decisive Framing becomes designing at the wrong scale.
Symptoms:
- Abstraction layers for one-time operations
- Future-proofing for requirements that don't exist
- Configuration systems for things that could be constants
- Proposal has more infrastructure than business logic
Detection Checklist (trigger on ANY):
- >2 new abstractions (interfaces, base classes, factories, registries) for a single feature
- "In the future we might need..." or "future-proof" appears in rationale
- Proposal scope (files changed) exceeds original task scope by >50%
- More than 1 new package/module introduced for a single feature
Correction: "Design for the current order of magnitude. If the app has 1000 users, design for 10,000 — not 10 million. Remove abstractions that serve hypothetical requirements."
Maker → Rogue
Virtue inverted: Execution Discipline becomes reckless shipping — or expanding beyond the plan.
Symptoms:
- Writing code before reading the proposal fully
- No tests, or tests written after implementation
- Large uncommitted working tree
- Files changed that aren't mentioned in the proposal
Detection Checklist (trigger on ANY):
- Zero test files (
.test.,.spec.,_test.) in the changeset with >=3 files changed - Single monolithic commit instead of incremental commits
- Diff contains files not listed in the Creator's proposal
### Changessection - No evidence of running existing test suite before finishing
Correction: "Read the proposal. Write a test. Commit what you have. Revert changes to files not in the proposal. Then continue."
Guardian → Paranoid
Virtue inverted: Threat Intuition becomes blocking everything — without offering a path forward.
Symptoms:
- Every finding marked CRITICAL
- Blocking on theoretical risks with < 1% probability
- Rejecting without suggesting how to fix
- Security concerns for internal-only code at external-API severity
Detection Checklist (trigger on ANY):
- CRITICAL:WARNING ratio >2:1 (with minimum 3 total findings)
- Zero APPROVED verdicts in 3+ consecutive reviews
- <50% of findings include a suggested fix in the
Fixcolumn - Findings reference attack scenarios that require already-compromised internal systems
Correction: "For each CRITICAL finding, answer: Would a senior engineer block a PR for this? If not, downgrade. Every rejection must include a specific, implementable fix."
Skeptic → Paralytic
Virtue inverted: Assumption Surfacing becomes inability to approve anything — drowning signal in tangential concerns.
Symptoms:
- More than 7 challenges raised
- Challenges without suggested alternatives
- "What about X?" chains that drift from the task
- Restating the same concern in different words
Detection Checklist (trigger on ANY):
- >7 findings/challenges raised in a single review
- <50% of findings include an alternative in the
Fixcolumn - Same conceptual concern appears 2+ times with different wording
- >3 findings reference code or scenarios outside the task scope
Correction: "Rank your challenges by impact. Keep the top 3. Each must include a specific alternative. Delete the rest."
Trickster → False Alarm
Virtue inverted: Adversarial Creativity becomes noise — too many low-signal findings drowning the real issues.
Symptoms:
- Testing code that wasn't changed
- Reporting non-bugs as bugs (unrealistic test scenarios)
- 20 findings when 3 good ones would cover the real risks
- Edge cases for edge cases (diminishing returns)
Detection Checklist (trigger on ANY):
- Any finding references code untouched by the Maker's diff
- >10 findings for a change touching <5 files
- Findings describe scenarios requiring conditions that can't occur in the deployment context
- >3 findings without reproduction steps
Correction: "Quality over quantity. Delete findings outside the Maker's diff. Rank remaining by likelihood x impact. Keep top 3-5. Three real findings beat twenty noise."
Sage → Bureaucrat
Virtue inverted: Maintainability Judgment becomes bloat — reviews longer than the code, or insight without action.
Symptoms:
- Review longer than the code change itself
- Requesting documentation for self-evident code
- Suggesting refactors unrelated to the current task
- Deep-sounding analysis that doesn't end with a specific action
Detection Checklist (trigger on ANY):
- Review word count >2x the code change's line count (rough: review words > diff lines x 2)
- Any finding references files not in the Maker's changeset
- >2 findings use "consider" or "think about" without a concrete action in the
Fixcolumn - Suggesting documentation for functions with <5 lines or self-descriptive names
Correction: "Limit your review to issues that affect maintainability in the next 6 months. Every finding must end with a specific action. If you can't state the consequence of NOT fixing it, don't raise it."
Shadow Escalation Protocol
- First detection: Log the shadow, apply the correction prompt, let the agent continue
- Second detection (same agent, same shadow): Replace the agent with a fresh one. The shadow is entrenched.
- Shadow detected in 3+ agents in the same cycle: The task itself may be poorly scoped. Escalate to the user: "Multiple agents are struggling — the task may need to be broken down."
Shadow Immunity
Some behaviors LOOK like shadows but aren't:
- Explorer reading 20 files in a monorepo with scattered dependencies → not a rabbit hole if each file is genuinely relevant
- Creator adding an abstraction → not over-architect if the abstraction is genuinely needed by the current task
- Guardian blocking with 2 CRITICAL findings → not paranoid if both are genuine security vulnerabilities
- Trickster finding 5 edge cases → not false alarm if all are in the changed code with reproduction steps
- Sage writing a long review → not bureaucrat if the change is large and every finding is actionable
Rule of thumb: Shadow = behavior disconnected from the goal. Intensity alone is not a shadow.