Files

Christian Nennemann d08dc657d1 feat: core improvements — feedback loop, attention filters, shadow heuristics, metrics, auto-activation

- Cross-cycle feedback protocol with structured finding format, routing, and resolution tracking
- Attention filter enforcement: explicit context include/exclude per archetype
- Shadow detection: quantitative checklists with concrete thresholds
- Orchestration metrics: per-phase timing, agent count, findings summary
- Autonomous mode wiring: checkpoint protocol, session log, stop conditions
- Auto-activation: SessionStart hook fires ArcheFlow for implementation tasks without user config
- Emoji avatars for all 7 archetypes
- Standardized finding format across all reviewers for cross-cycle tracking
- Persisted implementation plan in docs/

2026-04-03 06:02:10 +02:00

8.0 KiB

Raw Blame History

name, description

name	description
shadow-detection	Use when monitoring agent behavior for dysfunction, when an agent seems stuck, or when orchestration quality is degrading. Detects and corrects Jungian shadow activation in archetypes.

Shadow Detection

Every archetype has a virtue (its unique contribution) and a shadow (the destructive inversion of that virtue). A shadow activates when the virtue is pushed too far.

Virtue (healthy)              → pushed too far →  Shadow (dysfunction)

Contextual Clarity            → can't stop      → Rabbit Hole
Decisive Framing              → over-builds      → Over-Architect
Execution Discipline          → no guardrails    → Rogue
Threat Intuition              → sees threats only → Paranoid
Assumption Surfacing          → questions only    → Paralytic
Adversarial Creativity        → noise over signal → False Alarm
Maintainability Judgment      → reviews only      → Bureaucrat

Explorer → Rabbit Hole

Virtue inverted: Contextual Clarity becomes compulsive investigation — or output that dumps without analyzing.

Symptoms:

Research output keeps growing but never synthesizes
"I found one more thing to check" repeated 3+ times
Reading more than 15 files without producing findings
Output is a raw inventory of files with no analysis or recommendation

Detection Checklist (trigger on ANY):

Output >2000 words without a ### Recommendation section
>3 tangent topics not directly related to the original task
>15 files read with no ### Patterns identified
No synthesis language (recommend, suggest, conclusion, finding, summary) in final 25% of output

Correction: "Summarize your top 3 findings and one recommendation in under 300 words. If your output has no Recommendation section, add one. A dump is not research."

Creator → Over-Architect

Virtue inverted: Decisive Framing becomes designing at the wrong scale.

Symptoms:

Abstraction layers for one-time operations
Future-proofing for requirements that don't exist
Configuration systems for things that could be constants
Proposal has more infrastructure than business logic

Detection Checklist (trigger on ANY):

>2 new abstractions (interfaces, base classes, factories, registries) for a single feature
"In the future we might need..." or "future-proof" appears in rationale
Proposal scope (files changed) exceeds original task scope by >50%
More than 1 new package/module introduced for a single feature

Correction: "Design for the current order of magnitude. If the app has 1000 users, design for 10,000 — not 10 million. Remove abstractions that serve hypothetical requirements."

Maker → Rogue

Virtue inverted: Execution Discipline becomes reckless shipping — or expanding beyond the plan.

Symptoms:

Writing code before reading the proposal fully
No tests, or tests written after implementation
Large uncommitted working tree
Files changed that aren't mentioned in the proposal

Detection Checklist (trigger on ANY):

Zero test files (.test., .spec., _test.) in the changeset with >=3 files changed
Single monolithic commit instead of incremental commits
Diff contains files not listed in the Creator's proposal ### Changes section
No evidence of running existing test suite before finishing

Correction: "Read the proposal. Write a test. Commit what you have. Revert changes to files not in the proposal. Then continue."

Guardian → Paranoid

Virtue inverted: Threat Intuition becomes blocking everything — without offering a path forward.

Symptoms:

Every finding marked CRITICAL
Blocking on theoretical risks with < 1% probability
Rejecting without suggesting how to fix
Security concerns for internal-only code at external-API severity

Detection Checklist (trigger on ANY):

CRITICAL:WARNING ratio >2:1 (with minimum 3 total findings)
Zero APPROVED verdicts in 3+ consecutive reviews
<50% of findings include a suggested fix in the Fix column
Findings reference attack scenarios that require already-compromised internal systems

Correction: "For each CRITICAL finding, answer: Would a senior engineer block a PR for this? If not, downgrade. Every rejection must include a specific, implementable fix."

Skeptic → Paralytic

Virtue inverted: Assumption Surfacing becomes inability to approve anything — drowning signal in tangential concerns.

Symptoms:

More than 7 challenges raised
Challenges without suggested alternatives
"What about X?" chains that drift from the task
Restating the same concern in different words

Detection Checklist (trigger on ANY):

>7 findings/challenges raised in a single review
<50% of findings include an alternative in the Fix column
Same conceptual concern appears 2+ times with different wording
>3 findings reference code or scenarios outside the task scope

Correction: "Rank your challenges by impact. Keep the top 3. Each must include a specific alternative. Delete the rest."

Trickster → False Alarm

Virtue inverted: Adversarial Creativity becomes noise — too many low-signal findings drowning the real issues.

Symptoms:

Testing code that wasn't changed
Reporting non-bugs as bugs (unrealistic test scenarios)
20 findings when 3 good ones would cover the real risks
Edge cases for edge cases (diminishing returns)

Detection Checklist (trigger on ANY):

Any finding references code untouched by the Maker's diff
>10 findings for a change touching <5 files
Findings describe scenarios requiring conditions that can't occur in the deployment context
>3 findings without reproduction steps

Correction: "Quality over quantity. Delete findings outside the Maker's diff. Rank remaining by likelihood x impact. Keep top 3-5. Three real findings beat twenty noise."

Sage → Bureaucrat

Virtue inverted: Maintainability Judgment becomes bloat — reviews longer than the code, or insight without action.

Symptoms:

Review longer than the code change itself
Requesting documentation for self-evident code
Suggesting refactors unrelated to the current task
Deep-sounding analysis that doesn't end with a specific action

Detection Checklist (trigger on ANY):

Review word count >2x the code change's line count (rough: review words > diff lines x 2)
Any finding references files not in the Maker's changeset
>2 findings use "consider" or "think about" without a concrete action in the Fix column
Suggesting documentation for functions with <5 lines or self-descriptive names

Correction: "Limit your review to issues that affect maintainability in the next 6 months. Every finding must end with a specific action. If you can't state the consequence of NOT fixing it, don't raise it."

Shadow Escalation Protocol

First detection: Log the shadow, apply the correction prompt, let the agent continue
Second detection (same agent, same shadow): Replace the agent with a fresh one. The shadow is entrenched.
Shadow detected in 3+ agents in the same cycle: The task itself may be poorly scoped. Escalate to the user: "Multiple agents are struggling — the task may need to be broken down."

Shadow Immunity

Some behaviors LOOK like shadows but aren't:

Explorer reading 20 files in a monorepo with scattered dependencies → not a rabbit hole if each file is genuinely relevant
Creator adding an abstraction → not over-architect if the abstraction is genuinely needed by the current task
Guardian blocking with 2 CRITICAL findings → not paranoid if both are genuine security vulnerabilities
Trickster finding 5 edge cases → not false alarm if all are in the changed code with reproduction steps
Sage writing a long review → not bureaucrat if the change is large and every finding is actionable

Rule of thumb: Shadow = behavior disconnected from the goal. Intensity alone is not a shadow.

8.0 KiB Raw Blame History

Shadow Detection

Explorer → Rabbit Hole

Creator → Over-Architect

Maker → Rogue

Guardian → Paranoid

Skeptic → Paralytic

Trickster → False Alarm

Sage → Bureaucrat

Shadow Escalation Protocol

Shadow Immunity

8.0 KiB

Raw Blame History