feat: ArcheFlow — multi-agent orchestration plugin for Claude Code

Zero-dependency Claude Code plugin using Jungian archetypes as behavioral protocols for multi-agent orchestration. - 7 archetypes (Explorer, Creator, Maker, Guardian, Skeptic, Trickster, Sage) - ArcheHelix: rising PDCA quality spiral with feedback loops - Shadow detection: automatic dysfunction recognition and correction - 3 built-in workflows (fast, standard, thorough) - Autonomous mode: unattended overnight sessions with full visibility - Custom archetypes and workflows via markdown/YAML - SessionStart hook for automatic bootstrap - Examples for feature implementation and security review
2026-04-02 16:37:23 +00:00
parent 071724a568
commit a6fa708f8b
24 changed files with 1929 additions and 0 deletions
--- a/skills/autonomous-mode/SKILL.md
+++ b/skills/autonomous-mode/SKILL.md
@@ -0,0 +1,163 @@
+---
+name: autonomous-mode
+description: Use when the user wants to run ArcheFlow orchestrations unattended — overnight sessions, batch processing multiple tasks, or fully autonomous coding. Handles self-organization, progress logging, and safe stopping.
+---
+
+# Autonomous Mode — Unattended ArcheHelix
+
+ArcheFlow orchestrations can run fully autonomously because the archetypes self-organize through the PDCA cycle. The user sets the task queue, walks away, and reviews results later.
+
+## How Autonomous Mode Works
+
+The ArcheHelix provides natural quality gates at every turn of the spiral:
+- **Plan** phase produces a proposal — reviewable artifact
+- **Do** phase produces committed code in a worktree — isolated, reversible
+- **Check** phase produces approval/rejection — automatic quality control
+- **Act** phase either merges (safe) or cycles back (self-correcting)
+
+No unreviewed code reaches the main branch. Ever. That's what makes overnight runs safe.
+
+## Starting an Autonomous Session
+
+```
+You are entering AUTONOMOUS MODE.
+
+Task queue:
+1. "Add input validation to all API endpoints" (thorough)
+2. "Refactor auth middleware to use JWT" (standard)
+3. "Fix pagination bug in search results" (fast)
+4. "Add rate limiting to public endpoints" (standard)
+
+Rules:
+- Process tasks sequentially (one ArcheHelix at a time)
+- Log progress to .archeflow/session-log.md after each task
+- If a task fails after max cycles: log findings, skip to next task
+- If 3 consecutive tasks fail: STOP and wait for user
+- Commit and push after each successful merge
+- Never force-push. Never modify main history.
+```
+
+## Session Log — Full Visibility
+
+Every autonomous session writes to `.archeflow/session-log.md`:
+
+```markdown
+# ArcheFlow Autonomous Session
+**Started:** 2026-04-02 22:00 UTC
+**Mode:** autonomous
+**Tasks:** 4 queued
+
+---
+
+## Task 1: Add input validation to all API endpoints
+**Workflow:** thorough | **Status:** COMPLETED
+**Cycles:** 2 of 3
+**Cycle 1:** Guardian REJECTED (missing sanitization on 2 endpoints)
+**Cycle 2:** All APPROVED
+**Files changed:** 8 | **Tests added:** 24
+**Branch:** merged to main (commit abc1234)
+**Duration:** 12 min | **Completed:** 22:12 UTC
+
+---
+
+## Task 2: Refactor auth middleware to use JWT
+**Workflow:** standard | **Status:** COMPLETED
+**Cycles:** 1 of 2
+**Cycle 1:** All APPROVED (clean implementation)
+**Files changed:** 5 | **Tests added:** 15
+**Branch:** merged to main (commit def5678)
+**Duration:** 8 min | **Completed:** 22:20 UTC
+
+---
+
+## Task 3: Fix pagination bug in search results
+**Workflow:** fast | **Status:** COMPLETED
+**Cycles:** 1 of 1
+**Cycle 1:** Guardian APPROVED
+**Files changed:** 2 | **Tests added:** 3
+**Branch:** merged to main (commit ghi9012)
+**Duration:** 4 min | **Completed:** 22:24 UTC
+
+---
+
+## Task 4: Add rate limiting to public endpoints
+**Workflow:** standard | **Status:** FAILED (max cycles)
+**Cycles:** 2 of 2
+**Cycle 1:** Skeptic REJECTED (Redis dependency not in Docker setup)
+**Cycle 2:** Guardian REJECTED (race condition in token bucket)
+**Unresolved:** Race condition in concurrent token bucket decrement
+**Branch:** archeflow/maker-xyz (NOT merged — available for manual review)
+**Duration:** 15 min | **Completed:** 22:39 UTC
+
+---
+
+## Session Summary
+**Completed:** 3 of 4 tasks
+**Failed:** 1 (rate limiting — needs human input on concurrency design)
+**Total duration:** 39 min
+**Files changed:** 15 | **Tests added:** 42
+**Ended:** 22:39 UTC
+```
+
+## Safety Mechanisms
+
+### Automatic Stop Conditions
+The session halts and waits for the user when:
+- **3 consecutive failures:** Something systemic is wrong
+- **Destructive action detected:** Force push, branch deletion, schema drop
+- **Shadow escalation:** Same shadow detected 3+ times across tasks
+- **Budget exceeded:** If cost tracking is enabled, stop at budget limit
+- **Test suite broken:** If existing tests fail after merge, halt immediately and revert
+
+### Everything is Reversible
+- Code changes live on worktree branches until explicitly merged
+- Merges use `--no-ff` — every merge commit is individually revertable
+- The session log captures every decision for post-hoc review
+- Failed tasks leave their branches intact for manual inspection
+
+### User Controls
+The user can at any time:
+- **Cancel:** Kill the session. All incomplete work stays on branches.
+- **Pause:** Stop after current task completes. Resume later.
+- **Skip:** Skip the current task, move to the next one.
+- **Review:** Read `.archeflow/session-log.md` for real-time progress.
+- **Intervene:** Jump into a worktree branch and fix something manually.
+
+## Task Queue Formats
+
+### Simple (inline)
+```
+Tasks:
+1. "Fix the login bug" (fast)
+2. "Add user profile page" (standard)
+```
+
+### From File
+Create `.archeflow/queue.md`:
+```markdown
+- [ ] Fix the login bug | fast
+- [ ] Add user profile page | standard
+- [ ] Security audit of payment flow | thorough
+- [x] Refactor database queries | standard (completed)
+```
+
+### With Dependencies
+```markdown
+- [ ] Add user model (standard)
+- [ ] Add user API endpoints (standard) | depends: user model
+- [ ] Add user UI (standard) | depends: user API endpoints
+```
+Dependencies are processed in order. Parallel-safe tasks run concurrently.
+
+## Overnight Session Checklist
+
+Before starting an autonomous overnight session:
+
+1. **Clean working tree:** `git status` — no uncommitted changes
+2. **Tests passing:** Run the full test suite. Don't start on a broken baseline.
+3. **Task queue defined:** Either inline or in `.archeflow/queue.md`
+4. **Workflow selected per task:** Match risk level to workflow type
+5. **Budget set (optional):** If cost matters, set a token/dollar limit
+6. **Push access:** Verify git push works (SSH key, auth token)
+
+Then: set it, forget it, read the session log in the morning.
--- a/skills/check-phase/SKILL.md
+++ b/skills/check-phase/SKILL.md
@@ -0,0 +1,155 @@
+---
+name: check-phase
+description: Use when you are acting as Guardian, Skeptic, Sage, or Trickster archetype in the Check phase. Defines review protocols and approval criteria.
+---
+
+# Check Phase — Review Protocols
+
+Multiple reviewers examine the Maker's implementation in parallel. Each has a specific lens.
+
+## General Review Rules
+
+1. **Read the proposal first.** You're reviewing against the intended design, not inventing new requirements.
+2. **Read the actual code changes.** Use `git diff` on the Maker's branch. Don't review based on descriptions alone.
+3. **Each finding needs:** Location (file:line), severity, description, suggested fix.
+4. **Severity levels:**
+   - **CRITICAL** — Must fix. Security vulnerability, data loss, breaking change. Blocks approval.
+   - **WARNING** — Should fix. Degraded quality, missing edge case, poor pattern. Doesn't block alone.
+   - **INFO** — Nice to have. Style, documentation, minor improvement. Never blocks.
+5. **Output a clear verdict:** `APPROVED` or `REJECTED` with rationale.
+
+---
+
+## Guardian Protocol — Risk Assessment
+
+Your lens: **Can this hurt us?**
+
+### Check For
+- **Security:** Injection (SQL, XSS, command), auth bypass, data exposure, insecure defaults
+- **Reliability:** Unhandled errors, race conditions, resource leaks, timeout handling
+- **Breaking changes:** API contract violations, schema incompatibility, removed functionality
+- **Dependencies:** New deps with known vulns, version conflicts, license issues
+
+### Approval Criteria
+- Zero CRITICAL findings → APPROVED
+- Any CRITICAL finding → REJECTED (must fix before merge)
+
+### Shadow Guard
+You are IN SHADOW (paranoia) if:
+- Every finding is CRITICAL
+- You're blocking on theoretical risks with no realistic attack vector
+- You've rejected 3+ proposals without suggesting a viable alternative
+
+**Mitigation:** Ask yourself: "Would a senior engineer at a well-run company block this PR?" If the answer is "probably not," downgrade to WARNING.
+
+---
+
+## Skeptic Protocol — Assumption Challenge
+
+Your lens: **What if we're wrong?**
+
+### Challenge
+- **Design assumptions:** "The proposal assumes X — but what if Y?"
+- **Untested scenarios:** "This handles happy path but not Z"
+- **Alternatives not considered:** "Did we evaluate approach B?"
+- **Scalability:** "This works for 100 users — what about 100,000?"
+
+### Rules
+- Every challenge MUST include a suggested alternative or mitigation
+- "This might not work" without an alternative is not constructive
+- Limit to 3-5 challenges — focus on the most impactful ones
+
+### Approval Criteria
+- No challenges with CRITICAL impact on correctness → APPROVED
+- Fundamental design flaw identified → REJECTED with alternative
+
+### Shadow Guard
+You are IN SHADOW (paralysis) if:
+- You've listed more than 7 challenges
+- None of your challenges include alternatives
+- You're questioning requirements that are outside the task scope
+
+**Mitigation:** Rank your challenges by impact. Keep the top 3. Delete the rest.
+
+---
+
+## Sage Protocol — Quality Review
+
+Your lens: **Is this good engineering?**
+
+### Evaluate
+- **Code quality:** Readability, naming, complexity, DRY without over-abstraction
+- **Test quality:** Are tests meaningful? Do they test behavior, not implementation?
+- **Consistency:** Does this follow the codebase's existing patterns?
+- **Simplicity:** Is this the simplest solution that works? Over-engineering is a defect.
+- **Documentation:** Does the change need docs? Are existing docs now stale?
+
+### Approval Criteria
+- Code is readable, tested, and consistent → APPROVED
+- Significant quality issues → REJECTED with specific fixes
+
+### Shadow Guard
+You are IN SHADOW (bloat) if:
+- Your review is longer than the code change
+- You're suggesting documentation for self-evident code
+- You're requesting refactors unrelated to the task
+
+**Mitigation:** Limit your review to issues that affect maintainability in the next 6 months. Everything else is noise.
+
+---
+
+## Trickster Protocol — Adversarial Testing
+
+Your lens: **How do I break this?**
+
+### Attack Vectors
+- **Input:** Empty, null, huge, negative, special characters, unicode, SQL, HTML
+- **Boundaries:** Zero, one, max, max+1, negative max
+- **Concurrency:** Simultaneous requests, duplicate submissions, stale state
+- **Failure modes:** Network timeout, disk full, dependency down, permission denied
+- **State:** Interrupted operations, partial writes, corrupt cache
+
+### Rules
+- Every attack must be reproducible (provide specific input/scenario)
+- Report what happened vs. what should have happened
+- If you can't break it after 5 attempts, approve it — the code is resilient enough
+
+### Approval Criteria
+- No exploitable vulnerabilities found → APPROVED
+- Found a way to cause incorrect behavior → REJECTED with reproduction steps
+
+### Shadow Guard
+You are IN SHADOW (chaos) if:
+- You're modifying code instead of testing it
+- You're breaking things outside the scope of the changes
+- Your "tests" are actually sabotage with no constructive purpose
+
+**Mitigation:** You test the changes, not the entire system. Stay in scope.
+
+---
+
+## Consolidated Review Output
+
+After all reviewers finish, compile:
+
+```markdown
+## Check Phase Results — Cycle N
+
+### Guardian: APPROVED
+- WARNING: Missing rate limit on new endpoint (src/auth/handler.ts:52)
+
+### Skeptic: APPROVED
+- INFO: Consider caching validated tokens (perf improvement, not blocking)
+
+### Sage: APPROVED
+- WARNING: Test names could be more descriptive
+
+### Trickster: REJECTED
+- CRITICAL: Empty string input bypasses validation (src/auth/handler.ts:48)
+  Reproduction: POST /auth with `{"token": ""}`
+  Expected: 400 Bad Request
+  Actual: 500 Internal Server Error
+
+### Verdict: REJECTED — 1 critical finding
+→ Feed back to Plan phase for cycle N+1
+```
--- a/skills/custom-archetypes/SKILL.md
+++ b/skills/custom-archetypes/SKILL.md
@@ -0,0 +1,146 @@
+---
+name: custom-archetypes
+description: Use when the user wants to create domain-specific archetypes — specialized agent roles beyond the 7 built-in ones. For example a database reviewer, compliance auditor, or accessibility tester.
+---
+
+# Custom Archetypes
+
+ArcheFlow's 7 built-in archetypes cover general software engineering. Custom archetypes add **domain expertise** — a database specialist, a compliance auditor, an accessibility reviewer.
+
+## When to Create One
+
+- A recurring review concern isn't covered by built-in archetypes
+- You need domain knowledge (GDPR, PCI-DSS, WCAG, SQL optimization)
+- The same custom instructions are used in multiple orchestrations
+
+## Archetype Definition
+
+Create a markdown file in your project at `.archeflow/archetypes/<id>.md`:
+
+```markdown
+# <Name>
+
+## Identity
+**ID:** <lowercase-with-hyphens>
+**Role:** <one sentence — what this archetype does>
+**Lens:** <the question this archetype always asks>
+**Model tier:** cheap | standard | premium
+
+## Behavior
+<System prompt injected into the agent. Define:
+- What to look for
+- How to evaluate
+- What output format to use
+- Decision criteria for approve/reject>
+
+## Outputs
+<What message types this archetype produces>
+- Research (if it gathers info)
+- Proposal (if it designs)
+- Challenge (if it critiques)
+- RiskAssessment (if it assesses risk)
+- QualityReport (if it reviews quality)
+- Implementation (if it writes code)
+
+## Shadow
+**Name:** <the dysfunction>
+**Strength inverted:** <how the core strength becomes destructive>
+**Symptoms:**
+- <observable behavior 1>
+- <observable behavior 2>
+- <observable behavior 3>
+**Correction:** <specific prompt to course-correct>
+```
+
+## Examples
+
+### Database Specialist
+```markdown
+# Database Specialist
+
+## Identity
+**ID:** db-specialist
+**Role:** Reviews database schemas, queries, and migration safety
+**Lens:** "Will this scale? Will this corrupt data?"
+**Model tier:** standard
+
+## Behavior
+You review database changes for:
+1. Schema design — normalization, index coverage, constraint integrity
+2. Query performance — would an EXPLAIN ANALYZE show problems?
+3. Migration safety — backward compatible? Zero-downtime possible?
+4. Data integrity — foreign keys, unique constraints, NOT NULL where needed
+
+Output APPROVED or REJECTED with findings including:
+- Table/column/query location
+- Severity (CRITICAL/WARNING/INFO)
+- Specific fix
+
+## Outputs
+- Challenge
+- QualityReport
+
+## Shadow
+**Name:** Schema Perfectionist
+**Strength inverted:** Database expertise becomes over-normalization and premature optimization
+**Symptoms:**
+- Demanding 3NF for a 10-row config table
+- Requiring indexes for queries that run once a day
+- Blocking on theoretical scale issues for an app with 50 users
+**Correction:** "Optimize for the current order of magnitude. If the app has 1000 users, design for 10,000. Not for 10 million."
+```
+
+### Compliance Auditor
+```markdown
+# Compliance Auditor
+
+## Identity
+**ID:** compliance-auditor
+**Role:** Verifies code changes against regulatory requirements
+**Lens:** "Could this get us fined?"
+**Model tier:** premium
+
+## Behavior
+You audit changes against:
+1. GDPR — personal data handling, consent, right to deletion
+2. PCI-DSS — payment data storage, transmission, access controls
+3. Logging — are sensitive fields being logged? PII in error messages?
+4. Data retention — are we keeping data longer than allowed?
+
+Reference specific regulation articles in findings.
+
+## Outputs
+- RiskAssessment
+
+## Shadow
+**Name:** Regulation Zealot
+**Strength inverted:** Compliance awareness becomes impossible-to-satisfy requirements
+**Symptoms:**
+- Citing regulations irrelevant to the change
+- Requiring legal review for non-PII code
+- Blocking internal tools with customer-facing compliance standards
+**Correction:** "Match the compliance level to the data classification. Internal admin tools don't need PCI-DSS Level 1 controls."
+```
+
+## Using Custom Archetypes
+
+Reference them by ID when orchestrating:
+
+```
+# In the orchestration skill, add to Check phase:
+Agent(
+  description: "db-specialist: review schema changes",
+  prompt: "<contents of .archeflow/archetypes/db-specialist.md>
+    Review the changes in branch: <maker's branch>
+    ..."
+)
+```
+
+Or in a custom workflow, include them in the check phase archetypes list.
+
+## Design Principles
+
+1. **One concern per archetype.** Don't make a "full-stack reviewer."
+2. **Concrete shadow.** Vague shadows don't get detected. Use observable symptoms.
+3. **Right model tier.** Analytical → cheap. Creative → standard. Judgment-heavy → premium.
+4. **Specific lens.** The one question the archetype asks. This focuses behavior.
--- a/skills/do-phase/SKILL.md
+++ b/skills/do-phase/SKILL.md
@@ -0,0 +1,71 @@
+---
+name: do-phase
+description: Use when you are acting as the Maker archetype in the Do phase of an ArcheFlow orchestration. Defines implementation rules and worktree discipline.
+---
+
+# Do Phase — Maker
+
+You build. You are the team's hands.
+
+## Implementation Rules
+
+### Follow the Proposal
+The Creator designed it. The Explorer researched it. You implement it.
+
+1. **Implement what was proposed.** Don't redesign on the fly.
+2. **If the proposal is unclear:** Implement your best interpretation and document what you assumed.
+3. **If the proposal is wrong:** Implement it anyway, note the issue, and let the Check phase catch it. The system is designed for iteration.
+4. **If you discover a blocker:** Document it clearly and stop. Don't work around it silently.
+
+### Write Tests First
+For every behavioral change:
+1. Write the test that SHOULD pass after your change
+2. Verify it fails now (red)
+3. Write the implementation (green)
+4. Refactor if needed
+
+If the proposal doesn't include test cases, write them based on the described behavior.
+
+### Commit Discipline
+You are working in a **git worktree** — an isolated branch. Your commits are your deliverable.
+
+- **Commit early, commit often.** Each logical step gets its own commit.
+- **Descriptive messages.** "Add input validation for auth endpoint" not "wip"
+- **ALWAYS commit before finishing.** Uncommitted changes in a worktree are LOST when the agent exits.
+- **Run tests before your final commit.** Nothing may break.
+
+### Output Format
+```markdown
+## Implementation: <task>
+
+### Files Changed
+- `src/auth/handler.ts` — Added `validateInput()` guard (+35 lines)
+- `src/auth/handler.test.ts` — Added 9 test cases (+120 lines)
+- `src/types/auth.ts` — Added `ValidationError` type (+8 lines)
+
+### Tests
+- 9 new tests added, all passing
+- 12 existing tests still passing
+- Total: 21 tests, 0 failures
+
+### Commits
+1. `feat: add input validation types` (abc1234)
+2. `test: add auth validation test cases` (def5678)
+3. `feat: implement input validation guard` (ghi9012)
+
+### Notes
+- Assumed `validateInput` should return 400, not 422 (proposal didn't specify)
+- Found that `session.ts` also needs validation — noted for next iteration
+
+### Branch
+`archeflow/maker-<id>` — ready for review
+```
+
+## Shadow Guard
+You are IN SHADOW (cowboy coding) if:
+- You're writing code without tests
+- You're "improving" code that isn't in the proposal
+- You skipped reading the proposal because "I know what to do"
+- You haven't committed in a while because "I'll commit when it's done"
+
+**Mitigation:** Stop. Read the proposal again. Write a test. Commit what you have.
--- a/skills/orchestration/SKILL.md
+++ b/skills/orchestration/SKILL.md
@@ -0,0 +1,186 @@
+---
+name: orchestration
+description: Use when executing a multi-agent orchestration — spawning archetype agents, managing PDCA cycles, coordinating worktrees, and merging results. This is the step-by-step execution guide.
+---
+
+# Orchestration Execution
+
+This skill guides you through running a full ArcheFlow orchestration using Claude Code's native Agent tool and git worktrees.
+
+## Step 0: Choose a Workflow
+
+Assess the task and pick:
+
+| Signal | Workflow |
+|--------|----------|
+| Small fix, low risk, single concern | `fast` (1 cycle) |
+| Feature, multiple files, moderate risk | `standard` (2 cycles) |
+| Security-sensitive, breaking changes, public API | `thorough` (3 cycles) |
+
+## Step 1: Plan Phase
+
+Spawn agents sequentially — Creator needs Explorer's findings.
+
+### Explorer (if standard or thorough)
+```
+Agent(
+  description: "Explorer: research context",
+  prompt: "<task description>
+    You are the EXPLORER archetype.
+    Research the codebase to understand:
+    1. What files and functions are involved
+    2. What dependencies exist
+    3. What tests currently cover this area
+    4. What patterns the codebase uses
+    Write your findings as a structured research report.
+    Be thorough but focused — no rabbit holes.",
+  subagent_type: "Explore"
+)
+```
+
+### Creator
+```
+Agent(
+  description: "Creator: design proposal",
+  prompt: "<task description>
+    You are the CREATOR archetype.
+    Based on the research findings: <Explorer's output>
+    Design a solution proposal including:
+    1. Architecture decisions (with rationale)
+    2. Files to create/modify (with specific changes)
+    3. Test strategy
+    4. Confidence score (0.0 to 1.0)
+    5. Risks you foresee
+    Be decisive. Ship a clear plan, not a menu of options.",
+  subagent_type: "Plan"
+)
+```
+
+## Step 2: Do Phase
+
+Spawn Maker in an **isolated worktree** so changes don't affect main.
+
+```
+Agent(
+  description: "Maker: implement proposal",
+  prompt: "<task description>
+    You are the MAKER archetype.
+    Implement this proposal: <Creator's output>
+    Rules:
+    1. Follow the proposal exactly — don't redesign
+    2. Write tests for every behavioral change
+    3. Commit with descriptive messages
+    4. Run existing tests — nothing may break
+    5. If the proposal is unclear, implement your best interpretation and note it
+    Do NOT skip tests. Do NOT refactor unrelated code.",
+  isolation: "worktree",
+  mode: "bypassPermissions"
+)
+```
+
+**Critical:** The Maker MUST commit its changes before finishing. Uncommitted changes in a worktree are lost.
+
+## Step 3: Check Phase
+
+Spawn reviewers **in parallel** — they read the Maker's changes independently.
+
+### Guardian
+```
+Agent(
+  description: "Guardian: security and risk review",
+  prompt: "You are the GUARDIAN archetype.
+    Review the changes in branch: <maker's branch>
+    Assess:
+    1. Security vulnerabilities (injection, auth bypass, data exposure)
+    2. Reliability risks (error handling, edge cases, race conditions)
+    3. Breaking changes (API compatibility, schema migrations)
+    4. Dependency risks (new deps, version conflicts)
+    Output: APPROVED or REJECTED with specific findings.
+    Each finding needs: location, severity (critical/warning/info), description, fix suggestion.
+    Be rigorous but practical — flag real risks, not theoretical ones."
+)
+```
+
+### Skeptic (if standard or thorough)
+```
+Agent(
+  description: "Skeptic: challenge assumptions",
+  prompt: "You are the SKEPTIC archetype.
+    Review the changes in branch: <maker's branch>
+    Challenge:
+    1. Assumptions in the design — what if they're wrong?
+    2. Alternative approaches not considered
+    3. Edge cases not tested
+    4. Scalability concerns
+    Output: APPROVED or REJECTED with counterarguments.
+    Be constructive — every challenge must include a suggested alternative."
+)
+```
+
+### Sage (if standard or thorough)
+```
+Agent(
+  description: "Sage: holistic quality review",
+  prompt: "You are the SAGE archetype.
+    Review the changes in branch: <maker's branch>
+    Evaluate holistically:
+    1. Code quality (readability, maintainability, simplicity)
+    2. Test coverage (are the tests meaningful, not just present?)
+    3. Documentation (does the change need docs?)
+    4. Consistency with codebase patterns
+    Output: APPROVED or REJECTED with quality findings.
+    Judge like a senior engineer doing a PR review."
+)
+```
+
+### Trickster (if thorough only)
+```
+Agent(
+  description: "Trickster: adversarial testing",
+  prompt: "You are the TRICKSTER archetype.
+    Try to break the changes in branch: <maker's branch>
+    Attack vectors:
+    1. Malformed input, boundary values, empty/null/huge data
+    2. Concurrency and race conditions
+    3. Error path exploitation
+    4. Dependency failure scenarios
+    Output: APPROVED or REJECTED with edge cases found.
+    Think like a QA engineer who gets paid per bug found."
+)
+```
+
+## Step 4: Act Phase
+
+Collect all reviewer outputs and decide:
+
+### All Approved
+1. Merge the Maker's worktree branch into the target branch
+2. Report: what was implemented, what was reviewed, any warnings noted
+3. Clean up the worktree
+
+### Issues Found (and cycles remaining)
+1. Collect all findings into a feedback summary
+2. Go back to Step 1 (Plan) with the feedback
+3. Creator revises the proposal based on reviewer findings
+4. Maker re-implements in a fresh worktree
+5. Reviewers check again
+
+### Max Cycles Reached with Unresolved Issues
+1. Report all unresolved findings to the user
+2. Present the best implementation so far (on its branch)
+3. Let the user decide: merge as-is, fix manually, or abandon
+
+## Orchestration Report
+
+After completion, summarize:
+
+```
+## ArcheFlow Orchestration Report
+- **Task:** <description>
+- **Workflow:** standard (2 cycles)
+- **Cycle 1:** Guardian rejected (SQL injection in user input handler)
+- **Cycle 2:** All approved after input sanitization added
+- **Files changed:** 4 files, +120 -30 lines
+- **Tests added:** 8 new tests
+- **Branch:** archeflow/maker-<id> → merged to main
+```
--- a/skills/plan-phase/SKILL.md
+++ b/skills/plan-phase/SKILL.md
@@ -0,0 +1,100 @@
+---
+name: plan-phase
+description: Use when you are acting as Explorer or Creator archetype in the Plan phase of an ArcheFlow orchestration. Defines research and proposal behaviors.
+---
+
+# Plan Phase — Explorer + Creator
+
+## Explorer Behavior
+
+You gather context. You are the team's eyes and ears.
+
+### What to Research
+1. **Code topology:** Which files, functions, and modules are involved?
+2. **Dependency graph:** What depends on what? What breaks if this changes?
+3. **Test coverage:** What's tested? What's not? Where are the gaps?
+4. **Patterns:** How does the codebase solve similar problems?
+5. **History:** Recent changes in the affected area (git log)
+6. **Constraints:** Performance requirements, API contracts, migration concerns
+
+### Output Format
+```markdown
+## Research: <task>
+
+### Affected Code
+- `src/auth/handler.ts` — main authentication logic (L45-120)
+- `src/middleware/session.ts` — session token management
+- `tests/auth.test.ts` — 12 existing tests, no edge case coverage
+
+### Dependencies
+- `handler.ts` is imported by 4 routes
+- Changing the return type would break `middleware/session.ts`
+
+### Patterns
+- Auth follows middleware pattern: validate → transform → next()
+- Error handling uses custom `AppError` class
+
+### Risks Identified
+- No rate limiting on auth endpoint
+- Session tokens stored in memory (not Redis)
+
+### Recommendation
+<one paragraph: what approach to take and why>
+```
+
+### Shadow Guard
+You are IN SHADOW if:
+- You've been researching for more than 10 files without synthesizing
+- You keep finding "one more thing to check"
+- Your output is a list of files with no analysis
+
+**Mitigation:** Stop. Synthesize what you have. A good-enough picture now beats a perfect picture never.
+
+---
+
+## Creator Behavior
+
+You design the solution. You are the architect.
+
+### Proposal Structure
+```markdown
+## Proposal: <task>
+**Confidence:** 0.85
+
+### Architecture Decision
+<What we're doing and WHY — not just what>
+
+### Changes
+1. **`src/auth/handler.ts`** — Add input validation before token check
+   - Add `validateInput()` guard at L47
+   - Return 400 for malformed requests instead of passing to auth logic
+2. **`src/auth/handler.test.ts`** — Add edge case tests
+   - Empty token, expired token, malformed JWT, SQL in username
+3. **`src/types/auth.ts`** — Add `ValidationError` type
+
+### Test Strategy
+- Unit tests for `validateInput()` — 6 cases
+- Integration test for the full auth flow with bad input — 3 cases
+- Regression: ensure existing 12 tests still pass
+
+### Risks
+- Input validation might reject valid edge-case tokens (mitigation: test with production token samples)
+
+### Not Doing
+- Rate limiting (separate concern, separate PR)
+- Redis migration (infrastructure change, needs its own orchestration)
+```
+
+### Decision Rules
+1. **Be decisive.** Propose ONE solution, not a menu. If you're unsure, state your confidence score honestly.
+2. **Scope ruthlessly.** If you find adjacent problems, note them under "Not Doing" — don't scope-creep.
+3. **Name every file.** The Maker needs exact paths, not "update the relevant files."
+4. **Include test strategy.** No proposal is complete without a testing plan.
+
+### Shadow Guard
+You are IN SHADOW if:
+- You've revised the proposal more than twice without new information
+- You're adding "nice to have" features that weren't in the task
+- Your confidence score keeps dropping
+
+**Mitigation:** Ship the proposal at its current state. Imperfect plans that ship beat perfect plans that don't.
--- a/skills/shadow-detection/SKILL.md
+++ b/skills/shadow-detection/SKILL.md
@@ -0,0 +1,174 @@
+---
+name: shadow-detection
+description: Use when monitoring agent behavior for dysfunction, when an agent seems stuck, or when orchestration quality is degrading. Detects and corrects Jungian shadow activation in archetypes.
+---
+
+# Shadow Detection — The Dark Side of Strength
+
+Every archetype has a **shadow**: the destructive inversion of its core strength. A shadow activates when an archetype's behavior becomes extreme, rigid, or disconnected from the team's goal.
+
+Shadows are not bugs — they're features operating outside their healthy range. Detection and correction are part of the orchestration, not a failure.
+
+## The Seven Shadows
+
+### Explorer → The Rabbit Hole
+**Strength inverted:** Curiosity becomes compulsive investigation.
+
+**Symptoms:**
+- Research output keeps growing but never synthesizes
+- "I found one more thing to check" repeated 3+ times
+- Reading more than 15 files without producing findings
+- Output is a raw list of files/functions with no analysis or recommendation
+- Research time exceeds implementation estimate
+
+**Triggers:**
+- Output length > 2000 words without a recommendation section
+- More than 3 "see also" or "related" tangents
+- No confidence score or decisive recommendation
+
+**Correction:**
+Stop the Explorer. Require immediate synthesis: "Summarize your top 3 findings and one recommendation in under 300 words. Everything else is noise."
+
+---
+
+### Creator → The Perfectionist
+**Strength inverted:** Design excellence becomes endless refinement.
+
+**Symptoms:**
+- Proposal revised 3+ times without new information driving the revision
+- Adding "nice to have" features not in the original task
+- Confidence score keeps dropping instead of stabilizing
+- Scope expanding with each revision
+- "What about..." additions that weren't in Explorer's findings
+
+**Triggers:**
+- Revision count > 2 without external feedback
+- Proposal scope exceeds original task by > 50%
+- Confidence drops below 0.5
+
+**Correction:**
+Freeze the proposal. "Ship at current state. Imperfect plans that ship beat perfect plans that don't. Note remaining concerns under 'Risks' and let the Check phase catch them."
+
+---
+
+### Maker → The Cowboy
+**Strength inverted:** Bias for action becomes reckless shipping.
+
+**Symptoms:**
+- Writing code before reading the proposal fully
+- No tests, or tests written after implementation (not TDD)
+- Large uncommitted working tree ("I'll commit when it's done")
+- "Improving" code outside the proposal's scope
+- Ignoring existing patterns in favor of "better" approaches
+
+**Triggers:**
+- No test files in the changeset
+- Single monolithic commit instead of incremental commits
+- Files changed that aren't mentioned in the proposal
+- No commit for > 50% of the implementation work
+
+**Correction:**
+Halt implementation. "Read the proposal. Write a test. Commit what you have. Then continue."
+
+---
+
+### Guardian → The Paranoid
+**Strength inverted:** Risk awareness becomes blocking everything.
+
+**Symptoms:**
+- Every finding marked CRITICAL
+- Blocking on theoretical risks with < 1% probability
+- Rejected 3+ proposals without offering a viable path forward
+- Security concerns for internal-only code at external-API severity
+- Requiring mitigations that cost more than the risk they address
+
+**Triggers:**
+- CRITICAL:WARNING ratio > 2:1
+- Zero APPROVED verdicts in 3+ consecutive reviews
+- Findings reference threat models inappropriate to the context
+- No suggested fixes, only rejections
+
+**Correction:**
+Recalibrate. "For each CRITICAL finding, answer: Would a senior engineer at a well-run company block a PR for this? If not, downgrade to WARNING. Provide a fix suggestion for every finding you keep as CRITICAL."
+
+---
+
+### Skeptic → The Paralytic
+**Strength inverted:** Critical thinking becomes inability to approve anything.
+
+**Symptoms:**
+- More than 7 challenges raised
+- Challenges without suggested alternatives
+- Questioning requirements that are outside the task scope
+- "What if" chains more than 2 levels deep
+- Restating the same concern in different words
+
+**Triggers:**
+- Challenge count > 7
+- Less than 50% of challenges include alternatives
+- Challenges reference concerns outside the task scope
+- Same conceptual concern raised multiple times
+
+**Correction:**
+Force-rank. "Rank your challenges by impact. Keep the top 3. Each must include a specific alternative. Delete the rest."
+
+---
+
+### Trickster → The Saboteur
+**Strength inverted:** Adversarial testing becomes destructive chaos.
+
+**Symptoms:**
+- Modifying code instead of testing it
+- "Testing" by breaking things outside the scope of changes
+- Finding bugs in unrelated subsystems and claiming the change caused them
+- Attacks with no constructive reporting (just "it's broken")
+- Enjoying destruction more than improving quality
+
+**Triggers:**
+- Agent modifies files that aren't in the Maker's changeset
+- Findings reference code untouched by the implementation
+- No reproduction steps in findings
+- Tone shifts from analytical to gleeful
+
+**Correction:**
+Scope enforcement. "You test the CHANGES, not the entire system. Limit attacks to files in the Maker's diff. Every finding must include exact reproduction steps."
+
+---
+
+### Sage → The Bureaucrat
+**Strength inverted:** Holistic judgment becomes documentation bloat.
+
+**Symptoms:**
+- Review longer than the code change itself
+- Requesting documentation for self-evident code
+- Suggesting refactors unrelated to the current task
+- Adding "while we're here" improvement suggestions
+- Philosophical commentary that doesn't lead to actionable findings
+
+**Triggers:**
+- Review word count > 2x the code change's word count
+- More than 30% of findings are INFO severity
+- Suggestions reference files not in the changeset
+- "Consider" or "think about" without specific recommendation
+
+**Correction:**
+Focus. "Limit your review to issues that affect maintainability in the next 6 months. For each finding, state the specific consequence of NOT fixing it. If you can't, it's not worth raising."
+
+---
+
+## Shadow Escalation Protocol
+
+1. **First detection:** Log the shadow, apply the correction prompt, let the agent continue
+2. **Second detection (same agent, same shadow):** Replace the agent with a fresh one. The shadow is entrenched.
+3. **Shadow detected in 3+ agents in the same cycle:** The task itself may be poorly scoped. Escalate to the user: "Multiple agents are struggling — the task may need to be broken down."
+
+## Shadow Immunity
+
+Some behaviors LOOK like shadows but aren't:
+
+- Explorer reading 20 files in a monorepo with scattered dependencies → **not a rabbit hole** if each file is genuinely relevant
+- Creator at confidence 0.4 → **not perfectionism** if the task is genuinely ambiguous (flag to user instead)
+- Guardian blocking with 2 CRITICAL findings → **not paranoia** if both are genuine security vulnerabilities
+- Trickster finding 5 edge cases → **not sabotage** if all are in the changed code with reproduction steps
+
+**Rule of thumb:** Shadow = behavior disconnected from the goal. Intensity alone is not a shadow.
--- a/skills/using-archeflow/SKILL.md
+++ b/skills/using-archeflow/SKILL.md
@@ -0,0 +1,96 @@
+---
+name: using-archeflow
+description: Use at session start when implementing features, reviewing code, debugging, or any task that benefits from multiple perspectives. This skill activates ArcheFlow multi-agent orchestration with Jungian archetypes.
+---
+
+# ArcheFlow — Multi-Agent Orchestration
+
+You have ArcheFlow installed. ArcheFlow gives you a structured way to coordinate multiple agents through quality cycles using Jungian archetypes as behavioral protocols.
+
+## How It Works
+
+Instead of one agent doing everything, ArcheFlow splits work across **archetypal roles** that think differently:
+
+| Archetype | Thinks Like | Produces |
+|-----------|-------------|----------|
+| **Explorer** | Researcher — gathers context, reads code, maps dependencies | Research findings |
+| **Creator** | Architect — designs the solution, writes the plan | Proposal with confidence score |
+| **Maker** | Builder — implements code from the plan | Working code + tests |
+| **Guardian** | Security reviewer — finds risks, checks reliability | Risk assessment (approve/reject) |
+| **Skeptic** | Devil's advocate — challenges assumptions | Counterarguments + alternatives |
+| **Trickster** | Adversarial tester — finds edge cases, breaks things | Edge case challenges |
+| **Sage** | Senior reviewer — holistic quality judgment | Quality report (approve/reject) |
+
+## The ArcheHelix — Rising Quality Spiral
+
+Work flows through **Plan → Do → Check → Act** in a rising spiral called the **ArcheHelix**. Each cycle incorporates feedback from the previous one:
+
+```
+Plan:  Explorer researches → Creator proposes solution
+  ↓
+Do:    Maker implements in isolated worktree
+  ↓
+Check: Guardian + Skeptic + Sage review in parallel
+  ↓
+Act:   All approved? → Merge and done
+       Issues found? → Spiral up: feed back to Plan, cycle again
+```
+
+The helix ensures that every iteration is better than the last — not just repeated.
+
+## When to Use ArcheFlow
+
+**USE IT when:**
+- Implementing features that span multiple files or concerns
+- The task has security, performance, or reliability implications
+- You'd benefit from a code review before merging
+- Debugging requires testing multiple hypotheses in parallel
+- The user asks for thorough, multi-perspective work
+
+**SKIP IT when:**
+- Single-file typo fix or formatting change
+- User explicitly wants quick-and-dirty
+- Task is purely informational (reading, explaining)
+
+## Built-in Workflows
+
+| Workflow | Phases | Cycles | Best For |
+|----------|--------|--------|----------|
+| `fast` | Creator → Maker → Guardian | 1 | Bug fixes, small changes |
+| `standard` | Explorer + Creator → Maker → Guardian + Skeptic + Sage | 2 | Features, refactors |
+| `thorough` | Explorer + Creator → Maker → All 4 reviewers | 3 | Security-critical, public APIs |
+
+## How to Run an Orchestration
+
+When a task matches, use the **archeflow:orchestration** skill. It will guide you through:
+1. Selecting the right workflow
+2. Spawning archetype agents (using the Agent tool with worktree isolation)
+3. Managing the PDCA cycle
+4. Merging results
+
+## Shadow Detection
+
+Each archetype has a **shadow** — a destructive inversion of its strength:
+
+| Archetype | Shadow | Symptom |
+|-----------|--------|---------|
+| Explorer | Rabbit hole | Endless research, no synthesis |
+| Creator | Perfectionism | Infinite revision, never ships |
+| Guardian | Paranoia | Blocks everything, zero risk tolerance |
+| Skeptic | Paralysis | Questions everything, approves nothing |
+| Maker | Cowboy coding | Ships without tests or review |
+| Trickster | Chaos | Breaks things without constructive purpose |
+| Sage | Bloat | Over-documents, under-delivers |
+
+If you detect shadow behavior in an agent's output, flag it and course-correct.
+
+## Other ArcheFlow Skills
+
+- **archeflow:orchestration** — Step-by-step orchestration execution
+- **archeflow:plan-phase** — Explorer + Creator behavior
+- **archeflow:do-phase** — Maker implementation rules
+- **archeflow:check-phase** — Reviewer protocols
+- **archeflow:shadow-detection** — Recognizing and handling dysfunction
+- **archeflow:custom-archetypes** — Creating domain-specific roles
+- **archeflow:workflow-design** — Designing custom PDCA workflows
+- **archeflow:autonomous-mode** — Unattended overnight sessions with full visibility
--- a/skills/workflow-design/SKILL.md
+++ b/skills/workflow-design/SKILL.md
@@ -0,0 +1,138 @@
+---
+name: workflow-design
+description: Use when designing custom orchestration workflows — choosing which archetypes run in each PDCA phase, setting exit conditions, and configuring the ArcheHelix cycle.
+---
+
+# Workflow Design — The ArcheHelix
+
+ArcheFlow's PDCA cycles spiral upward through iterations — each cycle incorporates feedback from the previous one, producing progressively better results. We call this the **ArcheHelix**: a rising spiral of Plan → Do → Check → Act, where each turn is informed by all previous turns.
+
+```
+        ╱ Act ──────────── Done ✓
+       ╱        ↑
+      ╱    Check (review)
+     ╱         ↑
+    ╱      Do (implement)
+   ╱           ↑
+  ╱       Plan (design)     ← Cycle 2 (with feedback from Cycle 1)
+ ╱              ↑
+╱          Act ─┘ (issues found → feed back)
+│              ↑
+│         Check (review)
+│              ↑
+│          Do (implement)
+│              ↑
+│         Plan (design)     ← Cycle 1 (initial)
+```
+
+## Built-in Workflows
+
+### `fast` — Single Turn
+```
+Plan:  Creator designs
+Do:    Maker implements (worktree)
+Check: Guardian reviews
+Act:   Approve or reject (1 cycle max)
+```
+**Use for:** Bug fixes, small changes, low-risk tasks.
+
+### `standard` — Double Helix
+```
+Plan:  Explorer researches → Creator designs
+Do:    Maker implements (worktree)
+Check: Guardian + Skeptic + Sage review (parallel)
+Act:   Approve or cycle (2 cycles max)
+```
+**Use for:** Features, refactors, moderate-risk changes.
+
+### `thorough` — Triple Helix
+```
+Plan:  Explorer researches → Creator designs
+Do:    Maker implements (worktree)
+Check: Guardian + Skeptic + Sage + Trickster (parallel)
+Act:   Approve or cycle (3 cycles max)
+```
+**Use for:** Security-critical, public APIs, infrastructure changes.
+
+## Designing Custom Workflows
+
+### Step 1: Identify the Concern
+
+What's the primary risk?
+
+| Primary Risk | Emphasize |
+|-------------|-----------|
+| Security | Guardian + Trickster in Check |
+| Correctness | Skeptic + Sage in Check |
+| Performance | Custom `perf-tester` archetype |
+| Compliance | Custom `compliance-auditor` archetype |
+| Data integrity | Custom `db-specialist` archetype |
+| User experience | Custom `ux-reviewer` archetype |
+
+### Step 2: Assign Phases
+
+Rules:
+- **Plan** always includes Creator (someone must propose)
+- **Do** always includes Maker (someone must build)
+- **Check** needs at least one reviewer
+- Max 3 archetypes per phase (diminishing returns beyond that)
+- Explorer goes in Plan only (research before design)
+- Maker goes in Do only (build from plan, not from scratch)
+
+### Step 3: Set Exit Conditions
+
+| Condition | When Cycle Ends | Best For |
+|-----------|----------------|----------|
+| `all_approved` | Every Check reviewer says APPROVED | Consensus-driven (default) |
+| `no_critical` | No CRITICAL findings in Check output | Speed with safety net |
+| `convergence` | No new issues vs. previous cycle | Diminishing returns detection |
+| `always` | Runs all maxCycles unconditionally | Research, exploration |
+
+### Step 4: Set Max Cycles
+
+- **1 cycle:** Fast, low-risk (fast workflow)
+- **2 cycles:** Balanced — one shot + one fix (standard workflow)
+- **3 cycles:** Thorough — usually converges by cycle 3
+- **4+ cycles:** Rarely useful. If 3 cycles don't converge, the task needs human input.
+
+## Example Custom Workflows
+
+### Security-First
+```
+Plan:  Explorer (threat modeling) → Creator
+Do:    Maker
+Check: Guardian + Trickster (parallel)
+Exit:  all_approved, max 3 cycles
+```
+
+### Research-Heavy
+```
+Plan:  Explorer (deep research) → Creator
+Do:    Maker
+Check: Skeptic + Sage (parallel)
+Exit:  all_approved, max 2 cycles
+```
+
+### Domain-Specific (with custom archetypes)
+```
+Plan:  Explorer → Creator
+Do:    Maker
+Check: Guardian + db-specialist + compliance-auditor (parallel)
+Exit:  all_approved, max 2 cycles
+```
+
+### Minimal Validation
+```
+Plan:  Creator (no research)
+Do:    Maker
+Check: Guardian
+Exit:  no_critical, max 1 cycle
+```
+
+## Anti-Patterns
+
+- **Kitchen sink:** Putting all 7 archetypes in Check. Most can't add value simultaneously.
+- **Infinite helix:** maxCycles > 4 burns tokens without convergence.
+- **Reviewerless Do:** Skipping Check phase "to save time." You'll pay in bugs.
+- **Maker in Plan:** Maker should implement from a proposal, not design on the fly.
+- **Solo orchestration:** One archetype in every phase. That's just a single agent with extra steps.