refactor: consolidate run skill — merge 8 skills into one self-contained PDCA orchestrator
Merge run + orchestration + plan-phase + do-phase + artifact-routing + process-log + attention-filters + convergence + effectiveness into a single 459-line run/SKILL.md. Before: run skill (890 lines) + 3 prerequisites (~1,300 lines) = ~2,200 lines of context. After: one self-contained skill (459 lines) with zero prerequisites. Preserved: PDCA flow, workflow selection, adaptation rules A1-A3, agent prompts, attention filters, feedback routing, convergence detection, effectiveness scoring, shadow monitoring, pipeline strategy, event reference, artifact naming. Removed: verbose bash code blocks, shell variable tracking, resolve_model() function, lib validation loops, evidence validation bash, redundant event emission blocks.
This commit is contained in:
@@ -1,289 +0,0 @@
|
||||
---
|
||||
name: artifact-routing
|
||||
description: |
|
||||
Inter-phase artifact protocol for ArcheFlow runs. Defines how artifacts are named, stored,
|
||||
routed between agents, and archived across PDCA cycles. Ensures each agent receives exactly
|
||||
the context it needs — no more, no less.
|
||||
<example>Automatically loaded by archeflow:run</example>
|
||||
<example>User: "What does the Maker receive as context?"</example>
|
||||
---
|
||||
|
||||
# Artifact Routing — Inter-Phase Context Protocol
|
||||
|
||||
Every ArcheFlow run produces artifacts — research notes, proposals, diffs, reviews, feedback. This skill defines how those artifacts are named, where they live, what each agent receives, and how they are preserved across cycles.
|
||||
|
||||
## Artifact Directory Structure
|
||||
|
||||
```
|
||||
.archeflow/artifacts/<run_id>/
|
||||
├── plan-explorer.md # Explorer research output
|
||||
├── plan-creator.md # Creator proposal/outline
|
||||
├── do-maker.md # Maker implementation summary
|
||||
├── do-maker-files.txt # List of files created/modified (one path per line)
|
||||
├── check-guardian.md # Guardian review verdict + findings
|
||||
├── check-sage.md # Sage review (if present)
|
||||
├── check-skeptic.md # Skeptic review (if present)
|
||||
├── check-trickster.md # Trickster review (if present)
|
||||
├── act-feedback.md # Structured feedback for next cycle (Cycle Feedback Protocol)
|
||||
├── act-fixes.jsonl # Applied fixes log (one JSON line per fix)
|
||||
├── cycle-1/ # Archived artifacts from cycle 1
|
||||
│ ├── plan-explorer.md
|
||||
│ ├── plan-creator.md
|
||||
│ ├── do-maker.md
|
||||
│ ├── do-maker-files.txt
|
||||
│ ├── check-guardian.md
|
||||
│ ├── check-sage.md
|
||||
│ └── act-feedback.md
|
||||
└── cycle-2/ # Archived artifacts from cycle 2 (if cycle 3 starts)
|
||||
└── ...
|
||||
```
|
||||
|
||||
## Naming Convention
|
||||
|
||||
Artifacts follow the pattern: `<phase>-<agent>.<ext>`
|
||||
|
||||
| Phase | Agent | Filename | Format |
|
||||
|-------|-------|----------|--------|
|
||||
| plan | explorer | `plan-explorer.md` | Markdown research report |
|
||||
| plan | creator | `plan-creator.md` | Markdown proposal with confidence scores |
|
||||
| plan | mini-explorer | `plan-mini-explorer.md` | Focused risk research (only if confidence gate triggers) |
|
||||
| do | maker | `do-maker.md` | Markdown implementation summary |
|
||||
| do | maker | `do-maker-files.txt` | Plain text, one file path per line |
|
||||
| check | guardian | `check-guardian.md` | Markdown verdict + findings table |
|
||||
| check | sage | `check-sage.md` | Markdown verdict + findings table |
|
||||
| check | skeptic | `check-skeptic.md` | Markdown verdict + findings table |
|
||||
| check | trickster | `check-trickster.md` | Markdown verdict + findings table |
|
||||
| act | (orchestrator) | `act-feedback.md` | Structured feedback (see Cycle Feedback Protocol) |
|
||||
| act | (orchestrator) | `act-fixes.jsonl` | JSONL fix log |
|
||||
|
||||
**Rule:** Never invent new artifact names during a run. If a reviewer is skipped (A2 fast-path, reviewer profile), its artifact simply does not exist. Downstream phases check for file existence before reading.
|
||||
|
||||
---
|
||||
|
||||
## Context Injection Rules
|
||||
|
||||
Each agent receives a filtered subset of artifacts. This is the **attention filter** — it controls what context is injected into the agent's prompt.
|
||||
|
||||
### Plan Phase
|
||||
|
||||
| Agent | Receives | Does NOT receive |
|
||||
|-------|----------|-----------------|
|
||||
| **Explorer** | Task description, relevant file paths, codebase access | Prior proposals, review outputs, implementation details |
|
||||
| **Creator** (cycle 1) | Task description, `plan-explorer.md` (if exists) | Raw file contents (Explorer summarized them), git diffs |
|
||||
| **Creator** (cycle 2+) | Task description, `plan-explorer.md`, `act-feedback.md` (Creator-routed findings only) | Raw reviewer outputs, Maker-routed findings |
|
||||
|
||||
**Creator context injection template (cycle 2+):**
|
||||
```markdown
|
||||
## Task
|
||||
<task description>
|
||||
|
||||
## Research (from Explorer)
|
||||
<contents of plan-explorer.md>
|
||||
|
||||
## Feedback from Prior Cycle
|
||||
<Creator-routed section of act-feedback.md only>
|
||||
|
||||
Note: Address each unresolved issue listed above. Explain how your revised proposal resolves it.
|
||||
```
|
||||
|
||||
### Do Phase
|
||||
|
||||
| Agent | Receives | Does NOT receive |
|
||||
|-------|----------|-----------------|
|
||||
| **Maker** (cycle 1) | `plan-creator.md` (the proposal), `plan-mini-explorer.md` (if exists) | `plan-explorer.md`, reviewer outputs, raw task description |
|
||||
| **Maker** (cycle 2+) | `plan-creator.md`, `plan-mini-explorer.md` (if exists), Maker-routed findings from `act-feedback.md` | Explorer research, Guardian/Skeptic findings (those went to Creator) |
|
||||
|
||||
**Maker context injection template (cycle 2+):**
|
||||
```markdown
|
||||
## Proposal
|
||||
<contents of plan-creator.md>
|
||||
|
||||
## Implementation Feedback from Prior Cycle
|
||||
<Maker-routed section of act-feedback.md only>
|
||||
|
||||
Note: The proposal has been revised to address design-level issues. Focus on the implementation
|
||||
feedback items above (code quality, test gaps, consistency).
|
||||
```
|
||||
|
||||
**Why Maker doesn't get Explorer output:** The Creator already distilled Explorer's research into a concrete proposal. Giving Maker raw research causes scope creep and "Rogue" shadow activation.
|
||||
|
||||
### Check Phase
|
||||
|
||||
| Agent | Receives | Does NOT receive |
|
||||
|-------|----------|-----------------|
|
||||
| **Guardian** | Maker's git diff, risk section from `plan-creator.md` | Full proposal, Explorer research, other reviewer outputs |
|
||||
| **Skeptic** | `plan-creator.md` (assumptions focus) | Git diff details, Explorer research, other reviewer outputs |
|
||||
| **Sage** | `plan-creator.md`, Maker's git diff, `do-maker.md` | Explorer research, other reviewer outputs |
|
||||
| **Trickster** | Maker's git diff only | Everything else |
|
||||
|
||||
**Guardian context injection template:**
|
||||
```markdown
|
||||
## Changes to Review
|
||||
<git diff from Maker's branch>
|
||||
|
||||
## Risk Assessment (from proposal)
|
||||
<risks section extracted from plan-creator.md>
|
||||
|
||||
Review these changes for security, reliability, breaking changes, and dependency risks.
|
||||
```
|
||||
|
||||
**Skeptic context injection template:**
|
||||
```markdown
|
||||
## Proposal to Challenge
|
||||
<contents of plan-creator.md>
|
||||
|
||||
Focus on assumptions, alternatives not considered, edge cases, and scalability.
|
||||
```
|
||||
|
||||
**Sage context injection template:**
|
||||
```markdown
|
||||
## Proposal
|
||||
<contents of plan-creator.md>
|
||||
|
||||
## Implementation Summary
|
||||
<contents of do-maker.md>
|
||||
|
||||
## Changes
|
||||
<git diff from Maker's branch>
|
||||
|
||||
Evaluate code quality, test coverage, documentation, and codebase consistency.
|
||||
```
|
||||
|
||||
**Trickster context injection template:**
|
||||
```markdown
|
||||
## Changes to Attack
|
||||
<git diff from Maker's branch>
|
||||
|
||||
Try to break this. Malformed input, boundaries, concurrency, error paths, dependency failures.
|
||||
```
|
||||
|
||||
### Act Phase
|
||||
|
||||
No agents are spawned in Act. The orchestrator reads all `check-*.md` artifacts directly.
|
||||
|
||||
---
|
||||
|
||||
## Feedback Routing
|
||||
|
||||
> **This is the canonical routing table.** Other skills (orchestration, act-phase) must match this table exactly. When updating routing rules, update this table first, then sync the others.
|
||||
|
||||
When building `act-feedback.md` after the Check phase, route each finding to the right agent for the next cycle:
|
||||
|
||||
| Finding Source | Finding Category | Routes To | Rationale |
|
||||
|---------------|-----------------|-----------|-----------|
|
||||
| Guardian | security, breaking-change | **Creator** | Design must change |
|
||||
| Guardian | reliability, dependency | **Creator** | Architectural decision needed |
|
||||
| Skeptic | design, scalability | **Creator** | Assumptions need revision |
|
||||
| Sage | quality, consistency | **Maker** | Implementation refinement |
|
||||
| Sage | testing | **Maker** | Test gap, not design flaw |
|
||||
| Trickster | reliability (design flaw) | **Creator** | Needs redesign |
|
||||
| Trickster | reliability (test gap) | **Maker** | Needs more tests |
|
||||
| Trickster | testing | **Maker** | Edge case not covered |
|
||||
|
||||
**Disambiguation rule:** When in doubt: if the fix requires changing the approach, route to Creator. If it requires changing the code within the existing approach, route to Maker.
|
||||
|
||||
### Feedback File Format
|
||||
|
||||
`act-feedback.md` is split into two sections so each agent can be given only its portion:
|
||||
|
||||
```markdown
|
||||
# Cycle <N> Feedback
|
||||
|
||||
## Creator-Routed Issues
|
||||
| # | Source | Severity | Category | Issue | Suggested Fix |
|
||||
|---|--------|----------|----------|-------|---------------|
|
||||
| 1 | Guardian | CRITICAL | security | SQL injection in user input | Add parameterized queries |
|
||||
| 2 | Skeptic | WARNING | design | Assumes single-tenant only | Add tenant isolation |
|
||||
|
||||
## Maker-Routed Issues
|
||||
| # | Source | Severity | Category | Issue | Suggested Fix |
|
||||
|---|--------|----------|----------|-------|---------------|
|
||||
| 3 | Sage | WARNING | quality | Test names don't describe behavior | Rename to describe expected outcome |
|
||||
| 4 | Sage | INFO | consistency | Import order doesn't match codebase style | Re-order imports |
|
||||
|
||||
## Resolved (from prior cycles)
|
||||
| # | Source | Issue | Resolution | Resolved In |
|
||||
|---|--------|-------|------------|-------------|
|
||||
| 1 | Guardian | Missing rate limit | Added rate limiter middleware | Cycle 1 |
|
||||
|
||||
## Convergence Warnings
|
||||
<any finding that appeared unresolved in 2+ consecutive cycles — requires user input>
|
||||
```
|
||||
|
||||
When injecting feedback into Creator's prompt, include **only** the "Creator-Routed Issues" section.
|
||||
When injecting feedback into Maker's prompt, include **only** the "Maker-Routed Issues" section.
|
||||
|
||||
---
|
||||
|
||||
## Cycle Archiving
|
||||
|
||||
When a PDCA cycle completes and a new cycle begins, archive the current artifacts so they are preserved and the working directory is clean for the next iteration.
|
||||
|
||||
### Archive Procedure
|
||||
|
||||
At the end of each cycle (before starting the next):
|
||||
|
||||
```bash
|
||||
RUN_DIR=".archeflow/artifacts/${RUN_ID}"
|
||||
ARCHIVE_DIR="${RUN_DIR}/cycle-${CYCLE}"
|
||||
|
||||
mkdir -p "$ARCHIVE_DIR"
|
||||
|
||||
# Copy all phase artifacts to archive
|
||||
cp "${RUN_DIR}"/plan-*.md "$ARCHIVE_DIR/" 2>/dev/null || true
|
||||
cp "${RUN_DIR}"/do-*.md "$ARCHIVE_DIR/" 2>/dev/null || true
|
||||
cp "${RUN_DIR}"/do-*.txt "$ARCHIVE_DIR/" 2>/dev/null || true
|
||||
cp "${RUN_DIR}"/check-*.md "$ARCHIVE_DIR/" 2>/dev/null || true
|
||||
cp "${RUN_DIR}"/act-feedback.md "$ARCHIVE_DIR/" 2>/dev/null || true
|
||||
```
|
||||
|
||||
**Do NOT delete** the working-level artifacts after archiving. The next cycle's agents need `act-feedback.md` and `plan-explorer.md` (Explorer cache may reuse prior research). Old artifacts in the working directory get overwritten when the new cycle's agents produce their outputs.
|
||||
|
||||
### Archive Access
|
||||
|
||||
Archived artifacts are read-only references. Use them for:
|
||||
- **Resolution tracking:** Compare `cycle-1/check-guardian.md` findings against `cycle-2/check-guardian.md` to detect resolved/persisting issues
|
||||
- **Convergence detection:** Same finding in `cycle-N/act-feedback.md` and `cycle-N+1/act-feedback.md` → escalate to user
|
||||
- **Post-hoc analysis:** Understanding how a solution evolved across cycles
|
||||
|
||||
---
|
||||
|
||||
## Artifact Existence Checks
|
||||
|
||||
Before injecting an artifact into an agent's context, always check if the file exists. Missing artifacts are expected in certain workflows:
|
||||
|
||||
| Artifact | Missing when |
|
||||
|----------|-------------|
|
||||
| `plan-explorer.md` | Fast workflow (no Explorer) |
|
||||
| `plan-mini-explorer.md` | Confidence gate did not trigger for risk coverage |
|
||||
| `check-skeptic.md` | Fast workflow, or A2 fast-path taken |
|
||||
| `check-sage.md` | Fast workflow, or A2 fast-path taken |
|
||||
| `check-trickster.md` | Non-thorough workflow, or A2 fast-path taken |
|
||||
| `act-feedback.md` | Cycle 1 (no prior feedback exists) |
|
||||
| `act-fixes.jsonl` | Cycle 1, or no fixes applied |
|
||||
|
||||
**Rule:** Never fail because an optional artifact is missing. Check existence, skip injection if absent, and note what was skipped in the event data.
|
||||
|
||||
---
|
||||
|
||||
## Git Diff as Artifact
|
||||
|
||||
The Maker's git diff is not saved as a file — it is generated on-the-fly from the Maker's worktree branch:
|
||||
|
||||
```bash
|
||||
git diff main...<maker-branch>
|
||||
```
|
||||
|
||||
This ensures reviewers always see the actual current diff, not a stale snapshot. The diff is injected directly into reviewer prompts, not saved to disk.
|
||||
|
||||
Exception: `do-maker-files.txt` IS saved to disk (just the file list, not the full diff) for quick reference by the orchestrator and for archiving purposes.
|
||||
|
||||
---
|
||||
|
||||
## Design Principles
|
||||
|
||||
1. **Minimal context per agent.** Each agent gets only what it needs. Over-injection causes distraction, shadow activation, and wasted tokens.
|
||||
2. **Artifacts are the handoff mechanism.** Agents never communicate directly. All inter-agent data flows through saved artifacts.
|
||||
3. **Files over memory.** Everything is on disk. If a session crashes, artifacts survive. A `--start-from` resume reads artifacts, not session state.
|
||||
4. **Overwrite, don't accumulate.** Working-level artifacts get overwritten each cycle. Archives preserve history. This keeps the working directory simple.
|
||||
5. **Check before inject.** Always verify artifact existence. Gracefully handle missing optional artifacts.
|
||||
@@ -1,249 +0,0 @@
|
||||
---
|
||||
name: convergence
|
||||
description: |
|
||||
Detects convergence, stalling, and oscillation in multi-cycle PDCA runs. Prevents wasted cycles
|
||||
by stopping early when findings are not being resolved or are bouncing between cycles.
|
||||
<example>Automatically loaded during Act phase before exit decision</example>
|
||||
<example>User: "Is the run converging?"</example>
|
||||
---
|
||||
|
||||
# Convergence Detection
|
||||
|
||||
In multi-cycle PDCA runs, the Act phase must decide whether another cycle will help or just waste tokens. This skill provides the analysis: are findings being resolved (converging), staying the same (stalling), or bouncing back (oscillating)?
|
||||
|
||||
## When It Runs
|
||||
|
||||
Convergence analysis runs **after the Check phase completes and before the Act phase exit decision**. It requires at least 2 cycles of data — on cycle 1, it is skipped (no comparison baseline).
|
||||
|
||||
```
|
||||
Check phase → Convergence Analysis → Act phase exit decision
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Step 1: Finding Comparison
|
||||
|
||||
Extract findings from the current cycle and compare against the previous cycle.
|
||||
|
||||
### Data Sources
|
||||
|
||||
- **Current cycle findings:** Parsed from `check-*.md` artifacts in `.archeflow/artifacts/<run_id>/`
|
||||
- **Previous cycle findings:** Parsed from `check-*.md` artifacts in `.archeflow/artifacts/<run_id>/cycle-<N-1>/`
|
||||
|
||||
Each finding is identified by a composite key: `source + category + file_location + description_keywords`.
|
||||
|
||||
### Finding Categories
|
||||
|
||||
Every finding from the current cycle is classified into exactly one category:
|
||||
|
||||
| Category | Definition |
|
||||
|----------|------------|
|
||||
| **NEW** | Finding not present in any previous cycle |
|
||||
| **RESOLVED** | Was present in the previous cycle, absent in the current cycle |
|
||||
| **PERSISTENT** | Present in both the current and previous cycle (same key) |
|
||||
| **REGRESSED** | Was RESOLVED in the previous cycle (was present in N-2, absent in N-1), but returned in the current cycle |
|
||||
|
||||
### Matching Algorithm
|
||||
|
||||
Two findings match if:
|
||||
1. Same `source` archetype (guardian, sage, etc.)
|
||||
2. Same `category` (security, reliability, quality, etc.)
|
||||
3. Same or overlapping file location (same file, line within 10 lines)
|
||||
4. 50%+ keyword overlap in description (lowercase, strip punctuation)
|
||||
|
||||
All four conditions must hold. This prevents false matches across unrelated findings.
|
||||
|
||||
---
|
||||
|
||||
## Step 2: Convergence Score
|
||||
|
||||
Calculate a convergence score from the categorized findings:
|
||||
|
||||
```
|
||||
convergence = resolved_count / (resolved_count + new_count + regressed_count)
|
||||
```
|
||||
|
||||
If the denominator is 0 (no resolved, no new, no regressed — only persistent), the score is `0.0` (stalled, not converging).
|
||||
|
||||
### Score Interpretation
|
||||
|
||||
| Score Range | Status | Meaning |
|
||||
|-------------|--------|---------|
|
||||
| > 0.8 | **Converging** | Most issues being resolved, few new ones introduced |
|
||||
| 0.5 - 0.8 | **Stalling** | Fixing roughly as many as introducing |
|
||||
| < 0.5 | **Diverging** | Making things worse — more new/regressed than resolved |
|
||||
| 0.0 (all persistent) | **Stuck** | No progress in either direction |
|
||||
|
||||
---
|
||||
|
||||
## Step 3: Oscillation Detection
|
||||
|
||||
An oscillating finding is one that bounces between resolved and re-introduced across cycles:
|
||||
|
||||
1. Finding was present in cycle N-2
|
||||
2. Finding was absent in cycle N-1 (resolved)
|
||||
3. Finding is present again in cycle N (regressed)
|
||||
|
||||
This indicates the fix in cycle N-1 was undone or invalidated by other changes in cycle N.
|
||||
|
||||
### Oscillation Rules
|
||||
|
||||
- A single oscillating finding: **flag it** in the convergence report but continue.
|
||||
- Two or more oscillating findings: **STOP** and escalate to the user.
|
||||
- Message: `"Findings X and Y are oscillating between cycles. Manual intervention needed — the automated fixes are interfering with each other."`
|
||||
|
||||
Oscillation tracking requires 3+ cycles of data. On cycles 1-2, oscillation detection is skipped.
|
||||
|
||||
---
|
||||
|
||||
## Step 4: Early Termination Rules
|
||||
|
||||
The convergence analysis can override the normal Act phase exit decision. If any of these conditions hold, the recommendation is **STOP**:
|
||||
|
||||
| Condition | Threshold | Recommendation |
|
||||
|-----------|-----------|----------------|
|
||||
| Diverging | Score < 0.5 for 2 consecutive cycles | STOP — changes are making things worse |
|
||||
| Stalled | 0 findings resolved between cycles | STOP — no progress, further cycles will not help |
|
||||
| Stuck | All findings are PERSISTENT for 2 consecutive cycles | STOP — automated fixes cannot resolve these |
|
||||
| Oscillating | 2+ findings oscillating | STOP — fixes are interfering with each other |
|
||||
|
||||
When STOP is recommended, the Act phase should:
|
||||
1. **Not** start another PDCA cycle
|
||||
2. Report all unresolved findings to the user
|
||||
3. Present the best implementation so far (on its branch, not merged)
|
||||
4. Include the convergence report explaining why the run was stopped
|
||||
|
||||
### Override Behavior
|
||||
|
||||
The convergence STOP recommendation overrides the normal cycle-back logic in the Act phase. Even if `CYCLE < MAX_CYCLES` and there are fixable-looking findings, if convergence says STOP, the run stops.
|
||||
|
||||
The user can always override by explicitly requesting another cycle: `"Run one more cycle anyway"`.
|
||||
|
||||
---
|
||||
|
||||
## Step 5: Integration with Act Phase
|
||||
|
||||
### Event Data
|
||||
|
||||
Convergence data is included in the `cycle.boundary` event emitted by the Act phase:
|
||||
|
||||
```json
|
||||
{
|
||||
"type": "cycle.boundary",
|
||||
"phase": "act",
|
||||
"data": {
|
||||
"cycle": 2,
|
||||
"max_cycles": 3,
|
||||
"exit_condition": "convergence_stop",
|
||||
"met": false,
|
||||
"fixes_applied": 2,
|
||||
"next_action": "stop",
|
||||
"convergence": {
|
||||
"score": 0.35,
|
||||
"status": "diverging",
|
||||
"resolved": 1,
|
||||
"new": 2,
|
||||
"regressed": 1,
|
||||
"persistent": 3,
|
||||
"oscillating": ["Timeline reference mismatch"],
|
||||
"recommendation": "stop",
|
||||
"reason": "Diverging for 2 consecutive cycles"
|
||||
}
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
### Decision Tree Update
|
||||
|
||||
The Act phase decision tree (from `act-phase` skill Step 4) gains a new first branch:
|
||||
|
||||
```
|
||||
┌─ Convergence analysis (cycle 2+)
|
||||
│
|
||||
├─ Convergence says STOP
|
||||
│ └─ STOP: Report to user with convergence report
|
||||
│
|
||||
├─ Convergence says CONTINUE
|
||||
│ └─ Fall through to normal exit decision logic
|
||||
│
|
||||
└─ Cycle 1 (no convergence data)
|
||||
└─ Fall through to normal exit decision logic
|
||||
```
|
||||
|
||||
### Act Feedback Enhancement
|
||||
|
||||
When the Act phase builds `act-feedback.md` for the next cycle, it includes the convergence summary at the top:
|
||||
|
||||
```markdown
|
||||
## Convergence Analysis (Cycle 1 → 2)
|
||||
|
||||
Score: 0.75 (converging)
|
||||
Resolved: 3 | New: 1 | Regressed: 0 | Persistent: 2
|
||||
|
||||
Recommendation: Continue — trend is positive
|
||||
|
||||
### Finding Status
|
||||
| Finding | Status | Cycles |
|
||||
|---------|--------|--------|
|
||||
| SQL injection in user input | RESOLVED | 1 |
|
||||
| Missing rate limit | RESOLVED | 1 |
|
||||
| Test names unclear | RESOLVED | 1 |
|
||||
| Null check missing in parser | PERSISTENT | 2 |
|
||||
| Error path not tested | PERSISTENT | 2 |
|
||||
| New: Unused import introduced | NEW | 1 |
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Convergence Report Format
|
||||
|
||||
The full convergence report is generated as part of the orchestration output:
|
||||
|
||||
```markdown
|
||||
## Convergence Analysis (Cycle N-1 → N)
|
||||
|
||||
**Score:** 0.75 (converging)
|
||||
**Resolved:** 3 | **New:** 1 | **Regressed:** 0 | **Persistent:** 2 | **Oscillating:** 0
|
||||
|
||||
### Resolved This Cycle
|
||||
| Source | Category | Description |
|
||||
|--------|----------|-------------|
|
||||
| guardian | security | SQL injection in user input handler |
|
||||
| guardian | reliability | Missing rate limit on auth endpoint |
|
||||
| sage | quality | Test names don't describe behavior |
|
||||
|
||||
### New This Cycle
|
||||
| Source | Category | Description |
|
||||
|--------|----------|-------------|
|
||||
| sage | quality | Unused import introduced by fix |
|
||||
|
||||
### Persistent (unresolved across cycles)
|
||||
| Source | Category | Description | Cycles Open |
|
||||
|--------|----------|-------------|-------------|
|
||||
| trickster | reliability | Null check missing in parser | 2 |
|
||||
| sage | testing | Error path not tested | 2 |
|
||||
|
||||
### Oscillating
|
||||
(none)
|
||||
|
||||
**Recommendation:** Continue — trend is positive
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Integration with Memory Skill
|
||||
|
||||
When convergence detects PERSISTENT findings (present for 2+ cycles), these are strong candidates for the `memory` skill's lesson extraction:
|
||||
|
||||
- After a run that had persistent findings, `archeflow-memory.sh extract` will pick these up with higher confidence (they have been confirmed across multiple cycles within a single run).
|
||||
- Persistent findings that also appear in `lessons.jsonl` from prior runs get a double frequency boost (cross-cycle within run + cross-run pattern).
|
||||
|
||||
---
|
||||
|
||||
## Design Principles
|
||||
|
||||
1. **Conservative stopping.** Requires 2 consecutive data points before recommending STOP. A single bad cycle might be noise.
|
||||
2. **User has final say.** STOP is a recommendation, not an enforced shutdown. The user can override.
|
||||
3. **Cheap computation.** Keyword matching on finding descriptions, simple arithmetic on counts. No ML, no embeddings.
|
||||
4. **Bounded scope.** Only compares adjacent cycles (N vs N-1, with N-2 for oscillation). Does not attempt to model long-term trends across many cycles.
|
||||
5. **Observable.** All convergence data is included in the `cycle.boundary` event, making it available for post-hoc analysis via the process log.
|
||||
@@ -1,193 +0,0 @@
|
||||
---
|
||||
name: do-phase
|
||||
description: Use when acting as Maker in the Do phase. Defines execution rules, worktree protocol, commit discipline, and output format.
|
||||
---
|
||||
|
||||
# Do Phase
|
||||
|
||||
Maker implements the Creator's proposal. This skill defines the execution protocol — the agent definition (`agents/maker.md`) has the behavioral rules.
|
||||
|
||||
## Execution Protocol
|
||||
|
||||
### 1. Read Before Writing
|
||||
Read the Creator's proposal completely. Identify:
|
||||
- Files to create or modify (the `### Changes` section)
|
||||
- Test strategy (the `### Test Strategy` section)
|
||||
- Scope boundaries (the `### Not Doing` section)
|
||||
|
||||
If the proposal is unclear on any point: implement your best interpretation and note the assumption in your output.
|
||||
|
||||
### 2. Implementation Order
|
||||
For each change in the proposal:
|
||||
1. Write the test first (expect it to fail)
|
||||
2. Implement the change (make the test pass)
|
||||
3. Verify existing tests still pass
|
||||
4. Commit with a descriptive message
|
||||
|
||||
For writing domain (stories, prose):
|
||||
1. Read the outline / scene plan
|
||||
2. Read the voice profile and character sheets
|
||||
3. Draft scene by scene, following the outline's emotional beats
|
||||
4. Self-check: does the voice hold? Does dialogue sound natural?
|
||||
5. Commit after each scene or logical section
|
||||
|
||||
### 3. Commit Discipline
|
||||
|
||||
**CRITICAL: Always commit before finishing.** Uncommitted worktree changes are LOST when the agent exits.
|
||||
|
||||
Commit conventions:
|
||||
```
|
||||
feat: <what was added> # New functionality
|
||||
fix: <what was fixed> # Bug fix within the task
|
||||
test: <what was tested> # Test additions
|
||||
docs: <what was documented> # Documentation only
|
||||
```
|
||||
|
||||
Commit frequency:
|
||||
- **Code:** After each logical step (one feature, one fix, one test suite)
|
||||
- **Writing:** After each scene or section (~500-1000 words)
|
||||
- **Never:** One big commit at the end with everything
|
||||
|
||||
### 4. Scope Control
|
||||
|
||||
Do exactly what the proposal says. No more, no less.
|
||||
|
||||
**In scope:**
|
||||
- Files listed in the proposal's `### Changes` section
|
||||
- Tests specified in the `### Test Strategy` section
|
||||
- Dependencies explicitly mentioned
|
||||
|
||||
**Out of scope (even if tempting):**
|
||||
- Refactoring code you noticed while implementing
|
||||
- Adding features not in the proposal
|
||||
- Fixing pre-existing bugs in adjacent code
|
||||
- Updating documentation beyond what the task requires
|
||||
|
||||
If you encounter something that needs fixing but is out of scope: note it in `### Notes` for future work. Don't fix it now.
|
||||
|
||||
### 5. Blocker Protocol
|
||||
|
||||
If you hit a blocker (dependency missing, test infrastructure broken, proposal contradicts codebase):
|
||||
1. Document what's blocked and why
|
||||
2. Document what you completed before the block
|
||||
3. Commit what you have
|
||||
4. Stop and report — don't silently work around it
|
||||
|
||||
## Worktree Protocol
|
||||
|
||||
When running in an isolated git worktree (`isolation: "worktree"`):
|
||||
|
||||
```
|
||||
main branch (untouched)
|
||||
└── archeflow/maker-<run_id> (worktree branch)
|
||||
├── commit: implementation step 1
|
||||
├── commit: implementation step 2
|
||||
└── commit: implementation step 3 (final)
|
||||
```
|
||||
|
||||
- All work stays on the worktree branch
|
||||
- Main branch is never modified directly
|
||||
- The branch name follows the pattern: `archeflow/maker-<run_id>`
|
||||
- After Check phase approves: the orchestrator merges (not the Maker)
|
||||
|
||||
## Output Format
|
||||
|
||||
```markdown
|
||||
## Implementation: <task>
|
||||
|
||||
### Files Changed
|
||||
- `path/file.ext` — What changed (+N -M lines)
|
||||
|
||||
### Tests
|
||||
- N new tests, all passing
|
||||
- M existing tests still passing
|
||||
|
||||
### Commits
|
||||
1. `feat: description` (hash)
|
||||
2. `test: description` (hash)
|
||||
|
||||
### Notes
|
||||
- Assumptions made where proposal was unclear
|
||||
- Out-of-scope issues noticed (for future work)
|
||||
|
||||
### Branch
|
||||
`archeflow/maker-<run_id>` — ready for review
|
||||
```
|
||||
|
||||
For writing domain:
|
||||
```markdown
|
||||
## Draft: <story/chapter title>
|
||||
|
||||
### Scenes Written
|
||||
- Scene 1: <title> (~N words)
|
||||
- Scene 2: <title> (~N words)
|
||||
|
||||
### Word Count
|
||||
- Target: N | Actual: M | Delta: +/-
|
||||
|
||||
### Voice Notes
|
||||
- Dialect usage: N instances (target: moderate)
|
||||
- Essen/Trinken: present in X/Y scenes
|
||||
|
||||
### Commits
|
||||
1. `feat: scene 1 - <title>` (hash)
|
||||
2. `feat: scene 2 - <title>` (hash)
|
||||
|
||||
### Notes
|
||||
- Deviations from outline (with reasoning)
|
||||
```
|
||||
|
||||
## With Prior Feedback (Cycle 2+)
|
||||
|
||||
When the Maker receives feedback from a prior cycle's Check phase:
|
||||
|
||||
1. Read the `act-feedback.md` — focus on the `### For Maker` section
|
||||
2. Address each finding marked as "routed to Maker"
|
||||
3. In your output, include a response table:
|
||||
|
||||
```markdown
|
||||
### Feedback Response
|
||||
| Finding | Source | Action |
|
||||
|---------|--------|--------|
|
||||
| Test names unclear | Sage | Fixed — renamed to behavior descriptions |
|
||||
| Missing edge case | Trickster | Added test for empty input |
|
||||
```
|
||||
|
||||
Do not address findings routed to Creator — those were handled in the revised proposal.
|
||||
|
||||
## Quality Checklist (self-check before finishing)
|
||||
|
||||
Before your final commit, verify:
|
||||
- [ ] All proposal changes implemented
|
||||
- [ ] All new tests pass
|
||||
- [ ] All existing tests still pass
|
||||
- [ ] No files modified outside proposal scope
|
||||
- [ ] Every logical step has its own commit
|
||||
- [ ] Output summary is complete and accurate
|
||||
- [ ] Branch name follows convention
|
||||
|
||||
## Test-First Gate
|
||||
|
||||
Before the Maker's output is accepted, the orchestrator validates that tests were included.
|
||||
|
||||
### Validation Logic
|
||||
|
||||
Read `do-maker-files.txt`. Check if any file path matches common test patterns:
|
||||
- `*test*`, `*spec*`, `*.test.*`, `*.spec.*`, `*_test.*`, `*_spec.*`
|
||||
- Files in directories named `test/`, `tests/`, `__tests__/`, `spec/`
|
||||
|
||||
For writing domain projects, this gate is skipped.
|
||||
|
||||
### Outcomes
|
||||
|
||||
| Result | Action |
|
||||
|--------|--------|
|
||||
| Test files found | Pass — proceed to Check phase |
|
||||
| No test files, code domain | **Warn** — emit WARNING event, note in do-maker.md |
|
||||
| No test files + Creator specified tests | **Block** — re-run Maker with test instruction (1 retry) |
|
||||
| Writing domain | Skip gate entirely |
|
||||
|
||||
The block case triggers a targeted re-run with prompt:
|
||||
"The proposal specified these test cases: <test strategy section>. No test files
|
||||
were found in your changes. Add the specified tests before finishing."
|
||||
This is one retry within the Do phase, not a full PDCA cycle.
|
||||
@@ -1,200 +0,0 @@
|
||||
---
|
||||
name: effectiveness
|
||||
description: |
|
||||
Track archetype effectiveness across runs. Scores each archetype on signal-to-noise,
|
||||
fix rate, cost efficiency, accuracy, and cycle impact. Recommends model tier changes
|
||||
and archetype removal based on rolling averages.
|
||||
<example>User: "Which reviewers are actually useful?"</example>
|
||||
<example>User: "Show archetype effectiveness report"</example>
|
||||
---
|
||||
|
||||
# Agent Effectiveness Scoring
|
||||
|
||||
Track which archetypes are most useful vs. which waste tokens. Over multiple runs, build a profile of each archetype's effectiveness and use it to optimize team composition and model selection.
|
||||
|
||||
## Storage
|
||||
|
||||
```
|
||||
.archeflow/memory/effectiveness.jsonl # Per-run archetype scores (append-only)
|
||||
```
|
||||
|
||||
## Scoring Dimensions
|
||||
|
||||
For each archetype that participates in a run, calculate these scores:
|
||||
|
||||
| Dimension | How Measured | Weight |
|
||||
|-----------|-------------|--------|
|
||||
| **Signal-to-noise** | useful findings / total findings | 0.30 |
|
||||
| **Fix rate** | findings that led to actual fixes / total findings | 0.25 |
|
||||
| **Cost efficiency** | useful findings per dollar spent | 0.20 |
|
||||
| **Accuracy** | findings not contradicted by other reviewers | 0.15 |
|
||||
| **Cycle impact** | did this archetype's findings lead to cycle exit? | 0.10 |
|
||||
|
||||
### Definitions
|
||||
|
||||
- **Useful finding**: A finding in a `review.verdict` event with `severity >= WARNING` (i.e., severity is `warning`, `bug`, or `critical`) AND `fix_required == true`.
|
||||
- **Actual fix**: A `fix.applied` event whose `source` field matches this archetype (or whose DAG `parent` chain traces back to this archetype's `review.verdict` event).
|
||||
- **Contradicted finding**: Another reviewer's `review.verdict` has `verdict == "approved"` for the same scope where this archetype flagged an issue. Approximation: if archetype A flags N findings but archetype B approves the same code with 0 findings in overlapping severity categories, A's unmatched findings are considered potentially contradicted.
|
||||
- **Cycle impact**: The archetype's findings (with `fix_required == true`) resulted in fixes that were part of the final approved cycle. Determined by checking if `fix.applied` events referencing this archetype exist before the final `cycle.boundary` with `met == true`.
|
||||
|
||||
### Composite Score
|
||||
|
||||
```
|
||||
composite = (signal_to_noise * 0.30)
|
||||
+ (fix_rate * 0.25)
|
||||
+ (cost_efficiency_normalized * 0.20)
|
||||
+ (accuracy * 0.15)
|
||||
+ (cycle_impact * 0.10)
|
||||
```
|
||||
|
||||
**Cost efficiency normalization**: Raw cost efficiency is `useful_findings / cost_usd`. To normalize to 0-1 range, use: `min(1.0, raw_efficiency / 100)`. The threshold of 100 means "100 useful findings per dollar" is considered perfect efficiency (achievable with haiku on structured reviews).
|
||||
|
||||
## Per-Run Scoring
|
||||
|
||||
After `run.complete`, calculate scores for each archetype that participated. The `extract` command does this.
|
||||
|
||||
### Per-Run Score Record
|
||||
|
||||
```jsonl
|
||||
{"ts":"2026-04-03T16:00:00Z","run_id":"2026-04-03-der-huster","archetype":"guardian","signal_to_noise":0.85,"fix_rate":1.0,"cost_efficiency":42.5,"accuracy":1.0,"cycle_impact":true,"composite_score":0.91,"tokens":5000,"cost_usd":0.004,"model":"haiku","findings_total":4,"findings_useful":3,"fixes_applied":3}
|
||||
```
|
||||
|
||||
Appended to `.archeflow/memory/effectiveness.jsonl`.
|
||||
|
||||
### Scoring Non-Review Archetypes
|
||||
|
||||
Only archetypes that produce `review.verdict` events are scored (Guardian, Skeptic, Sage, Trickster, and any custom review archetypes). Non-review archetypes (Explorer, Creator, Maker) are tracked by cost-tracking but not effectiveness-scored, because their output quality is measured differently (by whether the run succeeds, not by individual findings).
|
||||
|
||||
## Aggregate Scoring
|
||||
|
||||
Across all runs, maintain rolling averages (computed on-demand, not stored):
|
||||
|
||||
```jsonl
|
||||
{"archetype":"guardian","runs":12,"avg_composite":0.88,"avg_signal_noise":0.82,"avg_cost_efficiency":38.2,"trend":"stable","recommendation":"keep"}
|
||||
{"archetype":"trickster","runs":8,"avg_composite":0.35,"avg_signal_noise":0.20,"avg_cost_efficiency":5.1,"trend":"declining","recommendation":"consider_removing"}
|
||||
```
|
||||
|
||||
### Trend Calculation
|
||||
|
||||
Compare the average composite score of the last 5 runs to the 5 runs before that:
|
||||
|
||||
- **improving**: last-5 avg > prior-5 avg + 0.05
|
||||
- **declining**: last-5 avg < prior-5 avg - 0.05
|
||||
- **stable**: within +/- 0.05
|
||||
|
||||
If fewer than 10 runs exist, trend is `"insufficient_data"`.
|
||||
|
||||
### Recommendations
|
||||
|
||||
Based on aggregate composite scores:
|
||||
|
||||
| Composite Score | Recommendation | Meaning |
|
||||
|----------------|---------------|---------|
|
||||
| >= 0.70 | `keep` | Archetype is valuable, contributes meaningful findings |
|
||||
| 0.40 - 0.69 | `optimize` | Consider cheaper model or tighter review lens |
|
||||
| < 0.40 | `consider_removing` | Might be wasting tokens, review whether it adds value |
|
||||
|
||||
## Integration Points
|
||||
|
||||
### At Run Start
|
||||
|
||||
When the `run` skill initializes, show a brief effectiveness summary for the team's archetypes:
|
||||
|
||||
```
|
||||
Archetype effectiveness (last 10 runs):
|
||||
guardian: 0.88 (keep) — haiku, $0.004/run avg
|
||||
sage: 0.72 (keep) — sonnet, $0.08/run avg
|
||||
skeptic: 0.45 (optimize) — haiku, $0.003/run avg
|
||||
trickster: 0.32 (consider_removing) — haiku, $0.003/run avg
|
||||
```
|
||||
|
||||
### Model Tier Suggestions
|
||||
|
||||
Cross-reference effectiveness with model assignment:
|
||||
|
||||
- **High effectiveness on cheap model** (composite >= 0.7, model = haiku): "Keep cheap. Working well."
|
||||
- **Low effectiveness on cheap model** (composite < 0.5, model = haiku): "Consider upgrading to sonnet — cheap model may not be capturing issues."
|
||||
- **High effectiveness on expensive model** (composite >= 0.7, model = sonnet): "Try downgrading to haiku — may maintain quality at lower cost."
|
||||
- **Low effectiveness on expensive model** (composite < 0.5, model = sonnet): "Consider removing — expensive and not contributing."
|
||||
|
||||
### Cost-Tracking Integration
|
||||
|
||||
Multiply estimated cost by effectiveness to get "value per dollar":
|
||||
|
||||
```
|
||||
value_per_dollar = composite_score / cost_usd
|
||||
```
|
||||
|
||||
This metric helps compare archetypes directly: a cheap archetype with moderate effectiveness may have higher value_per_dollar than an expensive one with high effectiveness.
|
||||
|
||||
## Effectiveness Script
|
||||
|
||||
**Location:** `lib/archeflow-score.sh`
|
||||
|
||||
```
|
||||
Usage:
|
||||
archeflow-score.sh extract <events.jsonl> # Score archetypes from a completed run
|
||||
archeflow-score.sh report # Show aggregate effectiveness report
|
||||
archeflow-score.sh recommend <team.yaml> # Recommend model tiers for a team
|
||||
```
|
||||
|
||||
### `extract` Command
|
||||
|
||||
1. Read all events from the JSONL file
|
||||
2. Verify a `run.complete` event exists (scoring incomplete runs is unreliable)
|
||||
3. For each `review.verdict` event:
|
||||
- Count total findings and useful findings (severity >= WARNING, fix_required)
|
||||
- Cross-reference with `fix.applied` events via the `source` field or DAG parent chain
|
||||
- Check for contradictions from other reviewers
|
||||
- Determine cycle impact
|
||||
4. Calculate all scoring dimensions and composite score
|
||||
5. Append per-archetype score records to `.archeflow/memory/effectiveness.jsonl`
|
||||
|
||||
### `report` Command
|
||||
|
||||
1. Read `.archeflow/memory/effectiveness.jsonl`
|
||||
2. Group by archetype
|
||||
3. Calculate rolling averages (last 10 runs per archetype)
|
||||
4. Calculate trends (last 5 vs. prior 5)
|
||||
5. Output a markdown table:
|
||||
|
||||
```markdown
|
||||
# Archetype Effectiveness Report
|
||||
|
||||
| Archetype | Runs | Avg Score | S/N | Fix Rate | Cost Eff | Accuracy | Trend | Rec |
|
||||
|-----------|------|-----------|-----|----------|----------|----------|-------|-----|
|
||||
| guardian | 12 | 0.88 | 0.82 | 0.95 | 38.2 | 0.97 | stable | keep |
|
||||
| sage | 10 | 0.72 | 0.70 | 0.80 | 12.1 | 0.88 | improving | keep |
|
||||
| skeptic | 8 | 0.45 | 0.40 | 0.50 | 22.5 | 0.60 | stable | optimize |
|
||||
| trickster | 8 | 0.35 | 0.20 | 0.30 | 5.1 | 0.55 | declining | consider_removing |
|
||||
|
||||
**Model suggestions:**
|
||||
- skeptic (haiku, score 0.45): Consider upgrading to sonnet or tightening review lens
|
||||
- trickster (haiku, score 0.35): Consider removing — low signal, low fix rate
|
||||
```
|
||||
|
||||
### `recommend` Command
|
||||
|
||||
1. Read the team preset YAML file
|
||||
2. For each archetype in the team, look up its effectiveness from `.archeflow/memory/effectiveness.jsonl`
|
||||
3. Cross-reference current model assignment with effectiveness
|
||||
4. Output recommendations:
|
||||
|
||||
```markdown
|
||||
# Model Recommendations for team: story-development
|
||||
|
||||
| Archetype | Current Model | Score | Suggestion |
|
||||
|-----------|--------------|-------|------------|
|
||||
| guardian | haiku | 0.88 | Keep haiku — high effectiveness at low cost |
|
||||
| sage | sonnet | 0.72 | Keep sonnet — quality-sensitive role |
|
||||
| skeptic | haiku | 0.45 | Try sonnet — may improve signal quality |
|
||||
| trickster | haiku | 0.35 | Consider removing from team |
|
||||
```
|
||||
|
||||
## Design Principles
|
||||
|
||||
1. **Append-only.** Score records are immutable facts. Aggregates are computed on-demand.
|
||||
2. **Review archetypes only.** Non-review agents (Explorer, Creator, Maker) are not scored — their value is in the final product, not in individual findings.
|
||||
3. **Relative, not absolute.** Scores are meaningful in comparison (guardian vs. trickster), not as standalone numbers. The thresholds (0.7, 0.4) are starting points — calibrate after 20+ runs.
|
||||
4. **Actionable.** Every report ends with concrete recommendations (keep, optimize, remove, change model).
|
||||
5. **Cheap to compute.** One JSONL scan per report. No databases, no external services.
|
||||
@@ -1,634 +0,0 @@
|
||||
---
|
||||
name: orchestration
|
||||
description: Use when executing a multi-agent orchestration — spawning archetype agents, managing PDCA cycles, coordinating worktrees, and merging results. This is the step-by-step execution guide.
|
||||
---
|
||||
|
||||
# Orchestration Execution
|
||||
|
||||
This skill guides you through running a full ArcheFlow orchestration using Claude Code's native Agent tool and git worktrees.
|
||||
|
||||
## Strategy Selection
|
||||
|
||||
A **strategy** defines the shape of an orchestration run — which phases execute, in what order, and when to iterate. A **workflow** (fast/standard/thorough) controls the depth within a strategy.
|
||||
|
||||
### Available Strategies
|
||||
|
||||
| Strategy | Flow | When to Use |
|
||||
|----------|------|-------------|
|
||||
| `pdca` | Plan -> Do -> Check -> Act (cyclic) | Refactors, thorough reviews, multi-concern tasks |
|
||||
| `pipeline` | Plan -> Implement -> Spec-Review -> Quality-Review -> Verify (linear) | Bug fixes, fast patches, single-concern tasks |
|
||||
| `auto` | Selected by task analysis | Default — let ArcheFlow decide |
|
||||
|
||||
### Strategy Interface
|
||||
|
||||
Every strategy defines:
|
||||
|
||||
- **Phases** — ordered list of execution stages
|
||||
- **Agent mapping** — which archetypes run in each phase
|
||||
- **Transition rules** — conditions for moving between phases
|
||||
- **Iteration model** — cyclic (PDCA) or linear (pipeline)
|
||||
- **Exit conditions** — when the run terminates
|
||||
|
||||
### PDCA Strategy
|
||||
|
||||
The existing orchestration flow (Steps 0-4 below). Cyclic — the Act phase can feed back to Plan for another iteration. Best for tasks requiring multiple review perspectives and iterative refinement.
|
||||
|
||||
### Pipeline Strategy
|
||||
|
||||
Linear flow with no cycle-back. Faster for well-understood tasks where one pass is sufficient.
|
||||
|
||||
| Phase | Agent | Purpose |
|
||||
|-------|-------|---------|
|
||||
| Plan | Creator | Design proposal |
|
||||
| Implement | Maker | Build in worktree |
|
||||
| Spec-Review | Guardian, then Skeptic | Security + assumption check (sequential) |
|
||||
| Quality-Review | Sage | Code quality review |
|
||||
| Verify | (automated) | Run tests, apply targeted fix if CRITICAL |
|
||||
|
||||
No cycle-back — WARNINGs are logged but do not block. CRITICALs in Verify trigger a single targeted fix attempt by the Maker, not a full cycle.
|
||||
|
||||
### Auto-Selection Rules
|
||||
|
||||
When `strategy: auto` (default):
|
||||
|
||||
- Task contains "fix", "bug", "patch", "hotfix" → `pipeline`
|
||||
- Task contains "refactor", "redesign", "review" → `pdca`
|
||||
- Workflow is `thorough` → `pdca` (always)
|
||||
- Workflow is `fast` with single file → `pipeline`
|
||||
- Otherwise → `pdca`
|
||||
|
||||
---
|
||||
|
||||
## Step 0: Choose a Workflow
|
||||
|
||||
If `.archeflow/teams/<name>.yaml` exists, the user can reference a team preset: `"Use the backend team"`. Load the preset's phase config instead of built-in defaults. See `archeflow:custom-archetypes` skill for preset format.
|
||||
|
||||
Otherwise, assess the task and pick:
|
||||
|
||||
| Signal | Workflow |
|
||||
|--------|----------|
|
||||
| Small fix, low risk, single concern | `fast` (1 cycle) |
|
||||
| Feature, multiple files, moderate risk | `standard` (2 cycles) |
|
||||
| Security-sensitive, breaking changes, public API | `thorough` (3 cycles) |
|
||||
|
||||
## Workflow Adaptation Rules
|
||||
|
||||
The initial workflow choice is a starting point, not a commitment. These rules adapt the workflow at runtime. Each rule specifies when it evaluates (which phase boundary).
|
||||
|
||||
### A3: Confidence Gate (evaluates: after Plan, before Do)
|
||||
|
||||
**When:** Creator's confidence table has any axis below 0.5.
|
||||
**Action by axis:**
|
||||
|
||||
| Axis | Score < 0.5 Action |
|
||||
|------|-------------------|
|
||||
| Task understanding | **Pause.** Ask user to clarify before proceeding. Do not spawn Maker. |
|
||||
| Solution completeness | **Upgrade to standard.** Add Explorer before Maker starts. |
|
||||
| Risk coverage | **Spawn mini-Explorer** for the specific risky area (parallel, 5 min max). Maker can proceed. |
|
||||
|
||||
A3 runs before any Do/Check agents spawn, so there are no cancellation issues.
|
||||
|
||||
### A1: Conditional Escalation (evaluates: after Check, before next cycle)
|
||||
|
||||
**When:** Guardian rejects with 2+ CRITICAL findings in a `fast` workflow.
|
||||
**Action:** Escalate to `standard` for the **next cycle** — add Skeptic + Sage to the reviewer roster.
|
||||
**Why:** If Guardian found serious issues, more perspectives help find root causes.
|
||||
**Sticky:** Once escalated, the workflow stays escalated for all remaining cycles. A2 does not apply to escalated workflows.
|
||||
|
||||
### A2: Guardian Fast-Path (evaluates: after Guardian, before spawning other reviewers)
|
||||
|
||||
**When:** Guardian finds 0 CRITICAL and 0 WARNING in a non-escalated `standard` or `thorough` workflow.
|
||||
**Action:** Do not spawn Skeptic, Sage, or Trickster. Proceed directly to Act phase.
|
||||
**Why:** Guardian's security review is the strictest gate. Clean pass = safe to skip additional reviewers.
|
||||
**Critical:** Evaluate A2 **after Guardian completes but before other reviewers are spawned.** Do not spawn reviewers in parallel with Guardian — spawn Guardian first, check A2, then spawn remaining reviewers only if A2 doesn't trigger.
|
||||
**Does not apply to:** Escalated workflows (A1 triggered), or first cycle of `thorough` workflows (Trickster is mandatory on first pass).
|
||||
**Log:** Note "Guardian fast-path taken" in orchestration report.
|
||||
|
||||
### Evaluation Order
|
||||
|
||||
```
|
||||
Plan phase completes → A3 (confidence gate)
|
||||
↓
|
||||
Guardian completes → A2 (fast-path check) → if clean, skip other reviewers
|
||||
↓ if not, spawn other reviewers
|
||||
Check phase done → A1 (escalation check) → if 2+ CRITICALs in fast, next cycle is standard
|
||||
```
|
||||
|
||||
## Process Logging
|
||||
|
||||
If `.archeflow/events/` exists (or should be created), emit structured events throughout orchestration. See `archeflow:process-log` skill for full schema.
|
||||
|
||||
**Quick reference — emit at these points:**
|
||||
|
||||
```
|
||||
run.start → After workflow selection, before first agent
|
||||
agent.start → Before each Agent tool call
|
||||
agent.complete → After each Agent returns (include duration, tokens, summary, artifacts)
|
||||
decision → When choosing between alternatives (plot direction, approach, fix strategy)
|
||||
phase.transition → At Plan→Do, Do→Check, Check→Act boundaries
|
||||
review.verdict → After each reviewer delivers verdict
|
||||
fix.applied → After each edit addressing a review finding
|
||||
cycle.boundary → End of PDCA cycle
|
||||
shadow.detected → When shadow threshold triggers
|
||||
run.complete → After final Act phase (include totals)
|
||||
```
|
||||
|
||||
**Helper:** `./lib/archeflow-event.sh <run_id> <type> <phase> <agent> '<json>'`
|
||||
|
||||
**Report:** `./lib/archeflow-report.sh .archeflow/events/<run_id>.jsonl`
|
||||
|
||||
Events are optional — if the events dir doesn't exist, skip logging. Never let logging block orchestration.
|
||||
|
||||
---
|
||||
|
||||
## Model Configuration
|
||||
|
||||
Model assignment per archetype and workflow is configured in `.archeflow/config.yaml` under the `models:` section. The `archeflow:run` skill (section 0c) handles resolution with fallback chain: per-workflow per-archetype > per-workflow default > per-archetype > global default. When spawning agents manually, read the config to select the appropriate model.
|
||||
|
||||
---
|
||||
|
||||
## Step 1: Plan Phase
|
||||
|
||||
Spawn agents sequentially — Creator needs Explorer's findings.
|
||||
|
||||
### Explorer (if standard or thorough)
|
||||
|
||||
**Context to include:** Task description, relevant file paths, codebase access.
|
||||
**Context to exclude:** Prior proposals, review outputs, implementation details, feedback from previous cycles.
|
||||
|
||||
```
|
||||
Agent(
|
||||
description: "🔍 Explorer: research context",
|
||||
prompt: "<task description>
|
||||
You are the EXPLORER archetype.
|
||||
Research the codebase to understand:
|
||||
1. What files and functions are involved
|
||||
2. What dependencies exist
|
||||
3. What tests currently cover this area
|
||||
4. What patterns the codebase uses
|
||||
Write your findings as a structured research report.
|
||||
Be thorough but focused — no rabbit holes.",
|
||||
subagent_type: "Explore"
|
||||
)
|
||||
```
|
||||
|
||||
### Creator
|
||||
|
||||
**Context to include:** Task description, Explorer's research output. On cycle 2+: prior cycle's structured feedback (see Cycle Feedback Protocol).
|
||||
**Context to exclude:** Raw file contents (Explorer already summarized), git diffs, reviewer full outputs.
|
||||
|
||||
**Fast workflow only (no Explorer):** The Creator must perform a Mini-Reflect before proposing:
|
||||
1. Restate the task in your own words (catch misunderstandings early)
|
||||
2. List 3 assumptions you're making
|
||||
3. Name the one risk that would cause most damage if wrong
|
||||
|
||||
```
|
||||
Agent(
|
||||
description: "🏗️ Creator: design proposal",
|
||||
prompt: "<task description>
|
||||
You are the CREATOR archetype.
|
||||
<if fast workflow (no Explorer): Before proposing, perform a Mini-Reflect:
|
||||
1. Restate the task in one sentence
|
||||
2. List 3 assumptions you're making
|
||||
3. Name the highest-damage risk
|
||||
Then propose.>
|
||||
<if standard/thorough: Based on the research findings: <Explorer's output>>
|
||||
<if cycle 2+: Prior cycle feedback: <structured feedback — see Cycle Feedback Protocol>>
|
||||
Design a solution proposal including:
|
||||
1. Architecture decisions (with rationale)
|
||||
2. Files to create/modify (with specific changes)
|
||||
3. Alternatives considered (at least 2, with rejection rationale)
|
||||
4. Test strategy
|
||||
5. Confidence (scored by axis: task understanding, solution completeness, risk coverage)
|
||||
6. Risks you foresee
|
||||
<if cycle 2+: 6. How you addressed each unresolved issue from prior feedback>
|
||||
Be decisive. Ship a clear plan, not a menu of options.",
|
||||
subagent_type: "Plan"
|
||||
)
|
||||
```
|
||||
|
||||
## Step 2: Do Phase
|
||||
|
||||
Spawn Maker in an **isolated worktree** so changes don't affect main.
|
||||
|
||||
**Context to include:** Creator's proposal only. On cycle 2+: implementation-routed feedback from Sage/Trickster.
|
||||
**Context to exclude:** Explorer's research, Guardian/Skeptic findings (those go to Creator).
|
||||
|
||||
```
|
||||
Agent(
|
||||
description: "⚒️ Maker: implement proposal",
|
||||
prompt: "<task description>
|
||||
You are the MAKER archetype.
|
||||
Implement this proposal: <Creator's output>
|
||||
<if cycle 2+: Implementation feedback from prior cycle: <Sage/Trickster findings only>>
|
||||
Rules:
|
||||
1. Follow the proposal exactly — don't redesign
|
||||
2. Write tests for every behavioral change
|
||||
3. Commit with descriptive messages
|
||||
4. Run existing tests — nothing may break
|
||||
5. If the proposal is unclear, implement your best interpretation and note it
|
||||
Do NOT skip tests. Do NOT refactor unrelated code.
|
||||
|
||||
BEFORE finishing — Self-Review Checklist:
|
||||
1. Did I change ALL files listed in the proposal's Changes section?
|
||||
2. Did I add tests for each behavioral change?
|
||||
3. Are there files in my diff NOT listed in the proposal? If yes, revert them.
|
||||
4. Do all existing tests still pass?
|
||||
Report any gaps in your Implementation summary.",
|
||||
isolation: "worktree",
|
||||
mode: "bypassPermissions"
|
||||
)
|
||||
```
|
||||
|
||||
**Critical:** The Maker MUST commit its changes before finishing. Uncommitted changes in a worktree are lost.
|
||||
|
||||
## Step 3: Check Phase
|
||||
|
||||
Spawn Guardian **first**. After Guardian completes, check adaptation rule A2 (fast-path). If A2 triggers (0 CRITICAL, 0 WARNING, non-escalated workflow), skip remaining reviewers and proceed to Act. Otherwise, spawn remaining reviewers **in parallel**.
|
||||
|
||||
**Reviewer spawning protocol:** The canonical sequence (Guardian first, A2 evaluation, parallel spawning, timeout handling) is defined in `archeflow:check-phase` under "Reviewer Spawning Protocol". Follow that protocol for the exact spawning order, context per reviewer, and timeout rules.
|
||||
|
||||
### Guardian (always runs first)
|
||||
|
||||
**Context to include:** Maker's git diff, proposal risk section only.
|
||||
**Context to exclude:** Explorer's research, full proposal, other reviewer outputs.
|
||||
|
||||
```
|
||||
Agent(
|
||||
description: "🛡️ Guardian: security and risk review",
|
||||
prompt: "You are the GUARDIAN archetype.
|
||||
Review the changes in branch: <maker's branch>
|
||||
Assess:
|
||||
1. Security vulnerabilities (injection, auth bypass, data exposure)
|
||||
2. Reliability risks (error handling, edge cases, race conditions)
|
||||
3. Breaking changes (API compatibility, schema migrations)
|
||||
4. Dependency risks (new deps, version conflicts)
|
||||
Output: APPROVED or REJECTED with specific findings.
|
||||
Each finding: | file:line | CRITICAL/WARNING/INFO | category | description | fix |
|
||||
Categories: security, reliability, design, breaking-change, dependency
|
||||
Be rigorous but practical — flag real risks, not theoretical ones."
|
||||
)
|
||||
```
|
||||
|
||||
### Skeptic (if standard or thorough)
|
||||
|
||||
**Context to include:** Creator's proposal (focus on assumptions section).
|
||||
**Context to exclude:** Git diff details, Explorer's research, other reviewer outputs.
|
||||
|
||||
```
|
||||
Agent(
|
||||
description: "🤔 Skeptic: challenge assumptions",
|
||||
prompt: "You are the SKEPTIC archetype.
|
||||
Review the proposal: <Creator's proposal>
|
||||
Challenge:
|
||||
1. Assumptions in the design — what if they're wrong?
|
||||
2. Alternative approaches not considered
|
||||
3. Edge cases not tested
|
||||
4. Scalability concerns
|
||||
Output: APPROVED or REJECTED with counterarguments.
|
||||
Each finding: | file:line | CRITICAL/WARNING/INFO | category | description | fix |
|
||||
Categories: design, quality, testing, scalability
|
||||
Be constructive — every challenge must include a suggested alternative."
|
||||
)
|
||||
```
|
||||
|
||||
### Sage (if standard or thorough)
|
||||
|
||||
**Context to include:** Creator's proposal, Maker's git diff, implementation summary.
|
||||
**Context to exclude:** Explorer's raw research, other reviewer outputs.
|
||||
|
||||
```
|
||||
Agent(
|
||||
description: "📚 Sage: holistic quality review",
|
||||
prompt: "You are the SAGE archetype.
|
||||
Review the changes in branch: <maker's branch>
|
||||
Evaluate holistically:
|
||||
1. Code quality (readability, maintainability, simplicity)
|
||||
2. Test coverage (are the tests meaningful, not just present?)
|
||||
3. Documentation (does the change need docs?)
|
||||
4. Consistency with codebase patterns
|
||||
Output: APPROVED or REJECTED with quality findings.
|
||||
Each finding: | file:line | CRITICAL/WARNING/INFO | category | description | fix |
|
||||
Categories: quality, testing, design, consistency
|
||||
Judge like a senior engineer doing a PR review."
|
||||
)
|
||||
```
|
||||
|
||||
### Trickster (if thorough only)
|
||||
|
||||
**Context to include:** Maker's git diff only.
|
||||
**Context to exclude:** Everything else — proposal, research, other reviews.
|
||||
|
||||
```
|
||||
Agent(
|
||||
description: "🃏 Trickster: adversarial testing",
|
||||
prompt: "You are the TRICKSTER archetype.
|
||||
Try to break the changes in branch: <maker's branch>
|
||||
Attack vectors:
|
||||
1. Malformed input, boundary values, empty/null/huge data
|
||||
2. Concurrency and race conditions
|
||||
3. Error path exploitation
|
||||
4. Dependency failure scenarios
|
||||
Output: APPROVED or REJECTED with edge cases found.
|
||||
Each finding: | file:line | CRITICAL/WARNING/INFO | category | description | fix |
|
||||
Categories: security, reliability, testing
|
||||
Think like a QA engineer who gets paid per bug found."
|
||||
)
|
||||
```
|
||||
|
||||
## Step 4: Act Phase
|
||||
|
||||
Collect all reviewer outputs and decide.
|
||||
|
||||
### Completion Promise (optional)
|
||||
|
||||
If the user defined explicit done criteria with the task, check them now:
|
||||
|
||||
```
|
||||
Completion criteria: <test command passes> AND <Guardian approves>
|
||||
Example: "done when pytest passes and Guardian approves with 0 CRITICAL"
|
||||
```
|
||||
|
||||
If completion criteria are defined, **all criteria must pass** — reviewer approval alone is not sufficient. If tests fail but reviewers approved, cycle back with "tests failing" as feedback to Creator.
|
||||
|
||||
### All Approved (and completion criteria met)
|
||||
1. **Pre-merge hooks:** Check `.archeflow/hooks.yaml` for `pre-merge` hooks. Run them. If `fail_action: abort`, stop and report.
|
||||
2. Merge the Maker's worktree branch into the target branch
|
||||
3. **Post-merge hooks:** Run `post-merge` hooks from `.archeflow/hooks.yaml` if defined. Then run the project's test suite on the merged branch
|
||||
- Tests pass → proceed to step 3
|
||||
- Tests fail → **auto-revert** the merge commit, report the failure, and cycle back with "integration test failure on main" as feedback
|
||||
3. Report: what was implemented, what was reviewed, any warnings noted
|
||||
4. Clean up the worktree
|
||||
5. Record metrics (see Orchestration Metrics)
|
||||
|
||||
### Issues Found (and cycles remaining)
|
||||
1. Build structured feedback using the Cycle Feedback Protocol below
|
||||
2. Go back to Step 1 (Plan) with the feedback
|
||||
3. Creator revises the proposal, addressing each unresolved issue
|
||||
4. Maker re-implements in a fresh worktree
|
||||
5. Reviewers check again
|
||||
|
||||
### Max Cycles Reached with Unresolved Issues
|
||||
1. Report all unresolved findings to the user
|
||||
2. Present the best implementation so far (on its branch)
|
||||
3. Let the user decide: merge as-is, fix manually, or abandon
|
||||
|
||||
---
|
||||
|
||||
## Cycle Feedback Protocol
|
||||
|
||||
After the Check phase, build structured feedback for the next cycle. This replaces dumping raw reviewer output.
|
||||
|
||||
### 1. Extract Findings
|
||||
|
||||
Parse each reviewer's output into the standardized format:
|
||||
|
||||
```markdown
|
||||
## Cycle N Feedback
|
||||
|
||||
### Unresolved Issues
|
||||
| Source | Severity | Category | Issue | Route to |
|
||||
|--------|----------|----------|-------|----------|
|
||||
| Guardian | CRITICAL | security | SQL injection in user input | Creator |
|
||||
| Skeptic | WARNING | design | Assumes single-tenant only | Creator |
|
||||
| Sage | WARNING | quality | Test names don't describe behavior | Maker |
|
||||
| Trickster | CRITICAL | reliability | Empty string bypasses validation | Creator |
|
||||
|
||||
### Resolved (from cycle N-1)
|
||||
| Source | Issue | Resolution |
|
||||
|--------|-------|------------|
|
||||
| Guardian | Missing rate limit | Added rate limiter middleware |
|
||||
```
|
||||
|
||||
### 2. Route Feedback
|
||||
|
||||
Not all findings go to the same agent:
|
||||
|
||||
| Source | Category | Routes to | Reason |
|
||||
|--------|----------|-----------|--------|
|
||||
| Guardian | security, breaking-change | **Creator** | Design must change |
|
||||
| Guardian | reliability, dependency | **Creator** | Architectural decision needed |
|
||||
| Skeptic | design, scalability | **Creator** | Assumptions need revision |
|
||||
| Sage | quality, consistency | **Maker** | Implementation refinement |
|
||||
| Sage | testing | **Maker** | Test gap, not design flaw |
|
||||
| Trickster | reliability (design flaw) | **Creator** | Needs redesign |
|
||||
| Trickster | reliability (test gap) | **Maker** | Needs more tests |
|
||||
| Trickster | testing | **Maker** | Edge case not covered |
|
||||
|
||||
**Disambiguation rule:** When in doubt: if the fix requires changing the approach, route to Creator. If it requires changing the code within the existing approach, route to Maker.
|
||||
|
||||
### 3. Track Resolution
|
||||
|
||||
Compare cycle N findings against cycle N-1:
|
||||
- If a prior finding no longer appears in the same category → mark **resolved**
|
||||
- If a prior finding persists → it stays **unresolved** with an incremented cycle count
|
||||
- If new findings appear → add as new unresolved issues
|
||||
|
||||
This prevents regression and gives the Creator/Maker a clear list of what to address.
|
||||
|
||||
### 4. Convergence Detection
|
||||
|
||||
If the **same finding** (same category + same file location) appears **unresolved in 2 consecutive cycles**, escalate to user:
|
||||
|
||||
> "Finding persists across 2 cycles: [Guardian] CRITICAL security — SQL injection in src/auth.ts:48. This may need human judgment or a different approach."
|
||||
|
||||
Do not cycle again blindly. The issue is likely structural (wrong design, not wrong implementation) and needs human input.
|
||||
|
||||
### 5. Cross-Archetype Dedup
|
||||
|
||||
If two reviewers raise the same issue (same file + same category + similar description), merge into one finding in the consolidated output:
|
||||
|
||||
```
|
||||
| Guardian + Skeptic | CRITICAL | security | Input not sanitized (src/api.ts:30) | Add validation |
|
||||
```
|
||||
|
||||
Don't double-count in severity tallies. Route to the higher-priority destination (Creator over Maker).
|
||||
|
||||
---
|
||||
|
||||
## Orchestration Metrics
|
||||
|
||||
Track lightweight metrics throughout the orchestration. No token counting (unreliable from skill layer) — just timing and outcomes.
|
||||
|
||||
### Per-Phase Logging
|
||||
|
||||
After each phase completes, note:
|
||||
|
||||
```
|
||||
| Phase | Duration | Agents | Outcome |
|
||||
|-------|----------|--------|---------|
|
||||
| Plan | 45s | 2 | Proposal ready (confidence: 0.8) |
|
||||
| Do | 90s | 1 | 4 files changed, 8 tests added |
|
||||
| Check | 60s | 3 | 1 REJECTED (Guardian), 2 APPROVED |
|
||||
| Act | — | — | Cycle back → feedback built |
|
||||
```
|
||||
|
||||
### Orchestration Summary
|
||||
|
||||
At orchestration end, include in the report:
|
||||
|
||||
```markdown
|
||||
## Orchestration Metrics
|
||||
| Metric | Value |
|
||||
|--------|-------|
|
||||
| Workflow | standard |
|
||||
| Cycles | 2 of 2 |
|
||||
| Total duration | 4m 30s |
|
||||
| Agents spawned | 9 |
|
||||
| Findings (total) | 5 |
|
||||
| Findings (critical) | 1 |
|
||||
| Findings (resolved) | 4 |
|
||||
| Shadow detections | 0 |
|
||||
```
|
||||
|
||||
Use this data to calibrate future workflow selection — if fast workflows consistently need 0 cycles of revision, the task was well-scoped.
|
||||
|
||||
---
|
||||
|
||||
## Autonomous Mode
|
||||
|
||||
When running unattended (overnight sessions, batch queues), add these behaviors to the orchestration loop:
|
||||
|
||||
### Between-Task Checkpoint
|
||||
|
||||
After each task completes (success or failure):
|
||||
1. **Commit and push** all changes immediately
|
||||
2. **Update session log** at `.archeflow/session-log.md` with task outcome
|
||||
3. **Check stop conditions** before starting next task:
|
||||
- 3 consecutive failures → STOP
|
||||
- Shadow escalation (same shadow 3+ times) → STOP
|
||||
- Test suite broken after merge → REVERT and STOP
|
||||
- Destructive action detected → STOP
|
||||
|
||||
### Session Log Protocol
|
||||
|
||||
**Primary:** Emit `run.complete` event to `.archeflow/events/<run_id>.jsonl` (see Process Logging section above). The event stream is the source of truth.
|
||||
|
||||
**Secondary:** Also write a human-readable summary to `.archeflow/session-log.md`:
|
||||
|
||||
```markdown
|
||||
## Task N: <description>
|
||||
**Workflow:** standard | **Status:** COMPLETED/FAILED
|
||||
**Cycles:** 1 of 2
|
||||
**Findings:** Guardian APPROVED, Skeptic APPROVED, Sage WARNING (test names)
|
||||
**Files changed:** 5 | **Tests added:** 12
|
||||
**Branch:** merged to main (commit abc1234) | OR: archeflow/maker-xyz (NOT merged)
|
||||
**Duration:** 8 min
|
||||
**Events:** `.archeflow/events/<run_id>.jsonl` (full process log)
|
||||
```
|
||||
|
||||
Generate the full Markdown report: `./lib/archeflow-report.sh .archeflow/events/<run_id>.jsonl`
|
||||
|
||||
### Safety Rules
|
||||
- Never force-push. Never modify main history.
|
||||
- All work stays on worktree branches until explicitly merged
|
||||
- Merges use `--no-ff` — individually revertable
|
||||
- Failed tasks leave branches intact for manual inspection
|
||||
|
||||
For full autonomous mode details (task queues, overnight checklists, user controls): load the `archeflow:autonomous-mode` skill.
|
||||
|
||||
---
|
||||
|
||||
## Shadow Monitoring
|
||||
|
||||
During orchestration, watch for shadow activation after each agent completes. Quick checklist:
|
||||
|
||||
| Archetype | Shadow | Quick Check |
|
||||
|-----------|--------|-------------|
|
||||
| Explorer | Rabbit Hole | Output >2000 words without Recommendation section? |
|
||||
| Creator | Over-Architect | >2 new abstractions for one feature? |
|
||||
| Maker | Rogue | No test files in changeset? Files outside proposal? |
|
||||
| Guardian | Paranoid | CRITICAL:WARNING ratio >2:1? Zero approvals? |
|
||||
| Skeptic | Paralytic | >7 challenges? <50% have alternatives? |
|
||||
| Trickster | False Alarm | Findings in untouched code? >10 findings? |
|
||||
| Sage | Bureaucrat | Review >2x code change length? |
|
||||
|
||||
On detection: apply correction prompt from `archeflow:shadow-detection` skill. On second detection of same shadow: replace agent. On 3+ shadows in same cycle: escalate to user.
|
||||
|
||||
---
|
||||
|
||||
## Parallel Team Orchestration
|
||||
|
||||
When running multiple independent tasks, spawn parallel ArcheFlow teams. Each team runs its own PDCA cycle on a separate worktree.
|
||||
|
||||
### Rules
|
||||
|
||||
1. **Non-overlapping file scope:** Each team must work on different files. If two tasks touch the same file, run them sequentially.
|
||||
2. **Independent worktrees:** Each team's Maker gets its own worktree branch (`archeflow/team-1-maker`, `archeflow/team-2-maker`).
|
||||
3. **First-finished-first-merged:** Teams merge in completion order. Later teams rebase onto the updated main before their own merge.
|
||||
4. **Merge conflict handling:** If rebase fails, the later team re-runs its Check phase against the merged main. If conflicts are structural, escalate to user.
|
||||
5. **Max 3 parallel teams:** More causes diminishing returns and merge headaches.
|
||||
|
||||
### Spawning Parallel Teams
|
||||
|
||||
```
|
||||
# Launch 2-3 teams in a single message with multiple Agent calls:
|
||||
Agent(description: "🏗️ Team 1: pagination fix (fast)", ...)
|
||||
Agent(description: "🏗️ Team 2: JWT auth (standard)", ...)
|
||||
Agent(description: "🏗️ Team 3: logging refactor (fast)", ...)
|
||||
```
|
||||
|
||||
Each team follows the full PDCA steps independently. The orchestrator monitors all teams and handles merges.
|
||||
|
||||
---
|
||||
|
||||
## Reviewer Profiles
|
||||
|
||||
Projects can configure which reviewers matter in `.archeflow/config.yaml`:
|
||||
|
||||
```yaml
|
||||
reviewers:
|
||||
always: [guardian] # Always runs
|
||||
default: [sage] # Runs in standard+thorough
|
||||
thorough_only: [trickster] # Only in thorough
|
||||
skip: [skeptic] # Never runs for this project
|
||||
```
|
||||
|
||||
If no config exists, use the built-in workflow defaults. Profiles save tokens by not spawning reviewers that add little value for the specific project.
|
||||
|
||||
## Explorer Cache
|
||||
|
||||
If the same code area was explored recently, skip Explorer and reuse prior research:
|
||||
|
||||
**Cache hit criteria:** Same files affected (>70% overlap by path) AND prior research is <24 hours old AND no commits to those files since the research.
|
||||
|
||||
**On cache hit:** Show the prior research to Creator with a note: "Using cached Explorer research from [timestamp]. If the codebase changed significantly, re-run Explorer."
|
||||
|
||||
**On cache miss:** Run Explorer normally.
|
||||
|
||||
Cache is stored in `.archeflow/explorer-cache/` as timestamped markdown files. The orchestrator checks for matches before spawning Explorer.
|
||||
|
||||
## Learning from History
|
||||
|
||||
Track which archetypes catch real issues per project over time. After each orchestration, append to `.archeflow/metrics.jsonl`:
|
||||
|
||||
```json
|
||||
{"task": "...", "archetype": "guardian", "findings": 2, "critical": 1, "resolved": 2, "useful": true}
|
||||
{"task": "...", "archetype": "skeptic", "findings": 3, "critical": 0, "resolved": 0, "useful": false}
|
||||
```
|
||||
|
||||
A finding is **useful** if it was resolved (led to a code change) rather than dismissed.
|
||||
|
||||
After 10+ orchestrations, the orchestrator can recommend reviewer profile changes:
|
||||
- "Skeptic has found 0 useful issues in 8 runs — consider moving to `skip` or `thorough_only`"
|
||||
- "Guardian catches critical issues in 80% of runs — confirmed as essential"
|
||||
|
||||
This is advisory, not automatic. The user decides based on the data.
|
||||
|
||||
---
|
||||
|
||||
## Orchestration Report
|
||||
|
||||
After completion, summarize:
|
||||
|
||||
```markdown
|
||||
## ArcheFlow Orchestration Report
|
||||
- **Task:** <description>
|
||||
- **Workflow:** standard (2 cycles)
|
||||
- **Cycle 1:** Guardian rejected (SQL injection in user input handler)
|
||||
- **Cycle 2:** All approved after input sanitization added
|
||||
- **Files changed:** 4 files, +120 -30 lines
|
||||
- **Tests added:** 8 new tests
|
||||
- **Branch:** archeflow/maker-<id> → merged to main
|
||||
- **Metrics:** 9 agents, 4m 30s, 5 findings (4 resolved, 1 info remaining)
|
||||
```
|
||||
@@ -1,175 +0,0 @@
|
||||
---
|
||||
name: plan-phase
|
||||
description: Use when acting as Explorer or Creator in the Plan phase. Defines output formats for research and proposals.
|
||||
---
|
||||
|
||||
# Plan Phase
|
||||
|
||||
Explorer researches, then Creator designs. Sequential — Creator needs Explorer's findings.
|
||||
|
||||
## Explorer Output Format
|
||||
|
||||
```markdown
|
||||
## Research: <task>
|
||||
|
||||
### Affected Code
|
||||
- `path/file.ext` — description (L<start>-<end>)
|
||||
|
||||
### Dependencies
|
||||
- What depends on what, what breaks if changed
|
||||
|
||||
### Patterns
|
||||
- How the codebase solves similar problems
|
||||
|
||||
### Risks
|
||||
- What could go wrong
|
||||
|
||||
### Recommendation
|
||||
<one paragraph: approach + rationale>
|
||||
```
|
||||
|
||||
## Creator Output Format
|
||||
|
||||
```markdown
|
||||
## Proposal: <task>
|
||||
|
||||
### Mini-Reflect (fast workflow only — skip if Explorer ran)
|
||||
- **Task restated:** <one sentence>
|
||||
- **Assumptions:** 1) ... 2) ... 3) ...
|
||||
- **Highest-damage risk:** <the one thing that would hurt most if wrong>
|
||||
|
||||
### Architecture Decision
|
||||
<What and WHY>
|
||||
|
||||
### Alternatives Considered
|
||||
| Approach | Why Rejected |
|
||||
|----------|-------------|
|
||||
| <option A> | <reason> |
|
||||
| <option B> | <reason> |
|
||||
|
||||
### Changes
|
||||
1. **`path/file.ext`** — What changes and why
|
||||
2. **`path/test.ext`** — What tests to add
|
||||
|
||||
### Test Strategy
|
||||
- <specific test cases>
|
||||
|
||||
### Confidence
|
||||
| Axis | Score | Note |
|
||||
|------|-------|------|
|
||||
| Task understanding | <0.0-1.0> | <why> |
|
||||
| Solution completeness | <0.0-1.0> | <gaps?> |
|
||||
| Risk coverage | <0.0-1.0> | <unknowns?> |
|
||||
|
||||
### Risks
|
||||
- <what could go wrong + mitigations>
|
||||
|
||||
### Not Doing
|
||||
- <adjacent concerns deliberately excluded>
|
||||
```
|
||||
|
||||
**Confidence triggers:** If any axis scores below 0.5, flag it to the orchestrator. Low task understanding → clarify with user. Low solution completeness → consider standard workflow. Low risk coverage → spawn targeted Explorer research.
|
||||
|
||||
## Creator with Prior Feedback (Cycle 2+)
|
||||
|
||||
When the Creator receives structured feedback from a prior cycle, the proposal must include an additional section addressing each unresolved issue:
|
||||
|
||||
```markdown
|
||||
## Proposal: <task> (Revision — Cycle N)
|
||||
|
||||
### What Changed (vs. prior proposal)
|
||||
- <brief delta: what was added, removed, or redesigned>
|
||||
|
||||
### Prior Feedback Response
|
||||
| Issue | Source | Action | Rationale |
|
||||
|-------|--------|--------|-----------|
|
||||
| SQL injection in user input | Guardian | **Fixed** — added parameterized queries | Direct security fix |
|
||||
| Assumes single-tenant | Skeptic | **Deferred** — multi-tenant out of scope | Not in task requirements |
|
||||
| Test names unclear | Sage | **Accepted** — routed to Maker | Implementation concern |
|
||||
|
||||
### Architecture Decision
|
||||
<revised design addressing feedback>
|
||||
|
||||
### Changes
|
||||
<updated file list>
|
||||
|
||||
### Test Strategy
|
||||
<updated test cases>
|
||||
|
||||
### Confidence
|
||||
| Axis | Score | Note |
|
||||
|------|-------|------|
|
||||
| Task understanding | <0.0-1.0> | <why> |
|
||||
| Solution completeness | <0.0-1.0> | <gaps?> |
|
||||
| Risk coverage | <0.0-1.0> | <unknowns?> |
|
||||
|
||||
### Risks
|
||||
<updated risks — include any new risks from the revision>
|
||||
|
||||
### Not Doing
|
||||
<updated scope boundaries>
|
||||
```
|
||||
|
||||
**Rules for addressing feedback:**
|
||||
- **Fixed:** Changed the design to resolve the issue. Explain how.
|
||||
- **Deferred:** Not addressing now, with explicit reason. Must not be a CRITICAL finding.
|
||||
- **Accepted:** Acknowledged and routed to Maker for implementation-level fix.
|
||||
- **Disputed:** Disagrees with the finding. Must provide evidence or reasoning.
|
||||
|
||||
CRITICAL findings cannot be deferred or disputed — they must be fixed or the proposal will be rejected again.
|
||||
|
||||
## Task Granularity
|
||||
|
||||
Each change item in the Creator's proposal must be a **2-5 minute task** — specific enough that the Maker can implement it without interpretation.
|
||||
|
||||
### Requirements per Change Item
|
||||
|
||||
Every item in the `### Changes` section must include:
|
||||
|
||||
1. **Exact file path** — `src/auth/handler.ts`, not "the auth module"
|
||||
2. **What to change** — a code block showing the target state or transformation
|
||||
3. **How to verify** — a command or check that confirms correctness
|
||||
|
||||
### Good Example
|
||||
|
||||
```markdown
|
||||
1. **`src/auth/handler.ts:48`** — Add input length validation before token processing
|
||||
```typescript
|
||||
if (!token || token.trim().length === 0) {
|
||||
throw new ValidationError('Token must not be empty');
|
||||
}
|
||||
```
|
||||
**Verify:** `npm test -- --grep "empty token"` passes
|
||||
```
|
||||
|
||||
### Bad Example
|
||||
|
||||
```markdown
|
||||
1. **Auth module** — Fix the validation logic
|
||||
```
|
||||
|
||||
This is too vague. Which file? Which function? What does "fix" mean? The Maker will guess.
|
||||
|
||||
### Granularity Check
|
||||
|
||||
- If a single change item would take **>5 minutes**, split it into smaller items
|
||||
- If a non-trivial task has **<2 change items**, it is under-specified — the Creator missed something
|
||||
- Each item should touch **1-2 files** at most. Cross-cutting changes need separate items per file.
|
||||
|
||||
---
|
||||
|
||||
## Explorer Skip Conditions
|
||||
|
||||
Not every task needs Explorer research. Use this decision table:
|
||||
|
||||
| Condition | Skip Explorer? | Reason |
|
||||
|-----------|---------------|--------|
|
||||
| Task names specific files (1-2) and change is clear | **Yes** | Context is already known |
|
||||
| Bug fix with stack trace or error message | **Yes** | Root cause is locatable without research |
|
||||
| High confidence + small scope (single function/class) | **Yes** | Creator can mini-reflect instead |
|
||||
| Task contains "investigate", "research", "explore" | **No** | Explicit research request |
|
||||
| Task affects >3 files or unknown scope | **No** | Need dependency mapping |
|
||||
| Unfamiliar area of codebase (no recent commits by team) | **No** | Need pattern discovery |
|
||||
| Security-sensitive change (auth, crypto, input handling) | **No** | Need risk surface mapping |
|
||||
|
||||
When Explorer is skipped, Creator MUST include the **Mini-Reflect** section in its proposal to compensate for missing research context.
|
||||
@@ -1,278 +0,0 @@
|
||||
---
|
||||
name: process-log
|
||||
description: |
|
||||
Event-based process logging for ArcheFlow orchestrations. Captures every phase transition,
|
||||
agent output, decision, and fix as structured JSONL events. Enables post-hoc reports,
|
||||
dashboards, and process archaeology.
|
||||
<example>Automatically loaded during orchestration</example>
|
||||
<example>User: "Show me how this story was made"</example>
|
||||
---
|
||||
|
||||
# Process Log — Event-Sourced Orchestration History
|
||||
|
||||
Every ArcheFlow orchestration writes structured events to a JSONL file. Events are the **single source of truth** — all reports (Markdown, dashboards, timelines) are generated views.
|
||||
|
||||
## Event Storage
|
||||
|
||||
```
|
||||
.archeflow/events/<run-id>.jsonl # One file per orchestration run
|
||||
.archeflow/events/index.jsonl # Run index (one line per run, for listing)
|
||||
```
|
||||
|
||||
**Run ID format:** `<date>-<slug>` (e.g., `2026-04-03-der-huster`)
|
||||
|
||||
## When to Emit Events
|
||||
|
||||
Emit an event at each of these points during orchestration:
|
||||
|
||||
| Moment | Event Type | Trigger |
|
||||
|--------|-----------|---------|
|
||||
| Orchestration starts | `run.start` | After workflow selection, before first agent |
|
||||
| Agent spawned | `agent.start` | Before each Agent tool call |
|
||||
| Agent completes | `agent.complete` | After each Agent returns |
|
||||
| Phase transition | `phase.transition` | Plan→Do, Do→Check, Check→Act |
|
||||
| Decision made | `decision` | Plot direction chosen, fix applied, workflow adapted |
|
||||
| Review verdict | `review.verdict` | Guardian/Sage/Skeptic delivers verdict |
|
||||
| Fix applied | `fix.applied` | After each edit that addresses a review finding |
|
||||
| Cycle boundary | `cycle.boundary` | End of PDCA cycle, before next (or exit) |
|
||||
| Shadow detected | `shadow.detected` | Shadow threshold triggered |
|
||||
| Orchestration ends | `run.complete` | After final Act phase |
|
||||
|
||||
## Event Schema
|
||||
|
||||
Every event is one JSON line with these required fields:
|
||||
|
||||
```jsonl
|
||||
{
|
||||
"ts": "2026-04-03T14:32:07Z",
|
||||
"run_id": "2026-04-03-der-huster",
|
||||
"seq": 4,
|
||||
"parent": [2],
|
||||
"type": "agent.complete",
|
||||
"phase": "plan",
|
||||
"agent": "creator",
|
||||
"data": { ... }
|
||||
}
|
||||
```
|
||||
|
||||
| Field | Type | Description |
|
||||
|-------|------|-------------|
|
||||
| `ts` | ISO 8601 | Timestamp |
|
||||
| `run_id` | string | Unique run identifier |
|
||||
| `seq` | integer | Monotonically increasing sequence number within run |
|
||||
| `parent` | int[] | Seq numbers of causal parent events. Forms a DAG. `[]` for root events. |
|
||||
| `type` | string | Event type (see table above) |
|
||||
| `phase` | string | Current PDCA phase: `plan`, `do`, `check`, `act` |
|
||||
| `agent` | string or null | Agent archetype that triggered the event |
|
||||
| `data` | object | Event-type-specific payload (see below) |
|
||||
|
||||
### Parent Relationships (DAG)
|
||||
|
||||
The `parent` field turns the flat event stream into a directed acyclic graph (agent call graph). This enables:
|
||||
|
||||
- **Causal reconstruction:** which agent output caused which downstream action
|
||||
- **Parallel visualization:** agents sharing a parent ran concurrently
|
||||
- **Blame tracking:** trace a fix back through review → draft → outline → research
|
||||
|
||||
Rules:
|
||||
- `run.start` has `parent: []` (root node)
|
||||
- An agent has `parent: [seq of event that triggered it]`
|
||||
- A phase transition has `parent: [seq of all completing events in prior phase]`
|
||||
- A fix has `parent: [seq of the review that found the issue]`
|
||||
- A decision has `parent: [seq of the agent that produced the alternatives]`
|
||||
- Parallel agents share the same parent (fan-out), phase transitions collect them (fan-in)
|
||||
|
||||
Example DAG from a writing workflow:
|
||||
```
|
||||
#1 run.start []
|
||||
├── #2 agent.complete (explorer) [1]
|
||||
│ └── #3 decision (plot direction) [2]
|
||||
├── #4 agent.complete (creator) [2] ← explorer informs creator
|
||||
├── #5 phase.transition (plan→do) [3,4] ← fan-in
|
||||
│ └── #6 agent.complete (maker) [5]
|
||||
├── #7 phase.transition (do→check) [6]
|
||||
│ ├── #8 review (guardian) [7] ← parallel (fan-out)
|
||||
│ └── #9 review (sage) [7] ← parallel (fan-out)
|
||||
├── #10 phase.transition (check→act) [8,9] ← fan-in
|
||||
├── #11 fix (timeline) [8] ← caused by guardian
|
||||
├── #12 fix (voice drift) [9] ← caused by sage
|
||||
└── #18 run.complete [17]
|
||||
```
|
||||
|
||||
## Event Payloads by Type
|
||||
|
||||
### `run.start`
|
||||
```json
|
||||
{
|
||||
"task": "Write short story 'Der Huster'",
|
||||
"workflow": "kurzgeschichte",
|
||||
"team": "story-development",
|
||||
"max_cycles": 2,
|
||||
"config": {
|
||||
"voice_profile": "vp-giesing-gschichten-v1",
|
||||
"persona": "giesinger",
|
||||
"target_words": 6000
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
### `agent.start`
|
||||
```json
|
||||
{
|
||||
"archetype": "story-explorer",
|
||||
"model": "haiku",
|
||||
"prompt_summary": "Research premise, find emotional core, suggest 3 plot directions"
|
||||
}
|
||||
```
|
||||
|
||||
### `agent.complete`
|
||||
```json
|
||||
{
|
||||
"archetype": "story-explorer",
|
||||
"duration_ms": 87605,
|
||||
"tokens": 21645,
|
||||
"artifacts": ["docs/01-der-huster-research.md"],
|
||||
"summary": "3 plot directions developed, recommended C (Mo krank + Koffer)"
|
||||
}
|
||||
```
|
||||
|
||||
### `decision`
|
||||
```json
|
||||
{
|
||||
"what": "plot_direction",
|
||||
"chosen": "C — Mo krank + Koffer aus B",
|
||||
"alternatives": [
|
||||
{"id": "A", "label": "Mo ist weg", "reason_rejected": "Zu passiv für 6k-Story"},
|
||||
{"id": "B", "label": "Huster gehört nicht Mo", "reason_rejected": "Zu Krimi-nah"}
|
||||
],
|
||||
"rationale": "Stärkster emotionaler Kern, passt zum Voice Profile"
|
||||
}
|
||||
```
|
||||
|
||||
### `review.verdict`
|
||||
```json
|
||||
{
|
||||
"archetype": "guardian",
|
||||
"verdict": "approved_with_fixes",
|
||||
"findings": [
|
||||
{"severity": "bug", "description": "Timeline: 'Montag' referenced but story starts Dienstag", "fix_required": true},
|
||||
{"severity": "recommendation", "description": "Gentrification monologue too long for Alex register", "fix_required": false}
|
||||
]
|
||||
}
|
||||
```
|
||||
|
||||
### `fix.applied`
|
||||
```json
|
||||
{
|
||||
"source": "guardian",
|
||||
"finding": "Timeline: Montag → Dienstag",
|
||||
"file": "stories/01-der-huster.md",
|
||||
"line": 302,
|
||||
"before": "das Gegenteil von Montag",
|
||||
"after": "das Gegenteil von Dienstag"
|
||||
}
|
||||
```
|
||||
|
||||
### `phase.transition`
|
||||
```json
|
||||
{
|
||||
"from": "plan",
|
||||
"to": "do",
|
||||
"artifacts_so_far": ["research.md", "outline.md"],
|
||||
"notes": "Explorer recommended direction C, Creator produced 6-scene outline"
|
||||
}
|
||||
```
|
||||
|
||||
### `cycle.boundary`
|
||||
```json
|
||||
{
|
||||
"cycle": 1,
|
||||
"max_cycles": 2,
|
||||
"exit_condition": "all_approved",
|
||||
"met": true,
|
||||
"fixes_applied": 6,
|
||||
"next_action": "complete"
|
||||
}
|
||||
```
|
||||
|
||||
### `shadow.detected`
|
||||
```json
|
||||
{
|
||||
"archetype": "story-explorer",
|
||||
"shadow": "endless_research",
|
||||
"trigger": "output >2000 words without recommendation",
|
||||
"action": "correction_prompt_applied",
|
||||
"occurrence": 1
|
||||
}
|
||||
```
|
||||
|
||||
### `run.complete`
|
||||
```json
|
||||
{
|
||||
"status": "completed",
|
||||
"cycles": 1,
|
||||
"agents_total": 5,
|
||||
"fixes_total": 6,
|
||||
"shadows": 0,
|
||||
"duration_ms": 1295519,
|
||||
"artifacts": [
|
||||
"docs/01-der-huster-research.md",
|
||||
"docs/01-der-huster-outline.md",
|
||||
"stories/01-der-huster.md",
|
||||
"docs/01-der-huster-guardian-review.md",
|
||||
"docs/01-der-huster-sage-review.md",
|
||||
"docs/01-der-huster-process.md"
|
||||
]
|
||||
}
|
||||
```
|
||||
|
||||
## How to Emit Events
|
||||
|
||||
During orchestration, write events using this pattern:
|
||||
|
||||
```bash
|
||||
# Append one event to the run's JSONL file
|
||||
echo '{"ts":"'$(date -u +%Y-%m-%dT%H:%M:%SZ)'","run_id":"RUN_ID","seq":SEQ,"type":"TYPE","phase":"PHASE","agent":"AGENT","data":{...}}' >> .archeflow/events/RUN_ID.jsonl
|
||||
```
|
||||
|
||||
Or use the helper script:
|
||||
|
||||
```bash
|
||||
./lib/archeflow-event.sh RUN_ID TYPE PHASE AGENT '{"key":"value"}'
|
||||
```
|
||||
|
||||
The orchestration skill should call the event emitter at each trigger point listed in the table above.
|
||||
|
||||
## Generating Reports
|
||||
|
||||
After orchestration completes (or during, for live progress):
|
||||
|
||||
```bash
|
||||
# Generate markdown process report
|
||||
./lib/archeflow-report.sh .archeflow/events/2026-04-03-der-huster.jsonl > docs/process-report.md
|
||||
|
||||
# List all runs
|
||||
cat .archeflow/events/index.jsonl | jq -r '[.run_id, .status, .task] | @tsv'
|
||||
```
|
||||
|
||||
## Run Index
|
||||
|
||||
After each `run.complete`, append a summary line to `.archeflow/events/index.jsonl`:
|
||||
|
||||
```jsonl
|
||||
{"run_id":"2026-04-03-der-huster","ts":"2026-04-03T16:00:00Z","task":"Write Der Huster","workflow":"kurzgeschichte","status":"completed","cycles":1,"agents":5,"fixes":6,"duration_ms":1295519}
|
||||
```
|
||||
|
||||
## Integration with Existing Skills
|
||||
|
||||
- **`orchestration`**: Emit events at phase transitions and after each agent
|
||||
- **`shadow-detection`**: Emit `shadow.detected` when thresholds trigger
|
||||
- **`autonomous-mode`**: Use `index.jsonl` for session summaries instead of separate session-log
|
||||
- **`workflow-design`**: Custom workflows inherit logging automatically
|
||||
|
||||
## Design Principles
|
||||
|
||||
1. **Append-only.** Never modify or delete events. They are immutable facts.
|
||||
2. **Self-contained.** Each event has enough context to be understood alone (no forward references).
|
||||
3. **Cheap.** One `echo >>` per event. No database, no service, no dependencies.
|
||||
4. **Optional.** If events dir doesn't exist, orchestration works fine without logging. Events are observation, not control flow.
|
||||
1039
skills/run/SKILL.md
1039
skills/run/SKILL.md
File diff suppressed because it is too large
Load Diff
Reference in New Issue
Block a user