Compare commits
3 Commits
chore/trim
...
refactor/s
| Author | SHA1 | Date | |
|---|---|---|---|
| c8bd55d97c | |||
| 55de51aabe | |||
| 1baaa79946 |
@@ -1,289 +0,0 @@
|
||||
---
|
||||
name: artifact-routing
|
||||
description: |
|
||||
Inter-phase artifact protocol for ArcheFlow runs. Defines how artifacts are named, stored,
|
||||
routed between agents, and archived across PDCA cycles. Ensures each agent receives exactly
|
||||
the context it needs — no more, no less.
|
||||
<example>Automatically loaded by archeflow:run</example>
|
||||
<example>User: "What does the Maker receive as context?"</example>
|
||||
---
|
||||
|
||||
# Artifact Routing — Inter-Phase Context Protocol
|
||||
|
||||
Every ArcheFlow run produces artifacts — research notes, proposals, diffs, reviews, feedback. This skill defines how those artifacts are named, where they live, what each agent receives, and how they are preserved across cycles.
|
||||
|
||||
## Artifact Directory Structure
|
||||
|
||||
```
|
||||
.archeflow/artifacts/<run_id>/
|
||||
├── plan-explorer.md # Explorer research output
|
||||
├── plan-creator.md # Creator proposal/outline
|
||||
├── do-maker.md # Maker implementation summary
|
||||
├── do-maker-files.txt # List of files created/modified (one path per line)
|
||||
├── check-guardian.md # Guardian review verdict + findings
|
||||
├── check-sage.md # Sage review (if present)
|
||||
├── check-skeptic.md # Skeptic review (if present)
|
||||
├── check-trickster.md # Trickster review (if present)
|
||||
├── act-feedback.md # Structured feedback for next cycle (Cycle Feedback Protocol)
|
||||
├── act-fixes.jsonl # Applied fixes log (one JSON line per fix)
|
||||
├── cycle-1/ # Archived artifacts from cycle 1
|
||||
│ ├── plan-explorer.md
|
||||
│ ├── plan-creator.md
|
||||
│ ├── do-maker.md
|
||||
│ ├── do-maker-files.txt
|
||||
│ ├── check-guardian.md
|
||||
│ ├── check-sage.md
|
||||
│ └── act-feedback.md
|
||||
└── cycle-2/ # Archived artifacts from cycle 2 (if cycle 3 starts)
|
||||
└── ...
|
||||
```
|
||||
|
||||
## Naming Convention
|
||||
|
||||
Artifacts follow the pattern: `<phase>-<agent>.<ext>`
|
||||
|
||||
| Phase | Agent | Filename | Format |
|
||||
|-------|-------|----------|--------|
|
||||
| plan | explorer | `plan-explorer.md` | Markdown research report |
|
||||
| plan | creator | `plan-creator.md` | Markdown proposal with confidence scores |
|
||||
| plan | mini-explorer | `plan-mini-explorer.md` | Focused risk research (only if confidence gate triggers) |
|
||||
| do | maker | `do-maker.md` | Markdown implementation summary |
|
||||
| do | maker | `do-maker-files.txt` | Plain text, one file path per line |
|
||||
| check | guardian | `check-guardian.md` | Markdown verdict + findings table |
|
||||
| check | sage | `check-sage.md` | Markdown verdict + findings table |
|
||||
| check | skeptic | `check-skeptic.md` | Markdown verdict + findings table |
|
||||
| check | trickster | `check-trickster.md` | Markdown verdict + findings table |
|
||||
| act | (orchestrator) | `act-feedback.md` | Structured feedback (see Cycle Feedback Protocol) |
|
||||
| act | (orchestrator) | `act-fixes.jsonl` | JSONL fix log |
|
||||
|
||||
**Rule:** Never invent new artifact names during a run. If a reviewer is skipped (A2 fast-path, reviewer profile), its artifact simply does not exist. Downstream phases check for file existence before reading.
|
||||
|
||||
---
|
||||
|
||||
## Context Injection Rules
|
||||
|
||||
Each agent receives a filtered subset of artifacts. This is the **attention filter** — it controls what context is injected into the agent's prompt.
|
||||
|
||||
### Plan Phase
|
||||
|
||||
| Agent | Receives | Does NOT receive |
|
||||
|-------|----------|-----------------|
|
||||
| **Explorer** | Task description, relevant file paths, codebase access | Prior proposals, review outputs, implementation details |
|
||||
| **Creator** (cycle 1) | Task description, `plan-explorer.md` (if exists) | Raw file contents (Explorer summarized them), git diffs |
|
||||
| **Creator** (cycle 2+) | Task description, `plan-explorer.md`, `act-feedback.md` (Creator-routed findings only) | Raw reviewer outputs, Maker-routed findings |
|
||||
|
||||
**Creator context injection template (cycle 2+):**
|
||||
```markdown
|
||||
## Task
|
||||
<task description>
|
||||
|
||||
## Research (from Explorer)
|
||||
<contents of plan-explorer.md>
|
||||
|
||||
## Feedback from Prior Cycle
|
||||
<Creator-routed section of act-feedback.md only>
|
||||
|
||||
Note: Address each unresolved issue listed above. Explain how your revised proposal resolves it.
|
||||
```
|
||||
|
||||
### Do Phase
|
||||
|
||||
| Agent | Receives | Does NOT receive |
|
||||
|-------|----------|-----------------|
|
||||
| **Maker** (cycle 1) | `plan-creator.md` (the proposal), `plan-mini-explorer.md` (if exists) | `plan-explorer.md`, reviewer outputs, raw task description |
|
||||
| **Maker** (cycle 2+) | `plan-creator.md`, `plan-mini-explorer.md` (if exists), Maker-routed findings from `act-feedback.md` | Explorer research, Guardian/Skeptic findings (those went to Creator) |
|
||||
|
||||
**Maker context injection template (cycle 2+):**
|
||||
```markdown
|
||||
## Proposal
|
||||
<contents of plan-creator.md>
|
||||
|
||||
## Implementation Feedback from Prior Cycle
|
||||
<Maker-routed section of act-feedback.md only>
|
||||
|
||||
Note: The proposal has been revised to address design-level issues. Focus on the implementation
|
||||
feedback items above (code quality, test gaps, consistency).
|
||||
```
|
||||
|
||||
**Why Maker doesn't get Explorer output:** The Creator already distilled Explorer's research into a concrete proposal. Giving Maker raw research causes scope creep and "Rogue" shadow activation.
|
||||
|
||||
### Check Phase
|
||||
|
||||
| Agent | Receives | Does NOT receive |
|
||||
|-------|----------|-----------------|
|
||||
| **Guardian** | Maker's git diff, risk section from `plan-creator.md` | Full proposal, Explorer research, other reviewer outputs |
|
||||
| **Skeptic** | `plan-creator.md` (assumptions focus) | Git diff details, Explorer research, other reviewer outputs |
|
||||
| **Sage** | `plan-creator.md`, Maker's git diff, `do-maker.md` | Explorer research, other reviewer outputs |
|
||||
| **Trickster** | Maker's git diff only | Everything else |
|
||||
|
||||
**Guardian context injection template:**
|
||||
```markdown
|
||||
## Changes to Review
|
||||
<git diff from Maker's branch>
|
||||
|
||||
## Risk Assessment (from proposal)
|
||||
<risks section extracted from plan-creator.md>
|
||||
|
||||
Review these changes for security, reliability, breaking changes, and dependency risks.
|
||||
```
|
||||
|
||||
**Skeptic context injection template:**
|
||||
```markdown
|
||||
## Proposal to Challenge
|
||||
<contents of plan-creator.md>
|
||||
|
||||
Focus on assumptions, alternatives not considered, edge cases, and scalability.
|
||||
```
|
||||
|
||||
**Sage context injection template:**
|
||||
```markdown
|
||||
## Proposal
|
||||
<contents of plan-creator.md>
|
||||
|
||||
## Implementation Summary
|
||||
<contents of do-maker.md>
|
||||
|
||||
## Changes
|
||||
<git diff from Maker's branch>
|
||||
|
||||
Evaluate code quality, test coverage, documentation, and codebase consistency.
|
||||
```
|
||||
|
||||
**Trickster context injection template:**
|
||||
```markdown
|
||||
## Changes to Attack
|
||||
<git diff from Maker's branch>
|
||||
|
||||
Try to break this. Malformed input, boundaries, concurrency, error paths, dependency failures.
|
||||
```
|
||||
|
||||
### Act Phase
|
||||
|
||||
No agents are spawned in Act. The orchestrator reads all `check-*.md` artifacts directly.
|
||||
|
||||
---
|
||||
|
||||
## Feedback Routing
|
||||
|
||||
> **This is the canonical routing table.** Other skills (orchestration, act-phase) must match this table exactly. When updating routing rules, update this table first, then sync the others.
|
||||
|
||||
When building `act-feedback.md` after the Check phase, route each finding to the right agent for the next cycle:
|
||||
|
||||
| Finding Source | Finding Category | Routes To | Rationale |
|
||||
|---------------|-----------------|-----------|-----------|
|
||||
| Guardian | security, breaking-change | **Creator** | Design must change |
|
||||
| Guardian | reliability, dependency | **Creator** | Architectural decision needed |
|
||||
| Skeptic | design, scalability | **Creator** | Assumptions need revision |
|
||||
| Sage | quality, consistency | **Maker** | Implementation refinement |
|
||||
| Sage | testing | **Maker** | Test gap, not design flaw |
|
||||
| Trickster | reliability (design flaw) | **Creator** | Needs redesign |
|
||||
| Trickster | reliability (test gap) | **Maker** | Needs more tests |
|
||||
| Trickster | testing | **Maker** | Edge case not covered |
|
||||
|
||||
**Disambiguation rule:** When in doubt: if the fix requires changing the approach, route to Creator. If it requires changing the code within the existing approach, route to Maker.
|
||||
|
||||
### Feedback File Format
|
||||
|
||||
`act-feedback.md` is split into two sections so each agent can be given only its portion:
|
||||
|
||||
```markdown
|
||||
# Cycle <N> Feedback
|
||||
|
||||
## Creator-Routed Issues
|
||||
| # | Source | Severity | Category | Issue | Suggested Fix |
|
||||
|---|--------|----------|----------|-------|---------------|
|
||||
| 1 | Guardian | CRITICAL | security | SQL injection in user input | Add parameterized queries |
|
||||
| 2 | Skeptic | WARNING | design | Assumes single-tenant only | Add tenant isolation |
|
||||
|
||||
## Maker-Routed Issues
|
||||
| # | Source | Severity | Category | Issue | Suggested Fix |
|
||||
|---|--------|----------|----------|-------|---------------|
|
||||
| 3 | Sage | WARNING | quality | Test names don't describe behavior | Rename to describe expected outcome |
|
||||
| 4 | Sage | INFO | consistency | Import order doesn't match codebase style | Re-order imports |
|
||||
|
||||
## Resolved (from prior cycles)
|
||||
| # | Source | Issue | Resolution | Resolved In |
|
||||
|---|--------|-------|------------|-------------|
|
||||
| 1 | Guardian | Missing rate limit | Added rate limiter middleware | Cycle 1 |
|
||||
|
||||
## Convergence Warnings
|
||||
<any finding that appeared unresolved in 2+ consecutive cycles — requires user input>
|
||||
```
|
||||
|
||||
When injecting feedback into Creator's prompt, include **only** the "Creator-Routed Issues" section.
|
||||
When injecting feedback into Maker's prompt, include **only** the "Maker-Routed Issues" section.
|
||||
|
||||
---
|
||||
|
||||
## Cycle Archiving
|
||||
|
||||
When a PDCA cycle completes and a new cycle begins, archive the current artifacts so they are preserved and the working directory is clean for the next iteration.
|
||||
|
||||
### Archive Procedure
|
||||
|
||||
At the end of each cycle (before starting the next):
|
||||
|
||||
```bash
|
||||
RUN_DIR=".archeflow/artifacts/${RUN_ID}"
|
||||
ARCHIVE_DIR="${RUN_DIR}/cycle-${CYCLE}"
|
||||
|
||||
mkdir -p "$ARCHIVE_DIR"
|
||||
|
||||
# Copy all phase artifacts to archive
|
||||
cp "${RUN_DIR}"/plan-*.md "$ARCHIVE_DIR/" 2>/dev/null || true
|
||||
cp "${RUN_DIR}"/do-*.md "$ARCHIVE_DIR/" 2>/dev/null || true
|
||||
cp "${RUN_DIR}"/do-*.txt "$ARCHIVE_DIR/" 2>/dev/null || true
|
||||
cp "${RUN_DIR}"/check-*.md "$ARCHIVE_DIR/" 2>/dev/null || true
|
||||
cp "${RUN_DIR}"/act-feedback.md "$ARCHIVE_DIR/" 2>/dev/null || true
|
||||
```
|
||||
|
||||
**Do NOT delete** the working-level artifacts after archiving. The next cycle's agents need `act-feedback.md` and `plan-explorer.md` (Explorer cache may reuse prior research). Old artifacts in the working directory get overwritten when the new cycle's agents produce their outputs.
|
||||
|
||||
### Archive Access
|
||||
|
||||
Archived artifacts are read-only references. Use them for:
|
||||
- **Resolution tracking:** Compare `cycle-1/check-guardian.md` findings against `cycle-2/check-guardian.md` to detect resolved/persisting issues
|
||||
- **Convergence detection:** Same finding in `cycle-N/act-feedback.md` and `cycle-N+1/act-feedback.md` → escalate to user
|
||||
- **Post-hoc analysis:** Understanding how a solution evolved across cycles
|
||||
|
||||
---
|
||||
|
||||
## Artifact Existence Checks
|
||||
|
||||
Before injecting an artifact into an agent's context, always check if the file exists. Missing artifacts are expected in certain workflows:
|
||||
|
||||
| Artifact | Missing when |
|
||||
|----------|-------------|
|
||||
| `plan-explorer.md` | Fast workflow (no Explorer) |
|
||||
| `plan-mini-explorer.md` | Confidence gate did not trigger for risk coverage |
|
||||
| `check-skeptic.md` | Fast workflow, or A2 fast-path taken |
|
||||
| `check-sage.md` | Fast workflow, or A2 fast-path taken |
|
||||
| `check-trickster.md` | Non-thorough workflow, or A2 fast-path taken |
|
||||
| `act-feedback.md` | Cycle 1 (no prior feedback exists) |
|
||||
| `act-fixes.jsonl` | Cycle 1, or no fixes applied |
|
||||
|
||||
**Rule:** Never fail because an optional artifact is missing. Check existence, skip injection if absent, and note what was skipped in the event data.
|
||||
|
||||
---
|
||||
|
||||
## Git Diff as Artifact
|
||||
|
||||
The Maker's git diff is not saved as a file — it is generated on-the-fly from the Maker's worktree branch:
|
||||
|
||||
```bash
|
||||
git diff main...<maker-branch>
|
||||
```
|
||||
|
||||
This ensures reviewers always see the actual current diff, not a stale snapshot. The diff is injected directly into reviewer prompts, not saved to disk.
|
||||
|
||||
Exception: `do-maker-files.txt` IS saved to disk (just the file list, not the full diff) for quick reference by the orchestrator and for archiving purposes.
|
||||
|
||||
---
|
||||
|
||||
## Design Principles
|
||||
|
||||
1. **Minimal context per agent.** Each agent gets only what it needs. Over-injection causes distraction, shadow activation, and wasted tokens.
|
||||
2. **Artifacts are the handoff mechanism.** Agents never communicate directly. All inter-agent data flows through saved artifacts.
|
||||
3. **Files over memory.** Everything is on disk. If a session crashes, artifacts survive. A `--start-from` resume reads artifacts, not session state.
|
||||
4. **Overwrite, don't accumulate.** Working-level artifacts get overwritten each cycle. Archives preserve history. This keeps the working directory simple.
|
||||
5. **Check before inject.** Always verify artifact existence. Gracefully handle missing optional artifacts.
|
||||
@@ -1,249 +0,0 @@
|
||||
---
|
||||
name: convergence
|
||||
description: |
|
||||
Detects convergence, stalling, and oscillation in multi-cycle PDCA runs. Prevents wasted cycles
|
||||
by stopping early when findings are not being resolved or are bouncing between cycles.
|
||||
<example>Automatically loaded during Act phase before exit decision</example>
|
||||
<example>User: "Is the run converging?"</example>
|
||||
---
|
||||
|
||||
# Convergence Detection
|
||||
|
||||
In multi-cycle PDCA runs, the Act phase must decide whether another cycle will help or just waste tokens. This skill provides the analysis: are findings being resolved (converging), staying the same (stalling), or bouncing back (oscillating)?
|
||||
|
||||
## When It Runs
|
||||
|
||||
Convergence analysis runs **after the Check phase completes and before the Act phase exit decision**. It requires at least 2 cycles of data — on cycle 1, it is skipped (no comparison baseline).
|
||||
|
||||
```
|
||||
Check phase → Convergence Analysis → Act phase exit decision
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Step 1: Finding Comparison
|
||||
|
||||
Extract findings from the current cycle and compare against the previous cycle.
|
||||
|
||||
### Data Sources
|
||||
|
||||
- **Current cycle findings:** Parsed from `check-*.md` artifacts in `.archeflow/artifacts/<run_id>/`
|
||||
- **Previous cycle findings:** Parsed from `check-*.md` artifacts in `.archeflow/artifacts/<run_id>/cycle-<N-1>/`
|
||||
|
||||
Each finding is identified by a composite key: `source + category + file_location + description_keywords`.
|
||||
|
||||
### Finding Categories
|
||||
|
||||
Every finding from the current cycle is classified into exactly one category:
|
||||
|
||||
| Category | Definition |
|
||||
|----------|------------|
|
||||
| **NEW** | Finding not present in any previous cycle |
|
||||
| **RESOLVED** | Was present in the previous cycle, absent in the current cycle |
|
||||
| **PERSISTENT** | Present in both the current and previous cycle (same key) |
|
||||
| **REGRESSED** | Was RESOLVED in the previous cycle (was present in N-2, absent in N-1), but returned in the current cycle |
|
||||
|
||||
### Matching Algorithm
|
||||
|
||||
Two findings match if:
|
||||
1. Same `source` archetype (guardian, sage, etc.)
|
||||
2. Same `category` (security, reliability, quality, etc.)
|
||||
3. Same or overlapping file location (same file, line within 10 lines)
|
||||
4. 50%+ keyword overlap in description (lowercase, strip punctuation)
|
||||
|
||||
All four conditions must hold. This prevents false matches across unrelated findings.
|
||||
|
||||
---
|
||||
|
||||
## Step 2: Convergence Score
|
||||
|
||||
Calculate a convergence score from the categorized findings:
|
||||
|
||||
```
|
||||
convergence = resolved_count / (resolved_count + new_count + regressed_count)
|
||||
```
|
||||
|
||||
If the denominator is 0 (no resolved, no new, no regressed — only persistent), the score is `0.0` (stalled, not converging).
|
||||
|
||||
### Score Interpretation
|
||||
|
||||
| Score Range | Status | Meaning |
|
||||
|-------------|--------|---------|
|
||||
| > 0.8 | **Converging** | Most issues being resolved, few new ones introduced |
|
||||
| 0.5 - 0.8 | **Stalling** | Fixing roughly as many as introducing |
|
||||
| < 0.5 | **Diverging** | Making things worse — more new/regressed than resolved |
|
||||
| 0.0 (all persistent) | **Stuck** | No progress in either direction |
|
||||
|
||||
---
|
||||
|
||||
## Step 3: Oscillation Detection
|
||||
|
||||
An oscillating finding is one that bounces between resolved and re-introduced across cycles:
|
||||
|
||||
1. Finding was present in cycle N-2
|
||||
2. Finding was absent in cycle N-1 (resolved)
|
||||
3. Finding is present again in cycle N (regressed)
|
||||
|
||||
This indicates the fix in cycle N-1 was undone or invalidated by other changes in cycle N.
|
||||
|
||||
### Oscillation Rules
|
||||
|
||||
- A single oscillating finding: **flag it** in the convergence report but continue.
|
||||
- Two or more oscillating findings: **STOP** and escalate to the user.
|
||||
- Message: `"Findings X and Y are oscillating between cycles. Manual intervention needed — the automated fixes are interfering with each other."`
|
||||
|
||||
Oscillation tracking requires 3+ cycles of data. On cycles 1-2, oscillation detection is skipped.
|
||||
|
||||
---
|
||||
|
||||
## Step 4: Early Termination Rules
|
||||
|
||||
The convergence analysis can override the normal Act phase exit decision. If any of these conditions hold, the recommendation is **STOP**:
|
||||
|
||||
| Condition | Threshold | Recommendation |
|
||||
|-----------|-----------|----------------|
|
||||
| Diverging | Score < 0.5 for 2 consecutive cycles | STOP — changes are making things worse |
|
||||
| Stalled | 0 findings resolved between cycles | STOP — no progress, further cycles will not help |
|
||||
| Stuck | All findings are PERSISTENT for 2 consecutive cycles | STOP — automated fixes cannot resolve these |
|
||||
| Oscillating | 2+ findings oscillating | STOP — fixes are interfering with each other |
|
||||
|
||||
When STOP is recommended, the Act phase should:
|
||||
1. **Not** start another PDCA cycle
|
||||
2. Report all unresolved findings to the user
|
||||
3. Present the best implementation so far (on its branch, not merged)
|
||||
4. Include the convergence report explaining why the run was stopped
|
||||
|
||||
### Override Behavior
|
||||
|
||||
The convergence STOP recommendation overrides the normal cycle-back logic in the Act phase. Even if `CYCLE < MAX_CYCLES` and there are fixable-looking findings, if convergence says STOP, the run stops.
|
||||
|
||||
The user can always override by explicitly requesting another cycle: `"Run one more cycle anyway"`.
|
||||
|
||||
---
|
||||
|
||||
## Step 5: Integration with Act Phase
|
||||
|
||||
### Event Data
|
||||
|
||||
Convergence data is included in the `cycle.boundary` event emitted by the Act phase:
|
||||
|
||||
```json
|
||||
{
|
||||
"type": "cycle.boundary",
|
||||
"phase": "act",
|
||||
"data": {
|
||||
"cycle": 2,
|
||||
"max_cycles": 3,
|
||||
"exit_condition": "convergence_stop",
|
||||
"met": false,
|
||||
"fixes_applied": 2,
|
||||
"next_action": "stop",
|
||||
"convergence": {
|
||||
"score": 0.35,
|
||||
"status": "diverging",
|
||||
"resolved": 1,
|
||||
"new": 2,
|
||||
"regressed": 1,
|
||||
"persistent": 3,
|
||||
"oscillating": ["Timeline reference mismatch"],
|
||||
"recommendation": "stop",
|
||||
"reason": "Diverging for 2 consecutive cycles"
|
||||
}
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
### Decision Tree Update
|
||||
|
||||
The Act phase decision tree (from `act-phase` skill Step 4) gains a new first branch:
|
||||
|
||||
```
|
||||
┌─ Convergence analysis (cycle 2+)
|
||||
│
|
||||
├─ Convergence says STOP
|
||||
│ └─ STOP: Report to user with convergence report
|
||||
│
|
||||
├─ Convergence says CONTINUE
|
||||
│ └─ Fall through to normal exit decision logic
|
||||
│
|
||||
└─ Cycle 1 (no convergence data)
|
||||
└─ Fall through to normal exit decision logic
|
||||
```
|
||||
|
||||
### Act Feedback Enhancement
|
||||
|
||||
When the Act phase builds `act-feedback.md` for the next cycle, it includes the convergence summary at the top:
|
||||
|
||||
```markdown
|
||||
## Convergence Analysis (Cycle 1 → 2)
|
||||
|
||||
Score: 0.75 (converging)
|
||||
Resolved: 3 | New: 1 | Regressed: 0 | Persistent: 2
|
||||
|
||||
Recommendation: Continue — trend is positive
|
||||
|
||||
### Finding Status
|
||||
| Finding | Status | Cycles |
|
||||
|---------|--------|--------|
|
||||
| SQL injection in user input | RESOLVED | 1 |
|
||||
| Missing rate limit | RESOLVED | 1 |
|
||||
| Test names unclear | RESOLVED | 1 |
|
||||
| Null check missing in parser | PERSISTENT | 2 |
|
||||
| Error path not tested | PERSISTENT | 2 |
|
||||
| New: Unused import introduced | NEW | 1 |
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Convergence Report Format
|
||||
|
||||
The full convergence report is generated as part of the orchestration output:
|
||||
|
||||
```markdown
|
||||
## Convergence Analysis (Cycle N-1 → N)
|
||||
|
||||
**Score:** 0.75 (converging)
|
||||
**Resolved:** 3 | **New:** 1 | **Regressed:** 0 | **Persistent:** 2 | **Oscillating:** 0
|
||||
|
||||
### Resolved This Cycle
|
||||
| Source | Category | Description |
|
||||
|--------|----------|-------------|
|
||||
| guardian | security | SQL injection in user input handler |
|
||||
| guardian | reliability | Missing rate limit on auth endpoint |
|
||||
| sage | quality | Test names don't describe behavior |
|
||||
|
||||
### New This Cycle
|
||||
| Source | Category | Description |
|
||||
|--------|----------|-------------|
|
||||
| sage | quality | Unused import introduced by fix |
|
||||
|
||||
### Persistent (unresolved across cycles)
|
||||
| Source | Category | Description | Cycles Open |
|
||||
|--------|----------|-------------|-------------|
|
||||
| trickster | reliability | Null check missing in parser | 2 |
|
||||
| sage | testing | Error path not tested | 2 |
|
||||
|
||||
### Oscillating
|
||||
(none)
|
||||
|
||||
**Recommendation:** Continue — trend is positive
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Integration with Memory Skill
|
||||
|
||||
When convergence detects PERSISTENT findings (present for 2+ cycles), these are strong candidates for the `memory` skill's lesson extraction:
|
||||
|
||||
- After a run that had persistent findings, `archeflow-memory.sh extract` will pick these up with higher confidence (they have been confirmed across multiple cycles within a single run).
|
||||
- Persistent findings that also appear in `lessons.jsonl` from prior runs get a double frequency boost (cross-cycle within run + cross-run pattern).
|
||||
|
||||
---
|
||||
|
||||
## Design Principles
|
||||
|
||||
1. **Conservative stopping.** Requires 2 consecutive data points before recommending STOP. A single bad cycle might be noise.
|
||||
2. **User has final say.** STOP is a recommendation, not an enforced shutdown. The user can override.
|
||||
3. **Cheap computation.** Keyword matching on finding descriptions, simple arithmetic on counts. No ML, no embeddings.
|
||||
4. **Bounded scope.** Only compares adjacent cycles (N vs N-1, with N-2 for oscillation). Does not attempt to model long-term trends across many cycles.
|
||||
5. **Observable.** All convergence data is included in the `cycle.boundary` event, making it available for post-hoc analysis via the process log.
|
||||
@@ -1,193 +0,0 @@
|
||||
---
|
||||
name: do-phase
|
||||
description: Use when acting as Maker in the Do phase. Defines execution rules, worktree protocol, commit discipline, and output format.
|
||||
---
|
||||
|
||||
# Do Phase
|
||||
|
||||
Maker implements the Creator's proposal. This skill defines the execution protocol — the agent definition (`agents/maker.md`) has the behavioral rules.
|
||||
|
||||
## Execution Protocol
|
||||
|
||||
### 1. Read Before Writing
|
||||
Read the Creator's proposal completely. Identify:
|
||||
- Files to create or modify (the `### Changes` section)
|
||||
- Test strategy (the `### Test Strategy` section)
|
||||
- Scope boundaries (the `### Not Doing` section)
|
||||
|
||||
If the proposal is unclear on any point: implement your best interpretation and note the assumption in your output.
|
||||
|
||||
### 2. Implementation Order
|
||||
For each change in the proposal:
|
||||
1. Write the test first (expect it to fail)
|
||||
2. Implement the change (make the test pass)
|
||||
3. Verify existing tests still pass
|
||||
4. Commit with a descriptive message
|
||||
|
||||
For writing domain (stories, prose):
|
||||
1. Read the outline / scene plan
|
||||
2. Read the voice profile and character sheets
|
||||
3. Draft scene by scene, following the outline's emotional beats
|
||||
4. Self-check: does the voice hold? Does dialogue sound natural?
|
||||
5. Commit after each scene or logical section
|
||||
|
||||
### 3. Commit Discipline
|
||||
|
||||
**CRITICAL: Always commit before finishing.** Uncommitted worktree changes are LOST when the agent exits.
|
||||
|
||||
Commit conventions:
|
||||
```
|
||||
feat: <what was added> # New functionality
|
||||
fix: <what was fixed> # Bug fix within the task
|
||||
test: <what was tested> # Test additions
|
||||
docs: <what was documented> # Documentation only
|
||||
```
|
||||
|
||||
Commit frequency:
|
||||
- **Code:** After each logical step (one feature, one fix, one test suite)
|
||||
- **Writing:** After each scene or section (~500-1000 words)
|
||||
- **Never:** One big commit at the end with everything
|
||||
|
||||
### 4. Scope Control
|
||||
|
||||
Do exactly what the proposal says. No more, no less.
|
||||
|
||||
**In scope:**
|
||||
- Files listed in the proposal's `### Changes` section
|
||||
- Tests specified in the `### Test Strategy` section
|
||||
- Dependencies explicitly mentioned
|
||||
|
||||
**Out of scope (even if tempting):**
|
||||
- Refactoring code you noticed while implementing
|
||||
- Adding features not in the proposal
|
||||
- Fixing pre-existing bugs in adjacent code
|
||||
- Updating documentation beyond what the task requires
|
||||
|
||||
If you encounter something that needs fixing but is out of scope: note it in `### Notes` for future work. Don't fix it now.
|
||||
|
||||
### 5. Blocker Protocol
|
||||
|
||||
If you hit a blocker (dependency missing, test infrastructure broken, proposal contradicts codebase):
|
||||
1. Document what's blocked and why
|
||||
2. Document what you completed before the block
|
||||
3. Commit what you have
|
||||
4. Stop and report — don't silently work around it
|
||||
|
||||
## Worktree Protocol
|
||||
|
||||
When running in an isolated git worktree (`isolation: "worktree"`):
|
||||
|
||||
```
|
||||
main branch (untouched)
|
||||
└── archeflow/maker-<run_id> (worktree branch)
|
||||
├── commit: implementation step 1
|
||||
├── commit: implementation step 2
|
||||
└── commit: implementation step 3 (final)
|
||||
```
|
||||
|
||||
- All work stays on the worktree branch
|
||||
- Main branch is never modified directly
|
||||
- The branch name follows the pattern: `archeflow/maker-<run_id>`
|
||||
- After Check phase approves: the orchestrator merges (not the Maker)
|
||||
|
||||
## Output Format
|
||||
|
||||
```markdown
|
||||
## Implementation: <task>
|
||||
|
||||
### Files Changed
|
||||
- `path/file.ext` — What changed (+N -M lines)
|
||||
|
||||
### Tests
|
||||
- N new tests, all passing
|
||||
- M existing tests still passing
|
||||
|
||||
### Commits
|
||||
1. `feat: description` (hash)
|
||||
2. `test: description` (hash)
|
||||
|
||||
### Notes
|
||||
- Assumptions made where proposal was unclear
|
||||
- Out-of-scope issues noticed (for future work)
|
||||
|
||||
### Branch
|
||||
`archeflow/maker-<run_id>` — ready for review
|
||||
```
|
||||
|
||||
For writing domain:
|
||||
```markdown
|
||||
## Draft: <story/chapter title>
|
||||
|
||||
### Scenes Written
|
||||
- Scene 1: <title> (~N words)
|
||||
- Scene 2: <title> (~N words)
|
||||
|
||||
### Word Count
|
||||
- Target: N | Actual: M | Delta: +/-
|
||||
|
||||
### Voice Notes
|
||||
- Dialect usage: N instances (target: moderate)
|
||||
- Essen/Trinken: present in X/Y scenes
|
||||
|
||||
### Commits
|
||||
1. `feat: scene 1 - <title>` (hash)
|
||||
2. `feat: scene 2 - <title>` (hash)
|
||||
|
||||
### Notes
|
||||
- Deviations from outline (with reasoning)
|
||||
```
|
||||
|
||||
## With Prior Feedback (Cycle 2+)
|
||||
|
||||
When the Maker receives feedback from a prior cycle's Check phase:
|
||||
|
||||
1. Read the `act-feedback.md` — focus on the `### For Maker` section
|
||||
2. Address each finding marked as "routed to Maker"
|
||||
3. In your output, include a response table:
|
||||
|
||||
```markdown
|
||||
### Feedback Response
|
||||
| Finding | Source | Action |
|
||||
|---------|--------|--------|
|
||||
| Test names unclear | Sage | Fixed — renamed to behavior descriptions |
|
||||
| Missing edge case | Trickster | Added test for empty input |
|
||||
```
|
||||
|
||||
Do not address findings routed to Creator — those were handled in the revised proposal.
|
||||
|
||||
## Quality Checklist (self-check before finishing)
|
||||
|
||||
Before your final commit, verify:
|
||||
- [ ] All proposal changes implemented
|
||||
- [ ] All new tests pass
|
||||
- [ ] All existing tests still pass
|
||||
- [ ] No files modified outside proposal scope
|
||||
- [ ] Every logical step has its own commit
|
||||
- [ ] Output summary is complete and accurate
|
||||
- [ ] Branch name follows convention
|
||||
|
||||
## Test-First Gate
|
||||
|
||||
Before the Maker's output is accepted, the orchestrator validates that tests were included.
|
||||
|
||||
### Validation Logic
|
||||
|
||||
Read `do-maker-files.txt`. Check if any file path matches common test patterns:
|
||||
- `*test*`, `*spec*`, `*.test.*`, `*.spec.*`, `*_test.*`, `*_spec.*`
|
||||
- Files in directories named `test/`, `tests/`, `__tests__/`, `spec/`
|
||||
|
||||
For writing domain projects, this gate is skipped.
|
||||
|
||||
### Outcomes
|
||||
|
||||
| Result | Action |
|
||||
|--------|--------|
|
||||
| Test files found | Pass — proceed to Check phase |
|
||||
| No test files, code domain | **Warn** — emit WARNING event, note in do-maker.md |
|
||||
| No test files + Creator specified tests | **Block** — re-run Maker with test instruction (1 retry) |
|
||||
| Writing domain | Skip gate entirely |
|
||||
|
||||
The block case triggers a targeted re-run with prompt:
|
||||
"The proposal specified these test cases: <test strategy section>. No test files
|
||||
were found in your changes. Add the specified tests before finishing."
|
||||
This is one retry within the Do phase, not a full PDCA cycle.
|
||||
@@ -1,200 +0,0 @@
|
||||
---
|
||||
name: effectiveness
|
||||
description: |
|
||||
Track archetype effectiveness across runs. Scores each archetype on signal-to-noise,
|
||||
fix rate, cost efficiency, accuracy, and cycle impact. Recommends model tier changes
|
||||
and archetype removal based on rolling averages.
|
||||
<example>User: "Which reviewers are actually useful?"</example>
|
||||
<example>User: "Show archetype effectiveness report"</example>
|
||||
---
|
||||
|
||||
# Agent Effectiveness Scoring
|
||||
|
||||
Track which archetypes are most useful vs. which waste tokens. Over multiple runs, build a profile of each archetype's effectiveness and use it to optimize team composition and model selection.
|
||||
|
||||
## Storage
|
||||
|
||||
```
|
||||
.archeflow/memory/effectiveness.jsonl # Per-run archetype scores (append-only)
|
||||
```
|
||||
|
||||
## Scoring Dimensions
|
||||
|
||||
For each archetype that participates in a run, calculate these scores:
|
||||
|
||||
| Dimension | How Measured | Weight |
|
||||
|-----------|-------------|--------|
|
||||
| **Signal-to-noise** | useful findings / total findings | 0.30 |
|
||||
| **Fix rate** | findings that led to actual fixes / total findings | 0.25 |
|
||||
| **Cost efficiency** | useful findings per dollar spent | 0.20 |
|
||||
| **Accuracy** | findings not contradicted by other reviewers | 0.15 |
|
||||
| **Cycle impact** | did this archetype's findings lead to cycle exit? | 0.10 |
|
||||
|
||||
### Definitions
|
||||
|
||||
- **Useful finding**: A finding in a `review.verdict` event with `severity >= WARNING` (i.e., severity is `warning`, `bug`, or `critical`) AND `fix_required == true`.
|
||||
- **Actual fix**: A `fix.applied` event whose `source` field matches this archetype (or whose DAG `parent` chain traces back to this archetype's `review.verdict` event).
|
||||
- **Contradicted finding**: Another reviewer's `review.verdict` has `verdict == "approved"` for the same scope where this archetype flagged an issue. Approximation: if archetype A flags N findings but archetype B approves the same code with 0 findings in overlapping severity categories, A's unmatched findings are considered potentially contradicted.
|
||||
- **Cycle impact**: The archetype's findings (with `fix_required == true`) resulted in fixes that were part of the final approved cycle. Determined by checking if `fix.applied` events referencing this archetype exist before the final `cycle.boundary` with `met == true`.
|
||||
|
||||
### Composite Score
|
||||
|
||||
```
|
||||
composite = (signal_to_noise * 0.30)
|
||||
+ (fix_rate * 0.25)
|
||||
+ (cost_efficiency_normalized * 0.20)
|
||||
+ (accuracy * 0.15)
|
||||
+ (cycle_impact * 0.10)
|
||||
```
|
||||
|
||||
**Cost efficiency normalization**: Raw cost efficiency is `useful_findings / cost_usd`. To normalize to 0-1 range, use: `min(1.0, raw_efficiency / 100)`. The threshold of 100 means "100 useful findings per dollar" is considered perfect efficiency (achievable with haiku on structured reviews).
|
||||
|
||||
## Per-Run Scoring
|
||||
|
||||
After `run.complete`, calculate scores for each archetype that participated. The `extract` command does this.
|
||||
|
||||
### Per-Run Score Record
|
||||
|
||||
```jsonl
|
||||
{"ts":"2026-04-03T16:00:00Z","run_id":"2026-04-03-der-huster","archetype":"guardian","signal_to_noise":0.85,"fix_rate":1.0,"cost_efficiency":42.5,"accuracy":1.0,"cycle_impact":true,"composite_score":0.91,"tokens":5000,"cost_usd":0.004,"model":"haiku","findings_total":4,"findings_useful":3,"fixes_applied":3}
|
||||
```
|
||||
|
||||
Appended to `.archeflow/memory/effectiveness.jsonl`.
|
||||
|
||||
### Scoring Non-Review Archetypes
|
||||
|
||||
Only archetypes that produce `review.verdict` events are scored (Guardian, Skeptic, Sage, Trickster, and any custom review archetypes). Non-review archetypes (Explorer, Creator, Maker) are tracked by cost-tracking but not effectiveness-scored, because their output quality is measured differently (by whether the run succeeds, not by individual findings).
|
||||
|
||||
## Aggregate Scoring
|
||||
|
||||
Across all runs, maintain rolling averages (computed on-demand, not stored):
|
||||
|
||||
```jsonl
|
||||
{"archetype":"guardian","runs":12,"avg_composite":0.88,"avg_signal_noise":0.82,"avg_cost_efficiency":38.2,"trend":"stable","recommendation":"keep"}
|
||||
{"archetype":"trickster","runs":8,"avg_composite":0.35,"avg_signal_noise":0.20,"avg_cost_efficiency":5.1,"trend":"declining","recommendation":"consider_removing"}
|
||||
```
|
||||
|
||||
### Trend Calculation
|
||||
|
||||
Compare the average composite score of the last 5 runs to the 5 runs before that:
|
||||
|
||||
- **improving**: last-5 avg > prior-5 avg + 0.05
|
||||
- **declining**: last-5 avg < prior-5 avg - 0.05
|
||||
- **stable**: within +/- 0.05
|
||||
|
||||
If fewer than 10 runs exist, trend is `"insufficient_data"`.
|
||||
|
||||
### Recommendations
|
||||
|
||||
Based on aggregate composite scores:
|
||||
|
||||
| Composite Score | Recommendation | Meaning |
|
||||
|----------------|---------------|---------|
|
||||
| >= 0.70 | `keep` | Archetype is valuable, contributes meaningful findings |
|
||||
| 0.40 - 0.69 | `optimize` | Consider cheaper model or tighter review lens |
|
||||
| < 0.40 | `consider_removing` | Might be wasting tokens, review whether it adds value |
|
||||
|
||||
## Integration Points
|
||||
|
||||
### At Run Start
|
||||
|
||||
When the `run` skill initializes, show a brief effectiveness summary for the team's archetypes:
|
||||
|
||||
```
|
||||
Archetype effectiveness (last 10 runs):
|
||||
guardian: 0.88 (keep) — haiku, $0.004/run avg
|
||||
sage: 0.72 (keep) — sonnet, $0.08/run avg
|
||||
skeptic: 0.45 (optimize) — haiku, $0.003/run avg
|
||||
trickster: 0.32 (consider_removing) — haiku, $0.003/run avg
|
||||
```
|
||||
|
||||
### Model Tier Suggestions
|
||||
|
||||
Cross-reference effectiveness with model assignment:
|
||||
|
||||
- **High effectiveness on cheap model** (composite >= 0.7, model = haiku): "Keep cheap. Working well."
|
||||
- **Low effectiveness on cheap model** (composite < 0.5, model = haiku): "Consider upgrading to sonnet — cheap model may not be capturing issues."
|
||||
- **High effectiveness on expensive model** (composite >= 0.7, model = sonnet): "Try downgrading to haiku — may maintain quality at lower cost."
|
||||
- **Low effectiveness on expensive model** (composite < 0.5, model = sonnet): "Consider removing — expensive and not contributing."
|
||||
|
||||
### Cost-Tracking Integration
|
||||
|
||||
Multiply estimated cost by effectiveness to get "value per dollar":
|
||||
|
||||
```
|
||||
value_per_dollar = composite_score / cost_usd
|
||||
```
|
||||
|
||||
This metric helps compare archetypes directly: a cheap archetype with moderate effectiveness may have higher value_per_dollar than an expensive one with high effectiveness.
|
||||
|
||||
## Effectiveness Script
|
||||
|
||||
**Location:** `lib/archeflow-score.sh`
|
||||
|
||||
```
|
||||
Usage:
|
||||
archeflow-score.sh extract <events.jsonl> # Score archetypes from a completed run
|
||||
archeflow-score.sh report # Show aggregate effectiveness report
|
||||
archeflow-score.sh recommend <team.yaml> # Recommend model tiers for a team
|
||||
```
|
||||
|
||||
### `extract` Command
|
||||
|
||||
1. Read all events from the JSONL file
|
||||
2. Verify a `run.complete` event exists (scoring incomplete runs is unreliable)
|
||||
3. For each `review.verdict` event:
|
||||
- Count total findings and useful findings (severity >= WARNING, fix_required)
|
||||
- Cross-reference with `fix.applied` events via the `source` field or DAG parent chain
|
||||
- Check for contradictions from other reviewers
|
||||
- Determine cycle impact
|
||||
4. Calculate all scoring dimensions and composite score
|
||||
5. Append per-archetype score records to `.archeflow/memory/effectiveness.jsonl`
|
||||
|
||||
### `report` Command
|
||||
|
||||
1. Read `.archeflow/memory/effectiveness.jsonl`
|
||||
2. Group by archetype
|
||||
3. Calculate rolling averages (last 10 runs per archetype)
|
||||
4. Calculate trends (last 5 vs. prior 5)
|
||||
5. Output a markdown table:
|
||||
|
||||
```markdown
|
||||
# Archetype Effectiveness Report
|
||||
|
||||
| Archetype | Runs | Avg Score | S/N | Fix Rate | Cost Eff | Accuracy | Trend | Rec |
|
||||
|-----------|------|-----------|-----|----------|----------|----------|-------|-----|
|
||||
| guardian | 12 | 0.88 | 0.82 | 0.95 | 38.2 | 0.97 | stable | keep |
|
||||
| sage | 10 | 0.72 | 0.70 | 0.80 | 12.1 | 0.88 | improving | keep |
|
||||
| skeptic | 8 | 0.45 | 0.40 | 0.50 | 22.5 | 0.60 | stable | optimize |
|
||||
| trickster | 8 | 0.35 | 0.20 | 0.30 | 5.1 | 0.55 | declining | consider_removing |
|
||||
|
||||
**Model suggestions:**
|
||||
- skeptic (haiku, score 0.45): Consider upgrading to sonnet or tightening review lens
|
||||
- trickster (haiku, score 0.35): Consider removing — low signal, low fix rate
|
||||
```
|
||||
|
||||
### `recommend` Command
|
||||
|
||||
1. Read the team preset YAML file
|
||||
2. For each archetype in the team, look up its effectiveness from `.archeflow/memory/effectiveness.jsonl`
|
||||
3. Cross-reference current model assignment with effectiveness
|
||||
4. Output recommendations:
|
||||
|
||||
```markdown
|
||||
# Model Recommendations for team: story-development
|
||||
|
||||
| Archetype | Current Model | Score | Suggestion |
|
||||
|-----------|--------------|-------|------------|
|
||||
| guardian | haiku | 0.88 | Keep haiku — high effectiveness at low cost |
|
||||
| sage | sonnet | 0.72 | Keep sonnet — quality-sensitive role |
|
||||
| skeptic | haiku | 0.45 | Try sonnet — may improve signal quality |
|
||||
| trickster | haiku | 0.35 | Consider removing from team |
|
||||
```
|
||||
|
||||
## Design Principles
|
||||
|
||||
1. **Append-only.** Score records are immutable facts. Aggregates are computed on-demand.
|
||||
2. **Review archetypes only.** Non-review agents (Explorer, Creator, Maker) are not scored — their value is in the final product, not in individual findings.
|
||||
3. **Relative, not absolute.** Scores are meaningful in comparison (guardian vs. trickster), not as standalone numbers. The thresholds (0.7, 0.4) are starting points — calibrate after 20+ runs.
|
||||
4. **Actionable.** Every report ends with concrete recommendations (keep, optimize, remove, change model).
|
||||
5. **Cheap to compute.** One JSONL scan per report. No databases, no external services.
|
||||
@@ -11,21 +11,14 @@ description: |
|
||||
|
||||
# Cross-Run Memory
|
||||
|
||||
ArcheFlow forgets everything after each run. If Guardian repeatedly flags the same type of issue (e.g., timeline errors in fiction, missing null checks in code), the next run starts from zero. This skill fixes that by extracting lessons from completed runs and injecting them into future agent prompts.
|
||||
ArcheFlow forgets everything after each run. This skill extracts lessons from completed runs and injects them into future agent prompts, so recurring issues (timeline errors, missing null checks) are caught proactively.
|
||||
|
||||
## Storage
|
||||
|
||||
```
|
||||
.archeflow/memory/lessons.jsonl # Append-only, one lesson per line
|
||||
```
|
||||
|
||||
Each lesson is a single JSON line:
|
||||
|
||||
```jsonl
|
||||
{"id":"m-001","ts":"2026-04-03T14:00:00Z","run_id":"2026-04-03-der-huster","type":"pattern","source":"guardian","description":"Timeline references must match story start day","frequency":2,"severity":"bug","domain":"writing","tags":["continuity","timeline"],"last_seen_run":"2026-04-03-der-huster","runs_since_last_seen":0}
|
||||
{"id":"m-002","ts":"2026-04-03T15:00:00Z","run_id":"2026-04-03-der-huster","type":"preference","source":"user_feedback","description":"User prefers single bundled PR over many small ones","frequency":1,"severity":"info","domain":"general","tags":["workflow"],"last_seen_run":"","runs_since_last_seen":0}
|
||||
{"id":"m-003","ts":"2026-04-04T10:00:00Z","run_id":"2026-04-04-auth-fix","type":"archetype_hint","source":"sage","description":"Voice drift most common in long monologue passages","frequency":3,"severity":"warning","domain":"writing","tags":["voice","prose"],"archetype":"story-sage","last_seen_run":"2026-04-04-auth-fix","runs_since_last_seen":0}
|
||||
{"id":"m-004","ts":"2026-04-04T11:00:00Z","run_id":"2026-04-04-auth-fix","type":"anti_pattern","source":"maker","description":"Splitting auth middleware into per-route handlers causes duplication","frequency":1,"severity":"warning","domain":"code","tags":["auth","middleware"],"last_seen_run":"2026-04-04-auth-fix","runs_since_last_seen":0}
|
||||
.archeflow/memory/archive.jsonl # Decayed lessons (frequency reached 0)
|
||||
.archeflow/memory/audit.jsonl # Injection audit trail
|
||||
```
|
||||
|
||||
## Lesson Types
|
||||
@@ -33,245 +26,95 @@ Each lesson is a single JSON line:
|
||||
| Type | Source | Description |
|
||||
|------|--------|-------------|
|
||||
| `pattern` | Auto-detected | Recurring finding across runs (same category + similar description) |
|
||||
| `preference` | Manual | User correction or workflow preference (added via CLI) |
|
||||
| `preference` | Manual | User correction or workflow preference (injected immediately, skips frequency threshold) |
|
||||
| `archetype_hint` | Auto-detected | Per-archetype insight (e.g., Sage catches voice drift in monologues) |
|
||||
| `anti_pattern` | Manual or auto | Something that was tried and failed — avoid repeating |
|
||||
| `anti_pattern` | Manual or auto | Something that was tried and failed -- avoid repeating |
|
||||
|
||||
## Lesson Fields
|
||||
## Lesson JSON Fields
|
||||
|
||||
| Field | Type | Description |
|
||||
|-------|------|-------------|
|
||||
| `id` | string | Unique ID, format `m-NNN` (monotonically increasing) |
|
||||
| `ts` | ISO 8601 | When the lesson was created or last updated |
|
||||
| `id` | string | `m-NNN` (monotonically increasing) |
|
||||
| `ts` | ISO 8601 | Created or last updated |
|
||||
| `run_id` | string | Run that created or last triggered this lesson |
|
||||
| `type` | string | One of: `pattern`, `preference`, `archetype_hint`, `anti_pattern` |
|
||||
| `source` | string | Archetype or `user_feedback` that originated the lesson |
|
||||
| `type` | string | `pattern`, `preference`, `archetype_hint`, `anti_pattern` |
|
||||
| `source` | string | Archetype name or `user_feedback` |
|
||||
| `description` | string | Human-readable lesson text |
|
||||
| `frequency` | integer | How many times this lesson was triggered |
|
||||
| `severity` | string | `bug`, `warning`, `info`, or `recommendation` |
|
||||
| `frequency` | integer | Times this lesson was triggered |
|
||||
| `severity` | string | `bug`, `warning`, `info`, `recommendation` |
|
||||
| `domain` | string | `writing`, `code`, `general`, or project-specific |
|
||||
| `tags` | string[] | Keywords for matching and filtering |
|
||||
| `archetype` | string or null | For `archetype_hint` type — which archetype this applies to |
|
||||
| `last_seen_run` | string | Run ID where this lesson was last matched |
|
||||
| `runs_since_last_seen` | integer | Counter for decay — incremented each run that does NOT trigger this lesson |
|
||||
| `archetype` | string? | For `archetype_hint` -- which archetype this applies to |
|
||||
| `last_seen_run` | string | Run ID where last matched |
|
||||
| `runs_since_last_seen` | integer | Counter for decay |
|
||||
|
||||
Example:
|
||||
```jsonl
|
||||
{"id":"m-001","ts":"2026-04-03T14:00:00Z","run_id":"2026-04-03-der-huster","type":"pattern","source":"guardian","description":"Timeline references must match story start day","frequency":2,"severity":"bug","domain":"writing","tags":["continuity","timeline"],"last_seen_run":"2026-04-03-der-huster","runs_since_last_seen":0}
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Auto-Detection
|
||||
|
||||
After each `run.complete`, the orchestrator runs lesson extraction:
|
||||
After each `run.complete`, extract lessons from findings:
|
||||
|
||||
```bash
|
||||
./lib/archeflow-memory.sh extract .archeflow/events/<run_id>.jsonl
|
||||
```
|
||||
|
||||
### Extraction Algorithm
|
||||
The script reads `review.verdict` events, matches findings against existing lessons by keyword overlap (50%+ threshold), increments frequency on matches, and creates new candidate lessons (frequency: 1) for unmatched findings with severity >= WARNING.
|
||||
|
||||
1. **Read all `review.verdict` events** from the completed run's JSONL.
|
||||
2. **For each finding** in each verdict:
|
||||
a. Tokenize the finding description into keywords (lowercase, strip punctuation).
|
||||
b. Compare keywords against each existing lesson's description + tags.
|
||||
c. **Match threshold:** 50%+ keyword overlap between finding and lesson.
|
||||
3. **If match found:** Update the existing lesson:
|
||||
- Increment `frequency` by 1
|
||||
- Update `ts` to now
|
||||
- Update `last_seen_run` to current run ID
|
||||
- Reset `runs_since_last_seen` to 0
|
||||
4. **If no match AND severity >= WARNING:** Add as candidate lesson with `frequency: 1`.
|
||||
5. **Candidates become active** when `frequency >= 2` (triggered in a second run).
|
||||
|
||||
### Promotion Rule
|
||||
|
||||
A finding that appears in only one run stays at `frequency: 1` — it might be a one-off. Once the same pattern appears in a second run (matched by keyword overlap), it gets promoted to `frequency: 2` and becomes eligible for injection.
|
||||
|
||||
This prevents noise from single-run anomalies while still capturing genuine recurring issues quickly.
|
||||
|
||||
---
|
||||
**Promotion rule:** A finding needs `frequency >= 2` (seen in 2+ runs) before injection. This filters out one-off noise. Preferences skip this threshold.
|
||||
|
||||
## Injection
|
||||
|
||||
At run start, before spawning agents, the orchestrator injects relevant lessons:
|
||||
Before spawning agents, inject relevant lessons:
|
||||
|
||||
```bash
|
||||
LESSONS=$(./lib/archeflow-memory.sh inject <domain> <archetype>)
|
||||
```
|
||||
|
||||
### Injection Rules
|
||||
Rules: filters by domain (or `general`), optionally by archetype, requires `frequency >= 2`, sorts by frequency descending, caps at 10 lessons. Lessons with `frequency >= 5` are always injected regardless of filters.
|
||||
|
||||
1. Read `lessons.jsonl`.
|
||||
2. Filter by `domain` (exact match or `general`) and optionally by `archetype`.
|
||||
3. Only include lessons with `frequency >= 2` (confirmed patterns).
|
||||
4. Sort by frequency descending (most common first).
|
||||
5. Cap at **10 lessons** per injection.
|
||||
6. Lessons with `frequency >= 5` are **always injected** regardless of domain/archetype filter (they are universal enough to matter).
|
||||
|
||||
### Injection Format
|
||||
|
||||
Append to the agent's system prompt as a structured section:
|
||||
Injected as a markdown section appended to the agent's system prompt:
|
||||
|
||||
```markdown
|
||||
## Known Issues (from past runs)
|
||||
- Timeline references must match story start day [seen 3x, guardian]
|
||||
- Voice drift common in monologue passages >200 words [seen 2x, sage]
|
||||
- Missing null checks in API response handlers [seen 5x, guardian]
|
||||
```
|
||||
|
||||
### Integration with Run Skill
|
||||
|
||||
In the `run` skill, after Step 0 (Initialize) and before Step 1 (Plan Phase):
|
||||
|
||||
```bash
|
||||
# Load cross-run memory for this domain
|
||||
MEMORY_LESSONS=$(./lib/archeflow-memory.sh inject "$DOMAIN" "")
|
||||
|
||||
# Inject into Explorer/Creator prompts if non-empty
|
||||
if [[ -n "$MEMORY_LESSONS" ]]; then
|
||||
EXPLORER_PROMPT="${EXPLORER_PROMPT}
|
||||
|
||||
${MEMORY_LESSONS}"
|
||||
CREATOR_PROMPT="${CREATOR_PROMPT}
|
||||
|
||||
${MEMORY_LESSONS}"
|
||||
fi
|
||||
```
|
||||
|
||||
For reviewers in the Check phase, inject archetype-specific lessons:
|
||||
|
||||
```bash
|
||||
GUARDIAN_LESSONS=$(./lib/archeflow-memory.sh inject "$DOMAIN" "guardian")
|
||||
SAGE_LESSONS=$(./lib/archeflow-memory.sh inject "$DOMAIN" "sage")
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Decay
|
||||
|
||||
Lessons that stop being relevant should fade out. After each `run.complete`, apply decay:
|
||||
After each `run.complete`, apply decay: lessons not seen for 10 runs lose 1 frequency. When frequency reaches 0, the lesson is archived.
|
||||
|
||||
```bash
|
||||
./lib/archeflow-memory.sh decay
|
||||
```
|
||||
|
||||
### Decay Algorithm
|
||||
|
||||
1. For every lesson in `lessons.jsonl`:
|
||||
- If `last_seen_run` is NOT the current run → increment `runs_since_last_seen` by 1
|
||||
2. If `runs_since_last_seen >= 10`:
|
||||
- Decrement `frequency` by 1
|
||||
- Reset `runs_since_last_seen` to 0
|
||||
3. If `frequency` drops to 0:
|
||||
- Move the lesson to `.archeflow/memory/archive.jsonl` (append)
|
||||
- Remove from `lessons.jsonl`
|
||||
|
||||
This means a lesson that was seen 5 times but then stops appearing will survive 50 runs of non-triggering before being fully archived (5 decrements x 10 runs each).
|
||||
|
||||
---
|
||||
|
||||
## Manual Management
|
||||
|
||||
### Add a lesson
|
||||
```bash
|
||||
archeflow memory add "User prefers single bundled PR" # Add preference (injected immediately)
|
||||
archeflow memory list # Show all active lessons
|
||||
archeflow memory forget m-002 # Archive a lesson
|
||||
```
|
||||
|
||||
## Audit Trail
|
||||
|
||||
Track which lessons are injected per run and whether they were effective. Pass `--audit <run_id>` to inject to log records. After a run, `audit-check <run_id>` compares injected lessons against review findings: no matching finding = helpful (issue prevented), matching finding = ineffective (issue repeated despite injection).
|
||||
|
||||
```bash
|
||||
archeflow memory add "User prefers single bundled PR over many small ones"
|
||||
# Internally: ./lib/archeflow-memory.sh add preference "User prefers single bundled PR over many small ones"
|
||||
./lib/archeflow-memory.sh inject "$DOMAIN" "" --audit "$RUN_ID"
|
||||
./lib/archeflow-memory.sh audit-check <run_id>
|
||||
```
|
||||
|
||||
Manually added lessons start at `frequency: 1` but with type `preference`, which means they are injected immediately (preferences skip the frequency >= 2 threshold).
|
||||
|
||||
### List lessons
|
||||
|
||||
```bash
|
||||
archeflow memory list
|
||||
# Internally: ./lib/archeflow-memory.sh list
|
||||
```
|
||||
|
||||
Output:
|
||||
|
||||
```
|
||||
ID Freq Type Domain Description
|
||||
m-001 3 pattern writing Timeline references must match story start day
|
||||
m-002 1 preference general User prefers single bundled PR over many small ones
|
||||
m-003 5 archetype_hint writing Voice drift most common in long monologue passages
|
||||
m-004 1 anti_pattern code Splitting auth middleware causes duplication
|
||||
```
|
||||
|
||||
### Forget a lesson
|
||||
|
||||
```bash
|
||||
archeflow memory forget m-002
|
||||
# Internally: ./lib/archeflow-memory.sh forget m-002
|
||||
```
|
||||
|
||||
Moves the lesson to `archive.jsonl` regardless of frequency.
|
||||
|
||||
---
|
||||
|
||||
## Integration Points
|
||||
|
||||
| Moment | Action | Script Command |
|
||||
|--------|--------|----------------|
|
||||
| After `run.complete` | Extract lessons from findings | `archeflow-memory.sh extract <events.jsonl>` |
|
||||
| After extraction | Apply decay to all lessons | `archeflow-memory.sh decay` |
|
||||
| Before agent spawn (run start) | Inject relevant lessons | `archeflow-memory.sh inject <domain> <archetype>` |
|
||||
| Before agent spawn | Inject relevant lessons | `archeflow-memory.sh inject <domain> <archetype>` |
|
||||
| User command | Add/list/forget lessons | `archeflow-memory.sh add/list/forget` |
|
||||
|
||||
## Audit Trail
|
||||
|
||||
Track which lessons are injected into each run and whether they were effective.
|
||||
|
||||
### Storage
|
||||
|
||||
```
|
||||
.archeflow/memory/audit.jsonl # Append-only audit log
|
||||
```
|
||||
|
||||
### Injection Audit Record
|
||||
|
||||
When `--audit <run_id>` is passed to the `inject` command, an audit record is written:
|
||||
|
||||
```jsonl
|
||||
{"ts":"2026-04-04T10:00:00Z","run_id":"2026-04-04-auth-fix","domain":"code","archetype":"","lessons_injected":["m-001","m-003"],"lesson_count":2}
|
||||
```
|
||||
|
||||
Usage:
|
||||
```bash
|
||||
./lib/archeflow-memory.sh inject "$DOMAIN" "" --audit "$RUN_ID"
|
||||
```
|
||||
|
||||
### Effectiveness Check
|
||||
|
||||
After a run completes, check whether injected lessons prevented issues:
|
||||
|
||||
```bash
|
||||
./lib/archeflow-memory.sh audit-check <run_id>
|
||||
```
|
||||
|
||||
This command:
|
||||
1. Reads `audit.jsonl` for lessons injected in the given run
|
||||
2. Reads the run's event file for `review.verdict` events
|
||||
3. For each injected lesson, checks keyword overlap between the lesson's description and review findings
|
||||
4. **No matching finding** = `helpful` (the lesson likely prevented the issue)
|
||||
5. **Matching finding** = `ineffective` (the issue repeated despite the lesson being injected)
|
||||
6. Appends effectiveness results to `audit.jsonl`
|
||||
|
||||
### Effectiveness Over Time
|
||||
|
||||
By querying `audit.jsonl` for effectiveness records, you can measure:
|
||||
- Which lessons consistently prevent issues (high `helpful` count)
|
||||
- Which lessons are not working (high `ineffective` count — consider rewording or removing)
|
||||
- Overall memory system ROI (ratio of helpful to ineffective across all runs)
|
||||
|
||||
```bash
|
||||
# Count effectiveness per lesson
|
||||
jq -r 'select(.type == "effectiveness_check") | [.lesson_id, .effectiveness] | @tsv' .archeflow/memory/audit.jsonl | sort | uniq -c
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Design Principles
|
||||
|
||||
1. **Append-only storage.** `lessons.jsonl` is append-only during writes; decay rewrites the file in place but preserves all data (archived lessons move to `archive.jsonl`).
|
||||
2. **Conservative promotion.** A finding must appear in 2+ runs before injection. One-offs are noise.
|
||||
3. **Graceful degradation.** If `lessons.jsonl` doesn't exist, injection returns empty — no error, no block.
|
||||
4. **Cheap.** Keyword matching, not embeddings. `jq` for JSON, `grep` for matching. No external services.
|
||||
5. **Bounded.** Max 10 lessons injected per prompt. Prevents context pollution.
|
||||
|
||||
@@ -1,634 +0,0 @@
|
||||
---
|
||||
name: orchestration
|
||||
description: Use when executing a multi-agent orchestration — spawning archetype agents, managing PDCA cycles, coordinating worktrees, and merging results. This is the step-by-step execution guide.
|
||||
---
|
||||
|
||||
# Orchestration Execution
|
||||
|
||||
This skill guides you through running a full ArcheFlow orchestration using Claude Code's native Agent tool and git worktrees.
|
||||
|
||||
## Strategy Selection
|
||||
|
||||
A **strategy** defines the shape of an orchestration run — which phases execute, in what order, and when to iterate. A **workflow** (fast/standard/thorough) controls the depth within a strategy.
|
||||
|
||||
### Available Strategies
|
||||
|
||||
| Strategy | Flow | When to Use |
|
||||
|----------|------|-------------|
|
||||
| `pdca` | Plan -> Do -> Check -> Act (cyclic) | Refactors, thorough reviews, multi-concern tasks |
|
||||
| `pipeline` | Plan -> Implement -> Spec-Review -> Quality-Review -> Verify (linear) | Bug fixes, fast patches, single-concern tasks |
|
||||
| `auto` | Selected by task analysis | Default — let ArcheFlow decide |
|
||||
|
||||
### Strategy Interface
|
||||
|
||||
Every strategy defines:
|
||||
|
||||
- **Phases** — ordered list of execution stages
|
||||
- **Agent mapping** — which archetypes run in each phase
|
||||
- **Transition rules** — conditions for moving between phases
|
||||
- **Iteration model** — cyclic (PDCA) or linear (pipeline)
|
||||
- **Exit conditions** — when the run terminates
|
||||
|
||||
### PDCA Strategy
|
||||
|
||||
The existing orchestration flow (Steps 0-4 below). Cyclic — the Act phase can feed back to Plan for another iteration. Best for tasks requiring multiple review perspectives and iterative refinement.
|
||||
|
||||
### Pipeline Strategy
|
||||
|
||||
Linear flow with no cycle-back. Faster for well-understood tasks where one pass is sufficient.
|
||||
|
||||
| Phase | Agent | Purpose |
|
||||
|-------|-------|---------|
|
||||
| Plan | Creator | Design proposal |
|
||||
| Implement | Maker | Build in worktree |
|
||||
| Spec-Review | Guardian, then Skeptic | Security + assumption check (sequential) |
|
||||
| Quality-Review | Sage | Code quality review |
|
||||
| Verify | (automated) | Run tests, apply targeted fix if CRITICAL |
|
||||
|
||||
No cycle-back — WARNINGs are logged but do not block. CRITICALs in Verify trigger a single targeted fix attempt by the Maker, not a full cycle.
|
||||
|
||||
### Auto-Selection Rules
|
||||
|
||||
When `strategy: auto` (default):
|
||||
|
||||
- Task contains "fix", "bug", "patch", "hotfix" → `pipeline`
|
||||
- Task contains "refactor", "redesign", "review" → `pdca`
|
||||
- Workflow is `thorough` → `pdca` (always)
|
||||
- Workflow is `fast` with single file → `pipeline`
|
||||
- Otherwise → `pdca`
|
||||
|
||||
---
|
||||
|
||||
## Step 0: Choose a Workflow
|
||||
|
||||
If `.archeflow/teams/<name>.yaml` exists, the user can reference a team preset: `"Use the backend team"`. Load the preset's phase config instead of built-in defaults. See `archeflow:custom-archetypes` skill for preset format.
|
||||
|
||||
Otherwise, assess the task and pick:
|
||||
|
||||
| Signal | Workflow |
|
||||
|--------|----------|
|
||||
| Small fix, low risk, single concern | `fast` (1 cycle) |
|
||||
| Feature, multiple files, moderate risk | `standard` (2 cycles) |
|
||||
| Security-sensitive, breaking changes, public API | `thorough` (3 cycles) |
|
||||
|
||||
## Workflow Adaptation Rules
|
||||
|
||||
The initial workflow choice is a starting point, not a commitment. These rules adapt the workflow at runtime. Each rule specifies when it evaluates (which phase boundary).
|
||||
|
||||
### A3: Confidence Gate (evaluates: after Plan, before Do)
|
||||
|
||||
**When:** Creator's confidence table has any axis below 0.5.
|
||||
**Action by axis:**
|
||||
|
||||
| Axis | Score < 0.5 Action |
|
||||
|------|-------------------|
|
||||
| Task understanding | **Pause.** Ask user to clarify before proceeding. Do not spawn Maker. |
|
||||
| Solution completeness | **Upgrade to standard.** Add Explorer before Maker starts. |
|
||||
| Risk coverage | **Spawn mini-Explorer** for the specific risky area (parallel, 5 min max). Maker can proceed. |
|
||||
|
||||
A3 runs before any Do/Check agents spawn, so there are no cancellation issues.
|
||||
|
||||
### A1: Conditional Escalation (evaluates: after Check, before next cycle)
|
||||
|
||||
**When:** Guardian rejects with 2+ CRITICAL findings in a `fast` workflow.
|
||||
**Action:** Escalate to `standard` for the **next cycle** — add Skeptic + Sage to the reviewer roster.
|
||||
**Why:** If Guardian found serious issues, more perspectives help find root causes.
|
||||
**Sticky:** Once escalated, the workflow stays escalated for all remaining cycles. A2 does not apply to escalated workflows.
|
||||
|
||||
### A2: Guardian Fast-Path (evaluates: after Guardian, before spawning other reviewers)
|
||||
|
||||
**When:** Guardian finds 0 CRITICAL and 0 WARNING in a non-escalated `standard` or `thorough` workflow.
|
||||
**Action:** Do not spawn Skeptic, Sage, or Trickster. Proceed directly to Act phase.
|
||||
**Why:** Guardian's security review is the strictest gate. Clean pass = safe to skip additional reviewers.
|
||||
**Critical:** Evaluate A2 **after Guardian completes but before other reviewers are spawned.** Do not spawn reviewers in parallel with Guardian — spawn Guardian first, check A2, then spawn remaining reviewers only if A2 doesn't trigger.
|
||||
**Does not apply to:** Escalated workflows (A1 triggered), or first cycle of `thorough` workflows (Trickster is mandatory on first pass).
|
||||
**Log:** Note "Guardian fast-path taken" in orchestration report.
|
||||
|
||||
### Evaluation Order
|
||||
|
||||
```
|
||||
Plan phase completes → A3 (confidence gate)
|
||||
↓
|
||||
Guardian completes → A2 (fast-path check) → if clean, skip other reviewers
|
||||
↓ if not, spawn other reviewers
|
||||
Check phase done → A1 (escalation check) → if 2+ CRITICALs in fast, next cycle is standard
|
||||
```
|
||||
|
||||
## Process Logging
|
||||
|
||||
If `.archeflow/events/` exists (or should be created), emit structured events throughout orchestration. See `archeflow:process-log` skill for full schema.
|
||||
|
||||
**Quick reference — emit at these points:**
|
||||
|
||||
```
|
||||
run.start → After workflow selection, before first agent
|
||||
agent.start → Before each Agent tool call
|
||||
agent.complete → After each Agent returns (include duration, tokens, summary, artifacts)
|
||||
decision → When choosing between alternatives (plot direction, approach, fix strategy)
|
||||
phase.transition → At Plan→Do, Do→Check, Check→Act boundaries
|
||||
review.verdict → After each reviewer delivers verdict
|
||||
fix.applied → After each edit addressing a review finding
|
||||
cycle.boundary → End of PDCA cycle
|
||||
shadow.detected → When shadow threshold triggers
|
||||
run.complete → After final Act phase (include totals)
|
||||
```
|
||||
|
||||
**Helper:** `./lib/archeflow-event.sh <run_id> <type> <phase> <agent> '<json>'`
|
||||
|
||||
**Report:** `./lib/archeflow-report.sh .archeflow/events/<run_id>.jsonl`
|
||||
|
||||
Events are optional — if the events dir doesn't exist, skip logging. Never let logging block orchestration.
|
||||
|
||||
---
|
||||
|
||||
## Model Configuration
|
||||
|
||||
Model assignment per archetype and workflow is configured in `.archeflow/config.yaml` under the `models:` section. The `archeflow:run` skill (section 0c) handles resolution with fallback chain: per-workflow per-archetype > per-workflow default > per-archetype > global default. When spawning agents manually, read the config to select the appropriate model.
|
||||
|
||||
---
|
||||
|
||||
## Step 1: Plan Phase
|
||||
|
||||
Spawn agents sequentially — Creator needs Explorer's findings.
|
||||
|
||||
### Explorer (if standard or thorough)
|
||||
|
||||
**Context to include:** Task description, relevant file paths, codebase access.
|
||||
**Context to exclude:** Prior proposals, review outputs, implementation details, feedback from previous cycles.
|
||||
|
||||
```
|
||||
Agent(
|
||||
description: "🔍 Explorer: research context",
|
||||
prompt: "<task description>
|
||||
You are the EXPLORER archetype.
|
||||
Research the codebase to understand:
|
||||
1. What files and functions are involved
|
||||
2. What dependencies exist
|
||||
3. What tests currently cover this area
|
||||
4. What patterns the codebase uses
|
||||
Write your findings as a structured research report.
|
||||
Be thorough but focused — no rabbit holes.",
|
||||
subagent_type: "Explore"
|
||||
)
|
||||
```
|
||||
|
||||
### Creator
|
||||
|
||||
**Context to include:** Task description, Explorer's research output. On cycle 2+: prior cycle's structured feedback (see Cycle Feedback Protocol).
|
||||
**Context to exclude:** Raw file contents (Explorer already summarized), git diffs, reviewer full outputs.
|
||||
|
||||
**Fast workflow only (no Explorer):** The Creator must perform a Mini-Reflect before proposing:
|
||||
1. Restate the task in your own words (catch misunderstandings early)
|
||||
2. List 3 assumptions you're making
|
||||
3. Name the one risk that would cause most damage if wrong
|
||||
|
||||
```
|
||||
Agent(
|
||||
description: "🏗️ Creator: design proposal",
|
||||
prompt: "<task description>
|
||||
You are the CREATOR archetype.
|
||||
<if fast workflow (no Explorer): Before proposing, perform a Mini-Reflect:
|
||||
1. Restate the task in one sentence
|
||||
2. List 3 assumptions you're making
|
||||
3. Name the highest-damage risk
|
||||
Then propose.>
|
||||
<if standard/thorough: Based on the research findings: <Explorer's output>>
|
||||
<if cycle 2+: Prior cycle feedback: <structured feedback — see Cycle Feedback Protocol>>
|
||||
Design a solution proposal including:
|
||||
1. Architecture decisions (with rationale)
|
||||
2. Files to create/modify (with specific changes)
|
||||
3. Alternatives considered (at least 2, with rejection rationale)
|
||||
4. Test strategy
|
||||
5. Confidence (scored by axis: task understanding, solution completeness, risk coverage)
|
||||
6. Risks you foresee
|
||||
<if cycle 2+: 6. How you addressed each unresolved issue from prior feedback>
|
||||
Be decisive. Ship a clear plan, not a menu of options.",
|
||||
subagent_type: "Plan"
|
||||
)
|
||||
```
|
||||
|
||||
## Step 2: Do Phase
|
||||
|
||||
Spawn Maker in an **isolated worktree** so changes don't affect main.
|
||||
|
||||
**Context to include:** Creator's proposal only. On cycle 2+: implementation-routed feedback from Sage/Trickster.
|
||||
**Context to exclude:** Explorer's research, Guardian/Skeptic findings (those go to Creator).
|
||||
|
||||
```
|
||||
Agent(
|
||||
description: "⚒️ Maker: implement proposal",
|
||||
prompt: "<task description>
|
||||
You are the MAKER archetype.
|
||||
Implement this proposal: <Creator's output>
|
||||
<if cycle 2+: Implementation feedback from prior cycle: <Sage/Trickster findings only>>
|
||||
Rules:
|
||||
1. Follow the proposal exactly — don't redesign
|
||||
2. Write tests for every behavioral change
|
||||
3. Commit with descriptive messages
|
||||
4. Run existing tests — nothing may break
|
||||
5. If the proposal is unclear, implement your best interpretation and note it
|
||||
Do NOT skip tests. Do NOT refactor unrelated code.
|
||||
|
||||
BEFORE finishing — Self-Review Checklist:
|
||||
1. Did I change ALL files listed in the proposal's Changes section?
|
||||
2. Did I add tests for each behavioral change?
|
||||
3. Are there files in my diff NOT listed in the proposal? If yes, revert them.
|
||||
4. Do all existing tests still pass?
|
||||
Report any gaps in your Implementation summary.",
|
||||
isolation: "worktree",
|
||||
mode: "bypassPermissions"
|
||||
)
|
||||
```
|
||||
|
||||
**Critical:** The Maker MUST commit its changes before finishing. Uncommitted changes in a worktree are lost.
|
||||
|
||||
## Step 3: Check Phase
|
||||
|
||||
Spawn Guardian **first**. After Guardian completes, check adaptation rule A2 (fast-path). If A2 triggers (0 CRITICAL, 0 WARNING, non-escalated workflow), skip remaining reviewers and proceed to Act. Otherwise, spawn remaining reviewers **in parallel**.
|
||||
|
||||
**Reviewer spawning protocol:** The canonical sequence (Guardian first, A2 evaluation, parallel spawning, timeout handling) is defined in `archeflow:check-phase` under "Reviewer Spawning Protocol". Follow that protocol for the exact spawning order, context per reviewer, and timeout rules.
|
||||
|
||||
### Guardian (always runs first)
|
||||
|
||||
**Context to include:** Maker's git diff, proposal risk section only.
|
||||
**Context to exclude:** Explorer's research, full proposal, other reviewer outputs.
|
||||
|
||||
```
|
||||
Agent(
|
||||
description: "🛡️ Guardian: security and risk review",
|
||||
prompt: "You are the GUARDIAN archetype.
|
||||
Review the changes in branch: <maker's branch>
|
||||
Assess:
|
||||
1. Security vulnerabilities (injection, auth bypass, data exposure)
|
||||
2. Reliability risks (error handling, edge cases, race conditions)
|
||||
3. Breaking changes (API compatibility, schema migrations)
|
||||
4. Dependency risks (new deps, version conflicts)
|
||||
Output: APPROVED or REJECTED with specific findings.
|
||||
Each finding: | file:line | CRITICAL/WARNING/INFO | category | description | fix |
|
||||
Categories: security, reliability, design, breaking-change, dependency
|
||||
Be rigorous but practical — flag real risks, not theoretical ones."
|
||||
)
|
||||
```
|
||||
|
||||
### Skeptic (if standard or thorough)
|
||||
|
||||
**Context to include:** Creator's proposal (focus on assumptions section).
|
||||
**Context to exclude:** Git diff details, Explorer's research, other reviewer outputs.
|
||||
|
||||
```
|
||||
Agent(
|
||||
description: "🤔 Skeptic: challenge assumptions",
|
||||
prompt: "You are the SKEPTIC archetype.
|
||||
Review the proposal: <Creator's proposal>
|
||||
Challenge:
|
||||
1. Assumptions in the design — what if they're wrong?
|
||||
2. Alternative approaches not considered
|
||||
3. Edge cases not tested
|
||||
4. Scalability concerns
|
||||
Output: APPROVED or REJECTED with counterarguments.
|
||||
Each finding: | file:line | CRITICAL/WARNING/INFO | category | description | fix |
|
||||
Categories: design, quality, testing, scalability
|
||||
Be constructive — every challenge must include a suggested alternative."
|
||||
)
|
||||
```
|
||||
|
||||
### Sage (if standard or thorough)
|
||||
|
||||
**Context to include:** Creator's proposal, Maker's git diff, implementation summary.
|
||||
**Context to exclude:** Explorer's raw research, other reviewer outputs.
|
||||
|
||||
```
|
||||
Agent(
|
||||
description: "📚 Sage: holistic quality review",
|
||||
prompt: "You are the SAGE archetype.
|
||||
Review the changes in branch: <maker's branch>
|
||||
Evaluate holistically:
|
||||
1. Code quality (readability, maintainability, simplicity)
|
||||
2. Test coverage (are the tests meaningful, not just present?)
|
||||
3. Documentation (does the change need docs?)
|
||||
4. Consistency with codebase patterns
|
||||
Output: APPROVED or REJECTED with quality findings.
|
||||
Each finding: | file:line | CRITICAL/WARNING/INFO | category | description | fix |
|
||||
Categories: quality, testing, design, consistency
|
||||
Judge like a senior engineer doing a PR review."
|
||||
)
|
||||
```
|
||||
|
||||
### Trickster (if thorough only)
|
||||
|
||||
**Context to include:** Maker's git diff only.
|
||||
**Context to exclude:** Everything else — proposal, research, other reviews.
|
||||
|
||||
```
|
||||
Agent(
|
||||
description: "🃏 Trickster: adversarial testing",
|
||||
prompt: "You are the TRICKSTER archetype.
|
||||
Try to break the changes in branch: <maker's branch>
|
||||
Attack vectors:
|
||||
1. Malformed input, boundary values, empty/null/huge data
|
||||
2. Concurrency and race conditions
|
||||
3. Error path exploitation
|
||||
4. Dependency failure scenarios
|
||||
Output: APPROVED or REJECTED with edge cases found.
|
||||
Each finding: | file:line | CRITICAL/WARNING/INFO | category | description | fix |
|
||||
Categories: security, reliability, testing
|
||||
Think like a QA engineer who gets paid per bug found."
|
||||
)
|
||||
```
|
||||
|
||||
## Step 4: Act Phase
|
||||
|
||||
Collect all reviewer outputs and decide.
|
||||
|
||||
### Completion Promise (optional)
|
||||
|
||||
If the user defined explicit done criteria with the task, check them now:
|
||||
|
||||
```
|
||||
Completion criteria: <test command passes> AND <Guardian approves>
|
||||
Example: "done when pytest passes and Guardian approves with 0 CRITICAL"
|
||||
```
|
||||
|
||||
If completion criteria are defined, **all criteria must pass** — reviewer approval alone is not sufficient. If tests fail but reviewers approved, cycle back with "tests failing" as feedback to Creator.
|
||||
|
||||
### All Approved (and completion criteria met)
|
||||
1. **Pre-merge hooks:** Check `.archeflow/hooks.yaml` for `pre-merge` hooks. Run them. If `fail_action: abort`, stop and report.
|
||||
2. Merge the Maker's worktree branch into the target branch
|
||||
3. **Post-merge hooks:** Run `post-merge` hooks from `.archeflow/hooks.yaml` if defined. Then run the project's test suite on the merged branch
|
||||
- Tests pass → proceed to step 3
|
||||
- Tests fail → **auto-revert** the merge commit, report the failure, and cycle back with "integration test failure on main" as feedback
|
||||
3. Report: what was implemented, what was reviewed, any warnings noted
|
||||
4. Clean up the worktree
|
||||
5. Record metrics (see Orchestration Metrics)
|
||||
|
||||
### Issues Found (and cycles remaining)
|
||||
1. Build structured feedback using the Cycle Feedback Protocol below
|
||||
2. Go back to Step 1 (Plan) with the feedback
|
||||
3. Creator revises the proposal, addressing each unresolved issue
|
||||
4. Maker re-implements in a fresh worktree
|
||||
5. Reviewers check again
|
||||
|
||||
### Max Cycles Reached with Unresolved Issues
|
||||
1. Report all unresolved findings to the user
|
||||
2. Present the best implementation so far (on its branch)
|
||||
3. Let the user decide: merge as-is, fix manually, or abandon
|
||||
|
||||
---
|
||||
|
||||
## Cycle Feedback Protocol
|
||||
|
||||
After the Check phase, build structured feedback for the next cycle. This replaces dumping raw reviewer output.
|
||||
|
||||
### 1. Extract Findings
|
||||
|
||||
Parse each reviewer's output into the standardized format:
|
||||
|
||||
```markdown
|
||||
## Cycle N Feedback
|
||||
|
||||
### Unresolved Issues
|
||||
| Source | Severity | Category | Issue | Route to |
|
||||
|--------|----------|----------|-------|----------|
|
||||
| Guardian | CRITICAL | security | SQL injection in user input | Creator |
|
||||
| Skeptic | WARNING | design | Assumes single-tenant only | Creator |
|
||||
| Sage | WARNING | quality | Test names don't describe behavior | Maker |
|
||||
| Trickster | CRITICAL | reliability | Empty string bypasses validation | Creator |
|
||||
|
||||
### Resolved (from cycle N-1)
|
||||
| Source | Issue | Resolution |
|
||||
|--------|-------|------------|
|
||||
| Guardian | Missing rate limit | Added rate limiter middleware |
|
||||
```
|
||||
|
||||
### 2. Route Feedback
|
||||
|
||||
Not all findings go to the same agent:
|
||||
|
||||
| Source | Category | Routes to | Reason |
|
||||
|--------|----------|-----------|--------|
|
||||
| Guardian | security, breaking-change | **Creator** | Design must change |
|
||||
| Guardian | reliability, dependency | **Creator** | Architectural decision needed |
|
||||
| Skeptic | design, scalability | **Creator** | Assumptions need revision |
|
||||
| Sage | quality, consistency | **Maker** | Implementation refinement |
|
||||
| Sage | testing | **Maker** | Test gap, not design flaw |
|
||||
| Trickster | reliability (design flaw) | **Creator** | Needs redesign |
|
||||
| Trickster | reliability (test gap) | **Maker** | Needs more tests |
|
||||
| Trickster | testing | **Maker** | Edge case not covered |
|
||||
|
||||
**Disambiguation rule:** When in doubt: if the fix requires changing the approach, route to Creator. If it requires changing the code within the existing approach, route to Maker.
|
||||
|
||||
### 3. Track Resolution
|
||||
|
||||
Compare cycle N findings against cycle N-1:
|
||||
- If a prior finding no longer appears in the same category → mark **resolved**
|
||||
- If a prior finding persists → it stays **unresolved** with an incremented cycle count
|
||||
- If new findings appear → add as new unresolved issues
|
||||
|
||||
This prevents regression and gives the Creator/Maker a clear list of what to address.
|
||||
|
||||
### 4. Convergence Detection
|
||||
|
||||
If the **same finding** (same category + same file location) appears **unresolved in 2 consecutive cycles**, escalate to user:
|
||||
|
||||
> "Finding persists across 2 cycles: [Guardian] CRITICAL security — SQL injection in src/auth.ts:48. This may need human judgment or a different approach."
|
||||
|
||||
Do not cycle again blindly. The issue is likely structural (wrong design, not wrong implementation) and needs human input.
|
||||
|
||||
### 5. Cross-Archetype Dedup
|
||||
|
||||
If two reviewers raise the same issue (same file + same category + similar description), merge into one finding in the consolidated output:
|
||||
|
||||
```
|
||||
| Guardian + Skeptic | CRITICAL | security | Input not sanitized (src/api.ts:30) | Add validation |
|
||||
```
|
||||
|
||||
Don't double-count in severity tallies. Route to the higher-priority destination (Creator over Maker).
|
||||
|
||||
---
|
||||
|
||||
## Orchestration Metrics
|
||||
|
||||
Track lightweight metrics throughout the orchestration. No token counting (unreliable from skill layer) — just timing and outcomes.
|
||||
|
||||
### Per-Phase Logging
|
||||
|
||||
After each phase completes, note:
|
||||
|
||||
```
|
||||
| Phase | Duration | Agents | Outcome |
|
||||
|-------|----------|--------|---------|
|
||||
| Plan | 45s | 2 | Proposal ready (confidence: 0.8) |
|
||||
| Do | 90s | 1 | 4 files changed, 8 tests added |
|
||||
| Check | 60s | 3 | 1 REJECTED (Guardian), 2 APPROVED |
|
||||
| Act | — | — | Cycle back → feedback built |
|
||||
```
|
||||
|
||||
### Orchestration Summary
|
||||
|
||||
At orchestration end, include in the report:
|
||||
|
||||
```markdown
|
||||
## Orchestration Metrics
|
||||
| Metric | Value |
|
||||
|--------|-------|
|
||||
| Workflow | standard |
|
||||
| Cycles | 2 of 2 |
|
||||
| Total duration | 4m 30s |
|
||||
| Agents spawned | 9 |
|
||||
| Findings (total) | 5 |
|
||||
| Findings (critical) | 1 |
|
||||
| Findings (resolved) | 4 |
|
||||
| Shadow detections | 0 |
|
||||
```
|
||||
|
||||
Use this data to calibrate future workflow selection — if fast workflows consistently need 0 cycles of revision, the task was well-scoped.
|
||||
|
||||
---
|
||||
|
||||
## Autonomous Mode
|
||||
|
||||
When running unattended (overnight sessions, batch queues), add these behaviors to the orchestration loop:
|
||||
|
||||
### Between-Task Checkpoint
|
||||
|
||||
After each task completes (success or failure):
|
||||
1. **Commit and push** all changes immediately
|
||||
2. **Update session log** at `.archeflow/session-log.md` with task outcome
|
||||
3. **Check stop conditions** before starting next task:
|
||||
- 3 consecutive failures → STOP
|
||||
- Shadow escalation (same shadow 3+ times) → STOP
|
||||
- Test suite broken after merge → REVERT and STOP
|
||||
- Destructive action detected → STOP
|
||||
|
||||
### Session Log Protocol
|
||||
|
||||
**Primary:** Emit `run.complete` event to `.archeflow/events/<run_id>.jsonl` (see Process Logging section above). The event stream is the source of truth.
|
||||
|
||||
**Secondary:** Also write a human-readable summary to `.archeflow/session-log.md`:
|
||||
|
||||
```markdown
|
||||
## Task N: <description>
|
||||
**Workflow:** standard | **Status:** COMPLETED/FAILED
|
||||
**Cycles:** 1 of 2
|
||||
**Findings:** Guardian APPROVED, Skeptic APPROVED, Sage WARNING (test names)
|
||||
**Files changed:** 5 | **Tests added:** 12
|
||||
**Branch:** merged to main (commit abc1234) | OR: archeflow/maker-xyz (NOT merged)
|
||||
**Duration:** 8 min
|
||||
**Events:** `.archeflow/events/<run_id>.jsonl` (full process log)
|
||||
```
|
||||
|
||||
Generate the full Markdown report: `./lib/archeflow-report.sh .archeflow/events/<run_id>.jsonl`
|
||||
|
||||
### Safety Rules
|
||||
- Never force-push. Never modify main history.
|
||||
- All work stays on worktree branches until explicitly merged
|
||||
- Merges use `--no-ff` — individually revertable
|
||||
- Failed tasks leave branches intact for manual inspection
|
||||
|
||||
For full autonomous mode details (task queues, overnight checklists, user controls): load the `archeflow:autonomous-mode` skill.
|
||||
|
||||
---
|
||||
|
||||
## Shadow Monitoring
|
||||
|
||||
During orchestration, watch for shadow activation after each agent completes. Quick checklist:
|
||||
|
||||
| Archetype | Shadow | Quick Check |
|
||||
|-----------|--------|-------------|
|
||||
| Explorer | Rabbit Hole | Output >2000 words without Recommendation section? |
|
||||
| Creator | Over-Architect | >2 new abstractions for one feature? |
|
||||
| Maker | Rogue | No test files in changeset? Files outside proposal? |
|
||||
| Guardian | Paranoid | CRITICAL:WARNING ratio >2:1? Zero approvals? |
|
||||
| Skeptic | Paralytic | >7 challenges? <50% have alternatives? |
|
||||
| Trickster | False Alarm | Findings in untouched code? >10 findings? |
|
||||
| Sage | Bureaucrat | Review >2x code change length? |
|
||||
|
||||
On detection: apply correction prompt from `archeflow:shadow-detection` skill. On second detection of same shadow: replace agent. On 3+ shadows in same cycle: escalate to user.
|
||||
|
||||
---
|
||||
|
||||
## Parallel Team Orchestration
|
||||
|
||||
When running multiple independent tasks, spawn parallel ArcheFlow teams. Each team runs its own PDCA cycle on a separate worktree.
|
||||
|
||||
### Rules
|
||||
|
||||
1. **Non-overlapping file scope:** Each team must work on different files. If two tasks touch the same file, run them sequentially.
|
||||
2. **Independent worktrees:** Each team's Maker gets its own worktree branch (`archeflow/team-1-maker`, `archeflow/team-2-maker`).
|
||||
3. **First-finished-first-merged:** Teams merge in completion order. Later teams rebase onto the updated main before their own merge.
|
||||
4. **Merge conflict handling:** If rebase fails, the later team re-runs its Check phase against the merged main. If conflicts are structural, escalate to user.
|
||||
5. **Max 3 parallel teams:** More causes diminishing returns and merge headaches.
|
||||
|
||||
### Spawning Parallel Teams
|
||||
|
||||
```
|
||||
# Launch 2-3 teams in a single message with multiple Agent calls:
|
||||
Agent(description: "🏗️ Team 1: pagination fix (fast)", ...)
|
||||
Agent(description: "🏗️ Team 2: JWT auth (standard)", ...)
|
||||
Agent(description: "🏗️ Team 3: logging refactor (fast)", ...)
|
||||
```
|
||||
|
||||
Each team follows the full PDCA steps independently. The orchestrator monitors all teams and handles merges.
|
||||
|
||||
---
|
||||
|
||||
## Reviewer Profiles
|
||||
|
||||
Projects can configure which reviewers matter in `.archeflow/config.yaml`:
|
||||
|
||||
```yaml
|
||||
reviewers:
|
||||
always: [guardian] # Always runs
|
||||
default: [sage] # Runs in standard+thorough
|
||||
thorough_only: [trickster] # Only in thorough
|
||||
skip: [skeptic] # Never runs for this project
|
||||
```
|
||||
|
||||
If no config exists, use the built-in workflow defaults. Profiles save tokens by not spawning reviewers that add little value for the specific project.
|
||||
|
||||
## Explorer Cache
|
||||
|
||||
If the same code area was explored recently, skip Explorer and reuse prior research:
|
||||
|
||||
**Cache hit criteria:** Same files affected (>70% overlap by path) AND prior research is <24 hours old AND no commits to those files since the research.
|
||||
|
||||
**On cache hit:** Show the prior research to Creator with a note: "Using cached Explorer research from [timestamp]. If the codebase changed significantly, re-run Explorer."
|
||||
|
||||
**On cache miss:** Run Explorer normally.
|
||||
|
||||
Cache is stored in `.archeflow/explorer-cache/` as timestamped markdown files. The orchestrator checks for matches before spawning Explorer.
|
||||
|
||||
## Learning from History
|
||||
|
||||
Track which archetypes catch real issues per project over time. After each orchestration, append to `.archeflow/metrics.jsonl`:
|
||||
|
||||
```json
|
||||
{"task": "...", "archetype": "guardian", "findings": 2, "critical": 1, "resolved": 2, "useful": true}
|
||||
{"task": "...", "archetype": "skeptic", "findings": 3, "critical": 0, "resolved": 0, "useful": false}
|
||||
```
|
||||
|
||||
A finding is **useful** if it was resolved (led to a code change) rather than dismissed.
|
||||
|
||||
After 10+ orchestrations, the orchestrator can recommend reviewer profile changes:
|
||||
- "Skeptic has found 0 useful issues in 8 runs — consider moving to `skip` or `thorough_only`"
|
||||
- "Guardian catches critical issues in 80% of runs — confirmed as essential"
|
||||
|
||||
This is advisory, not automatic. The user decides based on the data.
|
||||
|
||||
---
|
||||
|
||||
## Orchestration Report
|
||||
|
||||
After completion, summarize:
|
||||
|
||||
```markdown
|
||||
## ArcheFlow Orchestration Report
|
||||
- **Task:** <description>
|
||||
- **Workflow:** standard (2 cycles)
|
||||
- **Cycle 1:** Guardian rejected (SQL injection in user input handler)
|
||||
- **Cycle 2:** All approved after input sanitization added
|
||||
- **Files changed:** 4 files, +120 -30 lines
|
||||
- **Tests added:** 8 new tests
|
||||
- **Branch:** archeflow/maker-<id> → merged to main
|
||||
- **Metrics:** 9 agents, 4m 30s, 5 findings (4 resolved, 1 info remaining)
|
||||
```
|
||||
@@ -1,175 +0,0 @@
|
||||
---
|
||||
name: plan-phase
|
||||
description: Use when acting as Explorer or Creator in the Plan phase. Defines output formats for research and proposals.
|
||||
---
|
||||
|
||||
# Plan Phase
|
||||
|
||||
Explorer researches, then Creator designs. Sequential — Creator needs Explorer's findings.
|
||||
|
||||
## Explorer Output Format
|
||||
|
||||
```markdown
|
||||
## Research: <task>
|
||||
|
||||
### Affected Code
|
||||
- `path/file.ext` — description (L<start>-<end>)
|
||||
|
||||
### Dependencies
|
||||
- What depends on what, what breaks if changed
|
||||
|
||||
### Patterns
|
||||
- How the codebase solves similar problems
|
||||
|
||||
### Risks
|
||||
- What could go wrong
|
||||
|
||||
### Recommendation
|
||||
<one paragraph: approach + rationale>
|
||||
```
|
||||
|
||||
## Creator Output Format
|
||||
|
||||
```markdown
|
||||
## Proposal: <task>
|
||||
|
||||
### Mini-Reflect (fast workflow only — skip if Explorer ran)
|
||||
- **Task restated:** <one sentence>
|
||||
- **Assumptions:** 1) ... 2) ... 3) ...
|
||||
- **Highest-damage risk:** <the one thing that would hurt most if wrong>
|
||||
|
||||
### Architecture Decision
|
||||
<What and WHY>
|
||||
|
||||
### Alternatives Considered
|
||||
| Approach | Why Rejected |
|
||||
|----------|-------------|
|
||||
| <option A> | <reason> |
|
||||
| <option B> | <reason> |
|
||||
|
||||
### Changes
|
||||
1. **`path/file.ext`** — What changes and why
|
||||
2. **`path/test.ext`** — What tests to add
|
||||
|
||||
### Test Strategy
|
||||
- <specific test cases>
|
||||
|
||||
### Confidence
|
||||
| Axis | Score | Note |
|
||||
|------|-------|------|
|
||||
| Task understanding | <0.0-1.0> | <why> |
|
||||
| Solution completeness | <0.0-1.0> | <gaps?> |
|
||||
| Risk coverage | <0.0-1.0> | <unknowns?> |
|
||||
|
||||
### Risks
|
||||
- <what could go wrong + mitigations>
|
||||
|
||||
### Not Doing
|
||||
- <adjacent concerns deliberately excluded>
|
||||
```
|
||||
|
||||
**Confidence triggers:** If any axis scores below 0.5, flag it to the orchestrator. Low task understanding → clarify with user. Low solution completeness → consider standard workflow. Low risk coverage → spawn targeted Explorer research.
|
||||
|
||||
## Creator with Prior Feedback (Cycle 2+)
|
||||
|
||||
When the Creator receives structured feedback from a prior cycle, the proposal must include an additional section addressing each unresolved issue:
|
||||
|
||||
```markdown
|
||||
## Proposal: <task> (Revision — Cycle N)
|
||||
|
||||
### What Changed (vs. prior proposal)
|
||||
- <brief delta: what was added, removed, or redesigned>
|
||||
|
||||
### Prior Feedback Response
|
||||
| Issue | Source | Action | Rationale |
|
||||
|-------|--------|--------|-----------|
|
||||
| SQL injection in user input | Guardian | **Fixed** — added parameterized queries | Direct security fix |
|
||||
| Assumes single-tenant | Skeptic | **Deferred** — multi-tenant out of scope | Not in task requirements |
|
||||
| Test names unclear | Sage | **Accepted** — routed to Maker | Implementation concern |
|
||||
|
||||
### Architecture Decision
|
||||
<revised design addressing feedback>
|
||||
|
||||
### Changes
|
||||
<updated file list>
|
||||
|
||||
### Test Strategy
|
||||
<updated test cases>
|
||||
|
||||
### Confidence
|
||||
| Axis | Score | Note |
|
||||
|------|-------|------|
|
||||
| Task understanding | <0.0-1.0> | <why> |
|
||||
| Solution completeness | <0.0-1.0> | <gaps?> |
|
||||
| Risk coverage | <0.0-1.0> | <unknowns?> |
|
||||
|
||||
### Risks
|
||||
<updated risks — include any new risks from the revision>
|
||||
|
||||
### Not Doing
|
||||
<updated scope boundaries>
|
||||
```
|
||||
|
||||
**Rules for addressing feedback:**
|
||||
- **Fixed:** Changed the design to resolve the issue. Explain how.
|
||||
- **Deferred:** Not addressing now, with explicit reason. Must not be a CRITICAL finding.
|
||||
- **Accepted:** Acknowledged and routed to Maker for implementation-level fix.
|
||||
- **Disputed:** Disagrees with the finding. Must provide evidence or reasoning.
|
||||
|
||||
CRITICAL findings cannot be deferred or disputed — they must be fixed or the proposal will be rejected again.
|
||||
|
||||
## Task Granularity
|
||||
|
||||
Each change item in the Creator's proposal must be a **2-5 minute task** — specific enough that the Maker can implement it without interpretation.
|
||||
|
||||
### Requirements per Change Item
|
||||
|
||||
Every item in the `### Changes` section must include:
|
||||
|
||||
1. **Exact file path** — `src/auth/handler.ts`, not "the auth module"
|
||||
2. **What to change** — a code block showing the target state or transformation
|
||||
3. **How to verify** — a command or check that confirms correctness
|
||||
|
||||
### Good Example
|
||||
|
||||
```markdown
|
||||
1. **`src/auth/handler.ts:48`** — Add input length validation before token processing
|
||||
```typescript
|
||||
if (!token || token.trim().length === 0) {
|
||||
throw new ValidationError('Token must not be empty');
|
||||
}
|
||||
```
|
||||
**Verify:** `npm test -- --grep "empty token"` passes
|
||||
```
|
||||
|
||||
### Bad Example
|
||||
|
||||
```markdown
|
||||
1. **Auth module** — Fix the validation logic
|
||||
```
|
||||
|
||||
This is too vague. Which file? Which function? What does "fix" mean? The Maker will guess.
|
||||
|
||||
### Granularity Check
|
||||
|
||||
- If a single change item would take **>5 minutes**, split it into smaller items
|
||||
- If a non-trivial task has **<2 change items**, it is under-specified — the Creator missed something
|
||||
- Each item should touch **1-2 files** at most. Cross-cutting changes need separate items per file.
|
||||
|
||||
---
|
||||
|
||||
## Explorer Skip Conditions
|
||||
|
||||
Not every task needs Explorer research. Use this decision table:
|
||||
|
||||
| Condition | Skip Explorer? | Reason |
|
||||
|-----------|---------------|--------|
|
||||
| Task names specific files (1-2) and change is clear | **Yes** | Context is already known |
|
||||
| Bug fix with stack trace or error message | **Yes** | Root cause is locatable without research |
|
||||
| High confidence + small scope (single function/class) | **Yes** | Creator can mini-reflect instead |
|
||||
| Task contains "investigate", "research", "explore" | **No** | Explicit research request |
|
||||
| Task affects >3 files or unknown scope | **No** | Need dependency mapping |
|
||||
| Unfamiliar area of codebase (no recent commits by team) | **No** | Need pattern discovery |
|
||||
| Security-sensitive change (auth, crypto, input handling) | **No** | Need risk surface mapping |
|
||||
|
||||
When Explorer is skipped, Creator MUST include the **Mini-Reflect** section in its proposal to compensate for missing research context.
|
||||
@@ -1,278 +0,0 @@
|
||||
---
|
||||
name: process-log
|
||||
description: |
|
||||
Event-based process logging for ArcheFlow orchestrations. Captures every phase transition,
|
||||
agent output, decision, and fix as structured JSONL events. Enables post-hoc reports,
|
||||
dashboards, and process archaeology.
|
||||
<example>Automatically loaded during orchestration</example>
|
||||
<example>User: "Show me how this story was made"</example>
|
||||
---
|
||||
|
||||
# Process Log — Event-Sourced Orchestration History
|
||||
|
||||
Every ArcheFlow orchestration writes structured events to a JSONL file. Events are the **single source of truth** — all reports (Markdown, dashboards, timelines) are generated views.
|
||||
|
||||
## Event Storage
|
||||
|
||||
```
|
||||
.archeflow/events/<run-id>.jsonl # One file per orchestration run
|
||||
.archeflow/events/index.jsonl # Run index (one line per run, for listing)
|
||||
```
|
||||
|
||||
**Run ID format:** `<date>-<slug>` (e.g., `2026-04-03-der-huster`)
|
||||
|
||||
## When to Emit Events
|
||||
|
||||
Emit an event at each of these points during orchestration:
|
||||
|
||||
| Moment | Event Type | Trigger |
|
||||
|--------|-----------|---------|
|
||||
| Orchestration starts | `run.start` | After workflow selection, before first agent |
|
||||
| Agent spawned | `agent.start` | Before each Agent tool call |
|
||||
| Agent completes | `agent.complete` | After each Agent returns |
|
||||
| Phase transition | `phase.transition` | Plan→Do, Do→Check, Check→Act |
|
||||
| Decision made | `decision` | Plot direction chosen, fix applied, workflow adapted |
|
||||
| Review verdict | `review.verdict` | Guardian/Sage/Skeptic delivers verdict |
|
||||
| Fix applied | `fix.applied` | After each edit that addresses a review finding |
|
||||
| Cycle boundary | `cycle.boundary` | End of PDCA cycle, before next (or exit) |
|
||||
| Shadow detected | `shadow.detected` | Shadow threshold triggered |
|
||||
| Orchestration ends | `run.complete` | After final Act phase |
|
||||
|
||||
## Event Schema
|
||||
|
||||
Every event is one JSON line with these required fields:
|
||||
|
||||
```jsonl
|
||||
{
|
||||
"ts": "2026-04-03T14:32:07Z",
|
||||
"run_id": "2026-04-03-der-huster",
|
||||
"seq": 4,
|
||||
"parent": [2],
|
||||
"type": "agent.complete",
|
||||
"phase": "plan",
|
||||
"agent": "creator",
|
||||
"data": { ... }
|
||||
}
|
||||
```
|
||||
|
||||
| Field | Type | Description |
|
||||
|-------|------|-------------|
|
||||
| `ts` | ISO 8601 | Timestamp |
|
||||
| `run_id` | string | Unique run identifier |
|
||||
| `seq` | integer | Monotonically increasing sequence number within run |
|
||||
| `parent` | int[] | Seq numbers of causal parent events. Forms a DAG. `[]` for root events. |
|
||||
| `type` | string | Event type (see table above) |
|
||||
| `phase` | string | Current PDCA phase: `plan`, `do`, `check`, `act` |
|
||||
| `agent` | string or null | Agent archetype that triggered the event |
|
||||
| `data` | object | Event-type-specific payload (see below) |
|
||||
|
||||
### Parent Relationships (DAG)
|
||||
|
||||
The `parent` field turns the flat event stream into a directed acyclic graph (agent call graph). This enables:
|
||||
|
||||
- **Causal reconstruction:** which agent output caused which downstream action
|
||||
- **Parallel visualization:** agents sharing a parent ran concurrently
|
||||
- **Blame tracking:** trace a fix back through review → draft → outline → research
|
||||
|
||||
Rules:
|
||||
- `run.start` has `parent: []` (root node)
|
||||
- An agent has `parent: [seq of event that triggered it]`
|
||||
- A phase transition has `parent: [seq of all completing events in prior phase]`
|
||||
- A fix has `parent: [seq of the review that found the issue]`
|
||||
- A decision has `parent: [seq of the agent that produced the alternatives]`
|
||||
- Parallel agents share the same parent (fan-out), phase transitions collect them (fan-in)
|
||||
|
||||
Example DAG from a writing workflow:
|
||||
```
|
||||
#1 run.start []
|
||||
├── #2 agent.complete (explorer) [1]
|
||||
│ └── #3 decision (plot direction) [2]
|
||||
├── #4 agent.complete (creator) [2] ← explorer informs creator
|
||||
├── #5 phase.transition (plan→do) [3,4] ← fan-in
|
||||
│ └── #6 agent.complete (maker) [5]
|
||||
├── #7 phase.transition (do→check) [6]
|
||||
│ ├── #8 review (guardian) [7] ← parallel (fan-out)
|
||||
│ └── #9 review (sage) [7] ← parallel (fan-out)
|
||||
├── #10 phase.transition (check→act) [8,9] ← fan-in
|
||||
├── #11 fix (timeline) [8] ← caused by guardian
|
||||
├── #12 fix (voice drift) [9] ← caused by sage
|
||||
└── #18 run.complete [17]
|
||||
```
|
||||
|
||||
## Event Payloads by Type
|
||||
|
||||
### `run.start`
|
||||
```json
|
||||
{
|
||||
"task": "Write short story 'Der Huster'",
|
||||
"workflow": "kurzgeschichte",
|
||||
"team": "story-development",
|
||||
"max_cycles": 2,
|
||||
"config": {
|
||||
"voice_profile": "vp-giesing-gschichten-v1",
|
||||
"persona": "giesinger",
|
||||
"target_words": 6000
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
### `agent.start`
|
||||
```json
|
||||
{
|
||||
"archetype": "story-explorer",
|
||||
"model": "haiku",
|
||||
"prompt_summary": "Research premise, find emotional core, suggest 3 plot directions"
|
||||
}
|
||||
```
|
||||
|
||||
### `agent.complete`
|
||||
```json
|
||||
{
|
||||
"archetype": "story-explorer",
|
||||
"duration_ms": 87605,
|
||||
"tokens": 21645,
|
||||
"artifacts": ["docs/01-der-huster-research.md"],
|
||||
"summary": "3 plot directions developed, recommended C (Mo krank + Koffer)"
|
||||
}
|
||||
```
|
||||
|
||||
### `decision`
|
||||
```json
|
||||
{
|
||||
"what": "plot_direction",
|
||||
"chosen": "C — Mo krank + Koffer aus B",
|
||||
"alternatives": [
|
||||
{"id": "A", "label": "Mo ist weg", "reason_rejected": "Zu passiv für 6k-Story"},
|
||||
{"id": "B", "label": "Huster gehört nicht Mo", "reason_rejected": "Zu Krimi-nah"}
|
||||
],
|
||||
"rationale": "Stärkster emotionaler Kern, passt zum Voice Profile"
|
||||
}
|
||||
```
|
||||
|
||||
### `review.verdict`
|
||||
```json
|
||||
{
|
||||
"archetype": "guardian",
|
||||
"verdict": "approved_with_fixes",
|
||||
"findings": [
|
||||
{"severity": "bug", "description": "Timeline: 'Montag' referenced but story starts Dienstag", "fix_required": true},
|
||||
{"severity": "recommendation", "description": "Gentrification monologue too long for Alex register", "fix_required": false}
|
||||
]
|
||||
}
|
||||
```
|
||||
|
||||
### `fix.applied`
|
||||
```json
|
||||
{
|
||||
"source": "guardian",
|
||||
"finding": "Timeline: Montag → Dienstag",
|
||||
"file": "stories/01-der-huster.md",
|
||||
"line": 302,
|
||||
"before": "das Gegenteil von Montag",
|
||||
"after": "das Gegenteil von Dienstag"
|
||||
}
|
||||
```
|
||||
|
||||
### `phase.transition`
|
||||
```json
|
||||
{
|
||||
"from": "plan",
|
||||
"to": "do",
|
||||
"artifacts_so_far": ["research.md", "outline.md"],
|
||||
"notes": "Explorer recommended direction C, Creator produced 6-scene outline"
|
||||
}
|
||||
```
|
||||
|
||||
### `cycle.boundary`
|
||||
```json
|
||||
{
|
||||
"cycle": 1,
|
||||
"max_cycles": 2,
|
||||
"exit_condition": "all_approved",
|
||||
"met": true,
|
||||
"fixes_applied": 6,
|
||||
"next_action": "complete"
|
||||
}
|
||||
```
|
||||
|
||||
### `shadow.detected`
|
||||
```json
|
||||
{
|
||||
"archetype": "story-explorer",
|
||||
"shadow": "endless_research",
|
||||
"trigger": "output >2000 words without recommendation",
|
||||
"action": "correction_prompt_applied",
|
||||
"occurrence": 1
|
||||
}
|
||||
```
|
||||
|
||||
### `run.complete`
|
||||
```json
|
||||
{
|
||||
"status": "completed",
|
||||
"cycles": 1,
|
||||
"agents_total": 5,
|
||||
"fixes_total": 6,
|
||||
"shadows": 0,
|
||||
"duration_ms": 1295519,
|
||||
"artifacts": [
|
||||
"docs/01-der-huster-research.md",
|
||||
"docs/01-der-huster-outline.md",
|
||||
"stories/01-der-huster.md",
|
||||
"docs/01-der-huster-guardian-review.md",
|
||||
"docs/01-der-huster-sage-review.md",
|
||||
"docs/01-der-huster-process.md"
|
||||
]
|
||||
}
|
||||
```
|
||||
|
||||
## How to Emit Events
|
||||
|
||||
During orchestration, write events using this pattern:
|
||||
|
||||
```bash
|
||||
# Append one event to the run's JSONL file
|
||||
echo '{"ts":"'$(date -u +%Y-%m-%dT%H:%M:%SZ)'","run_id":"RUN_ID","seq":SEQ,"type":"TYPE","phase":"PHASE","agent":"AGENT","data":{...}}' >> .archeflow/events/RUN_ID.jsonl
|
||||
```
|
||||
|
||||
Or use the helper script:
|
||||
|
||||
```bash
|
||||
./lib/archeflow-event.sh RUN_ID TYPE PHASE AGENT '{"key":"value"}'
|
||||
```
|
||||
|
||||
The orchestration skill should call the event emitter at each trigger point listed in the table above.
|
||||
|
||||
## Generating Reports
|
||||
|
||||
After orchestration completes (or during, for live progress):
|
||||
|
||||
```bash
|
||||
# Generate markdown process report
|
||||
./lib/archeflow-report.sh .archeflow/events/2026-04-03-der-huster.jsonl > docs/process-report.md
|
||||
|
||||
# List all runs
|
||||
cat .archeflow/events/index.jsonl | jq -r '[.run_id, .status, .task] | @tsv'
|
||||
```
|
||||
|
||||
## Run Index
|
||||
|
||||
After each `run.complete`, append a summary line to `.archeflow/events/index.jsonl`:
|
||||
|
||||
```jsonl
|
||||
{"run_id":"2026-04-03-der-huster","ts":"2026-04-03T16:00:00Z","task":"Write Der Huster","workflow":"kurzgeschichte","status":"completed","cycles":1,"agents":5,"fixes":6,"duration_ms":1295519}
|
||||
```
|
||||
|
||||
## Integration with Existing Skills
|
||||
|
||||
- **`orchestration`**: Emit events at phase transitions and after each agent
|
||||
- **`shadow-detection`**: Emit `shadow.detected` when thresholds trigger
|
||||
- **`autonomous-mode`**: Use `index.jsonl` for session summaries instead of separate session-log
|
||||
- **`workflow-design`**: Custom workflows inherit logging automatically
|
||||
|
||||
## Design Principles
|
||||
|
||||
1. **Append-only.** Never modify or delete events. They are immutable facts.
|
||||
2. **Self-contained.** Each event has enough context to be understood alone (no forward references).
|
||||
3. **Cheap.** One `echo >>` per event. No database, no service, no dependencies.
|
||||
4. **Optional.** If events dir doesn't exist, orchestration works fine without logging. Events are observation, not control flow.
|
||||
1027
skills/run/SKILL.md
1027
skills/run/SKILL.md
File diff suppressed because it is too large
Load Diff
@@ -5,176 +5,62 @@ description: Use when monitoring agent behavior for dysfunction, when an agent s
|
||||
|
||||
# Shadow Detection
|
||||
|
||||
Every archetype has a **virtue** (its unique contribution) and a **shadow** (the destructive inversion of that virtue). A shadow activates when the virtue is pushed too far.
|
||||
Every archetype has a virtue and a shadow (its destructive inversion). Shadow activates when the virtue is pushed too far.
|
||||
|
||||
```
|
||||
Virtue (healthy) → pushed too far → Shadow (dysfunction)
|
||||
|
||||
Contextual Clarity → can't stop → Rabbit Hole
|
||||
Decisive Framing → over-builds → Over-Architect
|
||||
Execution Discipline → no guardrails → Rogue
|
||||
Threat Intuition → sees threats only → Paranoid
|
||||
Assumption Surfacing → questions only → Paralytic
|
||||
Adversarial Creativity → noise over signal → False Alarm
|
||||
Maintainability Judgment → reviews only → Bureaucrat
|
||||
```
|
||||
| Archetype | Virtue | Shadow |
|
||||
|-----------|--------|--------|
|
||||
| Explorer | Contextual Clarity | Rabbit Hole |
|
||||
| Creator | Decisive Framing | Over-Architect |
|
||||
| Maker | Execution Discipline | Rogue |
|
||||
| Guardian | Threat Intuition | Paranoid |
|
||||
| Skeptic | Assumption Surfacing | Paralytic |
|
||||
| Trickster | Adversarial Creativity | False Alarm |
|
||||
| Sage | Maintainability Judgment | Bureaucrat |
|
||||
|
||||
---
|
||||
|
||||
## Explorer → Rabbit Hole
|
||||
**Virtue inverted:** Contextual Clarity becomes compulsive investigation — or output that dumps without analyzing.
|
||||
### Explorer -> Rabbit Hole
|
||||
**Detect** (any): output >2000w without Recommendation | >3 tangents | >15 files no patterns | no synthesis in final 25%
|
||||
**Correct**: "Summarize top 3 findings and one recommendation in under 300 words."
|
||||
|
||||
**Symptoms:**
|
||||
- Research output keeps growing but never synthesizes
|
||||
- "I found one more thing to check" repeated 3+ times
|
||||
- Reading more than 15 files without producing findings
|
||||
- Output is a raw inventory of files with no analysis or recommendation
|
||||
### Creator -> Over-Architect
|
||||
**Detect** (any): >2 new abstractions for a single feature | "future-proof" in rationale | scope exceeds task by >50% | >1 new package for one feature
|
||||
**Correct**: "Design for the current order of magnitude. Remove abstractions that serve hypothetical requirements."
|
||||
|
||||
**Detection Checklist** (trigger on ANY):
|
||||
- [ ] Output >2000 words without a `### Recommendation` section
|
||||
- [ ] >3 tangent topics not directly related to the original task
|
||||
- [ ] >15 files read with no `### Patterns` identified
|
||||
- [ ] No synthesis language (recommend, suggest, conclusion, finding, summary) in final 25% of output
|
||||
### Maker -> Rogue
|
||||
**Detect** (any): zero test files with >=3 files changed | single monolithic commit | diff contains files not in proposal | no evidence of running tests
|
||||
**Correct**: "Read the proposal. Write a test. Commit what you have. Revert changes to files not in the proposal."
|
||||
|
||||
**Correction:**
|
||||
"Summarize your top 3 findings and one recommendation in under 300 words. If your output has no Recommendation section, add one. A dump is not research."
|
||||
### Guardian -> Paranoid
|
||||
**Detect** (any): CRITICAL:WARNING ratio >2:1 (min 3 findings) | zero APPROVED in 3+ reviews | <50% findings include a fix | findings require already-compromised systems
|
||||
**Correct**: "For each CRITICAL: would a senior engineer block a PR for this? If not, downgrade. Every rejection must include a specific fix."
|
||||
|
||||
### Skeptic -> Paralytic
|
||||
**Detect** (any): >7 challenges in a single review | <50% include alternatives | same concern appears 2+ times reworded | >3 findings outside task scope
|
||||
**Correct**: "Rank challenges by impact. Keep top 3. Each must include a specific alternative. Delete the rest."
|
||||
|
||||
### Trickster -> False Alarm
|
||||
**Detect** (any): findings reference code untouched by diff | >10 findings for <5 files | impossible deployment scenarios | >3 findings without repro steps
|
||||
**Correct**: "Delete findings outside the diff. Rank remaining by likelihood x impact. Keep top 3-5."
|
||||
|
||||
### Sage -> Bureaucrat
|
||||
**Detect** (any): review words >2x diff lines | findings reference files not in changeset | >2 "consider" without concrete action | suggesting docs for <5-line functions
|
||||
**Correct**: "Limit to issues affecting maintainability in the next 6 months. Every finding must end with a specific action."
|
||||
|
||||
---
|
||||
|
||||
## Creator → Over-Architect
|
||||
**Virtue inverted:** Decisive Framing becomes designing at the wrong scale.
|
||||
## Escalation Protocol
|
||||
|
||||
**Symptoms:**
|
||||
- Abstraction layers for one-time operations
|
||||
- Future-proofing for requirements that don't exist
|
||||
- Configuration systems for things that could be constants
|
||||
- Proposal has more infrastructure than business logic
|
||||
|
||||
**Detection Checklist** (trigger on ANY):
|
||||
- [ ] >2 new abstractions (interfaces, base classes, factories, registries) for a single feature
|
||||
- [ ] "In the future we might need..." or "future-proof" appears in rationale
|
||||
- [ ] Proposal scope (files changed) exceeds original task scope by >50%
|
||||
- [ ] More than 1 new package/module introduced for a single feature
|
||||
|
||||
**Correction:**
|
||||
"Design for the current order of magnitude. If the app has 1000 users, design for 10,000 — not 10 million. Remove abstractions that serve hypothetical requirements."
|
||||
|
||||
---
|
||||
|
||||
## Maker → Rogue
|
||||
**Virtue inverted:** Execution Discipline becomes reckless shipping — or expanding beyond the plan.
|
||||
|
||||
**Symptoms:**
|
||||
- Writing code before reading the proposal fully
|
||||
- No tests, or tests written after implementation
|
||||
- Large uncommitted working tree
|
||||
- Files changed that aren't mentioned in the proposal
|
||||
|
||||
**Detection Checklist** (trigger on ANY):
|
||||
- [ ] Zero test files (`.test.`, `.spec.`, `_test.`) in the changeset with >=3 files changed
|
||||
- [ ] Single monolithic commit instead of incremental commits
|
||||
- [ ] Diff contains files not listed in the Creator's proposal `### Changes` section
|
||||
- [ ] No evidence of running existing test suite before finishing
|
||||
|
||||
**Correction:**
|
||||
"Read the proposal. Write a test. Commit what you have. Revert changes to files not in the proposal. Then continue."
|
||||
|
||||
---
|
||||
|
||||
## Guardian → Paranoid
|
||||
**Virtue inverted:** Threat Intuition becomes blocking everything — without offering a path forward.
|
||||
|
||||
**Symptoms:**
|
||||
- Every finding marked CRITICAL
|
||||
- Blocking on theoretical risks with < 1% probability
|
||||
- Rejecting without suggesting how to fix
|
||||
- Security concerns for internal-only code at external-API severity
|
||||
|
||||
**Detection Checklist** (trigger on ANY):
|
||||
- [ ] CRITICAL:WARNING ratio >2:1 (with minimum 3 total findings)
|
||||
- [ ] Zero APPROVED verdicts in 3+ consecutive reviews
|
||||
- [ ] <50% of findings include a suggested fix in the `Fix` column
|
||||
- [ ] Findings reference attack scenarios that require already-compromised internal systems
|
||||
|
||||
**Correction:**
|
||||
"For each CRITICAL finding, answer: Would a senior engineer block a PR for this? If not, downgrade. Every rejection must include a specific, implementable fix."
|
||||
|
||||
---
|
||||
|
||||
## Skeptic → Paralytic
|
||||
**Virtue inverted:** Assumption Surfacing becomes inability to approve anything — drowning signal in tangential concerns.
|
||||
|
||||
**Symptoms:**
|
||||
- More than 7 challenges raised
|
||||
- Challenges without suggested alternatives
|
||||
- "What about X?" chains that drift from the task
|
||||
- Restating the same concern in different words
|
||||
|
||||
**Detection Checklist** (trigger on ANY):
|
||||
- [ ] >7 findings/challenges raised in a single review
|
||||
- [ ] <50% of findings include an alternative in the `Fix` column
|
||||
- [ ] Same conceptual concern appears 2+ times with different wording
|
||||
- [ ] >3 findings reference code or scenarios outside the task scope
|
||||
|
||||
**Correction:**
|
||||
"Rank your challenges by impact. Keep the top 3. Each must include a specific alternative. Delete the rest."
|
||||
|
||||
---
|
||||
|
||||
## Trickster → False Alarm
|
||||
**Virtue inverted:** Adversarial Creativity becomes noise — too many low-signal findings drowning the real issues.
|
||||
|
||||
**Symptoms:**
|
||||
- Testing code that wasn't changed
|
||||
- Reporting non-bugs as bugs (unrealistic test scenarios)
|
||||
- 20 findings when 3 good ones would cover the real risks
|
||||
- Edge cases for edge cases (diminishing returns)
|
||||
|
||||
**Detection Checklist** (trigger on ANY):
|
||||
- [ ] Any finding references code untouched by the Maker's diff
|
||||
- [ ] >10 findings for a change touching <5 files
|
||||
- [ ] Findings describe scenarios requiring conditions that can't occur in the deployment context
|
||||
- [ ] >3 findings without reproduction steps
|
||||
|
||||
**Correction:**
|
||||
"Quality over quantity. Delete findings outside the Maker's diff. Rank remaining by likelihood x impact. Keep top 3-5. Three real findings beat twenty noise."
|
||||
|
||||
---
|
||||
|
||||
## Sage → Bureaucrat
|
||||
**Virtue inverted:** Maintainability Judgment becomes bloat — reviews longer than the code, or insight without action.
|
||||
|
||||
**Symptoms:**
|
||||
- Review longer than the code change itself
|
||||
- Requesting documentation for self-evident code
|
||||
- Suggesting refactors unrelated to the current task
|
||||
- Deep-sounding analysis that doesn't end with a specific action
|
||||
|
||||
**Detection Checklist** (trigger on ANY):
|
||||
- [ ] Review word count >2x the code change's line count (rough: review words > diff lines x 2)
|
||||
- [ ] Any finding references files not in the Maker's changeset
|
||||
- [ ] >2 findings use "consider" or "think about" without a concrete action in the `Fix` column
|
||||
- [ ] Suggesting documentation for functions with <5 lines or self-descriptive names
|
||||
|
||||
**Correction:**
|
||||
"Limit your review to issues that affect maintainability in the next 6 months. Every finding must end with a specific action. If you can't state the consequence of NOT fixing it, don't raise it."
|
||||
|
||||
---
|
||||
|
||||
## Shadow Escalation Protocol
|
||||
|
||||
1. **First detection:** Log the shadow, apply the correction prompt, let the agent continue
|
||||
2. **Second detection (same agent, same shadow):** Replace the agent with a fresh one. The shadow is entrenched.
|
||||
3. **Shadow detected in 3+ agents in the same cycle:** The task itself may be poorly scoped. Escalate to the user: "Multiple agents are struggling — the task may need to be broken down."
|
||||
1. **1st detection:** Log the shadow, apply the correction prompt, let the agent continue
|
||||
2. **2nd detection (same agent, same shadow):** Replace the agent -- the shadow is entrenched
|
||||
3. **3+ agents shadowed in same cycle:** Escalate to user -- the task may need to be broken down
|
||||
|
||||
## Shadow Immunity
|
||||
|
||||
Some behaviors LOOK like shadows but aren't:
|
||||
Some behaviors look like shadows but are not. **Rule of thumb:** shadow = behavior disconnected from the goal. Intensity alone is not a shadow.
|
||||
|
||||
- Explorer reading 20 files in a monorepo with scattered dependencies → **not a rabbit hole** if each file is genuinely relevant
|
||||
- Creator adding an abstraction → **not over-architect** if the abstraction is genuinely needed by the current task
|
||||
- Guardian blocking with 2 CRITICAL findings → **not paranoid** if both are genuine security vulnerabilities
|
||||
- Trickster finding 5 edge cases → **not false alarm** if all are in the changed code with reproduction steps
|
||||
- Sage writing a long review → **not bureaucrat** if the change is large and every finding is actionable
|
||||
|
||||
**Rule of thumb:** Shadow = behavior disconnected from the goal. Intensity alone is not a shadow.
|
||||
- Explorer reading 20 files in a monorepo with scattered dependencies -- not a rabbit hole if each file is genuinely relevant
|
||||
- Creator adding an abstraction -- not over-architect if the current task genuinely needs it
|
||||
- Guardian blocking with 2 CRITICALs -- not paranoid if both are genuine security vulnerabilities
|
||||
- Trickster finding 5 edge cases -- not false alarm if all are in changed code with repro steps
|
||||
- Sage writing a long review -- not bureaucrat if the change is large and every finding is actionable
|
||||
|
||||
@@ -20,16 +20,10 @@ This is the **primary operational mode** for ArcheFlow in multi-project workspac
|
||||
Use it when the user says "run the sprint", "work the queue", "go autonomous", or
|
||||
invokes `af-sprint`.
|
||||
|
||||
Do NOT use `archeflow:run` for individual tasks within a sprint — the sprint runner
|
||||
Do NOT use `archeflow:run` for individual tasks within a sprint -- the sprint runner
|
||||
handles task dispatch internally, using `archeflow:run` only when a task warrants
|
||||
full PDCA orchestration.
|
||||
|
||||
## Prerequisites
|
||||
|
||||
- `docs/orchestra/queue.json` — task queue (managed by `./scripts/ws`)
|
||||
- `./scripts/ws` — workspace CLI for queue operations
|
||||
- Each project is a separate git repo under the workspace root
|
||||
|
||||
## Invocation
|
||||
|
||||
```
|
||||
@@ -46,21 +40,12 @@ af-sprint --project writing.colette # Only process items for this project
|
||||
|
||||
### Step 0: Orient
|
||||
|
||||
```bash
|
||||
# Load queue and workspace state
|
||||
QUEUE=$(cat docs/orchestra/queue.json)
|
||||
MODE=$(echo "$QUEUE" | jq -r '.mode')
|
||||
```
|
||||
Load queue from `docs/orchestra/queue.json`. Check mode (`AUTONOM` / `ATTENDED` / `PAUSED`).
|
||||
Show one-line status: `sprint: AUTONOM | 7 pending (1xP0, 1xP2, 5xP3) | 4 slots`
|
||||
|
||||
Check mode:
|
||||
- `AUTONOM` → proceed without asking
|
||||
- `ATTENDED` → show plan, wait for user approval before each batch
|
||||
- `PAUSED` → report status only, do not start tasks
|
||||
|
||||
Show one-line status:
|
||||
```
|
||||
sprint: AUTONOM · 7 pending (1×P0, 1×P2, 5×P3) · 4 slots
|
||||
```
|
||||
- `AUTONOM` -- proceed without asking
|
||||
- `ATTENDED` -- show plan, wait for user approval before each batch
|
||||
- `PAUSED` -- report status only, do not start tasks
|
||||
|
||||
### Step 1: Select Batch
|
||||
|
||||
@@ -69,234 +54,111 @@ Pick tasks for the next batch. Rules:
|
||||
1. **Priority cascade**: P0 first, then P1, then P2. Never start P3 unless user explicitly includes it.
|
||||
2. **Dependency check**: Skip tasks whose `depends_on` items aren't all `completed`.
|
||||
3. **One agent per project**: Never run two tasks on the same project simultaneously.
|
||||
4. **Cost-aware concurrency**:
|
||||
- Estimate task cost from `estimate` field: S=cheap, M=moderate, L=expensive, XL=very expensive
|
||||
- **Expensive tasks** (L, XL): max 2 concurrent
|
||||
- **Cheap tasks** (S, M): fill remaining slots
|
||||
- Target mix: 1-2 expensive + 2-3 cheap = 4-5 total
|
||||
4. **Cost-aware concurrency**: L/XL tasks (expensive) max 2 concurrent. Fill remaining slots with S/M tasks. Target mix: 1-2 expensive + 2-3 cheap.
|
||||
5. **Slot limit**: Never exceed `--slots` (default 4).
|
||||
|
||||
```python
|
||||
# Pseudocode for batch selection
|
||||
batch = []
|
||||
used_projects = set()
|
||||
expensive_count = 0
|
||||
|
||||
for priority in ["P0", "P1", "P2"]:
|
||||
for task in queue_items(priority, status="pending"):
|
||||
if len(batch) >= MAX_SLOTS:
|
||||
break
|
||||
if task.project in used_projects:
|
||||
continue # One agent per project
|
||||
if not deps_satisfied(task):
|
||||
continue
|
||||
if task.estimate in ("L", "XL"):
|
||||
if expensive_count >= 2:
|
||||
continue
|
||||
expensive_count += 1
|
||||
batch.append(task)
|
||||
used_projects.add(task.project)
|
||||
```
|
||||
|
||||
### Step 2: Assess and Dispatch
|
||||
|
||||
For each task in the batch, decide the execution strategy:
|
||||
|
||||
| Signal | Strategy | What happens |
|
||||
|--------|----------|-------------|
|
||||
| Estimate S, clear scope | **Direct** | Spawn Agent() with task description, no orchestration |
|
||||
| Estimate M, multi-file | **Direct+** | Spawn Agent() with task + "read code first, run tests after" |
|
||||
| Estimate L/XL, code | **Feature-dev style** | Agent explores → implements → self-reviews (see below) |
|
||||
| Estimate L/XL, writing | **PDCA** | Use af-run with writing domain archetypes |
|
||||
| Task contains "validate", "test", "lint", "check" | **Direct** | Cheap analytical task, no orchestration |
|
||||
| Task contains "review", "audit", "security" | **Review** | Spawn Guardian + relevant reviewers only |
|
||||
| Signal | Strategy |
|
||||
|--------|----------|
|
||||
| Estimate S, clear scope | **Direct** -- Agent with task description, no orchestration |
|
||||
| Estimate M, multi-file | **Direct+** -- Agent with "read code first, run tests after" |
|
||||
| Estimate L/XL, code | **Feature-dev** -- Agent explores, plans, implements, tests, self-reviews, commits |
|
||||
| Estimate L/XL, writing | **PDCA** -- Use af-run with writing domain archetypes |
|
||||
| validate/test/lint/check tasks | **Direct** -- cheap analytical, no orchestration |
|
||||
| review/audit/security tasks | **Review** -- spawn Guardian + relevant reviewers only |
|
||||
|
||||
### L/XL Code Task Template (feature-dev style)
|
||||
### L/XL Code Task Template
|
||||
|
||||
For complex code tasks, give the agent a structured process instead of PDCA:
|
||||
Give the agent a structured process:
|
||||
|
||||
```
|
||||
Agent(
|
||||
description: "<project>: <task-short>",
|
||||
prompt: "You are working on project <project> at <path>.
|
||||
Task: <task description>
|
||||
Agent(prompt: "You are working on <project> at <path>. Task: <description>
|
||||
|
||||
Follow this process:
|
||||
1. EXPLORE: Read CLAUDE.md, docs/status.md, and the relevant source files.
|
||||
Understand existing patterns before writing anything.
|
||||
2. PLAN: Identify 2-3 files to change. Write a brief plan (what, where, why).
|
||||
If ambiguous, list your assumptions.
|
||||
3. IMPLEMENT: Make the changes. Follow existing code patterns strictly.
|
||||
4. TEST: Run the project's test suite. Fix any failures.
|
||||
5. SELF-REVIEW: Before committing, re-read your diff. Check:
|
||||
- Error handling: what happens when this fails?
|
||||
- Protocol compliance: am I using the right function signatures?
|
||||
- Tests: did I test the important paths?
|
||||
1. EXPLORE: Read CLAUDE.md, docs/status.md, relevant source files.
|
||||
2. PLAN: Identify files to change, write brief plan (what, where, why).
|
||||
3. IMPLEMENT: Follow existing code patterns strictly.
|
||||
4. TEST: Run project test suite, fix failures.
|
||||
5. SELF-REVIEW: Re-read diff -- error handling, protocol compliance, test coverage.
|
||||
6. COMMIT + PUSH: Conventional commits, signed, pushed.
|
||||
|
||||
<standard rules>
|
||||
|
||||
STATUS: DONE | DONE_WITH_CONCERNS | NEEDS_CONTEXT | BLOCKED"
|
||||
)
|
||||
STATUS: DONE | DONE_WITH_CONCERNS | NEEDS_CONTEXT | BLOCKED")
|
||||
```
|
||||
|
||||
This gives the agent feature-dev's structured exploration without the multi-agent overhead.
|
||||
For writing/research L/XL tasks, use af-run instead — archetypes add value where linters don't exist.
|
||||
### Agent Spawn Template
|
||||
|
||||
**Agent spawn template:**
|
||||
|
||||
For each task in the batch, spawn an Agent in the SAME message (parallel dispatch):
|
||||
Spawn ALL batch agents in a **single message** (parallel execution). Each agent gets:
|
||||
|
||||
```
|
||||
Agent(
|
||||
description: "<project>: <task-short>",
|
||||
prompt: "You are working on project <project> at <path>.
|
||||
Task: <task description>
|
||||
<notes if any>
|
||||
|
||||
prompt: "You are working on <project> at <path>. Task: <description>
|
||||
Rules:
|
||||
- Read the project's CLAUDE.md first
|
||||
- Commit with: git -c user.signingkey=/home/c/.ssh/id_ed25519_dev.pub commit
|
||||
- NO Co-Authored-By trailers
|
||||
- Conventional commits
|
||||
- Push when done: GIT_SSH_COMMAND='ssh -i /home/c/.ssh/id_ed25519_dev -o IdentitiesOnly=yes' git push origin main
|
||||
- Commit: git -c user.signingkey=/home/c/.ssh/id_ed25519_dev.pub commit
|
||||
- NO Co-Authored-By trailers, conventional commits
|
||||
- Push: GIT_SSH_COMMAND='ssh -i /home/c/.ssh/id_ed25519_dev -o IdentitiesOnly=yes' git push origin main
|
||||
- Run tests if the project has them
|
||||
- Report: what you did, what changed, any blockers
|
||||
|
||||
STATUS: DONE | DONE_WITH_CONCERNS | NEEDS_CONTEXT | BLOCKED",
|
||||
subagent_type: "general-purpose",
|
||||
isolation: "worktree" # Only for L/XL tasks; S/M tasks run directly
|
||||
isolation: "worktree" # Only for L/XL tasks; S/M run directly
|
||||
)
|
||||
```
|
||||
|
||||
**CRITICAL: Spawn all batch agents in a SINGLE message.** This enables parallel execution.
|
||||
Do not spawn them sequentially.
|
||||
|
||||
### Step 3: Mark Running
|
||||
|
||||
After spawning, update the queue:
|
||||
|
||||
Update the queue after spawning:
|
||||
```bash
|
||||
# For each spawned task
|
||||
./scripts/ws start <task-id> # or manually update queue.json status to "running"
|
||||
```
|
||||
|
||||
If `./scripts/ws start` doesn't exist, update queue.json directly:
|
||||
```python
|
||||
task["status"] = "running"
|
||||
# Write back to docs/orchestra/queue.json
|
||||
./scripts/ws start <task-id> # or update queue.json status to "running" directly
|
||||
```
|
||||
|
||||
### Step 4: Collect Results
|
||||
|
||||
As agents complete, process their results:
|
||||
Parse status token from agent output. Based on status:
|
||||
- `DONE` -- mark completed, note result
|
||||
- `DONE_WITH_CONCERNS` -- mark completed, log concerns for user review
|
||||
- `NEEDS_CONTEXT` -- mark pending, add concern to notes, skip for now
|
||||
- `BLOCKED` -- mark failed, add blocker to notes
|
||||
|
||||
1. **Parse status token** from agent output (last line: `STATUS: DONE|...`)
|
||||
2. **Based on status**:
|
||||
- `DONE` → mark completed, note result
|
||||
- `DONE_WITH_CONCERNS` → mark completed, log concerns for user review
|
||||
- `NEEDS_CONTEXT` → mark pending, add concern to notes, skip for now
|
||||
- `BLOCKED` → mark failed, add blocker to notes
|
||||
3. **Update queue**:
|
||||
```bash
|
||||
./scripts/ws done <task-id> -r "<summary of what was done>"
|
||||
# or
|
||||
./scripts/ws fail <task-id> -r "<reason>"
|
||||
```
|
||||
Update: `./scripts/ws done <task-id> -r "<summary>"` or `./scripts/ws fail <task-id> -r "<reason>"`
|
||||
|
||||
### Step 5: Report and Loop
|
||||
|
||||
After batch completes, show sprint status:
|
||||
Show batch status, then **immediately select next batch** (no user prompt in AUTONOM mode):
|
||||
|
||||
```
|
||||
── Sprint Batch 1 ──────────────────────────────
|
||||
✓ writing.colette fanout run done (45s)
|
||||
✓ book.3sets validation done (30s)
|
||||
△ book.sos meta-book concept needs_context (missing outline)
|
||||
✓ tool.archeflow af-review mode done (60s)
|
||||
|
||||
-- Sprint Batch 1 --------------------------------------------------
|
||||
+ writing.colette fanout run done (45s)
|
||||
+ book.3sets validation done (30s)
|
||||
! book.sos meta-book concept needs_context
|
||||
+ tool.archeflow af-review mode done (60s)
|
||||
Queue: 3 completed, 1 blocked, 3 remaining
|
||||
Next batch: 2 items ready
|
||||
────────────────────────────────────────────────
|
||||
--------------------------------------------------------------------
|
||||
```
|
||||
|
||||
Then **immediately select and dispatch the next batch** (Step 1). Don't wait for user input in AUTONOM mode.
|
||||
|
||||
### Step 6: Sprint Complete
|
||||
|
||||
When no more tasks are schedulable (all done, blocked, or P3-only):
|
||||
|
||||
When no more tasks are schedulable:
|
||||
1. Update `docs/control-center.md` Handoff section
|
||||
2. Run `./scripts/ws log --summary "<sprint summary>"` if available
|
||||
3. Show final sprint report:
|
||||
|
||||
```
|
||||
── Sprint Complete ─────────────────────────────
|
||||
Duration: 12 min
|
||||
Tasks: 5 completed, 1 blocked, 1 remaining (P3)
|
||||
Projects touched: 4
|
||||
Commits: 7
|
||||
────────────────────────────────────────────────
|
||||
```
|
||||
2. Run `./scripts/ws log --summary "<sprint summary>"`
|
||||
3. Show final report with duration, tasks completed/blocked/remaining, projects touched, commits
|
||||
|
||||
---
|
||||
|
||||
## Mode Behavior
|
||||
|
||||
### AUTONOM
|
||||
- Dispatch immediately, no user confirmation
|
||||
- Commit + push after each agent completes
|
||||
- Only pause for BLOCKED tasks or budget exhaustion
|
||||
- Report between batches (one-line status)
|
||||
|
||||
### ATTENDED
|
||||
- Show the selected batch before dispatching
|
||||
- Wait for user to approve: "Proceed with this batch? [y/n]"
|
||||
- After each batch, show results and ask: "Continue to next batch? [y/n/edit]"
|
||||
- "edit" lets the user reprioritize before next batch
|
||||
|
||||
### PAUSED
|
||||
- Show queue status only
|
||||
- Do not dispatch any agents
|
||||
- Useful for reviewing state between sessions
|
||||
|
||||
---
|
||||
|
||||
## When to Use ArcheFlow Orchestration Within Sprint
|
||||
|
||||
Most sprint tasks should be **direct agent dispatch** (no PDCA/pipeline overhead).
|
||||
Only escalate to full orchestration when:
|
||||
|
||||
| Signal | Action |
|
||||
|--------|--------|
|
||||
| Task is S/M, clear scope, single project | Direct dispatch |
|
||||
| Task is L/XL | Use pipeline or PDCA strategy |
|
||||
| Task mentions "security", "auth", "encryption" | Add Guardian review |
|
||||
| Task is a review/audit | Spawn reviewers only (af-review mode) |
|
||||
| Task failed in a previous sprint | Escalate to PDCA with Explorer |
|
||||
|
||||
The sprint runner's job is **throughput**, not perfection. Ship fast, fix forward.
|
||||
|
||||
---
|
||||
|
||||
## Integration with Existing Tools
|
||||
|
||||
| Tool | How sprint uses it |
|
||||
|------|-------------------|
|
||||
| `./scripts/ws next` | Get next schedulable task |
|
||||
| `./scripts/ws done <id>` | Mark task completed |
|
||||
| `./scripts/ws fail <id>` | Mark task failed |
|
||||
| `./scripts/ws orient` | Initial workspace overview |
|
||||
| `./scripts/ws validate` | Pre-flight queue validation |
|
||||
| `git` per project | Commit + push after each agent |
|
||||
| `archeflow:run` | Only for L/XL tasks needing PDCA |
|
||||
|
||||
---
|
||||
| Mode | Dispatch | Between batches | Stops for |
|
||||
|------|----------|----------------|-----------|
|
||||
| **AUTONOM** | Immediate | One-line status, no pause | BLOCKED or budget exhaustion |
|
||||
| **ATTENDED** | Show batch, wait for approval | Show results, ask "Continue? [y/n/edit]" | User decision |
|
||||
| **PAUSED** | No dispatch | -- | Always (status display only) |
|
||||
|
||||
## Error Recovery
|
||||
|
||||
- **Agent crashes mid-task**: Mark task as `failed`, add error to notes, continue with next batch
|
||||
- **Git push fails**: Log the error, do NOT retry. User will handle push conflicts manually.
|
||||
- **Queue file corrupted**: Run `./scripts/ws validate`. If invalid, stop sprint and report.
|
||||
- **Budget exceeded**: Stop sprint, report remaining tasks and estimated cost.
|
||||
- **All tasks blocked**: Report dependency graph, suggest which blockers to resolve first.
|
||||
- **Agent crash**: Mark `failed`, continue with next batch
|
||||
- **Git push fails**: Log error, do NOT retry -- user handles conflicts
|
||||
- **Queue corrupted**: Run `./scripts/ws validate`, stop if invalid
|
||||
- **Budget exceeded**: Stop sprint, report remaining tasks and estimated cost
|
||||
- **All blocked**: Report dependency graph, suggest which blockers to resolve first
|
||||
|
||||
@@ -5,180 +5,51 @@ description: Use at session start when implementing features, reviewing code, de
|
||||
|
||||
# ArcheFlow -- Active
|
||||
|
||||
Multi-agent orchestration using archetypal roles and PDCA quality cycles.
|
||||
|
||||
## Session Start
|
||||
|
||||
On activation, print ONE line:
|
||||
On activation, print ONE line then proceed silently:
|
||||
```
|
||||
archeflow v0.7.0 · 25 skills · <domain> domain
|
||||
```
|
||||
Where `<domain>` is auto-detected: `writing` if `colette.yaml` exists, `research` if paper/thesis files exist, `code` otherwise. Then proceed silently — no further announcement unless `archeflow:run` is invoked.
|
||||
Domain auto-detected: `writing` if `colette.yaml` exists, `research` if paper/thesis files, `code` otherwise.
|
||||
|
||||
During runs, follow the `archeflow:presence` skill for output format: show outcomes not mechanics, one line per phase, value at the end.
|
||||
## When to Use What
|
||||
|
||||
## IMPORTANT: When to Use What
|
||||
|
||||
### Use `/af-sprint` (primary mode) when:
|
||||
- User says "run the sprint", "work the queue", "go autonomous"
|
||||
- Multiple tasks are pending across projects
|
||||
- The workspace queue (docs/orchestra/queue.json) has pending items
|
||||
|
||||
### Use `/af-review` when:
|
||||
- User wants to review code before merging
|
||||
- A diff, branch, or commit range needs quality check
|
||||
- Security-sensitive changes need Guardian analysis
|
||||
|
||||
### Use `/af-run` (deep orchestration) when:
|
||||
- **Writing/research tasks** -- archetypes add value where linters don't exist
|
||||
- **Security-sensitive code changes** -- auth, encryption, API keys
|
||||
- **Complex multi-module refactors** with unclear approach
|
||||
|
||||
### Do NOT use ArcheFlow for:
|
||||
- **Single-feature code development** -- use `feature-dev` plugin or work directly
|
||||
- **Simple fixes** -- just do them
|
||||
- **Questions, exploration, reading** -- no code changes needed
|
||||
|
||||
Choose the workflow based on risk:
|
||||
|
||||
| Signal | Workflow | Command |
|
||||
|--------|----------|---------|
|
||||
| Small fix, low risk, single concern | `fast` | Creator --> Maker --> Guardian |
|
||||
| Feature, multiple files, moderate risk | `standard` | Explorer + Creator --> Maker --> Guardian + Skeptic + Sage |
|
||||
| Security-sensitive, breaking changes, public API | `thorough` | Explorer + Creator --> Maker --> All 4 reviewers |
|
||||
| Need | Command | When |
|
||||
|------|---------|------|
|
||||
| **Work the queue** | `/af-sprint` | Multiple tasks pending across projects, "run the sprint" |
|
||||
| **Deep orchestration** | `/af-run` | Writing/research tasks, security-sensitive code, complex multi-module refactors |
|
||||
| **Code review** | `/af-review` | Review diff/branch/commits before merging, security-sensitive changes |
|
||||
| **Single feature** | `feature-dev` or direct | Clear scope, one project -- no orchestration needed |
|
||||
|
||||
## When to Skip ArcheFlow
|
||||
|
||||
Do NOT use ArcheFlow for these -- just do them directly:
|
||||
Do NOT use for: single-line fixes, questions, reading/exploring, config tweaks, git ops.
|
||||
|
||||
- Single-line fixes, typos, formatting
|
||||
- Answering questions (no code changes)
|
||||
- Reading/exploring code without making changes
|
||||
- Config changes to a single file
|
||||
- Git operations (commit, push, branch)
|
||||
## Workflow Selection
|
||||
|
||||
**Mini-Reflect fallback:** Even when skipping ArcheFlow, apply a quick reflection for non-trivial single-file changes: (1) restate what you're changing, (2) name one assumption, (3) check if it could break anything. This takes ~10 seconds and catches misunderstandings before they become commits.
|
||||
|
||||
## Archetypes
|
||||
|
||||
| Archetype | Avatar | Virtue | Shadow | Phase |
|
||||
|-----------|--------|--------|--------|-------|
|
||||
| **Explorer** | 🔍 | Contextual Clarity | Rabbit Hole | Plan |
|
||||
| **Creator** | 🏗️ | Decisive Framing | Over-Architect | Plan |
|
||||
| **Maker** | ⚒️ | Execution Discipline | Rogue | Do |
|
||||
| **Guardian** | 🛡️ | Threat Intuition | Paranoid | Check |
|
||||
| **Skeptic** | 🤔 | Assumption Surfacing | Paralytic | Check |
|
||||
| **Trickster** | 🃏 | Adversarial Creativity | False Alarm | Check |
|
||||
| **Sage** | 📚 | Maintainability Judgment | Bureaucrat | Check |
|
||||
|
||||
## PDCA Cycle
|
||||
|
||||
```
|
||||
Plan --> Explorer researches, Creator proposes
|
||||
Do --> Maker implements in isolated worktree
|
||||
Check --> Reviewers assess in parallel (approve/reject)
|
||||
Act --> All approved? Merge. Issues? Cycle back to Plan.
|
||||
```
|
||||
|
||||
## Progress Indicators
|
||||
|
||||
During orchestration, emit phase markers so the user can track progress:
|
||||
|
||||
```
|
||||
--- ArcheFlow: <task> -------------------------
|
||||
Workflow: standard (2 cycles max)
|
||||
|
||||
🔍 [Plan] Explorer researching... done (35s)
|
||||
🏗️ [Plan] Creator designing proposal... done (25s, confidence: 0.8)
|
||||
⚒️ [Do] Maker implementing... done (90s, 4 files, 8 tests)
|
||||
🛡️ [Check] Guardian reviewing... APPROVED
|
||||
🤔 [Check] Skeptic challenging... APPROVED (1 INFO)
|
||||
📚 [Check] Sage reviewing... APPROVED
|
||||
[Act] All approved -- merging... merged to main
|
||||
|
||||
--- Complete: 3m 10s, 1 cycle -----------------
|
||||
```
|
||||
|
||||
Update each line as agents complete. This gives the user real-time visibility without interrupting the flow.
|
||||
|
||||
## Dry-Run Mode
|
||||
|
||||
When the user asks "what would ArcheFlow do?" or uses `--dry-run`, show the plan without executing:
|
||||
|
||||
```
|
||||
Dry run for: "Add JWT authentication"
|
||||
Workflow: standard (2 cycles)
|
||||
Agents: 🔍 Explorer --> 🏗️ Creator --> ⚒️ Maker --> 🛡️ Guardian + 🤔 Skeptic + 📚 Sage
|
||||
Est. agents: 6 per cycle, 12 max
|
||||
Worktree: yes (isolated branch)
|
||||
Proceed? [y/n]
|
||||
```
|
||||
|
||||
## Quick Start
|
||||
|
||||
When the user gives an implementation task:
|
||||
|
||||
1. Assess: does this need ArcheFlow? (see criteria above)
|
||||
2. If yes: load `archeflow:orchestration` skill
|
||||
3. Pick workflow (fast/standard/thorough)
|
||||
4. Execute the PDCA steps from the orchestration skill
|
||||
5. Emit progress indicators throughout (see above)
|
||||
| Signal | Workflow | Pipeline |
|
||||
|--------|----------|----------|
|
||||
| Small fix, low risk | `fast` | Creator --> Maker --> Guardian |
|
||||
| Feature, multi-file, moderate risk | `standard` | Explorer + Creator --> Maker --> Guardian + Skeptic + Sage |
|
||||
| Security, breaking changes, public API | `thorough` | Explorer + Creator --> Maker --> All 4 reviewers |
|
||||
|
||||
## Available Commands
|
||||
|
||||
| Command | What it does |
|
||||
|---------|-------------|
|
||||
| `archeflow:run` | Automated PDCA loop -- single command to orchestrate a full run |
|
||||
| `archeflow:orchestration` | Load manual PDCA execution guide |
|
||||
| `archeflow:shadow-detection` | Load shadow monitoring rules |
|
||||
| `archeflow:autonomous-mode` | Load autonomous/overnight session protocol |
|
||||
| `archeflow:status` | Show current orchestration state (phase, cycle, active agents) |
|
||||
| `archeflow:history` | Show past orchestration summaries from `.archeflow/session-log.md` |
|
||||
| `/af-sprint` | Queue-driven parallel agent runner (primary mode) |
|
||||
| `/af-run <task>` | PDCA orchestration loop (`--dry-run`, `--start-from`, `--workflow`) |
|
||||
| `/af-review` | Guardian-led code review on diff/branch/range |
|
||||
| `/af-status` | Current run state, active agents, findings |
|
||||
| `/af-report` | Full process report for a run |
|
||||
| `/af-init` | Initialize ArcheFlow in a project |
|
||||
| `/af-score` | Archetype effectiveness scores |
|
||||
| `/af-memory` | Cross-run lesson memory |
|
||||
| `/af-fanout` | Colette book fanout via agents |
|
||||
| `/af-dag` | DAG of current/last run |
|
||||
|
||||
### `archeflow:status`
|
||||
Read `.archeflow/state.json` (if exists) and report:
|
||||
- Current task, phase, and cycle
|
||||
- Active agents and their status
|
||||
- Findings so far (by severity)
|
||||
- Time elapsed
|
||||
## Mini-Reflect Fallback
|
||||
|
||||
### `archeflow:history`
|
||||
Read `.archeflow/session-log.md` and show the last 5 orchestration summaries in compact format.
|
||||
|
||||
## Skills Reference (All 24)
|
||||
|
||||
### Core Orchestration
|
||||
- **archeflow:run** -- Automated PDCA execution loop with `--start-from` and `--dry-run`
|
||||
- **archeflow:orchestration** -- Step-by-step manual execution guide
|
||||
- **archeflow:plan-phase** -- Explorer and Creator output formats and protocols
|
||||
- **archeflow:do-phase** -- Maker implementation rules and worktree commit strategy
|
||||
- **archeflow:check-phase** -- Shared reviewer protocols and output format
|
||||
- **archeflow:act-phase** -- Post-Check decision logic: collect findings, route fixes, exit or cycle
|
||||
|
||||
### Quality and Safety
|
||||
- **archeflow:shadow-detection** -- Quantitative dysfunction detection and correction
|
||||
- **archeflow:attention-filters** -- Context optimization per archetype
|
||||
- **archeflow:convergence** -- Detects convergence, stalling, and oscillation in multi-cycle runs
|
||||
- **archeflow:artifact-routing** -- Inter-phase artifact protocol for naming, storage, and routing
|
||||
|
||||
### Process Intelligence
|
||||
- **archeflow:process-log** -- Event-sourced JSONL logging with DAG parent relationships
|
||||
- **archeflow:memory** -- Cross-run learning from recurring findings
|
||||
- **archeflow:effectiveness** -- Archetype scoring on signal-to-noise, fix rate, cost efficiency
|
||||
- **archeflow:progress** -- Live progress file watchable from a second terminal
|
||||
|
||||
### Integration
|
||||
- **archeflow:colette-bridge** -- Bridges ArcheFlow with the Colette writing platform
|
||||
- **archeflow:git-integration** -- Git-per-phase commits, branch-per-run, rollback
|
||||
- **archeflow:multi-project** -- Cross-repo orchestration with dependency DAG and shared budget
|
||||
|
||||
### Configuration
|
||||
- **archeflow:custom-archetypes** -- Create domain-specific roles
|
||||
- **archeflow:workflow-design** -- Design custom workflows with per-phase archetype assignment
|
||||
- **archeflow:domains** -- Domain adapters for writing, research, and non-code workflows
|
||||
- **archeflow:cost-tracking** -- Budget enforcement and model tier recommendations
|
||||
- **archeflow:templates** -- Template gallery for sharing workflows, teams, and setup bundles
|
||||
- **archeflow:autonomous-mode** -- Unattended overnight sessions
|
||||
|
||||
### Meta
|
||||
- **archeflow:using-archeflow** -- This skill: session-start activation and quick reference
|
||||
Even when skipping ArcheFlow, apply for non-trivial changes:
|
||||
1. Restate what you're changing
|
||||
2. Name one assumption
|
||||
3. Check if it could break anything
|
||||
|
||||
Reference in New Issue
Block a user