feat: workflow intelligence, quality loop, completion promises, parallel teams

Sprint 1 — Workflow Intelligence (A1-A3): - Conditional escalation: fast→standard on 2+ CRITICALs - Guardian fast-path: skip remaining reviewers on clean pass - Confidence-triggered escalation: pause/upgrade/probe on low scores Sprint 2 — Quality Loop (B1-B2, B5-B6): - Maker self-review checklist before submitting to Check phase - Proposal diff ("What Changed") on cycle 2+ revisions - Convergence detection: escalate to user if same finding persists 2 cycles - Cross-archetype dedup: merge duplicate findings from different reviewers Sprint 3 — Completion & Verification (B3-B4): - Completion promise: user-defined done criteria checked in Act phase - Post-merge verification: run tests on main, auto-revert on failure Sprint 4 — Parallel & Scale (C1-C4): - Parallel team orchestration: 2-3 independent teams with merge gate - Task dependency graph in autonomous queue format - Auto-resume on interruption via .archeflow/state.json - Budget-aware scheduling with automatic workflow downgrade
2026-04-03 06:17:53 +02:00
parent 5139f1ad89
commit 83e09b70f2
4 changed files with 162 additions and 7 deletions
--- a/skills/autonomous-mode/SKILL.md
+++ b/skills/autonomous-mode/SKILL.md
@@ -147,7 +147,57 @@ Create `.archeflow/queue.md`:
 - [ ] Add user API endpoints (standard) | depends: user model
 - [ ] Add user UI (standard) | depends: user API endpoints
 ```
-Dependencies are processed in order. Parallel-safe tasks run concurrently.
+Dependencies are processed in order: a task with `depends: X` waits until X completes successfully. Tasks without dependencies or with resolved dependencies can run in parallel (see Parallel Team Orchestration in the orchestration skill).
 ### With Completion Criteria
 ```markdown
 - [ ] Fix login bug | fast | done: login_test.py passes
 - [ ] Add rate limiting | standard | done: Guardian approves AND load_test.sh passes
 ```
 Completion criteria are checked in the Act phase. If the test command fails even when reviewers approve, the task cycles back.
 ## Budget-Aware Scheduling
 Set a token or cost budget for the session. The orchestrator tracks estimated cost per task and adapts:
 ```
 Budget: $5.00 (or ~2M tokens)
 ```
 | Budget Remaining | Action |
 |-----------------|--------|
 | > 50% | Run tasks at their selected workflow level |
 | 25-50% | Downgrade `thorough` → `standard`, `standard` → `fast` |
 | < 25% | Run remaining tasks as `fast` only |
 | Exhausted | Stop. Log remaining tasks as "skipped — budget exhausted" |
 Budget is tracked per-task in the session log. Estimated cost = agents spawned x model tier pricing.
 ## Auto-Resume on Interruption
 If a session is interrupted (crash, timeout, user cancel), save state for resumption:
 ### On Interruption
 Write `.archeflow/state.json`:
 ```json
 {
  "session_id": "...",
  "current_task": 2,
  "current_phase": "check",
  "current_cycle": 1,
  "completed_tasks": [1],
  "queue": ["task3", "task4"],
  "worktree_branch": "archeflow/maker-abc",
  "timestamp": "2026-04-03T22:15:00Z"
 }
 ```
 ### On Next Session Start
 If `.archeflow/state.json` exists:
 1. Report: "Found interrupted ArcheFlow session from [timestamp]. Task [N] was in [phase] phase."
 2. Offer: "Resume from where we left off? Or start fresh?"
 3. If resume: pick up from the saved phase. The worktree branch is still intact.
 4. If fresh: clean up state file and worktrees, start over.
 ## Overnight Session Checklist
--- a/skills/check-phase/SKILL.md
+++ b/skills/check-phase/SKILL.md
@@ -66,6 +66,12 @@ After all reviewers finish, compile:
 |----------|----------|----------|-------------|-----|
 | src/auth/handler.ts:48 | CRITICAL | reliability | Empty string bypasses validation | Add `if (!token || token.trim() === '')` guard |
 ### Deduplication
 If two reviewers raise the same issue (same file + same category), merge:
 | Guardian + Skeptic | CRITICAL | security | Input not sanitized (src/api.ts:30) | Add validation |
 Use the higher severity. Don't double-count in the verdict.
 ### Verdict: REJECTED — 1 critical finding
 → Build cycle feedback (see orchestration skill) and feed to Plan phase
 ```
--- a/skills/orchestration/SKILL.md
+++ b/skills/orchestration/SKILL.md
@@ -17,6 +17,38 @@ Assess the task and pick:
 | Feature, multiple files, moderate risk | `standard` (2 cycles) |
 | Security-sensitive, breaking changes, public API | `thorough` (3 cycles) |
 ## Workflow Adaptation Rules
 The initial workflow choice is a starting point, not a commitment. These rules adapt the workflow at runtime based on signals from Creator and Guardian. Evaluate in order after each relevant phase.
 ### A1: Conditional Escalation (fast → standard)
 **When:** Guardian rejects with 2+ CRITICAL findings in a `fast` workflow.
 **Action:** Escalate to `standard` for the next cycle — add Skeptic + Sage to the reviewer roster.
 **Why:** If Guardian found serious issues, more perspectives help find root causes.
 **Override:** If A3 already escalated to `thorough`, do nothing (already higher).
 ### A2: Guardian Fast-Path (skip remaining reviewers)
 **When:** Guardian finds 0 CRITICAL and 0 WARNING in a `standard` or `thorough` workflow.
 **Action:** Skip Skeptic, Sage, and Trickster. Proceed directly to Act phase (merge).
 **Why:** Guardian's security review is the strictest gate. Clean pass = safe to merge without additional review cost.
 **Override:** Suppressed if A1 just triggered in the same cycle. Do not fast-path an escalated workflow.
 **Log:** Note "Guardian fast-path taken — remaining reviewers skipped" in orchestration report.
 ### A3: Confidence-Triggered Escalation
 **When:** Creator's confidence table has any axis below 0.5.
 **Action by axis:**
 | Axis | Score < 0.5 Action |
 |------|-------------------|
 | Task understanding | **Pause.** Ask user to clarify. Do not proceed until >0.5 or user overrides. |
 | Solution completeness | **Upgrade to standard.** Add Explorer before proceeding. |
 | Risk coverage | **Spawn mini-Explorer** for the specific risky area (parallel, 5 min max). Don't block other phases. |
 Multiple axes can trigger simultaneously — handle in parallel.
 ## Step 1: Plan Phase
 Spawn agents sequentially — Creator needs Explorer's findings.
@@ -97,7 +129,14 @@ Agent(
    3. Commit with descriptive messages
    4. Run existing tests — nothing may break
    5. If the proposal is unclear, implement your best interpretation and note it
-    Do NOT skip tests. Do NOT refactor unrelated code.",
+    Do NOT skip tests. Do NOT refactor unrelated code.
    BEFORE finishing — Self-Review Checklist:
    1. Did I change ALL files listed in the proposal's Changes section?
    2. Did I add tests for each behavioral change?
    3. Are there files in my diff NOT listed in the proposal? If yes, revert them.
    4. Do all existing tests still pass?
    Report any gaps in your Implementation summary.",
  isolation: "worktree",
  mode: "bypassPermissions"
 )
@@ -199,13 +238,27 @@ Agent(
 ## Step 4: Act Phase
-Collect all reviewer outputs and decide:
+Collect all reviewer outputs and decide.
-### All Approved
+### Completion Promise (optional)
 If the user defined explicit done criteria with the task, check them now:
 ```
 Completion criteria: <test command passes> AND <Guardian approves>
 Example: "done when pytest passes and Guardian approves with 0 CRITICAL"
 ```
 If completion criteria are defined, **all criteria must pass** — reviewer approval alone is not sufficient. If tests fail but reviewers approved, cycle back with "tests failing" as feedback to Creator.
 ### All Approved (and completion criteria met)
 1. Merge the Maker's worktree branch into the target branch
-2. Report: what was implemented, what was reviewed, any warnings noted
+2. **Post-merge verification:** Run the project's test suite on the merged branch
-3. Clean up the worktree
+   - Tests pass → proceed to step 3
-4. Record metrics (see Orchestration Metrics)
+   - Tests fail → **auto-revert** the merge commit, report the failure, and cycle back with "integration test failure on main" as feedback
 3. Report: what was implemented, what was reviewed, any warnings noted
 4. Clean up the worktree
 5. Record metrics (see Orchestration Metrics)
 ### Issues Found (and cycles remaining)
 1. Build structured feedback using the Cycle Feedback Protocol below
@@ -266,6 +319,24 @@ Compare cycle N findings against cycle N-1:
 This prevents regression and gives the Creator/Maker a clear list of what to address.
 ### 4. Convergence Detection
 If the **same finding** (same category + same file location) appears **unresolved in 2 consecutive cycles**, escalate to user:
 > "Finding persists across 2 cycles: [Guardian] CRITICAL security — SQL injection in src/auth.ts:48. This may need human judgment or a different approach."
 Do not cycle again blindly. The issue is likely structural (wrong design, not wrong implementation) and needs human input.
 ### 5. Cross-Archetype Dedup
 If two reviewers raise the same issue (same file + same category + similar description), merge into one finding in the consolidated output:
 ```
 | Guardian + Skeptic | CRITICAL | security | Input not sanitized (src/api.ts:30) | Add validation |
 ```
 Don't double-count in severity tallies. Route to the higher-priority destination (Creator over Maker).
 ---
 ## Orchestration Metrics
@@ -364,6 +435,31 @@ On detection: apply correction prompt from `archeflow:shadow-detection` skill. O
 ---
 ## Parallel Team Orchestration
 When running multiple independent tasks, spawn parallel ArcheFlow teams. Each team runs its own PDCA cycle on a separate worktree.
 ### Rules
 1. **Non-overlapping file scope:** Each team must work on different files. If two tasks touch the same file, run them sequentially.
 2. **Independent worktrees:** Each team's Maker gets its own worktree branch (`archeflow/team-1-maker`, `archeflow/team-2-maker`).
 3. **First-finished-first-merged:** Teams merge in completion order. Later teams rebase onto the updated main before their own merge.
 4. **Merge conflict handling:** If rebase fails, the later team re-runs its Check phase against the merged main. If conflicts are structural, escalate to user.
 5. **Max 3 parallel teams:** More causes diminishing returns and merge headaches.
 ### Spawning Parallel Teams
 ```
 # Launch 2-3 teams in a single message with multiple Agent calls:
 Agent(description: "🏗️ Team 1: pagination fix (fast)", ...)
 Agent(description: "🏗️ Team 2: JWT auth (standard)", ...)
 Agent(description: "🏗️ Team 3: logging refactor (fast)", ...)
 ```
 Each team follows the full PDCA steps independently. The orchestrator monitors all teams and handles merges.
 ---
 ## Orchestration Report
 After completion, summarize:
--- a/skills/plan-phase/SKILL.md
+++ b/skills/plan-phase/SKILL.md
@@ -77,6 +77,9 @@ When the Creator receives structured feedback from a prior cycle, the proposal m
 ```markdown
 ## Proposal: <task> (Revision — Cycle N)
 ### What Changed (vs. prior proposal)
 - <brief delta: what was added, removed, or redesigned>
 ### Prior Feedback Response
 | Issue | Source | Action | Rationale |
 |-------|--------|--------|-----------|