From ee5dfa70b8a7ce165ad9e0bdb23e610634fa2652 Mon Sep 17 00:00:00 2001 From: Christian Nennemann Date: Fri, 3 Apr 2026 11:41:06 +0200 Subject: [PATCH] feat: add multi-project orchestration with dependency DAG and shared budget --- skills/multi-project/SKILL.md | 629 ++++++++++++++++++++++++++++++++++ 1 file changed, 629 insertions(+) create mode 100644 skills/multi-project/SKILL.md diff --git a/skills/multi-project/SKILL.md b/skills/multi-project/SKILL.md new file mode 100644 index 0000000..4e94421 --- /dev/null +++ b/skills/multi-project/SKILL.md @@ -0,0 +1,629 @@ +--- +name: multi-project +description: | + Multi-project orchestration for workspaces with 20+ repos. Builds a dependency DAG across + projects, runs independent sub-runs in parallel, shares artifacts between dependent projects, + and enforces a shared budget. Each sub-run uses the standard `run` skill internally. + User: "archeflow:multi-project" with a multi-run.yaml + User: "Run this across archeflow, colette, and giesing" + User: "archeflow:multi-project --dry-run" +--- + +# Multi-Project Orchestration + +Coordinates ArcheFlow runs across multiple projects in a workspace. Each project gets its own +PDCA run (via the standard `run` skill), but dependencies between projects are respected, artifacts +are shared, and budget is tracked globally. + +## Prerequisites + +Load these skills (they are referenced throughout): +- `archeflow:run` — single-project PDCA execution loop +- `archeflow:process-log` — event schema and DAG parent rules +- `archeflow:artifact-routing` — artifact naming, context injection, cycle archiving +- `archeflow:cost-tracking` — cost aggregation and budget enforcement +- `archeflow:domains` — domain detection per project + +## Invocation + +``` +archeflow:multi-project # Read from .archeflow/multi-run.yaml +archeflow:multi-project --config path/to.yaml # Explicit config file +archeflow:multi-project --dry-run # Plan phase only for all projects, show cost estimate +archeflow:multi-project --resume # Resume a failed/paused multi-run +``` + +--- + +## Multi-Run Definition + +A multi-run is defined in YAML, either in `.archeflow/multi-run.yaml` or passed via `--config`. + +```yaml +name: "giesing-gschichten-v2" +description: "Write second story with improved ArcheFlow + Colette integration" + +projects: + - id: archeflow + path: "../archeflow" # Relative to workspace root, or absolute + task: "Add memory injection to run skill" + workflow: fast # fast | standard | thorough (optional, auto-select if omitted) + domain: code # Optional, auto-detected if omitted + depends_on: [] # No dependencies — can start immediately + + - id: colette + path: "../writing.colette" + task: "Add story-specific voice validation command" + workflow: standard + domain: code + depends_on: [] # Independent of archeflow — runs in parallel + + - id: giesing + path: "." + task: "Write story #2 using improved tools" + workflow: kurzgeschichte + domain: writing + depends_on: [archeflow, colette] # Waits for both to complete + +budget: + total_usd: 15.00 # Hard cap — stops all projects when exceeded + per_project_usd: 10.00 # Soft cap — warns but does not stop + +parallel: true # Run independent projects concurrently (default: true) +``` + +### Definition Rules + +- `id` must be unique within the multi-run. +- `path` is resolved relative to the directory containing the YAML file unless absolute. +- `depends_on` references other project `id` values. Cycles are rejected at validation time. +- `workflow` and `domain` are optional. If omitted, the `run` skill auto-selects per project. +- At least one project must have an empty `depends_on` (otherwise the DAG has no entry point). + +--- + +## Workspace Registry Integration + +If `docs/project-registry.md` exists at the workspace root, the multi-project skill can: + +1. **Auto-discover paths:** When `path` is omitted from a project entry, look up the project `id` in the registry to find its directory. +2. **Validate existence:** Before starting, verify that every project path exists on disk. Abort with a clear error if a path is missing. +3. **Show registry status:** In the progress table, include the project's current sprint goal from the registry alongside the multi-run status. +4. **Update registry:** After the multi-run completes, update each project's status in the registry if meaningful changes were made (new features, completed sprint goals). + +--- + +## Execution Steps + +### 0. Validate and Initialize + +**0a. Parse and validate the multi-run definition:** + +``` +1. Read the YAML file. +2. Validate all required fields (name, projects with id/path/task). +3. Resolve all paths to absolute paths. +4. Verify each path exists on disk. +5. Build the dependency DAG. +6. Check for cycles — abort if any detected. +7. Identify the entry-point projects (depends_on is empty). +8. Verify at least one entry-point exists. +``` + +**0b. Generate multi-run ID and directory structure:** + +```bash +MULTI_RUN_ID="$(date -u +%Y-%m-%d)-${name}" + +# Master event file +mkdir -p .archeflow/events +touch .archeflow/events/${MULTI_RUN_ID}.jsonl + +# Cross-project artifact directory +mkdir -p .archeflow/artifacts/${MULTI_RUN_ID} +for project in ${PROJECT_IDS}; do + mkdir -p .archeflow/artifacts/${MULTI_RUN_ID}/${project} +done + +# Progress file +touch .archeflow/multi-progress.md +``` + +**0c. Emit `multi.start`:** + +```jsonl +{"ts":"...","run_id":"","seq":1,"parent":[],"type":"multi.start","phase":"init","agent":null,"data":{"name":"giesing-v2","description":"...","projects":["archeflow","colette","giesing"],"parallel":true,"budget_total_usd":15.00,"dag":{"archeflow":[],"colette":[],"giesing":["archeflow","colette"]}}} +``` + +**Track state throughout the multi-run:** +- `MULTI_RUN_ID` — unique multi-run identifier +- `MULTI_SEQ` — master event sequence counter +- `PROJECT_STATUS` — map of project_id to status (`pending | running | completed | failed | blocked | skipped`) +- `PROJECT_RUN_IDS` — map of project_id to its sub-run_id +- `TOTAL_COST` — running cost total across all projects +- `REMAINING_BUDGET` — budget minus total cost + +--- + +### 1. Dependency Resolution + +Build a topological sort of the project DAG. This determines execution order. + +``` +Given: + archeflow: depends_on=[] + colette: depends_on=[] + giesing: depends_on=[archeflow, colette] + +Topological layers: + Layer 0 (immediate): [archeflow, colette] # No deps, start now + Layer 1: [giesing] # Depends on Layer 0 +``` + +**Algorithm:** +1. Find all projects with zero unmet dependencies. These form the current layer. +2. When a project completes, remove it from the dependency lists of all downstream projects. +3. Any project whose dependency list becomes empty moves to the ready queue. +4. Repeat until all projects are complete, failed, or blocked. + +**Cycle detection:** Before starting, verify the DAG is acyclic. Use Kahn's algorithm — if after processing all nodes the sorted list is shorter than the project list, there is a cycle. Report which projects form the cycle and abort. + +--- + +### 2. Parallel Execution + +For each project in the ready queue, start a sub-run. Independent projects run concurrently. + +**Starting a sub-run:** + +``` +For each ready project: + 1. Set PROJECT_STATUS[project_id] = "running" + 2. Generate sub-run ID: MULTI_RUN_ID/project_id + (e.g., "2026-04-03-giesing-v2/archeflow") + 3. Emit project.start to master event file + 4. cd into the project's path + 5. Invoke archeflow:run with: + - run_id = MULTI_RUN_ID/project_id + - workflow = project.workflow (or auto-select) + - domain = project.domain (or auto-detect) + - budget = min(per_project_budget, remaining_total_budget) + - artifact_dir = .archeflow/artifacts/MULTI_RUN_ID/project_id/ + 6. The sub-run emits its own events to its own JSONL file + inside the project's directory (standard run behavior) +``` + +**Concurrency model:** + +When `parallel: true` (default), spawn independent projects as parallel subagents: + +``` +Agent( + description: "Multi-project sub-run: ", + prompt: "Run archeflow:run in with task: . + Run ID: / + Workflow: + Domain: + Budget: $ + Save artifacts to: .archeflow/artifacts/// + When complete, report: status, cost, artifact list, and any issues.", + isolation: "worktree", + mode: "bypassPermissions" +) +``` + +Launch all Layer 0 projects simultaneously. As each completes, check if any Layer 1+ projects become unblocked. + +When `parallel: false`, run projects sequentially in topological order. Still respect dependencies — a project does not start until all its dependencies have completed. + +--- + +### 3. Master Events + +All multi-run-level events are written to `.archeflow/events/.jsonl`. These track the overall orchestration, not individual PDCA phases (those go to each project's own event file). + +#### Master Event Types + +| Event | When | Key Data | +|-------|------|----------| +| `multi.start` | Multi-run begins | Project list, DAG, budget | +| `project.start` | A sub-run launches | project_id, run_id, path | +| `project.complete` | A sub-run finishes successfully | project_id, status, cost, artifacts | +| `project.failed` | A sub-run fails | project_id, error, cost_so_far | +| `project.blocked` | A dependency failed, blocking this project | project_id, blocked_by | +| `project.unblocked` | All dependencies met, project can start | project_id, unblocked_by | +| `project.skipped` | User chose to skip a blocked project | project_id, reason | +| `budget.warning` | Budget threshold crossed | spent, budget, percent | +| `budget.exceeded` | Hard budget cap hit | spent, budget, halted_projects | +| `multi.complete` | All projects done (or halted) | status, projects_completed, total_cost | + +#### Example Master Event Stream + +```jsonl +{"seq":1,"type":"multi.start","phase":"init","data":{"name":"giesing-v2","projects":["archeflow","colette","giesing"],"parallel":true,"budget_total_usd":15.00}} +{"seq":2,"type":"project.start","phase":"run","data":{"project":"archeflow","run_id":"2026-04-03-giesing-v2/archeflow","path":"/home/c/projects/archeflow"}} +{"seq":3,"type":"project.start","phase":"run","data":{"project":"colette","run_id":"2026-04-03-giesing-v2/colette","path":"/home/c/projects/writing.colette"}} +{"seq":4,"type":"project.complete","phase":"run","data":{"project":"archeflow","status":"completed","run_id":"2026-04-03-giesing-v2/archeflow","cost_usd":1.20,"artifacts":["plan-explorer.md","plan-creator.md","do-maker.md","check-guardian.md"]}} +{"seq":5,"type":"project.complete","phase":"run","data":{"project":"colette","status":"completed","run_id":"2026-04-03-giesing-v2/colette","cost_usd":1.80,"artifacts":["plan-creator.md","do-maker.md","check-guardian.md","check-sage.md"]}} +{"seq":6,"type":"project.unblocked","phase":"run","data":{"project":"giesing","unblocked_by":["archeflow","colette"]}} +{"seq":7,"type":"project.start","phase":"run","data":{"project":"giesing","run_id":"2026-04-03-giesing-v2/giesing","path":"/home/c/projects/book.giesing-gschichten"}} +{"seq":8,"type":"project.complete","phase":"run","data":{"project":"giesing","status":"completed","run_id":"2026-04-03-giesing-v2/giesing","cost_usd":3.50,"artifacts":["plan-explorer.md","plan-creator.md","do-maker.md","check-guardian.md","check-sage.md"]}} +{"seq":9,"type":"multi.complete","phase":"done","data":{"status":"completed","projects_completed":3,"projects_failed":0,"total_cost_usd":6.50,"budget_remaining_usd":8.50}} +``` + +--- + +### 4. Cross-Project Artifacts + +When project B depends on project A, B's agents can access A's artifacts. This is the primary mechanism for cross-project information flow. + +#### Artifact Directory Layout + +``` +.archeflow/artifacts// +├── archeflow/ # Sub-run artifacts from archeflow +│ ├── plan-explorer.md +│ ├── plan-creator.md +│ ├── do-maker.md +│ ├── do-maker-files.txt +│ └── check-guardian.md +├── colette/ # Sub-run artifacts from colette +│ ├── plan-creator.md +│ ├── do-maker.md +│ └── check-sage.md +└── giesing/ # Sub-run artifacts from giesing (depends on both) + ├── plan-explorer.md # Explorer can reference upstream artifacts + ├── plan-creator.md + ├── do-maker.md + └── check-guardian.md +``` + +#### Cross-Project Context Injection + +When a dependent project's sub-run starts, inject upstream artifact summaries into the Explorer's prompt: + +```markdown +## Upstream Project Results + +### archeflow (completed) +Summary: Added memory injection to run skill. +Key artifacts: +- plan-creator.md: +- do-maker.md: + +### colette (completed) +Summary: Added story-specific voice validation command. +Key artifacts: +- plan-creator.md: +- do-maker.md: + +Use these results as context. The changes from these projects are available in their +respective directories and have been committed to their branches. +``` + +**Rules for cross-project injection:** +- Only inject summaries, not full artifacts (keep context small). +- If an upstream artifact is large (>200 lines), extract the summary/overview section only. +- The dependent project's Explorer has filesystem access to read full upstream artifacts if needed. +- Cross-project injection happens ONLY in the Plan phase (Explorer and Creator). The Maker works from the Creator's proposal, which already incorporates upstream context. + +--- + +### 5. Budget Coordination + +The multi-run has a shared budget across all projects. + +#### Budget Hierarchy + +``` +total_usd: 15.00 # Hard cap — stops ALL projects when exceeded +per_project_usd: 10.00 # Soft cap — warns but does not stop individual project +``` + +#### Budget Tracking + +Maintain a running total across all sub-runs: + +``` +TOTAL_COST = sum of all project costs reported in project.complete events +REMAINING = total_usd - TOTAL_COST +``` + +#### Budget Enforcement Points + +1. **Before starting a sub-run:** + - Estimate the sub-run cost (based on workflow and domain). + - If estimated cost > REMAINING: warn and ask user (attended) or halt (autonomous). + +2. **After each sub-run completes:** + - Update TOTAL_COST with actual cost from the sub-run. + - If TOTAL_COST > total_usd * warn_at_percent: emit `budget.warning`. + - If TOTAL_COST > total_usd: emit `budget.exceeded`, halt remaining projects. + +3. **Per-project soft cap:** + - Each sub-run receives `min(per_project_usd, REMAINING)` as its budget. + - The `run` skill's own budget enforcement handles the per-project cap. + - If a project exceeds per_project_usd, it warns but continues (soft cap). + +#### Budget Events + +```jsonl +{"seq":5,"type":"budget.warning","data":{"spent_usd":11.50,"budget_usd":15.00,"percent":77,"message":"Budget 77% consumed"}} +{"seq":8,"type":"budget.exceeded","data":{"spent_usd":15.30,"budget_usd":15.00,"halted_projects":["giesing"],"message":"Hard budget cap exceeded. Halting remaining projects."}} +``` + +--- + +### 6. Failure Handling + +Failures in one project affect downstream projects but not independent ones. + +#### Failure Scenarios + +| Scenario | Action | +|----------|--------| +| Project fails (run error, test failure, max cycles) | Mark as `failed` in master events. Independent projects continue. | +| Dependency of project X failed | Mark X as `blocked`. Do not start X. | +| Budget exceeded mid-run | Halt the current project. Mark remaining as `blocked`. | +| All entry-point projects fail | Entire multi-run fails. No downstream projects can start. | + +#### Blocked Project Resolution + +When a project is blocked because a dependency failed, offer three options: + +1. **Skip:** Mark the blocked project as `skipped`. Continue with other independent projects. +2. **Retry:** Re-run the failed dependency. If it succeeds, unblock downstream projects. +3. **Abort:** Stop the entire multi-run. Report what completed and what did not. + +In **autonomous mode**, the default action is `skip` — blocked projects are skipped, independent projects continue, and the multi-run completes with partial results. + +In **attended mode**, prompt the user with the options above. + +#### Failure Events + +```jsonl +{"seq":4,"type":"project.failed","data":{"project":"archeflow","error":"Max cycles reached with unresolved CRITICAL findings","cost_usd":2.10}} +{"seq":5,"type":"project.blocked","data":{"project":"giesing","blocked_by":["archeflow"],"reason":"Dependency 'archeflow' failed"}} +``` + +--- + +### 7. Progress Tracking + +Maintain a live progress file at `.archeflow/multi-progress.md`. Update it after every project state change. + +```markdown +# Multi-Run: giesing-v2 +Started: 2026-04-03T14:00:00Z + +| Project | Status | Domain | Phase | Detail | +|---------|--------|--------|-------|--------| +| archeflow | completed | code | -- | 1 cycle, $1.20 | +| colette | running | code | DO | maker drafting | +| giesing | blocked | writing | -- | waiting for colette | + +## Budget +| | Amount | +|---|--------| +| Spent | $3.00 | +| Budget | $15.00 | +| Remaining | $12.00 | +| Utilization | 20% | + +## Dependency Graph +``` +archeflow ----\ + +---> giesing +colette ------/ +``` + +## Timeline +- 14:00:00 — Started archeflow, colette (parallel) +- 14:05:23 — archeflow completed ($1.20, 1 cycle) +- 14:06:10 — colette DO phase, maker drafting +``` + +Update this file after: +- A project starts +- A project changes phase (via status polling or sub-agent reporting) +- A project completes or fails +- A project becomes unblocked +- Budget threshold is crossed + +--- + +### 8. Completion + +When all projects are complete (or blocked/skipped with no more actionable items): + +**8a. Emit `multi.complete`:** + +```jsonl +{"seq":9,"type":"multi.complete","phase":"done","data":{"status":"completed","projects_completed":3,"projects_failed":0,"projects_skipped":0,"total_cost_usd":6.50,"budget_remaining_usd":8.50,"duration_ms":600000}} +``` + +Status values: +- `completed` — all projects finished successfully +- `partial` — some projects completed, some failed/skipped +- `failed` — no projects completed successfully +- `halted` — stopped due to budget or user abort + +**8b. Generate multi-run report:** + +```markdown +# Multi-Run Report: giesing-v2 + +## Summary +| Metric | Value | +|--------|-------| +| Projects | 3 | +| Completed | 3 | +| Failed | 0 | +| Total cost | $6.50 / $15.00 | +| Duration | 10m 00s | + +## Per-Project Results +### archeflow +- **Status:** completed +- **Task:** Add memory injection to run skill +- **Workflow:** fast (1 cycle) +- **Cost:** $1.20 +- **Key artifacts:** plan-creator.md, do-maker.md + +### colette +- **Status:** completed +- **Task:** Add story-specific voice validation command +- **Workflow:** standard (1 cycle) +- **Cost:** $1.80 +- **Key artifacts:** plan-creator.md, do-maker.md, check-sage.md + +### giesing +- **Status:** completed +- **Task:** Write story #2 using improved tools +- **Workflow:** kurzgeschichte (2 cycles) +- **Cost:** $3.50 +- **Key artifacts:** plan-explorer.md, do-maker.md, check-guardian.md + +## Dependency Graph Execution +archeflow (Layer 0) ----> completed +colette (Layer 0) ----> completed +giesing (Layer 1) ----> unblocked ----> completed + +## Cost Breakdown +| Project | Plan | Do | Check | Total | +|---------|------|----|-------|-------| +| archeflow | $0.20 | $0.60 | $0.40 | $1.20 | +| colette | $0.30 | $0.80 | $0.70 | $1.80 | +| giesing | $0.50 | $2.00 | $1.00 | $3.50 | +| **Total** | **$1.00** | **$3.40** | **$2.10** | **$6.50** | +``` + +**8c. Update master event index:** + +Append to `.archeflow/events/index.jsonl`: + +```jsonl +{"run_id":"2026-04-03-giesing-v2","ts":"2026-04-03T14:10:00Z","type":"multi","task":"Write second story with improved ArcheFlow + Colette integration","status":"completed","projects":3,"total_cost_usd":6.50} +``` + +**8d. Update workspace registry (if applicable):** + +If `docs/project-registry.md` exists and project statuses changed meaningfully, update the registry entries for affected projects. + +--- + +## Dry-Run Mode + +When `--dry-run` is specified: + +1. Validate the multi-run definition (DAG, paths, budget). +2. For each project (in topological order), run `archeflow:run --dry-run` to get a cost estimate and plan preview. +3. Display a summary: + +``` +Multi-Run Dry Run: giesing-v2 + Projects: 3 + Dependency layers: 2 + Parallel execution: yes + + Layer 0 (parallel): + archeflow — fast workflow, code domain + Estimated cost: $0.50-1.50 + colette — standard workflow, code domain + Estimated cost: $1.00-3.00 + + Layer 1 (after Layer 0): + giesing — kurzgeschichte workflow, writing domain + Estimated cost: $2.00-5.00 + + Total estimated cost: $3.50-9.50 + Budget: $15.00 (sufficient) + + Proceed? [y/n] +``` + +4. Do NOT emit `multi.complete`. The multi-run is paused. +5. If user says yes, start the full multi-run using the validated config. + +--- + +## Resume Mode + +When `--resume ` is specified: + +1. Read the master event file `.archeflow/events/.jsonl`. +2. Reconstruct `PROJECT_STATUS` from events (which projects completed, failed, are pending). +3. Identify resumable projects: + - `failed` projects can be retried. + - `blocked` projects whose blockers are now `completed` (e.g., after manual fix) can start. + - `pending` projects that were never started can start if their deps are met. +4. Display current state and ask for confirmation. +5. Continue the multi-run from where it left off, appending to the existing master event file. + +Resume emits a `multi.resume` event: + +```jsonl +{"seq":10,"type":"multi.resume","phase":"init","data":{"resumed_from":"2026-04-03-giesing-v2","projects_completed":["archeflow"],"projects_to_run":["colette","giesing"]}} +``` + +--- + +## Integration with Existing Skills + +| Skill | Integration Point | +|-------|-------------------| +| `run` | Each sub-run is a standard `archeflow:run` invocation. The multi-project skill wraps and coordinates multiple runs. | +| `process-log` | Master events follow the same schema (ts, run_id, seq, parent, type, phase, agent, data). Sub-run events use the standard event types. | +| `artifact-routing` | Each sub-run follows standard artifact routing internally. Cross-project artifacts follow the injection rules in Section 4. | +| `cost-tracking` | Per-project costs come from sub-run `run.complete` events. The multi-project skill aggregates them and enforces the shared budget. | +| `domains` | Each project auto-detects its domain independently. Different projects in the same multi-run can have different domains. | +| `git-integration` | Each sub-run manages its own branch. The multi-project skill does not merge across repos — each project's Act phase handles its own merge. | +| `autonomous-mode` | Multi-project runs are autonomous-mode-friendly. Budget enforcement is strict (halt, don't prompt). Blocked projects are skipped. | + +--- + +## Progress Display + +Throughout the multi-run, display live progress: + +``` +━━━ ArcheFlow Multi-Run: giesing-v2 ━━━━━━━━━━━━━━━━━━━ +Projects: 3 | Budget: $15.00 | Parallel: yes + +[archeflow] fast/code -> running (Plan: Creator designing...) +[colette] standard/code -> running (Do: Maker implementing...) +[giesing] kurzgeschichte/writing -> blocked (waiting: archeflow, colette) + +Cost: $1.80 / $15.00 (12%) +━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ +``` + +Update the display when: +- A project changes state (start, phase change, complete, fail, unblock) +- Budget thresholds are crossed + +--- + +## Error Handling + +| Error | Response | +|-------|----------| +| YAML parse error | Abort before starting. Report the parse error with line number. | +| Dependency cycle detected | Abort. Report which projects form the cycle. | +| Project path does not exist | Abort. Report the missing path. | +| Sub-run agent fails to return | Mark project as failed (5-min timeout per the `run` skill). Continue independent projects. | +| Master event write fails | Log warning. Continue orchestration. Events are observation, not control flow. | +| Artifact directory creation fails | Abort the affected project. This is blocking for cross-project artifact sharing. | +| Budget exceeded mid-project | Halt that project immediately. Emit `budget.exceeded`. Skip downstream dependents. | + +--- + +## Design Principles + +1. **Each project is autonomous.** Sub-runs use the standard `run` skill without modification. The multi-project skill is a coordinator, not a replacement. +2. **DAG over sequence.** Dependencies are declared, not implied by order. Independent projects always run in parallel when possible. +3. **Shared budget, independent domains.** Budget is global, but each project detects its own domain, selects its own workflow, and manages its own artifacts. +4. **Fail forward.** A failure in one project does not halt independent projects. Only downstream dependents are blocked. +5. **Artifacts are the interface.** Projects communicate through saved artifacts, not shared memory or direct agent-to-agent messaging. +6. **Resume over restart.** Multi-runs can be resumed from any point. Master events provide enough state to reconstruct progress. +7. **Registry-aware.** When a workspace registry exists, use it for discovery and keep it updated. When it does not exist, everything still works.