feat: add automated PDCA loop, domain adapters, cost tracking, DAG renderer
- skills/run: automated PDCA execution loop with --start-from, --dry-run - skills/artifact-routing: inter-phase artifact protocol with context injection - skills/act-phase: structured review→fix pipeline with cycle feedback - skills/domains: domain adapter system (writing, code, research) - skills/cost-tracking: per-agent cost estimation, budget enforcement - lib/archeflow-dag.sh: ASCII DAG renderer from JSONL events - lib/archeflow-report.sh: updated with DAG section, cycle diff, --dag/--summary flags
This commit is contained in:
327
skills/cost-tracking/SKILL.md
Normal file
327
skills/cost-tracking/SKILL.md
Normal file
@@ -0,0 +1,327 @@
|
||||
---
|
||||
name: cost-tracking
|
||||
description: |
|
||||
Cost aggregation, budget enforcement, and model selection for ArcheFlow orchestrations.
|
||||
Tracks per-agent and per-run token costs, enforces budgets, and recommends the cheapest
|
||||
model that meets quality requirements per archetype and domain.
|
||||
<example>User: "How much did that orchestration cost?"</example>
|
||||
<example>Automatically active when budget is configured</example>
|
||||
---
|
||||
|
||||
# Cost Tracking — Budget-Aware Orchestration
|
||||
|
||||
Every ArcheFlow orchestration consumes LLM tokens. This skill tracks costs per agent and per run, enforces budgets, and recommends cost-optimal model assignments.
|
||||
|
||||
## Model Pricing Table
|
||||
|
||||
Current pricing (update when models change):
|
||||
|
||||
| Model | Input ($/M tokens) | Output ($/M tokens) | Notes |
|
||||
|-------|--------------------:|---------------------:|-------|
|
||||
| `claude-opus-4-6` | 15.00 | 75.00 | Highest quality, use sparingly |
|
||||
| `claude-sonnet-4-6` | 3.00 | 15.00 | Good balance of quality and cost |
|
||||
| `claude-haiku-4-5` | 0.80 | 4.00 | Cheap, fast, good for structured tasks |
|
||||
|
||||
**Prompt caching** (when applicable): 90% discount on cached input tokens. The orchestrator should structure system prompts to maximize cache hits (archetype instructions, voice profiles, and domain context are cache-friendly since they repeat across agents in a run).
|
||||
|
||||
**Batches API**: 50% discount on all tokens. Use for non-time-sensitive bulk operations (validation passes, consistency checks).
|
||||
|
||||
## Per-Agent Cost Tracking
|
||||
|
||||
Every `agent.complete` event includes cost data:
|
||||
|
||||
```jsonl
|
||||
{
|
||||
"type": "agent.complete",
|
||||
"data": {
|
||||
"archetype": "story-explorer",
|
||||
"duration_ms": 87605,
|
||||
"tokens_input": 15000,
|
||||
"tokens_output": 6000,
|
||||
"tokens_cache_read": 8000,
|
||||
"model": "haiku",
|
||||
"estimated_cost_usd": 0.02,
|
||||
"summary": "3 plot directions developed, recommended C"
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
### Cost Calculation
|
||||
|
||||
```
|
||||
cost = (tokens_input - tokens_cache_read) * input_price / 1_000_000
|
||||
+ tokens_cache_read * input_price * 0.10 / 1_000_000
|
||||
+ tokens_output * output_price / 1_000_000
|
||||
```
|
||||
|
||||
If exact token counts are unavailable (Claude Code doesn't always expose them), estimate based on character count:
|
||||
|
||||
```
|
||||
estimated_tokens = character_count / 4 # rough heuristic
|
||||
```
|
||||
|
||||
Mark estimated costs with `"cost_estimated": true` in the event data so reports can distinguish measured from estimated values.
|
||||
|
||||
## Run-Level Aggregation
|
||||
|
||||
The `run.complete` event includes cost totals:
|
||||
|
||||
```jsonl
|
||||
{
|
||||
"type": "run.complete",
|
||||
"data": {
|
||||
"status": "completed",
|
||||
"total_tokens_input": 95000,
|
||||
"total_tokens_output": 33000,
|
||||
"total_tokens_cache_read": 42000,
|
||||
"total_cost_usd": 1.45,
|
||||
"budget_usd": 10.00,
|
||||
"budget_remaining_usd": 8.55,
|
||||
"agents_total": 5,
|
||||
"cost_by_phase": {
|
||||
"plan": 0.35,
|
||||
"do": 0.72,
|
||||
"check": 0.38
|
||||
},
|
||||
"cost_by_model": {
|
||||
"haiku": 0.12,
|
||||
"sonnet": 1.33
|
||||
}
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
### Cost Summary in Orchestration Report
|
||||
|
||||
After each orchestration, the report includes a cost section:
|
||||
|
||||
```markdown
|
||||
## Cost Summary
|
||||
| Phase | Model(s) | Tokens (in/out) | Cost |
|
||||
|-------|----------|-----------------|------|
|
||||
| Plan | haiku, sonnet | 32k / 12k | $0.35 |
|
||||
| Do | sonnet | 40k / 15k | $0.72 |
|
||||
| Check | haiku, sonnet | 23k / 6k | $0.38 |
|
||||
| **Total** | | **95k / 33k** | **$1.45** |
|
||||
|
||||
Budget: $10.00 | Spent: $1.45 | Remaining: $8.55
|
||||
```
|
||||
|
||||
## Budget Configuration
|
||||
|
||||
Budgets are defined in team presets or `.archeflow/config.yaml`:
|
||||
|
||||
```yaml
|
||||
# .archeflow/config.yaml
|
||||
budget:
|
||||
per_run_usd: 10.00 # Max cost per orchestration run
|
||||
per_agent_usd: 3.00 # Max cost per individual agent
|
||||
daily_usd: 50.00 # Max daily spend across all runs
|
||||
warn_at_percent: 75 # Warn when this % of budget is consumed
|
||||
```
|
||||
|
||||
```yaml
|
||||
# Team preset override
|
||||
name: story-development
|
||||
domain: writing
|
||||
budget:
|
||||
per_run_usd: 5.00 # Writing runs are usually cheaper
|
||||
```
|
||||
|
||||
Team preset budget overrides the global config for that run.
|
||||
|
||||
### Budget Precedence
|
||||
|
||||
1. Team preset `budget` (if set)
|
||||
2. `.archeflow/config.yaml` `budget`
|
||||
3. No budget (unlimited) — costs are still tracked but not enforced
|
||||
|
||||
## Budget Enforcement
|
||||
|
||||
Budget checks happen at two points:
|
||||
|
||||
### 1. Pre-Agent Check (before spawning)
|
||||
|
||||
Before each agent is spawned, estimate its cost and check against remaining budget:
|
||||
|
||||
```
|
||||
estimated_agent_cost = estimate_tokens(archetype, task_complexity) * model_price
|
||||
remaining_budget = budget - sum(costs_so_far)
|
||||
|
||||
if estimated_agent_cost > remaining_budget:
|
||||
WARN: "Estimated cost for {archetype} (${estimated}) would exceed remaining budget (${remaining}). Continue? [y/N]"
|
||||
```
|
||||
|
||||
**In autonomous mode**: if budget would be exceeded, STOP the run and report. Do not prompt — there is no one to answer.
|
||||
|
||||
**In attended mode**: warn and ask the user. They can approve the overage or stop.
|
||||
|
||||
### 2. Post-Agent Check (after completion)
|
||||
|
||||
After each agent completes, update the running total and check:
|
||||
|
||||
```
|
||||
if total_cost > budget * warn_at_percent / 100:
|
||||
WARN: "Budget ${warn_at_percent}% consumed (${total_cost} of ${budget})"
|
||||
|
||||
if total_cost > budget:
|
||||
STOP: "Budget exceeded (${total_cost} of ${budget}). Run halted."
|
||||
```
|
||||
|
||||
### Pre-Agent Cost Estimation
|
||||
|
||||
Rough token estimates by archetype (calibrate over time with actual data from `metrics.jsonl`):
|
||||
|
||||
| Archetype | Typical Input | Typical Output | Notes |
|
||||
|-----------|-------------:|---------------:|-------|
|
||||
| Explorer | 8k | 4k | Research, reads many files |
|
||||
| Creator | 12k | 6k | Receives Explorer output, produces plan |
|
||||
| Maker | 15k | 12k | Largest output (implementation/prose) |
|
||||
| Guardian | 10k | 3k | Reads diff, structured output |
|
||||
| Skeptic | 8k | 3k | Reads proposal, structured challenges |
|
||||
| Sage | 12k | 4k | Reads diff + proposal |
|
||||
| Trickster | 8k | 4k | Reads diff, generates test cases |
|
||||
|
||||
These are starting estimates. After 10+ runs, use actual averages from `metrics.jsonl` instead.
|
||||
|
||||
## Cost-Aware Model Selection
|
||||
|
||||
Each archetype has a recommended model tier based on the quality requirements of its role:
|
||||
|
||||
### Default Model Assignments (Code Domain)
|
||||
|
||||
| Archetype | Model | Rationale |
|
||||
|-----------|-------|-----------|
|
||||
| Explorer | haiku | Research is structured extraction — cheap model handles it well |
|
||||
| Creator | sonnet | Design decisions need reasoning quality |
|
||||
| Maker | sonnet | Implementation needs quality to avoid rework cycles |
|
||||
| Guardian | haiku | Security/risk review is checklist-driven — structured and cheap |
|
||||
| Skeptic | haiku | Challenge generation follows patterns — cheap |
|
||||
| Sage | sonnet | Holistic quality judgment needs nuance |
|
||||
| Trickster | haiku | Adversarial testing is systematic — cheap |
|
||||
|
||||
### Writing Domain Overrides
|
||||
|
||||
Writing tasks need higher quality for prose-generating agents:
|
||||
|
||||
| Archetype | Model | Rationale |
|
||||
|-----------|-------|-----------|
|
||||
| Explorer / story-explorer | haiku | Research is still cheap |
|
||||
| Creator | sonnet | Outline design needs narrative judgment |
|
||||
| Maker | **sonnet** | Prose quality is the product — cannot be cheap |
|
||||
| Guardian | haiku | Plot/continuity checks are structured |
|
||||
| Skeptic | haiku | Premise challenges are structured |
|
||||
| Sage / story-sage | **sonnet** | Voice and craft judgment need taste |
|
||||
| Trickster | haiku | Reader-confusion analysis is systematic |
|
||||
|
||||
**When to escalate to opus**: Only for final-pass prose polishing on high-stakes content (book manuscripts, not short stories). Never for review or research agents. The user must explicitly opt in via:
|
||||
|
||||
```yaml
|
||||
# Team preset
|
||||
model_overrides:
|
||||
maker: opus # Only for final polish pass
|
||||
```
|
||||
|
||||
### Domain-Driven Model Selection
|
||||
|
||||
The effective model for each agent is resolved in this order:
|
||||
|
||||
1. **Team preset `model_overrides`** (highest priority — explicit choice)
|
||||
2. **Domain `model_overrides`** (from `.archeflow/domains/<name>.yaml`)
|
||||
3. **Archetype default** (from the table above)
|
||||
4. **Custom archetype `model` field** (from archetype YAML frontmatter)
|
||||
|
||||
Example resolution for `story-sage` in a writing run:
|
||||
- Team preset says nothing about story-sage → skip
|
||||
- Writing domain says `story-sage: sonnet` → **use sonnet**
|
||||
- Archetype YAML says `model: sonnet` → would have been used if domain didn't specify
|
||||
|
||||
## Cost Optimization Strategies
|
||||
|
||||
### 1. Prompt Caching
|
||||
|
||||
Structure prompts so that stable content comes first (maximizes cache prefix hits):
|
||||
|
||||
```
|
||||
[System prompt — archetype instructions] ← cached across agents in same run
|
||||
[Domain context — voice profile, persona] ← cached across agents in same run
|
||||
[Phase context — Explorer output, proposal] ← changes per agent
|
||||
[Task-specific instructions] ← changes per agent
|
||||
```
|
||||
|
||||
Estimated savings: 30-50% on input tokens for runs with 5+ agents.
|
||||
|
||||
### 2. Guardian Fast-Path (A2)
|
||||
|
||||
When Guardian approves with 0 issues, skip Skeptic/Sage/Trickster. This saves 2-3 agent calls per cycle. See `archeflow:orchestration` skill, rule A2.
|
||||
|
||||
Typical savings: $0.30-0.80 per skipped cycle (depending on models).
|
||||
|
||||
### 3. Explorer Cache
|
||||
|
||||
Reuse recent Explorer research instead of re-running. See `archeflow:orchestration` skill, Explorer Cache section.
|
||||
|
||||
Typical savings: $0.02-0.05 per cache hit (haiku Explorer).
|
||||
|
||||
### 4. Batches API for Bulk Operations
|
||||
|
||||
When running consistency checks, validation passes, or other non-time-sensitive work across multiple files, use the Batches API (50% discount):
|
||||
|
||||
```yaml
|
||||
# Mark agents as batch-eligible in team presets
|
||||
batch_eligible:
|
||||
- guardian # Structured review, can wait
|
||||
- skeptic # Challenge generation, can wait
|
||||
```
|
||||
|
||||
Only use batches when the user is not waiting for real-time results (overnight runs, autonomous mode).
|
||||
|
||||
### 5. Early Termination
|
||||
|
||||
If the first cycle produces a clean Guardian pass (A2 fast-path) AND the Maker's self-review checklist is clean, skip the remaining cycles even if `max_cycles > 1`. This avoids spending tokens on unnecessary verification.
|
||||
|
||||
## Daily Cost Tracking
|
||||
|
||||
Across runs, maintain a daily cost ledger:
|
||||
|
||||
```
|
||||
.archeflow/costs/<YYYY-MM-DD>.jsonl
|
||||
```
|
||||
|
||||
Each line is one run's cost summary:
|
||||
|
||||
```jsonl
|
||||
{"run_id":"2026-04-03-der-huster","cost_usd":1.45,"tokens_input":95000,"tokens_output":33000,"models":{"haiku":2,"sonnet":3},"domain":"writing"}
|
||||
{"run_id":"2026-04-03-auth-refactor","cost_usd":2.10,"tokens_input":120000,"tokens_output":45000,"models":{"haiku":3,"sonnet":2},"domain":"code"}
|
||||
```
|
||||
|
||||
Daily budget enforcement reads this file to check `daily_usd` limits before starting new runs.
|
||||
|
||||
### Cost Report Command
|
||||
|
||||
```bash
|
||||
# Show today's costs
|
||||
./lib/archeflow-costs.sh today
|
||||
|
||||
# Show costs for a date range
|
||||
./lib/archeflow-costs.sh 2026-04-01 2026-04-03
|
||||
|
||||
# Show costs for a specific run
|
||||
./lib/archeflow-costs.sh run 2026-04-03-der-huster
|
||||
```
|
||||
|
||||
## Integration with Other Skills
|
||||
|
||||
- **`orchestration`**: Calls pre-agent and post-agent budget checks. Includes cost summary in orchestration report.
|
||||
- **`process-log`**: Cost data is embedded in `agent.complete` and `run.complete` events. No separate cost events needed.
|
||||
- **`domains`**: Reads `model_overrides` from the active domain to determine effective model per agent.
|
||||
- **`autonomous-mode`**: Enforces budget strictly (no prompts — just stop on budget exceeded). Uses daily budget to limit overnight spend.
|
||||
- **`workflow-design`**: Custom workflows can specify per-phase model assignments that override domain defaults.
|
||||
|
||||
## Design Principles
|
||||
|
||||
1. **Track always, enforce optionally.** Cost data is in every event regardless of whether a budget is set. Budget enforcement is opt-in.
|
||||
2. **Estimate before spend.** Always estimate before spawning an agent. Surprises are worse than slightly inaccurate estimates.
|
||||
3. **Cheapest model that works.** Default to haiku. Upgrade to sonnet only when the task demonstrably needs it. Opus is user-opt-in only.
|
||||
4. **Transparent.** Every cost shows up in the orchestration report. No hidden token spend.
|
||||
5. **Learn from history.** After enough runs, replace estimates with actual averages from `metrics.jsonl`.
|
||||
Reference in New Issue
Block a user