--- name: cost-tracking description: | Cost aggregation, budget enforcement, and model selection for ArcheFlow orchestrations. Tracks per-agent and per-run token costs, enforces budgets, and recommends the cheapest model that meets quality requirements per archetype and domain. User: "How much did that orchestration cost?" Automatically active when budget is configured --- # Cost Tracking — Budget-Aware Orchestration Every ArcheFlow orchestration consumes LLM tokens. This skill tracks costs per agent and per run, enforces budgets, and recommends cost-optimal model assignments. ## Model Pricing Table Current pricing (update when models change): | Model | Input ($/M tokens) | Output ($/M tokens) | Notes | |-------|--------------------:|---------------------:|-------| | `claude-opus-4-6` | 15.00 | 75.00 | Highest quality, use sparingly | | `claude-sonnet-4-6` | 3.00 | 15.00 | Good balance of quality and cost | | `claude-haiku-4-5` | 0.80 | 4.00 | Cheap, fast, good for structured tasks | **Prompt caching** (when applicable): 90% discount on cached input tokens. The orchestrator should structure system prompts to maximize cache hits (archetype instructions, voice profiles, and domain context are cache-friendly since they repeat across agents in a run). **Batches API**: 50% discount on all tokens. Use for non-time-sensitive bulk operations (validation passes, consistency checks). ## Per-Agent Cost Tracking Every `agent.complete` event includes cost data: ```jsonl { "type": "agent.complete", "data": { "archetype": "story-explorer", "duration_ms": 87605, "tokens_input": 15000, "tokens_output": 6000, "tokens_cache_read": 8000, "model": "haiku", "estimated_cost_usd": 0.02, "summary": "3 plot directions developed, recommended C" } } ``` ### Cost Calculation ``` cost = (tokens_input - tokens_cache_read) * input_price / 1_000_000 + tokens_cache_read * input_price * 0.10 / 1_000_000 + tokens_output * output_price / 1_000_000 ``` If exact token counts are unavailable (Claude Code doesn't always expose them), estimate based on character count: ``` estimated_tokens = character_count / 4 # rough heuristic ``` Mark estimated costs with `"cost_estimated": true` in the event data so reports can distinguish measured from estimated values. ## Run-Level Aggregation The `run.complete` event includes cost totals: ```jsonl { "type": "run.complete", "data": { "status": "completed", "total_tokens_input": 95000, "total_tokens_output": 33000, "total_tokens_cache_read": 42000, "total_cost_usd": 1.45, "budget_usd": 10.00, "budget_remaining_usd": 8.55, "agents_total": 5, "cost_by_phase": { "plan": 0.35, "do": 0.72, "check": 0.38 }, "cost_by_model": { "haiku": 0.12, "sonnet": 1.33 } } } ``` ### Cost Summary in Orchestration Report After each orchestration, the report includes a cost section: ```markdown ## Cost Summary | Phase | Model(s) | Tokens (in/out) | Cost | |-------|----------|-----------------|------| | Plan | haiku, sonnet | 32k / 12k | $0.35 | | Do | sonnet | 40k / 15k | $0.72 | | Check | haiku, sonnet | 23k / 6k | $0.38 | | **Total** | | **95k / 33k** | **$1.45** | Budget: $10.00 | Spent: $1.45 | Remaining: $8.55 ``` ## Budget Configuration Budgets are defined in team presets or `.archeflow/config.yaml`: ```yaml # .archeflow/config.yaml budget: per_run_usd: 10.00 # Max cost per orchestration run per_agent_usd: 3.00 # Max cost per individual agent daily_usd: 50.00 # Max daily spend across all runs warn_at_percent: 75 # Warn when this % of budget is consumed ``` ```yaml # Team preset override name: story-development domain: writing budget: per_run_usd: 5.00 # Writing runs are usually cheaper ``` Team preset budget overrides the global config for that run. ### Budget Precedence 1. Team preset `budget` (if set) 2. `.archeflow/config.yaml` `budget` 3. No budget (unlimited) — costs are still tracked but not enforced ## Budget Enforcement Budget checks happen at two points: ### 1. Pre-Agent Check (before spawning) Before each agent is spawned, estimate its cost and check against remaining budget: ``` estimated_agent_cost = estimate_tokens(archetype, task_complexity) * model_price remaining_budget = budget - sum(costs_so_far) if estimated_agent_cost > remaining_budget: WARN: "Estimated cost for {archetype} (${estimated}) would exceed remaining budget (${remaining}). Continue? [y/N]" ``` **In autonomous mode**: if budget would be exceeded, STOP the run and report. Do not prompt — there is no one to answer. **In attended mode**: warn and ask the user. They can approve the overage or stop. ### 2. Post-Agent Check (after completion) After each agent completes, update the running total and check: ``` if total_cost > budget * warn_at_percent / 100: WARN: "Budget ${warn_at_percent}% consumed (${total_cost} of ${budget})" if total_cost > budget: STOP: "Budget exceeded (${total_cost} of ${budget}). Run halted." ``` ### Pre-Agent Cost Estimation Rough token estimates by archetype (calibrate over time with actual data from `metrics.jsonl`): | Archetype | Typical Input | Typical Output | Notes | |-----------|-------------:|---------------:|-------| | Explorer | 8k | 4k | Research, reads many files | | Creator | 12k | 6k | Receives Explorer output, produces plan | | Maker | 15k | 12k | Largest output (implementation/prose) | | Guardian | 10k | 3k | Reads diff, structured output | | Skeptic | 8k | 3k | Reads proposal, structured challenges | | Sage | 12k | 4k | Reads diff + proposal | | Trickster | 8k | 4k | Reads diff, generates test cases | These are starting estimates. After 10+ runs, use actual averages from `metrics.jsonl` instead. ## Cost-Aware Model Selection Each archetype has a recommended model tier based on the quality requirements of its role: ### Default Model Assignments (Code Domain) | Archetype | Model | Rationale | |-----------|-------|-----------| | Explorer | haiku | Research is structured extraction — cheap model handles it well | | Creator | sonnet | Design decisions need reasoning quality | | Maker | sonnet | Implementation needs quality to avoid rework cycles | | Guardian | haiku | Security/risk review is checklist-driven — structured and cheap | | Skeptic | haiku | Challenge generation follows patterns — cheap | | Sage | sonnet | Holistic quality judgment needs nuance | | Trickster | haiku | Adversarial testing is systematic — cheap | ### Writing Domain Overrides Writing tasks need higher quality for prose-generating agents: | Archetype | Model | Rationale | |-----------|-------|-----------| | Explorer / story-explorer | haiku | Research is still cheap | | Creator | sonnet | Outline design needs narrative judgment | | Maker | **sonnet** | Prose quality is the product — cannot be cheap | | Guardian | haiku | Plot/continuity checks are structured | | Skeptic | haiku | Premise challenges are structured | | Sage / story-sage | **sonnet** | Voice and craft judgment need taste | | Trickster | haiku | Reader-confusion analysis is systematic | **When to escalate to opus**: Only for final-pass prose polishing on high-stakes content (book manuscripts, not short stories). Never for review or research agents. The user must explicitly opt in via: ```yaml # Team preset model_overrides: maker: opus # Only for final polish pass ``` ### Domain-Driven Model Selection The effective model for each agent is resolved in this order: 1. **Team preset `model_overrides`** (highest priority — explicit choice) 2. **Domain `model_overrides`** (from `.archeflow/domains/.yaml`) 3. **Archetype default** (from the table above) 4. **Custom archetype `model` field** (from archetype YAML frontmatter) Example resolution for `story-sage` in a writing run: - Team preset says nothing about story-sage → skip - Writing domain says `story-sage: sonnet` → **use sonnet** - Archetype YAML says `model: sonnet` → would have been used if domain didn't specify ## Cost Optimization Strategies ### 1. Prompt Caching Structure prompts so that stable content comes first (maximizes cache prefix hits): ``` [System prompt — archetype instructions] ← cached across agents in same run [Domain context — voice profile, persona] ← cached across agents in same run [Phase context — Explorer output, proposal] ← changes per agent [Task-specific instructions] ← changes per agent ``` Estimated savings: 30-50% on input tokens for runs with 5+ agents. ### 2. Guardian Fast-Path (A2) When Guardian approves with 0 issues, skip Skeptic/Sage/Trickster. This saves 2-3 agent calls per cycle. See `archeflow:orchestration` skill, rule A2. Typical savings: $0.30-0.80 per skipped cycle (depending on models). ### 3. Explorer Cache Reuse recent Explorer research instead of re-running. See `archeflow:orchestration` skill, Explorer Cache section. Typical savings: $0.02-0.05 per cache hit (haiku Explorer). ### 4. Batches API for Bulk Operations When running consistency checks, validation passes, or other non-time-sensitive work across multiple files, use the Batches API (50% discount): ```yaml # Mark agents as batch-eligible in team presets batch_eligible: - guardian # Structured review, can wait - skeptic # Challenge generation, can wait ``` Only use batches when the user is not waiting for real-time results (overnight runs, autonomous mode). ### 5. Early Termination If the first cycle produces a clean Guardian pass (A2 fast-path) AND the Maker's self-review checklist is clean, skip the remaining cycles even if `max_cycles > 1`. This avoids spending tokens on unnecessary verification. ## Daily Cost Tracking Across runs, maintain a daily cost ledger: ``` .archeflow/costs/.jsonl ``` Each line is one run's cost summary: ```jsonl {"run_id":"2026-04-03-der-huster","cost_usd":1.45,"tokens_input":95000,"tokens_output":33000,"models":{"haiku":2,"sonnet":3},"domain":"writing"} {"run_id":"2026-04-03-auth-refactor","cost_usd":2.10,"tokens_input":120000,"tokens_output":45000,"models":{"haiku":3,"sonnet":2},"domain":"code"} ``` Daily budget enforcement reads this file to check `daily_usd` limits before starting new runs. ### Cost Report Command ```bash # Show today's costs ./lib/archeflow-costs.sh today # Show costs for a date range ./lib/archeflow-costs.sh 2026-04-01 2026-04-03 # Show costs for a specific run ./lib/archeflow-costs.sh run 2026-04-03-der-huster ``` ## Integration with Other Skills - **`orchestration`**: Calls pre-agent and post-agent budget checks. Includes cost summary in orchestration report. - **`process-log`**: Cost data is embedded in `agent.complete` and `run.complete` events. No separate cost events needed. - **`domains`**: Reads `model_overrides` from the active domain to determine effective model per agent. - **`autonomous-mode`**: Enforces budget strictly (no prompts — just stop on budget exceeded). Uses daily budget to limit overnight spend. - **`workflow-design`**: Custom workflows can specify per-phase model assignments that override domain defaults. ## Design Principles 1. **Track always, enforce optionally.** Cost data is in every event regardless of whether a budget is set. Budget enforcement is opt-in. 2. **Estimate before spend.** Always estimate before spawning an agent. Surprises are worse than slightly inaccurate estimates. 3. **Cheapest model that works.** Default to haiku. Upgrade to sonnet only when the task demonstrably needs it. Opus is user-opt-in only. 4. **Transparent.** Every cost shows up in the orchestration report. No hidden token spend. 5. **Learn from history.** After enough runs, replace estimates with actual averages from `metrics.jsonl`.