Files
claude-archeflow-plugin/skills/cost-tracking/SKILL.md
Christian Nennemann b6df3d19fd feat: add automated PDCA loop, domain adapters, cost tracking, DAG renderer
- skills/run: automated PDCA execution loop with --start-from, --dry-run
- skills/artifact-routing: inter-phase artifact protocol with context injection
- skills/act-phase: structured review→fix pipeline with cycle feedback
- skills/domains: domain adapter system (writing, code, research)
- skills/cost-tracking: per-agent cost estimation, budget enforcement
- lib/archeflow-dag.sh: ASCII DAG renderer from JSONL events
- lib/archeflow-report.sh: updated with DAG section, cycle diff, --dag/--summary flags
2026-04-03 11:20:14 +02:00

12 KiB

name, description
name description
cost-tracking Cost aggregation, budget enforcement, and model selection for ArcheFlow orchestrations. Tracks per-agent and per-run token costs, enforces budgets, and recommends the cheapest model that meets quality requirements per archetype and domain. <example>User: "How much did that orchestration cost?"</example> <example>Automatically active when budget is configured</example>

Cost Tracking — Budget-Aware Orchestration

Every ArcheFlow orchestration consumes LLM tokens. This skill tracks costs per agent and per run, enforces budgets, and recommends cost-optimal model assignments.

Model Pricing Table

Current pricing (update when models change):

Model Input ($/M tokens) Output ($/M tokens) Notes
claude-opus-4-6 15.00 75.00 Highest quality, use sparingly
claude-sonnet-4-6 3.00 15.00 Good balance of quality and cost
claude-haiku-4-5 0.80 4.00 Cheap, fast, good for structured tasks

Prompt caching (when applicable): 90% discount on cached input tokens. The orchestrator should structure system prompts to maximize cache hits (archetype instructions, voice profiles, and domain context are cache-friendly since they repeat across agents in a run).

Batches API: 50% discount on all tokens. Use for non-time-sensitive bulk operations (validation passes, consistency checks).

Per-Agent Cost Tracking

Every agent.complete event includes cost data:

{
  "type": "agent.complete",
  "data": {
    "archetype": "story-explorer",
    "duration_ms": 87605,
    "tokens_input": 15000,
    "tokens_output": 6000,
    "tokens_cache_read": 8000,
    "model": "haiku",
    "estimated_cost_usd": 0.02,
    "summary": "3 plot directions developed, recommended C"
  }
}

Cost Calculation

cost = (tokens_input - tokens_cache_read) * input_price / 1_000_000
     + tokens_cache_read * input_price * 0.10 / 1_000_000
     + tokens_output * output_price / 1_000_000

If exact token counts are unavailable (Claude Code doesn't always expose them), estimate based on character count:

estimated_tokens = character_count / 4   # rough heuristic

Mark estimated costs with "cost_estimated": true in the event data so reports can distinguish measured from estimated values.

Run-Level Aggregation

The run.complete event includes cost totals:

{
  "type": "run.complete",
  "data": {
    "status": "completed",
    "total_tokens_input": 95000,
    "total_tokens_output": 33000,
    "total_tokens_cache_read": 42000,
    "total_cost_usd": 1.45,
    "budget_usd": 10.00,
    "budget_remaining_usd": 8.55,
    "agents_total": 5,
    "cost_by_phase": {
      "plan": 0.35,
      "do": 0.72,
      "check": 0.38
    },
    "cost_by_model": {
      "haiku": 0.12,
      "sonnet": 1.33
    }
  }
}

Cost Summary in Orchestration Report

After each orchestration, the report includes a cost section:

## Cost Summary
| Phase | Model(s) | Tokens (in/out) | Cost |
|-------|----------|-----------------|------|
| Plan  | haiku, sonnet | 32k / 12k | $0.35 |
| Do    | sonnet | 40k / 15k | $0.72 |
| Check | haiku, sonnet | 23k / 6k | $0.38 |
| **Total** | | **95k / 33k** | **$1.45** |

Budget: $10.00 | Spent: $1.45 | Remaining: $8.55

Budget Configuration

Budgets are defined in team presets or .archeflow/config.yaml:

# .archeflow/config.yaml
budget:
  per_run_usd: 10.00        # Max cost per orchestration run
  per_agent_usd: 3.00       # Max cost per individual agent
  daily_usd: 50.00          # Max daily spend across all runs
  warn_at_percent: 75       # Warn when this % of budget is consumed
# Team preset override
name: story-development
domain: writing
budget:
  per_run_usd: 5.00         # Writing runs are usually cheaper

Team preset budget overrides the global config for that run.

Budget Precedence

  1. Team preset budget (if set)
  2. .archeflow/config.yaml budget
  3. No budget (unlimited) — costs are still tracked but not enforced

Budget Enforcement

Budget checks happen at two points:

1. Pre-Agent Check (before spawning)

Before each agent is spawned, estimate its cost and check against remaining budget:

estimated_agent_cost = estimate_tokens(archetype, task_complexity) * model_price
remaining_budget = budget - sum(costs_so_far)

if estimated_agent_cost > remaining_budget:
    WARN: "Estimated cost for {archetype} (${estimated}) would exceed remaining budget (${remaining}). Continue? [y/N]"

In autonomous mode: if budget would be exceeded, STOP the run and report. Do not prompt — there is no one to answer.

In attended mode: warn and ask the user. They can approve the overage or stop.

2. Post-Agent Check (after completion)

After each agent completes, update the running total and check:

if total_cost > budget * warn_at_percent / 100:
    WARN: "Budget ${warn_at_percent}% consumed (${total_cost} of ${budget})"

if total_cost > budget:
    STOP: "Budget exceeded (${total_cost} of ${budget}). Run halted."

Pre-Agent Cost Estimation

Rough token estimates by archetype (calibrate over time with actual data from metrics.jsonl):

Archetype Typical Input Typical Output Notes
Explorer 8k 4k Research, reads many files
Creator 12k 6k Receives Explorer output, produces plan
Maker 15k 12k Largest output (implementation/prose)
Guardian 10k 3k Reads diff, structured output
Skeptic 8k 3k Reads proposal, structured challenges
Sage 12k 4k Reads diff + proposal
Trickster 8k 4k Reads diff, generates test cases

These are starting estimates. After 10+ runs, use actual averages from metrics.jsonl instead.

Cost-Aware Model Selection

Each archetype has a recommended model tier based on the quality requirements of its role:

Default Model Assignments (Code Domain)

Archetype Model Rationale
Explorer haiku Research is structured extraction — cheap model handles it well
Creator sonnet Design decisions need reasoning quality
Maker sonnet Implementation needs quality to avoid rework cycles
Guardian haiku Security/risk review is checklist-driven — structured and cheap
Skeptic haiku Challenge generation follows patterns — cheap
Sage sonnet Holistic quality judgment needs nuance
Trickster haiku Adversarial testing is systematic — cheap

Writing Domain Overrides

Writing tasks need higher quality for prose-generating agents:

Archetype Model Rationale
Explorer / story-explorer haiku Research is still cheap
Creator sonnet Outline design needs narrative judgment
Maker sonnet Prose quality is the product — cannot be cheap
Guardian haiku Plot/continuity checks are structured
Skeptic haiku Premise challenges are structured
Sage / story-sage sonnet Voice and craft judgment need taste
Trickster haiku Reader-confusion analysis is systematic

When to escalate to opus: Only for final-pass prose polishing on high-stakes content (book manuscripts, not short stories). Never for review or research agents. The user must explicitly opt in via:

# Team preset
model_overrides:
  maker: opus    # Only for final polish pass

Domain-Driven Model Selection

The effective model for each agent is resolved in this order:

  1. Team preset model_overrides (highest priority — explicit choice)
  2. Domain model_overrides (from .archeflow/domains/<name>.yaml)
  3. Archetype default (from the table above)
  4. Custom archetype model field (from archetype YAML frontmatter)

Example resolution for story-sage in a writing run:

  • Team preset says nothing about story-sage → skip
  • Writing domain says story-sage: sonnetuse sonnet
  • Archetype YAML says model: sonnet → would have been used if domain didn't specify

Cost Optimization Strategies

1. Prompt Caching

Structure prompts so that stable content comes first (maximizes cache prefix hits):

[System prompt — archetype instructions]     ← cached across agents in same run
[Domain context — voice profile, persona]    ← cached across agents in same run
[Phase context — Explorer output, proposal]  ← changes per agent
[Task-specific instructions]                 ← changes per agent

Estimated savings: 30-50% on input tokens for runs with 5+ agents.

2. Guardian Fast-Path (A2)

When Guardian approves with 0 issues, skip Skeptic/Sage/Trickster. This saves 2-3 agent calls per cycle. See archeflow:orchestration skill, rule A2.

Typical savings: $0.30-0.80 per skipped cycle (depending on models).

3. Explorer Cache

Reuse recent Explorer research instead of re-running. See archeflow:orchestration skill, Explorer Cache section.

Typical savings: $0.02-0.05 per cache hit (haiku Explorer).

4. Batches API for Bulk Operations

When running consistency checks, validation passes, or other non-time-sensitive work across multiple files, use the Batches API (50% discount):

# Mark agents as batch-eligible in team presets
batch_eligible:
  - guardian    # Structured review, can wait
  - skeptic    # Challenge generation, can wait

Only use batches when the user is not waiting for real-time results (overnight runs, autonomous mode).

5. Early Termination

If the first cycle produces a clean Guardian pass (A2 fast-path) AND the Maker's self-review checklist is clean, skip the remaining cycles even if max_cycles > 1. This avoids spending tokens on unnecessary verification.

Daily Cost Tracking

Across runs, maintain a daily cost ledger:

.archeflow/costs/<YYYY-MM-DD>.jsonl

Each line is one run's cost summary:

{"run_id":"2026-04-03-der-huster","cost_usd":1.45,"tokens_input":95000,"tokens_output":33000,"models":{"haiku":2,"sonnet":3},"domain":"writing"}
{"run_id":"2026-04-03-auth-refactor","cost_usd":2.10,"tokens_input":120000,"tokens_output":45000,"models":{"haiku":3,"sonnet":2},"domain":"code"}

Daily budget enforcement reads this file to check daily_usd limits before starting new runs.

Cost Report Command

# Show today's costs
./lib/archeflow-costs.sh today

# Show costs for a date range
./lib/archeflow-costs.sh 2026-04-01 2026-04-03

# Show costs for a specific run
./lib/archeflow-costs.sh run 2026-04-03-der-huster

Integration with Other Skills

  • orchestration: Calls pre-agent and post-agent budget checks. Includes cost summary in orchestration report.
  • process-log: Cost data is embedded in agent.complete and run.complete events. No separate cost events needed.
  • domains: Reads model_overrides from the active domain to determine effective model per agent.
  • autonomous-mode: Enforces budget strictly (no prompts — just stop on budget exceeded). Uses daily budget to limit overnight spend.
  • workflow-design: Custom workflows can specify per-phase model assignments that override domain defaults.

Design Principles

  1. Track always, enforce optionally. Cost data is in every event regardless of whether a budget is set. Budget enforcement is opt-in.
  2. Estimate before spend. Always estimate before spawning an agent. Surprises are worse than slightly inaccurate estimates.
  3. Cheapest model that works. Default to haiku. Upgrade to sonnet only when the task demonstrably needs it. Opus is user-opt-in only.
  4. Transparent. Every cost shows up in the orchestration report. No hidden token spend.
  5. Learn from history. After enough runs, replace estimates with actual averages from metrics.jsonl.