Files

Christian Nennemann b6df3d19fd feat: add automated PDCA loop, domain adapters, cost tracking, DAG renderer

- skills/run: automated PDCA execution loop with --start-from, --dry-run
- skills/artifact-routing: inter-phase artifact protocol with context injection
- skills/act-phase: structured review→fix pipeline with cycle feedback
- skills/domains: domain adapter system (writing, code, research)
- skills/cost-tracking: per-agent cost estimation, budget enforcement
- lib/archeflow-dag.sh: ASCII DAG renderer from JSONL events
- lib/archeflow-report.sh: updated with DAG section, cycle diff, --dag/--summary flags

2026-04-03 11:20:14 +02:00

12 KiB

Raw Blame History

name, description

name	description
cost-tracking	Cost aggregation, budget enforcement, and model selection for ArcheFlow orchestrations. Tracks per-agent and per-run token costs, enforces budgets, and recommends the cheapest model that meets quality requirements per archetype and domain. <example>User: "How much did that orchestration cost?"</example> <example>Automatically active when budget is configured</example>

Cost Tracking — Budget-Aware Orchestration

Every ArcheFlow orchestration consumes LLM tokens. This skill tracks costs per agent and per run, enforces budgets, and recommends cost-optimal model assignments.

Model Pricing Table

Current pricing (update when models change):

Model	Input ($/M tokens)	Output ($/M tokens)	Notes
`claude-opus-4-6`	15.00	75.00	Highest quality, use sparingly
`claude-sonnet-4-6`	3.00	15.00	Good balance of quality and cost
`claude-haiku-4-5`	0.80	4.00	Cheap, fast, good for structured tasks

Prompt caching (when applicable): 90% discount on cached input tokens. The orchestrator should structure system prompts to maximize cache hits (archetype instructions, voice profiles, and domain context are cache-friendly since they repeat across agents in a run).

Batches API: 50% discount on all tokens. Use for non-time-sensitive bulk operations (validation passes, consistency checks).

Per-Agent Cost Tracking

Every agent.complete event includes cost data:

{
  "type": "agent.complete",
  "data": {
    "archetype": "story-explorer",
    "duration_ms": 87605,
    "tokens_input": 15000,
    "tokens_output": 6000,
    "tokens_cache_read": 8000,
    "model": "haiku",
    "estimated_cost_usd": 0.02,
    "summary": "3 plot directions developed, recommended C"
  }
}

Cost Calculation

cost = (tokens_input - tokens_cache_read) * input_price / 1_000_000
     + tokens_cache_read * input_price * 0.10 / 1_000_000
     + tokens_output * output_price / 1_000_000

If exact token counts are unavailable (Claude Code doesn't always expose them), estimate based on character count:

estimated_tokens = character_count / 4   # rough heuristic

Mark estimated costs with "cost_estimated": true in the event data so reports can distinguish measured from estimated values.

Run-Level Aggregation

The run.complete event includes cost totals:

{
  "type": "run.complete",
  "data": {
    "status": "completed",
    "total_tokens_input": 95000,
    "total_tokens_output": 33000,
    "total_tokens_cache_read": 42000,
    "total_cost_usd": 1.45,
    "budget_usd": 10.00,
    "budget_remaining_usd": 8.55,
    "agents_total": 5,
    "cost_by_phase": {
      "plan": 0.35,
      "do": 0.72,
      "check": 0.38
    },
    "cost_by_model": {
      "haiku": 0.12,
      "sonnet": 1.33
    }
  }
}

Cost Summary in Orchestration Report

After each orchestration, the report includes a cost section:

## Cost Summary
| Phase | Model(s) | Tokens (in/out) | Cost |
|-------|----------|-----------------|------|
| Plan  | haiku, sonnet | 32k / 12k | $0.35 |
| Do    | sonnet | 40k / 15k | $0.72 |
| Check | haiku, sonnet | 23k / 6k | $0.38 |
| **Total** | | **95k / 33k** | **$1.45** |

Budget: $10.00 | Spent: $1.45 | Remaining: $8.55

Budget Configuration

Budgets are defined in team presets or .archeflow/config.yaml:

# .archeflow/config.yaml
budget:
  per_run_usd: 10.00        # Max cost per orchestration run
  per_agent_usd: 3.00       # Max cost per individual agent
  daily_usd: 50.00          # Max daily spend across all runs
  warn_at_percent: 75       # Warn when this % of budget is consumed

# Team preset override
name: story-development
domain: writing
budget:
  per_run_usd: 5.00         # Writing runs are usually cheaper

Team preset budget overrides the global config for that run.

Budget Precedence

Team preset budget (if set)
.archeflow/config.yaml budget
No budget (unlimited) — costs are still tracked but not enforced

Budget Enforcement

Budget checks happen at two points:

1. Pre-Agent Check (before spawning)

Before each agent is spawned, estimate its cost and check against remaining budget:

estimated_agent_cost = estimate_tokens(archetype, task_complexity) * model_price
remaining_budget = budget - sum(costs_so_far)

if estimated_agent_cost > remaining_budget:
    WARN: "Estimated cost for {archetype} (${estimated}) would exceed remaining budget (${remaining}). Continue? [y/N]"

In autonomous mode: if budget would be exceeded, STOP the run and report. Do not prompt — there is no one to answer.

In attended mode: warn and ask the user. They can approve the overage or stop.

2. Post-Agent Check (after completion)

After each agent completes, update the running total and check:

if total_cost > budget * warn_at_percent / 100:
    WARN: "Budget ${warn_at_percent}% consumed (${total_cost} of ${budget})"

if total_cost > budget:
    STOP: "Budget exceeded (${total_cost} of ${budget}). Run halted."

Pre-Agent Cost Estimation

Rough token estimates by archetype (calibrate over time with actual data from metrics.jsonl):

Archetype	Typical Input	Typical Output	Notes
Explorer	8k	4k	Research, reads many files
Creator	12k	6k	Receives Explorer output, produces plan
Maker	15k	12k	Largest output (implementation/prose)
Guardian	10k	3k	Reads diff, structured output
Skeptic	8k	3k	Reads proposal, structured challenges
Sage	12k	4k	Reads diff + proposal
Trickster	8k	4k	Reads diff, generates test cases

These are starting estimates. After 10+ runs, use actual averages from metrics.jsonl instead.

Cost-Aware Model Selection

Each archetype has a recommended model tier based on the quality requirements of its role:

Default Model Assignments (Code Domain)

Archetype	Model	Rationale
Explorer	haiku	Research is structured extraction — cheap model handles it well
Creator	sonnet	Design decisions need reasoning quality
Maker	sonnet	Implementation needs quality to avoid rework cycles
Guardian	haiku	Security/risk review is checklist-driven — structured and cheap
Skeptic	haiku	Challenge generation follows patterns — cheap
Sage	sonnet	Holistic quality judgment needs nuance
Trickster	haiku	Adversarial testing is systematic — cheap

Writing Domain Overrides

Writing tasks need higher quality for prose-generating agents:

Archetype	Model	Rationale
Explorer / story-explorer	haiku	Research is still cheap
Creator	sonnet	Outline design needs narrative judgment
Maker	sonnet	Prose quality is the product — cannot be cheap
Guardian	haiku	Plot/continuity checks are structured
Skeptic	haiku	Premise challenges are structured
Sage / story-sage	sonnet	Voice and craft judgment need taste
Trickster	haiku	Reader-confusion analysis is systematic

When to escalate to opus: Only for final-pass prose polishing on high-stakes content (book manuscripts, not short stories). Never for review or research agents. The user must explicitly opt in via:

# Team preset
model_overrides:
  maker: opus    # Only for final polish pass

Domain-Driven Model Selection

The effective model for each agent is resolved in this order:

Team preset model_overrides (highest priority — explicit choice)
Domain model_overrides (from .archeflow/domains/<name>.yaml)
Archetype default (from the table above)
Custom archetype model field (from archetype YAML frontmatter)

Example resolution for story-sage in a writing run:

Team preset says nothing about story-sage → skip
Writing domain says story-sage: sonnet → use sonnet
Archetype YAML says model: sonnet → would have been used if domain didn't specify

Cost Optimization Strategies

1. Prompt Caching

Structure prompts so that stable content comes first (maximizes cache prefix hits):

[System prompt — archetype instructions]     ← cached across agents in same run
[Domain context — voice profile, persona]    ← cached across agents in same run
[Phase context — Explorer output, proposal]  ← changes per agent
[Task-specific instructions]                 ← changes per agent

Estimated savings: 30-50% on input tokens for runs with 5+ agents.

2. Guardian Fast-Path (A2)

When Guardian approves with 0 issues, skip Skeptic/Sage/Trickster. This saves 2-3 agent calls per cycle. See archeflow:orchestration skill, rule A2.

Typical savings: $0.30-0.80 per skipped cycle (depending on models).

3. Explorer Cache

Reuse recent Explorer research instead of re-running. See archeflow:orchestration skill, Explorer Cache section.

Typical savings: $0.02-0.05 per cache hit (haiku Explorer).

4. Batches API for Bulk Operations

When running consistency checks, validation passes, or other non-time-sensitive work across multiple files, use the Batches API (50% discount):

# Mark agents as batch-eligible in team presets
batch_eligible:
  - guardian    # Structured review, can wait
  - skeptic    # Challenge generation, can wait

Only use batches when the user is not waiting for real-time results (overnight runs, autonomous mode).

5. Early Termination

If the first cycle produces a clean Guardian pass (A2 fast-path) AND the Maker's self-review checklist is clean, skip the remaining cycles even if max_cycles > 1. This avoids spending tokens on unnecessary verification.

Daily Cost Tracking

Across runs, maintain a daily cost ledger:

.archeflow/costs/<YYYY-MM-DD>.jsonl

Each line is one run's cost summary:

{"run_id":"2026-04-03-der-huster","cost_usd":1.45,"tokens_input":95000,"tokens_output":33000,"models":{"haiku":2,"sonnet":3},"domain":"writing"}
{"run_id":"2026-04-03-auth-refactor","cost_usd":2.10,"tokens_input":120000,"tokens_output":45000,"models":{"haiku":3,"sonnet":2},"domain":"code"}

Daily budget enforcement reads this file to check daily_usd limits before starting new runs.

Cost Report Command

# Show today's costs
./lib/archeflow-costs.sh today

# Show costs for a date range
./lib/archeflow-costs.sh 2026-04-01 2026-04-03

# Show costs for a specific run
./lib/archeflow-costs.sh run 2026-04-03-der-huster

Integration with Other Skills

orchestration: Calls pre-agent and post-agent budget checks. Includes cost summary in orchestration report.
process-log: Cost data is embedded in agent.complete and run.complete events. No separate cost events needed.
domains: Reads model_overrides from the active domain to determine effective model per agent.
autonomous-mode: Enforces budget strictly (no prompts — just stop on budget exceeded). Uses daily budget to limit overnight spend.
workflow-design: Custom workflows can specify per-phase model assignments that override domain defaults.

Design Principles

Track always, enforce optionally. Cost data is in every event regardless of whether a budget is set. Budget enforcement is opt-in.
Estimate before spend. Always estimate before spawning an agent. Surprises are worse than slightly inaccurate estimates.
Cheapest model that works. Default to haiku. Upgrade to sonnet only when the task demonstrably needs it. Opus is user-opt-in only.
Transparent. Every cost shows up in the orchestration report. No hidden token spend.
Learn from history. After enough runs, replace estimates with actual averages from metrics.jsonl.

12 KiB Raw Blame History