Files
claude-archeflow-plugin/docs/dogfood-2026-04-04.md

4.5 KiB

ArcheFlow Dogfood Report: Colette Expose/Pitch Generation

Date: 2026-04-04 Task: Implement expose and pitch generation steps in Colette's fanout pipeline Project: writing.colette (Python, 27 modules, 457 tests)

Task Description

The fanout pipeline in src/colette/fanout.py had two placeholder steps (generate_expose, generate_pitch) that logged "not yet implemented". The task was to replace them with real LLM-powered implementations that generate publishing proposals and pitch letters.

Conditions

Condition Strategy Agents Time Lines
Plain Claude (no orchestration) None 0 ~3 min 107 (+75 impl, +32 test)
ArcheFlow PDCA (standard workflow) pdca 4 (Explorer, Creator, Maker, Guardian) ~15 min 230 (+145 impl, +85 test)

Findings

Bugs introduced

Condition Bug Caught by Severity
Plain Claude None N/A N/A
ArcheFlow task_type/file_path kwargs passed to LLMClient.create() but only exist on GuardedLLMClient Guardian review CRITICAL (runtime crash on non-guarded clients)

Key observation: ArcheFlow's Maker introduced a bug that plain Claude avoided. The Guardian caught it, but the net result was: introduce bug + catch bug = extra work for the same outcome.

Code comparison

Metric Plain Claude ArcheFlow
Implementation lines 75 145
Test lines 32 85
LLMClient compatibility Clean (protocol args only) Needed fix (extra kwargs)
Prompt detail Adequate (10 sections listed) More detailed (explicit section descriptions)
Defensive coding Minimal (follows existing patterns) More (mkdir guards, fallback paths)
Test thoroughness Basic (file existence, call count) More thorough (token accumulation, error states)

Process overhead

Phase Time Value added
Explorer research ~60s Low — task was well-scoped, pattern was obvious from reading 2 lines
Creator proposal ~45s Low — 300-line plan for 75-line task, mostly restated what the code already showed
Maker implementation ~90s Same as plain Claude, but produced more verbose code + a bug
Guardian review ~30s Mixed — caught 1 real bug (out of 5 findings, 80% noise)

Why plain Claude won

  1. Pattern-following task. Two placeholder functions, one existing pattern to copy. No ambiguity, no design decisions, no security concerns.
  2. Direct protocol reading. Plain Claude checked the LLMClient.create() signature and used only standard args. The Maker, working from the Creator's plan (which didn't mention the protocol), used extra kwargs it saw in the GuardedLLMClient.
  3. Less indirection = fewer errors. The Creator-to-Maker handoff introduced information loss. The Creator specified "call llm_client.create()" but didn't specify the exact signature constraints. Plain Claude read the source of truth directly.

When ArcheFlow would have been worth it

This task had none of these signals:

  • Ambiguous requirements (need Explorer)
  • Multiple valid approaches (need Creator to evaluate)
  • Security-sensitive code (need Guardian for real threats)
  • Cross-cutting changes (5+ files, interaction risks)
  • Unfamiliar codebase (need research phase)

Improvement opportunities

  1. Auto-select should skip orchestration for pattern-following tasks (placeholder + existing pattern in same file)
  2. Creator compact mode — for simple tasks, emit a 10-line diff-style plan, not a 300-line essay
  3. Explorer budget cap — 60s max for single-file tasks
  4. Guardian calibration — inject project conventions to reduce false positives from 80% to ~40%
  5. Baseline capture — run the same task without ArcheFlow to enable A/B comparison

Conclusion

For this specific task (simple, pattern-following, single-file, well-scoped), ArcheFlow added cost without adding quality. Plain Claude was faster, produced less code, and avoided a bug that the Maker introduced.

This is not a failure of ArcheFlow's design — it's a calibration problem. The auto-select heuristic should have detected this as a skip-orchestration task. The complexity threshold for ArcheFlow activation needs to be higher than "touches 2+ files."

Honest assessment: ArcheFlow's value-add starts at tasks requiring genuine design decisions, security review, or cross-module coordination. Below that threshold, it's ceremony.