4.5 KiB
ArcheFlow Dogfood Report: Colette Expose/Pitch Generation
Date: 2026-04-04 Task: Implement expose and pitch generation steps in Colette's fanout pipeline Project: writing.colette (Python, 27 modules, 457 tests)
Task Description
The fanout pipeline in src/colette/fanout.py had two placeholder steps (generate_expose, generate_pitch) that logged "not yet implemented". The task was to replace them with real LLM-powered implementations that generate publishing proposals and pitch letters.
Conditions
| Condition | Strategy | Agents | Time | Lines |
|---|---|---|---|---|
| Plain Claude (no orchestration) | None | 0 | ~3 min | 107 (+75 impl, +32 test) |
| ArcheFlow PDCA (standard workflow) | pdca | 4 (Explorer, Creator, Maker, Guardian) | ~15 min | 230 (+145 impl, +85 test) |
Findings
Bugs introduced
| Condition | Bug | Caught by | Severity |
|---|---|---|---|
| Plain Claude | None | N/A | N/A |
| ArcheFlow | task_type/file_path kwargs passed to LLMClient.create() but only exist on GuardedLLMClient |
Guardian review | CRITICAL (runtime crash on non-guarded clients) |
Key observation: ArcheFlow's Maker introduced a bug that plain Claude avoided. The Guardian caught it, but the net result was: introduce bug + catch bug = extra work for the same outcome.
Code comparison
| Metric | Plain Claude | ArcheFlow |
|---|---|---|
| Implementation lines | 75 | 145 |
| Test lines | 32 | 85 |
| LLMClient compatibility | Clean (protocol args only) | Needed fix (extra kwargs) |
| Prompt detail | Adequate (10 sections listed) | More detailed (explicit section descriptions) |
| Defensive coding | Minimal (follows existing patterns) | More (mkdir guards, fallback paths) |
| Test thoroughness | Basic (file existence, call count) | More thorough (token accumulation, error states) |
Process overhead
| Phase | Time | Value added |
|---|---|---|
| Explorer research | ~60s | Low — task was well-scoped, pattern was obvious from reading 2 lines |
| Creator proposal | ~45s | Low — 300-line plan for 75-line task, mostly restated what the code already showed |
| Maker implementation | ~90s | Same as plain Claude, but produced more verbose code + a bug |
| Guardian review | ~30s | Mixed — caught 1 real bug (out of 5 findings, 80% noise) |
Why plain Claude won
- Pattern-following task. Two placeholder functions, one existing pattern to copy. No ambiguity, no design decisions, no security concerns.
- Direct protocol reading. Plain Claude checked the
LLMClient.create()signature and used only standard args. The Maker, working from the Creator's plan (which didn't mention the protocol), used extra kwargs it saw in theGuardedLLMClient. - Less indirection = fewer errors. The Creator-to-Maker handoff introduced information loss. The Creator specified "call llm_client.create()" but didn't specify the exact signature constraints. Plain Claude read the source of truth directly.
When ArcheFlow would have been worth it
This task had none of these signals:
- Ambiguous requirements (need Explorer)
- Multiple valid approaches (need Creator to evaluate)
- Security-sensitive code (need Guardian for real threats)
- Cross-cutting changes (5+ files, interaction risks)
- Unfamiliar codebase (need research phase)
Improvement opportunities
- Auto-select should skip orchestration for pattern-following tasks (placeholder + existing pattern in same file)
- Creator compact mode — for simple tasks, emit a 10-line diff-style plan, not a 300-line essay
- Explorer budget cap — 60s max for single-file tasks
- Guardian calibration — inject project conventions to reduce false positives from 80% to ~40%
- Baseline capture — run the same task without ArcheFlow to enable A/B comparison
Conclusion
For this specific task (simple, pattern-following, single-file, well-scoped), ArcheFlow added cost without adding quality. Plain Claude was faster, produced less code, and avoided a bug that the Maker introduced.
This is not a failure of ArcheFlow's design — it's a calibration problem. The auto-select heuristic should have detected this as a skip-orchestration task. The complexity threshold for ArcheFlow activation needs to be higher than "touches 2+ files."
Honest assessment: ArcheFlow's value-add starts at tasks requiring genuine design decisions, security review, or cross-module coordination. Below that threshold, it's ceremony.