ArcheFlow Dogfood Report: Colette Expose/Pitch Generation

Date: 2026-04-04 Task: Implement expose and pitch generation steps in Colette's fanout pipeline Project: writing.colette (Python, 27 modules, 457 tests)

Task Description

The fanout pipeline in src/colette/fanout.py had two placeholder steps (generate_expose, generate_pitch) that logged "not yet implemented". The task was to replace them with real LLM-powered implementations that generate publishing proposals and pitch letters.

Conditions

Condition	Strategy	Agents	Time	Lines
Plain Claude (no orchestration)	None	0	~3 min	107 (+75 impl, +32 test)
ArcheFlow PDCA (standard workflow)	pdca	4 (Explorer, Creator, Maker, Guardian)	~15 min	230 (+145 impl, +85 test)

Findings

Bugs introduced

Condition	Bug	Caught by	Severity
Plain Claude	None	N/A	N/A
ArcheFlow	`task_type`/`file_path` kwargs passed to `LLMClient.create()` but only exist on `GuardedLLMClient`	Guardian review	CRITICAL (runtime crash on non-guarded clients)

Key observation: ArcheFlow's Maker introduced a bug that plain Claude avoided. The Guardian caught it, but the net result was: introduce bug + catch bug = extra work for the same outcome.

Code comparison

Metric	Plain Claude	ArcheFlow
Implementation lines	75	145
Test lines	32	85
LLMClient compatibility	Clean (protocol args only)	Needed fix (extra kwargs)
Prompt detail	Adequate (10 sections listed)	More detailed (explicit section descriptions)
Defensive coding	Minimal (follows existing patterns)	More (mkdir guards, fallback paths)
Test thoroughness	Basic (file existence, call count)	More thorough (token accumulation, error states)

Process overhead

Phase	Time	Value added
Explorer research	~60s	Low — task was well-scoped, pattern was obvious from reading 2 lines
Creator proposal	~45s	Low — 300-line plan for 75-line task, mostly restated what the code already showed
Maker implementation	~90s	Same as plain Claude, but produced more verbose code + a bug
Guardian review	~30s	Mixed — caught 1 real bug (out of 5 findings, 80% noise)

Why plain Claude won

Pattern-following task. Two placeholder functions, one existing pattern to copy. No ambiguity, no design decisions, no security concerns.
Direct protocol reading. Plain Claude checked the LLMClient.create() signature and used only standard args. The Maker, working from the Creator's plan (which didn't mention the protocol), used extra kwargs it saw in the GuardedLLMClient.
Less indirection = fewer errors. The Creator-to-Maker handoff introduced information loss. The Creator specified "call llm_client.create()" but didn't specify the exact signature constraints. Plain Claude read the source of truth directly.

When ArcheFlow would have been worth it

This task had none of these signals:

Ambiguous requirements (need Explorer)
Multiple valid approaches (need Creator to evaluate)
Security-sensitive code (need Guardian for real threats)
Cross-cutting changes (5+ files, interaction risks)
Unfamiliar codebase (need research phase)

Improvement opportunities

Auto-select should skip orchestration for pattern-following tasks (placeholder + existing pattern in same file)
Creator compact mode — for simple tasks, emit a 10-line diff-style plan, not a 300-line essay
Explorer budget cap — 60s max for single-file tasks
Guardian calibration — inject project conventions to reduce false positives from 80% to ~40%
Baseline capture — run the same task without ArcheFlow to enable A/B comparison

Conclusion

For this specific task (simple, pattern-following, single-file, well-scoped), ArcheFlow added cost without adding quality. Plain Claude was faster, produced less code, and avoided a bug that the Maker introduced.

This is not a failure of ArcheFlow's design — it's a calibration problem. The auto-select heuristic should have detected this as a skip-orchestration task. The complexity threshold for ArcheFlow activation needs to be higher than "touches 2+ files."

Honest assessment: ArcheFlow's value-add starts at tasks requiring genuine design decisions, security review, or cross-module coordination. Below that threshold, it's ceremony.

4.5 KiB Raw Blame History