New subsection in Discussion framing Wiggum Breaks as the formal boundary between autonomous and human-supervised operation. Derives HITL from convergence theory rather than pre-defined approval gates. Covers oscillation, divergence, and repeated shadow detection as provably unproductive conditions that trigger human escalation.
881 lines
39 KiB
TeX
881 lines
39 KiB
TeX
\documentclass[11pt,a4paper]{article}
|
|
|
|
% ---- Packages ----
|
|
\usepackage[utf8]{inputenc}
|
|
\usepackage[T1]{fontenc}
|
|
\usepackage{amsmath,amssymb}
|
|
\usepackage{graphicx}
|
|
\usepackage{booktabs}
|
|
\usepackage{hyperref}
|
|
\usepackage{xcolor}
|
|
\usepackage{listings}
|
|
\usepackage{subcaption}
|
|
\usepackage{tikz}
|
|
\usetikzlibrary{shapes,arrows.meta,positioning,fit,calc}
|
|
\usepackage[numbers]{natbib}
|
|
\usepackage{geometry}
|
|
\geometry{margin=1in}
|
|
|
|
% ---- Listings style ----
|
|
\lstset{
|
|
basicstyle=\ttfamily\small,
|
|
breaklines=true,
|
|
frame=single,
|
|
framesep=3pt,
|
|
columns=flexible,
|
|
keepspaces=true,
|
|
showstringspaces=false,
|
|
commentstyle=\color{gray},
|
|
keywordstyle=\color{blue!70!black},
|
|
}
|
|
|
|
% ---- Title ----
|
|
\title{%
|
|
ArcheFlow: Multi-Agent Orchestration with\\
|
|
Archetypal Roles and PDCA Quality Cycles%
|
|
}
|
|
|
|
\author{
|
|
Christian Nennemann\\
|
|
Independent Researcher\\
|
|
\texttt{chris@nennemann.de}\\
|
|
\texttt{https://github.com/XORwell/archeflow}
|
|
}
|
|
|
|
\date{April 2026}
|
|
|
|
\begin{document}
|
|
\maketitle
|
|
|
|
% ============================================================
|
|
\begin{abstract}
|
|
We present \textsc{ArcheFlow}, an open-source orchestration framework for
|
|
multi-agent software engineering that assigns \emph{archetypal roles}---derived
|
|
from Jungian analytical psychology---to LLM agents and coordinates them through
|
|
\emph{Plan--Do--Check--Act} (PDCA) quality cycles. Each of seven archetypes
|
|
(Explorer, Creator, Maker, Guardian, Skeptic, Trickster, Sage) carries a defined
|
|
cognitive virtue and a quantitatively detected \emph{shadow}---a failure mode
|
|
triggered when the virtue becomes excessive. The framework implements a
|
|
three-layer corrective action system (archetype shadows, system shadows, policy
|
|
boundaries) that detects and mitigates agent dysfunction during autonomous
|
|
operation. We describe ArcheFlow's architecture as a zero-dependency plugin for
|
|
Claude Code, detail its attention filtering, feedback routing, convergence
|
|
detection, and effectiveness scoring mechanisms, and discuss connections to
|
|
recent work on persona stability in language models
|
|
\citep{lu2026assistant}. ArcheFlow demonstrates that structured persona
|
|
assignment with shadow detection can maintain productive agent behavior across
|
|
extended autonomous sessions spanning multiple projects and quality domains
|
|
(code, prose, research). The system is publicly available under the MIT license.
|
|
\end{abstract}
|
|
|
|
% ============================================================
|
|
\section{Introduction}
|
|
\label{sec:introduction}
|
|
|
|
The rise of agentic coding assistants---tools that autonomously write, test,
|
|
review, and commit code---has created a new class of software engineering
|
|
challenges. While individual LLM agents can produce competent code, the quality
|
|
of autonomous output degrades under conditions that are well-known from human
|
|
software teams: reviewers who rubber-stamp, architects who over-engineer,
|
|
implementers who ignore specifications, and testers who optimize for coverage
|
|
metrics rather than real defects.
|
|
|
|
These failure modes are not merely analogies. \citet{lu2026assistant}
|
|
demonstrate that language models occupy a measurable \emph{persona space} and
|
|
can drift from their trained Assistant identity during extended conversations,
|
|
particularly under emotional or philosophical pressure. Their ``Assistant
|
|
Axis''---a dominant directional component in activation space---predicts when
|
|
models will exhibit uncharacteristic behavior. If a single model drifts, a
|
|
multi-agent system where each agent maintains a distinct persona faces
|
|
compounded persona management challenges.
|
|
|
|
ArcheFlow addresses this problem by drawing on two established frameworks:
|
|
\begin{enumerate}
|
|
\item \textbf{Jungian archetypal psychology} \citep{jung1968archetypes}, which
|
|
provides a taxonomy of cognitive orientations---each with a productive
|
|
\emph{virtue} and a destructive \emph{shadow}---that map naturally onto
|
|
software engineering roles.
|
|
\item \textbf{PDCA quality cycles} \citep{deming1986out}, which provide a
|
|
convergence mechanism for iterative refinement with measurable exit criteria.
|
|
\end{enumerate}
|
|
|
|
The contribution of this paper is threefold:
|
|
\begin{itemize}
|
|
\item We present a \emph{shadow detection framework} that quantitatively
|
|
identifies agent dysfunction---not through sentiment analysis or output
|
|
classification, but through structural metrics (output length, finding ratios,
|
|
scope violations) specific to each archetype's failure mode (Section~\ref{sec:shadows}).
|
|
\item We describe \emph{attention filters} and \emph{feedback routing} mechanisms
|
|
that constrain what each agent sees and where its output flows, preventing the
|
|
information overload and echo chamber effects that plague na\"ive multi-agent
|
|
systems (Section~\ref{sec:attention}).
|
|
\item We demonstrate that PDCA convergence detection---including oscillation
|
|
analysis and divergence scoring---provides principled stopping criteria for
|
|
iterative review cycles (Section~\ref{sec:convergence}).
|
|
\end{itemize}
|
|
|
|
ArcheFlow is implemented as a zero-dependency plugin (Bash + Markdown) for
|
|
Claude Code\footnote{\url{https://claude.ai/claude-code}}, Anthropic's CLI
|
|
coding assistant. It has been used in production across a portfolio of 10--30
|
|
repositories spanning code, creative writing, and academic research.
|
|
|
|
% ============================================================
|
|
\section{Related Work}
|
|
\label{sec:related}
|
|
|
|
\subsection{Multi-Agent Software Engineering}
|
|
|
|
Multi-agent systems for software engineering have proliferated since 2024.
|
|
\citet{hong2024metagpt} propose MetaGPT, which assigns human-like roles
|
|
(product manager, architect, engineer) to LLM agents and enforces structured
|
|
communication through Standardized Operating Procedures (SOPs). ChatDev
|
|
\citep{qian2024chatdev} simulates a virtual software company with role-playing
|
|
agents communicating through natural language chat. SWE-Agent
|
|
\citep{yang2024sweagent} focuses on single-agent benchmark performance on
|
|
GitHub issues, demonstrating that tool-augmented agents can resolve real-world
|
|
bugs.
|
|
|
|
These systems share a common limitation: roles are defined by \emph{job
|
|
descriptions} rather than \emph{cognitive orientations}. A ``product manager''
|
|
agent may behave identically to a ``tech lead'' agent when both receive the same
|
|
context, because the role boundary is semantic rather than structural. ArcheFlow
|
|
addresses this through attention filters (Section~\ref{sec:attention}) that
|
|
physically restrict what each agent perceives, ensuring that role differences
|
|
manifest in behavior rather than merely in prompts.
|
|
|
|
\subsection{Persona Stability in Language Models}
|
|
|
|
\citet{lu2026assistant} identify the ``Assistant Axis'' in LLM activation
|
|
space---a linear direction capturing the degree to which a model operates in its
|
|
default helpful mode versus an alternative persona. Their key findings are
|
|
directly relevant to multi-agent orchestration:
|
|
|
|
\begin{enumerate}
|
|
\item \textbf{Persona space is low-dimensional}: only 4--19 principal
|
|
components explain 70\% of persona variance across 275 character archetypes.
|
|
\item \textbf{Drift is predictable}: user message embeddings predict response
|
|
position along the Assistant Axis ($R^2 = 0.53$--$0.77$).
|
|
\item \textbf{Drift correlates with harm}: models are more liable to produce
|
|
harmful outputs when drifted from the Assistant identity ($r = 0.39$--$0.52$).
|
|
\end{enumerate}
|
|
|
|
ArcheFlow's shadow detection (Section~\ref{sec:shadows}) can be understood as an
|
|
\emph{application-level} analog to activation capping: where \citet{lu2026assistant}
|
|
constrain neural activations to maintain persona stability, ArcheFlow constrains
|
|
\emph{behavioral outputs} through quantitative triggers and corrective prompts.
|
|
Both approaches recognize that productive personas require active stabilization,
|
|
not merely initial assignment.
|
|
|
|
\subsection{Quality Cycles in Software Engineering}
|
|
|
|
The Plan--Do--Check--Act (PDCA) cycle, formalized by \citet{deming1986out} and
|
|
rooted in Shewhart's statistical process control \citep{shewhart1939statistical},
|
|
is the dominant quality improvement framework in manufacturing and has been
|
|
applied to software engineering through agile retrospectives and continuous
|
|
improvement. To our knowledge, ArcheFlow is the first system to apply PDCA
|
|
cycles to multi-agent LLM orchestration with formal convergence detection and
|
|
oscillation analysis.
|
|
|
|
\subsection{Jungian Archetypes in Computing}
|
|
|
|
While Jungian archetypes have been applied in user experience design
|
|
\citep{hartson2012ux}, brand strategy, and game design, their application to
|
|
AI agent systems is novel. The closest related work is in computational
|
|
creativity, where archetypal narratives have been used to structure story
|
|
generation \citep{winston2011strong}. ArcheFlow extends this to software
|
|
engineering by mapping archetypal virtues and shadows to measurable engineering
|
|
outcomes.
|
|
|
|
% ============================================================
|
|
\section{Architecture}
|
|
\label{sec:architecture}
|
|
|
|
ArcheFlow is a plugin for Claude Code that operates entirely through prompt
|
|
engineering, shell scripts, and file-based communication. It has zero runtime
|
|
dependencies beyond Bash and a compatible LLM backend.
|
|
|
|
\begin{figure}[t]
|
|
\centering
|
|
\begin{tikzpicture}[
|
|
node distance=1.2cm and 2cm,
|
|
phase/.style={draw, rounded corners, minimum width=2.5cm, minimum height=0.8cm, font=\small\bfseries},
|
|
agent/.style={draw, rounded corners, minimum width=2cm, minimum height=0.6cm, font=\small, fill=blue!5},
|
|
arrow/.style={-{Stealth[length=3mm]}, thick},
|
|
label/.style={font=\scriptsize, text=gray},
|
|
]
|
|
|
|
% PDCA Cycle
|
|
\node[phase, fill=yellow!20] (plan) {Plan};
|
|
\node[phase, fill=green!20, right=of plan] (do) {Do};
|
|
\node[phase, fill=orange!20, right=of do] (check) {Check};
|
|
\node[phase, fill=red!15, right=of check] (act) {Act};
|
|
|
|
% Plan agents
|
|
\node[agent, below left=0.8cm and 0.3cm of plan] (explorer) {Explorer};
|
|
\node[agent, below right=0.8cm and 0.3cm of plan] (creator) {Creator};
|
|
|
|
% Do agent
|
|
\node[agent, below=0.8cm of do] (maker) {Maker};
|
|
|
|
% Check agents
|
|
\node[agent, below left=0.8cm and -0.2cm of check] (guardian) {Guardian};
|
|
\node[agent, below=0.8cm of check] (skeptic) {Skeptic};
|
|
\node[agent, below right=0.8cm and -0.2cm of check] (sage) {Sage};
|
|
|
|
% Arrows
|
|
\draw[arrow] (plan) -- (do);
|
|
\draw[arrow] (do) -- (check);
|
|
\draw[arrow] (check) -- (act);
|
|
\draw[arrow, dashed] (act.south) -- ++(0,-0.5) -| node[label, below, pos=0.25] {cycle back} (plan.south);
|
|
|
|
% Agent connections
|
|
\draw[-] (plan.south) -- (explorer.north);
|
|
\draw[-] (plan.south) -- (creator.north);
|
|
\draw[-] (do.south) -- (maker.north);
|
|
\draw[-] (check.south) -- (guardian.north);
|
|
\draw[-] (check.south) -- (skeptic.north);
|
|
\draw[-] (check.south) -- (sage.north);
|
|
|
|
\end{tikzpicture}
|
|
\caption{ArcheFlow PDCA cycle with archetypal agent assignments. The dashed arrow represents cycle-back when reviewers find issues. A Trickster agent (not shown) joins the Check phase in \texttt{thorough} workflows.}
|
|
\label{fig:pdca}
|
|
\end{figure}
|
|
|
|
\subsection{Components}
|
|
|
|
The system comprises four component types:
|
|
|
|
\begin{description}
|
|
\item[Agent personas] (\texttt{agents/*.md}): Behavioral protocols for each
|
|
archetype, defining the agent's cognitive lens, output format, and quality
|
|
criteria. Each persona is a Markdown file loaded as a system prompt.
|
|
|
|
\item[Skills] (\texttt{skills/*/SKILL.md}): Operational instructions that
|
|
Claude Code follows to orchestrate the PDCA cycle. The core \texttt{run} skill
|
|
(466 lines) is self-contained---it encodes the complete orchestration protocol
|
|
including workflow selection, agent spawning, attention filtering, convergence
|
|
checking, and exit decisions.
|
|
|
|
\item[Library scripts] (\texttt{lib/*.sh}): Ten Bash scripts handling
|
|
infrastructure concerns: JSONL event logging, git operations (per-phase
|
|
commits, branch management, rollback), cross-run memory, progress tracking,
|
|
effectiveness scoring, and run replay.
|
|
|
|
\item[Hooks] (\texttt{hooks/}): Session-start hook that auto-activates
|
|
ArcheFlow and injects the domain detection logic.
|
|
\end{description}
|
|
|
|
\subsection{Execution Modes}
|
|
|
|
ArcheFlow provides three execution modes optimized for different use cases:
|
|
|
|
\begin{description}
|
|
\item[Sprint] (\texttt{/af-sprint}): Queue-driven parallel dispatch. Reads a
|
|
priority-ordered task queue, spawns 3--5 agents across different projects
|
|
simultaneously, collects results, commits, and starts the next batch. Designed
|
|
for throughput over ceremony.
|
|
|
|
\item[Review] (\texttt{/af-review}): Guardian-led post-implementation review
|
|
on existing diffs, branches, or commit ranges. No planning or implementation
|
|
orchestration---pure quality analysis.
|
|
|
|
\item[Run] (\texttt{/af-run}): Full PDCA orchestration for complex tasks
|
|
requiring structured exploration, design, implementation, and multi-perspective
|
|
review.
|
|
\end{description}
|
|
|
|
\subsection{Domain Adaptation}
|
|
|
|
ArcheFlow adapts its terminology and quality criteria based on domain detection:
|
|
\texttt{code} (diffs, tests, security), \texttt{writing} (voice consistency,
|
|
dialect authenticity, narrative structure), and \texttt{research} (source quality,
|
|
argument coherence, citation accuracy). Domain is auto-detected from project
|
|
contents or specified in configuration.
|
|
|
|
% ============================================================
|
|
\section{The Seven Archetypes}
|
|
\label{sec:archetypes}
|
|
|
|
Each archetype embodies a cognitive orientation with a defined virtue (productive
|
|
mode) and shadow (destructive mode). \Cref{tab:archetypes} summarizes the
|
|
complete taxonomy.
|
|
|
|
\begin{table}[t]
|
|
\centering
|
|
\caption{The seven ArcheFlow archetypes with their PDCA phase assignments,
|
|
cognitive virtues, and shadow failure modes.}
|
|
\label{tab:archetypes}
|
|
\begin{tabular}{@{}llllll@{}}
|
|
\toprule
|
|
\textbf{Archetype} & \textbf{Phase} & \textbf{Virtue} & \textbf{Shadow} & \textbf{Model Tier} \\
|
|
\midrule
|
|
Explorer & Plan & Contextual Clarity & Rabbit Hole & Haiku \\
|
|
Creator & Plan & Decisive Framing & Over-Architect & Sonnet \\
|
|
Maker & Do & Execution Discipline & Rogue & Sonnet \\
|
|
Guardian & Check & Threat Intuition & Paranoid & Sonnet \\
|
|
Skeptic & Check & Assumption Surfacing & Paralytic & Haiku \\
|
|
Trickster & Check & Adversarial Creativity & False Alarm & Haiku \\
|
|
Sage & Check & Maintainability Judgment & Bureaucrat & Haiku \\
|
|
\bottomrule
|
|
\end{tabular}
|
|
\end{table}
|
|
|
|
The archetype--shadow pairing is not metaphorical; it is the core mechanism
|
|
for maintaining agent quality. The virtue describes \emph{what} the archetype
|
|
contributes; the shadow describes what happens when that contribution becomes
|
|
excessive. An Explorer who never stops researching (Rabbit Hole) delays the
|
|
entire pipeline. A Guardian who rejects everything (Paranoid) prevents any
|
|
code from shipping.
|
|
|
|
\subsection{Cost-Aware Model Assignment}
|
|
|
|
Not all archetypes require the same model capability. Analytical tasks
|
|
(exploration, assumption checking, code quality review) can be performed by
|
|
cheaper models (Haiku), while creative tasks (architecture design,
|
|
implementation, security analysis) benefit from more capable models (Sonnet).
|
|
This tiered assignment reduces per-run costs by 40--60\% compared to using the
|
|
most capable model for all agents, with no observed quality degradation in
|
|
analytical roles.
|
|
|
|
% ============================================================
|
|
\section{Shadow Detection and Corrective Action}
|
|
\label{sec:shadows}
|
|
|
|
\subsection{Archetype Shadows}
|
|
|
|
Shadow detection is \emph{quantitative, not sentiment-based}. Each archetype has
|
|
a specific trigger condition derived from structural properties of its output:
|
|
|
|
\begin{table}[h]
|
|
\centering
|
|
\caption{Shadow detection triggers. Each trigger is evaluated automatically
|
|
after the agent completes.}
|
|
\label{tab:shadows}
|
|
\begin{tabular}{@{}lll@{}}
|
|
\toprule
|
|
\textbf{Archetype} & \textbf{Shadow} & \textbf{Trigger} \\
|
|
\midrule
|
|
Explorer & Rabbit Hole & Output $> 2000$ words without Recommendation section \\
|
|
Creator & Over-Architect & $> 2$ new abstractions for a single feature \\
|
|
Maker & Rogue & No tests in changeset, or files outside proposal scope \\
|
|
Guardian & Paranoid & CRITICAL:WARNING ratio $> 2{:}1$, or zero approvals \\
|
|
Skeptic & Paralytic & $> 7$ challenges with $< 50\%$ having alternatives \\
|
|
Trickster & False Alarm & Findings in untouched code, or $> 10$ total findings \\
|
|
Sage & Bureaucrat & Review length $> 2\times$ code change length \\
|
|
\bottomrule
|
|
\end{tabular}
|
|
\end{table}
|
|
|
|
The escalation protocol follows a three-strike pattern:
|
|
\begin{enumerate}
|
|
\item \textbf{First detection}: Inject a correction prompt that names the
|
|
shadow and redirects the agent toward its virtue.
|
|
\item \textbf{Second detection} (same shadow, same run): Replace the agent
|
|
with a fresh instance.
|
|
\item \textbf{Third detection}: Escalate to the user for manual intervention.
|
|
\end{enumerate}
|
|
|
|
\subsection{System Shadows}
|
|
|
|
Beyond individual archetype dysfunction, ArcheFlow monitors for
|
|
\emph{system-level} failure modes:
|
|
|
|
\begin{description}
|
|
\item[Echo Chamber]: Multiple reviewers produce identical findings, suggesting
|
|
they are confirming each other rather than applying independent judgment.
|
|
Detected when $> 60\%$ of findings across reviewers share the same
|
|
file-and-category tuple.
|
|
|
|
\item[Tunnel Vision]: All findings cluster in a single file or module while
|
|
the changeset spans multiple. Detected when $> 80\%$ of findings target
|
|
$< 20\%$ of changed files.
|
|
|
|
\item[Scope Creep]: Maker modifies files not mentioned in the Creator's
|
|
proposal. Detected by comparing \texttt{do-maker-files.txt} against the
|
|
proposal's file list.
|
|
\end{description}
|
|
|
|
\subsection{Policy Boundaries and the Wiggum Break}
|
|
|
|
The third layer enforces operational limits through budget gates, cycle
|
|
limits, and checkpoint policies. When limits are exceeded, the system
|
|
triggers a \emph{Wiggum Break}\footnote{Named after Chief Wiggum from
|
|
\emph{The Simpsons}---a nod to both ``policy enforcement'' and the
|
|
Ralph Loop plugin for Claude Code.}---a circuit breaker that halts
|
|
execution, saves state, and reports to the user.
|
|
|
|
Wiggum Breaks are classified as \emph{hard} (halt immediately) or
|
|
\emph{soft} (finish current task, then halt):
|
|
|
|
\begin{description}
|
|
\item[Hard breaks]: 3 consecutive agent failures, 3 consecutive shadow
|
|
detections in one run, test suite broken after merge, 2+ oscillating
|
|
findings.
|
|
\item[Soft breaks]: convergence score $< 0.5$ for 2 consecutive cycles,
|
|
findings unchanged between cycles, budget $> 95\%$ spent.
|
|
\end{description}
|
|
|
|
Each Wiggum Break emits a \texttt{wiggum.break} event capturing the
|
|
trigger, run state, and unresolved findings for post-run analysis.
|
|
|
|
\subsection{Connection to the Assistant Axis}
|
|
|
|
The shadow detection framework addresses the same fundamental problem identified
|
|
by \citet{lu2026assistant}: models drift from productive personas during
|
|
extended operation. Where their work identifies drift in activation space and
|
|
proposes activation capping as a mitigation, ArcheFlow operates at the
|
|
\emph{behavioral} level---detecting drift through output structure rather than
|
|
internal representations, and correcting through prompt injection rather than
|
|
activation manipulation.
|
|
|
|
This application-level approach has a practical advantage: it requires no access
|
|
to model internals and works with any LLM backend, including API-only models
|
|
where activation-level interventions are impossible. The tradeoff is that
|
|
behavioral detection is necessarily coarser than activation-level measurement
|
|
and can only detect drift after it manifests in output, not before.
|
|
|
|
% ============================================================
|
|
\section{Attention Filters and Information Flow}
|
|
\label{sec:attention}
|
|
|
|
A key design principle is that each agent receives \emph{only the information
|
|
relevant to its role}. This is implemented through \emph{attention filters}---rules
|
|
governing which artifacts from prior phases are injected into each agent's
|
|
context.
|
|
|
|
\begin{table}[h]
|
|
\centering
|
|
\caption{Attention filter matrix. Each agent receives only the artifacts marked
|
|
with \checkmark.}
|
|
\label{tab:attention}
|
|
\begin{tabular}{@{}lccccc@{}}
|
|
\toprule
|
|
\textbf{Agent} & \textbf{Task} & \textbf{Explorer} & \textbf{Creator} & \textbf{Diff} & \textbf{Reviews} \\
|
|
\midrule
|
|
Explorer & \checkmark & & & & \\
|
|
Creator & \checkmark & \checkmark & & & \\
|
|
Maker & \checkmark & & \checkmark & & \\
|
|
Guardian & & & (risks) & \checkmark & \\
|
|
Skeptic & & & \checkmark & & \\
|
|
Sage & & & \checkmark & \checkmark & \\
|
|
Trickster & & & & \checkmark & \\
|
|
\bottomrule
|
|
\end{tabular}
|
|
\end{table}
|
|
|
|
The rationale for attention filtering is twofold:
|
|
|
|
\begin{enumerate}
|
|
\item \textbf{Independence}: Reviewers who see each other's findings tend to
|
|
converge on a shared narrative rather than applying independent judgment. By
|
|
isolating reviewer inputs, ArcheFlow ensures that each reviewer contributes a
|
|
genuinely distinct perspective.
|
|
|
|
\item \textbf{Focus}: An agent given everything tends to address everything,
|
|
producing diluted analysis. The Trickster, for example, receives \emph{only}
|
|
the diff---no design rationale, no risk analysis---forcing it to evaluate the
|
|
code purely on its own terms.
|
|
\end{enumerate}
|
|
|
|
In PDCA cycle 2+, the feedback from the Act phase is routed selectively:
|
|
Creator-routed issues go to the Creator, Maker-routed issues go to the Maker.
|
|
Neither sees the other's feedback, preventing defensive responses to criticism
|
|
that was directed elsewhere.
|
|
|
|
% ============================================================
|
|
\section{Feedback Routing}
|
|
\label{sec:routing}
|
|
|
|
When the Check phase identifies issues, the Act phase must decide where to route
|
|
each finding for the next cycle. ArcheFlow uses a deterministic routing table
|
|
based on the source archetype and finding category:
|
|
|
|
\begin{table}[h]
|
|
\centering
|
|
\caption{Feedback routing table. Findings are routed to the agent best equipped
|
|
to address them, preventing cross-contamination.}
|
|
\label{tab:routing}
|
|
\begin{tabular}{@{}llll@{}}
|
|
\toprule
|
|
\textbf{Source} & \textbf{Category} & \textbf{Routes To} & \textbf{Rationale} \\
|
|
\midrule
|
|
Guardian & security, breaking-change & Creator & Design must change \\
|
|
Guardian & reliability, dependency & Creator & Architectural decision \\
|
|
Skeptic & design, scalability & Creator & Assumptions need revision \\
|
|
Sage & quality, consistency & Maker & Implementation refinement \\
|
|
Sage & testing & Maker & Test gap, not design flaw \\
|
|
Trickster & reliability (design flaw) & Creator & Needs redesign \\
|
|
Trickster & reliability (test gap) & Maker & Needs more tests \\
|
|
\bottomrule
|
|
\end{tabular}
|
|
\end{table}
|
|
|
|
The disambiguation principle: if fixing the issue requires changing the
|
|
\emph{approach}, route to Creator. If it requires changing the \emph{code within
|
|
the existing approach}, route to Maker. Findings that persist across two
|
|
consecutive cycles are escalated to the user rather than cycled indefinitely.
|
|
|
|
% ============================================================
|
|
\section{Convergence Detection}
|
|
\label{sec:convergence}
|
|
|
|
\subsection{Convergence Score}
|
|
|
|
In PDCA cycle 2+, ArcheFlow compares current findings against the previous cycle
|
|
and classifies each as \textsc{New}, \textsc{Resolved}, \textsc{Persistent}, or
|
|
\textsc{Regressed}. The convergence score is:
|
|
|
|
\begin{equation}
|
|
C = \frac{|\textsc{Resolved}|}{|\textsc{Resolved}| + |\textsc{New}| + |\textsc{Regressed}|}
|
|
\label{eq:convergence}
|
|
\end{equation}
|
|
|
|
\begin{table}[h]
|
|
\centering
|
|
\caption{Convergence score interpretation and corresponding actions.}
|
|
\label{tab:convergence}
|
|
\begin{tabular}{@{}lll@{}}
|
|
\toprule
|
|
\textbf{Score Range} & \textbf{Status} & \textbf{Action} \\
|
|
\midrule
|
|
$C > 0.8$ & Converging & Continue if cycles remain \\
|
|
$0.5 \leq C \leq 0.8$ & Stalling & Continue with caution \\
|
|
$C < 0.5$ & Diverging & Stop if 2 consecutive diverging cycles \\
|
|
$C = 0$ & Stuck & Stop immediately \\
|
|
\bottomrule
|
|
\end{tabular}
|
|
\end{table}
|
|
|
|
\subsection{Oscillation Detection}
|
|
|
|
A finding is \emph{oscillating} if it was present in cycle $n-2$, absent in
|
|
cycle $n-1$, and present again in cycle $n$. Two or more oscillating findings
|
|
trigger an immediate stop with escalation to the user, as oscillation indicates
|
|
a fundamental tension in the review criteria that automated cycles cannot
|
|
resolve.
|
|
|
|
\subsection{Adaptive Workflow Escalation}
|
|
|
|
Convergence detection interacts with workflow selection through Rule A1: if a
|
|
\texttt{fast} workflow and Guardian finds $\geq 2$ CRITICAL findings, the next
|
|
cycle escalates to \texttt{standard} (adding Skeptic and Sage reviewers). Once
|
|
escalated, the workflow remains escalated for the duration of the run.
|
|
|
|
Conversely, Rule A2 provides a \emph{fast-path}: if Guardian finds zero CRITICAL
|
|
and zero WARNING findings, remaining reviewers are skipped entirely, and the
|
|
system proceeds directly to Act. This optimization reduces the cost of runs
|
|
where the Maker's implementation is clean.
|
|
|
|
% ============================================================
|
|
\section{Evidence Validation}
|
|
\label{sec:evidence}
|
|
|
|
Reviewer findings are subject to evidence validation before they influence
|
|
routing decisions. A CRITICAL or WARNING finding is downgraded to INFO if:
|
|
|
|
\begin{itemize}
|
|
\item It uses \emph{banned hedging phrases} without supporting evidence:
|
|
``might be'', ``could potentially'', ``appears to'', ``seems like'', ``may not''.
|
|
\item It contains \emph{no evidence}: no command output, code citation, line
|
|
reference, or reproduction steps.
|
|
\end{itemize}
|
|
|
|
This mechanism addresses a well-known failure mode of LLM reviewers: generating
|
|
plausible-sounding but unsupported concerns. By requiring evidence for
|
|
high-severity findings, ArcheFlow forces reviewers to ground their analysis in
|
|
the actual changeset rather than speculation.
|
|
|
|
Downgrades are tracked in the event log but do \emph{not} modify the original
|
|
artifact files, preserving the complete reviewer output for post-run analysis.
|
|
|
|
% ============================================================
|
|
\section{Effectiveness Scoring}
|
|
\label{sec:effectiveness}
|
|
|
|
After each completed run, ArcheFlow scores review archetypes across five
|
|
dimensions:
|
|
|
|
\begin{table}[h]
|
|
\centering
|
|
\caption{Effectiveness scoring dimensions and their weights.}
|
|
\label{tab:effectiveness}
|
|
\begin{tabular}{@{}lp{7cm}r@{}}
|
|
\toprule
|
|
\textbf{Dimension} & \textbf{Description} & \textbf{Weight} \\
|
|
\midrule
|
|
Signal-to-noise & Ratio of useful findings to total findings & 0.30 \\
|
|
Fix rate & Fraction of findings that led to applied fixes & 0.25 \\
|
|
Cost efficiency & Useful findings per dollar of model inference cost & 0.20 \\
|
|
Accuracy & Fraction not contradicted by other reviewers & 0.15 \\
|
|
Cycle impact & Whether findings contributed to cycle exit decision & 0.10 \\
|
|
\bottomrule
|
|
\end{tabular}
|
|
\end{table}
|
|
|
|
Scores accumulate in a cross-run memory file
|
|
(\texttt{.archeflow/memory/effectiveness.jsonl}). After 10+ completed runs,
|
|
the system recommends model tier changes (e.g., promoting a Haiku-tier reviewer
|
|
to Sonnet if its signal-to-noise is consistently high) and, in extreme cases,
|
|
archetype removal for persistently low-scoring reviewers.
|
|
|
|
% ============================================================
|
|
\section{Cross-Run Memory}
|
|
\label{sec:memory}
|
|
|
|
ArcheFlow maintains a lesson-learning system that persists across runs. When
|
|
recurring findings are detected---the same category of issue appearing in
|
|
multiple runs---the system stores a lesson and injects it into future agents
|
|
as additional context.
|
|
|
|
Lessons decay over time: each lesson has a relevance counter that increments on
|
|
reuse and decrements on irrelevance. Lessons that fall below a threshold are
|
|
archived rather than injected, preventing the accumulation of stale guidance.
|
|
|
|
The memory system also performs regression detection: if a previously resolved
|
|
issue reappears, it is flagged as a regression with higher priority than a
|
|
fresh finding.
|
|
|
|
% ============================================================
|
|
\section{Implementation}
|
|
\label{sec:implementation}
|
|
|
|
ArcheFlow is implemented in approximately 6,700 lines across three layers:
|
|
|
|
\begin{itemize}
|
|
\item \textbf{Skills} (19 Markdown files, $\sim$2,500 lines): Operational
|
|
instructions for Claude Code, written as imperative protocols. The core
|
|
\texttt{run} skill encodes the complete PDCA orchestration in 466 lines.
|
|
|
|
\item \textbf{Agent personas} (7 Markdown files, $\sim$700 lines): Behavioral
|
|
protocols defining each archetype's cognitive lens, output format, and
|
|
self-review checklist.
|
|
|
|
\item \textbf{Library scripts} (10 Bash scripts, $\sim$3,500 lines): Event
|
|
logging, git operations, memory management, progress tracking, effectiveness
|
|
scoring, and run replay.
|
|
\end{itemize}
|
|
|
|
The system uses no database, no API server, and no runtime dependencies beyond
|
|
Bash 4+ and a Claude Code installation. All state is stored in JSONL event logs
|
|
and Markdown artifact files. This zero-dependency architecture was a deliberate
|
|
design choice: orchestration infrastructure that itself requires complex setup
|
|
and maintenance undermines the autonomy it is supposed to enable.
|
|
|
|
\subsection{Git Integration}
|
|
|
|
ArcheFlow creates per-phase commits, enabling fine-grained rollback. The Maker
|
|
operates in a git worktree---an isolated working copy---so its changes do not
|
|
affect the main branch until explicitly merged. If post-merge tests fail, the
|
|
system auto-reverts the merge and cycles back with ``integration test failure''
|
|
feedback.
|
|
|
|
\subsection{Run Replay}
|
|
|
|
All orchestration decisions are logged as \texttt{decision.point} events,
|
|
enabling post-hoc analysis. The replay system provides:
|
|
\begin{itemize}
|
|
\item \textbf{Timeline view}: chronological sequence of all decisions with
|
|
confidence scores.
|
|
\item \textbf{Weighted what-if}: re-evaluation of the ship/block outcome
|
|
using different reviewer weights, answering questions like ``would the outcome
|
|
have changed if we weighted Guardian 2x and Sage 0.5x?''
|
|
\item \textbf{Cross-run comparison}: side-by-side analysis of decision
|
|
patterns across runs.
|
|
\end{itemize}
|
|
|
|
% ============================================================
|
|
\section{Multi-Domain Application}
|
|
\label{sec:domains}
|
|
|
|
ArcheFlow's archetype system extends beyond code. The framework has been
|
|
deployed across three domains:
|
|
|
|
\subsection{Software Engineering}
|
|
|
|
The primary domain. Archetypes map to standard engineering roles: Explorer
|
|
performs codebase research, Creator designs architecture, Maker writes code,
|
|
and the Check-phase archetypes review for security (Guardian), design flaws
|
|
(Skeptic), edge cases (Trickster), and overall quality (Sage).
|
|
|
|
\subsection{Creative Writing}
|
|
|
|
In writing mode, the same archetype structure applies with adapted quality
|
|
criteria. Custom archetypes (story-explorer, story-sage) replace or augment
|
|
the defaults. The framework integrates with Colette, a voice profiling system
|
|
that maintains consistent authorial voice across chapters. Quality gates check
|
|
for voice consistency, dialect authenticity, and narrative structure rather
|
|
than test coverage and security.
|
|
|
|
\subsection{Academic Research}
|
|
|
|
In research mode, quality criteria shift to source quality, argument coherence,
|
|
citation accuracy, and methodological rigor. The Guardian reviews for logical
|
|
fallacies and unsupported claims rather than security vulnerabilities.
|
|
|
|
% ============================================================
|
|
\section{Discussion}
|
|
\label{sec:discussion}
|
|
|
|
\subsection{Archetypes vs. Role Descriptions}
|
|
|
|
The key distinction between ArcheFlow's approach and prior multi-agent systems
|
|
is the \emph{shadow} mechanism. A role description tells an agent what to do;
|
|
an archetype tells an agent what to do \emph{and what doing too much of it
|
|
looks like}. This bidirectional specification creates a bounded operating
|
|
range for each agent, preventing the unbounded optimization that leads to
|
|
dysfunction.
|
|
|
|
The connection to \citet{lu2026assistant}'s persona axis is instructive.
|
|
They show that model personas exist on a continuum, with the Assistant identity
|
|
at one extreme and theatrical/mystical identities at the other. ArcheFlow's
|
|
archetypes deliberately position agents \emph{away} from the default Assistant
|
|
toward specific cognitive orientations---but the shadow mechanism prevents them
|
|
from drifting too far, maintaining a productive operating range analogous to
|
|
what \citeauthor{lu2026assistant} achieve through activation capping.
|
|
|
|
\subsection{Wiggum Breaks as Human-in-the-Loop Boundaries}
|
|
|
|
A central question in autonomous agent systems is: \emph{when should the
|
|
system stop acting and ask a human?} Most frameworks treat this as an
|
|
implementation detail---a timeout, a retry limit, an exception handler.
|
|
ArcheFlow treats it as a first-class architectural concept through the
|
|
\emph{Wiggum Break}.
|
|
|
|
The Wiggum Break defines the \textbf{formal boundary between autonomous and
|
|
human-supervised operation}. It is not a failure mode; it is the system's
|
|
\emph{designed} response to situations where autonomous resolution is
|
|
provably unproductive:
|
|
|
|
\begin{itemize}
|
|
\item \textbf{Oscillation} (finding present $\to$ absent $\to$ present)
|
|
indicates a genuine tension in the review criteria that no amount of
|
|
cycling will resolve---only human judgment about which criterion takes
|
|
priority.
|
|
|
|
\item \textbf{Divergence} (convergence score $< 0.5$ for two consecutive
|
|
cycles) indicates that the implementation is getting worse with each
|
|
iteration---the agents lack the context or capability to solve the
|
|
problem, and continuing wastes resources.
|
|
|
|
\item \textbf{Repeated shadow detection} (same dysfunction three times)
|
|
indicates that the corrective action framework has exhausted its
|
|
options---the task structure is incompatible with the assigned archetype,
|
|
and a human must re-scope.
|
|
\end{itemize}
|
|
|
|
This framing inverts the typical HITL paradigm. Rather than asking
|
|
``how much autonomy should the system have?'' and pre-defining approval
|
|
gates, ArcheFlow asks ``under what conditions is autonomy
|
|
\emph{provably unproductive}?'' and derives the HITL boundary from
|
|
convergence theory. The system runs autonomously by default and escalates
|
|
only when it can demonstrate---through quantitative metrics, not
|
|
heuristics---that continued autonomous operation will not improve the
|
|
outcome.
|
|
|
|
This approach has three advantages over pre-defined approval gates:
|
|
|
|
\begin{enumerate}
|
|
\item \textbf{Adaptive autonomy}: Simple tasks never trigger a Wiggum
|
|
Break; complex tasks trigger one quickly. The HITL boundary adapts to
|
|
task difficulty without manual configuration.
|
|
|
|
\item \textbf{Auditable escalation}: Every Wiggum Break emits a
|
|
\texttt{wiggum.break} event with the trigger condition, run state, and
|
|
unresolved findings. The human receives not just a request for help,
|
|
but a structured summary of \emph{why} autonomous resolution failed
|
|
and what specifically needs their judgment.
|
|
|
|
\item \textbf{Minimal interruption}: Pre-defined gates (``approve every
|
|
PR'', ``review every design'') interrupt the human on tasks the system
|
|
could have handled autonomously. Convergence-derived breaks interrupt
|
|
only when the system has evidence that it cannot proceed productively.
|
|
\end{enumerate}
|
|
|
|
The Wiggum Break thus operationalizes a principle from resilience
|
|
engineering: the system should be \emph{autonomy-seeking} (preferring to
|
|
resolve issues itself) but \emph{escalation-ready} (able to produce a
|
|
useful handoff when self-resolution fails). The quality of the handoff---not
|
|
just the fact of escalation---is what makes HITL effective.
|
|
|
|
\subsection{Limitations}
|
|
|
|
\begin{enumerate}
|
|
\item \textbf{No activation-level control}: ArcheFlow operates purely at the
|
|
prompt level. It cannot detect persona drift before it manifests in output,
|
|
unlike activation-level approaches \citep{lu2026assistant}.
|
|
|
|
\item \textbf{Single LLM backend}: The current implementation targets Claude
|
|
Code. While the architectural principles are model-agnostic, the skill and
|
|
hook system is specific to Claude Code's plugin API.
|
|
|
|
\item \textbf{Evaluation methodology}: We have not conducted controlled
|
|
experiments comparing ArcheFlow's output quality against baselines (single-agent,
|
|
role-based multi-agent without shadows, PDCA without archetypes). The system
|
|
has been evaluated through production use across real projects, which
|
|
demonstrates practical utility but not causal attribution.
|
|
|
|
\item \textbf{Shadow trigger thresholds}: The quantitative thresholds
|
|
(e.g., 2000 words for Rabbit Hole, ratio $> 2{:}1$ for Paranoid) were
|
|
determined empirically through iterative use and may not generalize across
|
|
all codebases and domains.
|
|
\end{enumerate}
|
|
|
|
\subsection{Future Work}
|
|
|
|
\begin{enumerate}
|
|
\item \textbf{Activation-level integration}: Combining behavioral shadow
|
|
detection with the Assistant Axis measurement from \citet{lu2026assistant}
|
|
could provide earlier and more reliable drift detection, particularly for
|
|
open-weight models where activations are accessible.
|
|
|
|
\item \textbf{Controlled evaluation}: A systematic comparison across standard
|
|
benchmarks (SWE-bench, HumanEval) would establish whether the archetype +
|
|
PDCA approach provides measurable quality improvements over simpler
|
|
orchestration strategies.
|
|
|
|
\item \textbf{Archetype discovery}: Rather than hand-designing archetypes,
|
|
the persona space analysis from \citet{lu2026assistant} could be used to
|
|
identify \emph{natural} cognitive orientations that models adopt, potentially
|
|
revealing useful archetypes that human intuition would not suggest.
|
|
|
|
\item \textbf{Cross-model persona stability}: Investigating whether shadow
|
|
triggers calibrated for one model family transfer to others, or whether
|
|
per-model calibration is necessary.
|
|
\end{enumerate}
|
|
|
|
% ============================================================
|
|
\section{Conclusion}
|
|
\label{sec:conclusion}
|
|
|
|
ArcheFlow demonstrates that multi-agent LLM orchestration benefits from
|
|
structured persona management---not just telling agents \emph{what to do},
|
|
but actively monitoring and correcting \emph{how they do it}. The combination
|
|
of Jungian archetypes (providing a principled taxonomy of cognitive virtues and
|
|
their failure modes) with PDCA quality cycles (providing convergence guarantees
|
|
and principled stopping criteria) produces an orchestration framework that
|
|
maintains productive agent behavior across extended autonomous sessions.
|
|
|
|
The shadow detection mechanism---quantitative triggers for archetype-specific
|
|
dysfunction---addresses the same persona stability challenge identified by
|
|
\citet{lu2026assistant} at the application level, requiring no access to model
|
|
internals and working with any LLM backend. While coarser than activation-level
|
|
approaches, behavioral shadow detection is practical, interpretable, and
|
|
immediately deployable.
|
|
|
|
ArcheFlow is open-source under the MIT license and available at
|
|
\url{https://github.com/XORwell/archeflow}.
|
|
|
|
% ============================================================
|
|
\section*{Acknowledgments}
|
|
|
|
The author thanks the Claude Code team at Anthropic for building the plugin
|
|
infrastructure that made ArcheFlow possible, and the authors of
|
|
\citet{lu2026assistant} for the Assistant Axis framework that informed the
|
|
theoretical grounding of shadow detection.
|
|
|
|
% ============================================================
|
|
\bibliographystyle{plainnat}
|
|
\bibliography{references}
|
|
|
|
\end{document}
|