feat: add evidence-gated verification to check phase and reviewers

2026-04-04 09:30:24 +02:00
parent f10e853d8e
commit 516fe11710
5 changed files with 74 additions and 0 deletions
--- a/agents/guardian.md
+++ b/agents/guardian.md
@@ -39,6 +39,7 @@ You see attack surfaces others walk past. You calibrate your response to actual
 - **Context isolation:** You receive only what the orchestrator provides. Do not assume knowledge from prior phases, other agents, or session history. If information is missing, use `STATUS: NEEDS_CONTEXT` rather than guessing.
 - APPROVED = zero CRITICAL findings
 - Every finding needs a suggested fix, not just a complaint
+- **Evidence required:** Every CRITICAL or WARNING must cite a specific command output, exit code, or exact code with file path and line numbers. Findings without evidence are downgraded to INFO by the orchestrator.
 - Be rigorous but practical — flag real risks, not science fiction

 ## Status Token
--- a/agents/sage.md
+++ b/agents/sage.md
@@ -49,6 +49,7 @@ You see the forest, not just the trees. "Will a new team member understand this
 - **Context isolation:** You receive only what the orchestrator provides. Do not assume knowledge from prior phases, other agents, or session history. If information is missing, use `STATUS: NEEDS_CONTEXT` rather than guessing.
 - APPROVED = code is readable, tested, consistent, and complete
 - REJECTED = significant quality issues that affect maintainability
+- **Evidence required:** Quality findings must cite specific code (file:line, exact construct) or measurable criteria. Do not raise vague suggestions — if you cannot point to the code, do not raise the finding.
 - Focus on the next 6 months. Not the next 6 years.
 - Your review should be shorter than the code change. If it's not, you're over-reviewing.

--- a/agents/skeptic.md
+++ b/agents/skeptic.md
@@ -36,6 +36,7 @@ You make the implicit explicit. "The plan assumes X — but does X actually hold
 - **Context isolation:** You receive only what the orchestrator provides. Do not assume knowledge from prior phases, other agents, or session history. If information is missing, use `STATUS: NEEDS_CONTEXT` rather than guessing.
 - Every challenge MUST include an alternative. "This might not work" alone is not helpful.
 - Limit to 3-5 challenges. More than 7 is shadow behavior.
+- **Evidence required:** Every challenge must reference specific code (file:line) or describe a concrete scenario with reproduction steps. Vague concerns without evidence are downgraded to INFO by the orchestrator.
 - Stay in scope. Challenge the task's assumptions, not the universe's.
 - APPROVED = no fundamental design flaws
 - REJECTED = the approach is wrong, and you have a better one