refactor: simplify memory and shadow-detection skills

Trim verbose implementation details that duplicate what the bash helper
scripts already handle. Memory skill: 278 -> 120 lines. Shadow detection
skill: 180 -> 66 lines. All essential protocols, tables, and commands
preserved; removed redundant algorithm descriptions, multiple examples,
and narrative prose.
This commit is contained in:
2026-04-06 20:42:47 +02:00
parent 55a6ba14c9
commit 1baaa79946
2 changed files with 83 additions and 354 deletions

View File

@@ -11,21 +11,14 @@ description: |
# Cross-Run Memory
ArcheFlow forgets everything after each run. If Guardian repeatedly flags the same type of issue (e.g., timeline errors in fiction, missing null checks in code), the next run starts from zero. This skill fixes that by extracting lessons from completed runs and injecting them into future agent prompts.
ArcheFlow forgets everything after each run. This skill extracts lessons from completed runs and injects them into future agent prompts, so recurring issues (timeline errors, missing null checks) are caught proactively.
## Storage
```
.archeflow/memory/lessons.jsonl # Append-only, one lesson per line
```
Each lesson is a single JSON line:
```jsonl
{"id":"m-001","ts":"2026-04-03T14:00:00Z","run_id":"2026-04-03-der-huster","type":"pattern","source":"guardian","description":"Timeline references must match story start day","frequency":2,"severity":"bug","domain":"writing","tags":["continuity","timeline"],"last_seen_run":"2026-04-03-der-huster","runs_since_last_seen":0}
{"id":"m-002","ts":"2026-04-03T15:00:00Z","run_id":"2026-04-03-der-huster","type":"preference","source":"user_feedback","description":"User prefers single bundled PR over many small ones","frequency":1,"severity":"info","domain":"general","tags":["workflow"],"last_seen_run":"","runs_since_last_seen":0}
{"id":"m-003","ts":"2026-04-04T10:00:00Z","run_id":"2026-04-04-auth-fix","type":"archetype_hint","source":"sage","description":"Voice drift most common in long monologue passages","frequency":3,"severity":"warning","domain":"writing","tags":["voice","prose"],"archetype":"story-sage","last_seen_run":"2026-04-04-auth-fix","runs_since_last_seen":0}
{"id":"m-004","ts":"2026-04-04T11:00:00Z","run_id":"2026-04-04-auth-fix","type":"anti_pattern","source":"maker","description":"Splitting auth middleware into per-route handlers causes duplication","frequency":1,"severity":"warning","domain":"code","tags":["auth","middleware"],"last_seen_run":"2026-04-04-auth-fix","runs_since_last_seen":0}
.archeflow/memory/archive.jsonl # Decayed lessons (frequency reached 0)
.archeflow/memory/audit.jsonl # Injection audit trail
```
## Lesson Types
@@ -33,245 +26,95 @@ Each lesson is a single JSON line:
| Type | Source | Description |
|------|--------|-------------|
| `pattern` | Auto-detected | Recurring finding across runs (same category + similar description) |
| `preference` | Manual | User correction or workflow preference (added via CLI) |
| `preference` | Manual | User correction or workflow preference (injected immediately, skips frequency threshold) |
| `archetype_hint` | Auto-detected | Per-archetype insight (e.g., Sage catches voice drift in monologues) |
| `anti_pattern` | Manual or auto | Something that was tried and failed avoid repeating |
| `anti_pattern` | Manual or auto | Something that was tried and failed -- avoid repeating |
## Lesson Fields
## Lesson JSON Fields
| Field | Type | Description |
|-------|------|-------------|
| `id` | string | Unique ID, format `m-NNN` (monotonically increasing) |
| `ts` | ISO 8601 | When the lesson was created or last updated |
| `id` | string | `m-NNN` (monotonically increasing) |
| `ts` | ISO 8601 | Created or last updated |
| `run_id` | string | Run that created or last triggered this lesson |
| `type` | string | One of: `pattern`, `preference`, `archetype_hint`, `anti_pattern` |
| `source` | string | Archetype or `user_feedback` that originated the lesson |
| `type` | string | `pattern`, `preference`, `archetype_hint`, `anti_pattern` |
| `source` | string | Archetype name or `user_feedback` |
| `description` | string | Human-readable lesson text |
| `frequency` | integer | How many times this lesson was triggered |
| `severity` | string | `bug`, `warning`, `info`, or `recommendation` |
| `frequency` | integer | Times this lesson was triggered |
| `severity` | string | `bug`, `warning`, `info`, `recommendation` |
| `domain` | string | `writing`, `code`, `general`, or project-specific |
| `tags` | string[] | Keywords for matching and filtering |
| `archetype` | string or null | For `archetype_hint` type — which archetype this applies to |
| `last_seen_run` | string | Run ID where this lesson was last matched |
| `runs_since_last_seen` | integer | Counter for decay — incremented each run that does NOT trigger this lesson |
| `archetype` | string? | For `archetype_hint` -- which archetype this applies to |
| `last_seen_run` | string | Run ID where last matched |
| `runs_since_last_seen` | integer | Counter for decay |
Example:
```jsonl
{"id":"m-001","ts":"2026-04-03T14:00:00Z","run_id":"2026-04-03-der-huster","type":"pattern","source":"guardian","description":"Timeline references must match story start day","frequency":2,"severity":"bug","domain":"writing","tags":["continuity","timeline"],"last_seen_run":"2026-04-03-der-huster","runs_since_last_seen":0}
```
---
## Auto-Detection
After each `run.complete`, the orchestrator runs lesson extraction:
After each `run.complete`, extract lessons from findings:
```bash
./lib/archeflow-memory.sh extract .archeflow/events/<run_id>.jsonl
```
### Extraction Algorithm
The script reads `review.verdict` events, matches findings against existing lessons by keyword overlap (50%+ threshold), increments frequency on matches, and creates new candidate lessons (frequency: 1) for unmatched findings with severity >= WARNING.
1. **Read all `review.verdict` events** from the completed run's JSONL.
2. **For each finding** in each verdict:
a. Tokenize the finding description into keywords (lowercase, strip punctuation).
b. Compare keywords against each existing lesson's description + tags.
c. **Match threshold:** 50%+ keyword overlap between finding and lesson.
3. **If match found:** Update the existing lesson:
- Increment `frequency` by 1
- Update `ts` to now
- Update `last_seen_run` to current run ID
- Reset `runs_since_last_seen` to 0
4. **If no match AND severity >= WARNING:** Add as candidate lesson with `frequency: 1`.
5. **Candidates become active** when `frequency >= 2` (triggered in a second run).
### Promotion Rule
A finding that appears in only one run stays at `frequency: 1` — it might be a one-off. Once the same pattern appears in a second run (matched by keyword overlap), it gets promoted to `frequency: 2` and becomes eligible for injection.
This prevents noise from single-run anomalies while still capturing genuine recurring issues quickly.
---
**Promotion rule:** A finding needs `frequency >= 2` (seen in 2+ runs) before injection. This filters out one-off noise. Preferences skip this threshold.
## Injection
At run start, before spawning agents, the orchestrator injects relevant lessons:
Before spawning agents, inject relevant lessons:
```bash
LESSONS=$(./lib/archeflow-memory.sh inject <domain> <archetype>)
```
### Injection Rules
Rules: filters by domain (or `general`), optionally by archetype, requires `frequency >= 2`, sorts by frequency descending, caps at 10 lessons. Lessons with `frequency >= 5` are always injected regardless of filters.
1. Read `lessons.jsonl`.
2. Filter by `domain` (exact match or `general`) and optionally by `archetype`.
3. Only include lessons with `frequency >= 2` (confirmed patterns).
4. Sort by frequency descending (most common first).
5. Cap at **10 lessons** per injection.
6. Lessons with `frequency >= 5` are **always injected** regardless of domain/archetype filter (they are universal enough to matter).
### Injection Format
Append to the agent's system prompt as a structured section:
Injected as a markdown section appended to the agent's system prompt:
```markdown
## Known Issues (from past runs)
- Timeline references must match story start day [seen 3x, guardian]
- Voice drift common in monologue passages >200 words [seen 2x, sage]
- Missing null checks in API response handlers [seen 5x, guardian]
```
### Integration with Run Skill
In the `run` skill, after Step 0 (Initialize) and before Step 1 (Plan Phase):
```bash
# Load cross-run memory for this domain
MEMORY_LESSONS=$(./lib/archeflow-memory.sh inject "$DOMAIN" "")
# Inject into Explorer/Creator prompts if non-empty
if [[ -n "$MEMORY_LESSONS" ]]; then
EXPLORER_PROMPT="${EXPLORER_PROMPT}
${MEMORY_LESSONS}"
CREATOR_PROMPT="${CREATOR_PROMPT}
${MEMORY_LESSONS}"
fi
```
For reviewers in the Check phase, inject archetype-specific lessons:
```bash
GUARDIAN_LESSONS=$(./lib/archeflow-memory.sh inject "$DOMAIN" "guardian")
SAGE_LESSONS=$(./lib/archeflow-memory.sh inject "$DOMAIN" "sage")
```
---
## Decay
Lessons that stop being relevant should fade out. After each `run.complete`, apply decay:
After each `run.complete`, apply decay: lessons not seen for 10 runs lose 1 frequency. When frequency reaches 0, the lesson is archived.
```bash
./lib/archeflow-memory.sh decay
```
### Decay Algorithm
1. For every lesson in `lessons.jsonl`:
- If `last_seen_run` is NOT the current run → increment `runs_since_last_seen` by 1
2. If `runs_since_last_seen >= 10`:
- Decrement `frequency` by 1
- Reset `runs_since_last_seen` to 0
3. If `frequency` drops to 0:
- Move the lesson to `.archeflow/memory/archive.jsonl` (append)
- Remove from `lessons.jsonl`
This means a lesson that was seen 5 times but then stops appearing will survive 50 runs of non-triggering before being fully archived (5 decrements x 10 runs each).
---
## Manual Management
### Add a lesson
```bash
archeflow memory add "User prefers single bundled PR" # Add preference (injected immediately)
archeflow memory list # Show all active lessons
archeflow memory forget m-002 # Archive a lesson
```
## Audit Trail
Track which lessons are injected per run and whether they were effective. Pass `--audit <run_id>` to inject to log records. After a run, `audit-check <run_id>` compares injected lessons against review findings: no matching finding = helpful (issue prevented), matching finding = ineffective (issue repeated despite injection).
```bash
archeflow memory add "User prefers single bundled PR over many small ones"
# Internally: ./lib/archeflow-memory.sh add preference "User prefers single bundled PR over many small ones"
./lib/archeflow-memory.sh inject "$DOMAIN" "" --audit "$RUN_ID"
./lib/archeflow-memory.sh audit-check <run_id>
```
Manually added lessons start at `frequency: 1` but with type `preference`, which means they are injected immediately (preferences skip the frequency >= 2 threshold).
### List lessons
```bash
archeflow memory list
# Internally: ./lib/archeflow-memory.sh list
```
Output:
```
ID Freq Type Domain Description
m-001 3 pattern writing Timeline references must match story start day
m-002 1 preference general User prefers single bundled PR over many small ones
m-003 5 archetype_hint writing Voice drift most common in long monologue passages
m-004 1 anti_pattern code Splitting auth middleware causes duplication
```
### Forget a lesson
```bash
archeflow memory forget m-002
# Internally: ./lib/archeflow-memory.sh forget m-002
```
Moves the lesson to `archive.jsonl` regardless of frequency.
---
## Integration Points
| Moment | Action | Script Command |
|--------|--------|----------------|
| After `run.complete` | Extract lessons from findings | `archeflow-memory.sh extract <events.jsonl>` |
| After extraction | Apply decay to all lessons | `archeflow-memory.sh decay` |
| Before agent spawn (run start) | Inject relevant lessons | `archeflow-memory.sh inject <domain> <archetype>` |
| Before agent spawn | Inject relevant lessons | `archeflow-memory.sh inject <domain> <archetype>` |
| User command | Add/list/forget lessons | `archeflow-memory.sh add/list/forget` |
## Audit Trail
Track which lessons are injected into each run and whether they were effective.
### Storage
```
.archeflow/memory/audit.jsonl # Append-only audit log
```
### Injection Audit Record
When `--audit <run_id>` is passed to the `inject` command, an audit record is written:
```jsonl
{"ts":"2026-04-04T10:00:00Z","run_id":"2026-04-04-auth-fix","domain":"code","archetype":"","lessons_injected":["m-001","m-003"],"lesson_count":2}
```
Usage:
```bash
./lib/archeflow-memory.sh inject "$DOMAIN" "" --audit "$RUN_ID"
```
### Effectiveness Check
After a run completes, check whether injected lessons prevented issues:
```bash
./lib/archeflow-memory.sh audit-check <run_id>
```
This command:
1. Reads `audit.jsonl` for lessons injected in the given run
2. Reads the run's event file for `review.verdict` events
3. For each injected lesson, checks keyword overlap between the lesson's description and review findings
4. **No matching finding** = `helpful` (the lesson likely prevented the issue)
5. **Matching finding** = `ineffective` (the issue repeated despite the lesson being injected)
6. Appends effectiveness results to `audit.jsonl`
### Effectiveness Over Time
By querying `audit.jsonl` for effectiveness records, you can measure:
- Which lessons consistently prevent issues (high `helpful` count)
- Which lessons are not working (high `ineffective` count — consider rewording or removing)
- Overall memory system ROI (ratio of helpful to ineffective across all runs)
```bash
# Count effectiveness per lesson
jq -r 'select(.type == "effectiveness_check") | [.lesson_id, .effectiveness] | @tsv' .archeflow/memory/audit.jsonl | sort | uniq -c
```
---
## Design Principles
1. **Append-only storage.** `lessons.jsonl` is append-only during writes; decay rewrites the file in place but preserves all data (archived lessons move to `archive.jsonl`).
2. **Conservative promotion.** A finding must appear in 2+ runs before injection. One-offs are noise.
3. **Graceful degradation.** If `lessons.jsonl` doesn't exist, injection returns empty — no error, no block.
4. **Cheap.** Keyword matching, not embeddings. `jq` for JSON, `grep` for matching. No external services.
5. **Bounded.** Max 10 lessons injected per prompt. Prevents context pollution.