docs: add ArcheFlow roadmap v0.9-v0.12
This commit is contained in:
235
docs/plans/archeflow-roadmap-v1.md
Normal file
235
docs/plans/archeflow-roadmap-v1.md
Normal file
@@ -0,0 +1,235 @@
|
||||
# ArcheFlow Roadmap — From Framework to Tool
|
||||
|
||||
Status: Planning (2026-04-06)
|
||||
Context: v0.8.0 shipped — consolidated skills, corrective action framework, 110 tests. The scaffolding is solid. Now make it genuinely useful.
|
||||
|
||||
## Guiding Principle
|
||||
|
||||
Every feature must close a feedback loop or remove friction. No features that add complexity without measurable improvement in either speed, cost, or quality.
|
||||
|
||||
---
|
||||
|
||||
## Tier 1: Make the Sprint Runner Smart (highest impact)
|
||||
|
||||
### 1.1 Queue from Git Issues
|
||||
|
||||
**Problem:** Manual `queue.json` is the biggest friction point. Nobody wants to maintain a JSON file by hand.
|
||||
|
||||
**Solution:** `./scripts/ws sync-issues` that:
|
||||
- Reads Gitea/GitHub issues via API (`gh issue list` or Gitea REST)
|
||||
- Maps labels to priority: `P0`=critical/blocker, `P1`=high, `P2`=medium, `P3`=low/enhancement
|
||||
- Maps labels to estimate: `size/S`, `size/M`, `size/L`, `size/XL` (default: M)
|
||||
- Extracts `depends_on` from "blocks #N" / "depends on #N" in issue body
|
||||
- Upserts into `queue.json` (doesn't overwrite manual edits, merges by issue ID)
|
||||
- Skips issues with `wontfix`, `duplicate`, `question` labels
|
||||
|
||||
**Scope:** One script in `scripts/`, ~100 lines. Gitea API + GitHub API (detect from remote URL). Needs API token in env var `GITEA_TOKEN` or `GITHUB_TOKEN`.
|
||||
|
||||
**Test:** bats tests with mock API responses (curl fixture files).
|
||||
|
||||
### 1.2 Cost Estimation
|
||||
|
||||
**Problem:** Users don't know what a sprint will cost before running it.
|
||||
|
||||
**Solution:** `/af-sprint --dry-run` shows estimated cost:
|
||||
```
|
||||
Sprint estimate: 7 tasks, ~18 agents, est. $1.20-$2.40, ~12 minutes
|
||||
P1: writing.colette fanout (L) — est. $0.50, 4 agents
|
||||
P1: tool.archeflow review (M) — est. $0.15, 2 agents
|
||||
...
|
||||
Proceed? [y/n]
|
||||
```
|
||||
|
||||
**How:** Track actual token counts per task size (S/M/L/XL) in `.archeflow/memory/cost-history.jsonl`. After 5+ tasks per size bucket, use median. Before that, use defaults: S=$0.05, M=$0.15, L=$0.50, XL=$1.50.
|
||||
|
||||
**Scope:** Update `sprint` skill with estimation section. Add cost logging to `archeflow-event.sh` (include `tokens_used` in `agent.complete` data). New script `lib/archeflow-cost.sh` for estimation.
|
||||
|
||||
### 1.3 Smart Workflow Selection
|
||||
|
||||
**Problem:** Current auto-selection uses keyword matching ("fix" -> pipeline). This is crude.
|
||||
|
||||
**Solution:** Analyze the actual task + codebase signals:
|
||||
|
||||
| Signal | Source | Workflow |
|
||||
|--------|--------|----------|
|
||||
| Files matching `auth|crypto|secret|token|session` | task description + file paths | -> thorough |
|
||||
| Public API changes (OpenAPI spec modified, exported functions changed) | git diff | -> thorough |
|
||||
| <3 files changed, all in same dir | git diff | -> fast/pipeline |
|
||||
| Test files only | git diff | -> pipeline |
|
||||
| Historical: this project's last 3 runs needed 0 cycles | memory | -> fast |
|
||||
| Historical: this project's last run had 2+ CRITICALs | memory | -> thorough |
|
||||
|
||||
**Scope:** Add to the `run` skill's Strategy Selection section. Read git diff stats + memory lessons before choosing. ~20 lines of logic replacing the current keyword table.
|
||||
|
||||
---
|
||||
|
||||
## Tier 2: Close the Learning Loop
|
||||
|
||||
### 2.1 Confidence Calibration
|
||||
|
||||
**Problem:** Creator's confidence scores (0.0-1.0) are self-reported and uncalibrated. A Creator that always says 0.8 but gets rejected 40% of the time is not useful.
|
||||
|
||||
**Solution:** After each `run.complete`, log calibration data:
|
||||
```jsonl
|
||||
{"run_id":"...","creator_confidence":{"task":0.8,"solution":0.7,"risk":0.6},"actual_outcome":"rejected","cycles":2,"criticals":1}
|
||||
```
|
||||
|
||||
At run start, inject calibration context into Creator prompt:
|
||||
```
|
||||
Your historical calibration: You rate task understanding at 0.8 avg,
|
||||
but 35% of runs with that score needed cycle-back. Consider scoring
|
||||
more conservatively.
|
||||
```
|
||||
|
||||
**Scope:** New field in `archeflow-memory.sh` calibration store. ~30 lines in `run` skill to log + inject. Needs 5+ runs before meaningful.
|
||||
|
||||
### 2.2 Archetype Auto-Tuning
|
||||
|
||||
**Problem:** The effectiveness scoring system exists (`archeflow-score.sh`) but nothing acts on it.
|
||||
|
||||
**Solution:** After 10+ runs, auto-generate recommendations:
|
||||
```
|
||||
Archetype Recommendations (based on 15 runs):
|
||||
Guardian: essential (caught real issues in 80% of runs)
|
||||
Sage: keep (useful findings in 60% of runs)
|
||||
Skeptic: demote to thorough-only (useful in 20%, mostly INFO)
|
||||
Trickster: keep for thorough (caught 2 bugs Guardian missed)
|
||||
```
|
||||
|
||||
Add to `/af-score` output. Store recommendation in config as `reviewers.recommended`:
|
||||
```yaml
|
||||
reviewers:
|
||||
recommended:
|
||||
always: [guardian]
|
||||
default: [sage]
|
||||
thorough_only: [skeptic, trickster]
|
||||
# Auto-generated 2026-04-06 from 15 runs. Override with explicit config.
|
||||
```
|
||||
|
||||
**Scope:** Update `archeflow-score.sh` with recommendation logic. Update `run` skill to read recommended config. Add to `af-score` skill display.
|
||||
|
||||
### 2.3 Campaign Memory
|
||||
|
||||
**Problem:** Related runs (e.g., "harden all API endpoints") don't share context.
|
||||
|
||||
**Solution:** Optional `--campaign <id>` flag on `/af-run`:
|
||||
- Links runs under a campaign ID
|
||||
- Cross-run context: "In Run 1, we found the auth pattern uses middleware X. In Run 2, the same pattern applies."
|
||||
- Campaign-level progress: "3/8 endpoints hardened, 2 CRITICALs remaining"
|
||||
- Campaign memory injected into Explorer/Creator prompts
|
||||
|
||||
**Scope:** New field in event schema. Campaign index in `.archeflow/campaigns/`. Update memory injection to filter by campaign. ~50 lines in `run` skill.
|
||||
|
||||
---
|
||||
|
||||
## Tier 3: Integrate with Real Workflow
|
||||
|
||||
### 3.1 Findings as PR Comments
|
||||
|
||||
**Problem:** Review findings live in `.archeflow/artifacts/`. Nobody reads artifact files — they read PR comments.
|
||||
|
||||
**Solution:** After Check phase, if a PR exists for the branch:
|
||||
```bash
|
||||
# Post each CRITICAL/WARNING as a PR review comment
|
||||
gh api repos/{owner}/{repo}/pulls/{pr}/comments \
|
||||
--field body="🛡️ **Guardian** [CRITICAL/security]\n\n${description}\n\nSuggested fix: ${fix}" \
|
||||
--field path="${file}" --field line="${line}"
|
||||
```
|
||||
|
||||
**Scope:** New `--pr <number>` flag on `/af-run` and `/af-review`. Script `lib/archeflow-pr.sh` for posting comments. Falls back gracefully if no PR or no API token.
|
||||
|
||||
### 3.2 CI Hook Mode
|
||||
|
||||
**Problem:** ArcheFlow runs manually. It should run automatically on PRs.
|
||||
|
||||
**Solution:** Lightweight CI integration:
|
||||
```yaml
|
||||
# .github/workflows/archeflow-review.yml (or Gitea equivalent)
|
||||
on: pull_request
|
||||
jobs:
|
||||
review:
|
||||
runs-on: ubuntu-latest
|
||||
steps:
|
||||
- uses: actions/checkout@v4
|
||||
- run: claude --plugin-dir ./archeflow -p "/af-review --branch ${{ github.head_ref }} --pr ${{ github.event.number }}"
|
||||
```
|
||||
|
||||
Only runs Guardian (fast, cheap). Posts findings as PR comments. No PDCA overhead.
|
||||
|
||||
**Scope:** Template workflow file in `examples/ci/`. Update `review` skill to support `--pr` flag. Documentation.
|
||||
|
||||
### 3.3 Watch Mode
|
||||
|
||||
**Problem:** You have to remember to run `/af-review` after pushing.
|
||||
|
||||
**Solution:** `/af-watch` — background process that monitors a branch:
|
||||
- Uses `git log --since` polling (every 60s)
|
||||
- On new commits: auto-run `/af-review` on the diff
|
||||
- Posts findings as PR comments if PR exists
|
||||
- Respects budget gate from corrective action framework
|
||||
|
||||
**Scope:** New skill `af-watch/SKILL.md` (~30 lines). Uses the `loop` skill infrastructure. Low priority — CI hook mode covers most use cases.
|
||||
|
||||
---
|
||||
|
||||
## Tier 4: Replay and Analysis
|
||||
|
||||
### 4.1 Decision Journal
|
||||
|
||||
**Problem:** No visibility into why ArcheFlow made specific choices during a run.
|
||||
|
||||
**Solution:** Already started with `archeflow-decision.sh` and `archeflow-replay.sh`. Extend:
|
||||
- Log every decision point: workflow selection, A1/A2/A3 triggers, fix routing, shadow detections
|
||||
- `/af-replay <run_id> --timeline` shows the decision chain
|
||||
- `/af-replay <run_id> --whatif --workflow thorough` simulates: "What would thorough have found?"
|
||||
|
||||
**Scope:** Mostly built. Needs integration into the `run` skill (emit `decision.point` events at each choice). The replay script needs the what-if simulation logic.
|
||||
|
||||
### 4.2 Run Comparison
|
||||
|
||||
**Problem:** No way to evaluate whether workflow X is better than workflow Y for a project.
|
||||
|
||||
**Solution:** `/af-replay compare <run_a> <run_b>`:
|
||||
```
|
||||
Run A (standard, 4m30s, $0.80): 5 findings, 4 resolved, 1 INFO remaining
|
||||
Run B (thorough, 12m, $2.10): 7 findings, 6 resolved, 1 INFO remaining
|
||||
Delta: +2 findings (both INFO), +165% cost, +167% time
|
||||
Verdict: Standard was sufficient for this task.
|
||||
```
|
||||
|
||||
**Scope:** Update `archeflow-replay.sh` with comparison mode. Needs at least 2 runs on similar tasks.
|
||||
|
||||
---
|
||||
|
||||
## Implementation Order
|
||||
|
||||
```
|
||||
v0.9.0 — Sprint Intelligence
|
||||
1.1 Queue from issues
|
||||
1.2 Cost estimation
|
||||
1.3 Smart workflow selection
|
||||
|
||||
v0.10.0 — Learning Loop
|
||||
2.1 Confidence calibration
|
||||
2.2 Archetype auto-tuning
|
||||
2.3 Campaign memory
|
||||
|
||||
v0.11.0 — Integration
|
||||
3.1 Findings as PR comments
|
||||
3.2 CI hook mode
|
||||
3.3 Watch mode (stretch)
|
||||
|
||||
v0.12.0 — Analysis
|
||||
4.1 Decision journal (mostly done)
|
||||
4.2 Run comparison
|
||||
```
|
||||
|
||||
Each version is independently shippable. No version depends on a later one.
|
||||
|
||||
## What NOT to Build
|
||||
|
||||
- **Web dashboard** — Terminal is the interface. Don't add a server.
|
||||
- **Embedding-based memory** — Keyword matching works. Don't add vector DBs.
|
||||
- **Agent marketplace** — Focus on the 7 built-in archetypes being excellent.
|
||||
- **Multi-user collaboration** — ArcheFlow is a single-user tool. Git is the collaboration layer.
|
||||
- **Plugin system for plugins** — ArcheFlow IS a plugin. Don't go meta.
|
||||
Reference in New Issue
Block a user