Files

Christian Nennemann 325935226d feat: principle #36 — ephemeral execution environments

2026-03-31 22:29:04 +00:00

16 KiB

Raw Blame History

AI Dev Principles

Living document. Principles discovered while building with AI agents. Extracted from CLAUDE.md files, memory, project configs, and observed patterns. Drop ideas in chat — they land here.

Architecture & Design

1. Single Source of Truth

One file, one config, one definition. Never maintain the same information in two places.

Dockerfile: one shared by all agents
Queue: queue.json is canonical, markdown table is a view
Voice profiles: YAML is source, generated text is output

Test: If you change something, how many files do you need to touch? If >1, you have a SSOT violation.

2. Vertical Spike Before Framework

Validate architecture through working code, not docs. Don't write 1000 lines of specs for zero lines of working code. Frameworks are extracted from spikes, not theorized upfront.

Origin: afet validation strategy — spike first, then extract. Dogfood with a real project.

3. Convention Over Configuration

Reduce decisions by establishing patterns that just work.

Conventional commits everywhere (feat:, fix:, docs:)
docs/status.md in every project (same format)
Queue priority: P0 > P1 > P2 > P3
Agent picks up work without being told which item is next

4. Agent-First Design

Build frameworks and tools that agents can use, not just humans. Measure agent ergonomics.

Schema-free entities (type + status + JSON meta) — agents don't need to learn fixed schemas
Config-driven: new types/statuses/fields = TOML, never code changes
Standard tool interfaces (MCP) so any agent can plug in

Origin: tool.affordance — 380 tests, agent ergonomics testing built in.

5. Don't Make Me Think (About Infrastructure)

Dev environment changes propagate automatically. No manual rebuild, no "remember to also update X".

One Dockerfile, auto-detected by all consumers
devcontainer features for language runtimes
post-create scripts for project-specific setup
If a human has to remember a step, it will be forgotten

Cost & Efficiency

6. Cheapest Model per Task

Don't use Opus for what Haiku can do.

Task	Model	Cost
Validation, guardrails, diff	Haiku	$0.80/MTok
Creative writing, rewrites	Sonnet	$3/MTok
Architecture, complex reasoning	Opus	When needed

Two-pass approach: Haiku draft + Sonnet polish = 80% savings vs Sonnet throughout.

7. Budget-aware by Default

Track every token. No surprise bills.

--dry-run before expensive operations
Cost estimation before fan-out
CostGuard with hard limits per session
Batch API for bulk (50% discount)
Prompt caching for repeated system prompts (90% on reads)
Report estimated vs actual after runs
Budget decisions >$10 require user consent

8. Batch, Cache, Deduplicate

Before implementing any API operation, proactively optimize:

Batch: group N items per call (100 rows/insert, not 1)
Cache: static reference data between runs
Dedup: hash-based skip of already-processed items
Incremental: upserts and delta syncs, not full re-processing
Parallelize: async/concurrent where rate limits allow
Respect limits: throttle proactively, don't react to 429s

State the optimization strategy when proposing a workflow.

Execution & Error Handling

9. Checkpoint / Resume

Every long-running operation must be interruptible and resumable.

Save progress incrementally, not all at the end
Hash-based dedup so restarting skips completed work
JSONL for append-only progress logs
Killing a stuck task must not lose completed work

10. Diagnose Before Retrying

When something fails, understand why before trying again.

Read the error message. Check logs. Understand the cause.
Never retry the same command hoping for a different result
Never add broad try/except or || true to suppress errors
Never sleep-loop waiting for things to work
Fix the root cause, then try once more

11. Fail Forward

Don't block on one broken thing. Document it, next item.

Error → log it → move to next queue item
Missing dependency → note it → work on something else
Rate limit → save progress → switch tasks
Unsolvable error → document in status, move on

12. Read Before Write

Understand existing code before changing it. Match the project's patterns, don't invent new ones.

No todo!(), unimplemented!(), pass in production code
Fix lint/test failures before committing
If a pre-commit hook fails, make a NEW commit (never --amend after failure)
Never silence warnings to make code compile — fix the root cause

Autonomy & Agents

13. Autonomous but Auditable

Agents work independently. Humans can always follow what happened.

Status logs updated after every sprint
Control center as handoff document between sessions
Conventional commits with meaningful messages
No silent failures — document and move on

14. Parallel by Default

Never sit idle when there's work that can run concurrently.

Multiple agents on independent projects simultaneously (up to 4-5)
Background scouts while foreground work continues
If blocked on one thing, pick up the next queue item
Use idle time productively — check the task list

15. Dual-Agent Routing

Different agents for different task shapes. Route to strengths.

Claude Code: long-running, autonomous, shell-native, writing, research, tests
Cursor: interactive, codebase-aware, multi-file edits, UI/web, PR-scoped
Each queue item has an agent field. Agents only pick their own items.
Handoff protocol between agents to avoid collisions

16. Session Handoff Protocol

Every session writes a handoff so the next session can resume without re-discovery.

Fill "Letzte Session" in control-center before ending
Update project's docs/status.md with what was done
Fields: date, channel, projects touched, completed, blocked, next step
Read handoff at session start — this IS the context

17. Worktree Safety

When agents work in isolated worktrees, protect uncommitted work.

Agents must commit before finishing
Check for uncommitted changes before deleting worktrees
Save work as named branches before cleanup
Never use TeamDelete as a shortcut — it destroys worktrees

Documentation & Knowledge

18. Documentation as First-Class Deliverable

Not an afterthought — parallel to code.

Every architectural decision gets an ADR
Master Prompts and Book docs updated alongside code
Security by design, not bolted on
New concepts captured immediately, not retroactively

19. Script Everything

Multi-step workflows, automations, reusable commands → save as scripts in scripts/. Never just execute ephemeral commands.

Include a brief comment header explaining what the script does
Reproducibility, auditability, handoff clarity
If you did it twice, it's a script

20. Memory as Institutional Knowledge

Persistent memory bridges sessions. Only save what a future session would need.

API quirks, DB schemas, key architectural decisions
Not routine operations or task progress
Check at session start to avoid re-discovery
Update or remove stale memories

Capture & Learning

21. Zero-Friction Capture

Ideas, principles, and decisions are capturable in the moment without switching tools.

The conversation is the inbox
Agent triages and routes automatically
Principles are extracted proactively from observed patterns
User validates by not objecting

22. Proactive Principle Detection

When a decision is made, a pattern repeats, or an approach is confirmed — check if there's an underlying principle worth capturing. Don't ask. Just add it and mention briefly.

Content Production

23. Voice Profile Consistency

Writing follows defined voice profiles with guardrail enforcement.

Kombi B: essayistic + provocative-analytical
No fictional characters, no coaching language, no listicles
"Follow the money" in every chapter
Structure: Strukturanalyse → Philosophie → Daten → Praktische Übung
Automated guardrail checks before publishing

24. Persona / Series / Volume Hierarchy

Content scales through structured inheritance.

Persona (author identity) → Series (universe rules) → Volume (individual book) → Fan-Out (publisher × language variants)
Volume override > series default > persona default > global default
One persona can write multiple series; each series has shared terminology and rules

Infrastructure

25. Tool Auto-Provision

When an agent needs a tool that isn't installed, it should install it automatically — not block and ask.

Container should support on-demand tool installation (apt, npm, go install, curl binary)
Dockerfile covers the 80% case; auto-install covers the long tail
Log what was installed so it can be baked into the Dockerfile later
Never let a missing jq or go derail a 30-minute sprint

Origin: "Wenn irgendwas fehlt wie go — wir brauchen die Möglichkeit ein Tool passthru oder auto-install zu ermöglichen"

Quality & Process

26. PDCA Every Sprint

Plan-Do-Check-Act after every sprint, not just at the end. Check catches bugs before they compound.

Plan: define features + acceptance criteria
Do: implement with team, commit after each feature
Check: test in production, read debug logs, try bad inputs, verify on mobile
Act: fix everything found before starting next sprint
Never skip Check. A shipped bug costs 10x more than a caught bug.

Origin: Sprint 1-3 each had a PDCA cycle that caught rate limiting issues, SSE race conditions, and Caddy routing gaps.

27. Test in Production (for fast prototyping)

For single-user tools in rapid prototyping: test against the real deployment. Mocks hide integration bugs. Grain of salt: This applies to MVPs and personal tools. For multi-user, shared, or safety-critical systems, use proper staging environments and test suites.

Fast prototyping: curl against live API, try PWA on real phone, submit real jobs
Production-grade: staging environment, automated test suite, canary deploys
The principle is about speed of feedback, not skipping quality gates
Know when you've graduated from prototype to product — then add proper testing

Origin: "Committe regelmäßig und test in production — keine mocks!" (during rapid MVP sprint)

28. Changelog as First-Class Artifact

Every project gets a CHANGELOG.md. Updated with every sprint. The user should never have to ask "what changed?"

Reverse-chronological, grouped by version/sprint
Include Added/Changed/Security/Fixed sections
Link to relevant commits if helpful
Update it DURING the sprint, not after

Origin: "Ich brauch gute changelogs um bei allem laufenden zu bleiben."

29. Emergency Stop (Not-Aus)

Every autonomous system needs a kill switch. One button, kills everything, no confirmation cascade.

Cancel all running jobs immediately
Pause the system (workers stop polling)
Log the event as critical
Resume button to unpause
Visible at all times, not buried in a menu

Origin: "Und wir brauchen einen Not-Aus-Knopf ;)"

30. Self-Monitoring (Guardian Pattern)

The system monitors itself. A background watchdog checks health every N minutes and logs findings.

Check: stuck jobs, dead workers, error spikes, DB connectivity
Log structured findings to a queryable debug_log
Agent can read the logs to self-diagnose
Future: alert the user via push/webhook when degraded
Clean up old logs automatically

Origin: "We should have a guardian who checks every other minute what's going on."

31. Debug Logs as Agent Interface

Structured debug logs aren't just for humans — they're an API for the agent to understand system health.

Queryable by level, component, time range
Secret-safe (auto-redact tokens, keys, passwords)
Agent reads them between sprints to catch issues
Self-healing: agent detects error patterns and applies fixes

Origin: Built during dispatch development — agent reads /debug/logs to diagnose production issues.

32. Multi-Layer Auth for Admin Endpoints

Regular API operations and admin/debug operations need different auth levels.

Regular token: job CRUD, worker operations
Admin token: debug logs, stats, worker management, emergency stop
Rate limiting: stricter on admin endpoints
Never share the same token for both levels

Origin: "Ich hoffe wir haben da ne mehrstufige Authentifizierung dahinter..."

33. Container-First Development

Use containers wherever possible — for isolation, reproducibility, and security.

Dev environments: devcontainer (one Dockerfile for all agents)
Agent execution: run Claude Code in sandboxed containers (claudine)
Worker jobs: execute in ephemeral containers, not on the host directly
Dispatch workers: should spin up containers per job (isolation, cleanup, no state leakage)
Testing: container-based test environments matching production
Production: containerized services (not bare-metal pip installs)

The goal is not containers for containers' sake — it's isolation + reproducibility + disposability. A crashed job shouldn't affect the host. A rogue agent shouldn't access other projects.

Origin: "Wir sollten noch darauf achten so viel wie geht Container sinnvoll zu nutzen"

How to apply:

Dispatch Sprint 4+: Workers should optionally run jobs inside containers
claudine already does this for Claude Code sessions
Dev environment already uses .devcontainer/Dockerfile
Next step: containerized worker execution (docker/podman per job)

34. Git Worktrees for Agent Isolation

When multiple agents work on the same repo, use git worktrees instead of branches. Each agent gets a full working copy without cloning.

Agent writes to its own worktree — no merge conflicts during work
Main branch stays clean until merge
Agents can work in parallel on the same files
Worktree = disposable sandbox. Commit → merge → delete.
Cheaper than containers for code-only isolation (no image build, instant)

Combination with containers: Container for runtime isolation (process, network, filesystem). Worktree for code isolation (git history, no conflicts). Best of both:

Agent runs in container (sandboxed execution)
Container mounts a worktree (isolated code copy)
Agent commits to worktree branch
Team lead merges worktree branches → main

Origin: "Genauso wie git worktree, was wohl auch sehr clever ist"

Caveat: Worktree safety is critical — agents MUST commit before worktree deletion. See Principle #17 (Worktree Safety).

(inbox — unsorted ideas)

35. Recursive Agent Dispatch

Agents should be able to dispatch sub-jobs to other agents/workers. Not just top-down human→agent, but agent→agent.

Agent working on a task can submit sub-tasks via MCP/API
Sub-tasks run on different workers (different machines, different capabilities)
Parent agent monitors sub-task completion and integrates results
Enables: "fix this bug" → agent runs tests on server, checks docs on laptop, submits PR via Gitea

Key constraint: Prevent infinite recursion. Max dispatch depth (e.g., 3 levels). Cost budgets per job chain.

Origin: "Was wenn die Agents sogar selber dispatchen könnten via MCP oder API ;)"

36. Ephemeral Execution Environments

Code execution should be disposable. Run in a fresh environment, extract results, throw away the environment.

Containers (Claudine): self-hosted, free, full control
Cloud microVMs (Vercel Sandbox): managed, instant snapshots, network firewall
The job shouldn't care WHERE it runs — same interface, different backends
Dispatch should abstract over execution backends: local worker, container, cloud sandbox

Origin: Comparing Claudine (self-hosted containers) with Vercel Sandbox (managed microVMs) — same concept, different trade-offs.

(inbox — unsorted ideas)

Least-privilege agent access: Agents should SSH as a dedicated non-root user (e.g. deploy@) with scoped sudo for only what they need (systemctl, caddy reload). No root SSH long-term.
Immutable deploy artifacts: Agent builds a tarball/image, uploads it, runs a deploy script. Never edits files in-place on production.
Multi-node same-repo: Multiple workers (laptop + vserver) working on same git repo but different features. Merge conflicts → Gitea Actions hook or team-lead merge strategy. Start small: one worker per repo. Scale: worktrees per feature.
Prioritize small+working over ambitious+broken: "Lieber erstmal klein und funktionierend und dann mehr." Collect ideas, implement incrementally, never ship broken.

Drop new principles here. They get organized on next pass.

16 KiB Raw Blame History Unescape Escape

AI Dev Principles

Architecture & Design

1. Single Source of Truth

2. Vertical Spike Before Framework

3. Convention Over Configuration

4. Agent-First Design

5. Don't Make Me Think (About Infrastructure)

Cost & Efficiency

6. Cheapest Model per Task

7. Budget-aware by Default

8. Batch, Cache, Deduplicate

Execution & Error Handling

9. Checkpoint / Resume

10. Diagnose Before Retrying

11. Fail Forward

12. Read Before Write

Autonomy & Agents

13. Autonomous but Auditable

14. Parallel by Default

15. Dual-Agent Routing

16. Session Handoff Protocol

17. Worktree Safety

Documentation & Knowledge

18. Documentation as First-Class Deliverable

19. Script Everything

20. Memory as Institutional Knowledge

Capture & Learning

21. Zero-Friction Capture

22. Proactive Principle Detection

Content Production

23. Voice Profile Consistency

24. Persona / Series / Volume Hierarchy

Infrastructure

25. Tool Auto-Provision

Quality & Process

26. PDCA Every Sprint

27. Test in Production (for fast prototyping)

28. Changelog as First-Class Artifact

29. Emergency Stop (Not-Aus)

30. Self-Monitoring (Guardian Pattern)

31. Debug Logs as Agent Interface

32. Multi-Layer Auth for Admin Endpoints

33. Container-First Development

34. Git Worktrees for Agent Isolation

(inbox — unsorted ideas)

35. Recursive Agent Dispatch

36. Ephemeral Execution Environments

(inbox — unsorted ideas)

16 KiB

Raw Blame History