Apr 25, 2026

Stop Telling Your Agents What To Do

Most people configure AI agents with prose. A README that says “don’t edit generated files.” A system prompt that says “always use ISO dates.” A CLAUDE.md that says “run tests before committing.”

This is the equivalent of writing “please wash your hands” on a bathroom wall and expecting 100% compliance.

Agents are forgetful. Not because they’re bad, but because that’s the nature of stateless inference over a context window. Every session starts fresh. Every prompt competes with a million other tokens for attention. Your carefully written rule in paragraph 47 of AGENTS.md? The agent skipped it. Not maliciously. It just didn’t weight it highly enough this time.

The fix isn’t writing better prose. The fix is making the prose unnecessary.

Desire paths

In urban planning, a desire path is the trail worn into grass where people actually walk, ignoring the paved sidewalk. Steve Yegge applied the term to agents: “desire paths clearly work. Just make your tool work the way agents want it to work.” Agents have desire paths too. They read files. They edit files. They run git commands. They commit. They push. That’s the natural flow, the grass trail.

Infrastructure that sits on a desire path is almost impossible to circumvent. A pre-commit hook runs every time the agent commits, and agents commit constantly. A PreToolUse hook fires every time it tries to edit a file, dozens of times per session. The agent doesn’t need to know the rule exists. It just walks into the wall.

Infrastructure that sits off the desire path is a different story. A script the agent has to decide to run, a validation command mentioned in the README: these are only as reliable as the agent’s memory. And we’ve established how reliable that is.

Here’s the enforcement ladder. Click any level to see what happens:

Enforcement ladder interactive: six levels of controlling agent behavior, from weakest to strongest. Level 0, Prose instruction: the agent ignores a rule written in AGENTS.md and edits a computed file directly. Level 1, Skill prompt: a skill injects the rule into context, guiding the agent correctly, but the instruction can get lost in longer sessions. Level 2, Script: the agent uses a validation script, but only because it happened to know about it. A fresh session might not. Level 3, Injected context: a SessionStart hook automatically injects available tools and guards, but the agent could still choose to edit directly. Level 4, Hook on desire path: a PreToolUse hook blocks the edit before it happens. The agent self-corrects and investigates the source data instead. Level 5, Schema at boundary: a Zod schema and pre-commit hook reject any inconsistent data at commit time. The agent does not even attempt to edit. Higher levels use structural constraints enforced by the harness rather than prose instructions.

The critical distinction is between “available” and “unavoidable.” A script that exists is available. A pre-commit hook is unavoidable. A Zod schema that runs on pnpm validate is available. A Zod schema that runs inside the file-write hook is unavoidable.

Every rule you can move one rung up the ladder is a rule that fails less often. But more importantly: every rule you can move onto a desire path is a rule that enforces itself.

What this looks like in practice

I run a personal finance + life planning repo. It has a deterministic pipeline: raw bank CSVs → normalized transactions → enriched data → computed metrics → forecasts. The pipeline produces numbers. AI agents read those numbers and produce opinions.

The critical invariant: agents must never hand-edit computed outputs. If they do, the numbers stop being reproducible, can’t be tested, and the git history becomes meaningless.

Here’s how this rule evolved:

Version 1: prose (bad)

AGENTS.md, paragraph buried in section 2:

Agents DO NOT: Hand-edit files under data/metrics/, data/forecasts/, data/categorized/, data/transactions/, data/snapshots/, or anything else derived.

This worked maybe 80% of the time. One in five sessions, the agent would “helpfully” fix a typo in a metrics file, or append a row to a transaction CSV. Then I’d spend 20 minutes figuring out why pnpm rebuild produced different numbers than what was committed.

Version 2: hook (good)

A short bash script wired as a PreToolUse hook:

Now the agent can’t edit metrics files. Not “shouldn’t,” can’t. The hook fires before the tool executes and returns exit code 2, which the harness interprets as a block. The agent gets an error message that tells it what to do instead (“write a generator in scripts/”).

Compliance went from ~80% to 100%. Not because the agent got smarter. Because the infrastructure got smarter.

More examples

Once you see the pattern, it’s everywhere:

That last one comes from Geoffrey Huntley:

“What I’m coming to understand is software modularity is more important than ever before. As the agents are forgetful what ya gotta do is push stuff down as an infrastructure concern. Less the agent has to do the better outcomes.”

He’s talking about logging. I’m talking about everything. The principle is the same: if you can push a concern out of the agent’s responsibility and into the system’s structure, do it. Every time.

The boundary

The art is in drawing the line between what’s deterministic and what’s probabilistic. Hooks, schemas, type systems, scripts: these never forget, never drift, never get creative. The LLM reads outputs, spots patterns, proposes decisions, writes new code. It’s creative and fallible.

You want the agent to be as capable as possible, which means giving it freedom to be creative within the constraints the infrastructure enforces. Not “don’t edit metrics files or else.” Just a wall that says “this is not a door.”

Agents don’t get worse when you add constraints. They get better. They stop wasting tokens on things they shouldn’t be doing and focus on the things only they can do: reading the data, spotting what’s off, proposing what to do about it.

The checklist

Next time you write a rule in a README, CLAUDE.md, or system prompt, ask two questions:

1. Can this move up the ladder?

2. Is it on a desire path?

If it’s off the desire path, it’s a tool the agent can use. If it’s on the desire path, it’s a wall the agent can’t avoid. The second is always better.

A pnpm validate script is good. A pre-commit hook that runs pnpm validate is better. Same code, same schema, but now it’s unavoidable because committing is a desire path.

If yes to either question, move the rule there. The prose version is a bug report. It means your infra has a gap.

Save prose for the things that are genuinely judgment calls:

Those belong in prose because they’re genuinely fuzzy. Everything else is infrastructure waiting to be born.