Apr 25, 2026

Stop Telling Your Agents What To Do

Most people configure AI agents with prose. A README that says “don’t edit generated files.” A system prompt that says “always use ISO dates.” A CLAUDE.md that says “run tests before committing.”

This is the equivalent of writing “please wash your hands” on a bathroom wall and expecting 100% compliance.

Agents are forgetful. Not because they’re bad, but because that’s the nature of stateless inference over a context window. Every session starts fresh. Every prompt competes with a million other tokens for attention. Your carefully written rule in paragraph 47 of AGENTS.md? The agent skipped it. Not maliciously. It just didn’t weight it highly enough this time.

The fix isn’t writing better prose. The fix is making the prose unnecessary.

Desire paths

In urban planning, a desire path is the trail worn into grass where people actually walk, ignoring the paved sidewalk. Steve Yegge applied the term to agents: “desire paths clearly work. Just make your tool work the way agents want it to work.” Agents have desire paths too. They read files. They edit files. They run git commands. They commit. They push. That’s the natural flow, the grass trail.

Infrastructure that sits on a desire path is almost impossible to circumvent. A pre-commit hook runs every time the agent commits, and agents commit constantly. A PreToolUse hook fires every time it tries to edit a file, dozens of times per session. The agent doesn’t need to know the rule exists. It just walks into the wall.

Infrastructure that sits off the desire path is a different story. A script the agent has to decide to run, a validation command mentioned in the README: these are only as reliable as the agent’s memory. And we’ve established how reliable that is.

Here’s the enforcement ladder. Click any level to see what happens:

The critical distinction is between “available” and “unavoidable.” A script that exists is available. A pre-commit hook is unavoidable. A Zod schema that runs on pnpm validate is available. A Zod schema that runs inside the file-write hook is unavoidable.

Every rule you can move one rung up the ladder is a rule that fails less often. But more importantly: every rule you can move onto a desire path is a rule that enforces itself.

What this looks like in practice

I run a personal finance + life planning repo. It has a deterministic pipeline: raw bank CSVs → normalized transactions → enriched data → computed metrics → forecasts. The pipeline produces numbers. AI agents read those numbers and produce opinions.

The critical invariant: agents must never hand-edit computed outputs. If they do, the numbers stop being reproducible, can’t be tested, and the git history becomes meaningless.

Here’s how this rule evolved:

Version 1: prose (bad)

AGENTS.md, paragraph buried in section 2:

Agents DO NOT: Hand-edit files under data/metrics/, data/forecasts/, data/categorized/, data/transactions/, data/snapshots/, or anything else derived.

This worked maybe 80% of the time. One in five sessions, the agent would “helpfully” fix a typo in a metrics file, or append a row to a transaction CSV. Then I’d spend 20 minutes figuring out why pnpm rebuild produced different numbers than what was committed.

Version 2: hook (good)

A short bash script wired as a PreToolUse hook:

INPUT=$(cat)
TOOL_NAME=$(echo "$INPUT" | jq -r '.tool_name')
FILE_PATH=$(echo "$INPUT" | jq -r '.tool_input.file_path')

# Only check file-writing tools
if [[ "$TOOL_NAME" != "Edit" && "$TOOL_NAME" != "Write" ]]; then
  exit 0
fi

# Block edits to computed metrics
if [[ "$FILE_PATH" == */data/metrics/* ]]; then
  echo "BLOCKED: L3 metrics are computed by scripts."
  exit 2  # exit 2 = block the tool call
fi

Now the agent can’t edit metrics files. Not “shouldn’t,” can’t. The hook fires before the tool executes and returns exit code 2, which the harness interprets as a block. The agent gets an error message that tells it what to do instead (“write a generator in scripts/”).

Compliance went from ~80% to 100%. Not because the agent got smarter. Because the infrastructure got smarter.

More examples

Once you see the pattern, it’s everywhere:

Rule in prose	Rule in infra
”Use YYYY-MM-DD date format”	Zod schema rejects anything else
”Run tests before committing”	Pre-commit hook runs them
”Don’t hardcode FX rates”	Import fails if the fx-rates module isn’t used
”PRs need 2 approvers”	Branch protection rule enforces it
”Max 3 retries”	Config constant, not a number in a prompt
”Output JSON with these keys”	TypeScript return type + zod parse at boundary
”Don’t commit .env files”	.gitignore + pre-commit hook scanning for secrets
”Always log requests”	Infrastructure middleware, not application code

That last one comes from Geoffrey Huntley:

“What I’m coming to understand is software modularity is more important than ever before. As the agents are forgetful what ya gotta do is push stuff down as an infrastructure concern. Less the agent has to do the better outcomes.”

He’s talking about logging. I’m talking about everything. The principle is the same: if you can push a concern out of the agent’s responsibility and into the system’s structure, do it. Every time.

The boundary

The art is in drawing the line between what’s deterministic and what’s probabilistic. Hooks, schemas, type systems, scripts: these never forget, never drift, never get creative. The LLM reads outputs, spots patterns, proposes decisions, writes new code. It’s creative and fallible.

You want the agent to be as capable as possible, which means giving it freedom to be creative within the constraints the infrastructure enforces. Not “don’t edit metrics files or else.” Just a wall that says “this is not a door.”

Agents don’t get worse when you add constraints. They get better. They stop wasting tokens on things they shouldn’t be doing and focus on the things only they can do: reading the data, spotting what’s off, proposing what to do about it.

The checklist

Next time you write a rule in a README, CLAUDE.md, or system prompt, ask two questions:

1. Can this move up the ladder?

Can it be injected context? (SessionStart)
Can it be a script? (deterministic output)
Can it be a hook? (pre-execution block)
Can it be a schema? (runtime validation)
Can it be a type? (compile-time guarantee)

2. Is it on a desire path?

Does it intercept something the agent already does (edit, commit, push)?
Or does the agent have to go out of its way to trigger it?

If it’s off the desire path, it’s a tool the agent can use. If it’s on the desire path, it’s a wall the agent can’t avoid. The second is always better.

A pnpm validate script is good. A pre-commit hook that runs pnpm validate is better. Same code, same schema, but now it’s unavoidable because committing is a desire path.

If yes to either question, move the rule there. The prose version is a bug report. It means your infra has a gap.

Save prose for the things that are genuinely judgment calls:

“Prioritize by urgency × impact / effort.”
“Write in my voice, direct, no corporate speak.”
“If two domains tie, pick the one with more dependents.”

Those belong in prose because they’re genuinely fuzzy. Everything else is infrastructure waiting to be born.