How AGENTS.md works (and why most AI coding agents need one)

AGENTS.md is the file every agent in VibeKit reads on every turn. It's not a system prompt — it's a per-app contract that scopes the agent's behavior, encodes architectural decisions, and survives context resets. Here's why it matters and how it's built.

The problem AGENTS.md is solving

If you've used Claude Code, Cursor agent mode, or Devin for more than an hour, you've noticed a pattern: the agent is great at one turn and forgets everything before it. It re-reads the same files. It re-asks the same architectural questions. It "decides" to use kebab-case routes, you correct it, it writes them in kebab-case for the rest of the session — and tomorrow you're back at square one.

The naive fix is "use a longer context window." That doesn't actually work. Even with a million-token window, every fresh conversation starts with no awareness of past decisions. The agent is technically capable of reading prior chats; it just doesn't, because nothing in its prompt tells it to.

The fix is to externalize the things you want remembered into a file the agent reads every turn. In VibeKit that file is AGENTS.md, and it lives in every app's repo at the root.

This post is about what's actually in that file, why each piece is there, and the small number of architectural decisions that make persistent agent memory work in production.

What AGENTS.md is, concretely

It's a Markdown file. Around 5–8KB. It sits at the root of every VibeKit-hosted app's GitHub repo (we wire up the GitHub integration automatically on first deploy), alongside package.json and README.md. The agent's instructions tell it to read this file on every turn before doing anything substantive.

Here's an excerpt from one — the actual template that ships with every new VibeKit app, generated at provision time by renderAgentsMd (src/services/agent-templates.ts:87):

# my-dashboard — Agent

You are the AI agent for **my-dashboard** (live at https://my-dashboard.vibekit.bot).

The product is called "my-dashboard". Refer to it that way in conversation. The
GitHub repo (user/my-dashboard) is just where the code is stored — its name may
differ from the product name (renames don't propagate to GitHub). When in doubt
about what to call this project, use "my-dashboard".

Port: 4001 | Container: vk-my-dashboard

## Setup (every session)
```bash
source .vibekit-env   # loads VIBEKIT_API_URL, VIBEKIT_API_KEY,
                      # VIBEKIT_SUBDOMAIN, VIBEKIT_APP_ID
source .env           # app env vars


The full file is closer to 200 lines and covers four things:

1. **Identity** — what to call the app, what the live URL is, what container it runs in
2. **Defaults** — when to use tools vs. just reply, what to do after deploying, when to commit
3. **Mechanics** — how to deploy, how the container runs, what API endpoints exist
4. **Anti-patterns** — common mistakes that look reasonable but break the system

That fourth bucket is the most underrated. Most "system prompt" content focuses on what to do; AGENTS.md spends roughly equal real estate on what *not* to do, with explanations. Examples I've ended up writing the hard way:

Sandbox failures are not permission bugs. chmod, chown, sudo, docker, ssh, aws, systemctl all fail by design — that's expected. Never tell the user a workspace file is "not writable" because chmod returned an error.
React/Vite without a build step → container can't serve it
Listening on localhost instead of 0.0.0.0 → not reachable
Treating a chmod/chown/sudo failure as evidence a file is unwritable — those commands are sandbox-blocked by design; the file is yours to edit.


Each one of those bullets exists because an agent run failed on it once, the user asked "why did the agent give up?", and the answer was "it interpreted a sandbox-by-design block as a real permission problem." Without AGENTS.md, that lesson dies with the conversation. With it, it generalizes — every agent on every app starts from the corrected baseline.

## Why this isn't just a system prompt

The instinct most engineers have when they hear "file the agent reads every turn" is: "isn't that just a system prompt?" There's overlap, but they're not the same thing — and the differences are what make AGENTS.md work as a persistence mechanism.

A system prompt:
- Lives in your application code or your model-provider config
- Is the same for every user of the application
- Changes only when you redeploy
- Is not visible to the user
- Is not editable by the agent itself

AGENTS.md:
- Lives in the user's GitHub repo, owned by them
- Is per-app, customized at provision time with the app's name, subdomain, port, container ID
- Changes whenever the user (or the agent) commits to it
- Is fully visible — it's a regular file on disk
- The agent is explicitly instructed to update it ("Update MEMORY.md with important findings, decisions, and lessons learned")

The key property is the last one. AGENTS.md is **not just a contract you push down to the agent — it's a notebook the agent writes back to.** When an agent solves a tricky bug, the AGENTS.md instructions tell it to log the lesson. The next agent run, days later, reads that lesson and doesn't repeat the bug.

For this to work the file needs to be:

1. **Checked into git** — survives container restarts, redeploys, even cancellation
2. **Per-app, not global** — different apps have different conventions; mixing them poisons context
3. **Bounded in size** — too large and you blow the model's context window every turn

Number 3 is the boring constraint that ends up driving design. The current template is around 5–6KB. The agent gateway has a hard cap of **5,000 characters** before injecting AGENTS.md into per-turn context — over that threshold, the file gets truncated with a marker. You'll see the warning in the gateway logs: `workspace bootstrap file AGENTS.md is 9726 chars (limit 5000); truncating in injected context`. So the agent is also instructed to put longer-form notes into `MEMORY.md` (a sibling file the agent owns and writes to freely) rather than letting AGENTS.md grow unbounded.

## The three-file pattern: AGENTS.md, MEMORY.md, STATUS.md

In production, persistent memory in VibeKit is actually three files, not one. Each has a different access pattern:

| File | Owner | Read frequency | Write frequency | Size budget |
|---|---|---|---|---|
| `AGENTS.md` | Platform (regenerated on rename) | Every turn | Rarely | ~5KB |
| `MEMORY.md` | Agent | When user asks for "real work" | When agent learns something | Unbounded, agent-managed |
| `STATUS.md` | Platform (live updates) | When user asks about app health | Continuously, by infra | ~2KB |
| `memory/YYYY-MM-DD.md` | Agent | Today's | Whenever something happens | Daily rotation |

This separation matters because the read pattern is different per file. AGENTS.md is read by *every* agent run — even trivial replies — so it has to be small and stable. MEMORY.md is only read on real work tasks, so it can be much larger. STATUS.md is only relevant when the agent is investigating live behavior. Daily logs in `memory/` are mostly write-only — they exist so that "what did I do last Tuesday" is answerable.

If you tried to do this with one big file, you'd hit one of two failure modes:

- **All in AGENTS.md**: the file balloons past the context budget. Agent now spends its entire context on memory, has none left for the actual task.
- **All in MEMORY.md**: the agent never reads it on trivial turns (greetings, status checks), so identity context evaporates.

Splitting by access pattern is what makes the system stable. The reason most "agent memory" implementations don't work is they pick one file and try to make it serve both jobs.

## How the agent is told to read it

The hard part of "agent reads file every turn" isn't the file. It's the *instruction* that the agent reads it without being asked. This is brittle — most LLMs will ignore instructions to do work they don't see immediate value in.

What actually works in the AGENTS.md template:

```markdown
## Setup (every session)
\`\`\`bash
source .vibekit-env   # loads VIBEKIT_API_URL, VIBEKIT_API_KEY, ...
source .env           # app env vars
\`\`\`
When the user asks you to do real work, also read: STATUS.md (live
health/logs), MEMORY.md (your accumulated knowledge), memory/YYYY-MM-DD.md
(today's log). For greetings or trivial messages, skip — answer from context.

Two things about this:

The setup is concrete, not aspirational. "Read AGENTS.md" doesn't work — the agent's already in a context that includes AGENTS.md. The instruction is to source .vibekit-env, which has a verifiable side effect (env vars are now defined). Concrete actions stick; abstract reminders don't.
The "skip for trivial messages" carve-out is load-bearing. Without it, the agent reads MEMORY.md on every "hi" and burns 30+ seconds (and tokens) on a non-task. The carve-out is what keeps voice-call latency bearable: a "yes I'm here" reply doesn't have to traverse the whole memory layer.

Empirically, the carve-out cuts tool-call counts on trivial turns by close to an order of magnitude. Without it, the agent will dutifully run git status, cat package.json, and read STATUS.md before answering "hey are you there" — every single time. A "yes I'm here" reply that takes 8 seconds because it's traversing the whole memory layer feels broken to a user, even though the agent is technically doing what its instructions told it to.

The "be resourceful, check the file" failure mode

There's a specific anti-pattern AGENTS.md is designed to prevent, and it shows up across nearly every agent platform: the default-to-resourceful-investigation loop.

The agent gets a prompt like "make the hero copy shorter." The reasonable instinct, and the instinct most agent system prompts encourage, is to:

Read the current homepage HTML
Read the deployed version to confirm what's live
Check git log for recent changes
Look at the README to understand the project
Then edit the hero

That's five tool calls before any work. For "shorter hero copy" — a 30-token text edit — you've burned 50,000 input tokens just gathering context, half a minute of latency, and a few cents of API spend.

AGENTS.md addresses this directly:

- Default to no tool calls. Aim for ≤3 tool calls per turn. Only exceed when
  the user explicitly asks you to fix, build, deploy, debug, or investigate —
  and even then, prefer one targeted read over a sweep.
- The "be resourceful, check the file" rule applies when the user asks for
  code work. It does NOT mean read public/index.html 60 times before saying hello.

The framing matters: it's not "be lazy" — it's "be appropriately resourced for the task at hand." Most coding agents are tuned by their providers to look thorough, because thoroughness reads as competence in benchmarks. In production, with a real user waiting and real tokens metered, thoroughness without judgment is just expensive.

What goes in AGENTS.md vs MEMORY.md vs the system prompt

This is the question I get most often when I describe the architecture: where does each kind of context belong? Some heuristics that have held up:

Goes in the system prompt (gateway-level, identical for every agent run on every app):

General coding agent personality and behavior
Tool-use formatting (how to invoke Bash, Edit, Write)
Output structure expectations (markdown, plain text, JSON)
Universal safety rules ("never expose API keys")

Goes in AGENTS.md (per-app, identical across turns within an app):

App identity (name, URL, subdomain, port)
App-specific deploy mechanics (the deploy-workspace endpoint, status flow)
Constraints unique to this hosting environment (256MB memory, sandbox limits)
Anti-patterns that have bitten this platform's users specifically

Goes in MEMORY.md (per-app, agent-edited, read on demand):

Architectural decisions specific to this app ("we use Drizzle, not Prisma")
Bug fixes that should not be re-introduced
User preferences this user has stated ("commit messages should be lowercase")
In-progress work spanning multiple sessions

Goes in conversation history (per-conversation):

The actual back-and-forth of the current task

The line between AGENTS.md and MEMORY.md is the most subtle. The principle: AGENTS.md is what the platform knows; MEMORY.md is what the agent and user have learned. AGENTS.md is regenerated whenever the platform redefines what an app is (rename, port change, container migration). MEMORY.md is only edited by the agent itself, and the platform never overwrites it.

When a user renames an app, the platform regenerates AGENTS.md to reflect the new name and subdomain — see the rename handler in src/routes/hosting.ts. The handler explicitly only writes AGENTS.md; MEMORY.md isn't touched because the architectural decisions captured there are still valid. This split is what makes it safe for the platform to mutate AGENTS.md without losing the agent's accumulated knowledge.

What I'd build differently if starting over

Three things I'd revisit:

1. The 5KB cap is arbitrary. The original number came from a back-of-envelope estimate of how much context to spend on per-turn boilerplate. In hindsight some apps need 8KB; some only 2KB. A smarter cap would be percentage-of-context-budget, not absolute bytes.

2. AGENTS.md isn't versioned. When I update the template, all newly-provisioned apps get the new version, but existing apps stay on whatever version they were created with — until the next rename triggers regeneration. There's a regenerateAgentsMd batch operation that fixes this, but running it is opt-in. A cleaner design would auto-update on platform updates with a migration log so users can see what changed.

3. The "trivial vs work" judgment is in the agent's hands. Right now the agent decides whether a turn is a greeting (no tools) or a work request (read MEMORY.md). When it gets that judgment wrong — interpreting a real request as a greeting — the user sees a confidently wrong answer with no investigation. A platform-level classifier ("is this a real task?") that ran before the agent saw the message would give better worst-case behavior.

Each of these is a real tradeoff, not a clear win. The current design ships and works for the vast majority of turns. But if you're building something similar, those are the spots I'd push on first.

Closing

The takeaway isn't "every agent platform should have an AGENTS.md file." It's that persistent memory in agents is an architecture problem, not a model capability. You can't fix it by upgrading to a larger context window. You can't fix it by giving the model better instructions in the system prompt. The thing that fixes it is structuring memory across multiple files with different access patterns, and giving both the platform and the agent ownership over different parts of that memory graph.

If your agent forgets things between sessions, the bug isn't in the model — it's in the absence of a place for the model to remember things to.

If you want to see how this works in production, VibeKit gives every app you build its own agent with this architecture. Free with BYOK (bring your own Claude or OpenAI key) — your AI charges go straight to the provider. The persistent agent page has the architecture summary; this post is the longer-form why.

Coming soon: a follow-up on MEMORY.md as an editable knowledge base — what works, what fails, and how the agent decides when something is worth writing down. Subscribe via @609.sol on X to catch it.