What persistent agents actually remember — and what they shouldn't

Every AI coding tool now claims 'persistent memory.' Most of them just have a longer chat window. Real agent persistence is a layered system: durable files, an indexed workspace, agent-side notes, and the AI provider's own context. Here's what we put where, and the few things we deliberately let the agent forget.

"Persistent" is doing too much work

If you read the marketing pages for any AI coding tool released in the past year, "persistent agent" is in the headline. What it actually means varies wildly:

Chat history persistence. The conversation thread is saved. The agent re-reads the last N messages on each call. Most tools mean this.
Workspace persistence. The code files survive across sessions. Less common than you'd think — some tools wipe the filesystem each session.
Agent-side note persistence. The agent itself can leave notes for its future self, scoped to the project. Few tools have this.
Cross-device persistence. The same agent state is available whether you're on web, iOS, or the Telegram bot. Almost no tool has this.

VibeKit's agent has all four. They're not the same thing, they live in different storage backends, and they fail in different ways. Calling all of them "memory" is what makes the category confusing.

The four layers, what's in each

Here's the actual taxonomy I've ended up with after running this in production for a few months.

1. Chat history (durable, in Postgres)

Every user message, every agent response, every system event tied to an app gets a row in agent_messages. Indexed by app_id, ordered by created_at. This is the "did you say this two weeks ago" memory.

The user sees this in the chat UI on every device — open the iOS app, scroll up, the conversation from your laptop is there. Open the Telegram bot for the same app, same history. The agent reads the last ~50 messages on every turn to keep context.

What this is good for: continuity, "as I mentioned earlier", referring back to user-stated preferences.

What this is not: semantic search over your entire codebase. Chat history is linear and unindexed.

2. The workspace (durable, on EFS)

Each app's code lives in /mnt/efs/workspaces/app-<id>/ — a real filesystem the agent reads, writes, and runs builds against. The container bind-mounts it. The agent's editor tools read and write here. git operates on this directory.

This is "the codebase" in the literal sense. Files persist forever (until the user deletes the app). The agent can cat, grep, and ls like a normal Unix process.

What this is good for: "what did this file say yesterday?", line-level edits, multi-file refactors with awareness of the whole tree.

What this is not: automatic summarization. The agent has to actually look at the files to know what's in them.

3. Agent-side notes (semi-durable, in workspace files)

The agent leaves notes for its future self in versioned files. The most important is AGENTS.md — a per-app contract that scopes behavior. But there are several:

AGENTS.md — the contract: what kind of app this is, what the architecture commitments are, what to never touch
HEARTBEAT.md — recent decision log; updated by the agent every few turns
TOOLS.md — what tools the agent has access to (rendered into prompt)
SOUL.md / IDENTITY.md — the agent's named persona for this app (different agent per app)

These are real files in the workspace. They survive deploys, they survive container restarts, they survive the user logging out and back in months later. They're the agent's memory of itself.

What this is good for: "remember not to use yarn, the user prefers pnpm", "remember the auth flow you set up last week".

What this is not: infallible. The agent can lie to itself in these files, or write notes that age poorly. We treat them as advisory, not authoritative.

4. The AI provider's context window (ephemeral, per-call)

This is the only memory that's actually scoped to a single API call. Anthropic / OpenAI / OpenRouter give the model a context window — 200K tokens for Claude Sonnet, 128K+ for GPT-5. Inside that window, the agent has everything it needs for the current turn:

System prompt (instructions + persona)
Recent chat history (the durable Postgres rows, replayed into the window)
Read files (the workspace files the agent decided to look at)
Tool results from this turn

That window resets on every call. There's no "memory" inside it — just whatever we put there.

This is the layer that's most often confused with "persistent memory." It isn't. The other three layers are what make persistence possible by reconstructing the window on each call.

What we deliberately let the agent forget

Persistence is a knife with two edges. Some things should not persist:

Half-formed reasoning. If the agent went down a wrong path and the user redirected it ("no, don't do that"), we don't want the wrong path to keep showing up in future prompts. Chat history captures the redirect; the bad reasoning gets diluted by everything that comes after.
One-off API responses. Tool call results (the actual JSON from a fetch, the contents of a ls -la) live in the context window for the turn and then get summarized into a one-line note in HEARTBEAT.md rather than retained verbatim. Persisting them all would blow up the workspace.
Secrets. API keys the user pasted, tokens, credentials. These never get written to disk by the agent. The encryption boundary is enforced at write time.
User regret. If the user says "actually delete that file" or "undo that commit", we let them. Persistence shouldn't be a roach motel.

The principle: persist what helps the agent be coherent across time, forget what wastes context window or creates risk.

How this fails

A few honest ways persistence breaks in practice:

Workspace ownership drift. EFS / NFSv4 + container UID mismatches caused some files to become nobody:nobody-owned, which the agent then couldn't overwrite. We solved this with an EFS Access Point that forces UID 989, but it was a real bug.
Note rot. Agent-side notes (AGENTS.md, HEARTBEAT.md) can become stale if the code changes underneath them. We have a periodic sweep that re-asks the agent to audit its own notes, but it's imperfect.
Chat history bloat. A user with 2,000 messages in one app has a lot of context to replay. We only inject the last ~50 by default, summarize older content into the system prompt, and rely on the workspace + notes for older state.

What this enables

When all four layers are working, the agent behavior you get is:

Open the iOS app a week later → "where did we leave off?" → agent has the answer
Push a commit from Cursor on your Mac → next iPhone agent session knows about it
Tell the agent on Monday "we're using pnpm for this project" → on Friday it's still using pnpm
Crash a container → restart from the same EFS workspace, no loss

That's what "persistent agent" should mean. Not just "longer chat window."

If you want to see the layered model in action, start a project and come back to it a week later. Watch the agent re-orient itself. The persistence isn't a feature; it's the substrate the agent stands on.