What persistent agents actually remember — and what they shouldn't

Every AI coding tool now claims 'persistent memory.' Most of them just have a longer chat window. Real agent persistence is a layered system: durable files, an indexed workspace, agent-side notes, and the AI provider's own context. Here's what we put where, and the few things we deliberately let the agent forget.

By Brian Boisjoli 5 min read agentsmemoryarchitecture

"Persistent" is doing too much work

If you read the marketing pages for any AI coding tool released in the past year, "persistent agent" is in the headline. What it actually means varies wildly:

VibeKit's agent has all four. They're not the same thing, they live in different storage backends, and they fail in different ways. Calling all of them "memory" is what makes the category confusing.

The four layers, what's in each

Here's the actual taxonomy I've ended up with after running this in production for a few months.

1. Chat history (durable, in Postgres)

Every user message, every agent response, every system event tied to an app gets a row in agent_messages. Indexed by app_id, ordered by created_at. This is the "did you say this two weeks ago" memory.

The user sees this in the chat UI on every device — open the iOS app, scroll up, the conversation from your laptop is there. Open the Telegram bot for the same app, same history. The agent reads the last ~50 messages on every turn to keep context.

What this is good for: continuity, "as I mentioned earlier", referring back to user-stated preferences.

What this is not: semantic search over your entire codebase. Chat history is linear and unindexed.

2. The workspace (durable, on EFS)

Each app's code lives in /mnt/efs/workspaces/app-<id>/ — a real filesystem the agent reads, writes, and runs builds against. The container bind-mounts it. The agent's editor tools read and write here. git operates on this directory.

This is "the codebase" in the literal sense. Files persist forever (until the user deletes the app). The agent can cat, grep, and ls like a normal Unix process.

What this is good for: "what did this file say yesterday?", line-level edits, multi-file refactors with awareness of the whole tree.

What this is not: automatic summarization. The agent has to actually look at the files to know what's in them.

3. Agent-side notes (semi-durable, in workspace files)

The agent leaves notes for its future self in versioned files. The most important is AGENTS.md — a per-app contract that scopes behavior. But there are several:

These are real files in the workspace. They survive deploys, they survive container restarts, they survive the user logging out and back in months later. They're the agent's memory of itself.

What this is good for: "remember not to use yarn, the user prefers pnpm", "remember the auth flow you set up last week".

What this is not: infallible. The agent can lie to itself in these files, or write notes that age poorly. We treat them as advisory, not authoritative.

4. The AI provider's context window (ephemeral, per-call)

This is the only memory that's actually scoped to a single API call. Anthropic / OpenAI / OpenRouter give the model a context window — 200K tokens for Claude Sonnet, 128K+ for GPT-5. Inside that window, the agent has everything it needs for the current turn:

That window resets on every call. There's no "memory" inside it — just whatever we put there.

This is the layer that's most often confused with "persistent memory." It isn't. The other three layers are what make persistence possible by reconstructing the window on each call.

What we deliberately let the agent forget

Persistence is a knife with two edges. Some things should not persist:

The principle: persist what helps the agent be coherent across time, forget what wastes context window or creates risk.

How this fails

A few honest ways persistence breaks in practice:

What this enables

When all four layers are working, the agent behavior you get is:

That's what "persistent agent" should mean. Not just "longer chat window."

If you want to see the layered model in action, start a project and come back to it a week later. Watch the agent re-orient itself. The persistence isn't a feature; it's the substrate the agent stands on.

Try VibeKit
Every app gets its own AI agent. Free tier with BYOK.
Start Building →