Persistent agent memory — the three layers most tutorials skip

When you hear 'persistent agent memory' you probably think vector database. That's one layer of three, and it's the one most apps reach for first when they actually need a different one. Here's how we run persistence on VibeKit — workspace, session, and tool-tracked — and when each one is the right answer.

By Brian Boisjoli 8 min read agentsmemorypersistencearchitecturevibekit

The question that gets answered wrong

Open any agent-building tutorial and you'll see the same diagram: agent → vector database → retrieval → context. "Persistent memory" usually means "RAG with embeddings." It's a real layer, but it's the wrong default for most of what users call memory.

The actual question is: when the conversation ends and the user comes back tomorrow, what should the agent remember?

That has at least three answers and they're not interchangeable.

We run VibeKit — an AI agent platform that gives each user a persistent coding agent with its own workspace, file history, environment, and conversational state. We had to figure this out for production. Here's the model that worked, the model we tried first and replaced, and how to know which layer your use case actually wants.

Layer 1 — Workspace memory (the agent's filesystem)

This is the layer that gets skipped most often, and it's almost always the one the user means.

When a user says "the agent should remember my project," they mean the agent should pick up tomorrow with the same files, the same package.json, the same database schema, the same commits-in-progress that they left. They don't mean "the agent should recall the gist of yesterday's chat" — they mean the literal artifacts of the work should still be there.

On VibeKit each app has a workspace dir on EFS (AWS's network filesystem) mounted into the agent's Fargate container at /home/agent/workspace. The workspace persists across container restarts, agent reprovisioning, model swaps, even cross-Fargate-task moves. Bigger than memory in the colloquial sense: it's the agent's entire working environment, frozen between sessions.

What this layer is good for:

What it's not good for:

If you're building an agent and you don't have this layer, you don't have persistence — you have a chatbot. Adding it is not the same as adding a vector DB. It's mounting a real filesystem (or S3, or a Postgres jsonb column, or whatever — the storage medium matters less than the fact that the artifacts persist).

A practical detail people miss: the workspace also persists the state of the environment. Installed dependencies, configured env vars, partially-applied migrations. When the agent comes back, it shouldn't need to redo any of this. On VibeKit we use EFS Access Points with per-app POSIX UIDs so the ownership stays correct across container restarts — solved a real bug where ownership drift was forcing reinstalls on every wake. The fix isn't sexy. It's just unglamorous filesystem plumbing that keeps the persistence promise.

Layer 2 — Session memory (the chat transcript)

This is the layer most tutorials show. It's the one where the agent recalls "yesterday we were discussing X."

The naive implementation is: dump every message into a SQL table keyed by user+session, replay them on next turn. This works for short conversations. It breaks the moment the agent has been talking to a user for two weeks and you're trying to stuff 400 messages into a 200K-token context window.

The actual model on VibeKit is closer to: truncate by recency, summarize by significance.

The hardest part isn't the storage. It's the cadence of compaction. If you summarize too eagerly, you lose nuance. If you wait too long, you blow context budgets and the agent loses its grip. We compact when context utilization crosses ~70% of the available window, with a deliberate hand-off message ("I'm summarizing our earlier discussion for memory; continuing…"). The user sees the seam, which we've found increases trust more than hiding it.

A small but important point: session memory is per-conversation, workspace memory is per-app. Same user, same VibeKit app, different conversation threads = same workspace, different session transcripts. This split matters because it lets users open a fresh chat ("I'm starting over") without losing their work files. Conflating these layers is a common bug. Don't.

When session memory is the right layer:

When it's not enough:

Layer 3 — Tool-tracked memory (the agent's structured state)

This is the layer almost nobody builds explicitly, and it's the one that quietly does the most work.

When the agent calls a tool — a Write to a file, an Edit, a Bash command, a Read of a doc — that action is itself memory. The fact that the agent ran npm install fastify yesterday is recorded in the workspace's package.json. The fact that it created an admin route is recorded in the source file. The fact that it ran a migration is recorded in the database. None of this needs a vector store. The system-of-record IS the memory.

The reason this matters: it makes recovery deterministic. When the agent wakes up tomorrow, it doesn't need to recall "did I install fastify yesterday?" — it can read package.json and know. The state of the world IS the memory; the agent re-derives context from it on demand.

We push this hard on VibeKit. The agent's instruction file (AGENTS.md) says: "always read the current state before deciding the next action." That sounds obvious, but most agent loops we've seen take "what I remember" as the source of truth and only fall back to "what's on disk" when memory contradicts itself. We invert it. The disk is the source of truth. The agent's working memory is a cache of the disk. When they disagree, the disk wins.

This has a great practical consequence: model failures are recoverable. If the agent crashes mid-session, the next agent (same or different model) can read the workspace and pick up. There's no "agent personality" lost in the crash, because there was no personality stored separately from the artifacts.

When tool-tracked memory is the right layer:

When it's not:

When the vector DB is the right answer

We haven't talked about RAG yet because it usually isn't what people need first. But it has its place. The vector-DB layer is right when:

  1. The corpus is genuinely larger than context. If you have 10,000 docs the agent might need to query, you need retrieval. If you have 50, just stuff them in system context with structured prefixing — cheaper, more reliable, no embedding-drift problems.
  2. Search is semantic, not lookup. "Find me docs related to authentication strategies" needs RAG. "Read AGENTS.md" doesn't.
  3. The user explicitly wants knowledge-base behavior. The agent recalling from a pinned reference library, citing sources, etc.

Most production agent systems we've looked at have RAG bolted on as the first persistence layer and no workspace layer at all. Then they wonder why users can't pick up yesterday's work. The fix isn't more retrieval. It's adding the filesystem they should have had on day one.

A practical decision flow

Working through which layer your agent needs:

  1. Does the user create artifacts that should persist? (code, docs, generated images, datasets) → You need Layer 1 — workspace. Not negotiable. This is the floor.
  2. Will conversations span more than 10–20 turns?You need Layer 2 — session memory with compaction. A SQL table isn't enough; you need a summarization strategy.
  3. Does the agent take actions that change canonical state? (database writes, API calls, file edits) → You need Layer 3 — tool-tracked. Make it explicit. Tell the agent (in its instruction file) to re-derive context from the state of the world, not from its own memory.
  4. Does the agent need to recall from a corpus larger than context?Now consider RAG. And only now.

Most apps that say "we need persistent memory" need (1), (2), and (3) in that order, and don't need (4) at all. The default conversation in the agent community goes (4) first because it's the most discussed and the most vendor-marketed. It's also usually the least urgent.

Why we wrote this

We were getting search traffic for "persistent memory for ai agents" — and our existing post on the topic was at position 37, deep enough that nobody clicked. The query was right; the answer we'd given was too narrow.

Persistent memory isn't one thing. It's at least three. Getting that taxonomy right is the difference between an agent that seems to remember and an agent that actually picks up where the user left off — same workspace, same conversation thread, same understanding of the world it's been changing.

If you're building something agent-shaped right now, audit which layers you have. There's a very good chance you're missing the most important one and have invested heavily in the least important one. We did. The fix wasn't bigger memory. It was the right kind.

Try VibeKit
Every app gets its own AI agent. Free tier with BYOK.
Start Building →