Most 'AI agents' aren't agents — a definition that actually means something

The word 'agent' has been slapped on every LLM product launched in the past 18 months. Most of them aren't agents under any honest definition. Here's a tight three-property test, and how popular tools score against it — including VibeKit.

By Brian Boisjoli 8 min read agentsopiniontaxonomy

The category is on fire and it's mostly a vocabulary problem

In the past 18 months, "AI agent" has gone from a research-paper term to a marketing word that means whatever the speaker wants it to mean. It's now applied to:

These are wildly different things. Calling them all "agents" is like calling a thermostat, a self-driving car, and a chess engine all "AI" — technically defensible at the broadest level, useless for actually distinguishing what's in front of you.

This post is an attempt to fix the vocabulary. Not because I think semantics matter for their own sake, but because the conflation lets bad products hide behind good products' marketing. I'm going to define "agent" in a tight, falsifiable way, then score popular tools against it. I'll be honest about where VibeKit lands.

A three-property definition

An AI agent, to be worth calling an agent, has to do three things:

  1. Take goal-directed actions on the world — not just text out, but actual side effects: file edits, API calls, deploys, transactions.
  2. Loop autonomously — when its first action doesn't complete the goal, it observes the result and decides what to do next, without being re-prompted by a human for each step.
  3. Carry state across turns within a task — what it learns mid-task informs subsequent decisions in that same task. (Not "remember me across sessions" — that's a separate, harder problem. Just: doesn't reset every time it acts.)

That's it. Three properties: action, autonomy, state. I'll call them A/A/S.

A/A/S is not a high bar. It's a floor. Most "AI agent" products fail at one of the three. The ones that pass aren't necessarily good products — they're just minimally entitled to the name.

A few things this definition deliberately doesn't require:

What it does require: actually closing a loop. That's the load-bearing part. Without autonomy, you have a chatbot with extra steps. Without state, you have a stateless function. Without action, you have an LLM.

Scoring the popular tools

Let me run this against everything I see called an "agent" in 2026.

Cursor agent mode

Claude Code

Devin

Aider

ChatGPT (default)

ChatGPT Custom GPTs

LangChain "agents" (the term in their docs)

Zapier / Make / n8n with LLM steps

"Shopping assistant," "research assistant," "writing assistant" (LLM wrappers)

Lovable / Bolt / v0 (vibe-coding tools)

VibeKit

This is the part I'm obligated to be honest about.

But also being honest about where VibeKit's "agent" claim is weaker than it could be:

I'd call VibeKit a "real but constrained" agent. If Devin is at the upper end of the agent spectrum and Custom GPTs are below it entirely, VibeKit sits comfortably above the threshold but well short of "AI software engineer."

Why this taxonomy matters

If you only read up to here, fine. The detail of which tool scores how doesn't matter much. What matters is having a way to read marketing copy with calibrated skepticism.

When a launch post says "we're building agents" — what they actually mean might be:

The reason this conflation is harmful isn't pedantic. It's that the underlying engineering for each is wildly different. A real agent has to handle infinite loops, runaway costs, partial failures mid-task, unbounded state growth, and a dozen other failure modes that a chatbot-with-tools doesn't. Buying a "chatbot with tools" thinking you got "real agent" — or vice versa — produces predictable disappointment.

How to test in 30 seconds

If you're evaluating a product that calls itself an agent, do this:

  1. Give it a task with a step that will fail the first time you describe it. ("Find the broken link in this README" — but the broken link is described in a way that requires checking the file.)
  2. Watch what it does after the first action returns nothing useful. Does it try a different approach? Or does it return "I don't know"?

That single test separates real agents from chatbots-with-tools. Real agents observe the failure, form a new hypothesis, try again. Chatbots return their best guess and stop.

I run this test on every "agent" product I evaluate. Roughly half pass. The other half — including some very expensive products from companies with very loud marketing — return their first guess and call it done.

The takeaway

"Agent" became a marketing word in 2024–2025 the same way "AI" became one in 2017. It will keep being used loosely; you can't unring that bell. What you can do is build a private vocabulary that distinguishes:

Use the right word for the right product. When a vendor blurs the categories, that's information. Sometimes they're confused; sometimes they're hiding limitations behind a stronger word. Both worth noticing.


If you want to see VibeKit's agent loop in action, start a free app with BYOK and give it the failure-test from above. The agent runs in a real container, hits real failures, and either recovers or tells you what's blocking it. That's the floor of what an agent should do.

Read the architecture details: How AGENTS.md works covers the persistent-memory layer that keeps state across sessions. Persistent AI agent is the product overview.

For new posts, follow @609.sol on X.

Try VibeKit
Every app gets its own AI agent. Free tier with BYOK.
Start Building →