Switchback — per-turn cascade model router for AI agents
Open-source library that classifies each AI agent turn, starts on the cheapest model in your ladder that can plausibly handle it, then escalates mid-turn only when the cheap one fumbles. Brand-locked. MIT.
$ npm install vibekit-switchback
$ npx vibekit-switchback-mcp # MCP server for Claude Desktop / Cursor / Cline
What problem does it solve?
Most AI agent products pick one model per app and use it for every turn. A "fix this typo" turn and a "refactor the auth flow" turn both hit the same flagship — wasted spend on the trivial half, no escalation safety on the complex half.
Switchback fixes both ends. A small classifier model (default openai/gpt-5.4-mini, ~$0.0001/call) scores each user turn as trivial, standard, or complex. The agent starts on the cheapest model in your ladder that can plausibly handle the work — then bumps up a tier mid-turn if signals say the cheap one is fumbling. Same agent loop. Same brand. Just spending what the turn actually needs.
How the cascade works
| Round | What happens | If it works | If it fumbles |
|---|---|---|---|
| 0 | Classifier picks initial tier; agent runs with cheapest model that can handle it | Done. Telemetry logs cheap tier. | Empty tool-exit? Refusal? Escalate. |
| 1+ | Next round runs on the bumped tier with new x-openclaw-model header (no DB write, no reload) | Done. Telemetry logs both tiers. | Three tool errors in steps? Escalate again. |
| Top | Flagship runs the rest of the turn | Done. Telemetry logs full climb. | Capped — never exceeds your ladder. |
What makes Switchback different
Mid-turn escalation
Other model routers like NotDiamond and Martian pick a model at request start and that's that. Switchback can swap models between agent rounds if the cheap tier gives up. Requires deep agent-loop integration; catches failures other routers can't.
Brand-locked by design
You hand Switchback one ladder per user. An Anthropic OAuth user gets an all-Anthropic ladder (Haiku → Sonnet → Opus). An OpenAI key user gets all OpenAI (mini → gpt-5.4 → gpt-5.5). Your user's bill stays inside the provider they connected — every tier.
Agent-loop signals
Five escalation triggers built on real agent execution, not chat-completion patterns:
- Empty tool exit — model called tools, returned no text, two rounds in a row
- Tool error spike — three or more tool errors in the round's
steps - Continuation exhaustion — round 5+ without progress
- Sentinel after retry — explicit "empty response" sentinel survives the retry
- Refusal signal — assistant reply starts with "I can't" / "I'm unable"
Telemetry baked in
Every cascade.telemetry(state) returns a JSON record ready to write to your logs: initial model, final model, escalation count, per-trigger event log, per-tier tokens. Ship the same telemetry contract that VibeKit ships internally.
Quick start
import { createCascade, createOpenRouterClassifier } from 'vibekit-switchback';
const cascade = createCascade({
ladder: ['openai/gpt-5.4-mini', 'openai/gpt-5.4', 'openai/gpt-5.5'],
classify: createOpenRouterClassifier({ apiKey: process.env.OPENROUTER_API_KEY }),
});
const state = await cascade.init({ userMessage: 'add a dark-mode toggle' });
// First round with state.initialModel...
// After each round, call cascade.evaluate(state, signals, round) and
// cascade.escalate(state, decision.reason, round, tokensSoFar) if escalate.
const telemetry = cascade.telemetry(state);
// { initial_model, final_model, escalations, signals[], per_tier_tokens }
Full API reference in the GitHub README.
Hosted classifier endpoint
Don't want to manage an OpenRouter key just for the classifier? Hit the hosted endpoint with a VibeKit API key — $0.001 per call billed against credits:
$ curl https://vibekit.bot/api/v1/switchback/route \
-H "Authorization: Bearer vk_..." \
-H "Content-Type: application/json" \
-d '{"userMessage": "add a dark mode toggle",
"ladder": ["openai/gpt-5.4-mini", "openai/gpt-5.4"]}'
Get a key at app.vibekit.bot.
MCP server for Claude Desktop, Cursor, and Cline
Drop Switchback into any Model Context Protocol client as a stdio server. Exposes two tools — classify_turn and recommend_model — that any MCP-aware AI assistant can call to decide whether to spend on a flagship model for the next step.
# In your MCP client's config (e.g. Claude Desktop):
{
"mcpServers": {
"switchback": {
"command": "npx",
"args": ["-y", "vibekit-switchback-mcp"],
"env": { "OPENROUTER_API_KEY": "sk-or-..." }
}
}
}
When NOT to use Switchback
- You only have one model. Save the classifier latency. Switchback is for cascades — at least two tiers.
- You don't run an agent loop. The mid-turn escalation only helps if you have multiple rounds per user turn. For one-shot chat-completion routing, you want NotDiamond, Martian, or OpenRouter's
autoinstead. - Latency-critical chat under 200ms first-token. The classifier adds ~200-500ms before round 0's first token.
Frequently asked questions
What is a cascade model router?
A cascade router classifies each request by complexity, runs it on the cheapest model in your ladder that can plausibly handle it, then escalates to a more capable model mid-turn if the cheap one gives up. Switchback adds five escalation triggers built on real agent execution: empty tool exits, tool-error spikes, continuation exhaustion, retry sentinels, and refusal phrases.
How is this different from NotDiamond or Martian?
NotDiamond and Martian pick a model at request start; Switchback can swap models mid-turn between agent rounds, requires deep agent-loop integration, and stays inside whatever brand-locked ladder you hand it. Your user keeps their own provider across every tier.
Does Switchback cost anything?
The vibekit-switchback library is MIT-licensed and free. You pay for the classifier call (~$0.0001/turn via your OpenRouter key) and for whatever models the cascade runs against on your own key. The hosted endpoint bills $0.001 per call against VibeKit credits.
What does "brand-locked" mean?
You hand Switchback one ladder per user. An Anthropic OAuth user gets an all-Anthropic ladder. An OpenAI key user gets all OpenAI. Switchback stays inside whatever ladder you give it — your user's bill never crosses into another provider.
Where did Switchback come from?
Born inside VibeKit as the cascade router behind every "Auto" model turn. Extracted as a standalone library so the same approach can ship in any AI agent product. The classifier-first cascade thesis is heavily inspired by Factory.ai's Factory Router — same shape, different implementation.
VibeKit
Enter App