Switchback — per-turn cascade model router for AI agents

Open-source library that classifies each AI agent turn, starts on the cheapest model in your ladder that can plausibly handle it, then escalates mid-turn only when the cheap one fumbles. Brand-locked. MIT.

vibekit-switchback on npm vibekit-switchback-mcp on npm MIT
$ npm install vibekit-switchback
$ npx vibekit-switchback-mcp     # MCP server for Claude Desktop / Cursor / Cline

What problem does it solve?

Most AI agent products pick one model per app and use it for every turn. A "fix this typo" turn and a "refactor the auth flow" turn both hit the same flagship — wasted spend on the trivial half, no escalation safety on the complex half.

Switchback fixes both ends. A small classifier model (default openai/gpt-5.4-mini, ~$0.0001/call) scores each user turn as trivial, standard, or complex. The agent starts on the cheapest model in your ladder that can plausibly handle the work — then bumps up a tier mid-turn if signals say the cheap one is fumbling. Same agent loop. Same brand. Just spending what the turn actually needs.

How the cascade works

RoundWhat happensIf it worksIf it fumbles
0Classifier picks initial tier; agent runs with cheapest model that can handle itDone. Telemetry logs cheap tier.Empty tool-exit? Refusal? Escalate.
1+Next round runs on the bumped tier with new x-openclaw-model header (no DB write, no reload)Done. Telemetry logs both tiers.Three tool errors in steps? Escalate again.
TopFlagship runs the rest of the turnDone. Telemetry logs full climb.Capped — never exceeds your ladder.

What makes Switchback different

Mid-turn escalation

Other model routers like NotDiamond and Martian pick a model at request start and that's that. Switchback can swap models between agent rounds if the cheap tier gives up. Requires deep agent-loop integration; catches failures other routers can't.

Brand-locked by design

You hand Switchback one ladder per user. An Anthropic OAuth user gets an all-Anthropic ladder (Haiku → Sonnet → Opus). An OpenAI key user gets all OpenAI (mini → gpt-5.4 → gpt-5.5). Your user's bill stays inside the provider they connected — every tier.

Agent-loop signals

Five escalation triggers built on real agent execution, not chat-completion patterns:

Telemetry baked in

Every cascade.telemetry(state) returns a JSON record ready to write to your logs: initial model, final model, escalation count, per-trigger event log, per-tier tokens. Ship the same telemetry contract that VibeKit ships internally.

Quick start

import { createCascade, createOpenRouterClassifier } from 'vibekit-switchback';

const cascade = createCascade({
  ladder: ['openai/gpt-5.4-mini', 'openai/gpt-5.4', 'openai/gpt-5.5'],
  classify: createOpenRouterClassifier({ apiKey: process.env.OPENROUTER_API_KEY }),
});

const state = await cascade.init({ userMessage: 'add a dark-mode toggle' });

// First round with state.initialModel...
// After each round, call cascade.evaluate(state, signals, round) and
// cascade.escalate(state, decision.reason, round, tokensSoFar) if escalate.

const telemetry = cascade.telemetry(state);
// { initial_model, final_model, escalations, signals[], per_tier_tokens }

Full API reference in the GitHub README.

Hosted classifier endpoint

Don't want to manage an OpenRouter key just for the classifier? Hit the hosted endpoint with a VibeKit API key — $0.001 per call billed against credits:

$ curl https://vibekit.bot/api/v1/switchback/route \
    -H "Authorization: Bearer vk_..." \
    -H "Content-Type: application/json" \
    -d '{"userMessage": "add a dark mode toggle",
         "ladder": ["openai/gpt-5.4-mini", "openai/gpt-5.4"]}'

Get a key at app.vibekit.bot.

MCP server for Claude Desktop, Cursor, and Cline

Drop Switchback into any Model Context Protocol client as a stdio server. Exposes two tools — classify_turn and recommend_model — that any MCP-aware AI assistant can call to decide whether to spend on a flagship model for the next step.

# In your MCP client's config (e.g. Claude Desktop):
{
  "mcpServers": {
    "switchback": {
      "command": "npx",
      "args": ["-y", "vibekit-switchback-mcp"],
      "env": { "OPENROUTER_API_KEY": "sk-or-..." }
    }
  }
}

When NOT to use Switchback

Frequently asked questions

What is a cascade model router?

A cascade router classifies each request by complexity, runs it on the cheapest model in your ladder that can plausibly handle it, then escalates to a more capable model mid-turn if the cheap one gives up. Switchback adds five escalation triggers built on real agent execution: empty tool exits, tool-error spikes, continuation exhaustion, retry sentinels, and refusal phrases.

How is this different from NotDiamond or Martian?

NotDiamond and Martian pick a model at request start; Switchback can swap models mid-turn between agent rounds, requires deep agent-loop integration, and stays inside whatever brand-locked ladder you hand it. Your user keeps their own provider across every tier.

Does Switchback cost anything?

The vibekit-switchback library is MIT-licensed and free. You pay for the classifier call (~$0.0001/turn via your OpenRouter key) and for whatever models the cascade runs against on your own key. The hosted endpoint bills $0.001 per call against VibeKit credits.

What does "brand-locked" mean?

You hand Switchback one ladder per user. An Anthropic OAuth user gets an all-Anthropic ladder. An OpenAI key user gets all OpenAI. Switchback stays inside whatever ladder you give it — your user's bill never crosses into another provider.

Where did Switchback come from?

Born inside VibeKit as the cascade router behind every "Auto" model turn. Extracted as a standalone library so the same approach can ship in any AI agent product. The classifier-first cascade thesis is heavily inspired by Factory.ai's Factory Router — same shape, different implementation.

Switchback powers VibeKit's "Auto" mode
Every Auto turn in our hosted product runs through this cascade. Try the whole platform — every app gets its own AI agent.
Start Building →