Running SEO for a real business with an AI agent — the playbook we use on vibekit.bot

We're a tiny team. Our SEO work — keyword research, baseline capture, query-intent audits, decision logs, blog drafts — is mostly driven by an AI agent reading a single instruction file. Here's the exact setup, what it does well, what we still do by hand, and the artifacts you can copy.

Why this post exists

SEO at a small company is a tax on time you don't have. There's a pull-the-numbers task, a write-the-baseline task, a re-read-last-month's-decisions task, a check-the-hold-window task, a draft-the-meta-rewrite task. None of them are hard. All of them want an hour, twice a week, forever. They never get done because they sit beneath whatever ships next.

We ran into this exact problem at VibeKit and stopped doing it manually. Today the work happens by handing a single instruction file to an AI agent, watching it pull data, propose changes, and check itself against rules we wrote once. We don't do the rote part. We do the strategy part. That ratio — agent on the loop, human on the strategy — is what makes any of it sustainable for a team of one.

This post is the exact setup. It's not theoretical. The file the agent reads is checked into our repo, the scripts are real, and the workflow has caught real mistakes (including one that nearly pruned 69% of our impression base on stale data).

The core artifact: one instruction file the agent reads

Everything starts with a single markdown file at docs/seo.md — about 300 lines, version-controlled, read by the agent on every relevant turn. It has six sections:

Workflow rules — "capture a baseline before editing any SEO-relevant file", "14-day hold per URL", "one change per commit". These are the guardrails. Without them the agent will happily rewrite three meta tags in one commit and we'll never know which one moved the needle.
Open Hypotheses table — what's currently in flight, when it ships, when it's eligible for review, what the verdict is.
Decisions table — resolved hypotheses with the verdict and the reasoning. Lost / Won / Inconclusive. This is the institutional memory; without it we repeat the same losing experiments quarterly.
Watch list — pages getting organic movement (impression growth or position drift) without a deliberate edit. Recorded so the agent has the full picture at review time.
Strategic updates — when a thesis changes (e.g. "we were optimizing the wrong pages all along"), the reasoning gets a dated entry. The agent reads these to understand current intent, not just current rules.
Tools — exact script paths and what each does.

The file is the agent's brain. It's also legible to humans. When I hand off this work, I hand off this file.

The structure is straightforward enough to copy from this post alone — six sections, each with a clear job. The scripts the file references are four short Python files (~50 lines each), small enough to rebuild from the spec in the §Tools section below.

The agent's actual job, broken down

Here's what happens when I tell the agent "pull GSC and tell me what to update today":

Run scripts/gsc-recent.py — pulls 7-day window vs prior 7-day from Google Search Console via service-account auth.
Compare to numbers cited in seo.md — flags drift (e.g. "the watch list says /claude-code is pos 8.2, current data shows pos 9.6 — drift accelerating").
Cross-reference against the rules — for each page that needs an edit, check the 14-day hold (when was the last commit touching it?). Skip if too soon.
Check the Open Hypotheses table — if a page is in a pending hypothesis with an eligibility date that's passed, flag it for resolution.
Propose specific edits — not "update the title", but "current title: X, proposed: Y, reason: Z, eligibility window: now-clear". Each one is a single-vector change.
Wait for me to approve.

Most of this is rote. The agent doesn't decide whether we should rewrite a meta — that's a strategy call that depends on what we're trying to prove this quarter. It surfaces options and grounds them in the data.

What the agent does NOT do without human review:

Edit any landing page or blog file. The 14-day hook still blocks commits; the agent respects it.
Resolve a hypothesis as Won or Lost. Those are calls I make, written into the Decisions table by hand (or with the agent drafting and me approving the verbatim text).
Decide strategic direction. When we shifted from "optimize existing pages" to "expand right-audience surface" on 2026-05-29, that was a one-hour conversation between me and the agent — and the conversation itself became a dated update block in seo.md so future-me and future-agent both have the reasoning.

The toolchain (it's smaller than you'd think)

Four Python scripts and one git hook. That's it.

scripts/seo-baseline.py <path> — captures a 7-day GSC snapshot for a URL and prints a markdown row ready to paste into the Open Hypotheses table. Forces the baseline-first rule by being the path of least resistance.
scripts/gsc-recent.py — 7d vs prior 7d totals + top pages + top queries. Run before any meaningful edit decision.
scripts/gsc-perf.py — 28d vs prior 28d. The monthly trend read.
scripts/bing-pull.py — Bing Webmaster Tools (because ChatGPT Search and Microsoft Copilot use Bing's index).
.githooks/pre-commit — blocks commits that touch SEO-gated files (src/routes/static.ts, content/blog/**, src/routes/blog.ts, public/sitemap*.xml, public/robots.txt) if the last commit touching them was less than 14 days ago. The agent reads this hook's behavior, so it won't try to push an edit that's going to be blocked.

All four scripts read from ~/.config/vibekit/gsc-key.json (a GCP service account key with the read-only Search Console scope) or an env var (BING_WEBMASTER_KEY). Both setups are 10 minutes of one-time work.

Giving the agent access to GSC (without giving it the keys)

This is the part most people overthink, so it's worth a section of its own.

The agent does not need an OAuth flow, an MCP server, or a credential vault to use Google Search Console. It needs a script on disk that knows how to authenticate, and the ability to run that script. That's it.

Here's the whole pattern:

Service-account key on disk, in a known path. GCP gives you a JSON file with a private key when you create a service account. Put it at ~/.config/vibekit/gsc-key.json (or wherever your scripts expect). Lock the file with chmod 600. The agent never sees the contents; only the script does.
Script authenticates and prints structured output. Eight lines of Python — load the key, request a token, hit the Search Console API, print formatted rows to stdout. No streaming, no callbacks, no state. Just stdin/stdout.
Agent runs the script via Bash. It sees the printed table or markdown. It doesn't see the key file, doesn't handle the token, doesn't know your GCP project ID. The least-privilege story is the agent has exactly the access a cat-like read of the script's output would give it.

The actual auth snippet (this is the entirety of the credential-handling code across all four GSC scripts):

from google.oauth2 import service_account
from google.auth.transport.requests import Request

KEY_PATH = os.path.expanduser('~/.config/vibekit/gsc-key.json')
creds = service_account.Credentials.from_service_account_file(
    KEY_PATH,
    scopes=['https://www.googleapis.com/auth/webmasters.readonly'])
creds.refresh(Request())
token = creds.token  # use as Bearer in any GSC API call

That's it. No env vars, no secret manager, no per-request OAuth dance. The token's lifetime is managed by the Google client library; you never serialize it.

Why this beats MCP for this job

A lot of recent agent tutorials reach for MCP (Model Context Protocol) to give agents access to third-party APIs. MCP is great for interactive tools — file system, shell, browser. For read-only data APIs like GSC, it's overkill and adds three problems:

An extra server to run. MCP needs a listener process that the agent talks to. One more thing to start, monitor, and restart.
Vendor coupling. MCP works well with Claude Desktop. It's still rough on other agents and CLIs as of mid-2026.
Less auditable. A script you wrote yourself is 50 lines you read once and forget about. An MCP server is a moving dependency.

Compare: a Python script + a service-account key. The agent runs the script, parses the output, makes decisions. If the agent changes (you swap Claude for something else next year), the script doesn't change. If the API output format changes, you edit one file. There's nothing else in the loop.

This pattern works for almost any "give my agent read access to X" problem: GSC, Bing Webmaster, Stripe, GitHub stats, Airtable, your own database. Write a script with a service-account or read-only API key, hand the agent the script. Done.

The minimum GCP setup (10 minutes, one time)

For people who haven't done this before:

In Google Cloud Console, create a new service account in a project. Name it something like vibekit-gsc-readonly. No roles needed at the GCP project level.
From its "Keys" tab, create a new JSON key. Download the file.
Save it: mkdir -p ~/.config/vibekit && mv ~/Downloads/your-key-file.json ~/.config/vibekit/gsc-key.json && chmod 600 ~/.config/vibekit/gsc-key.json.
In Google Search Console, open your property → Settings → Users and permissions → Add user. The email is the service account address (looks like [email protected]). Set the role to Restricted — that's enough for read.
Enable the Search Console API on the GCP project.
Run python3 scripts/gsc-recent.py. If the auth works, you'll see your last 7 days of GSC data. If it doesn't, the error is almost always "service account isn't a property user" — re-do step 4.

The whole thing — service account, key file, GSC property add, API enable — is ~10 minutes. The script is then permanent. Every agent you ever use against this codebase reads from the same key file with the same scope.

Why scripts and not direct MCP tool calls

To restate the principle: the agent might be Claude today and something else tomorrow. Scripts that read a key file and print structured output work for any agent. The instruction file (seo.md) tells the agent which script to run for which task. No vendor lock-in below the agent layer, no credentials touched by the model, no MCP server to babysit.

The honest limits

A few things this setup does not solve:

The agent can write a great blog post draft, but it can't decide what to write about. Topic selection still requires human input — looking at what's converting, what queries we're underserving, what we have a unique perspective on. The agent surfaces the underserved queries; I pick which one matters.
The agent doesn't proofread its own writing for voice. Generic AI-generated posts read as generic AI-generated posts. We use Claude for drafts, I rewrite the parts that don't sound like me. The post you're reading is exactly that — agent-drafted, human-edited for voice and the specific war stories.
Quarterly strategy is mine. The agent runs the playbook. It doesn't decide whether the playbook is the right playbook this quarter. When we shifted from optimizing existing pages to expanding the blog (this post is part of that), the agent surfaced the data — but the call to change strategy was made over coffee, not in a tool call.
Some queries the agent can't help with at all. "Should we go after the lovable-alternative query?" is a market-positioning question. The data input ("Lovable is the hottest AI app-builder, no existing page") is from the agent. The decision is human and durable.

What you'd ship in your first weekend

If you want to copy this for your own business:

Write docs/seo.md (or whatever you call it). Even a 50-line version. Sections: Workflow rules, Open Hypotheses, Decisions, Watch list, Tools. Don't worry about content; the structure is the point.
Set up GSC API access. Create a GCP service account, grant it read-only Search Console, add it as a property user in GSC. Save the JSON key to ~/.config/yourapp/gsc-key.json. Total time: 20 minutes.
Write one script. Just gsc-recent.py. The other tools can wait. Get the 7d-vs-prior-7d comparison working and reading from your service-account key.
Write one rule. Just the 14-day hold. Pre-commit hook. One file, ten lines of bash. (Our hook's at .githooks/pre-commit if you want a starting point.)
Hand the file to your agent and ask it to "review SEO and propose changes." It will pull GSC, read the rules, propose 2-3 edits. You'll see immediately whether your instruction file is precise enough — if the agent suggests something the file should have caught, the file is missing a rule. Add it. The file gets smarter with every session.
Ship one experiment. Pick the smallest possible change with a captured baseline. Wait 14 days. Resolve. Add the verdict to Decisions.

You're now running the loop. It's not glamorous and it's not magic. It's just bookkeeping with a teammate that never forgets to check.

The compound effect

The seo.md file at the start of May 2026 was 50 lines and had two rules. Today it's 350+ lines, has 12 resolved decisions, six open hypotheses, a watch list, a content backlog, and a Bing strategy. Every single addition came from an agent session — either the agent flagging a gap, me catching the agent in a near-mistake, or both of us noticing a drift and writing the new rule into the file so the next session inherits it.

The file is the institutional memory of our SEO work. It's not a vanity exercise. When I'm gone for a week, the agent reads it and picks up exactly where the work was. When something goes wrong, the post-mortem goes into the Decisions table and becomes a rule. The thing compounds at the speed we encounter new failure modes.

This is the part of AI-agent-assisted work I find most underrated. It's not the speed of the individual tasks. It's that every session deposits a rule that future sessions inherit. The file grows. The work gets cheaper to do. You build a moat in your own ops.

If you're a small team doing growth work with finite hours, this is one of the highest-leverage things you can set up. Cost: one weekend. Payoff: SEO that actually compounds.