Running Google Ads from the Terminal with Custom AI Skills

TLDR: A paid search manager at Kilo built a terminal-based workflow for managing Google Ads using custom AI skills and CLI tooling. Every mutating operation ships with a dry-run default, the system accumulates API knowledge across sessions, and model selection is deliberate — expensive models for strategy, cheaper ones for mechanical execution.

Summary:

The premise here is deceptively simple: most paid search work is not intellectually hard, it is just tedious. Renaming 40 ad groups, building a campaign from a spec, auditing tracking templates across an account — none of these tasks require real judgment, but they each involve enough manual steps that a mistake is almost inevitable when you do them by hand. The author, who has been in paid search for 12 years, decided to replace the Google Ads UI almost entirely with a CLI workflow built on Kilo's tooling and a set of custom skills written in plain markdown.

The architecture is worth understanding in detail. There are three layers: rules, which are always active and serve as the "do not shoot yourself in the foot" guardrails; skills, which load on demand for specific types of work; and scripts, which are the actual Python files making API calls. The rules file encodes conventions that must never be violated — no live API writes without an explicit execute flag, new campaigns always created paused, every mutating script ships with a dry-run mode that prints a full diff before touching anything. The skills are reference documents written for a future session that has forgotten everything. They list existing scripts, what each one does, which patterns it demonstrates, and any API quirks discovered along the way. The scripts themselves are generated by agents reading those skills, not written from scratch each time.

What makes this interesting architecturally is the compounding effect. The author describes discovering mid-task that the Google Ads API returns STRING_TOO_SHORT when you try to clear a tracking URL field with an empty string. That is genuinely non-obvious from the documentation. Once discovered, the fix goes into the skill. Every subsequent session that touches tracking templates now knows this without having to discover it again. Same with the fact that trial conversions live in a different metrics field than regular conversions, or that segments and cost metrics cannot coexist in the same GAQL query for search term views. Each API footgun, once hit, becomes permanent institutional knowledge encoded in markdown. The author estimates 600 lines of skills is worth more than all the Python in the repository combined — the markdown is what makes the Python safe to generate from a prompt.

Model selection by task type is another deliberate choice worth highlighting. Planning and strategy gets the expensive reasoning model. Once the plan is locked and execution begins — read the skill, follow the pattern, write the Python, run dry-run — that mechanical work goes to a cheaper, faster model. The author is explicit about not wanting to pay for reasoning horsepower on tasks that are purely pattern-following. This is a practical cost discipline, but it also forces a useful cognitive habit: distinguishing which work actually requires judgment and which work is just labor.

The weekly reporting workflow crystallizes the philosophy cleanly. The script produces numbers through mechanical aggregation. Commentary, the "why did this spike" paragraph, the what-we-are-going-to-do-about-it section — those stay human. The author trusts the numbers to be computed accurately but does not trust any model to write the actual story from those numbers alone. There is no attempt to fully automate the judgment. The automation stops exactly where judgment begins.

Key takeaways:

Every mutating script defaults to dry-run, preventing accidental API writes during development or review
Skills are reference documents for future sessions — they accumulate API quirks, conventions, and discovered footguns over time
Model selection is deliberate: expensive reasoning models for strategy, cheaper execution models for mechanical pattern-following
New campaigns are always created paused; enabling is a manual UI step, never automated
Commentary and judgment are explicitly excluded from automation — numbers auto, interpretation manual

Why do I care: The pattern here maps cleanly onto how I think about agentic tooling in any domain, not just paid search. The key insight is that skills-as-markdown are doing the same job as a well-maintained internal wiki, but they are actually read and acted on because they sit in the agent's context window. Most teams accumulate institutional knowledge in Confluence pages nobody opens. This setup makes that knowledge active. The compounding argument is also real — the marginal value of documenting an API quirk once is enormous compared to rediscovering it repeatedly. The one thing I'd push back on is the implicit assumption that dry-run output is sufficient review. For a campaign with 751 ad groups, a text diff is still a lot of cognitive surface area, and the author admits human mistakes are "easy to miss." The safety comes from the convention being enforced consistently, not from the reviewer catching everything. That is a meaningful distinction worth being honest about.

Running Google Ads from the terminal with custom AI skills