In partnership with

Stop babysitting your coding agents

Agents can generate code. Getting it right for your system, team conventions, and past decisions is the hard part – you end up wasting time and tokens in correction loops.

MCPs give agents access to information but not understanding. The teams pulling ahead use a context engine to give agents exactly what they need.

  • Where teams get stuck on the AI maturity curve

  • How a context engine solves for quality, efficiency, and cost

  • Live demo: the same coding task with and without a context engine

Friday night before I left for the Costa Brava. Two Claude Code windows side by side. Same prompts. Two of my own apps, same stack at 95%.

Left window: Claude writes the mutation, types a command into its terminal, reads the JSON that comes back, spots a wrongly-cast field, fixes it, retypes. 3 iterations in total autonomy. I come back, say "have a good weekend," commit validated.

Right window: Claude writes the mutation and stops. It pings me. "Can you open the admin and check that this works?" I click. I refresh. Re-ping. My wife starts raising her voice. Whatever, we'll see Monday.

Same Claude. Same me. One folder of difference: a homegrown CLI.


What the 2026 stack forgot

Every guide (agents folder, CLAUDE.md, MCP servers…) lists the same layers. One is missing: a CLI that exposes your business logic as typed commands the agent can chain and verify on its own.

Not a throwaway script in scripts/. Not a REST wrapper. A real kernel: bun run cli partner sync --dry-run, structured JSON out, readable exit codes. The CLI becomes the nervous system the agent palpates, while your UI stays the human interface.

Over 30 days, the app with a CLI shipped 1.8x more commits than the other. Same backend. Same framework. Same prompt. The whole gap comes from the write → run → read → fix loop the agent can close on its own, without going through you.

The full breakdown, published this morning. You'll find:

  • The pattern that turns business logic into a surface the agent can audit

  • The 4 design mistakes that keep a CLI cosmetic (and why scripts/ doesn't count)

  • The Convex + bun template I run on this repo

The surrounding context, if you want to dig in

Phil

PS: reply with the last spot where Claude/Codex asked you to click in a UI when it could have verified itself. I'm collecting patterns for a follow-up on which commands pay back the fastest when you expose them first.

PPS: if you're reading this thinking "I don't even know how to open a terminal," that's exactly what my book Vibe Coding, For Real unpacks — the 8-step process to ship a real app without coding yourself. The CLI comes after; the book gets you to the starting line.

Reply

Avatar

or to participate

Keep Reading