Stop babysitting your coding agents
Agents can generate code. Getting it right for your system, team conventions, and past decisions is the hard part – you end up wasting time and tokens in correction loops.
MCPs give agents access to information but not understanding. The teams pulling ahead use a context engine to give agents exactly what they need.
Join us April 23 (FREE) to see:
Where teams get stuck on the AI maturity curve
How a context engine solves for quality, efficiency, and cost
Live demo: the same coding task with and without a context engine
Friday night before I left for the Costa Brava. Two Claude Code windows side by side. Same prompts. Two of my own apps, same stack at 95%.
Left window: Claude writes the mutation, types a command into its terminal, reads the JSON that comes back, spots a wrongly-cast field, fixes it, retypes. 3 iterations in total autonomy. I come back, say "have a good weekend," commit validated.
Right window: Claude writes the mutation and stops. It pings me. "Can you open the admin and check that this works?" I click. I refresh. Re-ping. My wife starts raising her voice. Whatever, we'll see Monday.
Same Claude. Same me. One folder of difference: a homegrown CLI.
What the 2026 stack forgot
Every guide (agents folder, CLAUDE.md, MCP servers…) lists the same layers. One is missing: a CLI that exposes your business logic as typed commands the agent can chain and verify on its own.
Not a throwaway script in scripts/. Not a REST wrapper. A real kernel: bun run cli partner sync --dry-run, structured JSON out, readable exit codes. The CLI becomes the nervous system the agent palpates, while your UI stays the human interface.
Over 30 days, the app with a CLI shipped 1.8x more commits than the other. Same backend. Same framework. Same prompt. The whole gap comes from the write → run → read → fix loop the agent can close on its own, without going through you.
The full breakdown, published this morning. You'll find:
The pattern that turns business logic into a surface the agent can audit
The 4 design mistakes that keep a CLI cosmetic (and why
scripts/doesn't count)The Convex + bun template I run on this repo
The surrounding context, if you want to dig in
Why CLIs Beat MCP for AI Agents — And How to Build Your Own CLI Army — the underlying why. Context cost, composability, why CLIs beat MCP servers for coding agents.
I Run 15 AI Clones of Myself in Parallel — what unlocks when your agents can verify themselves: you launch 15 of them with no babysitter.
I Stopped Vibe Coding and Started "Prompt Contracts" — how to frame what the agent does with those commands, without restarting from scratch every attempt.
Phil
PS: reply with the last spot where Claude/Codex asked you to click in a UI when it could have verified itself. I'm collecting patterns for a follow-up on which commands pay back the fastest when you expose them first.
PPS: if you're reading this thinking "I don't even know how to open a terminal," that's exactly what my book Vibe Coding, For Real unpacks — the 8-step process to ship a real app without coding yourself. The CLI comes after; the book gets you to the starting line.



