Design Loops, Not Prompts. The Honest Version Of Steinb...

Peter Steinberger, the Austrian developer behind PSPDFKit who sold that company in 2021 and now works at OpenAI, posted this on X on June 7, 2026:

Here's your monthly reminder that you shouldn't be prompting coding agents anymore. You should be designing loops that prompt your agents.

Coding Twitter went off. A pile of think-pieces followed. A few of them ran with view counts and reply-split percentages that nobody can actually verify, so I am leaving those out and dealing with the argument itself, because the argument is the part that matters.

This is the shape of the workflow series I have been writing for the last two months. Loops are what /loop and /schedule already do. Skills, rules, memory, adversarial subagents on worktrees, hooks. Each piece is a slot in a loop. So this post is less "new idea" and more "here is what is real about the idea, here is what is overstated, and here is the cost problem nobody seems to want to talk about."

The argument, plainly

Steinberger's claim has two parts.

One. A modern coding agent is not a calculator you feed inputs to. It is a worker that needs an environment. The environment includes tools it can call, a memory it can read, a way to verify its own output, and a trigger that fires it. A prompt is only one of those five things.

Two. If you spend your time hand-crafting the prompt, you are doing the cheap, fungible part. If you spend your time building the loop around the agent, you are building the part that compounds. The prompt is a sentence. The loop is a system.

Boris Cherny, who runs Claude Code at Anthropic, said the same thing in a recent interview that The New Stack and Office Chai both picked up:

I don't prompt Claude anymore. I have loops running that prompt Claude and figuring out what to do. My job is to write loops.

Addy Osmani, a Google engineering lead, published a full essay calling it "loop engineering" and naming the primitives that make a real loop possible. Three respected practitioners, same direction. It is not a stunt take.

What Osmani actually said (and what some summaries get wrong)

A lot of the secondary write-ups paraphrase Osmani's framework as five blocks: skills, context injection, sub-agents, connectors, state files. That is not quite what he wrote.

In his Substack essay, Osmani names six primitives: scheduled automations, isolated workspaces (worktrees), skills, MCP connectors, maker/checker sub-agents, and durable on-disk memory. The two pieces that frequently get dropped from secondary summaries are the ones that do most of the work in practice: the scheduler that fires the loop without you sitting there, and the worktree that gives each parallel agent its own sandbox so they cannot stomp on each other.

If you read a summary that only lists five primitives without "scheduled automations" and "worktrees," you are reading a flattened version. Go to the source.

The concrete example Osmani uses is a morning CI loop: agents open worktrees, draft fixes for the overnight failures, review each other's drafts, open PRs. That is six primitives in one workflow. Take any of them out and the loop falls apart.

How this maps onto the workflow series here

This is where the argument stops being abstract. Every Osmani primitive already has a Claude Code surface that ships with the tool.

Scheduled automations are /schedule. A cron-style trigger that runs an agent on Anthropic's infrastructure whether your laptop is open or not. Morning report at 8am. Nightly test sweep at 2am. Weekly dependency audit on Monday.

In-session loops are /loop. The watchdog that polls a deploy, summarizes test failures, keeps the context window tidy while you work. Loops that die when the session dies, but cost almost nothing because they piggyback on your prompt cache.

Isolated workspaces are git worktrees, and the adversarial subagents pattern Cherny himself runs daily. Spawn three agents with different review roles (security, performance, style) on three worktrees, let them disagree, merge what survives. Uncorrelated reviewers find uncorrelated bugs.

Skills are the SKILL.md layer that lives in ~/.claude/skills/ and turns a repeatable workflow into a slash command. Rules scope constraints to a directory. Memory persists facts across sessions. Each layer has a job; pile them all into CLAUDE.md and the model stops reading carefully around the 800-line mark.

Hooks are the deterministic auto-approve layer at .claude/hooks.json. PreToolUse, PostToolUse, Stop, UserPromptSubmit. Stop clicking approve on npm test 47 times a day. Wire the loop to act instead.

MCP connectors are how the loop touches the outside world. Calendar, GitHub, your file system, an internal API. The loop's reach is the union of its connectors.

Six primitives, six surfaces, all shipping. The Steinberger tweet is not a prediction. It is a description of what people running Claude Code or OpenClaw at scale have already been doing for six months.

Where the discourse oversells the idea

I want to be honest about three places the loops-not-prompts framing leans on hype.

One. The "OpenClaw was built in a single hour" story is wrong. It is repeated everywhere. The one-hour figure refers to Steinberger's original Clawdbot prototype in November 2025, not the production OpenClaw codebase that hit roughly 180,000 GitHub stars in three months and kept growing through early 2026, per Hive Security's timeline. The production project took real engineering. Conflating the two makes the rest of the workflow look like magic when it is craft.

Two. Loops without verification are just expensive prompt chains. Steinberger's own philosophy, in the interviews he has given on The Pragmatic Engineer and The Wantrepreneur, is that code works well with AI because it is verifiable. Compile, run, test, lint. You close the loop by checking. If your loop has no verifier (no test suite, no lint pass, no diff review, no human checkpoint), it is not a loop. It is a fire hose.

Three. The architecture has shipped, the security model has not. OpenClaw's seven-component architecture (Channel System, Gateway, Plug-ins/Skills, Agent Runtime, Memory & Knowledge, LLM Provider, Local Execution, per the arXiv writeup) is genuinely well-factored. The marketplace built on top of it is not. In January 2026, OpenClaw shipped CVE-2026-25253, a critical (CVSS 8.8) WebSocket-hijack RCE that let a single malicious link steal the auth token and pop remote code execution. Patched within four days in v2026.1.29, but it shipped. A month later Koi Security audited ClawHub and found 341 malicious skills out of roughly 2,857 in the marketplace. A skill marketplace where roughly twelve percent of listings push macOS stealer malware is not a foundation. It is a warning.

A loop that pulls a poisoned skill from a poisoned marketplace and runs it on your machine is the worst possible version of "automation." That is the failure mode the prompt-engineering era did not have because nobody had built the runway for it yet. The loop era has.

The unsexy cost problem

Here is the honest counter-argument to the entire framing.

A prompt is bounded. You write it, you send it, you read the answer, you stop. You know what the call cost.

A loop is not bounded. The whole point is that it runs on a schedule, fires off subagents, calls tools, retries on failure, escalates to bigger models when the small model gets stuck. Every one of those is a token-cost multiplier. The Towards AI piece on the $47,000 agent-loop incident (where a poorly-bounded loop ran overnight and produced essentially nothing) is the cleanest example of the failure mode, and it was just one team. The boring observability layer (daily caps, iteration limits, alerts at half the cap, lint on every new skill, read-only by default on every connector) is what stops your "loop engineering" from becoming a story your CFO tells at parties.

The Cherny quote is real and it is right. But Cherny works at the company that ships the model. His token cost is internal. Yours is not. If you are an SMB or an indie operator, the cost discipline matters more than the loop design, because a loop without cost discipline is a self-inflicted denial-of-service attack on your own credit card.

That is the part the tweet did not say.

A concrete starter loop you can actually run

If you are coming to this from the Claude Code series and you want one loop today, here is the simplest one that pays for itself.

A morning loop, fired by /schedule '0 8 * * *':

Pull the last 24 hours of git commits and open PRs on the repo.
Run the test suite against main.
Lint everything that changed.
Summarize failures, slow tests, and untested new files into a one-page brief.
Drop the brief in a file and (optionally) email it to you.

Five primitives in one loop. Scheduler fires it. Subagent for the test run. Skill for the brief format. Memory for "what I already told you yesterday so I do not repeat it." Hook to auto-approve the safe Bash calls.

If you have not built it, you do not have a loop. You have a habit of reading CI dashboards at 9am.

The Apple Silicon piece

A practical wrinkle for anyone running this on a Mac. The model under the loop matters less than people think, but the inference layer it runs on matters a lot, because loops fire often.

Ollama 0.19 shipped its MLX backend on March 30, 2026 and, on M5 Max running Qwen3.5-35B-A3B (bf16), measured prefill rising from 1,154 to 1,810 tok/s (+57%) and decode from 58 to 112 tok/s (+93%) versus the prior backend. That is Ollama's own published number on one chip with one model. Independent benchmarks (Andreas K's MLX vs llama.cpp comparison, Antek Apetanovic's Qwen3.5 run) put the MLX advantage in a range of roughly 20% to 87% on sub-14B models, dropping to near zero above 27B as memory bandwidth dominates. The flat "MLX is 20 to 30% faster" line you see in some posts is a single point inside a wide range, not a constant.

If your loops are firing dozens of times an hour on a local model, the chip-and-runtime choice compounds. I am building a free Apple Silicon Local AI Advisor (no signup, no telemetry, all client-side) that takes your chip and RAM and gives a source-cited shortlist of runtimes and model sizes that actually fit, with every number labeled measured, vendor-claimed, or engineering judgment. That tool lands soon.

So is the prompt dead

No. You still write the prompt that goes into the loop. You still tune the prompt when the loop produces garbage. The loop is the new unit of work, but the prompt is still the smallest brick inside it.

The honest version of the argument is not "stop prompting." It is stop optimizing the brick when the building plan is the bottleneck. If your bottleneck is "Claude does not understand what I want," yes, work on the prompt. If your bottleneck is "I am the one re-triggering Claude every twenty minutes," you have a loop problem, not a prompt problem. Identify the actual bottleneck before you swallow whoever's framing happens to be loud that week.

This is the version of Steinberger's tweet I think holds up under scrutiny. The verifiable parts (a real founder, a real argument, a real corroborating quote from Cherny, a real essay from Osmani naming the six primitives) are enough. The unverifiable bolt-ons (specific view counts, specific reply splits, an OpenClaw "built in one hour," a single-statistic ClawHub 12% pulled from a different audit's denominator) do not add anything you need.

Build the loop. Bound the cost. Verify the output. Pull skills from places you trust. The rest is engineering.

A book mention, since this is the audience for it

If you are running an under-thirty-person business and trying to figure out what part of the AI stack is worth your own time versus an agency's quote, the spine of that argument is The $100 Network ($9.99 on Kindle, Digital Empire series). The book covers the boring observability layer in more detail than a blog post can, including the specific cost-control patterns that make a loop economy actually work at SMB scale.

Fact-check notes and sources

Peter Steinberger founded PSPDFKit and sold the company in 2021. He joined OpenAI in February 2026 after also receiving offers from Anthropic and Meta; he turned Meta down. Sources: Wikipedia: Peter Steinberger, TechCrunch, Feb 15, 2026, Implicator: "Peter Steinberger Chose OpenAI".
The June 7, 2026 tweet text is verifiable at x.com/steipete/status/2063697162748260627. Specific view counts and reply-split percentages cited in some secondary write-ups vary across sources and are not used here.
Boris Cherny is the creator and head of Claude Code at Anthropic. The "I don't prompt Claude anymore" quote is reported by The New Stack's "Loop Engineering" coverage, Office Chai, and Digg.
Addy Osmani's loop-engineering essay names six primitives (scheduled automations, isolated workspaces, skills, MCP connectors, maker/checker sub-agents, durable on-disk memory). Source: Addy Osmani: Loop Engineering and The New Stack coverage.
OpenClaw's seven-component architecture (Channel System, Gateway, Plug-ins/Skills, Agent Runtime, Memory & Knowledge, LLM Provider, Local Execution) is documented in arXiv 2603.27517 and the ppaolo Substack architecture overview.
CVE-2026-25253 (OpenClaw WebSocket auth-token theft leading to RCE, CVSS 8.8) was disclosed January 26, 2026 and patched in v2026.1.29 on January 30, 2026. Sources: SonicWall threat advisory, The Hacker News, ProArch advisory.
Koi Security's ClawHub audit (Oren Yomtov, February 2026) found 341 malicious skills out of roughly 2,857 audited, with most tied to a campaign called ClawHavoc distributing the AMOS macOS stealer. Sources: The Hacker News, SC Media, CyberInsider.
OpenClaw GitHub star counts: roughly 180,000 by late January 2026 (about three months after the November 2025 launch), crossing 250,000 by March 2026. Sources: Hive Security timeline, Inbounter 2026 timeline.
Ollama 0.19 shipped its MLX backend for Apple Silicon on March 30, 2026, with Ollama's own benchmarks on M5 Max showing Qwen3.5-35B-A3B prefill of 1,154 → 1,810 tok/s (+57%) and decode of 58 → 112 tok/s (+93%). Source: Ollama blog: MLX.
MLX vs llama.cpp on sub-14B Apple Silicon shows a range of roughly 20% to 87% improvement, narrowing to near zero above ~27B. Sources: Andreas K, MLX vs llama.cpp, Antek Apetanovic, Qwen3.5 Apple Silicon benchmark.
The $47,000 agent-loop incident framing is from Towards AI's "We Spent $47,000 Running AI Agents in Production", October 2025.

This post is informational, not consulting advice. Mentions of Peter Steinberger, Boris Cherny, Addy Osmani, Anthropic, OpenAI, Google, Meta, Apple, and other third parties are nominative fair use. No affiliation is implied.

Stop Prompting, Start Designing Loops. The Honest Version Of A Tweet That Set Off Coding Twitter.