Adversarial Subagents + Worktree Isolation — Claude Cod...

Part of the Claude Code workflow series. Start with the install primer; then what to do after install; then this post for the multi-agent review pattern the Claude Code team themselves use daily.

Review-by-consensus is a bad pattern for code review and a worse pattern for LLM code review. If you spawn two subagents and ask them to "check each other's work," they mostly agree. Agreement feels like validation. Most of the time it's just two instances of the same base model averaging out the same blind spots.

Boris Cherny's publicly-described workflow does the opposite: he spawns subagents with deliberately different review roles and lets them disagree. The idea is simple — a security reviewer, a performance reviewer, and a style reviewer all critique the same change. Where they agree, you trust it. Where they disagree, you read the arguments and make the call yourself.

Pair that with worktree isolation (one git worktree per parallel agent) and you get a pattern that scales from "refactor one file" to "migrate 400 files from Framework A to Framework B" without losing your mind.

Why "adversarial" matters

Two agents sent the same prompt produce correlated answers. Two agents sent different prompts aimed at different failure modes produce uncorrelated answers. The uncorrelation is the whole point.

Practical example. You've just written a new API endpoint. Three reviewers:

Security reviewer — prompt: "Look only for: auth bypass, injection, secret leakage, rate-limit gaps, missing input validation. Report each finding with severity and an exploit sketch. Ignore style and performance."
Performance reviewer — prompt: "Look only for: N+1 queries, unbounded loops, missing cache keys, blocking I/O on the request path, missing indexes. Ignore security and style."
Style / maintainability reviewer — prompt: "Look only for: naming clarity, function length, untyped parameters, missing JSDoc on public APIs, pattern consistency with the rest of the repo. Ignore security and performance."

Run all three in parallel. Aggregate their findings. The three reviewers routinely catch different things. Security finds the SQL injection that performance missed. Performance finds the unbounded loop that style didn't care about. Style finds the misnamed variable that will cost you thirty minutes in two weeks.

If you instead had asked one agent "review this endpoint," you'd have gotten a mediocre paragraph that touched none of those in depth. Role specialization is the delta.

Worktree isolation — the bit that makes it safe

Subagents writing to the same branch stomp each other. Subagents writing to isolated git worktrees don't.

claude --worktree security-review -- "Review the new /api/billing endpoint for security"
claude --worktree perf-review -- "Review the new /api/billing endpoint for performance"
claude --worktree style-review -- "Review the new /api/billing endpoint for style"

Each command creates a separate worktree off the current branch, spawns a Claude session scoped to that worktree, and runs the review. The three sessions share no filesystem state. They can all edit, all test, all commit, and nothing collides.

Under the hood claude --worktree <name> is roughly equivalent to:

git worktree add ../myrepo-<name> <current-branch>
cd ../myrepo-<name>
claude

Worktrees share the object database (cheap — no duplication of blob storage) but have their own working tree and HEAD. Claude sessions each have their own filesystem sandbox.

A real five-agent spawn

For changes bigger than a single endpoint — say, migrating one package to a new dependency — Cherny's published default is five parallel agents. The recipe:

Lead agent — reads the change description, decomposes the work into 5–30 independent units, emits a plan. You approve or revise the plan before anyone writes code.
Implementation agents (N) — one per unit, each in its own worktree. Each implements, tests, and opens a PR against the main branch.
Review agents — after implementation, the adversarial review agents (security / perf / style) run against the aggregate diff.
Lead agent (again) — reads all review output, resolves disagreements (or flags them for you), and produces the merge plan.

This is roughly what /batch does under the hood for mechanical migrations. For anything more subjective — architecture changes, API redesigns — you run the pattern manually so you retain judgment at the decomposition step.

The merge strategy that doesn't blow up

Five parallel agents = five parallel branches. Merging them naively creates merge conflicts nobody wants to resolve.

The strategy that works:

Each agent gets a non-overlapping scope. Pre-plan it. If agent A edits src/billing/ and agent B edits src/billing/calculator.ts, you're going to have a bad time. Decompose so each agent owns a directory or a module, not a file.
Merges happen in dependency order. If agent C depends on a type agent A created, merge A first, rebase C on top, then merge.
The lead agent runs the merge, not you. After all agents report done, the lead agent pulls each branch, merges in dependency order, resolves trivial conflicts, runs the full test suite, and reports. You review the final merged diff in one pass.
Failed agents don't block the merge. If agent D crashed or got stuck, merge A / B / C, flag D as needing human attention, move on. Don't hold up the 80% that worked because the 20% didn't.

Where the pattern breaks down

The pattern is a multiplier, not a silver bullet. Four places it falls apart:

Decomposition quality. If the lead agent decomposes poorly, the five implementation agents are each doing 20% of the wrong work. Review the plan before approving. Push back on vague unit descriptions.

Coupled modules. Some code genuinely resists decomposition. A tightly-coupled state machine in one file can't be split across three agents. Recognize this during planning and keep the change single-agent.

Unstable APIs mid-change. If agent A is redesigning a type that agents B and C are also using, the type shifts while they're working and they end up referencing the old shape. Mitigation: merge API changes first (A alone), then spawn B and C on top of the merged result.

Review agent fatigue. If the security reviewer sees the same pattern across all 25 units, by unit 10 it stops flagging. Rotate review agents or reset context between units to keep attention fresh.

When to use this pattern

Mechanical migrations across many files — framework swaps, dependency upgrades, naming-convention rollouts.
Security-critical changes where you want adversarial coverage — auth, payments, PII handling.
Pre-release code reviews where you want more than one lens on the diff.
Learning-mode — watching three reviewers argue about a change teaches you more about your own code than reading any single AI review.

When NOT to use it:

Small changes. A one-file bug fix doesn't need five agents. Run it once, review it yourself, move on.
Exploration. If you don't know what you want, decomposing into parallel units is premature. Do the exploration with a single agent first.
Cost-sensitive sessions. Five parallel agents cost ~5× one agent. For hobby projects, this adds up.

What you actually set up (start-of-day)

If you want to run this pattern tomorrow:

Make sure your repo is in a git worktree-ready state (git worktree list should work; any modern git version is fine).
Write three short review-role prompts and save them as skills: .claude/skills/review-security.md, .claude/skills/review-perf.md, .claude/skills/review-style.md. Each contains its targeted review prompt.
Document in CLAUDE.md that adversarial review is the expected pattern for changes to src/auth/, src/billing/, and any file over 300 lines.
First time you spawn parallel agents, do it on a throwaway branch. Feel the merge ergonomics. Tune the decomposition.

After the first run the pattern stays in your muscle memory. The skills live in the repo for the next person who joins the team.

Fact-check notes and sources

Boris Cherny's workflow — summarized in his thread, the howborisusesclaudecode.com writeup, and Pragmatic Engineer's "How Claude Code is built".
Anthropic: How Anthropic teams use Claude Code — includes the adversarial-review language.
Anthropic: Building a C compiler with a team of parallel Claudes — a worked example of decomposition + parallel implementation + aggregation.
Claude Code docs: Subagents — the canonical reference for the spawning API.
Paddo: 10 Tips from Inside the Claude Code Team — practical decomposition advice.

Informational, not engineering consulting advice. The claude --worktree flag reflects Q1 2026 CLI behavior. Verify against the official changelog before depending on specific syntax. Mentions of Anthropic, Claude Code, Boris Cherny, and linked publications are nominative fair use.

Make The Subagents Fight — Adversarial Review With Worktree Isolation, Boris Cherny's Pattern For Non-Trivial Changes