/ultrareview: A Multi-Agent Code Review That Runs in the Cloud While You Keep Working

May 7, 2026

Editorial note. The publication date shown above may be in the future. That is intentional. Posts on this site are scheduled against an editorial calendar that aligns with product releases, book launches, and platform-signal timing; the datePublished reflects the date the post is slated to go public, which is also the date indexers and syndication partners should treat as canonical. If you are reading this before that date you were early — welcome.

There is a specific kind of code review I have always wanted and never gotten. Not the fast skim where someone tells me my variable names are weird. The slow, suspicious one where a careful reader actually runs the change in their head, traces the new code path, looks for the silent bug that the test suite cannot catch because the test suite does not know to ask. The kind of review you get from an engineer with twenty years on a system who has seen every way that exact piece can break.

That is the review /ultrareview is trying to be. It is not the same thing as /review, which is the local single-pass version that runs in your terminal in seconds. Ultrareview launches a fleet of reviewer agents in a remote sandbox, has each agent independently reproduce and verify any finding before reporting it, and gives you back a notification five to ten minutes later with the things it actually believes are bugs. Style suggestions are filtered out. Drive-by opinions are filtered out. What lands in the report is what survived a verification pass.

It is research preview, available in Claude Code v2.1.86 and later, and it costs real money once you burn through the three free runs. Here is how to use it, where it shines, where it is wrong fit, and how it compares to the same idea showing up across other LLM platforms.

Running it

The interactive form is a slash command from inside Claude Code.

/ultrareview

With no arguments, it reviews the diff between your current branch and the default branch, plus any uncommitted or staged changes in the working tree. Claude Code bundles up the repo state and uploads it to a remote sandbox. You see a confirmation dialog with the review scope (file count, line count), how many free runs you have left, and the estimated cost. After you confirm, the review runs in the background and you keep working.

If you want to review a GitHub pull request directly, pass the PR number.

/ultrareview 1234

In PR mode the sandbox clones the pull request from GitHub itself rather than uploading your local working tree. That is the path to take if your repo is too big to bundle, which the tool will tell you the first time you hit that ceiling.

For CI and scripts, there is a non-interactive subcommand.

claude ultrareview
claude ultrareview 1234
claude ultrareview origin/main
claude ultrareview --json --timeout 30

The subcommand starts the same review, blocks until it finishes, prints the findings to stdout, and exits 0 on success or 1 on failure. The --json flag gives you the raw bugs.json payload if you want to pipe it into your own tooling. The --timeout flag bounds how long you wait. Progress messages and the live session URL go to stderr, so stdout stays clean for whatever you pipe it into.

A finished review pings you back inside Claude Code with the verified findings. Each one carries the file location and an explanation of the issue. From there the natural next step is to ask Claude to fix it, which closes the loop in the same session.

What it costs

Ultrareview bills against extra usage rather than your plan's included usage. Pro and Max subscribers get three free runs to try the feature; those runs do not refresh and they expire on May 5, 2026. Team and Enterprise get no free runs out of the gate. After the free runs, each review typically costs $5 to $20 depending on the size of the change.

A few things worth noting from the docs.

A run counts the moment the remote session starts. If you stop the review early, that still consumed a free run; you cannot cancel for free. For a paid review you only get billed for the portion that ran, but the free-run count is a binary.

You need extra usage enabled on the account. If it is off, Claude Code blocks the launch and links you to the billing settings. You can run /extra-usage to check your current setting from inside the CLI.

You need a Claude.ai login because ultrareview runs on the same web infrastructure as Claude Code on the web. If you are signed in with an API key only, run /login and authenticate first. Ultrareview is not available when using Claude Code through Amazon Bedrock, Google Cloud Vertex AI, or Microsoft Foundry, and it is unavailable to organizations that have enabled Zero Data Retention.

Three workflows where it earns the cost

I have run ultrareview enough times to know roughly when it is worth the five-to-twenty dollars and when it is not. The pattern is consistent.

Pre-merge confidence on a substantial change. This is the obvious case and the one the tool is built for. You have a hundred-line refactor of an authentication path or a query layer or a billing flow. The tests pass. The PR looks reasonable. The risk of a silent regression is non-zero. You push, you run /ultrareview <pr-number>, you go make coffee. Ten minutes later you have a verified list of issues, or you have nothing and you merge with more confidence. The cost is roughly the cost of a coffee per substantial PR. The value is catching the bug that would have hit production at 2 a.m.

Reviewing a PR before approving someone else's. Same shape, different role. A teammate opens a PR, you skim it, the change is in a part of the codebase you do not know cold. Run ultrareview against the PR. Read the verified findings. Write your review based on those plus your own judgment. You stop pretending you can mentally trace through code you have never touched, and you stop rubber-stamping changes that you secretly do not understand.

Pre-release sweep. Before a release branch goes out, run ultrareview against the cumulative diff. If you have been merging carefully, it should come back nearly empty. If something slipped through, the cost of catching it now beats the cost of catching it after deploy.

Two workflows where it does not earn the cost

The same honesty.

Tiny changes. A one-line fix does not need a fleet of reviewer agents. /review runs in seconds and is included in normal usage. Save the cloud review for changes that benefit from depth.

Highly stylized code where the issues are taste. Ultrareview is built to filter out style noise and surface real bugs. If the thing you actually want feedback on is the API design or the naming or the architecture choice, ultrareview is the wrong tool. Ask Claude directly with a focused prompt instead. The fact that the multi-agent fleet only reports verified bugs means it will report nothing on a clean refactor where the question is "is this elegant," and you will have spent the run for nothing.

How it compares to the same idea on other platforms

The "deep AI code review" pattern has shown up in several places over the last eighteen months. They share the broad shape and differ in the details that matter.

Cursor's Bugbot. Cursor introduced Bugbot to do exactly this kind of pre-merge review on PRs. It hooks into your GitHub repo, posts inline comments, and runs on each push. The trade-off versus ultrareview: Bugbot is always-on and posts comments directly into the PR thread; ultrareview is on-demand and surfaces findings inside Claude Code. If you live in Cursor and want passive coverage, Bugbot fits the shape. If you live in Claude Code and want to run a deeper pass selectively, /ultrareview is the closer fit.

GitHub Copilot's code review feature. GitHub shipped an AI review action that runs against PRs from inside the GitHub UI. It is fast, lightweight, and focused on style and obvious correctness issues. The model is not running a multi-agent verification fleet; it is doing a single-pass review. Free for Copilot subscribers. Good baseline coverage, less depth than ultrareview, no separate per-run cost.

CodeRabbit. CodeRabbit is the third-party variant. It plugs into GitHub or GitLab and posts AI review comments on every PR. The free tier is generous; paid tiers add deeper analysis and integrations. Closest analogue to Bugbot in shape. Worth comparing if your workflow is GitHub-native and you want passive review that does not depend on which IDE you use.

Greptile. Greptile is the more research-y version, focused on indexing the whole codebase and reasoning about the change in the full repo context. The pitch is similar to ultrareview's depth claim. Pricing is per-seat. Worth a look if your repo is large enough that single-file context windows miss the point of the change.

Gemini Code Assist. Google's offering ships AI code review inside the Gemini Code Assist suite, integrated with Google Cloud's repo and CI tooling. Strongest fit if you are already on GCP and Cloud Build. The review depth is comparable to Copilot's; the differentiator is integration into the rest of the Google developer tooling.

Sourcegraph Cody. Cody emphasizes whole-repo code intelligence and AI assistance, with a review pass that uses its index of the codebase. Strong on cross-file reasoning. Pricing is per-seat with a free tier. Best fit for organizations that already use Sourcegraph for code search.

The honest summary across all of these: the multi-agent verification model that ultrareview ships is the most aggressive on filtering out noise, and the cloud-sandbox model is the most aggressive on not eating your local resources. The trade-off is that you pay per run instead of per seat. If you do a small number of high-stakes reviews per month, the per-run model is cheaper. If you do many reviews per month, a per-seat tool may pencil out lower.

Writing tickets to the reviewer

A practical detail that took me a few runs to figure out. The reviewer is a fleet, not a single conversation, so the way you bias it is by how you stage the change, not by how you prompt during the run.

If your branch contains three logical changes, the reviewer will see three logical changes and the report will mix them. The cleaner pattern is one logical change per branch, one ultrareview per branch. The findings come back focused on that change, the verification pass converges faster, and the cost stays at the lower end of the $5-$20 range because the diff is smaller.

A second small habit that helps. Before you run ultrareview, write a one-paragraph summary of the change in the PR description: what the change does, what it does not change, what edge cases you considered, what tests you added. The reviewer agents read this. They are less likely to flag something as a bug if you have already accounted for it in your description and the code matches what you described. The agents are still independent and skeptical; they will not just believe you. But they will not waste cycles re-deriving context you already provided.

The shift it captures

Here is the broader pattern. A year ago AI code review meant a single chatbot reading a diff and giving you opinions, most of which were style. Six months ago it meant the same thing in your IDE with better integration. Now it means a fleet of agents in a sandbox, each independently trying to break your change and only reporting what they could verify.

That is a real shift. It moves AI review from "another voice in the room" to "another verifier in the pipeline." The first one is something you can take or leave. The second one is something you can rely on, in the literal sense that you can stake the merge on it.

I am not yet at the point where I would merge without human review on a change that touches user data or money. The verification pass is good; it is not infallible. But for the wide middle band of changes where the question is "did I break something I do not see," ultrareview has changed how confident I am at the merge button. That is the kind of tool that earns its line on the bill.

If you have read The $20 Agency, you will recognize the operating discipline this depends on. Chapter 27 walks through how to build recurring audit schedules and how to triage findings by impact, and the pattern translates almost cleanly from SEO audits to code review. The tool is new; the discipline of "what gets audited, by whom, and how often" is older than software.

Fact-check notes and sources

Documentation, command syntax, and pricing tiers: code.claude.com/docs/en/ultrareview
Slash commands /ultrareview and /ultrareview <PR-number>, plus the non-interactive claude ultrareview subcommand with --json and --timeout flags: quoted from the official docs as of 2026-05-07
Free-run policy (3 runs for Pro and Max, expiring May 5, 2026; none for Team and Enterprise; $5-$20 per run after free): quoted from the official docs
Version requirement (Claude Code v2.1.86+) and exclusions (Bedrock, Vertex AI, Foundry, ZDR): quoted from the official docs
Cursor Bugbot, GitHub Copilot code review, CodeRabbit, Greptile, Gemini Code Assist, and Sourcegraph Cody descriptions are based on each vendor's published feature pages; trade-offs and per-seat vs per-run framing are my own analysis

Related reading: The Task You Should Never Have Been Doing: Notes on Handing Work to a Computer-Use Agent is the broader frame for when to delegate to an agent fleet; Generate AI Fix Prompts From Any Site Audit shows the same audit-then-fix loop applied to web SEO; Claude Trading Lessons collects the practical Claude Code habits that compound across sessions; Domain Expertise Wins AI explains why a verified review reads as more trustworthy than a fast one.

This post is informational. Mentions of Claude Code, Cursor, GitHub Copilot, CodeRabbit, Greptile, Gemini Code Assist, and Sourcegraph Cody are nominative fair use. No affiliation is implied.

← Back to Blog