# How a Small Business Runs AI Agents Without a $47,000 Surprise Bill

An engineering team woke up to a $47K AWS bill because two AI agents got stuck in a loop for 11 days. Here&#39;s the cheat-sheet version for small-business owners: the four safeguards that prevent 99% of agent disasters, and the honest answer to whether you should run agents at all yet.

Author: J.A. Watte
Published: May 14, 2026
Source: https://jwatte.com/blog/blog-ai-agent-cost-controls-smb/

---

The most expensive AI mistake of 2025 wasn't a hack or a leak. It was two agents stuck in a polite, infinite conversation for eleven days while their team slept.

An engineer named Kusireddy [wrote this up in October 2025](https://medium.com/towards-artificial-intelligence/we-spent-47000-running-ai-agents-in-production-c3-1234567890). Four LangChain agents, coordinating with each other to help users research market data. Week 1 cost $127. Week 2, $891. Week 3, $6,240. By week 4 the total was $18,400 and rising; total final damage came to $47,000 before someone pulled the plug. The agents weren't doing anything malicious. They were just asking each other clarifying questions, indefinitely, with no instruction to stop.

For a 30-person engineering team running production AI, $47K is an expensive lesson. For a five-person small business that just turned on the new [Claude for Small Business plugin](/blog/blog-claude-for-small-business-walkthrough/), the same mistake at a smaller scale could still be a $4,700 lesson, which is still a punch in the gut. The good news is that the safeguards that prevent the disaster are simple, and most of them you can set in five minutes if you know they exist.

This is the SMB-owner version of the playbook.

## What an "AI agent" actually is, in plain English

When you ask Claude a question and Claude answers, that's not an agent. That's a chat.

An agent is Claude (or any AI) given the ability to take real actions in the world. Reading your QuickBooks, sending an email, posting to Slack, scheduling a calendar event. The agent receives a goal ("collect outstanding invoices"), and it decides on its own which actions to take, in what order, to achieve the goal.

When you turn on a workflow like the invoice-chase skill in Claude for Small Business, you are running an agent. The agent reads QuickBooks, decides which invoices are overdue, drafts an email, sends it, and logs the result. It's doing a small loop of decisions, and each loop costs a few cents in AI tokens plus the time the actions take.

Now imagine the agent gets confused. It reads the QuickBooks record, can't tell if the invoice was paid, asks itself for clarification, reads the record again, still can't tell, asks again. Each cycle costs a few cents. Run for a day, that's a few dollars. Run for a week unattended, that's a few hundred. Run for eleven days with multiple agents asking each other questions, that's $47,000.

This isn't theoretical. It's the most common failure mode in production AI, and it happens to engineering teams that should know better.

## The four safeguards

Set these once, never think about them again. None of them require code. All of them work on Claude, OpenAI, or any major AI platform.

### Safeguard 1. A hard daily cost cap

Every major AI platform now lets you set a maximum daily spend. This is the single most important setting. If you do nothing else from this post, do this one.

On Anthropic, log into your billing dashboard and set a "monthly spending limit" plus a "daily spending limit." The daily limit should be roughly 2x to 3x your expected daily usage. For a small business running the invoice-chase skill, expected daily usage is probably $1 to $5. Set the cap at $20. If the agent goes into a loop, it'll burn $20 before the platform cuts it off. That's a small lesson, not a catastrophe.

On OpenAI, the equivalent is "usage limits" in the billing settings.

This single setting would have stopped the $47K disaster on day one. They didn't have it on. The platform was perfectly happy to keep billing them.

### Safeguard 2. A max-iterations limit per task

When an agent works on a task, it loops: think, act, check, think, act, check. A reasonable task is 5 to 20 loops. An infinite loop is, well, infinite.

Every agent framework lets you cap the number of loops per task. For Claude skills, the default cap is reasonable. For DIY agents built with LangChain, CrewAI, or similar, you have to set this yourself. The right number is the smallest one that actually completes your task; usually 10 to 20 is plenty.

If you're not building agents yourself, you don't need to touch this. If you've hired a vendor to build an agent for you, ask them explicitly: "What's the max-iterations cap on each agent task?" If they don't know, they haven't set one. Don't sign the contract until they do.

### Safeguard 3. A timeout per agent run

A task should complete in seconds to minutes, not days. Set a timeout so any single agent run gets killed after, say, 5 minutes. If the agent's stuck, the platform kills it and reports the failure to you. You investigate. You don't wake up to a bill.

In Claude for Small Business, individual skill runs already have built-in timeouts in the low minutes. Don't disable them.

### Safeguard 4. Email or Slack alerts when usage doubles

Most AI platforms now let you set an alert when daily usage exceeds a threshold. Set it at half your daily cap (so the alert fires before the cap actually kicks in).

For a $20/day cap, set the alert at $10. If the alert fires mid-day, you know there's a problem before it's too late. Most of the time the alert is benign (busy week, more invoices than usual). The other 10% of the time, the alert catches a runaway before it costs you a thousand dollars.

## What to do once a month

This is the entire ongoing maintenance:

1. Look at your AI bill. Five minutes. Compare to last month.
2. If it's way higher than expected, ask the platform: which agent? Which task? They have a usage dashboard. Find the spike.
3. If it's about the same, you're fine. Close the tab.

Most months are about the same. The whole point of the safeguards is that the disasters never reach you; the platform caught them before you had to look.

## The two failure modes the safeguards don't catch

Being honest about the limits of this approach.

**Quiet wrong-doing.** The agent's working fine but it's doing the wrong thing. Sending follow-up emails to customers who already paid. Misclassifying receipts in the books. Inviting a vendor to your Slack who shouldn't be there. The safeguards above don't catch this; they catch loops and runaways, not subtle errors.

The fix here isn't a budget cap. It's the "run one full cycle and watch" step from the [Claude for Small Business walkthrough](/blog/blog-claude-for-small-business-walkthrough/). For the first month, you watch every action the agent takes. After a month of "no, that's fine," you can relax. Most small businesses don't need to micro-monitor forever; they need to micro-monitor for the first 30 days while the agent gets calibrated.

**Tool-use errors.** The agent decides to use a tool (like deleting a file, or writing to a database) and gets it wrong. The connector then does the wrong thing in the real world.

The fix is the principle of least permission. When you connect QuickBooks to Claude, don't grant the agent permission to delete invoices. Grant it permission to read invoices and create reminders. The smaller the agent's permission set, the smaller the blast radius if it makes a mistake.

For each connector you add, ask yourself: what's the worst the agent could do with this access? If the answer involves the word "delete" or "send money," you probably want to restrict the permissions.

## Should an SMB even run agents yet?

Honest answer: yes, but only for narrow, well-defined, low-risk tasks.

**Good first agents:**

- Invoice chase (send reminders for overdue invoices)
- Calendar triage (auto-decline or reschedule meetings based on rules)
- Email categorization (label inbox; don't reply automatically yet)
- Document organization (sort uploaded receipts into folders)
- First-draft writing (draft a follow-up, you review and send)

**Don't agent yet (or only with a human approval step before action):**

- Anything that sends money
- Anything that signs contracts
- Anything that talks directly to customers without your review on first contact
- Anything legal, compliance, or regulatory in nature
- Anything where a mistake creates a public record (Twitter post, public review reply, court filing)

The pattern is: agent-friendly tasks are repetitive, low-stakes, and easy to undo. Agent-unfriendly tasks are public, high-stakes, or one-way (you can't take back the email to the IRS).

The good news: most of the tedious daily grind in a small business is in the first category. The agents pay off there fastest.

## The audit tools that catch this

I built three of these specifically for the SMB-agent use case.

- **[LLM Retrieval Cost Estimator](/tools/llm-retrieval-cost-estimator/).** Before you turn on a workflow, paste in a sample of the data the agent will touch and the tool gives you an estimated cost per run plus a monthly projection. If the projection looks bigger than your daily cap, that's the signal to either narrow the workflow or raise the cap.
- **[LLM Tokenizer Efficiency](/tools/llm-tokenizer-efficiency/).** Shows you which of your prompts (or skill definitions) are token-bloated and how to compress them without losing meaning. Cutting prompt size cuts cost, sometimes dramatically.
- **[AI Model Recommender](/tools/ai-model-recommender/).** For each workflow, recommends which model tier to use. Many agent tasks don't need Claude Opus and can run on Haiku for 1/10th the cost. The tool maps the workflow to the right tier.

And for the safety side:

- **[FBI Fraud Reflex Card for SMBs](/tools/fbi-fraud-reflex-card/).** Pattern-matching for agent risks beyond cost: data leakage, vendor fraud, social engineering. Pairs well with the safeguards above.
- **[API Secret Leakage Audit](/tools/api-secret-leakage-audit/).** If you're connecting any AI to any service, scan your codebase first to make sure you don't have an API key leaking that would let an attacker rack up bills against your account.

## A reasonable starter configuration

For a 25-person service business turning on Claude for Small Business this week, this is what I'd set:

- Daily spending limit: $20
- Monthly spending limit: $300
- Alert at $10/day
- Max iterations per task: default (Claude's built-in)
- Timeout per run: default (Claude's built-in)
- Connectors: read-only on QuickBooks, read-and-create on Gmail (no auto-send for the first month), read-only on Calendar

This combination gives you a hard ceiling on the worst-case scenario ($300/month even if everything goes wrong) while letting the agent actually do useful work. After 30 days of watching, you can decide whether to raise the limits, add auto-send, or grant write access to additional connectors.

## The deeper version

[The $100 Network](https://www.amazon.com/dp/B0FB1J28J9) ($9.99 on Kindle, Digital Empire series) covers the broader argument: that the under-$100 AI stack is now powerful enough for a small business to run multi-agent workflows that used to require a dedicated engineering team, but that the safeguards have to scale with the autonomy. If you want the full map of how to build out from one skill to ten, that's the book.

## Related reading

- [Claude for Small Business walkthrough](/blog/blog-claude-for-small-business-walkthrough/), the announcement post and setup guide.
- [Before you pay an agency $3,500/month for proprietary AI](/blog/blog-spot-ai-vendor-markup/), the F12 check for evaluating AI vendors.
- [AI employees stack for small business in 2026](/blog/blog-ai-employees-small-business-stacks-2026/), the broader stack map.
- [LLM Retrieval Cost Estimator deep-dive](/blog/blog-tool-llm-retrieval-cost-estimator/), the math behind the cost-control approach.
- [Opus 4.7 rankings and early-adopter cost](/blog/blog-opus-4-7-rankings-early-adopter-cost/), tier-by-tier model cost comparison.

## Fact-check notes and sources

- Kusireddy's $47,000 production-agent post: [We Spent $47,000 Running AI Agents in Production. Here's What Nobody Tells You About A2A and MCP.](https://medium.com/towards-artificial-intelligence) on Towards AI, October 2025. The $127 → $891 → $6,240 → $18,400 weekly trajectory, the 11-day loop, the LangChain four-agent configuration, and the $47K final figure are his.
- Anthropic's billing-controls dashboard (daily and monthly spending limits) documented at [docs.anthropic.com](https://docs.anthropic.com).
- OpenAI usage-limit settings documented at [platform.openai.com/docs/guides/production-best-practices](https://platform.openai.com/docs/guides/production-best-practices).
- Model Context Protocol (MCP) announcement: [Anthropic Model Context Protocol release](https://www.anthropic.com/news/model-context-protocol), March 2024.

*This post is informational, not financial, contract, or AI-implementation advice. Mentions of Anthropic, OpenAI, LangChain, QuickBooks, and other third-party services are nominative fair use. No affiliation is implied.*


---

Canonical HTML: https://jwatte.com/blog/blog-ai-agent-cost-controls-smb/
RSS: https://jwatte.com/feed.xml
JSON Feed: https://jwatte.com/feed.json
Hero image: https://jwatte.com/images/blog-ai-agent-cost-controls-smb.webp