← Back to Blog

I Cut a Recurring AI Bill by More Than Half in an Afternoon, Without Touching Quality

I Cut a Recurring AI Bill by More Than Half in an Afternoon, Without Touching Quality

The model bill on a small AI project I run had crept up to roughly 140 dollars a month. Nothing dramatic, no runaway loop, no leak. Just a number that kept climbing while I wasn't looking. I spent an afternoon on it, changed almost no actual logic, and brought it under 25 dollars a month with no drop in what the project produces.

The interesting part is that none of the wins came from writing better code. They came from looking at things I had stopped looking at. Here is the checklist, in the order that found the most money fastest.

1. Find out what you are actually paying for

This was the single biggest win and it took twenty minutes.

I assumed the bill was the news desk, the part that reads articles and writes summaries. It wasn't. The same API key was quietly paying for two completely different jobs: the news desk, and a translation step that converts every foreign-language story into English. When I split the bill apart, the news desk was about a third of it. The translation was the other two thirds.

I had been about to optimize the wrong thing.

One shared key, or one shared account, will happily blend the cost of five different jobs into a single scary number, and your instinct about which job is expensive is usually wrong. Before you change anything, separate the spend by job. Most billing dashboards let you group by API key, by model, or by project. Group it. The thing you were sure was the problem often isn't.

Once I could see it, the translation fix was obvious: I switched it from a paid model back to a free translation endpoint that is perfectly good for the job. Two thirds of the bill, gone, in one setting change.

2. Read your schedules before you read your code

The second win was a cron job nobody had looked at in a month.

A batch process was set to run twice a day. The second run regenerated the exact same set of outputs as the first and saved them right on top of the morning's work. It was paying full price to overwrite itself. The corpus it read from barely changed between the two runs, so the second batch was almost pure waste. I cut it to once a day and added a much lighter midday refresh of just the headline content instead.

Right next to it, a market-data snapshot was firing every thirty minutes. Forty eight runs a day. The job itself takes about twenty seconds, but most billing for short scheduled jobs rounds every run up to a full minute. So forty eight tiny runs cost forty eight minutes a day, almost all of it rounding tax, for data that only needs to be hourly. I halved the schedule and the cost halved with it.

Your schedules are where money leaks silently, because a cron job never complains. It just runs, and bills, forever. Open the list. For each scheduled job ask two questions: does this need to run this often, and is anything downstream actually using every run? You will find at least one that answers no.

3. Only regenerate when the input actually changed

The third idea is the one I will reuse on everything from now on.

Most expensive AI jobs regenerate something on a timer. A daily summary, an hourly digest, a nightly report. They run whether or not the underlying data moved. On a slow news day, my project was paying to rewrite a report about the same stories it summarized hours earlier.

The fix is a skip gate. Before the expensive job runs, take a cheap fingerprint of its input, in my case the set of the most recent story links, and compare it to the fingerprint from the last time it ran. If the input barely changed, skip the job entirely. If it changed a lot, run it. I set the threshold so that the job only fires when at least a third of the input is new.

The gate fails open, which matters: if the fingerprint check errors or there's no baseline yet, it runs the job rather than silently skipping it. You never want a cost optimization that quietly suppresses your output. You want one that pays only when there is something new to pay for.

This pattern fits almost any timed regeneration. Hash the input, store the hash, compare next time, skip if it's basically the same. It is a dozen lines and it stops you paying to rebuild things that did not change.

4. Move each job to the cheapest tool that can do it

A lot of cost is using a heavy tool for a light job out of habit.

The translation step in point one is one example: a frontier model doing work a free endpoint handles fine. There are others. Counting things does not need a model at all; a few lines of code do it for free and more accurately, which I wrote about separately in the post on AI making up numbers. The shallow, formulaic parts of a job can often run on a smaller, cheaper model while the genuinely hard reasoning stays on the expensive one.

The question for every step is: what is the cheapest thing that produces an acceptable result here? Not the best thing. The cheapest acceptable thing. You will be surprised how often the honest answer is "a free endpoint" or "ten lines of code."

5. Watch for the thing that is quietly growing

The last check is less a fix and more a habit. When I separated the spend, one job was not just expensive, it was growing week over week, climbing as the project added more sources to process. Left alone it would have doubled again. A number that is merely high is a one-time fix. A number that is climbing is the one that turns into a real problem if you ignore it. Sort your costs by trend, not just by size, and deal with the climbers first.

The afternoon, in total

Separate the bill by job. Read the schedules. Gate regeneration on real change. Use the cheapest tool that works. Watch the trend. None of that is clever engineering. It is just looking, on purpose, at things that quietly bill you while you are busy. The reward for an afternoon of looking was a bill cut by more than half, with the output unchanged.

If you are tired of paying every month for work you could do yourself with a few cheap tools, that is the whole premise of The $20 Dollar Agency, one of my $9.99 guides.

Related reading

Fact-check notes and sources

  • The bill figures, the one-third versus two-thirds split, and the per-job costs are from my own project's billing data and logs.
  • GitHub Actions bills each job by the minute and rounds each run up to the nearest full minute, which is why many short scheduled runs cost more than their runtime suggests: GitHub Actions billing.
  • Anthropic's Message Batches API is billed at a 50% discount versus standard calls, and prompt caching pricing is documented here: Anthropic pricing.

This post is informational, not financial or consulting advice. It describes work on my own software. No affiliation with any vendor named is implied.

← Back to Blog

Accessibility Options

Text Size
High Contrast
Reduce Motion
Reading Guide
Link Highlighting
Accessibility Statement

J.A. Watte is committed to ensuring digital accessibility for people with disabilities. This site conforms to WCAG 2.1 and 2.2 Level AA guidelines.

Measures Taken

  • Semantic HTML with proper heading hierarchy
  • ARIA labels and roles for interactive components
  • Color contrast ratios meeting WCAG AA (4.5:1)
  • Full keyboard navigation support
  • Skip navigation link
  • Visible focus indicators (3:1 contrast)
  • 44px minimum touch/click targets
  • Dark/light theme with system preference detection
  • Responsive design for all devices
  • Reduced motion support (CSS + toggle)
  • Text size customization (14px–20px)
  • Print stylesheet

Feedback

Contact: jwatte.com/contact

Full Accessibility StatementPrivacy Policy

Last updated: April 2026