A background job on a project I run had quietly become the most expensive thing in the whole operation. It scrapes about a hundred news sources, and it had grown to take up to 150 minutes per pass, running seven times a day. The runtime kept climbing as I added sources. Left alone, it was on track to cost more than everything else combined.
The fix took about a dozen lines and cut a typical run from over two hours to around thirty minutes, with the same coverage. Here is what was wrong and how the same change applies to a lot of slow jobs.
The job was waiting, not working
The original loop did the obvious thing. For each source: fetch the page, wait a polite moment, pull out the articles, move to the next source. One at a time, start to finish.
The problem is what "fetch the page" actually means. Your machine sends a request to some news site's server and then waits. It waits for that server to wake up, find the page, and send it back. That round trip can take a second, or five, or thirty if the site is slow. During every bit of that wait, your machine is doing nothing at all. It is sitting idle, holding its breath, until the response comes back. Then it does a tiny bit of work and starts waiting on the next source.
Multiply that by a hundred sources, each with its own waiting, and most of the two and a half hours was not work. It was waiting in line, one source at a time.
This is the difference between a job that is busy thinking and a job that is busy waiting. If your job is crunching numbers, it is using the processor and going faster means a faster processor. But if your job is mostly waiting on the network, the processor is bored. You do not need a faster machine. You need to stop waiting on things one after another.
Wait on several things at once
While source A's server is taking its time, nothing stops you from also asking source B, source C, and source D. All four servers can be working on your requests at the same time. None of the individual fetches gets faster. You just stop standing in a single-file line.
So instead of one loop, I run a small pool of workers. Each worker pulls the next source off a shared list, handles it, then grabs the next one. With four workers, four sources are always in flight. When one finishes, that worker immediately claims the next unclaimed source. The wall-clock time drops by roughly the number of workers. Four workers, four times faster, give or take.
The whole pool is about this much code:
async function runPool(items, concurrency, worker) {
let next = 0;
const run = async () => {
while (next < items.length) {
const i = next++;
try { await worker(items[i], i); }
catch (e) { console.error('item failed, continuing:', e.message); }
}
};
await Promise.all(
Array.from({ length: Math.min(concurrency, items.length) }, run)
);
}
That is the entire idea. You start a fixed number of workers, they share one counter, and each one keeps claiming the next item until the list runs out.
Bounded, not unleashed
The instinct after you understand this is to fire all hundred requests at once. Do not. Unbounded concurrency is its own disaster. A hundred simultaneous fetches can exhaust your machine's memory, trip rate limits, and look like an attack to the sites you are reading. You get blocked, and you deserve it.
A small fixed pool is the sweet spot. Four to eight workers captures almost all of the speedup while staying gentle. I used four. It kept memory flat and kept me a polite guest on every server.
What you must not break when you parallelize
Speed is easy. Speed without breaking the things the slow version got right is the actual job. Four things mattered.
Do not hammer any single server. I run the sources concurrently, but each individual source keeps its original pause between page requests. The concurrency is across different sites, never piling onto one. Four different servers being asked one polite question each is fine. One server being asked forty questions at once is not.
Keep saving as you go. The slow version wrote its progress to disk after each source, so a crash at minute 90 still left 90 minutes of results. The fast version keeps that exactly. Each worker saves the moment it finishes an item. If the job dies, you have everything completed so far, not an empty file.
Isolate failures. One source that errors or hangs must not take down the other ninety nine. In the pool above, each item runs inside its own try and catch, so a single bad fetch logs a line and the pool keeps going. The old loop already did this per item; the new one has to keep doing it.
Keep the safety nets. There was a per source timeout so one stuck site could not stall everything, and an overall time budget so the job always exits cleanly even on a bad day. Both carried straight over. Concurrency does not replace your timeouts. It runs inside them.
How to test it without running the whole thing
I did not want to find out whether the concurrency was correct by watching a live two hour scrape. So I pulled the pool out into its own little function, the one above, and wrote a fast test that runs it with a fake worker that just sleeps and records what happened. No network, no real scraping. In a few milliseconds it proved the things I actually cared about: never more than four in flight at once, every item handled exactly once, a thrown error does not kill the pool, and the stop signal halts it cleanly. Only after that passed did I run it against a few real sources to confirm the live path behaved. Cheap test first, expensive test second.
Where this applies
Any time you have a loop shaped like "for each item, go fetch or call something over the network, then handle the result," you have a candidate. A nightly job that hits fifty API endpoints. An image pipeline that calls a service once per file. A script that checks five hundred URLs. A mailout to a thousand subscribers. If the bottleneck is waiting on other machines rather than computing on yours, a small bounded pool is usually the highest leverage change available, and it is a dozen lines.
The slow version felt safe because it was simple. But simple and slow was costing real money and getting slower every week. A dozen lines bought back two hours a pass and stopped the climb.
If you are running lean technical work for yourself or for clients and want the larger playbook for doing more with a tiny stack, that is what The $20 Dollar Agency is about, one of my $9.99 guides.
Related reading
- I Cut a Recurring AI Bill by More Than Half in an Afternoon
- Your AI Is Quietly Making Up the Numbers in Your Reports
- The $50/Month AI Stack for a Small Business
Fact-check notes and sources
- The runtime figures (up to 150 minutes per pass, seven passes a day, around 30 minutes after the change) and the source count are from my own project's run logs.
- GitHub Actions bills by runner-minute, so cutting wall-clock time cuts the bill directly: GitHub Actions billing.
- The pattern shown is standard bounded-concurrency over Promises in JavaScript;
Promise.allresolves when all the workers finish: MDN — Promise.all.
This post is informational, not consulting advice. It describes work on my own software. No affiliation with any vendor named is implied.