# Claude on a Schedule and at Scale: Routines, GitHub Actions, the Batch Discount, and Model-Routed Gates

Put Claude on a schedule with routines, then run it cheaply at volume with GitHub Actions parallelism, the Batches API 50% discount, and model-routed gates.

Author: J.A. Watte
Published: June 27, 2026
Source: https://jwatte.com/blog/claude-routines-and-scale/

---

There is the Claude you drive and the Claude that runs without you. This post is about the second one: getting work done on a timer, on an event, or in bulk, and doing it without paying full price for every call. There are two halves to it. The managed half is routines, where Claude Code handles the scheduling and execution for you. The coded half is your own pipeline, where you trade convenience for control and a much lower bill at volume.

## Routines: Claude Code on a timer or a trigger

A routine is a saved Claude Code configuration, a prompt plus the repositories, connectors, and environment it needs, that runs on its own. In the desktop app you create one and choose a flavor, Remote or Local, and that choice decides almost everything else.

A **remote routine** runs in the cloud on Anthropic's infrastructure. Your laptop does not need to be on. It runs without permission prompts, as a full Claude Code session scoped to what you gave it, and it can commit, open pull requests, and call your connectors under your own identity. By default it can only push to branches that start with `claude/`, a safety guard you can lift per repository when you actually want it writing to a shared branch.

A **local routine** runs on your own machine instead. You reach for this when the task needs your real files, your local tools, or an environment that only exists on your computer. The catch is the one I covered in the [desktop app post](/blog/claude-desktop-app/): it only fires while the app is open and the machine is awake. If the computer is asleep at the scheduled time the run is skipped, and when you wake it the app runs a single catch-up for the most recent missed slot and drops the older ones. There is a keep awake option if a local routine has to be dependable.

You create either kind three ways: the web at claude.ai/code/routines, the `/schedule` command inside a session where you just describe what you want, or the Routines panel in the desktop app. All three write to the same account, so a routine made in one shows up in the others.

### What can set one off

Remote routines support three kinds of trigger:

- **A schedule**, on a recurring cadence like hourly, daily, weekdays, or weekly, or as a one off at a specific time. The minimum interval is one hour.
- **An API call**, a per routine endpoint you POST to with a bearer token, with an optional text field for passing in the context of that particular run.
- **A GitHub event**, once the Claude GitHub App is on the repository, firing on pull request and release events and filterable down to the author, branch, labels, and the rest.

Between the three, almost any automation people reach for already has a trigger waiting for it.

### The guardrails worth knowing

Routines are a research preview as I write this, so treat the specifics as a moving target. A few are worth keeping in mind. There are daily run caps by plan, on the order of a few runs a day on Pro and more on the higher tiers, with one off runs not counting against the cap. Everything a routine does happens under your identity, your commits, your connector accounts, so scope each one to only the repositories and tools it actually needs. And on team plans an owner can switch routines off for the whole organization.

## When you outgrow routines: your own pipeline

Routines are the right tool for a handful of scheduled jobs. They are the wrong tool for running Claude over thousands of items, because each routine is one session with a daily cap, not a bulk processor. When the job is "score these ten thousand records" or "summarize every document in this bucket," you write the loop yourself, and the question becomes how to run it fast and cheap. Three levers do almost all the work: parallelism, the batch discount, and routing.

## Parallelism in GitHub Actions

If your bulk job already lives in CI, the simplest way to go faster is to shard it across runners with a matrix. Each shard processes its slice of the work in its own job, in parallel:

```yaml
jobs:
  analyze:
    strategy:
      max-parallel: 4
      matrix:
        shard: [1, 2, 3, 4]
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4
      - run: python analyze.py --shard ${{ matrix.shard }} --of 4
        env:
          ANTHROPIC_API_KEY: ${{ secrets.ANTHROPIC_API_KEY }}
```

The ceiling is not the number of runners, it is your account's rate limits, the requests and tokens per minute Anthropic will accept. Push too many calls at once and you start getting 429 responses. The SDK retries those with backoff on its own, but you should still design for the limit rather than against it. That is what `max-parallel` is for: cap the number of shards running at once so the combined call rate stays under your ceiling. And keep the bigger point in view: parallel real time calls all pay full price. Going wide finishes the job sooner, but the per call cost is unchanged. For sheer volume, the next lever does.

## The Batches API: half price for work that can wait

The Batches API runs your requests asynchronously at half the normal token price. You hand it up to a hundred thousand requests at once, it works through them, and you collect the results when they are ready. Most batches finish within an hour, the ceiling is twenty four, and the results stay available for a few weeks. Everything the regular API does, tools, vision, prompt caching, works inside a batch too.

The trade is in that word asynchronous:

- **For it:** the tokens cost fifty percent less, the throughput is enormous, and you do not have to manage your own concurrency or backoff.
- **Against it:** it is not interactive, so it is wrong for anything a user is waiting on. The results come back in whatever order they finish, so you have to label each request and match results by that label. And you poll for completion rather than getting an immediate answer.

Coding it is short. Tag every request with a `custom_id`, submit the batch, wait, then match results back by that id:

```python
import time
from anthropic import Anthropic
from anthropic.types.message_create_params import MessageCreateParamsNonStreaming
from anthropic.types.messages.batch_create_params import Request

client = Anthropic()

batch = client.messages.batches.create(requests=[
    Request(
        custom_id=f"item-{i}",
        params=MessageCreateParamsNonStreaming(
            model="claude-sonnet-4-6",
            max_tokens=1024,
            messages=[{"role": "user", "content": prompt}],
        ),
    )
    for i, prompt in enumerate(prompts)
])

while client.messages.batches.retrieve(batch.id).processing_status != "ended":
    time.sleep(30)

# results arrive in any order, so key by custom_id, never by position
for result in client.messages.batches.results(batch.id):
    if result.result.type == "succeeded":
        text = next(b.text for b in result.result.message.content if b.type == "text")
        save(result.custom_id, text)
```

If the work can wait, batch it instead of hammering the real time API in parallel for something a batch does for half the money.

## Model-routed gates: spend the expensive model only where it counts

The last lever is about which model handles each item. On a big pile of work, most items are easy and a few are hard, and paying the top model for every one of them is waste. A model routed gate fixes that. A cheap, fast model looks at each item first and decides whether it is simple or hard, and only the hard ones go to the expensive model. The gate itself is a tiny classification call:

```python
def gate(item: str) -> str:
    r = client.messages.create(
        model="claude-haiku-4-5",
        max_tokens=5,
        messages=[{"role": "user", "content":
            f"Reply with one word, SIMPLE or HARD. Item:\n{item}"}],
    )
    label = next(b.text for b in r.content if b.type == "text").strip().upper()
    return "HARD" if "HARD" in label else "SIMPLE"
```

Per million tokens, Haiku runs about a dollar in and five out, Sonnet three and fifteen, Opus five and twenty five. A Haiku gate costs a fraction of a single Opus call, so if it lets you answer the easy majority with a cheaper model and reserve the top model for the genuinely hard slice, the saving dwarfs the triage. Two cautions. The gate can misjudge, so make hard the safe default and route up whenever it is unsure, because a wrong cheap answer costs more than a right expensive one. And remember the gate is itself a call, so it only wins when the routing saves more than the triage spends, which is to say on real volume.

## Putting the levers together

The cheapest correct pipeline for bulk work stacks all three. Gate every item with a cheap model. Send the large, can wait majority to the Batches API at half price. Keep real time parallel calls, in GitHub Actions or wherever your job runs, for the slice that genuinely needs an answer now. Then wrap the whole thing in a routine so it runs on a schedule or on an event without you starting it.

This is ordinary cost discipline applied to model calls, spend effort and money where they change the outcome and automate the rest. That same discipline is the spine of [The $97 Launch](https://the97dollarlaunch.com/), which is about building real things on a small budget by being deliberate about where the dollars go.

## Related reading

- [Claude Code Loop vs Schedule](/blog/claude-code-loop-vs-schedule/) compares running something on an interval against scheduling it, which is the choice underneath every routine.
- [Claude on the Desktop](/blog/claude-desktop-app/) covers local routines and the awake constraint in the context of the desktop app.
- [Claude Cowork](/blog/claude-cowork-for-knowledge-work/) is the knowledge work side of the same scheduling feature.
- [Claude Code Without the Terminal](/blog/claude-code-beyond-the-terminal/) maps the cloud and web surfaces where remote routines live.

## Fact-check notes and sources

- Routines, triggers, the Remote and Local split, branch-push defaults, and daily caps: [Automate work with routines](https://code.claude.com/docs/en/routines), the [routine API trigger](https://platform.claude.com/docs/en/api/claude-code/routines-fire), and [scheduling recurring tasks in Cowork](https://support.claude.com/en/articles/13854387-schedule-recurring-tasks-in-claude-cowork).
- The Batches API (50% pricing, up to 100,000 requests per batch, asynchronous completion, results keyed by `custom_id`): [Message Batches](https://platform.claude.com/docs/en/build-with-claude/batch-processing) and [pricing](https://claude.com/pricing).
- Rate limits that cap real-time parallelism, and per-model pricing used in the routing math: [rate limits](https://platform.claude.com/docs/en/api/rate-limits) and [models overview](https://platform.claude.com/docs/en/about-claude/models/overview).

*Written from my own hands-on use of Claude and the Claude API. Mentions of Claude, Claude Code, and Anthropic are nominative; this site is independent, and no affiliation or endorsement is implied. This post is informational, not professional advice.*


---

Canonical HTML: https://jwatte.com/blog/claude-routines-and-scale/
RSS: https://jwatte.com/feed.xml
JSON Feed: https://jwatte.com/feed.json
Hero image: https://jwatte.com/images/claude-routines-and-scale.webp
