The Task You Should Never Have Been Doing: Notes on Han...

Claude-the-chatbot is a better search engine. Claude-that-drives-your-computer is something else.

The chatbot answers. You still fetch, click, format, paste, repeat. The computer-use version opens the browser itself, reads the documents, edits the spreadsheet, and hands you the thing. Those two experiences aren't adjacent on a productivity chart. They're different products. And the only question worth asking about the second one is: which work do I let it have?

Not all work. Not yet, and maybe not ever for some of it. But a surprising amount, and more than I expected the first time I ran the math.

Four quick questions before you hand anything off

I've been running computer-use agents on about fifteen different workflows across three businesses. The ones that paid off and the ones that wasted my time separated cleanly. Four questions tell you which bucket a candidate task is in. Run them in order; if any answer is bad, you have your answer.

Is the work procedural or judgment-heavy? Procedural means the right answer is defined by a sequence, open page, read field, put value in cell, next page. Judgment-heavy means the right answer depends on taste, context, or a market read. Agents are good at the first and cheerfully wrong at the second. Automate the procedural slice. Keep the judgment for yourself.

Are you doing the task because it grows the business or because you have to? A weekly reconciliation you do because the books won't do themselves is a better automation target than a pricing experiment that might or might not pay. The value of handing off is roughly the hours you get back, and boring hours count the same as interesting ones on that count.

Does the task run on interfaces the agent can use? Today, that means web apps, browser-based admin panels, most SaaS tools, standard documents, and local files. It does not mean legacy desktop apps, anything behind a phone-tap SSO, most CAPTCHAs, or UIs built around drag gestures. Know your edge.

Can you inspect the output before it matters? When the output is a spreadsheet you eyeball, errors are cheap. When the output is a public tweet, an approved refund, or an email to a client list, errors are expensive. Automate the first kind freely. The second kind gets a human-in-the-loop step.

A task that answers well on all four is a handoff candidate. Three is probably worth trying. Two or fewer, don't bother.

Three I said yes to

These are real. Numbers are approximate but close enough that you could plan against them.

Weekly competitive blog pulse. For five named competitor blogs, pull the most recent fifteen posts each. Note title, publish date, word count, heading outline, whether the post cites a primary source. Put it in one spreadsheet, sorted by publish date.

I used to do this twice a year because it was three hours of clicking and counting. Now it runs weekly. About eleven minutes of agent time, two minutes of my review. What changed in the business: I caught a topic shift about a month earlier than I used to, and I stopped launching a piece into territory a competitor had already saturated. That one near-miss paid for a year of the agent.

Monthly invoice reconciliation. Open the accounting tool's vendor invoice list. For each invoice above a dollar threshold, locate the matching purchase order in the contracts folder. Flag any invoice where the total differs from the PO, line items mismatch, or there's no PO at all. Output an exception report sorted by dollar value.

A CFO friend used to eat three hours on this at the end of every month. Now it's a fifteen-minute agent job and she reads only the exceptions. Two things worth noting. She does not let the agent approve or pay anything, that would fail the fourth question. And she double-checked the agent's output manually for two months before trusting it enough to skim. Both are good hygiene.

Pre-publication link check. For a blog post in drafts, verify every outbound link returns 200, every in-page anchor targets an element that exists, and every internal link points at a live page on the site. Output a short checklist.

Three minutes. Catches maybe one real issue per ten posts. Before the agent existed I did this sometimes, found broken links via reader email, retrofit fixes. Small task, huge quality lift. The best kind of handoff.

Three I said no to

In the interest of honesty.

Drafting customer support replies. Fails question four regularly. The replies sound right and occasionally promise things we don't do. Errors like that are expensive and not caught by spreadsheet-style review. Support replies still come from a human. The agent helps with research, pulling customer history from the CRM, but the words are mine.

Writing market commentary for the newsletter. Fails question one. This is taste work dressed up as fact work. The agent produces prose that's technically accurate, structurally ordinary, and unmistakably not me. I went back to writing it myself and stopped pretending that was a weakness to fix.

Final approval on a pull request. Fails question four. An agent can surface style violations, failing tests, and known anti-patterns, that's useful. Letting it approve and merge is letting it make a judgment call with a production-level blast radius. Human review stays in the loop.

The small set of habits that keeps it safe

Three things I do every time.

A separate browser profile with no saved passwords, no active banking sessions, no cookies for anything material. The agent operates in a sandbox that has access to exactly what the task needs and nothing else.

Read-only API keys wherever the data source supports them. The agent can fetch analytics but not change them. Pull invoices but not approve them. Read a repository but not push to main.

Watch the first run. I tail the logs on any new task end-to-end for the first execution. After one clean run I let it go unattended. Before that, I'm paying attention.

None of this is paranoia. It's what you'd do with a new intern who needed admin access to get anything done. The agent is an intern now. Treat it like one.

The shift

Here's what clicked for me. A year ago the question was "can the AI do this task?" The question now is "given that it can, should I have been doing this task in the first place?"

For a lot of the fetch-extract-arrange work that fills a knowledge worker's week, the honest answer is no. That work was never valuable because it was your work, it was valuable because until recently there was no reasonable way to have it done without it being someone's. That changed. Quietly. Somewhere around the middle of 2024.

What you keep is the work that requires you. The decisions. The taste calls. The customer conversations. The strategy. What you hand off is the work that only required you because the alternative didn't exist.

That's the whole point. The tools are new; the discipline of deciding what to delegate has been the same for a thousand years.

Related tools: Mega Analyzer and Batch Analyzer pair well with a computer-use agent that crawls a competitor's sitemap; the CLAUDE.md Generator produces the memory file that makes repeated handoffs more reliable.

Informed in part by a Medium piece titled "How To Automate Your Stock Research in 23 Minutes With Claude Co-work" from the AI in Trading channel. Framing and examples in this post are my own; the underlying observation about computer-use agents shifting the delegation question is shared across the community.

The Task You Should Never Have Been Doing: Notes on Handing Work to a Computer-Use Agent