← Back to Blog

Paperclip: When Your AI Agents Need an Org Chart, Not a Prompt

Paperclip: When Your AI Agents Need an Org Chart, Not a Prompt

Most of us still talk to AI the way we talk to a search box. One question, one answer, repeat. The minute you have more than one agent running, that frame breaks. You stop asking "what should I prompt?" and start asking "who is supposed to be doing this?"

That second question is what Paperclip is built around. It is an open-source, self-hosted platform for managing AI agents the way you would manage a small company. There is an org chart, the agents have roles and reporting lines, every task is a ticket, every agent has a monthly budget, and each role wakes up on a schedule. The whole thing runs locally with npx paperclipai onboard --yes. No vendor account, no SaaS lock-in, MIT license.

The pitch on the homepage is honest about what just happened to the mental model: "The mental model is a company you are running, not a tool you are using." If that line lands for you, you already understand why a ChatGPT thread is the wrong shape for a five-agent operation.

What you actually get

The pieces fit together about how a small startup fits together.

An org chart. Roles like CEO, CMO, CTO, and the people they report to. You can run multiple isolated companies in a single Paperclip deployment, which matters if you contract for clients or you want a personal workspace separate from business work.

Heartbeats. Each role has a schedule. The Content Writer wakes up every four hours, looks at its inbox, picks the highest-priority ticket, does the work, and goes back to sleep. The SEO Analyst wakes up every eight hours. The Social Manager every twelve. You set the cadence; the agents do not run constantly burning tokens.

Tickets. Every task is a ticket with an originator, an assignee, a goal, an output, and a full audit trail. Ticket #1042 might be "draft three social posts about Tuesday's product launch." The CMO opened it, assigned it to the Social Manager, and it carries the company mission as ambient context so the writer does not start from a blank slate.

Budgets per agent. "Monthly budgets per agent. When they hit the limit, they stop." Plain hard cap. Your agents cannot accidentally rack up a four-figure API bill on a runaway loop because each one has a wall.

Governance. You approve hires before an agent goes live. You can override decisions. You can read the trace of every tool call. None of this is theoretical; it is wired into the ticket model.

The runtime is whatever you want

Paperclip is intentionally not opinionated about which model or runtime sits behind a role. The line they keep coming back to is: "If it can receive a heartbeat, it's hired." In practice that means the platform fires an HTTP request at a configurable target, the target wakes up, the target works the ticket queue, the target reports back. Whatever you put in that target is up to you.

The adapters that ship include Claude Code, Codex, Cursor's CLI, OpenClaw, raw Bash scripts, and arbitrary HTTP webhooks. The webhook one is the escape hatch: if the runtime you want is not a first-class adapter, you write a thirty-line shim and it becomes one.

What that looks like in practice, role by role:

Claude Code as the CTO

Claude Code is the strongest fit for engineering roles because of the file-editing and multi-step tool use. You point Paperclip at a long-running Claude Code session, hand it a CLAUDE.md that describes the company, its mission, and the codebase conventions, and let the heartbeat fire it on a schedule.

A representative ticket flow looks like this. The CEO role opens ticket #2210, "rebuild the analytics dashboard query layer to use the new metric service." The ticket carries the company mission, the dashboard's current pain points, and the deadline. Claude Code wakes up on its next heartbeat, reads the ticket, opens the relevant repo, plans the migration, edits the files, runs the tests, and writes a status update back into the ticket trace. If it hits a decision it cannot make alone (a schema change with downstream effects), it opens a sub-ticket for the CEO and goes back to sleep.

You do not need a screen up for any of this.

OpenAI Codex as a backend specialist

Codex is well-suited for narrowly scoped engineering tasks where you want fast turnaround on a known pattern. In Paperclip, that maps to a Senior Engineer role with a tight budget and an aggressive heartbeat (every two hours). The ticket queue feeding it is curated by the CTO role, not by you. Codex never sees the broad strategy; it sees focused tickets like "implement the cursor-based pagination for the /events endpoint to match this spec."

The point of having both Claude Code and Codex inside the same org chart is not that one is better. They have different sweet spots. Claude Code does well on multi-file refactors and ambiguous specs. Codex does well on narrow, well-scoped patterns. The CTO role decides which one a given ticket goes to.

Cursor CLI as the QA reviewer

Cursor's CLI agent is good for review and verification because the runtime is fast and the model is tuned for inline code reasoning. In Paperclip you wire it as the QA Reviewer, scheduled to wake up every six hours. Its inbox is automatically populated with every ticket marked "ready for review" by the engineering roles. It pulls the ticket, runs the code, runs the tests, files comments back into the ticket trace, and either marks the ticket "approved" or kicks it back with a reproduction.

This is the role where the budget cap matters most. Reviews can spiral. You set the cap, the agent stops at the cap, the work waits until the next month or until you raise it. Both outcomes are fine. What is not fine is a runaway loop, and Paperclip removes that failure mode entirely.

A Bash script as the DevOps role

Some of the most useful "agents" are not agents in the LLM sense. The DevOps role in a small operation might be a fifty-line Bash script that pulls the latest deployments, runs a security check, posts a summary back into Paperclip's ticket system, and exits. It receives a heartbeat the same way Claude Code does. Paperclip does not care that there is no LLM in the loop. The org chart is the abstraction; the runtime is whatever does the work.

A webhook for anything else

If you have a service that already does the work and you just want it on a schedule with budget controls and audit trails, you wire it as an HTTP webhook role. The heartbeat hits the URL. The service does the thing. The service reports back. You get the same coordination shell around it.

Walking through a realistic small-business setup

The fastest way to get a feel for how the pieces compose is to walk through a small content operation.

You run a three-person content shop, except none of the three people are people. The CEO role is a Claude Code agent with a long company-mission document and a weekly heartbeat. Its job is strategy and prioritization, not writing. The CMO role is a Cursor CLI agent on a daily heartbeat that reads competitor coverage and opens tickets for content the company should write. The Content Writer role is another Claude Code agent on a four-hour heartbeat that drafts the content the CMO has commissioned. The Editor role is a Codex agent that wakes up after the writer and refines the prose against a style guide.

You set the budgets. CEO: $40 per month. CMO: $80. Writer: $300. Editor: $120. The total is under $600.

Now you go on vacation for a week. The CMO finds three competitor pieces, opens three tickets. The CEO wakes up on Sunday, reads the tickets, prioritizes two of them, kills the third because it does not align with the mission. The Writer drafts the two pieces over the next forty-eight hours. The Editor refines them. By the time you check in, you have two finished drafts in your review queue, a clear ticket trail explaining every decision, and a budget report showing you spent $42 of your $540.

You did not write a single prompt that week. You wrote one CLAUDE.md two months ago and it has been compounding ever since.

How this compares to the other agent orchestration frameworks

Three names come up if you go shopping for this kind of platform. They solve different problems.

CrewAI and AutoGen are libraries, not platforms. You write Python code that defines roles, you run a script, the agents talk to each other inside that one process, and when the script ends the state goes away. Excellent for one-off complex tasks. Not the right shape for an ongoing operation where you want budgets, audit trails, and roles that wake up on their own.

LangGraph is a graph-shaped workflow runner. You define nodes and edges and the graph executes once, end to end. The mental model is a state machine, not a company. Better than CrewAI for production reliability, still not the right shape if what you actually want is "agents with jobs that run on a schedule."

Paperclip is the org chart shape. Persistent state. Per-agent budgets. Scheduled heartbeats. Multi-company isolation. Audit trails by default. The cost is that you give up the inline Python ergonomics of CrewAI and the graph-shaped explicitness of LangGraph. The benefit is that the abstraction matches the actual shape of an ongoing operation.

If your work is "run a complex multi-agent task once," use CrewAI or LangGraph. If your work is "have agents do things on a schedule for the next year," use Paperclip.

What I would not use it for

A few honest constraints.

If you only have one agent and one workflow, this is overkill. You do not need an org chart for a single role. A cron job and a CLAUDE.md file are enough.

If your agents need to coordinate in seconds rather than hours, the heartbeat model is too coarse. Paperclip is built around the assumption that work happens on the timescale of human attention, not the timescale of distributed systems. For tight real-time loops, look at LangGraph or build something custom.

If you do not want to self-host, you are not the audience. There is no managed Paperclip cloud. You run it on your machine or on a small VPS, you back up the database, you handle the upgrades. That is by design and it is the right call for a tool that holds your operational context, but it is not the same shape as a SaaS product.

The shift it forces you to make

The reason this approach feels right to me is the same reason a CLAUDE.md file feels right. Both are bets that the bottleneck for AI work is no longer the model. The model is fine. The bottleneck is the operating shell around the model: what does it know, what is it allowed to do, when does it run, what does it cost when it runs, who reviews its output.

Paperclip is one answer to that. The org-chart shape is not the only valid shape, but it is a coherent one, and it has the property that as your operation gets more complex, the abstraction does not have to change. You add roles. You add reporting lines. You raise budgets where the work justifies it. You let the system grow the way you would let a real team grow.

The first time you set up a heartbeat schedule and walk away for a week, the feeling is unfamiliar. It is the same feeling a founder has the first time the team operates without them in the room. You start trusting the structure instead of the sessions, and the work compounds.

If you have read The $97 Launch, you will recognize the underlying pattern. Chapter 41 calls it the Master AI Prompt: a context-rich, role-defined, mission-anchored prompt that turns a generic model into a specialized employee. Paperclip is what happens when you take that pattern and make it a platform instead of a paragraph.

Fact-check notes and sources

  • Paperclip homepage and product description: paperclip.ing
  • Source repository, install commands, and runtime requirements (Node.js 20+, pnpm 9.15+): github.com/paperclipai/paperclip
  • Quickstart command quoted verbatim from homepage: npx paperclipai onboard --yes
  • Budget-cap quote ("Monthly budgets per agent. When they hit the limit, they stop") and the runtime-agnostic quote ("If it can receive a heartbeat, it's hired") are direct from the Paperclip homepage as of 2026-05-07
  • License (MIT, self-hosted, no required account): confirmed via the repo README

Related reading: The Task You Should Never Have Been Doing: Notes on Handing Work to a Computer-Use Agent covers the delegation question that Paperclip operationalizes; The Agent Protocol Stack goes a layer deeper on the infrastructure agents need to interoperate; Generate AI Fix Prompts From Any Site Audit shows the Master AI Prompt pattern applied to a single tool; Domain Expertise Wins AI explains why context-rich roles outperform generic prompts.

This post is informational. Mentions of Paperclip, Claude Code, Codex, Cursor, OpenClaw, CrewAI, AutoGen, and LangGraph are nominative fair use. No affiliation is implied.

← Back to Blog

Accessibility Options

Text Size
High Contrast
Reduce Motion
Reading Guide
Link Highlighting
Accessibility Statement

J.A. Watte is committed to ensuring digital accessibility for people with disabilities. This site conforms to WCAG 2.1 and 2.2 Level AA guidelines.

Measures Taken

  • Semantic HTML with proper heading hierarchy
  • ARIA labels and roles for interactive components
  • Color contrast ratios meeting WCAG AA (4.5:1)
  • Full keyboard navigation support
  • Skip navigation link
  • Visible focus indicators (3:1 contrast)
  • 44px minimum touch/click targets
  • Dark/light theme with system preference detection
  • Responsive design for all devices
  • Reduced motion support (CSS + toggle)
  • Text size customization (14px–20px)
  • Print stylesheet

Feedback

Contact: jwatte.com/contact

Full Accessibility StatementPrivacy Policy

Last updated: April 2026