# The Case for an AI That Remembers You: Why Persistent Memory Beats a Fresh Prompt Every Time

An open-source agent called Hermes crossed 110,000 GitHub stars in ten weeks because of one feature: it remembers what you taught it last time. Here is what that changes for a small operator who runs the same workflow week after week.

Author: J.A. Watte
Published: May 15, 2026
Source: https://jwatte.com/blog/blog-agent-that-remembers-you/

---

Anyone who has used a chat-based AI tool for more than a week has felt the moment.

You spend twenty minutes explaining to Claude or ChatGPT how your business works. The dietary restrictions of your café's regulars. The way you draft client check-in emails. The fact that you call the back room "the cold pantry" not "storage two." The model gets it. The next conversation, none of it exists. You start over.

In late April 2026, August G. Osei published a Medium piece titled ["Hermes Agent Just Beat Claude Code on GitHub. But the Agent on My Server Already Does What Hermes Promises"](https://medium.com/@augustozz/hermes-agent-just-beat-claude-code-on-github-001c6a388582). The headline overstates the case. Claude Code is a closed commercial product, and comparing its public GitHub stars to an open-source project is not an apples-to-apples comparison. But the underlying observation is the thing worth absorbing. An open-source AI agent framework called Hermes, released by [Nous Research](https://github.com/NousResearch/hermes-agent) in February 2026, crossed 110,000 GitHub stars in under ten weeks with no marketing budget. The reason it spread that fast is a single feature.

Hermes remembers.

## What "an agent that remembers" actually means

Most AI tools you have used run a fresh conversation every time. You type a prompt, the model reads it, the model answers, the model forgets. The next prompt starts from zero context. The way you compensate is by re-pasting your instructions every session, which is fine when the instructions fit in three lines and absurd when they fit in three pages.

A persistent agent is the same model underneath, wrapped in a memory layer that does three things.

It writes down what it learned in this session.
It pulls up what it learned in past sessions when relevant.
It updates its model of you, your business, and your preferences over time.

In Hermes's case, the framework explicitly creates "skills" from experience, refines those skills through continued use, and builds a model of the user across sessions, per the project's own description on its [landing page](https://hermes-agent.nousresearch.com/). The tagline is "the agent that grows with you," and the architecture matches the slogan.

There is nothing magical about it. The persistence is a database. The retrieval is similarity search. The model is the same Claude or GPT or Kimi or whatever you point the framework at. What is new is that the boring infrastructure for "remembering" finally exists as a free, MIT-licensed tool you can install on a $5-a-month server.

## Why this matters more for a small operator than for a big one

A large company hires people to remember its customers. A small operator does it themselves, in their head, with the gaps that come with running everything at once.

A small operator who runs the same workflow every week is the best-fit user of a persistent-memory agent. The fresh-prompt model penalizes repetition. The persistent-memory model rewards it.

Four examples.

## Example 1. A real estate agent who works one specific neighborhood

She does 30 transactions a year, mostly in two zip codes. Every property comes through the same lender. Every contract uses the same title company. Every staging vendor she trusts has a different preference for which day of the week she can book them.

The fresh-prompt approach: she copies a 1,200-word context document into every chat session before she can ask the AI anything useful.

The persistent approach: the agent already knows the lender's preferred submission window, the title company's three favored escrow contacts, and each staging vendor's blackout dates. When she says "draft an offer letter for 1402 Sage Avenue," the agent already has the variables. She edits one sentence.

The savings are not in tokens. The savings are in the cognitive cost of re-explaining herself thirty times a year.

## Example 2. A bookkeeper managing twenty client books

She has twenty clients, each with their own chart of accounts, their own preferred categorization rules, and their own quirks. Client A wants vendor names in caps. Client B always categorizes Uber rides under "client meetings" because she only uses Uber for meetings. Client C insists on a separate row for sales tax even though her state does not require it.

Fresh-prompt: every time she opens a new session to draft a month-end summary, she remembers (or forgets) one of those quirks.

Persistent: the agent has a small note per client. The note grows. After six months, the agent drafts month-end summaries that already follow each client's rules, and she edits the one sentence she did not anticipate.

## Example 3. A landscaper who quotes residential jobs

He estimates a job, walks the property, talks to the homeowner about which trees they want to keep, which beds they hate weeding, the kid who is allergic to bee stings so no flowering plants near the back deck. He writes the quote that night. Three weeks later, the homeowner accepts. He has to remember everything from the walkthrough.

The persistent agent took voice notes on the walkthrough, transcribed them, and summarized them into a structured note attached to the homeowner's record. Three weeks later, when he asks for "the proposal for the Sage Avenue job," the agent already knows about the bee-stinging kid and proposes a backyard layout that respects that constraint. He does not have to remember.

## Example 4. A consultant running monthly diagnostic engagements

She does monthly business diagnostics for ten clients. Each diagnostic looks at last month's numbers, compares against her benchmarks, and recommends one or two actions. The fresh-prompt model means she writes a fresh prompt every month for every client.

The persistent agent remembers what she recommended last month, whether the client acted on it, and how the numbers moved as a result. The next month's diagnostic is not a fresh report. It is a follow-up. It already knows what worked and what did not.

The diagnostic gets sharper every cycle because the agent learns what mattered to that specific client.

## The harness matters more than the model

There is a useful observation Osei makes in his Medium piece, citing an experiment by the AI infrastructure company LangChain. They held the underlying AI model constant and only changed the "harness" around it: the instructions, the constraints, the feedback loops, the memory, the orchestration. The benchmark score moved from 52.8 percent to 66.5 percent. The ranking jumped from outside the top 30 to the top 5. Zero model changes.

Translated for a small operator. The model you use (Claude, GPT, something else) matters less than the harness you build around it. The harness is your system prompt, your saved client notes, your stored preferences, the way you log past outputs. Most people running AI tools for their business spend their effort picking the right model. The actual gains live in the harness.

A persistent-memory agent is, fundamentally, a harness. Hermes is one open-source example. There are others. The pattern matters more than any specific tool.

## What it costs to run

For a small operator, the cost ladder of moving to a persistent agent breaks out roughly like this, drawn from Osei's own bill and the Hermes project documentation:

- A small cloud server to host the agent. About $5 a month, from a provider like Hetzner or DigitalOcean. (You only need this for the self-hosted route. Managed alternatives exist.)
- Whichever AI model you point it at. Saioc's [social-posting build](/blog/blog-50-month-ai-stack-smb/) runs around $5 of Claude API per month for fifty workflow runs. A persistent agent serving four small workflows for one operator might run $15 to $50 a month in model calls depending on how chatty you are.
- The Hermes framework itself. Free, MIT-licensed, no per-seat fee.

So $25 to $60 a month all-in for one operator's full persistent-memory agent setup. That is not nothing. It is also less than most people spend on coffee.

The catch, as Osei puts it: "the last 20 percent of setup is where most people quietly stop." Connecting the agent to web search needs an API key. Connecting it to your Google Drive is a separate step. Wiring it to Slack or Telegram for the interface is its own configuration. None of the steps are hard. The combined friction is enough that most installs end up sitting idle.

If you do not want to spend a Saturday wiring things up, the workaround is the [boring infrastructure layer](/blog/blog-boring-ai-layer-decides-smb-margin/) approach. Pay someone the price of one agency retainer to do the wiring once, then never pay again.

## When to bother

If your business runs the same workflow every week, with the same context, on different inputs (different clients, different days, different properties), a persistent-memory agent will pay for itself within a month.

If you run one-off projects, each totally different from the last, the gains are smaller. Fresh-prompt tools work fine for that.

The sweet spot: repeatable workflows, the same handful of customers or projects, accumulating context that is annoying to re-paste. That describes about 80 percent of small-business operations.

## Fact-check notes and sources

- August G. Osei, "Hermes Agent Just Beat Claude Code on GitHub. But the Agent on My Server Already Does What Hermes Promises," Medium, April 29, 2026.
- The 110,000 GitHub stars figure: cited in Osei's piece. Verify the current count directly at [github.com/NousResearch/hermes-agent](https://github.com/NousResearch/hermes-agent). Star counts only go up.
- The LangChain benchmark experiment (52.8 percent to 66.5 percent with no model change): cited by Osei. Verify the original write-up by searching LangChain's blog. Treat the specific percentages as second-hand until verified directly.
- The Hermes Agent project itself, MIT-licensed, single-command install: [github.com/NousResearch/hermes-agent](https://github.com/NousResearch/hermes-agent), [hermes-agent.nousresearch.com](https://hermes-agent.nousresearch.com/).
- All four operator examples are composite illustrations of real workflow patterns. The specific cost figures are typical ranges, not data from a single business.

## Related reading

- [The $50 AI stack for any small business](/blog/blog-50-month-ai-stack-smb/), the foundational pipeline before you add a memory layer on top.
- [The boring AI layer decides your SMB margin](/blog/blog-boring-ai-layer-decides-smb-margin/), why infrastructure investments compound and model choice does not.
- [Spot the spread](/blog/blog-spot-the-spread-find-money-in-your-business/), using the persistent-memory pattern to track structural opportunities month over month.
- [The Claude skill linter](/blog/blog-tool-claude-skill-linter/), keeping your skill definitions clean as your harness grows.
- [Claude trading lessons](/blog/blog-claude-trading-lessons/), the long-form thinking on AI as a thoughtful collaborator.

If you are running a small portfolio of customer-facing workflows and want the wiring guide for scaling beyond one operator's brain, The $100 Network on Amazon Kindle covers the multi-customer compound pattern this post hints at.


---

Canonical HTML: https://jwatte.com/blog/blog-agent-that-remembers-you/
RSS: https://jwatte.com/feed.xml
JSON Feed: https://jwatte.com/feed.json
Hero image: https://jwatte.com/images/blog-agent-that-remembers-you.webp
