# MemPalace, Explained — And A Generator For Your First Setup

MemPalace is a local-first, zero-cloud AI memory system that stores raw conversation verbatim in ChromaDB, organized in a spatial hierarchy. This post explains how it works and ships a setup generator.

Author: J.A. Watte
Published: April 20, 2026
Source: https://jwatte.com/blog/blog-tool-mem-palace-gen/

---

If you've had the same conversation with Claude three times because each new session forgets what you told it last week, you've hit the problem [MemPalace](https://github.com/jwatte/mempalace) was built to solve. MemPalace is a Python library that gives LLMs persistent, long-running memory without a cloud service, without an API key, and without a summarizer-LLM deciding what was important.

The [MemPalace Setup Generator](/tools/mem-palace-gen/) walks you from "nothing" to "working palace" in about ten minutes. You fill in your role, work style, and the projects you juggle. It outputs the identity file, the wing/room tree, the MCP config for Claude Code or Cursor, a Python snippet for local-model integration, and an AI walk-through prompt you can paste into Claude for the install-and-verify step.

## What MemPalace actually does

Every other memory system I've tried. Mem0, Zep, Letta. Works the same way: after each conversation, an LLM summarizer reads the exchange, extracts "important facts," and stores a compressed structured version. The LLM decides what mattered. That decision is lossy in a way that bites later. "User prefers PostgreSQL" gets stored; the six sentences explaining *why* do not. Next time you ask a nuanced question, the model has the conclusion without the reasoning.

MemPalace inverts the approach. Raw conversations go into a local [ChromaDB](https://www.trychroma.com/) vector database *verbatim*. No LLM touches them on the write path. The intelligence lives in how the data is organized and retrieved, not in what gets stored.

The architecture borrows directly from the ancient [method of loci](https://en.wikipedia.org/wiki/Method_of_loci). The mnemonic technique Greek orators used to memorize long speeches by placing ideas in imagined rooms of a palace. MemPalace makes that literal:

- **Wings** are top-level domains. A project, a client, a life area. "orion_project" is a wing. "personal" is a wing.
- **Rooms** are focused areas inside a wing. Inside "orion_project" you might have rooms for "auth," "database," "deployment."
- **Halls** are metadata labels applied across wings. Work / Health / Relationships / Travel / General. A memory about refactoring a stressful project lives in the "orion_project" wing but can be tagged Work *and* Health.
- **Drawers** are the atomic units. Each drawer holds a vectorized chunk of raw text plus numeric weights for importance and emotional resonance.

When your AI needs memory, MemPalace pre-filters by wing and room (semantic retrieval within a constrained space), then runs similarity against that subset. That matters. Flat vector search across everything you've ever said is slow and noisy. Spatial pre-filtering shrinks the search space to the thing that's probably relevant before embeddings even compare.

## The four-layer memory stack

Every session, MemPalace builds your context in four layers:

- **Layer 0. Identity.** A small file (~50-100 tokens) with your role, coding style, and current focus. Loads every session. This is the generator's first deliverable.
- **Layer 1. Top memories.** The 15 highest-weighted drawers across the whole palace, sorted by importance + emotional weight. ~500-800 tokens.
- **Layer 2. Topic-specific context.** Pulled from the wing/room matching the current conversation topic. ~200-500 tokens.
- **Layer 3. Deep semantic search.** Full-palace similarity search, only when the agent explicitly asks.

The stack is the thing worth paying attention to. Most memory systems give you one knob: "here are all the retrieved memories." MemPalace gives you four, with layer 0 stable and layers 2, 3 adaptive. A typical session injects ~1,500 tokens of memory; a deep-context question can burst to ~5,000. Compare that with an agent that dumps 100K tokens of chat history into context on every request, and the token economics alone justify the architecture.

## The benchmark that matters

MemPalace's README reports a 96.6% R@5 score on [LongMemEval](https://arxiv.org/abs/2410.10813), the standard benchmark for long-term memory in LLMs. That's above any other open-source memory system as of April 2026. Higher than Mem0 (~49%), Zep (~64%), and comparable to the best-paid options. The interesting finding behind that number is that raw verbatim storage + decent embeddings beats LLM-curated summaries on retrieval. The industry had assumed summarization was the smart move; LongMemEval says it mostly isn't.

Two caveats before the hype carries you away:

1. The headline "100% benchmark score" MemPalace originally claimed in some marketing was on a variant of LongMemEval where a proprietary compression dialect was used to reconstruct the ground-truth text. The clean 96.6% figure. On the standard unmodified dataset. Is the number that matters.
2. MemPalace is new. Small team. Expect rough edges in 2026 that mature in 2027. Budget time for handholding.

## When MemPalace is the right tool

Use it when:

- You want persistent memory that runs entirely on your laptop. No cloud, no API key, no invoice.
- You care about privacy (medical, legal, client work where data can't leave the device).
- You're a solo developer or small team. Zero-cloud means zero monthly bill.
- You want to experiment with AI memory before committing to a paid service.

Skip it when:

- You need enterprise SSO, audit logs, and a managed backup story. Go Mem0 (Pro tier, $249/mo as of April 2026).
- You need temporal reasoning. "what did we decide last month versus this month." [Zep/Graphiti's](https://github.com/getzep/graphiti) Neo4j-backed temporal graph is purpose-built for that. Starts around $25/mo.
- You want your agents to manage their own memory like an OS manages RAM. [Letta](https://github.com/letta-ai/letta) is architecturally the most interesting. But you're adopting a whole agent runtime, not just a memory layer.

## The setup loop the generator encodes

1. **Fill the generator form.** Role, stack, communication style, projects. Pick an archetype.
2. **Copy the identity file.** This is Layer 0. `~/.mempalace/identity.txt`.
3. **Run the install commands.** `git clone … && pip install -e ".[dev]"` then `mempalace init`.
4. **Create wings.** One per project or life domain. Don't pre-create 20 empty rooms. Let them emerge.
5. **Mine existing content** (optional). Code, chat logs, notes. The miner is deterministic; expect misclassifications; inspect the first run.
6. **Connect via MCP** for Claude Code or Cursor, or via the Python API for local models.

## The parts people get wrong

**Over-engineering up front.** The generator defaults to 3-5 rooms per wing. Resist the urge to pre-create 20. MemPalace mines existing content into rooms deterministically; you'll find half your plan was wrong and the right structure emerges from actual use.

**Expecting the miner to be perfect.** It classifies by directory path, filename, keyword frequency. Files with ambiguous content land in the wrong room. Check the first mining run and move memories by hand. The system won't auto-correct.

**Treating identity.txt like a biography.** It's a boot sequence, not a résumé. 50-100 tokens. "Senior backend dev, Python/TypeScript, prefers explicit over implicit, current focus: auth migration." That's it. Every extra paragraph is one less paragraph of memory that fits.

## Methodology cross-reference

Chapter 2 of [The $97 Launch](https://the97dollarlaunch.com/). *GitHub as Your Content Engine*. Makes the adjacent argument that treating public code as an authority signal requires you to care about it. MemPalace is the equivalent discipline applied to your *private* context: your working memory is worth as much engineering as your public work. Chapter 26 of [The $100 Network](https://the100dollarnetwork.com/). *Monitoring at Scale*. Is where persistent memory for agent fleets becomes load-bearing.

If MemPalace sticks for you, the upgrade path is straightforward: pair it with [Ollama](https://ollama.com/) for local LLM inference, wire both into a Docker stack (see the [Docker Generator](/tools/docker-gen/)), and you have a fully local AI workflow with no cloud dependencies at all.


---

Canonical HTML: https://jwatte.com/blog/blog-tool-mem-palace-gen/
RSS: https://jwatte.com/feed.xml
JSON Feed: https://jwatte.com/feed.json
Hero image: https://jwatte.com/images/blog-tool-mem-palace-gen.webp
