MemPalace Setup Generator — Wings, Rooms, Halls, Drawer...

If you've had the same conversation with Claude three times because each new session forgets what you told it last week, you've hit the problem MemPalace was built to solve. MemPalace is a Python library that gives LLMs persistent, long-running memory without a cloud service, without an API key, and without a summarizer-LLM deciding what was important.

The MemPalace Setup Generator walks you from "nothing" to "working palace" in about ten minutes. You fill in your role, work style, and the projects you juggle. It outputs the identity file, the wing/room tree, the MCP config for Claude Code or Cursor, a Python snippet for local-model integration, and an AI walk-through prompt you can paste into Claude for the install-and-verify step.

What MemPalace actually does

Every other memory system I've tried. Mem0, Zep, Letta. Works the same way: after each conversation, an LLM summarizer reads the exchange, extracts "important facts," and stores a compressed structured version. The LLM decides what mattered. That decision is lossy in a way that bites later. "User prefers PostgreSQL" gets stored; the six sentences explaining why do not. Next time you ask a nuanced question, the model has the conclusion without the reasoning.

MemPalace inverts the approach. Raw conversations go into a local ChromaDB vector database verbatim. No LLM touches them on the write path. The intelligence lives in how the data is organized and retrieved, not in what gets stored.

The architecture borrows directly from the ancient method of loci. The mnemonic technique Greek orators used to memorize long speeches by placing ideas in imagined rooms of a palace. MemPalace makes that literal:

Wings are top-level domains. A project, a client, a life area. "orion_project" is a wing. "personal" is a wing.
Rooms are focused areas inside a wing. Inside "orion_project" you might have rooms for "auth," "database," "deployment."
Halls are metadata labels applied across wings. Work / Health / Relationships / Travel / General. A memory about refactoring a stressful project lives in the "orion_project" wing but can be tagged Work and Health.
Drawers are the atomic units. Each drawer holds a vectorized chunk of raw text plus numeric weights for importance and emotional resonance.

When your AI needs memory, MemPalace pre-filters by wing and room (semantic retrieval within a constrained space), then runs similarity against that subset. That matters. Flat vector search across everything you've ever said is slow and noisy. Spatial pre-filtering shrinks the search space to the thing that's probably relevant before embeddings even compare.

The four-layer memory stack

Every session, MemPalace builds your context in four layers:

Layer 0. Identity. A small file (~50-100 tokens) with your role, coding style, and current focus. Loads every session. This is the generator's first deliverable.
Layer 1. Top memories. The 15 highest-weighted drawers across the whole palace, sorted by importance + emotional weight. ~500-800 tokens.
Layer 2. Topic-specific context. Pulled from the wing/room matching the current conversation topic. ~200-500 tokens.
Layer 3. Deep semantic search. Full-palace similarity search, only when the agent explicitly asks.

The stack is the thing worth paying attention to. Most memory systems give you one knob: "here are all the retrieved memories." MemPalace gives you four, with layer 0 stable and layers 2, 3 adaptive. A typical session injects ~1,500 tokens of memory; a deep-context question can burst to ~5,000. Compare that with an agent that dumps 100K tokens of chat history into context on every request, and the token economics alone justify the architecture.

The benchmark that matters

MemPalace's README reports a 96.6% R@5 score on LongMemEval, the standard benchmark for long-term memory in LLMs. That's above any other open-source memory system as of April 2026. Higher than Mem0 (~49%), Zep (~64%), and comparable to the best-paid options. The interesting finding behind that number is that raw verbatim storage + decent embeddings beats LLM-curated summaries on retrieval. The industry had assumed summarization was the smart move; LongMemEval says it mostly isn't.

Two caveats before the hype carries you away:

The headline "100% benchmark score" MemPalace originally claimed in some marketing was on a variant of LongMemEval where a proprietary compression dialect was used to reconstruct the ground-truth text. The clean 96.6% figure. On the standard unmodified dataset. Is the number that matters.
MemPalace is new. Small team. Expect rough edges in 2026 that mature in 2027. Budget time for handholding.

When MemPalace is the right tool

Use it when:

You want persistent memory that runs entirely on your laptop. No cloud, no API key, no invoice.
You care about privacy (medical, legal, client work where data can't leave the device).
You're a solo developer or small team. Zero-cloud means zero monthly bill.
You want to experiment with AI memory before committing to a paid service.

Skip it when:

You need enterprise SSO, audit logs, and a managed backup story. Go Mem0 (Pro tier, $249/mo as of April 2026).
You need temporal reasoning. "what did we decide last month versus this month." Zep/Graphiti's Neo4j-backed temporal graph is purpose-built for that. Starts around $25/mo.
You want your agents to manage their own memory like an OS manages RAM. Letta is architecturally the most interesting. But you're adopting a whole agent runtime, not just a memory layer.

The setup loop the generator encodes

Fill the generator form. Role, stack, communication style, projects. Pick an archetype.
Copy the identity file. This is Layer 0. ~/.mempalace/identity.txt.
Run the install commands. git clone … && pip install -e ".[dev]" then mempalace init.
Create wings. One per project or life domain. Don't pre-create 20 empty rooms. Let them emerge.
Mine existing content (optional). Code, chat logs, notes. The miner is deterministic; expect misclassifications; inspect the first run.
Connect via MCP for Claude Code or Cursor, or via the Python API for local models.

The parts people get wrong

Over-engineering up front. The generator defaults to 3-5 rooms per wing. Resist the urge to pre-create 20. MemPalace mines existing content into rooms deterministically; you'll find half your plan was wrong and the right structure emerges from actual use.

Expecting the miner to be perfect. It classifies by directory path, filename, keyword frequency. Files with ambiguous content land in the wrong room. Check the first mining run and move memories by hand. The system won't auto-correct.

Treating identity.txt like a biography. It's a boot sequence, not a résumé. 50-100 tokens. "Senior backend dev, Python/TypeScript, prefers explicit over implicit, current focus: auth migration." That's it. Every extra paragraph is one less paragraph of memory that fits.

Methodology cross-reference

Chapter 2 of The $97 Launch. GitHub as Your Content Engine. Makes the adjacent argument that treating public code as an authority signal requires you to care about it. MemPalace is the equivalent discipline applied to your private context: your working memory is worth as much engineering as your public work. Chapter 26 of The $100 Network. Monitoring at Scale. Is where persistent memory for agent fleets becomes load-bearing.

If MemPalace sticks for you, the upgrade path is straightforward: pair it with Ollama for local LLM inference, wire both into a Docker stack (see the Docker Generator), and you have a fully local AI workflow with no cloud dependencies at all.

MemPalace, Explained — And A Generator For Your First Setup