Somewhere in the past year, three separate engineering teams building production AI coding tools arrived at the same conclusion independently. When an LLM needs to remember things between sessions, the best default storage layer isn't a managed vector database. It's a plain Markdown file sitting in the project directory.
Claude Code uses CLAUDE.md. Manus uses todo.md. OpenClaw uses MEMORY.md alongside dated journal files. None of them started with files and later downgraded from something fancier. They evaluated the options and picked files first.
This isn't an argument against databases. It's an observation that for the way most developers actually use AI coding tools, a flat file handles the job better than infrastructure you have to provision, pay for, and maintain. The reasons are more interesting than they sound.
Why files win the default
Cache economics. LLM providers price cached tokens significantly cheaper than uncached ones. Anthropic's prompt caching, for example, charges roughly a tenth of the standard rate for tokens that hit the cache. The cache works by matching stable prompt prefixes. If the first chunk of your prompt is the same across requests, the provider skips reprocessing it. A Markdown file loaded at the top of every session creates exactly that kind of stable prefix. A vector database that retrieves different chunks each time defeats the cache on every call.
Human readability. You can open CLAUDE.md in any editor and read it. You can edit it with your hands. You can grep it. You can diff it. You can version-control it with git. Try doing any of that with embeddings stored in a vector index. The file is simultaneously the LLM's memory and your documentation. No translation layer, no query language, no dashboard.
Attention placement. Language models attend most strongly to tokens near the end of the context window and near the beginning. The middle gets less attention. This is the "lost in the middle" problem that every retrieval-augmented system has to work around. A curated Markdown file keeps the most important context short and positioned where the model will actually read it. A vector retrieval system returns chunks ranked by similarity, not by position, and the model may bury the critical piece in the middle of a long context.
Zero infrastructure. No server to run. No embedding model to choose and maintain. No index to rebuild when your schema changes. No cold-start latency on the first query. The file is just there. It loads in milliseconds. It works offline. It works on a plane.
The three patterns that ship
Every tool that uses file-based memory converges on one of three shapes. You don't have to use the exact filenames, but understanding the patterns helps you build your own.
Pattern 1: The project instruction file
This is CLAUDE.md in Claude Code, or the equivalent in Cursor's .cursorrules. A single file in the project root that loads automatically at the start of every session. It holds:
- Build commands and stack details
- Code-style preferences and anti-patterns to avoid
- Known quirks and gotchas specific to the codebase
- Security constraints and deployment procedures
The key discipline: keep it under 200 lines. Anything longer starts consuming context budget that should go to the actual task. If a section grows past what fits, split it into a subdirectory-level file that only loads when the model is working in that part of the codebase.
This pattern works in every AI coding tool that supports system-level instructions. In Codex, you'd place it in a project-level instructions file. In Cursor, .cursorrules at the root. In Aider, the --read flag can load a Markdown file into every session. In Gemini CLI, paste or pipe it as context.
Pattern 2: The running checklist
Manus popularized this with todo.md. During a complex multi-step task, the agent writes and continuously rewrites a checklist file tracking what's done, what's next, and what's blocked. Each rewrite puts the current plan into the most recent part of the context, which is exactly where the model attends most strongly.
You don't need Manus to use this pattern. In Claude Code, you can ask the model to maintain a TODO.md or use the built-in task tracking. In Codex, you can include "update the checklist after each step" in your instructions. The point is that the model's plan isn't floating in conversation history where it will scroll away. It's pinned to a file that gets reloaded.
This is particularly useful for tasks that span dozens of tool calls. Without the checklist, the model loses track of its own plan around call fifteen or twenty. With the checklist, it re-reads the plan every time it updates the file.
Pattern 3: The memory journal
OpenClaw's approach: a MEMORY.md index file pointing to individual memory entries, plus dated files (memory/2026-04-29.md) for session-specific notes. The index stays small. The individual files hold the detail. The system flushes context to disk when the conversation approaches the context limit, and restores relevant pieces when needed later.
In Claude Code, this pattern maps directly to the auto-memory system in ~/.claude/projects/. The model writes small memory files with frontmatter (name, description, type) and maintains an index in MEMORY.md. Each memory loads only when relevant. The cap is 200 lines in the index to prevent bloat.
You can build the same thing manually in any tool. Create a memory/ directory. Write a short Markdown file for each thing worth remembering. Maintain an index. Load the index at session start. Load individual files when the model needs them. It's a filing cabinet, not a database.
How to set this up in your tools
Claude Code
Already built in. CLAUDE.md at the project root loads automatically. The auto-memory system in ~/.claude/projects/ handles cross-session persistence. To get the most out of it:
- Keep
CLAUDE.mdfocused on instructions, not history - Let the auto-memory system handle session-specific observations
- Review
MEMORY.mdperiodically and prune stale entries - Use subdirectory
CLAUDE.mdfiles for subsystem-specific context
If you haven't set this up yet, the CLAUDE.md Generator on this site builds the initial file from a 10-question form.
OpenAI Codex CLI
Codex reads project-level instruction files. Create a Markdown file with your project context and reference it in your Codex configuration. The same principles apply: stable prefix, curated content, under 200 lines. Codex's sandboxed execution model means the instruction file is especially important for communicating constraints the model can't infer from the code alone.
Cursor
Drop a .cursorrules file in your project root. Cursor loads it automatically. Same shape as CLAUDE.md: stack details, style rules, known gotchas. Cursor also supports @docs references for pulling in external documentation, which layers on top of the file-based memory.
Gemini CLI
No built-in project file convention yet, but the pipe-in workflow covers it. Create your context file, then start sessions with cat project-context.md | gemini or paste the content at the start. For batch workflows, prepend the context file to every prompt.
Aider
aider --read context.md loads a file as read-only context for every session. Aider's git-native approach means your context file is version-controlled by default, so you get a history of how your project memory evolved.
When you actually need a vector database
Files stop being enough when one of these conditions is true.
Your memory corpus exceeds what fits in a context window. If you have thousands of memory entries and you need semantic search across all of them, a vector index is the right tool. The inflection point is usually somewhere around 50 to 100 separate memory files. Below that, a curated index and selective loading works fine.
You need concurrent multi-agent access with consistency guarantees. Two agents writing to the same Markdown file at the same time will corrupt it. A database gives you atomicity and isolation. If you're running parallel agents that share memory, files alone won't hold.
You need fuzzy semantic retrieval. Keyword grep doesn't find paraphrases. If your memory says "the deploy pipeline uses GitHub Actions" and the query is "how do we ship code to production," grep won't match. A vector search will. For small corpora this doesn't matter because you can load everything. For large ones, it does.
The hybrid approach that works best in practice: keep files as the primary interface. Build a lightweight vector index over those files using something like sqlite-vec if you need search. The files remain human-readable and git-tracked. The index is derived, rebuildable, disposable. You get both the cache benefits of stable file prefixes and the retrieval benefits of semantic search, without committing to a managed database service.
The mental model
Think of the context window as RAM and the filesystem as disk. You wouldn't design an application that tries to keep its entire database in RAM at all times. You'd keep the hot data in memory and page in the rest as needed.
That's exactly what file-based memory does for an LLM. The project instruction file is the hot data, always loaded. The memory journal entries are on disk, paged in when relevant. The context window stays clean. The model stays focused. The cost stays predictable.
And you can read every byte of it in your text editor.
If you're building a business on top of these tools and want the complete map of which AI coding tool handles which task best, The $20 Dollar Agency covers the full AI tool stack from first install to daily production use. Search "The $20 Dollar Agency" on Amazon Kindle.
Related reading
- Why CLAUDE.md Generator exists — the tool that builds the initial project instruction file
- Top AI CLIs and how to use them with our generators — Claude Code, Gemini CLI, aichat, Aider, and how to pipe context into each
- How to validate an AI coding model before you trust it — the pre-upgrade checklist for when model updates ship
- Five lessons from using Claude Code on a live codebase — what persistent project memory actually looks like in practice
- Two CLIs, one workflow: Codex alongside Claude Code — running both daily with shared project context
Fact-check notes and sources
- Prompt caching pricing (cached tokens ~10x cheaper): Anthropic's prompt caching documentation prices cache reads at 0.1x the base input token rate for Claude models. Anthropic prompt caching docs.
- "Lost in the middle" attention pattern: Liu et al., "Lost in the Middle: How Language Models Use Long Contexts" (2023). Showed performance degrades for information placed in the middle of long context windows. arXiv:2307.03172.
- Manus todo.md pattern and tool-call averages (~50 per task): Based on Manus team public engineering blog posts describing their agent architecture and operational patterns, 2026.
- OpenClaw hybrid retrieval (sqlite-vec, 0.7/0.3 vector/text weight split): Based on OpenClaw's public repository documentation describing their memory architecture.
- CLAUDE.md 200-line cap and memory file conventions: Claude Code documentation and the auto-memory system described in Anthropic's product documentation for Claude Code.
This post is informational, not consulting or financial advice. Mentions of Anthropic, OpenAI, Google, Manus, OpenClaw, Cursor, and Aider are nominative fair use. No affiliation is implied.