This is part four of a five part series on the practical AI and web stack for a small or medium business. Earlier parts covered where your AI runs, where your site lives, and how you take money. This part is about the most over-sold idea in the space: pointing AI at your own documents so it can answer questions about your business. It is real, it is cheap, and it is worth it less often than the demos suggest.
Two terms get thrown around. Retrieval-augmented generation, or RAG, just means the AI looks things up in your documents before it answers, so it responds from your manuals and policies instead of from general knowledge. An agent means the AI can take actions, not just answer, like creating a draft, updating a record, or sending something for approval. Both are useful. Both are also easy to over-build.
When RAG is worth it, and when a shared doc wins
The honest test is volume and repetition. RAG earns its keep when the same questions get asked constantly against a body of documents too large to skim.
- An HVAC contractor whose techs keep calling the office to ask install specs, warranty terms, and error codes buried across dozens of equipment manuals. That is a perfect RAG case: high question volume, a large document set, and answers that are in the manuals but slow to find.
- A motel night-desk fielding the same guest policy questions at 2am when no manager is awake. Also a good case: repetitive, after-hours, and answerable from existing policy documents.
- A small firm where new hires keep asking "what's our process for X." Borderline. If it is a handful of processes, a single well-organized shared document with a search box answers it for free. If it is hundreds of pages across many files, RAG starts to pay.
Here is the part the demos skip: if your "knowledge base" is five pages, you do not need RAG. You need one clean document and a search box. RAG is for when the documents are too many to read and the questions are too frequent to handle by hand. Below that line, you are building infrastructure to solve a problem a Google Doc already solved.
The cheap stack, and what it actually costs
When RAG does fit, the build is modest and the running cost is trivial. The pieces, in plain terms: a tool that reads and chunks your documents (LlamaIndex is the common one), a place to store the searchable index (Chroma to prototype, or pgvector inside a Postgres database you already run so you do not add a second system), and a model to write the answers, which can be a cheap cloud tier or a local model on your own machine for privacy.
The cost is not what people fear. Turning your documents into a searchable index is a one-time job that costs cents per thousand pages. Each question a user asks then costs a fraction of a cent at the cheap API tiers from part one of this series. A contractor answering a few hundred tech questions a month is looking at a couple of dollars, not a budget line. If you want to estimate your own numbers before building, I made a retrieval cost estimator.
One nuance worth knowing: for a lot of small-business knowledge, a tidy structured list of facts beats a fancy semantic search. If your "documents" are really a price list, a parts catalog, or a policy table, organizing them cleanly often outperforms a vector database, and costs less. I wrote about that trade in why an ontology often beats embeddings for an SMB.
When to add an agent, and when not to
An agent is the step from answering to acting. It is genuinely useful for multi-step jobs that are tedious but well-defined: take a new lead, look up the customer, draft a quote, and queue it for your review. The key word is review. The right design for a small business keeps a human approving anything that touches money, safety, or a customer-facing send.
Skip the agent when a single answer or a simple fixed workflow does the job. A tool that drafts a reply does not need to be an "agent," it needs to draft a reply. Reaching for agents on simple tasks adds cost, latency, and failure modes for no benefit. When you do build one, put a hard spending cap on it and log what it does, because an agent in a loop is the one way these systems run up a real bill. The patterns for that are in AI agent cost controls for small business.
The failure modes to plan for
Three things go wrong, and all three are manageable if you expect them.
- Bad documents in, bad answers out. RAG answers from what you feed it. If your manuals are outdated or contradictory, the AI will confidently repeat the mistake. Clean the source first; the system is only as good as the documents behind it.
- Confident wrong answers. Any model can state something false as if it were certain. For anything that matters, keep the source citation visible so a person can check, and keep a human in the loop on decisions with consequences.
- Quiet cost creep on agents. Answering questions is cheap. An agent that retries in a loop is not. Caps and logs, always.
None of these are reasons to avoid the technology. They are reasons to start small, on one real problem, with a human checking the output, exactly the approach behind everything I build and the argument of The $97 Launch. Point it at one painful, repetitive, document-heavy question, prove it saves real time, and only then expand.
The honest summary
Build RAG when the same questions hit a document set too big to skim. Use a shared doc and a search box when it is small. Keep your facts clean, keep a citation visible, and keep a human approving anything that matters. Add an agent only for multi-step jobs with a clear definition of done, and cap its spend. Do that and you get a genuinely useful assistant on your own knowledge for a couple of dollars a month. The last part of this series steps back to the biggest decision of all: whether to hire for this, train for it, or hand it off.
The series
- Previous: Part 3, Square and a portable storefront
- Part 4 (this post): RAG and agents over your own documents
- Next: Part 5, hire vs upskill vs outsource the AI work
Related reading
- Why an ontology often beats embeddings for an SMB, when structure beats semantic search
- AI agent cost controls for small business, keeping an agent's bill in check
- Claude for small business, a walkthrough, a hands-on starting point
- Retrieval cost estimator, price your own RAG workload
Fact-check notes and sources
Tool capabilities and API prices change; treat specifics as approximate mid-2026 figures and confirm before relying on them.
- Retrieval and indexing stack: LlamaIndex, Chroma, pgvector, Ollama.
- Query and embedding costs reference the cheap-tier LLM API prices in part one of this series; per-million-token rates from OpenAI, Google Gemini, and Claude.
This post is informational and not legal, financial, or professional advice. Product names are current as of mid-2026 and change; verify before relying on them. No affiliation with the tools mentioned is implied.