24. Retrieval-Augmented Generation Basics

Overview and links for this section of the guide.

What this section is for

Section 24 gives you the fundamentals you need to build and debug grounded Q&A systems.

You’ll learn how to reason about:

  • when RAG is the right tool,
  • how chunking and metadata determine retrieval quality,
  • what embeddings do (and what they don’t),
  • how ranking/reranking improves relevance,
  • how prompts enforce “use sources faithfully.”
Builder framing

RAG is a pipeline. Most wins come from boring engineering: data cleanliness, chunking, evaluation, and logs.

The minimum components of a RAG system

A practical RAG system needs:

  • Document ingestion: load text + metadata from your corpus.
  • Chunking: split docs into retrievable units with stable ids.
  • Embeddings: convert chunks and queries into vectors for similarity search.
  • Storage: store chunk text + metadata + embeddings in a retrievable form.
  • Retrieval: select candidate chunks for a user query (with filters).
  • Ranking/reranking: choose the best subset under a context budget.
  • Prompt composition: present sources and rules clearly.
  • Answer format: structured output with citations and “not found.”
  • Evaluation: an eval set that detects regressions and hallucination.

The predictable failure modes

When RAG feels “bad,” it’s usually one of these:

  • Bad chunking: the answer spans boundaries or key definitions are missing.
  • Wrong retrieval: top-k results are semantically close but not actually relevant.
  • Missing filters: retrieval ignores permissions or document types.
  • Context packing mistakes: too many chunks, not enough instruction, no stable ids.
  • Prompt injection: retrieved docs contain instructions that override your system goals.
  • No evaluation: you can’t tell if you made it better or worse.

Section 24 gives you tools to diagnose these systematically.

A practical “RAG basics” workflow

  1. Start with a small corpus: 5–20 documents that matter.
  2. Chunk with stable ids: make it citable and auditable.
  3. Build a tiny retrieval demo: query → top 5 chunks (inspect results).
  4. Add a grounding prompt: answer only from chunks; include citations.
  5. Create an eval set: 25 questions; record failures.
  6. Iterate: improve chunking, retrieval, and prompts based on failures.

Section 24 map (24.1–24.5)

Where to go next