25. Building a RAG App (Project 2)

Overview and links for this section of the guide.

What this project builds

Project 2 is an end-to-end RAG app: a system that answers questions about your documents with references.

Unlike a “chat over docs” demo, this project is designed to be:

  • grounded: answers are constrained to sources,
  • auditable: you can see which chunks influenced the answer,
  • maintainable: indexes update as docs change,
  • measurable: evaluation catches regressions.
Scope choice

This project focuses on the core RAG pipeline and guardrails. You can wrap it in a CLI or web UI later; the backend contract is the hard part.

Project deliverables (minimum viable)

By the end of Section 25, you should have:

  • A spec: clear “done” criteria and non-goals.
  • An indexer: ingest → chunk → embed → store (repeatable, idempotent).
  • A query path: retrieve → compose prompt → answer (with citations).
  • Validation: schema validation and “not found” behavior.
  • Evaluation harness: a small eval set + regression detection.
  • Maintenance plan: updates, deletions, and re-embedding strategy.

Reference architecture (simple but real)

A minimal but production-shaped architecture includes:

  • Document store: source docs and extracted text.
  • Chunk store: chunks with ids + metadata + text.
  • Vector index: embeddings for chunks (with metadata filters).
  • Retrieval layer: query embedding + filters + top-k search + rerank (optional).
  • Prompt composer: context packing + grounding rules.
  • Answer validator: JSON/schema checks + citation checks.
  • Audit log: store question, retrieved chunk ids, answer, model versions.

In the beginning, “stores” can be files on disk. The key is the interfaces and artifacts: ids, metadata, and reproducible pipelines.

Common project failure modes

  • No spec: the system “works” but you can’t tell what “correct” means.
  • No eval set: changes feel better until a user finds a bad answer.
  • Bad chunking: retrieval can’t find the evidence even though it exists.
  • Weak grounding: the model answers from vibes when sources are missing.
  • No maintenance: stale indexes break trust as docs evolve.

Section 25 map (25.1–25.5)

Where to start