34.2 Repo indexing strategy

Overview and links for this section of the guide.

Level 1: Brute Force (Small Repos)

If your project is < 50 files, don't overthink it. Just traverse the directory, concatenate all non-ignored files into a single big string (XML-wrapped), and stuff it into the context window.

Gemini 1.5 Pro has a 2M token window. That fits 99% of side projects entirely.

Level 2: The File Map (Medium Repos)

If you have a 100MB monorepo, you can't send everything.

Strategy: 1. Generate a "File Map": A tree structure of all file paths. 2. Send the File Map to the model first. 3. User asks: "Update the login page." 4. Model looks at the map and says: "I need to read `src/pages/login.tsx` and `src/components/auth-form.tsx`." 5. Tool fetches those 2 files. 6. Model generates the answer.

Level 3: Embeddings (Large Repos)

For massive codebases (Google scale), you chunk every function and index it in a vector DB. This is complex to maintain (cache invalidation is hard). We will stick to Level 1 or 2 for this project.

Where to go next