Author: Yu Tang

The Rise of Agentic Retrieval Over Vector Indexing

The successful stories of Claude Code have shown that you can skip heavyweight vector databases and let the model itself handle retrieval with simple tools: well-written llms.txt and grep calls. Surprisingly, this minimalist approach delivers more accurate and faster retrieval, proving that a reasoning-driven model can outperform embedding-based systems in both retrieval precision and latency.

We take that same principle beyond code.

PageIndex is an LLM-native, vectorless index for PDFs and long-form documents — a hierarchical table-of-contents tree that lives inside the model’s context window, enabling the model to reason and navigate like a human reader. No vector DB required.

The RAG Pipeline We All Know, and Increasingly Use Less

The classic RAG pipeline works like this:

split content into chunks → embed → store in a vector DB → semantic search → (blend results with keyword search) → (rerank) → stuff the context → answer.

It works, but it’s complex to build, hard to maintain, and slow to iterate, often more infrastructure than you really need.

In contrast, a new wave of code agents takes a refreshingly simple approach:

No pre-indexing.
Just agentic retrieval — ****give the LLM a few basic file tools (fetch, grep, read) and a compact directory of the space (e.g., a curated llms.txt describing what’s where).
Let the model decide what to open next based on its own reasoning and chain of thought.

Benchmark comparison of different retrieval methods from Lance Martin’s blog post

In real-world tests on developer docs, practitioners have found that a well-crafted llms.txt (URLs + succinct descriptions) plus simple tool calls, such as grep, outperforms vector-DB pipelines for various coding tasks. It’s not only more accurate, but also dramatically easier to maintain and update.

Why does this minimalist approach work so well?

Codebases have explicit structure and intent. Each function, class, or module serves a clear purpose and lives in a predictable place. When an agent understands the task, e.g., “find how this API handles authentication”, it can trace that logic directly through file names, imports, and docstrings, without needing fuzzy semantic matching.
The LLM’s reasoning loop is the retrieval algorithm. Instead of outsourcing relevance to a vector search system, the model plans its own retrieval path — deciding what to open, read, and reference next — guided by its internal understanding of context and goals.