Author: Yu Tang (PageIndex)

The Rise of Agentic Retrieval Over Vector Indexing

The successful stories of Claude Code have shown that in RAG, you can skip heavyweight vector databases and let the LLM itself handle retrieval using simple tools, such as well-written llms.txt and grep calls. Surprisingly, this minimalist approach delivers more accurate and faster retrieval, demonstrating that reasoning-driven retrieval can outperform embedding-based methods in both retrieval precision and latency. This insight challenges the default assumptions behind mainstream RAG systems.

We take the same principle beyond code.

**PageIndex (GitHub)** is a vectorlessreasoning-based retrieval framework that mirrors how human experts read, navigate, and extract knowledge from long, complex documents. Instead of relying on chunking and vector similarity search, PageIndex transforms documents into a tree-structured, in-context index and enables LLMs to perform agentic reasoning over that structure for context-aware retrieval. The retrieval process is traceable and interpretable, and requires no vector database or chunking.

For more details, see the blog post on the PageIndex framework.


The RAG Pipeline We All Know, but Increasingly Use Less

Classic RAG Pipelines

The classic RAG pipeline works like this:

split content into chunks → embed → store in a vector DB → semantic search → (blend with keyword search) → (rerank) → stuff the context → answer.

It works, but it’s complex to build, hard to maintain, and slow to iterate, often more infrastructure than you really need.

Agentic Retrieval is Emerging

In contrast, a new wave of coding agents like Claude Code takes a refreshingly simple approach:

Benchmark comparison of different retrieval methods from Lance Martin’s blog post

Benchmark comparison of different retrieval methods from Lance Martin’s blog post

In real-world tests on developer docs, practitioners have found that a well-crafted llms.txt (URLs + succinct descriptions) plus simple tool calls, such as grep, outperforms vector DB pipelines for various coding tasks. It’s not only more accurate, but also dramatically easier to maintain and update.

Why does this minimalist approach work so well?