The successful stories of Claude Code have shown that in RAG, you can skip heavyweight vector databases and let the LLM itself handle retrieval using simple tools, such as well-written llms.txt and grep calls. Surprisingly, this minimalist approach delivers more accurate and faster retrieval, demonstrating that reasoning-driven retrieval can outperform embedding-based methods in both retrieval precision and latency. This insight challenges the default assumptions behind mainstream RAG systems.
We take the same principle beyond code.
**PageIndex (GitHub)** is a vectorless, reasoning-based retrieval framework that mirrors how human experts read, navigate, and extract knowledge from long, complex documents. Instead of relying on chunking and vector similarity search, PageIndex transforms documents into a tree-structured, in-context index and enables LLMs to perform agentic reasoning over that structure for context-aware retrieval. The retrieval process is traceable and interpretable, and requires no vector database or chunking.
For more details, see the blog post on the PageIndex framework.
The classic RAG pipeline works like this:
split content into chunks → embed → store in a vector DB → semantic search → (blend with keyword search) → (rerank) → stuff the context → answer.
It works, but it’s complex to build, hard to maintain, and slow to iterate, often more infrastructure than you really need.
In contrast, a new wave of coding agents like Claude Code takes a refreshingly simple approach:

Benchmark comparison of different retrieval methods from Lance Martin’s blog post
In real-world tests on developer docs, practitioners have found that a well-crafted llms.txt (URLs + succinct descriptions) plus simple tool calls, such as grep, outperforms vector DB pipelines for various coding tasks. It’s not only more accurate, but also dramatically easier to maintain and update.
Why does this minimalist approach work so well?