A technique that enhances language model outputs by first retrieving relevant information from external knowledge sources, then using that retrieved context to generate more accurate, grounded, and up-to-date responses.
In Depth
Retrieval-Augmented Generation (RAG) addresses a core limitation of language models: their knowledge is frozen at training time. A model trained in January 2024 has no knowledge of events in February 2024. RAG solves this by adding an external retrieval step before generation. When a user asks a question, the system first searches a knowledge base — a database of documents, a vector store, a web search engine — to find relevant context. This retrieved information is then included in the model's prompt, allowing it to generate responses grounded in current, specific, and verifiable sources.
A typical RAG pipeline has three components: an indexing stage that encodes documents as embeddings and stores them in a vector database (Pinecone, Weaviate, Chroma); a retrieval stage that embeds the user's query and finds the most semantically similar document chunks; and a generation stage where the retrieved chunks are injected into the model's prompt as context. The model then synthesizes an answer from this context rather than relying solely on its parametric knowledge. This architecture enables grounding, reduces hallucination, and allows source attribution.
RAG has become the standard approach for building production LLM applications because it offers advantages over fine-tuning: it requires no model retraining, the knowledge base can be updated in real time, sources are traceable for verification, and it works with any base model. However, RAG has its own challenges: retrieval quality is critical (irrelevant context hurts more than no context), chunking strategy affects performance, and long-context models are beginning to offer alternatives for some use cases. Advanced RAG techniques include re-ranking retrieved results, multi-hop reasoning over multiple retrieval steps, and hybrid search combining semantic and keyword matching.
RAG connects language models to external knowledge bases, enabling accurate, up-to-date, and verifiable responses — it is the dominant architecture for production LLM applications.