/ THE CORE

RAG Chunking Strategies That Actually Work in 2026

Chunking is the most unglamorous and most impactful part of any RAG system. Here's what we've learned about doing it right — and the mistakes that silently destroy retrieval quality.

Visual comparison of different chunking strategies applied to the same document

Why chunking matters so much

In a RAG system, chunking is the process of splitting source documents into pieces that can be individually embedded and retrieved. It seems like a minor preprocessing step. It isn't.

Chunking determines what the retrieval system can find. If a relevant piece of information spans two chunks, neither chunk alone contains the full answer, and the model may hallucinate or say it doesn't know. If chunks are too large, the embedding becomes a noisy average of multiple topics, and retrieval precision drops. If they're too small, you lose context.

Getting chunking right is the single highest-leverage optimization most RAG teams can make — and it requires surprisingly little code.

The common strategies

Fixed-size chunking

Split the document into chunks of a fixed token count (e.g., 512 tokens) with overlap between consecutive chunks (e.g., 50 tokens).

This is the simplest approach and works surprisingly well as a baseline. The overlap ensures that information near chunk boundaries isn't completely lost. Most teams should start here and only move to more sophisticated strategies when they've identified specific quality problems that chunking improvements can solve.

Recursive character splitting

Split on natural boundaries — paragraphs first, then sentences, then words — with a target chunk size. This preserves semantic units better than fixed-size splitting because it avoids cutting mid-sentence or mid-paragraph.

This is the default recommendation for most use cases. It's slightly more complex than fixed-size splitting but produces meaningfully better retrieval quality because chunks tend to be semantically coherent.

Semantic chunking

Use an embedding model to detect topic boundaries within the document, splitting where the semantic content shifts. This produces the most semantically coherent chunks but is significantly more expensive (requires embedding every sentence or paragraph to detect boundaries) and harder to tune.

Use this when your documents contain multiple distinct topics and you need high retrieval precision. The overhead is justified for knowledge bases where quality matters more than indexing speed.

Start simple Recursive character splitting with a chunk size of 512–1024 tokens and 10% overlap is the right starting point for 80% of RAG use cases. Only move to more complex strategies when you've measured that they improve retrieval quality on your data.

Document-structure-aware chunking

For structured documents (HTML, Markdown, PDFs with headers), split along the document's own structure: sections, subsections, paragraphs. This preserves the author's intended logical units and typically produces the most meaningful chunks.

This is the best approach when your documents have reliable structure, but requires parsing logic for each document format.

The parent-child pattern

One of the most effective advanced patterns: embed small chunks (for precise retrieval) but retrieve the larger parent chunk (for sufficient context to the LLM).

The implementation: split documents into small chunks (e.g., 256 tokens), embed these for retrieval, but maintain a mapping to their parent chunk (e.g., the full section or a 1024-token window). When a small chunk is retrieved, pass the parent chunk to the model.

This gives you the best of both worlds: precise retrieval (because small chunks produce focused embeddings) and rich context (because the model sees the surrounding information).

Chunk size: the critical variable

The optimal chunk size depends on several factors:

A practical approach: test chunk sizes of 256, 512, and 1024 tokens on your evaluation set and measure retrieval quality at each. The optimal size is rarely the one you'd guess.

Metadata enrichment

Chunks should carry metadata that helps both retrieval and the LLM:

This metadata enables filtered retrieval ("find information from documents published after 2024") and helps the model cite its sources accurately.

Common chunking mistakes

Chunking isn't sexy, but it's the foundation. Get it right and everything downstream gets better.

Link copied!