Context Window

Definition

The maximum amount of text (measured in tokens) that a language model can process in a single input-output interaction — determining how much conversation history, document content, or context the model can 'see' at once.

In Depth

A language model's context window is the maximum number of tokens it can accept as input and generate as output in a single interaction. Everything the model 'knows' during a conversation — the system prompt, conversation history, uploaded documents, and its own previous responses — must fit within this window. Early GPT-3 models had a 4K token context window (roughly 3,000 words). Modern models have expanded dramatically: GPT-4 supports up to 128K tokens, Claude offers up to 200K tokens, and Gemini supports up to 1 million tokens.

Context window size has profound implications for what a model can do. A small context window limits conversations to short exchanges and prevents the model from processing long documents. A large context window enables the model to analyze entire books, codebases, or lengthy conversations while maintaining coherence. However, larger context windows come with tradeoffs: they require more memory and compute, increase latency and cost, and models may struggle to attend equally to information at different positions — a challenge known as the 'lost in the middle' problem.

The context window is not the same as memory. Once a conversation exceeds the context window, earlier messages are dropped — the model has no persistent memory of them. This is fundamentally different from how humans remember previous conversations. Techniques like RAG (Retrieval-Augmented Generation), summarization of conversation history, and external memory systems are used to work around context window limits. The race to build longer context windows — and to improve models' ability to effectively use the full context — is one of the most active areas of LLM research.

Key Takeaway

The context window determines how much text an LLM can process at once — it sets the boundary between what the model can 'see' and what it cannot, directly shaping AI capabilities and costs.

Real-World Applications

01 Document analysis: large context windows allow models to process entire legal contracts, research papers, or financial reports in a single prompt.

02 Code review: developers paste entire codebases (thousands of lines) for the model to analyze, debug, or refactor in context.

03 Long conversations: chatbot applications track conversation history within the context window to maintain coherent, multi-turn dialogue.

04 Book summarization: models with 100K+ token contexts can analyze entire novels or textbooks without chunking or summarization steps.

05 Multi-document synthesis: comparing and synthesizing information from several documents simultaneously within a single model interaction.

In Depth

Real-World Applications

Related Concepts