The maximum amount of text (measured in tokens) that a language model can process in a single input-output interaction — determining how much conversation history, document content, or context the model can 'see' at once.
In Depth
A language model's context window is the maximum number of tokens it can accept as input and generate as output in a single interaction. Everything the model 'knows' during a conversation — the system prompt, conversation history, uploaded documents, and its own previous responses — must fit within this window. Early GPT-3 models had a 4K token context window (roughly 3,000 words). Modern models have expanded dramatically: GPT-4 supports up to 128K tokens, Claude offers up to 200K tokens, and Gemini supports up to 1 million tokens.
Context window size has profound implications for what a model can do. A small context window limits conversations to short exchanges and prevents the model from processing long documents. A large context window enables the model to analyze entire books, codebases, or lengthy conversations while maintaining coherence. However, larger context windows come with tradeoffs: they require more memory and compute, increase latency and cost, and models may struggle to attend equally to information at different positions — a challenge known as the 'lost in the middle' problem.
The context window is not the same as memory. Once a conversation exceeds the context window, earlier messages are dropped — the model has no persistent memory of them. This is fundamentally different from how humans remember previous conversations. Techniques like RAG (Retrieval-Augmented Generation), summarization of conversation history, and external memory systems are used to work around context window limits. The race to build longer context windows — and to improve models' ability to effectively use the full context — is one of the most active areas of LLM research.
The context window determines how much text an LLM can process at once — it sets the boundary between what the model can 'see' and what it cannot, directly shaping AI capabilities and costs.