Temperature in LLMs – Definition & How It Affects AI Output

Definition

A parameter that controls the randomness and creativity of a language model's output — low temperature produces focused, deterministic responses while high temperature yields more diverse, surprising, and creative text.

In Depth

Temperature is a hyperparameter that controls the probability distribution over possible next tokens when a language model generates text. Technically, it scales the logits (raw scores) output by the model before applying the softmax function. A temperature of 1.0 uses the model's native probability distribution. Temperatures below 1.0 sharpen the distribution, making high-probability tokens even more likely and suppressing unlikely ones — producing more predictable, focused output. Temperatures above 1.0 flatten the distribution, giving lower-probability tokens a greater chance of being selected — producing more varied and creative output.

At temperature 0 (or very close to it), the model always selects the highest-probability token — a strategy called greedy decoding. This produces the most deterministic, consistent output, ideal for factual queries, coding tasks, or any context where reproducibility matters. At high temperatures (0.8-1.2), the model explores a wider range of token choices, which is useful for creative writing, brainstorming, or generating diverse options. At very high temperatures (>1.5), output often becomes incoherent as even very low-probability tokens have a meaningful chance of selection.

Temperature is one of several sampling parameters. Top-P (nucleus sampling) limits token selection to the smallest set of tokens whose cumulative probability exceeds a threshold P. Top-K restricts selection to the K highest-probability tokens. These parameters can be combined with temperature for fine-grained control. In practice, most applications use temperature between 0.0 and 1.0 — LLM APIs expose temperature as a user-configurable setting, making it one of the most accessible knobs for controlling AI behavior.

Key Takeaway

Temperature controls the creativity-accuracy tradeoff in language models — lower values produce focused, deterministic output while higher values enable more diverse, creative generation.

Real-World Applications

01 Code generation: temperature near 0 ensures deterministic, syntactically correct code that consistently follows instructions.

02 Creative writing: temperature 0.7-1.0 encourages varied word choice and unexpected narrative directions for fiction, poetry, and brainstorming.

03 Chatbot deployment: customer support bots use low temperature for consistent, reliable answers; creative assistants use higher temperature for engaging responses.

04 Data augmentation: high temperature generates diverse paraphrases of training examples for dataset expansion in NLP tasks.

05 Multiple response generation: running the same prompt at moderate temperature several times produces diverse candidate responses that users or rankers can choose from.

Temperature

In Depth

Real-World Applications

Related Concepts