A parameter that controls the randomness and creativity of a language model's output — low temperature produces focused, deterministic responses while high temperature yields more diverse, surprising, and creative text.
In Depth
Temperature is a hyperparameter that controls the probability distribution over possible next tokens when a language model generates text. Technically, it scales the logits (raw scores) output by the model before applying the softmax function. A temperature of 1.0 uses the model's native probability distribution. Temperatures below 1.0 sharpen the distribution, making high-probability tokens even more likely and suppressing unlikely ones — producing more predictable, focused output. Temperatures above 1.0 flatten the distribution, giving lower-probability tokens a greater chance of being selected — producing more varied and creative output.
At temperature 0 (or very close to it), the model always selects the highest-probability token — a strategy called greedy decoding. This produces the most deterministic, consistent output, ideal for factual queries, coding tasks, or any context where reproducibility matters. At high temperatures (0.8-1.2), the model explores a wider range of token choices, which is useful for creative writing, brainstorming, or generating diverse options. At very high temperatures (>1.5), output often becomes incoherent as even very low-probability tokens have a meaningful chance of selection.
Temperature is one of several sampling parameters. Top-P (nucleus sampling) limits token selection to the smallest set of tokens whose cumulative probability exceeds a threshold P. Top-K restricts selection to the K highest-probability tokens. These parameters can be combined with temperature for fine-grained control. In practice, most applications use temperature between 0.0 and 1.0 — LLM APIs expose temperature as a user-configurable setting, making it one of the most accessible knobs for controlling AI behavior.
Temperature controls the creativity-accuracy tradeoff in language models — lower values produce focused, deterministic output while higher values enable more diverse, creative generation.