What is Temperature? — AI Glossary

Definition

Temperature is a parameter that controls how random or deterministic an LLM’s output is. It scales the probability distribution over tokens before the model picks the next one.

Temperature = 0 — The model always picks the most likely token. Output is deterministic and focused.
Temperature = 1 — The model samples from the full probability distribution as-is. Output is varied and natural.
Temperature > 1 — Probabilities get flattened, making unlikely tokens more probable. Output becomes more creative (or chaotic).

The name comes from thermodynamics — higher temperature means more molecular randomness. Same idea: higher temperature means more randomness in token selection.

How It Works

Temperature modifies the softmax function that converts raw model scores (logits) into probabilities:

Original logits for next token:
  "Paris":  5.0
  "Lyon":   2.0
  "Berlin": 1.0

Temperature = 0.2 (focused):
  "Paris":  99.4%    ← almost certain
  "Lyon":    0.5%
  "Berlin":  0.1%

Temperature = 1.0 (balanced):
  "Paris":  84.4%
  "Lyon":   10.5%
  "Berlin":  3.9%

Temperature = 2.0 (creative):
  "Paris":  55.1%    ← much less certain
  "Lyon":   22.1%
  "Berlin": 14.0%

Mathematically: P(token) = exp(logit / T) / sum(exp(all_logits / T)) where T is the temperature.

Why It Matters

Choosing the right temperature is one of the easiest ways to improve LLM output quality:

Use Case	Recommended Temperature	Why
Code generation	0 - 0.2	Code needs to be correct, not creative
Data extraction / JSON	0	Deterministic output for parsing
General Q&A	0.3 - 0.7	Balance accuracy with natural language
Creative writing	0.8 - 1.0	Varied, interesting prose
Brainstorming	1.0 - 1.5	Diverse, unexpected ideas

Setting temperature too high for factual tasks causes hallucinations. Setting it too low for creative tasks produces repetitive, boring text.

Example

from anthropic import Anthropic

client = Anthropic()

prompt = "Write a one-sentence product tagline for an AI code editor."

# Low temperature — consistent, safe output
for i in range(3):
    response = client.messages.create(
        model="claude-sonnet-4-6",
        max_tokens=50,
        temperature=0,
        messages=[{"role": "user", "content": prompt}]
    )
    print(f"T=0 run {i+1}: {response.content[0].text}")

# → All 3 outputs will be identical or near-identical

# Higher temperature — varied, creative output
for i in range(3):
    response = client.messages.create(
        model="claude-sonnet-4-6",
        max_tokens=50,
        temperature=1.0,
        messages=[{"role": "user", "content": prompt}]
    )
    print(f"T=1 run {i+1}: {response.content[0].text}")

# → Each output will be different

# Practical pattern: use low temperature for structured output
import json

response = client.messages.create(
    model="claude-sonnet-4-6",
    max_tokens=200,
    temperature=0,  # deterministic for reliable JSON
    messages=[{
        "role": "user",
        "content": "Extract {name, price, category} as JSON from: 'The Sony WH-1000XM5 headphones cost $349 in the audio category.'"
    }]
)

data = json.loads(response.content[0].text)
# → {"name": "Sony WH-1000XM5", "price": 349, "category": "audio"}

Temperature vs Top-p

Temperature and top-p both control randomness, but differently:

Temperature scales all probabilities — changes the shape of the entire distribution
Top-p cuts off the tail — only considers tokens whose cumulative probability reaches p

Most APIs let you set both, but it’s best to adjust one at a time. Anthropic’s API supports both temperature and top_p parameters.

Key Takeaways

Temperature controls randomness: 0 = deterministic, 1 = natural sampling, >1 = more random
Use low temperature (0-0.3) for code, data extraction, and factual tasks
Use higher temperature (0.7-1.0) for creative writing and brainstorming
Temperature is the single most impactful generation parameter — adjust it before anything else
When using both temperature and top-p, change one at a time to understand the effect

Part of the DeepRaft Glossary — AI and ML terms explained for developers.

Temperature