llm

Temperature

Learn what Temperature means in AI and machine learning, with examples and related concepts.

Definition

Temperature is a parameter that controls how random or deterministic an LLM’s output is. It scales the probability distribution over tokens before the model picks the next one.

The name comes from thermodynamics — higher temperature means more molecular randomness. Same idea: higher temperature means more randomness in token selection.

How It Works

Temperature modifies the softmax function that converts raw model scores (logits) into probabilities:

Original logits for next token:
  "Paris":  5.0
  "Lyon":   2.0
  "Berlin": 1.0

Temperature = 0.2 (focused):
  "Paris":  99.4%    ← almost certain
  "Lyon":    0.5%
  "Berlin":  0.1%

Temperature = 1.0 (balanced):
  "Paris":  84.4%
  "Lyon":   10.5%
  "Berlin":  3.9%

Temperature = 2.0 (creative):
  "Paris":  55.1%    ← much less certain
  "Lyon":   22.1%
  "Berlin": 14.0%

Mathematically: P(token) = exp(logit / T) / sum(exp(all_logits / T)) where T is the temperature.

Why It Matters

Choosing the right temperature is one of the easiest ways to improve LLM output quality:

Use CaseRecommended TemperatureWhy
Code generation0 - 0.2Code needs to be correct, not creative
Data extraction / JSON0Deterministic output for parsing
General Q&A0.3 - 0.7Balance accuracy with natural language
Creative writing0.8 - 1.0Varied, interesting prose
Brainstorming1.0 - 1.5Diverse, unexpected ideas

Setting temperature too high for factual tasks causes hallucinations. Setting it too low for creative tasks produces repetitive, boring text.

Example

from anthropic import Anthropic

client = Anthropic()

prompt = "Write a one-sentence product tagline for an AI code editor."

# Low temperature — consistent, safe output
for i in range(3):
    response = client.messages.create(
        model="claude-sonnet-4-6",
        max_tokens=50,
        temperature=0,
        messages=[{"role": "user", "content": prompt}]
    )
    print(f"T=0 run {i+1}: {response.content[0].text}")

# → All 3 outputs will be identical or near-identical

# Higher temperature — varied, creative output
for i in range(3):
    response = client.messages.create(
        model="claude-sonnet-4-6",
        max_tokens=50,
        temperature=1.0,
        messages=[{"role": "user", "content": prompt}]
    )
    print(f"T=1 run {i+1}: {response.content[0].text}")

# → Each output will be different
# Practical pattern: use low temperature for structured output
import json

response = client.messages.create(
    model="claude-sonnet-4-6",
    max_tokens=200,
    temperature=0,  # deterministic for reliable JSON
    messages=[{
        "role": "user",
        "content": "Extract {name, price, category} as JSON from: 'The Sony WH-1000XM5 headphones cost $349 in the audio category.'"
    }]
)

data = json.loads(response.content[0].text)
# → {"name": "Sony WH-1000XM5", "price": 349, "category": "audio"}

Temperature vs Top-p

Temperature and top-p both control randomness, but differently:

Most APIs let you set both, but it’s best to adjust one at a time. Anthropic’s API supports both temperature and top_p parameters.

Key Takeaways


Part of the DeepRaft Glossary — AI and ML terms explained for developers.