Experiment Reveals Bizarre Outputs When Sampling LLaMA at Negative Temperatures

By modifying llama.cpp to accept negative temperature values, a researcher demonstrated that LLaMA generates highly improbable tokens—including anomalous embeddings near the centroid—resulting in nonsensical outputs that expose limitations in language model behavior.

When language models generate text, the temperature parameter typically controls creativity: lower values produce predictable outputs, while higher values increase randomness. But what happens when temperature drops below zero? A recent experiment with Meta's LLaMA model reveals unexpected behavior that challenges intuitive expectations.

In neural networks, temperature scaling adjusts the softmax function applied to logits (raw prediction scores). At temperature $T$, probabilities are computed as $p_i = \frac{e^{z_i / T}}{\sum_j e^{z_j / T}}$. This mirrors the Boltzmann distribution from statistical mechanics, where negative $T$ inverts probabilities—making the least likely states most probable. While physical systems rarely support negative temperatures due to infinite state spaces, language models have finite vocabularies, theoretically enabling this inversion.

To test this, the researcher modified llama.cpp, an open-source LLaMA inference tool. By default, llama.cpp restricts temperatures to $T \geq 0$, defaulting to greedy sampling at $T=0$. A code change replaced if (temp <= 0) with if (temp == 0), allowing negative values. The experiment used LLaMA-7B and 13B models with $T = -0.001$, disabling other sampling techniques (repetition penalty, top-k, top-p).

Results were striking. At $T=0.001$, outputs were coherent (e.g., "Temperature is a concept that describes..."). At $T=-0.001$, however, generations degenerated into repetitive gibberish like "Хронологија entferne Kontrola"—tokens that analysis revealed lie near the centroid of LLaMA's embedding space. These embeddings represent low-information tokens the model struggles to interpret, evidenced by follow-up tests where LLaMA refused to output "entferne" even when explicitly prompted, substituting it with unrelated words like "get".

This behavior occurs because negative temperature amplifies tokens with near-zero logits, which cluster near the embedding centroid. As prior research notes, centroid-proximal tokens often exhibit anomalous properties in transformer models. While the outputs appear random, they are systematically improbable—essentially an "anti-creative" mode maximizing surprise.

Practically, negative temperature sampling lacks clear applications, as outputs are unusable for tasks like text generation. The experiment primarily illuminates model internals: centroidal tokens act as blind spots, and refusal to generate them (even under direct instruction) suggests alignment mechanisms or architectural biases. Limitations include LLaMA's specific tokenizer and embedding structure; results may vary in other models. For full details, see the original blog post.

#LLMs #temperature #negative sampling #token embeddings #model behavior

Experiment Reveals Bizarre Outputs When Sampling LLaMA at Negative Temperatures

Comments