Machine Learning FAQ
Why do LLMs sometimes repeat themselves or get stuck in loops during generation?
LLMs sometimes repeat themselves or get stuck in loops because text generation is a local next-token process. At each step, the model only decides what token should come next given the current context. It is not directly optimizing a global notion such as “avoid repetition across the whole answer.”
That means a repetition can become self-reinforcing. Once the model emits a pattern, that pattern becomes part of the context for the next step, which can make the same continuation even more likely.

Several factors make loops more likely.
Low-diversity decoding
If you use greedy decoding, very low temperature, or an overly narrow top-k setting, the model keeps choosing the safest locally probable token. That can collapse into repetitive phrasing.
Weak stopping structure
If the prompt does not clearly define what a complete answer looks like, the model may continue a pattern instead of ending cleanly.
Exposure mismatch
During training, the model sees true text prefixes. During generation, it conditions on its own outputs. If it starts drifting into a repetitive region, each new repeated token can make the drift worse.
Distributional habits
Some training patterns really are repetitive, especially list continuations, boilerplate phrasing, or dialogue markers. The model may overuse those familiar continuations.

This is why repetition is partly a model problem and partly a decoding problem.
Common mitigations include:
- using a less rigid decoding strategy
- improving the prompt so the desired format is clearer
- adding repetition penalties or stop sequences
- instruction finetuning or preference tuning to make answers more natural
In short, LLMs repeat themselves when local next-token choices start reinforcing the same pattern, which is especially likely under low-diversity decoding, weak prompts, or situations where the model is conditioning on its own repetitive outputs.