LLMs sometimes repeat themselves or get stuck in loops because text generation is a local next-token process. At each step, the model only decides what token should come next given the current context. It is not directly optimizing a global notion such as “avoid repetition across the whole answer.”

That means a repetition can become self-reinforcing. Once the model emits a pattern, that pattern becomes part of the context for the next step, which can make the same continuation even more likely.

Autoregressive generation feeds each newly produced token back into the model as part of the next step’s context

Several factors make loops more likely.

Low-diversity decoding

If you use greedy decoding, very low temperature, or an overly narrow top-k setting, the model keeps choosing the safest locally probable token. That can collapse into repetitive phrasing.

Weak stopping structure

If the prompt does not clearly define what a complete answer looks like, the model may continue a pattern instead of ending cleanly.

Exposure mismatch

During training, the model sees true text prefixes. During generation, it conditions on its own outputs. If it starts drifting into a repetitive region, each new repeated token can make the drift worse.

Distributional habits

Some training patterns really are repetitive, especially list continuations, boilerplate phrasing, or dialogue markers. The model may overuse those familiar continuations.

Decoding settings directly influence which candidate token the model picks next, which affects whether generation stays varied or collapses into repetitive patterns

This is why repetition is partly a model problem and partly a decoding problem.

Common mitigations include:

  • using a less rigid decoding strategy
  • improving the prompt so the desired format is clearer
  • adding repetition penalties or stop sequences
  • instruction finetuning or preference tuning to make answers more natural

In short, LLMs repeat themselves when local next-token choices start reinforcing the same pattern, which is especially likely under low-diversity decoding, weak prompts, or situations where the model is conditioning on its own repetitive outputs.