Researchers demonstrate that autoregressive models can develop emergent temporal abstractions—internal controllers that compress sequences of actions into learned behaviors. This 'internal RL' approach enables efficient exploration in hierarchical tasks with sparse rewards, overcoming limitations of token-by-token sampling. The breakthrough could accelerate progress in foundation models for complex decision-making.