The current AI arms race is largely a game of brute force. For years, the industry has followed a simple, expensive mantra: more data and more parameters equal more intelligence. But as the costs of training massive models skyrocket and the physical limits of hardware loom, a different philosophy is gaining traction. It is not about making the model larger, but making it more efficient through a process called recursion.
The conversation reached a fever pitch recently when YCombinator released a video titled “Recursion Is The Next Scaling Law In AI,” breaking down Hierarchical Reasoning Models (HRM) and Tiny Recursive Models (TRM). The premise is compelling: instead of a linear path where data flows through a hundred different layers, a recursive model can loop the same layer multiple times, essentially “thinking” through a problem until it reaches a solution.
While the YCombinator presentation has sparked a new wave of interest, veteran AI researchers and critics, including Grigory Sapunov, argue that this “new” scaling law is actually a homecoming. The concepts driving HRM and TRM are rooted in a lineage of “depth recursion” that dates back several years, most notably to the 2018 Universal Transformer (UT). By ignoring these predecessors, the current hype cycle misses a crucial distinction: the difference between using recursion for memory and using it for computation.
Depth vs. Sequence: The Architecture of Thought
To understand why this distinction matters, one must look at how transformers—the engine behind ChatGPT and Claude—actually process information. In standard models, each layer is unique. once a token passes through Layer 1, it moves to Layer 2 and never returns. What we have is computationally expensive and memory-intensive.
Early attempts at recursion, such as Transformer-XL and the Compressive Transformer (both 2019), focused on recursion over the sequence. These models were designed to handle longer strings of text by recycling previous segments of data, effectively acting as a form of extended memory. They solved the “context window” problem, but they didn’t change how the model reasoned.
:max_bytes(150000):strip_icc()/TRANSFORMERSTheRide-3D-5942a9e43df78c537b138e15.jpg)
The Universal Transformer took a different path: recursion over depth. Instead of having 24 different layers, a UT model might have one highly optimized layer that the data passes through repeatedly. In this setup, recursion is not about memory; it is about computation. The model applies the same set of weights over and over, refining its understanding of a token with each pass.
This approach led to the development of “adaptive computation time.” While a model like ALBERT (A Lite BERT, 2019) uses shared weights but a fixed number of iterations, the Universal Transformer can theoretically decide how much “thought” a specific token requires. A simple word like “the” might pass through the layer once and be finished, while a complex mathematical variable might be “simmered” through ten iterations before the model feels confident in the result.
The Hardware Gamble and the Edge Computing Win
The shift toward recursive models isn’t just a theoretical exercise in computer science; it is a pragmatic response to the soaring cost of hardware. As the industry pushes toward “edge AI”—bringing powerful models to smartphones, wearables, and IoT devices—the memory footprint becomes the primary bottleneck.
A standard transformer requires memory for every single layer it possesses. If a model has 24 layers, it needs the storage and bandwidth to support 24 distinct sets of weights. A recursive model, by contrast, can be orders of magnitude smaller because it reuses the same weights. This drastically reduces the need to constantly shuffle data from High Bandwidth Memory (HBM) to the accelerator’s SRAM, a process that often slows down computation and wastes energy.
| Model Approach | Recursion Type | Primary Benefit | Key Example |
|---|---|---|---|
| Sequence-based | Over Sequence | Extended Context/Memory | Transformer-XL |
| Depth-based | Over Depth | Computational Efficiency | Universal Transformer |
| Block-based | Over Layer Blocks | Balanced Scale/Reasoning | Ouro / Huginn |
This efficiency enables “test-time scaling.” In traditional models, if you want a better answer, you often have to prompt the model to “think step-by-step” in the output text. Recursive models allow for reasoning inside the latent space; you can simply run the recursion deeper at the moment of inference to achieve a higher-quality result without needing to retrain the model from scratch.
The Rise of the Looped Transformers
The current landscape is seeing a proliferation of “Looped Transformers,” which are essentially evolved versions of the Universal Transformer. Recent projects like Ouro—named after the Ouroboros, the snake eating its own tail—and Huginn are implementing “LoopLMs.” Unlike the minimal setups of the past that repeated a single layer, these models repeat entire blocks of layers, combining the stability of deep networks with the efficiency of recursion.
New research, including papers such as “Loop, Think, & Generalize” and “Hyperloop Transformers,” suggests that the field is moving toward a hybrid future. The goal is to combine these recursive structures with other modern optimizations: sparse Mixture of Experts (MoE), low-bit quantization, and mHC (multi-head compression). When these technologies converge, the result could be a model that possesses the reasoning capabilities of a giant LLM but fits comfortably on a piece of wearable hardware.
As the AI community looks toward 2026, the focus is shifting from how many parameters a model has to how effectively it can iterate. The industry is moving away from the “bigger is better” era and toward a period of architectural elegance, where the ability to recurse may prove more valuable than the ability to scale.
The next major milestone for this architecture will be the release of larger-scale pre-trained Looped Language Models, which are expected to demonstrate whether depth-recursion can maintain stability at the scale of frontier models. Updates on these implementations are typically tracked via ArXiv and major AI research hubs.
Do you think recursive AI will finally bring “true” reasoning to our devices, or is this just another optimization? Let us know in the comments or share this story with your network.
