Article illustration 1

Scaling Evolution Strategies to Billion‑Parameter Models with Low‑Rank Perturbations

Source: arXiv:2511.16652

Evolution Strategies (ES) have long been prized for their black‑box optimization prowess, especially in reinforcement learning (RL) where gradients are noisy or unavailable. Yet the classic ES formulation—adding full‑rank Gaussian perturbations to every parameter—becomes prohibitively expensive once models exceed a few hundred million weights. In their November 2025 preprint, Bidipta Sarkar and colleagues introduce EGGROLL (Evolution Guided General Optimization via Low‑rank Learning), a method that replaces the costly full‑rank perturbation matrix with a pair of low‑rank factors. The result is a dramatic reduction in both memory footprint and forward‑pass cost, while preserving the convergence properties of standard ES.

“EGGROLL scales backprop‑free optimization to large population sizes for modern large neural network architectures with billions of parameters.” — arXiv abstract

The Bottleneck of Classic ES

A vanilla ES run with a population of size N perturbs the parameter vector (\theta) by adding a matrix (E) of Gaussian noise. Each worker then evaluates the perturbed model, and the gradients are aggregated as

[\Delta\theta \approx \frac{1}{N\sigma^2}\sum_{i=1}^{N} E_i R_i,]

where (R_i) is the return of worker (i). The cost of generating and applying (E) scales as (O(mn)), which quickly becomes a memory and compute bottleneck for (m,n) on the order of (10^9).

EGGROLL sidesteps this by factorizing the perturbation:

[E \approx A B^\top,\quad A \in \mathbb{R}^{m\times r},\; B \in \mathbb{R}^{n\times r},\; r \ll \min(m,n).]

The forward pass now requires only (O(r(m+n))) operations, and the auxiliary storage shrinks from (mn) to (r(m+n)) per layer. The authors prove that as (r) grows, the low‑rank update converges to the full‑rank update at a rate (O(1/r)), ensuring that the approximation does not compromise learning quality.

Empirical Validation

The paper presents three key experiments:

  1. Tabula‑Rasa RL – On standard benchmarks (e.g., Atari, MuJoCo), EGGROLL matches the performance of full‑rank ES while reducing runtime by up to 4× on a 8‑GPU cluster.
  2. LLM Reasoning – When applied to a 1.5‑B parameter transformer, EGGROLL outperforms Gradient‑Based Policy Optimisation (GRPO) in few‑shot reasoning tasks, achieving a 12% higher accuracy on the GSM8K dataset.
  3. Integer‑Only Language Models – EGGROLL enables stable pre‑training of a recurrent language model that operates entirely in 8‑bit integers, a setting where back‑propagation is notoriously difficult.

These results suggest that low‑rank perturbations can be a practical replacement for full‑rank noise in large‑scale black‑box optimization, opening the door to training models that were previously out of reach for ES.

Why It Matters for Developers

  • Parallelism without Back‑Propagation – ES is naturally amenable to distributed execution. EGGROLL keeps this advantage while making the per‑worker workload tractable on commodity GPUs.
  • Robustness to Noisy Objectives – Many modern RL environments, especially in robotics or game‑AI, produce highly stochastic rewards. ES remains a reliable tool; EGGROLL simply makes it scalable.
  • Integer‑Only Training – With the industry’s shift toward mixed‑precision and quantized inference, the ability to train end‑to‑end models in low‑precision formats is a significant competitive edge.

“EGGROLL enables stable pre‑training of nonlinear recurrent language models that operate purely in integer datatypes.” — arXiv abstract

Looking Forward

The community will likely explore several extensions:

  • Adaptive Rank Selection – Dynamically tuning (r) during training could balance speed and fidelity.
  • Hybrid ES‑Gradient Methods – Combining low‑rank ES with sparse gradient updates may yield even faster convergence.
  • Hardware Acceleration – GPUs and TPUs already provide efficient matrix‑multiply primitives; specialized kernels for low‑rank perturbations could further reduce latency.

For now, EGGROLL represents a compelling step toward democratizing large‑scale, back‑prop‑free training. As models grow and compute budgets stretch, techniques that reduce per‑sample cost without sacrificing performance will become indispensable.

Article illustration 2