PyTorch’s New Knapsack Solver Cuts Memory Footprint by 20×
Share this article
PyTorch’s New Knapsack Solver Cuts Memory Footprint by 20×
When training deep neural networks, the amount of GPU‑side memory available for storing intermediate activations can become a hard bottleneck. PyTorch mitigates this by memory planning: it decides which tensors to keep in memory and which to recompute during the backward pass. The decision is formulated as a classic 0/1 knapsack problem, where each operation is an item with a weight (the memory it occupies) and a value (the time saved by not recomputing it).
From a simple DP table to a sliding‑window, Hirschberg solver
The original implementation, dp_knapsack, builds a full two‑dimensional DP table of size
(number_of_operations, max_memory_budget)
and then backtracks to recover the optimal set of operations. While correct, this approach is memory‑hungry and slow for large models.
Horace He, one of the architects of PyTorch’s memory planner, introduced two optimisations:
Sliding‑window DP – instead of keeping the entire table, only the current and previous rows are stored. This reduces the table dimension from
n × Wto2 × W.- Hirschberg’s divide‑and‑conquer – a classic algorithm for reconstructing solutions to knapsack without backtracking. By splitting the item list recursively and computing partial DP rows, the algorithm determines the optimal split point and builds the solution on the fly.
Combined, these techniques cut the peak memory needed for the DP table by roughly a factor of 20, enabling PyTorch to handle models with thousands of operations that previously would have crashed on a 64 GB machine.
How it works in practice
import torch
import torch._functorch.config as fconfig
# Enable the new solver
fconfig.activation_memory_budget_solver = fconfig.dp_knapsack_sliding_hirschberg
After setting this flag early in your script, PyTorch automatically uses the new solver for all subsequent memory planning. The solver remains experimental and is only available when you build PyTorch from the main branch; it is not yet part of a released wheel.
Alternatives
| Solver | Speed | Accuracy | Dependencies |
|---|---|---|---|
dp_knapsack |
Baseline | Exact | None |
dp_knapsack_sliding_hirschberg |
~37 % faster | Exact | None |
ilp_knapsack |
Much faster | Exact | SciPy |
greedy_knapsack |
Fastest | Approximate | None |
If you can afford an additional dependency, ilp_knapsack (based on an integer‑linear‑programming solver) is considerably faster than either DP variant. For pure speed at the cost of exactness, greedy_knapsack is the go‑to choice.
Why this matters
Memory planning is a hidden cost in deep learning pipelines. A 20× reduction in peak memory not only allows larger models or batch sizes but also reduces the risk of OOM crashes during training, which can be costly in distributed settings. Moreover, the runtime improvement means fewer recomputations, translating directly into faster epochs.
The sliding‑window, Hirschberg solver is a concrete example of how algorithmic research—here, a classic divide‑and‑conquer trick—can have immediate, practical impact on a widely used framework.
“The knapsack analogy is a great way to understand the trade‑offs PyTorch makes on the fly.” – Anonymous PyTorch contributor
Next steps for developers
- Try it out – Build PyTorch from source and set the flag as shown above. Monitor
torch.cuda.memory_allocated()to verify the reduced peak. - Report bugs – The author welcomes issue reports on GitHub; tag the maintainer
@jmaczan. - Consider alternatives – If you need the absolute fastest planner, experiment with
ilp_knapsackorgreedy_knapsack.
The new solver demonstrates that even mature libraries still have room for optimization. As models grow and memory budgets shrink, such algorithmic refinements will become increasingly valuable.
Source: https://jedrzej.maczan.pl/2025_11_21_dp_knapsack_sliding_hirschberg