Researchers establish mathematical correspondence between hierarchical decision trees and diffusion models, introducing Global Trajectory Score Matching optimization principle and demonstrating practical applications in tabular data generation and model distillation.
Unifying Decision Trees and Diffusion Models: A Theoretical Bridge with Practical Applications
In a significant theoretical development, researchers Sai Niranjan Ramachandran and Suvrit Sra have established a mathematical correspondence between two seemingly disparate machine learning paradigms: decision trees and diffusion models. Their work, "Trees to Flows and Back: Unifying Decision Trees and Diffusion Models," accepted to ICML 2026, reveals that these model classes—traditionally viewed as fundamentally different (one discrete and hierarchical, the other continuous and dynamic)—share deep mathematical connections in appropriate limiting regimes.
Theoretical Foundations: Bridging Discrete and Continuous
The core contribution of this paper lies in its rigorous mathematical framework that connects hierarchical decision trees with diffusion processes. Decision trees operate on discrete, categorical decisions that partition the feature space, while diffusion models work through continuous stochastic processes that gradually transform noise into data samples.
The researchers demonstrate that in certain limiting regimes, hierarchical decision trees can be viewed as discrete approximations of continuous diffusion processes. This correspondence is not merely superficial; it reveals that both model classes can be understood through the lens of trajectory optimization in their respective spaces.
"Our work shows that what appears to be a fundamental dichotomy between tree-based methods and diffusion models is actually a manifestation of the same underlying principles operating in different domains," the authors explain.
Global Trajectory Score Matching: A Unifying Optimization Principle
Central to this unification is the introduction of Global Trajectory Score Matching (GTSM), a novel optimization principle that applies to both model classes. The researchers demonstrate that gradient boosting, in an idealized version, is asymptotically optimal for GTSM, providing a theoretical justification for the empirical success of gradient-based methods in tree optimization.
This principle offers a fresh perspective on how both decision trees and diffusion models can be optimized. Rather than treating these approaches as fundamentally different optimization problems, GTSM provides a common framework that explains their shared behavior and guides their improvement.
Practical Applications: TreeFlow and Dsmtree
The theoretical contributions are complemented by two practical implementations that demonstrate the value of this unification:
TreeFlow: Efficient Tabular Data Generation
TreeFlow is a novel generative model for tabular data that leverages the mathematical correspondence between trees and diffusion processes. According to the paper, TreeFlow achieves competitive generation quality on tabular datasets while offering significant advantages over existing methods:
- Higher fidelity in generated samples
- 2× computational speedup compared to baseline diffusion models
- Improved interpretability through the inherent structure of decision trees
This is particularly valuable for tabular data, where diffusion models have historically lagged behind their performance in image domains. TreeFlow brings the benefits of diffusion-based generation to structured data while maintaining computational efficiency.
Dsmtree: Distilling Tree Knowledge into Neural Networks
The second practical contribution, Dsmtree, addresses the challenge of transferring the hierarchical decision logic from tree models into neural networks. This distillation method preserves the interpretability and structured reasoning of decision trees within the flexible framework of neural networks.
The researchers report that Dsmtree matches teacher performance within 2% on many benchmarks while providing the benefits of neural network architectures. This is significant because it allows for the deployment of tree-derived knowledge in settings where neural networks are preferred, such as in deep learning pipelines or resource-constrained environments.
Evaluation and Benchmark Results
The paper presents extensive experimental validation across multiple datasets and tasks. For TreeFlow, the authors demonstrate competitive performance on standard tabular benchmarks including UCI datasets and financial data. The 2× speedup is particularly notable, as it addresses one of the primary computational limitations of diffusion models.
For Dsmtree, the researchers evaluated the model across classification and regression tasks, finding consistent performance within 2% of the teacher decision trees while offering the advantages of neural network representations.
Implications and Future Directions
This work has several important implications for the machine learning community:
Theoretical Understanding: It provides a deeper mathematical understanding of both decision trees and diffusion models, revealing their connections rather than treating them as separate paradigms.
Algorithm Design: The GTSM principle opens new avenues for designing optimization algorithms that work effectively across both discrete and continuous model spaces.
Model Interpretability: By connecting the interpretable nature of decision trees with the generative capabilities of diffusion models, this work may help develop more interpretable generative models.
Computational Efficiency: The demonstrated speedup in TreeFlow suggests that diffusion models can be made more practical for real-world applications, particularly with tabular data.
The authors suggest several promising directions for future work, including extending the theoretical framework to other model classes, developing more efficient implementations of TreeFlow, and exploring applications in domains like healthcare and finance where both interpretability and generative modeling are valuable.
Conclusion
"Trees to Flows and Back" represents a significant theoretical advance in machine learning, establishing a rigorous mathematical connection between decision trees and diffusion models. Through the introduction of Global Trajectory Score Matching and practical implementations like TreeFlow and Dsmtree, this work not only advances our understanding of these model classes but also provides new tools for practitioners.
As machine learning continues to evolve, such unifying frameworks become increasingly important for developing more efficient, interpretable, and powerful models. This paper contributes to that effort by revealing hidden connections between seemingly disparate approaches and demonstrating their practical value.
For those interested in exploring this work further, the paper is available on arXiv:2605.00414 and will be presented at the Forty-Third International Conference on Machine Learning (ICML) 2026.


Comments
Please log in or register to join the discussion