Timber: AOT Compiling Classical ML Models to Native C for Microsecond Inference

Timber converts XGBoost, LightGBM, scikit-learn, CatBoost and ONNX models into optimized native C code, serving them via a local HTTP API with claimed 336x speedup over Python inference.

The Timber project introduces an interesting approach to accelerating classical machine learning inference by compiling trained models into native C99 code. Positioned as 'Ollama for classical ML models,' Timber targets tree-based models commonly used in production systems but often hampered by Python's runtime overhead.

How Timber Works

Timber functions as an Ahead-of-Time (AOT) compiler that takes trained models from popular ML frameworks and converts them into optimized native C code. The workflow is straightforward:

Load a model using timber load model.json --name model-name
Serve it using timber serve model-name
Make predictions via HTTP API calls

The compiled models are served through a local HTTP API with endpoints similar to Ollama's design:

/api/predict (POST) - Run inference
/api/generate (POST) - Alias for predict
/api/models (GET) - List loaded models
/api/model/:name (GET) - Get model metadata
/api/health (GET) - Health check

The project supports several major ML frameworks:

XGBoost: JSON format models
LightGBM: Text, .model, and .lgb files
scikit-learn: Pickle files (.pkl, .pickle)
CatBoost: JSON exports
ONNX: TreeEnsembleClassifier/Regressor operators

Performance Claims

The project's most striking claim is a 336x speedup over Python XGBoost inference. The benchmark methodology provides useful context:

Hardware: Apple M2 Pro, 16GB RAM, macOS
Model: XGBoost binary classifier with 50 trees, max depth 4, 30 features
Dataset: breast_cancer from sklearn
Measurement: In-process latency (not HTTP round-trip)
Baseline: Python XGBoost (booster.predict)

The benchmarks focus on single-sample inference latency, which is crucial for real-time applications. The reported latency is approximately 2 microseconds for native calls, which is indeed impressive for tree-based models.

The project includes reproducible benchmark scripts for those who want to verify these claims independently. This transparency is commendable and follows best practices for performance benchmarking.

Comparison with Alternatives

Timber enters a space with several existing solutions for accelerating tree-based model inference:

Python Runtime

The baseline approach uses Python with the original framework (XGBoost, LightGBM, etc.). This offers:

Pros: Familiar development workflow, broad framework support
Cons: High runtime overhead (50-200MB process footprint), latency in the 100s of microseconds to milliseconds range

ONNX Runtime

ONNX Runtime provides a standardized inference engine with:

Pros: Broad model ecosystem, hardware acceleration options
Cons: Still carries significant runtime overhead (MBs to 10s of MBs), typically 100s of microseconds latency

Treelite

Treelite is specifically designed for gradient boosted decision trees:

Pros: Low-latency when compiled, mature GBDT support
Cons: Separate compile/runtime flow, framework-specific

lleaves

lleaves is LightGBM-focused with:

Pros: Lower latency than pure Python, LightGBM-optimized
Cons: Python runtime still required, LightGBM-specific

Timber's key differentiator is eliminating the Python runtime entirely from the inference hot path, producing standalone C99 binaries with minimal dependencies. This approach offers advantages for edge deployments and regulated environments where deterministic performance is critical.

Limitations and Caveats

Despite the impressive performance claims, Timber has several limitations worth noting:

ONNX Support: Currently limited to TreeEnsembleClassifier/Regressor operators. Complex ONNX models with other operators won't be supported.
CatBoost: Only supports JSON exports, not native binary formats.
scikit-learn: While it supports major tree estimators, uncommon or custom estimator wrappers may fail during parsing.
Security: As with any tool that parses Python pickle files, there are security considerations. The documentation explicitly states "only load trusted artifacts" when using pickle parsing.
XGBoost Format: Support is primarily for JSON-model based XGBoost exports. Binary booster formats are not the primary input path.
Benchmark Dependencies: Optional benchmark backends (ONNX Runtime, Treelite, lleaves) are only included if explicitly installed and configured.

Potential Use Cases

The project identifies several compelling use cases where Timber's approach makes particular sense:

Fraud/Risk Systems: Teams running classical models in low-latency transaction paths where microseconds matter.
Edge/IoT Deployments: When deploying models to gateways or embedded devices with limited resources.
Regulated Industries: Finance, healthcare, and automotive sectors needing deterministic artifacts and clear audit trails.
Platform/Infrastructure Teams: Looking to replace Python model-serving overhead with lightweight native binaries.

The small artifact size (~48KB for an example model) and lack of runtime dependencies make Timber particularly attractive for resource-constrained environments.

Technical Implementation

While the repository doesn't provide extensive details about the compilation internals, the fact that it produces C99 code is significant. C99 offers excellent portability across platforms and architectures, which aligns with Timber's goal of supporting edge deployments.

The project includes a technical paper that presumably goes deeper into the compilation approach and optimization strategies. For practitioners interested in the technical details, this would be essential reading.

Roadmap

The project's roadmap indicates several planned improvements:

Better framework/version compatibility coverage
Broader ONNX operator support beyond tree ensembles
Enhanced embedded deployment profiles (ARM Cortex-M/RISC-V presets)
Expanded benchmark matrices and public reproducibility reports
Additional safety/regulatory tooling for audit and MISRA-C workflows

Conclusion

Timber presents an interesting approach to accelerating classical ML inference by leveraging AOT compilation to native code. The 336x performance claim over Python is substantial, though the benchmark focuses specifically on single-sample inference latency rather than end-to-end throughput or real-world deployment scenarios.

For teams working with tree-based models in latency-sensitive applications, particularly those targeting edge deployments or operating in regulated environments, Timber offers a compelling alternative to traditional Python-based serving. The elimination of runtime dependencies and deterministic performance characteristics are significant advantages.

However, practitioners should carefully evaluate the limitations around framework support and the security implications of parsing model exports. The project's transparency with benchmarks and reproducible scripts is commendable and should give confidence to those considering adoption.

As machine learning deployment increasingly spans from cloud to edge, tools like Timber that optimize for performance, portability, and determinism will likely see growing relevance in the ML infrastructure landscape.

#AOT #C++#XGBoost #Edge #inference