TRELLIS.2 Now Runs Natively on Apple Silicon: Microsoft's State-of-the-Art Image-to-3D Model Ported to Mac

A new open-source port allows Apple Silicon Mac users to generate detailed 3D meshes from single images without NVIDIA hardware, creating 400K+ vertex models in about 3.5 minutes on M4 Pro chips.

Microsoft's TRELLIS.2 image-to-3D generation model has been successfully ported to run natively on Apple Silicon Macs, enabling creators to generate detailed 3D content without requiring expensive NVIDIA GPUs. The open-source project, available on GitHub, represents a significant technical achievement in making cutting-edge 3D generation tools more accessible to Apple users.

What is TRELLIS.2?

TRELLIS.2 is a state-of-the-art model developed by Microsoft Research that can convert 2D images into detailed 3D meshes. This technology has applications across gaming, virtual reality, product visualization, and digital content creation. The original implementation required NVIDIA GPUs with CUDA support, limiting its accessibility to users with specialized hardware.

The new port, developed by shivampkumar, transforms this CUDA-only implementation to work with Apple Silicon through PyTorch MPS (Metal Performance Shaders), eliminating the need for NVIDIA hardware. This breakthrough makes advanced 3D generation capabilities available to the millions of users with M1, M2, M3, or M4 Macs.

Technical Achievements

The porting process involved replacing several CUDA-only dependencies with pure PyTorch and Python implementations:

Sparse 3D Convolution: The original flex_gemm kernel was replaced with a custom implementation in backends/conv_none.py that builds spatial hashes of active voxels, gathers neighbor features, applies weights via matrix multiplication, and scatter-adds results back.
Mesh Extraction: The mesh extraction functionality was reimplemented using Python dictionaries instead of CUDA hashmap operations. The new implementation builds coordinate-to-index lookup tables and triangulates quads using normal alignment heuristics.
Attention Mechanism: The sparse attention module was patched to use PyTorch's SDPA (Scaled Dot-Product Attention) instead of flash_attn, enabling the transformers to work without CUDA.
Device-Agnostic Code: All hardcoded .cuda() calls throughout the codebase were patched to use the active device instead, allowing the model to run seamlessly on Apple Silicon.

Performance and Capabilities

On an M4 Pro with 24GB of unified memory, the system can generate 400K+ vertex meshes from single images in approximately 3.5 minutes. The output includes vertex-colored OBJ and GLB files ready for use in various 3D applications.

The generation process consists of several stages:

Model loading: ~45 seconds
Image preprocessing: ~5 seconds
Sparse structure sampling: ~15 seconds
Shape SLat sampling: ~90 seconds
Texture SLat sampling: ~50 seconds
Mesh decoding: ~30 seconds

Memory usage peaks at around 18GB unified memory during generation, making the 24GB recommendation important for optimal performance.

Limitations and Trade-offs

Despite impressive performance, the port has several limitations compared to the original CUDA implementation:

No Texture Export: Texture baking requires nvdiffrast, a CUDA-only differentiable rasterizer. The current implementation exports meshes with vertex colors only.
Hole Filling Disabled: Mesh hole filling requires cumesh, which is CUDA-dependent. Generated meshes may contain small holes.
Performance Differences: The pure-PyTorch sparse convolution is approximately 10x slower than the CUDA flex_gemm kernel, representing the main performance bottleneck.
Inference Only: The port currently supports inference only, with no training capabilities.

Setup and Usage

The project requires macOS on Apple Silicon (M1 or later), Python 3.11+, and at least 24GB of unified memory. Users need approximately 15GB of disk space for model weights.

Setup involves cloning the repository, logging into HuggingFace (for gated model weights), and running the setup script which creates a virtual environment, installs dependencies, and clones & patches TRELLIS.2. After activation, users can generate 3D models from images using the generate.py script.

The project includes several options for customization:

--seed: Random seed for generation (default: 42)
--output: Output filename without extension (default: output_3d)
--pipeline-type: Pipeline resolution (options: 512, 1024, 1024_cascade; default: 512)

Broader Implications

This port represents a significant step forward in democratizing 3D content creation. By making state-of-the-art image-to-3D generation accessible to Apple Silicon users without requiring specialized hardware, the project lowers barriers for indie developers, artists, and small studios to create immersive 3D experiences.

The availability of such tools on consumer-grade hardware could accelerate adoption of 3D content across various industries, from e-commerce product visualization to educational content creation. As Apple continues to expand its silicon lineup, tools like TRELLIS.2 for Apple Silicon could become increasingly important in the creative workflow.

The project also highlights the growing importance of platform-agnostic AI research, where models are designed to run across different hardware architectures rather than being locked into specific vendor ecosystems.

For developers interested in exploring the code or contributing to the project, the GitHub repository contains all necessary information, including detailed technical documentation and setup instructions. The porting code is released under the MIT License, while the upstream model weights are subject to their respective licenses.

#Apple Silicon #3D Generation #PyTorch #Open Source #Microsoft Research