Show HN: Run TRELLIS.2 Image-to-3D generation natively on Apple Silicon

172 points • by shivampkumar • today at 12:07 AM • 29 comments • view on HN

I ported Microsoft's TRELLIS.2 (4B parameter image-to-3D model) to run on Apple Silicon via PyTorch MPS. The original requires CUDA with flash_attn, nvdiffrast, and custom sparse convolution kernels: none of which work on Mac.

I replaced the CUDA-specific ops with pure-PyTorch alternatives: a gather-scatter sparse 3D convolution, SDPA attention for sparse transformers, and a Python-based mesh extraction replacing CUDA hashmap operations. Total changes are a few hundred lines across 9 files.

Generates ~400K vertex meshes from single photos in about 3.5 minutes on M4 Pro (24GB). Not as fast as H100 (where it takes seconds), but it works offline with no cloud dependency.

https://github.com/shivampkumar/trellis-mac

Comments

sebakubisz • today at 7:26 AM

This is the kind of porting work I always hope for when I see a CUDA-only release. Have you thought about publishing the gather-scatter sparse 3D convolution and SDPA attention swaps as a standalone toolkit or writeup? A lot of folks running models locally on Apple Silicon hit the same wall with flash_attn, nvdiffrast, and custom sparse kernels and end up redoing the same work.

➕ show 1 reply

sergiopreira • today at 8:58 AM

Most 'runs on Mac' ports are a wrapper around a cloud call or a quantized shell of the original model. Going after the CUDA-specific kernels with pure-PyTorch alternatives is the kind of work that ages well, because the next CUDA-locked research release is three weeks away. One question: how much of the gather-scatter sparse conv is reusable for other TRELLIS-like architectures, or is it bespoke to this one?

gondar • today at 2:32 AM

Nice work. Although this model is not very good, I tried a lot of different image-to-3d models, the one from meshy.ai is the best, trellis is in the useless tier, really hope there could be some good open source models in this domain.

➕ show 3 replies

drbscl • today at 10:57 AM

Does it support multi-view input?

petargyurov • today at 7:12 AM

This is fantastic, great work. I will attempt to run it on my 16GB M1 but I doubt it'll run.

Out of curiosity, how did you go about replacing the CUDA specific ops? Any resources you relied on or just experience? Would love to learn more.

antirez • today at 9:24 AM

Great. Potentially can go much faster rewriting it in terms of Metal shaders.

kennyloginz • today at 2:40 AM

So much effort, but no examples in the landing page.

➕ show 2 replies

post-it • today at 4:04 AM

How much RAM does this use? Only sitting on 8 GB right now, I'm trying to figure out if I should buy 24 GB when it's time for a replacement or spring for 32.

➕ show 2 replies

villgax • today at 1:34 AM

That’s always been possible with MPS backend, the reason people choose to omit it in HF spaces/demos is that HF doesn’t offer an MPS backend. People would rather have the thing work at best speeds than 10x worse speeds just for compatibility.

➕ show 3 replies

jmatthews • today at 3:15 AM

Well done

➕ show 2 replies

takahitoyoneda • today at 11:30 AM

[dead]

techpulselab • today at 8:04 AM

[dead]

jiexiang • today at 7:03 AM

[dead]

vrr044 • today at 6:58 AM

[dead]

hank808 • today at 2:52 AM

[flagged]

➕ show 3 replies

alt Hacker News

Show HN: Run TRELLIS.2 Image-to-3D generation natively on Apple Silicon

Comments