logoalt Hacker News

Show HN: Run TRELLIS.2 Image-to-3D generation natively on Apple Silicon

172 pointsby shivampkumartoday at 12:07 AM29 commentsview on HN

I ported Microsoft's TRELLIS.2 (4B parameter image-to-3D model) to run on Apple Silicon via PyTorch MPS. The original requires CUDA with flash_attn, nvdiffrast, and custom sparse convolution kernels: none of which work on Mac.

I replaced the CUDA-specific ops with pure-PyTorch alternatives: a gather-scatter sparse 3D convolution, SDPA attention for sparse transformers, and a Python-based mesh extraction replacing CUDA hashmap operations. Total changes are a few hundred lines across 9 files.

Generates ~400K vertex meshes from single photos in about 3.5 minutes on M4 Pro (24GB). Not as fast as H100 (where it takes seconds), but it works offline with no cloud dependency.

https://github.com/shivampkumar/trellis-mac


Comments

sebakubisztoday at 7:26 AM

This is the kind of porting work I always hope for when I see a CUDA-only release. Have you thought about publishing the gather-scatter sparse 3D convolution and SDPA attention swaps as a standalone toolkit or writeup? A lot of folks running models locally on Apple Silicon hit the same wall with flash_attn, nvdiffrast, and custom sparse kernels and end up redoing the same work.

show 1 reply
sergiopreiratoday at 8:58 AM

Most 'runs on Mac' ports are a wrapper around a cloud call or a quantized shell of the original model. Going after the CUDA-specific kernels with pure-PyTorch alternatives is the kind of work that ages well, because the next CUDA-locked research release is three weeks away. One question: how much of the gather-scatter sparse conv is reusable for other TRELLIS-like architectures, or is it bespoke to this one?

gondartoday at 2:32 AM

Nice work. Although this model is not very good, I tried a lot of different image-to-3d models, the one from meshy.ai is the best, trellis is in the useless tier, really hope there could be some good open source models in this domain.

show 3 replies
drbscltoday at 10:57 AM

Does it support multi-view input?

petargyurovtoday at 7:12 AM

This is fantastic, great work. I will attempt to run it on my 16GB M1 but I doubt it'll run.

Out of curiosity, how did you go about replacing the CUDA specific ops? Any resources you relied on or just experience? Would love to learn more.

antireztoday at 9:24 AM

Great. Potentially can go much faster rewriting it in terms of Metal shaders.

kennyloginztoday at 2:40 AM

So much effort, but no examples in the landing page.

show 2 replies
post-ittoday at 4:04 AM

How much RAM does this use? Only sitting on 8 GB right now, I'm trying to figure out if I should buy 24 GB when it's time for a replacement or spring for 32.

show 2 replies
villgaxtoday at 1:34 AM

That’s always been possible with MPS backend, the reason people choose to omit it in HF spaces/demos is that HF doesn’t offer an MPS backend. People would rather have the thing work at best speeds than 10x worse speeds just for compatibility.

show 3 replies
jmatthewstoday at 3:15 AM

Well done

show 2 replies
takahitoyonedatoday at 11:30 AM

[dead]

techpulselabtoday at 8:04 AM

[dead]

jiexiangtoday at 7:03 AM

[dead]

vrr044today at 6:58 AM

[dead]

hank808today at 2:52 AM

[flagged]

show 3 replies