logoalt Hacker News

sebakubisztoday at 7:26 AM1 replyview on HN

This is the kind of porting work I always hope for when I see a CUDA-only release. Have you thought about publishing the gather-scatter sparse 3D convolution and SDPA attention swaps as a standalone toolkit or writeup? A lot of folks running models locally on Apple Silicon hit the same wall with flash_attn, nvdiffrast, and custom sparse kernels and end up redoing the same work.


Replies

shivampkumartoday at 8:33 AM

that makes so much sense...I am exploring if I can find someone who has done this well...If not I'll try to do it myself.