My main thought is would this allow me to speed up prompt process for large MoE models? That is the ...

frankc • yesterday at 5:45 PM • 1 reply • view on HN

My main thought is would this allow me to speed up prompt process for large MoE models? That is the real bottleneck for m3ultra. The tokens per second is pretty good.

Replies

embedding-shape • yesterday at 6:00 PM

tinygrad does have pretty neat support for sharding things across various devices relatively easy, that'd help. I'm guessing you'd hit the bandwidth ceiling transferring stuff back and forth though instead.

alt Hacker News

Replies