logoalt Hacker News

zkmonyesterday at 2:51 PM2 repliesview on HN

I'm guessing 3.5-27b would beat 3.6-35b. MoE is a bad idea. Because for the same VRAM 27b would leave a lot more room, and the quality of work directly depends on context size, not just the "B" number.


Replies

zozbot234yesterday at 2:59 PM

MoE is not a bad idea for local inference if you have fast storage to offload to, and this is quickly becoming feasible with PCIe 5.0 interconnect.

perbuyesterday at 4:50 PM

MoE is excellent for the unified memory inference hardware like DGX Sparc, Apple Studio, etc. Large memory size means you can have quite a few B's and the smaller experts keeps those tokens flowing fast.