logoalt Hacker News

Flere-Imsahoyesterday at 7:37 PM3 repliesview on HN

Yeah the future is probably a number of highly specialised small models you can run on your own hardware rather than massive frontier models in the cloud.

That's what I'm betting on anyway.


Replies

thewebguydyesterday at 7:43 PM

That seems to be what Microsoft is betting on also based on what was shown at the BUILD keynote today + that new surface ultra and the surface mini PC with the new Nvidia chip. Nadella really played up local AI as the main use case they have in mind.

search_facilityyesterday at 7:49 PM

MOE basically work that way already, QWEN/etc with low active params (A-number in name) allows to inference big models locally (only active params have to fit into memory)

girvoyesterday at 10:11 PM

Step 3.7 Flash on my Asus GB10 based mini pc is incredibly close to that today. I’m very impressed, and that’s without MTP to boost performance