logoalt Hacker News

taffydavidtoday at 7:32 AM1 replyview on HN

Noob q: can advancements like this targeted at local inference have bonus effects for cloud inference? Presumably if you can get great results on cheaper hardware that also equates to less resource usage on cutting edge hardware, and less power draw?

Will advancements like this ultimately reduce the carbon footprint of AI?


Replies

goldenarmtoday at 8:59 AM

Consumer and server hardware are quite different, especially Google's TPUs. They notably have much larger mixture-of-experts ratios and more complex caching systems. At such scale and inference budgets, they are incentivised to optimize as much as possible.

Also Google Deepmins has a six month embargo on strategic papers, so I bet the juiciest quantization tech isn't public yet.