Noob q: can advancements like this targeted at local inference have bonus effects for cloud inferenc...

taffydavid • today at 7:32 AM • 1 reply • view on HN

Noob q: can advancements like this targeted at local inference have bonus effects for cloud inference? Presumably if you can get great results on cheaper hardware that also equates to less resource usage on cutting edge hardware, and less power draw?

Will advancements like this ultimately reduce the carbon footprint of AI?

Replies

goldenarm • today at 8:59 AM

Consumer and server hardware are quite different, especially Google's TPUs. They notably have much larger mixture-of-experts ratios and more complex caching systems. At such scale and inference budgets, they are incentivised to optimize as much as possible.

Also Google Deepmins has a six month embargo on strategic papers, so I bet the juiciest quantization tech isn't public yet.

alt Hacker News

Replies