This is why I'd like to see a lot more focus on batched inference with lower-end hardware. If y...

zozbot234 • last Thursday at 4:17 PM • 2 replies • view on HN

This is why I'd like to see a lot more focus on batched inference with lower-end hardware. If you just do a tiny amount of tok/day and can wait for the answer to be computed overnight or so, you don't really need top-of-the-line hardware even for SOTA results.

Replies

mistercheese • last Friday at 4:15 AM

That’s a good point. I think I saw Together.ai with that offering, but for some reason just never think to throw random non urgent coding tasks at it overnight

deaux • last Thursday at 6:01 PM

> If you just do a tiny amount of tok/day and can wait for the answer to be computed overnight or so

But they can't? The usage pattern is the polar opposite. Most people running these models locally just ask a few questions to it throughout the day. They want the answers now, or at least within a minute.

➕ show 1 reply

alt Hacker News

Replies