While they can be run locally, and most of the discussion on HN about that, I bet that if you look a...

deaux • yesterday at 4:06 PM • 1 reply • view on HN

While they can be run locally, and most of the discussion on HN about that, I bet that if you look at total tok/day local usage is a tiny amount compared to total cloud inference even for these models. Most people who do use them locally just do a prompt every now and then.

Replies

zozbot234 • yesterday at 4:17 PM

This is why I'd like to see a lot more focus on batched inference with lower-end hardware. If you just do a tiny amount of tok/day and can wait for the answer to be computed overnight or so, you don't really need top-of-the-line hardware even for SOTA results.

➕ show 1 reply

alt Hacker News

Replies