logoalt Hacker News

wongarsutoday at 12:41 PM0 repliesview on HN

Anything below one billion parameters you can run on the CPU at acceptable speed

For larger sizes you still can, it just becomes slower and slower. For a simple classification task (small input, tiny output, and you can constrain output to a couple tokens) you could even run something like a 4B or 8B model on the CPU