logoalt Hacker News

zozbot234yesterday at 6:39 PM0 repliesview on HN

10 minutes a day or 15 minutes a day is what the inference workload is like on fairly small models. Once you start streaming in weights from SSD, things slow down quite a bit and become quite power hungry.