logoalt Hacker News

Aurornistoday at 2:52 PM1 replyview on HN

Most people won’t be dumping 100K tokens into it at once, but I agree that all of the prefill time that adds up during a session becomes a lot to account for.

This is also a problem for all of the Mac local LLMs. Macs are a great way to get a lot of high bandwidth memory, but their compute is very far behind current gen dedicated GPUs. Some of the expensive Mac Studio setups allow you to run very large models with usable tokens/s, but you can be waiting a long time for it to get to the point of generating those tokens.


Replies

Tepixtoday at 5:52 PM

When you're using OpenCode it's easy to reach 100,000 tokens after a while.