logoalt Hacker News

ipdashcyesterday at 2:18 PM1 replyview on HN

> Good models will require multiple Taalas chips

I guess that makes sense. Is this feasible, or does the added latency between chips kill any of the performance gains?


Replies

wmfyesterday at 3:40 PM

Using multiple chips seems to work fine for Cerebras and Groq so it should also work for Taalas. It does sounds challenging to reach >10K tok/s but latency could be below 1 us which is a small part of the token budget.