logoalt Hacker News

wmfyesterday at 3:40 PM0 repliesview on HN

Using multiple chips seems to work fine for Cerebras and Groq so it should also work for Taalas. It does sounds challenging to reach >10K tok/s but latency could be below 1 us which is a small part of the token budget.