> Good models will require multiple Taalas chips I guess that makes sense. Is this feasible, or...

ipdashc • yesterday at 2:18 PM • 1 reply • view on HN

> Good models will require multiple Taalas chips

I guess that makes sense. Is this feasible, or does the added latency between chips kill any of the performance gains?

Replies

Using multiple chips seems to work fine for Cerebras and Groq so it should also work for Taalas. It does sounds challenging to reach >10K tok/s but latency could be below 1 us which is a small part of the token budget.

alt Hacker News

Replies