logoalt Hacker News

maz1blast Saturday at 7:57 PM3 repliesview on HN

AFAIK, they don't have any deals or partnerships with Groq or Cerebras or any of those kinds of companies.. so how did they do this?


Replies

tcdentlast Saturday at 8:01 PM

Inference is run on shared hardware already, so they're not giving you the full bandwidth of the system by default. This most likely just allocates more resources to your request.

show 1 reply
hendersoonlast Saturday at 7:58 PM

Could well be running on Google TPUs.

rvzyesterday at 8:31 AM

The models are running on Google TPUs.