logoalt Hacker News

maz1byesterday at 2:56 PM5 repliesview on HN

Pretty huge move. Google and their TPUs are looking infinitely more prescient as I think they are on their 7th generation, along with the offshoots it inspired like the LPU and even others, perhaps like Cerebras and their Wafer Scale Engine.

However, based off first impressions, it seems like this is meant for inference side, and not training, which is also an interesting choice.


Replies

skeledrewyesterday at 4:07 PM

Training is pretty much a 1x cost, and efficiency there is already on the way down with architectural improvements. Inference though is an ongoing cost which over time takes orders of magnitude more resources, so focusing on making that far more efficient means way greater gains over time.

ggcrtoday at 10:19 AM

With Reinforcement Learning, inference is very present in post-training stages now too

forrestthewoodsyesterday at 3:57 PM

Inference costs are higher than training now. I think.

Nvidia is king of general purpose training chips. But inferences can be specialized.

show 1 reply
cactusplant7374yesterday at 9:47 PM

Cerebras's Codex Spark 5.3 has been a huge flop. Small context window and old model. But hopefully they can improve so that we can benefit from 1000 tokens/second with GPT 5.5.

zer00eyzyesterday at 4:09 PM

> early testing shows that Jalapeño will deliver performance per watt substantially better than current state-of-the-art

We're starting to see what really matters here, and though this is hand wavy the TPU makes similar claims.

I think googles memo about having no moat still stands (see: https://newsletter.semianalysis.com/p/google-we-have-no-moat... if you are unaware). It kind of makes sense that all of this is looking more like 60's to 90's IBM, DEC, Cray, Sun and the hardware race that happened then. History doesn't repeat but it often rhymes and I suspect that these efforts will follow the same trajectory.

show 1 reply