logoalt Hacker News

Legend2440last Friday at 10:36 PM2 repliesview on HN

You are missing the point. This is a technology demonstration on prototype hardware, and no one intends it to be seriously useful.

Their architecture has fundamental speed and efficiency advantages over GPUs or Cerebras. They expect to scale up to real LLMs by splitting a model layer-wise across several chips, which they can do without incurring any throughput penalty.


Replies

Kirby64last Friday at 10:43 PM

> They expect to scale up to real LLMs by splitting a model layer-wise across several chips, which they can do without incurring any throughput penalty.

I’ll patiently wait to see this in reality. Their demonstration hardware is a 250W chip that is enormous in die area for the model size. They’re making a lot of claims, but until they can deliver then it’s nearly vaporware in my view.

I’d be happy to be proven wrong, but I think they’re going to quickly run into hardware realities quite soon if they think they can just chain a bunch of chips together to achieve the same performance on larger sizes.

show 1 reply
__alexsyesterday at 8:38 AM

Actually it's the opposite. Per mm of silicon it's massively less efficient and making enough chips and powering them is a major bottleneck right now. Worse, scaling to larger models requires more of our absolute best quality silicon manufacturing, where e.g. an H200 mostly just needs more memory.