logoalt Hacker News

Kirby64last Friday at 10:01 PM4 repliesview on HN

But it’s irrelevant. 750 tokens/s on a full frontier model is useful. 15000 poor quality tokens is much less useful no matter how much scaffolding you put around it.


Replies

Legend2440last Friday at 10:36 PM

You are missing the point. This is a technology demonstration on prototype hardware, and no one intends it to be seriously useful.

Their architecture has fundamental speed and efficiency advantages over GPUs or Cerebras. They expect to scale up to real LLMs by splitting a model layer-wise across several chips, which they can do without incurring any throughput penalty.

show 1 reply
trollbridgeyesterday at 12:54 PM

I’ve been using 1,000 t/s on a near frontier model for a month now. It’s very useful for agentic coding.

It does require new approaches for me personally since I get a lot less time to think or read its output.

show 1 reply
windexh8erlast Friday at 10:11 PM

I think you missed the point and don't understand / aren't considerate of SLM utility.

show 1 reply
huflungdunglast Friday at 11:18 PM

[dead]