But it’s irrelevant. 750 tokens/s on a full frontier model is useful. 15000 poor quality tokens...

Kirby64 • last Friday at 10:01 PM • 4 replies • view on HN

But it’s irrelevant. 750 tokens/s on a full frontier model is useful. 15000 poor quality tokens is much less useful no matter how much scaffolding you put around it.

Replies

Legend2440 • last Friday at 10:36 PM

You are missing the point. This is a technology demonstration on prototype hardware, and no one intends it to be seriously useful.

Their architecture has fundamental speed and efficiency advantages over GPUs or Cerebras. They expect to scale up to real LLMs by splitting a model layer-wise across several chips, which they can do without incurring any throughput penalty.

➕ show 1 reply

trollbridge • yesterday at 12:54 PM

I’ve been using 1,000 t/s on a near frontier model for a month now. It’s very useful for agentic coding.

It does require new approaches for me personally since I get a lot less time to think or read its output.

➕ show 1 reply

windexh8er • last Friday at 10:11 PM

I think you missed the point and don't understand / aren't considerate of SLM utility.

➕ show 1 reply

huflungdung • last Friday at 11:18 PM

[dead]

alt Hacker News

Replies