The path to ubiquitous AI (17k tokens/sec)

659 points • by sidnarsipur • yesterday at 10:32 AM • 376 comments • view on HN

Comments

Impressive stuff.

For fun I'm imagining a future where you would be able to buy an ASIC with like an hard-wired 1B LLM model in it for cents and it could be used everywhere.

standeven • yesterday at 3:51 PM

Holy shit this is fast. It generated a legible, original, two-paragraph story on given topics in 0.025s.

PrimaryExplorer • yesterday at 12:51 PM

this is absolutely mindblowing speed. imagine this with opus or 5.2

YetAnotherNick • yesterday at 11:48 AM

17k token/sec is $0.18/chip/hr for the size of H100 chip if they want to compete with the market rate[1]. But 17k token/sec could lead to some new usecases.

[1]: https://artificialanalysis.ai/models/llama-3-1-instruct-8b/p...

OrvalWintermute • yesterday at 4:32 PM

wow that is fast!

heliumtera • yesterday at 7:45 PM

Yep, this is the most exciting demo for me yet. Holy cow this is unbelievably fast.

The most impressive demo since gpt 3, honestly.

Since we already have open source models that are plenty good, like the new kimi k2.5, all I need is the ability to run it at moderate speed.

Honestly I am not bullish on capabilities that models do not yet have, seems we have seen it all and the only advancement have been context size.

And honestly I would claim this is the market sentiment aswell, anthropic showed opus 4.6 first and the big release was actually sonnet, the model people would use routinely. Nobody gave a shit about Gemini 3.1 pro, 3.0 flash was very successful...

Given all the recent developments in the last 12 months, no new use cases have opened for me. Given this insane speed, even on a limited model/context size, we would approach IA very differently.

hagbard_c • yesterday at 1:35 PM

Holy Hallucinations, Batman...

User You: What is the airspeed of an unladen swallow/

Jimmy:

A reference to the classic line from Monty Python and the Holy Grail!

Thus far it is correct but here it comes:

In the movie, a group of medieval scholars, the Knights who say "Ni!" (also known as the Bridgekeepers), ask a king's servant, a blacksmith, if an unladen swallow makes good time. The blacksmith, played by John Cleese, responds with the now-famous line:

"And what's the airspeed velocity of an unladen swallow?"

To which the blacksmith responds:

"What do you mean? An unladen swallow?"

The blacksmith then explains that an unladen swallow is when a swallow (a type of bird) is flying in the air without carrying any burden, implying that it is moving at a speed that is not affected by carrying any weight or load.

Nope, that is not at all how it went. No blacksmiths, not related to the Knights who say 'Ni!' - who have nothing scholarly about them, they wanted a shrubbery and demanded another one when their wish was fulfilled. It was the Bridgekeeper - not related to the aforementioned knights - who asked 3 questions and would throw anyone who could not answer all three into the chasm. When he asked Arthur - played by Cleese - the question about the airspeed of the swallow he asked the Bridgekeeper whether he meant an African or European swallow. The Bridgekeeper did not have an answer and was thrown into the chasm, problem solved.

However, in reality, swallows are not typically "unladen" in flight. They are small birds that fly at relatively fast speeds, usually around 10-15 km/h (6-9 mph), but they are not usually carrying any burdens!

Needless LLM-blabber.

The "airspeed velocity of an unladen swallow" has become a meme and a cultural reference point, often used humorously or ironically to refer to situations where someone is trying to make an absurd or non-sensical argument or ask an absurd question.

Somewhat correct but not necessary in this context.

The correct answer to the question would have been Do you mean an African or European swallow? followed by a short reference to the movie.

Of course this demo is not about the accuracy of the model - 'an old Llama' as mentioned elsewhere in this thread - but it does show that speed isn't everything. For generating LLM-slop this hardware implementation probably offers an unbeatable price/performance ratio but it remains to be seen if it can be combined with larger and less hallucination-prone models.

➕ show 1 reply

pelasaco • yesterday at 1:08 PM

Is it already available to buy, or is this a “pay now, get it later” kind of new ASIC miner? Sorry for being skeptical, but AI is the new "crypto coin", and the crypto bros are still around.

➕ show 1 reply

johnjames87 • yesterday at 11:54 AM

[dead]

small_model • yesterday at 11:59 AM

Scale this then close the loop and have fabs spit out new chips with latest weights every week that get placed in a server using a robot, how long before AGI?

fragkakis • yesterday at 11:47 AM

The article doesn't say anything about the price (it will be expensive), but it doesn't look like something that the average developer would purchase.

An LLM's effective lifespan is a few months (ie the amount of time it is considered top-tier), it wouldn't make sense for a user to purchase something that would be superseded in a couple of months.

An LLM hosting service however, where it would operate 24/7, would be able to make up for the investment.

viftodi • yesterday at 11:26 AM

I tried the trick question I saw here before, about the make 1000 with 9 8s and additions only

I know it's not a resonating model, but I keep pushing it and eventually it gave me this as part of it's output

888 + 88 + 88 + 8 + 8 = 1060, too high... 8888 + 8 = 10000, too high... 888 + 8 + 8 +ประก 8 = 1000,ประก

I googled the strange symbol, it seems to mean Set in thai?

➕ show 1 reply

alt Hacker News

The path to ubiquitous AI (17k tokens/sec)

Comments