What about 15k tokens per second? [0] I remember looking at this earlier in the year and it being s...

windexh8er • last Friday at 7:58 PM • 4 replies • view on HN

What about 15k tokens per second? [0] I remember looking at this earlier in the year and it being so fast that it feels fake. And, yes, this model is old - but still awesome for what it is.

[0] https://chatjimmy.ai/

Replies

Kirby64 • last Friday at 8:36 PM

It’s not just old, it’s also tiny and quantized. It’s llama 3.1 8b at 3/6-bit quant. This is the type of thing you can run on almost any device…

➕ show 1 reply

ehsankia • yesterday at 6:03 AM

I just tried it, and the answer is non-sense.

I asked it something simple, list some good indie puzzle games, and half the answers are games that don't exist. Imo quality > speed.

partsch • last Friday at 9:04 PM

They baked the LLM into a CPU

calvinmorrison • yesterday at 1:15 AM

at 15K tokens/s... do you need code anymore

➕ show 1 reply

alt Hacker News

Replies