Taalas is interesting. 16,000 TPS for Llama on a chip.

On a very old model, it's more like 16.000 garbage words/s

➕ show 2 replies

Nihilartikel • yesterday at 12:41 PM

Neat! I had been wondering if anyone was trying to implement a model in silico. We're getting closer to having chatty talking toasters every day now!

➕ show 1 reply

replete • yesterday at 7:29 AM

Its exciting to see, but look at the die size for only an 8b model

DeathArrow • yesterday at 6:07 AM

I wonder how many token per seconds can they get if they put Mercury 2 on a chip.

alt Hacker News