https://chatjimmy.ai being a d...

lelandbatey • yesterday at 7:50 PM • 1 reply • view on HN

https://chatjimmy.ai being a demo of the "burn the model to an ASIC" approach being sold by Taalas[0], an approach which they use to run Llama 3.1 8B at ~17000 tokens per second.

[0] - https://taalas.com/products/

Replies

snek_case • today at 3:19 AM

Not to downplay their accomplishment but Llama 3.1 8B is a terrible model. It's really outdated at this point. It's cool that they were able to accelerate a model with silicon, but it also feels wasteful since llama 8B is such a useless model?

➕ show 2 replies

alt Hacker News

Replies