logoalt Hacker News

Etheryteyesterday at 12:44 PM3 repliesview on HN

It is incredibly fast, on that I agree, but even simple queries I tried got very inaccurate answers. Which makes sense, it's essentially a trade off of how much time you give it to "think", but if it's fast to the point where it has no accuracy, I'm not sure I see the appeal.


Replies

andrewdeayesterday at 1:27 PM

the hardwired model is Llama 3.1 8B, which is a lightweight model from two years ago. Unlike other models, it doesn't use "reasoning:" the time between question and answer is spent predicting the next tokens. It doesn't run faster because it uses less time to "think," It runs faster because its weights are hardwired into the chip rather than loaded from memory. A larger model running on a larger hardwired chip would run about as fast and get far more accurate results. That's what this proof of concept shows

show 1 reply
kaashifyesterday at 12:46 PM

If it's incredibly fast at a 2022 state of the art level of accuracy, then surely it's only a matter of time until it's incredibly fast at a 2026 level of accuracy.

show 2 replies
scotty79yesterday at 1:15 PM

I think it might be pretty good for translation. Especially when fed with small chunks of the content at a time so it doesn't lose track on longer texts.