logoalt Hacker News

exabrialyesterday at 12:46 AM2 repliesview on HN

I feel like we need an entirely new type of silicon for LLMs. Something completely focused on bandwidth and storage probably at the sacrifice of raw computation power.


Replies

garethspriceyesterday at 12:50 PM

Something like this? (Llama 3.1-8B etched into custom silicon delivering 16,000 tok/s, doesn't use much PCIe bandwidth):

- https://taalas.com/the-path-to-ubiquitous-ai/ - https://chatjimmy.ai/

show 1 reply