I feel like we need an entirely new type of silicon for LLMs. Something completely focused on bandwidth and storage probably at the sacrifice of raw computation power.
Something like this? (Llama 3.1-8B etched into custom silicon delivering 16,000 tok/s, doesn't use much PCIe bandwidth):
- https://taalas.com/the-path-to-ubiquitous-ai/ - https://chatjimmy.ai/
Something like this? (Llama 3.1-8B etched into custom silicon delivering 16,000 tok/s, doesn't use much PCIe bandwidth):
- https://taalas.com/the-path-to-ubiquitous-ai/ - https://chatjimmy.ai/