logoalt Hacker News

RedCinnabaryesterday at 5:38 PM4 repliesview on HN

Call me back when you can run these models on 16GB of RAM and any recent i5/i7. Until then, there’s no point on using these toy models.


Replies

guaxyesterday at 7:48 PM

Its so funny, these "toy models" would be the wet dreams of researchers not 5 years ago.

Progress marches without mercy.

show 1 reply
giancarlostoroyesterday at 5:39 PM

You need it to run in about 8 GB so you have extra space for the context window.

Catloafdevyesterday at 5:40 PM

Hello, it's the internet calling, today is that day.

https://github.com/ikawrakow/ik_llama.cpp

Edit: it's gonna be slow if you're not using any VRAM. But it's possible. Software isn't going to speed that up anytime soon, it's just a hardware bandwidth limit.

jboss10yesterday at 8:08 PM

They can be ran on 32GB with 8GB VRAM. I don't think these will be on 16GB for a while. (35B MoE)

show 1 reply