Any chance you'd be willing to talk further about your setup? I have 2 x 3090s in a local machi...

akulbe • yesterday at 10:36 PM • 1 reply • view on HN

Any chance you'd be willing to talk further about your setup? I have 2 x 3090s in a local machine, and I'm still left with questions about how best to use stuff locally.

Replies

sheeshkebab • today at 12:25 AM

You can only run heavily quantized models on all 3/4/5 rtx gpus (with 32gb or less vram) - and you probably want moe versions like Qwen 35b for this to run at speed somewhat comparable to Claude. It’s still not there to be honest but getting there. Personally I mess around with llama.cpp on m5 max with 128gb - it’s a decent setup to try various medium sized things, and runs llms surprisingly well without quantization, at least the moe models.

➕ show 2 replies

alt Hacker News

Replies