You're not sharing what quantization you're using, in my experience, anything below Q8 and...

embedding-shape • today at 11:23 AM • 1 reply • view on HN

You're not sharing what quantization you're using, in my experience, anything below Q8 and less than ~30B tends to basically be useless locally, at least for what you typically use codex et al for, I'm sure it works for very simple prompts.

But as soon as you go below Q8, the models get stuck in repeating loops, get the tool calling syntax wrong or just starts outputting gibberish after a short while.

Replies

gchamonlive • today at 12:04 PM

will do that in an edit to the post

➕ show 1 reply

alt Hacker News

Replies