logoalt Hacker News

BoredomIsFuntoday at 8:06 AM1 replyview on HN

LFM models I've tried all seemed to be suffering from serious coherence issues. I found Gemmas the best at tasks requiring rock solid coherent output; even Qwen's not comparable.


Replies

1domtoday at 8:50 AM

I think context length is important to consider here.

I find Gemmas really good for a short conversation with maybe 3 or 4 exchanges of a few paragraphs each, which covers a surprisingly large amount of interactions.

For anything longer form though, particularly with larger code contexts, Qwen is far more useful for me personally.

I'm not an expert in this field, but my understanding is Qwen are hybrid gated attention mechanisms, whereas Gemma is hybrid including a sliding attention attention mechanism which makes it look like it favour the most recent tokens a little too much at times.

This is all in the context of local quantized models, I'm aware both have larger cloud variants that wouldn't suffer as much.