logoalt Hacker News

iknowstufflast Monday at 7:02 PM1 replyview on HN

Gemma 31B scoring below 26B-A4B?


Replies

gertlabslast Monday at 7:10 PM

In one shot coding, surprisingly, yes, by a decent amount. And it isn't a sample size issue. In agentic, no: https://gertlabs.com/?agentic=agentic

My early takeaway is that Gemma 26B-A4B is the best tuned out of the bunch, but being small and with few active params, it's severely constrained by context (large inputs and tasks with large required outputs tank Gemma 26B's performance). We're working on a clean visualization for this; the data is there.

It's not uncommon for a sub-release of a model to show improvements across the board on its model card, but actually have mixed real performance compared to its predecessor (sometimes even being worse on average).

show 1 reply