logoalt Hacker News

ACCount37yesterday at 11:07 PM1 replyview on HN

And then that 10M param GRAM went and got its shit kicked in by Grok 4.20 Blaze It Edition - on the same ARC-AGI battery. I know how that story goes.

It's the pattern with those "stupid specific architectures". Very good at this one thing. But only ever "good for their size", and only to a point.

They don't scale up and they don't generalize. Go far enough on task complexity and LLMs just kill them.

Does that make them useless? As an LLM replacement, yes. In general? Maybe not, I can think of things. But I'm yet to find any paper demonstrating a real world use.


Replies

onlyrealcuzzotoday at 12:37 AM

GRAM is something you add onto an LLM... It's not an LLM replacement. It's like an MLA caching layer, an MoE routing layer, or a speculative decoder at the end...