logoalt Hacker News

sometimelurkeryesterday at 6:20 PM2 repliesview on HN

I looked into this "GRAM" stuff a sibling comment links further to, and just to say:

- this gets reinvented/rediscovered constantly under different names

- it cant be trained very well (right now, will change)

- massive theoretical improvements over current models (log_2(vocabsize)=17, residual stream dim is thousands of dimensions, recursivity means more information bandwidth by ~3 OoM)

- BUT it cant be interpreted or aligned <- this is why no one uses it and no one talks about it. the idea is 100% obvious to all the frontier labs and there is a good reason why it isn't used

I follow this stuff closely, I think I know what I'm talking about (edited for formating)


Replies

onlyrealcuzzoyesterday at 8:15 PM

> - this gets reinvented/rediscovered constantly under different names

What are the different names? I haven't seen this before.

> - it cant be trained very well (right now, will change)

If you're sure it will change, then why are you certain that it hasn't yet, and if it's proven a 5000x boost in reasoning... why aren't they exploring this path more aggressively?

> the idea is 100% obvious to all the frontier labs and there is a good reason why it isn't used

Surely someone is willing to take a 5000x boost in reasoning on a small research model... None of them have even tried anything resembling this AFAIK. It does not seem like something 100% obvious to them.

show 2 replies
l674yesterday at 6:29 PM

Could you explain how/why GRAM cannot be interpreted or aligned how current LLMs are? Not very familiar how it works

show 2 replies