logoalt Hacker News

jumploopstoday at 6:29 AM2 repliesview on HN

Completely agree!

It’s interesting to me how similar attempting to understand LLMs is to neuroscience.

“When we turn this bit off, this other thing happens… if we change these weights the Eiffel Tower is now in Rome”

We’re basically just probing around and trying to reverse engineer an emergent system.

To your point, this system may be quite different from model to model (human to human) although some similarities likely occur.

The comment I was responding to tried to belittle the OP’s understanding of transformers, by mentioning that running an LLM at scale is much harder than the simple white board diagram.

My point was simply that we don’t know why they work, and all the extra optimizations isn’t the “thing” that makes it emergent.

Simply scaling the “GPT” is good enough to see it, so the OP’s awe should stand.

(On a side note, what other architectures can we scale to find similar emergent behavior?)


Replies

galaxyLogictoday at 8:24 PM

Isn't the LLM simply predicting what should be the next sentences after user's input, using its algorithm and data it has exatrcted from existing texts on the internet. The algorithm that does that could have many different designs, some better some worse for the purpose of predicting what output makes most sense next?

So what is it that we don't understand about why theyr work? The algorithm? We have the code. Why the specific algorithm makes such good predictions? I see it as a generalization of trying to predict who wins Kentucky Derby.

trollbridgetoday at 10:12 AM

Computer vision ends up displaying emergent behaviour. It just "figures out" things.