logoalt Hacker News

sulamyesterday at 11:28 PM5 repliesview on HN

Fair, I should define what I mean by under the hood. By “under the hood” I mean that models are still just being fed a stream of text (or other tokens in the case of video and audio models), being asked to predict the next token, and then doing that again. There is no technique that anyone has discovered that is different than that, at least not that is in production. If you think there is, and people are just keeping it secret, well, you clearly don’t know how these places work. The elaborations that make this more interesting than the original GPT/Attention stuff is 1) there is more than one model in the mix now, even though you may only be told you’re interacting with “GPT 5.4”, 2) there’s a significant amount of fine tuning with RLHF in specific domains that each lab feels is important to be good at because of benchmarks, strategy, or just conviction (DeepMind, we see you). There’s also a lot work being put into speeding up inference, as well as making it cheaper to operate. I probably shouldn’t forget tool use for that matter, since that’s the only reason they can count the r’s in strawberry these days.

None of that changes the concept that a model is just fundamentally very good at predicting what the next element in the stream should be, modulo injected randomness in the form of a temperature. Why does that actually end up looking like intelligence? Well, because we see the model’s ability to be plausibly correct over a wide range of topics and we get excited.

Btw, don’t take this reductionist approach as being synonymous with thinking these models aren’t incredibly useful and transformative for multiple industries. They’re a very big deal. But OpenAI shouldn’t give up because Opus 4.whatever is doing better on a bunch of benchmarks that are either saturated or in the training data, or have been RLHF’d to hell and back. This is not AGI.


Replies

stavrosyesterday at 11:50 PM

Everybody says "but they just predict tokens" as if that's not just "I hope you won't think too much about this" sleight of hand.

Why does predicting the next token mean that they aren't AGI? Please clarify the exact logical steps there, because I make a similar argument that human brains are merely electrical signals propagating, and not real intelligence, but I never really seem to convince people.

show 3 replies
famouswafflestoday at 12:58 AM

Next-token prediction is just the training objective. I could describe your reply to me as “next-word prediction” too, since the words necessarily come out one after another. But that framing is trivial. It tells you what the system is being optimized to do, not how it actually does it.

Model training can be summed up as 'This what you have to do (objective), figure it out. Well here's a little skeleton that might help you out (architecture)'.

We spend millions of dollars and months training these frontier models precisely because the training process figures out numerous things we don't know or understand. Every day, Large Language Models, in service of their reply, in service of 'predicting the next token', perform sophisticated internal procedures far more complex than anything any human has come up with or possesses knowledge of. So for someone to say that they 'know how the models work under the hood', well it's all very silly.

heavyset_gotoday at 12:08 AM

> Btw, don’t take this reductionist approach as being synonymous with thinking these models aren’t incredibly useful and transformative for multiple industries. They’re a very big deal. But OpenAI shouldn’t give up because Opus 4.whatever is doing better on a bunch of benchmarks that are either saturated or in the training data, or have been RLHF’d to hell and back. This is not AGI.

It's sad that you have to add this postscript lest you be accused of being ignorant or anti-AI because you acknowledge that LLMs are not AGI.

torginustoday at 12:03 AM

If you typed your comment by reading all the others' in the chain, then you responded by typing your response in one go, then you 'just' did next-token prediction based on textual input.

I would still argue that does not prevent you from having intelligence, so that's why this argument is silly.