This thought that “maybe we are just next token predictors too” is not particularly clever. Most of us have thought about that, but a bit of experience with LLMs make it obvious that’s not what’s going on here. I think it’s a bit like listening to a recording of a person and swearing there’s an actual person in the recording device because the audible output is indistinguishable from the real thing. Why would you do that? You wouldn’t unless you have no idea how a recording device works, in which case it seems like magic.
A one-way audio channel is indeed too weak for a person to distinguish a person from a recording, but a bidirectional audio channel is easily strong enough: the person can verbally ask the person-or-recording a question and see if it is acknowledged.
I claim that a modern frontier LLM can be given simple instructions that make it impossible for a person to reliably distinguish it from a person over a bidirectional text-only medium.
Thank you for you completion
> a bit of experience with LLMs make it obvious that’s not what’s going on here
I feel like that overstates the point quite a bit. There's a lot that's similar: neurotransmitter release is stochastic at the vesicle level, ion channels open and close probabilistically, post-synaptic responses have noise. A given neuron receiving identical input twice doesn't produce identical output. Neither brains nor LLMs have a central decider that forms intent and then implements it. In both, decisions emerges from network dynamics, they're a description of what the system did, not a separate cause (see Libet's experiments).
Now pretty clearly there's a lot that's different, and of course we don't understand brains enough to say just how similar they are to LLMs, but that's the point: it's an interesting thought experiment and shutting it down with a virtual eyeroll is sad.