You can learn a lot from a model when you ask about its sizing, although not necessarily anything ab...

vessenes • yesterday at 10:04 PM • 2 replies • view on HN

You can learn a lot from a model when you ask about its sizing, although not necessarily anything about the sizing.

For instance, you can learn how much introspection has been trained in during RL, and you can also learn (sometimes) if output from other models has been incorporated into the RL.

I think of the self-knowledge conversations with models as a nicety that's recent, and stand by my assessment that this model is not trained using modern frontier RL workflows.

> you can’t use software to figure out the “process” used to manufacture the chip it is running on.

This seems so incorrect that I don't even know where to start parsing it. All chips are designed and analyzed by software; all chip analysis, say of an unknown chip, starts with etching away layers and imaging them using software, then analyzing the layers, using software. But maybe another way to say that is "I don't understand your analogy."

Replies

jiggawatts • today at 1:50 AM

> I don't even know where to start parsing it.

If it helps, the key part is: "that it is running on".

You can't use software to analyse images of disassembled chips that it is running on because disassembled chips can't run software!

A surgeon can learn about brain surgery by inspecting other brains, but the smartest brain surgeon in the world can't possibly figure out how many neurons or synapses their own brains have just by thinking about it.

Your meat substrate is inaccessible to your thoughts in the exact same manner that the number of weights, model architecture, runtime stack, CUDA driver version, etc, etc... are totally inaccessible to an LLM.

It can be told, after the fact, in the same manner that a surgeon might study how brains work in a series of lectures, but that is fundamentally distinct.

PS: Most ChatGPT models didn't know what they were called either, and tended to say the name and properties of their predecessor model, which was in their training set. Open AI eventually got fed up with people thinking this was a fundamental flaw (it isn't), and baked this specific set of metadata into the system prompt and/or the post-training phase.

wizzwizz4 • yesterday at 10:13 PM

> For instance, you can learn how much introspection has been trained in during RL,

That's not introspection: that's a simulacrum of it. Introspection allows you to actually learn things about how your mind functions, if you do it right (which I can't do reliably, but I have done on occasion – and occasionally I discover something that's true for humans in general, which I can later find described in the academic literature), and that's something that language models are inherently incapable of. Though you probably could design a neural architecture that is capable of observing its own function, by altering its operation: perhaps a recurrent or spiking neural network might learn such a behaviour, under carefully-engineered circumstances, although all the training processes I know of would have the model ignore whatever signals it was getting from its own architecture.

> all chip analysis, say of an unknown chip, starts with etching away layers

Good luck running any software on that chip afterwards.

➕ show 1 reply

alt Hacker News

Replies