logoalt Hacker News

oerstedyesterday at 10:36 PM1 replyview on HN

> The neurons serve as a biological filter: the training system translates screen pixels and ray-cast distances into electrical zaps, the living cells fire spikes, and those counts feed straight into a PyTorch decoder that maps them to Doom actions. The PPO agent, CNN encoder and entire reward loop run on ordinary silicon elsewhere. Cole’s ablation modes make the split testable, set decoder output to random or zero and the game still plays. The CL1 hardware interface works exactly as advertised. What remains unproven is whether 200,000 human neurons can ever carry the policy instead of just riding along.

Yeah… That’s quite the smoking gun.

So it’s quite likely then that the neurons are just acting as a bad conductor. The electrodes read a noisy version of the signals that go into the neurons, and they just train a CNN with PPO to remove that noise, get the proper inputs, and learn a half-decent policy for playing the game.

If this worked as advertised they shouldn’t need a CNN decoder at all! The raw neuron readout should be interpreted as game inputs directly.

Besides, they are not streaming the video into the neurons at all. Just the horizontal position of the enemies and the distance, or some variant of that. In that sense it’s barely more than pong isn’t it? If enemy left, rotate left, if enemy right, rotate right, if enemy center shoot. At a stretch, if enemy far, go forward, if enemy close, go back. The rest of the time just move randomly. Indeed, the behavior in the video is essentially that…

While we are at it, the encoded input signal itself is already pretty close to a decent policy if mapped directly to the keys (how much enemy left, center, right), even without any CNN, PPO or neurons.

EDIT: It seems like the readme does address these concerns, and the described setup differs significantly from the description in the critical blogpost. Still not entirely convincing to me, a lot of weights being trained in silicon around the neurons, but it sounds better. I don’t have time right now to look deeper into it. They outline some interesting details though.

> Quote from: https://raw.githubusercontent.com/SeanCole02/doom-neuron/mai...

Isn't the decoder/PPO doing all the learning?

No, this is precisely why there are ablations. The footage you see in the video was taken using a 0-bias full linear readout decoder, meaning that the action selected is a linear function of the output spikes from the CL1; the CL1 is doing the learning. There is a noticeable difference when using the ablation (both random and 0 spikes result in zero learning) versus actual CL1 spikes.

Isn't the encoder/PPO doing all the learning?

This question largely assumes that the cells are static, which is incorrect; it is not a memory-less feed X in get Y machine. Both the policy and the cells are dynamical systems; biological neurons have an internal state (membrane potential, synaptic weights, adaptation currents). The same stimulation delivered at different points in training will produce different spike patterns, because the neurons have been conditioned by prior feedback. During testing, we froze encoder weights and still observed improvements in the reward.

How is DOOM converted to electrical signals?

We train an encoder in our PPO policy that dictates the stimulation pattern (frequency, amplitude, pulses, and even which channels to stimulate). Because the CL1 spikes are non-differentiable, the encoder is trained through PPO policy gradients using the log-likelihood trick (REINFORCE-style), i.e., by including the encoder’s sampled stimulation log-probs in the PPO objective rather than backpropagating through spikes.


Replies

NooneAtAll3today at 10:24 AM

> If this worked as advertised they shouldn’t need a CNN decoder at all!

yeah!

the whole point was to make neurons BE the neural net