> the whole process remains differentiable: we can even propagate gradients through the computati...

BenoitP • today at 1:24 PM • 0 replies • view on HN

> the whole process remains differentiable: we can even propagate gradients through the computation itself. That makes this fundamentally different from an external tool. It becomes a trainable computational substrate that can be integrated directly into a larger model.

IMHO the key point at which this technique has an unfair advantage vs a traditional interpreter is here.

How disruptive is it to have differentiability? To me it would mean that some tweaking-around can happen in an LLM-program at train-time; like changing a constant, or switching from a function call to another function. Can we gradient-descent effectively inside this huge space? How different is it from tool-calling from a pool of learned programs (think github but for LLM programs written in classic languages)?

alt Hacker News