Except learning to reason is a far cry from curve fitting. Our brains have more than five parameters.
It's the statistics equivalent of 'no one needs more than 640kb of RAM'
speak for yourself!
reasoning capability might just be some specific combinations of mirror neurons.
even some advanced math usually evolves applying patterns found elsewhere into new topics
I agree, I don't think gradient descent is going to work in the long run for the kind of luxurious & automated communist utopia the technocrats are promising everyone.
After a quick content browse, my understanding is this is more like with a very compressed diff vector, applied to a multi billion parameter model, the models could be 'retrained' to reason (score) better on a specific topic , e.g. math was used in the paper