logoalt Hacker News

hodgehog11yesterday at 7:27 AM1 replyview on HN

I should have been more specific then. The problem isn't that the search space is too large to explore. The problem is that the search space is so large that the training procedure actively prefers to restrict the search space to maximise short term rewards, regardless of hyperparameter selection. There is a tradeoff here that could be ignored in the case of chess, but not for general math problems.

This is far from unsolvable. It just means that the "apply RL like AlphaGo" attitude is laughably naive. We need at least one more trick.


Replies

vatsachaktoday at 12:12 AM

The other trick could be bootstrapping through mathlib.

As you said brute forcing the search space as the starting procedure would take way too long for the AI to build intuition.

But if we could give it a million or so lemmas of human math, that would be a great starting point.