logoalt Hacker News

christianstumpyesterday at 3:54 PM1 replyview on HN

When you write "there's a notable difference between performing a literature search versus solving something de novo", you suggest that the questions we provided can be solved doing a literature search.

This is incorrect. What is correct is the following: When understanding the existing literature on a question in the dataset, one can derive the answer without creating new mathematics research.

So the difference is "searching the literature" vs "understanding the literature" that made me believe it. But if you didn't that's even better!


Replies

fc417fc802yesterday at 10:58 PM

I did not suggest that, no. I stress that claiming a possibility is not the same as claiming a fact.

I observed that the two things are quite different in terms of model capabilities. That's relevant when considering how to interpret the results of the benchmark. We need to differentiate between (at minimum) reproducing an (approximately) verbatim answer from the training set, assembling disparate items from the training set into an answer piecewise, and performing novel logical inference using items from the training set.

I further speculated about the intent of the authors but you seem to be saying that my guess was wrong. In response I will observe that for any problem that's known to be solved it's likely to be quite difficult if not impossible to confidently determine that the model performed a de novo derivation as opposed to finding pieces of the answer in various places.

Of course there's absolutely nothing wrong with the latter! It's just important to be aware of the possibility when drawing conclusions about model capabilities.