I think this is a good way to test a certain kind of capability, but as to whether LLMs would pass s...

0x3f • today at 10:56 AM • 0 replies • view on HN

I think this is a good way to test a certain kind of capability, but as to whether LLMs would pass such a test, I'm guessing almost certainly not. If you've ever used one for research, it's very 'in' the current literature, whatever that may be. It's an incredible retrieval tool, and it will glibly evaluate any novel ideas that you feed in, but analyses are often incorrect when there's a paucity of directly relevant training data.

alt Hacker News