logoalt Hacker News

0x3ftoday at 10:56 AM0 repliesview on HN

I think this is a good way to test a certain kind of capability, but as to whether LLMs would pass such a test, I'm guessing almost certainly not. If you've ever used one for research, it's very 'in' the current literature, whatever that may be. It's an incredible retrieval tool, and it will glibly evaluate any novel ideas that you feed in, but analyses are often incorrect when there's a paucity of directly relevant training data.