logoalt Hacker News

twotwotwoyesterday at 6:39 PM0 repliesview on HN

For folks that like this kind of question, SimpleBench (https://simple-bench.com/ ) is sort of neat. From the sample questions (https://github.com/simple-bench/SimpleBench/blob/main/simple... ), a common pattern seems to be for the prompt to 'look like' a familiar/textbook problem (maybe with detail you'd need to solve a physics problem, etc.) but to get the actually-correct answer you have to ignore what the format appears to be hinting at and (sometimes) pull in some piece of human common sense.

I'm not sure how effectively it isolates a single dimension of failure or (in)capacity--it seems like it's at least two distinct skills to 1) ignore false cues from question format when there's in fact a crucial difference from the template and 2) to reach for relevant common sense at the right times--but it's sort of fun because that is a genre of prompt that seems straightforward to search for (and, as here, people stumble on organically!).