logoalt Hacker News

icedchaitoday at 12:42 AM0 repliesview on HN

Yes, if it can't answer this common sense question correctly, what else has it screwed up and buried among all that slop?

Claude Opus 4.6 failed at first, even in "extended thinking" mode. I had to give it a pretty big hint for it to get the right answer: "Remember, my goal is to actually wash the car!" Only then did it get the correct answer. I will now call myself a Prompt Engineer.