logoalt Hacker News

simianwordstoday at 6:36 AM0 repliesview on HN

> The complete failure of Claude to play Pokemon, something a small child can do with zero prior instruction

cherry picking because gemini and gpt have beat it. claude doesn't have a good vision set up

> The "how many r's are in strawberry" question

it could do this since 2024

> The "should I drive or walk to the car wash" question

the SOTA models get it right with reasoning

> fact that right now, today all models are very frequently turning out code that uses APIs that don't exist, syntax that doesn't exist, or basic logic failures.

not when you use a harness. even humans can't write code that works in first attempt.