failed the car wash test. i think instead of postiioning as a general purpuse reasoning model, the...

findjashua • today at 7:10 PM • 1 reply • view on HN

failed the car wash test.

i think instead of postiioning as a general purpuse reasoning model, they'd have more success focusing on a specific use case (eg coding agent) and benchmark against the sota open models for the use case (eg qwen3-coder-next)

Replies

Jianghong94 • today at 7:15 PM

Honestly I don't understand why they/any fast-and-error-prone model position themselves as coding agents; my experience tells me that I'd much rather working with a slow-but-correct model and let it run longer session than handholding a fast-but-wrong model.

alt Hacker News

Replies