failed the car wash test.
i think instead of postiioning as a general purpuse reasoning model, they'd have more success focusing on a specific use case (eg coding agent) and benchmark against the sota open models for the use case (eg qwen3-coder-next)
Honestly I don't understand why they/any fast-and-error-prone model position themselves as coding agents; my experience tells me that I'd much rather working with a slow-but-correct model and let it run longer session than handholding a fast-but-wrong model.