logoalt Hacker News

akietoday at 6:58 AM2 repliesview on HN

Why is Haiku the benchmark though, with code generation don't we primarily care about the quality of the code - not the speed or efficiency at which it's generated?


Replies

NitpickLawyertoday at 7:28 AM

You would be surprised how much code haiku writes behind the scenes. With the whole 'plan w/ opus, spawn subagents w/ haiku' that cc does. And you'd be surprised how useful the small models can be under some guidance / hand holding. You can daily-drive gpt5-mini and still find it useful. They're not as good as the big ones, obviously, and can't handle a project start-to-finish on their own, but given a well-scoped task, they'll do it just fine.

epolanskitoday at 7:16 AM

I'm not sure I follow, but I'll give you a very fresh example.

I was implementing a re-print functionality in my warehouse management system.

It took Opus 4.8 high 24m1s and 87k tokens. Took Haiku 6m30s and 41k tokens.

After that time I had to provide (minor) adjustments to both. But Haiku allowed me to iterate faster. Code quality for that somewhat trivial use case was similar.

Actually, I would even say that Opus provided a sub par solution: instead of fixing an issue where carrier label pdf wasn't saved as the state machine progressed to the latest step, it went through a much complex solution of re-generating those by scratch. Which is also wrong, as it was de-facto booking the carriers twice for the same order.

Haiku simply added another field on the terminal state that carried the already generated urls.

I don't think it's a good idea to default to highest effort/bigger model without taking into account the time it takes and the task complexity.

Imho we should experiment rather than assume that what the rest of the community does to be the best practice.

show 1 reply