logoalt Hacker News

_345today at 5:15 AM7 repliesview on HN

This makes so much sense as to why I've always felt that Opus 4.8 was leagues ahead of GPT 5.5. It's so good at taking underspecified requirements and filling in the gaps with sensible approaches for your project


Replies

nsingh2today at 5:48 AM

Why supply underspecified requirements in the first place? Both models are good at challenging assumptions/edge cases and asking questions to clarify, but seemingly only when explicitly asked (i.e. something like a "brainstorm" skill).

I don't think either harnesses do enough to encourage the model to challenge all assumptions and ask questions, maybe because users might find it annoying. That step is basically a requirement IMO.

I've found all of the GPT-5 models to be very nit-picky, useful for code review and mathematics (important for my work), but seemingly gets in the way of "aesthetic" code, e.g. overly defensive code to cover all edge cases, even if unlikely.

There is seemingly also a tradeoff between flexibility vs instruction following. In my experience Opus will sometimes ignore instructions but can "fill in the blanks" more, vs GPT-5.5 follows instructions better but perhaps at the cost of rigidity.

show 3 replies
root-parenttoday at 10:39 AM

The best benchmarks are the ones you create yourself.

Its not my experience Opus is leagues ahead or even superior, but in any case, since GPT 5.5 has Instant, Medium, High, Extra High and Pro...Should the comparison be with GPT on Pro, instead of Extra High as it seems to be the case in the table?

show 2 replies
CSMastermindtoday at 6:26 AM

Man I don't know if I'm living in a crazy bubble or something but GPT 5.5 is lightyears better than Opus 4.8 for me to the point where I'm honestly wondering how you're evaluating them or what kind of work you're doing.

There's specific tasks that Opus does better on like Frontend Dev and Design but for anything else 5.5 just laps it.

show 3 replies
m3kw9today at 3:49 PM

Better for vibe coders who always under specify. But at what point does it know you are under specifying but you have properly specified and it did it over your specification?

zuzululutoday at 6:04 AM

same observation here opus 4.8 (and i dont understand the people defending gpt 5.5 constantly) was significantly mature, it would even push back against anything off putting where as GPT 5.5 will happily agree and do what is asked but I would note that it takes several tries.

4.8 also requires more than one prompt but its output is significantly higher quality and offers more insight

Fable 5 is a different beast however.

re-thctoday at 5:32 AM

> It's so good at taking underspecified requirements and filling in the gaps with sensible approaches for your project.

At a high level. It misses low level or other non-functional requirements differently so I wouldn't say Opus is just strictly better.

It's also possible that it's just a harness problem more than model.

show 1 reply
hypfertoday at 6:44 AM

Similarly, it explains to me why people found Claude so amazing, while I just thought "eh."

Tool expectations