logoalt Hacker News

NitpickLawyeryesterday at 8:35 AM3 repliesview on HN

Take math and coding for example:

- in math, if they can solve a problem, or a class of problems, they'll solve it. If you use a "thinking" model + maj@x, you'll get strong results. But if you try for example to have the model consider a particular way or method of exploring a problem, it'll default to "solving" mode. It's near impossible to have it do something else with a math problem, other than solving it. Say "explore this part, in this way, using this method". Can't do it. It'll maybe play a bit, but then enter "solving" mdoe and continue to solve it as it was trained.

In practice, this means that "massive parallel" test time compute becomes harder to do with these models, because you can't "guide" them towards certain aspects of a problem. They are extremely "stubborn".

- in coding it's even more obvious. Ask them to produce any 0shot often tested and often shown things (spa, game, visualisation, etc) - and they do it. Convincingly.

But ask them to look at a piece of code and extract meaning, and they fail. Or ask them to reverse an implementation. Figure out what a function does and reverse its use, or make it do something else, and they fail.


Replies

vintermannyesterday at 9:25 AM

Oof, that sounds frustrating. Yeah, I can relate to this failure mode, it's basically "did you mean (more likely query)" up to 11.

It does sound like an artifact of the dialog/thinking tuning though.

CuriouslyCyesterday at 12:38 PM

That's the thing people miss that's so good about GPT5. It's incredibly steerable in a way a lot of models aren't.

elbearyesterday at 8:41 AM

It sounds like some people.