logoalt Hacker News

codethiefyesterday at 11:24 PM6 repliesview on HN

> > Why are you handwaving things away though? I've got you on max effort. I even patched the system prompts to reduce this.

In my experience, prompts like this one, which 1) ask for a reason behind an answer (when the model won't actually be able to provide one), 2) are somewhat standoff-ish, don't work well at all. You'll just have the model go the other way.

What works much better is to tell the model to take a step back and re-evaluate. Sometimes it also helps to explicitly ask it to look at things from a different angle XYZ, in other words, to add some entropy to get it away from the local optimum it's currently at.


Replies

mrandishtoday at 1:07 AM

> when the model won't actually be able to provide one

This is key. In my experience, asking an LLM why it did something is usually pointless. In a subsequent round, it generally can't meaningfully introspect on its prior internal state, so it's just referring to the session transcript and extrapolating a plausible sounding answer based on its training data of how LLMs typically work.

That doesn't necessarily mean the reply is wrong because, as usual, a statistically plausible sounding answer sometimes also happens to be correct, but it has no fundamental truth value. I've gotten equally plausible answers just pasting the same session transcript into another LLM and asking why it did that.

show 1 reply
matheusmoreirayesterday at 11:32 PM

That's good advice. I managed to get the session back on track by doing that a few turns later. I started making it very explicit that I wanted it to really think things through. It kept asking me for permission to do things, I had to explicitly prompt it to trace through and resolve every single edge case it ran into, but it seems to be doing better now. It's running a lot of adversarial tests right now and the results at least seem to be more thorough and acceptable. It's gonna take a while to fully review the output though.

It's just that Opus 4.6 DISABLE_ADAPTIVE_THINKING=1 doesn't seem to require me to do this at all, or at least not as often. It'd fully explore the code and take into account all the edge cases and caveats without any explicit prompting from me. It's a really frustrating experience to watch Anthropic's flagship subscription-only model burn my tokens only to end up lazily hand-waving away hard questions unless I explicitly tell it not to do that.

I have to give it to Opus 4.7 though: it recovered much better than 4.6.

christina97today at 4:24 AM

This is frankly one of the most frustrating things about LLMs: sometimes I just want to drive it into a corner. “Why the f** did you do X when I specifically told you not to?”

It never leads to anything helpful. I don’t generally find it necessary to drive humans into a corner. I’m not sure it’s because it’s explicitly not a human so I don’t feel bad for it, though I think it’s more the fact that it’s always so bland and is entirely unable to respond to a slight bit of negative sentiment (both in terms of genuinely not being able to exert more effort into getting it right when someone is frustrated with it, but also in that it is always equally nonchalant and inflexible).

show 1 reply
j-bostoday at 1:03 AM

Yeah for anyone seriously using these models I highly reccomend reading the Mythos system card, esp the sections on analyzing it's internal non verbalized states. Save a lot of head wall banging.

neloxtoday at 12:43 AM

Precisely. I find Grok’s multi-agent approach very useful here. I have custom agent configured as a validator.

show 1 reply
noodletheworldtoday at 2:44 AM

> What works much better is to tell the model to take a step back and re-evaluate.

I desperately hate that modern tooling relies on “did you perform the correct prayer to the Omnissiah”

> to add some entropy to get it away from the local optimum

Is that what it does? I don't think thats what it does, technically.

I think thats just anthropomorphizing a system that behaves in a non deterministic way.

A more menaingful solution is almost always “do it multiple times”.

That is a solution that makes sense sometimes because the system is prob based, but even then, when youre hitting an opaque api which has multiple hidden caching layers, /shrug who knows.

This is way I firmly believing prompt engineering and prompt hacking is just fluff.

Its both mostly technically meaningless (observing random variance over a sample so small you cant see actual patterns) and obsolete once models/apis change.

Just ask Claude to rewrite your request “as a prompt for claude code” and use that.

I bet it wont be any worse than the prompt you write by hand.

show 2 replies