logoalt Hacker News

misnomeyesterday at 5:52 PM6 repliesview on HN

I've been playing with 3.5:122b on a GH200 the past few days for rust/react/ts, and while it's clearly sub-Sonnet, with tight descriptions it can get small-medium tasks done OK - as well as Sonnet if the scope is small.

The main quirk I've found is that it has a tendency to decide halfway through following my detailed instructions that it would be "simpler" to just... not do what I asked, and I find it has stripped all the preliminary support infrastructure for the new feature out of the code.


Replies

sheepscreekyesterday at 7:30 PM

That sounds awfully similar to what Opus 4.6 does on my tasks sometimes.

> Blah blah blah (second guesses its own reasoning half a dozen times then goes). Actually, it would be a simpler to just ...

Specifically on Antigravity, I've noticed it doing that trying to "save time" to stay within some artificial deadline.

It might have something to do with the system messages and the reinforcement/realignment messages that are interwoven into the context (but never displayed to end-users) to keep the agents on task.

show 2 replies
storusyesterday at 8:40 PM

> to decide halfway through following my detailed instructions that it would be "simpler" to just... not do what I asked

That's likely coming from the 3:1 ratio of linear to quadratic attention usage. The latest DeepSeek also suffers from it which the original R1 never exhibited.

shaan7yesterday at 7:38 PM

> that it would be "simpler" to just... not do what I asked

That sounds too close to what I feel on some days xD

reactordevyesterday at 6:11 PM

Turn down the temperature and you’ll see less “simpler” short cuts.

show 1 reply
slicesyesterday at 10:23 PM

I've seen behavior like that when the model wasn't being served with sufficiently sized context window

Aurornisyesterday at 9:19 PM

> The main quirk I've found is that it has a tendency to decide halfway through following my detailed instructions that it would be "simpler" to just... not do what I asked,

This is my experience with the Qwen3-Next and Qwen3.5 models, too.

I can prompt with strict instructions saying "** DO NOT..." and it follows them for a few iterations. Then it has a realization that it would be simpler to just do the thing I told it not to do, which leads it to the dead end I was trying to avoid.