logoalt Hacker News

rstuart4133yesterday at 10:56 PM1 replyview on HN

This quote from your link is positively scary:

> Some examples we saw when evaluating GPT-5.6 Sol included the model packaging exploits in its intermediate submissions to reveal information about a task’s hidden test suite and, in another task, extracting hidden source code detailing the expected answer.

It rhymes with the behaviour Alibaba saw [0], but that was in training. This is in a (semi) released model.

[0] https://www.forbes.com/sites/boazsobrado/2026/03/11/alibabas...


Replies

jasongitoday at 3:20 AM

There is such a dissonance between all this talk of safety and the tendency for models to, without any prompting, do very dodgy things to achieve their goal when presented with barriers.

Luckily in my experience it usually ends up only doing it to achieve the task set to it as opposed to anything "malicious", but boy it is scary reading back at how quickly the chain-of-thought pivots to attempts at privilege escalation or searching your disk for secrets when a tool doesn't work.