System card: https://deploymentsafety.openai.com/gpt-5-6-preview
Flagged activity can also trigger account-level review across relevant conversations and risk signals, consistent with our terms and policies around content retention and review. Looking beyond a single conversation helps our systems distinguish persistent malicious behavior from legitimate dual-use security work, where similar technical concepts may appear in very different contexts.
Fascinating!Every conversation you have with these "more capable" models will be monitored and joined up and then your entire account might one day be tagged as Distiller or Cyber Threat Actor or whatnot. When combined with identity verification (which isn't discussed in this press release), expect people to be falsely flagged and banned from ever using OpenAI models again.
Wish I could find the thread from last week where discussions of exactly this kind of thing were dismissed as daft and outlandish.
Another year, and OpenAI comes up with yet another naming scheme for their models. First it was integers (GPT2, GPT3). Then they added friendly names (remember Ada, Babbage, Curie, Davinci?), but decided against it. Instead we got dot integers (GPT3.5), then then letter-number modifiers (o1), plus word modifiers like o1-pro, o3-mini, or -mini-high, or codex, codex-max, Pro, etc.
Now they've got friendly cosmic names. And this time they want us to believe that this time they're gonna stick to a naming convention? I'll believe it when they do 3 releases in a row without inventing a new naming scheme.
Guess it's just another price bump hidden behind output token speed.
[flagged]
[dead]
[flagged]
[flagged]
[flagged]
[flagged]
[dead]
[flagged]
[flagged]
TLDR - It's not quite Mythos but it uses about 5 times less tokens, and those tokens are also cheaper?
https://pbs.twimg.com/media/HLwuJLvbwAAOfQZ?format=jpg&name=...
[dead]
[flagged]
they're trying to be anthropic with these model names
whoa, a new model that surpasses benchmarks of other models? wild.
Could not care less.
Doesn't it strike anyone as strange that SOL, TERRA, and LUNA are all quasi-scam crypto tickers?
GPT 5.5 in Codex is so much worse than Opus, and sometimes worse than Sonnet. I don't think 5.6 Sol will be anywhere near Fable, let alone Mythos. Probably slightly better than Opus. Maybe not even.
Time to create more LLM based startups.
* House design plans from prompts
* Government surveillance of public communication
* Extracting world/spatial concepts from language models (do we really need a world/spatial models now?)
* Driverless City planning startups
* Election vote rigging/harvesting startups
* Video game NPC backstory startups (all NPCs in GTA 6 go to work, go home, shower, go to sleep now?)
Keep moving don't doom.I can’t help but think that these benchmarks are completely fake. Sam even posted a benchmark on X a couple days ago of how the ‘complete version’ of 5.5 cyber was already ahead of Mythos apparently. This just feels like absolutely fake nonsense. The impact of Mythos on the industry was clear and in front of everyone’s eyes. The amount of vulnerabilities Mozilla fixed. The vulnerabilities and exploits Anthropic showcased in that blog post about the chrome sandbox escape etc. And now we’re supposed to believe this 5.5 cyber is already ahead of Mythos, ok. And yeah, gpt 5.6 is even further ahead, alright.
I hate not being able to use the latest models. There needs to be a much faster resolution to whatever is happening with the federal government.