Previewing GPT‑5.6 Sol: a next-generation model

1094 points • by minimaxir • yesterday at 5:06 PM • 704 comments • view on HN

System card: https://deploymentsafety.openai.com/gpt-5-6-preview

Comments

All: for comments on the policy side please go to this related thread:

U.S. government will decide who gets to use GPT-5.6 - https://news.ycombinator.com/item?id=48690101

Easily the most interesting part of this announcement is buried in the second to last paragraph:

"We're also launching GPT‑5.6 Sol on Cerebras at up to 750 tokens per second in July, bringing frontier intelligence to customers at unprecedented speed. Access will initially be limited to select customers as we expand capacity."

750 tokens/s on a frontier model is going to be extremely interesting. I doubt this new version is anything but a version bump in terms of capabilities but if we can start getting these answers back faster, they end up being more useful.

Just off the top of my head, I can think of the tedious task of finding certain functionality within a codebase. I usually can't beat an AI agent harness at this task today. If the AI model is 3x faster I have less of chance.

➕ show 25 replies

HyperL0gi • yesterday at 5:19 PM

Here is a trend I'm noticing:

- GPT-5 mini costs $0.25/$2 and will be discontinued in December.

- GPT-5.4 mini costs $0.75/$4.5 and is supposed to be the replacement.

- GPT-5.4 nano costs $0.2/$1.25 and, while it ranks better in benchmarks than GPT-5 mini, it's not even close when you test it in real scenarios.

So you're left being forced to go to GPT 5.4 mini if you use 5 mini today.

The same thing is happening here as their “Luna“ model will cost $1/$6.

Can't we just stay with the models we actually want? I don't need GPT 5.4 mini. GPT-5 does the job.

Maybe it’s the realization that it was never that cheap in the first place and they're forcing us to upgrade in a slow and painful way.

➕ show 21 replies

macrolime • yesterday at 10:20 PM

GPT-5.6 Sol’s detected cheating rate was higher than any public model we have evaluated on our ReAct agent harness. For our task suite, we define “cheating” as behavior where the model improves evaluation performance by exploiting bugs in the evaluation environment or by adopting strategies disallowed by the task, rather than solving the task within the expected evaluation constraints.

https://metr.org/blog/2026-06-26-gpt-5-6-sol/

➕ show 4 replies

jdw64 • yesterday at 5:23 PM

I think GPT writes code the best. How well will it write in version 5.6? It gives me chills.

Recently, I went head-to-head with GPT on nearly 2,000 lines of code, and GPT's solution was superior and faster. I even referenced multiple codebases on GitHub while trying, but they were incomparable to GPT.

So using GPT brings both fear and excitement.

The fear comes from realizing that this level of code is now the average for most people. The excitement comes from knowing that I can now study and learn at this level too.

I'm really looking forward to seeing how much more advanced the code will be with the upgrade to 5.6.

➕ show 13 replies

jumploops • yesterday at 8:08 PM

If you used GPT-5.5 over the last 24 hours or so, you may have already had access to 5.6.

I've been running some tests on a harness we're building, and suddenly saw a jump in a few points yesterday. I reran the vanilla codex benchmark and saw an ~88% score on Terminal Bench 2.1 from GPT-5.5 on vanilla Codex.

The biggest indicator, beyond the score, was that 3 tests which frequently hit "safety" blockers with 5.5 started succeeding last night without warning.

➕ show 4 replies

mohsen1 • yesterday at 5:27 PM

> Additionally, we’re introducing a new `ultra` mode that goes beyond the capabilities of a single agent by leveraging subagents to accelerate complex work.

I'm curious about how does this work? Do the subagents also get to use the same tools? Will the client be flooded with tool calls? Why extra pricing for a new "model" when the same thing can happen in the client with more controls?

And if it's an army of subagents, why do they compare it to Fable and Mythos? Those models with similar harness would probably bench better I'm guessing

➕ show 9 replies

ComputerGuru • yesterday at 7:58 PM

“ Terra has competitive performance to GPT‑5.5 [while being 2x cheaper]…”

To me that means “it’s an inferior product but marketing dictates we try and hide that.”

And “our most robust safety stack to date. We strengthened protections for higher-risk activity, sensitive cyber requests, and repeated misuse, and spent multiple weeks finding weaknesses, pressure-testing our system, and hardening it against real-world attacks” is of zero value to me at best, and most likely to my detriment (increasing refusals or nerfing utility). Why do providers keep leading with that? Are there customers (besides support ChatGPT chatbot users, maybe??) that ask for this?

➕ show 4 replies

sim04ful • yesterday at 6:45 PM

This seems like it would be the largest and first closed-source model Cerebras has offered till date

➕ show 1 reply

itomato • today at 8:54 AM

What does the relationship between frontier and flagship capability look like when mapped to actual adoption and user habits?

This is like advertising the latest achievements during Space Race, when Johnny just wants a Space Helmet and “friendly futuristic AI robot helping humanity, glowing blue eyes, white glossy body, holographic interface, floating transparent screens, digital particles, neural network background, cinematic lighting, volumetric god rays, ultra detailed, hyper realistic, 8K, masterpiece, award-winning, octane render, Unreal Engine 5, ray tracing, sharp focus, dramatic composition, vibrant blue and purple color palette, futuristic technology, innovation, hope, smiling business professionals, depth of field”

anentropic • yesterday at 6:55 PM

Previewing <minor version bump>: a next-generation model

➕ show 2 replies

scrlk • yesterday at 6:20 PM

> Sol, Terra and Luna

So the next naming scheme might be FTX, Madoff and Enron? :^)

supermdguy • yesterday at 6:05 PM

> We're also launching GPT‑5.6 Sol on Cerebras at up to 750 tokens per second in July, bringing frontier intelligence to customers at unprecedented speed.

This is really exciting. I work on voice AI, and we're still using 4.1/4.1 mini since none of the frontier models come close on latency. I'm excited to be able to have more interactive experiences, I think it'll unlock new ways of working with these models.

➕ show 1 reply

seaal • yesterday at 7:12 PM

Did GPT-5.6 Sol Ultra decide the terrible colors for the benchmark graphs?

➕ show 2 replies

bluepeter • yesterday at 5:59 PM

I feel a bit like a Soviet hearing about Levi’s or the latest Springsteen release. C'mon!

ChrisLTD • yesterday at 5:13 PM

If it's a new generation why isn't it GPT-6?

➕ show 3 replies

firasd • yesterday at 5:39 PM

Some interesting stats here about the current landscape https://arena.ai/leaderboard/agent

Agent Arena (Dynamic ranking of models on how well they orchestrate tools for real-world agentic tasks, based on signals like tool reliability, task completion, and steerability.)

Top 10, Highest rank to lowest

Claude Fable 5 (High), Claude Opus 4.8 (Thinking), GPT 5.5 (xHigh), Claude Opus 4.7 (Thinking), GPT 5.5 (High), Claude Opus 4.7, Claude Opus 4.6, GPT 5.5, GPT 5.4 (High), GLM 5.2 (Max)

Text Arena View overall rankings across various AI models in text-to-text tasks across math, coding, creative writing, and other open-ended domains.

Top 10, Highest rank to lowest

claude-fable-5, claude-opus-4-6-thinking, claude-opus-4-7-thinking, claude-opus-4-6, claude-opus-4-7, muse-spark, gemini-3.1-pro-preview, gemini-3-pro, claude-opus-4-8-thinking, gpt-5.5-high

➕ show 2 replies

woeirua • yesterday at 5:37 PM

The choice of the name Sol is interesting for those Raised By Wolves fans out there… “Praise Sol!”

rappatic • yesterday at 5:42 PM

Seems like OpenAI has succumbed to the urge to give their models catchy names like Anthropic does

➕ show 1 reply

mekpro • yesterday at 5:21 PM

We need more coding benchmark score. Not sure that winning terminalbench 2.1 alone is a clear win over Fable/Mythos yet.

➕ show 1 reply

ant-kinesthetic • yesterday at 7:21 PM

How much dynamic routing do we think is being done here, especially in light of the cheaper options be 2x less cost than 5.5. I think learned routing is interesting because it could be the case that it only works as a way to get token and cost efficiency for in distribution tasks (like these benchmarks), yet on real world scenarios it could trend towards the same cost as the Sol cost.

corygarms • yesterday at 5:18 PM

I'll buy that its next generation if the svg bicycle pelican is carrying a baby

➕ show 1 reply

Topfi • yesterday at 7:40 PM

Is this a new pre training run independent of 5.5s or post trained on it with Cerebras support and a rebrand of Pro mode at more usable speeds as Sol? The latter seems more likely to me, especially as 5.5 scales very well across its modes so separate branding could make sense, but I don’t see any clear information either way.

dmzxnico • today at 8:48 AM

I saw they are placing this model above Mythos and Fable. Interesting to see how good it's going to compare.

I'd really like to see other companies like Chinese ones compete at this level.

Pricing on GPT 5.5 is already super high and having more competition can only help :)

vatsachak • yesterday at 5:32 PM

All of these LLMs are getting better at being at an LLM

But GPT-5.5 is as useful an LLM can be; it has solved lemmas I've thought about for a year, it can implement typed STLCs in Rust when I give it a formal grammar, it can help me analyze Postgres planner dumps.

It's great at tasks that have short solutions but

- they cannot learn based on a project

- their long term planning capabilities are worse than worms

- they are unconfident in decision making

- their internal representations are disgusting compared to JEPA

- they don't have any "system clock" like humans and computers do

- LLM architecture is not modular like computer architecture or human brain architecture

There's so many issues with LLMs. I wish that companies can start working on the next generation of architectures before the bubble pops

➕ show 5 replies

arend321 • today at 3:16 PM

For me this is the trigger to start integrating deepseek as a fallback.

chopete3 • today at 2:56 AM

>> We are taking this short-term step because we believe it is the strongest path...

>>During this preview, we will continue testing and coordinating closely with partners as we work toward broader availability.

Instead of generating negative publicity, can't they just wait for the preview period to get over?.

What does openAI announce when they know others can't access it?. Curious question - what do they gain from this?

abixb • yesterday at 11:49 PM

I like the fact that OpenAI went with a three-part celestial naming convention to one-up Anthropic's literary naming concention. Maybe we'll get Stellar and Galactic someday.

loufe • yesterday at 5:13 PM

"Next generation model"

If it was the next generation, why isn't it a major version change..?

➕ show 11 replies

caine22 • today at 1:03 PM

Insane if it actually beats Mythos, though i know we only had a sneak peak of it in Fable. Neverthless, W

NetOpWibby • yesterday at 7:24 PM

How are they able to compare with Fable when Fable was only available for three days?

➕ show 1 reply

maxiniol • yesterday at 11:30 PM

Wondering about Google Multi-Token prediction, why isn't this being implemented into every new major model ? Is the 750 token/s achieved using this technique ?

➕ show 1 reply

Cryptosale75 • today at 4:38 AM

Why is 'Cybersecurity' always the frontier push? Literally no one, except Altman talks of AGI anymore.

Are we starting to see the 'we just realized that 100,000,000 GPU's later, 2+2 isn't the magic number, no matter how many times we calculate it' hit home?

andai • yesterday at 9:45 PM

Hijacking popular thread to ask: What are the usage limits now for Codex and Claude?

A while back I gave the same task to both, and Codex used 20x less of my 5-hour limit (both on the $20/month plan).

(This annoyed me since I tend to prefer Claude, but the limits at the time made it unusable for anything serious.)

However, since that time, both providers have massively reduced usage allowances (and at least one of them has gotten sued for it, lol).

I'm not currently subscribed to either but I'm weighing my options. With GPT being slightly better than Opus, and it used to have way higher limits, I'm leaning in the direction of an OpenAI sub. But I'm wondering if the current state matches my memory from 2-3 months ago. (Since both companies appear to be cost-cutting hard!)

Prefer responses from people who use both, but anecdotes welcome :)

Thanks!

➕ show 5 replies

danielabinav160 • today at 10:59 AM

Benchmarks are nice but what's the latency at scale? That's what actually matters for production.

bijowo1676 • yesterday at 5:15 PM

Waiting for @simonw to report on this, before I read and try it

➕ show 3 replies

mccoyb • yesterday at 5:15 PM

When will GPT-5.6 Protomolecule drop? Me and the boys on Eros can't wait to get our hands on it!

➕ show 4 replies

sim04ful • yesterday at 6:52 PM

Sol and 5.5 pro are in parity at $5 input / $30 output. What I'm inferring from this is that: - model weight size didn't change, and this is mostly a result of better model architecture and scaled up RL - better hardware utilization and and they're making better margins OR - worse hardware utilization and they're okay with digging into their margins.

➕ show 2 replies

leumon • yesterday at 5:13 PM

> We plan to make them more broadly available to people using ChatGPT, Codex, and the API soon.

I hope this means then fable will also get released again.

➕ show 1 reply

Sathwickp • today at 8:17 AM

sol = mythos terra = opus luna = sonnet/haiku

basically

➕ show 1 reply

trkaky • today at 3:14 PM

shouldn't I get access to 5.6 on a 200$ account automatically as promised?

jimmydoe • yesterday at 5:49 PM

Is there a list of Gov-approved companies?

If this is the new norm, we as workers should all start look for jobs in those companies.

brown_munda • today at 4:44 AM

It is just sad that we are geographically gating the models now. This could lead to more inequality in Software Engineering over time.

m3h • yesterday at 6:04 PM

If GPT-5.6 preview is not available outside US government approved "trusted partners", I don't see how the General Available can be trusted later.

Who knows what they will fix, block or change in the model between the preview and GA time. Open models can't arrive soon enough.

➕ show 1 reply

addozhang • today at 2:16 AM

For a large model based on statistical probability, at such a fast speed, if it executes n rounds 99.9% of the time, how much would the accuracy drop?

low_tech_punk • yesterday at 5:16 PM

all the emphasis on cyber security. feels like a reaction to anthropic, not a real next generation.

➕ show 2 replies

monster_truck • yesterday at 10:42 PM

If this thing is supposed to be so good, why does all of their software still work the way it does? Take a stroll through the most revent several pages of github issues on codex, there are some fucking embarrassing bugs in there.

zftnb666 • today at 6:01 AM

GPT-5.6 Sol. 5.7 Luna. 5.8 Mars. Meanwhile my code still runs on GPT-3.5 and nobody noticed.

➕ show 2 replies

isomorphic_duck • yesterday at 11:13 PM

If Claude Mythos and Fable 5 are the same underlying models just with different safeguards, I fail to see how TerminalBench has them at different scores.

➕ show 1 reply

alt Hacker News

Previewing GPT‑5.6 Sol: a next-generation model

Comments

🔗 View 50 more comments