For anyone using LLMs heavily for coding, this shouldn't be too surprising. It was just a matter of time.
Mathematicians make new discoveries by building and applying mathematical tools in new ways. It is tons of iterative work, following hunches and exploring connections. While true that LLMs can't truly "make discoveries" since they have no sense of what that would mean, they can Monte Carlo every mathematical tool at a narrow objective and see what sticks, then build on that or combine improvements.
Reading the article, that seems exactly how the discovery was made, an LLM used a "surprising connection" to go beyond the expected result. But the result has no meaning without the human intent behind the objective, human understanding to value the new pathway the AI used (more valuable than the result itself, by far) and the mathematical language (built by humans) to explore the concept.
I think one interesting thing to point out is that the proof (disproof) was done by finding a counterexample of Erdős' original conjecture.
I agree with one of the mathematician's responses in the linked PDF that this is somewhat less interesting than proving the actual conjecture was true.
In my eyes proving the conjecture true requires a bit more theory crafting. You have to explain why the conjecture is correct by grounding it in a larger theory while with the counterexample the model has to just perform a more advanced form of search to find the correct construction.
Obviously this search is impressive not naive and requires many steps along the way to prove connections to the counterexample, but instead of developing new deep mathematics the model is still just connecting existing ideas.
Not to discount this monumental achievement. I think we're really getting somewhere! To me, and this is just vibes based, I think the models aren't far from being able to theory craft in such a way that they could prove more complicated conjectures that require developing new mathematics. I think that's just a matter of having them able to work on longer and longer time horizons.
As I have stated before, AI will win a fields medal before it can manage a McDonald's
A difficult part was constructing a chess board on which to play math (Lean). Now it's just pattern recognition and computation.
LLMs are just the beginning, we'll see more specialized math AI resembling StockFish soon.
The proof brings unexpected, sophisticated ideas from algebraic number theory to bear on an elementary geometric question.
The more I read about these achievements the more I get a feeling that a lot of the power of these models comes from having prior knowledge on every possible field and having zero problems transferring to new domains.
To me the potential beauty of this is that these tools might help us break through the increasing super specialization that humans in science have to go through today. Which in one hand is important on the other hand does limit the person in terms of the tooling and inspiration it has access to.
From the companion paper:
> The argument relies crucially on ideas that may, at least in retrospect, be attributed to Ellenberg-Venkatesh, Golod-Shafarevich, and Hajir-Maire-Ramakrishna.
Can someone please elaborate on this?
I like how everyone laughed when OpenAI said their models will have "PhD-Level Intelligence" and now the goalpost has been moved to if AI can create new math (i.e., not PhD-Level, but Leibniz/Euler/Galois level.)
The summarized chain of thought for this task (linked in the blogpost) is 125 pages. That's an insane scale of reasoning, quite akin to what Anthropic has been teasing with Mythos.
Is there a reason why we only hear of Erdos problems being solved? I would imagine there are a myriad of other unsolved problems in math, but every single ChatGPT "breakthrough in math" I come across on r/singularity and r/accelerate are Erdos problems.
This is impressive, no question.
Without knowing all this model has been trained on though, it is pretty hard to ascertain the extent to which it arrived to this "on its own". The entire AI industry has been (not so secretly) paying a lot of experts in many fields to generate large amounts of novel training data. Novel training data that isn't found anywhere else--they hoard it--and which could actually contain original ideas.
It isn't likely that someone solved this and then just put it in the training data, although I honestly wouldn't put that past OpenAI. More interesting though is the extent to which they've generated training data that may have touched on most or all of the "original" tenets found in this proof.
We can't know, of course. But until these things are built in a non-clandestine manner, this question will always remain.
What strikes me in this case (and I haven't seen in other comments) is that it's a _disproof_ of a conjecture put forth by Erdős and supported (at least according to OpenAI) by other professional mathematicians. Erdős, one of the greats, thought that the limit was O(n^{1 + o(1)}), which GPT disproved.
We can argue about recombination/interpolation of training data in LLMs, but even if this was an interpolation, the result was contrarian rather than a confirmation. Any system that can identify an error in Erdős's thinking seems very useful to me (though perhaps he did not spend much time thinking about or checking this particular conjecture).
One thing seems for certain is that OpenAI models hold a distinct lead in academics over Anthropic and Google models.
For those in academics, is OpenAI the vendor of choice?
Would be interesting to know what kind of preparatory work actually went into this - how long did it take to construct an input that produced a real result, and how much input did they get from actual mathematicians to guide refining it
I actually tried using GPT-5.5 Pro on this problem recently. It thought it was making progress on one path, but it made so many mistakes that it didn't feel worth it pushing further. It'll be interesting to check whether it's the same route. I got partial results (proved in Lean) that improve on the best-known results for four Erdős problems with GPT-5.5 Pro
To paraphrase Gwynne Shotwell: “Not too bad for just a large Markov chain, eh?”
Can anyone find (or draw) a picture of the construction?
I guess if this stuff is going to make my employment more precarious, it’d be nice if it also makes some scientific breakthroughs. We’ll see
Few questions that the blog did not answer, if anyone knows that'll be great:
- Does anyone know if this was a 1 minute of inference or 1 month?
- How many times did the model say it was done disproving before it was found out that the model was wrong/hallucinating?
- One of the graphs say - the model produced the right answer almost half the times at the peak compute??? did i understand that right? what does peak compute mean here?
Not to dismiss the AI but the important part is that you still need someone able to recognize these solutions in the first place. A lot of things were just hidden in plain sight before AI but no one noticed or didn't have the framework either in maths or any other field they're specialized in to recognize those feats.
How do you even get an LLM to try to solve one of these problems? When I ask it just comes back with the name of the problem and saying "it can't be done"
I think it's worth being skeptical of this.. there's a way too common pattern of "AI Lab Shows AI Doing Something Only Humans Can Do" only for a bunch of important caveats and limitations to be discovered after the initial hype. And of course, the correction never seems to be as viral as the hype. I'll believe it when a mathematician actually reads the 100+ pages of reasoning.
Timothy Gowers' tweet about this: "If you are a mathematician, then you may want to make sure you are sitting down before reading futher.".
woah.
Speaking as a postdoc in math, I must say that this is rather exciting. This is outside of my field, but the companion remarks document is quite digestible. It appears as though the proof here fairly inspired by results in literature, but the tweaks are non-trivial. Or, at least to me, they appear to be substantial to where I would consider the entire publication novel and exciting.
Many of my colleagues and I have been experimenting with LLMs in our research process. I've had pretty great success, though fairly rarely do they solve my entire research question outright like this. Usually, I end up with a back and forth process of refinements and questions on my end until eventually the idea comes apparent. Not unlike my traditional research refinement process, just better. Of course, I don't have access to the model they're using =) .
Nevertheless, one thing that struck me in this writeup, was the lack of attribution in the quoted final response from the model. In a field like math, where most research is posted publicly and is available, attribution of prior results is both social credit and how we find/build abstractions and concentrate attention. The human-edited paper naturally contains this. I dug through the chain-of-thought publication and did actually find (a few of) them. If people working on these LLMs are reading, it's very important to me that these are contained in the actual model output.
One more note: the comments on articles like these on HN and otherwise are usually pretty negative / downcast. There's great reason for that, what with how these companies market themselves and how proponents of the technology conduct themselves on social media. Moreover, I personally cannot feel anything other than disgust seeing these models displace talented creatives whose work they're trained on (often to the detriment of quality). But, for scientists, I find that these tools address the problem of the exploding complexity barrier in the frontier. Every day, it grows harder and harder to contain a mental map of recent relevant progress by simple virtue of the amount being produced. I cannot help but be very optimistic about the ambition mathematicians of this era will be able to scale to. There still remain lots of problems in current era tools and their usage though.
How did they jump from finding counter-examples (disproof) to a proof?
AI isn't going to supercharge science but I wouldn't be as dismissive as other posters here.
I’m curious about the “autonomous” claim. Usually these systems require a human to guide and verify steps, clarify problems, etc. are they claiming that the reinforcement model wasn’t given any inputs, tools, guidance, or training data from humans?
This topic and discussion is out of my league what is the implication here ? LLMs aren't a dead end ?
To all AI skeptics:
What is preventing AI from continuing to improve until it is absolutely better than humans at any mental task?
If we compare AI now vs 2022 the difference is outstandingly stark. Do you believe this improvement will just stop before it eclipses all humans in everything we care about?
I wonder whether there will be progress in string theory from these kinds of applications of AI.
Nice. By the year 2100 200 Erdos problems will have been solved by AI. Let's build more data centers.
While the result is impressive, this blog post is extremely disappointing.
- It does not show an example of the new best solution, nor explain why they couldn't show an example (e.g. if the proof was not constructive)
- It does not even explain the previous best solution. The diagram of the rescaled unit grid doesn't indicate what the "points" are beyond the normal non-scaled unit grid. I have no idea what to take away from it.
- It's description of the new proof just cites some terms of art with no effort made to actually explain the result.
If this post were not on the OpenAI blog, I would assume it was slop. I understand advanced pure mathematics is complicated, but it is entirely possible to explain complicated topics to non-experts.
Another entry in a growing list of the last couple months (interestingly mostly Open AI):
1. Erdos 1196, GPT-5.4 Pro - https://www.scientificamerican.com/article/amateur-armed-wit...
There are a couple of other Erdos wins, but this was the most impressive, prior to the thread in question. And it's completely unsupervised.
Solution - https://chatgpt.com/share/69dd1c83-b164-8385-bf2e-8533e9baba...
2. Single-minus gluon tree amplitudes are nonzero , GPT-5.2 https://openai.com/index/new-result-theoretical-physics/
3. Frontier Math Open Problem, GPT-5.4 Pro and others - https://epoch.ai/frontiermath/open-problems/ramsey-hypergrap...
4. GPT-5.5 Pro - https://gowers.wordpress.com/2026/05/08/a-recent-experience-...
5. Claude's Cycles, Claude Opus 4.6 - https://www-cs-faculty.stanford.edu/~knuth/papers/claude-cyc...
I would have thought a triangular grid works better than a grid of squares. You get ~3n links vs ~2n for the square grid. Curious what the AI came up with.
Which model did this? Is it available to the public?
As this becomes more common it makes me wonder where the LLM ends and the harness begins.
The underlying model may still effectively be a stochastic parrot, but used properly that can do impressive things and the various harnesses have been getting better and better at automating the use of said parrot.
"The proof came from a general-purpose reasoning model, not a system built specifically to solve math problems or this problem in particular, and represents an important milestone for the math and AI communities."
How central is it in the discrete geometry? Could anyone with the knowledge in the field reply?
> AI is about to start taking a very serious role in the creative parts of research, and most importantly AI research itself. While this progress is not unexpected, it reinforces the urgency we feel about understanding this next phase of AI development, the challenges of aligning very intelligent systems, and the future of human-AI collaboration.
I find this hyperbolic, but ya gotta juice up the upcoming IPO. I hate that they took an interesting announcement and reminded me why I hate tech and our society at the end.
I wonder how much this cost vs a Math Professor or a team of Math Professors.
Can someone explain to me what is their "prompting-scaffolding" to make it work ?
can the AI please tell us what to do now that all knowledge work will become unemployment?
Every time I interact even with OpenAI's pro model, I am forced to come to the conclusion that anything outside the domain of specific technical problems is almost completely hopeless outside of a simple enhanced search and summary engine.
For example, these machines, if scaling intellect so fiercely that they are solving bespoke mathematics problems, should be able to generate mundane insights or unique conjectures far below the level of intellect required for highly advanced mathematics - and they simply do not.
Ask a model to give you the rundown and theory on a specific pharmacological substance, for example. It will cite the textbook and meta-analyses it pulls, but be completely incapable of any bespoke thinking on the topic. A random person pursuing a bachelor's in chemistry can do this.
Anything at all outside of the absolute facts, even the faintest conjecture, feels completely outside of their reach.
Is this something that can be made explainable to someone without any of the relevant background, or is this one of those things where all that background is needed to understand it? Because I have no idea what's going on here, but would like to.
The back and forth in this discussion reveals to me we are sorting through a kind of philosophical debate about intelligence. That alone tells me LLMs are doing something novel.
Important note: this was not done with a special mathematics harness or specialized workflow.
I wonder if it has anything to do with the fact that AI is a grid of grid-calculating grids. It seems like it would be especially well suited to finding solutions about grids. That is until you consider the fact that even 1 trillion billion grids is still not anywhere close to an infinite grid. So, probably slop.
Absolutely no proof that any LLM actually found the result, and just a mention of an "internal model". Served to you by one of the biggest liars in the world.
Why would anyone believe this to be true even for a split second?
To the “LLMs just interpolate their training data” crowd:
Ayer, and in a different way early Wittgenstein, held that mathematical truths don’t report new facts about the world. Proofs unfold what is already implicit in axioms, definitions, symbols, and rules.
I think that idea is deeply fascinating, AND have no problem that we still credit mathematicians with discoveries.
So either “recombining existing material” isn’t disqualifying, or a lot of Fields Medals need to be returned.