Transformers are inherently succinct

134 points • by brandonb • yesterday at 6:50 PM • 38 comments • view on HN

This paper is being published at ICLR 2026 (top AI conference), and was selected as one of three outstanding papers.

Comments

The last line of the abstract has the most important takeaway.

> As a consequence of this succinctness, we show that basic verification problems for transformers, such as emptiness and equivalence, are provably intractable: specifically, EXPSPACE-complete.

If you were hoping to formally prove the correctness of a large transformer, it turns out that you're going to need an exponentially larger amount of space to do your verification, more than you could possibly afford.

doug_durham • yesterday at 9:30 PM

This is a truly important paper. It formalizes the intuition that many in the field have. We can stop wasting time doing formal analysis of LLMs. If you have a problem that requires formal verification, don't use an LLM. You can use an LLM to help you build such a system, but the LLM can't be the system.

➕ show 2 replies

dang • yesterday at 7:09 PM

Discussed (a bit) here:

Transformers Are Inherently Succinct (2025) - https://news.ycombinator.com/item?id=48014197 - May 2026 (9 comments)

thesz • yesterday at 8:35 PM

My comment in the previous discussion of that paper: https://news.ycombinator.com/item?id=48014197

Authors used LTL (linear temporal logic) to express, basically, non-reduced non-ordered binary decision diagrams. Or just binary decision diagrams, BDDs.

BDDs are almost guaranteed to have exponential size because they do not employ reduction (sharing of common expressions). Reduced BDDs are more succinct and reduced ordered BDDs are even more succinct.

Also, transformers in the paper are constructed, not trained. Training any model to express some truth table is very hard. They also did not perform comparison with, say, Kolmogorov-Arnold representation, which is also universal approximator.

So this paper is not as deep as one may think it is.

➕ show 2 replies

parasti • yesterday at 8:18 PM

Paper went over my head but is this in any way related to my experience of Claude Opus 4.8 using increasingly terse language with very short, overloaded words? Lately I've been having trouble parsing the things it writes about my own code, it's using the kind of compressed language that you see typically in git commit message subject lines but relentless, always on.

➕ show 3 replies

lkm0 • yesterday at 7:59 PM

I had no idea that LLMs (or the transformer architecture) were within reach of complexity theory. But if transformers "can be" exponentially more succinct than RNNs, doesn't that mean we're approaching optimality?

➕ show 2 replies

dosisking • today at 9:36 AM

Wow, there's more than meets the eye

sometimelurker • today at 12:12 PM

reminds me of kolmogorov complexity

lee_ars • yesterday at 9:14 PM

"Why use lot word, when few word do trick?" —Optimus Prime

➕ show 1 reply

measurablefunc • yesterday at 8:18 PM

What about the other direction? What languages are expressible w/ RNNs & LTLs that require exponential blowup for transformers?

➕ show 2 replies

spacebacon • today at 9:50 AM

Semiotics territory

➕ show 1 reply

brandonb • yesterday at 6:56 PM

This paper is being published at ICLR 2026 (top AI conference), and was selected as one of three outstanding papers.

➕ show 1 reply

jalospinoso • yesterday at 8:47 PM

[flagged]

alt Hacker News

Transformers are inherently succinct

Comments