> Clearly the theory that LLM's can't "extrapolate" is woefully incomplete at best (and most likely simply incorrect).
What example is there where an LLM has extrapolated? All I've seen is a data set so large and an extra decomposition process making it so interpolation feels like extrapolation if you don't look close enough.
> but a theory of why further advancements can't solve the deficiencies
How about LeCun's?