Thanks for the additional reading. I've often thought about LLMs and their ability to represent the physical world with its laws. And always concluded it is not really possible to do so with "just" text tokens and their relations in a latent space. It looks to me there are different approaches being taken to tackle this:
* You could instruct your LLM to interact with a simulator to run experiments and infer behaviour
* You could edit the transformer model and inject spatially relevant data rather than text as is done in above paper
* You could change the architecture to be more condusive for representating a world state. I.e., LeCun's JEPA world model.
* You could further enhance some of the above by using a differentiable physics engine (eg. NVIDIA Newton) to calculate losses directly.
But at the end of the day if a model has any hope to always produce realistic physics, it HAS to learn the laws of nature in some form or other. It looks to me that the next big leap could be achieved by combining the last two approaches.
P.S.: I like discussing such topics. If anyone knows a forum or discord with like-minded people, please let me know :)