Really unfortunate that I forgot what YT video i saw about this just 2 weeks ago.
It was about Googles PaLM-E evolution and progress. It basically has two models one which controls the robot, the other is a llm and they are combined together in some attention layer.
Must have been https://www.youtube.com/watch?v=2mrGMMmrVNE from Welch Labs.