Reinforcement learning can solve a Rubik’s Cube. A LLM that hasn’t been trained to solve a Rubik’s Cube can not.