LLM models can not do spacial reasoning. I haven't tried with GPT, however, Claude can not solv...

dataviz1000 • yesterday at 8:05 PM • 7 replies • view on HN

LLM models can not do spacial reasoning. I haven't tried with GPT, however, Claude can not solve a Rubik Cube no matter how much I try with prompt engineering. I got Opus 4.6 to get ~70% of the puzzle solved but it got stuck. At $20 a run it prohibitively expensive.

The point is if we can prompt an LLM to reason about 3 dimensions, we likely will be able to apply that to math problems which it isn't able to solve currently.

I should release my Rubiks Cube MCP server with the challenge to see if someone can write a prompt to solve a Rubik's Cube.

Replies

variodot • today at 10:30 AM

I’ve had a similar experience building a geometry/woodworking-flavored web app with Three.js and SVG rendering. It’s been kind of wild how quickly the SOTA models let me approach a new space in spatial development and rendering 3d (or SA optimization approaches, for that matter). That said, there are still easy "3d app" mistakes it makes like z-axis flipping or misreading coordinate conventions. But these models make similar mistakes with CSS and page awareness. Both require good verification loops to be effective.

➕ show 1 reply

embedding-shape • yesterday at 9:41 PM

> I should release my Rubiks Cube MCP server with the challenge to see if someone can write a prompt to solve a Rubik's Cube.

Do it, I'm game! You nerdsniped me immediately and my brain went "That sounds easy, I'm sure I could do that in a night" so I'm surely not alone in being almost triggered by what you wrote. I bet I could even do it with a local model!

versteegen • today at 4:43 AM

Interesting (would like to hear more), but solving a Rubiks cube would appear to be a poor way to measure spatial understanding or reasoning. Ordinary human spatial intuition lets you think about how to move a tile to a certain location, but not really how to make consistent progress towards a solution; what's needed is knowledge of solution techniques. I'd say what you're measuring is 'perception' rather than reasoning.

➕ show 1 reply

Melatonic • yesterday at 11:01 PM

What about a model designed for robotics and vision? Seems like an LLM trained on text would inherently not be great for this.

DeepMinds other models however might do better?

holoduke • today at 11:26 AM

I bet I can even do it with the smallest gemma 4 model using a prompt of max 500 characters.

snet0 • yesterday at 9:24 PM

How are you handing the cube state to the model?

➕ show 2 replies

Torkel • yesterday at 9:37 PM

*yet

alt Hacker News

Replies