Antigravity 2.0 Tops the OpenSCAD Architectural 3D LLM Benchmark

305 points • by jetter • today at 10:38 AM • 116 comments • view on HN

Comments

Last weekend I bought my wife a bike off marketplace. It was in good condition but was missing one of the internal cable routing grommets. I gave Claude pictures of the pill-shaped hole by itself and with my digital calipers in the long and short directions.

Gave it a short prompt and it gave me an openscad model with everything parametrized. I printed with no changes in tpu and it was nearly perfect on the first try. Claude put in a 0.3mm subtraction in the x/y dimensions and I lowered it to 0.1 and it's perfect.

Much easier shape than ancient Roman architecture but still very cool how easy it was.

➕ show 5 replies

jlhawn • today at 4:34 PM

> Antigravity was the only autonomous agent that implemented the Pantheon’s signature interior ceiling pattern: repeated square coffers visible through the oculus.

That is seriously really impressive. I looked at the 3D model and didn't even thing to LOOK INSIDE the building before reading this.

Here's [1] the 3D model with `show_cutaway` enabled.

[1] https://modelrift.com/models/pantheon-benchmark-antigravity-...

➕ show 1 reply

tjoff • today at 6:46 PM

I've had such a bad time trying to do this myself. You might get a half-way decent draft on the first try and then you start to "debug" this and after a very frustrating session you realize that the model can't properly "see" the results. That is, you just can't iterate on it, at all.

I'm guessing that most harnesses/tools will resize an image before processing and in doing so will loose enough detail to make it much harder to reason about - especially wireframe images.

I'm sure I'm holding it wrong, but this test didn't really test this. It was just a one off. That breaks down pretty quickly and especially if you don't have reference pictures of what you are trying to create.

mellosouls • today at 11:13 AM

Antigravity may well Top the whatever benchmark but:

My Antigravity (forced) replacement for Gemini CLI requires me to log on via browser every time I use it, and my Antigravity IDE won't update at all, so:

If it's ok I'd prefer they just work on reaching a baseline acceptable rollout before worrying about being Top in anything.

Ps actual title:

OpenSCAD LLM Benchmark: Building the Pantheon

➕ show 6 replies

ponyous • today at 2:05 PM

I've run a tons of benchmarks for OpenSCAD for all kinds of models and setups, and what I realised is:

- Models are very jagged (might excel in one type of 3d model, but not another)

- Gemini models are the least jagged in my experience and have the best image understanding

- Gemini models are also the most creative (which may be undesirable if you want precise CAD part)

- Overall this benchmark doesn't prove much because one 3d model (and one attempt) is just not enough. I am usually testing on at least a dozen models each generated 3 times, but should really do much more, but it's too pricey for a solo dev.

Still, thanks for publishing this. Will be definitely run flash 3.5 soon to see how it performs.

1970-01-01 • today at 1:40 PM

Creating a single real-world object and declaring it a benchmark? No, it doesn't work that way for a robust tool. You need to do something like Iron Chef, with a Greek architecture theme and and a panel or judge that declares the winner. This is just seeing which tool subjectively makes the best looking Pantheon.

➕ show 1 reply

seemaze • today at 5:11 PM

I'm unconvinced, this is one of the most iconic historical buildings with tomes written about it and plenty of existing photographs and public models to train on.

I would be more interested in benchmarking the modeling of an anonymous structure based on provided references alone. It kind of feels like the shallow magic of watching an LLM one-shot a to-do app..

seniorsassycat • today at 4:39 PM

I tried Claude code designing a snap fit, vase mode printed box. Ultimately didn't work out, it couldn't get the tolerances right and kept designing features that wouldn't print in vase mode.

Scad needs unit tests. It would be powerful to asset that a profile doesn't have slope greater than 45°, that intersection of two objects is null, or specific volume.

It also needs cut away views. I got okay results using boxes to remove everything except a sliver, to view a slice and internal details. But without hash marks, texture, or outlines it can be hard to tell the forms.

➕ show 1 reply

dhfbshfbu4u3 • today at 11:32 AM

Still a long way from shorting Autodesk.

As a side note Autodesk released an agentic assistant back in December for Fusion. Six months later it is still quite bad.

➕ show 3 replies

sjia • today at 6:37 PM

Isn't CadQuery more professionally than OpenSCAD close to traditional CAD / mechanical engineering workflows. Not sure which model (ChatGPT, Gemini, and Claude Code) is better for CadQuery code generation?

thedougd • today at 4:25 PM

I've been trying out MCP servers for FreeCAD to mixed results.

One area I had near magic was providing a land survey which includes details in writing of the plat. It took those directions and beautifully reconstructed the boundaries to exact precision in CAD.

Where I ran into trouble was creating good constraints on sketches without being overly explicit. I kept running into it creating distance constraints from an arbitrary point instead of using other elements in the diagram that a human drafter would think to do by default.

debarshri • today at 11:51 AM

I have been using GPT 5.5 to build a video game. Benchmark sounds about right. It generates assets and sprite good enough, if not closer to AAA level games. Will check antigravity now.

➕ show 1 reply

lithiumii • today at 3:53 PM

That's actually a reason for me to try it again. My past attempts to use LLM for OpenScad has greatly improved my own OpenScad skills.

usermac • today at 4:08 PM

I've been using LLM's to do my OpenSCAD work for over two years now. It's always where I start (and end).

emmanuelsemugga • today at 4:22 PM

This is a really important project. Preserving humanity’s knowledge and making it openly accessible,including in formats usable by AI systems feels like one of the most valuable things happening right now. Thank you for the clear technical instructions and the bulk download options.

Projects like Anna’s Archive make it much easier for researchers and builders to work responsibly with large datasets.

pshirshov • today at 3:24 PM

That's curious, I've been trying to do some parametric modeling with Claude - and its performance was abysmal.

➕ show 1 reply

faangguyindia • today at 11:41 AM

Why are specialized CAD making LLM models not showing up? In future are we going to have same model for everything? from programming to creative writing to CADs?

➕ show 2 replies

a3w • today at 12:03 PM

Claude Code 2.1 / Opus 4.7 looks best to me: Dome and ceiling structure is correcter than the others.

Why is this medium ranked, and not on par with the best two?

➕ show 2 replies

megiddo • today at 12:12 PM

This would be the same Antigravity 2.0 that "surprise, no longer an IDE, did I forget to mention that? Lolol."

➕ show 1 reply

jdw64 • today at 11:57 AM

To be brutally honest, I'm disappointed with antiGravity. It feels incredibly unGoogle-like. The AI billing models are fragmented, and the AntiGravity IDE is currently tripping over something as trivial as a basic Electron deployment config bug.

Don't get me wrong, I don't think AI coding is a bad thing. For East Asians like myself, it levels the playing field with Westerners, so as long as you rigorously review the AI's output, it's a perfectly viable tool.

However, the absolute farce we just witnessed with the antiGravity2.0 update really raises doubts about whether 'vibe coding' can actually be trusted. If even a behemoth like Google is dropping the ball like this, it says a lot.

➕ show 3 replies

ReptileMan • today at 11:16 AM

The only thing faster moving that AI these days are the goalposts. Three years ago we would have been amazed if models were able to produce anything, now we have the luxury of nitpicking. Even the worst entries in the benchmark are quite impressive.

➕ show 5 replies

Onplana • today at 1:45 PM

Going to try it. just downloaded. will see how it is compared to Claude Code

anony-123 • today at 1:49 PM

So, does it mean Antigravity is better than Claude code with opus model? Given this benchmark. I once tried Antigravity and it was just disappointing.

nycdatasci • today at 12:07 PM

And yet 300+140=460. A very jagged surface indeed. https://gemini.google.com/share/c2a187275e26

➕ show 2 replies

dilap • today at 2:36 PM

Why Codex GPT-5.5 High instead of Extra High, I wonder?

u8 • today at 2:00 PM

It's crazy how I can see articles like this, but in my practical every day use antigravity is a horrible consumer experience. The TUI is broken. You cannot type input while the model is outputting text, otherwise both get messed up and the the TUI renders a sickly blob of text. There are no keyboard shortcuts to switch between planning and execution mode, or a way to directly load skills.

The usage limits are too aggressive, too. I tried to generate a quick Deno Fresh website to act as a a redirect to my GitHub from socials (literally the simplest possible thing I could have asked of it) and it chewed through my five hour limit in tokens from scaffolding.

To me, as a developer of CLI developer tooling, its obvious not a lot of thought or testing went into this product, but as Google has said before: the models are the product".

spiderfarmer • today at 11:23 AM

Next month they'll be beaten again.

And next year Google will probably sunset Antigravity.

If it doesn't make Google billions, don't trust them.

➕ show 1 reply

bobbycastorama • today at 12:08 PM

Why are half of the comments on Hackernews stereotypical AI-bros whose lives revolve around tech, and the other half sceptical commentators whose lives also revolve around tech but they are disappointed with its performance?!

Where are the normal people :/

➕ show 7 replies

robert_ddsbos • today at 2:00 PM

[flagged]

rizkimurtadha • today at 3:18 PM

[flagged]

MarStudio • today at 1:10 PM

[dead]

eddyaipt • today at 1:10 PM

[flagged]

hacker_mar • today at 2:03 PM

[dead]

beanjuiceII • today at 11:44 AM

google..no thanks

fnordpiglet • today at 3:45 PM

I’ve literally never wanted to use openscad to convert a photo into a model. Usually I have a functional requirement such as making an en enclosure with a spec sheet to work from on the enclosed device.

Claude 4.6 before the lobotomy in Claude code was able to take a PSU spec sheet and my requirements for glands and ports, use YAPP and openscad MCPs to iteratively and unassisted build end to end a printable enclosure that was perfectly suited for the PSU with right dimensions and screw holes, mountings, grills, gland ports, everything, placed for optimal printing. This was the moment I felt like LLMs had really arrived.

A photo of a building? Why. That’s a mesh problem and is about fidelity. A technical spec sheet and diagrams to functional print with intelligent choices about the functional part baked in? That’s useful.

alt Hacker News

Antigravity 2.0 Tops the OpenSCAD Architectural 3D LLM Benchmark

Comments