I’ve been working on this for about a year through four major rewrites. Godogen is a pipeline that takes a text prompt, designs the architecture, generates 2D/3D assets, writes the GDScript, and tests it visually. The output is a complete, playable Godot 4 project.
Getting LLMs to reliably generate functional games required solving three specific engineering bottlenecks:
1. The Training Data Scarcity: LLMs barely know GDScript. It has ~850 classes and a Python-like syntax that will happily let a model hallucinate Python idioms that fail to compile. To fix this, I built a custom reference system: a hand-written language spec, full API docs converted from Godot's XML source, and a quirks database for engine behaviors you can't learn from docs alone. Because 850 classes blow up the context window, the agent lazy-loads only the specific APIs it needs at runtime.
2. The Build-Time vs. Runtime State: Scenes are generated by headless scripts that build the node graph in memory and serialize it to .tscn files. This avoids the fragility of hand-editing Godot's serialization format. But it means certain engine features (like `@onready` or signal connections) aren't available at build time—they only exist when the game actually runs. Teaching the model which APIs are available at which phase — and that every node needs its owner set correctly or it silently vanishes on save — took careful prompting but paid off.
3. The Evaluation Loop: A coding agent is inherently biased toward its own output. To stop it from cheating, a separate Gemini Flash agent acts as visual QA. It sees only the rendered screenshots from the running engine—no code—and compares them against a generated reference image. It catches the visual bugs text analysis misses: z-fighting, floating objects, physics explosions, and grid-like placements that should be organic.
Architecturally, it runs as two Claude Code skills: an orchestrator that plans the pipeline, and a task executor that implements each piece in a `context: fork` window so mistakes and state don't accumulate.
Everything is open source: https://github.com/htdt/godogen
Demo video (real games, not cherry-picked screenshots): https://youtu.be/eUz19GROIpY
Blog post with the full story (all the wrong turns) coming soon. Happy to answer questions.
This actually produces more impressive results than I expected. My understanding was that models are quite poor at spatial reasoning/understanding, so I'm surprised it can generate such good assets. Do you use different models for the 3d generation?
Great work but why not use C# instead of GDScript?
LLMs are really good at C# (and tscn files for some reason), so that solves the "LLMs suck at GDScript" problem. Also, C# can be cheaper in terms of token usage (even accounting for not having to load the additional APIs): one agent writes the interfaces, another one fills in the details.
Saying this because I had really enjoyed vibecoding a Godot game in C# - and it was REALLY painful to vibecode with GDScript.
Upvoting this! And thanks!
[flagged]
What is the development loop like with this? There’s a lot of folks successfully building games with agents already on the AI gamedev Discord server. So I’m wondering if there were some shorter paths to your goal. You might want to exchange notes with folks there.