logoalt Hacker News

Tiberiumtoday at 6:44 AM0 repliesview on HN

You can do this yourself with LLMs, but I admit that it's still not a trivial process. For the actual best results and the highest chance of progressing quickly, you really need a $200 OpenAI/Anthropic sub and a lot of free time. I seriously recommend OpenAI over Anthropic for this, as in my experience GPT 5.5 is far more thorough in reverse engineering and makes mistakes less often.

Then, you first need some tooling, either Ghidra (open-source) or IDA (paid), and some tooling to expose them to an LLM. I have a custom IDA Python-based CLI that lets LLMs trivially call into IDA's Python APIs from CLI, but there are tens of different Ghidra/IDA MCP/CLI/SQL projects out there.

After that, it becomes more mechanical, you just let the agent (or multiple agents) explore and start renaming (easier to start with functions + globals), better if it's something that's directly applied to the suite you're using, so that all names show up everywhere. This is actually quite a quick process, especially for older/smaller games. After you have enough function names, you need to instruct and have LLMs add actual types, so that the decompile doesn't have raw casts and offsets, but real fields + types. They will also need to apply types to globals and functions.

Afterwards comes the hardest part - you can export this very well named/annotated decompile, and do one of:

1) Have the LLM try to directly polish the code enough to be compilable. Despite of the decompile quality, this isn't as hard as it sounds.

2) Have the LLM recreate the project from scratch in another language, something like https://banteg.xyz/posts/crimsonland/. This can be easier to get initial results, but you're sure to hit tons of bugs due to the differences in implementations.

The 1st approach is more thorough, especially because you can find a matching C/C++ compiler and start doing actual decomp.dev-like work - binary matching functions so your source code compiles down to the exact bytes as in the original binary. This is the longest, hardest part, human community projects take years to complete, and LLMs still struggle with getting matches for 100%, so you might spend weeks-months, and tons of LLM usage - an agent can take like an hour to match a single 1000-byte function in worst case.

As a small note - you do not need to binary match 100% to have the game be absolutely playable. Compilers are often very tricky to lead, so you might already have the code that does exactly the same thing, but just a different local variable layout might make the compiler use different registers.