I gave Codex 5.4 Playwright MCP access to the site and a prompt of "Use Playwright CLI Skill to open https://playstarfling.com/ and load the game. Work out how to play it, and devise a strategy to win." After a about half a dozen attempts it had figured the game out. Then I prompted it to "Score as much as you can." It wrote itself an auto-play script that just keeps going.
I stopped it running at 10866. That's currently the high score. I appreciate that this is pointless and proves nothing, but I've been experimenting with automating testing games (I work at a gaming company at the moment) so it felt like an opportunity to try an experiment.
Do tell? How did it play the game, did you watch? Just took forever with every shot, or how did that play out with the LLM induced latency?
Are you sure the script is actually testing the gameplay? Given it can see the entire source code of the game.