logoalt Hacker News

simonwyesterday at 10:21 PM2 repliesview on HN

A bit odd that this talks about AutoGPT and declares it a failure. Gary quotes himself describing it like this:

> With direct access to the Internet, the ability to write source code and increased powers of automation, this may well have drastic and difficult to predict security consequences.

AutoGPT was a failure, but Claude Code / Codex CLI / the whole category of coding agents fit the above description almost exactly and are effectively AutoGPT done right, and they've been a huge success over the past 12 months.

AutoGPT was way too early - the models weren't ready for it.


Replies

lbritoyesterday at 10:44 PM

>they've been a huge success over the past 12 months

They lose billions of dollars annually.

In what universe is that a business success?

show 1 reply
anonymous908213yesterday at 10:35 PM

Have they actually been a huge success, though? You're one of the most active advocates here, so I want to ask you what you make of "the Codex app". More specifically, the fact that it's a shitty Electron app. Is this not a perfect use case for agents? Why can OpenAI, with unlimited agents, not let them loose on the codebase with instructions to replace Electron with an appropriate cross-platform native framework, or even a per-platform native GUI? They said they chose Electron for ease of portability for cross-platform delivery, but they could allocate 1, 10, or 1000 agents to develop a native Linux and native Windows port of the MacOS codebase they started with. This is not even a particularly serious endeavour. I have coded a cross-platform chat application myself with more advanced features than what Codex offers, and chat GUIs are really among the most basic thing you can be doing; practically every consumer-targeted GUI application finds a time when they shove a chat box into a significantly more complex framework.

The conclusion that seems readily apparent to me, as it has always been, is that these "agents" are completely incapable of creating production-grade software suitable for shipping, or even meaningfully modifying existing software for a task like a port. Like the one-shot game they demo'd, they can make impressive proof-of-concepts, but nothing any user would use, nor with a suitable foundation for developers to actually build upon.

show 2 replies