These models are so powerful. It's totally possible to build entire software products in the ...

xrd • today at 6:10 PM • 7 replies • view on HN

These models are so powerful.

It's totally possible to build entire software products in the fraction of the time it took before.

But, reading the comments here, the behaviors from one version to another point version (not major version mind you) seem very divergent.

It feels like we are now able to manage incredibly smart engineers for a month at the price of a good sushi dinner.

But it also feels like you have to be diligent about adopting new models (even same family and just point version updates) because they operate totally differently regardless of your prompt and agent files.

Imagine managing a team of software developers where every month it was an entirely new team with radically different personalities, career experiences and guiding principles. It would be chaos.

I suspect that older models will be deprecated quickly and unexpectedly, or, worse yet, will be swapped out with subtle different behavioral characteristics without notice. It'll be quicksand.

Replies

simonw • today at 6:21 PM

I had an interesting experience recently where I ran Opus 4.6 against a problem that o4-mini had previously convinced me wasn't tractable... and Opus 4.6 found me a great solution. https://github.com/simonw/sqlite-chronicle/issues/20

This inspired me to point the latest models at a bunch of my older projects, resulting in a flurry of fixes and unblocks.

➕ show 4 replies

nly • today at 10:47 PM

I keep giving the top Anthropic, Google and OpenAI models problems.

They come up with passable solutions and are good for getting juices flowing and giving you a start on a codebase, but they are far from building "entire software products" unless you really don't care about quality and attention to detail.

jama211 • today at 6:16 PM

Yeah I keep maintaining a specific app I built with gpt 5.1 codex max with that exact model because it continues to work for the requests I send it, and attempts with other models even 5.2 or 5.3 codex seemed to have odd results. If I were superstitious I would say it’s almost like the model that wrote the code likes to work on the code better. Perhaps there’s something about the structure it created though that it finds easier to understand…

seizethecheese • today at 6:12 PM

> It feels like we are now able to manage incredibly smart engineers for a month at the price of a good sushi dinner.

In my experience it’s more like idiot savant engineers. Still remarkable.

➕ show 1 reply

worldsavior • today at 6:19 PM

Sushy dinner? What are you building with AI, a calculator?

WarmWash • today at 6:20 PM

I have long suspected that a large part of people's distaste for given models comes from their comfort with their daily driver.

Which I guess feeds back to prompting still being critical for getting the most out of a model (outside of subjective stylistic traits the models have in their outputs).

HardCodedBias • today at 8:30 PM

"These models are so powerful."

Careful.

Gemini simply, as of 3.0, isn't in the same class for work.

We'll see in a week or two if it really is any good.

Bravo to those who are willing to give up their time to test for Google to see if the model is really there.

(history says it won't be. Ant and OAI really are the only two in this race ATM).

alt Hacker News

Replies