How hard have you tried? I've been finding that the Opus 4.5/4.6 and GPT-5.2/5.3 mo...

simonw • yesterday at 6:56 PM • 2 replies • view on HN

How hard have you tried?

I've been finding that the Opus 4.5/4.6 and GPT-5.2/5.3 models really have represented a step-change in how good they are at running long tasks.

I can one-shot prompt all sorts of useful coding challenges now that previously I would have expected to need multiple follow-ups to fix mistakes the agents made.

I got all of this from a single prompt, for example: https://github.com/simonw/research/tree/main/cysqlite-wasm-w... - including this demo page: https://simonw.github.io/research/cysqlite-wasm-wheel/demo.h... - using this single prompt: https://github.com/simonw/research/pull/79

Replies

aeyes • yesterday at 7:03 PM

What do you mean? The generated script just downloads the sources and runs pyodide: https://github.com/simonw/research/blob/main/cysqlite-wasm-w...

There is maybe 5 relevant lines in the script and nothing complex at all that would require to run for days.

➕ show 2 replies

basilgohar • yesterday at 7:04 PM

Can you share any examples of these one-shot prompts? I've not gotten to the point where I can get those kind of results yet.

➕ show 1 reply

alt Hacker News

Replies