logoalt Hacker News

porneltoday at 7:56 PM2 repliesview on HN

[meta] I wonder why people have such wildly different bar for what is "good" agentic coding?

In a way, it's absolutely amazing that we've went from "Playing 'Set a Timer' on Apple Music" intelligence to something that may pass the Turing Test, but in practical terms the small models are still far from what I'd call "good" for more than a tech demo.

To me, 7B models are just a fuzzy echo of Wikipedia. Gemma models at 4 bit are too clumsy to even reliably generate JSON for tool calls or copy a line of code to apply a patch.

Qwen needs so much detail and babysitting to stop it from doom looping or losing the plot, that the instructions that I need to give are usually longer than the code I end up keeping.

Is there some magic prompt that I don't know? Do other people just have a lot more patience, or way lower expectations?


Replies

papersailtoday at 8:12 PM

I had similar doubts. I think expectations differ because the workload differs. For small scripts, glue code, or simple CRUD changes, smaller models such as Qwen3.6-27B can work wonders than they do on a larger, messier code base.

verdvermtoday at 10:09 PM

There is a lower bar (that gets lower over time), but ime, the config you are describing is too low still.

qwen/gemma in the 27/35B range @fp8 are better than gemini-2.5, but less than gemini-3.1, you can run DS4-flash @fp8 on two DGX spark, and things keep becoming better. DiffusionGemma came out recently with 4x token gen speeds.

tl;dr - the models you appear to be trying with are too small or too quant'd