logoalt Hacker News

hmokiguessyesterday at 7:32 PM19 repliesview on HN

Does anyone actually uses these smaller models for coding? If so, how? I usually Opus everything. Is the play to plan/design/architect with a heavier model than delegate structured tasks to these smaller ones? Would appreciate to hear someone's opinion on having done and tested both paths.


Replies

linuxhanslyesterday at 7:51 PM

I am using Opus 4.x at work, and these "smaller" (20-80bn, 3-4bn active) models at home. Unfortunately there is no comparison, yet (IMHO anyway).

With Opus I can work, trust its designs, architecture suggestions, and code changes, even in a complex code base.

The smaller models seem to "try". They work for smaller tasks, but for more complex task it's often more work than doing it myself.

I wish it were different, and maybe in a year or two it will be.

show 1 reply
motoboitoday at 4:05 PM

Unless you are token rich, you'll have to find a way pretty soon.

For tasks (like kubernetes, linux, reports, database exploration and such) I use GLM5.1. Faster is actually smarter in those cases. And much cheaper too.

Opus 4.8 is for the unknown. Things I don't know how to do myself.

0123456789ABCDEyesterday at 8:18 PM

>Is the play to plan/design/architect with a heavier model than delegate structured tasks to these smaller ones?

always has been

claude code has opusplan — uses opus while in plan mode, switches to sonnet for execution.

https://code.claude.com/docs/en/model-config#opusplan-model-...

edit: you can make it work with sonnet for planning, and haiku for execution, or any other combination you fancy to work with.

https://code.claude.com/docs/en/model-config#control-the-mod...

hedgehogyesterday at 9:07 PM

Yes. Divide execution of a change into separate responsibilities. Designate the main chat as the "orchestrator", Opus. You designate a goal, then tell it to grind until it gets there using the following sub-agents in sequence:

1. Step execution (Sonnet): Work for 30 minutes / 100k tokens at the direction of the Orchestrator

2. Review (Opus): Scrutinize the previous step's work for errors, fidelity to the instructions, fix those and record opportunities to improve the agent configuration + tools to reduce errors and token usage (record those to a file).

3. Self-improvement (Opus): Implement the highest impact self-improvement items that don't require user intervention.

Repeat: Until orchestrator session token budget exhausted (set it to 1M or whatever).

The underlying rationale is to keep each step manageable to maximize adherence to instructions and minimize cost (even cached tokens cost something). Prompt tokens are much cheaper than generated, so to the extent Opus mostly reviews rather than drives that saves a lot too. Self-improvement steps are very expensive but the improvements compound, if you're going to run a job for days or weeks it's way more expensive not to do them.

Edit: I do this in Claude Code with the Anthropic models as well as Qwen family models for offline use.

pkayeyesterday at 11:03 PM

Because the Haiku model is quite cheap but doesn't screw up too often I used it for interactive coding for my existing projects on the older copilot plans.

For simple features I don't have a full plan worked out. I write a bit of code then tell the model in a short line prompt what it should do. Sometimes I put temporary comments in the code to give it guidance. Generally if the code change is within a file or package, Haiku is good enough follow what you ask and not mess up too much. I also have skills created over time to give it guidance. There were some months when I used GitHub copilot where I had excess credits available at the end of the month I frantically try to use up.

Even the AI code completions can be pretty good on their own. Sometimes I write some temporary comments describing what the code should do and just press Tab-Tab-Tab and the entire function is done.

I think there is a tendency for people to go for the advanced models thinking they we screw up less but if you really understand the code its easier to interactively do it with a lesser model.

ojryesterday at 7:37 PM

I use Gemini 3 Flash, I've seen the Claude Code setups, bullish on Anthropic people are driving up tokens but I am able to produce outcomes with a fraction of the money.

show 1 reply
veselinyesterday at 8:06 PM

Claude code itself spins a lot of its subagents with Haiku. The model has low hallucination rate, so it is great for exploration tasks. I guess this is what the best purpose of this model here will be as well. Which is a lot of tokens - many tasks spin multiple exploration agents before the planning or fixing, that is then just a few tool calls.

killermouse0yesterday at 7:40 PM

I was wondering the same. I guess it makes sense to use a heavy weight model to make the entire design and split the work so that smaller models (possibly local one?) would then do the coding... But how would I even do that? I'm using Claude Code. Would I need support for this within the harness ?

show 1 reply
XCSmeyesterday at 10:53 PM

Not sure if considered it's considered small in any way, but DeepSeek V4 Flash is really decent.

axi0myesterday at 8:50 PM

From my experience, smaller models like Haïku 4.5 have indeed shown very convincing results on specific, scoped tasks (themselves generated by a more capable model such as Opus 4.6). We use this kind of workflows in production to optimize speed, efficiency, and costs.

lanthissayesterday at 8:17 PM

i used to use opus for everything, thats not an option once you move to a multi agent system unless you're working on like high end research. I could easily spend 3k a day if i was using opus as just a normal dev.

As we build a better and better harness and better feedback/verifiers we're switching more to 3.5 flash. I think chinese models would work too, but we cant use those atm.

Generally theres a coordinator running opus and an ever growing set of skills and subagents that take actions using weaker models and output feedback to the coordinator opus.

I'm pretty convinced at this point we're past the level of intelligence needed for most tasks most devs do and that will trend down as we better build harnesses for our own codebases.

cushyesterday at 8:45 PM

Implicitly, yes. A lot of harnesses will invoke small models to do small changes, saving time and tokens.

newusertodayyesterday at 7:39 PM

plan using opus execute using local

glaslongyesterday at 8:13 PM

I keep trying to, because I really want to make qwen 3.6 35b work for end implementation of a fleshed out spec (mostly for local data privacy reasons).

...but I spend so much more time correcting it, or building pipelines to try, retry, and converge, that it's rarely worthwhile for me in either time or $ spent vs Opus.

ebbiyesterday at 8:21 PM

I use it for smaller changes that I need to make, mainly on UI fixes or some easy logic fixes.

scotty79yesterday at 8:18 PM

In DeepSWE anything from Antropic is a whole class lower than what's achievable with gpt-5.5

So by using Opus you are using "smaller" model. Well, not really smaller, just worse. The actual smaller models can at least be faster.

altmanaltmanyesterday at 8:13 PM

I actually find planning/design easier with a smaller model and implementation with a larger one. I'm mostly manually working with the model on planning and design and decisions are mine and smaller models are faster. And when there's a clear design/wayforward, the bigger models are usually better at understanding the overall context and applying the specific patch they were assigned to. I call it the 1-2 punch system where you do the first light punch then the harder punch when its actually important to hit properly. I know it goes against the standard of throwing the biggest model at design but I personally experience the bigger models try to do TOO MUCH and take a lot of time which is something that's not good in the design/arch/boilterplate phase.

claud_iatoday at 10:02 AM

[flagged]

wd021today at 2:12 PM

[dead]