It's a start and I welcome competition but I don't think I ever used small cloud models li...

bel8 • yesterday at 8:13 PM • 18 replies • view on HN

It's a start and I welcome competition but I don't think I ever used small cloud models like Haiku 4.5. They are cute but for serious coding they tend to waste your expensive time.

And this certainly wont bring me back to GitHub Copilot which I cancelled yesterday.

GitHub Copilot had competitive pricing until yesterday when they changed from per-request to one of the most expensive per-token quotas. Seriously, take a look at their burning subreddit for some laughs: https://www.reddit.com/r/GithubCopilot

I have since changed to DeekSeek Flash on high which is Sonnet+ level for almost free.

If I feel I still need smarter models I might signup for $20/mo Codex to use GPT 5.5 which, in my opinion, is the best I can access right now.

Replies

fnordpiglet • yesterday at 10:16 PM

I use larger models to organize work into a topologically sorted task graph and pin smaller models to the tasks depending on the complexity with a larger model evaluating the work and patching where necessary. This uses haiku quite often for routine work. I’m able to do multi hour highly complex work with superior results and a much lower bill as a result by doing this, with a parent orchestrator able to do a massive labor within a single context window by effectively organizing work and reviewing quality and integrating where needed. I don’t use haiku directly, but it’s often 30-40% of any major efforts token use. This further improves time to completion as well as cost - but I find haiku is better at following literal instructions and plans without “second guessing,” while opus class models second guess in their thinking constantly.

As such, haiku isn’t a waste of my time, it saves enormous amounts of time for me. But I spent a large amount of time building the orchestration system up front and iterating on it to get here. Interestingly i found my experience as a director and later a distinguished engineer gave me the tools to build it and get it working well and reliably end to end - the dynamics of multi agent workflows of varying capability is not a lot different than the dynamics of a 1000 engineer organization.

➕ show 2 replies

SwellJoe • yesterday at 10:30 PM

I've been doing benchmarking of various models for finding hard security bugs, and my faith in Haiku (and Sonnet, even) has dropped precipitously in the process. Self-hosted Qwen 3.6 27B consistently outperforms both for finding security bugs, which was a shocking result. I expected Qwen to be around Haiku level, maybe a little worse, and I definitely expected it to be worse than Sonnet.

And, DeepSeek and MiMo perform much better than Haiku and Sonnet, near Opus/GPT 5.5 levels, at a fraction of the cost.

There's seemingly no reason to ever use Haiku or Sonnet, if you're not getting it for free or as part of a subscription (that you don't usually saturate).

➕ show 1 reply

GaryBluto • yesterday at 8:34 PM

Almost exactly the same story here. I've also had little to no refusals from DeepSeek, with it's Chinese values meaning substantially less friction when it comes to things like reverse engineering, finding copyrighted files, working with dubiously-sourced source code, et cetera. I don't think I'd go back to Copilot even if they dropped prices by 90%.

➕ show 1 reply

lambda • yesterday at 9:32 PM

Yeah, seems like this is in the range of Qwen 3.6, Gemma 4, Nemotron 3 Super, and the like. There are lot of models, including much smaller cheaper ones (like Qwen 3.6 35B-A3B), that are similarly competitive with Haiku. I can run these on my laptop, I don't need to rent them from Microsoft.

I suppose if you're reeling at the new Copilot bill but want to stay in their ecosystem, this gives you something to use, but for most folks, there's a plethora of better options.

Hfuffzehn • today at 9:57 AM

Agreed. Seems like this could have been a nice model if we would still be in the old GitHub Copilot free request/ premium multiplier mode. It could have been a good compromise to somehow reign in the costs for Microsoft.

But with Copilot now just being paying per-token prices I don't see how this is competitive with Chinese models.

It is probably telling you can't find the costs in the announcement. Because Input $0.75 Cached input $0.075 Output $4.50 might be competitive with Haiku, but nobody in their right mind uses Haiku and Anthropic has abandoned it chasing the tokenmaxers who aren't thinking about budgets.

So I guess they are aiming for corporate customers that are bound to Microsoft through compliance approval that will soon start seeing their budgets explode that have to find some corporate compromise.

hparadiz • yesterday at 8:46 PM

The $20/month ChatGPT plan that comes with codex is good value. Even just have premium ChatGPT is nice. I get rate limited regularly but it still lets me do most things.

➕ show 1 reply

Aperocky • today at 1:42 PM

If you use claude-code Haiku is used under the hood for certain task. I'm not sure what it is, but there's some kind of routing that goes to Haiku automatically.

nate • yesterday at 8:52 PM

The small stuff has their place. I have this safari extension and needed a way to quickly title people's chat histories. Haiku is the fast cheap thing to come up with decent titles of blocks of text. I feel like there's a bunch of those little things lying around you need a model for. I'm even finding Apple's Foundation Model is super useful for stuff like that. Even summarizing an article. It's like equally awful at doing it, but gets enough done to still be useful as a way to be like "oh yeah, this article is actually worth reading"

➕ show 1 reply

alkonaut • yesterday at 8:51 PM

Won’t (presumably) all the market actors converge on similar pricing? If OpenAI stopped operating on subsidies and charge the true costs and their most token hungry customers are the ones that switch to Anthropic and others, then their pricing model switch will also be around the corner.

Unless of course we’re thinking Copilot will be more expensive than others longer term. But is that a reasonable assumption?

➕ show 1 reply

vidarh • yesterday at 10:09 PM

Haiku does quite well if given a detailed plan. That means much more detail than you otherwise would, but you can still save over e.g. having Opus or Sonnet do everything by having them expand their initial plans into more specific levels of detail and feed it to Haiku (or similar level models).

I personally wouldn't use models that class directly, though - I'd use them in a harness as a "backend" for more capable models. And Haiku itself, as opposed to other smaller models, is still expensive.

eli • yesterday at 11:38 PM

Makes sense as part of a larger coding workflow, especially if it’s fast. Using a trillion parameter model to figure out how to call a targeted edit tool or generate a commit message is a waste. Also narrow tasks like “make the background darker” or “rename this function and update callers”

verdverm • yesterday at 8:24 PM

I've been having really good results with DeepSeek-v4-flash, qwen-3.6-moe, and the older gimini-3-flash-preview. (recent geminis suck hard)

Small models are more than enough for the majority of tasks these days. Plan and review with the bigger ones, let the little ones explore and implement.

OpenCode Go is $10/month for the open weight models with nice quotas: https://opencode.ai/go

➕ show 1 reply

bbstats • yesterday at 11:38 PM

What application/UI are you using deep seek flash high on? Still copilot or something else

partiallypro • yesterday at 9:10 PM

> "GitHub Copilot had competitive pricing until yesterday when they changed from per-request to one of the most expensive per-token quotas. Seriously, take a look at their burning subreddit for some laughs"

AI is expensive and it has been heavily subsidized. I you think $20/mo for Codex/Claude flat vs a more usage based model you're in for a shock. Especially once these companies go public and have to meet investor expectations.

epolanski • today at 6:55 AM

> They are cute but for serious coding they tend to waste your expensive time.

90% of corporate job tasks are trivial enough that Haiku can handle them.

Just this morning I have been implementing a reprint functionality in our warehouse management system, which needed to print again carrier labels and delivery notes for a specific order.

It essentially had to do the same workflow of print, but instead of generating and uploading the pdfs, it only had to fetch and print them.

Took Opus 4.8 high 24m1 seconds and 87k tokens. Took Haiku 6m30 seconds and half the tokens.

So not really sure what do you mean by "wasting your expensive time" here. I think you really don't experiment with these tools and assume higher effort, bigger model => time saved, but that's true only when tasks are much bigger and complex enough that a smaller/less precise model would fail or land work of much lower quality.

➕ show 1 reply

LoganDark • today at 12:16 AM

I really hope one day there is something like Opus 4.8 but with Cerebras' speed -- they reach over 1,000t/s on gpt-oss-120b but that model is seemingly not even properly trained for tool calling. But watching it slam out several entire screens of thinking/reasoning per second is amazing. I'd love that with Opus quality.

➕ show 1 reply

emsign • yesterday at 8:47 PM

I wonder when THEY make it illegal to vote with your wallet.

alt Hacker News

Replies