logoalt Hacker News

c7btoday at 9:59 AM10 repliesview on HN

Gotta say, I've lost all interest in cloud-based AI products. Too many cool features and workflows that I was once excited about that I can't or don't use anymore for a variety of reasons (price hikes, subjectively nerfed, disappeared altogether, replaced,...) for me to even remember. It's tiring.

I've set up a small rig, mostly settled on Qwen3.6 and I'm slowly adding features myself. It probably can't compete with Claude. I don't even know, I've stopped checking. It's providing a ton of value to me as is, and it only keeps getting better. All it takes is to realize that it doesn't actually matter if the grass is (maybe even objectively) greener somewhere else. Feels so good to know that it won't change under my feet. I've got this amazing, highly extensible tool, and it's mine.


Replies

jugtoday at 6:14 PM

I often feel like we're nowadays mostly pushing AI developments in the ways of finetuning differences. Like how new editions of Claude are tuned for agentic coding which might even be detrimental if you're using it for non-agentic coding. Or how Fable 5 in fact do look great but at a huge cost for inference and a high likelihood of post-launch nerfs or limit/price revisions. How Gemini 3.5 has more liberal limits but on the other hand underperforms a bit.

It's like we're mostly treading mud at this point. New editions are released, a version number increases, but I have to wonder if all steps are forward or they're more just tuned differently with similar actual perf per dollar as when this year began.

Most in fact seem to be happening to me with small models. Like your Qwen. Or Gemma 4 31B which is kinda magic especially when considering multilingual abilities. So yes, in that sense I can see "development" probably as we refine data sets and training methods but I see it less on the big hulking beasts with daily limits (unless you turn it up to 11 like Fable).

Edit: As I posted this, I saw a "before and after" comparison for Fable and the reintroduced version is seeing a catastrophic drop in BridgeBench performance as they're still mucking with the model. Go figure... https://x.com/Hesamation/status/2072692225100612032

kamranjontoday at 5:28 PM

I'm really happy this is one of the top comments here, I am fully local as well.

Just wanted to leave a note for folks who might not have the memory to run a big 32gb model - I just found out there are some pruned models that have really good performance and If I had a smaller machine I might try this pruned unsloth Q4 quant of GLM 4.7 flash that sits at 14gb: https://huggingface.co/unsloth/GLM-4.7-Flash-REAP-23B-A3B-GG...

I usually use LM Studio for this type of thing but unsloth has their own studio type app that might be even better suited for these quants.

I used GLM 4.7 flash as my main model for months and it was an incredibly tenacious model and very very fast - I think on restricted hardware, this could be a great choice.

unleadedtoday at 11:23 AM

Qwen3.6-35B-A3B-UD-Q4_K_M runs at about 11 tokens/second on my poor old 1060. Absolutely nuts how far we've come

show 2 replies
pyrekotoday at 6:12 PM

Same here, been happy throwing Qwen3.6 on my old MBP - no it's not as fast as Claude which I use at work, but it works well enough locally and I don't have to worry about credits or shit like the rug getting pulled under me in terms of capabilities.

JSR_FDEDtoday at 10:14 AM

This sounds very appealing. What size Mac mini would I need for that?

show 4 replies
deadbabetoday at 2:48 PM

People want to make it seem like you need to always use the latest and greatest frontier models to be taken seriously as a developer.

You really don’t need them. After a certain point, bigger models give diminishing returns. If you can get 80% of the productivity gain with a free local model, use the local model. It will still be way faster than doing everything by hand, but you also don’t have to pay for tokens to a cloud provider and the tools won’t be ripped away from you on a whim.

This is the new attitude enlightened people should adopt. Reject the arms race.

show 2 replies
cyanydeeztoday at 11:00 AM

I never got into any of the AI models because it was clear local first was going to be more valueable, if they were to replace coding tasks.

I tried out a few models and ended up going with either Qwen3-Coder-Next (no think, just do) and Qwen3.6-35B (thinking, w/llamacpp token budget). Created a customized prompt that works fairly well to around ~60k tokens and then is a toss up on whether it's poisoned itself or I've directly steered it into the wrong. When it's clear that's happened, if it's important to continue, ask it to write a doc then start fresh.

I don't kno whow any one cold have witnessed the last 2 decades of American VC funded tech startups and tell themselves, "you know, this will be a reliable technolgy with no hidden problems".

Even a sober technical evaluation is just two steps:

1. You're proposing to build a app on a non-deterministic model.

2. That model is hosted behind a non-deterministic system (model alignment, model guardrails, system context subterfuge, cost/token pricing)

---

So you want to build your app and you think you're going to kep up with both #1 and #2?

show 3 replies
hathymtoday at 10:34 AM

Same here, I’ve removed my credit card from Copilot and won’t be renewing

anon373839today at 10:22 AM

What features/workflows have you added?

show 1 reply