logoalt Hacker News

trouve_searchyesterday at 5:38 PM14 repliesview on HN

OK, I'm 100% rooting for both Mistral and task focused small models.

But Mistral has fall really far behind since 2025Q3. It seems they can't get good reasoning models working at even medium context sizes, which is necessary to be at the table right now.

Gemma4 and Qwen3.6 are currently best in the small size; Mistral's "small" model has ~4x the parameter count at 120B and isn't even competing with models a quarter its size.

Back one year ago with Mistral Small 3.1 they were keeping up, but they've fallen into irrelevancy right now.

If Mistral seriously wants to play the on-prem and small task-specific model game, a decent proxy would be to build models that get the r/localLlama crowd excited


Replies

barrelltoday at 6:13 PM

I think it really depends on what you’re doing. I use mistral for many tasks in https://phrasing.app and they blow models many times their size out of the water.

None of my tasks use reasoning though (reasoning actually kills the performance) so perhaps that’s why. Still, I just had to rewrite my pipeline, and mistral was both faster, cheaper, and substantially better than any alternative

ar0yesterday at 6:47 PM

I agree. I am a paying Le Chat Pro user, really rooting for a European alternative. But the quality difference between Mistral and the frontier labs is growing too big to ignore. It’s worrying to me that they didn’t talk much about new models at the conference, because that is really where their focus should be IMHO.

I am wondering what is keeping them back, though: Money? Compute? Skills? Training data? My fear is that you are really only getting really good models by training on very dubious data (outputs from the frontier models etc) and that Mistral is too European and too enterprisey to take those risks.

show 6 replies
greyskullyesterday at 6:37 PM

> task focused small models

This is tangential: and forgive my ignorance here, but is there an inherent reason why there aren't smaller, focused models from the frontier model providers?

I'm thinking something like a software-specific subset of Opus that is the default for use in Claude Code. Smaller, cheaper to deploy and consume, maybe faster.

show 1 reply
baqyesterday at 6:27 PM

agreed, the next price increase from frontier labs (and the inevitable limits decrease in subscription tiers) will have people thinking real hard about their model providers and that's when mistral should be ready. however, given their recent performance, I realistically don't have my hopes high up.

show 2 replies
raincoletoday at 10:44 AM

> they've fallen into irrelevancy right now

It's a very charitable take, as Mistral has never really left the realm of irrelevancy.

It's only a matter of time before EU falls back to hosting Chinese models in EU datacenters.

rhdunnyesterday at 7:59 PM

Yeah. I run LLM models locally and for me 22B-32B is the largest I'm willing to invest in trying out.

Even though Mistral 4 has 6B active parameters per token (allowing 3-3.5 per token parameters to be loaded on a 4090), the ~240GB download + storage is pushing the limits of being able to try this out locally, especially if you are downloading and evaluating multiple models.

It also makes it harder for other people to make downstream finetunes like with what happened with the older Mistral/Magistral models.

show 1 reply
chartpathtoday at 12:24 AM

I find Mistral Medium 3.5 with OpenCode is perfectly fine if you're willing to talk to it in a more fine-grained way about actual code. For me that's fine because even with huge frontier models I don't like trying to vibe prompt like a product manager.

coredev_yesterday at 7:11 PM

I don't agree that they are falling behind. Using both chat and cli I get what I need and it's comparable to "sota" when I compare.

arkhtoday at 10:05 AM

Mistral is entering the "let's extract has much money from EU taxpayers as we can" phase of European tech company which did not get bought by a US one.

They'll end like Dailymotion, just a zombie company.

echelonyesterday at 5:55 PM

Nobody trying to compete with Google, OpenAI, and Anthropic should be playing the small models / local models game.

Foundation model labs should be building very large reasoning models, then leaving it to the community to distill them down.

You can't scale a small model up, but you can scale a small model down.

I'm convinced the only way we'll have a seat at the table in the future and avoid total runaway takeoff is if there are very large models within 80% of the capabilities of the frontier models. Tiny RTX models do diddly squat to remain competitive.

Build open weights models for running on H200s. I'll spin them up on RunPod or Lambda.

show 2 replies
lettergramyesterday at 6:04 PM

We actually found the Mistral Small 4, quantized to 4bit was comparable to Qwen 3.6 27B and is roughly the same size. At least from our experience on our use cases, the quantization of the Mistral model worked far better than trying to quantize the Qwen family.

Fully agree to your point though, Mistral in general is far behind where I'd expect and Qwen in particular is crushing it at the smaller sizes.

Personally, I'd consider anything 20B params and above a "medium" model. Small being <20B and large >100B. I think obviously we can get to the huge 1-2T param models, but frankly the margin of accuracy improvement for the speed hit is kinda insane (1-2% for many metrics).

show 1 reply
thatsadudetoday at 4:48 AM

Nawh, they trained on test since Llama 2, no wonder.

kergonathyesterday at 10:10 PM

> a decent proxy would be to build models that get the r/localLlama crowd excited

I don’t really disagree with your post, but this is not exactly right. That subreddit seems to go from hype train to hype train every week, I haven’t found anything really insightful in it for quite a while now.

dyauspitryesterday at 8:32 PM

Mistral is bad bad. For its use cases I feel like India’s Sarvam is doing better.

show 1 reply