logoalt Hacker News

Mistral AI Releases Forge

685 pointsby pemberyesterday at 9:04 PM174 commentsview on HN

Comments

kioleanutoday at 8:04 AM

I like Mistral, it hits the exact sweet spot between cost and my data staying in the EU, withouth a significant drop in quality, but man are their model naming conventions confusing af. They mention they have a model called Devstral 2, which is neither Codestral nor Devestral. I want to use it, but the api only lists devstral-2512, devstral-latest, devstral-medium-latest, devstral-medium-2507, devstral-small, devstral-small-2507.

I think, devstral-latest should be it, no? So I write to support and get an answer 12 hours later that says oh, no, devstral 2 is definetely called devstral 2 and then a page of instructions on how to set it up in Intellij... generated with AI. The screens it is refering to don't exist and never did.

show 3 replies
ogoutoday at 5:19 AM

Don't sleep on Mistral. Highly underrated as a general service LLM. Cheaper, too. Their emphasis on bespoke modelling over generalized megaliths will pay off. There are all kinds of specialized datasets and restricted access stores that can benefit from their approach. Especially in highly regulated EU.

Not everyone is obsessed with code generation. There is a whole world out there.

show 7 replies
mark_l_watsonyesterday at 11:58 PM

I am rooting for Mistral with their different approach: not really competing on the largest and advanced models, instead doing custom engineering for customers and generally serving the needs of EU customers.

show 5 replies
upghosttoday at 3:07 AM

> Pre-training allows organizations to build domain-aware models by learning from large internal datasets.

> Post-training methods allow teams to refine model behavior for specific tasks and environments.

How do you suppose this works? They say "pretraining" but I'm certain that the amount of clean data available in proper dataset format is not nearly enough to make a "foundation model". Do you suppose what they are calling "pretraining" is actually SFT and then "post-training" is ... more SFT?

There's no way they mean "start from scratch". Maybe they do something like generate a heckin bunch of synthetic data seeded from company data using one of their SOA models -- which is basically equivalent to low resolution distillation, I would imagine. Hmm.

show 5 replies
jcmartinezdevtoday at 9:53 AM

Mistral is doing some really great stuff lately. Sure, it's hard to compete with OpenAI and Anthropic and their models, but they are taking up some interesting takes and designing their product in unique ways.

I like a lot what they are doing and I'll be watching them a lot more closely. I'd love to work for them btw!

roxolotlyesterday at 11:36 PM

Mistral has been releasing some cool stuff. Definitively behind on frontier models but they are working a different angle. Was just talking at work about how hard model training is for a small company so we’d probably never do it. But with tools like this, and the new unsloth release, training feels more in reach.

ryeguy_24today at 1:52 AM

How many proprietary use cases truly need pre-training or even fine-tuning as opposed to RAG approach? And at what point does it make sense to pre-train/fine tune? Curious.

show 3 replies
dmixtoday at 1:14 AM

This is definitely the smart path for making $$ in AI. I noticed MongoDB is also going into this market with https://www.voyageai.com/ targeting business RAG applications and offering consulting for company-specific models.

dash2today at 6:30 AM

I think it’s interesting what this approach suggests about who will profit from AI. I’m sceptical that having huge numbers of GPUs is a moat. After all, real humans – even geniuses – are trained on much much less data than the whole Internet. But proprietary and specialised data could very well be a moat. It’s hard to train a scientist/lawyer/analyst without reading a lot of science/law/finance. Companies’ proprietary data might encode a great deal of irreplaceable knowledge. Seems as if Mistral is taking this bet.

show 1 reply
losvedirtoday at 12:09 PM

> Forge enables enterprises to build models that internalize their domain knowledge. Organizations can train models on large volumes of internal documentation, codebases, structured data, and operational records. During training, the model learns the vocabulary, reasoning patterns, and constraints that define that environment.

I'm probably really out of date at this point, but my impression was that fine tuning never really worked that well for knowledge acquisition, and that don't variety of RAG is the way to go here. Fine tuning can affect the "voice", but not really the knowledge.

show 1 reply
csunoseryesterday at 11:47 PM

Huh. I initially thought this is just another finetuning end point. But apparently they are partnering up with customers on the pretraining side as well. But RL as well? Jeez RL env are really hard to get right. Best wishes I guess.

todteeratoday at 9:44 AM

Interesting how Mistral is investing into training models for industry specific use cases. With the commoditization of intelligence by base models, they're probably looking to creating value from specialized verticals.

alansabertoday at 12:50 PM

I find the mistral "middle" between small LMs /1T LMs compelling. Models that are sufficiently big to be performant but specialised for domains and tasks- this is what I assumed we'd always head towards.

jbverschoortoday at 7:39 AM

ASML and ESA as clients means something. I dont expect to see the first name somewhere else on the logo list

vincentbuschtoday at 5:50 PM

lol the AI-generated support reply about their own AI model is peak 2026

the naming mess is wild though. i ran into similar confusion trying to set up mistral for a side project — ended up just guessing which endpoint was the right one

andaitoday at 1:19 AM

They mention pretraining too, which surprises me. I thought that was prohibitively expensive?

It's feasible for small models but, I thought small models were not reliable for factual information?

show 1 reply
zbytoday at 6:40 AM

My bet is that the solution to continuous learning is with external storage. There is a lot of talk about context engineering - but I have not seen anyone taking context as the main bottleneck and building a system around that. This would show that even context engineering is kind of wrong term - because context does not enter the llm in some mysterious way - it goes through prompt and the whole model of passing chat history back and forth is not the most efficient way of using the prompt limitation.

show 2 replies
tho23i42342397today at 10:35 AM

Interesting. Does this actually scale though ? I've never seen enterprises which have "internal knowledge" in proper readable form - it's often in code, and more importantly in people who wrote them.

I recall that even at Google - with its own search engine and so on - the best way to understand anything was to read code or to reach out to those who wrote them. I don't know how it works in places that work with the "real world" like ASML.

Often the issue is not even about documentation - it's just that it's extremely hard to include all the nuances in text and still have it be readable (code-documentation comes to mind).

Interestingly, I strongly feel that this also where LLMs (and some of our more textually-obsessed academics) fail.

show 1 reply
hermit_devtoday at 3:04 AM

The future of AI is specialization, not just achieving benevolent knowledge as fast as we can at the expense of everything and everyone along the way. I appreciate and applaud this approach. I am looking into a similar product myself. Good stuff.

show 2 replies
rorylawlesstoday at 1:12 AM

The fine tuning endpoint is deprecated according to the API docs. Is this the replacement?

https://docs.mistral.ai/api/endpoint/deprecated/fine-tuning

show 1 reply
thecopytoday at 9:21 AM

Looks interesting. But how to explore or test or use? The product page (https://mistral.ai/products/forge) also does not contain anything useful. Just "Contact us"

Dissapointing.

Aldipowertoday at 8:27 AM

I cannot keep up with their products, model names and releases. What is what for? Their marketing texts do not make sense for me. Is there a nice overview somewhere?

I am a simple stupid Le Chat user with a small mind and the Tredict MCP Server connected to it (to Le Chat, not my mind), which works ok-ish. :-)

speedgoosetoday at 6:37 AM

I was enthusiastic but it’s "contact us" priced for now. I was expecting a classic cloud LLM forge with a public pricing.

apexalphatoday at 9:58 AM

This looks good but how much money are we talking here? Are we 'retraining' an entire model but adding enterprise data to the public data set?

whatever1today at 6:42 AM

I thought that for pretraining to work and reasoning to emerge you need internet scale data. How can forge achieve it with just internal company data (unless the said company is AT&T or something) ?

krinnetoday at 8:45 AM

I wasnt able to find a way to access this - is this something accessible only to enterprises ?

Would love to take it for a spin, if that is even possible.

Havoctoday at 12:35 PM

Good for them. Really hope they find market fit

spacesh1psodatoday at 9:41 AM

Go EU!

aavcitoday at 2:27 AM

How does this compare to fine tuning?

show 1 reply
burgerquizztoday at 9:33 AM

can i use mistral to read my source code and teach it so i don't need to inject the whole doc every single time and consume token every single time?

supernestoday at 5:40 AM

> Code agents are becoming the primary users of developer tools, so we built Forge for them first, not

... for humans.

bsjshshsbtoday at 1:09 AM

Id training or FT > context? Anyone have experience.

Is it possible to retrain daily or hourly as info changes?

dragochattoday at 11:55 AM

where sample notebook/script? where github? where signup?

...learn a thing or two from NVIDIA or gtfo

show 1 reply
webagent255today at 6:36 PM

[dead]

myylogictoday at 5:10 PM

[dead]

maxothextoday at 10:38 AM

[dead]

wei03288today at 6:02 AM

[dead]

genie3iotoday at 3:00 PM

[dead]

codancetoday at 1:58 AM

[dead]

shablulmanyesterday at 9:21 PM

[dead]

Heer_Jtoday at 5:06 PM

[dead]

gpubridgetoday at 3:01 AM

[flagged]