MAI-Code-1-Flash

517 points • by EvanZhouDev • yesterday at 6:47 PM • 243 comments • view on HN

https://microsoft.ai/models/mai-code-1-flash/

https://microsoft.ai/pdf/MAI-Code-1-Flash-Model-Card.PDF

Launching seven new MAI models: https://microsoft.ai/news/building-a-hillclimbing-machine-la...

Comments

camelmel • yesterday at 7:40 PM

Huh, according to that model card this is a 137B total parameter model.

Performance doesn't seem that good:

- MAI-Code-1-Flash (137B-A5B) = 51% on SWE-bench pro

- Qwen3.6-35B-A3B = 49.5% on SWE-bench pro (https://huggingface.co/Qwen/Qwen3.6-35B-A3B)

They benchmark against Claude Haiku but Haiku is not good, it's worse than tiny open models you can run locally or via API at 10% the cost.

➕ show 4 replies

bel8 • yesterday at 8:13 PM

It's a start and I welcome competition but I don't think I ever used small cloud models like Haiku 4.5. They are cute but for serious coding they tend to waste your expensive time.

And this certainly wont bring me back to GitHub Copilot which I cancelled yesterday.

GitHub Copilot had competitive pricing until yesterday when they changed from per-request to one of the most expensive per-token quotas. Seriously, take a look at their burning subreddit for some laughs: https://www.reddit.com/r/GithubCopilot

I have since changed to DeekSeek Flash on high which is Sonnet+ level for almost free.

If I feel I still need smarter models I might signup for $20/mo Codex to use GPT 5.5 which, in my opinion, is the best I can access right now.

➕ show 15 replies

hmokiguess • yesterday at 7:32 PM

Does anyone actually uses these smaller models for coding? If so, how? I usually Opus everything. Is the play to plan/design/architect with a heavier model than delegate structured tasks to these smaller ones? Would appreciate to hear someone's opinion on having done and tested both paths.

➕ show 16 replies

motoboi • today at 4:03 PM

To understand microsoft IA problems right now, observe that NONE of the models announced are available for use even in the microsoft foundry, which is the place were you add models to your account.

I understand github copilot rollout takes time, but why can't we consume the models via microsoft own api after launching?

Anthropic models are available at foundry the same moment they are launched, but not Microsoft's own models.

➕ show 1 reply

cwillu • yesterday at 9:27 PM

What is with people reimplementing window scrolling badly?

➕ show 2 replies

capten • yesterday at 7:12 PM

It's so weird to me that the benchmarks remain so low, but the models are marketed as revolutionary. And if you say that low coding capabilities aren't a problem, say that to the token price hike and 'general use' model setup.

Why not sell it as a math agent? Why do I have to set up 4 agents to check each others' work?

➕ show 2 replies

AntiRush • yesterday at 6:57 PM

The introductory blog post has a lot more information

https://microsoft.ai/news/introducingmai-code-1-flash/

and the model card

https://microsoft.ai/pdf/MAI-Code-1-Flash-Model-Card.PDF

The broader announcement of 7 MAI models seems to be where the 5B active in the title comes from

https://microsoft.ai/news/building-a-hillclimbing-machine-la...

➕ show 1 reply

mekpro • today at 2:44 PM

The technical report is very detailed and would 'reinforcement learning' of future researchers, Thanks Microsoft!

eterevsky • yesterday at 11:42 PM

They are comparing it to Haiku 4.5. Not Opus, not Sonnet, but Haiku, the smallest Anthropic model, 3 versions old.

➕ show 1 reply

Hfuffzehn • today at 12:43 PM

So I guess the important link the marketing department forgot is this one: https://docs.github.com/en/copilot/reference/copilot-billing...

Model Input Cached input Output

MAI-Code-1-Flash $0.75 $0.075 $4.50

Comparing to

Claude Haiku 4.5 $1.00 $0.10 $5.00

looks fine.

But they also forgot to include the benchmarks comparing to

GPT-5.4 mini $0.75 $0.075 $4.50

Those would have been helpful.

➕ show 1 reply

ChicagoDave • today at 2:23 PM

I’m not sure the message should be benchmarking.

The eye-opener is clean licensed data with filters for AI content (not sure how you do that).

If MSFT builds up using an ethical approach, there is a large anti-AI audience that might take note.

efields • yesterday at 7:43 PM

Please test your websites in Safari. Almost all of your iOS users use it by default, and the desktop experience is pretty close to the mobile experience, so testing is easy.

That scroll effect is jank city for me (yeah yeah works fine in Chrome/Edge).

➕ show 1 reply

OsrsNeedsf2P • yesterday at 6:55 PM

So it's trained on the SWE Bench Pro evalset

➕ show 2 replies

tosh • yesterday at 7:17 PM

not open weight or at least I did not find anything indicating open weight

➕ show 1 reply

deckar01 • yesterday at 8:04 PM

If only they had launched that yesterday I might have avoided Copilot auto model selection using a 9x model, quietly burning my monthly quota in a single afternoon.

tgtweak • today at 2:34 PM

Is anyone using haiku 4.5?

Why not showcase it against something in a similar domain like qwen3.6 or gemma 4?

mentos • yesterday at 7:26 PM

Shouldn’t the next model focus not be on code but system design?

Seems like the work from a good system design to code is practically solved.

Now it’s a matter of the design of the system. Or is that represented in these evals?

➕ show 1 reply

AJRF • yesterday at 8:17 PM

Copilot brand is tarnished, so time to bung everything under MAI?

➕ show 1 reply

jnwatson • today at 12:30 AM

I had to remind myself what Haiku is even for. Anthropic hasn't spent a lot of recent marketing on it.

When I need a light model, I reach for Sonnet. It is nearly free on the max plans, and quite fast. I don't see a place for Haiku in regular coding.

Haiku I guess is when you need summarization/categorization at scale.

Microsoft setting Haiku as the benchmark is a low bar.

➕ show 1 reply

ajyoon • yesterday at 7:13 PM

Scroll wheel hijacked on this entire domain

➕ show 3 replies

zoobab • today at 7:23 AM

"It is built end-to-end by Microsoft using clean and appropriately licensed data."

Well still no list nor publication of the training data.

smcleod • yesterday at 9:12 PM

I don't see the point in comparing yourself to Haiku which is not only useless for coding but also old. No thanks Microsoft.

schmorptron • today at 11:16 AM

Maybe this will replace raptor-mini as the "free" model on copilot plans? (but I don't see it at all yet on the student plan, in vscode or the cli)

ronbenton • today at 1:00 AM

>Build for developers, not benchmarks

That sounds like something you say when you don't benchmark well

ramaseshanms • today at 11:13 AM

How not to flex:

"MAI-Code-1-Flash outperforms Claude Haiku 4.5"

aubanel • today at 12:08 PM

Raw feedback to the team: 1-model looks awesome, 2-The artificially smoothed scrolling on your page feels really bad!

mchl-mumo • today at 5:57 AM

The UI has Mustafa Suleyman written all over. Seems to be as much effort in rebranding MAI as in training.

npn • yesterday at 9:01 PM

I personally do not like Microsoft, but congrats them to release this model.

While the scores are not good compare to other open weight model, the important thing to note is their training data (as they claimed) is very clean, without any synthetic datasets.

onlyrealcuzzo • yesterday at 7:05 PM

Gemma 4 26B-A4B scored exceptionally well with 20% less params, so this isn't unprecedented.

dang • yesterday at 10:07 PM

Related ongoing thread:

MAI-Thinking-1 - https://news.ycombinator.com/item?id=48374362 - June 2026 (64 comments)

giancarlostoro • yesterday at 8:02 PM

Mark Zuckerberg must be in crisis. Microsoft releasing models that compete with Claude's models. Meanwhile the only thing anyone knows about Mark's models is that they help you get hacked more easily.

➕ show 2 replies

mmaunder • yesterday at 7:49 PM

You lost me at forced scrolling. Ugh!

➕ show 1 reply

bguberfain • yesterday at 7:21 PM

It is good to se big companies like Microsoft launching LLMs. They have large amount of compute power and good scientists to create useful models.

➕ show 1 reply

GaryBluto • yesterday at 8:31 PM

What's with the lack of Microsoft design language on the website? It's painfully obvious they're trying to emulate Anthropic's style here and it looks tacky.

➕ show 8 replies

hootz • yesterday at 7:12 PM

I'd love to see a tokens per second metric. I always prioritize speed over raw intelligence for flash models.

➕ show 1 reply

ruined • yesterday at 9:48 PM

wtf are they doing to the scroll on that page

➕ show 1 reply

halapro • today at 4:33 AM

In a few languages MAI means no/never, so it's an apt name for a Microsoft offering.

➕ show 1 reply

striking • yesterday at 7:35 PM

To be clear about the size of the model: MAI-Code-1-Flash is 137B A5B.

notenkidev • today at 5:11 AM

Curious how this handles token cost visibility. One of the biggest pain points with AI coding tools right now is having no idea what you're actually spending per project.

tornikeo • today at 5:31 AM

Where's the pelican when you need it the most?

cainxinth • yesterday at 9:40 PM

Claude Haiku 4.5 results with 60% fewer tokens. Sounds good, but they don't list token costs.

gslepak • yesterday at 7:33 PM

Would be cool if this were an open model.

randomsc • yesterday at 9:16 PM

“ Build for developers, not benchmarks” is the worst marketing shot I ever heard

➕ show 1 reply

arunkant • yesterday at 10:21 PM

Why do websites still hijack scrolling? It sucks

gruntled-worker • today at 12:03 AM

"Mai" means "never" in Italian. Ain't gonna happen.

jMyles • yesterday at 7:43 PM

I'd really like to get back to an autocomplete flow, ideally with some shared and optimized context with the relationship with my larger agent models.

But it seems like, by and large, even the faster models are now aimed at longer-running agentic flows and not sub-1s autocomplete. Or am I wrong about that?

➕ show 1 reply

LoganDark • yesterday at 7:32 PM

"Clean data" is impossible. Language models have polluted the landscape to such a degree it's impossible to filter them out now. OpenAI has no doubt discarded or muddled their dataset that was used to train the original ChatGPT, so there may be no dataset in existence now that isn't contaminated.

ilia-a • yesterday at 8:18 PM

I mean they are comparing themselves to Haiku of all things, geez that's not a good start...

alt Hacker News

MAI-Code-1-Flash

Comments

🔗 View 26 more comments