Kimi K2.6: Advancing open-source coding

665 points • by meetpateltech • yesterday at 3:28 PM • 347 comments • view on HN

Comments

Accessed via OpenRouter, this one decided to wrap the SVG pelican in HTML with controls for the animation speed: https://gisthost.github.io/?ecaad98efe0f747e27bc0e0ebc669e94...

Transcript and HTML here: https://gist.github.com/simonw/ecaad98efe0f747e27bc0e0ebc669...

➕ show 6 replies

game_the0ry • yesterday at 4:23 PM

There is some humor in the fact that china (of all countries) is pioneering possibly the world's most important tech via open source, while we (US) are doing the exact opposite.

➕ show 18 replies

gertlabs • yesterday at 10:55 PM

Early benchmarks show tremendous improvement over Kimi K2 Thinking, which didn't perform well on our benchmarks (and we do use best available quantization).

Kimi K2.6 is currently the top open weights model in one-shot coding reasoning, a little better than GLM 5.1, and still a strong contender against SOTA models from ~3 months ago (comparable to Gemini 3.1 Pro Preview).

Agentic tests are still running, check back tomorrow. Open weights models typically struggle with longer contexts in agentic workflows, but GLM 5.1 still handled them very well, so I'm curious how Kimi ends up. Both the old Kimi and the new model are on the slower side, so that's a consideration that makes them probably less usable for agentic coding work, regardless. The old Kimi K2 model was severely benchmaxxed, and was only really interesting in the context of generating more variation and temperature, not for solving hard problems. The new one is a much stronger generalist.

Overall, the field of open weights models is looking fantastic. A new near-frontier release every week, it seems.

Comprehensive, difficult to game benchmarks at https://gertlabs.com/?mode=oneshot_coding

➕ show 5 replies

elfbargpt • yesterday at 4:23 PM

I've always been surprised Kimi doesn't get more attention than it does. It's always stood out to me in terms of creativity, quality... has been my favorite model for awhile (but I'm far from an authority)

➕ show 6 replies

kburman • yesterday at 5:35 PM

Has anyone here used Kimi for actual work?

I tried it once, although it looks amazing on benchmarks, my experience was just okay-ish.

On the other hand, Qwen 3.6 is really good. It’s still not close to Opus, but it’s easily on par with Sonnet.

➕ show 3 replies

nickandbro • yesterday at 3:56 PM

Wow, if the benchmarks checkout with the vibes, this could almost be like a Deepseek moment with Chinese AI now being neck and neck with SOTA US lab made models

➕ show 2 replies

m4rkuskk • yesterday at 4:58 PM

I have been testing it in my app all morning, and the results line up with 4.6 Sonnet. This is just a "vibe" feeling with no real testing. I'm glad we have some real competition to the "frontier" models.

➕ show 1 reply

XCSme • yesterday at 5:57 PM

In my tests[0] it does only slightly better than Kimi K2.5.

Kimi K2.6 seems to struggle most with puzzle/domain-specific and trick-style exactness tasks, where it shows frequent instruction misses and wrong-answer failures.

It is probably a great coding model, but a bit less intelligent overall than SOTAs

[0]: https://aibenchy.com/compare/moonshotai-kimi-k2-6-medium/moo...

➕ show 1 reply

waynevdm • today at 10:52 AM

With agents running at the scale and for an extended period. Surely they would need to pay for external services like APIs, compute, data. Would everything be based off subscriptions or API usage?

ninjahawk1 • yesterday at 7:33 PM

I often wonder if in the future, the same way early computers used to take up an entire room but now fit in your pocket, if in the future the equivalent of a data center will be a single physical device like a phone nowadays. And if that’s the case, would it happen much quicker since technology has been speeding up year by year?

➕ show 2 replies

candl • yesterday at 5:19 PM

Are there any coding plans for this? (aka no token limit, just api call limit). Recently my account failed to be billed for GLM on z.ai and my subscription expired because of this... the pricing for GLM went through the roof in recent months, though...

➕ show 2 replies

sixhobbits • yesterday at 7:27 PM

I tried it out with my normal mixed-up wolf, goat, cabbage problem and it couldn't solve it. Sonnet 4.6 also can't, but Opus 4.7 has no problems.

Details here [0]

[0] https://techstackups.com/comparisons/kimi-2.6-vs-opus-4.7-an...

mariopt • yesterday at 4:30 PM

Really excited to try this one, I've been using kimi 2.5 for design and it's really good but borderline useless on backend/advanced tasks.

Also discovered that using OpenCode instead of the kimi cli, really hurts the model performance (2.5).

lbreakjai • yesterday at 4:15 PM

I have a subscription through work, I've been trialing it, so far it looks on par, if not better, than opus.

pt9567 • yesterday at 4:17 PM

wow - $0.95 input/$4 output. If its anywhere near opus 4.6 that's incredible.

➕ show 1 reply

dmix • yesterday at 4:55 PM

I'm pretty Kimi is what Cursor uses for their "composer 2" model. Works pretty good as a fallback when Claude runs out, but definitely a downgrade.

➕ show 1 reply

Alifatisk • yesterday at 7:02 PM

Damn it, they stopped offering Kimmmmy. Their sales ai agent which allowed you to bargain for lower subscription prices.

verdverm • yesterday at 4:17 PM

https://huggingface.co/moonshotai/Kimi-K2.6

Is this the same model?

Unsloth quants: https://huggingface.co/unsloth/Kimi-K2.6-GGUF

(work in progress, no gguf files yet, header message saying as much)

➕ show 3 replies

rane • yesterday at 8:38 PM

Added support for Kimi in https://github.com/raine/claude-code-proxy and it does appear to work surprisingly well with Claude Code, although the usage limit for the entry tier doesn't seem as generous as I'd have expected.

irthomasthomas • yesterday at 3:44 PM

Beats opus 4.6! They missed claiming the frontier by a few days.

➕ show 3 replies

throwaw12 • yesterday at 6:10 PM

Beats Opus and Open Source?

I really hope this holds true in real world use cases as well and not only benchmarks. Congrats to Kimi team!

➕ show 1 reply

ttul • yesterday at 7:09 PM

Am I being paranoid in questioning whether the CPC would have something to gain by monitoring coding sessions with Chinese coding AI models? Coding models receive snippets of our intellectual property all day long. It's a bit of a gold mine, no?

➕ show 2 replies

greenavocado • yesterday at 4:18 PM

I pray the benchmark figures are true so I can stop paying Anthropic after screwing me over this quarter by dumbing down their models, making usage quotas ridiculously small, and demanding KYC paperwork.

➕ show 4 replies

Banditoz • yesterday at 4:25 PM

If the benchmarks are private, how do we reproduce the results? I looked up the Humanity's Last Exam (https://agi.safe.ai/) this model uses and I can't seem to access it.

➕ show 1 reply

swingboy • yesterday at 4:01 PM

Exciting benchmarks if true. What kind of hardware do they typically run these benchmarks on? Apologies if my terminology is off, but I assume they're using an unquantized version that wouldn't run on even the beefiest MacBook?

dogscatstrees • yesterday at 8:11 PM

This kimi website, it looks like a stylesheet from the 90's. They could learn a thing or two about typeface design. Steve Jobs would be incensed at this.

➕ show 1 reply

dygd • yesterday at 5:38 PM

> Agent Swarms, Elevated: Match 100 Jobs and Generate 100 Tailored Resumes

Model seems quite capable, but this use-case is just yikes. As if interviewing isn't already a hellscape.

antirez • yesterday at 5:36 PM

Here I analyze the same linenoise PR with Kimi K2.6, Opus, GPT. https://www.youtube.com/watch?v=pJ11diFOjqo

Unfortunately the generation of the English audio track is work in progress and takes a few hours, but the subtitles can already be translated from Italian to English.

TLDR: It works well for the use case I tested it against. Will do more testing in the future.

OsamaJaber • yesterday at 8:00 PM

The modified MIT clause is sneakier than people think. Hit 100M users or $20M a month and you have to slap "Kimi K2.6" on your UI. That covers any consumer app worth building. Not really open, more like free until you matter. Llama pulled the same move

➕ show 7 replies

esafak • yesterday at 4:07 PM

K2.5 was already pretty decent so I would try this. Starting at $15/month: https://www.kimi.com/membership/pricing

edit: Note that you can run it yourself with sufficient resources (e.g., companies), or access it from other providers too: https://openrouter.ai/moonshotai/kimi-k2.6/providers

➕ show 3 replies

cassianoleal • yesterday at 4:55 PM

If only their API wasn't tied to a Google or phone login...

➕ show 2 replies

thomasahle • yesterday at 9:10 PM

Does it run on Nvidia or Huawei?

nisegami • yesterday at 4:23 PM

The choice of example task for Long-Horizon Coding is a bit spooky if you squint, since it's nearing the territory of LLMs improving themselves.

jauntywundrkind • yesterday at 5:59 PM

I really wish some of these very-long-horizon runs were themselves open sourced (open released open access). Have the harness setup to do git committing automatically of the transcript and code, offload the git commit message making. Release it all.

This sounds so so so cool. It would be so amazing to see this unfurl:

> Kimi K2.6 successfully downloaded and deployed the Qwen3.5-0.8B model locally on a Mac. By implementing and optimizing model inference in Zig—a highly niche programming language—it demonstrated exceptional out-of-distribution generalization. Across 4,000+ tool calls, over 12 hours of continuous execution, and 14 iterations, Kimi K2.6 dramatically improved throughput from ~15 to ~193 tokens/sec, ultimately achieving speeds ~20% faster than LM Studio.

cmrdporcupine • yesterday at 4:56 PM

Running it through opencode to their API and... it definitely seems like it's "overthinking" -- watching the thought process, it's been going for pages and pages and pages diagnosing and "thinking" things through... without doing anything. Sitting at 50k+ output tokens used now just going in thought circles, complete analysis paralysis.

Might be a configuration or prompt issue. I guess I'll wait and see, but I can't get use out of this now.

➕ show 2 replies

oliver236 • yesterday at 4:30 PM

isnt this better than qwen?

➕ show 1 reply

potter098 • today at 11:28 AM

[dead]

max2026 • today at 3:32 AM

[dead]

XCSme • yesterday at 5:17 PM

(commented on the wrong thread, HN doesn't let me delete it :( )

➕ show 1 reply

alt Hacker News

Kimi K2.6: Advancing open-source coding

Comments