DeepSeek v4

1297 points • by impact_sy • today at 3:01 AM • 929 comments • view on HN

https://huggingface.co/deepseek-ai/DeepSeek-V4-Pro/blob/main...

Comments

The speed of progress here is wild. It feels like the hard part is shifting from having access to a strong model to actually building trustworthy systems around it.

aquir • today at 6:41 AM

It is great! I asked the question what I always ask of new models ("what would Ian M Banks think about the current state of AI") and it gave me a brilliant answer! Funny enough the answer contained multiple criticisms of his own creators ("Chinese state entities", "Social Credit System").

jfxia • today at 6:36 AM

Is V4 still not a multi-modal model?

➕ show 1 reply

yanis_t • today at 6:58 AM

Is there a harness that is as good as cloud code that can be used with open weight models?

➕ show 5 replies

taosx • today at 3:47 AM

MErge? https://news.ycombinator.com/item?id=47885014

clark1013 • today at 5:14 AM

Looking forward to DeepSeek Coding Plan

➕ show 1 reply

aliljet • today at 4:35 AM

How can you reasonably try to get near frontier (even at all tps) on hardware you own? Maybe under 5k in cost?

➕ show 6 replies

dannyw • today at 8:21 AM

Are there better providers for inferencing this right now? I know it's launch day, but openrouter showing 30tps isn't looking great.

namegulf • today at 4:27 AM

Is there a Quantized version of this?

➕ show 1 reply

KaoruAoiShiho • today at 3:57 AM

SOTA MRCR (or would've been a few hours earlier... beaten by 5.5), I've long thought of this as the most important non-agentic benchmark, so this is especially impressive. Beats Opus 4.7 here

sibellavia • today at 5:32 AM

A few hours after GPT5.5 is wild. Can’t wait to try it.

GuardCalf • today at 8:14 AM

I like this. The more competitors there are, the more we the users benefit.

fbrncci • today at 10:12 AM

Take that Anthropic and your shenanigans.

JonChesterfield • today at 8:32 AM

Anyone worked out how much hardware one needs to self host this one?

apexalpha • today at 6:17 AM

This FLash model might be affordable for OpenClaw. I run it on my mac 48gb ram now but it's slowish.

reenorap • today at 4:19 AM

Which version fits in a Mac Studio M3 Ultra 512 GB?

➕ show 1 reply

swrrt • today at 4:04 AM

Any visualised benchmark/scoreboard for comparison between latest models? DeepSeek v4 and GPT-5.5 seems to be ground breaking.

ghstinda • today at 11:56 AM

so many models not enough time

WhereIsTheTruth • today at 6:21 AM

Interesting note:

"Due to constraints in high-end compute capacity, the current service capacity for Pro is very limited. After the 950 supernodes are launched at scale in the second half of this year, the price of Pro is expected to be reduced significantly."

So it's going to be even cheaper

cztomsik • today at 6:54 AM

So is this the first AI lab using MUON for their frontier model?

➕ show 1 reply

mariopt • today at 4:48 AM

Does deepseek has any coding plan?

➕ show 1 reply

raincole • today at 4:07 AM

History doesn't always repeat itself.

But if it does, then in the following week we'll see DeepSeek4 floods every AI-related online space. Thousands of posts swearing how it's better than the latest models OpenAI/Anthropic/Google have but only costs pennies.

Then a few weeks later it'll be forgotten by most.

➕ show 2 replies

rvz • today at 4:00 AM

The paper is here: [0]

Was expecting that the release would be this month [1], since everyone forgot about it and not reading the papers they were releasing and 7 days later here we have it.

One of the key points of this model to look at is the optimization that DeepSeek made with the residual design of the neural network architecture of the LLM, which is manifold-constrained hyper-connections (mHC) which is from this paper [2], which makes this possible to efficiently train it, especially with its hybrid attention mechanism designed for this.

There was not that much discussion around it some months ago here [3] about it but again this is a recommended read of the paper.

I wouldn't trust the benchmarks directly, but would wait for others to try it for themselves to see if it matches the performance of frontier models.

Either way, this is why Anthropic wants to ban open weight models and I cannot wait for the quantized versions to release momentarily.

[0] https://huggingface.co/deepseek-ai/DeepSeek-V4-Pro/blob/main...

[1] https://news.ycombinator.com/item?id=47793880

[2] https://arxiv.org/abs/2512.24880

[3] https://news.ycombinator.com/item?id=46452172

➕ show 1 reply

tcbrah • today at 6:03 AM

giving meta a run for its money, esp when it was supposed to be the poster child for OSS models. deepseek is really overshadowing them rn

cl08 • today at 7:15 AM

Any way to connect this to claude code?

➕ show 2 replies

sergiotapia • today at 5:40 AM

Using it with opencode sometimes it generates commands like:

    bash({"command":"gh pr create --title "Improve Calendar module docs and clean up idiomatic Elixir" --body "$(cat <<'EOF'
    Problem
    The Calendar modu...

like generating output, but not actually running the bash command so not creating the PR ultimately. I wonder if it's a model thing, or an opencode thing.

tariky • today at 5:27 AM

Anyone tried with make web UI with it? How good is it? For me opus is only worth because of it.

zurfer • today at 7:32 AM

lots of great stuff, but the plot in the paper is just chart crime. different shades of gray for references where sometimes you see 4 models and sometimes 3.

ls612 • today at 4:07 AM

How long does it usually take for folks to make smaller distills of these models? I really want to see how this will do when brought down to a size that will run on a Macbook.

➕ show 2 replies

augment_me • today at 6:30 AM

Amaze amaze amaze

cubefox • today at 8:49 AM

Abstract of the technical report [1]:

> We present a preview version of DeepSeek-V4 series, including two strong Mixture-of-Experts (MoE) language models — DeepSeek-V4-Pro with 1.6T parameters (49B activated) and DeepSeek-V4-Flash with 284B parameters (13B activated) — both supporting a context length of one million tokens. DeepSeek-V4 series incorporate several key upgrades in architecture and optimization: (1) a hybrid attention architecture that combines Compressed Sparse Attention (CSA) and Heavily Compressed Attention (HCA) to improve long-context efficiency; (2) Manifold-Constrained Hyper-Connections (mHC) that enhance conventional residual connections; (3) and the Muon optimizer for faster convergence and greater training stability. We pre-train both models on more than 32T diverse and high-quality tokens, followed by a comprehensive post-training pipeline that unlocks and further enhances their capabilities. DeepSeek-V4-Pro-Max, the maximum reasoning effort mode of DeepSeek-V4-Pro, redefines the state-of-the-art for open models, outperforming its predecessors in core tasks. Meanwhile, DeepSeek-V4 series are highly efficient in long-context scenarios. In the one-million-token context setting, DeepSeek-V4-Pro requires only 27% of single-token inference FLOPs and 10% of KV cache compared with DeepSeek-V3.2. This enables us to routinely support one-million-token contexts, thereby making long-horizon tasks and further test-time scaling more feasible. The model checkpoints are available at https://huggingface.co/collections/deepseek-ai/deepseek-v4.

1: https://huggingface.co/deepseek-ai/DeepSeek-V4-Pro/blob/main...

casey2 • today at 8:21 AM

Already over a billion tokens on open router in under 5 hours

gigatexal • today at 6:42 AM

Has anyone used it? How does it compare to gpt 5.5 or opus 4.7?

coolThingsFirst • today at 6:35 AM

I got an API key without credit card details I didn’t know they had a free plan.

luew • today at 5:35 AM

We will be hosting it soon at getlilac.com!

punkpeye • today at 5:48 AM

Incredible model quality to price ratio

frozenseven • today at 4:11 AM

Better link:

https://news.ycombinator.com/item?id=47885014

https://huggingface.co/deepseek-ai/DeepSeek-V4-Pro

donbreo • today at 7:06 AM

Aaaand it cant still name all the states in India,or say what happened in 1989

➕ show 1 reply

hongbo_zhang • today at 4:35 AM

congrats

creamyhorror • today at 4:09 AM

[dead]

Aegis_Labs • today at 1:06 PM

[dead]

unit149 • today at 1:18 PM

[dead]

hubertzhang • today at 4:44 AM

[dead]

maryjeiel • today at 4:11 AM

[dead]

Razengan • today at 10:56 AM

[dead]

slopinthebag • today at 5:10 AM

OMG

OMG ITS HAPPENING

minhajulmahib • today at 4:23 AM

[flagged]

➕ show 1 reply

dhruv3006 • today at 4:58 AM

Ah now !

alt Hacker News

DeepSeek v4

Comments

🔗 View 1 more comment