logoalt Hacker News

anonym29yesterday at 5:12 PM3 repliesview on HN

I know it's only tangentially relevant, but I've been baffled by the interest in DeepSeek V4 Flash. It's larger, less efficient, and in many cases, performs worse on both objective benchmarks and real world sniff test (admittedly, n=1) than Minimax M2.7. DS4F hallucinates at extraordinary rates while M2.7 does not. The 196k context length that M2.7 was natively trained up represents neither a hard technical ceiling (this is metadata that can easily adjusted), nor a meaningful degradation threshold - I've personally ran it up past 330k token context windows where it maintained full coherency, and still completed my one-shot agentic task to my satisfaction.


Replies

solarkraftyesterday at 9:13 PM

> The 196k context length that M2.7 was natively trained up represents neither a hard technical ceiling (this is metadata that can easily adjusted), nor a meaningful degradation threshold

FWIW, I find that in OpenCode it starts becoming erratic after around 80k tokens (sometimes less).

NitpickLawyeryesterday at 5:36 PM

M2.7 is no longer open source, it's been changed to a NC license. It's an OK model, but IME out of the big 5 chinese models (ds, glm, kimi, minimax and qwen), DS models have generally shown better generalisation and real-world usage than all the others, even if the benchmark scores were lower. Less benchmaxxxing, basically.

DS4 also has some neat new arch improvements, giving it a lot of context at lower VRAM usage. So it will be cheaper to serve, B for B than previous models.

show 1 reply
antirezyesterday at 6:14 PM

May I ask you what did you used for the DS4F inference? It is a model with very low hallucination rate in my tests.

show 2 replies