logoalt Hacker News

ZAYA1-8B: An 8B Moe Model with 760M Active Params Matching DeepSeek-R1 on Math

20 pointsby steveharing1today at 8:56 AM19 commentsview on HN

Comments

throwaw12today at 10:25 AM

> The math and coding part is impressive but the agentic one is not.

I think this is very important to eventually become a viable replacement for coding models. Because most of the time coding harnesses are leveraging tool calls to gather the context and then write a solution.

I am hopeful, that one day we can replace Claude and OpenAI models with local SOTA LLMs

show 2 replies
Havoctoday at 11:15 AM

0.76 active and it's vaguely competitive at coding sounds promising.

LM studio doesn't let me actually run this yet though: "Unsupported safetensors format: null"

2ndorderthoughttoday at 10:22 AM

I've been saying it for a long time now. I think small models are the future for LLMs. It's been fun seeing experiments to see just how much better models get by making them insanely large but it's not sustainable.

No I am not saying this model is a drop in Claude replacement. But I think in 2 years we might be really surprised what can be done in a desktop with commodity hardware, no connection to the internet, and a few models that span a subset of tasks.

Really happy to see amd put their hat in the ring. It's a good day for amd investors. I know a lot of AI bros will scoff at this, but having your first training run is a big deal for a new lab. AMD is on their way despite Nvidia having years of runway

show 2 replies