logoalt Hacker News

Granite 4.1: IBM's 8B Model Matching 32B MoE

295 pointsby steveharing1yesterday at 10:31 AM191 commentsview on HN

https://research.ibm.com/blog/granite-4-1-ai-foundation-mode...


Comments

2ndorderthoughtyesterday at 10:54 AM

I test drove it yesterday. It's pretty impressive at 8b. Runs on commodity hardware quickly.

Qwen3.6 35b a3b is still my local champion but I may use this for auto complete and small tasks. Granite has recent training data which is nice. If the other small models got fine tuned on recent data I don't know if I would use this at all, but that alone makes it pretty decent.

The 4b they released was not good for my needs but could probably handle tool calls or something

show 5 replies
cbg0yesterday at 11:21 AM

The real "sleeper" might be https://huggingface.co/ibm-granite/granite-vision-4.1-4b if the benchmarks hold up for such a small model against frontier models for table & semantic k:v extraction.

show 1 reply
smj-edisonyesterday at 3:47 PM

On the topic of local models, is there a good equivalent to something like Claude's chat interface? I've recently started transitioning to open models after getting fed up with Claude's usage limits (I'm not in a position to drop $200/month), and for coding tasks Kimi 2.6 has been about the same as Sonnet in my experience. The only thing I've found myself missing is a nice interface to ask it questions and have it help me with my math assignments.

show 10 replies
Havocyesterday at 10:55 AM

Interesting to see a pivot away from MoE by both IBM and mistral while the larger classes of SOTA of models all seem to be sticking to it.

Quick vibe check of it- 8B @ Q6 - seems promising. Bit of a clinical tone, but can see that being useful for data processing and similar. You don't really want a LLM that spams you with emojis sometimes...

show 2 replies
sexylinuxtoday at 12:14 PM

Is this a model that will create reliable output or will it also produce errors?

0xbadcafebeeyesterday at 12:55 PM

People complain a lot about LLM-written articles, but the human comments here on HN are far worse. Mostly a bunch of people extremely proud of themselves for not reading an LLM-written article, and then a bunch of people who take it at face value and make the model seem almost useful, and one comment that actually looked at other benchmarks. Good 'ol humanity, good at.. being emotional... and not doing analysis.....

The article makes some good points about model design (how different size models within a family can get similar results, how to filter out hallucination, math result reinforcement), so that's worth understanding. It's analyzing a paper, which only discussed 3 sizes of the same model family. But what the article doesn't say is, compared to other model families, Granite 4.1 8B sucks. The only benchmark it does well at compared to other models is non-hallucination and instruction following. Qwen 3.5 4B (among other models) easily outclass it on every other metric.

This article teaches a valuable lesson about reading articles in general. You can take useful information away from them (yes, despite being written by LLM). But you should also use critical thinking skills and be proactive to see if the article missed anything you might find relevant.

show 11 replies
simonwyesterday at 3:56 PM

The Granite 4.1 3B model is only 2GB from Unsloth: https://huggingface.co/unsloth/granite-4.1-3b-GGUF

I ran it in LM Studio and got a pleasingly abstract pelican on a bicycle (genuinely not bad for a tiny 3B model - it can at least output valid SVG): https://gist.github.com/simonw/5f2df6093885a04c9573cf5756d34...

show 1 reply
100msyesterday at 11:13 AM

> Full stop.

Why people don't edit out obvious sloppification and expect to still have readers left

show 3 replies
nielsbotyesterday at 11:48 PM

Very much an aside, but I'm struck by IBM's consistent iconic design language. For me it harkens all the way back to the futuristic design in 2001: A Space Odyssey from 1968. But you can also see it in their old mainframe hardware designs and other places.

pjmalandrinoyesterday at 12:48 PM

Very impressive series of SLM by IBM here.

I have been using it with their Chunkless RAG concept and it is fitting very well! (for curious https://github.com/scub-france/Docling-Studio)

I convinced that SLM are a real parto of solution for true integrated AI in process...

dash2yesterday at 12:38 PM

Nah, I ain't reading that. If they can't be bothered to get a human to write it, it can't be that important. I'm glad for them though. Or sorry that happened.

show 4 replies
dimitrismrtzsyesterday at 1:32 PM

The 8B class closing the gap with 32B is the real story of 2026 for anyone running models locally. I've been using smaller models for agent tool-use and the progress this year is real.

The gap that still matters most isn't intelligence — it's consistency on structured output. When you chain 5+ tool calls in sequence, even a small per-call reliability difference compounds fast. Would love to see Granite 4.1 benchmarked specifically on multi-step function calling rather than just general benchmarks.

latentframetoday at 10:00 AM

The limit is changing from scaling parameters to scaling datas quality however compute is still the big constraint

agunapalyesterday at 11:34 AM

If you really think about why MoE came into existence, its to save significant cost during training, I don't think there was any concrete evidence of performance gains for comparable MoE vs dense models. Over the years, I believe all the new techniques being employed in post training have made the models better.

show 2 replies
woadwarrior01yesterday at 1:59 PM

The most salient thing about these models is that they're non-reasoning models. This makes then very token efficient and particularly well suited for local inference where decoding is usually slower than with datacenter GPUs.

Link to HF collection: https://huggingface.co/collections/ibm-granite/granite-41-la...

mdp2021yesterday at 5:03 PM

I read that IBM pioneered the concept of "shifting through "mid-training" from "guessing the next token" to "guessing the next logical step"". I am wondering how far is the research from "enhancing apparent reasoning" to "achieving solid, reliable reasoning".

If techniques existed to shift from "guess the next highly probable" token to "guess the best next logical step", as some interpreted said research, should not that be the foremost objective?

dissahcyesterday at 12:26 PM

qwen3.5 9b outperforms granite 4.1 30b by a huge amount (32 vs 15 on artificialanalysis benchmark)... i have no idea what made the writer of this article say so many demonstrably incorrect things

RandyOrionyesterday at 5:06 PM

Although the performance claim of 8b dense matching 32b moe is somewhat questionable, thank you granite team for releasing small dense LLMs.

mdp2021yesterday at 10:50 AM

Wish they also released an embedding model, in the line of their previous: compact (while good)...

show 2 replies
RugnirVikingyesterday at 10:51 AM

sounds interesting. Here's hoping they release a 32B model, thats a pretty good sweet spot for feasibility of home setups.

edit: I just realised they do actually have a 30b release alongside this. Haven't tried it yet.

show 1 reply
SwellJoeyesterday at 4:50 PM

I wish AI slop articles were somehow automatically flagged and deaded. They're all flowery verbose piles of crap. Yeah, the model is interesting, but the article is trash. I can't believe real humans are willing to sign their name to this stuff.

theblazehenyesterday at 12:42 PM

> models are judged by GPT-4

An interesting choice

cubefoxyesterday at 1:14 PM

It's strange that they don't include reasoning training (RLVR). Their justification doesn't sound convincing:

> While reasoning models have grown in popularity in recent years, their abilities aren’t always the most efficient way to get a result. In enterprise settings, token costs and speed are often as important as performance. That is why turning to less expensive, non-reasoning models with similar benchmark performance for select tasks like instruction following and tool calling makes sense for enterprise users.

I guess they currently don't have the ability to do proper RLVR.

nickpsecurityyesterday at 8:38 PM

On changing the training mix, H20 did that with Danube in 2024:

https://arxiv.org/pdf/2401.16818

With those results, I would've already done that in any models I got to train. There's also the principle that the LLM's are often better at what they saw last in their training set. That also justifies putting more logic, code, and math in at the end for an analytical or coding model. So, a few precedents for that technique already.

rasztoday at 2:34 AM

me: "Bits 5-4 indicate the video mode", how do you derive VIDEO_MASK_00b = 0x03C0 from this?

Granite 4.1 8B: "I apologize for the confusion earlier. Let's clarify the derivation of the VIDEO__MASK_00b based on the description "Bits 5-6 indicate the video mode" from the equipment_list section.

Given that bits 5 and 6 are used for video mode, we need to create a mask that isolates these two bits:

Bit 5 is the least significant bit of the group (2^5 = 32 decimal, 0b00100000 in 8-bit form). Bit 6 is the next bit up (2^6 = 64 decimal, 0b01000000 in 8-bit form). Combining these bits: ...

Therefore, the correct mask for VIDEO_MASK_00b should be 0x00C0"

Errors on top of errors when converting description into binary numbers. Its hopeless for basic task like parsing/generating headers :(

robotmaxtronyesterday at 12:35 PM

"open source"

show me.

show 1 reply
tokenhub_devyesterday at 12:29 PM

[flagged]

samagraguneyesterday at 4:46 PM

[dead]

whalesaladyesterday at 11:53 AM

[flagged]