logoalt Hacker News

dimitrismrtzsyesterday at 1:32 PM0 repliesview on HN

The 8B class closing the gap with 32B is the real story of 2026 for anyone running models locally. I've been using smaller models for agent tool-use and the progress this year is real.

The gap that still matters most isn't intelligence — it's consistency on structured output. When you chain 5+ tool calls in sequence, even a small per-call reliability difference compounds fast. Would love to see Granite 4.1 benchmarked specifically on multi-step function calling rather than just general benchmarks.