Terrible article, repetitive AI slop.
But, Gemma really is very impressive. The premise that people are paying for GPT-3.5 or using it for serious work is weird, though? GPT-3.5 was bad enough to convince a lot of folks they didn't need to worry about AI. Good enough to be a chatbot for some category of people, but not good enough to actually write code that worked, or prose that could pass for human (that's still a challenge for current SOTA models, as this article written by Claude proves, but code is mostly solved by frontier models).
Tiny models are what I find most exciting about AI, though. Gemma 2B isn't Good Enough for anything beyond chatting, AFAIC, and even then it's not very smart. But, Gemma 31B or the MoE 26BA4B probably are Good Enough. And, those run on modest hardware, too, relatively speaking. A 32GB GPU, even an old one, can run either one at 4-bit quantization, and they're OK, competitive with frontier models of 18 months ago. They can write code in popular languages, the code works. They can use tools. They can find bugs. Their prose is good, though still obviously AI slop; too wordy, too flowery. But, you could build real and good software using nothing but Gemma 4 31B, if you're already a good programmer that knows when the LLM is going off on a bizarre tangent. For things where correctness can be proven with tools, a model at the level of Gemma 4 31B can do the job, if slower and with a lot more hand-holding than Opus 4.6 needs.
The Prism Bonsai 1-bit 8B model is crazy, too. Less than 2GB on disk, shockingly smart for a tiny model (but also not Good Enough, by my above definition, it's similarly weak to Gemma 2B in my limited testing), and plenty fast on modest hardware.
Small models are getting really interesting. When the AI bubble pops (or whatever happens to normalize things, so normal people can buy RAM and GPUs again) we'll be able to do a lot with local models.