After TurboQuant and Gemma 4, came across the following video[0] running Gemma on local machine at 5...

wg0 • yesterday at 12:10 PM • 2 replies • view on HN

After TurboQuant and Gemma 4, came across the following video[0] running Gemma on local machine at 50 token/second.

That already looks like Sonnet 3x and 4 level capabilities to me where the model in question (Gemma 4) set ups whole python project with a UI and installs python libraries using uv etc.

Add this Simple Self Distillation to the picture and by 2028 I see cheaper coding model providers with much more generous usage limits in the future and power users would be mostly running their own models anyway.

Anyone using these models as "non-deterministic transpilers" from natural language to code (experienced engineers who can write code themselves) would probably not be paying to any AI providers.

[0] https://www.youtube.com/watch?v=-_hC-C_Drcw

Replies

spiderfarmer • yesterday at 12:48 PM

I always wonder how much smaller and faster models could be if they were only trained on the latest versions of the languages I use, so for me that is PHP, SQL, HTML, JS, CSS, Dutch, English, plus tool use for my OS of choice (MacOS).

Right now it feels like hammering a house onto a nail instead of the other way around.

➕ show 4 replies

red75prime • yesterday at 2:08 PM

> power users would be mostly running their own models

...with a fair amount of supervision, while frontier models would be running circles around them using project-specific memory and on-demand training (or whatever we would have by then).

➕ show 3 replies

alt Hacker News

Replies