I'm assuming GP means 'run inference locally on GPU or RAM'. You can run really big L...

all2 • yesterday at 12:39 AM • 2 replies • view on HN

I'm assuming GP means 'run inference locally on GPU or RAM'. You can run really big LLMs on local infra, they just do a fraction of a token per second, so it might take all night to get a paragraph or two of text. Mix in things like thinking and tool calls, and it will take a long, long time to get anything useful out of it.

Replies

hxtk • yesterday at 7:30 AM

I’ve been experimenting with this today. I still don’t think AI is a very good use of my programming time… but it’s a pretty good use of my non-programming time.

I ran OpenCode with some 30B local models today and it got some useful stuff done while I was doing my budget, folding laundry, etc.

It’s less likely to “one shot” apples to apples compared to the big cloud models; Gemini 3 Pro can one shot reasonably complex coding problems through the chat interface. But through the agent interface where it can run tests, linters, etc. it does a pretty good job for the size of task I find reasonable to outsource to AI.

This is with a high end but not specifically AI-focused desktop that I mostly built with VMs, code compilation tasks, and gaming in mind some three years ago.

guerrilla • yesterday at 4:08 AM

Yes, this is what I meant. People are running huge models at home now, I assumed people could do it on premises or in a data center if you're a business, presumably faster... but yeah it definitely depends on what time scales we're talking.

➕ show 2 replies

alt Hacker News

Replies