>It's here, right now. I mean I've been forcing my good old 1080ti to run local model...

fennecfoxy • yesterday at 9:12 AM • 2 replies • view on HN

>It's here, right now.

I mean I've been forcing my good old 1080ti to run local models since a short while after llama was first leaked.

But I wouldn't say "local models are here" in the same way as "year of the Linux desktop!111"

Until someone can just go out and buy some sort of "AI pod" that they can take home, plug in and hit one button on a mobile app to select a model (or even just hide models behind various personas) then I wouldn't say it's quite there yet.

It's important that the average consumer can do it, I think the limitations for that are: things are changing too quickly, ram+compute components are exceedingly expensive now, we're still waiting on better controls/harnesses for this stuff to stop consumers not just from shooting themselves in the foot, but blowing their foot clean off.

Would be interesting to see a Taalas-like chip in a product, albeit there's so many changes going on atm with diffusion based models, Google's Turboquant (which as someone who has had to almost always run quantized models, makes a lot of sense to me).

Replies

skillina • yesterday at 11:10 AM

What is the use case you see for non-technical users self-hosting? I think it’s important that tools remain available but I don’t expect it to be adopted by “average consumers.”

I’m interested in self-hosting for privacy and control. I already owned the hardware I’m testing with, so my spend is limited to time and electricity.

The “LLM pods” you describe will be loaded with spyware and adware (see: Smart TVs), and average consumers won’t max their compute around the clock so naturally data centers are able to make more efficient use of hardware by maximizing utilization.

➕ show 2 replies

cl0ckt0wer • yesterday at 12:16 PM

There are local ai pods. They're like 2k for a low end.

alt Hacker News

Replies