Yes, but that comes at the cost of using a dumber llm. The state of the art ones are only available ...

kennywinker • yesterday at 6:59 PM • 1 reply • view on HN

Yes, but that comes at the cost of using a dumber llm. The state of the art ones are only available via commercial api, and the best self-hostable models require $10,000+ gpus.

This is a problem for coding as smarter really has an impact there, but there are so so so many tasks that an 8b model that runs on a $200 gpu can handle nicely. Scrape this page and dump json? Yeah that’s gonna be fine.

This is my conclusion based on a week or so of using ollama + qwen3.5:3b self hosted on a ~10 year old dell optiplex with only the built-in gpu. You don’t need state of the art to do simple tasks.

Replies

TheDong • today at 12:41 AM

> Scrape this page and dump json? Yeah that’s gonna be fine.

Only gonna be fine on a trusted page, an 8b model can be prompt injected incredibly trivially compared to larger ones.

➕ show 1 reply

alt Hacker News

Replies