logoalt Hacker News

esquire_900yesterday at 5:07 AM4 repliesview on HN

Cost wise it does not seem very effective. .5 token / sec (the optimized one) is 3600 tokens an hour, which costs about 200-300 watts for an active 3090+system. Running 3600 tokens on open router @.4$ for llama 3.1 (3.3 costs less), is about $0,00144. That money buys you about 2-3 watts (in the Netherlands).

Great achievement for privacy inference nonetheless.


Replies

teo_zeroyesterday at 6:33 AM

I think we use different units. In my system there are 3600 seconds per hour, and watts measure power.

show 1 reply
Aerroonyesterday at 5:29 AM

Something to consider is that input tokens have a cost too. They are typically processed much faster than output tokens. If you have long conversations then input tokens will end up being a significant part of the cost.

It probably won't matter much here though.

qoezyesterday at 12:09 PM

Open router is highly subsidized. This might be cheaper in the long run once these companies shift to taking profits

show 1 reply
thatwasunusualyesterday at 10:18 AM

> Cost wise it does not seem very effective.

Why is this so damn important? Isn't it more important to end up with the best result?

I (in Norway) use a homelab with Ollama to generate a report every morning. It's slow, but it runs between 5-6 am, energy prices are at a low, and it doesn't matter if it takes 5 or 50 minutes.

show 1 reply