> it also would use less electricity
How would it use less electricity? I’d like to learn more.
That's completely not true. LLM on device would use MORE electricity.
Service providers that do batch>1 inference are a lot more efficient per watt.
Local inference can only do batch=1 inference, which is very inefficient.
That's completely not true. LLM on device would use MORE electricity.
Service providers that do batch>1 inference are a lot more efficient per watt.
Local inference can only do batch=1 inference, which is very inefficient.