Number of params isn’t really the relevant metric imo. Top models don’t support local inference. More relevant is tokens per dollar or per second.
Number of parameters is at least a proxy for model capability.
You can achieve incredible tok/dollar or tok/sec with Qwen3 0.6b.
It just won't be very good for most use cases.
It does since you can run this model locally on a < $3k machine
Its an open source model, why wouldn't it be relevant for people who want to self host.....