I wonder if it is better to have a machine somewhere running a model for you maybe shared with a few others. I could probably justify a M6 Mac Studio with hopefully 256gb RAM and have a few people all with access to one agreed upon model. I think maybe laptops are too warm and clunky for this.
The problem is that the moment you introduce shared remote hardware there's a slippery slope leading right back down to "just pay an inference host for model tokens". If you're transmitting your prompts over the internet to a trusted host you might as well just let that host be DeepInfra or together.ai or one of the many other providers already in that business.