How do you do on-device inference while preserving battery life?
It's not limited to just the mobile device. You could have a MacBook/mini/studio that is part of your local "cluster" and the inference runs across all of them and optimized based on power source.
Using something like Taalas' hardcoded model as opposed to running one on general purpose GPUs, flexible but power-hungry.
https://www.cnx-software.com/2026/02/22/taalas-hc1-hardwired...