Some tasks don’t require SOTA models. For translating small texts I use Gemma 4 on my iPhone because it’s faster and better than Apple Translate or Google Translate and works offline. Also if you can break down certain tasks like JSON healing into small focused coding tasks then local models are useful
Do you use E2B or E4B?
How does that work? Wouldn't it be slow loading the weights into memory every time you launch it?