had a good run with Gemma 4 E2B Unsloth 4Q: https://youtube.com/shorts/XLsAnz5aAAI
The E4B model doesn’t fit on my phone TPU, so it swaps to RAM, the QAT version means more accuracy, good!
How do you know it swaps to ram vs on the TPU?
Would be interested in testing this on my pixel.
How were you getting anything useful out of that? We found the (unquantized!) E2B model to be completely useless at even the simplest real-world classification tasks.