When running models on my phone - either through the web browser or via an app - is there any chance it uses the phone's NPU, or will these be GPU only?
I don't really understand how the interface to the NPU chip looks from the perspective of a non-system caller, if it exists at all. This is a Samsung device but I am wondering about the general principle.