Somethings really screwy with on-device models from Google, I can't put my finger on what, and ...

refulgentis • yesterday at 6:08 PM • 2 replies • view on HN

Somethings really screwy with on-device models from Google, I can't put my finger on what, and I think being ex-Google is screwing with my ability to evaluate.

Cherry-picking something that's quick to evaluate:

"High throughput: Processes up to 60 frames per second on a Google Pixel, enabling real-time, on-device video analysis and interactive experiences."

You can download an APK from the official Google project for this, linked from the blogpost: https://github.com/google-ai-edge/gallery?tab=readme-ov-file...

If I download it, run it on Pixel Fold, actual 2B model which is half the size of the ones the 60 fps claim is made for, it takes 6.2-7.5 seconds to begin responding (3 samples, 3 diff photos). Generation speed is shown at 4-5 tokens per second, slightly slower than what llama.cpp does on my phone. (I maintain an AI app that inter alia, wraps llama.cpp on all platforms)

So, *0.16* frames a second, not 60 fps.

The blog post is so jammed up with so many claims re: this is special for on-device and performance that just...seemingly aren't true. At all.

- Are they missing a demo APK?

- Was there some massive TPU leap since the Pixel Fold release?

- Is there a lot of BS in there that they're pretty sure won't be called out in a systematic way, given the amount of effort it takes to get this inferencing?

- I used to work on Pixel, and I remember thinking that it seemed like there weren't actually public APIs for the TPU. Is that what's going on?

In any case, either:

A) I'm missing something, big or

B) they are lying, repeatedly, big time, in a way that would be shown near-immediately when you actually tried building on it because it "enables real-time, on-device video analysis and interactive experiences."

Everything I've seen the last year or two indicates they are lying, big time, regularly.

But if that's the case:

- How are they getting away with it, over this length of time?

- How come I never see anyone else mention these gaps?

Replies

mlsu • yesterday at 6:56 PM

It looks to me by the marketing copy that the vision encoder can run 60FPS.

> MobileNet-V5-300M

Which makes sense as it's 300M in size and probably far less complex, not a multi billions of parameters transformer.

➕ show 1 reply

catchmrbharath • yesterday at 6:29 PM

The APK that you linked, runs the inference on CPU and does not run it on Google Tensor.

➕ show 2 replies

alt Hacker News

Replies