logoalt Hacker News

mlsulast Thursday at 6:56 PM1 replyview on HN

It looks to me by the marketing copy that the vision encoder can run 60FPS.

> MobileNet-V5-300M

Which makes sense as it's 300M in size and probably far less complex, not a multi billions of parameters transformer.


Replies

refulgentislast Thursday at 7:03 PM

I agree that's the most likely interpretation - does it read as a shell game to you? Like, it can do that but once you get the thing that can use the output involved it's 1/100th of that? Do they have anything that does stuff with the outputs from just MobileNet? If they don't, how are they sure I can build 60 fps realtime audiovisual experiences they say I can?

show 1 reply