Showing the murder dog reading a gauge using $$$ worth of model time is kinda not an amazing demo. We already know how to read gauges with machine vision. We also know how to order digital gauges out of industrial catalogs for under $50.
Agree. I'm unclear what's the highlight of this post. Is the multimodality of the model (that can replace computer vision), is it the reasoning part, is it the overall wrapper that makes it very easy to develop on top?
Completely agree, I get that this is a stepping stone for future, more reliable robots but I found the demonstration underwhelming.
I think that where this gets interesting is when you can just drop these robotic systems into an environment that wasn't necessarily set up specifically to handle them. The $50 for your gauge isn't really the cost: it's engineering time to go through the whole environment and set it up so that the robotic system can deal with each of the specific tasks, each of which will require some bespoke setup.