logoalt Hacker News

dbcurtisyesterday at 9:47 PM0 repliesview on HN

It is easy to underestimate how much one relies on senses other than vision. You hear many kinds of noises that indicate road surface, traffic, etc. You feel road surface imperfections telegraphed through the steering wheel. You feel accelerations in your butt, and conclude loss of traction from response of the accelerator and motion of the vehicle. Secondly, the human eye has much more dynamic range than any camera. And is mounted on an exquisite PTZ platform. Then turning to the model -- you are classifying obstacles and agents at a furious rate, and making predictions about the behavior of the agents. So, in part I agree that the models need work, but the models need to be fed, and IMHO computer vision is not a sufficient sensor feed.

Consider an exhaust condensation cloud coming from a vehicle's tail pipe -- it could be opaque to a camera/computer-vision system. Can you model your way out of that? Or is it also useful to do sensor fusion of vision data with radar data (cloud is transparent) and others like lidar, etc. A multi-modal sensor feed is going to simplify the model, which in the end translates into compute load.