Probably both vision and dexterity, and the first mistake we make as roboticists/engineers might be to distinguish the two like they're separate problems to solve or that a solution exists where the two live a separate life.
https://rodneybrooks.com/why-todays-humanoids-wont-learn-dex...
Agreed. The solution will likely be some vision foundation model that directly sends controls to the robot ("move here, grab, move there"), trained by Amazon with RL to integrate collision avoidance, object detection, grasping point detection, grasp verification etc.