This would be a more convincing take if reasoning LLMs didn't already exist. Given the growth in capability over the last few years alone nothing about your description "several minute explanation of how the item description and the slight differentiations of the boxes" seems beyond an artificial intelligence to solve by the time humanoid robots would be ready to physically traverse a warehouse.
Your last point is also interesting given perhaps a robot is more amenable to such instruction, thus creating cascading savings. Each human has to be trained, and could be individually a failure. Robot can essentially copy its "brain" to its others.
Or likely more accurately, download the latest brain trained from all the robot's aggregate experiences from the amazon hivemind hq