I've been using Gemini for that - it feels like it practically thinks in images (or "possesses impressive visual intelligence," as Google execs would put it).