logoalt Hacker News

echelonyesterday at 10:06 PM0 repliesview on HN

Speaking of browser automation, are there any LLMs or tools that hook up to actual desktop browsers and can automate the keyboard and mouse?

Which LLMs best drive these? Claude/Gemini, etc., or is anything local actually competent at it?

Can they understand layout and visual cues with a VLM or multimodality?

Are they robust enough to interact with threejs and videos and whatnot, or can they just blindly navigate the DOM?