logoalt Hacker News

grbshyesterday at 7:08 PM1 replyview on HN

Why not just use Claude by itself? Opus and Sonnet are great at producing pixel coordinates and tool usages from screenshots of UIs. Curious as to what your framework gives me over the plain base model.


Replies

anerliyesterday at 7:16 PM

Hey! To have a framework that can effectively control browser agents, you need systems to interact with the browser, but also pass relevant content from the page to the LLM. Our framework manages this agent loop in a way that enables flexible agentic execution that can mix with your own code - giving you control but in a convenient way. Claude and OpenAI computer use APIs/loops are slower, more expensive, and tailored for a limited set of desktop automation use cases rather than robust browser automations.