logoalt Hacker News

dvtyesterday at 6:51 PM3 repliesview on HN

I’m working on a DOM agent and I think MCP is overkill. You have a few “layers” you can imply by just executing some simple JS (eg: visible text, clickable surfaces, forms, etc). 90% of the time, the agent can imply the full functionality, except for the obvious edge cases (which trip up even humans): infinite scrolling, hijacking navigation, etc.


Replies

Garlefyesterday at 7:11 PM

Question: Are you writing this under the assumption that the proposed WebMCP is for navigating websites? If so: It is not. From what I've gathered, this is an alternative to providing an MCP server.

Instead of letting the agent call a server (MCP), the agent downloads javascript and executes it itself (WebMCP).

0x696C6961yesterday at 6:59 PM

In what world is this simpler than just giving the agent a list of functions it can call?

show 2 replies
Mic92yesterday at 7:02 PM

Do expose the accessibility tree of a website to llms? What do you do with websites that lack that? Some agents I saw use screenshots, but that seems also kind of wasteful. Something in-between would be interesting.

show 1 reply