logoalt Hacker News

Show HN: PageAgent, A GUI agent that lives inside your web app

62 pointsby simon_luv_photoday at 5:01 PM33 commentsview on HN

Title: Show HN: PageAgent, A GUI agent that lives inside your web app

Hi HN,

I'm building PageAgent, an open-source (MIT) library that embeds an AI agent directly into your frontend.

I built this because I believe there's a massive design space for deploying general agents natively inside the web apps we already use, rather than treating the web merely as a dumb target for isolated bots.

Currently, most AI agents operate from external clients or server-side programs, effectively leaving web development out of the AI ecosystem. I'm experimenting with an "inside-out" paradigm instead. By dropping the library into a page, you get a client-side agent that interacts natively with the live DOM tree and inherits the user's active session out of the box, which works perfectly for SPAs.

To handle cross-page tasks, I built an optional browser extension that acts as a "bridge". This allows the web-page agent to control the entire browser with explicit user authorization. Instead of a desktop app controlling your browser, your web app is empowered to act as a general agent that can navigate the broader web.

I'd love to start a conversation about the viability of this architecture, and what you all think about the future of in-app general agents. Happy to answer any questions!


Comments

simon_luv_photoday at 5:07 PM

This is highly experimental right now, but here are some quick links for anyone wanting to dig deeper:

- GitHub: https://github.com/alibaba/page-agent

- Live Demo (No sign-up): https://alibaba.github.io/page-agent/ (you can drag the bookmarklet from here to try it on other sites)

- Browser Extension: https://chromewebstore.google.com/detail/page-agent-ext/akld...

I'd be really interested in feedback on the security model of client-side agents giving extension-bridge access, and taking questions on the implementation!

jadboxtoday at 10:21 PM

Firefox support?

mentalgeartoday at 6:59 PM

> Data processed via servers in Mainland China

Appreciate the transparency, but maybe you could add some European (preferably) alternatives ?

show 2 replies
general_revealtoday at 7:18 PM

I’ve been thinking about something like this. If it’s just a one line script import, how the heck are you trusting natural language to translate to commands for an arbitrary ui?

The only thing I can think of is you had the AI rewrite and embed selectors on the entire build file and work with that?

show 1 reply
dzinktoday at 6:53 PM

Is this Affiliated with the Chinese company Alibaba? Any chance data goes there too?

show 1 reply
pscanftoday at 5:59 PM

Very cool!

I'm particularly impressed by the bookmark "trick" to install it on a page. Despite having spent 15 years developing for the browser, I had somehow missed that feature of the bookmarks bar. But awesome UX for people to try out the tool. Congrats!

show 2 replies
Mnexiumtoday at 7:21 PM

Curious - how does it perform with captchas and other "are you human" stuff on the web?

show 1 reply
coreylanetoday at 6:52 PM

Looks cool! Are you open to adding AWS Bedrock or LiteLLM support?

show 1 reply
MeteorMarctoday at 6:41 PM

Confusing name because of the existence of pageant, the putty agent.

show 2 replies
popalchemisttoday at 7:32 PM

Does it support long-click / click-and-drag?

show 1 reply
jauntywundrkindtoday at 5:43 PM

Not exactly the same but I'd also point to Paul Kinlan's FolioLM as a very interesting project in this space. A very nice browser extension,

> Collect and query content from tabs, bookmarks, and history - your AI research companion. FolioLM helps you collect sources from tabs, bookmarks, and history, then query and transform that content using AI.

https://github.com/PaulKinlan/NotebookLM-Chrome https://chromewebstore.google.com/detail/foliolm/eeejhgacmlh...

show 2 replies