logoalt Hacker News

oleg2025today at 7:28 AM1 replyview on HN

Couple of months ago I was inspired by kubectl, and built desktopctl CLI to control GUI apps. It uses combination of OCR and Accessibility API on Mac, represents UI as markdown, and exposes actions for mouse and keyboard.

My core idea was that "fast" perception loop is fully local, GPU optimised for UI tokenisation and change detection. "Slow" control loop requires LLM roundtrip, and uses token-efficient markdown interface in CLI output.

It uses relatively stable identifiers for controls, so agents can script common actions, eg `desktopctl pointer click --id btn_save` doesn't require UI tokenisation loop.

https://github.com/yaroshevych/desktopctl/tree/main


Replies

oleg2025today at 7:43 AM

I've learned that compared to APIs, human interfaces are slow and messy, but there is actually a lot of science behind them. The good apps expose information well, and are optimised for clicks, typing, etc.

The best GUIs make great use of muscle memory, which makes them perfect candidates for scripting via CLI. eg a simple sequence "open Notes app, hit Cmd+F, enter search term, read list of results" can be one Bash command invoked by AI agent.