My friend and I were building UI-based automation tools for small businesses and kept running into a few recurring problems:
- Computer-use agents (CUAs) are powerful, but often unreliable and slow when controlling UIs directly.
- Workflows vary — some can be handled deterministically by RPAs, while others need a bit more intelligence from CUAs (while they may share a lot of the same 'actions').
We wanted an easier way to record UI-based automation scripts that could:
- Be run by humans, like a lightweight RPA, or
- Be invoked by CUAs as tool calls — improving their reliability and speed.
So we open-sourced the SDK we’ve been using internally. It currently works on macOS, and lets you:
- Record desktop interactions with any app that exposes accessibility info
- Record browser interactions via the Chrome extension (link on the github).
- Replay recordings deterministically like an RPA, or integrate the generated UI script as a callable tool for your CUA.
We’d love feedback from anyone working on UI automation, RPAs, or CUAs — especially around reliability and edge-case handling.