Hey HN! I'm excited to show off Cerebellum, a new open-source tool designed to automate browser tasks using large language models (LLMs).
Cerebellum is an LLM-based AI agent that interacts with web sites using mouse and keyboard interactions, much like you would. So it works on web sites whether or not they have an API. The only requirement is that you use a Selenium-supported browser like Chrome, Firefox, or Safari. It’ll even work on Electron apps with a bit of setup!
Furthermore, Cerebellum is open-source! We’d love both your feedback and contributions. The GitHub repo is here: https://github.com/theredsix/cerebellum. And the license is MIT.
Currently, it uses Claude 3.5 Sonnet’s newly released computer use ability, but the ultimate vision is to crowdsource a high quality set of browser sessions to train an open source local model. Other than that, there is still a lot more work to do: Python SDK, supporting more LLMs, .... So help is very welcome.
I'm eager to hear what you think!
— Han Wang