Those are good ones. I've fiddled with similar systems before, do you have a rough success rate? I know they can be finicky, especially as you execute through a chain-of-thought action plan, or however you're doing it.
Anything improving reasoning chains of though improves planning. Right now the long term ones Art mentioned like logging in have been around 80% while simpler ones have been higher. Right now our main issue is figuring out how to keep the server up :/ we're getting a little more traffic than expected. However, to bump those success rates up (which we need to) we really really need to fine tune additional models which we're planning out right now.
I have a few ideas around that mostly going down the RL route (with a twist) mixed with some knowledge graph work. We'll give an update when we push that!
We have an API server where we execute all the agent reasoning/planning jobs then we stream the browser commands to the client. We mention this in the how it works section on the website. This is the main reason why we have the 5 bot a day limit is because of this. It's cheap for us to run as of now but if anyone would like us to ship a version where you'd use your own api keys (plug n play) locally let us know!
Happy to hear all the thoughts for those who try the app out! Even if you just have ideas about how agents might look in their final form, there's so many avenue's this tech can take and we have a ton of wild ideas we'll be building so stay tuned. :D