If this can't run full-local, isn't that basically a botnet? You're talking about installing a kernel-level driver that receives instructions on what to do from a cloud service.
Great point! Yes you are correct in that the actual "agent" lives in the cloud and its actions are executed by a proxy running on the desktop. Hopefully at some point we can set up a straightforward installation procedure to have the AI models running entirely on the desktop, but that's constrained by desktop specs for now. VMs and desktops with the specs to handle that would be prohibitively expensive for a lot of teams trying to build these automations.
There isn't a viable computer use model that can be ran locally yet unfortunately. Am extremely excited for the day that happens though. Essentially the key capability that makes a model a computer use model is precise coordinate generation.
So if you come across a local model that can do that well, let us know! We're also keeping a close watch.
You are correct in that ByteDance did releas UI-TARS which sounds like a really good open source computer use model according to some articles I read. You could run that locally. We haven't tested it so I wouldn't know how it performs, but sounds like it's definitely worth exploring!
I don't know too much about training your own computer use model other than it would probably be a very hefty, very expensive task.
However, I believe ByteDance released UI-TARS which is an excellent open source computer use model according to some articles I read. You could run that locally. We haven't tested it so I wouldn't know how it performs, but sounds like it's definitely worth exploring!