I don't know too much about training your own computer use model other than it would probably be a very hefty, very expensive task.
However, I believe ByteDance released UI-TARS which is an excellent open source computer use model according to some articles I read. You could run that locally. We haven't tested it so I wouldn't know how it performs, but sounds like it's definitely worth exploring!