
Ask HN: Deep learning enabled GUI automation tools? - swayson
I was wondering, have anybody found really good tools, potentially cross-platform for GUI automation but which leverages image detection from Computer vision models, say convolutional neural networks?<p>How about open-source alternatives?
======
RMPR
Not really a tool, but Python is widely used for deep learning so, you can
combine Pytorch, Tensorflow, [insert your DL framework here] with Pyautogui[1]
to achieve exactly what you're asking. If you feel Pyautogui is too much
"manual", I built a kind of frontend for it [2].

[1]:
[https://github.com/asweigart/pyautogui](https://github.com/asweigart/pyautogui)

[2]: [https://github.com/rmpr/atbswp](https://github.com/rmpr/atbswp)

~~~
swayson
I have been looking into pyautogui, wondering how I can hookup a custom
backend then for the boundary box detection, which appears is not supported.

guess wrapping pyautogui might be the way to go, is my understanding correct?

atbswp looks very valuable, thanks for sharing.

~~~
RMPR
> wondering how I can hookup a custom backend then for the boundary box
> detection, which appears is not supported.

You can take a screenshot with:

    
    
        pyautogui.screenshot()
    

With your neural network you can have the coordinates of what you want, and
act with pyautogui afterwards. In many cases, a neural network can even be
overkill, take a look at this
[https://vimeo.com/352072921](https://vimeo.com/352072921) The script takes a
screenshot of the webpage, recognize the current highlighted word with
pytesseract and type it in with pyautogui, simple.

~~~
swayson
this is great thanks!!!

