> Could probably write a book about the lessons learned getting a "vision first"...

hugs · on Aug 8, 2024

Agreed on the need for a demo. #1 on the TODO list! If I know at least one person will read it, I might even do a blog, too! :)

The rise of multi-modal LLMs is making "vision first" plausible. However, my basic test is asking these models to find the X,Y screen coordinates of the number "1" on a screenshot of a calculator app. ChatGPT-4o still can't do it. Same with LLaVA 1.5 last I tried. But I'm sure it'll get there someday soon.

Yeah, SikuliX was dependent on old school "classic" OpenCV methods. No machine learning involved. To some extent those methods still work in highly constrained domains like UI automation... But I'm looking forward to sprinkling in some AI magic when it's ready.

localfirst · on Aug 8, 2024

You already have a fan! Feel free to contact me if you need more traffic i'll be sure to spread the word.