- Advisor wanted a one button way to run a convoluted research prototype I had made, and I didn't want to have to dig into Cocoa to figure out how to programmatically click/select options in a few desktop apps.
- Worked at a company and there was silly employee training slideshow+quiz, so I had Sikuli wait for the next arrow to show up once the audio was finished and click it.
- Wanted to heat up my GPU to warm a brownie, so I opened one of those WebGL water demos and had Sikuli repeatedly pick up a ball and drop it in the water.
I like brownies better when they're warm, so I came up with a few ways to warm the pre-packaged treats while still sitting in my office. I first tried sitting the brownie atop my laptop charger: http://images.reclipse.net/warmed_brownie.jpg
This worked fairly well, but the charger was less hot when my laptop was at 100% charge, so I sought alternate methods for heating up the brownie. For some reason, I had stumbled across some recent WebGL demos at the time, and noticed that the fan on my laptop would spin up on that particular demo. (http://madebyevan.com/webgl-water/)
This provided a more consistent heat than the charger, since usually my laptops would be completely charged by the end of lunchtime. I had my own laptop and a company-provided laptop, so it wasn't hard to set one of them aside as a brownie warmer.
Brownie-wise, I would wait a while because lunch would fill me up. But at 3 or 4 pm, I'd be hungry again and something the size of a brownie would really hit the spot.
I'm not sure why this is like nails on a chalkboard for me. Maybe it's the programmer equivalent of dog-earing book pages.
I don't dog-ear book pages, if that makes you feel any better.
(also I dog ear pages so we're even)
I remember running into issues as soon as anything changed regarding resolution, scaling, graphic settings, color scheme etc.
Pretty fun to make toy programs in to automate stuff with though.
Super useful for automating things that are easy to handle by looking for patterns/things on screen and hard to handle with APIs (or lack thereof)
You do need to be careful about timing and getting the right images so tuning things to work under all conditions is a bit of an art. Also being able to recover from a failure so you can continue testing is another bit of art.
As to why this is news there seems to be a new release out (or soon?) 1.1.0 ... Sikuli development seems to have almost died a few years back but it's made a comeback over the last ~2.5 years which is nice.
We've found that the "Got kicked off for every build" continuous
integration process you mention is the crucial part to achieving success
with this type of test automation -- if you're going to invest the effort in
writing reliable tests, you want to be getting value out of them by
running them as often as possible and as early as possible.
Sikuli is good at image matching. For me sikuli broke when I started to take images of text. The font would render differently in gnome and the vm (vncserver/twm) jenkins ran the tests on. I ended up creating docker images of the test environemt so the docker image would be the same on jenkins and the testers machines.
Debian has a sikuli package libsikuli-script-java and sikuli-ide. I've also written a docker file for sikuli on debian wheezy .
I did find some use for it. My girlfriend got addicted to some online flash Mahjong game and no matter how hard I tried, I could not post a better score than her. With a bit of Sikuli scripting, I was posting top scores in no time!
Not quite. Sikuli tries to figure out where things are by doing a visual match. This works very well for things like automating applications or sites where page elements are fixed (e.g. finding an option in a menu or using a search engine). But it works terribly when trying to overlay semantic structure on dynamically-generated data. For example, it has no way of knowing that a list of movies is split up on multiple pages, with each movie having multiple genres, a cast, and multiple reviews, each of which has a rating and an author.
There's also the additional drawback that it is hard to parallelize things in Sikuli (you would need heavyweight vms, and there are no obvious "breaks" in the flow). So doing something at scale is not feasible.
With ParseHub, one of the goals is to make it easy to express relationships (and we think we've done a really good job). We also automatically figure out how to split a job up across an entire fleet of servers.
Hope that offers some insight. Email me at firstname.lastname@example.org if you have any other questions.
Ultimately Tesseract was primarily designed to operate on text which had been printed and then scanned, whereas the text on screen is lower resolution, anti-aliased, on a coloured background, etc etc.
Some further details of our OCR investigations here: http://stb-tester.com/blog/2014/04/14/improving-ocr-accuracy...
The TLDR version is: Training Tesseract on your font doesn't help; scaling up the text 3x before passing it to tesseract gives a massive improvement (I don't know if Sikuli does this); normalising ligatures & punctuation gives an additional slight improvement.