Hacker News new | comments | show | ask | jobs | submit login
Sikuli Script (sikuli.org)
148 points by llamataboot on Sept 30, 2014 | hide | past | web | favorite | 31 comments

I used Sikuli back in 2011-12 for some random automaton tasks, and often wish I would remember to use it more.

- Advisor wanted a one button way to run a convoluted research prototype I had made, and I didn't want to have to dig into Cocoa to figure out how to programmatically click/select options in a few desktop apps.

- Worked at a company and there was silly employee training slideshow+quiz, so I had Sikuli wait for the next arrow to show up once the audio was finished and click it.

- Wanted to heat up my GPU to warm a brownie, so I opened one of those WebGL water demos and had Sikuli repeatedly pick up a ball and drop it in the water.

> - Wanted to heat up my GPU to warm a brownie

Please elaborate.

Sure thing. I was an intern at Adobe working on a new programming language. The office has a cafeteria, and I would often buy lunch there and get a brownie to save for later.

I like brownies better when they're warm, so I came up with a few ways to warm the pre-packaged treats while still sitting in my office. I first tried sitting the brownie atop my laptop charger: http://images.reclipse.net/warmed_brownie.jpg

This worked fairly well, but the charger was less hot when my laptop was at 100% charge, so I sought alternate methods for heating up the brownie. For some reason, I had stumbled across some recent WebGL demos at the time, and noticed that the fan on my laptop would spin up on that particular demo. (http://madebyevan.com/webgl-water/)

This provided a more consistent heat than the charger, since usually my laptops would be completely charged by the end of lunchtime. I had my own laptop and a company-provided laptop, so it wasn't hard to set one of them aside as a brownie warmer.

Wow. I thought the "to warm a brownie" thing was a joke about WebGL performance. That's a hilarious story.

Nah, it's real. Adobe had a WebGL competitor, Stage3D, at the time, but I didn't work on that at all. As an intern, I would run a few useful benchmarks per day, but it was more reliable to run things I knew would heat up a laptop instantly, so I used WebGL demos.

Brownie-wise, I would wait a while because lunch would fill me up. But at 3 or 4 pm, I'd be hungry again and something the size of a brownie would really hit the spot.

Low hanging fruits.

> - Wanted to heat up my GPU to warm a brownie, so I opened one of those WebGL water demos and had Sikuli repeatedly pick up a ball and drop it in the water.

I'm not sure why this is like nails on a chalkboard for me. Maybe it's the programmer equivalent of dog-earing book pages.

It was just for fun. Part of my job at the time was evaluating benchmarks, so things like the water simulation were on my mind. I don't remember if there were microwaves on each floor that I might have used instead.

I don't dog-ear book pages, if that makes you feel any better.

Reminds me of the Wired bitcoin miner[0]. They bought an early ASIC miner and used it to heat coffee on a livestream.

[0] http://www.wired.com/2013/05/butterfly_live/

(also I dog ear pages so we're even)

I'm gonna post this XKCD, because it's eerily appropriate (once WebGL manages to fix any performance bugs): http://xkcd.com/1172/

I haven't worked anywhere with tasty nearby brownies since then. =-/ I know I ate something unhealthy often at the Stata Center's cafe, but I think it was a large cookie that didn't need to be heated.

:,( You can buy brownies before or after, no?

I used to use AHK a lot for interacting with a game. Sikuli seemed neat when I heard about it, but it turns out that it only supports pretty much a single method of capture and only a single method of relaying mouse events. This made it completely unusable for that purpose. AHK supports pretty much every method available to Windows. The simplicity of use was attractive, but it needs a ton of work under the hood before it'd be viable as a general purpose automation tool.

I also experimented with it a few years ago trying to build an automated GUI testing framework, and it turns out that its way too fragile and non-portable to be usable for that use case.

I remember running into issues as soon as anything changed regarding resolution, scaling, graphic settings, color scheme etc.

Pretty fun to make toy programs in to automate stuff with though.

Sikuli is freaking great, though not sure how this is news, it's been around.

Super useful for automating things that are easy to handle by looking for patterns/things on screen and hard to handle with APIs (or lack thereof)

We used Sikuli for test automation in a pretty large project with a Windows UI. Got kicked off by a TeamCity agent for every build and worked really nicely.

Thumbs up.

You do need to be careful about timing and getting the right images so tuning things to work under all conditions is a bit of an art. Also being able to recover from a failure so you can continue testing is another bit of art.

As to why this is news there seems to be a new release out (or soon?) 1.1.0 ... Sikuli development seems to have almost died a few years back but it's made a comeback over the last ~2.5 years which is nice.

I do a similar style of UI test automation for set-top boxes / smart TVs, with stb-tester[1].

We've found that the "Got kicked off for every build" continuous integration process you mention is the crucial part to achieving success with this type of test automation -- if you're going to invest the effort in writing reliable tests, you want to be getting value out of them by running them as often as possible and as early as possible.

[1] http://stb-tester.com/

Nice to see sikuli on the front page. I'm currently using it to test a legacy application.

Sikuli is good at image matching. For me sikuli broke when I started to take images of text. The font would render differently in gnome and the vm (vncserver/twm) jenkins ran the tests on. I ended up creating docker images of the test environemt so the docker image would be the same on jenkins and the testers machines.

Debian has a sikuli package libsikuli-script-java and sikuli-ide. I've also written a docker file for sikuli on debian wheezy [1].

[1] https://github.com/jesg/sikuli

Wrote a WoW fishing bot with Sikuli a few years back - it worked pretty well.

I was a big fan of Sikuli back in the day, but I found it a bit unreliable for automation. No matter how much I tweaked it, it seemed to be a bit unpredictable.

I did find some use for it. My girlfriend got addicted to some online flash Mahjong game and no matter how hard I tried, I could not post a better score than her. With a bit of Sikuli scripting, I was posting top scores in no time!

Reminds me a bit of Scratch[1], a tool that came pretty popular when the Raspbery Pi came out. Very simple programming interfaces that work with simple graphics, but can do a lot of things.

[1]: http://scratch.mit.edu/

Sikuli is being used to create test plans for the VistA system (documented at http://www.osehra.org ) It works with GUI stand-alone executables and with web pages from a browser.

Does anyone use this for web scraping dynamic links created by javascript, pulled from dev tools "network" tab?

Works with the browser, right? So this is the ultimate visual scraping tool, import.io and ParseHub are useless now?

One of the founders of ParseHub here.

Not quite. Sikuli tries to figure out where things are by doing a visual match. This works very well for things like automating applications or sites where page elements are fixed (e.g. finding an option in a menu or using a search engine). But it works terribly when trying to overlay semantic structure on dynamically-generated data. For example, it has no way of knowing that a list of movies is split up on multiple pages, with each movie having multiple genres, a cast, and multiple reviews, each of which has a rating and an author.

There's also the additional drawback that it is hard to parallelize things in Sikuli (you would need heavyweight vms, and there are no obvious "breaks" in the flow). So doing something at scale is not feasible.

With ParseHub, one of the goals is to make it easy to express relationships (and we think we've done a really good job). We also automatically figure out how to split a job up across an entire fleet of servers.

Hope that offers some insight. Email me at serge@parsehub.com if you have any other questions.

What are it's capabilities with OCR?

It has OCR but wasn't working so great. It uses Tesseract. I'm not absolutely sure why it wasn't working well in the past, possibly something to do with different fonts/display rendering (e.g. ClearType and such). It "almost" worked so maybe it got better or maybe there's some tuning you can do. Didn't spend too much time on it.

OCR is never perfect. I do a lot of automated UI testing in a way similar to Sikuli, and while we do rely on OCR a lot, you have to use certain workarounds (like fuzzy matching instead of looking for a perfect match of your expected text).

Ultimately Tesseract was primarily designed to operate on text which had been printed and then scanned, whereas the text on screen is lower resolution, anti-aliased, on a coloured background, etc etc.

Some further details of our OCR investigations here: http://stb-tester.com/blog/2014/04/14/improving-ocr-accuracy...

The TLDR version is: Training Tesseract on your font doesn't help; scaling up the text 3x before passing it to tesseract gives a massive improvement (I don't know if Sikuli does this); normalising ligatures & punctuation gives an additional slight improvement.

Great project, used with great success to automate testing of legacy app across multiple virtual machines.

How is this news? Sikuli has been around for many years.

Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | Legal | Apply to YC | Contact