Looking at the source right now. Noticed comments in the code along the lines of...

bendykstra · on May 12, 2014

Hi, one of the authors here. First, though I worked on the paper, I am not employed by CGD, so these comments are my own.

The main reason for using the non-headless Firefox WebDriver was that we wanted the script to access the site just like a human user. This made it easy to explain to non-technical people exactly how we had gotten the data. We didn't want to do anything that could be seen as circumventing the interface that the World Bank had created for that purpose.

Up to a point, performance was not a concern. In fact, as slow as Selenium is, we still artificially limited the speed of the script by waiting three seconds between each set of queries. However, when it came to selecting options, it could take Selenium tens of seconds, so that was done with js.

tokenizerrr · on May 10, 2014

Unless you need to evaluate JavaScript or take screenshots of the rendered page is there any point at all in using a webdriver like that instead of building a plain old scraper?

dmn001 · on May 10, 2014

I agree, using Selenium seems like an unnecessary waste of time and cpu resources when replicating the GET/POST requests and parsing the html response using a simple Perl or Python script would have sufficed.

obstacle1 · on May 11, 2014

Great point.