Node.js for web scraping usually is the obvious choice: Scraping using jQuery sy...

jqueryin · on Nov 12, 2013

Did you bother to read the post? It pitted two different DOM traversal libraries against each other:

* cheerio (https://github.com/MatthewMueller/cheerio)

* PhpQuery (https://code.google.com/p/phpquery/wiki/jQueryPortingState)

Both of these use a jQuery-esque syntax, so your comment regarding DOM traversal in PHP is a moot point.

tehwebguy · on Nov 12, 2013

Yeah the CSS style selectors and methods are the same, I assumed he was referring to the fact that it's all JS.

When you are scraping it's great to be able to do a test run in the browser console and then just paste the code into your node script without any language porting.

It's not an argument that it's better or faster or anything than PHP, just that some find it easier to hack a scraper together in this way.

wldlyinaccurate · on Nov 12, 2013

> Even if Node was 5x slower than PHP I would still go for Node because of its easy jQuery syntax

That "jQuery syntax" has nothing to do with the language itself. jQuery uses Sizzle[0], which is a CSS selector library for JavaScript. There are plenty of PHP libraries which provide CSS selectors, such as the Symfony CssSelector component[1].

[0] https://github.com/jquery/sizzle

[1] https://github.com/symfony/CssSelector

deanc · on Nov 12, 2013

The argument you really should be making is that the Javascript syntax is familiar. jQuery and it's methods for traversing the DOM can trivially be implemented in any langauge. e.g. PHP:

http://symfony.com/doc/current/components/dom_crawler.html#n...