Hacker News new | past | comments | ask | show | jobs | submit login

Node.js for web scraping usually is the obvious choice:

Scraping using jQuery syntax such as:

  $('table tr').each(function(ix, el) {
    names   .push($(el).find('td').eq(0));
    surnames.push($(el).find('td').eq(1));
  })
is more familiar to most web developers as opposed to the PHP syntax.

Even if Node was 5x slower than PHP I would still go for Node because of its easy jQuery syntax.




Did you bother to read the post? It pitted two different DOM traversal libraries against each other:

* cheerio (https://github.com/MatthewMueller/cheerio)

* PhpQuery (https://code.google.com/p/phpquery/wiki/jQueryPortingState)

Both of these use a jQuery-esque syntax, so your comment regarding DOM traversal in PHP is a moot point.


Yeah the CSS style selectors and methods are the same, I assumed he was referring to the fact that it's all JS.

When you are scraping it's great to be able to do a test run in the browser console and then just paste the code into your node script without any language porting.

It's not an argument that it's better or faster or anything than PHP, just that some find it easier to hack a scraper together in this way.


> Even if Node was 5x slower than PHP I would still go for Node because of its easy jQuery syntax

That "jQuery syntax" has nothing to do with the language itself. jQuery uses Sizzle[0], which is a CSS selector library for JavaScript. There are plenty of PHP libraries which provide CSS selectors, such as the Symfony CssSelector component[1].

[0] https://github.com/jquery/sizzle

[1] https://github.com/symfony/CssSelector


The argument you really should be making is that the Javascript syntax is familiar. jQuery and it's methods for traversing the DOM can trivially be implemented in any langauge. e.g. PHP:

http://symfony.com/doc/current/components/dom_crawler.html#n...




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: