
Noodlejs - Node.js web scraper - charlieirish
http://noodlejs.com/
======
premasagar
I didn't submit this, but I did oversee the project.

Noodle is a Node-based web scraper that also handles JSON, XML and other file
formats. It was initially built as a hack project to replace a core subset of
YQL.

All responses can be served as JSONP, to allow for cross-domain scraping from
a website's front-end.

Selector queries can be used to grab a subset of a document - e.g. CSS
selectors for HTML documents and dot-notation for JSON documents. It lets you
request multiple documents in a single HTTP request, and a few other things.

I helped to guide an intern, Aaron Acerboni, at my company, Dharmafly, when he
built it last year.

------
hmottestad
I can recommend formatting the JSON. Like this:

    
    
      demoElement.innerHTML = JSON.stringify(data, null, 4);
    

And then use "<pre>" tag in the demoElement.

~~~
premasagar
A good idea. I've just added that to the docs site.

------
nonchalance
For those looking for the source, you can manually get it:

[http://npm.im/noodlejs](http://npm.im/noodlejs) is the NPM package

[https://registry.npmjs.org/noodlejs/-/noodlejs-0.2.0.tgz](https://registry.npmjs.org/noodlejs/-/noodlejs-0.2.0.tgz)
is the package source

    
    
        $ curl -kO https://registry.npmjs.org/noodlejs/-/noodlejs-0.2.0.tgz
    

The code is pure-JS (not a C++ addin)

------
AnSavvides
I get a 404 when I try to access the project's GitHub page or download it.

~~~
premasagar
Whoops. Now fixed. Not sure how that happened.

------
lkinc
README
[https://npmjs.org/package/noodlejs](https://npmjs.org/package/noodlejs)

Npm package npm install noodle

Npm tarball curl -kO
[https://registry.npmjs.org/noodlejs/-/noodlejs-0.2.0.tgz](https://registry.npmjs.org/noodlejs/-/noodlejs-0.2.0.tgz)

~~~
lkinc
noodle should be noodlejs

------
bdcravens
Unless I'm reading wrong, this scraper is a slightly prettier API to what most
do: a simple curl, structuring the data with selectors.

Like most it ignores page requiring state or anything dynamically generated on
the client.

------
harryf
How does this compare with Cheerio? ( see discussion on HN from yesterday -
[https://news.ycombinator.com/item?id=6273905](https://news.ycombinator.com/item?id=6273905)
)

~~~
irickt
I can't answer your question without reading further, but I note that Cheerio
is a dependency of Noodle.

------
Misiek
how to get the list of urls in this example on
[http://noodlejs.com/](http://noodlejs.com/)?

the selector: 'h3.r a' allows only to get the list of names

~~~
premasagar
Simply replace `extract: text` with `extract: href` for the anchor's href
attribute. You can do this with any kind of attribute.

------
BaconJuice
This is very cool! Thanks for sharing.

