

Show HN: Ruby gem to scrape a web page - daviducolo
https://github.com/davidesantangelo/webinspector

======
nathan_f77
Good work, but you might not have heard about Mechanize:
[https://github.com/sparklemotion/mechanize](https://github.com/sparklemotion/mechanize)

~~~
superplussed
The chance that he never heard of Mechanize is approximately 0.

~~~
nathan_f77
True, I checked his profile and he seems like a very experienced web
developer. I guess he wanted to make something that was a little bit easier to
use for simple web scraping.

------
mapgrep
I was surprised there's no way to query the page beyond the small list of
element accessors you provide (body, url, scheme, host, port, title,
description, links, images, meta).

When you're putting together a tool like this, it's nice to give the user some
way to "escape" your framework and get to lower level underlying data.

Why not offer something like

    
    
      page.selector('h2 p') #returns Nokogiri elements
      
      page.h1 #calls method_missing, returns Nokogiri elements
      
      page.p #ditto
    
      page.noko #returns underlying Nokogiri doc
    

Also, you forgot to include the body accessor in the "Accessing inpsected
data" portion of the doc.

------
mjands
Similar gem that scrapes OGP and oEmbed tags as well as HTML tags. Also
configured using Faraday and allows for serialization/deserialization of
underlying data:
[https://github.com/socialcast/link_preview](https://github.com/socialcast/link_preview)

------
purephase
Always nice to have alternative. Mechanize is certainly the big player in this
space, but I like the use of faraday here.

Thanks for sharing.

------
AznHisoka
What does this use underneath? I wouldn't use it unless I know whether it uses
libcurl or something else.

~~~
JustinAiken
Looks like net:Http ->
[https://github.com/davidesantangelo/webinspector/blob/master...](https://github.com/davidesantangelo/webinspector/blob/master/lib/web_inspector/page.rb#L95)

Would be easy enough to swap out if you prefer something else though - Faraday
supports a good number of connectors.

