Ask HN: How you scrape websites? - seriousQ
======
remx
Greasemonkey[0] is fairly handy for scraping. You can even include remote
resources like jQuery if you're targeting specific DOM elements in the page.
For storing the data, you can just use LocalStorage (for example if you're
paginating)

[0] [https://addons.mozilla.org/en-
US/firefox/addon/greasemonkey/](https://addons.mozilla.org/en-
US/firefox/addon/greasemonkey/)

------
stefanpie
99% of the time I can easily thow together a simple Python script with the
BeautifulSoup and requests libraries to do any kind of basic scraping, form
text to images or to API endpoints. And most of the time if I need a to store
the data nicely I can either use the built-in csv or json modules or use
SQLite database as a single file (which I believe is also part of the standard
library)

------
brandonlipman
So many ways to do this. What are you trying to scrape? If its light weight,
you could use a Chrome plugin called Data Miner. Right now I am using
Beautiful Soup for most of my scrapping and love it. Python is incredibly easy
to learn and powerful.

------
motyar
I use PHP
[http://motyar.info/webscrapemaster/](http://motyar.info/webscrapemaster/)

------
egfx
Try with another. The qkast web clipper [http://qkast.com](http://qkast.com)

------
jetti
It has been awhile since I last did this but I used HTMLAgilityPack for .NET.

------
limeblack
I've used Phatom.js before. It really depends on the site. Some sites need
full browsers running which you could use selenium.

------
Philomath
If you like JavaScript you may give Nightmare.js a try. It's so simple yet so
powerful. I've used it quite a lot and never had a complaint.

------
hawkweed
JSoup[0] works fine for me.

[0] [https://jsoup.org/](https://jsoup.org/)

------
curiousgal
Scrapy

~~~
espiii
This tool felt like it descended from the gods of software. It's a brilliant
tool.

------
dhruvkar
Depending on how complicated the site its either

 _Selenium with Chrome running in a virtual display.

_ Python requests with Beautifulsoup

------
Madeindjs
Ruby & Nokogiri gem. You can also use Anemone gem to crawl the complete
website.

------
tjalfi
wget -mrnp [https://www.example.com](https://www.example.com)

~~~
bbcbasic
| grep ... | sed ... | uniq ... | etc ...

------
gerenuk
Celery with gevent/process.

