
Using Scrapy to Build Your Own Dataset - rbanffy
https://medium.com/@GalarnykMichael/using-scrapy-to-build-your-own-dataset-64ea2d7d4673
======
slezyr
Just in time. That XPath example is really useful for me.

I didn't finished the article yet, but I have one question. Is there ability
in Scrapy to append new data after initial scrapping?

For example site at the moment of scrapping:

    
    
      Article 1  
      Article 2
    

Then we run scrapy again after some time and site will look like this:

    
    
      Article 1  
      Article 2  
      Article 3  
      Article 4
    

Can it just append article 3 and 4 to file/db table and stop scraping?

~~~
karma_fountain
This project might give you some ideas [https://github.com/TeamHG-
Memex/scrapy-crawl-once](https://github.com/TeamHG-Memex/scrapy-crawl-once).

That guy has quite a few scraping projects.

------
jonotime
Scrapy is great. I have been using it at work to populate our solr index for
our site search page. It is very extensible and powerful, but helps to
understand Twisted if you need to do anything advanced.

------
thesehands
Remember to scrape responsibly, check terms of service, robots.txt etc.

~~~
eccfcco15
Why? Isn’t it better to use a vps or two (or even just a vpn), and strong rate
limiting? I.e. try not to be noticed regardless of the robots.txt.

~~~
JosephRedfern
Because it's considered polite.

------
CamelCaseName
I used Scrapy extensively last term in my non-CS co-op. Really loved how
quickly I could make beautiful dashboards with a bit of Python / VBA.

My problem with Scrapy was that there was a lot of manual work involved for
each site, but perhaps that speaks more about my proficiency rather than
Scrapy.

What's the next step from here in terms of gathering, cleaning, and analyzing
data?

------
daotoad
Scrapy? Really?

Scrapie is a transmissible spongiform ecephalopathy, similar to mad cow
disease, in sheep and goats.

[https://en.wikipedia.org/wiki/Scrapie](https://en.wikipedia.org/wiki/Scrapie)

~~~
CamelCaseName
The "Py" is for python.

