

Web Crawling: Why is Service a Better Option than Software? - grepsr
http://www.grepsr.com/blog/web-crawling-why-can-service-be-a-better-option-than-software/

======
weland
This kind of article is honestly insulting to the intelligence of anyone who
has written more than a dozen lines of code. _It literally says nothing_ while
"magically" hiding the fact that "services" are just "software" handled by
someone else. Seriously?

~~~
icebraining
Hmm?

 _" Grepsr (...) uses software as a service"_

 _" With service mode, however, the software that runs the service (...)"_

 _" A powerful crawling software runs at the back of our system (...)"_

------
lesingerouge
Having had direct contact with all the issue of massive crawls (talking about
tens of thousands of websites daily), I can say that it depends on the scale
and "business-criticality" of the crawl. For the types of crawls that their
product seems to be targeted at (quite small scale and with a very precise
"point and gather" approach), it might be efficient to go the service way.

But the author fails to mention that the outsourcing decision holds true only
for all non-mission critical crawls (whether it's for marketing, accounting or
any other operations) while it's totally unfeasible if your company is doing
big number crunching. When the product is the data, you need to control the
accuracy and timeliness of the crawl without a doubt.

~~~
grepsr
Right now most of our revenue is from bigger projects servicing products which
are data centric. We cater both type of projects -- small and big.

Not all companies (consultant firms, investment banks, equity researchers)
have the expertise to do a in-house solution for bigger projects. This is
where a service has an upper hand over using a tool.

------
known
[http://www.bbc.co.uk/news/technology-23866614](http://www.bbc.co.uk/news/technology-23866614)

    
    
        Google handles more than 100 billion searches each month or more than three billion per day.
        Google has to crawl more than 20 billion pages a day to keep up with the ever-changing web.
        Google has found 60 trillion web addresses so far, up from one trillion in 2008.
        Google has indexed pages from over 230 million domains.
        A query on Google travels 750 miles each way, to and from the data centre.

