
Ask HN: Who crawls websites on a regular bases and why? - sshaginyan
Hi folks,<p>I&#x27;m thinking about building a web crawling service very similar to kimonolabs. Before I do so, I&#x27;m trying to figure out who my target audience should&#x2F;could be.<p>I started off thinking about a tool that I would personally use which is sort of a web polling trigger. For example, login to website A, sort a list by relevance, check if the first item on the relevance sorted list is greater than X, poll until true (once true, send an email&#x2F;send an sms&#x2F;make an api call), then login to website B, insert item from website A into an input field and submit. If I were to start off with this, who would use it the most? (Maybe a financial analyst? investor? Sales? Market Research?)<p>I&#x27;ve also been thinking about building the service particularly for data scientists&#x2F;analysts. Some features would include visualizing datasets, clustering analysis, sentiment analysis,
relational &amp; and non-relational database modeling (similar to MySQL work bench) directly in the browser, integrations with IBM Watson (https:&#x2F;&#x2F;www.ibm.com&#x2F;watson&#x2F;developercloud&#x2F;personality-insights.html)<p>What do you guys think?
======
teapot01
My only suggestion is you should find a problem and build a solution. It
sounds like you have a solution and you want to man-handle it onto a problem.

~~~
sshaginyan
I'm just wondering what problems Kimono, Apifier, and 80Legs are trying to
solve.

------
Jugurtha
This is all great to write down as to what it could eventually become.. I
think it has to be chunked to more manageable units, first. I mean, whether
you have the individual skills required or not, it'll start by requesting a
page and inevitably forgetting to show the respect Unicode deserves.

That's what I'm doing to learn.

\- See how a page is structured (does it use schema.org's stuff?). Its fields,
URL pattern, sitemap, resource urls, selectors, etc.

\- Fetch a page. \- Parse it and extract data. \- Save data. \- Rinse, repeat.

I'm also learning about D3 and many cool things.

Check out
[http://atlas.media.mit.edu/en/profile/country/usa/](http://atlas.media.mit.edu/en/profile/country/usa/)

As to your target audience, I think you'll probably serve the Many-Faced God.
In a gold rush, people who sell shovels make a good living. You probably want
to make something that helps with decision making or triggers buying according
to price (from your description).

------
cblock811
I used to work for a company Zillabyte that had a lot of web data. Mostly
marketers and sales people were looking for lead generation. Let's say you
work for Mixpanel, and you want to know all the websites out there with
Kissmetrics installed. Even better, a report every month showing who
uninstalled their analytics tools. Others looked for more general signals than
javascript snippets (weak text analysis), but that was the bulk of what people
asked me for.

------
tyingq
80legs has a customer page with a few short blurbs about why each specific
customer is crawling. [http://www.80legs.com/our-
customers.html](http://www.80legs.com/our-customers.html)

Competitive analysis is, I suspect, one of the more popular reasons, and not
much public info on that...for obvious reasons.

