

80 legs: Web Crawler as a Service - luckystrike
http://www.80legs.com/index.html

======
westside1506
Hi guys. We were actually about to do an "Ask HN: Review our startup" post,
but I guess someone beat us to it.

So, please review our startup. :)

We are launching the beta today to a handful of users and will be letting in
more and more users over time.

One other note: We don't just offer crawling. Our model is actually to allow
you to analyze the web content that you discover. Using your own custom code
that you push into 80legs, you can do sophisticated text processing, image
processing, look inside PDFs, etc.

~~~
luckystrike
Sorry for hijacking your plan to post here first, but i found the idea
incredibly cool and useful, and didn't know you guys are around here on HN. 80
legs can potentially save a lot of effort for people/companies who need to
crawl web for data and analyze it.

Hope this really works out well for all of you.

p.s. Just in case you are curious, i got a reference note about your
application from someone who was following Web 2.0 Expo.

~~~
westside1506
Thanks for the nice comments luckystrike. We had a great time at the Web 2.0
Expo and we've been overwhelmed by all the interest in our service. We're
pretty hopeful that we've built something that people want. :)

------
mjs
Interesting, it's a botnet! From the FAQ: "How can the prices be so low?"
"Plura pays developers to embed lightweight widgets in their desktop
applications or websites. These widgets harness the idle and excess bandwidth
and computing power on the computers of people using the applications and
websites."

~~~
westside1506
Plura affiliates actually accept responsibility for getting the permission of
their users. Plura encourages disclosure and has found that it is actually
very well received by the users once it is explained. It always works our
better for Plura affiliates when they disclose. To that end, Plura has
actually changed it's TOS with affiliates so that they directly take
responsibility for getting user acceptance.

Most Plura apps/websites give users optin/optout capabilities. Rather than
anything ill-intentioned, the actual model is really that Plura gives
application developers a means of offering their application at a discount (or
free) to users that don't mind trading their excess computer resources for the
app. For those that don't want Plura+free, the application developer can give
them other options (pay, ads, whatever).

Once the users really understand it, they are almost always happy that the
developer has a new means of monetization so that the developer will continue
to improve the software they are using.

BTW, this all runs in a secure java sandbox where nothing can actually see the
users data, disk, what programs are running, or anything else about the
computer. Plura has gone to great lengths to try to sanitize the entire
process and be good guys.

~~~
kiba
Interesting way to earn money. However, why a regular user can't run a plura
client too so that they can earn cash themsleves?

~~~
westside1506
There's certainly no problem with individual users doing it. Just contact
Plura through the web form at <http://pluraprocessing.com/contact.php>.

Alternatively, you can use one of our affiliates to raise money for charity
(not related to our company) <http://donatebot.com>

------
gojomo
Very interesting service! A number of questions...

What User-Agent do you use?

Do you crawl non-textual resources?

Do you save all headers from the crawled responses?

Do you perform any processing on the returned content (like de-chunking or de-
compressing) or can it be retrieved verbatim?

If two customers request the same URL/site be crawled, are their requests
merged so the site is only crawled once?

Do you save the exact time of the request (not trusting the returned 'Date'
header)?

