

80legs sets its web crawler free  - raghus
http://venturebeat.com/2009/12/21/80legs-web-crawler-free/

======
e1ven
I love the idea behind 80legs, and their Plura program is a great way to help
monetize webgames, but I just can't get past their interface.

Too many of the things that I want to do require custom code- And while it's
great that they support me uploading custom code right in the window, the
implementation makes it pretty difficult.

For one, since all the code needs to be manually reviewed, I can't rapidly
iterate. Granted, I'm only a mediocre coder, but the way I tend to program
with a new API is to write a Hello World, then expand it outward. Add in one
feature, test, expand it to another, until it does most of what I want. Then,
I send it through a sample-set of data, and if it works, I'm good to go.
There's no way to do this with the 80legs program.

The second problem is that the code has to be in Java. I like Java. I use Java
on my own projects. But java isn't a very fun language. I don't know many
people who wake up in the morning excited because they get to write Java apps.
The thing is, the JVM supports a multitude of scripting languages.. Give us
access to them. Let me upload ruby code, with an idea, click-and-run against
some sample data, and see what happens.

I have a old test project I worked on a year ago, that identifies images and
tracks their sources.. Something like 80 legs would be perfect to "seed" it,
and fill it up with data. But figuring out how to make it work is too much of
a hassle, versus doing the crawling myself, and the speed difference isn't
worth it to me. If it gets done in a week versus 5 weeks, it doesn't make
change very much in the end.

In any event, I do wish them well. It's a very innovative program, and with
some implementation work, I think it could do very well.

~~~
axod
>> "But java isn't a very fun language."

No language is "fun". It's what you create with the language that is fun :)

The worst thing about the tech industry is the constant language wars.

~~~
e1ven
Sure, that's a fair point, but ensuring type safety is something I do on
projects because I'm afraid of the long-term consequences if I don't, not
because it makes me all giddy inside.

------
PanMan
I think it's a confusing title: It's now free under 100K pages. But that
costed 0.2$ before; hardly a hurdle. If you want tot spider 110K pages, before
it would have cost $0.21. Now it suddenly is $100/month. Doesn't seem free to
me, or a better deal.

~~~
bravura
I really enjoyed 80legs and used them a lot, until they changed their pricing
structure.

Crawling-as-a-service makes sense. Thousands upon thousands of people need
crawling as a service.

Crawling on a subscription basis? Not so much. How many organizations are just
crawling crawling crawling, and need to do so all the time?

Regardless, I wish that I had been grandfathered in to the old pricing
structure. I've been using 80legs since the beginning, and have been an
advocate for it from day one. It really sucks that, having helped promote the
service, I am now forced to get a cut-back in affordable service.

------
gfodor
Usually when you're going to do a price hike like this you want to have some
other big news to offset the fact that if readers read between the lines they
realize they are paying ridiculously more today than yesterday for your
service. The fact that they're wrapping it into this "free" banner just makes
it even more disingenuous.

Better would have been "80legs is changing our pricing structure. But, as a
gift to everyone who has been using our original (albeit short-sighted)
pricing structure, anyone on the old pricing structure will continue to be
able to use it until Q1 2011." or something. Along with a feature release,
this would have avoided the impending blowback once people understand what's
really changed.

~~~
bravura
I really wish they had grandfathered me in to the old pricing.

------
bluebird
Their service would take off much more if they offered a Python, Ruby, or
JavaScript API.

~~~
buro9
No.

The service would take off much more if instead of defining search patterns as
regular expressions they were defined as jquery style expressions that
acknowledged DOM and allow you to find all <title> tags that exist in the
<header>. Yes you can do this with regexp, but parsing HTML shouldn't be a
regexp task.

Oh, I'd like to see email gateways too... point a stream of emails at it and
parse those. I'm thinking of scenarios like tripit.com taking in tons of
different emails and parsing them to extract travel info.

~~~
icey
I'm building something right now that includes page parsing, and so far I've
only been building in regex support. I like your jQuery selector idea as well,
are there any other ways that you can think of that would make searching the
contents of a page programmatically easier for you?

~~~
qeorge
May I suggest taking a look at Parsely? Its the syntax they use on
www.parselets.com. The documentation for implementing it in your own apps is a
little sparse, but the data format is awesome. Here's one that describes
scraping HN:

<http://parselets.com/parselets/yc/14>

Might not be a fit for your project, but in terms of describing parsing
instructions to a crawler its the best format I've ever seen.

~~~
icey
I'm not crawling, but that is pretty interesting looking. I'll bookmark it and
take a look at it for later for sure - thanks!

------
tlrobinson
I like how the registration page includes Hacker News as an option for the
"where did you hear about us?" question...

~~~
jdrock
We actually get the most traffic and signups when an article about us is
posted on HN.

------
cpach
I think the title is confusing. I was expecting to see an article about 80legs
releasing open source code.

~~~
jdrock
Actually, a lot of the code for our 80Apps is open source.

