
Show HN: Apify – Turn any website into an API - jancurn
https://www.apify.com/
======
jancurn
Hey guys, this is Jan, co-founder of Apify.

Two years ago we showed HN Apifier - a hosted web crawler for developers -
during our time in the Y Combinator Fellowship.

Today we're launching the largest upgrade of Apifier to date - a new product
called Actor and a complete redesign of our website. Also, we changed our name
to Apify.

Actor is a new serverless computing platform that enables execution of
arbitrary pieces of code in the Apify cloud (we call them "acts"). For
example, you can have an act to run a web automation job in headless Chrome
with Puppeteer, to post-process data from the crawler in order to remove
duplicates or to upload new contacts into your CRM. The possibilities are
unlimited.

We also launched a library
([https://www.apify.com/library](https://www.apify.com/library)), where you
can find acts and crawlers built by other people and share yours to support
the web scraping community. There are already several acts and people are
adding new ones almost every day.

We're really looking forward to hearing what you think about Actor and to
seeing what you can built with it. If you have any ideas, questions or
feedback, just let us know on support@apify.com or here.

------
RileyJames
[https://www.apify.com/pricing](https://www.apify.com/pricing)

This is the best pricing page I’ve seen on mobile, hands down. Pricing pages
are so often broken completely, or a list of boxes (difficult to compare
plans). This beautifully handles both. Well done.

[https://imgur.com/gallery/i5WvP](https://imgur.com/gallery/i5WvP)

~~~
pedalpete
I had to open up the pricing page just to see what you were raving about.

I'll admit it's well done, and a simple solution, though I'm not sure it
deserves the rave review.

What ones do you hate so much?

------
fiatjaf
I've used Apifier a lot of times and is the best of all the similar products
in the market (or at least the other 4 (?) I've tried).

~~~
jancurn
Thank you :)

------
RHSman2
A replacement for the most awesome Kimono Labs!

------
boundlessdreamz
Currently I use phantomjs via selenium hub for a product and would like to
migrate to chrome but couldn't much information on how to run a chrome
cluster. So it will be great if you can shed some light on how you have set
this up.

~~~
jancurn
Our crawler
([https://www.apify.com/docs/crawler](https://www.apify.com/docs/crawler))
currently runs on PhantomJS, while Actor
([https://www.apify.com/docs/actor](https://www.apify.com/docs/actor)) can run
arbitrary jobs, including jobs that use headless Chrome and/or Puppeteer - for
example, see [https://www.apify.com/docs/actor#examples-
puppeteer](https://www.apify.com/docs/actor#examples-puppeteer)

BTW we're working on migration of Crawler to headless Chrome.

If you could provide more details about your use case, I'm sure we will figure
out how to do it with Chrome.

------
ryeguy_24
Love this. How many paying users do you have so far?

~~~
jakubbalada
(co-founder here) We have hundreds of customers, half of them on recurring
subscriptions, half just one-time customers paying for crawler configurations.

------
anotheryou
How legal is web scraping?

~~~
jakubbalada
Typically depends on what you do with scraped data. There is an interesting
recent US court ruling [1]

[1] [http://www.reuters.com/article/us-microsoft-linkedin-
ruling/...](http://www.reuters.com/article/us-microsoft-linkedin-ruling/u-s-
judge-says-linkedin-cannot-block-startup-from-public-profile-data-
idUSKCN1AU2BV)

~~~
seanwilson
If a court decides the usage wasn't legal, what's your liability as the
service that provided the crawler?

~~~
jancurn
We consider ourselves only as service providers. We provide technical means to
perform the crawling but the responsibility for the actual crawling is with
our users. It's similar to Amazon AWS - they only provide infrastructure and
it's your responsibility not to use it for anything illeagal.

~~~
seanwilson
Just curious but how to do enforce this assigning of responsibility? You make
the user agree to terms of service before crawling?

~~~
jancurn
Yes, every user has to agree with the Terms of use when signing up.

------
dannyhw
Gonna check it out. I always use request and cheerio on node so it's almost
identical. Is this an actual headless browser though? When I have to go that
route it's painfully slow.

~~~
jancurn
The crawler uses PhantomJS, but you can also create jobs that use a plain HTML
downloader or even cheerio. The platform is very flexible.

~~~
dannyhw
Very cool. I'm looking at it more in depth and I'm pretty impressed. This is
basically all things I've been doing for years, but encapsulated in a way that
I'm pretty sure end users will be able to use. I've always built these beasts
and tried to evangelize for a MEAN stack as the best option but people haven't
been ready. I have never met a problem I felt promises were a good solution
for, but you've done a really great job of using them to let people who can't
think asynchronously get things done.

Pseudo URLs are also something I've done before but are never the optimal
solution. Again though, this is something that will allow a total beginner to
catch up to me about 75%, and I've been developing for 25 years.

