

Amazon EC2 with Hadoop is 4 times more expensive than running on our own cluster - phaefele
http://deepvalue.net/ec2-is-380-more-expensive-than-internal-cluster/
Deep Value runs Hadoop at scale on EC2, but we find that running our own cluster is significantly cheaper. By significantly, I mean 3.8 times cheaper.
======
senthilnayagam
Are you implying Amazon web services is a high margin business? That would be
good for Amazon stocks

I agree virtualised server performance is lower than equivalent hardware, but
order of 4 magnitude seems very high and improbable.

Assuming you don't use S3 locally, I don't see EMC mentioned anywhere in your
local cluster, we should remove that in comparison

~~~
phaefele
Some of the difference comes about due to the number of cores in the machines
we purchased - they are 16 core machines (each E5-2650 has 8 cores.)

In terms of storage, we are utilizing HDFS with commodity 3TB drives, thus no
specialized storage from EMC (or the like.)

------
vbm
I am not surprised by your calculation. Amazon has to earn from the service it
provides. It is not on the basis of no-loss no-gain. It is well known that the
EC2 is great if you have to quick start your system without worrying about the
hardware infra. It is also good for initial quick scaling. And it is good to
some extent of scaling after that it is not advisable? If would be cheap, then
why not other big companies will use amazon rather than managing their own
data center?

~~~
phaefele
Understood that Amazon needs to make a profit, but these are still dollars
that I need to pay for, that I could be feasibly be using for something else
(hiring great developers perhaps.)

I think the advantage of EC2 is exactly fast scaling, and an easy sell to
management (no big upfront.) But as we started using it at size and
consistently, it is just cheaper to run our own hardware as the blog attests
to.

We do all the racking and cabling using "remote hands" directed from our SAs
in India, and so far for us this has not been the complex part. Getting the
Hadoop configured and our software running efficiently if several orders of
magnitude harder, and EC2 doesn't I think help here. If anything it hinders as
we are dealing with virtual hardware. There have been several posts about
"lemon" EC2 instances and how you should test your instance before using it.

~~~
dnu
Using EC2 also takes away a part of the risk. If your startup fails in a year,
then you don't get stuck in the end with a pile of hardware (for which you
paid big bucks).

I guess that the best would be to use EC2 in the beginning, when you don't
really know how much hardware you need, and later on use EC2 only for demand
spikes.

~~~
phaefele
Yes - this is what we are doing. With this 4x cost differential however, we
calculated that it takes only 6-8 months to have the hardware pay for itself,
so if you plan to be in business in that time frame you are better of buying.

------
okrasz
I would also suggest taking into account other cloud options. Amazon is very
often not the cheapest option. Just compare it here:
<http://www.cloudorado.com/>

------
ankushb
Nice analysis, ec2 certainly comes with its own problems, Like difficult to
manage and hard to integrate ec2 hadoop cluster with private cluster

~~~
phaefele
We have run into issues with VPC being tied to a specific zone and the
availability in that zone.

