
Why we ditched Amazon AWS - defied
http://blog.testingbot.com/2013/12/06/testingbot-has-moved-to-its-own-cloud
======
programminggeek
This is a pet peeve of mine, but when you are running your own servers in a
datacenter, that is not "your own cloud", it's just your own servers.

I realize the cloud is just a marketing term and it doesn't really mean
something, but if I move my single heroku dyno app to my own dedicated server
sitting in my basement, did I just move it to my own private cloud?

~~~
dpark
If you have a server farm that allocates VMs dynamically on demand, you have
something much different than a dedicated server, and it is quite reasonable
to call this a private cloud.

~~~
asdasf
Except people were doing that long before the "cloud" marketing nonsense
started,

~~~
dpark
You just don't like the word "cloud" at all, and that's fine, but that is a
different complaint from the original, which was basically that the term is
being misused when combined with the term "private". If we are going to use
the term "cloud" and assign it a meaning, we can certainly use it to refer to
private clouds. If EC2 is a cloud for me, is it not also a cloud when Amazon
builds services on it?

~~~
michaelt
If we're going to assign a meaning to the term 'cloud', shouldn't we assign it
to something we don't already have a word for, so it actually describes
something?

If I start two VMs on my desktop and we call that a 'cloud' then the word
'cloud' isn't terribly useful in describing anything.

~~~
dpark
I don't think the accepted definition of "cloud" is just "group of VMs". Your
two VMs does not constitute a cloud. You can set up ten thousand VMs on a
thousand servers and not have anything that would reasonably be considered a
cloud if you did it manually. "Cloud" generally implies simplicity and
automation. You could build a cloud with physical machines instead of VMs so
long as you have automatic ways to allocate and manage the machines'
lifecycles.

------
UK-AL
"AWS is expensive, as soon as we start an instance, we’re billed for the
entire hour, even if we only need to run a 2 minute test on it."

What you really needed to do was design algorithm that keeps machines running
over period in time, according to a general trend. Not shutdown, and instantly
bootup for every customer deciding to run tests.

~~~
lotsofcows
What you really needed to do was move to Google who bill by the minute.

Not a Google shill by the way - I don't have a preference yet. However, for
that particular complaint, Google has the solution.

~~~
octix
I think you get billed for at least 10min... still better than 1 hour though.

~~~
Florin_Andrei
Right, 10 min upfront, then by the minute. The last minute chunk is discarded
from billing if it was actually used less than 15 sec.

Or so I remember from reading the ToS yesterday.

The command-line cloud utility is so much easier than Amazon's. Everything is
in one place. ssh keys are managed automatically. In less than 1 hour I was
able to do everything with Google, whereas it took me a couple days to feel
comfortable with AWS.

------
jonaldomo
For those unfamiliar with Amazon, Amazon AWS consists of over 25 different
services. This article focuses on EC2, the virtual server service.

I use S3 and DynamoDB regularly and think the pricing is better than most. I
don't use it for pricing though. I use it because I don't have to worry about
load balancing, adding multiple servers, running out of disk space. Set it up
and forget about it.

------
gfodor
I'm confused. Noisy neighbors and the "round up to the hour" AWS billing can
both be fixed by more active instance management. It's not trivial but it's
also not incredibly complex. Moving completely off of AWS has huge
ramifications for your business in the long run, some positive some negative,
and it seems weird to make such a bold and disruptive move based upon two
issues that are both well known and fairly straightforward to address.

~~~
JoeAltmaier
He has a constraint that 'instance management' doesn't fix: he wants a clean
VM for every test run. By definition that precludes instance management?

~~~
anko
It's a pretty arbitrary constraint. I wonder if he's looked into docker or
other methods of containerisation?

------
wikiburner
That's funny, I was just reading this blog post from Adrian Holovaty on some
of the issues he ran into with deploying Python/Django on Heroku and how great
AWS has worked out for him with Soundslice:

[http://www.holovaty.com/writing/aws-
notes/](http://www.holovaty.com/writing/aws-notes/)

He just about had me sold.

Anyway, I'm definitely in the wanting to avoid sysadmin stuff if at all
possible camp. Does anyone have any thoughts on AWS vs Heroku for Django? Does
Adrian's solution seem reasonable?

~~~
mamcx
Mi main concern is about keep well & alive the database (postgres). I can do
the app server but the data server is the one that scary me ;)

~~~
griffordson
RDS recently added cross region snapshot copies, and more importantly, cross
region replication for at least MySQL. Those were the last two features I was
waiting for to jump to RDS. In fact, I was in the process of setting up cross
region backups using just the cross region snapshot copy when they released
the new cross region read replicas feature.

I'm very much looking forward to moving our manually managed cross data center
MySQL replication/admin to RDS. But I also don't have to scale MySQL to
ridiculous levels either, I just need very high availability.

------
apawloski
"Noisy neighbors: sometimes instances would behave much slower than usual,
because other people on the same hypervisor were using all the hypervisor’s
resources"

The Xen hypervisor actually has pretty good resource allocation between
virtual machines, and although there are academic attacks [1], I'm interested
to hear what evidence you've seen of neighbors hogging your resources.

[1]
[http://pages.cs.wisc.edu/~rist/papers/rfa.pdf](http://pages.cs.wisc.edu/~rist/papers/rfa.pdf)

~~~
asdasf
I think most "noisy neighbor" complaints actually come from the noisy
neighbors. EC2 lets you use CPU resources of your neighbors if they aren't
using them. So people hammer the machine and expect that level of performance,
then when a neighbor does something and suddenly you are scaled back to only
using the resources you actually pay for, you feel it is slow and complain.

But I think if EC2 didn't let you borrow CPU, nobody would use it because they
would realize how absurdly underpowered EC2 offerings really are.

~~~
lsc
>But I think if EC2 didn't let you borrow CPU, nobody would use it because
they would realize how absurdly underpowered EC2 offerings really are.

yeah, that's pretty much the crux of virtualization; it's a lot like buying
bandwidth. Yes, your upstream is oversubscribing. Yes, if they do this right,
99% of the time, you won't notice the oversubscribe; you will get more service
for less money vs. something that isn't oversubscribed. But, oversubscription
needs to be managed carefully.

One thing I've noticed? If your upstream has a 1000mbps line, and sells 1000
unlimited 10Mbps ports, you are almost never going to see contention, even
though it's a 10x oversubscribe.

If your upstream has a 1000Mbps port, and sells 10 1000Mbps ports? that's the
same 10x oversubscribe. But I /guarantee/ that you will hit contention at
least once a week. Probably more often. If people expect 1000Mbps reliable off
that, they will be very unhappy. (Of course, if you setup your QoS properly,
and tell the customers that it's 100Mbps CIR and up to a gigabit of best-
effort burst? it can work out just fine. But nobody is going to get a reliable
full gigabit out of that deal.)

90% of your users are using like 10% of your resources. But you've always got
a few who are running torrents (or, in the case of CPU, mining primecoins) In
the 1000 10Mbps port situation? it doesn't really matter if you've got a few
bittorrent users. In the 10 users who can all completely fill the pipe
situation? it matters a /lot/

That's the thing about CPU sharing, though; most of the time you don't put
that many guests on one machine, and often you give each guest the ability to
use the whole machine (when it's otherwise idle) - so you are in the situation
of selling 10 1000Mbps links when you only have 1 1000mbps uplink, which ends
in tears if anyone actually expects a reliable 1000Mbps uplink. (Now, if
everyone understands that it's actually 100Mbps CIR that can burst to
1000Mbps, then sure, people can be happy. but you have to be careful with
those expectations. With the 1000 10Mbps links on a 1gbps uplink, customers
can treat their 10Mbps link as 10Mbps dedicated, and 99% of the time, they
will get what they expect.)

~~~
larrys
"If your upstream has a 1000Mbps port, and sells 10 1000Mbps ports? that's the
same 10x oversubscribe. But I /guarantee/ that you will hit contention at
least once a week. Probably more often. If people expect 1000Mbps reliable off
that, they will be very unhappy. "

I like that example.

Reminds me of account receivable and bad debt.

Better to have 1000 customers that owe you $30 each rather than 10 customers
that owe you $3,000 each. I'm not factoring into this example the cost of
billing or customer service. Strictly that if you have 1000 customers it's
much less aggravating and you don't loose sleep at night worrying about a big
customer that doesn't pay a bill.

~~~
lsc
Yup. And either way, you can solve the problem with capital. (Having a bunch
of money laying around to cover the shortfall in the case of A/R, or having a
very large burstable uplink with a small commit in the case of overselling
bandwidth.)

Of course, burstable uplinks all suffer from the 'best effort' issue... if you
go beyond your CIR, well, there's usually headroom, but not always.

------
ckdarby
Blog down, clearly shouldn't have ditched AWS

------
chrisferry
Looks like "their cloud" couldn't handle Hacker News' traffic. Had they set up
their blog in an AutoScaling group in EC2 this would not have been a problem
:)

------
fat0wl
Ahhhh whew. I thought this was gonna be an article about people switching back
to Heroku & my blood was gonna get all angried up.

On the other hand, people wanting to set up their own server clusters...
VMs... Linux container stuff... that's heartwarming :)

~~~
dblacc
I was just about to launch my first app on heroku.. whats wrong with it ?

~~~
fat0wl
actually that may be a good use case for it!

it's real seamless and nice just ends up being expensive after a while if you
are trying to do enterprise stuff & their add-on services aren't particularly
cheap. The customizability of AWS & private stuff means you have to do a bit
of server admin but it is generally cheaper & can give better performance.

EDIT: Oh also note that (my biggest gripe) I see big performance swings /
queue-ing issues that aren't really correlated with traffic. Plus they
introduce some platform/API changes intermittently that make their admin UIs
kinda buggy. Or you have to change your workflow to integrate with their
services (though the APIs for this are usually ok). I don't know, I feel like
they make a lot of money draining people on threads since the performance of
RoR overall is kinda questionable, along with the performance of their
platform. I'm going to try with Play framework soon & see if there are less of
these issues.

------
patrickg_zill
I would encourage you to look at OpenVZ for anything Linux-based. Starts
faster, uses less resources than KVM.

The gain might not be much however, given how powerful today's CPUs are.

~~~
notacoward
Also, OpenVZ has faulty POSIX support in some cases. I'm particularly aware of
how the lack of xattr support has bitten many GlusterFS users, but the same
lack could affect anyone who wants to use SELinux, ACLs, or just xattrs for
their own sake.

------
oomkiller
Any reason why you didn't just OpenStack instead?

~~~
defied
We didn't really try OpenStack, heard good things about it but we wanted to
really keep it as fast as possible without any overhead.

Now we have a simple nodejs daemon which spawns and destroys VMs through
libvirt.

------
tsmith
Awesome blog post Jochen! And thanks for the Gridcentric shout-out :-)

------
seaghost
It better to say it doesn't fit your business model.

~~~
octix
What business model doesn't want to reduce costs?

~~~
omni
You're missing the point. EC2 specifically didn't work for them because they
were constantly spinning up new instances and killing old ones, so they were
repeatedly incurring the minimum one hour cost. If your business model doesn't
call for that sort of a workflow, the article is irrelevant to you.

~~~
korg250
Exactly. I use EC2 by running my instances 24/7\. That article is useless.

Plus, Amazon has servers here in Brazil, so that a plus to me.

