
We're moving. Goodbye Rackspace. - suhail
http://code.mixpanel.com/amazon-vs-rackspace/
======
newobj
Wow, that's a brutal under-the-bus-tossing. But well deserved. Most of the
points are spot-on.

I've operated a several dozen machine fleet for 2+ years on EC2 and I can tell
you that the number of times boxes have gone down never to come back (which
most people tend to think happens regularly on EC2) is incredibly small.

We actually run a Hadoop cluster (including DFS) on a spot instances. We never
lose the spot bid. We pay way less than the going rate for the compute time.
Less than the reserved instance rate. It's awesome. Yes, you obviously need a
plan to deal with your cluster vanishing in the blink of an eye. It's not too
hard.

I would second another commenter's caution on EBS though. We never put it into
production. Personally I never experienced an ephemeral drive failure that had
repercussions - when they (rarely) occurred, the drive was RAIDed or in our
Hadoop cluster (e.g. redundant). We made two experiments into using EBS with
our DB's, and both times, literally within 24 hours, we experienced a
catastrophic failure of the EBS volume, one time unrecoverable. So that put me
pretty well off EBS.

I can't say that the performance is necessarily the best, and we do experience
the occasional odd asymmetrical inter-machine latencies (e.g. 300ms to
establish a tcp connection in one direction, but normal <1ms in the other),
but for the most part AWS is just awesome.

~~~
gruseom
_Yes, you obviously need a plan to deal with your cluster vanishing in the
blink of an eye._

How long do the spot instances typically run before they vanish?

~~~
cperciva
A spot instance should stay around until the auction price is higher than what
you bid. That price has been quite stable recently, so as long as you bid
slightly higher than that rate you'll probably have your instance for months.

~~~
count
I actually did a market analysis of the EC2 spot price (they give you a year
of price history in JSON format). The spot price was under the reserved price
for all but a few hours since it came out. The savings in general were around
30-50% (depending on instance size), with higher savings coming from using
spot market pricing in the non-VA markets (CA, EU, Asia), where prices were
higher to start with.

Why don't people just bid the reserve price on the spot market, and never lose
the bid? You'll still pay less...

~~~
cperciva
_Why don't people just bid the reserve price on the spot market, and never
lose the bid?_

Because you _will_ lose the bid if a disaster occurs. A lot of reserved
instances have been purchased for disaster-recovery purposes but aren't
actively being used; those are instead being sold as spot instances. If a
disaster occurs, the supply of spot instances will drop dramatically.

------
xal
My honest question is why there is this odd loyalty to virtual environments in
this community. I realize that it may be boring but you guys are passing up
insane savings that can be had by using colocation. All cloud providers are
very expensive when you actually do the math and you need more then 10
servers.

Our example may be a bit extreme, but we are just building out a new
datacenter at a colocation and will recover the entire up front investment (
about 150k, we have the cash to not need leasing ) in a bit over half a year.

~~~
tdavis
Dedicated (rented) offers similar performance improvements and cost reductions
without the extreme upfront cost (obviously costing more over its lifetime as
a result). But these days it really seems like a blind spot. Maybe it's just
that I never completely jumped on the cloud bandwagon (or was running servers
far before it got rolling), but I really haven't found a generic use case for
cloud hosting. Planning infrastructure upgrades isn't rocket science; you'll
have more than 10 minutes' notice to get a new machine. There are certainly
many specific use cases for cloud hosting, but as a generic hosting tier I
find it incredibly overused—and companies waste money and are forced to deal
with unreliable performance (EBS i/o anyone?!) as a result.

~~~
ergo98
GoGrid lets you mix and match dedicated and cloud servers, which offers the
performance and value of a baseline of dedicated, and the flexibility of the
cloud. I find it to be unparalleled.

~~~
tdavis
Softlayer (my preferred dedicated host for years) does this as well. I agree
that it's currently impossible to beat the combination of bare metal
performance and cost efficiency with the ability to spin up cloud servers
within one's own VLAN.

------
jread
CPU is a major bottleneck for Rackspace Cloud. All instance sizes get the same
4 cores and about the same compute resources. CPU performance is roughly the
same on a 1GB cloud server as an 8GB cloud server, you are just paying for
more memory. Rackspace also uses ONLY Opteron 2374 2.20 GHz processors. EC2 on
the other hand offers linear CPU performance improvement on larger sized
instances. EC2 also uses a heterogenous hardware ranging from Opteron 2218 or
Xeon E5430 for m1 instances; Xeon E5410 for c1 instances; Xeon X5550 for m2
instances and Xeon X5570 (hyper-threaded to 16 cores) for the cluster compute
instances. EBS on the cluster instance is also much faster than local disk IO
in the Rackspace Cloud based on testing I've done (due to non-blocking 10G
network). Here are a couple of references for this:

[http://blog.cloudharmony.com/2010/05/what-is-ecu-cpu-
benchma...](http://blog.cloudharmony.com/2010/05/what-is-ecu-cpu-benchmarking-
in-cloud.html) [http://blog.cloudharmony.com/2010/06/disk-io-benchmarking-
in...](http://blog.cloudharmony.com/2010/06/disk-io-benchmarking-in-
cloud.html)

------
powdahound
Make sure you explore the limits of EBS before assuming it's a perfect
solution. We've found it to have incredibly slow throughput at times.

Some reference links:

    
    
      - http://orion.heroku.com/past/2009/7/29/io_performance_on_ebs
      - http://victortrac.com/EC2_Ephemeral_Disks_vs_EBS_Volumes
      - http://stu.mp/2009/12/disk-io-and-throughput-benchmarks-on-amazons-ec2.html

~~~
recampbell
I've also seen highly variable io performance (ie, slow/stuck) on EBS volumes.
Sometimes, all reads/writes block for several second or minutes, with iowait
going to 100% on an m1.small.

example:
[http://developer.amazonwebservices.com/connect/message.jspa?...](http://developer.amazonwebservices.com/connect/message.jspa?messageID=201918#201918)

However, I've decided that the other benefits of EBS volumes are just to
important to give up on (snapshotting, lazy-load, re-attachment). Instead, I
plan to monitor for such situations and blow away the node when I detect it.

But really, I don't understand why Amazon doesn't fix this. It's happened to
me 3-4 times in a relatively small installation. Surely their monitoring can
detect this?

~~~
moe
_But really, I don't understand why Amazon doesn't fix this._

When you select your instance-type then you get to choose between "Low",
"Moderate", "High" and "Very High" I/O performance.

Want to take a guess at which instance types suffer first when there is high
I/O demand or contention?

~~~
recampbell
I can handle suffering, but completely freezing?

~~~
moe
That is likely not intended so I guess it's probably inherent to how Xen
and/or their I/O layer do the resource allocation.

I, too, have seen these freezes occasionally on small instances. If you think
that is harsh then try one of the new micro's. ;-)

Anyways, I haven't had such a freeze last longer than a minute yet, and the
instance would always recover. If yours does _not_ recover then that would
clearly be a bug.

------
adriand
This is a rather unpleasant review of a service we were about to move over to.

For companies that have moderate performance requirements (e.g. visitors in
the range of 30k or 40k per day across a range of web apps and sites),
reasonable but by no means expert level server administration skills, the need
for a redundant environment to satisfy SLAs with clients (e.g. two app servers
+ load balancer + master/slave db servers), and the desire to focus mainly on
software development instead of server admin, what companies does the HN
community recommend?

We've been considering Rackspace Cloud and Linode, but are open to any
suggestions. We also have a quote for a standard, managed four-server +
hardware load balancer deployment in front of us but it is pricey
($3000+/month).

~~~
madaxe
Linode are wonderful - best support I've had from anyone, ever. Nice
instances, too!

~~~
boyter
What are they like for CPU usage? I have been looking at VPS.NET and other
such providers to ensure I get a reasonable slice of CPU which I can use for
other things.

Most VPS providers I have come across dont like it if you start working out
the millionth digit of Pi, but sometimes you have a task that takes a while
and you can't run offsite.

~~~
easp
I believe each physical host has at least 8 real cores. They give each
instance 4 virtual cores. As long as CPU time is available, an instance can
use the full capacity of four cores. If there is contention for CPU time,
every instance gets an equal share of CPU, which is also their fare share of
CPU, because all the instances on a box are the same size/pricepoint. Further,
larger instances are shared with a proportionally smaller number of other
instances.

My experience, which seems to be shared by just about anyone who has published
benchmarks, is that Linode is appropriately conservative on the degree to
which they oversell CPU, because generally, it appears that a task that wants
100% of four cores does in fact get close to 100% of four cores.

The only thing to look out for is that there is quite a difference in single
threaded performance between their newer hardware and their older hardware.
Our app is written in one of the popular interpreted dynamic languages.
Generic benchmarks for that language showeda 2-3x difference on some tests
when run on the different hardware, and our app showed similar differences on
CPU bound tasks.

There are a few implications of this:

First, it complicates things when you try to make your staging environment
mirror production. The quick check is to look at /proc/cpuinfo and see if
clock,cache and model line up. If not, open a ticket and ask them to migrate
instances so things match.

Second, while they try to size things appropriately so that your guaranteed
CPU Is the same, regardless of which generation of host you are on, your peak
CPU is going to vary dramatically since you get 4 cores, irregardless.

------
dangrover
Rackspace cloud's DNS stuff stinks. No way to add TXT records -- you have to
open a ticket! Sure you can host it yourself, but every other cloud provider
has this in their UI.

I get the feeling they're just in "maintenence mode" over there and don't have
anyone working hard on improving the offerings.

~~~
danudey
Their DNS used to be great, if simple. You could put in a record for your new
server and by the time you'd logged in, the record was live.

Now, it's gotten to the point where their DNS updates so slowly that I just
re-use old subdomains instead of creating new ones, just so that I can get my
project tested today rather than tomorrow.

I'm working on moving everything over to Amazon; more flexibility. I like
Rackspace Cloud, but they're falling behind and they either don't know, don't
care, or can't catch up.

------
jread
Another distinguishing factor between Rackspace Cloud and EC2 or Linode are
bandwidth caps. Rackspace limits outbound public network throughput to 10mbps
for 256mb instances to 70mbps for 16gb instances. EC2 and Linode both provide
an uncapped GigE Internet uplink for instances of any size.
[http://cloudservers.rackspacecloud.com/index.php/Frequently_...](http://cloudservers.rackspacecloud.com/index.php/Frequently_Asked_Questions#Is_there_a_throughput_limit_on_my_server.27s_network_interface_card.3F)

~~~
megaman821
This actually becomes really important when you set up a small server as a
proxy sever. We had to bump up our HAProxy server to a 512mb instance in order
to get adequate bandwidth to serve our sites.

------
cperciva
Quoth the article: _Lastly, we moved over to the Rackspace Cloud because they
cut a deal with YCombinator (one of the many benefits of being part of YC)._

Can anyone say what this deal is, or is it secret?

~~~
mikeyur
For a separate project I've worked on (non-YC), we negotiated 50% off
bandwidth and somewhat reduced server rates. You just need to ask for it.

Not sure what the YC deal is, but I assume it's similar.

~~~
aaroneous
I had a similar experience: with minimal negotiating we were able to get 50%
off their initial quote. I also renegotiated our terms later within the
contract period with relative ease.

They pad their pricing pretty heavily, so don't be afraid to ask.

------
powdahound
Goodbye code.mixpanel.com too? Page won't load. :(

~~~
suhail
funny part is...it's on the rackspace cloud.

Though probably our fault since we gave it a crappy box.

Edit: 512MB of RAM should be enough...

~~~
suhail
Apache2 basically just OOM'd

~~~
chaosmachine
Disable keepalives:

[http://www.kalzumeus.com/2010/06/19/running-apache-on-a-
memo...](http://www.kalzumeus.com/2010/06/19/running-apache-on-a-memory-
constrained-vps/)

------
yesimahuman
Another control panel complaint: DNS. For some reason beyond me you have to
choose a server just to configure DNS. Added on top of the fact that the
control panel is really slow, it just becomes a pain to use.

~~~
nl
Yes!!

I was trying to do a DNS change and I couldn't find the DNS panel anywhere
until I Google'd it.

------
zemaj
I moved all my services from EC2 to Rackspace Cloud about 2 years ago, but I'm
regretting it.

Rackspace Cloud does one thing well - small instances have great value CPU &
local IO performance. If your app is CPU or local IO bound, splitting it
across multiple 256MB instances on Rackspace Cloud will get you huge
performance relative to price. I've been worried that this would degrade as
the service grew, but that hasn't been the case.

Unfortunately many other 'features' of Rackspace Cloud have been poor to awful
recently. Some anecdotal stories;

1) We haven't been able to make images of our server or restore ANY backup of
our servers for months. There is a bug in the backup service where if you have
spaces in the names of your Cloud Files containers (completely unrelated to
the backup service) then all images fail to be able to be restored. We can't
remove the spaces in the containers because you can't rename containers (only
delete) and there's too much data tied to different parts of our
infrastructure in there.

2) In relation to the issue above, we have had a ticket open for over 2 months
which we continually post updates with new information & asking for issue
resolution. We never receive updates to the ticket itself and only receive
information when contacting their live chat. The response is always "we're
working on it". I could live with it if this was a short period, or not an
absolutely vital part of their service, but come on - all backups broken for 2
months! No timelines on resolution. No ticket responses. No happy.

3) While CPU value is great on small instances you get the other end of the
stick on large instances as other posters here have said. You don't get
significantly improved performance above the 2GB servers. CPU capacity
certainly does not double as their documents say.

4) Cloud Files latency is awful. Individual read/writes take 300-1000ms. Fine
for a small number of large files. Impossible for a large number of small
files. (Having said that, being able to upload files and publish to CDN in a
click has saved me lots of time for static files I need to quickly publish).

5) Resizing mid to large instances is impossible. We recently tried to resize
a 1GB (40GB disc) server to a 2GB (80GB disc) and it took OVER 3 DAYS. No
really. It didn't complete. The resize process takes the server down towards
the end. We had to get Rackspace to cancel the resize and manually spin up
another server and transfer the files ourselves. To make it worse, we couldn't
act on this issue initially because Rackspace insisted that the process was
"almost complete" from 12 hours onwards. 2.5 days later we just gave up. We
managed to do the manual transfer ourselves in a couple of hours. Even worse
Rackspace seemed to not think that it was unusual for the process to take 3
days or express any desire to investigate further.

6) The web interface has awful performance at scale. Once you go above 20
cloud servers every single page load takes 10+ seconds. As the original poster
says, the number of errors it spits out about not being able to complete
operations is insane. It's rare I can go in there planning on doing something
and not have to contact support to fix something broken on their end.

7) They're taking the entire web interface and API offline for 12 hours this
week! You won't be able to spin up or take down any of your servers. Why? So
they can fix a billing issue related to Cloud Sites (a service we don't use).

I've always been a champion of Rackspace Cloud and Rackspace in general, but
sadly I would no longer recommend them to people. I'm starting to make
contingency plans and looking for other providers again.

~~~
jread
Rackspace Cloud used to have the upper hand on smaller sized instances. This
is no longer the case with EC2's new 640mb micro instance at about $8/mo
reserve. Not top performer, but it gets the job done for lower traffic
servers.

~~~
russell_h
For anything that needs much CPU at all that isn't going to be the case. Any
Rackspace node will be literally an order of magnitude faster for anything CPU
bound, compared to an EC2 micro instance.

On the other hand if you just need memory (without necessarily having the
clock cycles to quickly access it), or just something that will eventually
handle occasional requests, a micro instance should be just fine.

~~~
alex1
Just curious, are there any benchmarking results that show this?

~~~
jread
This is true, generally all Rackspace Cloud servers perform about the same in
terms of CPU and disk IO, so the small Rackspace instances will outperform
small EC2 instances. However, they don't scale well and on the high end, they
under-perform relative to comparable EC2 instance. Here is some sample data
validating this (these web service links will provide XML formatted benchmark
results):

1GB Rackspace vs EC2 Micro - CPU Performance:
[http://cloudharmony.com/ws/getServerBenchmarkResults?serverI...](http://cloudharmony.com/ws/getServerBenchmarkResults?serverId=ec2-us-
east.linux.t1.micro|rs-1gb&benchmarkId=ccu&ws-format=xml)

1GB Rackspace vs EC2 Micro - IO Performance (using EC2 EBS):
[http://cloudharmony.com/ws/getServerBenchmarkResults?serverI...](http://cloudharmony.com/ws/getServerBenchmarkResults?serverId=ec2-us-
east.linux.t1.micro|rs-1gb&benchmarkId=iop&ws-format=xml)

16GB Rackspace vs EC2 cc.4xlarge - CPU Performance:
[http://cloudharmony.com/ws/getServerBenchmarkResults?serverI...](http://cloudharmony.com/ws/getServerBenchmarkResults?serverId=ec2-us-
east.linux.cc.4xlarge|rs-16gb&benchmarkId=ccu&ws-format=xml)

16GB Rackspace vs EC2 cc.4xlarge - IO Performance:
[http://cloudharmony.com/ws/getServerBenchmarkResults?serverI...](http://cloudharmony.com/ws/getServerBenchmarkResults?serverId=ec2-us-
east.linux.cc.4xlarge|rs-16gb&benchmarkId=iop&ws-format=xml)

~~~
jread
More benchmark results (in a UI) for Rackspace Cloud, EC2 and others are
available here:

<http://cloudharmony.com/benchmarks> (click on the 'View benchmark results'
link on the bottom left)

------
barrydahlberg
I'm suprised to see a setup using 50+ instances running on RackSpace cloud,
wouldn't it have made sense to start moving towards dedicated servers by then?

~~~
suhail
We need the elasticity. I can't wait 24 hours for new boxes to come.

~~~
tnm
Capacity planning. It's a thing. Try it out.

~~~
suhail
Capacity planning is hard when data volume doubles / triples over night.

~~~
abhay
No one denies that capacity planning is hard. There are books written on the
subject. The points you make are exactly the reason why you need to do
capacity planning and plan for mitigating failures. If you aren't planning on
2x (in fact more) growth then I'm confused as to what kind of growth you
really expect in your service.

If you aren't giving yourself room for expected and unexpected loads, you're
doing it wrong. Add capacity and load testing to your process.

~~~
moe
_If you aren't giving yourself room for expected and \-- >unexpected<\--
loads, you're doing it wrong._

You're using that word, I'm not sure it means what you think it means.

Over here in the real world, many applications (and notably web-applications)
have one thing in common: They change all the time.

Your capacity plan from October might have been amazingly accurate for the
software that was deployed and the load signature that was observed then.

Sadly now, in November, we have these two new features that hit the database
quite hard. Plus, to add insult to injury, there's another _old_ feature (that
we had basically written off already) that is suddenly gaining immense
popularity - and nobody can really tell how far that will go.

Sound familiar?

------
jread
<quote>Amazon has a CDN and servers distributed globally. This is important to
Mixpanel as websites all over the world are sending us data. There’s nothing
like this on Rackspace.</quote>

Actually, Rackspace cloud offers CDN services with Cloud Files through
Limelight, although it does not support some features that CloudFront does
like CName and streaming.

~~~
suhail
But not physical servers which is really what I am referring to.

~~~
jread
Rackspace Cloud has data centers in Texas and Illinois. However, they don't
let you choose (your account is assigned to one data center at the time of
creation). EC2 certain offers a lot more flexibility in this regard with EC2
data center regions in California, Virginia, Ireland and Singapore. I spoke
with Rackspace at Interop a few weeks ago and they told me they are working on
expanding to additional data centers including an international data center
very soon and offering choice.

~~~
0x44
You can "choose" insofar as you can email support and they can manually change
what data-center your next VM and all subsequent ones will be built in. It
really should be exposed through the API, but the people who write the
customer-facing API haven't done it.

~~~
jread
That is not the story I got when I contacted support and asked if I could
deploy a VM specifically to the Chicago cloud. Rackspace's support told me
this was not possible an I'd have to setup an entirely new account in order to
deploy to that data center.

~~~
0x44
Well, I guess that's one way of doing it. :(

I wrote the code in question, though. It wasn't supposed to be an _advertised_
option, but when I wrote it it was supposed to be a usable one.

------
mikey_p
I'd also add that their billing software can't keep things straight. I've had
a couple of servers that I spun up for a demo at a meetup, that somehow ended
up with the same name, this prevented me from deleting them, and tells me that
their control panel has concurrency issues, since you shouldn't be able to
create two servers with the same name.

I didn't notice the issue until 2 months later (I thought I had successfully
deleted the servers) because all of the sudden I received huge bill that
contained over 1100 hours of usage for each instance, for that month alone.
WTF? Turns out their software failed to bill me the previous month so I didn't
notice any change.

Their response to my ticket about not being able to delete servers was to tell
me the steps that I had to take to fix it (renaming the servers). I really
wish when you had a ticket for stuff like that they'd actually act on it
instead of just telling you how to fix stuff and expecting you to do it
yourself.

------
lisper
It's even worse than that:

[http://rondam.blogspot.com/2010/03/danger-will-robinson-
rack...](http://rondam.blogspot.com/2010/03/danger-will-robinson-rackspace-
cloud.html)

------
joecode
One potential reason not to move: Security.

A good friend deep in the security community once told me, off hand, that EC2
was "owned." I didn't take this too seriously until another good friend, who
has been working at Amazon for the past several years, told me that engineers
at Amazon were generally forbidden from using AWS due to security concerns.

That much said, I still decided to use EC2/RDS/S3 to host the infrastructure
of my latest startup. It is just too convenient to walk away from. Once it
matters, I can move the critical stuff to dedicated servers.

EDIT: To clarify, I'm not suggesting that Amazon knows AWS is "owned" and
offers it to others anyway. I'm only noting that, for certain critical
services, they themselves do not appear willing to take the risk.

~~~
cperciva
I've worked with Amazon Web Services security people in the past, and while
they're not perfect (nobody is) I have always had the impression that they
take security seriously. AWS has many very large customers, including the US
government and companies handling HIPAA-restricted data; based on the
assumption that Amazon employees don't want to be thrown in jail for 10 years,
I think it's safe to say that if EC2 is is "0wned" as you claim, it's
certainly not well known within Amazon.

~~~
tptacek
For what it's worth, accidentally (or even negliently) violating HIPAA is
fantastically unlikely to get you charged criminally.

~~~
rbranson
Yeah but "0wning" EC2 would most certainly get you charged criminally under a
number of laws.

~~~
tptacek
Colin was implying that negligent management of EC2 could leave Amazon
employees criminally liable. Obviously anybody who "owned up" EC2 is already a
criminal.

