
Rackspace Goes Down. Again. Takes The Internet With It. Again. - vaksel
http://www.techcrunch.com/2009/12/18/rackspace-down/
======
tamersalama
I can't believe how TechCrunch is turning into a tech-tabloid.

Yes, there are issues (perhaps sometimes serious) that needs reporting - but -
the way it is done by TC isn't quite the quality I'm excepting.

~~~
SamAtt
I agree with you but I'd bet those are TechCrunch's most popular posts. In
fact, the author (MG Siegler) was essentially hired because he's good at
writing those types of post.

People like trashy journalism. It's like the Tiger Woods thing. I don't mind
the networks reporting on it but the fact that it's been the top item on
"legitimate" news sites should tell you how modern day journalism works.

------
tybris
Amazon just started looking a whole lot more attractive. They may not have
such fanatical support, but they do know how to fix their stuff.

~~~
cmelbye
Well, Rackspace knows how to fix their stuff too, which is why it's up again
after the problem appeared. ;-)

------
strlen
Attention Internet start-ups: operations is your core competency. You can't
just expect to push your application code to "the cloud" and have somebody
else make it scalable and fault tolerant.

It's fine to use a managed hosting service provider if you're _just getting
started_ and paying month-to-month based on credit cards and can't afford
networking gear (hardware itself can be leased). However, it shocks me to see
venture funded, post-series A start-ups _exclusively_ relying on others to do
their operations (including systems administration).

Problem with outsourcing your operations to somebody else is that you're
outsourcing it to somebody who has _zero_ knowledge of your application and is
also responsible for at the very least _dozens_ of other customers.
Essentially, rather than developing _your own_ vertical technology team,
you're relying on a on horizontal technology team whose resources multiple
other companies (including your competitors) are fighting for.

That's _exactly_ how poorly run big companies function (multiple engineering
teams competing for attention of monothlic operations, SCM, release, QA etc...
teams). Well run Internet giants, however, function much differently. If you
look at Google's job openings, you can note that they don't hire IT/Systems
Administrators (aside from data-center technicians and corporate IT). Instead,
they have Site Reliability Engineers (SREs) assigned to specific properties.
The SREs are actually engineers who are able to write and debug code and
deeply understand the application-specific stack they operate rather than
treat it as a black box.

Sure, there are great business and technical reasons (edit: it said financial,
which I felt was an unfair strawman) reasons to use a managed/"cloud" provider
(EC2 and Rackspace cloud are very attractive due to the ability to add/remove
nodes as traffic goes up and down, as well as provision machines ad-hoc for
analytics tasks/MapReduce) -- but even then, you're not off the hook for
operations. It's still _your_ responsibility to ensure that at the very least,
there's a "hot standby" disaster recovery site. Yes, it's not easy - but
running a successful business isn't supposed to be.

~~~
wrs
Comparing Techcrunch to Google is a bit lopsided, don't you think? Operations
for most websites is like vehicle maintenance for a pizza delivery company:
essential for success, but hardly their "core competency". This is especially
true if the website is something relatively standard like a blog or news site.
It takes quite a bit of scale and customization before it makes sense to hire
full-time dedicated sysadmins.

~~~
revicon
> Operations for most websites is like vehicle maintenance for a pizza
> delivery company: essential for success, but hardly their "core competency".

Very interesting analogy.

~~~
strlen
Would you fly on an airline that treated their aircraft maintenance the same
way? So why would you expect advertisers to run their ads on sites that don't
have a _person who understands the application_ carrying a pager?

------
jmonegro
TechCrunch ought to get a little green light image in their sidebar that goes
red and displays the text "Rackspace is down" every time it is.

~~~
Hoff
Yeah, and followed by a GoogleWave of Erlang threads on HN. Please.

------
pclark
Linode.

~~~
bentlegen
Linode had several hours of downtime back in October:
[http://www.linode.com/forums/archive/o_t/t_4778/host_reboots...](http://www.linode.com/forums/archive/o_t/t_4778/host_reboots_october_27th_2009.html)

Not to knock on Linode - I use them - but as earlier commenters have stated,
downtime is a fact of life, and no host is immune.

------
dnsworks
It's amusing that Techcrunch considers themselves and a handful of other
startups "the internet".

This is why we architect provider independent solutions. Keep your TTLs low.
Replicate your site to another provider, and be prepared to hit the switch
when your primary provider inevitably fails. All datacenters lose power, all
networks get borked, and all storage systems fail. It's a fact of life.

~~~
iigs
Yeah the snotty attitude advanced in the article is pretty offensive. If you
run something on the internet and rely on it for your dinner, you don't really
have an excuse for single points of failure. Do your job.

It's exceptionally easy to engineer around these issues if you don't just
outsource all of your thinking to a provider.

~~~
drusenko
I don't know, I sort of side with TechCrunch on this one. Yes, you can
replicate a site (some easier than others, and a blog should be relatively
easy), but that's still no excuse for datacenter downtime, and Rackspace has
had a prolific amount of downtime.

 _Any_ datacenter downtime is serious business -- at a good datacenter, it's a
small-ish incident every few years. Rackspace, on the other hand, seems to
have a large incident every few months, meaning it's probably best to take
your servers elsewhere.

~~~
ghshephard
I used to think that Data Center Downtime was serious business, but I've come
around to believing that it shouldn't be. If your data center going down is a
problem, then you don't really have a very robust failover plan.

If, instead, you _plan_ for the Data Center to go down, and treat a data
center failure as a trivial issue, then your DR plans start to become
significantly more robust.

In particular, the companies that I admire are ones that routinely swap, on a
routine basis, their DR and production facilities - and when a production data
center goes down they don't even bother to wait - they just light up the DR
center and are back in business.

Most of the SaaS Financial Hosting companies (Oracle Financials) that I've
talked with will provide you with that feature.

------
antidaily
Holiday party. Someone spilled egg nog on a surge protector.

------
ryansloan
This seems like a good candidate for <http://sadtrombone.com>

