Hacker News new | past | comments | ask | show | jobs | submit login
Tell HN: Rackspace apparently lost some storage volumes
134 points by sz4kerto 5 days ago | hide | past | favorite | 63 comments
First, RS has informed us that they're migrating some volumes and that'll require downtime:

1st of Oct:

"The above volume has been selected as available to clone and migrate to prevent being affected by the maintenance listed below. If the volume migration is not completed before October 16 at 22:00 CST, it will be affected by the below work. New volumes and clones of existing volumes will be placed in the new datacenter location and will not be affected by the below work."

5 hours ago:

"Unfortunately you have one or more volumes that we have not yet been able to bring back online. The following volumes are impacted by this issue:"

Then a few minutes ago:

"Following extensive troubleshooting we have been unable to bring the host server on which your volume is hosted back online and as such we are unable to recover data for your device.

You have the option to deploy a new Cloud Block Storage device."

This is completely anecdotal:

If you find yourself on Rackspace hosting I would absolutely migrate to something more mature. Personally I've had great luck with Digital Ocean and Linode for basic/affordable VPS hosting, with AWS being a great cloud provider with an incredible tool-box at your disposal. Hell, GPC, Azure... there's just so many better options.

IMO Rackspace has been on a steady decline for years. They were a choice provider back in the early 2000's when a managed/unmanaged beige box/shared cPanel/Plesk hosting was common. Back then their support was incredible. Sometime around 2010 though things started to go down hill, and I experienced more and more support issues with them. These would range from annoying all the way to "that shouldn't happen" (like losing volumes, which happened to me).

The final straw was when a dedicated server was turned down by mistake, causing a major outage for 20+ of my clients at the time. This was 10 years ago and I've never looked back. They did a pretty good job at turning me into an advocate against their services.

Oh and the Slicehost transition was crap too. That sucked to get over to a first-gen VPS provider only to have it sucked right back into the beast that you were trying to escape from in the first place...

Sorry for the mini rant. I'm still not over the PTSD that Rackspace has caused me.

Currently, to delete a cloud server, you're required to call support. The option to delete servers via the API and interface have been removed. On more than one occasion I've been pitched by the "support engineer" to sign up for something akin to reserved instance pricing.

They claim it's a security measure.

I've never had an issue with service or availability. But, the price per performance on the VM instances is so much better elsewhere.

Was working to migrate to DO, but now I'm fast tracking that process.

> Currently, to delete a cloud server, you're required to call support.

This is insanity to me. I'd leave a VPS/cloud provider so fast if they started this. Even back in the day I always had the option to delete a Rackspace Cloud instance.

Wow. Just wow.

Finance department meeting:

MBA1: So we've figured out that we make a lot of money from people just being too lazy to delete a server.

CFO: Great, how do we make it even harder to delete a server?

MBA2: We could remove the delete button.....

MBA1: I don't think that's a very good idea

CFO: Then how will they delete servers?

MBA2: Fax us!

CFO: Brilliant, I'll tell an engineer to develop the web code.

How often do you think this type conversation happens?

I worked in a industry comparable to RS and we pretty much used same tactics but the call came from the CTO.

Often. I heard something similar in an ecommerce/digital subscription context.

Huh, when did this happen? I deleted cloud servers via the GUI a few months ago as we are migrating stuff to AWS, but I still have some clients on Rackspace.

Rackspace has been around forever but they are a bunch of children working out of an abandoned mall like its Day of the Dead (which they've decorated like a discount Facebook HQ). They hire a few well known OpenSource people but its clown-town the rest of the way down. This is mostly because they are stingy and risk averse. Exhibit A: Who headquarters in San Antonio? Maybe once IBM or HP hit their heads a few more times one of them will buy up Rackspace and be done with it.

I think OP meant technological and process maturity, not age.

I worked for a Rackspace competitor in the mid 2000's and can confirm that every business decision in that space is driven by short-term growth and revenue. When I worked there, the pay was terrible (although I didn't know it at the time) and the "servers" were literally rows upon rows of cheap tower PCs shoved into wire shelving. The owner of the company made millions but never invested it back into the company beyond whatever was required for horizontal expansion. (He preferred to spend it on exotic cars and vacation homes.)

There is close to zero long-term thinking in this space and as a result, a decade or two later, cloud providers are eating these guys alive.

Employee in the same space you are referring to, though bare-metal. The wreckage owners can make by thinking of the short-term gains is baffling. We are so behind the curve that we'll never make up for it.

Oh yes, definitely was not trying to argue they were mature because of their age but rather that they weren't in spite of it. Of course that is a fallacy of progress, but you get the idea.

Worked for them in the Herndon DC. Forget the name of their lovecraftian ticketing/control interface (which was always breaking) but it was hands down the worst interface I've ever worked with. Their remote SA team was not very good either. The supply/parts folks serving DC techs earned bonuses on having bad parts recycled into production and the parts delivery was not managed properly. One xmas I saw the same motherboard 5 times after returning it as defective each time. I resigned and was then fired. Only place that ever happened. Was so happy to leave I didn't care but it has come back to haunt me once or twice.

You quit and then you were fired? How did they manage to do that - Virginia is a right to work state, you can quit any time. They can't fire you for.... quitting first?

Eh - this has happened to me. Literally word-for-word: "You can't quit! You're fired!"

I know it seems like it could be right out of a Simpsons episode or w/e but managers can have some pretty backwards egos. The same manager tried telling my colleagues that I had been fired and like 2-3 people were like, "Nope - actually he quit. Thanks for screwing us out of 2 weeks of project hand-off..."

The 4 letter word you are looking for is “core”

Careful with DigitalOcean, though. I had my entire account frozen (and all servers stopped) for an hour due to spinning up dedicated CPU instances to do some video transcoding.

Fortunately for us, we don't have anything mission-critical on DigitalOcean. Running a similar load on Vultr (high frequency CPU nodes) simply got us a polite email asking us to switch to dedicated nodes, which we happily did.

Did I understand this right, that they stopped your servers because you spun up some dedicated CPU instances? Isn't that what you're supposed to do?

Appreciate this anecdote. I tested their new PaaS recently and found the build times to be ungodly slow, much more so than Heroku.

I used Rackspace from 2012-2017 ish and it went noticeably downhill during that time. I had Linode servers at the same time and they had better uptime and better support for a fraction of the price, so that’s where I moved everything. No regrets.

Similar. used to be happy RS customer 6 years ago, then everything just got more complicated and less stable. Now they are public should have the cash to avoid any major issues like this. Just no reason to use them with G Cloud, AWS, Digital Ocean (happy customer with them too plus AWS).

Rackspace is very mature. However in their cloud offerings I'm not experienced or so sure.

This is kinda surprising this ever happened, no cloud company allows this with nuremous safe guards in place. This is very embarrassing for RS.

> Rackspace is very mature

Nahhh - it went public and everything is a race to the "bottom line" while exec/investors skim money off of the company vs. re-invest in an ever-changing market. Their stagnation has effectively put them back in the crib maturity wise.

It's not surprising when a company goes public and everything becomes about the almighty dollar first, and any sort of technical/support excellence second. Seen it happen time and time again in tech...

Vendor problems... amirite?

(I'm the OP)

We didn't have anything important there, migrated to GCP a while ago.

> If you find yourself on Rackspace hosting I would absolutely migrate to something more mature.

Rackspace has actually been around longer than DigitalOcean. I'm not arguing that they're better or worse though because I haven't used either in years.

Yes - I'm aware. The age of a product does not correlate to the maturity of the product.

Thanks for the "correction"

Maturity has nothing to do with age, even 50 year olds can be imature.

You can't be a child forever, but you can always be childish.

I wouldn't say maturity has nothing to do with age, but point taken.

There are more components to maturity than age.

Is this news? This is a block storage failure on Rackspace. EBS volumes on Amazon fail all the time, as anyone who manages a large number of instances probably knows (0.1%-0.2% per year for normal volumes):

> Amazon EBS offers a higher durability volume (io2), that is designed to provide 99.999% durability with an annual failure rate (AFR) of 0.001%, where failure refers to a complete or partial loss of the volume. For example, if you have 100,000 EBS io2 volumes running for 1 year, you should expect only one io2 volume to experience a failure. This makes io2 ideal for business-critical applications such as SAP HANA, Oracle, Microsoft SQL Server and IBM DB2 that will benefit from higher uptime. io2 volumes are 2000 times more reliable than typical commodity disk drives, which fail with an AFR of around 2%. All other Amazon EBS volumes are designed to provided 99.8%-99.9% durability with an AFR of between 0.1% - 0.2%,


A few months ago I found that Rackspace had left open their prod logging system for their global load balancer / firewall system (known as BlueFlood).

Reporting the open Elasticsearch server to then was one of the most bizarre experiences of my life. Ultimately the system was exposed for about 2 weeks.

You can find some screenshots of what their backend looked like, which came in at about 3 TB daily, here: https://blog.12security.com/rackspace-bluefood-breach/

I'm surprised, always thought the Rackspace premium price was to prevent or significantly reduce these issues. That being said, I've had quite a few problems with them over the last year or so that is making me reconsider. The first-tier support they now outsource to India is frustrating. CloudFiles entries being "inaccessible" after saving for no known reason, happens a couple times a month. Had a managed MySQL slave replica down for 5 hours because they changed the IP of the master (mandatory maintenance) and failed to update the replica, no way to do this on my own.

AWS and Azure might not be better, but they're definitely cheaper.

> I'm surprised, always thought the Rackspace premium price was to prevent or significantly reduce these issues.

I've found that these systems are often there to make the business-minded customer feel better vs. offering up any sort of highly-competent technical support. Rackspace fits this bill for me although I haven't been their customer for close to 10 years due to how badly they've burnt me.

On the contrary, free-tier Linode support has always been incredibly technically competent and has never let me down for the 10+ years I've been with them. If anyone from Linode is reading this - you guys are awesome, and your support staff is amazing. I can't say enough nice things about them as they've done so much to support me for a solid decade!

All I'm saying is paying for support vs. not paying doesn't mean you'll have any more technically competent help in my real-world experience. It just means "you're paying for support" on paper, and typically is just an up-sell from legacy hosting providers like Rackspace or GoDaddy. Free from Linode is 100X paid from Rackspace - all depends on the company.

I moved a company off Linode a few years ago because of ongoing intermittent packet loss (think 93%) for subsets of our customers, they never were able to do anything about it and just pointed fingers at the ISPs, who of course I have no direct contact or relationship with. Wouldn’t recommend.

Rackspace support is not just “help you when you are having trouble,” they will do sysadmin tasks for you, basically as much as you want. It’s more like light professional services + typical support. That’s why it costs more.

Yes, well aware. In-industry we call this "managed" and "un-managed" hosting.

All of my issues actually happened with "managed" hosting 10 years ago. I was in my early-20's and still getting my feet underneath me as a Linux administrator so having a backup was something I advocated when I was provisioning our hosting.

Frankly, it was probably pretty good for career development because their support staff was clueless with anything Linux administration, and if I needed an answer quick I was my only trustworthy resource. Like - basic LAMP (Linux, Apache2, MySQL, PHP) was completely beyond them even though that was the bread and butter of their business.

I haven't sprung for "managed" hosting ever since because it just taught me that I was the best person to handle my interests... and it lit a fire under my butt to get way better at my Linux chops!

> AWS and Azure might not be better, but they're definitely cheaper.

I can't speak for AWS as we don't use it but as a former client of Rackspace and a current client of Azure I can assure you that Azure is better and cheaper by a country mile.

> AWS and Azure might not be better, but they're definitely cheaper.

Azure has "ZERO % Annualized Failure Rate". It gives me such a cozy feeling.

Amen. We moved to Azure after a DDOS attack which took out our Rackspace instances and we have never looked back.

Rackspace was sold to a private equity company in late 2016. Most of their senior staff has jumped ship since then, and services have been decreasing in quality. It's essentially a different company, and not one I'd host anything important on.

Is there anyone that gets bought out by a PE company that doesn't go to absolute shit? QlikTech, Sencha, TravisCI all got bought and suffered a marked decline in service and roadmap confidence. Nowadays, when I hear someone is bought by a PE firm, I start looking for the exits.

I really wonder about this when it comes to the longer term viability of tech-focused PE firms. The following seem to be absolutely truisms:

1. The second a company is bought out by PE, all the good employees run for the exits. Corporate America isn't exactly known for its empathy and compassion, but working for a PE-owned company means you can be sure the ownership gives 0 shits about anyone working at the company. I have never heard anything but horror stories of those working for PE-owned shops.

2. Given #1, it means the products of said companies always turn to shit pretty fast.

This makes me really wonder about how PE firms can really work their financial magic in the long term. Perhaps they're just better at extracting value out of companies that are already declining in the first place, so they just suck the blood dry before they discard of the carcass?

That's exactly what they do. Buy when it's good, extract as much money as they can and then sell it on or run it into the ground.

personally, I think PE works something like: it's easier to cut costs than create value, so they buy something, and cut costs like crazy. It'll work with almost anything with enough money coming in. But, I think they often cut deep enough to compromise the long term health of the company, like if you suddenly cut your food budget to almost zero by eating nothing but ramen. In the short term, it's a budget miracle. Keep it up, and you'll die from not getting the vitamins and minerals you need. Before the long term consequences are fully understood and come to pass, they sell they company back off.

I'd agree with another comment here. It doesn't even take a declining company, just one small enough to purchase (realistically, you can't buy Apple, it's just too big) with a stable amount of cash coming in. Getting the cash coming in is hard, that takes creativity and creating value, so they need that part in the bag before purchasing the company. But if you have the cash coming in, any fool can just start cutting costs.

The stable income also helps with leverage. You can use the company you're buying as security for a loan to by the same company based on the stable income. Ta-da! That's another reason they need companies with cash coming in, leverage so they can flip bigger companies.

I'm convinced any idiot with a billion dollars laying around could do it.

re: the latter, the company doesn't have to be declining - although that usually makes it easier to buy. The transaction from the point of view of a PE company is a very different game than say, a product person looking at what could be "done" with a company.

I remember when LimeLight Networks got bought out by PE and their services turned to crap in about six months. It got so bad we got lawyers involved to break our contract with them.

Ha! I've been through something similar (company being sold).

According to my experience, a company that passes from an owner to another one is basically another, completely different company.

This is a shame, I had many years of fantastic experiences with them (running 100's of machines) up till about 2010 when I left a role & have been using AWS since.

This is incredibly frustrating, I can sympathize. But this is why we do backups even on cloud storage. I had the same thing happen with an EBS volume on AWS becoming unresponsive, unable to attach and load.

A few years ago I tried to cancel something, but it didn't work. Luckily my credit card was expiring anyway, so I just did nothing. But then they sent me emails more and more aggressive, so I put them on my killfile. Two years later I discovered that they were still sending me emails.

> First, RS has informed us that they're migrating some volumes and that'll require downtime.

I'm not trying to blame you, just trying to figure out what happened.

Reading October 1st message, it didn't sound like they were going to clone and migrate it, but that they were letting you know something you could do to avoid the below work (which you didn't quote).

What was the description of the work?

Did you initiate a clone and migrate?

All storage will fail at some point, this storage failed during a maintenance window that was advertised two weeks out.

I'm somewhat surprised they are able to declare defeat in less than 24 hours, I would expect moving the drives to another host would at least be able to recover the data, but I don't know how they designed their system, and some systems are more byzantine than others.

> Following extensive troubleshooting we have been unable to bring the host server on which your volume is hosted back online ...

How is now being able to bring a server online connected to the storage that server provides?

Surely they'd remove the physical storage devices from the dead server, put them in another one, and bring it up?

... unless they're hosting their servers with a third party they can't access? eg AWS or similar, which would be kind of ironic. ;)

yeah hardware issues happen, it's unfortunate but it happens on a very regular basis. backups are important

The idea behind cloud services is that you pay someone to manage that.

I think the idea is that these simple storage volumes are much lower-level than that. If you backed them up automatically it'd be hard to provide the performance that people are using them to get. You build durable storage on top of simple storage volumes.

Yup it's ironic however how in the end you have to manage that stuff yourself anyway.

It depends how the vendor presents and handles it. I'm not aware how RS promises look.

However, for example, the hetzner cloud has an explicit choice between triple-redundant ceph-based storage for VMs, or hypervisor local SSD storage. That's good and explicit. I can make an informed choice about speed or durability.

> triple-redundant ceph-based storage

Just to point out, that's still not a backup system. That's only one software bug (in your systems or theirs) away from having no working current data set. ;)

Yes, I did not mean to imply it is a backup. You still need backups. However, that redundant storage takes care of .. well redundancy, if you need it. Storage with zero redundancy on the drive or the service level (think of ES) is stressful to say the least.

Especially Rackspace -- isn't their price tag because its all managed hosting?

On one hand, I agree with you. On the other hand, you say this is a hardware issue but do you know for a fact that is what happened?

Migration sounds to me like you migrate data from one state to the other state, then delete the old state. To me, it would be logical that before you delete the old state, you verify whether the migration was successful.

providers like this dont migrate data for no reason. there was probably a bad drive in the array or something, or some other known defect, so they are opting to migrate off the host before things get worse

Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | Legal | Apply to YC | Contact