What the hell. It is so easy to configure multi-region glacier backups, mfa dele...

gregrata · on March 18, 2019

The key words you probably need to look at are "multi-petabyte". Not saying they shouldn't be doing something but it all costs - and at multi-petabytes, it cooooosts

1 Petabyte (and they have multiple) S3 - $30,000 a month, $360,000 a year

S3 - reduced redundancy - $24,000 a month, $288,000 a year

S3 - infrequent access - $13,100 a month, $157,000 a year

Glacier - $7340 a month - $88,000 a year

zxcvbn4038 · on March 18, 2019

Add in transit and cdn and Tumblr’s AWS bill was seven figures a month. A bunch of us wanted to build something like Facebook’s haystack do away with S3 altogether, but the idea kept getting killed because of concerns over all the places the S3 URLs were hard coded and also breaking 3rd party links to content in the bucket (for years you could link to the bucket directly - still can for content more then a couple years old)

PostOnce · on March 18, 2019

Well, the business was acquired for $500,000,000 and a single employee probably costs what backing up two petabytes of data for a year (on glacier) does.

They could also always use tapes, for something as critical as the data that is the blood of your business.

Imagine if facebook lost everyones' contact lists, how bad would that be for their business? Backups are cheap insurance.

FussyZeus · on March 18, 2019

Backups are still a hard sell for management, though. No matter how many companies die a quick and painful death when they lose too much business critical data, the bossmen just can't wrap their heads around spending $100k for what they perceive as no benefit.

Same problems with buying things like antivirus software or even IT management utilities; when they're doing their job, there's no perceivable difference. It's only when shit goes sideways that the value is demonstrated.

Hell you could take this a step further for IT as a whole; if IT is doing their job well, they're invisible. Then they can the entire department, outsource to offsite support, and the business starts hemorrhaging employees and revenue because nobody can get anything done.

magduf · on March 18, 2019

>No matter how many companies die a quick and painful death when they lose too much business critical data, the bossmen just can't wrap their heads around spending $100k for what they perceive as no benefit.

Yeah, but what exactly IS the benefit? The business doesn't die if something really bad happens? Is that really important though?

Consider the two alternatives:

1) The business spends $x00k/year on backups. IF something happens, they're saved, and business continues as normal. However, this money comes out of their bottom line, making them less profitable.

2) The business doesn't bother with backups, and has more profit. The management can get bigger bonuses. But IF something bad happens, the company goes under, but then what happens to the managers who made these decisions? They just go on to another job at another company, right?

I'm not sure I see the benefit of backups here.

FussyZeus · on March 18, 2019

> Yeah, but what exactly IS the benefit? The business doesn't die if something really bad happens? Is that really important though?

I mean the way management gets on me when we have outages, you'd think that was a significant priority?

magduf · on March 18, 2019

They can get more money in the short term by pushing you harder, and there's zero cost to them to go yell at you. If they could get a bigger bonus by ignoring outages, they'd do that, but instead, they can get a bigger bonus by pushing you to reduce outages without any additional resources.

meko · on March 20, 2019

You'd be absolutely right but it's still a sad state of affairs.

softawre · on March 18, 2019

The managers that make these decisions need to have equity.

magduf · on March 18, 2019

Seems like they do just fine with big golden parachutes. Why tie their compensation to the company's performance when they can just have a big payout whenever they leave under any circumstances?

ConceptJunkie · on March 18, 2019

I worked at a place that lost their entire CVS repository. The only reason they were able to restore it at all was because I made daily backups of the code myself. Sure, a lot of context data was still still lost, but at least there was some history preserved.

antt · on March 18, 2019

They are expensive until the business goes bankrupt.

Bartweiss · on March 19, 2019

I wouldn't be surprised if this was actually the rationale for not having backups.

Tumblr is apparently fragile and tech-debt laden on engineering side, stagnant on users, and unprofitable. At a certain point, it's a coherent decision to just say "a few days of downtime would seal our fate, the business can only be saved if everything goes right", and not spend any money on mitigation.

ummonk · on March 18, 2019

88k per year per petabyte is a small price to pay to protect your entire business from being wiped out.

OscarTheGrinch · on March 18, 2019

Devil's advocate: it depends on how many petabytes you have. This cloud of uncertainty over your uploads could be seen as the hidden cost of using a free platform.

dotancohen · on March 18, 2019

> cloud of uncertainty

So far as Myspace (or Tumblr apparently) is concerned, it is "somebody else's computer of uncertainty".

pmlnr · on March 18, 2019

There are Supermicro chassis' out there with 106x14TB drives in 4u, super deep racks.

1PB is nothing today.

idlewords · on March 19, 2019

Building such a storage behemoth is not the challenging part. Filling it with data, backing it up, and keeping the RAID rebuild time under load on such monster drives below the average drive failure time is the challenging part.

crest · on March 19, 2019

At that scale it makes sense to start thinking about alternatives to RAID e.g. an object storage with erasure coding should work well for a code base already using the S3 API. In theory even minio should be enough, but I never had enough spare hardware to perform a load test of that scale.

bufferoverflow · on March 18, 2019

Or they can just have their own backup solution for a lot cheaper. 8TB = $140 on Amazon.

1 petabyte = 125 drives = $17,500 (one-time cost).

It will probably cost more to connect all these drives to some sort of a server. Though 125 is within the realm of what a simple USB should be able to handle (127 devices per controller).

whoami_whereami · on March 18, 2019

And how many days of downtime are you willing to tolerate while you are restoring that petabyte of data from your contraption? Let's say you have a 10Gbps internet connection (not cheap) all the way through to the Amazon data center, the data transfer will only take about 12 days per petabyte then.

Getting petabytes of storage isn't the problem, transferring the data back and forth is.

Bartweiss · on March 19, 2019

This is all true, but it sort of presupposes competence.

Taking a full month to recover a downed social media platform isn't really acceptable, but it's still better than being literally unable to recover it at all. Spending a small fortune to ship hardware to an AWS datacenter and convincing/paying them to load it directly would probably also be worthwhile, when we're talking about simply losing a $500M company. If the claim here about "no backup" is true, it's so profoundly stupid that everything I know about best practices sort of goes out the window. Approaches that any sensible person would consider unacceptably slow and unreliable are still a step up from a completely blank playbook.

(I guess the theory might be that Tumblr is such a trashfire it can't be restored, or would lose so much value in days/weeks of downtime that there's no point in even planning for that. Again, I don't really know how you run cost-benefit analyses when it's not entirely clear the project has benefits.)

yayr · on March 19, 2019

you can just colocate that server

whoami_whereami · on March 19, 2019

And where does Amazon offer colo services? What they offer is Direct Connect at certain (non-Amazon) data centers. That costs about 20k per year for a 10Gb port, ON TOP of the colocation and cross connect fees you are paying at the data center where you want to establish the connection. If you want to bring the restore time down to 12 hours, you need 24 connections (and you need at least as many servers, no single server can handle 240Gb of traffic), so we are now at about 480k+X (large X!) per year per petabyte just for the connections you need in case you have a catastrophic failure (establishing such a connection takes days or even weeks, even if ports are available immediately, so you can't establish the connections "on demand").

That's not even talking about availability, as you are now getting into the realm where it starts to get questionable whether even Amazon has enough backhaul capacity available at those locations so that you can actually max out 50+ 10Gb connections simultaneously.

QuinnyPig · on March 19, 2019

At this scale there is no “just.”

lugg · on March 18, 2019

That's like a developer or two..

Wth?

zwily · on March 18, 2019

MFA delete at least doesn’t cost any extra.

quotemstr · on March 18, 2019

So, roughly the cost of one or two good engineers? Not having backups is penny wise and pound foolish.

ConceptJunkie · on March 18, 2019

"Penny wise and pound foolish" is the universal motto of management everywhere.

de_watcher · on March 18, 2019

They'll lose it as soon as they try to configure that.