Hacker News new | comments | show | ask | jobs | submit login
Trello is running on a diesel generator and may go down at anytime (fogcreek.com)
59 points by guiambros 1815 days ago | hide | past | web | 65 comments | favorite

Let me update this: we've been working all day on moving Trello to AWS, a project which is now underway. If all goes well (fingers crossed, etc) we'll be up and running on AWS very soon. There are still other services in the flooded New York data center (FogBugz, Kiln, Copilot) and we are making plans to physically move servers to a new data center as quickly as possible if the flooded data center falls over, but a physical move of 5 racks could take a whole day so I hope that doesn't happen.

Separately, Stack Exchange, which is in the same data center, is running off of its hot backup in Oregon.

We would have liked to have completely redundant data centers for FogBugz on Demand and Kiln on Demand to avoid even a few hours of downtime, but because those services rely heavily on giving every customer their own SQL database, there is almost no reasonable way to get fast failover to a different data center. We can do it with Stack Exchange because there are only a couple of hundred databases. We've been building a SAN solution which will make it possible to hot swap out to another datacenter, one day, but that project is not complete.

Why would you have needed fast failover in this case? Sandy's trajectory was predictable enough that even a day-long migration could have been completed with time to spare.

> but because those services rely heavily on giving every customer their own SQL database, there is almost no reasonable way to get fast failover to a different data center

Why did you say you could do fast failover to LA in 2007? [1] It was the case then that every customer had their own SQL database too.

I'm not trying to be snarky, your team is obviously putting forth some heroic effort. I'm just an affected customer that is curious why the question seems to have two mutually exclusive answers.

[1] http://webcache.googleusercontent.com/search?q=cache:lHEK939...

I'd be curious to see if there's any update to FBOD architecture. The idea of using self-hosted servers running SQL Server and IIS already sounded eccentric 5 years ago, but given the number of options today, it'd be hard to still justify such a monolithic solution.

In all fairness, Joel recognized in the article that the initial decision was based on their previous experience with Microsoft -- and AWS was still incipient in 2007 -- so it made sense at the time. I doubt this is still the case. Even less if you consider the costs of Sandy - duplicated servers, migration, hours lost, unhappy clients, etc.

ISCSI over DRBD for Linux, seems to be one solution. I know of a highly available VOIP company that uses this in production.

And maybe this sounds snarky, but AWS is your "making it better" solution? Seriously?

If aws is up and their DC is down, in what way is that not better?

Sorry for not being clear - I was referring to AWS' not so stellar uptime.

Except for this event. Where pretty much every DC in NY/NJ is compromised, yet no blips in us-east.

Probably because Amazon US-East is in Northern VA, not NY/NJ. http://aws.amazon.com/about-aws/globalinfrastructure/

NoVA didn't get hit that hard by Sandy, unlike NY/NJ. So data centers around here didn't tend to have problems, the few that I have equipment in didn't even loose utility power.

According to this post, the datacenter people are hauling 55 gallon drums up 17 flights of stairs. Diesel is ~7 pounds/gallon so a 55 gallon drum is ~385 pounds not including the container. Ouch.

Last we heard it's a bucket brigade doing 5 gallons at a time. Fog Creek president Michael Pryor is participating but mostly it seems to be the crew from Peer 1 networks doing the hard work, we don't want to take too much credit.

I used Trello today, now I appreciate all that dragging and dropping and sorting and arguing (erm... discussion) we had around our board even more. Thank you, even though I would have keep using you guys even if it had been down all day.

Bellevue Hospital had (has?) the same kind of operation going on.

That's an insane real world human effort so that geeks can move bits around a screen!

Hopefully they have stair-dollies.

My surprise is that a respected team and high profile company would make such a basic mistake of relying on a single datacenter, within a single availability zone -- in lower Manhattan. And knowing since last week that Sandy was coming, and this was a real threat.

And I'm not even talking about Trello (which is still just a hobby for Joel & team), but this also brought down Fog Creek and all their commercial services and paying customers.

The positive side of Sandy (if there's any) is that people will really take more seriously the idea of "expect the best; plan for the worst". And Amazon will likely see a spike of new customers in the next following days.

I'm baffled that the powers that be in NYC didn't learn from Houston's experience with Tropical Storm Allison in June 2001. All the hospitals at the Texas Medical Center had their basements and ground floors flooded; there went their generators [1]. (I was part of a large group of volunteers from all over the city that helped to evacuate patients for transfer; it was spooky seeing a grand piano floating in a below-grade-level lobby of one of the hospitals.)

My then-employer's building had its basement flooded; we were on floors 19 through 25, but our electricity and phones and Internet were gone --- and this, three weeks before the end of Q2. Our developers and IT people hauled computers down the stairs, and we moved to temporary space for several weeks, but we still missed, that is, we didn't achieve the sales and revenue targets that we had forecast for analysts.

I'm given to understand that because of the lessons learned from Allison, Houston isn't quite as vulnerable to flooding any more.

[1] http://en.wikipedia.org/wiki/Tropical_Storm_Allison#Texas

The bottom line is that people just don't plan for 100 year storms. I suspect that's because the number of things that go wrong once every 100 years is so large and the cost of handling each one is so disproportionate to the risk that it's not cost effective. Think about it this way: under what circumstances would you double your hardware budget to avoid 1 day of downtime every 100 years?

This is exactly right. The truth is you can spend infinite money on various scenarios, and that cost has to be paid. The manufacturer side of this is 'warranty', you guarantee that something won't break in 90 days, a year, 5 years, what have you and you price to that probability.

Another risk in New York that will take data centers offline, earthquake. It will happen at some point, but the chance of it happening in the next 100 years small (contrasted to Santa Clara where we're much more likely to get an earthquake than a flood). So you could pay to have your Manhattan high rise put onto base isolators and proofed against an earthquake, but what size earthquake? Magnitude 3? Magnitude 6? Magnitude 9? And then if your building is still standing brightly after the Magnitude 9 earthquake are your fiber optics still there? How about the network tie point? Did it fall into a hole in the ground? So the cost to make all of Manhattan resistant to a 9.0 earthquake?

Its impressive that they are carrying up the fuel. I might be inclined to see if I could tractor in a 12kW generator to run one elevator. Sure you'd be burning fuel at both ends but it would be easier on the crew hauling the petrol.

Magnitude 5.0 is 1 in 100. Magnitude 6.0 is more like 1 in 500. You do a cost-benefit analysis for the different return rates.

Hospitals should definitely be built to handle a magnitude 5 or more. You're talking about a high probability of loss of life if they can't handle a 1 in 100 year event. Data centers, who cares? It's cost vs expected damage, and unless it's a safety-critical server (emergency services co-ordination?) you're probably just as well off shipping the servers somewhere else and eating the cost of the downtime.

As for getting a 12kW generator in a natural disaster, generators are in extremely high demand. You'll have trouble finding one. The other fun thing about generators in a disaster - people usually don't test them, and end up panicing in a dark basement with no idea how to switch them on and hook them up.

due to flooding in the basement and diesel fuel spilled everywhere, they wouldn't power up the elevators even if they could. The building's entire electrical system is under salt water/diesel mixture.

Your team's dedication is amazing.

Having been lucky enough not to be placed in the situation you are in, I doubt I could have gone to such lengths to get everything back online so quickly. I admire everything you guys are doing.


When the Brooklyn-Battery tunnel is flooded, it's hard to blame anyone else in the city for being insufficiently risk-adverse.

You might as well blame a data-center for going down when the building it's in burns to the ground.

Trello has switched to AWS as of this posting. The Trello team are insane, and I'm very proud to be part of the same company that they are.

Also, pumps must always be located at the bottom to pull liquid uphill by more than ~30'. Storing large tanks of diesel fuel on the 17th floor (or higher), is a heavy and perhaps dangerous situation.

An always-submerged in-tank pump, not unlike that of a water well, powered via a sealed line from the generators above could, in principle, avoid this problem.

I hope the generators are able to power some sort of mechanical lifting arrangement. Manually hauling the drums is mighty hard.

I remember it being 30' for water, but it should be a bit more for diesel (since diesel is not as dense as water). Mercury, an obviously much denser fluid, can only be pulled up by a vacuum a much shorter distance (around 760 millimeters iirc).

Still, the extra height isn't going to buy you much.

When I was carrying buckets of diesel up the stairs, I did notice that it was a lot lighter than water. Still not great after 17 flights, but a lot lighter than a bucket full of water.

I should've remembered the density difference...

According to omniscient Wikipedia, diesel is 83% the density of water, so the 33 feet that water can be lifted via suction becomes ~38'.

The Trieste used the buoyancy of (incompressible) gasoline to enable a return trip from the bottom of the Mariana Trench.

Strong work lifting all that fuel!

While this is certainly a very rare event, I'm very glad that we've gone with redundant self-hosted solutions. Trello is great and is highly respected by the startup crowd, but to be held hostage to someone else's data decisions does not work for me once you get beyond the most basic level of a functioning company.

I've been involved in self hosting From 100 to 70% of our infrastructure for the last 15 years (The percentage has been dropping the last five to six years). This has been with companies with as few as five (various startups) and as many as 2500 (netscape) - with a median of around 500 employees (large enough to have a couple remote data centers and a local corporate data center).

I've never done the full 15 year assessment, but, on average, the smart people throwing themselves into hosting my business applications have performed at, or above, any level that I ever could.

People always talk about down time from third parties, but always seem to forget how much downtime self-hosting induces - particularly as you usually can't afford the 24x7 well-staffed NOC and operations team.

The temptation is always to say "I can do it better myself" - but the reality is you probably can find someone out there who can do it better than you, and, even when you "Self Host" - 95% of the time that really means relying on someone else to provide you a data center with generators/diesel/hvac/security/Sonet/etc...

With that said - Fog Creek is a tiny little operation, that probably operates at, or below the level of our own Operations team - so it's a coin toss as to whether they would do better than us at hosting.

On the flip side - I'm pretty confident that gmail, on average, has had better uptime the last 10 years than the mail servers I've managed. (Though, the last 5 years have been remarkably stable for self hosted email. Exchange has come a long way...)

Indeed! For me all hosted solutions are automatically dead. Too many startups just disappear so quick that I would never do business with them. (And while FogCreek hasn't disappeared so far, at least WebPutty is gone)

Given that this is a unpopular view around here anyway, I want to add another thought:

If your company has it's own webpage at datacenter A, including payment from datacenter/service B with CDN C and Twitter-integration D, using hosted FogBugz for support mails, hosting it's CSS-stylesheets on TRELLO which was baked by Google AppEngine, and using email provider E...

Why don't you have downtime at leas one a quarter???

How can a system that relies on so many components and other companies be reliable? With all those AWS-outtakes? I don' understand this.

If you host most of your stuff yourself, either everything works, or nothing. But not all goes down at the same time.

Just to be clear, WebPutty is not gone. It's operational for another few months on Fog Creek's dime[1]. After that (or now, if you'd like) you can easily host it yourself on App Engine, probably for free, now that it's open source[2].

[1] https://www.webputty.net/ [2] https://github.com/fogcreek/webputty

If a service one once used is retired, open sourcing it is of course the very best outcome. Thanks for that!!!

However (regarding AppEngine based solutions), as far as I know, if AppEngine goes down, there is no way to have any backup, is it? Only google can host AppEngine apps.

This is always true until you're the one without hosting.

This story reminds me a little of this: http://en.wikipedia.org/wiki/Interdictor_(blog)

It turns out that many backup power plans have not been designed with longterm flooding in mind.

Trello database can't be that big. Why not just copy it, zip it up, and transfer it over to the west coast?

I can't tell for sure, but a little Googling seems to indicate that there is natural gas available in at least some New York neighborhoods.

Why aren't data centers built where natural gas is available? A natural gas powered generator could run from the gas lines, which should be more reliable than fuel that has to be trucked in.

Natural gas and steam are pervasive in NYC. I think they are used less often because you cannot control delivery and running big gas pipes in a retrofitted building is expensive.

Keep in mind that Manhattan has NEVER flooded before.

I suspect because Natural Gas Lines can fail in a disaster situation as well, where you can "always" truck in diesel. If I'm ever building an insane data center I'd do both now!

NYU operates a CoGen plant under 251 Mercer that runs on natural gas and can fail over to diesel. It provides power, steam, and cold water for air conditioning to many of surrounding properties. The net result is that NYU servers frequently stay up during power outages, while their ISP usually goes down.

"The superhuman folks at the data center are hauling 55 gallon drums of diesel fuel up 17 flights of stairs."

I can't for the life of me think why someone would put a backup generator 17 floors up, these guys don't know you can buy mains cables longer than 4 feet?

If it absolutely has to be 17 floors up, buy a fucking pump.

Geeks. Sometimes we're so dumb it hurts.

From http://status.fogcreek.com/2012/10/fog-creek-services-update...:

"Here's the physical situation:

The generators are on a high floor in the building and the pumps supplying the generators with fuel are submerged. The best option at this point is for people to physically lug diesel up over a dozen floors, or make other arrangements for pumping fuel to that high floor. "

Yes, a generator not in the basement / ground level makes sense. When a flood happens, it usually happens from the ground floor up ;) Thus why generators are at the upper levels.

Even in this case, using a pump solution makes little sense. Assuming you have electricity to power the pump, I suppose you could power the pump with the electricity from the remaining fuel, but a 17 floor pipe containing fuel would probably be heavier (and more dangerous) than a long mains power cord, and it wouldn't make sense to have a 17 floor cable to power the pump with the pump at the bottom of the run than it would to have just the pipe of fuel running the length of 17 floors and the pump at the top of the run.

Time for a winch? I don't think Home Depot stocks a 17 story pipe, and relocating the generator probably will take more time than just winching / carrying up the barrels.

I believe having the pumps at the bottom makes sense because you only have ~14PSI of atmospheric pressure to use to pull the fuel up if the pump were at the top, but you can generate as much pressure as you need at the bottom to push the fuel up.

Generators higher up makes sense, but 17 stories high? I suppose real-estate concerns limit where you can put them though.

In this case, I think you are correct. Need lots of exhaust so have to be on the roof. One police plaza just tore the side of their building off to make room for generator exhaust. It's a warzone in lower manhattan right now.

Wow, that is pretty nuts. Did they install temporary generators, or did they run into unanticipated ventilation issues with existing generators?

Why not store the fuel on the same floor as the generators?

I understand that this might be a fire-safety issue. In which case I would agree that a winch should have been installed.

(I'm possibly missing something else obvious here!)

If you'd read the previous articles about the hurricane's effects on datacenters, you'd know that the diesel pumps in many buildings were wiped out by flooding.

"buy a fucking pump"

I suspect this is good advice, but the image that popped into my head was an elevated generator, a ground-level tank, and an electric pump reliant on either mains or generator power. "It's always worked during our monthly tests!" Whoops!

Probably the elevated generator should be installed with at least a small gravity-fed tank, easily refilled once the pump is operating. I doubt anyone wants a giant fuel depot anywhere inside or on the roof of the building.

Keep in mind that for heights taller than 3 floors (and practically, the limit is shorter than that), pumps have to be located on the same level as the fuel supply because the atmospheric column only has so much mass. Pumps and their electrical inputs should probably be installed with waterproofing if flooding is considered likely.

The fuel is stored in the basement of the datacenter. That flooded, taking the pump that was supposed to pump up the fuel stored in the basement to the 17th floor.

This has nothing to do with buying a pump, that was already there, this has to do with the fact that the diesel tanks and the pumps for them are under water.

In the future I hope they put the pumps on higher floors, and make sure the fuel tanks are completely sealed, so even if the tanks are flooded the pump can keep pumping fuel out of it.

Mind you they are in lower Manhattan, not some random suburban office park.

That was probably their first mistake.

How so? This storm was a whopper. To the point that buoys were reporting waves 5 times taller than anything on record. Historically, Manhattan has been a fine place for a data center. Moreover, Peer1 survived the 2003 blackout, the only other recent event of this magnitude, without service disruption.

You can armchair quarterback this all you want, but unless you're actually making the decisions, perhaps you should consider the information available.

Well sure, and everything is fine until it isn't. It would be impossible for Manhattan to have any kind of Internet without peering and NAPs, but for hosting? I'll choose Nowheresville anytime, which I have done before.

I'm sure there are situations where you need to get someone to the DC quickly, so a Manhattan DC makes a lot of sense for an NYC based company. The current solution already worked the VAST majority of the time too. If they'd hosted it in central nowhere, people would criticize them for not planning appropriately for the times they need someone at the DC.

LOL I agree. There are a few small data center in Manhattan. Actually, if they choose data center like Amazon they may expect outrage once in a while....

In most buildings, the generator is at the top floor, just saying. Source: I worked @ cbc on Queen street in ottawa,ON, and the generator is on the top floor.

Makes sense actually, probably in case of flooding.

Doesn't make that pump any less sensible though!

Apparently the pumps got flooded: http://status.fogcreek.com/2012/10/fog-creek-services-update... (near the bottom)

Well, now I do feel like a cunt.

Past my bedtime, so didn't connect this with that wee bit of weather the East coast USA experienced recently.

Sincerely hope lugging diesel remains the least of their worries.

This is reminiscent of what happened at Fukushima Daiichi, although with less dire consequences.

I suspect that's for ventilation reasons

Why would you host in New York, how dumb? Everyone knows that when aliens attack, disaster strikes, time for nuclear armageddon or it's the end of the world, that it all starts in New York. I've seen the movies so there!

What will I possibly do if I have to go without Trello during a nuclear armageddon.

Go back to reddit.

Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | DMCA | Apply to YC | Contact