Hacker News new | past | comments | ask | show | jobs | submit login
Heroku down? (heroku.com)
155 points by neeleshs on June 15, 2012 | hide | past | favorite | 67 comments



This is a more widespread EC2/EBS issue: http://status.aws.amazon.com/


I couldn't see any red circles indicating an issue with EC2/EBS.


The circle is green with a little "note" on it.

"8:50 PM PDT We are investigating degraded performance for some volumes in a single AZ in the us-east-1 region."


A green circle with a little note on it, for "There was a complete power failure"

God only knows what they reserve the red circle for...


Wouldn't that only affect a small subset of visitors. For example why would I be seeing any issues if I'd be hitting an Asia-pacific volume instead of a us-east region one? Seems like it goes deeper than that.


One problem which we've seen before is: If a large percentage of AWS infrastructure goes down, the customers don't just quietly suffer. Instead they scramble to try and launch infrastructure in other zones or regions, which creates a cascading series of load spikes throughout the AWS system.

AWS is a fascinating science experiment. Pity about the websites, though.


It's now yellow with this: "9:27 PM PDT We continue to investigate this issue. We can confirm that there is both impact to volumes and instances in a single AZ in US-EAST-1 Region. We are also experiencing increased error rates and latencies on the EC2 APIs in the US-EAST-1 Region."

AWS has been historically bad at reporting the severity of their outages promptly.


Is Heroku still in beta? The name has been around for a while. I would have thought the platform would have stabilized and complete outages were very unlikely.

In the past 30 days they have had 2 outages which have lasted more than 2 hours. That's a lot of down time.


This is an Amazon outage.


Not very interesting for a customer of Heroku.


Any customer of Heroku who didn't weigh the cost/benefit of Cloud hosting is a fool and deserves to watch their app have downtime.

Cloud hosting is fantastic but it's a trade off. There are so many layers of abstraction between you and the hardware that you are completely at the mercy of one, two or (more!) technical organizations, each with their own support systems and varying levels of opacity into their infrastructure.

The fact that Amazon went down IS VERY interesting for a customer of Heroku.

And if it isn't, than that customer is a fool for outsourcing so much of their system without even understanding the risks involved.


I don't know, I think heroku should provide the value proposition that - hey we have it covered, if we rely on another cloud it's on use to build in redundancy and reliability atop it so as not to burden you, the app developer who is paying us to take care of operations and scaling.

What would be interesting to me about an Amazon outage being behind a heroku outage would be to keep a tally, and if heroku didn't manage to build in more reliability to be resilient to even an amazon outage in a particular region, to question whether they were a good fit.


And Amazon had a power outage. That does not mean that customers can't (or won't) blame Amazon.


Dear Heroku -- I know it's my job to make sure my site is available (/thread). However, I think I speak for most enterprise customers when I say I will throw money at your company the second you come up with a multi-zone/highly-available offering.


Throw money at http://appfog.com/ they have multi datacenters already


Their pricing page https://console.appfog.com/pricing isn't very enlightening


It's an open offer. Do they support python? edit: they do, but it's not clear how to easily deploy a multi-zone application. Could you point me towards some docs?


Right now any single app can be deployed from any one of a bunch of infrastructures, AppFog is working on the ability to run one app in multiple infrastructures simultaneously too.


appfog.com also has outages as well - http://blog.appfog.com/october-27th-downtime-postmortem/ - when going with public cloud, your app's well being is always in 3rd party hands. If you want to mitigate the risk, host your own private PaaS on your own infrastructure - then you can only point your finger at yourself when outages occur.


Looks like they support everything except Python :(


AppFog supports Python!


https://status.heroku.com/incidents/151

Looks like they didn't actually do any of the remediation steps.


Dear competitors,

Please take this outage as proof that you need to build our your own infrastructure and hire your own operations team in multiple geographic locations.

In the mean time, we will continue to focus on building new features and products that our customers love on our EC2, Heroku, and cloud based system.


Nah, I'm going to pick dedicated hardware with SSDs for IO consistency that beats the pants off of AWS for a fraction of the price and not much more time commitment, rather than for the potentially better uptime.


not much more time commitment

This is fiction.


How much time commitment do you figure on when your site is down and paying customers are calling and emailing you?


If reliability is that critical, you need multiple data centers. This is far easier to implement with EC2 than by building out hardware.

Also: Most downtime is caused by bad code deployment, poorly-conceived network or system configuration changes, and sysadmins with fat fingers. Do you really think your hired talent is going to be better than Amazon's hired talent?


Sorry to be rude, but that is a bit incendiary of a comment. However, given stats for the past 12 months, I can show that my availability for the 2 racks I manage, is in fact, higher than AWS. Thanks for the compliment!


I don't consider a few hours per month administering servers to be much more than the time I would spend working around Heroku's proprietary app model for the more esoteric things I would want to do. I would guess it's less, but I don't really know. Throw saving thousands a month into the mix and it's not something I'd lose sleep over. Database servers that can handle 10's of thousands of iops and web servers that can handle thousands of uncached requests per second makes machine administration a lot more pleasant than it was just 5 years ago. Hardware has been scaling vertically quickly enough that it's no longer strictly necessary to scale horizontally in a massive way as you grow.


This is fact.

Honestly, it's not that hard to set up a server, and furthermore, it's just not that different to maintain a hard server than a virtual server.

And the most important part: when you have your own hardware, you at least maintain control over every aspect of your systems. The value of this cannot be overstated.


it's just not that different to maintain a hard server than a virtual server

I'm guessing you haven't used Heroku. Server setup is "git push master".

you at least maintain control over every aspect of your systems

There are still plenty of things you don't control - network feeds to your cage, continuous power, bugs and failures in the hardware you buy. You cannot provision new systems without either buying machines or having a hot standby, and somebody needs to make a trip to the cage. If you're getting hardware by the month from a service, you will probably have faster turnaround (hours not days), but once again you're giving up some control.


OR, buy a couple of monster 64GB+ RAM systems, add SSDs, and place in LA or Denver.

I have an ancient quad-900Mhz Xeon with 6GB RAM (customer does not want to migrate) that has an uptime of 1600 days, and for which total network issues during that period was a few hours (wonky power to switch).

Cloud is too often comparable to "vapor" in terms of the claims of redundancy and availability.


This solution naively assumes a fixed resource need. Growing startups steadily provision additional resources.

They also try new things with large amounts of data. This requires scaling up to additional machines for hours or days at a time, and then scaling back some or most of them when they optimize new ideas and services for production.

When the new services are a hit with clients, traffic increases and the whole cycle starts again.


And you've noted the EXACT use case for the cloud. If you are a new app, or a hot start-up and get tons of sporadic traffic, the cloud is absolutely where you need to be. To do anything else is beyond foolish.

Now, what I think the original post there is speaking about is for mid to large size enterprise companies that have stable, but significant traffic. In cases like this they must do a cost-benefit calculation because you risk a lot if you don't. Then the cloud might very well not be the right solution, because costs could be 10x more than anything else... so the answer in my mind is not always clear.


I believe friendfeed ran off of a single machine for most of its life. A few years down the line, I think most people who have only used EC2/Heroku would be shocked at how much traffic a single recent Xeon 8+ core machine with 32+ gigs of RAM and RAIDed SSDs can handle and the price at which it does it. Before that, even a mid-tier VPS is probably a better option than EC2 for most.

Sure, EC2 is probably best for a startup that expects to double every week from a nontrivial starting point and has large machine resource needs per user (viral video startups, for example). The vast, vast majority of startups won't have anything that resembles that kind of growth graph, though, and thus shouldn't blindly follow what the Pinterests of the world do. It's a completely different type of demand. If they find out that they actually are going to have double digit daily organic growth percentages, then they can switch to EC2 before it gets out of hand, but otherwise, it's premature optimization.


http://quora.com is down, too. Seems like an AWS outage.


Once again proof that "the cloud" isn't always the best solution. I am amazed that a cloud provider like Amazon can still suffer from outages considering the size of their cloud infrastructure and supposedly being decoupled, obviously not decoupled enough. Perhaps it's my lack of understanding of cloud hosting, but when issues like this present themselves it obviously shows that cloud hosting has a long way to go.


We've distributed our machines between availability zones. We lost a machine with the latest outage. Application impact to our users? None.

No point bemoaning a lack of decoupling, if you don't actually use it.


It's not like Amazon haven't had multizone outages though.


What's the alternative? Go it alone and run your own infrastructure?


Agreed. 99.97% uptime for May is excellent. I've worked with a lot of enterprise systems (to include military!) and we would love to have that high % of uptime.


My cheap shared hosting provider has 99.98% this month, and they give you a free month if it goes below 99.8%, which has only happened once in two years.

Somehow Heroku doesn't seem that great to me.


Cmon. Your cheap shared hosting doesn't compare when it comes to scale and features.


Yes, but it doesn't compare when it comes to price either, considering a single Dyno costs $36/month when I'm paying $44/year for the whole plan.

(And that comes with excellent support - for example, I asked if they were planning to offer Python and they said "Sure, just gives us a couple of days to set up a machine with Python for you", even when I was only interested in the cheapest plan).

I know they don't serve the same market, but I find it strange that a service that costs an order of magnitude more doesn't have a better uptime than cheap shared hosting.


Primarily because in order for amazon to pull a profit, they're not stuffing these systems in high-quality datacenters ;)


Going on last week's outage of 2 hours and last night's of 8 hours, that puts June's uptime at ~98.6%.


Only if you count small outages which affected a fraction of the customers as affecting the entire service. I had 100% uptime during that period using only the multi-AZ redundancy.


EBS is a deep, dark black magic that time has shown doesn't actually work.


I'm not a big EC2 fan but there is not much "black" about network volumes. Also a few hundred thousand customers seem to disagree with the "doesn't work" part.


"proof" - I do not think that word means, what you think it means.


I'm pretty sure nothing qualifies as "always the best" solution. "The cloud" can be an imperfect solution yet still be the best solution for certain apps.


Cloud is an excellent solution. The issues is companies not taking advantage of multiple availability zones.


Is anyone else getting a message that "sathish@DOMAIN.com has been unsubscribed from future notifications."? (Redacted just in case.)

There's a notification at the top of the page for me with that message, but it didn't appear in Chrome. Session collision maybe?


I also received that message.


Their current status is that they are investigating issues with their infrastructure provider: https://status.heroku.com/


Very cool new status page look. Timeline style! kudos to Heroku's team. Too bad I'm such in a bad mood when I visit it.


It seems that Heroku has been down for at least 4 hours in June. This makes the June uptime less than 99.5%.


between this (now 8 hour!?!?) stretch and the 2 hour outage last week, June uptime is down in the mid 98's


There was ~12.5 hours of downtime in june. That means their uptime for june is now 96.5%. No wonder why they have decided to show May uptime instead.


Yup. Serving up error messages for me. Let's hope things are resolved a bit quicker than last time.


http://hootsuite.com is offline as well.


I've only been using Heroku as we're working with a client who's managing it themselves for the last month or so. I'm surprised at how much downtime there's been. Is this typical or is it just an unlucky spot?


This is why http://AppFog.com/ is investing in multiple IaaS and is not being hit nearly as hard. You can still sign up and even create apps.


Looks like they are back. Now the fun part starts for the rest of us. Time to make sure that everything started and that the apps are running.


My apps are down, and so is the heroku.com homepage.


Yes it is. Again. Damn it.


Heroku is back up.




Join us for AI Startup School this June 16-17 in San Francisco!

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: