Then I clicked on the "down=no" link for fun, and the page partially loaded for me. I refreshed it and got the whole front page loaded. And then one more time and got the "Service unavailable" again...
This would apply more to purchases from a specific and exceptional point than those which can be made from multiple providers. Say, my usual lunch spot is closed or out of an item, and I can walk down the street elsewhere (or a drugstore, etc.). However if you're selling hard-to-find exclusive items, or we've got an established relationship and the item isn't something I need right now, I'll simply get it later.
On the macro scale, it makes me suspect that shorter interruptions to service don't have a significant regional financial impact.
Though this is all armchair economics.
If I may, could I suggest another possibility;
People know, and trust, Amazon.
"I can't load the shopping site. Maybe it'll work if I reboot."
It's quite different for a site that's not as well known as this one.
From a customer standpoint their a-z guarantee probably helps a lot.
Musing over my own post, I can think of instances where the same would not be true. A financial trading platform in particular -- trades are already occurring at volume, and lost time would be lost trades.
Thanks for sharing a non-obvious data point.
I thought for sure I'd have missed it and this would be one of those reports where the service was back up before the story gained traction, but as of 12:07 PM Pacific/US time I cannot navigate to Amazon's home page..
The amazing thing about this for me is that it reminds me that it was only a few years ago that even the biggest sites would have fairly frequent multi-hour outages, but these days it is pretty rare for this sort of thing to happen, particularly on a retail or otherwise direct-money generating site.
In a factory in 1965, maybe, but no good employer is going to fire someone for making a mistake, no matter how costly.
Depends on the factory, and I doubt it's much more or less likely in the 21st century than it was in 1965. In all times and places, you have enlightened and unenlightened people. In all times and places you have good and bad leaders.
"In every time, in every place, the deeds of men remain the same."
Citation needed. I'm pretty sure there are fewer war deaths and fewer kittens burned for entertainment per head now than 200 years ago.
Yes, there is less bad stuff, but the quote says nothing of frequency.
The problem with your attitude is that it's based upon a premise that is almost never true: that screwups are caused by incompetence, and that they have singular (or overwhelmingly singular) sources.
Neither of these assumptions bear out in reality, and certainly not in our industry.
The vast majority of downtime events trace back to systemic failures, not a freak event, and are more often catalyzed by momentary lapses than long-standing incompetence. Do we penalize the tech who clicked the wrong link on a dashboard, or the guy who wrote the dashboard such that a critical action contains no safeties or confirmations? Or do we penalize the manager for not having any established documentation on protocols surrounding triggering critical actions?
The only reasonable stance here is to collectively take responsibility for the failure. It may feel good to hang someone out to dry, but in all likelihood their failure was only the final link in a long chain of failures that extended well beyond themselves.
You root cause what led to the event (going deeper than "a tech clicked on the wrong thing"), and you fix the root cause, and you move on.
A team of good people should learn from their mistakes and reduce hazards along the way. But bumper bowling is no fun for experienced players. It's a balance, and it does tend to shift as a company grows.
The store went thru periods of relative stability, and relative lack of stability, and in the periods where it was not doing so well, it (or a major piece of functionality) would go down in some key area at least once a week, sometimes multiple times a week during the holidays.
While it's been several years and I'm sure they've improved reliability, the sheer mass of the store made it very slow to evolve. And as an ex-amazonian sometimes I go and check for bugs that were issues back in the day- several of them have come back over the years, which is not surprising given that the entire group that was working on the parts I was working on disbanded because so many people were driven off by bad management. (A one-two punch in that case, a bad manager backed by another bad manager, neither of which had any technical knowledge.)
At the time I worked there, large swaths of code in the store had no team who was responsible because the team had been disbanded in one of the regular shuffles of employees. Amazon had a tendency to get a team together to do a feature, launch it, get the PR and the stock bump, then disband the team and put them on other projects. Of course some of these things stuck around if they were successful, but there was a lot of cruft from past efforts like: Local restaurant menus, the movie times system, various "social shopping features" (a perennial favorite to try again and again.) Hell, they used to have catalogs for mail order merchants- scanned paper catalogs!
At the time, they were claiming that "AWS is what we built the amazon store on!" (which was totally false, S3 was engineered completely separately from the store, and to its credit, as obidos and gurupa were crap. The only thing the store shared with AWS for at least the first several years was being hosted in some of the same datacenters.)
At least at the time I worked there, I'd call it a mess held together by the code equivalents of duct tape and bailing wire.
One of the things Amazon excels at is customer service, so when these problems would impact the customer, their bacon was often saved by customer support fixing the problem manually (eg: messed up orders, etc.)
Granted, operating at Amazon's scale is not trivial matter. But Amazon is a retailer and stock marketing company (Eg: one of their primary products is Amazon stock), more than an engineering company.
I'm kinda amazed that people perceive them as a "tech giant" along with Google, Facebook and Amazon. Shows the power of a good (actually, GREAT) side business like AWS. They get the credit for building something good and scalable with AWS, but of course it was a separate team lead by a senior executive with enough political clout to shelter that team.
Except for the part where Gurupa enables scores of developers to build web apps that make hundreds of service calls yet emit results faster than the website we're using right now.
As for why, it's easy. Taxes. Much like Walmart rents its stores form an LLC it owns to write off the taxes and bring down the liability of the largest revenue sector, Amazon can write off their server costs since they can "rent" them from AWS LLC. While AWS makes a good chunk of change, it has nothing on amazon.com so by making AWS its own entity (and event better for them that its publicly available) they get a gigantic tax write off and AWS makes capex expenditures saving them taxes. All In all, the shell game must save amazon millions just like it does for Walmart
Does anyone have statistics for Amazon homepage uptime? I don't remember the last time I heard about Amazon being down.
And an hour after I read Patrick's (patio11) article on the Rails vulnerabilities. It's a scary day indeed.
* Or so I was told in a job interview with the big A a few years back.
The possibility for theft and fraud would be so massive if every dev at Amazon had write access to production that I find it nearly impossible to believe this is true.
One of the reasons I left Amazon was that I was given the job to deploy code regularly (about weekly) at 1am or so, and one evening, there was a problem due to work of another team, so it escalated and we spent 6 hours dealing with it. We rolled the change back right away, but for contractual reasons their code had to be fixed and deployed and there was an interdependency. Fortunately, it wasn't my team's mistake, but I had to be there to help test it, etc.) So, it's finally working at 7am, and I stuck around for 30 minutes to make sure it kept working before going to sleep around 7:45AM.
I emailed my boss about it, and of course he was getting emails the whole while as the tickets status was changing.
Still, the fact that I showed up at 10:15 for the 10AM meeting that morning was "unacceptable" and I got chewed out. (~2 hours sleep!)
I made the mistake of thinking that my HR rep might be someone to talk to about this, because I wasn't sure how to make it clear to him that it was kinda unreasonable (Especially since I told him I'd be late for the meeting)... and that's when I found out that everything I told her was written up in an email & sent to him.... resulting in getting chewed out yet again for going to HR!
The lesson: as a programmer, never work for a boss who can't program, or at least, be very wary of it!
1 - 45/(60*24*365)
all the internal links seem to be working fine
edit: added less ugly link
For some classes of items, they can sell at cost and still make money, because their operations are allegedly so good that they can turn over the inventory before their own payment to the supplier is due.
For example, say Amazon buys a book today and payment is due to the publisher in 30 days. They sell the book tomorrow at cost. Now they get to sit on the full price of the book for the rest of the month. In fact, take that money and buy another book, and sell it right away too. Keep that up, and you have a very big pool of money always sitting in your bank account. Money that can be profitably invested in other activities.
Why would a publisher give them 30 days to pay? Because they're Amazon. It's good to be big.
I think it's awesome. Imagine if Google had run a bunch of low-rent punch-the-monkey display ads early on. It would have killed them. Facebook vs MySpace is another good example of what happens when you focus on long-term value creation versus short-term profit taking.
Say you want a bite of the tablet market dominated by apple, it's easy, make a somewhat decent tablet for cheap and there you have it.
If you want a bite of an amazon dominated market, well good luck with that, and while at it hope that amazon is not planning to get into the market you're in.
It seems their strategy relies on tiny margins, maybe with a different set or circumstances amazon would change their stance, but I don't think it's currently part of their plans to ramp up prices.
If you are going up against the loss leader kindle though it is going to be a lot harder.
There was an article on HN a couple of weeks ago precisely about this topic, decent read / informative.
HN Discussion: http://news.ycombinator.com/item?id=5112998
61.1 Billion dollars (yearly revenue) / 31556926 seconds = 1936.18 dollars/second
That's still a huge amount of money ... $7mm an hour?
There will almost certainly be some number of people who would have stopped by Amazon right now and made some impulse purchases. At the scale Amazon operates, the increase in inconvenience to push off the marginal purchase as a function of inconvenience is almost certainly miniscule (See frequent reports on how milliseconds of page load time affect the likelihood of purchase)
(Surely there's some loss from being down, but it's not a simple loss = current order rate * downtime argument).
On a side note, the first thing I did was google, amazon down, and saw it was (http://www.isitdownrightnow.com/amazon.com.html) then I came here, and I am proud to say, this post was #1.
Still, downtime is money, even if it isn't a world-changing amount of it.
It's sad that "tech" bloggers don't research and report on news worthy things anymore, they just take what's on Hacker News and call it news.
These twitter gems demonstrate the cluelessness of the "hackers":
I know from the e-commerce side, when walmart.com went down last year we saw a traffic increase (enough to actually link to to the outage for walmart). I wonder if it'll happen here.
P.S. Wild guess. No idea about how much sale do they make during peak hours.
Why does this post have 168 points?
bogus, I can assure you
Don't post outages here.