
Prime Down: Amazon’s sale day turns into fail day - koolba
https://techcrunch.com/2018/07/16/prime-down-amazons-sale-day-turns-into-fail-day/
======
camtarn
Kinda wish I still worked there and could see the post-outage writeup.

My bet's on some unforeseen bottleneck that affects search and static pages.
Almost everything within Amazon is crazy scaleable, but there are some bits
where you scale them up and their behaviour changes radically. For instance, a
service's cache misses might skyrocket as customers get distributed over a
wider set of servers, causing service response times to increase just a little
bit on average, tipping a dependent service over into more frequent timeouts,
causing its downstream service to blow a timeout-percentage 'software fuse'
and stop using that service... etc etc.

Given that each of those services (and many more possibly-related ones) will
have an on-call engineer paged into a conference call when the manure hit the
rotating ventilation apparatus, there are going to be a lot of unhappy people
cancelling their weekend plans right now. I definitely don't miss that aspect
of the job!

~~~
throwaway427
smile.amazon.com was working fine during the outage, if that helps narrow it
down...

~~~
azhenley
It wasn't though. I saw an item being listed as a Prime Day deal on the normal
Amazon site, then I searched for it through smile.amazon.com and it wasn't
there. Went back to non-smile and it was there (but throwing errors so I
couldn't click it anyway...)

~~~
Alex3917
I'm seeing this also. Same item is $169 on smile.amazon.com, versus $79 on
www. Neither is working properly though.

------
rezashirazian
Something no one has mentioned yet, could it be that the engineering force at
Amazon is no longer what it used to be?

I can personally point to two friends who I consider top notch engineers and
designers that have left Amazon because of its toxic culture. I'm sure I'm not
the only with these anecdotal examples, we've all heard the stories. At the
end of the day years of unbalanced work/life balance, overly aggressive
management and frugal approach to everything makes for a weak argument for A
players to stick around.

Could this be an example of crumbling engineering standard at Amazon?

~~~
throwaway289355
AWS employee and bar raiser here.

>Something no one has mentioned yet, could it be that the engineering force at
Amazon is no longer what it used to be?

In many regards, yes. The bar had to be lowered to meet the demands of growth.
We've also taken in a lot of hires from companies that have brought their
culture and friends with them. The culture at Amazon is not what is was even 2
years ago. It is in many places day 2.

No one also seems to notice that Amazon retail often suffers widespread issues
like this. We can count on SEV1's happening during peak as things blow up
badly. This has happened several years in a row, and sadly the themes are
pretty much the same across all: forgot to scale (yes...really) or some stupid
system bottleneck. It doesn't help that Amazon retail has a good amount of its
workforce based in India and seemingly disconnected from the Seattle based
leadership.

~~~
nikofeyn
what is a bar raiser?

and when has amazon ever been an engineering force? i have always felt the
website and service experience is a relic of the 2000s. more often than not, i
get the answer “our system can’t do that” from customer service.

~~~
throwaway289355
>[https://www.socialtalent.com/blog/recruitment/raising-the-
ba...](https://www.socialtalent.com/blog/recruitment/raising-the-bar-
unconventional-interview-method-really-works)

I think Amazon has taken on an outsized image to many people that just isn't
true. We have good engineers in many organizations, but we don't pay enough,
have the right strategy, or take care of individuals well enough to lure the
kind of great folks you find at other big tech companies. In many ways, Amazon
is a retailer that does technology because it found a way to make money from
it. The DNA is still MBAs/finance and retail.

~~~
deskamess
How much do you think the on-call contributes to engineers leaving? You would
think support tools and support personnel could help to retain engineers.

~~~
jhall1468
No, bad design and refusal to manage technical debt is the issue. Oncall only
matters in some orgs and even then only matters where the tech debt is totally
out of control.

Bottom line is Amazon is a product culture not an engineering culture and that
makes it really easy to leave for Google or unicorns that really appreciate
tech debt tradeoffs.

------
rpeden
If only they had access to some kind of scalable cloud hosting service, they
could've completely avoided this sort of outage. :)

Jokes aside, I admire the work of the team(s) responsible for Amazon's web
site. I use it so often and encounter glitches so rarely that it really stands
out when something _does_ go wrong.

~~~
coryfklein
Serious question: I've heard that Bezos's approach with building out
commercial units is to break down each part of the vertical into separate
commercially-viable components. Idea being if AWS doesn't make sense for 3rd
parties to use then it may not be economical to use it internally.

Now the question part: would Amazon ever secretly run Amazon.com in a multi-
cloud setup, balancing between AWS, GCE, Azure, etc?

~~~
nevir
As someone that worked at Amazon a long time ago—back when AWS was just
getting started—can confirm (historically). Publicly there were a myriad of
AWS services; internally all we could use was S3 (for many years), if we were
lucky. AWS being born of Amazon's "spare capacity" is an urban legend.

Nowadays, I hear it's quite different, and much of AWS is more rapidly
dogfooded.

~~~
chronid
Some services (before I left) just did not have the sheer capacity to have
Amazon.com as a customer at peak. The service teams just said "nope, sorry,
you're going to kill us".

The requirements for prime day/black friday/cyber monday were mind-boggling.

~~~
xkjkls
That was quickly adjusted. I think like 3 years ago, Amazon publicly was
saying that every day to AWS they were _adding_ the same amount of computing
power that was used to run Amazon.com when it was a $10 billion business. AWS
is massively greater in scale than Amazon.com at this point.

~~~
ec109685
Scaling services for many independent businesses is somewhat of a different
challenge than “vertically” scaling for one large one like amazon.com.

------
tres
Funny how this comes on the heels of aggressively expanding their workforce
and trying to leverage themselves in a hundred different directions...

Maybe it's just me and my confirmation bias at work, but it seems that the
core value proposition that Amazon provided -- high value, low margins on
_products_ \-- has been eroding before our eyes.

Seems so much like the transition Microsoft made... too much focus on
"synergies" and leveraging... not enough on keeping the bilges dry and the
engine running.

It's funny... Fred Brooks wrote about this in 1975... and we're still making
the same mistakes _forty years later_. There are real limitations to how
quickly any organization can grow. Even awesome companies who are excellent at
building organizations -- places like like Amazon and Microsoft -- can't
organize this law of software development away.

~~~
klodolph
> Seems so much like the transition Microsoft made... too much focus on
> "synergies" and leveraging... not enough on keeping the bilges dry and the
> engine running.

The companies that just keep the bilges dry and the engine running are the
ones that we love, but they’re gone because they got made irrelevant. Or they
got absorbed into something larger. Microsoft has a bunch of failed
initiatives (Windows Phone, Zune) plus a bunch of successful ones (Azure,
Xbox, Office 365).

If you’re up for classic books like Brooks check out _The Innovator 's
Dilemma._ You have to try to expand in a hundred different directions because
you don’t know which one of those hundred directions will be relevant next
decade, and you have to be unafraid of cannibalizing your core business
because if you don’t eat yourself then someone else will eat you instead.

~~~
marcosdumay
The Innovator's Dilemma is certainly not about synergies between products. By
its conclusion, it's worthless to try to diversify within the same structure,
you'd better create a new company and turn the old one into a holding.

~~~
klodolph
Why would you think it’s about synergies? I’m not sure why you would think
it’s about synergies. Or why it would be “worthless to try to diversify within
the same structure.”

Summarizing the book here would be a bit of a disservice—but one of the points
of the book is that there are economic reasons why companies focus on their
most profitable core products, and there are economic reasons why that kind of
focus can result in the company collapsing when the market moves forward. This
isn’t some kind of imperative—the book isn’t saying, ”therefore, you should
create a new company.” It’s more descriptive, “this is how big, successful
tech companies can suddenly fail.”

~~~
marcosdumay
> Or why it would be “worthless to try to diversify within the same
> structure.”

?

The book has 2 chapters on that single point. And repeats it everywhere else.

On synergies, the OP post was about it, not about disruption.

------
wenc
I wonder how Alibaba Cloud handles similar events [1], where there are bursts
of 256k/s transactions and ~1bil packages being shipped out.

Do they just do brute-force massive scale out?

Amazon's US market is big, but my understanding is that number of online users
in China (> 400mil) exceeds the population of the US (~325mil), which makes me
wonder if the folks there think about data architecture a little differently
than we do.

[1] [https://qz.com/1127087/singles-day-crazy-stats-from-
alibabas...](https://qz.com/1127087/singles-day-crazy-stats-from-alibabas-
online-shopping-extravaganza/)

~~~
rannn
Probably every day at Alibaba is a prime day in terms of transactions...

~~~
akvadrako
True but they still have Singles Day, so it's not like they have constant
load.

------
lanius
I actually managed to add an item on sale to my cart, but my cart is now empty
after refreshing. Oh well, maybe next year.

~~~
phy6
This made me laugh more than it should have.

------
blueside
I can't help but feel terrible for the team of people there that ultimately
gets blamed for this, I hope they can get some sleep tonight.

~~~
matte_black
A night of missed sleep didn’t hurt anybody.

~~~
coryfklein
It's not so much the missed sleep, as it is the 48 hours of intense heart-
rending stress.

~~~
matte_black
I’m blown away by the sudden and swift down votes to my original comment.

These engineers work at a world class company and are paid vast sums of money
to not fuck things up. They live way better off than the majority of the
country and their mere presence makes life more expensive and stressful for
communities around them.

To suggest they cannot go a mere 48 hours or less without sleep on one of
their company’s most hyped days is out of touch.

~~~
toomuchtodo
Amazon is not world class. The pay is not comparatively great. There is no
retail job (including Amazon) where undue stress and sleepless nights are
warranted.

You're not saving lives, you're selling books and cat litter on the internet.

~~~
cheeze
Is their pay not comparatively great? Usually when I see this statement, folks
are comparing seattle to silicon valley 1:1 which isn't a fair comparison.
Seattle is expensive but not _that_ expensive. My friends who work at amzn
seem to be compensated in line with everyone else I know, but maybe I'm wrong?

~~~
toomuchtodo
The vesting schedule and the sweatshop-esq environment arguably don't make the
pay worth it (compared to what you can make elsewhere).

Disclaimer: My knowledge is based solely off of public reporting and first
hand experiences of SWEs and TAMs no longer at Amazon/AWS.

------
helper
I can't login to my root aws account right now. It's pretty annoying that the
root account login for aws is tied to amazon.com retail accounts.

~~~
all_blue_chucks
Using the root account is a poor practice anyway...

~~~
helper
Is it really so difficult to imagine that I needed to perform an action that
can only be done from the root account?

For example, the pen test authorization request form can only be filled out
from the root account.

------
russellbeattie
I work in a non-retail part of Amazon _and_ I'm on vacation. Hasn't stopped
friends and family from texting me about this though. As if I personally can
go in and reboot a server or something. Hope we get it sorted soon!

If you're affected by this, please accept my unofficial thanks for your
patience and understanding. (If you're a coworker in retail, good luck getting
things up and running!) :-)

------
itsEtai
Help strikers in Europe by NOT purchasing from Amazon today. Thank you.

~~~
mlrtime
"Europe" strikes as often as the wind changes.

~~~
antonvs
Which only raises the question, why doesn't the US? Working conditions must be
so fantastic here that no-one feels the need to strike.

It can't be that companies have crippled unions so that they can treat wage
and working conditions as a one-sided negotiation. Surely not...

~~~
drstewart
How come it doesn't raise the question of "why doesn't Canada?"

~~~
smt88
1) They do, 2) they have more worker-friendly laws

------
kolleykibber
It's also impossible to log into the AWS console. Surely something of such
importance should be separate to the ecommerce site.

~~~
lawnchair
Down for me also and it's being reported on
[https://status.aws.amazon.com/](https://status.aws.amazon.com/)

------
ComputerGuru
I know TFA is about technical failures, but the deals themselves are also
incredibly lacklustre. I was expecting _at least_ the Warehouse Deals part of
Prime Day to come through, typically 15 to 20% off all used offerings. This
year however, Amazon restricted it to only select listings which translated to
a few hundred items total. Very sad.

~~~
HillaryBriss
I didn't see anything that appealed to me either. One other person told me the
same.

Maybe Amazon's overloaded system was caused by shoppers checking back more
frequently than in past years because they can't find really good deals this
year either _but keep looking harder and harder anyway?_

------
biztos
What about the advertising money already spent?

I read somewhere (sorry, forgot where) that Amazon had been pushing sellers to
spend like mad on ads within Amazon.com for Prime Day, apparently it gave you
a big edge over whatever the algorithms suggest.

Those sellers will have missed their sales targets, and will consider the ad
spend to have been wasted. Will they get it back?

~~~
fencepost
And vendors who boosted supply based on anticipated sales (both discounts and
purchases driven by discounts). Are vendors going to find themselves with
thousands of extra widgets on-hand but without the anticipated purchasing
frenzy they were counting on to sell them?

~~~
CamelCaseName
Yup, and if you have too much inventory, you can either remove it or sell it
at a discount, both are expensive options.

Otherwise you'll get Amazon telling you, "Hey buddy! You sure are using up a
bunch space and not selling much. Why don't you not do that? We're limiting
the amount of storage space you can use for Q3.

------
rc-1140
... Is there anything even good for Prime Day this year? Or the year before?
Two years ago I remember seeing at least some Dell Workstations that could be
repurposed into cheapo home servers. Most of the stuff seems to be odds, ends,
and stuff that the various Chinese product-clone companies couldn't get rid
of.

~~~
JacobJans
I got a baby car seat for half off, which saved a lot of money, actually.

------
oflannabhra
Is there enough historical, public data available to estimate the amount of
money Amazon is losing per second?

~~~
CamelCaseName
Last year they sold $2.4B, and this year I've seen estimates around $3.4B -
$4B.

Prime day is 36 hours long, but I bet sales are weighted heavily in the first
few hours.

So 3.7B / 36 / 60 / 60 = $28.6K per second, and then maybe double or triple
that for the first hour or two after 3PM and that'll give you an idea of the
scale.

There's also knock on effects, like reduced trust in Amazon/less orders for
the rest of prime week, but also positive effects, like people who will just
defer their shopping.

For what it's worth -- my sales are below my 30 day average. Glad I didn't go
all out this year in terms of advertising.

------
rootbear
The web site is a bit of a mess. And it seems that, just since this morning,
my entire Wish List has vanished. Foolish me for not having a backup...

~~~
antonvs
Check again, mine was missing but it's back now. It's doubtful they'll
actually lose data, issues are probably just due to services that are offline.

------
jcampbell1
I'm seeing automatic reloading the page every second or so. Maybe some bad
javascript, though it isn't adding entries to the history. Looks like they
have a script that is DDoS themselves.

~~~
phy6
Same, the search results page keeps refreshing, void of any items.

------
rajathagasthya
What happens to people responsible for the crash today (infra, culprit
services)? Does Amazon take some kind of "action" since Prime Day is a huge,
once-a-year event for Amazon?

~~~
chronid
They will have to write a detailed post-mortem (with many people with titles
starting from director watching it every week). Based on what comes out from
the post-mortem automation/testing will be implemented to remove/mitigate the
failure from the equation in the future.

Unless this is something egregious (ex: a manager not allowing the team to
"scale up" in preparation for the event) no one will get fired. Tempers may
flare a bit if it is something stupid (it's usually not).

Nothing different from what happened with the (very public) DynamoDB and S3
failures of yesteryear.

------
macshaggy
Well, everything I wanted to buy is simply not on sale. If Prime Day isn't
supported by those things that I want or need then why would I participate?

~~~
fma
I've participated before only to be disappointed. I didn't even Beyer looking
at their 'deals' this year. I consider this their garage sale to get rid of
junk...unless you want to buy Amazon products like Ring.

------
hmcm55
changing the url to smile.amazon.com works

~~~
degenerate
People should be using Amazon Smile all the time, outage or no outage :)

I use this Chrome extension that rewrites all amazon pages to use it:
[http://www.smilealways.io/](http://www.smilealways.io/)

FF equivalent: [https://addons.mozilla.org/en-US/firefox/addon/auto-
smile/](https://addons.mozilla.org/en-US/firefox/addon/auto-smile/)

~~~
52-6F-62
It doesn't appear to be available for Canada, unfortunately. I didn't even
know it existed.

------
wheeler5x5
How about some love for their marketing guy who made it all happen?

Jeff probably said "make it rain, let's see if your hordes can take down
Amazon.com," and this guy basically accepted, and succeeded at, the challenge.

------
mirimir
Huh? Amazon seems OK now.

Edit: But hmmm, Quora just went down.

> 504\. Gateway Timeout.

> Quora is temporarily unavailable.

> Please wait a few minutes and try again.

And they use AWS, right?

Edit: I just got as far as search, order, cart. But no account as Mirimir, so
...

Edit: Re Quora -
[http://downdetector.com/status/quora](http://downdetector.com/status/quora)

I wonder what other AWS stuff is down. If that's it, anyway.

------
anigbrowl
Industrial action is cool and good
[https://twitter.com/kadybat/status/1018926864767676416](https://twitter.com/kadybat/status/1018926864767676416)

------
dawhizkid
I’m curious what the reprocusions are, if any, once a post mortem is completed
and teams or individuals that contributed to the outage are identified. Is
“causing” this a fireable offense?

~~~
selectiveshift
Not unless there was malicious intent or willful negligence. Amazon is a data
driven company. The data shows that a “blame” culture results in more
incidents. (Airline industry taught us this:
[https://www.faa.gov/about/initiatives/maintenance_hf/library...](https://www.faa.gov/about/initiatives/maintenance_hf/library/documents/media/human_factors_maintenance/understanding_the_safety_culture.a_communicational_approach_to_%60blame%60_options_in_asrs_incident_report_narratives.pdf))

~~~
smt88
Do you know this from experience? Or are you just guessing?

------
ovi256
FWIW, it worked fine in Western Europe since 1 GMT, 8 hours ago.

------
bluedino
This is the page everyone seems to be getting

[https://i.imgur.com/vpIHDpA.jpg?1](https://i.imgur.com/vpIHDpA.jpg?1)

~~~
plandis
Same on mobile. Perhaps Amazon retail is transition to a dog photos as a
service business model?

~~~
eat_veggies
did Dapps stand for Dog apps the whole time?

alternatively, Amazon rebrands their entire cloud lineup to PAWS

------
mrep
It's up, aaaaannnnddddd now it's back down...

------
mikestew
It probably doesn't help that the mobile app appears to load a random picture
of a cute dog every time I press the "retry" button. So you can guess what I'm
doing, trying to get it to load a new "Dogs of Amazon" pic.

Probably should have gone with goatse, reduce the load.

EDIT: do _NOT_ search for "goatse" on your work connection. That alone, even
if you've never heard the word, should tell you why I suggested it as an
alternative.

~~~
ergothus
Interestingly, despite not being a dog person, I found the first instance of a
dog to be a cute, human-relatable inclusion. On later pages I found it
frustrating and annoying.

I'm sure there's a fundamental UX principle or two at work there, but I won't
pretend to truly know what they are.

~~~
794CD01
It's called the Duck Hunt principle.

~~~
ergothus
Any sources for me to read up on? Googling just gave me the BattleChess duck
story, some unrelated UX examples using ducks, and info about how the duck
hunt "gun" worked.

~~~
jldugger
I've never heard the term before, but I do recall plenty of people playing
duck hunt shooting at the dog when the game said they failed. Although, he
kinda earned a laser zapping by laughing at your failures, right?

------
hartator
It's interesting that you can have hired the best talents in the world, but
still have major outages. I wonder if there a way to ensure more people = more
stability. Sounds a bit stupid, but maybe if each datacenter has its own
software team working on the same issue, it will be very redundant work, but
maybe it will be more organic in its failure?

~~~
gm-conspiracy
Or a team of 9 pregnant women making a baby in only one month?

~~~
hartator
Should work because 9 months / 9 = 1 month.

You probably want a distributed team though, with maybe fetus being thrown
around by pneumatic mail.

~~~
cncrnd
You're joking, but I do wonder if you can speed up fetal development in some
way.

------
anonu
Consumerism at its best... Probably for the better than Amazon Prime is down
(at least for me - I'll speak for myself).

Honestly, my life got better when I stopped getting packages to my door. Only
buy the most essential things you need - sell the stuff you don't. Better to
live an uncluttered life without "things"...

------
blairanderson
I am unable to buy/bid on my Amazon AMS ads still as of 2:12PM.

------
aviv
Yikes AWS console down

------
downrightmike
I mean, I see it fail from the fact that there aren't any deals I want. Maybe
if their fire tablets ran full android, but there isn't really anything on
sale.

------
jbeales
Just tried to do a search, got no results. Google search found me the product
page, (which worked).

If you can't even search for products, there's a problem!

------
dingo_bat
TFA is a super buggy webpage.

------
coryfklein
Hasn't this happened previously? It may become part of the celebration each
year.

------
oh-kumudo
Amazon is worse than Alibaba even, what a failure today.

------
XalvinX
I just realized the influence of Hacker News. I read techcrunch every day and
it is rare for an article, any article, to get more than 2 or 3 comments
(although, to be fair, they just added commenting in the last few
months)...this one got 35 already!

Anyways, I think that this is NOT a fail for Amazon at all, but a major win.
They obviously created all kinds of attention! I'm sure they will get it right
next time, and maybe, if they think fast, can reverse this situation by
offering those who were disappointed a 2nd chance at an even-greater
discount...at least that's what I would do.

------
kazishariar
It's hush-hush in the industry that CAT5 and it's predecessor the CAT5+1, both
are shielded with peanut butter. It's also a little-known fact that most
datacenters or PCs as we in the NOC tend to call them. Are also built around
tropical rainforests, which provide both security from the common man, as well
as naturally cool down the millions of heatsinks needed for the cloud. But an
even lesser known fact is that monkeys, yes monkies love peanut butter. --I'll
leave the rest to your imagination. But one of those fat guys, drinking beer,
with a hoodie, and in need of a bath left one of the windows open. The rest...
well, the rest was prime.

------
sj4nz
After reading Ryan Holiday's "Trust Me I'm Lying" and knowing how well AWS
usually handles surges of traffic, I am not so sure that this isn't a
marketing stunt that gets them an amazing amount of press. Cynical?

~~~
Jach
Could also be an A/B test of some new infrastructure design to see if it's
ready to deploy in November. I have no idea if they still hold everything
together with Perl (via this templating system:
[http://www.masonhq.com/](http://www.masonhq.com/)) but it wouldn't surprise
me, and it also wouldn't surprise me that there'd be occasional pushes to
replace it with something "modern" (or at least friendlier to the revolving
door of new grads).

~~~
jimbofisher1
This would go down as one of the worst A/B tests in history if that is the
case, which I highly doubt.

------
rednerrus
My guess is this is an intentional marketing ploy. Think of how much press
this is generating. Frontpage of HN. No way we are talking about this
otherwise.

~~~
NathanCH
Amazon is the front page nearly every day for some reason or other (:

~~~
wetpaws
If you are big enough it's hard to miss you.

