
Internal documents show how Amazon scrambled to fix Prime Day glitches - FactolSarin
https://www.cnbc.com/2018/07/19/amazon-internal-documents-what-caused-prime-day-crash-company-scramble.html
======
jamesrwhite
Has "Sable" been spoken about externally before? I found a few details about
it on a LinkedIn profile:
[https://www.linkedin.com/in/kokamd/](https://www.linkedin.com/in/kokamd/).

"Sable has been hugely successful in the company: it provides storage and
computation environment to over 850 teams, running 2450 unique applications,
hosts 10,000 data-sets comprising of 2.05PB of (replicated) data across all
retail regions. It handles about 2.2 trillion transactions per day with
average client-side latencies of 3ms. In the first half of 2016, there were
two outages of the cluster in North America due to operator errors, and this
brought down multiple lines of business, not just Amazon.com. Leading the
project to reduce SABLE blast radius. This involves segregating the data-sets
and providing seamless fail-over to alternate storage solutions (for example
DynamoDB) for data that is very critical such as Amazon catalog."

"SABLE is Amazon's e-commerce storage and computation platform. It hosts some
of the most important applications running on Amazon.com. Altogether there are
about 400 teams (including Shopping Cart, Items, Prices) that use SABLE within
Amazon.com for high performance storage, caching, and new business object
derivation services.

Part of the team that built SABLE from ground up. SABLE uses BDB (Berkley
Database) for persistence and Libevent for non-blocking I/O. My specific
feature contributions include: repartitioning support (add/reduce fleet size
without any impact to live production traffic), SABLE reactors (computation
platform for propagating prices, availability to the website). I optimized
SABLE reactors by eliminating worklogs (which is the bookkeeping mechanism
used to keep track of the amount of work that needs to be done in the system)
which resulted in 30% reduction in our fleet size.

As a manager, I had five directs, and our team launched: Quality of Service
for SABLE reactors (this helped in faster prices propagation to the website),
Shopping Cart on SABLE platform (increased the availability of Shopping Cart
application, more than what Oracle could provide)."

~~~
judge2020
There are listings for Sable development on amazon.jobs:
[https://www.amazon.jobs/en/search?base_query=sable](https://www.amazon.jobs/en/search?base_query=sable)

> Our NoSQL storage platform processes more than 1 trillion transactions per
> day to serve Amazon country-specific and private-label websites and internal
> Amazon systems.

------
CharlesW
> _" I'm confident we'll deliver an even better experience next year," he
> wrote in the email._

"Even better."

Ugh, I detest this kind of PR-written language. Acknowledge failure.

"We failed, and we're sorry we let down our customers. We'll try our best to
provide a shopping experience that's up to our standards, as well as our
customers' expectations, next year."

This is not that difficult.

~~~
ttul
How would you know?

~~~
vokep
Because its surprisingly easy to empathize with. The fundemental pattern of
choosing between _doing the right thing_ and _doing the easy thing_ is all it
really comes down to. The right thing isn't nearly as difficult to do as it
sounds.

------
wittekm
(Note: my knowledge here is out of date as of 2015.)

Sable is relatively ancient and badly in need of deprecation. (If I remember
correctly, as of 2011 it was sort of in a "please consider alternate storage
solutions" mode.) It is the closest to a single point of failure I can think
of at Amazon.

Hopefully this is the kick in the ass they need to move to something more
robust.

------
mercwear
It's amazing to me that CNBC was able to pull such a long piece out of this.
The image they used of Bezos is funny too, as if he was ashamed or something.
I'm guessing he has a particularly good prime day since Amazon stock hit a new
high.

------
VectorLock
So many jokes to be made about Amazon having to contact their AWS
representative to get their limits raised, etc.

~~~
justicezyx
Not sure what you mean, running Amazon retail on AWS is a fact that
demonstrates AWS' maturity and Amazon's commitment to have the same
infrastructure for internal and external users, which is a bold and long-term-
effcient strategy where many cloud providers are struggling to achieve.

~~~
judge2020
Sable is run on non-AWS infra to prevent a single point of failure for the
entire AWS side of the internet. They still likely have an auto-scaling system
specifically for Sable, however without the numerous amount of AWS servers
that could have been used to auto-scale.

