
The 2002 mandate for internal communication systems at Amazon - anacleto
https://www.sametab.com/blog/frameworks-for-remote-working
======
Twirrim
Yegge's post was very interesting reading, and I took similar learnings away
from it. I was at Amazon at the time, however, and there were things that
certainly weren't true any more:

>3) There will be no other form of interprocess communication allowed: no
direct linking, no direct reads of another team’s data store, no shared-memory
model, no back-doors whatsoever. The only communication allowed is via service
interface calls over the network.

API first... except if you want to be a number of certain new services that
somehow managed to get away with not presenting an API, even though an API
would make every service team's life easier.

> 5) All service interfaces, without exception, must be designed from the
> ground up to be externalizable. That is to say, the team must plan and
> design to be able to expose the interface to developers in the outside
> world. No exceptions.

Except, similar to above, where teams apparently decided they didn't want to
think that way at all and management just let them.

> 6) Anyone who doesn’t do this will be fired.

Unless your exception is perceived providing value to the company. Then you'll
get lauded, and everyone is told they'll need to use your js laden, web only
interface, and to hell with any automation.

Mostly those exceptions just codified further in my mind about just how right
the Bezos email Yegge paraphrased actually was.

~~~
leoh
How do services talk to each other without an API? Is it something like "put a
non-well-documented object into a queue?"

~~~
morelisp
A queue if you're lucky!

There's also:

\- Hire an intern / "Customer Service Representative" / "Technical Account
Specialist" to manually copy data from one service into another

\- Dump some file in a directory and hope something is treating that directory
like a queue

\- Read/write from the same database (/ same table)

Or the classic Unix trajectory of increasingly bad service communication:

\- Read/write from the same local socket and (hopefully) same raw memory
layouts (i.e. C structs) (because you've just taken your existing serialized
process and begun fork()ing workers)

\- that, but with some mmap'd region (because the next team of developers
doesn't know how to select())

\- that, but with a local file (because the next team of developers doesn't
know how to mmap())

\- that, but with some NFS file (for scaling!)

\- that, but with some hadoop fs file (for big data!)

Obviously all of these are at some level an 'application programming
interface'. But then, technically so is rowhammering the data you want into
the next job.

~~~
saalweachter
Don't forget the most important step.

"Think of the acronym CSV. Don't look up the definition of the format, just
meditate on the idea of the format for a bit. Then write your data in the
format you have just imagined is CSV, making whatever choices you feel
personally best or most elegant regarding character escapes. Pass this file on
to your downstream readers, assuring them it is a CSV file, without
elaborating on how you have redefined that."

~~~
wanderer2323
This is awesome, where does it come from? Google does not give me anything.

~~~
saalweachter
The quotation marks are stylistic rather than for attribution.

My personal experience comes from ingesting product feeds from online stores.
Misapplication of \ from other encodings was the most common sin, but I'm
pretty sure I saw about three dozen others, from double-comma to null-
terminated strings to re-encoding offending characters as hex escapes. (And,
of course, TSV files called CSV files, with the same suite of problems.)

------
eigen-vector
This was not exactly a Jeff Bezos mandate but the result of an engineering
brainstorm. The mandate came more out of a "how to scale Amazon for the next
decade" discussion. In large companies, one where distributed/independent
teams are as important as distributed systems this ended up being the only way
to operate.

Initially, during the good old days of Amazon, there was what you'd call a
single datawarehouse. It made sense initially for every system that processed
an order to access the data by querying that data warehouse—this meant that
the processes would be distributed (different services), while the data would
be centralized. It also meant that any change to the way the data is stored in
the datawarehouse meant deploying code to a hundred places.

The most important problem that this addressed was however different. A
centralized datawarehouse meant that every customer request bubbled up into N
queries to the datawarehouse (where N is the number of services that needed
access to the data—billing, ordering, tracking...).

The mandate summarized in one line would be this—"the data is the one that
should go to the services, not the other way round." Voila, microservices.

~~~
dang
Ok, we'll take Bezos out of the title above.

~~~
eigen-vector
Thank you, dang! This is certainly a more accurate title.

------
throwawayy98121
Hi! I’m a senior engineer at Amazon. Throwaway account but I’ll try to respond
to questions if anyone cares to ask.

Yeah we use services heavily, but there’s plenty of teams dumping data to S3
or using a data lake.

There’s also the “we need to do this but management doesn’t see value so let’s
dump it on the intern or SDE 1, who we won’t really mentor or guide and then
blame, forcing them to switch teams as soon as they can.”

If you work at another company and think we have our stuff figured out at
Amazon, we really don’t. We have brilliant people, many of who are straight up
assholes who will throw you under the bus. We have people who are kind and
will help you gain all kinds of engineering skills. We also have people who
are scum of the Earth shit people who work at Amazon because I don’t think any
other sane company or workplace would tolerate them. We have extremes on the
garbage people end of the spectrum, unfortunately.

Sorry long rant - point being - it’s good to learn how we do things. The
internal email on services is pretty unique. I learned about it when I was an
SDE 1 back in the day. But - don’t take it as gospel. It doesn’t mean you need
to build services.

I can think of any number of examples where we follow anti patterns because no
one gives a shit about the pattern, whether it’s a service, a bucket, a queue,
or a file attached to the system used for scrum tasks, or shit passed over
email... we care about value at the end of the day. If you don’t provide
sufficient value at Amazons bar, they have no problem tossing you out the
window.

------
xyzzyz
> While the third point makes all the difference in the world, what Amazon
> really did get right that Google didn’t was an internal communication system
> designed to make all the rest possible.

> Having teams acting like individual APIs and interacting with one another
> through interfaces over the network was the catalyst of a series of
> consequent actions that eventually made possible the realization of AWS in a
> way that couldn’t have been possible otherwise.

Google has worked this way since time immemorial. That’s what protocol buffers
are for: to create services and pass data between them using well defined
interfaces.

~~~
gowld
A protocol buffer is a serialized data object, not a service API. It doesn't
(and didn't) prevent anyone from using shared memory, shared database, or
shared flat files to communicate.

Also, 2002 _is_ time immemorial. Google was founded in 1998.Protocol buffers
were invented in 2001.

~~~
atombender
Protobufs were invented for Stubby, the RPC layer which is apparently used for
absolutely everything inside Google. It's existed since at least 2001, and
uses protobufs as the RPC serialization. gRPC is based on Stubby (though not
the actual implementation).

~~~
repolfx
Yeah. Google is actually a much better example of this mentality than Amazon
is, if I'm reading the thread right. Google Cloud isn't behind AWS because of
some service architecture nonsense. It's behind because Google started later,
and it started later because for the longest time (I was there) the senior
management had the following attitude:

 _" Why would we sell our cloud platform? We can always make more money and
have higher leverage by running our own services on it and monetising with
ads; merely selling hardware and software services is a comparatively
uninteresting and low margin business."_

Selling Google's platform (and it really _is_ a platform) is an obvious idea
that occurred to everyone who was there. It didn't happen because of explicit
executive decision, not because Bezos was some kind of savant.

I think Google could have really dominated the cloud space if they'd been a
bit more strategic. The problems were all cultural, not technological. For
instance they are culturally averse to trusted partnerships of any kind (not
just Google of course, that's a tech industry thing). There are only two
levels of trust:

\- Internal employee, nearly fully trusted.

\- External person or firm, assumed to be a highly skilled malicious attacker

There's nothing in between. So if your infrastructure can't handle the most
sophisticated attack you can think of, it can't be externalised at all. If it
can't scale automatically to a million customers overnight, it can't be
externalised at all.

There's really no notion in Google's culture of "maybe we should manually vet
companies and give them slightly lower trust levels than our employees in
return for money". It's seen as too labour intensive and not scalable enough
to be interesting. But it'd have allowed them to dominate cloud technology
years earlier than AWS or Azure.

~~~
rwmj
I have to wonder if maybe Google's management weren't right. Why are Google in
cloud, a low margin business?

------
sputknick
I worked at an organization that had a similar declaration. Here's how it
played out:

1\. Everyone is super excited for other teams to share their data

2\. Everyone wants an exception from sharing their own data because it's too
hard or too sensitive to share.

3\. Eventually everything gets shared, but it takes 3-4 times longer than it
really should.

~~~
rb808
4\. A couple of years later you want to stop an obsolete interface but you
can't because a handful of systems use it and they dont have budget to change.

~~~
jrd259
True this happens, but you're still better off that if you had tight coupling.
In the absolute worst case you can make a shim implementation to support the
obsolete use case. You can not do this when callers are directly reading your
database/memory structure.

~~~
wil421
Yep, I’ve implemented new ticketing systems that have to talk to the ancient
ticket system for certain things or vice versa. The ancient system had direct
DB connections that had too many down/upstream dependencies and not enough
budget or political backing.

------
lytfyre
IIRC when Yegge accidentally posted that rant, the entirety of Amazon corp got
IP banned from Hacker News from _everyone_ rushing to view and comment.

------
FrojoS
Something like Conway’s Law was also recently cited by Elon Musk (jump to
3:30)
[https://www.reddit.com/r/SpaceXLounge/comments/dbttaw/everyd...](https://www.reddit.com/r/SpaceXLounge/comments/dbttaw/everyday_astronaut_a_conversation_with_elon_musk/)

~~~
degenerate
Direct link to 3:30:
[https://youtu.be/cIQ36Kt7UVg?t=206](https://youtu.be/cIQ36Kt7UVg?t=206)

------
goatinaboat
At a previous company, a senior manager took Yegge’s blog post and presented
it internally as his own original work.

Hilarity ensued.

------
tomduncalf
Found the original post from Yegge a really interesting and thought provoking
read. Didn’t realise from the context that he originally accidentally posted
it as a public rather than private Google+ post!

His follow up post explaining this, and with an interesting anecdote about
presenting to Jeff Bezos, is archived here (seeing as G+ has, ironically (or
not) given the context, shut down):
[https://gist.github.com/dexterous/1383377#file-the-post-
retr...](https://gist.github.com/dexterous/1383377#file-the-post-retraction-
message)

------
darksaints
At least as of 3 years ago when I left, the software systems that drove the
mandate towards SOA were still massive systems that communicated almost purely
through a monolithic Oracle database. It was the software system(s) that was
responsible for all automation and accounting at fulfillment centers. This is
one of those rare times where I actually think a full rewrite from scratch
would have been a better idea.

~~~
dodobirdlord
They got there in the end.

[https://aws.amazon.com/solutions/case-studies/amazon-
fulfill...](https://aws.amazon.com/solutions/case-studies/amazon-fulfillment-
aurora/)

------
prepend
I wish the actual body of the email was available and published. I’ve only
read Yegge’s account of the note and didn’t see it in any of Bezos’ books.

I suppose it’s nice that the email, or really any amazon emails, has not been
leaked.

~~~
boldslogan
I only could find his autobiography. What other books does he have / wrote?

~~~
prepend
Brad Stone wrote a Bezos/Amazon bio called “Everything Store.”

------
Waterluvian
> 6) Anyone who doesn’t do this will be fired.

So I've never worked at a company over 150 people. Is this... a normal thing
for an email? Maybe I'm just one of those softies but an email with that line
would throw me off my day and cause a serious hit to my morale and confidence
of working there.

~~~
jeffbarr
I strongly believe that Steve was exaggerating for effect here. In my 17 years
at Amazon I have never seen or heard of a threat of this nature. The overall
intent of the email was to tell teams to decouple, decentralize, and to own
their own destinies.

~~~
solarengineer
An ex Netflix person, who has since moved to Amazon, spoke at a client place
three weeks ago. He casually mentioned things such as "we forgive the first
time, and we fire the second time". From how he spoke, we felt that this may
be the norm in Silicon Valley and related places.

I have much respect for what he has achieved, so I didn't interrupt to
question such a fear-inducing mindset.

~~~
everdrive
With such important services as streaming Seinfeld, it's easy to see why such
a scorched earth policy is necessary.

~~~
dodobirdlord
There are compounding productivity boosts available when a team can all trust
each other to basically never cut corners or make sloppy mistakes. Removing a
tenth team member who is not up to the bar of the rest of the team can make
the remaining 9 members each more than 11% more productive.

Of course, this strategy has its downsides. You can't ever hire juniors. You
can't really hire people in and train them up at all, because everything has
been built under the assumption that only experts will ever touch it. This
makes an organization that operates like this inherently parasitic to the
industry, only capable of hiring in experienced employees from other
companies.

------
ineedasername
The article mentions this as dog fooding, but does that really apply here? Did
they do this with the idea in mind that they'd turn this stuff into a product?
It struck me as Bezos wanting things built for the future, reducing technical
debt, and the product-ification was an excellent byproduct, but perhaps not
intentional.

~~~
morelisp
At the time Amazon was building out their merchant portal as a white-label-ish
service for other large retailers to sell products online. The 'customers' in
the memo would be other merchants, and the early AWS offering (e.g. SQS)
reflect this. "Elastic" clouds weren't really on the menu yet but obviously
part of the point is that you can offer it to customers regardless of where
the architectural fad goes.

~~~
function_seven
I remember the early days of target.com and (I think?) toysrus.com being
thinly skinned versions of Amazon.

~~~
gowld
Yep. Also one of the large British retailers and a few others.

------
duxup
Eat your own dogfood.

You can't sell to customers effectively if your flagship product only works
because it has access to resources the customers will never have... and it is
designed around that flagship's needs and not your customer's needs.

~~~
morelisp
AWS is largely a side-effect of this memo, not its instigator. At the time,
Amazon's dog food would be books/clothes/literal dog food.

------
dang
Yegge's article never says it was an email. What should the title be?

Edit: I've taken a rather lame crack at it and am open to improvements.

~~~
jcrites
The circumstances that Yegge described happened somewhat before my time, but I
suppose you could call it an "internal goal" or "internal mandate".

Amazon's not really big on "mandates" in general, but the term seems to fit
Yegge's characterization of what happened. "Internal goal" would be another
way to phrase it. E.g., "The single most important technical goal in the
history of Amazon".

------
Invictus0
A lot of interesting thoughts here but the author doesn't really wrap them
into a conclusion. A whole lot of words to say "they all work and it depends".

------
cm2012
I love some of Amazon's executive policies. From what I've read, everyone has
to write a multi-page paper before executive meetings, and everyone has to
read it, so the meeting goes smoothly with everyone understanding the issues.
I hate how no one reads anything in most organizations.

~~~
kylek
Not sure about execs, but this happens in engineering meetings (regarding new
features being implemented or other semi-major changes). Whoever is initiating
the meeting writes up a paper describing the terminology, the nature of the
change and why it's needed, how it will be implemented etc. The entire dev
team (+ maybe other dev teams within the group), management (the initiator's
boss + 1 level above, maybe other dev team managers too) start the meeting
with hard printouts of the paper, armed with red pens. The meeting "starts"
with ~15 mins or so of silence for everyone to review the paper in the room
from start to finish. Then the paper is reviewed end to end and torn up on the
way. Often there are multiple of these meetings (i.e. first one went badly or
if things change along the way of building/implementing it and questions come
up)

~~~
QuercusMax
That sounds like a low-fi, synchronous, in-person version of reviewing a
Google Doc (with comments, suggestions, etc).

~~~
plandis
I’ve generally found that forcing people to dedicate time to read and discuss
a design is more fruitful than a google doc

------
jrochkind1
> what Amazon really did get right that Google didn’t was an internal
> communication system designed to make all the rest possible.

I'm not following what he means. What is the thing he is describing as "an
internal communication system" here? That made all the rest possible? What
is/was this internal communications system?

~~~
akhilcacharya
I'm assuming Yegge was referring to the RPC framework.

~~~
jrochkind1
"an internal communication system" does sound like something like an "RPC
framework", but Yegge's paraphrase actually says "It doesn’t matter what
technology they use. HTTP, Corba, Pubsub, custom protocols — doesn’t matter.
Bezos doesn’t care."

I read this as saying different teams/services don't have to use the same
thing either. That doesn't sound like an "RPC framework" or "an internal
communications system" at all. It seems to leave the door open to everyone
doing things in a diverse mishmash. Which isn't what I'd call "an internal
communications system" at all.

But was/is there in fact an Amazon-specific "RPC framework", that all Amazon
services use, some consistent framework used consistently accross services? I
haven't heard much about this before so am curious to learn more. I haven't
heard of an Amazon 'RPC framework' before, or what it's called, or what. And
OP doesn't specify it either; does the rest of the audience know what's being
talked about, and I'm just missing context?

If _that_ is the thing that the OP thinks is really what Amazon got right...
then the interesting thing is figuring out how it went from the paraphrased
email, which doesn't actually demand such a thing, to.... such a thing. Who
designed or chose this "RPC framework"? When? How? How'd they get everyone to
use the same one? If _that 's_ the thing Amazon got right, there are some
steps missing between the Yegge-paraphrased email and there, since the email
doesn't actually even call for such a thing.

Or is that not what happened at all, and I'm still not sure what OP means by
"an internal communication system" being the thing Amazon got right.

~~~
blandflakes
This edict was before my time at Amazon, so I can't speak to whether there was
an RPC framework in existence when this was mandated.

By the time I arrived, however, there was a cross-language RPC framework that
integrated with Amazon's monitoring, request tracing, and build infrastructure
(for building and releasing client versions). It was very full-featured and
the de-facto system for creating a service. Most of our communication in my
organization was done using this framework, and systems that violated the
"only communicate over a service boundary" mandate were real problem children.

~~~
jrochkind1
Interesting, people don't talk about this much, although the OP seems to be
aware of it and think it was important.

Does anyone know if there's been much written on how this came to be and what
it looked like? If not, it would be a useful thing to write about!

Cause it does seem like a really important thing, without it, the narrative
seems to be that you make a decree like Bezos', and bing bang magic, you get
what AWS got. Where in fact, succesfully pulling off that RPC framework seems
to be really important, and undoubtedly took a lot of work, good succesful
design, and social organizing to get everyone to use it (perhaps by making it
the easy answer to Bezos' mandate). But none of that stuff just happens, some
have failed where AWS succeeded, the mandate alone isn't enough.

~~~
blandflakes
I think a lot of Amazon's internal tooling is sort of "unpublished" \- I've
not found a great reference for a lot of the really excellent dev support they
had.

The AWS story is particularly interesting because a lot of the internal setup
I was doing at the time was on old fashioned metal. There was an internal
project called Move to AWS (MAWS) that encouraged using newly-developed
integrations with the AWS systems that the public was using.

In other words, AWS lived alongside old-fashioned provisioning practices up
until even the early 2010s.

------
thefounder
The issue if you are developing using such requirements is that the product
will end-up quite expensive. A simple messaging or authentication feature
becomes a fully fledged multi-tier service maybe with super admins,
owners/admins and clients. Dev budget is not an issue for Amazon though...

------
notacoward
A few things I'd add today:

* Every service must provide latency and error-rate metrics.

* Every service must be capable of generating and/or responding to backpressure when things become overloaded.

* Every service must be prepared to support multitenancy.

------
d--b
The thing to point out is Bezos is a real techie, and while any business guy
would have built amazon on top of msft or google cloud, the fact that he knows
about infrastructure made it possible for Amazon to build AWS

------
busterarm
Reading Bezos' mandate email puts a smile across my face, every time.

------
thrower123
Why does the title of this keep getting flopped around? It's shifted three or
four times today. I thought it was supposed to be the title, or the subtitle,
and avoid paraphrasing.

------
iagooar
> 6) Anyone who doesn’t do this will be fired.

I would have so much loved this approach in the last corporate job I had. It
would have changed so many things in such a short time...

------
dmh2000
here's an article about how the idea of AWS came about. the main takeaway is
that it evolved and the article has a lot of 'we' in it, not only 'jeff'

[https://techcrunch.com/2016/07/02/andy-jassys-brief-
history-...](https://techcrunch.com/2016/07/02/andy-jassys-brief-history-of-
the-genesis-of-aws/)

------
brown9-2
It’s such a loss that Yegge doesn’t blog anymore.

~~~
rctay89
...you were saying? :) [https://medium.com/@steve.yegge/google-to-grab-one-
year-late...](https://medium.com/@steve.yegge/google-to-grab-one-year-
later-3e1e4df321f3)

------
emmelaich
> doesn't matter what technology they use. HTTP, Corba, Pubsub, custom
> protocols

So a jdbc interface and published schema would count?

------
totaldude87
>> Anyone who doesn’t do this will be fired

Right, motivating everyone.. check..

------
ga-vu
Do other (Silicon Valley) companies do the same?

------
jordache
is this trying to be stratechry in format?

------
darkstar999
Author should ctrl-f for the many erroneous double spaces. </ocd>

~~~
namdnay
Macbook keyboard maybe?

------
iamleppert
We have robotic baristas here in SF, but no one uses them. Why? People want to
have their food prepared and served by a real human being, in most cases. The
food tastes better when it's served to you by a real person.

~~~
cthalupa
I believe you replied to the wrong story, unless robot baristas serving you
coffee was meant to be a metaphor for using APIs for programmatic
communication vs. SFTPing csv files, or something.

------
arkitaip
If you use an adblocker like uBlock Origin, you can add the following rule:
news.ycombinator.com##.pagetop

Unfortunately it removes ALL of the top navbar but I've found it really useful
to get around HN's damaging and useless gamification metric.

~~~
ianmobbs
What?

