
Making Netflix's Data Infrastructure Cost-Effective - el_duderino
https://netflixtechblog.com/byte-down-making-netflixs-data-infrastructure-cost-effective-fee7b3235032
======
jonplackett
When I read something like this I realise just how immense of a lead tech
companies are building vs 'normal' companies. There's just layer of expertise
in how to streamline and organise themselves that anyone without a 1000 strong
engineering team and huge engineering budget just are never going to think
about.

~~~
hibikir
Having worked at some of those companies, along with small startups, the
biggest difference isn't really plain old sophistication, but expenses so high
as to make the engineering effort worthwhile.

If your production system is 10 instances on some random cloud, a 10%
efficiency savings saves me 1 instance, so maybe $2k a year. Taking into
account opportunity costs vs doing things that raise revenue, said startup
would consider the effort a waste of time unless it took a few hours.

With the same architecture, but instead 20K instances, then suddenly that 10%
is saving 2000 instances, and 4 million a year. Unless there's a major
engineering shortage, chances are that spending a over a month on 4 million in
yearly savings will be completely justified, and would even be a highlight in
someone SRE's review.

Chances are that the optimization wasn't even any harder in the big tech
company: It's just that small savings on big piles of money are suddenly worth
it. It's not possible to find millions of dollars in loose change between the
proverbial couch cushions if you didn't have the chance to spend hundreds of
millions in the first place. Heck, in a growth company, even at that size, 4
million might not be enough savings, as there might be even bigger things.

Same with dev tooling: Saving a company 5% CI times is not great with 10
developers, but if you are making five thousand developers more productive,
suddenly you can hire an entire team of very specialized developers, and it's
not a luxury.

This is also why often copying what large companies is foolish when you are
small: The tradeoffs are going to be completely different. What would be an
unacceptable flaw in a large company is just fine in your startup.

The real trick IMO is the 200-800 range: Large enough that the simple
solutions for small companies have probably broken down and are causing pain,
but nowhere near the staffing to hire yourself a team to, say, add types to
Ruby, get a team to build around the weaknesses of your database, or whichever
other problem you have that any member of FAANG would just throw 10 million
dollars in staffing costs without batting an eye. I've seen way too many
companies that stopped being able to grow their company at those intermediate
sizes, and get stuck in technical hell.

~~~
kabes
The problem is that managers at small companies with some, but insufficient,
technical knowledge also read these articles and think they need to do the
same. I come across overengineereed stuff at small companies all the time for
things they don't need but only do because it's what Google/Facebook is doing.

~~~
mlthoughts2018
Engineers do this mostly, not as much managers (even less so for technical
managers). Managers typically either want things done yesterday and would see
this as wasteful over-engineering, or would be delegating and trusting a
senior or staff engineer if they said this was truly necessary.

------
gavinray
Why does Netflix have half a dozen different database/database-esque systems?

Snowflake, Hive, Cassandra, RDS, Druid, Presto

Is there a good reason for this that people without experience at FAANG orgs
can't grasp? To the naive, it seems like mixing Postgres, MySQL, Mongo, and
Oracle.

Also why is it on Medium under a Paywall, they are worth, quite literally,
nearly $200 billion dollars =/

~~~
dmlittle
Not sure about all of them but different databases have different use cases.
It's not necessarily Postgres vs. MySQL where they are competing databases for
the same use case. Snowflake, for example, is a data warehouse that (I
believe) stores data in a columnar format. This makes it much better at
performing aggregate queries (how many shows were watched on netflix today?)
but a lot slower for specific queries (where in this episode is gavinray
currently at?).

~~~
gavinray
Hive is a Data Warehouse that runs on top of Hadoop.

Snowflake is a Data Warehouse/Lake, but it's also it's own custom SQL DB.

Cassandra is a NoSQL DB

RDS is running (some standard relational DB)

Apache Druid is a columnar analytics DB centered around realtime uses. It has
it's own query language and delegates to Apache Calcite for specific
DB/datasource drivers. Can integrate with Kafka/Hadoop.

Presto is (to my understanding) like a meta-DB that can query multiple
databases. Similar to an integrated Apache Calcite or Google's ZetaSQL.

There are a LOT of overlapping concerns here, which is why the confusion.

Essentially they have 2-3 different products in several categories targeting
generally the same usecase.

~~~
nemothekid
AFAICT, Hive/Snowflake are the only overlapping concerns here, and I can see
why they would have both (Snowflake may be the better product, but at the end
of the day Netflix owns their Hive install).

Everything else solves different problems at Netflix scale. I'm spitballing
here but Cassandra could be used for metadata serving (high throughtput,
embarassingly parallel reads with high uptime), RDS for their billing system
(transactions, ACID, etc), Druid for realtime OLAP and Presto as an interface
to Hive/Snowflake.

A smaller company wouldn't need this level of complexity (if you aren't large,
you could probably serve your metadata from MySQL, and just use Snowflake as
your OLAP engine).

------
kohtatsu
Hard sign-up walled :<

[https://i.imgur.com/mSk4uiI.png](https://i.imgur.com/mSk4uiI.png)

~~~
kofejnik
nope; opened just fine in an anon chrome window with ublock origin

------
RcouF1uZ4gsC
I wonder if Netflix is spending a lot of money and effort on things that
consumers don't really care about.

I don't really care for personalized recommendations. As long as the movies
and TV Shows have some type of keyword or grouping, I am happy. Given the
relatively fixed size of the catalog, and that most movies and TV shows are
already classified (for example as Action or Romance or Comedy etc) this
doesn't seem like something where a lot of data would do a lot to help.

For me, most streaming movie companies are pretty equivalent in their
experience. The only data I really need them to keep track of about me is what
I have watched so I know if I have already watched something before, and where
I am in the movie/tv series, so I can pick back up when I log in again.
Everyone pretty much does it. As for streaming quality, basically everyone
knows how to transcode movies and put them on a CDN. The size and quality of
the catalog is the biggest differentiator for me.

~~~
syspec
"I personally don't use feature X"

------
SergeAx
Am I getting it right and Netflix' own hardware is negligible or non-existent
and all their infrastructure is deployed to AWS?

~~~
yencabulator
Netflix runs its own CDN system. That's a significant amount of hardware to
own & manage, but their operating costs are probably less than what a
conventional data center would have, because it's in the ISP's interest to
host a local cache (to minimize bandwidth use).

[https://datacenterfrontier.com/mapping-netflix-content-
deliv...](https://datacenterfrontier.com/mapping-netflix-content-delivery-
network/) [2016]
[https://openconnect.netflix.com/](https://openconnect.netflix.com/)

------
discodave
Why on earth would Netflix want their blog behind a paywall?

Absurd.

~~~
timy2shoes
You can open it in incognito mode to avoid the paywall.

~~~
ramraj07
Problem solved, absurdity not solved.

------
TheGuyWhoCodes
Wow so a very large portion of their spend is on Cassandra. I bet it's heavily
modified for their needs but I wonder if Scylladb would be able to replace
some of those clusters for a considerable saving in maintenance time and
money.

