
How Netflix uses Druid for realtime insights - uaas
https://netflixtechblog.com/how-netflix-uses-druid-for-real-time-insights-to-ensure-a-high-quality-experience-19e1e8568d06
======
PedroBatista
"To keep reading this story, create a free account."

Seriously? I know it's the Medium hustle but someone at Netflix should know
better.

~~~
yingw787
I think one cool feature Medium can introduce is enterprise permissioning. If
you have an enterprise plan and pay Medium oodles of money, you should
maintain your own permissioning control and who sees your articles while still
having access to the Medium social network. Best of both worlds.

~~~
dehrmann
> oodles of money

Medium isn't worth oodles, though. Looking at Squarespace and Wordpress
pricing, for a company's tech blog, it's worth $25-$100 per month. Unless
there's value from the Medium brand...

------
shaklee3
Can anyone from Netflix comment on why not clickhouse? We tried druid, and the
performance was pretty bad compared to a clickhouse.

~~~
modarts
Can confirm clickhouse is generally faster across most typical workloads

~~~
jpgvm
Clickhouse is much more resource efficient in many cases but is less flexible
and importantly extensible than Druid.

Druid can easily be extended through available 3rd party extensions and you
can write your own to implement custom serialisation formats, aggregations,
connect to new streaming systems, read directly from whatever cold storage you
have etc.

In the Clickhouse model you have to work out a lot more of that stuff yourself
though these days it can read from Kafka directly which is useful.

Some things that are important for Clickhouse vs Druid at big scale is the
rather large difference in indexing approaches. Clickhouse uses bloom filters
and other probabilistic data structures to index large chunks of data, for the
most part though actually checking for rows requires a full scan of that chunk
to strip false positives.

This is different to Druid which uses full inverted indices for dimension
filtering.

The tradeoff is basically Clickhouse is cheaper, especially when scaling out
but Druid is faster especially when the cluster is under heavy concurrent
query load, like serving analytics dashboards or data exploration interfaces
to users.

Clickhouse excels when you want to scan most but not all the data most of the
time. Namely reporting or bulk analytics queries that will hit most rows in a
block.

I consider both to be excellent databases.

~~~
gianm
Druid committer here. (Also, I think we've met before in SF!)

One thing I wanted to add with regard to performance. Druid does indeed get a
big boost from the fact that it uses inverted indexes for filtering. It also
gets a boost from having a wide variety of approximate algorithms you can use
if you want (for things like topN, count distinct, set
difference/intersection, quantiles, etc). But straight scan performance has
been improving quite a bit recently too.

The biggest change related to straight scan perf is fully vectorizing the
query engine, which is partially done as of the latest release (0.17):
[https://druid.apache.org/docs/latest/querying/query-
context....](https://druid.apache.org/docs/latest/querying/query-
context.html#vectorizable-queries). In benchmarks, the implementation so far
has been posting row scan rate improvements in the 2-3x range. I expect we'll
be able to round it out and have it work for all queries over the next couple
of releases. The multiples involved mean this is quite meaningful if you do a
lot of straight scans.

There's plenty of other stuff going on too: our latest release added parallel
merging of large result sets. Our next one (0.18) is going to add a new, more
efficient hash aggregation engine. That next release is also going to add a
JOIN operator -- not perf related, but probably the number one most requested
feature.

~~~
barrkel
I was reading the details on inverted index usage in Druid, but what is
described seems to be bitmap indexes, not inverted indexes.

Inverted indexes map distinct values in a column to a list of document ids
containing the value. Bitmap indexes map distinct values to an array of
booleans the same length as the number of documents, with true for presence
and false for absence. Both index types can be highly compressed, of course.

Can you clarify what Druid is using?

~~~
gianm
Logically, an array of booleans and a set of integers are equivalent. So in
the Druid developer community we usually use the terms interchangeably. But to
be precise, our indexes are all stored as bitmaps and compressed with bitmap
compression libraries.

------
devmunchies
> During software updates, we enable the new version for a subset of users and
> ... compare how the new version is performing vs the previous version. Any
> regression in the metrics gives us a signal to abort the update and revert

How do you account for the possibility that the update only performs badly
because it’s different than what users are used to, but would actually be an
improvement in the long run?

~~~
Twirrim
> How do you account for the possibility that the update only performs badly
> because it’s different than what users are used to, but would actually be an
> improvement in the long run?

That's assuming that the update contains a UI/UX change. A lot of updates that
will roll out won't include that, they'll be fixing or optimising things.

------
magicalhippo
> How can we be confident that updates are not harming our users?

Maybe this is why they finally figured out the auto-playing trailer stuff was
a horrible change (I cancelled my account over that). One can only hope.

------
woolly
Is it possible to view this article without creating a medium account?

~~~
tschmidleithner
It looks like Medium is opening their paywall modal based on viewed articles
(and cookies). Therefore, you could try private browsing.

~~~
shivaas
can confirm. opening link in incognito worked for me.

------
alexandercrohde
I wonder how this choice compares to using a hosted solution like Splunk (both
in capabilities and cost).

~~~
prtaylor
Thats an interesting thought.

------
zerop
So this is behind paywall. I am sure Netflix is not marking this story to be
behind paywall and Medium can not put this story behind paywall without
consent from Netflix.

~~~
manigandham
It's not. It's medium.com asking for signin (which is a free account) but you
can delete your cookies and reload to view.

~~~
zerop
I am already logged in and it still is a featured story. It's counted against
3 features story I can read in a month. So it is behind paywall. C

~~~
rad_gruchalski
Then log out. It’s not a paywall. Log out and you can read it. Medium is
playing you.

~~~
tjoff
Proper response: delete account and don't visit medium again.

Even better: sites such as hn should never allow links to sites employing dark
patterns.

~~~
manigandham
People are interested in the content. No need for HN to start filtering
because the blog service. You can avoid the site if you want.

~~~
tjoff
There most certainly is a pressing need for sites such as HN to start care
about stuff like this.

If people start to care about being exploited the content will move to a more
ethical site. Or it might influence medium.

~~~
manigandham
People don't care though, because they can make their own choices but they're
clearly still sharing and reading on medium. You're asking for HN to
arbitrarily enforce _your wishes_ on top of what users are doing.

~~~
tjoff
They do, and it is hardly my wishes alone.

If 1% don't care and share the crap the other 99% still have to suffer for it.

~~~
rad_gruchalski
That’s assuming the 99% care to suffer. I don‘t, I see a post from wall street
journal, I don‘t click. I know it’s a paywall. People have a choice to click
or not.

I do agree that medium is a bit of a mess where one is „less privileged“ when
signed in.

But what we should be asking for is the companies like Netflix not using
medium in a first place.

~~~
tjoff
Ban medium from HN and companies like netflix will switch in 5 minutes.

------
polskibus
I wonder if materialize.io could handle such workloads at this stage.

~~~
adamfeldman
Materialize, today, looks to be limited to running on a single node.
[https://materialize.io/docs/overview/architecture](https://materialize.io/docs/overview/architecture)

Netflix's workload would likely exhaust the resources of even a vertically-
scaled single node.

~~~
derefr
Mind you, given the “Timely Dataflow” abstraction Materialize operates on top
of, if you give it a query that only requires certain result rows from one of
its mat views, then Materialize is only going to compute the intermediate rows
(and, further back, retrieve the source rows) required to “render” the
particular result-rows you ask for. (Sort of like how Excel, in memory-
constrained conditions, only computes the intermediate cell-values required to
render the cells currently in view, and thus, if an intermediate cell requires
an `XmlHttpRequest` call to resolve, that call won’t fire if the cell’s value
doesn’t need resolving yet.)

Because of that, you don’t really need to scale Materialize in a sharding
sense. You can just have a bunch of “the same” Materialize node (i.e. every
node just freestanding clone of a template node, with exactly the same sources
and matviews) and then hit them with the parts of a map-reduce query launched
by, say, Citus—where Citus was thinking it was talking to a bunch of Citus
shard nodes each holding a table-shard named X, but was actually talking to
Materialize nodes each holding a matview named X. As long as the query sent
from the map-reduce job to each node is constrained in its WHERE clause to
only the part of the data it _expects_ to get from that node—rather than
relying on the node to know what data it has—then the Materialize nodes would
each just do the work required to supply that data (including only pulling in
the parts of the configured sources required to compute that result.)

That’s just my intuition from how Materialize presents itself as PG-wire-
protocol compatible, though; I haven’t tried this myself, and there might be
some footguns in the path of anyone really trying to implement it.

And, of course, this is all irrelevant the moment you write a query that needs
a pure reduce (e.g. the computation of a current finite-state-machine state
over an event-stream source) rather than a map-reduce. Druid/Clickhouse/etc.
can probably “scale” those, in at least the Hadoop “move the job around
between each serial stage, so each stage has data-locality for the data of
that stage” sense; while Materialize would give you no benefit at all in such
a job over just querying a plain PG view defined on top of a Foreign Data
Wrapper source.

~~~
quodlibetor
> if you give it a query that only requires certain result rows from one of
> its mat views, then Materialize is only going to compute the intermediate
> rows

This is absolutely correct!

> You can just have a bunch of “the same” Materialize node (i.e. every node
> just freestanding clone of a template node, with exactly the same sources
> and matviews) and then hit them with the parts of a map-reduce query

This should work, but we have been thinking about it/testing it differently
internally. In general you should be able to create materialized views on
different "shards" that have different `where` conditions, allowing you to
control memory that way. This technique does require data that is actually
partitionable in this way, same as it must be partitionable in mapreduce.

> this is all irrelevant the moment you write a query that needs a pure reduce

Of course, with materialize's sinks you can spin up a bunch of `materialized`s
and connect them for a final reduce after data has gone through e.g. kafka or
shared files. Being able to write joins and aggregates across heterogenous
sources makes this kind of workload actually pretty pleasant.

------
siddcoder
Wondering if any performance comparison was done with Apache Pinot?

------
dikei
I tried setting up a Druid cluster once, they have so many components that
it's hard to manage. Also Druid didn't not support SQL at the time and we
don't want to learn its weird query language due to the lack of documentation.

