
DDSketch: A fast, fully-mergeable quantile sketch with relative-error guarantees - jbarciauskas
https://arxiv.org/abs/1908.10693
======
homin
Author here. We wanted to be able to graph p99, p99.9 metrics with arbitrary
ranges, and found the existing solutions were not accurate enough for our
needs. Happy to answer any questions.

Code here:

[https://github.com/DataDog/sketches-go](https://github.com/DataDog/sketches-
go)

[https://github.com/DataDog/sketches-py](https://github.com/DataDog/sketches-
py)

[https://github.com/DataDog/sketches-
java](https://github.com/DataDog/sketches-java)

~~~
heinrichhartman
Nice work! Averaging percentiles is well-known to give terrible results. Glad
to see more people, taking this problem serious, and providing viable
alternatives!

A note on Accuracy: At Circonus, we have been using a version of HDR-
Histograms [1] for many years to aggregate latency distributions, and
calculate accurate aggregated percentiles. Accuracy was never a problem
(worst-case error <5%, usually _much_ better).

If I read your evaluation results correctly, you also found HRD-Histograms to
be as-accurate or more-accurate, than DDSketches, correct?

The differentiator to HDR Histograms seems to be merging speed and size, where
DDSketches seem to have an edge.

One thing that is not immediately clear to me from reading the paper is, how
much of the distribution function can be reconstructed from the sketch? E.g.
for SLO calculations one is often interested in latency bands: "How many
requests were faster than 100ms?" [2].

Is it possible to approximate CDF values ("lower counts") from the sketch with
low/bounded error?

[1] [https://github.com/circonus-
labs/libcircllhist](https://github.com/circonus-labs/libcircllhist)

[2]
[http://heinrichhartmann.com/pdf/Heinrich%20Hartmann%20-%20La...](http://heinrichhartmann.com/pdf/Heinrich%20Hartmann%20-%20Latency%20SLOs%20Done%20Right%20\(FOSDEM,%202019\).pdf)

~~~
homin
HDR is great for the use-case when you can bound your range beforehand and
merging is not a requirement, but those were also the reasons we needed to
develop DDSketch.

~~~
heinrichhartman
Sorry, but I don't follow this argument:

(1) HDR-Histogram merges are 100% accurate and very fast (few microseconds)

(2) The range of HDR-Histogram is bounded in the same way that floating point
numbers are bounded. Hence the name "High Definition Range":

> For example, a Histogram could be configured to track the counts of observed
> integer values between 0 and 3,600,000,000,000 while maintaining a value
> precision of 3 significant digits across that range.

[http://hdrhistogram.github.io/HdrHistogram/](http://hdrhistogram.github.io/HdrHistogram/)

Circllhist offers a default range of 10^-128 .. 10^+127, which has been more
than ample for all use-cases I have seen.

[https://github.com/circonus-
labs/libcircllhist/blob/master/s...](https://github.com/circonus-
labs/libcircllhist/blob/master/src/circllhist.h#L66)

------
germanjoey
Hey, congratulations, this is a really cool algorithm! Thanks for sharing it.

I'm interested in this paper because I worked on a somewhat related problem
some time ago, but got stuck on how to handle data that morphs into a mixed-
modal distribution. Modes that are close together are no big deal, but modes
that are spaced more exponentially apart are tricky to deal with. For an
example of something that would be in DataDog's purview, it would be like
trying to sketch the histogram of response times from an endpoint that
sometimes took a "fast" path (e.g. a request for a query whose result was
cached), sometimes took a "normal" path, and sometimes took a "slow" path.
(e.g. a query with a flag that requested additional details to be computed) If
the response times from the slow path is much bigger than the others, e.g. by
an order of magnitude, their statistics might essentially drown-out the data
from the other two paths since you're using them to calculate bin size.

I noticed you had some results from measuring DDSketch's performance on a
mixed-modal distribution that looked pretty good (that "power" distribution on
the last page). I was wondering if you had done any more investigation in this
area? E.g. how messy/mixed can the data be before the sketch starts to break
down?

------
MrBuddyCasino
How does this compare to t-digest?

[https://github.com/tdunning/t-digest/blob/master/docs/t-dige...](https://github.com/tdunning/t-digest/blob/master/docs/t-digest-
paper/histo.pdf)

~~~
MrBuddyCasino
Ah there is a short comparison in "related work":

The problems of having high relative errors on the larger quantiles has been
addressed by a line of work that still uses rank error, but promises lower
rank error on the quantiles further away from the median by biasing the data
it keeps towards the higher (and lower) quantiles [7], [8], [17]. The latter,
dubbed t-digest, is notable as it is one of the quantile sketch
implementations used by Elasticsearch [18]. These sketches have much better
accuracy (in rank) than uniform-rank-error sketches on percentiles like the
p99.9, but they still have high relative error on heavy-tailed data sets. Like
GK they are only one-way mergeable.

------
skyde
great! so main difference is more accuracy on average or more the fact the
maximum error possible is bounded?

~~~
homin
Most previous sketches used what they called "rank accuracy". So if your
inputs were [2^1, 2^2, ...., 2^1000]. The actual p95 is 2^950, and a 0.01
rank-accurate sketch would be allowed to give you any value between 2^940 to
2^960 (which can be up to a factor of 1024 away).

A 0.01 relative-accurate sketch has to give you a value within a factor of
100.

The above example is contrived, but with real web request data, there is often
a very long tail, and rank accurate sketches quickly start giving values far
from the actual percentiles.

So to more directly answer you question, it's that the maximum error possible
is bounded.

