
Monarch: Google’s Planet-Scale In-Memory Time Series Database [pdf] - ngaut
http://www.vldb.org/pvldb/vol13/p3181-adams.pdf
======
kccqzy
As someone interested in PL theory, I've long found the query language exposed
by Monarch surprisingly interesting (briefly discussed in section 5.1 but the
description doesn't quite do it justice). It's a functional language, a breath
of fresh air compared to "real programming languages" in use at Google like
C++, Java, or Go.

The most interesting idea is that its native data types are _time series_ of
integers, doubles, distributions of doubles, booleans, and tuples of the
above. This means that the data you operate on intrinsically consist of many
timestamped data points. It's easy to apply an operation to each point of the
data, and it's also easy to apply operations on a rolling window, or on
successive points. This makes the language have the feel of an array-based
language, but even better because the elements are timestamped and the array
can be sparse.

Furthermore the presence of fields in each data point adds more dimensions to
the level of aggregation (not just the inherent time-based). Now the language
has the feel of a native multi-dimensional array language. It feels amazing to
program in it. You can easily do sophisticated queries like figuring out how
many standard deviations each task's RPC latency for a specific call is above
or below all tasks' mean, for outlier detection.

~~~
apk
The query language is the brainchild of John Banning, one of the authors of
the paper, and has a long history behind it. In 2007 or so he started working
on a replacement for Borgmon's rule language; the thinking at the time was
that the main problem with Borgmon was that its language was surprising and
difficult for casual users to grasp. (And with a monitoring language, there
are only casual users.)

That work eventually resulted in a language called Optic, which was indeed
(IMO) a very nice cleanup of Borgmon. Ultimately though that work got shelved
in favor of Monarch, whose focus was less on the language problems of Borgmon
and more on the points listed in the introduction of the paper, especially
points 1, 3, and 4 (at least in my memory).

The underpinnings of the query data model and execution model got hashed out
reasonably well as part of the first implementation of Monarch, which started
in earnest in late 2008 or early 2009. But the textual form of the query
language suffered for quite a long time after that. I wrote the first crappy
version of an operators-joined-by-pipes language sometime in 2010. ("Language"
is a generous term; John liked to refer to it in a kindly way as "an
impoverished notation.") But it was clear even then that the basics of that
syntax were appealing: they lined up nicely with how our users mentally
constructed their queries. "You start with the raw data; then apply a rate;
then aggregate by these fields; then take the maximum over the last five
minutes" etc.

Through a couple of revisions over the subsequent few years, that
"impoverished notation" eventually got embedded, through some awful operator
overloading, as a kind of DSL inside of Python. But it was clear to everyone
that it would be impossible to release that publicly to GCP users; it was much
too clunky, and also by then tied inextricably to Python idiosyncrasies. So in
about 2015, give or take, we came back to the question of what a better
textual notation might look like.

The obvious first choice was to see if we could somehow twist SQL into being
useful, possibly with some custom functions or very minor extensions. Around
this time there was a large effort going on to standardize several different
SQL dialects that were being used by internal systems (BigQuery/Dremel's SQL
dialect was not the same as Spanner's dialect, etc). So it felt like there was
a convenient opportunity to somehow fit time series data into the same model.

John did a bunch of due diligence to try to make that idea work, but it just
wouldn't fly. I remember a list he had of about fifty of the most common kinds
of queries, written with a SQL version next to (an early version of) Monarch's
current query language. Nearly everyone he showed it to, across the spectrum
of experience and seniority, both SWE and SRE, said "of course I'd rather read
and write SQL, let me look at that list"... and then went through it careful
and came out thinking, well, maybe not.

I don't know if there are any interesting conclusions to draw from the history
of it, except that language design is really hard. I agree that it's a fun
little language, and I'm very happy that John and the team managed to get it
out publicly in Stackdriver.

~~~
1010011010
Googlers still have to use the terrible python dsl (“mash”). Even worse: they
have to use it wrapped in a different terrible python dsl (“gmon”). Sigh.

~~~
lokar
I was mostly using th new notation when I left in 2018.

------
jeffbee
Interesting that this paper contains hard numbers hinting at Google's absolute
scale. They say Monarch has 144000 leaves. Even if each leaf is assigned only
1 CPU core -- which is probably a low estimate because who would do that? --
that makes Google's monitoring stack a Top 100 supercomputer.

The only other places I've seen Google give out hard numbers were a
presentation by Jeff Dean mentioning map-reduce core-years consumed per day,
and a footnote in a paper that mentions how much CPU time Google Exacycle
gives away to scientific computing every day. All of these calibration points
are eye-opening.

~~~
saddlerustle
Yeah the "supercomputer" ranking is a bit of a joke. Every mid-sized google dc
would count as a top 10 supercomputer.

~~~
jankeymeulen
I work at one such "medium sized" Google DCs. Supercomputers are typically
much more interconnected, whilst we have a much more traditional topology.

------
m0zg
Question for googlers (and ex-googlers). Is there anything out there that's as
convenient as /streamz, but in opensource form?

~~~
yegle
If you are looking for /varz, [https://prometheus.io](https://prometheus.io)

If you really want /streamz with simple aggregation for distribution,
description for metrics etc in an HTML page, I haven't encountered one yet.

~~~
m0zg
No, looking for streamz specifically. I'd like to be able to see how each node
is doing and have automatic, zero-config aggregation in k8s a-la Borg/Monarch.
The service I built was one of the first large users of Monarch at Google many
years ago. All because I couldn't be bothered to learn Borgmon. :-)

------
polskibus
I wonder if Google will ever open source it.

~~~
kyrra
Googler, opinions are my own.

Open sourcing anything at Google tends to be rather difficult because much of
the technology we build is built on other technology that has not been
released. It tends to take a great amount of effort to open source anything at
Google because of all the internal dependencies you have.

It should be noted that Prometheus is an open source recreation of the
precursor to monarch (developed by a former Google SRE that has since
returned).

[https://prometheus.io/](https://prometheus.io/)

~~~
fizixer
But when you publish a paper on Monarch, aren't you giving away the core idea?

In the case of PageRank, at least it was published after Google had already
dominated the search space, i.e., many years after the conception,
implementation, and utilization.

I wonder if Google papers are internally reviewed before publishing so as to
make sure only partial information is revealed, and not the secret sauce.

~~~
joshuamorton
[Also a Googler, opinons mine] In addition to what the other people are
saying: there are some limitations to monarch (or really, the data upload
path) that are quite annoying, so monarch isn't even necessarily the "best".
It's just very good. There are ways to improve it.

The issue is, even if you give away the secret sauce that doesn't really help
with making the secret sauce scale or whatnot, nor does anyone that isn't a
large cloud provider need a custom solution like monarch. Prometheus or
Datadog work fine for everyone else. This might be interesting reading for
those companies, but also it might not be, because those can't be as
centralized as monarch is (consider if prometheus had an API and ran a
centralized cluster of data-ingestion servers, and you made time series to
that global, Prometheus-owned, cluster).

~~~
jldugger
The one thing I really want (which apparently Monarch has) is histogram
retention. I'm often called upon to summarize service latency as global p50
and p95, and at the sheer volume of data we have, we aggregate that metric.
Thus I am left calculating an average of p95s, which isn't super useful.

To the best of my knowledge, nothing else in the market does that.

~~~
jeffbee
Stackdriver/Google Cloud monitoring is backed by Monarch, so if you want the
flavor of a Monarch distribution-valued metric, see the docs:

[https://cloud.google.com/monitoring/api/ref_v3/rest/v3/Typed...](https://cloud.google.com/monitoring/api/ref_v3/rest/v3/TypedValue#Distribution)

Since the distribution is represented by a CDF of buckets, there's no
guarantee that you'll get an accurate representation of the median or any
other quantile. On the other hand you'll get an exact average.

------
ur-whale
Given the size of the beast, I wonder what monitors it?

~~~
ithkuil
A borgmon

