The most interesting idea is that its native data types are time series of integers, doubles, distributions of doubles, booleans, and tuples of the above. This means that the data you operate on intrinsically consist of many timestamped data points. It's easy to apply an operation to each point of the data, and it's also easy to apply operations on a rolling window, or on successive points. This makes the language have the feel of an array-based language, but even better because the elements are timestamped and the array can be sparse.
Furthermore the presence of fields in each data point adds more dimensions to the level of aggregation (not just the inherent time-based). Now the language has the feel of a native multi-dimensional array language. It feels amazing to program in it. You can easily do sophisticated queries like figuring out how many standard deviations each task's RPC latency for a specific call is above or below all tasks' mean, for outlier detection.
That work eventually resulted in a language called Optic, which was indeed (IMO) a very nice cleanup of Borgmon. Ultimately though that work got shelved in favor of Monarch, whose focus was less on the language problems of Borgmon and more on the points listed in the introduction of the paper, especially points 1, 3, and 4 (at least in my memory).
The underpinnings of the query data model and execution model got hashed out reasonably well as part of the first implementation of Monarch, which started in earnest in late 2008 or early 2009. But the textual form of the query language suffered for quite a long time after that. I wrote the first crappy version of an operators-joined-by-pipes language sometime in 2010. ("Language" is a generous term; John liked to refer to it in a kindly way as "an impoverished notation.") But it was clear even then that the basics of that syntax were appealing: they lined up nicely with how our users mentally constructed their queries. "You start with the raw data; then apply a rate; then aggregate by these fields; then take the maximum over the last five minutes" etc.
Through a couple of revisions over the subsequent few years, that "impoverished notation" eventually got embedded, through some awful operator overloading, as a kind of DSL inside of Python. But it was clear to everyone that it would be impossible to release that publicly to GCP users; it was much too clunky, and also by then tied inextricably to Python idiosyncrasies. So in about 2015, give or take, we came back to the question of what a better textual notation might look like.
The obvious first choice was to see if we could somehow twist SQL into being useful, possibly with some custom functions or very minor extensions. Around this time there was a large effort going on to standardize several different SQL dialects that were being used by internal systems (BigQuery/Dremel's SQL dialect was not the same as Spanner's dialect, etc). So it felt like there was a convenient opportunity to somehow fit time series data into the same model.
John did a bunch of due diligence to try to make that idea work, but it just wouldn't fly. I remember a list he had of about fifty of the most common kinds of queries, written with a SQL version next to (an early version of) Monarch's current query language. Nearly everyone he showed it to, across the spectrum of experience and seniority, both SWE and SRE, said "of course I'd rather read and write SQL, let me look at that list"... and then went through it careful and came out thinking, well, maybe not.
I don't know if there are any interesting conclusions to draw from the history of it, except that language design is really hard. I agree that it's a fun little language, and I'm very happy that John and the team managed to get it out publicly in Stackdriver.
If you ignore the issue of alignment, I actually think that a more conventional array based language, either something SQL-like or numpy-like, would be more accessible to most people. And things like windowing are incredibly unintuitive for most users.
The only other places I've seen Google give out hard numbers were a presentation by Jeff Dean mentioning map-reduce core-years consumed per day, and a footnote in a paper that mentions how much CPU time Google Exacycle gives away to scientific computing every day. All of these calibration points are eye-opening.
A single core design allows a very simple concurrency model, without having to worry about cache pingponging, false sharing, or myriad other issues. The parallelism is applied at higher layers, as there are multiple replicas for each leaf and obviously they can use many cores effectively overall.
I don't see that the paper gives enough information to help us prune the design space here.
The reason supercomputers are so tiny compared to cloud data-centers is that cloud computing is highly and mostly-triviallly parallel, but supercomputers are mesh and serial.
I haven't found anything as good as Monarch and the internal dashboard, though. Monarch always made sense to me, by not doing anything clever. For example, if you want to aggregate across different streams, you have to "align" them first, and the alignment is the operation that defines how to make up missing points. Other systems don't have that alignment step, and it causes me to struggle with everything else I want to do.
As far as I can tell, Prometheus and InfluxDB have implicit alignment, but are never clear on what the rules are. At my last job, I collected some of the exact same metrics as I did at Google, and while I knew exactly how to get the charts I wanted with Monarch, I could never figure out how to get them with InfluxDB (I later found out that it simply didn't support my use case). (The exact use case was collecting counters from network devices opportunistically, and then generating a bandwidth chart for a section of the network. At Google it was easy; I could do the differencing to turn the packet counters into "bytes sent over the last X seconds", then align the streams so that data was available at each time, then aggregate over topology. With InfluxDB... the query language lets you express that, but it doesn't yield correct results. I complained about it on HN and the author of did a lot of handwaving about how what I want to do is wrong, or something, and so I just wrote my own thing instead.)
The other thing I miss from Monarch is the default visualization for histograms. I have to manually reproduce that in Grafana with a series of manual queries like histogram_quantile(0.99, ...), histogram_quantile(0.90, ...), histogram_quantile(0.85, ...), ... where that was just the default visualization. Again, the open source world has some defaults, but I can't make heads or tails of what they are (try a default histogram visualization in Grafana)... Monarch just did the right thing by default.
I miss it.
If you really want /streamz with simple aggregation for distribution, description for metrics etc in an HTML page, I haven't encountered one yet.
Open sourcing anything at Google tends to be rather difficult because much of the technology we build is built on other technology that has not been released. It tends to take a great amount of effort to open source anything at Google because of all the internal dependencies you have.
It should be noted that Prometheus is an open source recreation of the precursor to monarch (developed by a former Google SRE that has since returned).
Monarch might be an exception to that, though, because - as noted in the paper -, it has to be built without many dependencies, since almost everything else depends on it for monitoring and alerting.
That being said, I'm sure it would be extremely difficult to open source now since it was never built with open sourcing in mind.
In the case of PageRank, at least it was published after Google had already dominated the search space, i.e., many years after the conception, implementation, and utilization.
I wonder if Google papers are internally reviewed before publishing so as to make sure only partial information is revealed, and not the secret sauce.
They aren’t worried about ‘giving away the idea’, they just don’t have an easy technical way to open source just the one component.
The thing is Monarch is not the secret sauce that makes Google money. Every other company in the world could be running monarch, and Google revenue and market position would not be affected.
Publicizing this is a win-win situation. The idea gets exposure and potentially other devs can be benefit, and Google gets some good PR as a place where devs get to work on cool stuff.
The issue is, even if you give away the secret sauce that doesn't really help with making the secret sauce scale or whatnot, nor does anyone that isn't a large cloud provider need a custom solution like monarch. Prometheus or Datadog work fine for everyone else. This might be interesting reading for those companies, but also it might not be, because those can't be as centralized as monarch is (consider if prometheus had an API and ran a centralized cluster of data-ingestion servers, and you made time series to that global, Prometheus-owned, cluster).
I'm not actually sure that's true. It's like many other things inside Google - people outside don't necessarily understand the value or know what's actually possible, because they've never experienced anything similar. It's sort of like trying to discuss the finer points of the taste of oysters with someone who has never tasted them.
I would very much like the feature set of Monarch (and streamz), without the maintenance overhead or even insane scale. Very, very few companies out there need to run anything at anywhere near "billion-user" scale, but literally all of them could benefit from painless and detailed monitoring that Monarch offers.
To the best of my knowledge, nothing else in the market does that.
Since the distribution is represented by a CDF of buckets, there's no guarantee that you'll get an accurate representation of the median or any other quantile. On the other hand you'll get an exact average.
Pet peeve: it can calculate and graph something it calls a quantile. But if the value is in the middle of a large bucket, it will just interpolate or something and the result will be terribly misleading. It'd be much better if it gave lower/upper bounds.
I try to use distributions via questions of the form "what fraction of values are less than / greater than N [which I've verified is a bucket boundary]?". This gives you an answer you can trust.
The core idea is a distributed time-series DBMS... not much to give away there. It "gives away" some architectural novelty, but it's not the solution to P=NP. These papers typically describe engineering feats more than they do a revoluntionary idea.
As impressive a the Planet-Scale sounds like, monitoring system has been essential in a lot of companies, big or small, and there are tons of companies dedicated to this problem domain.
So it is not an untapped territory for many. Google solved a problem that is relevant to Google and its scale. The solution is cool, but it is only so because the problem exists in the first place.
Without Google's problem, the solution will be seen as incredibly over-engineered and unnecessary.