
Heatmaps Make Ops Better - rhema
https://www.honeycomb.io/blog/heatmaps-make-ops-better/
======
HenryBemis
I trade forex, every week I download my trades of that week and put all trades
in a heatmap. I've been doing this for quite a while now, and with just one
look I can see where I do good and where, not. I then take a screenshot and
compare to screenshots of previous weeks.

This gives me the ability, in a few seconds, just browsing through the
screenshots to see if my changes in tactic and my bots, help and which way if
more productive/profitable.

To KonSchubert's point, scatter plots help when I have a small number of one
or two dimentional attributes and it's pretty 'clean'. When I have multiple
attributes (profit, duration of trade, etc) heatmaps work better (for me).

~~~
adtac
Very interesting. Can we see an example heatmap of yours from any week?

------
kjeetgill
Super cool! It reminds be of Netflix's Flamescope[0] also from Brendan Greggs
work at [https://medium.com/netflix-techblog/netflix-
flamescope-a57ca...](https://medium.com/netflix-techblog/netflix-
flamescope-a57ca19d47bb).

The reasons heatmaps are "news" is because not everyone has any formal
exposure to data/stats curricula. The concept is easy, straight forward, and
nearly obvious but unless you're in the position to have datasets to try to
understand (such as a class where they have datasets and questions posed for
you!) you just don't hear about this stuff.

What often happens is that you have a problem you're trying to understand:
e.g. "Why does this crash?" or "Why is this slow on Wednesday?", and you use
the data you have available to solve it. You don't normally have access to the
kinds of data at the resolutions you need for something like heat-=maps to
come into play. You're ops not a the application engineer and you just have
50-90-95-99 latency percentile graphs pre-aggregated in minute windows in
nagios or graphana and maybe a few more of those for IO, NET, CPU, and thread
counts and you're trying to correlate between these to form a hypothesis.

If it's important enough AND you have bad luck AND it can't be solved with
more hardware, THEN you go deeper. You get to start actually trying to decide
what new data is worth collecting ad-hoc, during what time intervals, etc.
Only then do you even have data with the resolution where it's worth talking
about heatmaps.

That's why this is and interesting reintroduction or new to people. The post
helps justify the data that you'd need to collect in order to use heatmaps as
much as advocating heatmaps themselves.

~~~
sciurus
I don't get why you think metrics at low resolution or about resource
utilization aren't usefully visualized via heatmaps. They absolutely are!

Let's say your application's aggregate p99 latency goes up. Generate a heatmap
showing time (x-axis), percent CPU utilization (y-axis), and number of servers
at that utilization (z-axis, color saturation). Oh, turns out one of your 500
applications servers has developed much higher CPU utilization than the rest!
Better go kill the intern's bitcoin miner and bring latency back to normal.

[http://www.brendangregg.com/HeatMaps/utilization.html](http://www.brendangregg.com/HeatMaps/utilization.html)

~~~
kjeetgill
Your example is a strange one. A line graph showing time (x-axis), percent CPU
utilization (y-axis) would very readily show you a single host as an outlier
well above the rest.

~~~
sciurus
A line graph like that isn't easy to render or interpret once you have many
hosts. Take the example in the blog post I linked: utilization of 300 host
(5,312 CPUs).

Here's the data visualized as a line graph:
[http://www.brendangregg.com/HeatMaps/lines-allcpus-
image.png](http://www.brendangregg.com/HeatMaps/lines-allcpus-image.png)

Here it is as a heatmap: [http://www.brendangregg.com/HeatMaps/cpu-
utilization-heatmap...](http://www.brendangregg.com/HeatMaps/cpu-utilization-
heatmap-600.png)

Which one conveys more information?

------
jaytaylor
For contrast, I went back and read Brendan Gregg's explanation of heatmaps and
found the concise explanation of the mechanics clear and easy to understand:

[http://www.brendangregg.com/heatmaps.html](http://www.brendangregg.com/heatmaps.html)

TBH, the Honeycomb overview felt long and difficult to get through. Does
anyone argue against heatmaps and demand a case be made justifying "why
heatmaps"? This read like it was fighting hard in defense heatmaps; against
who I'm not sure :)

~~~
swiftcoder
Ops teams at tech firms often don't have much of a formal background in stats.
I used to see folks giving the "10 ways not to visualise data" talk at least
once a year, and still most service dashboards were just rank upon rank of
line plots with multiple different scales on each axis...

------
konschubert
So, yea, Heatmaps are a thing and they advantages and disadvantages compared
to scatter plots. Welcome to plotting 101.

