Hacker News new | past | comments | ask | show | jobs | submit login
Show HN: Visualizing disk IO activity using log-scale banded graphs (bvckup2.com)
207 points by apankrat on Apr 11, 2018 | hide | past | web | favorite | 69 comments

Author here.

We had a need for a tool to graph IO activity at arbitrary sampling rates, so there was finally a solid excuse to do some datavis work :)

The thing I wanted to show is not the program itself, but the idea of factoring values and displaying factorization to cover several magnitude levels in one go. I suspect that this has been done before, but I couldn't find anything similar when I was researching this beforehand.

This visualization is really cool and innovative! It's useful for seeing the changes in the stream of values even if they are small relative to absolute magnitude (e.g., 1000001 -> 1000005 -> 1000003 etc).

The tradition way to solve this is to rescale the y axis (for example google stock price chart ranges from 1000-1200). But this doesn't work for graphing IO because it spikes up/down too often in a huge range.

Your visualization plots the base-1024 digits so whenever the IO is sustained at a certain range, the top few digits will be held constant and the next digit will show the variation within that scale.

Unfortunately I don't think this solves the spiking problem completely. It's mentally hard to parse since two adjacent values are only comparable whenever their top digits match.

For example it's only meaningful to compare (1gb+2mb+3kb) with (1gb+2mb+5kb) on the kb graph. If it spikes down to 0gb+0mb+0kb, you need to switch to reading the gb graph (since comparing the kb value to one before is pretty meaningless unless for some crazy reason you actually care about the mod 1024 value of two very different numbers).

To fix this, you should color code the "runs" of values that are meaningful to compare! In other words, kb values have the same color only if their gb/mb values match. And mb values have the same color if their gb value matches.

I like the idea of using a color signal to show a changed value, with the change cascading to lower "digits". So, for instance, if the GB change, then so do MB, KB, and B. If GB and MB remains the same, then only KB and B will change.

Hard part would be how to do this over a long span of columns without the whole thing looking like a unicorn puked on the chart...

I've never seen something similar before either, and it is a very useful visualization.

I'm thinking about how to do something similar in my monitoring. How would you extend it to allow zooming on the X-axis? With a continuous log scale, or a simple linear scale, it's easy to know how to compress the data horizontally: take the min/max/avg of the datapoints that go in to one horizontal pixel. I'm not sure how to apply that to your discrete log scale; it seems to work on the assumption that points along the X-axis are discrete.

Edit: I said Y-axis, I meant X-axis. I'm embarrassed.

> How would you extend it to allow zooming on the Y-axis?

Well, the whole point is to NOT need to do any Y-scaling. What would be the case for zooming in?

I'm sorry, I meant X-axis. I'm embarrassed.

Ha. Good question, actually.

The simplest option would obviously be to just show a conventional candlestick when zoomed out. There might be an option to do something interesting with rendering the stick when min/max have the same higher-order factors... but it might also end up looking like a royal mess.

Nice work. Do you plan to open source it?

I see no point in that. It's trivial to do in several hours from scratch.

Even if you think your work is trivial, it may not be so for others. :) Also, gives an opportunity for people to extend the current feature set. Entirely your choice though.

And a great way for anyone wanting to start some windows programming to see an example of a nice simple program with a GUI that interacts with WIN32 apis

I came to the comments to look for a link to the source. I'm a systems programmer but don't have much knowledge of how you guys do things in Win32. A ~70kb executable with nice graphs was enticing and I was really interested to see the source. But as it is, I can't afford to download closed freeware that can't be audited just for some nice graphs.

Not meaning to be rude, but how do you run Windows itself if you cannot audit it? How do you fetch updates from Microsoft if you cannot audit them? What about other applications, eg. anything from Adobe, Microsoft (for Office etc.) ?

Seems to be an odd complaint to make, to me. You could run ProcessMonitor to see what the application is doing.

Or read the excellent Windows Internals books if you've got a few weeks spare (and for Windows 10 it is missing Volume 2 at the moment - given that Windows 10 is a constantly moving target the poor book will be out of date as soon as it is written, let alone published).

Do you audit apps on Android / iOS / macOS too? I find my Android device dials home all the time and flings data out at an astounding rate, which is disappointing.

How do you use your satellite/cable/TV these days? How do you audit your entertainment systems? Mine came with the GPL on a bit of paper in the box given that it's got no end of Open Source software used inside it but I was unable to audit it - how did you manage it?

Do you read the source code of every Open Source application before running it? Have you combed over the entire Linux kernel before booting it? What do you think to Peter H. Anvin's bootloader code given that you use it every single time Linux boots? How long did auditing that take?

At what point do you start to trust an executable or code?

I would assume that someone else would be one doing the auditing in this case. Some companies have paranoid IT.

That's a lot of code to read before using something!

Windows as a desktop operating system ships everything you need to write GUI apps with a plain C API, which allows to build fairly small tools. I'd guess this probably just uses good ol' fashioned GDI.

Yep, Win32 API + GDI + UPX.

by UPX do you mean https://github.com/upx/upx

Just stick it into a VM like the rest of us.

How's the GUI setup?

Basic window with stuff drawn on it with GDI and GDI+ in a double-buffered way.

Edit: Scratch the GDI+ bit, it's just GDI.

Maybe you can use slightly different color/background color for the bands? That way it's more obvious that the 3 bands are separated instead of one continuous band.

It started this way, actually. But in the day-to-day use it ends up being an embellishment, because there would typically be a string of read/writes that fall on the first pixel of every band so they'll highlight the boundaries, for free.

Different color could be a little bit distracting for minimal design, maybe there could be gaps between bands. The bands are independent with each other, a 999M 999K 999B band look like a continuous band but it should be 3 separate bars.

This is just pure nick picking, your current design probably suit your need already.

How are you reading disk I/O rates? Is there a logs from Windows on this?



This relies on IO performance counters enabled for the target drive, but that's typically the case. If they are off, they can be enabled with "diskperf.exe".

>a tool to graph IO activity at arbitrary sampling rates

Why sampling rate is fixed then?

Because public build is not the same as the one we use internally.

A Rainmeter version of this would be very cool.

This is a great idea, and I have one suggestion:

It can be kinda hard to see if two neighboring entities are the same in a given band (kbs, mbs, gbs), because you're looking for the absence of a difference of a pixel or two. Maybe if they are the same and neighboring, join the two columns rather than have them spaced? That would visually instantly tell you that those two samples are the same.

Interesting idea, thanks. I'll give it a try.

This just reminds me that I miss something like iStat Menus [0] on Windows.

Process Explorer [1] can show some small graphs as tray icons but there is not much detail visible without opening the main application's window.

XMeters [2] does a little more but its configuration is very inflexible and clicking on any item just opens the plain Windows Taskmanager instead of showing additional details and graphs like iStat Menus.

[0] https://bjango.com/mac/istatmenus/

[1] https://docs.microsoft.com/en-us/sysinternals/downloads/proc...

[2] https://entropy6.com/xmeters/

To the people who use log-scales for visualization of discrete events (e.g. 0, 1, 2, 3, 4, ...):

How do you deal with a value of 0?

Do you use the same distance between "0" and 1 as between 1 and 10?

On a different note:

In a lot of cases I prefer using something like sqrt(data) or data^(0.x) because a) it smoothly supports the case of 0 and b) it squashes the data less than log.

Regarding b): In a lot of cases I think that log() squashes the information too much and you easily lose a sense of scale. A scaling funtion like sqrt(data) still squashes data with high values but it maintains the feeling of "Oh, this value is much higher than that value" better than log(data).

The intercept of a log-scaled axis should always be positive, and a value of 0 would not appear on the chart. Many times it's non-sensical for a value of 0 (e.g. pressure, frequency), but wikipedia[1] has an example of tracking pandemic cases (which tends to be logistic, so includes an exponential growth phase) and you see that values below 0 do not appear at all on the graph.

In this case, the slope of each dataset is the important thing, and very low values are probably too noisy to give you information anyways

1: https://en.wikipedia.org/wiki/File:Influenza-2009-cases-loga...

The typical approach is to plot log(x+1) instead, which deals with the zeros as long as you know what you’re looking at. It’s nice to add some jitter too, e.g. a uniform random float number in the -0.25 to 0.25 range.

Recommend testing the site on mobile. I stared confused at a black screen for a while before realizing that it had a massive fixed width and I had to pan right to see anything.

Yep, noted.

We are a desktop software shop, so mobile browser support hasn't been a top priority.

But that doesn’t mean people who want desktop software aren’t browsing HN on their phone...

For me it shows the number 2 and nothing else, even if I view source.

I'm confused. What does "2" mean in the context of a web page? If the server broke wouldn't it cause a 500 HTTP response? This just seems odd: 'if error write "2"'.

Maybe it is some kinda of secret society code: "I'm sorry Dave, I can not do that... TWO!!!!"

Same here. Looks like the server broke.

Things are OK on the server side.

Ah, scratch that. I see the issue!... Check now?

While I find the graphs not very intuitive to read, they hold great information once you understand how to read them :-)

I don't understand. In most of your samples, there is no bar in the first two bands. Does that mean that your samples are a multiple of 1 MB?

If your read/write rate is not constant, such as in the non-optimized file copy and you display the rate mod 1024 in the first band, I expect to see seemingly random bars that would not show any meaningful information. What am I missing?

> Does that mean that your samples are a multiple of 1 MB?

No my samples, but the buffer sizes in read/write requests are multiples of 1 MB.

Consider when a disk driver would update its read/write counters - it happens when a request is completed, so even if a request takes longer than a sampling interval to complete, the byte counter will still go up in buffer size, atomically.

Here is a quick, far less useful visualization of file activity using Gource :)

This was run on OSx, linux users can use strace to watch for open files, much better performance.

# Using lsof in a loop (no sudo required) (while :; do lsof |gsed -r "s#(\w+)\s+.\s+(.+)#$(gdate +"%s")\|\1\|M\|\2#;t;d";done) | \ gource --realtime --filename-time 2 --highlight-users --1280x800 --log-format custom -

# Using Dtrace (better option, could replace with opensnoop possibly but still requires dtrace) sudo dtrace -n 'syscall::open:entry { printf("%u|%s|%s",walltimestamp,execname,copyinstr(arg0)); }' | \ gsed -r 's#.+ ([0-9]{10}).+\|(.+)\|(.+).+#\1\|\2\|M\|\3#;t;d' |\ gource --realtime --filename-time 2 --highlight-users --1280x800 --log-format custom -

Really neat! Any plans for Linux support? I would love to do some guided optimizations using this tool.

Zero plans, sorry.

This is an internal tool that just happens to produce good looking graphs, so we released a build to let people play with it if they want to. It's not a beta of a product, more of a tech demo... if even that.

Is this going to be part of Bvckup 2 by any chance?

Think its going in this https://ccsiobench.com/

I am very much tempted to somehow shovel it in, but I don't think it belongs there. Perhaps only if as an Easter egg.

A better target would be a Grafana plugin.

Yes, I’d love to be able to do this kind of y-axis scaling in Grafana. I was just messing with a network I/O bandwidth graph in Grafana this morning and thinking about log scales. It hadn’t occurred to me to band it by binary/1024 orders of magnitude, but I like the result here.

You can do log scales in Grafana, but not with this banding thing

In servers you can take those disk IO (and many other) metrics with telegraf, store them in a local/remote (storing locally may change the disk IO measurements, but probably it will be a constant factor) influxdb, and visualize them with Grafana.

There are plenty of alternative tools for that, but the point of the article is that helps in the visualization to use a log scale.

Unreadable for me on a 15 inch 1920x1080 monitor. I can't see the MB/KB/b labels. It's tiny and should be resizable. I don't know what the lines indicate, so I can only see real time data, I have to guess what past values represent.

This is a really cool visualization, and a nice implementation of an idea. Thanks for sharing it

The images really need to be larger or at the least be available in a higher resolution.

Page isn't loading for me (I just get a "2", no other content), but archive.org appears to have it cached: https://web.archive.org/web/20180411180009/https://bvckup2.c...

It was a hiccup on the server side. Sorry about that. Should be back to normal now.

(For Windows)

As I said in another comment, the interesting part that I wanted to share is how the data is graphed, and not that there's an executable.

Clever idea!

I would see this nicely applicable to system tray icons for monitoring tools. 16 pixels - 4 log bands with 0.25 precision.

Your work is very pleasant to look at, Alex!

Thanks, glad you like it!

great tool however the font size is really small and hard too read

I like it, the size is way to small though. Resource Monitor gives this information as well btw:edit, reread your post for the arbitrary file size thing-makes more sense now.

Couple requests: option to scroll the timeline, option to export the timeline, ability to view more than one drive at a time.

You can't get "Read X bytes since last reboot" from the standard performance counters, so, no, Resource Monitor doesn't have this data. The best it can do is the IO rate at one-second resolution.

Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | Legal | Apply to YC | Contact