
Visualizing Incomplete and Missing Data - kawera
https://flowingdata.com/2018/01/30/visualizing-incomplete-and-missing-data/
======
frumiousirc
This is an article with lots of great examples of what not to do.

The "Gaps" example is more abstract art than data representation.

The "Category" examples lack any explanation of the colors in the figure
captions.

"Zooming" is a selection and is biasing. It should be accompanied with numbers
indicating efficiency and bias. There is also no indication of the statistical
uncertainty of these measurements. Let me use that to determine if the data is
too "granular" on my own.

Interpolation is not a good suggestion in any case and the example shown is
neither sparse data nor was an interpolation performed.

~~~
nstrayer
I would tend to disagree with this assessment. Nathan has a Ph.D. in
statistics and as someone who is a Ph.D. candidate in the same field, I know
for certain he has done a lot of thinking about this.

\- For the gaps example, I think it's rather clear what's going on when gaps
are present. Is the NYT flowchart not clear? This is valuable advice as a lot
of visualization tools (ggplot, d3) will linearly interpolate these values for
you so making an effort to avoid that is valuable.

\- The category example lacks color legends because they are not necessary in
this situation. He was simply stating that you can show missingness as a
category next to other categorical values. The plots shown are simply examples
of the plot type.

\- Zooming _can_ be biasing but ultimately any visualization of data is going
to be some level of 'zoomed' in. As for representing uncertainty, I'm not
exactly sure why that is applicable in this case as he's not talking about
that.

\- Interpolating being a bad suggestion, in any case, is... well, just not
true. Missing data imputation is a giant subfield of statistics and lots of
theory and application has gone into the methods used for it. When it's not a
good idea using the 'gaps' suggestion seems like a fair substitute, but there
are certainly instances where it is fair to use interpolation.

There are absolutely examples for every technique listed where it is not
appropriate, but this is more of a toolbox of methods and the visualizer who
is familiar with the data should make their decisions on which method to use
for their data.

Adding this response as the above comment was the only one on the article and
I know I sometimes use the comments to filter articles I read and would hate
to see someone pass up this due to an (in my opinion) misguided opinion.

