Violin plots  are a great spin on boxplots that help show the distribution.
You could rotate the violin plot to show how the density changes on the other axis, for those graphs where that would be useful, but that requires looking at the data visually and making decisions, which is the whole point of the article.
In general, scatter plots can be very valuable but difficult to compare among trials. Violin plots help mitigate that, though you're correct that it comes at a cost.
In the mass media especially, the mean is often bandied about as the only statistic, and treated as if it was definitive.
Median is the value at which 50% of houses trade higher as well as lower.
* Linear relationship between predictor and response and no multicollinearity
* No auto-correlation (statistical independence of the errors)
* Homoscedasticity (constant variance) of the errors
* Normality of the residual(error) distribution.
As the paper suggests, plotting the data visually will help you avoid these assumptions, but also just making sure you don't violate the assumptions w/ statistical tests would work too. For example, uou can look at your residuals (loss) as an indicator of good fit. If your residuals do not follow a normal distribution, this is typically a warning sign that your R2 score is dubious.
There are a few statistical tests for Residual Normality, particularly, the Jaque-Bara test is common and available in scipy.
So, I would argue, you don't even need to visualize the data. I describe this more here: http://www.eggie5.com/104-linear-regression-assumptions
Until you get into high dimensions, it probably doesn't hurt too much to visualize the data. Additionally, it can be helpful to understand what signal has been left in the residuals (ex: you fit a linear model, but failed to include a quadratic term), which is something hypothesis tests aren't as good at telling you.
I feel it might be an useful example for illustrating the intuition of ZCA and Wasserstein metric.