
Why I don't use ggplot2 - simplystats
http://simplystatistics.org/2016/02/11/why-i-dont-use-ggplot2/
======
jrauser
I suppose if you've already memorized the arcane syntax of base R's plotting
functions, along with a giant laundry list of highly customized visualizations
available via libraries, then perhaps ggplot2 really is slower for exploratory
plotting. But for anyone coming to R fresh, the expressiveness and coherence
of ggplot2's grammar absolutely blows away base R for fast exploratory
graphics.

For working data scientists in industry, there is an extremely important
middle ground between exploratory plots that only I will ever see, and
publication quality plots. The most important plots I make are those that I'll
share with a small audience within my company. These need to be clear and
aesthetically pleasing, but not polished to an extreme degree. Again, ggplot2
excels in this regime.

~~~
wyldfire
> arcane syntax

Python's seaborn has visualization capability that seems to compare well with
ggplot2. Considerably more readable, IMO.

~~~
ves
I switched entirely from R to python, with matplotlib and seaborn, for data
science. It's a much nicer environment. R is horrifically unreadable and has a
lot of absolutely terrible unexpected behaviors, and I like python better as
an actual programming language.

~~~
glial
Yeah but now you have to use pandas instead of dplyr :-(

------
ellisv
I don't agree with the author on some of these points but its seems perfectly
fine for what it is given that the title is "Why _I_ don't use ggplot2" not
"Why _you_ shouldn't use ggplot2".

------
pinewurst
The author makes an interesting point that ggplot2 defaults often look
polished enough to make a novice find them acceptable when they're really not.

It's a beautifully polished and well-documented R package, but often I feel
like I've wrestled a bear to get a figure just right. The effort though seems
worth it for the results and I don't feel that way about the base graphics.

------
hyperbovine
Having trouble getting past the unreadable font and broken scrollbar in order
to read this article about how difficult ggplot2 is to use.

~~~
simplystats
Didn't say I used it for blogging :)

~~~
avn2109
Good god please fix the scrolling. FWIW I immediately switch to reader mode,
which strips all your ads/everything but the text out, because the scroll
interaction is broken.

------
jmount
I use a bit less of the Hadleyverse than my colleagues like. But ggplot2 has a
number of advantages over base graphics (even if you were to figure out base
graphics). One being the plot is a structure or value (returned by plot
construction) and not a bunch of side effects happening in a viewport. This
orientation is much more compatible with functional programming (the author
hints at this with the "compatible with piping" point).

------
minimaxir
Using a very, very simplistic chart and dataset for comparing base graphics
and ggplot2 is cheating a bit. Base graphics can become extremely unwieldy
when dealing with nonunivariate data, especially if you want specific theming.

The ggplot2 code is not formatted one-function-per-line like the base code. It
may be slightly more LoC for base graphics, but it's _very_ clear what is
happening with the ggplot2 and design, which is a far more important attribute
than LoC.

~~~
jpatrick
I think intermediate-complexity graphics are where ggplot2 excels. If you want
to do something like plot a bunch of small multiples, you'll have a much
better time of it using facet_grid than wrangling with base graphics.

Once your graph reaches a certain level of complexity, though, or requires a
certain degree of customization, I think base graphics regain the edge. At
that point I prefer the level of control you get by drawing things from
scratch with points(), lines(), text(), etc.

------
chaosfox
actually making heatmaps is rather easy, checkout:
[http://docs.ggplot2.org/current/geom_tile.html](http://docs.ggplot2.org/current/geom_tile.html)

~~~
thisisdave
It's easy if your data is already in two columns for x and y. If it's a
raster, then you have to remember the right command from reshape2 or tidyr or
write one yourself.

~~~
_Wintermute
I really disagree with the argument about heatmaps. It's hardly difficult to
convert a matrix into into a long dataframe that can be used with ggplot.
Literally one more word.

    
    
        ggplot(melt(matrix), aes(x = Var1, y = Var2, fill = value)) + geom_raster()

~~~
hadley
Jeff was probably thinking of heat maps with trees on the margins.

------
_Wintermute
I don't just base graphics because making a simple legend is a PITA, when
every other plotting library seems to handle it fairly easily.

------
casca
Site down, try the cache:
[https://webcache.googleusercontent.com/search?q=cache:http%3...](https://webcache.googleusercontent.com/search?q=cache:http%3A%2F%2Fsimplystatistics.org%2F2016%2F02%2F11%2Fwhy-
i-dont-use-ggplot2%2F)

------
tmalsburg2
I see ggplot2 is not so much as a replacement for base graphics but rather for
lattice, and as such it is pretty amazing. Downsides of ggplot2 are the poor
default theme and the fact that it's horribly slow. For bigger data sets it is
often two orders of magnitude slower than base graphics, which can be
prohibitive.

~~~
glup
Interesting, I think ggplot of it as the replacement for base graphics.
Lattice is for fast, ugly plots but with fast development time; I could always
build things up piece-by-piece with base graphics but ggplot makes it more
convenient.

~~~
tmalsburg2
Sure you can use ggplot2 as a replacement for base. But at least for me, the
killer feature of both lattice and ggplot2 is faceting.

------
vsbuffalo
I really like all plotting systems in R. First, I used base graphics for a few
years—and loved it. You learn your way around par(), commit esoteric argument
names to memory (oma, mar, mgp, mfrow, etc). It feels powerful — you're just
drawing on a screen; its history traces to the original pen plotters. Second,
I learned lattice. You can't help but fall in love with lattice after a year
or two with creating panel plots in base graphics. The biggest learning curve
with lattice is panel functions, but once you learn to throw a browser() in a
panel function for stack variable introspection, you can do anything.
Somewhere on a dusty bookshelf is a well-worn lattice book I splurged on while
taking an R course at UCD.

I like this article, because I think for production graphics, the author has a
point. If you're placing lines, points, and labels on a screen — you can
create anything. You can draw polygons and arcs. It's like drawing with raw
SVG. But I'd have a hard time thinking of an exploratory data analysis
situations I wouldn't reach for ggplot2 first. Since it looks at dataframe
column types (integers, factors, numerics), it automatically matches these two
the appropriate type of color gradient. Coloring a scatter plot by a potential
confounder is one additional argument to aes(), e.g. aes(x, y,
color=other_col). More than once during EDA I've done this and seen some
horrifying pattern in data that shouldn't be there. That's a powerful tool for
one extra function argument — the cost of checking for a confounder with color
(or shape) is essentially near zero.

I'd make the case that this is a more costly operation in base graphics, and
is thus much less likely to be done. You may already have your plots in a for
loop to create panels, then you have a few extra lines for adjusting margins
and axes (rather than facet_wrap(~col)). It took a lot of code to set that up
— there's already a lot of cruft when you just need to do a quick inspection.
Then you need to create a vector of appropriate size of colors, and then map
this to data. Sure it's easy-ish, but _it takes at least double the time as
color=some_col_. In EDA visualization, I want every single barrier to checking
a confounder to be as small as possible—which is what ggplot2 does.

That said, I really liked this article because I do agree that going from EDA
visualization to production is a hassle. Just after reading this, I remade
some production ggplots with base graphics and love the simple aesthetic —
which to mirror in ggplot takes a lot of hassle.

What I really long for is a lower-level data to visualization mapping (like
d3) in R. d3 is a pain to learn, but it's really the only data abstraction
(even though it is a low-level abstraction) that is seemingly limitless in
what it does and can do. I always hope for a general data-join grammar like
d3's to be the norm, built on top of base plotting (analogously: svg
elements), and then have abstractions like ggplot for tabular data built on
top of that.

~~~
Lofkin
What do you think of bokeh:
[https://github.com/DataWookie/MonthOfJulia](https://github.com/DataWookie/MonthOfJulia)

------
JohnLeTigre
Nothing that basic archiving can't fix. He should just maintain a script
containing his ggthemes of predilection for reuse.

~~~
Fomite
I've been astonished, doing research, at how un-portable "The graph I used for
something like this last time" code has been.

------
benbenolson
Looks like the site is down.

------
hackaflocka
Question to the author: have you tried a GUI like Deducer for creating ggplot2
graphs? If yes, what was your experience like? If no, why not?

