
Plotnine: A grammar of graphics for Python - carlosgg
https://github.com/has2k1/plotnine
======
peatmoss
I feel like a lot of attempts to recreate ggplot2 end up being superficial
because they don't recognize / duplicate the power of the underlying Grid
graphics that ggplot2 uses.

I know that web technologies are all the rage these days, but at least for
static, publication-ready graphics, Grid is really nice substrate, with well
thought out lower-level abstractions.

EDIT: I should also add that it's documented within an inch of its life should
anyone feel that it's worth recreating:
[https://stat.ethz.ch/R-manual/R-devel/library/grid/html/grid...](https://stat.ethz.ch/R-manual/R-devel/library/grid/html/grid-
package.html)

~~~
platz
In what way does eschewing the underlying Grid graphics make a less desirable
experience for a ggplot2 port?

~~~
peatmoss
The ports I see feel like a < 100% enumeration of the plots in the mainline
ggplot2 package. However, there are heaps of great extensions to ggplot2 that
I suspect are in part due to there being a carefully thought out set of
abstractions at the low level of Grid that mesh nicely with the high level
abstractions of ggplot.

ggplot2 being built on top of Grid means that modestly complex stuff is easy
(in ggplot2 by itself), but that it's relatively easy to drop down into the
lower layer (grid) to do more.

------
has2k1
Surprise to see this at the top, I am the creator* of plotnine. The most
common question seems to be, what to expect of plotnine? The answer; a high
quality implementation of a grammar of graphics with an API that closely
matches ggplot2, and more.

I also want other packages to be able to build off of plotnine, e.g. a package
with the functionality of Seaborn could be built off of plotnine. The only
constraint should be whether the backend -- in this case Matplotlib -- does
stand in the way. Matplotlib is evolving (though slowly) and has a very
receptive community so there is lots of hope.

* - Many people contributed to its history.

~~~
closed
I watched your refactor of yhat's py ggplot branch, and was disappointed when
glamp dropped in a totally new implementation out of the blue. Thanks for all
your hard work--glad it is it's own package now :).

~~~
has2k1
Well, I think it was different priorities. My main objective for contributing
was to have a full on grammar of graphics package in python. I appreciate
those warm feelings from afar.

------
sirrice
Recreating and keeping up with Hadley's hard work is challenging, particularly
because ggplot2's layout and extensions are really nice and continue to
evolve.

As an alternative that preserves the full power of Wickham's implementation,
pygg[1] is a Python wrapper that provides R's ggplot2 syntax in Python and
runs everything in R.

[1] [https://github.com/sirrice/pygg](https://github.com/sirrice/pygg)

------
skierscott
Another grammar of graphics: altair[1]. The altair are simpler and easier to
read, i.e.

    
    
            Chart(df).mark_point().encode(
        x='age', y='height', color='sex')
    

Also, see Jake Vaderplas's talk on an overview of Python visualization tools
at
[https://youtube.com/watch?v=FytuB8nFHPQ](https://youtube.com/watch?v=FytuB8nFHPQ)

[1]:[https://altair-viz.github.io](https://altair-viz.github.io)

------
vignesh_m
If this is an implementation of ggplot2, what does it offer over
[http://ggplot.yhathq.com/](http://ggplot.yhathq.com/)?

I don't mean to undermine your project, just wanted to know about significant
differences.

~~~
kaffee
\- yhat's ggplot has 256 commits, 13 contributors, last commit Nov 2016.

\- plotnine has 1,283 commits, 42 contributors, most recent commit is 3 days
ago.

~~~
asdfgadsfgasfdg
These comparisons are pointless. Would you like to compare loc as well? How
about man hours spent?

The only comparison that is important is how well the two projects work. I
have no idea how well plotnine works yet (but I intend to find out). I do know
that ggplot works OK - and seeing as it leverages matplotlib if there is
anything that isn't implemented I can finish the plot off manually.

EDIT it seems that plotnine also leverages matplotlib and produces nicer plots
for some common cases :).

~~~
cle
It's not pointless at all. For long-term maintenance, the community strength
and level of active development is just as important (and sometimes more
important) than minor feature differences.

~~~
jacobolus
I would often rather use a decade-old project that was developed solo by a
world-class expert dumping code over the wall once every 6 months than a
community project being hacked on by 100 amateurs.

Without additional context I find recency of last commit and number of
committers to be almost impossible to draw useful conclusions from.

~~~
StavrosK
I wish this contrarian argument would die already. It amounts to basically
this:

"On average, X is better than Y."

"Ah but I would rather have the top end of Y than the bottom end of X,
therefore comparing averages is useless."

Yes, a brand new Ford Focus would be better than a Ferrari that doesn't run,
but generally Ferrari is the better brand.

~~~
jacobolus
I wish this both-lazy-and-condescending missing-the-point hand-wavy-analogy
argument style would die already.

The earlier poster in this thread implied that number of contributors and
recency of commits in one of two competing github projects was evidence that
it was better.

My point is that these are inadequate (often totally misleading) heuristics
unless both projects are otherwise extremely similar, which they usually are
not, and even then are usually not very useful heuristics compared to other
ways of comparing the projects.

Unless you know who the authors are, what the project management/organization
style is, how the project is funded / what level of commitment the authors
have, what the project release cycle is like, etc., or unless you directly
examine the code yourself, the only thing that looking at the most recent git
commit tells you is how recently someone published public code changes. Which
is not something that anyone evaluating two projects cares about directly, but
only as some heuristic signal of other features that might be more costly to
examine.

But note that commit recency doesn’t give a remotely useful sense of how
extensible the project is, how readable or efficient the code is, how well
designed the API is, how good the documentation is, how friendly the community
is, how competent the project management is, .....

If we want to make a car analogy, it’s like choosing which car to buy based on
how frequently the company introduces new models, or how many engineers they
employ, rather than based on customer reviews, reliability estimates,
accessibility of mechanics, gas mileage, top speed, or storage capacity.

Your argument is basically analogous to: _“because the average car with
frequent updates is better than the average car with infrequent model updates,
criticizing that as a primary criterion for choosing a car is an invalid
argument”._ Notice that you haven’t even bothered to examine whether your
premise about the relation between updates and quality is true, or whether
that average relationship makes update frequency a practically useful
heuristic or not.

~~~
StavrosK
In the absence of any other information, the more recently updated codebase is
preferred to the least recently updated, for the same reason that an abandoned
codebase is dispreferable.

------
Waterluvian
What is meant by "a grammar"?

Is it the way we concatenate functions to create what's essentially a sentence
of what we want the plot to be?

~~~
perturbation
ggplot2 was inspired by: [https://www.amazon.com/Grammar-Graphics-Statistics-
Computing...](https://www.amazon.com/Grammar-Graphics-Statistics-
Computing/dp/0387245448)

and Hadley Wickham wrote about it in [http://vita.had.co.nz/papers/layered-
grammar.pdf](http://vita.had.co.nz/papers/layered-grammar.pdf).

I'm no expert, but I think that one of the main ideas is to separate the
elements of making a plot from the way that the data is presented. For
example, in ggplot2, you have the data that will go into the graph, the type
of plot (or "geometry") that defines how the data are presented (scatterplot,
bar plot, etc.), and then various "layers" that can be added that affect
style.

In order to split a plot into subplots, you simply define how it is to be
faceted (what column should be used to define groups). Grammar-of-graphics
moves plotting away from the "turtle graphics" model and lets you specify what
_should_ be done. Then ggplot figures out how to do it, kind of like SQL vs.
writing for loops to retrieve information.

~~~
Waterluvian
Aha! Thank you. So it's kind of like a declarative way of plotting.

------
stared
I get it is ggplot-based, by as it is Python, why not being more idiomatic and
using chaining instead of adding?

------
coldtea
This is nice and all, but the syntax and names are totally unintuitive.

If I'm to dig in the manual, I might as well build my plots with the standard
syntax of any random plotting library.

Is this "grammar of graphics" any good if you invest more time in it?

~~~
npgatech
I find it the opposite. The syntax is the most intuitive of all plotting
applications.

Layers are as follows [1]

1\. Data

2\. Aesthetic mappings

3\. Statistical transformation (stat)

4\. Geometric object (geom)

5\. Position adjustment

Once you get a hang of this, it becomes easy to create new plots purely from
the understanding of the layers. In matplotlib or even in Seaborn, I find
myself constantly Googling for examples.

ggplot2 is the most beautiful thing to happen in visualization space!

[1] Wickham, Hadley, and Carson Sievert. "4.4.1 Layers." Ggplot2: Elegant
Graphics for Data Analysis. Dordrecht: Springer, 2016. N. pag. Print.

------
lorenzfx
This looks interesting, but I find the documentation somewhat lacking. Is the
user supposed to know ggplot?

