Thus allowing you to tweak visualizations on the fly without touching code. My workflow is:
sql -> dataframe -> pivottable
This is not a dig at matplotlib which is undeniably powerful. More like an alternative for those of us that want to convey good-enough flexible interactive visualizations without getting into the minutia with matplotlib
I always say that matplotlib is more of a low-level charting API - you can do whatever you need, but it'll take a long time and a lot of code. Better is stuff like seaborn, pandas' charting support, and the new ggplot port and altair.
Also underutilized is pandas' to_excel, to_clipboard stuff where you can then transfer to whatever application and back for editing / graphing purposes IMHO.
Its more domain-specific, but I do like the seaborn library.
If you've not had to endure that hazing ritual, then yes, there are better options.
I'm a bit biased, as I wrote this particular section (most of the rest is Ben's work), but the plotting method overview is a very useful cheatsheet: http://nbviewer.jupyter.org/github/WeatherGod/AnatomyOfMatpl...
It gives you a compact visual representation of what the main plotting methods do and the differences between them.
Improving my python on the way ( and knowing numpy and matplotlib) has been a great experience the last 2 days. Although the progress seems to be "slow" ( translating formulas, n armed bandits to code ...). My best tip: download cheatsheets for: numpy, pandas, matplotlib, python, ... has been good for getting to know the language and libraries for ML.
So this tutorial/information will be put in good hands at a very opportunistic time ;) Thanks!
It is designed to be familiar to people who already know MATLAB, and it does that quite well. So it is not "unnecessary", it's like that for a reason. I agree tho' that someone who has never touched MATLAB might want to plot directly from Pandas, or maybe use Seaborn.
I think "organic" part of the API is very well integrated and completely optional in most places. For instance, you can use the matlabish shortcut "subplot(111)" or you can spell out the parameters in a pythonic way as "subplot(nrows=1, ncols=1, plot_number=1)".
Yes, and this was a mistake. Other than logical-indexing, there's one is better off forgetting _everything_ about MATLAB.
For specific examples, I've looked idly at doing chord diagrams and trees in ggplot. Chord diagrams don't seem to exist, you can do them with another R package but I don't think it interacts with ggplot. There's a ggtree library that does interact with ggplot, but in kind of a weird way that I wouldn't describe as "you can now draw trees with ggplot".
(I haven't explored too closely. This was in the context of "I want to write a ggplot for python, and I'm curious whether supporting chord diagrams and trees at some point in the future is plausible".)
I've said it before, but the attempts to replicate ggplot2 usually fail to implement the full stack. As a result we have somewhat incomplete implementations of surface features of ggplot2, without the depth that is afforded by the grid / ggplot2 combo.
ggplot2 is much nicer than matplotlib and d3 as long as you stay on the beaten path; which is wide enough to accommodate the majority of use cases. But the second you step off that beaten path, it's hell.
It just looks like there's a fundamental tradeoff between visualization expressiveness and API complexity.
Btw, Seaborn is a framework on top of matplotlib that replicates the semantics of ggplot2 if that's what you're after.
There's ggpy (formerly ggplot), but that also doesn't have a grammar going on under the hood, and on top of that it pretty directly translates ggplot's API to python. ggplot's API might be fine for R, but it's wildly unpythonic. ggpy doesn't understand where it's coming from or where it's going to. Plus it's buggy and incomplete and seems abandoned.
The one I'm currently looking at is plotnine, which keeps the API but at least it also keeps the grammar.
I was working on my own library until I found plotnine, with IMO a better API (and a rudimentary CLI). Now I'm looking at building it on top of plotnine, and when I have a POC I plan to get in touch with the author of that and see if he's interested in adopting it. (I think this is a long shot, but worth trying.)
Building abstractions is just a matter of writing very simple imperative functions. Modifying existing abstractions is a matter of copying and pasting out of the source code for existing ones.
It can involve lots of tedious trial-and-error, and the documentation is a little terse, but it's about as close to drawing by hand as you can get.
Ggplot2 is great for prototyping, but when you want to really own your graphics, go for Base.
There's also a middle path, called Lattice. It's built on the same library as Ggplot2 (called Grid), but lets you dig down into the guts a little more easily, at the cost of your graphs looking "older", since it's based on Trellis graphs from SAS.
I will agree on reproducibility and portability across graphics devices (not to mention actoss installations). Grid takes care of so much annoyance in that regard.
On the other hand, matplotlib is ridiculously powerful as an embedded plotting library; the limiting factor there is definitely not flexibility, but rather performance, which limits applicability to larger data sets -- you can work around that in many ways, or use something different. (I wrote more than one application-specific OpenGL plot renderer). Performance also limits interactive use, especially on non-desktop devices.
 Disclaimer: Very few technical docs I've seen for "modern" software are of convincing quality. Most are poor or don't exist in the first place. Further Disclaimer: I don't know how to do better and write docs I'd consider at best "meh-ok". People able to write technical texts well seem to be incredibly rare (likely an educational gap; besides 08/15 standard English courses I've never seen lectures or courses on technical writing at an university).
There's a reason Technical Writer is a profession in of itself, this stuff is much harder than people give it credit for.
I'm really curious, as I use matplotlib in interactive mode frequently, but I find it too slow for showing real-time updates of some on-going compute process. (It literally slows down my computations.. I thought about coming up with some kind of multiprocess answer to this but it would still have slow updates.)
Actually I'd be really interested in a simpler library than matplotlib that is specifically designed for live, parallel updates using super fast GPU-driven drawing methods.
There's also moviepy, which integrates well with matplotlib (they take RGB arrays as inputs to build the video file, but it has an helper function that converts matplotlib figures into numpy RGB arrays - that should be easy to parallelize). Link: http://zulko.github.io/blog/2014/11/29/data-animations-with-...
If there's a time dimension, there should be an animation.
This animation is part of a blog post I just published . I also have a notebook on github with an example (data & code - )
 - https://medium.com/football-crunching/the-zone-where-it-happ...
 - https://github.com/rjtavares/football-crunching/blob/master/...
If you are a researcher and you want to publish in B&W (something still very common in fields like Physics and Astrophysics), no other plotting library for Python comes near.
You can choose filling patterns, line patterns, annotate with LaTeX, etc. And, although hard, you can make your final product look as polished and perfect as you want (and as you are willing to take the time). No other library for Python comes near in these aspects.
There are simpler tools and it's easy to get a good enough looking plot, but if you want to get that perfect one exactly as you need, there's no way around Matplotlib (at least amongst the well known Python plotting libraries).
1. Easy to embed MPL graphics in Tkinter GUI's. Granted, my programs are not intended to be professional looking, but if I want to write stand alone software, e.g., for an automated experiment or industrial test, it invariably needs one or two graphs in a dialog.
But being able to visualise the problem or your solution is so important to build more intuition and become a better wannabe data scientist.
I am finding the API a lot cleaner than Matplotlib, and it is very nice to have the ability to do integrated interactive plots in Jupyter.
I've spent hours trying to get matplotlib to render on screen on OSX, and followed countless stackoverflow and blog posts instructions.
I still can't.
Edit: Just noticed you didn't mention Jupyter so I guess disregard if you aren't using it.
Basically, currently I am just accessing the HN API to plot the score graphs for one or multiple story items.
So this doesn't really generate any valuable insights. But I'd be happy for suggestions about what type of data/visualization would be valuable. Good to have a challenge to tackle :)
The thing you'll need to do is that when in R you write unquoted expressions in your aes, in plotnine you need to quote them. So `aes(x=foo/3, y=bar, color=..baz..)` becomes `aes(x='foo/3', y='bar', color='..baz..')`.