Hacker News new | comments | show | ask | jobs | submit login
Effectively Using Matplotlib (pbpython.com)
375 points by kercker 83 days ago | hide | past | web | 76 comments | favorite

I needed jupyter as a medium of information sharing in my team but matplotlib has just too much of a learning curve to expect everyone to adopt it as tribal knowledge considering this was not a core part of their job. I found a compromise using the wonderful jupyter_pivottablejs library:


Thus allowing you to tweak visualizations on the fly without touching code. My workflow is:

sql -> dataframe -> pivottable

This is not a dig at matplotlib which is undeniably powerful. More like an alternative for those of us that want to convey good-enough flexible interactive visualizations without getting into the minutia with matplotlib

This is brilliant!

I always say that matplotlib is more of a low-level charting API - you can do whatever you need, but it'll take a long time and a lot of code. Better is stuff like seaborn, pandas' charting support, and the new ggplot port and altair.

Also underutilized is pandas' to_excel, to_clipboard stuff where you can then transfer to whatever application and back for editing / graphing purposes IMHO.

I've commented on this before, but matplotlib is sort of stuck between a rock and a hard place of supporting the cruft of MATLAB plotting syntax, and trying to be pythonic. I'm still a big fan of it because I've grown with it, but I also don't expect that it is the future of python technical plotting.

Its more domain-specific, but I do like the seaborn library.

Those of us who like matplotlib like it because we got into those minutia with Matlab many years before.

If you've not had to endure that hazing ritual, then yes, there are better options.

State of visualization in Python by Jake Vanderplas:


Another useful guide is Ben Root's Anatomy of Matplotlib tutorial: https://github.com/WeatherGod/AnatomyOfMatplotlib

I'm a bit biased, as I wrote this particular section (most of the rest is Ben's work), but the plotting method overview is a very useful cheatsheet: http://nbviewer.jupyter.org/github/WeatherGod/AnatomyOfMatpl...

It gives you a compact visual representation of what the main plotting methods do and the differences between them.

I'm picking up reinformencent deep learning and documenting progress with jupyter notebook.

Improving my python on the way ( and knowing numpy and matplotlib) has been a great experience the last 2 days. Although the progress seems to be "slow" ( translating formulas, n armed bandits to code ...). My best tip: download cheatsheets for: numpy, pandas, matplotlib, python, ... has been good for getting to know the language and libraries for ML.

So this tutorial/information will be put in good hands at a very opportunistic time ;) Thanks!

Could you share those cheatsheets? I'm trying to teach myself ML but I come from a java background so it's really funny writing 30 lines of procedural code only to find out a single line of functional/numpy code would have done the same trick.

A cool feature I recently learned about of matplotlib is that it supports LaTeX for text rendering [1]. You can go as far as rendering LaTeX math formatting for titles/labels, or just have the plot fonts match your text and/or figure captions so it fits nicely into your paper.

[1] http://matplotlib.org/users/usetex.html

I've recently started using this option in gnuplot using the epslatex terminal [1]. Makes for very attractive plots and is relatively simple to use. For those looking for a Matplotlib alternative, I highly recommend it.

[1] http://www.gnuplotting.org/output-terminals/

matplotlib is an example of unnecessarily complex and confusing "organic" API. That's why there is so much resentment to use it; trivial things need non-trivial internal understanding and confusing boilerplates.

matplotlib is an example of unnecessarily complex and confusing "organic" API

It is designed to be familiar to people who already know MATLAB, and it does that quite well. So it is not "unnecessary", it's like that for a reason. I agree tho' that someone who has never touched MATLAB might want to plot directly from Pandas, or maybe use Seaborn.

The problem is that both Pandas and Seaborn are customized by passing their kwargs onto matplotlib, or by giving you back matplotlib axes objects. You have to break through the abstraction pretty much immediately. You can't really put the finishing touches on your graphs without also knowing matplotlib.

I didn't have any significant MATLAB background when I started using matplotlib and I found it a joy to use.

I think "organic" part of the API is very well integrated and completely optional in most places. For instance, you can use the matlabish shortcut "subplot(111)" or you can spell out the parameters in a pythonic way as "subplot(nrows=1, ncols=1, plot_number=1)".

I am fairly comfortable with doing basic matrix work in MATLAB and I am fairly comfortable using Python, but I have found matplotlib to be quite hard to get the hang of. Esp when you add it to pandas which is also slightly idiosyncratic in its own way, and then all of these are different from numpy, which is a mess of its own.

>It is designed to be familiar to people who already know MATLAB

Yes, and this was a mistake. Other than logical-indexing, there's one is better off forgetting _everything_ about MATLAB.

I agree with your sentiment, but at the same time I've yet to see anyone do it better. The other popular tools like d3 have the same problems, I think some of it may just be an intrinsic cost with expressive visualization frameworks.

I personally like the design of ggplot2. Not necessarily a design everyone will love, but it's pretty consistent and systematic in its design around the "grammar of graphics". Works especially well if you buy into the entire Hadley Wickham ecosystem of R packages: ggplot2, dplyr, reshape2.

I really like ggplot2, but I'm not sure it really competes with matplotlib or d3. It feels like, if it doesn't already support the thing you want, implementing it in ggplot is a lot harder than in mpl or d3.

For specific examples, I've looked idly at doing chord diagrams and trees in ggplot. Chord diagrams don't seem to exist, you can do them with another R package but I don't think it interacts with ggplot. There's a ggtree library that does interact with ggplot, but in kind of a weird way that I wouldn't describe as "you can now draw trees with ggplot".

(I haven't explored too closely. This was in the context of "I want to write a ggplot for python, and I'm curious whether supporting chord diagrams and trees at some point in the future is plausible".)

Ggplot2 is built on Grid Graphics, which is a well thought-out low level graphics layer. Part of the reason so many extensions to ggplot2 get built, is because Grid is easy (albeit lower level) to work with.


I've said it before, but the attempts to replicate ggplot2 usually fail to implement the full stack. As a result we have somewhat incomplete implementations of surface features of ggplot2, without the depth that is afforded by the grid / ggplot2 combo.

Spot on.

ggplot2 is much nicer than matplotlib and d3 as long as you stay on the beaten path; which is wide enough to accommodate the majority of use cases. But the second you step off that beaten path, it's hell.

It just looks like there's a fundamental tradeoff between visualization expressiveness and API complexity.

Btw, Seaborn is a framework on top of matplotlib that replicates the semantics of ggplot2 if that's what you're after.

Oh? I've looked at seaborn, but it didn't seem very ggplot-ish. No grammar going on.

There's ggpy (formerly ggplot), but that also doesn't have a grammar going on under the hood, and on top of that it pretty directly translates ggplot's API to python. ggplot's API might be fine for R, but it's wildly unpythonic. ggpy doesn't understand where it's coming from or where it's going to. Plus it's buggy and incomplete and seems abandoned.

The one I'm currently looking at is plotnine, which keeps the API but at least it also keeps the grammar.

I was working on my own library until I found plotnine, with IMO a better API (and a rudimentary CLI). Now I'm looking at building it on top of plotnine, and when I have a POC I plan to get in touch with the author of that and see if he's interested in adopting it. (I think this is a long shot, but worth trying.)

I am the author, do let me know what you are trying to do. I too have some ideas of what to add to plotnine or build on top of it.

I'll try to get in touch tomorrow. By default I'll open an issue on github, unless there's another way you'd prefer me to contact you?

Another library to look at these days in this area is Altair (https://altair-viz.github.io/), which is based on the vega spec.

Then use R's built-in "base graphics". They afford literally pixel-by-pixel control if you know what you're doing, in a 100% declarative API that isn't so hard to learn.

Building abstractions is just a matter of writing very simple imperative functions. Modifying existing abstractions is a matter of copying and pasting out of the source code for existing ones.

It can involve lots of tedious trial-and-error, and the documentation is a little terse, but it's about as close to drawing by hand as you can get.

Ggplot2 is great for prototyping, but when you want to really own your graphics, go for Base.

There's also a middle path, called Lattice. It's built on the same library as Ggplot2 (called Grid), but lets you dig down into the guts a little more easily, at the cost of your graphs looking "older", since it's based on Trellis graphs from SAS.

Base R graphics are a little like Matplotlib in being an organic API. I'd argue that Grid is a better starting point than base R, if you're planning to create a reproducible plot type.

Organic in what sense? Other than "pch" it's much less magical than matplotlib (to me), and its primitives are exposed alongside the convenience functions like hist() with ugly defaults we all love to hate.

I will agree on reproducibility and portability across graphics devices (not to mention actoss installations). Grid takes care of so much annoyance in that regard.

Yeah, the mishmash of convenience functions and pch were pretty much exactly what I was thinking about. Also, I will vehemently agree that base R graphics are still far less magical than matplotlib. So, I guess I meant "like matplotlib, but to a far less extreme degree" :-)

I definitely am not a fan of R's inconvenient convenience functions. A saner set of wrappers for R base graphics is on my long to-do list, at least something with better defaults. Someone had started a project that goes in the right direction called "compactr", but it seems abandoned: https://cran.r-project.org/package=compactr

How about gnuplot ?

Gnuplot needs to die. It is not GNU. It is not libre open source.

On the one hand, yes, if you're looking at the API design and the docs from a "clean slate" perspective, then they are clearly not very good. [1]

On the other hand, matplotlib is ridiculously powerful as an embedded plotting library; the limiting factor there is definitely not flexibility, but rather performance, which limits applicability to larger data sets -- you can work around that in many ways, or use something different. (I wrote more than one application-specific OpenGL plot renderer). Performance also limits interactive use, especially on non-desktop devices.

[1] Disclaimer: Very few technical docs I've seen for "modern" software are of convincing quality. Most are poor or don't exist in the first place. Further Disclaimer: I don't know how to do better and write docs I'd consider at best "meh-ok". People able to write technical texts well seem to be incredibly rare (likely an educational gap; besides 08/15 standard English courses I've never seen lectures or courses on technical writing at an university).

> Very few technical docs I've seen for "modern" software are of convincing quality.

There's a reason Technical Writer[0] is a profession in of itself, this stuff is much harder than people give it credit for.

[0]: https://en.wikipedia.org/wiki/Technical_writer

One aspect of matplotlib that is often overlooked is the animation capabilities. There should be more animations in data-sciency stuff (there's a reason small gifs spread so easilly on the internet).

We recently released a thin wrapper around matplotlib that makes it easier to do live plots [1] (since matplotlib has a few gotchas). We use it in a fair number of projects internally, since it makes it easier to monitor performance of various models as they are trained, which shortens the code-test loop.

[1] https://github.com/IGITUGraz/live-plotter

Are you just referring to ion() and draw(), or do you mean something more specific to animation?

I'm really curious, as I use matplotlib in interactive mode frequently, but I find it too slow for showing real-time updates of some on-going compute process. (It literally slows down my computations.. I thought about coming up with some kind of multiprocess answer to this but it would still have slow updates.)

Actually I'd be really interested in a simpler library than matplotlib that is specifically designed for live, parallel updates using super fast GPU-driven drawing methods.

I'm talking about the animation module: https://matplotlib.org/2.0.0/api/animation_api.html

There's also moviepy, which integrates well with matplotlib (they take RGB arrays as inputs to build the video file, but it has an helper function that converts matplotlib figures into numpy RGB arrays - that should be easy to parallelize). Link: http://zulko.github.io/blog/2014/11/29/data-animations-with-...

Ah thanks, I haven't used that.

I question whether or not animation would really help anything. Rather, I'd wager it would be like most animated PowerPoints.

An example of something I made with matplotlib: https://streamable.com/dui9k

If there's a time dimension, there should be an animation.

Like that a lot, is that under-pinned chyronhego data by any chance?

Thanks! That's actually data I collected myself.

This animation is part of a blog post I just published [1]. I also have a notebook on github with an example (data & code - [2])

[1] - https://medium.com/football-crunching/the-zone-where-it-happ...

[2] - https://github.com/rjtavares/football-crunching/blob/master/...

If you're using the animation to add another axis to the plot, that can be extremely useful. But using it to make annotations zip around the screen would be pathetic.

It's useful in that it allows visualising an extra dimension, most likely time, but I agree it has great potential for misuse. Also it doesn't work on printed material.

My current issue with the animation module is that there is no way to clear an animation away. You can clear the figure, but the animation will redraw on top of the cleared figure. The only workaround is to delete the FuncAnimation object, but that is difficult to do in a language without deterministic destruction.

Have you tried moviepy? It works by converting matplotlib figures into numpy arrays (you can even pre-build a list of arrays and then just iterate over the list).


It can be used for interactive widgets as well.

Example: [1]. As part of a python library I wrote for querying and manipulating annotations of a medical image dataset, I added, using matplotlib, a very basic DICOM-viewer that interactively flips through slices of chest CT scans and displays annotation info.

[1]: https://raw.githubusercontent.com/pylidc/pylidc/master/img/v...

You may already be aware of it but dicompyler is a nice dicom viewer built in python and it's quite easy to build plugins for it.

I want to vouch for Matplotlib, I can see it gets a bad reputation when compared to these new shiny frameworks like plotly, but it's vastly more powerful.

If you are a researcher and you want to publish in B&W (something still very common in fields like Physics and Astrophysics), no other plotting library for Python comes near.

You can choose filling patterns, line patterns, annotate with LaTeX, etc. And, although hard, you can make your final product look as polished and perfect as you want (and as you are willing to take the time). No other library for Python comes near in these aspects.

There are simpler tools and it's easy to get a good enough looking plot, but if you want to get that perfect one exactly as you need, there's no way around Matplotlib (at least amongst the well known Python plotting libraries).

MPL is my go-to graphing tool, but admittedly it's probably because I learned it first and now it's a habit. Almost every Python / Jupyter tutorial starts you out with MPL. But there are two things I like about it:

1. Easy to embed MPL graphics in Tkinter GUI's. Granted, my programs are not intended to be professional looking, but if I want to write stand alone software, e.g., for an automated experiment or industrial test, it invariably needs one or two graphs in a dialog.

2. If what you want is a static graph (no interaction), that's what MPL produces. With other packages that I've tried, every graph is its own JavaScript program running in the browser. A Jupyer notebook with dozens of graphs begins to hog down my computer.

This looks like a great resource! I am currently picking up deep learning and one of the things that they understandably don't cover much is how to use matplotlib.

But being able to visualise the problem or your solution is so important to build more intuition and become a better wannabe data scientist.

I used matplotlib for a very long time. Now, I suggest using bokeh


I am finding the API a lot cleaner than Matplotlib, and it is very nice to have the ability to do integrated interactive plots in Jupyter.

Biggest matplotlib frustration:

I've spent hours trying to get matplotlib to render on screen on OSX, and followed countless stackoverflow and blog posts instructions.

I still can't.

I had this. It turned out that my issue was that "%matplotlib" and "%matplotlib inline" are different, and I was using one when I needed to use the other (I forget which).

Edit: Just noticed you didn't mention Jupyter so I guess disregard if you aren't using it.

This may be completely off base but is the issue that you get a run time error telling you that Python is not installed as a framework?

Ive had issues on windows before but not osx. Always just installed with pip and it worked.

I'm currently learning to use Matplot by visualizing HN activity (very early stages) so this comes very handy. Thanks for sharing.

Interesting! Do you plan to share your findings?

I have one up on Github. But as I am a Python beginner, I am not sure if this is sophisticated :)

Basically, currently I am just accessing the HN API to plot the score graphs for one or multiple story items. https://github.com/martinweigert/hacker_news_analysis

So this doesn't really generate any valuable insights. But I'd be happy for suggestions about what type of data/visualization would be valuable. Good to have a challenge to tackle :)

How does matplotlib compare with gnuplot?

Anyone know of a good tutorial for plotnine? I'm new to graphing in python and am attracted to this because it should crossover to ggplot2 in R (which I'd also like to learn, but doing python for now). Will ggplot2 tutorials for R be enough to get going with plotnine?

From what I've seen, pretty much. It seems to be a pretty direct translation (though I've found some bugs that I haven't filed yet).

The thing you'll need to do is that when in R you write unquoted expressions in your aes, in plotnine you need to quote them. So `aes(x=foo/3, y=bar, color=..baz..)` becomes `aes(x='foo/3', y='bar', color='..baz..')`.

A fantastic and sorely needed tutorial for orienting matplotlib into modern usage. I really appreciated his description of the matlab-style API vs the object oriented API. Also how to use it with pandas' shortcut methods.

Having the graph go beyond a point with the last axis number under it is annoying as hell and everyone who does that should feel bad.

Very informative. this clears up a lot of doubts I had because I was doing a lot of snippet copying for my plots before.

Everyone, check out toyplot! It is a very easy python module for plotting.

Are there any advantages of using matplotlib versus say ggplot2?

from my personal experience, mpl's 3D plotting capabilities are pretty terrible (just try log-scaling your axes) and looking into Mayavi as a replacement has been on the list for a while.

I really had my hopes up for VisPy for this (GPU-accelerated 3D plotting). It was supposed to be the convergence of 3 or 4 previously existing OpenGL-based plotting libraries. However, 4 years after starting development, it's still in early stages and not really usable unless you want to code shaders by hand.

Mayavi is really powerful for 3D plotting and 3D data visualization. Unfortunately the learning curve is quite steep and the documentation is not the best. Also they've built their own python tools for building Mayavi so many things in Mayavi are done in its own, rather unique, way.

mayavi is also not amazing (though much better).. afaik there are no good options right now.. If anyone has any suggestions, aside from coding your own thing in PyVTK, I'd really like to know

When I was looking into 3d plotting I saw the problems with Mayavi and went with plotly instead.

Does plotly have fast 3d?

Applications are open for YC Winter 2018

Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | DMCA | Apply to YC | Contact