
Seaborn: a high-level Python interface for drawing statistical graphics - danso
https://github.com/mwaskom/seaborn
======
jordigh
While we're talking about stats and Python could I convince someone here to
implement a fast medcouple for statsmodels? I can't do it myself because I
read R's GPL'ed code in order to understand the algorithm. Using my
understanding, I wrote the following high-level description of it:

[https://en.wikipedia.org/wiki/Medcouple](https://en.wikipedia.org/wiki/Medcouple)

This should be taken as the design spec of a clean-room reverse engineering,
so that we can have a free, fast and non-copylefted implementation. It's not
that I have a problem with copyleft (in fact, I prefer it), but I really want
statsmodels to fix their implementation, and they're GPL-phobic.

Since Seaborn has boxplots, implementing an adjusted boxplot seems relevant.

edit: Oh, one more thing. I'd love any feedback on how to improve the "design
spec", in case I wasn't able to make it clear enough.

~~~
shoyer
Are you really suggesting that it's not possible for you write an independent
implementation of an algorithm from the version you once read? Is this sort of
"clean room" approach typical? It strikes me as absurdly cautious.

~~~
jordigh
It is what is known to be legally safe. We do it all the time in GNU Octave,
and we always tell people to not read Matlab source code when implementing
Octave functions.

Besides, I just don't feel like it's fair to the R copyright authors. They
worked hard to produce an implementation and they copylefted it, and I
_heavily_ relied on their implementation in order to reimplement it myself.

According to the United States Copyright Office[1], the algorithm itself can't
be copyrighted, so that's why I wrote a high-level description of the
algorithm.

\----

[1] "Copyright protection is not available for ideas, program logic,
algorithms, systems, methods, concepts, or layouts."

[http://www.copyright.gov/circs/circ61.pdf](http://www.copyright.gov/circs/circ61.pdf)

~~~
shoyer
I agree -- you shouldn't read sources you don't want to treat as derivative
_while_ you're writing an independent implementation. So yes, the code you've
already written is GPL.

But suppose you decide to reimplement the algorithm now, months later (based
only on the notes you wrote on Wikipedia). I'm not a lawyer, but I would say
that's almost certainly independent, unless you have extraordinary memory.

~~~
jordigh
I don't know. Maybe. I don't know how a judge and a jury would interpret that
situation. They might agree with you or they might not. The only safe
jurisprudence I know of is clean-room design, and it's what the SFLC documents
I have received recommend.

------
danso
tl;dr:

I'm new to Seaborn and matplotlib in general, but Seaborn is a wrapper on top
of matplotlib, and from what I can tell, was borne partly out of frustration
with how hard it is to get matplotlib graphics to look decent out-of-the-box.
Which makes it, in one sense, kind of like what ggplot2 was to R's standard
plotting tools.

However, Seaborn has a more object-oriented API, among other things:

[http://stanford.edu/~mwaskom/software/seaborn/introduction.h...](http://stanford.edu/~mwaskom/software/seaborn/introduction.html)

> Seaborn’s goals are similar to those of R’s ggplot, but it takes a different
> approach with an imperative and object-oriented style that tries to make it
> straightforward to construct sophisticated plots. If matplotlib “tries to
> make easy things easy and hard things possible”, seaborn aims to make a
> well-defined set of hard things easy too.

There already is an attempt to port ggplot over to Python, and its authors'
opinion is that its API should look like R's ggplot2, which means the syntax
is not Pythonic: [http://ggplot.yhathq.com/](http://ggplot.yhathq.com/)

~~~
acadien
Also matplotlib recently added stylesheets

[http://matplotlib.org/users/style_sheets.html](http://matplotlib.org/users/style_sheets.html)

which gets you ~75% of the way there to ggplot style plots. I've found a
number of edge cases where the plots don't turn out right when using the
ggplot sheet. BUT the important part is you can set your own default plotting
style with a single line of code and keep everything nice and pythonic (well
kind of pythonic since you're using matplotlib...)

~~~
makmanalp
There is also [http://ggplot.yhathq.com/](http://ggplot.yhathq.com/) which is
buggy for complex graphs but rapidly getting better - indispensable for me.

~~~
FiReaNG3L
Is this still being developed? The last commit on github was 6 months ago.

------
gammarator
Seaborn is fantastic--it produces high-quality plots with a minimum of code.
The aesthetics are great and easily tuned.

Check out the gallery for examples:
[https://web.stanford.edu/~mwaskom/software/seaborn/examples/...](https://web.stanford.edu/~mwaskom/software/seaborn/examples/index.html)

------
nichochar
I've used this forever, i love it.

You can setup pandas and ipython and write automatic data anlysis scripts that
pump out absolutely beautiful graphs with hardly any effort at all.

Thanks so much to all the people contributing to this awesome python project

------
astrobiased
Seaborn is my favorite statistical plotting package in Python. I wrote an
astro plotting package that digs deep into the Matplotlib internals and it was
not easy. Big props to the developer behind Seaborn and the great aesthetics
he imbued it with.

------
izzypark
I love using Seaborn for my plots, I find myself referring to the color
palettes for other data visualizations too! I wish there were more of a
variety, but thanks to those who have contributed to the project.

