
Wes McKinney, the developer of Pandas - gk1
https://qz.com/1126615/the-story-of-the-most-important-tool-in-data-science/
======
aklein
The majority of credit is due Wes, no doubt, for bringing pandas to life, but
it's a glaring omission not to mention the huge contributions from the wider
community toward pandas (shout-out to Jeff Reback!)
[https://github.com/pandas-
dev/pandas/graphs/contributors](https://github.com/pandas-
dev/pandas/graphs/contributors)

~~~
wesm
It's true -- as Jeff (lead core dev/maintainer the last several years) often
says "Wes gets the kudos, I get the hate-mail"

------
em500
Wes noted himself that pandas.read_csv() with its 50 or so parameters probably
accounts for a substantial part of its popularity :)

~~~
kortex
For years, I maintained my own python tool for loading/saving CSVs from numpy
formats. It was slow, buggy, constantly hitting edge cases. When
`pd.read_csv()` and `.to_csv()` came onto the scene, it was like the clouds
opened up and a chorus of angels sang. And then you have all the other `read`
and `to` functions, it's glorious.

------
chalence
I'm a fan and frequent user of Pandas, but does the increase in Stack Overflow
questions indicate a surge in popularity, or difficulty in use? I for one run
into Pandas issues frequently and often find myself searching for the succinct
solution (though, to be fair, this may also be an indication of the impressive
scope Pandas strives for).

~~~
agawronski
I think you're right. There are many, many things I have come across which I
search stack overflow excessively because I am overly surprised there isn't a
better method of achieving the task. Try and do a cross join in pandas, it's
deeply dissatisfying.

~~~
nas
Pandas is useful and I don't want to bad mouth it as people obviously find it
useful. However, it has a complicated API and contains about 200k lines of
code. So, it is not a surprise that documentation is a challenge and that
there are lot of Stack Overflow questions. For example, figuring out which
method result in copies of the data vs new views is hard.

Compare with dlply. It solves a similar problem as pandas does but has a
vastly simpler API. To be fair, Pandas does do more but dlply is also more
flexible. I looked at implementing something like dlply in Python but you
really need to have a lazy evaluation syntax. dlply makes extensive use of
this feature of R. As the downside, it can be very confusing to new users as
it is hard to debug this lazy evaluation code.

Rather than adopting Pandas to build our product, I built a very minimal
version of it (on top of numpy) that only does what we need. That was some
extra work but I'm happy I did it as we avoid this huge dependency. I
understand quite well my little minimal version does, it is only about 1000
lines of Python code and some tiny C extensions.

------
darksaints
I really love pandas and dplyr, but honestly both of them are inferior to
modern SQL. In my workflows, I’ve almost exclusively replaced them with
Postgres and it’s foreign data wrappers, spit out the results to a text file
and then load into R or Python.

It’s a more complicated environment for sure, but still more efficient.

~~~
sevensor
I've found pandas great for interactive sessions, for the most part, but I
found doing joins was way too fiddly and I'd much rather do it in SQL. Could
be I missed an important pandas concept in there somewhere that would have
made it make more sense, or maybe the API has improved since the last time I
tried. Generally I've found my SQL ends up being clearer and more readable
after the fact.

~~~
chrisgd
[https://github.com/yhat/pandasql](https://github.com/yhat/pandasql)

------
projectramo
Dan should do a Wes McKinney vs Hadley Wickham type article about the future
of data science instead. (hint: very bright)

I use pandas more than any other tool, but every time I look at R, I go
through a period of re-evaluating all my life decisions.

~~~
wesm
Hadley and I are also friends and collaborators! I think we're going to see a
lot more interesting collaborations between the R and Python communities in
the future, since at the end of the day we're solving a lot of the same
problems

~~~
projectramo
Oh yeah, I didn't mean to imply that it wouldn't all be in good fun. I was
thinking of the obvious counterpoint for the author to use.

You know Heinrich Wolfflin, one of the pioneers of art history, invented this
breakthrough method of teaching where he would set up dual projectors because
demonstrating contrasts was a much more effective pedagogical tool than simply
describing a painting.

------
Bishonen88
Today, McKinney works full time on Pandas and other open-source data science
projects as a software engineer(...)

What? It seems McKinney didn't commit almost anything for the past few years
to pandas (not lowering his contribution per-se, but the article made it
sounds like he's still dedicated to improvements of pandas).

[https://github.com/pandas-
dev/pandas/graphs/contributors](https://github.com/pandas-
dev/pandas/graphs/contributors)

~~~
wesm
I've been working on innovating core computational and IO infrastructure for
pandas (and projects like pandas) -- much of this work has been happening in
other codebases. See: [http://wesmckinney.com/blog/apache-arrow-pandas-
internals/](http://wesmckinney.com/blog/apache-arrow-pandas-internals/)

~~~
catawbasam
And it's looking good, too. Thank you.

------
sillysaurus3
_Basically, Pandas makes it so that data analysis tasks that would have taken
50 complex lines of code in the past now only take 5 simple lines_

A metric worth underlining.

------
teekert
Pandas, Seaborn and Jupyter. A gift from the gods for any (starting but also
more advanced) programming biologist.

------
newyankee
Great, but no love for R and Hadley Wickham ?

~~~
ct0
I thought they article was going to be about Hadley too. But i'm pretty sure
he was the HR star yesterday.

------
smortaz
In case you're interest in reading his book online - or 'running' it, here it
is:

[https://notebooks.azure.com/wesm/libraries/python-for-
data-a...](https://notebooks.azure.com/wesm/libraries/python-for-data-
analysis)

If you want to read, click on any notebook, for example:

[https://notebooks.azure.com/wesm/libraries/python-for-
data-a...](https://notebooks.azure.com/wesm/libraries/python-for-data-
analysis/html/ch08.ipynb)

If you want to run, click Clone, sign in, then Run. It's basically a
collection of Jupyter notebooks. This is from his personal repo.

[disclaimer: work at msft]

~~~
cschmidt
Or you could clone his Github repo, and do it in Jupyter:

[https://github.com/wesm/pydata-book](https://github.com/wesm/pydata-book)

I was really pleased to notice that the second edition of the "Pandas book"
([https://www.amazon.com/Python-Data-Analysis-Wrangling-
IPytho...](https://www.amazon.com/Python-Data-Analysis-Wrangling-
IPython/dp/1491957662/)) just came out in late October. I'm about halfway
through reading it now.

------
anon104
"The man behind Pandas, the most important _Python tool_ in data science"

~~~
make3
Not even. That's unarguably NumPy.

.. Followed by Scikit learn. Then, huge projects like {Tensorflow, Pytorch}.
Only then, Pandas.

~~~
anon104
"The man behind Pandas, a _reasonably_ important _Python_ tool in data
science"

~~~
make3
the title used to read "Pandas, the most important data science tool"

------
NelsonMinar
Pandas really is great. It's not just that it's a convenient library, it's
also really nicely implemented and super efficient. And it has a lot of tools
that guide the user towards doing things the right way.

------
amelius
I tried to zoom in on the photo to actually see his face, but the website
hijacked zooming on mobile.

~~~
aleyan
Which photo? The header photo at the beginning of the article with a man
working on apple laptop is not Wes McKinney. That is a stock photo of "a man
works at his computer at the Airbnb office headquarters in San Francisco" [1].
There is a portrait of the actual Wes McKinney a little ways down the article.

I don't know why QZ thought it is a good idea to start a "Meet the man"
article with a large photo of a man who is not the titular man.

[1] [https://www.citylab.com/environment/2017/09/lab-report-
shari...](https://www.citylab.com/environment/2017/09/lab-report-sharing-
economy-climate-impacts/540267/)

------
neaden
Wes McKinney is a good guy, and Pandas is very useful. The title of this
article is grandiose though.

~~~
larrydag
Exacltly. I thought this was an article about John Chambers, Ross Ihaka, and
Robert Gentleman.

[https://en.wikipedia.org/wiki/R_(programming_language)](https://en.wikipedia.org/wiki/R_\(programming_language\))

~~~
PunchTornado
Funny, but you have to admit pandas is the lingua franca of data science now.

A couple of years ago the whole team was using only R. Now some guys use it
here and there a couple of days a month.

~~~
larrydag
Not necessarily in my world. My theory...

Programmers grok pandas Statisticians grok R

~~~
dagw
That matches my observations very well.

------
mynewtb
Can we update the title to 'behind the pandas library'?

~~~
gk1
Added "Pandas" to title to make it less clickbaity.

