
Python for Data Analysis – A Critical Line-By-Line Reivew - TedPetrou
https://medium.com/dunder-data/python-for-data-analysis-a-critical-line-by-line-review-5d5678a4c203
======
TedPetrou
Hey all,

I wrote a very detailed review of the book, Python for Data Analysis (2nd
edition) by Wes McKinney.

Here is a high-level summary:

PDA is written very much like a reference manual, methodically covering one
feature or operation before moving on to the next. The current version of the
official documentation is a much more thorough reference guide if you are
looking to learn pandas in a similar type of manner.

There is very little actual data analysis and almost no teaching of common
techniques or theory that are crucial to making sense of data.

The vast majority of examples use randomly generated or contrived data that
bear little resemblance to what data actually look like in the real world.

For the most part, the operations are learned in isolation, independent from
other parts of the pandas library. This is not how data analysis happens in
the real-world, where many commands from different sections of the library
will be combined together to get a desired result.

Although the commands will work for the current pandas version 0.21, it is
clear that the book was not updated past version 0.18, which was released in
March of 2016. This is apparent because the resample method gained the on
parameter in version 0.19 which was absent in PDA. The powerful and popular
function merge_asof was also added in version 0.19 and is not mentioned once
in the book.

There were numerous instances where it was clear that the book was not updated
to show more modern code. For instance, the take method is almost never used
any more and has been completely replaced by the .iloc indexer. There were
also many instances were code snippets could be significantly transformed by
using completely different syntax, which would result in much better
performance and readability.

One of the most confusing things for newcomers to pandas are the multiple ways
to select data with the indexers[], .loc, and .iloc. There is not enough
detailed explanations for the reader to walk away with a thorough
understanding of each.

~~~
Bishonen88
It seems the author (am aware that it's most likely OP) became quite active
just a couple of weeks/months ago on twitter and medium - this overlaps
heavily with the publication of the pandas cookbook.

I wonder if the rather heavily negative review of Wes's book is supposed to
bring more attention to the authors own publication? Worked for me in any case
- Safaribooksonline seems to have both books. Let's see how much better the
newer one will be.

~~~
TedPetrou
Yes, as I mention in the article I am the author of Pandas Cookbook. You can
see a much deeper discussion on the datascience subreddit -
[https://www.reddit.com/r/datascience/comments/7fv3tr/python_...](https://www.reddit.com/r/datascience/comments/7fv3tr/python_for_data_analysis_a_critical_linebyline/).

