
Python Data Science Handbook - type0
https://jakevdp.github.io/PythonDataScienceHandbook/
======
pixelmonkey
We're using this book for a "book club" at work. Doing 1 chapter every 2
weeks. Chapter 1 covers Jupyter, 2 covers numpy, 3 pandas, 4 matplotlib, and 5
machine learning. We just made it through the first 4 chapters and it lays a
good foundation for those libraries. I suspect chapter 5 is the meatiest and
most interesting chapter, which covers scikit-learn and machine learning
techniques. It is a long chapter so we will spend a month on it.

I recommend combining this book with McKinney's Pandas book[1] and the
author's excellent YouTube presentations at PyCon and PyData. Start with
"Statistics for Hackers"[2] by Jake VanderPlas and then look for his others.

[1]:
[http://shop.oreilly.com/product/0636920050896.do](http://shop.oreilly.com/product/0636920050896.do)

[2]:
[https://www.youtube.com/watch?v=Iq9DzN6mvYA](https://www.youtube.com/watch?v=Iq9DzN6mvYA)

~~~
nafizh
The notebooks say they are an excerpt from the book but some other place
mentions you can read the book in its entirety in the posted link. So, the
notebooks have all the content or part of the book?

~~~
pixelmonkey
To my knowledge, the notebooks include all, or almost entirely all, of the
content in the print book. Jake mentions a few times in talks that the
Notebooks are "compiled" into the O'Reilly book format. The nice thing about
having the book as notebooks is you can literally "run the book as code" just
by pointing Jupyter at that cloned repo.

------
tony_cannistra
I work with Jake (the author) at the eScience Institute at the University of
Washington (though I'm merely a grad student) and can say that he is not only
an extraordinary data scientist and educator but is a great guy as well. He
worked extraordinarily hard on this, so I'm very glad to see it on the front
page of HN––I'll be sure to show him the screenshot tomorrow!

------
sbassi
Looks like Jupyter Notebooks is a new standard for sharing Live Python code,
as my Python for Bioinformatics book: [http://py3.us](http://py3.us)

~~~
danso
My problem with Jupyter is that it doesn't produce good plaintext files, i.e.
you can't edit it using your favorite text editor.

~~~
bovine3dom
There's a vi mode plugin [1] for it which helps me get by.

[1] [https://github.com/lambdalisue/jupyter-vim-
binding](https://github.com/lambdalisue/jupyter-vim-binding)

------
samwestdev
What math level should I have to understand this book?

~~~
sandwell
You don't need a strong math background. I got through a few books (PDSH
included) and a couple of MOOCs on data sci/ML over the last couple of years
with high school level math skills + some extra reading. Not everything is
explained in minute detail but there are plenty of other sources to supplement
it if you want to go deeper.

------
neves
It looks great. How would I convert it to the mobi format so I can read it in
my kindle?

~~~
hdra
You can buy it. Its available for Kindle on Amazon. Also available DRM-free on
ebooks.com.

Still a bummer that O'Reilly stopped selling books directly. There's been so
many recently published books that I'm interested in that I can no longer
purchase.

~~~
neves
I know, but is really pretty expensive if you live in a third world country e
don't earn a wage in dollars.

Nice tip about ebooks.com. I also was an Ora non DRM orphan.

------
monkeydust
Currently working through "Hands-On Machine Learning with Scikit-Learn and
TensorFlow" which is solid but this seems like a good primer to that.

------
sgpl
Has anyone gone through this book? If so, what were your thoughts?

~~~
claytonjy
Particularly curious how it compares to Wes' Python for Data Analysis, aside
from the sklearn stuff.

~~~
akg_67
One of the challenges with Wes book is that it is quite old (2014). A lot of
commands/functions/code mentioned in the book are obsolete and removed so code
fails.

The OP book is relatively recent (2016). The majority of code still runs as
mentioned. Only a few commands/functions mentioned generate deprecation
warning. This book is also covers packages and ML exhaustively. I have gone
through this book cover to cover and enjoyed it. This is the first and only
book that I found that covers data analysis with Python comprehensively. I
wish author had covered data cleaning aspects little bit more.

~~~
pixelmonkey
The Wes McKinney Pandas book has a 2nd edition coming out next month. Raw
edition is already available on Safari here.

[http://shop.oreilly.com/product/0636920050896.do](http://shop.oreilly.com/product/0636920050896.do)

Release date slated for October 2017.

------
tomrod
Wow! Thanks for this!

