Hacker News new | past | comments | ask | show | jobs | submit login
Python Data Science Handbook (jakevdp.github.io)
455 points by type0 on Aug 29, 2017 | hide | past | web | favorite | 27 comments



We're using this book for a "book club" at work. Doing 1 chapter every 2 weeks. Chapter 1 covers Jupyter, 2 covers numpy, 3 pandas, 4 matplotlib, and 5 machine learning. We just made it through the first 4 chapters and it lays a good foundation for those libraries. I suspect chapter 5 is the meatiest and most interesting chapter, which covers scikit-learn and machine learning techniques. It is a long chapter so we will spend a month on it.

I recommend combining this book with McKinney's Pandas book[1] and the author's excellent YouTube presentations at PyCon and PyData. Start with "Statistics for Hackers"[2] by Jake VanderPlas and then look for his others.

[1]: http://shop.oreilly.com/product/0636920050896.do

[2]: https://www.youtube.com/watch?v=Iq9DzN6mvYA


Can others attend your bookclub? I've been attending https://www.meetup.com/Math-and-Algorithm-Reading-Group/ over at buzzfeed HQ. This is right up the same alley.


The notebooks say they are an excerpt from the book but some other place mentions you can read the book in its entirety in the posted link. So, the notebooks have all the content or part of the book?


To my knowledge, the notebooks include all, or almost entirely all, of the content in the print book. Jake mentions a few times in talks that the Notebooks are "compiled" into the O'Reilly book format. The nice thing about having the book as notebooks is you can literally "run the book as code" just by pointing Jupyter at that cloned repo.


Very good idea. Thinking of doing the same.


I work with Jake (the author) at the eScience Institute at the University of Washington (though I'm merely a grad student) and can say that he is not only an extraordinary data scientist and educator but is a great guy as well. He worked extraordinarily hard on this, so I'm very glad to see it on the front page of HN––I'll be sure to show him the screenshot tomorrow!


Looks like Jupyter Notebooks is a new standard for sharing Live Python code, as my Python for Bioinformatics book: http://py3.us


My problem with Jupyter is that it doesn't produce good plaintext files, i.e. you can't edit it using your favorite text editor.


There's a vi mode plugin [1] for it which helps me get by.

[1] https://github.com/lambdalisue/jupyter-vim-binding


I'll add that there's an emacs Jupyter mode if that's of any interest.

I agree in principle that a better plaintext format could be interesting, but I don't see how embedding graphs etc could be easily done without a special interface of some kind.

That said I can imagine a mode where you simply have any editor open on one half of the screen and a browser that autorefreshes on the other.. this is more or less how I work with Emacs and Evince when working on Latex and it's great. Synchronising vertical position with the cursor position might be a challenge.


When you're finished, you can use

  jupyter nbconvert --to python
to get your work in script form.


Yeah you can. Atom has hydrogen, there's one in vscode too.


What math level should I have to understand this book?


You don't need a strong math background. I got through a few books (PDSH included) and a couple of MOOCs on data sci/ML over the last couple of years with high school level math skills + some extra reading. Not everything is explained in minute detail but there are plenty of other sources to supplement it if you want to go deeper.


It looks great. How would I convert it to the mobi format so I can read it in my kindle?


You can buy it. Its available for Kindle on Amazon. Also available DRM-free on ebooks.com.

Still a bummer that O'Reilly stopped selling books directly. There's been so many recently published books that I'm interested in that I can no longer purchase.


I know, but is really pretty expensive if you live in a third world country e don't earn a wage in dollars.

Nice tip about ebooks.com. I also was an Ora non DRM orphan.


Currently working through "Hands-On Machine Learning with Scikit-Learn and TensorFlow" which is solid but this seems like a good primer to that.


Has anyone gone through this book? If so, what were your thoughts?


Particularly curious how it compares to Wes' Python for Data Analysis, aside from the sklearn stuff.


One of the challenges with Wes book is that it is quite old (2014). A lot of commands/functions/code mentioned in the book are obsolete and removed so code fails.

The OP book is relatively recent (2016). The majority of code still runs as mentioned. Only a few commands/functions mentioned generate deprecation warning. This book is also covers packages and ML exhaustively. I have gone through this book cover to cover and enjoyed it. This is the first and only book that I found that covers data analysis with Python comprehensively. I wish author had covered data cleaning aspects little bit more.


The Wes McKinney Pandas book has a 2nd edition coming out next month. Raw edition is already available on Safari here.

http://shop.oreilly.com/product/0636920050896.do

Release date slated for October 2017.


Good to know. I've been recommending Wes's book for a some time now to people new to data work in Python, but between the discussion here and the consistently high quality of Jake's blog posts and demos, I'll have to keep this one in mind.

Out of curiosity, what sorts of material had you hoped he'd cover on data cleaning?


Any book recommendations for a good discussion of data cleaning?


I use this along with Chris Albons similar repo of recipes (http://shop.oreilly.com/product/0636920023784.do).

It is a great compliment to Wear McKinney's "Python for Data Analysis" it is more like a recipe book than the internals as Wes' book is. Also, JVP includes more than just Pandas and NumPy goodies.

Highly Recommend, and fork to create your own curated handbook.


Chris Albons } https://chrisalbon.com


Wow! Thanks for this!




Registration is open for Startup School 2019. Classes start July 22nd.

Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | Legal | Apply to YC | Contact

Search: