Hacker News new | past | comments | ask | show | jobs | submit login
A Junior Data Scientist Bookshelf (including Free Versions and HN Discussions) (ghyslain.me)
157 points by gghyslain on Dec 22, 2016 | hide | past | favorite | 9 comments

Thanks everyone for the positives feedbacks. I did not have much time yet to write down full reviews of all the books, but I'll work on it - so far this page is more of a personal "bookmark". But to reply to @Nekopa and @carlsednaoui, here is a short review of the first books.

I have had a really pragmatic approach about reading them - only focusing first on parts relevant to my projects.

# An Introduction to Statistical Learning (ISL) / The Elements of Statistical Learning (ESL)

I focused on chapter 8-9 of ISL about Tree Based Methods and SVMs, two algorithms I used for my dissertation project. I found ISL to provide very clear explanations of the algorithms with just enough mathematical formalism.

I have a good math background so ESL was interesting to go through. But I am more of a practical person, and I found ISL to be more suited for me when it came down to working on my project and supporting my choices.

# Python Machine Learning

Really great hands-on book ! Sebastian Raschka manages well to guide you through all steps of a ML project data: pre-processing, feature engineering, model selection... - all the steps are defined and covered with practical examples.

I strongly recommend this book if you are just starting out with ML and feel "lost" about how to start your own project.

# Taming Text

I decided to use text data I had available for my dissertation project. However, half-way through the book I realized my dataset was to small to apply any of the techniques described there. I still like the practical approach and in the end the book gave me a good idea of what can be done with text.

# Advanced Analytics with Spark

I picked this book once I started working on the implementation of my project into production - we use Apache Spark (Scala) at work.

It provided me with a good introduction to Spark BUT it's based on the RDD-api and as stated on Spark website: "As of Spark 2.0, the RDD-based APIs in the spark.mllib package have entered maintenance mode. The primary Machine Learning API for Spark is now the DataFrame-based API in the spark.ml package."

I'm now mostly relying on Spark Doc / API, I'm not aware of any up-to-date books yet :)

"R for Data Science" by Garrett Grolemund & Hadley Wickham was recently completed.


The ebook is free online, you can buy from Amazon & O'Reilly too.

Nice list! I especially like that you added Resonate in there.

Could you add your personal reason for keeping these books on your shelf? That would make the page more interesting, and maybe help you out with your job search, as it will give an insight into your thought processes.

I don't get why Resonate is there. From the Amazon description I get that this book helps with creating meaningful presentations (?), yet it landed on the list among Data Science books and now I read positive comment about it. Can anyone elaborate why this book is worth any aspiring Data Scientist attention?

As a senior practitioner in the field, I feel a few years removed from my initial learning chunk. I really like this list as a throw back to see how the I would have done it today.

I'm a software engineer, about a year into a career shift into data science / machine learning. I'd second the majority of the books. Some were new to me. I did a lot of on the job learning, but Intro To Statistical Learning was my first formal treatment of much of the material outside 1:1 mentorship. I wrote my thoughts here https://www.linkedin.com/pulse/introduction-statistical-lear...

I'm also halfway through the Deep Learning Book now. I'm really enjoying it. I got turned on to the book because there were a large number of people and reading groups at work (LinkedIn) that had organized around the book when it first came out in the html format a few months ago.

Still debating whether I should start with An Introduction to Statistical Learning (ISL) or Bishop's Pattern Recognition and Machine Learning (PRML). I really don't like using R (always a python person). Both have rave reviews on Amazon. Any thoughts?

Solid start. I'd strongly suggest adding some Bayesian modeling books; start with Gelman.

If you look at the academic lineage of many of these authors, it will also help you understand how they get stuck into little biases.

This is awesome, thanks for sharing. Ditto what Nekopa said, curious to hear why you like each of these resources.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact