
Learn the basics of data science with these books - becewumuy
https://hackernoon.com/wannabe-data-scientists-learn-the-basics-with-these-7-books-1a41cfbbdd34#.7hsu30e4f
======
jeroenjanssens
I'm flattered to see Data Science at the Command Line next to these great
titles, but I'm not sure if I would recommend it to learn the basics of data
science.

DSATCL discusses the ideas behind various cleaning and visualization
approaches and several machine algorithms, but only briefly. My personal
recommendation would be to first gain some experience with these topics using
Python and/or R. If you're afterwards curious to find out how the Unix command
line can help to do data science, well, then there's only one book I can think
of! ;)

~~~
dasboth
I agree. I'd start with something like Joel Grus's Data Science From Scratch
to get a handle on the basics in Python (or whatever the R equivalent is, I'm
not familiar with R books).

I do however find myself more and more wishing I knew data science-specific
Unix commands, and I think I know what book to get to solve that problem... :)

~~~
nthot
R for Data Science is a good R equivalent by Hadley Wickham. It also acts as a
high level overview of the hadley/tidy verse (ggplot2, tidyr, dplyr, etc.).
R4DS is free online [1].

[1] [http://r4ds.had.co.nz/](http://r4ds.had.co.nz/)

------
rcar
Would just throw an extra plug for Python for Data Analysis. Though the title
might sound a little bland, it's a good, practical summary of how to use
pandas for the sorts of data analysis you often have to do in data science
work.

~~~
ploika
I'd add the disclaimer that while Python for Data Analysis is a great resource
for learning pandas, which itself is invaluable for data science in Python,
the book doesn't cover machine learning or statistical inference in any great
detail. That's not a criticism, it's just (mostly) beyond the scope of the
book.

~~~
rcar
A fair point for sure, which is actually one of the reasons why I do tend to
recommend the book.

ML and stats are generally the more flashy and well-known parts of data
science, and so I've found that people new to the field often don't have major
difficulties finding resources for learning them or finding the self
motivation to dive into them. The data cleanup, on the other hand, is often
the more important work to be done on projects while simultaneously being seen
as the less enjoyable part. Learning how to do it well makes it a more
interesting process, and pandas and this book lay a good foundation for that.

------
jonathanstrange
IMHO, data science == applied statistics, but you better know a lot about the
underlying mathematics before you come to any conclusions.

------
rm_dash_rf
where can i get #2?

2\. Business value in the ocean of data — by Fajszi, Cser & Fehér

------
blahi
bah.

Statistics in Plain English.

Data Analysis Using Regression by Gelman

Introduction/Elements of Statistical Learning by Jerome Friedman. I recommend
reading the Introduction and using the bigger book as a reference material
when tackling a problem.

Bayesian Data Analysis, 3rd edition by Gelman.

You need calc 1 & 2 and matrix algebra somewhere along the way.

Lots of papers, googling and doing. That's when you got the basics covered.
You start being "operational" after Data Analysis Using Regression.

When you start working on a problem, you need to go through the relevant
literature first. Nobody ix expert or even half-good in more than 2 or 3
(small) areas of statistics. Read the literature, take notes and create a plan
first.

