
Free Data Science Books - LearnDataSci
http://www.learndatasci.com/free-books/
======
shubhamjain
Although, there is no denying that this is a valuable resource but I have
started to get turned off by a list of n books to learn something - they can
be valuable but it is undeniable that they can also be overwhelming and
perplex someone about how to get started. I believe technical books should be
used to complement your knowledge of the field not to get started in it. For
eg, "Secrets of the JavaScript Ninja" will be very valuable because I already
have experience in JS and it will help me understand some of the caveats that
I might have overlooked. The best way has always been to get start implement
something regarding the subject and try to dive into everything you uncover.

A blog post submitted here mentioned the same sentiment [1] -

> I can’t fully explain how immensely unmotivating it is to be given a huge
> list of resources without any context. It’s akin to a teacher handing you a
> stack of textbooks and saying “read all of these”. I struggled with this
> approach when I was in school. If I had started learning data science this
> way, I never would have kept going.

[1]: [https://www.dataquest.io/blog/how-to-actually-learn-data-
sci...](https://www.dataquest.io/blog/how-to-actually-learn-data-science/)

~~~
doctorcroc
Second the dataquest post. Information without structure can be overwhelming,
and its important to know what the optimal ways to learn something are.
Arguably this is why formal schooling was created - to provide a framework for
learning...

------
piraze
At least "Python for Data Analysis" is a pirate copy. Wonder how many others
are too. But as long as you make money from affiliate links you don't care,
right?

~~~
LearnDataSci
What makes it seem like Python for Data Analysis is a pirated copy? I figured
since it was hosted from Canisius College it would be legally distributed.

I don't want to host pirated content, so if it is I will remove it.

~~~
piraze
The book is not listed at
[http://www.oreilly.com/openbook/](http://www.oreilly.com/openbook/)

Also the PDF has a link to a notorious ebook pirate platform on every page. If
you really believe content on college pages is legal, you must be very naive.
I've never seen a naive webmaster that uses domain privacy though.

~~~
coroxout
Personally I wasn't surprised to see (possibly) pirated content on an .edu
site with a ~username URL, as the ~ suggested a student's page, where
unauthorised content might pop up to share with classmates and stay up
undetected by the college.

What surprised me is that the owner of the Canisius page appears to be
teaching staff rather than a student. The other books hosted there seem to be
legitimately freely available, however, so I'm guessing that was also a naive
mistake.

------
ching_wow_ka
If you're a beginner, you're probably going to be too overwhelmed by the
options. I often find emailing/asking a few different
professors/researchers/students in the field you want to learn for suggestions
more productive.

That's not to say this isn't helpful. This is from my own personal experience.

~~~
larrydag
Also get plugged into a local meetup/user group. They are popping up
everywhere. Here are some examples of R user groups.
[http://blog.revolutionanalytics.com/local-r-
groups.html](http://blog.revolutionanalytics.com/local-r-groups.html)

------
dbhattar
I would also add [http://mmds.org/](http://mmds.org/) in the list. Link to the
book is
'[http://infolab.stanford.edu/~ullman/mmds/book.pdf](http://infolab.stanford.edu/~ullman/mmds/book.pdf).

~~~
ching_wow_ka
It's there. "Mining of Massive Datasets"

------
yoklov
Is anybody aware of good books/resources on machine learning/data science in
Matlab?

My SO has been trying to learn ML to further her work for a couple months now,
and has had a hard time with it. She quite intelligent, but isn't a terribly
experienced programmer (she's been writing Matlab for a couple years now, but
mostly in a scientific setting)... Either way, I suspect part of the problem
is that most of the explanations usually are in a language unfamiliar to her,
and expect her to learn or translate it in addition to the concepts.

~~~
maurits
Andrew Ng, the man behind the excellent ML course on coursera, has an
introduction to Deep Learning using Matlab.

[1]: Wiki with code, exercises and explanation

[2]: Video lecture one with a recap on back-propagation

[3]: Video lecture two on Sparse Auto Encoders

[4]: Handouts

In terms of books, _Bayesian Reasoning and Machine Learning_ [5] is Matlab
based. So is the _Handbook of Monte Carlo methods_ [6].

[1]:
[http://ufldl.stanford.edu/wiki/index.php/UFLDL_Tutorial](http://ufldl.stanford.edu/wiki/index.php/UFLDL_Tutorial)

[2]:
[http://www.stanford.edu/class/cs294a/video1.html](http://www.stanford.edu/class/cs294a/video1.html)

[3]:
[http://www.stanford.edu/class/cs294a/video2.html](http://www.stanford.edu/class/cs294a/video2.html)

[4]:
[http://www.stanford.edu/class/cs294a/handouts.html](http://www.stanford.edu/class/cs294a/handouts.html)

[5]:
[http://web4.cs.ucl.ac.uk/staff/D.Barber/pmwiki/pmwiki.php?n=...](http://web4.cs.ucl.ac.uk/staff/D.Barber/pmwiki/pmwiki.php?n=Brml.HomePage)

[6]:
[http://www.maths.uq.edu.au/~kroese/montecarlohandbook/](http://www.maths.uq.edu.au/~kroese/montecarlohandbook/)

------
fitzwatermellow
I noticed something last night while watching the Djokovic US Open quarter-
final. It featured an "IBM Insights" segment which claimed to have mined 8
years worth of Majors competitions to generate stats. And one interesting
result it was able to produce went something like this: if Djokovic is able to
return only 25% of his opponents serves, then in 85% of past matches it has
resulted in victory for him. The implication being that such is the strength
of his defensive game.

While this is no doubt really interesting, I find I am getting diminishing
returns from outputting stats like this from big dumps of past historical
data. What I would like to be able to show is a live heat graph style stats
tracker, where each point in the match updates my belief net about who is
winning, or playing better. Of course, the final outcome may be upended by
some fluke occurrence such as a Hail Mary pass in the final seconds which is
what makes sports interesting, but nonetheless I think a live tracker would
say a lot more than the actual score of the match.

So, I am wondering if anyone has specific resources for real time online data
mining? At web scale for high throughput data streams. And I agree with
shubmajain above, libraries and repos are preferable to books and academic
journals ;)

~~~
msellout
I don't understand that IBM Insights note about Djokovic. Can you explain
more?

~~~
hguant
Without doing the math - Djokovic is such a strong player that even if he's
only returning a quarter of your serves, meaning you're 3/4's of the way to
winning your set (I don't tennis, sorry if I'm getting the terms wrong), he's
still probably going to beat you.

~~~
elechi
Well, that's a close explanation, except I think you're confusing set and
match. For men's tennis, it takes 3 sets to win the match, with the potential
of playing 5 sets.

I'm actually not sure that the math is true, though. (Or I really don't
understand what the stat is saying.) Let's say that it actually is for every 4
serves, you win 3, Djokovic wins 1. That number gives you every game (winning
the game game-point-15), to give you every set. I don't see how Djokovic ever
wins a game, let alone the set or match.

------
anacleto
Great resources.

I would add these great ebooks on Cloud Computing and AWS Certifications:

 __The Cloud Computing Job Market __

With this eBook you will learn how Cloud Computing is changing the IT industry
and creating a complete set of new roles for companies and businesses
worldwide. Information and data to start your cloud computing career.

Link [0] [https://cloudacademy.com/ebooks/cloud-computing-job-
market-3...](https://cloudacademy.com/ebooks/cloud-computing-job-market-3/)

 __A Guide to AWS Certification Exams __

Introduction to the full range of Amazon Web Services certification exams:
learn what, why, and how to pass just the right exam for you.

Link [1] [https://cloudacademy.com/ebooks/guide-aws-certification-
exam...](https://cloudacademy.com/ebooks/guide-aws-certification-exams-2/)

 __AWS Solutions Architect Certification __

Study guide to Amazon Web Service 's Solutions Architect certification exam:
tips and suggestions on how, what, and where to learn.

Link [2] [https://cloudacademy.com/ebooks/aws-solutions-architect-
cert...](https://cloudacademy.com/ebooks/aws-solutions-architect-
certification-1/)

------
noobermin
Honest question: is ML/DS something you can just pick up and be hired[0]? May
be I'm ignorant, but I'd think employers would look for a degree in some
related field to actually consider you for a position doing it.

[0] As in how you can pick up web hacking, do a few websites and create a
reputation and get hired that way without a formal degree.

~~~
hellofunk
There was a thread on here a month or two ago about this. In general, it was
noted that it's best (for both employment as well as just getting stuff done)
to have a deep understanding of a particular area of ML rather than a general
understanding of many areas. Usually those with a deep understanding have
focused on it in school. But the latter group of generalists is a much larger
group in the software industry, since most of us did not go to school for this
specifically.

------
geoff-codes
Suggest putting this in a repo somewhere, in the vein of:

[https://github.com/vhf/free-programming-
books/blob/master/fr...](https://github.com/vhf/free-programming-
books/blob/master/free-programming-books.md)

[https://github.com/ligurio/free-software-testing-
books/blob/...](https://github.com/ligurio/free-software-testing-
books/blob/master/free-software-testing-books.md)

etc.

------
viewer5
Any specific recommendations from anyone?

~~~
weavie
The Elements of Statistical Learning together with the online course
([http://www.r-bloggers.com/in-depth-introduction-to-
machine-l...](http://www.r-bloggers.com/in-depth-introduction-to-machine-
learning-in-15-hours-of-expert-videos/)) makes for a great introduction.

EDIT: Oops I should have said "An Introduction to Statistical Learning with
Applications in R" rather than The Elements of Statistical Learning. The
Elements book goes into way too much depth to be a good introduction to the
subject.

~~~
snoman
Similarly, An Introduction to Statistical Learning With Applications in R is
like a practical version of (or companion to) Elements. I very much enjoyed
it.

~~~
craigching
And the Stanford version of the same class linked above for ISLR is, in my
opinion, better:

[https://lagunita.stanford.edu/courses/HumanitiesandScience/S...](https://lagunita.stanford.edu/courses/HumanitiesandScience/StatLearning/Winter2015/about)

------
alceufc
"Mining of Massive Datasets" by Leskovec, Rajaraman and Ullman is very good.

Although the post gives a link to the Amazon page of the book, PDFs of the
chapters are free to download at the official book web site[1].

[1] [http://www.mmds.org/](http://www.mmds.org/)

------
LordKano
I really like this kind of stuff.

It's my opinion that our educational process is a bit too heavy on algorithms
and languages while being a bit too light on data structures.

I like to brush up on this subject matter from time to time just to keep
myself sharp.

------
DarkTree
Anyone recommend any of the R books listed or know of any great R books for
purchase?

~~~
larrydag
My favorite intro to R book is The Art of R Programming by Norman Matloff
[http://www.amazon.com/The-Art-Programming-Statistical-
Softwa...](http://www.amazon.com/The-Art-Programming-Statistical-
Software/dp/1593273843)

------
alador
Nice books collection. Thanks :)

~~~
LearnDataSci
:)

------
crazypyro
Why are you hijacking my scroll speed...

Your "smooth-scroll" library is completely breaking my touchpad scroll with an
Acer c720 Chromebook. One slight movement (which should be a few pixels
scroll) is moving me over half-way down the screen. Makes your site unusable
with this touchpad as accidental scrolling sometimes happens and moves the
screen a whole page away, especially when trying to right click open links
because the gestures are similar.

~~~
LearnDataSci
Hmm. Interesting. I just implemented the smooth scroll yesterday so I will
definitely check that out. Thanks for the input.

~~~
DeusExMachina
Smooth scrolling is already implemented correctly in the browser. Your
implementation is just a hack that hijacks the normal behaviour a user is
accustomed to and just gives back a version that just feels wrong to interact
with, even without performance issues.

