
Principles and Techniques of Data Science - charlysl
https://www.textbook.ds100.org/
======
charlysl
Homeworks, labs, projects:
[https://github.com/DS-100/sp19](https://github.com/DS-100/sp19)

Course design: [https://youtu.be/HITIm3KoU2U](https://youtu.be/HITIm3KoU2U)

Course website: [http://www.ds100.org/sp19/](http://www.ds100.org/sp19/)

~~~
edshiro
This looks great! Thanks for sharing. Interestingly enough, from looking at
the table of contents, it seems this book starts with a more (and welcome)
pragmatic approach, where you write some python code before, look at data
visualisation techniques, etc, before delving into stats.

Is there any chapter that stands out to you?

~~~
charlysl
You're welcome!

I haven't done the course yet, I've just found it. But, from the rationale
video, the course seems to be more about weaving recurrent fundamental data
science concepts throughout, emphasizing one particular concept or technique
in each chapter, so I guess that it would make more sense to take it as a
whole.

It is intended as a "glue" course, having completed CS fundamentals and before
core data science courses, like statistics, machine learning and databases,
giving students a context for what lies ahead, and just enough to be dangerous
and start doing data science stuff.

If this is what you are after, you may also want to consider CMU's "Practical
Data Science", which seems to have a similar approach, videos, much more
machine learning and big data, and is also very current, but doesn't have such
a nice companion online book (but the notes look great) and has much less
statistics: [http://datasciencecourse.org](http://datasciencecourse.org)

Both look like great DS intro courses from top universities, we are spoilt.

And then, also from Berkeley, there is "Data 8", which is intended for those
who want an intro to data science, but don't have any programming or college
math knowledge yet; it also has a similar online book with working links to
Jupyter notebooks: [http://data8.org/sp19/](http://data8.org/sp19/) (and
videos:
[https://www.youtube.com/playlist?list=PLXbeRfilLvMoC3QZKxRrp...](https://www.youtube.com/playlist?list=PLXbeRfilLvMoC3QZKxRrpseZNXjImWyof))

------
shubh2336
Shouldn't joins be explained as cartesion product instead of venn diagrams [1]
when co-relating with sets?

[1]
[https://www.textbook.ds100.org/ch/05/cleaning_structure.html...](https://www.textbook.ds100.org/ch/05/cleaning_structure.html#Joins)

~~~
EForEndeavour
As I understand things, the Cartesian product (AKA the cross join) cannot be
nicely depicted using Venn diagrams, you're right. However, Venn diagrams are
a great way to depict the set logic that applies to the join keys of left,
right, inner, and outer joins.

~~~
robgt
See here for an example of an argument against using Venn diagrams to depict
joins: [https://dzone.com/articles/say-no-to-venn-diagrams-when-
expl...](https://dzone.com/articles/say-no-to-venn-diagrams-when-explaining-
joins)

~~~
EForEndeavour
Thanks! That link sent me down a rabbit hole in which I learned valuable
things about SQL that I didn't even realize I lacked.

------
tronko
Any way to print this manual or buy a hard copy?

~~~
rahimnathwani
It should be possible to build a PDF from source. The setup guide is here:
[https://github.com/DS-100/textbook/blob/master/SETUP.md](https://github.com/DS-100/textbook/blob/master/SETUP.md)

I tried the following in a python 3.7 virtual environment, but it didn't quite
work:

    
    
      sudo apt-get update
      sudo apt-get install -y --no-install-recommends npm calibre jekyll ca-certificates
      git clone https://github.com/DS-100/textbook
      cd textbook
      pip install -r requirements.txt
      pip install datascience # due to version conflict
      pip install --upgrade folium # due to version conflict
      pip install beautifulsoup4
      pip install lxml py-mathjax # not sure if these are needed
      sudo npm install -g gitbook-cli
      sudo gitbook fetch
      sudo gitbook install
      make build

~~~
halfeatenpie
I'd assume because you have the --no-install-recommends flag on your apt-get
call. Maybe something you're doing requires the recommended (but not
dependent) packages. I haven't done it yet, but that's my assumption at first
glance, so take it with a grain of salt.

~~~
rahimnathwani
Sorry, I misled you a bit there. I didn't actually use that flag when I did
it, as I already had the packages installed.

------
iron0013
I haven’t looked at the material yet, but I did try to read Deborah Nolan’s
book Data Science in R, and it was a confounding experience. I remember
thinking “the material in this book is so far from anything that I’ve ever
heard described as ‘Data Science’ that it renders the phrase useless”

~~~
thousandautumns
I've never looked at Data Science in R, but Hadley Wickam's R for Data Science
is great in my opinion. Really applicable, down to earth, and focuses much
more on the meat of data science (data manipulation and munging,
visualization, relational data, and efficient programing) more than the
typical "fit a neural network to this idealized toy data set!"

Its also available for free online at
[https://r4ds.had.co.nz/](https://r4ds.had.co.nz/)

------
mleonard
The data-8 videos are online. Are the data-100 videos online too? Thanks.

~~~
charlysl
No, I searched and searched, but to no avail.

~~~
mleonard
Me too. Thanks for replying.

