
Ask HN: What are the best resources to learn Python for Data Analysis - jbmorgado
There are a great number of posts in HN about resources to learn R for Data Analysis, but - while considered also an excellent language for the task - very few about Python.<p>What are the best online courses, books, blogs for learning Data Analysis in Python?
======
hatmatrix
_Python for Data Analysis_
[http://shop.oreilly.com/product/0636920023784.do](http://shop.oreilly.com/product/0636920023784.do)

~~~
geebee
There have been some good recommendations here for machine learning and data
_science_ , but I'm not sure that's what the OP is looking for here (or maybe
it is, certainly no harm in posting them).

But yeah, if what you want to know how to do is query, organize, filter, trim,
format, reformat, munge, and finagle data, you're probably looking for
something more like the oreilly book hatmatrix mentioned above.

If not the book, I'd recommend just going through pandas as much as possible.
Nothing wrong with just going through the online docs.

Oh, one more thing, I'm personally a huge fan of pandasql as well. It's a nice
library that allows you to query a panda data frame as if it were a sql table
(joins work with other data frames). Pretty much whatever is available in
sqlite will be available through pandasql.

There have been a few spats on the interwebs about whether it's better to do
things in sql vs data frame operations. Personally, I use both - I do find
some things are far easier to do with a query, and then transition over to
pandas and bumpy when I get to programming/mathy things.

Lastly - if you _do_ want to do data science/ML stuff, I'd recommend going
over to scikit-learn and just going through all the examples, trying things
out on your own datasets.

~~~
jbmorgado
OP here. You are right, my interest is actually in the Data _Analysis_ part
and what are the most up to date tools and best practices for using those
tools. This is because I already have a good understanding of the scientific
part (i.e. statistical analysis) from the academy and from my job as a
researcher.

Still, I find that a good part of the Data _Science_ resources, starts by
giving a good introduction about the _Analysis_ part, so they are also
important answers for this question.

Thank you all.

------
jmportilla
Shameless self promotion: [https://www.udemy.com/python-for-data-science-and-
machine-le...](https://www.udemy.com/python-for-data-science-and-machine-
learning-bootcamp/?couponCode=REDDITPY19)

~~~
jbmorgado
I had actually found that course and it seemed quite interesting. I know that
you are it's creator, but it would be nice to have some 3rd party giving some
input about what he liked and didn't like about it.

------
pythonbull
Data Science from Scratch [http://amzn.to/2dD9Iba](http://amzn.to/2dD9Iba)

Python for Data Analysis [http://amzn.to/2dDw6fL](http://amzn.to/2dDw6fL)

Web Scraping with Python: Collecting Data from the Modern Web
[http://amzn.to/2eov4dZ](http://amzn.to/2eov4dZ)

Python Machine Learning [http://amzn.to/2eobdt3](http://amzn.to/2eobdt3)

[http://sebastianraschka.com/books.html](http://sebastianraschka.com/books.html)

------
mtmail
"Data Science from Scratch"
[http://shop.oreilly.com/product/0636920033400.do](http://shop.oreilly.com/product/0636920033400.do)

------
naftaliharris
[http://sebastianraschka.com/](http://sebastianraschka.com/) has some great
articles for machine learning in Python.

------
joeclark77
Not necessarily a learning resource, but I'd like to plug the Anaconda
distribution of Python. [https://www.continuum.io/](https://www.continuum.io/)
It includes most of the commonly-used libraries/packages in data analytics, so
at ASU I made all my students download it just to start from. Three things it
gives you right out of the box are iPython (a better Python shell), Spyder
(the Python version of RStudio), and Jupyter Notebooks.

For learning, I'd recommend taking something like Janert's "Data Analysis with
Open Source Tools" and go through chapter-by-chapter trying to figure out how
to implement the various analyses in Python. That book in particular uses a
different technology every chapter for its tutorial exercises, so ignore
those. But the exposition of the concepts is fantastic.

------
fitzwatermellow
Download Anaconda:

[https://www.continuum.io](https://www.continuum.io)

Take an icy plunge right into the "Titanic: Machine Learning from Disaster"
dataset ;)

[https://www.kaggle.com/c/titanic](https://www.kaggle.com/c/titanic)

------
source99
I tried to read books or even take a night course but nothing did the trick
except: a real side project that required me to learn the proper tools and
techniques. Turned into a full time job that I am really enjoying.

Now I am able to read the books because I desire the knowledge to further my
passion.

~~~
jbmorgado
I also follow that approach (more out of eagerness to do something than from a
didactic point of view), but the problem with it, is that you end up bypassing
the best practices in the area.

------
twunde
I've found General Assembly's data science course to be pretty good at getting
you up and running:
[https://github.com/justmarkham/DAT8](https://github.com/justmarkham/DAT8)

------
IndianAstronaut
Scipy videos posted from the Scipy conference(using Python for mathematical
computations and data mining) are on Youtube. An excellent resource.

------
probinso
the internet is a pretty good resource

