
An Introduction to Statistics with Python - Lofkin
http://work.thaslwanter.at/Stats/html/
======
Tycho
iPython Notebook is great. Making it the norm for exploring/teaching ideas in
a coding context is a great trend.

One other thing I hope comes along is annotated mathematical
formulas/expressions. Any time I see a mathematical symbol I want to be able
to hover over it and get its definition (in the context it's being used) and
other relevant information. So often I see formulas in articles/books where
they don't explain everything. It makes it really hard for outsiders/amateurs
to catch on to the ideas being discussed.

~~~
nileshtrivedi
Hover is not possible on touch devices. From 5 possible interactions (left
click, right click, left doubleclick, right double click, hover), we've gone
to 2 (short press, long press). On the other hand, we have acquired gestures
like pinch-zoom, so it may not be a big loss.

~~~
tangue
High-end Samsung Galaxy phones have a kind of hover effect (called "air-view")
through an infrared sensor, so it's quite possible that in the future this
could become a common feature.

------
tunnuz
Also relevant:
[http://greenteapress.com/thinkstats](http://greenteapress.com/thinkstats) by
Allen B. Downey (free PDF but available as O'Reilly print).

~~~
gjreda
His Think Bayes is another good one:
[http://greenteapress.com/thinkbayes/](http://greenteapress.com/thinkbayes/)

------
graffitici
Can people comment as to the quality of this article? Would it be a good place
for a refresher on statistics? Are there any other such projects that people
can recommend?

I like the way it has a strong Python focus.

~~~
maxpupmax
Well Python is right there in the title! ;)

It looks like more of an ebook and it seems to cover most of the essentials in
your typical undergrad course + bootstrapping and intro to bayesian. From what
I scanned it might be a little slow if you're just looking for a stats
refresher -- depending on your level of experience I'd recommend the numpy and
pandas docs as a "next step up" from the link (and maybe a bit more practical
IMO).

~~~
fluential
Also relevant - Statistics for Engineers Tutorial at SRECon15 -
[https://github.com/HeinrichHartmann/StatisticsTutorial](https://github.com/HeinrichHartmann/StatisticsTutorial)

------
Lofkin
More good python resources: [http://web.bryant.edu/~bblais/statistical-
inference-for-ever...](http://web.bryant.edu/~bblais/statistical-inference-
for-everyone-sie.html)

Harvard Data science class, in python:
[http://cs109.github.io/2014/](http://cs109.github.io/2014/)

------
jeo1234
I still have to finish reading it, but I wonder how python will ultimately
stack up against R?

~~~
Lofkin
Python has more consistent stats, time series and programming syntax (for the
packages it does have), and better bayesian inference package (pymc 3 > stan).

Python is also better than R for ad hoc statistical modeling and algorithim
development (you can write python code on the order of C fast with numba) ,
general programming, scraping, natural language processing, agent based
modeling etc.

Python is also better for GIS, optimization, symbolic math and larger datasets
with blaze and dask and pyspark.

R right now is a bit better for visualization, reporting and exploratory data
analysis (I think this will change soon though with Bokeh and blaze) and has
many more esoteric stats packages.

With statsmodels, pymc3, pandas and scikitlearn etc you can probably do 98% of
everything you need in python without needing to dip into the more esoteric
packages of R (with some exceptions). For everything else, you can call R
packages with Rpy2. With this you get all the advantages of working in one
language (not spread too thin) and the advantages that python offers while
leveraging R's wealth of packages.

The latter is a bit more difficult through python, but not as hard as trying
to remember the syntax of and gluing together a two language workflow.

That is why I chose python... Also I can write excel addins with python
(xlwings) instead of using VBA

~~~
data_scientist
Genuine question, why do you find Pymc3 better than stan? Pymc3 is still in
alpha, while Stan is stable since many years. They both implements the same
algorithm (NUTS), so is it just the nicer syntax or is it anything else?

~~~
peatmoss
I've made minimal use of Stan, and not really used Pymc3, but from a quick
look, it seems Pymc3 is a bit more integrated than RStan. In RStan, you end up
writing Stan code as an alien, wrapping the foreign syntax in quotes, and then
shoveling the code as a string into Stan. There isn't really an R grammar for
Stan as near as I can tell.

------
jtth
Python: We Still Can't Do Generalized Linear Mixed Models, So Call R.

~~~
tadlan
Or use pymc 3 for the bayes equivalent. It's Better anyways. Yes one can also
use rpy2. I think glmm is a gsoc project or an upcoming pr but not 100% sure.

