
Learning Python for Social Scientists (2015) - kome
https://nealcaren.github.io/python-tutorials/
======
apohn
One of the reasons R has been popular among social scientists is that you have
libraries dedicated to statistical methods that are less common in Python.
Examples include Mixed Effects Models, Hierarchical Linear Models,
Psychometrics/Measurement Theory, Structural Equation Modeling, Survival
Analysis, and a wide range of other techniques that shine when you have small
amounts of data and lots of practical/physical barriers around data collection
and experimentation.

Even for methods such as linear and generalized linear regression, there's
just a range of libraries and diagnostic capabilities in R that aren't present
in Python libraries.

Granted, a lot of these statistical methods can be expressed in Stan or some
other probabilistic programming engines that have interfaces in Python.
However, when you are a social scientists and not a statistician there's a big
difference between having a few functions you can call and expressing a
likelihood function.

~~~
nerdponx
In social science research, "getting the model right" is often the goal (i.e
explaining _why_ something happens), and "making good predictions" is of
secondary importance (i.e. good predictions only matter insofar as they
indicate that your model is good). That's one reason you end up doing involved
probability modeling, compared to just slapping it with machine learning like
you'd do in an industry setting. The flipside is that, as you said, many
social scientists don't have the programming chops to do it themselves.

The other big advantage IMO is that it straddles the space between
"traditional stats package" (SAS, Stata, SPSS, Minitab, etc.) and "programming
language". We here know that it's more or less the latter, but the "batteries
included" standard library with first-class support for a "data-frame" type
makes it feel much like the former, especially if that's what you're already
familiar with. It's a gradual transition from a stats package to general-
purpose programming, and you don't have to jump through hoops, install third-
party packages for basic stats work, or deal with ugly/counterintuitive
syntax.

~~~
thanatropism
A problem with Python is the lack of a "best practices" style for scientific
computing.

I find it amazing that Linear Regression in scikit-learn doesn't return
standard errors/t-scores for the coefficients. (These can be useful for model
selection, for example). I know about statsmodels and pysal, but the workflow
with these is already slightly different.

~~~
nerdponx
_I find it amazing that Linear Regression in scikit-learn doesn 't return
standard errors/t-scores for the coefficients_

Case in point! You don't typically need those for machine learning, so why
bother with them in a machine learning toolkit/library?

~~~
j7ake
You don't need them if you don't need to diagnose your analysis. But the hard
part is of course thinking about why your model doesn't work, which is the
strength of R.

------
polpenn
> Social scientists don't really analyze images much, but that might be the
> next big thing.

Satellite images have been used in research in economics to estimate poverty
and deforestation.

~~~
james1071
That all depends - there are some geography departments that do a lot of data
driven analaysis, as do some economists. There really is a lot of overlap -
with various disciplines doing data driven work. Would also expect that
criminologists might be doing similar things.

------
mayankkaizen
Link to very first notebook by Julia Evans is dead.

