

The Popularity of Data Analysis Software - sampo
http://r4stats.com/articles/popularity/

======
cwal37
As someone who uses Python and R at work, used Matlab in the past, and used
SAS, STATA, and SPSS briefly in grad school I found this really interesting.
However, and I realize this might be difficult to account for, a lot of
analytics actually just happens in Excel. For quick and dirty statistics or
basic linear regressions, Excel represents a widely known quantity. In
particular, I've been stuck using Excel in the past just because the final
product is heading off to an individual or group for whom anything else is
"too technical".

I realize Python didn't make the cut-off to break out its academic research
standing, but I would be curious to see its growth.

In terms of my personal impressions (could be wrong of course, just my
feeling/experience) of each that I've used:

SAS - Arcane and annoying. I am happy to be in a position where my group has
no interest in using it. I assume it made sense when other packages couldn't
handle enormous datasets, but there are plenty now that can. Assume
institutional inertia is keeping it in place, particularly at the federal
level.

STATA - For fields that previously relied on SAS, seems to be replacing it to
some extent. Younger econometrics people in particular. I didn't use it much
beyond the basics, but my healthcare economics PhD candidate roommate swore by
it. Purpose-driven software.

SPSS - I once had an economics professor refer to it as "a toy for
kindergarteners." I kind of view it as a more statistically-focused Excel for
people who need a bit more power but can't be bothered to learn any coding.
Too in-between for my uses. Seems to be thoroughly entrenched in the social
sciences, psychology in particular.

MATLAB - My first introduction to non-excel graphing solutions, and totally
blew my mind in that regard. I don't have an engineering background, and I
didn't used it for those specific functions. The first programming-type thing
I stuck with consistently in my life, and I honestly enjoyed it. I found
working through data and statistics in MATLAB to be an utter joy compared to
slogging through Excel. Specifically, the customization available in quickly
manipulating data and figures.

Python - Transitioned into Python (mostly scipy, numpy, statsmodels, pandas,
matplotlib) when I got my current job as we need to quickly iterate and be
able to test a bunch of different things. Used the Python(X,Y) suite as a
crutch for a while, but now I enjoy trying to find ways to slim things down
with "stock" Python. I'm certainly not a great (or even good) programmer, but
it's nice to be able to execute any random idea I have over the course of a
single Saturday. Seems to have the most flexibility and power out of
everything I've used due to its nature as an actual general programming
language. Sometimes library support can be spotty for specific statistical
operations. I love Spyder, as having a simple variable explorer is something I
was really going to miss coming from MATLAB.

R - First thought, "Wow, everything's been done before!" In a day (my first
day ever using R) I had a bunch of annoying regressions and analysis up and
running, which had taken me more than a week in Python. There seems to be
examples for everything, and the basic R package includes a tremendous amount
of statistical capability. I prefer Python, as I feel I'm getting more general
personal development out of it, but I use R to quickly spot-check any results
from Python or Excel.

~~~
kiyoto
>I realize this might be difficult to account for, a lot of analytics actually
just happens in Excel. For quick and dirty statistics or basic linear
regressions, Excel represents a widely known quantity. In particular, I've
been stuck using Excel in the past just because the final product is heading
off to an individual or group for whom anything else is "too technical".

As someone who has/can use any of the listed software, another benefit of
Excel (in addition to its pervasiveness and popularity with "non-technical"
people) is its rapid prototyping capability. If your data is small-ish and
requires no serious stats calculation (which accounts for a lot of data
analysis, especially in the early phase), Excel is super fast to prototype in.
You can do basic math, aggregation & filtering and charting all within a few
keystrokes.

Another benefit that I see is that as an interface, spreadsheet shows you to
show each step of the work. As a programming environment, spreadsheet is just
a mess, but I find it easier to track what I am doing in Excel than working
inside R/Python REPL and staring at my history or currently defined variables.

~~~
sampo
Personally, I don't work in the REPL much, but write the analysis in a script
file, so that it can be re-run if something changes upstream.

