
Ask HN: What programming language to learn for a statistician? - Slamchunk
My SO, is currently a statistician&#x2F;data research, she primarily that works in excel&#x2F;vba&#x2F;sql.<p>As she works in the public sector, we would like to broaden her employment prospects and looking at most J&#x2F;Ds for Data Scientists or similar stats based roles programming is a must.<p>So which is the most suitable language to learn? Python? R?<p>TLDR; Data Scientists, what did you start out learning? What worked? What would you do differently?
======
e_py
As other said here, Python seems like the good choice; Plus it's a "real
programming language". I mean, you can use it for more stuff: web programming,
web scrapping, etc. There are a lot of libraries, even for game development.

If you can learn more than one I would recommend learning R as well, rather
than other technologies such as octave or matlab..

------
joeclark77
I'd go with Python and specifically the scientific + numerical libraries.
There are books like "Python for Data Analysis" from O'Reilly.

Five years ago I would have said "R", but Python enthusiasts have been
replicating what R does in Python at a frantic pace, and R will never really
replicate what Python brings to the table.

------
NumberCruncher
As a statistician I spend ca. 80% of my time collecting and transforming data.
Maybe because I never had the luxury having 1-2 own data engineers doing that
for me. Having worked with SAS, SPSS, Matlab and python and tried some other
tools I would say that the choice of statistical programming language does not
make a big difference. If you once understand a modelling process you can
reproduce/use it in any language as long as there is a documented package for
it.

On the other side knowing how to work with data is IMHO more important. Being
an SQL pro, knowing how to think in data sets instead of data records, when to
use flat tables, how to use vectorization and matrix manipulation even for all
day tasks especially in "in memory" systems is essential.

I would say SQL + R/python makes a good combination. With that you can solve a
lot of problems at least two different ways. R gets integrated step by step in
DWHs, what makes a lot easier. I hope SAS dies a short and painful death, but
could be also a valid choice.

------
sfrailsdev
I'd note that SAS is still used in the public sector and enterprise to a
larger degree then you'd think, just because of legacy usage. For example, the
FDA had to clarify a few years ago that R was okay, because their regulations
required the use of SAS5 formatted data for electronic submissions, and when I
worked in a county health department, we used SAS.

Perl was also the bioinformatics golden child for a while, and I expect there
are still people using it for that purpose in industry.

That said, looking beyond the public sector I'd look at Python as broadening
her prospects more then anything else, just because it's more broadly used in
a variety of industries, and general understanding of it is more broadly
applicable.

I'd suggest she also learn enough R to import stuff and export stuff. If there
are R scripts she needs to use, python can be used to script R from the
command line, and she can import data, process it, and be able to export the
results back into python as an intermediate step.

------
huac
From a statistics perspective R is the language to learn.

Python is good for data engineering or pipelining, etc - but R is the best for
analysis:

\- Rstudio is a much more friendly interface than IPython/Jupyter notebooks

\- Python's visualization libraries can't come close to ggplot2

\- Python lacks an effective grammar of data manipulation better similar to
dplyr or magrittr.

I think HN is more engineering focused, hence increased exposure to Python. At
the places I've worked/interviewed for data science, 1 was full Python (though
they have a high eng bar for data scientists, and very few data engineers),
and the rest had a reasonable split of R and Python. Your SO will be fine
either way but might find R more intuitive and better suited to statistics
work.

------
fitzwatermellow
Dedicated probabilistic modeling environments are also gaining ground and may
become standard in the near future.

The holy grail is something like: feed in some data, or parameters and have an
algorithm generate the corresponding correct Bayesian inference and posterior
distribution. It's very easy for scientists, even with years of knowledge and
experience, to implement things incorrectly ;)

Check out Stan and Figaro:

[http://mc-stan.org/](http://mc-stan.org/)

[https://www.cra.com/technical-expertise/probabilistic-
modeli...](https://www.cra.com/technical-expertise/probabilistic-
modelingprogramming)

------
lnx01
Python or R, or a combination of both:
[https://en.wikipedia.org/wiki/R_(programming_language)](https://en.wikipedia.org/wiki/R_\(programming_language\))

------
tnmrnis
R is what companies that are leaving SPSS and SAS are switching to. While I
also like Python for Data Science, R seens to be more popular in the industry.

------
wkoszek
Python. It has all libraries in the world and it's easy. Plus you'll be able
to do some normal, general-purpose programming in it.

------
exyi
I'd also recommend you to have a look at F#. It's functional language so it
might be more intuitive for mathematician to learn. And it can also call R
functions, so you can use all the R statistical function + packages.

------
yeasayer
Julia

------
Ilikeruby
Ruby

------
chintanshah24
Python IMO.

