
Ask HN: Learn Python, R? Or something else? - socrates1998
Hi Hacker News, I am doing a statistics project for some American football teams and I think it&#x27;s going to require me to learn R.<p>Should I just learn R right away? Or should I learn another programming language first (like Python), then learn R?<p>I have some limited experience in Web development, doing web design and building website mainly in Word Press. As such, I know a some html, css, and Php.<p>Just looking to see if I could learn R without knowing much else.
======
RockyMcNuts
You can learn R without knowing anything else. It's a good statistics package
but a dated and quirky programming language.

I'd consider doing the Stanford Statistical Learning class which starts off
with teaching you R.
[https://lagunita.stanford.edu/courses/HumanitiesSciences/Sta...](https://lagunita.stanford.edu/courses/HumanitiesSciences/StatLearning/Winter2016/info)

Also recommend Swirl, which is an interactive tutorial -
[http://swirlstats.com/students.html](http://swirlstats.com/students.html)

Or you can go straight to Julia, which is a modern statistical package and
language without R's quirks [http://julialang.org/](http://julialang.org/)

If you're eventually going to put your project on the Web, or just want to
learn programming the right way, might be better off doing it in python.

~~~
phillc73
What makes R a dated and quirky language?

I know a reasonable amount of R, but I don't really know any other programming
language. I did once go looking at Julia, and had thought to write a few
simple things using it, but in the end just ran out of time and did it with R.

Learning R has been one of the most challenging and enjoyable things I've done
over the last three years. However, I would be interested to know where
Python, Go or Julia would make my life better or easier.

I mostly write R scripts to analyse and graphically display interpretations of
data. There are so many contributed libraries, it feels like I am really
spoiled for choice.

I also write Shiny web apps. Is there a Python equivalent of Shiny? I have
heard of Django, but is that really as easy as just writing a Python script
and deploying on a server? I always thought Django was more of a complete
framework which also required server side coding.

Anyway, I would genuinely like to know what advantage Python or Julia or Go
would provide me with over just continuing with R.

~~~
RockyMcNuts
Julia claims performance closer to Fortran or C, with the expressiveness of a
high-level scripting language. It seems to have enough traction that it's
probably the way of the future.

R quirks ... the <\- syntax, the ~ formula syntax, looping through a dataframe
is horrifically slow when you can't vectorize, multiple legacy object
semantics, poor parallel/multiprocessing support, poor support for datasets
that don't fit in memory ... upside is any statistical method probably has a
decent implementation in R

python - most comprehensive ecosystem and libraries beyond statistics, e.g.
Web, numerical and scientific computing, machine learning (Tensorflow), NLP,
generally a good language to learn to program in, pretty easy and forgiving
while also being reasonably expressive, performant, offering functional as
well as object oriented features/styles.

the good folks at plotly are working on a shiny equivalent (dash) but it's not
out yet. Django + matplotlib or bokeh or some client-side graphics like
plotly.js is potentially powerful but not really as integrated as shiny.

~~~
kprybol
Julia's biggest hurdle is the lack of well functioning DataFrames (or the
current fork, DataTables). Tons of issues around nullable arrays, etc. have
really slowed progress. I do think it's got a ton of upside, but I've found
that reimplementing my R or Python scripts in Julia to be too much of a
hassle. Costs of reimplemention greatly outweigh the not insignificant gains
in speed.

Also check out this article on updates to R 3.4. R tends to be fast enough for
most work (I use it regularly on one-off analysis or things that won't ever
make it farther than ad-hoc reporting/findings but can't imagine using it in
production systems). The listed changes should go a long way towards making R
just fast(er) enough for dealing with larger datasets (doesn't help with
datasets larger than memory though). For large datasets all the momentum seems
to be moving towards Spark (sparklyr is RStudio's SparkR integration. Very
much a beta but getting better by the day). On the Python front Dask is
awesome for out of memory computation that has no equivalent in R.

~~~
mindcrime
_For large datasets all the momentum seems to be moving towards Spark
(sparklyr is RStudio 's SparkR integration._

Worst case, you can always use MPI with R and run on a Beowulf cluster. Of
course that might not help if you want to use a function from a library, and
the library itself expects everything to be in memory on one node, but at
least it gives you another option for parallelization.

~~~
kprybol
Absolutely, though as you mention, removing the ability to use packages and
the necessity of writing statistical code that properly accounts for data
being spread out across multiple nodes would likely be out of the reach of
your everyday/typical R user. An open sourced alternative to Revolution
R/Microsoft R Server's out of core processing backend + distributed analtyics
packages would be a huge addition to the R language.

------
mindcrime
If you definitely need to use R, I'd say just learn R. R is different enough
from most other languages that I don't think you'd get a lot of value from,
say, learning Python first.

Why do I say that? First of all, R's syntax is quirky and different enough
from (Python|Java|C|Ruby|etc.) that you might almost find it harder to learn R
if you're already used to something else. Second, aside from the syntax the
biggest thing to get used to with R is that it's very much vector oriented.
Basically you're always working with vectors, even when you only have what you
would otherwise think of as a single scalar value. You just put in a vector of
length 1. Anyway, that whole paradigm is different enough from other
programming languages, that you might as well just learn it that way from the
beginning.

Now to be fair, there are libraries and things that let other languages act
and feel a bit more like R, but I'm intentionally not considering those right
now, as that would just be one more complication to deal with. And if you are
locked in on using R for whatever reason, there's no need to complicate life.

The only other question I would have, is whether or not you absolutely must
use R at all. If you have the option to choose your language, you can do
pretty much anything that you can do in R, using Python, or Octave, or
probably many other languages. If that's an option, then you just need to
decide which would be easier / more useful for you. And while I won't take
sides in general, I will say that Python _may_ be a little bit easier to learn
in general, but then you're back to using external libraries for more of the
statistical / numerical stuff.

 _Just looking to see if I could learn R without knowing much else._

My guess is that you can. R has some quirks, but there's nothing especially
scary about it. Depending on how much you already know about statistics, you
may find that learning and understanding the math is more difficult than
learning to use R.

------
mateo411
What dataset are you using? I would be interested in checking it out.

If you don't know R or Python, I would say that learning Python might work out
better for you. Python is a general purpose programming language, whereas R is
really good at stats and visualization. Python is also pretty good at this,
you can use pandas, matplotlib, and scikit-learn.

~~~
socrates1998
It's a string of qualitative data that I want to interpret, like the team runs
the ball Right, then Right, then Left.

My goal is to be able to analyze the next play and possibly give a probability
attached to where the coach will call the play.

But that's just the first step. I would probably need to do a lot of different
stuff with the data.

~~~
mateo411
Is it possible for me to get the data? I can trade you 11 years of NFL point
spreads.

~~~
socrates1998
[https://www.dropbox.com/s/3m9mr24rahwwzoz/St%20Thomas%20Aq%2...](https://www.dropbox.com/s/3m9mr24rahwwzoz/St%20Thomas%20Aq%20Break%20down%20data%202016%20copy.xlsx?dl=0)

~~~
mateo411
Got it. Can you send me an email? I would like to work with you on this. My
email is in my profile.

~~~
socrates1998
Sorry, I can't see your email in your profile, it might be because I don't
have enough karma on HN.

My email is mtgprivatelearning@gmail.com

------
syntexis
Learn R first, it's ideal for your project and not difficult.

------
kasperset
R should not be ignored if you are doing statistics. It has first class
support for stats built in plus there are so many packages available to do
more weight lifting.

------
dec0dedab0de
If you're just analyzing the data R seems to be the right choice. If you're
going to need to gather and parse the data then Python is a much better
general purpose language.

------
clumsysmurf
I've been asking myself the same question. I often hear of Go as a better
Python, so I was hoping to find a good numerical / statistical story for Go
...

Bruce Eckel had a blog about this, I think you'll find the discussion around R
/ Python / Mathematica interesting

[http://bruceeckel.github.io/2015/02/15/why-not-go-
there/](http://bruceeckel.github.io/2015/02/15/why-not-go-there/)

------
77ko
I started and found Python easier then R. Python is a lot more 'english
readable' while R is more like the code you see on Hollywood screens, somewhat
indecipherable with magic incantations.

As a starter, you probably need something like dataquest[0] or udacitys[1]
data courses.

[0] [https://www.dataquest.io](https://www.dataquest.io) [1] www.udacity.com

------
anotheryou
And if Python: 2 or 3?

(I'm kinda decided and want to do a bit of statistcics, but also have one
general purpose language I know a bit better)

~~~
Sir_Substance
R is good for statistics and nothing else. It is, however, really good at
statistics.

Python in more general purpose. If you want to do things that are not
statistics in the future, python makes more sense.

If you're going to get started with python, get started with python 3.

~~~
drewrv
One thing to consider is that most data projects require a fair bit of
manipulation or munging, and Python is particularly good at that kind of
stuff.

------
brunosaboia
As other said, R is the way to go if you decide to work solely with
statistics. My opinion towards R is that it needs some improvements, sometimes
it feels like a language made for prototyping only. But that is maybe its
goal, so in the end it could be a good thing to match academic needs.

------
eb0la
For me the best part of R is documentation.

Using Python for data will help you learn something that can be used for
more... but be warned that Dataframes on stuff like Spark don't work with list
comprehensions (my favorite Python feature).

------
busterarm
If this is a one-off and you don't expect to need to do more of this kind of
thing, I'd say learn Python, otherwise R.

Fortran is always fun though. :)

------
Mikeb85
Yes. Learn R. Download RStudio. If you do statistics nothing beats it.

~~~
jonbaer
Rodeo is an equivalent IDE for Python -
[https://www.yhat.com/products/rodeo](https://www.yhat.com/products/rodeo) ...
also Jupyter Lab (notebooks) ...
[https://github.com/jupyterlab/jupyterlab](https://github.com/jupyterlab/jupyterlab)

~~~
kprybol
Worth mentioning that Rodeo is still unstable and definitely not a feature for
feature equivalent for RStudio (I also worry about speed as the size of the
project grows as it's based on the same backend as the Atom IDE which has been
dogged by speed complaints almost from day one). As for Jupyter Lab, the
readme itself says that it's not yet ready for general use. Currently there is
no true Python equivalent to RStudio (unfortunately).

~~~
jimmyswimmy
Spyder is reasonably close, unless I'm missing some major feature present in
RStudio.

~~~
Mikeb85
Can't speak for Spyder, but one killer feature of RStudio is the ability to
easily make C++ modules that your R project uses (RCpp rocks - you can ignore
most of the annoying things about C++). They compile and link with zero
effort. Also the ability to easily install modules. And browse documentation.
And create projects in R or RMarkdown. Create modules, basic scripts, web
apps. And publish things. And see your data in a spreadsheet format. And
inspect all your objects/data/functions. All in one place.

Aside from RStudio, I love R's lispy/apl-ish semantics, and easy integration
with Fortran/C++ (and a host of other languages, including Java, Haskell,
Ruby, Prolog, Lisp, etc...).

I'm sure Python is great, but R is made for stats, is amazing, and RStudio
beats every IDE I've ever used, regardless of language.

------
probinso
R is a bad programming language. Learn R.

