Hacker News new | comments | ask | show | jobs | submit login

R is certainly a unique language, but when it comes to statistics I haven't seen anything else that compares. Often I see this R vs Python comparison being made (not that this particular article has that slant) as a come drink the Python kool-aid; it tastes better.

Yes; Python is a better general purpose language. It is inferior though when it comes specifically to statistical analysis. Personally I don't even try to use R as a general purpose language. I use it for data processing, statistics, and static visualizations. If I want dynamic visualizations I process in R then typically do a hand off to JavaScript and use D3.

Another clear advantage of R is that it is embedded into so many other tools. Ruby, C++, Java, Postgres, SQL Server (2016); I'm sure there are others.




> R is certainly a unique language

I'd say R is a _terrible_ language. Its types are just really different from every major programming language, and it's horrible for an experienced programmer to use.

I totally agree that R has fantastic libraries, but I'd like to see people focus on improving libraries for Python rather than sticking with R, which as a language is less well-designed than Python.

[I use R for most of my stats, I also use Matlab and Python]


I think you're wrong. R is an excellent language, targeted specifically around the problems you commonly see when doing data analysis. On the whole the standard libraries aren't particularly good, but I think the language is good.

That said, the language is often taught poorly. Here's my attempt to do better: http://adv-r.had.co.nz


Well, time to bring out my favorite dead horse to beat:

   - http://stackoverflow.com/questions/1815606/rscript-determine-path-of-the-executing-script
   - http://stackoverflow.com/questions/3452086/getting-path-of-an-r-script
(where you already commented, so it's not like this is something new...)

I would say that any language that does not have a facility to get the path of the current file, is not 'excellent' under the criteria an experienced programmer would use for assessing it.

Now, I very well know that those criteria are different from what scientists use, but still...


I think R is a great language for certain applications - namely statistics and some data analysis. Your work has certainly made it better.

However, from a computer language design point of view - it leaves a lot to be desired. It's type system is seems very complicated and while the language tries to do what it thinks you want, it's not always clear what is going on (are you working on a matrix or a dataframe that has been cast into a matrix?).

For me, R is one of those languages that is good in a certain domain, but once you get out of that domain, it makes things more complicated than they need to be. It just isn't a general purpose language. By far, the biggest problems I've seen have been people who only know R (mainly stats people or biologists) try to do something in R that would be a quick 10 line Python/Perl/Ruby/whatever script.

Normally for a language design, you aim to make easy things easy, and difficult things possible. For R, it seems like it makes difficult things easy and easy things difficult. Maybe that's the tradeoff that was needed. :)

That said - please keep doing what you're doing. You've made my R work vastly easier.


Hadley,

Thank you for all of your hard work! Keep on keeping on; your contributions have been phenomenal!


> as it explains some of R’s quirks and shows how some parts that seem horrible do have a positive side.

That sounds promising, I'll check it out, thanks.

I think R is a great tool, but I maintain that it is not a well-designed language by modern standards.


Could you give a couple of examples, where R is substantially superior to Python?


I'm not qualified to comment on how good or bad a language R is. But it is maddening how package developers don't follow some convention for naming functions. I load a package that I haven't used recently and I know the function I want but can't remember if it is called my_function, myFunction, my.function, or MyFunction. Google published an R styleguide, https://google-styleguide.googlecode.com/svn/trunk/Rguide.xm.... Does anybody follow it?


Definitely with you there. Even perl has more consistency. And thanks for the guide link! :)


Hmm what do you mean about the types being different?

My experience was exactly the opposite -- first time I saw R syntax (actually, it was S-Plus back then...) , I thought it was the most intuitive and powerful system I've ever seen -- this was after fairly extensive experience in C and C++, as well as a few others.

Now, I don't quite think so any more, because there are many rather tricky things buried under the surface (e.g. how many people really understand how exactly environments work?) -- but the majority of R programmers will never have to deal with them in their code...

Also, I have definitely done general-purpose coding in R -- for a lot of things it is completely adequate. Python has more general-purpose functions and libraries of course, similarly to how R has more statistical ones.


I've used python for years, decided to teach myself R for a masters class I'm taking.

I have to disagree. Its main model is generic function method dispatching. It can feel odd at first to someone coming from the C++ style of OO where objects own methods, not methods owning objects. But it's a legitimate OO style with its own advantages. [1]

I've found the more I use R, the more intuitive a lot of its operations are. It's relatively easy to "guess" what you ought to do to accomplish what you want. More so then other languages I've learned.

1. https://en.wikipedia.org/wiki/Dynamic_dispatch


When people argue that R is terrific language, I remind them that it has 4 (four) objects systems which differ in subtle ways between each other. It's programmers' nightmare.

It's not the worst language in the world, but it isn't terrific language either.


I'd also say the CRAN repository is awful, it discourages collaboration, and is typically written by small groups of academics who write the worst documentation I have ever seen.


I blame the R documentation standards. They force a package author to produce a useless alphabetically-listed pdf, and many people just stop at that point.

Without any standards at all, people would have at least produced a readme.txt, which would have been a huge improvement -- e.g. I much prefer working with unfamiliar user-written Matlab packages :)


I don't know why so many people complain about R documentation, I think it's pretty good. The PDFs are useless for sure, but you don't have to use that. Emacs displays documentation pages in a split window. Or you can use a web browser.

https://stat.ethz.ch/R-manual/R-devel/doc/html/packages.html


Function docs are fine, but they are not really that helpful in figuring out how to use a new package.


It seems that you are looking for vignettes in fact. Examples of use

library('zoo');

vignette('zoo');

#####

library('ggplot2');

vignette(package='ggplot2')


I am (sort of); but most packages don't have vignettes. Zoo and ggplot2 (and a few other major packages) have great documentation, but they are an exception.


It certainly needs a Pypi like rating or popularity system.


As an experienced programmer who started using R in the early days I feel that dealing with its quirks got me ready for the current modern languages


Just to toss another name into the ring, I'd say that Fortran is pretty suitable for numeric calculations of all sorts.

I like R as a higher level language (or I guess tools like SPSS or preferably PSPP for even higher level stuff). These days I do most of my academia stuff with R (mostly hypothesis and equivalence testing and the things related to it like power analysis etc.)

I've never really looked into Python which is strange because I use it as a "glue language" quite often. I think I'll investigate Python a bit more next time I have to actually collect and clean up the data before using it. Right now I'm more of a consumer (mostly using data from our experiments that are turned into CSV)


Absolutely; modern Fortran is great and is syntactically rather close to Matlab (and to an extent R as well).

The main difficulty with Fortran is IMO the lack of an extensive standard library -- sure, you can find code out there to do almost anything, but then you need to figure out linking/calling conventions/possibly incompatible data models for each new library you bring in...

But, as another poster mentioned, it is quite straightforward to call Fortran from R :)


matlab was originally released as a fortran library (pre 1.0), so it keeps a lot of that heritage even though it's probably c/c++ now: http://www.mathworks.com/company/newsletters/articles/the-or...


Yeah -- although doesn't it actually pre-date Fortran 90? I wonder which direction the influence went :)


> Just to toss another name into the ring, I'd say that Fortran is pretty suitable for numeric calculations of all sorts.

> I like R as a higher level language (or I guess tools like SPSS or preferably PSPP for even higher level stuff). These days I do most of my academia stuff with R (mostly hypothesis and equivalence testing and the things related to it like power analysis etc.)

You can see R as some sort of glue language around libraries written in lower languages like C++, C or Fortran (I believe a large part if not all the functionalities for matrix operations used by R for linear regressions and statistican analysis (PCA) is written in Fortran).

Fortran code runs much faster, but you don't want to use it to do exploratory analysis ("I have those data about people, what if I filter out the people earning more than X before checking if there is a correlation between the average age where men get married and their incomes?").


> I'd say that Fortran is pretty suitable for numeric calculations of all sorts.

It is indeed. And R works with Fortran quite easily.


Could you provide an example in stat analysis where python is clearly inferior? In the article, R seems to have an advantage of having many useful stat functions baked in vs having to import specific modules in python. im wondering if your proficiency in R is being weighed in your evaluation of R - maybe python's statistical analysis tool has many to offer, but you are more aware of R's toolsets.


I'm primarily a Python user and can say that there's no contest that R has many packages that Python does not have an equivalent of yet. This includes stats stuff and especially finance/trading. Definitely not a showstopper for me but if I were to recommend one or the other to people at work with no programming skills, I would have to choose R for the breadth of existing packages.


>>but if I were to recommend one or the other to people at work with no programming skills, I would have to choose R for the breadth of existing packages.

My 2 cents: If someone has no programming background, then building a foundation from python will allow them to do much much more than building a foundation on R--unless of course they only care about statistical analysis and have no inclination to code more generally. I learned both at the same time even though I had no use for Python at the time (was and still am a professor) but I use it almost everyday now and very much enjoy it!


Agree completely. Should have qualified that with most at my spot/industry(finance) would be using it as an Excel replacement and just want to get things done; hence the value of existing packages.


thank you for your reply


Also, ML academics tend towards R for reference implementations of novel algorithms. They are often available in R first. This cuts both ways; sometimes the Python implementation that comes later misses some subtleties of the R implementation that the original authors nailed, and other times the R implementation is a proof of concept, while a later implementation is more real-world ready. But the latest and greatest tends to be available in R long before it has made its way into e.g. SciPy.


I can't easily do GAMs or SEM in Python.


Great comparison. However, I find R's syntax as obtuse and baroque. Like a shovel with a compartment that carries tweezers. Advocates tend to argue that for moving dirt, this 'R' shovel is far more precise than an ordinary 'Python' shovel. But Python is in fact more like the toolshed from which both tools are housed plus a whole lot more.


I think the new packages from Hadley Wickham are beautiful and straight forward.

https://cran.rstudio.com/web/packages/dplyr/vignettes/introd...

End example from a airplane arrival and departure dataset:

flights %>%

  group_by(year, month, day) %>%

  select(arr_delay, dep_delay) %>%

  summarise(

    arr = mean(arr_delay, na.rm = TRUE),

    dep = mean(dep_delay, na.rm = TRUE)

  ) %>%

  filter(arr > 30 | dep > 30)


Well yeah, and I use them, but they're a bandaid over the fundamental problem that just like in Perl, in R TIMTOWTDI. It's the classic 'we have 12 standards, time to make a unifying one - now we have 13' problem. I've sort of gotten used to it now, but it was majorly difficult at first for me (after having programmed for nearly 20 years) to get used to the concept that any task can be done in 20 different ways, each one just as 'valid' or 'easy' or 'maintainable' as the others. At least in C++ there are 20 bad ways to do something, and one good one - the way that Sutter covered in his columns. I know it's not quite fair to compare 'just' the C++ programming language to R and all its packages, but still.


Just curious, what in particular did you find obtuse?

It's not like R does not have obtuse and baroque parts, it certainly does, and their obtus-ity is rather high, but IMO they are not parts of the language a casual user would likely encounter...

On the other hand, Python has quite a few pitfalls itself -- but I suspect a casual user would, for example, run into Python default arguments a bit sooner than she would run into R environments :)


Using R from within Python works pretty well for all those unique R packages which don't have a python equivalence.


Thanks; I suspected support for embedding R within Python already existed as well, but I wasn't sure about that one.


Rpy2 is the python library you probably want: http://rpy2.readthedocs.org/




Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | Legal | Apply to YC | Contact

Search: