
R, the master troll of statistical languages (2012) - misiti3780
http://www.talyarkoni.org/blog/2012/06/08/r-the-master-troll-of-statistical-languages/
======
Gravityloss
The problem is that here the "generic" tools like "apply" are not really
generic. Hence you can not use your intelligence and creativity to deduce a
working program using a small toolset that you know fully.

Instead you must use a huge number of special tools that do only a few things.
The code is hard to read, hard to write, slow, the compiled version is big. It
is also error prone since you must use a large number of different paradigms.
Some might have their arguments slightly differently. It's produced by
googling for everything (or if there's a decent builtin help system, using
that).

It seems strange that such concepts, like generic data types, operators and
functions are not more widely spread. For example if you can calculate
"max(a-b)" even if "a" is a matrix and "b" is a scalar, that's already quite
nice. This can increase productivity a lot.

Since such design models are so fundamental, they are rarely found in "matlab-
like" extensions for other programming languages: instead you must constantly
do type conversions and transformations by hand and use awkward middle man
functions to access data types. Most of your code is housekeeping and little
actual progress.

See for example the problems with Julia where column and row vectors are
totally different types: <http://2pif.info/op/julia.html>

~~~
pygy_
_> See for example the problems with Julia where column and row vectors are
totally different types: <http://2pif.info/op/julia.html> _

That was one year ago. They are still different types, but they behave as you
would expect wrt arithmetics.

    
    
        julia> x = [1,2,3]
        3-element Int64 Array:
         1
         2
         3
    
        julia> y = [1 2 3]
        1x3 Int64 Array:
         1  2  3
    
        julia> y'
        3x1 Int64 Array:
         1
         2
         3
    
        julia> x + y'
        3x1 Int64 Array:
         2
         4
         6
    

Their size and dimensions are still different, though, but it is mostly a
concern for library writers.

    
    
        julia> ndims(x)
        1
    
        julia> ndims(y)
        2
    
        julia> ndims(y')
        2
    
        julia> size(x)
        (3,)
    
        julia> size(y)
        (1,3)
    

See [https://groups.google.com/forum/?fromgroups=#!msg/julia-
dev/...](https://groups.google.com/forum/?fromgroups=#!msg/julia-
dev/W2C9-HbXH3o/P8A8jo7FHSoJ) for the discussion that resulted from that post.

 _Edit:_ I should add that there's now a mailing list dedicated to statistics
with Julia: <https://groups.google.com/forum/#!forum/julia-stats>

~~~
alok-g
This is pretty much the only thing that I do not like about Julia. Matlab also
uses this concept, and I have had my Matlab code broken several times because
of this issue.

~~~
foobarqux
Trailing singleton dimensions in N-D arrays was a far worse problem that I
believe Julia fixed.

------
minimax
R has a rich heritage of trolling unsuspecting programmers coming from other
languages. In ancient versions of R from the early 2000s, the underscore
operator was synonymous for '<-', aka assignment. This is why in R some use
periods for separators (ice.cream) rather than underscores (ice_cream), though
just to make things more lulzy some package authors have started to use Java
style camel case (iceCream) too.

I saw an example of R code using underscore for assignment posted on twitter a
couple of months ago. See if you can make heads or tails of this:
[http://www.stat.washington.edu/hoff/Code/hoff_raftery_handco...](http://www.stat.washington.edu/hoff/Code/hoff_raftery_handcock_2002_jasa/dist.r)

~~~
pygy_
_> This is why in R some use periods for separators (ice.cream) rather than
underscores (ice_cream)_

... and since the dot is legal in identifiers, they use $ as lookup operator.

    
    
        > ice.cream$flavor
        [1] "chocolate"    "vanilla"    "butterscotch advocado surprise"

------
guylhem
This is the very same critic that was (and still is) made about perl.

 _> > I won’t bother to explain all of these; the point is that, as you can
see, they all return the same result (namely, the first column of the
ice.cream data frame, named ‘flavor’)._

Having many ways to do one thing is good - whatever float your boat you know.
Just for the given examples, I see how one may be better in a loop (the one
where you use 1) while another one may be better when you type it on the
command line to check stuff (the one with flavour)

 _> > The answer is that when you’re trying to learn a new programming
language, you typically do it in large part by reading other people’s code–_

No. You learn by doing. You write some stuff, test it, and if you don't get
the result you expect go back to try and figure out what's wrong.

And even before doing that, you set aside some time to _LEARN_ about the
language.

But maybe, just maybe, the problem is not with the tool but with the person
using it?

 _> > I have to confess that I’ve never set aside much time to really learn it
very well; what basic competence I’ve developed has been acquired almost
entirely by reading the inline help and consulting the Oracle of Bacon Google
when I run into problems. I’m not very good at setting aside time for reading
articles or books or working my way through other people’s code (probably the
best way to learn), so the net result is that I don’t know R nearly as well as
I should._

At least that's honest. Maybe you don't do well with a language (any
language!) because huh, you didn't take time to study it and expect it to
magically work??

I know I'm no good in java, but I also know why - I never took some time to
actually learn it. I can do things with java, but not very complex things, and
when I fight my way out of a problem I caused, I won't blame java but myself
and my lack of knowledge of java.

~~~
Locke1689
_No. You learn by doing. You write some stuff, test it, and if you don't get
the result you expect go back to try and figure out what's wrong._

No. This is how you learn to do whatever simple things you want in a language.
The way you learn how to do things _correctly_ is mainly by reading code. If
you have an expert to review your code that's better, but almost no one has
that opportunity.

Edit:

I guess I should come right out and say what I'm thinking: not everyone's
opinions about a programming language are equal. I suppose this is why people
are constantly having discussions about the pluses and minuses of "dynamically
typed" languages, while type theorists don't even recognize "dynamic typing"
as a form of typing. Expert analysis has consistently shown problems with the
R language design and implementation. It's not a good language. Its features
are often misused or poorly used and it doesn't have a strong sense of what
support it wants to give to its users.

[http://channel9.msdn.com/Events/Lang-NEXT/Lang-
NEXT-2012/Why...](http://channel9.msdn.com/Events/Lang-NEXT/Lang-
NEXT-2012/Why-and-How-People-Use-R)

<http://www.cs.purdue.edu/homes/jv/pubs/ecoop12.pdf>

~~~
TrevorFancher
I've been interested in type theory for a while, but haven't found a good
avenue for getting into it.

Could people here list any resources on type theory that comes to there mind?
Books, blogs, people, etc. A book that explains the fundamentals would be
great.

~~~
benbataille
While not being exactly about type theory like Pierce books (you can't waste
your money on those), I really like the part about types in "The
Implementation of Functional Programming Languages" by Peyton Jones (the
chapters about type system are written by Peter Hancock). It's seen mostly
from a technical point of view. The book notably contains a fully explained
implementation of a type checker.

You can check it online while waiting for the Pierson ones:
[http://research.microsoft.com/en-
us/um/people/simonpj/papers...](http://research.microsoft.com/en-
us/um/people/simonpj/papers/slpj-book-1987/PAGES/V.HTM)

~~~
benbataille
Sorry for the ambiguity. It's certainly not a waste of money. I found "Types
and Programming Languages" really good (for whatever that means, I'm far from
being an authority in the subject). It seems to be wildly used as a textbook
and is a reference in the subject. While I didn't read it, I expect the second
one to be in the same vein.

------
mwexler
R is a great set of statistics wrapped in a crime of a programming language.
You have to fight the latter to get to the former. Is the fight worth it? I
and many others say yes... but it sucks that we even have to make that
decision.

------
jamesjporter
The metaphor I use to explain R is that most programming languages are like
different varieties of swiss army knife: general tools that can do a lot of
different stuff with relative ease. R, on the other hand, is like a fillet
knife: its totally obtuse for _most things_ , but in spite of its oddities it
outshines all other choices at one specific task (statistical data analysis /
filleting a fish respectively). Depending on what sort of camping trip you're
going on you might want to bring one or the other or both.

------
alexholehouse
Language issues asside, R is being propelled by an _awesome_ group of
developers (such as Hadley Wickham, Dirk Eddelbuettel, John Myles White, Brian
Ripley etc). These people are why R is as successful as it has become - both
through continued work on various aspects and packages and direct interaction
with users new and old. Frankly, there should be some kind of slightly awkward
parade for them all.

~~~
cschmidt
I'd say that R is awesome, and that R is a fairly annoying language, all at
the same time. (I'll go back to writing some R code now.)

------
jimmar
I think R's biggest strength of R is the package ecosystem. I would not trade
the large community of active package developers for slightly friendlier
syntax.

~~~
carterschonwald
Indeed, the ecosystem of actively deved numerical libs is the only upside to R
over other tools. It's a pretty large cliff for any other tool chain to climb
over.

\--someone who's writing numeric/ data analysis tools in Haskell.

~~~
pilgrim689
Off-topic but: Can you comment on the significance of this cliff for a team
considering moving from R to Haskell for data analysis? Is the availability
for statistics packages really sparse in Haskell?

~~~
carterschonwald
the answer is: it depends! Shoot me an email at Wellposed and I can try to
better answer your question.

I am quite literally building a full data analysis stack (as a product) in
haskell, some parts of which will be available as a sort of proprietary
augmented version of the haskell platform, and some parts are / will be open
source.

I do think that there are compelling reasons to consider Haskell / GHC for
analytical workloads, but depending on the details it really depends.

The principal cliff is just the HUGE number of (mostly poorly designed)
libraries for many standard analyses written in R. Theres some nice
engineering approaches to circumvent this, and theres some really exciting
libs that a uniquely awesome and handy in haskell land.

A notable example is AD, a really easy to use auto differentiation lib by
Edward Kmett, which has a really exciting refactor thats nearly done that will
make it useable by mortal Haskellers :)
<http://hackage.haskell.org/package/ad> and <https://github.com/ekmett/ad>
(I've some neat bits i'll be hopefully adding to AD myself in the next month)

~~~
pilgrim689
Thanks! I'll keep an eye out for these releases. :)

------
dnc
When I think about R I find it difficult to stop making analogies between it
and JavaScript. Both are dynamic, "script" languages with C-based syntax and
with LISP-like nature that lurks underneath it. Both have functions as first
class citizens. Both are inconsistent in different ways. Just to name a few
that first come to my mind ... I think that these analogies reduced
frustration that I had felt about R once when I "discovered" them and helped
me to adopt and learn R.

~~~
rck
That makes sense - they were both inspired by Scheme (R is in many ways just a
Lisp that uses M-expressions), but evolved to deal with immediate concerns
that, in retrospect, have created difficulties as the years have gone by. It
makes R and JavaScript great and frustrating at the same time.

------
tocomment
So why not just use Python these days?

~~~
rm999
At this point, mostly library support. Python is quickly getting closer to
being a viable R replacement, but it's simply not there yet IMO. The two
biggest holes in python for me are:

* The lack of built-in dataframes and libraries to work on them (like plyr). Pandas seems to be getting pretty good, but it's still not as mature as R's solutions.

* Visualization. ggplot for R is great, matplotlib for python not so good IMO. I've heard Bokeh and rplot are attempting to bring ggplot functionality to python. Again, not nearly as mature as R's solution.

I'd love to move to Python because R is not a fun language to develop software
in. But at this point, R is by far the better tool for working with data (for
my needs at least).

~~~
hadley
I'd love to hear what makes R not-fun to develop in. I'm always on the look-
out for common pain points.

~~~
rm999
A large pain point for me is remembering the huge number of useful commands in
R - I use at least 5-10x as many keywords in R as I did in C++. As I mention
in another comment I basically need a reference (usually google and ?) at all
times to keep track of these commands and their parameters. In many ways it's
great that all these functions exist because I can analyze datasets an order
of magnitude more quickly (in development time) than I could in a more
traditional language, but developing in R is far more stressful to me. When
I'm writing production-level code I constantly have to worry about readability
and whether someone else who is reading my code will intuitively understand
what I'm doing. With so many commands and so many ways to do things this can
be challenging. I find my R code is often lower 'quality' than what I write in
something like Python or C.

There are many little idiosyncrasies in R's syntax, I feel like I never
grokked the language. For example, pretty much anytime I see '~' I have to
relearn what is going on. From a mathematical perspective I appreciate that
vectors are indexed from 1 instead of 0, but from a programming perspective it
can be annoying.

BTW, thanks for contributing so many great things to R, I owe a lot of what I
do to you.

~~~
hadley
I think a big problem is not just the language, but how it's usually taught -
in other words, I think you shouldn't need such a large vocabulary if you
learn the right primitives (which don't always exist in base R). The lack of a
solid common foundation it also what makes code readability a challenge.

Part of the problem is the base packages: as soon as you open R you have ~1600
functions you can use, and you obviously can't memorise a significant
proportion of them. Learning R is as much about what you don't learn as what
you do.

~~~
rm999
Great point, I'll have to reevaluate how I use R. My training in R was very
informal and mostly involved reading other people's code, so my "vocabulary"
is probably a union of other peoples' and is probably too large.

Actually one thing that has helped in the past year was reading your
split/apply/combine paper and using the plyr package more.

~~~
hadley
I'm working on a curriculum for basic programming in R, hopefully that will
eventually help outline what everyone should know about R, regardless of what
they use it for.

------
gmac
_R has a million related built-in functions like sapply(), tapply(), lapply(),
and vapply()_

Exactly that. I've always found R a horribly confusing mess, compared to
general programming languages (I'm proficient in Ruby, JS, Obj-C).

On the other hand, compared to some of the other commercial stats packages,
it's beautiful and logical and reasonable. I regularly use Stata, where you're
only allowed one data table in memory at once, and almost everything relies on
side effects and Byzantine macros. E.g want to calculate a mean? First,
'summarise' the variable, then assign 'r(mean)' to a var name, then quote that
the right way to be substituted into an expression where it's needed.

------
camus
let's stop using the word 'troll' at this point ok ? it's clearly an insult.
doesnt make sense to call a language that way.

------
WayneDB
This was actually a fantastic introduction to an important aspect of R which I
had only briefly explored in the past.

