
ARrgh: a newcomer's angry guide to R (2013) - nkurz
http://tim-smith.us/arrgh/index.html
======
nkrumm
I gave up on R long ago. On python, pandas, numpy, scipy, scikit-learn and
statsmodels are often enough to replace basic R functionality. In the case of
needing an actual R package, I've found it's almost always worth the time to
build a wrapper on rpy2, or use ipython notebook's rmagic extension.

On the note of building wrappers-- it's still a good idea to rpy2-wrap basic
statistical tests and present in both ecosystems. The R functions are battle-
tested and have been looked over by far more statisticians and mathematicians
than their python counterparts (or so it seems).

~~~
hadley
I gave up on Python long ago. Anything I want to do in Python, I can do just
as easily in R.

Rcpp provides a great environment for intermingling high performance c++ with
expressive R code. R has all th r features you'd want for a modern development
environment: good IDE, unit testing, documentation conventions, ... It's easy
to turn analyses into interactive apps with shiny. There's a package for every
model you can think of. You can connect to databases, you can talk to web apis
and scrape web pages.

~~~
jzwinck
R makes it hard for teams to collaboratively develop code in libraries. No, R
packages are not a solution, because each user has to "build" and "install"
them every time another user checks in an update.

R doesn't have a debugger on par with Python's pdb. It also doesn't show you
tracebacks by default when errors occur, which is a huge problem for mere
mortals when something goes wrong.

If you're working alone, using well-tested libraries, none of the above will
stop you. But if you work on a team, these are big problems (for which Python
has solutions built in).

Since you mentioned connecting to databases, let's talk about using MySQL in
R. I downloaded a popular package to do that, and found that the function to
query the database is called "fetch()". Just "fetch()", not "mysql.fetch()",
not "database_fetch()", not "connection$fetch()", but "fetch()". That kind of
sloppy naming is par for the course in R, and it's a problem when your project
becomes larger than yourself.

~~~
hadley
Most of those problems have existing solutions.

1\. Do you know about browser()? That gives you an interactive debugger on par
with most programming languages. Also see the GUI wrapper to the debugger in
RStudio.

2\. Share a library (a collection of installed packages). I'm not sure how
you're doing this with python libraries, but there's probably a
straightforward equivalent in R.

3\. Once MySQL implements the new the DBI 0.3.0 interface, the function name
will dbFetch(). But at heart the function is named that way because R uses
generic function style OO, rather than message-passing OO - it's nothing to do
with sloppiness. I'd recommend reading up a bit on the advantages and
disadvantages of each style of OO.

------
epistasis
>The documentation is inanely bad. I can't explain it.

I'm surprised that the author is saying this as I've experienced exactly the
opposite. R completely documents all the arguments and outputs of its
functions, and documentation is easy to pull up by function, and this is
almost universal both for distribution and community packages. Additionally
the documentation often includes vignettes that show full examples.

In contrast, Python documentation is most often documented on long pages that
mentions functions, but does not describe arguments or the output. I've found
almost no Python documentation to be adequate, outside of some of the core
functions. And when it is adequate, it's exceedingly verbose, and lacking in
examples, basically the worst of all worlds.

~~~
mapcar
I agree, I've read other gripes about R function documentation but it's one of
the better ones for community software. Python's documentation seems focused
on implementation from a programmer's perspective, but often not as helpful
for actual application of the function.

------
acqq
Wow, the language used to code formulas in which it's dangerous to use the
single-letter variables:

[http://tim-smith.us/arrgh/atomic.html](http://tim-smith.us/arrgh/atomic.html)

"This also means that you shouldn't ever assign useful quantities to variables
named T and F. Sorry. Other variable names that you cannot use are c, q, t
(!), C, D, and I."

Note the contradiction of that limitation and the name of the language. Makes
the name even more exceptional.

Is he right? What's with the scope? Can't I introduce a new T in my function
thus just hiding the global one from it, but otherwise not disturbing
anything? (I don't know R, I'm just asking, reading that the variables have
the function scope)

~~~
rcthompson
Yes, all those single-letter names are just ordinary variables that you can
overwrite. Doing so is nearly always a terrible idea.

The article is being a little unclear when it says "cannot use". You can use
literally any variable name in R if you really want to. If the name you want
is already a reserved word (e.g. "for", "else", "function"), or if it is not a
syntactically valid token (e.g. '@!":%$>"@;'), then you just have to enclose
it in backquotes. So the following is valid R:

    
    
        `for` <- 1:5
        `function` <- 5:1
        `TRUE` <- `for` / `function`
        `@!":%$>"@;` <- `TRUE`^2
        print(`@!":%$>"@;`)

~~~
mapcar
and if you overwrite variables like "c", you can always invoke the original
concatenation function as "base::c".

------
weissguy
If I ever strike it rich, I swear to god I'm donating $5,000,000 to the cause
of reaching total feature parity between the best of R's packages and
NumPy/SciPy.

~~~
otoburb
If you strike it rich, would you like to hedge a bit by betting a percentage
of the $5M donation on a promising horse in the race called Julia?

~~~
_almosnow
I'd really like Julia to become the winner of that race.

I've used everything but Octave (sorry Stallman :[) and coming from a CS
background, no other language/platform made me feel more at home than Julia.

------
alilja
The worst part of R is that array indices begin at 1, and trying to get an
array at 0 will fail silently by returning 0. I've spent many a night trying
to figure out why all my data is wrong because my_array[0] * frame$column is
returning the wrong numbers.

~~~
princeb
this gets brought up again and again with no agreement and the only advice I
know how to give you is that when working in any language that has mathematics
as its primary focus (mathematica, matlab, Julia, R), you use 1-index, and
other languages you use 0-index.

~~~
redacted
alilja's issue isn't 1-indexing, it's R's completely insane decision to return
usable numerical values for an array access error. In Mathematica or Matlab
accidentally using 0-indexing would lead to obvious errors, while in R it
often doesn't - especially if you have sparse/many zero data already

~~~
hadley
That's not true. If you index with 0 you get a zero length vector back

------
kephra
The worst point of R is debugging. R scripts often fail without telling any
line number, even if script starts with an "options(error=traceback)". And I
never seen line numbers of warnings.

So you get errors, crashes and warning, and the only way to debug R, is to
inject message() statements all over the code.

~~~
hadley
You probably want options(error = recover). If you're not getting line numbers
you need to upgrade R. You might find [http://adv-r.had.co.nz/Exceptions-
Debugging.html](http://adv-r.had.co.nz/Exceptions-Debugging.html) helpful.

------
joyofdata
> The more you learn about the R language, the worse it will feel.

The opposite is true in my experience.

> R makes me want to kick things almost every time I use it.

Maybe R is not your biggest problem.

> The documentation is inanely bad. I can't explain it.

Good point!

