
Advanced R programming - iamtechaddict
http://adv-r.had.co.nz
======
danso
> _Although R has its quirks, I truly believe that at its heart it is an
> elegant and beautiful language. While R is a fairly mature language, we are
> still learning how to craft elegant R code: much code seen in the wild is
> written in haste to solve a pressing problem, and has not been rewritten to
> aid understanding._

First of all, what a great endeavor by Hadley...if "all" he had done was
produce ggplot2 (and write a great book about it), that's enough to cement his
elite status. However, what I don't get is... _why R?_ After a few days of
hacking, I was able to produce some nice graphics with ggplot2, but I have to
say that it was by far the hardest high-level language I've had to learn as a
programmer...I haven't used it enough to _love_ , so I'm not at the stage that
I am with JavaScript. That is, I know of JavaScript's problems but know of the
strengths that sometimes derive from its weirdness...and of course, JS is too
ubiquitous to just ignore. However, with R, it just seems some of its quirks
are just _bad_.

I guess my question is aimed more at the angle of: _how does R do the things
it does so well?_ ggplot2 is great enough to learn R for it alone. And some of
the data munging methods, such as `melt`, don't seem to have a well-supported
port in all the other popular languages. I know that Python's pandas has
one...Ruby does not. Is there something about R the language that makes it
especially good at its data and statistical methods (in the way Matlab is
geared toward matrix manipulation)? Or is it just that R was so heavily
adopted by the stats community that, if they had picked another language, that
language would have just as great as functionality as R does.

Note: I suffer from selection bias, though...a lot of the people I chat with
are data scientists, where R is so ubiquitous. It may be that Python pandas is
_just as good_ as the R libraries, but I just know more R-users than Python-
users.

~~~
jzwinck
Python's NumPy, SciPy, Pandas, Matplotlib, SciKits, and StatsModels are very
formidable, and have most of the good stuff R has, plus Python itself has a
lot more good stuff (from Boost Python to really basic stuff like argparse),
minus some horrible stuff that R has (such as the affinity for global
functions like `rm()` which seem to be named like Unix tools but which do
other things, or the `c()` function which is impossible to Google for, or the
abysmal default error reporting, or the use of dots in variable names).

But R has some things going for it. There are some algorithms and tools which
exist in R but nowhere in Python (this set seems to both shrink and grow over
time as both languages add more stuff). R's overly-terse syntax for some
things is annoying for maintainers of R code, but R hackers enjoy it because
they tend to be all about banging out piles of stuff quickly.

R also comes with a lot of stuff included that in the Python world would fall
under many different umbrellas (see the several names I mentioned at the
beginning--those are just some of the basics). Whether it's true or not, R
users perceive Python as being relatively balkanized, with that long list of
packages just to get started, and with the Python 2 vs. 3 divide which has
plagued it for years and will continue for a while still.

~~~
tfigment
My experience with R is about 2 years old but your comments are spot on. I
selected R initially because it had the only good autoregressive-moving-
average (ARMA) calculation that was good and also fast that was requested by
my users to do some data extrapolation. I could see its promise but I'll be
damned if it wasn't the most annoying language to use for general things like
accessing a database to get the data. I eventually got it everything to work
but it was not easy to automate and deploy.

Ultimately the ARMA calc didn't do what they wanted mostly because ARMA was
the wrong thing to use on the dataset in the first place, IMNSHO. This could
my general lack of experience with R but I've been programming for 15+ years
and it was one of the rougher languages to work with.

Anyway I ported the code to python, numpy, scipy, scikits (and most
significantly the time series stuff) and it was much easier to pull in the
data an apply smoothing filters and do some general data clean up work but the
ARMA was nowhere to be seen and I settled for simple linear and quadratic fits
and think it did a better job of forecasting. I really liked some things that
R did automatically like when trending data it added confidence intervals on
the forecasts. I was actually tempted to port the ARMA libraries to python
over this but didn't want to dedicate the time to debug and validate it. R was
really good for interactive manipulation but python was better for actual
deployment.

~~~
hadley
Connecting to databases in R is way harder than it should be. It's something I
want to work on in the future.

------
bedatadriven
Wow! Terrific. We've needed a resource like this for a long time in the R
community, and Hadley is the one to write it!

------
Wonnk13
This is a tremendous resource on the level of John Chambers's book Software
for Data Analysis.

------
CharlesMerriam1
I always look at a language's error handling. First piece I see is 'There are
three ways that a function can fail' followed by a six item list.

No one expects the exception.

~~~
hadley
That chapter (like the entire book) is still a work in progress and I'll
hopefully fix the most egregious errors before publication ;)

------
joelthelion
Is the whole book available in one page somewhere?

~~~
hadley
No, because it will be for sale eventually, and that's the deal I struck with
my publisher. But if you dig around in
[https://github.com/hadley/adv-r](https://github.com/hadley/adv-r) you can
find a script to make a single pdf...

------
leondutoit
This is a great contribution to the community, thanks so much. I'm sure it
will make writing R code even more enjoyable.

