

How much of R is written in R? - g-garron
http://www.r-bloggers.com/how-much-of-r-is-written-in-r/

======
etrain
I think both lines of code and number of files are terrible metrics to use to
compare sizes of code base across three very different languages. I don't have
any experience with Fortran, but as a seasoned R and C hacker, I don't think
the two languages could be much more different.

Well written R code tends to be incredibly compact, because the functions
available in base R are plentiful and the language is both heavily functional
and vector-oriented. The amount of manual memory management and explicit
looping required by C easily inflates the lines of code.

A better metric, perhaps, would be to count the number of functions written in
each language. Of course, there are issues of style there, but I think that
would lead to a more comparable estimate. I don't know of a tool that does
that for C - anyone know of one - or should I go the parser route?

------
rgovostes
R is a pretty terrible language for many reasons, and it always surprises me
when it rises to the top on HN.

But in terms of code quality within R itself, I was surprised to find that a
number of its .c files are actually machine-translated Fortran, so I'm
guessing the author's statistics are not far off.

I discovered this when I decided to confirm my suspicions that my (the?) most
frequently used function, t(), which computes the transpose of the matrix, was
implemented about as naively as possible. If R developers were really
concerned with speed, this is probably the first place to start optimizing.

~~~
gfodor
Can you go into more detail? I've been toying with learning R for ad-hoc
analysis but if there is a better alternative worth learning I'd love to hear
about it.

~~~
dagw
Depends on what you mean by "better". MATLAB and python+numpy will almost
certainly run faster than R in almost all situations, they are also far more
pleasant to program in (in my opinion).

However R has the advantage that it will have support for every obscure
statistical analysis routine you can ever think of. It also has better support
for reading in data from all kinds of sources and handling things like missing
and invalid data. So if your goal is to quickly read in a bunch of data sets
(that are small enough that performance isn't a critical issue) from arbitrary
sources, run a bunch of statistical functions on that data and turn those
results into pretty graphs, then R is pretty great.

~~~
onan_barbarian
Correct. The language R is _just plain weird_ (insane ideas about
scoping/binding that seem to be completely unlike anything you've seen in any
reasonably designed language in the last 20 years) and not very efficient to
boot, unless you hit one of the bits that's just C under the hood.

However, the vast, vast repository of every statistical analysis under the sun
- not just 'core R' but every thing that any statistician has hacked up - is
unparalleled.

My 'coping with R' strategy is to do all the heavy lifting data manipulation
in C/C++/Python, then do one-shot things in R. I just pass csv files around
but there are tighter integrations of R and python if you want to look into
that.

~~~
dagw
Just as an aside, "coping with R" would be an awesome concept for a book or
series of blog posts.

~~~
onan_barbarian
Or maybe 'Living with R', sort of akin to the self-help-book 'Living with
Chronic Fatigue Syndrome' type genre.

------
joblessjunkie
<http://www.ohloh.net/p/rproject/analyses/latest>

~~~
palish
Interesting... According to ohloh, Ubuntu has just 4k lines of code:
<http://www.ohloh.net/p/ubuntu>

Must be some pretty dense code!

~~~
rat
seems unrelated to ubuntu, its seems to be this
project:<http://code.google.com/p/zecurrencyconverter/>

------
rubergly
The HN title's typo is very confusing...

------
zeratul
How R competes with other data mining environments:

[http://www.kdnuggets.com/2011/08/poll-languages-for-data-
min...](http://www.kdnuggets.com/2011/08/poll-languages-for-data-mining-
analytics.html)

------
cschmidt
It is certainly a good thing (for speed) that the majority of lines of R are
written in C. I used to work in a shop that did much of our development in R.
We always used to joke "R is really fast if you write it in C".

~~~
singingfish
R is quite slow, but there are two ways to improve this state of affairs.

1\. Use the functional style stuff rather than loops ( i.e. especially the
apply family of functions)

2\. For large data sets avoid the default memory management which involves
loading everything into memory all at once. The sqlite dataframe stuff is
probably a good default for larger data sets :)

It's still slow though.

~~~
hardboiled
R has always been a niche language/platform for dataists/statisticians and
because of that it really hasn't had the contributions of programming language
developers and modern techniques.

It'd be cool to see R evolve more quickly as a language (implementation wise).
We're getting hints of it with the new byte code compiler though.

That would be ideal.

------
swah
Why didn't he use cloc ?

~~~
shabble
My usual tool is sloccount (<http://www.dwheeler.com/sloccount/>), but it
doesn't identify R as a language (and discounts it entirely, it appears).

------
spp
How well R competes with SPSS?

~~~
TalGalili
In what metric? The GUI of SPSS is better then R (which doesn't have a GUI,
though interesting competitors are available like Rcmdr and Deducer). In terms
of everything else (performance, graphics, statistical tools, programming
language), then from what I understand - R is the winner without a doubt...

~~~
spp
I learned R some time ago in university. I've heard of lots of newly graduated
colleagues that now work doing "SPSS consulting" (whatever that means) for big
businesses. But I can't really see how you could use R to parse financial
data, becaule I lack the financial background to understand what it means.
Maybe that's what SPSS provides, and what singingfish means with "you don't
need to know what you're doing".

~~~
billswift
Actually, I read his comment as meaning that SPSS was more useful for the sort
of "cookbook stat" usually taught in business schools.

