

An R programmer looks at Julia  - TalGalili
http://www.r-bloggers.com/an-r-programmer-looks-at-julia/

======
dkarl
If you read the original blog post, you won't get a headache from the hundreds
of missing spaces in the r-bloggers version:
[http://dmbates.blogspot.com/2012/04/r-programmer-looks-at-
ju...](http://dmbates.blogspot.com/2012/04/r-programmer-looks-at-julia.html)

------
haberman
I've written a few hundred lines of R sporadically over the last several
years. The absolute worst thing about it in my opinion is the type system. It
does not matter how many times I use R, I cannot for the life of me remember
or understand the difference between vectors, arrays, lists, data frames, and
matrices. A list is sort of like a mix between an array and a map, a matrix is
sorta like a 2d vector but can have row/column names, an array is like a
matrix but different, and data frame is like a heterogenous matrix. And
converting between them is always tricky.

As much as R may be capable of, I just can't get past how inconsistent and
complicated its basic types are.

~~~
chubot
The terminology is weird. I'm not an R expert, but here's how I think of it:

vector: this one is clear based on the name; it's a homogeneous sequence (with
very aggressive type conversion). A sequence of strings, a sequence of
numerics, etc. One thing worth knowing is that there are no atomic types, so
c(1) == 1. That is, the value 1 is identical to the singleton vector
containing 1. Also the empty vector c() is identical to NULL! is.null(c()) ==
TRUE. Weird.

list: the name is confusing, but I think of it basically like a dict in
Python. And the syntax is the same: list(a=1, b=2) vs dict(a=1, b=2). I think
you can use it like a sequence as you are saying, but I never use them that
way. Lists are for ad hoc composite types -- if I want to return 2 values from
a function, I return a list() of them. I think you can convert lists to
environments easily, or they are the same -- also similar to Python's dicts.

data frame: This is the core type AFAICT, it is basically a collection of
named column vectors of the same length. e.g. data.frame(name=c("a", "b",
"c"), value=c(1,2,3)). This seems pretty intuitive. A row has different types
(like a DB relation) but the columns have the same type since a column is
vector.

matrix: I don't use these too much, but it basically seems like a homogeneous
type like vector, except you specify the dimensions.

array: I don't use this, but the R documentation says "A 2-dimensional array
is the same thing as a matrix". So I think I am confused and what I typed
above is an "array", and matrix is the special 2D case. Yes the names are bad.
I think of a matrix as having arbitrary number of dimensions (e.g. in matlab).

I think where it gets confusing is that there are all these arbitary
conversions. And you can use things more than the prescribed ways, so you
might stumble across code that uses them wrong. But after a fair amount of R
programming, there is my mental model, whether right or wrong :)

I think a lot of the mess comes from the fact that dealing with real data is
just messy. R takes the mess and makes the common case convenient, and people
like that. But it's like Perl in that it's a "Do what I mean" language and
tries to guess a lot, rather than "Do what I say" like Python. And when it's
guessing your intent wrong it can leave you very frustrated, as with Perl.

~~~
TalGalili
Hi chubot,

Two things:

1) A data.frame is in fact a list of vectors of the same length "compacted"
together.

2) I find the types very "sensible" for a person doing statistics. But I guess
(almost) everything makes sense once you get used to it...

------
chimeracoder
I have to come to love R (for what I use it it for), but reading this makes me
realize how unusual my R-workflow must be, because most of the 'advantages' of
Julia over R don't really come up in my daily workflow anymore - it seems
that's likely because I've adapted to the shortcomings of R and have twisted
other tools to my needs. I'll add Julia to my list of languages to check out
in more detail, because perhaps Julia could replace my need for this rather
esoteric workflow that I've developed out of sheer necessity.

I use Python (NumPy/SciPy) for most of the data preprocessing, and perhaps
that's why. I used to do this in R, and I realized that it's just a lot easier
to get done in Python (and it ends up being faster anyway). The problem is
that Python/NumPy/SciPy still doesn't lend itself _quite_ as well as R does to
certain aspects of the statistician's use case. It's possible that things have
changed since the last time I evaluated the two, but I still find it easier to
prototype various _models_ in R, even if I do all of the preqrequisite data
munging in a different environment.

I understand that R, like Perl, is 'blessed' (pun intended) with two
different, incompatible type systems - in fact, this is the reason I avoid
using R's type system, and whenever I'm advising newcomers, I always recommend
the same. I don't write statistical packages, so this doesn't come up, but
when I find myself needing to write a method in R, I ask myself if this would
actually be done more easily another way instead. Generally, I find the answer
is 'yes, yes it would'.

I really do think the problem is the type system. The kind of type system that
lends itself well to data manipulation is not the same type system that lends
itself well to model manipulation - when I think about it, I've unconsciously
segregated my workflow into two parts, doing everything naturally done with
Python's type system in Python, and likewise for R. Maybe that's just the way
that I happen to approach data manipulation, but I think it's non-
coincidental. R's relative homoiconicity (compared to Python) makes it really
nice for some things, but there are other warts with its typing that are just
too annoying to work around, when a python shell is just a few keystrokes
away.

I guess the answer is (as always!) to use a purely homoiconic Lisp dialect, so
you get the best of both worlds but that's asking a lot of statisticians.

I really have come love R for what it does do, though. Of all all the
statistical software packages I've seen (comparable: SAS, SPSS, Stata,
MATLAB), it's far and away the best (and the GNU license makes it very, _very_
attractive to broke students looking to avoid the still-absurdly-priced
student licenses for the alternatives). That said, I still sigh every time I
realize that I'm essentially gluing together two separate runtime environments
for something that should really be easily integrated. I do what I do now
because it ends up being faster than using either Python or R for everything,
but it still strikes me as weird that a language so perfect for munging data
(Python) can still be so awkward for analyzing it, and vice versa.

~~~
drunkpotato
That is very interesting. I use Python to pre-process data for Matlab, and
have been giving serious thought lately to learning R for its free license and
easy(?) integration with Hadoop. Can you briefly comment on the advantages or
R over Matlab aside from licensing?

~~~
chimeracoder
If you're already used to Matlab, then you may not find my comments as
relevant. _If_ you were already proficient in both, then they're both
interchangeable for many tasks (which is in fact why I always recommend
learning R over learning Matlab).

Licensing isn't just a minor thing - getting Matlab to run on non-Debian Linux
is a painful ordeal. I never actually got it working, because I never bothered
to debug its cryptic error messages, and since it's distributed as a
precompiled binary, I wasn't going to sit around trying to patch it. A
corollary is that R is easier to integrate into other toolkits, and there are
a _ridiculous_ number of freely available R libraries that make your life
easier.

My issues with Matlab may be things that someone familiar with the language
would care less about. That said, I find Matlab to be incredibly, incredibly
irritating, and I think that's because it's design is tailored towards people
with minimal experience with other programming languages (like research
scientists), whereas R's design is simply based off of S - so I find it
violates the Principle of Least Surprise less. Matlab is not like Lisp or
Haskell (where the journey of understanding the language is valuable in
itself) - it's really just a means to an end (number crunching), so the POLS
is _especally_ important.

R, unlike Matlab, imposes almost no restrictions on the structure of a
program. The way I see it, Matlab makes Java's broken one-class-per-file model
even worse, by imposing _more_ filesystem-level restrictions on my program.

R, unlike Matlab, uses a type system that's more familiar to someone used to
programming with multiple datatypes, as opposed to someone used to thinking in
terms of strictly numerical structures. I never got the hang of when I should
index with () or {} or [] Matlab ... I'd have to look it up to tell you. R, on
the other hand, is more like Python in this regard - even if it's not _quite_
as clean as Python, it makes basic things like importing/maniplating CSVs much
easier than Python (or even Excel, which is even designed around that exact
purpose).

R, unlike Matlab, returns the last value computed, not the last values with
the same local names as the return value names.

R, unlike Matlab, uses a more intuitive (to me) definition of dimensions (and
of row- vs. column-vectors). I spent 80% of my time in Matlab figuring out how
to get dimensions to match in a robust manner, and I've never had to do that
in R.

You get the idea - my frustrations with the language itself are mostly with
the fact that it's so unlike most other languages, and it's too much of a
hassle to learn. My frustrations with the language environment is that the
free alternative (R) is much easier to work with, and much more cross-
platform.

~~~
drunkpotato
Thank you, that's a good list! Dealing with cell arrays and text vs. numerics
is why I do my pre-processing in Python. Matlab's job is to read in the data,
run it through various algorithms, collect accuracy statistics, and show
plots.

For us the issue is not so much Matlab as a programming language, but rather
availability of new algorithms and ease of parallel processing. The licensing
issues involved in getting the parallel toolbox running on multiple
workstations seems like a headache, which is part of what is motivating us to
look at R.

------
eddie_the_head
I wonder if anyone has written up a comparison on R and J.

~~~
TalGalili
There is now: [http://www.r-bloggers.com/comparing-julia-
and-r%E2%80%99s-vo...](http://www.r-bloggers.com/comparing-julia-
and-r%E2%80%99s-vocabularies/)

~~~
eddie_the_head
That's comparing R and Julia, not R and J.

