

When to use Excel, when to use R? - rams
http://michaelmilton.net/?p=286

======
secretasiandan
I would stress that with Excel, changing the underlying data set, especially
dimensions of your data is annoying and not very automate-able. This is the
case even when the analysis is the exact same or very similar as the original.
You either need to copy formulas over for every data set/sheet or write VBA
code to automate this, which still isn't as flexible as in R.

Excel is better if you want to pore over the data: study a piece, scroll down,
study some more. R is better if you can easily pinpoint the areas you think
are interesting and index directly into the array. Excel has indexing
capabilities (F5 which goes to an address), but not by row/column names.

Additionally, you can't do multi-step analysis that involves segmentation/sub-
setting of the data very easily with Excel. This is related to the first point
about changing an underlying data set or its dimensions. There are ways around
this, but they are VERY annoying.

I use R for the majority of my analysis. I used to be a Matlab user but needed
to find a cheaper tool when I left my job. I still use excel for simple data
inspection and when I need to send my analysis to other people to peruse.

~~~
3pt14159
"I would stress that with Excel, changing the underlying data set, especially
dimensions of your data is annoying and not very automate-able. This is the
case even when the analysis is the exact same or very similar as the original.
You either need to copy formulas over for every data set/sheet or write VBA
code to automate this, which still isn't as flexible as in R."

In certain circumstances you are right. Wanting to use the same spreadsheet
for a very, very similar task is often annoying. This can be avoided if you
knew from the start that you wanted to make something more reusable, but sadly
that isn't always the case. Of course there are ways of automating them (VBA
as you mentioned, or DB connections when appropriate, sometimes UDF, or just
organizing the data in the correct way) but this is definitely one of my major
annoyances.

"Excel has indexing capabilities (F5 which goes to an address), but not by
row/column names."

Actually with the name manager you can do this. Simply name the row(s) and
column(s) you want to index by, press F5, and type "my_col,my_row" and you are
there. It even does the "and" selection for you, which you can easily exit out
of if you want. Not sure if it is in versions prior to Excel 2010, but I don't
really find myself using this all that much.

"Additionally, you can't do multi-step analysis that involves
segmentation/sub-setting of the data very easily with Excel. This is related
to the first point about changing an underlying data set or its dimensions.
There are ways around this, but they are VERY annoying."

Your first statement is only correct if you mean "you can't do _some_ multi-
step analyzes" because some of them are just plain simple. Also, I don't find
the occasional annoyances (which typically have to do with using match(),
vlookup(), cross_product(), or similar functions) anywhere nearly as annoying
as some of the other things I have to deal with when programming. Dates can be
a pain in the ass. Even in Ruby.

I'll be clear in saying I have no idea about what R can do. I've only toyed
with it due to lack of time, but I do know that Excel can do a bunch of really
cool things. Solver, remote UDFs written in any programming language
(including Ruby running on a linux box), forms for pointy haired bosses,
pretty awesome charts if you know what you are doing, cubes, goal seek, data
tables, macros, formula auditing, db connections, etc.

What I don't like is the general vibe that Excel is, and only is, for MBAs and
R is, and only is, for comp sci kids. That just isn't true, and I don't have
to know anything about R to make that claim because I know some comp sci kids
some of the time get a ton of value out of Excel. Right tool, right job.

------
uuid
The arguments are good and sound, but could be simpler:

If you can "grasp" your data by looking at it, use excel. If your data is too
much to look at, use R.

Excel lends itself more to shallow exploratory approaches, while with R, you
have to think first (or rather: always).

~~~
stcredzero
What about a graphical front end to R? I wonder if 90% of R's use cases could
be represented in the form of a flowchart diagram, which then would generate R
code. One could have the automatically re-run the script when the diagram
changes and render graphs and other output. I was thinking of having a "remote
control" for such an app on the iPad, with the ability to use the
keyboard/mouse and script certain things by hand to enable the more esoteric
10%.

~~~
carbocation
I wonder if Rcmdr would be up your alley:

<http://socserv.mcmaster.ca/jfox/Misc/Rcmdr/>

Screenshot: [http://socserv.mcmaster.ca/jfox/Misc/Rcmdr/Rcmdr-
screenshot....](http://socserv.mcmaster.ca/jfox/Misc/Rcmdr/Rcmdr-
screenshot.html)

------
jedbrown
Please don't use spreadsheets for statistical analysis (even the simple kind
that the spreadsheet claims to support).

<http://www.jstatsoft.org/v34/i04/paper>

------
vtail
The arguments are good, assuming you're the only person working on the
analysis.

As a consultant I'm always constrained to use the greatest common denominator
- that is, something that my clients can use, modify and extend _themselves_
after I leave. In 99.9% of the cases, that means Excel, regardless of the
task.

~~~
secretasiandan
While your clients need to look at the output of the analysis or the end
result, they don't need to see every branch of it that you have explored.

I will do serious exploratory analysis in R and then show my conclusions and
maybe a few alternatives in Excel, but that doesn't mean you have to always
work in Excel. Further, I could not be anywhere near as productive in Excel as
I could in R.

~~~
vtail
More often than not people are interested in tools and ability to generate
answers themselves, not just answers.

~~~
uuid
A very valid point.

------
dmnd
I'm just starting to go beyond Excel, but I chose to learn Python and scipy
over R due to reading that R's scripting language didn't match up to Python.
Am I handicapping myself by ignoring R?

~~~
brettnak
You could always use sage if you want scientific computing in python. It makes
interacting between different libraries like SciPy, NumPy, and a ton of other
math libraries pretty strait forward. You can even choose your version of
numbers if you want. It also has rpy.

www.sagemath.com

------
Calamitous
So here's a hypothetical question: let's say you have a programmer who knows
next to nothing about statistics, knows how to navigate around Excel fairly
well, and once allowed himself to be scared away by R.

Let's further assume he's suddenly inherited a large volume of load testing
data and a mandate to "make something out of it."

What's a starting point? I, I mean, _he_ hears about the great visualization
stuff in R and understands the importance of it but has no clue where to
start.

Would HN have any advice for him?

~~~
samuel
His book "Head First Data Analysis" seems a fairly gentle introduction to data
analysis. I've read it and it's enough to learn a couple of tricks. Where to
go from here it's harder to figure out. I'm refreshing my stats classes with
the Pearson, and looking for a good introductory R book.

This stats course from Berkeley isn't bad, too:
<http://www.stat.berkeley.edu/classes/s133/schedule.html>

~~~
Calamitous
Thanks :)

------
klochner
MBA: Excel

CS: {R, Python, Matlab, Mathematica}

~~~
chbarts
CS people might be put off by the fact R is really a pretty lousy language if
you've seen any other language (like, for example, Python). It's obvious that
it wasn't a programmer who designed the language R uses.

This blog posts points out a list of specific gripes about the language:
<http://tjic.com/?p=10739> There is some real weirdness to this language.

The R Programming Language for Programmers:
<http://www.johndcook.com/R_language_for_programmers.html> does a lot to
ameliorate this, but the fact the weirdness is there at all makes a programmer
a little uneasy.

~~~
xtho
Think of it as a DSL, statisticians like to use. While it has its dark sides,
its vector-orientation is well suited for a language whose main purpose is
data manipulation.

~~~
chbarts
> Think of it as a DSL

I can't let this stand as a blanket defense of crappy languages. We have Lua.
We have Python. We have Scheme. We have multiple other languages that can be
used as a good way to access the goodies provided by your wonderful hand-
crafted libraries regardless of what they do. There is no reason for you to
invent your own language just so people will be able to use your library code
from a REPL.

Sadly, back in the dark ages, this was not true. However, in the future,
anyone who tries to defend the horrible design of a new language with "Think
of it as a DSL" gets to debug a 1000 KLOC application written in ANS COBOL
1968, which is a DSL for fixed-field database munging.

~~~
xtho
Let's put things in perspective: R is S and S had it's first appearance in the
1970s. It's current reincarnation is from the late 1980s. I remember that
other statistical packages back then had a similar looking syntax. This is
also the reason why lua/python stand no chance against R: they simply lack the
vast abundance of statistical packages and tutorials. The best thing they can
do is use R as an inferior process (IIRC there is a package for incanter that
does that).

The point is though that S/R isn't that badly suited for what it was created
for. There are a few pitfalls but those are explained in the langage
definition. But unfortunately nobody RTFM nowadays (like people used to do
back then in what you call the dark ages) because it's easier to start
screaming for Mommy and write stupid blog posts that prove nothing but that
those people haven't read the language definition.

And BTW, python sucks.

------
forkandwait
I find R syntax to be really annoying, and I am trying to work more in Octave,
a matlab compatible system. I don't like the mostly gratuituous complexity of
the R type hierarchy, and prefer to have everything be a matrix. I also detest
the use of periods as plain text. FWIW.

<http://www.octave.org>

------
tel
Honestly, once you learn them both, it's best to use Illustrator when you need
good presentation and R all other times. I can think of very little that Excel
can do that can't be done faster in R and prettier with Illustrator.

So I'd condense the argument to: if you're not comfortable with R/Illustrator
and don't have time to learn, use Excel.

~~~
jamesshamenski
can you easily update data feeds to illustrator without having to redo your
drafted work?

~~~
tel
It's generally for finishing and publication. For rapidly updating graphics,
the libraries lattice and ggplot2 are both good enough looking.

------
helwr
matplotlib is my best friend, matlab is my second best friend

excel? are you kidding me

~~~
Estragon
Yep, I use matplotlib pretty much exclusively for this kind of thing. I have
my eye on clojure's Incanter, though.

~~~
phren0logy
Incanter is really shaping up. I can't say enough good things about it.

------
tcc619
has anyone used both R and matlab/octave?

i've used matlab but not R. For machine learning, I found matlab great because
of its great matrix support.

what's the pros/cons of R? what advantages does R have over matlab?

