
R for Excel Users - elisebreda
http://blog.yhat.com/posts/R-for-excel-users.html
======
roel_v
The thing is though - R is just not Excel. For many tasks (blasphemy!), Excel
is simply better.

Take quick scenario evaluation. In Excel, I change one or two cells and see
everything update immediately; in R, you need to re-run your analysis and find
the outputs you're after again. There is built-in support for scenarios in
Excel. In R you have to code your analysis around it, and do the formatting
for comparing, too.

Or take the libraries. Anything that is not statistics is just a pain in R.
Everything is possible, sure - but high friction. PMT()/IPMT() functions in R?
Good luck. Sure it's easy to code yourself (which is the advice you get when
asking R users %| ) but I'm using something high level to not have to bother
with that sort of thing!

Graphs? Yeah, there's plot() which is straight out of 1960, or ggplot - which
is easy for the simple things and then devolves into afternoons chasing
obscure manual pages for this or that setting. Here's one: plot 360 degrees of
a sine wave, and it's first and second derivatives. Then explain an Excel user
how that works. (this is both because anything non-stats is bolted onto R, and
because ggplot is designed around stats graphing.)

Keeping a matrix of data, like a simple database? Sure R can read dozens of
file formats from CSV to HDF, but actually editing/maintaining that data is a
pain in the ass. Excel, just add a sheet and use vlookup - it will let you
sort and filter and copy and validate, all without leaving one software
package.

Yes, there are many things wrong with Excel. FFS, there is a conference on how
to not screw things up in real life with Excel. But saying 'just use R' is
silly. For most applications where starry-eyed grad students advocate R over
Excel (because hey, nail/hammer, right?), Excel is just the better choice,
even if that means workflows that make programming-literate folk die a little
inside every time we have to work with them.

~~~
gordon_shotwell
From the tone of your post I'm thinking that you've made up your mind on this
point, but there really is a lot of richness and extensibility in R which
isn't available in Excel. For instance my blog is written in an R library
called Blogdown, you can process 5 GB datasets on your laptop, send a
reproducible analysis to a colleague, stuff like that.

I guess the thing I would ask you to consider when thinking about this stuff
is the amount of time you've spent learning Excel. At the time I was learning
R I probably had several thousand hours of focused Excel practice under my
belt, and I could do a lot with the tool. So Excel was a way better tool for
me than R because I was an Excel expert and an R novice. After now putting in
about that same amount of time working with R I can say it's a much more
powerful and extensible tool. But if you mostly work in areas where
Excel...uh...excels, then there's no real reason to make the switch.

~~~
CJefferson
You don't seem to have really answered any of the points, like good in R to
interactively edit and view data, and update the graphs automatically. Also,
in R I would love an excel style graph wizard -- it could even output the R
required.

I need to use R about once a year -- each time I've forgotten everything from
last time, and use a combination of stackoverflow and swearing to do whatever
I need to do.

~~~
gordon_shotwell
Hmmm, I'm not sure there is an Excel style graph wizard, although there's this
package which is close [https://www.r-statistics.com/2016/11/ggedit-
interactive-ggpl...](https://www.r-statistics.com/2016/11/ggedit-interactive-
ggplot-aesthetic-and-theme-editor/)

For PML() kinds of functions, I often just google something like "Excel PML()
function in R" and something usually turns up: \-
[https://cran.r-project.org/web/packages/optiRum/optiRum.pdf](https://cran.r-project.org/web/packages/optiRum/optiRum.pdf)
\-
[https://gist.github.com/econ-r/dcd503815bbb271484ff](https://gist.github.com/econ-r/dcd503815bbb271484ff)

Another good tactic is to follow some of the R quants on twitter. A really
popular package for this is
[http://www.quantmod.com/](http://www.quantmod.com/)

------
vkb
This is a fantastic article for intermediate beginners. On HN, everyone is a
senior data scientist working with Spark and Keras and Tensorflow and deep
learning.

In the real world, there is a huge chasm of difference between people just
learning Excel and developers, not many people even understand why you would
switch away from the former when it's so convenient, which is why the
difficulty v.s. complexity chart is so great, and may actually speak to people
in an approachable way.

There are a lot of tutorials for how to do hard things and how to do easy
things, but not a lot for how to think of the hard things in terms of the easy
things, and this falls in that category. Another good book on this topic is
Data Smart by John Foreman, where he goes over basic data science skills in
Excel.

~~~
minimaxir
> This is a fantastic article for intermediate beginners. On HN, everyone is a
> senior data scientist working with Spark and Keras and Tensorflow and deep
> learning.

Although there is a ML/AI selection bias in HN, there are certainly a lot of
people on HN who fall into the intermediate/beginner category (there is a lot
of demand for R tutorials which I have been working on), although I would
argue that dplyr can legitimately be used at the advanced level. And certainly
Keras/Tensorflow is overkill for common business problems.

~~~
vkb
There are definitely a lot of people who fall into those categories, but the
articles/links give the impression that everyone is senior. Which is why it's
great when articles like this come around.

------
ktamura
So glad to see this. As someone who learned both Excel and SQL on the first
job (it's called being an analyst at a trading firm), I always felt that SQL
was a "more powerful but clunkier cousin" of Excel and wished there was a
tutorial for SQL aimed at Excel ninjas.

Well, fast forward half a decade and nobody wrote it, so I did so awhile back.
Of all the things I wrote for work, it continues to be the most popular:

[https://blog.treasuredata.com/blog/2014/12/05/learn-sql-
by-c...](https://blog.treasuredata.com/blog/2014/12/05/learn-sql-by-
calculating-customer-lifetime-value-part-1/)

~~~
dgudkov
Check out my EasyMorph ([http://easymorph.com](http://easymorph.com)) -- you
can get the best of both worlds with it -- SQL-like logic applied to tabular
datasets in an Excel-like fashion: visual, reactive, and with built-in
visualizations.

------
rhcom2
Also R is 1 indexed instead of 0 indexed. (Arrays are numbered [1,2,3] instead
of [0,1,2] for the beginner).

I forget that probably every 5 minutes when working in R.

~~~
Mikeb85
So is Fortran, Julia, MATLAB, etc...

~~~
peatmoss
It's only the Jonny-come-lately languages like Algol and its descendants that
are zero-based :-)

~~~
Avshalom
Algol, like Ada, actually requires you to specify your bounds and is perfectly
happy to have 0,1,87 or -62 as your first index.

~~~
auxym
So does excel VBA.

But man, are arrays a pain in VBA.

~~~
eon1
Why do you say so? I find them pretty easy to handle. Variant arrays can be
pulled and pushed straight from/to Range objects, you have LBound and Ubound
to iterate over arrays if you don't feel like you can trust the base, or
otherwise you have For Each. And you have Collection and Dictionary if you
have different needs - you can even define a class, instance it and push to a
Dictionary, now you have a key-value store full of your Object.

------
_Wintermute
That's a nice tutorial for beginners, though sticking dplyr code in functions
is skirting dangerously close to the world of hurt that is non-standard
evaluation.

The example:

    
    
        join_and_summarize <- function(df, colour_df){
            left_join(df, colour_df, by = "cyl") %>%
            group_by(colour) %>%
            summarize(mean_displacement = mean(disp))
        }
    

Will go really badly wrong if someone following the tutorial simply replaces
the `disp` with a function argument.

~~~
dandermotj
The standard evaluation versions of all dplyr functions are available, just
add an underscore to the end: filter_, select_, ...

~~~
minimaxir
A cool feature of select_ I discovered accidently is that you can pass in a
vector of column names to get the corresponding columns, which is _much_
easier than abusing which() to lookup indices.

~~~
_Wintermute
Not sure if I'm misunderstanding, but you've always been able to slice on a
vector of column names.

    
    
        cols <- c("colA", "colB")
        dataframe[, cols]

~~~
minimaxir
True, although not as a part of a dplyr chain, which is cleaner.

~~~
disgruntledphd2
extract and extract2 from magrittr also work. I do normally use dplyr though.

------
krembo
This dude should learn to use PowerPivot, it will blow his mind and get him
back to the old beloved Excel hands. Most of his examples of "why move to R"
can be easily solved with PowerPivot and PowerQuery, with a nice GUI and not a
clunky command line.

------
saosebastiao
Brings back great memories. Excel/VBA was my first programming "language"...a
basic requirement of the supply chain management job I was doing. The limits
of Excel for data analysis pushed me to R (+SQL!) which was my first real
programming language. Since then I've done professional work in Clojure,
Scala, OCaml, F#, Typescript, Prolog, and a handful of constraint/optimization
languages like AMPL, Mosel, and Minizinc.

------
SubiculumCode
There is a lot to like in this article, but the artificial division between
Excel and VBA for Excel is well, artificial. I've since moved to other coding
environments like R and python, but in the past I frequently used VBA to parse
and tabulate excel files; Excel VBA is quite capable of a lot of complex
manipulations...often done slow as hell...but capable.

------
losteverything
Great article. But I'm an old spreadsheet jockey in non tech jobs now. I use
Sheets for quick and dirty stuff.

Where does one "get" R?

~~~
phillc73
For Linux users, it is probably installable through the distribution's package
manager.

For others, the R language website:
[https://www.r-project.org](https://www.r-project.org)

Also try the IDE from RStudio to make life more pleasant:
[https://www.rstudio.com](https://www.rstudio.com)

Good luck! It is a fun journey and a big, deep, wonderful, rewarding rabbit
hole.

------
kanaugust
Instead of choosing between the two extreme ends of the spectrum, Clicking
Buttons vs. Writing Code, you can use a tool like Exploratory
([https://exploratory.io](https://exploratory.io)) (Disclaimer, I’m a co-
founder of Exploratory), which provides a modern and interactive UI to access
to all the amazing things of R without needing to write R code. You can get
immediate value out of R in the first few minutes with its grammar based
(dplyr verbs) data wrangling and beautiful and interactive visualization.
Everything you do in Exploratory is recorded as R function that you see in UI
so you can learn R while you work on data, and you can export your analysis
works as reproducible R scripts. Many of our users use Exploratory to not just
learn R, but also learn data analysis / science methods, which to me is more
important than learning the tools themselves.

------
IndianAstronaut
One of the best ways to get a good handle on basic R is to switch all the
Excel stuff to R. Force yourself to use only R and you will find that you have
a really good grasp of the environment in a couple months.

------
oli5679
It would be great if Microsoft integrated native R/Python into Excel.

------
lucb1e
This is not "R for spreadsheet users", it's "why bother learning R in the
first place". Was hoping for the former.

------
thanatropism
Has anyone used the Rodeo IDE they're selling?

~~~
minimaxir
Rodeo is oriented more toward Python users...as RStudio is an unbeatable IDE
for R.

------
jtcond13
This article is pretty good instruction, but _very_ good SEO. I know quite a
few people asking this question.

------
morsmodr
Kudos for using Kill Bill analogy!

~~~
gordon_shotwell
I'm thinking of writing a package which displays R errors as Pai Mei gifs.

~~~
cwyers
I would kill for an R package that displays R error messages as something
useful. Especially if it's happening 200 lines deep in a library somewhere, it
would be nice for that library to catch the error and display something that
pertains to what I wrote instead of just throwing whatever error occurs on
that line. Or at least tell me what library the error occurred in so I don't
have to guess.

