

The SAS vs. R Debate - ulam2
http://inside-bigdata.com/2014/03/01/sas-versus-r/

======
jzwinck
The article says this particular instance of the debate started in 2011.
Things have shifted a little since then, and I think Python has won more
mindshare with Pandas, SciPy, NumPy, and all the rest. I've used both Python
and R, and think the next debate will be between those two, as people find
that R is not a very good programming language and lacks decent libraries for
things like web scraping.

Python can be a single tool that integrates with every part of your workflow.
R right now still wins in the number of algorithms implemented in it (there
are statistical methods not available in R but not Python), and R has more
terse syntax which some people like for interactive use. But for really Big
Data, terse syntax and an endless variety of esoteric algorithms are not as
important as, say, robust error handling and debugging (a weak area in R, but
a strong one in Python).

~~~
krick
> R right now still wins in the number of algorithms implemented in it

Can you please list some important tools implemented in R but not in Python?
I'd like to know how bit the gap is.

~~~
mjfl
Is there an equivalent Python package to R's {stats}?

[http://stat.ethz.ch/R-manual/R-patched/library/stats/html/00...](http://stat.ethz.ch/R-manual/R-patched/library/stats/html/00Index.html)

~~~
jofer
Have a look at statsmodels:
[http://statsmodels.sourceforge.net/](http://statsmodels.sourceforge.net/) and
for more basic things, scipy.stats:
[http://docs.scipy.org/doc/scipy/reference/stats.html](http://docs.scipy.org/doc/scipy/reference/stats.html)

It's not a one-to-one match, but the majority of the functionality is there.

------
RobinL
I use of both Python (pandas) and base SAS at work for UK government.

I have lots of experience in SAS, and enjoy using it. The macro language
allows for very succinct solutions to difficult data manipulation problems.

However, given SAS's huge expense it's difficult for me to identify any
'killer' areas where it's significantly better than open source tools. Indeed,
I find pandas faster and easier to use for many problems.

I find it hugely frustrating that the government pays so much money for SAS
licences and training when most people use it for simple use cases, where they
would be better picking up transferable skills (e.g. Python, SQL, R).

My understanding is that that SAS supposed to be good at processing very large
datasets because it uses RAM efficiently (only the PDV is stored in RAM). But
in reality, a small minority of users are processing datasets that are too big
for RAM (e.g. 16gb+) and there are probably better tools for the job in this
use case.

One user here comments that SAS is like an 'improved Excel'. In fact, I find
pandas much closer to Excel than SAS because (in ipython notebook at least),
you get nice visual representations of your tables, and it usually isn't
difficult to translate an Excel operation into a pandas one. I especially like
the multi-index and pivot table based capabilities. With a background in VBA
for Excel, it's also relatively easy to pick up Python.

None of this is quite so obvious in SAS, which has quite an unusual data step
and macro programming language. It's very powerful, but is quite unintuitive
to begin with due to a complete reliance on the program data vector.

~~~
Fomite
I regularly produce data sets that are too big for RAM, and not having to
worry about that in SAS was a luxury.

I'd say SAS's two big "Killer Features" are the DATA step and SAS Press. I
still have yet to find R or Python nearly as pleasant to work with for
manipulating the data set itself when compared to SAS, and the SAS Press is
excellent at putting out books detailing a given type of analysis, and how to
implement it in SAS. I still turn to them for basic references even when not
using SAS.

------
JasonCEC
My company uses R, Shiny, and Rserve for nearly _everything_. R is a great
programing language - if you need to quickly and efficiently develop stat's
based features for medium sized data.

R excels (get it?) at creating reproducible, fault tolerant, consistent
functions that can be automated, packaged, applied to a variety of data types,
and then extended later.

Our web-stack is Shiny on AWS and we call our API's built in R (ML, images,
data, etc) from Android using Rserve.

A lot of the (programing?) criticisms of R will be 'solved' or become non-
issues in the next few years. Multithreading, implicit vectorization, better
memory handling, gpu functions, among other things are all in the pipe :)
(That said, the syntax _is_ a little weird to get use to)

\-----

* We're hiring for very senior positions in data-science and more general R programers. Contact me if you're interested (JasonCEC [at] Gastrograph.com)

[edited for spelling]

~~~
bsg75
Are you using the open-source or "Professional" editon of Shiny?
([http://www.rstudio.com/shiny/server/](http://www.rstudio.com/shiny/server/))

~~~
JasonCEC
We're a beta client for Shiny Pro. The security features and server monitoring
are quite good, and well worth it!

------
zmmmmm
The problem with R is that it's just not a very good programming language.
It's great for interactive analysis, but dismal for building higher level
abstractions. It's like the PHP or MySQL of the data analysis world. Data
types get magically converted all over the place, the global namespace is just
a giant playground for every module to pollute, it has something like 5
different object systems all with subtle differences. All the defaults that
are set for the convenience of interactive use undermine any kind of reliable
use for building on as a platform (for example, the "simplification" concept
where a 1 column data frame often magically turns into a vector).

I've forced myself to use R intensively for a couple of years now, but I must
say it's still a relief every time I bail out and get back to a "real"
programming language.

------
dekhn
My favorite part about the SAS v R debate was the conclusion of this article:
[http://bits.blogs.nytimes.com/2009/02/16/sas-warms-to-
open-s...](http://bits.blogs.nytimes.com/2009/02/16/sas-warms-to-open-source-
one-letter-at-a-time/)

"""In the article, Ms. Milley said, “I think it addresses a niche market for
high-end data analysts that want free, readily available code. We have
customers who build engines for aircraft. I am happy they are not using
freeware when I get on a jet.”

To her credit, Ms. Milley addressed some of the critical comments head-on in a
subsequent blog post."""

(Boeing uses R heavily and when you fly on their aircraft, you're flying on
open source)

------
opensandwich
Since SAS is a relatively simple language, why can't someone just write a
transcompiler that supports a subset of SAS and move it to R? That way you
have the best of both worlds (sort of).

The most difficult thing about that is how you would treat "by" statements
(SAS) vs the split-apply-combine (R).

Self-plug: I sort of made a quick hack about a month ago for SAS-Python, I'm
sure someone with more programming experience than me could produce something
much better (I come from a maths background).

[http://nbviewer.ipython.org/gist/chappers/8747253/stan_examp...](http://nbviewer.ipython.org/gist/chappers/8747253/stan_example.ipynb)
[https://github.com/chappers/Stan](https://github.com/chappers/Stan)

~~~
groovy2shoes
BAE Systems has a product called NetReveal that includes a compiler called
"DataServer" which compiles a large subset of SAS into Java. Legend has it
that the original author wrote the first version of DataServer in 6 hours on a
train ride to visit his mother.

As a programmer, I was constantly frustrated by it. I felt that SAS as a
language was pretty restrictive, especially when your algorithm wasn't a
natural fit for the dataset model that it uses. It was like trying to shove a
square peg into a round hole. It wasn't uncommon for me to sneak some Java
straight into the output, but that wasn't really a sustainable / maintainable
way to use it.

So your suggestion is doable, but I'm not convinced it's worthwhile.

~~~
opensandwich
That is unfortunate. I have heard of NetReveal (I used the Detica platform at
my previous job) never new that about DataServer though!

I suppose the demand just isn't there. If only more SAS users at least had an
interest in programming...

------
pistolpete20
I used to work for one of the largest U.S insurance companies. They were
always behind in transitioning to new technology (Excel 2003 could be found
there in 2013). That being said, the entire staistical modeling team and
research department made the switch to R and Python. Only a few clung to SAS
but realized they would be forced to move to R as any collabration would need
to be converted to R and not to SAS.

I believe it will be R verus Python future and SAS will not be a part of it.

~~~
christopheraden
I work in big insurance, too. How did you handle porting your legacy code into
(R|Python)?

We had tens of thousands of lines of SAS code from over the course of 10 years
(macros, clinical programs, reporting functionality) of SAS programming that
the higher-ups never saw the benefit of switching to Python or R to be
feasible, especially with the Biostats team working on existing projects.

------
mbq
Questions on StackOverflow: 49 878 R, 2 191 SAS; on Stats StackExchange: 5 524
R, 260 SAS.

------
stcredzero
Where does Stata fit into all of this, and why is it never discussed on HN?

~~~
bertil
Stata is seen as a less powerful, more usable version and cheaper version of
SAS; it's also comes off as less flexible than R when it comes to more complex
queries. Companies with large teams can afford what statistician would see as
a better tool; R is seen as the favored tool for the lone analyst with
extensive background, or the hard-science academic. Stata on the other hand is
mainly favored by social science practicionners and some more statistics-
inclined marketing people, neither dwell Hacker News too much.

------
ropz
These people:

[http://www.teamwpc.co.uk/](http://www.teamwpc.co.uk/)

produce a compiler, tools etc that run the language of SAS.

(disclaimer: I interviewed there last year)

~~~
reeses
That's awesome. Can you tell us anything about how close they have come and
how they've avoided SAS's competitive litigation? :)

~~~
dataminded
They have been sued and a very interesting ruling came out of it.

[http://en.wikipedia.org/wiki/SAS_Institute_Inc._v_World_Prog...](http://en.wikipedia.org/wiki/SAS_Institute_Inc._v_World_Programming_Ltd)

~~~
reeses
Wow, parts of that ruling are stunning. Thank you.

------
Fede_V
Is this even a discussion? Anyone serious about analyzing data will use either
R, Python (with Pandas/SciPy, etc), or Julia. For truly immense data sets that
require pipelines, you'll use tools like spark, hadoop, etc - but SAS is
basically a slightly improved excel.

~~~
jensgk
> SAS is basically a slightly improved excel.

SAS is a pretty big system, and I have worked with SAS for about 10 years, and
I can't see any similarity with Excel at all. Which part of the SAS system do
you think resembles Excel ? Here is a list of their products
([http://support.sas.com/documentation/productaz/index.html](http://support.sas.com/documentation/productaz/index.html))

Btw. I do think I am doing serious data analysis in a bank :-)

~~~
flatfilefan
He might have only seen Enterprise Guide, which indeed resembles Excel. But
that a good thing, actually.

~~~
Fomite
That's my guess. Personally, for 99% of my uses for SAS, all I see is the
script window, the log, and the output window. That's about as far from
'Excel' as you can get while still _having_ an interface.

