As an aside, I used R briefly for a econometrics/philosophy course I took. I recognized immediately it was a powerful, functional language. What I wonder, though, is if the scientific programming libraries in Python might eventually be a better environment. Surely there must be some R users here who can comment.
if the scientific programming libraries in Python might eventually be a better environment.
I also believe R is easier for statisticians who cannot and do not want to know how to program. There are many advanced techniques that are one-liners.
In other news: Programmers Captivated by Power of C
R, at least in the beginning, comes across more like a powerful set of Excel formulas, so I think a non-programmer might pick it up faster without having good programming form.
(Not to say one is more powerful than the other, or that R programmers aren't as good as Python programmers, etc, I just think it's a difference in community viewpoints)
One major advantage over Python is that it's vectorised, so you can say things like A + 1 when A is a vector (or matrix). A bit like Matlab in that regard, only that it doesn't suck as much as the Matlab ``language''.
This is not very accurate. NumPy / SciPy provide vectorized matrix libraries, significantly faster than both R and Matlab for matrix operations. No argument though that Matlab as a language truly sucks =)
i find that too limiting, found python+numarray+matplotlib and never looked at matlab ever since ... never regret
ironically i got a phone interview request from matlab at the end that year (i was on vacation, never got to that)
And it turns out I still have the syllabus lying around, so I don't need to try to explain it myself: http://www.cs.vt.edu/~scschnei/syllabus.pdf
- All statistical models have assumptions. Even if a model looks like it fits the data, make sure the data doesn't violate those assumptions. If it does, the model doesn't fit.
- Causation can be inferred, with confidence, just by analyzing data.
Honestly, the econometrics stuff was presented poorly. What looked like pages from a book were put up on the projector (and in some cases, I think they were book pages), and the professor would just talk through the page. Picking up anything worthwhile from his lectures was hard - he knew the class had a varied background (some CS, some philosophy, some economics, even one person from marketing), but he still went faster than my prob/stat background could keep up.
The causal inference stuff was presented better, but I think the subject matter is more intuitive in general. His (the philosophy professor) math was graph theory, which I have a firmer grasp of.
however, if you need to do anything systematic, do NOT use R, bugs are elusive and extremely tedious to debug
Also, here are interesting links I've found comparing R and similar statistical programs. Sorry, don't know much about Numpy's capabilities.
R equivalent in speed to Matlab:
Someone's research for a good data analysis language:
If you are working with a lot of heterogeneous data in R it becomes a real headache. Merging data frames seems like it should work like you think it should but if one of your sets of keys (strings) are 'factors' (what I am calling 'legacy S+ functionality', I'm sure they're useful for many algorithms), you'll end up with garbage. There's a hack you can put in your code ('options(stringsAsFactors=FALSE)') which alleviates some of this but in general aligning data I found to be a huge pain. If you're running regressions this is pretty important
haven't tried sage but have heard good things. NumPy is a good alternative because it's extremely well implemented and has consistent behavior across the board. Extensibility (with Fortran, Cython/Pyrex, C/C++) is clean and easy. Never thought I'd write Fortran 77 code being born quite a few years after '77 but it's an easy way to speed up simple procedural algorithms 50x or more.
If you want to merge data frames that were created w/ different factors, perhaps the easiest thing to do is turn your factors into strings?
If d is your data frame, then:
d$factorVar <- as.character( d$factorVar )
merge your two data frames, then
merged$factorVar <- as.factor( merged$factorVar )
should set you right...
Our largest fund and most profitable fund (around $US 25B), which includes some our of brightest people, which uses the Linux server infrastructure I design, also uses R and Python.
Trades are made based on the models created in these languages by our research teams. Our traders simply execute what our researchers models tell them to, when the model tells them to.
So both R and Python are fundamental parts of our business without which our best products could not function.
We don't use freeware either, as neither R, Python, or Linux are freeware. They are licensed software, with OSD compatible licenses.
Can the traders be eliminated? Why pay them if all they do is carry out orders? (I am asking sincerely)
My guess is largely, they can be. The actual trades could be automated (this would become increasingly necessary with rapid-fire [millisecond] trading, which we don't do now but could in future). The meatware oversight could be consolidated to a smaller group of individuals.
Actually for critical control software, I want it to be boring and simple.
However, the argument is still obviously laughable.
“I think it addresses a niche market for high-end data analysts that want free, readily available code," said Anne H. Milley, director of technology product marketing at SAS. She adds, “We have customers who build engines for aircraft. I am happy they are not using freeware when I get on a jet.”
Wow, is that a FUD statement if I ever heard one. Pretty cyncial stuff from Anne H. Milley.
But the person he's quoting, like you, confuses Open Source software / Free Software with freeware, which is generally considered to be unlicensed or public domain software.
These people are not dumb.
I can access the paper because my school subscribes. If anyone wants to read it, figure out how to email me and I'll send a copy.
While searching, I did find a brief email from Ihaka talking about the jump to Lisp: https://stat.ethz.ch/pipermail/r-devel/2008-May/049501.html
In the clinical space, it's all SAS. Pretty much the de facto standard.
Admittedly, i think R and PDL do different things... (I have never played with R).
There are CPAN modules to directly use R from Perl (for eg... R::* & Statistics::R).
I have a stats friend who's been singing the praises of R & PDL for donkey years.
That's an interesting reaction from the first designers of the program.
The tooling, library integration, and debuggers aren't as good as, say, Python, though.
It's an actual programming language. Like Python, you can use both non-OOP and OOP styles. You can define your own packages and namespaces (unlike MATLAB where there's just one big namespace). There are hundreds of contributed packages, you don't need to buy separate toolboxes. Not to forget, it's GPL, you can look at the source and learn a few things from people who know what they're doing (to name a few: Brian Ripley, Terry Therneau, Douglas Bates, Bill Venables).
On the other hand, it can be a bit of a steep learning curve at the beginning, but I feel it's definitely worth it. It's not perfect, I stub my toe on obscure language features from time to time, but to paraphrase Winston Churchill, R is the worst form of statistical languages except all those other forms that have been tried from time to time...
I think that Andrew Robinson's introduction is pretty decent (http://cran.r-project.org/doc/contrib/Robinson-icebreaker.pd...), but there are many others at http://cran.r-project.org/other-docs.html
The downsides are, well, it's slow for large data sets and debugging can be difficult. But as a desktop / rapid development platform for statistics it is without peer, IMO.
ps -- unlike Matlab, which often costs thousands of dollars, and the Statistics Toolbox, more thousands, R is free. This is pretty important on its own -- instead of $5k per server and workstation and home pc, install it on any linux, Mac, or windows box you have and get to work for $0.00.