

Comparison of data analysis packages: R, Matlab, SciPy, Excel, SAS, SPSS, Stata - lrajlich
http://anyall.org/blog/2009/02/comparison-of-data-analysis-packages-r-matlab-scipy-excel-sas-spss-stata/

======
trapper
The biggest problem is that the design decisisons of some of the projects
listed are pretty poor. Who decided that dataset size should be limited to
available memory? Not only does this mean you have a severe limitation to the
framework, but it also effects the programmers mentality. It shows. Take a
look at the R-source (and tests _cough_ ) and you will see the assumptions.

[speaking from experience writing both in memory and disk based data analysis
packages]

Disk based work is pretty straight forward. It's not rocket science, and makes
your code much, much faster when working in memory in most circumstances.

~~~
joe_the_user
I would like think that it wouldn't be that hard to adapt an in-memory system
to use something like a memory mapped file or even a custom cached memory
mapped file. Of course, such a system might not be designed to avoid page
swaps/cache hits.

Of course, this is about how the system could evolve - the possibility might
not help a simple user now.

~~~
trapper
Well, having done exactly that, you really do need a rewrite or fundamental
refactoring. It will touch most functions in your codebase.

Memory mapped won't help when you have a 100gb+ file, and as you say it gets
slow as it's definitely not optimal.

You also need custom indexing structures and data caching strategies for most
algorithms that aren't easily moved to disk. And most aren't unfortunately.
The other issue is that you end up doing a lot of research, because there just
aren't many people who have done this. It's a time sucker.

I must say it was awesome seeing our decision tree system running on huge
dataset sizes (tested > 100gb) in similar time (~30 seconds) to an in memory
database after indexing.

------
rdixit
Scipy+numpy+matplotlib, get enthought's distribution for one stop shop and an
interactive shell via Ipython. my 2 cents. Although short-term, matlab legacy
code is definitely a big plus for many scientific/data analysis applications.

------
urlwolf
I'm using R for largish datasets. When compiled on a 64-bit linux, it can
address plenty of memory. The windows version is limited to 1.5 Gb, even in
64-bit windows. This is bound to change as Revolution computing have a 64-bit
windows R that works (and adds some Parallel libraries). But that is not free,
and it's still in beta. We are getting one. I think 64-bit R + resolver one
(no limits for spreadsheet size) can get a lot done in a very visual way. I
just happen to catch bugs faster when I color-code cells, but this is
impossible to do in straight R (fix() and edit() suck!) or in excel because of
spreadsheet limits. The combo I propose (don't have it yet) will be expensive
but worth it I think.

------
ScottWhigham
For the Windows crowd, there is also SQL Server Reporting Services which is
"free" if you already have a SQL Server license. It's more of a roll-your-own
reporting package but it's quite easy.

------
mwexler
Article was helpful, but read the comments as well. Fleshes out some real life
experiences from folks who have used multiple packages.

------
Radix
Where does Maple fit in here? I have repeatedly heard of it as an alternative
to Matlab.

~~~
acangiano
Maple is Mathematica's main competitor.

