

R in a 64 bit world - erehweb
http://www.win-vector.com/blog/2015/06/r-in-a-64-bit-world/

======
rxin
Coincidentally, R on Spark (SparkR) is also announced today:
[http://databricks.com/blog/2015/06/09/announcing-sparkr-r-
on...](http://databricks.com/blog/2015/06/09/announcing-sparkr-r-on-
spark.html)

It will appear in Spark 1.4 to use R on a cluster of machines, or a single
machine with multicores.

------
iaw
I once encountered a problem in R trying to run a mixture model that depended
on some underlying Fortran code and the Fortran code couldn't handle the
initial size of the value to be minimized.

The only solution I found was to completely rewrite the code in Python to
avoid the problem. I was chuckling for a while about hitting an unsolvable
Fortran problem in 2014.

~~~
hyperbovine
That's a puzzling bug as Python floats are just doubles, which Fortran
definitely has. But I identify with the larger point: if you do scientific
computing for any amount of time you _will_ run into Fortran code. Old,
convoluted, unmaintainable, and often wicked fast and devoid of bugs. A really
eye-popping amount of netlib, *pack, etc. being used in production right this
very moment either relies on Fortran routines, or is calling C code that was
ported from an equivalent Fortran routine. It's the result of some really
smart people putting in a lot of time and effort over the past 30 years; if it
ain't broke...

~~~
angersock
As a friend of mine in academia is discovering...it can be quite hard to tell
if it _is_ broke.

~~~
rspeer
About two years ago I encountered a bizarre bug where, if I asked SciPy for
the eigenvalues of a particular small matrix that I was using as a test case,
it would consistently give a different result on my desktop computer than
anywhere else. But when I tried to isolate the test case, it went away. It
would only happen if I ran three other test cases in a particular order first.
Or if I ran that one test case 12 times in a row, it would fail the twelfth
time.

I really _wanted_ to find out what was going on. I looked through the code,
from SciPy to ARPACK to the underlying ATLAS calls, at which point it became
completely opaque to me.

I still don't know whether it was the fault of ARPACK or ATLAS or what, but I
just put the test cases in a different order, they consistently passed in that
order just like they passed for everyone else, and a few system upgrades later
the problem didn't happen anymore.

~~~
hyperbovine
ATLAS compiles differently on different machines ("Automatically Tuned Linear
Algebra Software") so this doesn't come as a complete surprise. Agree that
that's a really annoying bug though. Do you happen to remember the matrix?

------
skybrian
While I'm not a data scientist, I have been doing Euler problems in Julia for
fun and the situation seems better there, at least when it comes to the
foundations.

~~~
Lofkin
Definitely.

