

Announcing pqR: A faster version of R - tmoertel
http://radfordneal.wordpress.com/2013/06/22/announcing-pqr-a-faster-version-of-r/

======
bravura
For those who don't know, Radford Neal is an highly respected statistician and
machine learning researcher.

His most well-known contribution is showing the community how to train neural
networks in a Bayesian way.

He is at Toronto, where he received his Ph.D. under Geoff Hinton (the father
of deep learning).

~~~
jtmcmc
Also he invented tons of kickass sampling techniques and wrote what I consider
to be the best review of MCMC of all time
[http://www.cs.toronto.edu/~radford/ftp/review.pdf](http://www.cs.toronto.edu/~radford/ftp/review.pdf)

------
tardigrade
Don't understand why this wouldn't just been merged into the upcoming versions
of R. Creating a new project is strange, anyone know as to why they might have
done this? Sure there's probably a good reason that I'm just unaware of.

~~~
jergosh
R core developers are notoriously resistant to change. A number of glaring
inefficiencies have gone unfixed for years, despite people submitting patches
etc.

~~~
hadley
R core is generally motivated by ensuring that R continues to work as is, not
by improving performance. You can argue whether or not this is a good idea,
but in the absence of a comprehensive unit test suite, it's pretty hard to
improve performance without breaking behaviour.

~~~
radfordneal
It's difficult to see how this rationale can possibly justify ignoring a 10x
speed up in vector-matrix multiplies (and similar speedups for some other
matrix multiplies) that can be achieved with a modification affecting a dozen
or so lines of easily-checked code.

------
epistasis
Radford Neal: oh, let me take a break from ground-breaking stats work to
double the speed of R.

I'm very thankful.

------
scottfr
Looks great!

One thing R really needs is some sort of dead simple pass-by-reference
mechanism for functions. Creating copies object copies every time you call a
function on an object is a real performance killer.

~~~
_delirium
If a function doesn't modify a passed data frame, R doesn't actually copy it.
It's formally pass-by-value, but the implementation uses a copy-on-write
approach, so no copy is made when the function only reads the parameter's
values. That at least covers the common case of passing a bunch of data to a
function that builds a statistical model.

Of course that doesn't help in the case where you do want the called function
to modify the data, but in-place rather than by making a copy.

~~~
radfordneal
If you look at the "future directions" section of pqR's version of the "R
Internals" manual, you'll see a brief mention of a plan to implement "call by
name" parameter passing in the style of Algol 60, which should address this
issue. Before that happens, however, pqR will improve the tracking of
references to reduce the number of unnecessary copies made when parameters are
passed by value, which may be more than you realize in past versions of R.

~~~
hadley
It was my understanding that R basically already does implement call-by-name -
arguments to a function are passed by name and looking up in the calling
environment until you first modify them. Is my understand incorrect, or do
Algol 60's call-by-name semantics mean something different?

~~~
radfordneal
Sort of. That's why it shouldn't be too hard a modification to implement. For
a call-by-name argument, you just have to evaluate the "promise" every time,
rather than just the first time. (Assignment to a call-by-name argument will
be a bit trickier, but not impossible.)

------
makeset
How does the "helper threads" mechanism interact with existing code explicitly
using multicore operations like mclapply? Inadvertently spawning extra "helper
threads" from each of the explicit processes per core would not be pretty.

~~~
radfordneal
At present, pqR waits for all helper threads to be idle before doing a fork in
the "parallel" package, and disables use of helper threads in the child
processes (and temporarily in the parent, since it will wait for the child
processes before doing anything more).

~~~
makeset
Sounds great, thank you. I'm off to build this thing.

------
perlgeek
Are there any plans to merge that back into the main R implementation? And if
not, what are the reasons for keeping a separate fork? backwards
compatibility?

------
ehsanu1
Off topic:

Maybe it's just me, but the log scale for the relative program times is pretty
confusing. While it probably makes the improvement more obvious, it doesn't
help me understand the actual magnitude of improvement visually, without
having to look at the scale and figure out the actual numbers.

~~~
simcop2387
I think what would have worked better is normalizing the interpreted to 1.0
and then having the pqR results set against that. That'd make the graph far
less noisy and much easier to interpret for making the case of pqR being
faster. Right now there's much more information there than needs to be for
that with every different version of both being represented.

------
minimaxir
Is this version compatible with RStudio?

~~~
radfordneal
At present, pqR is compatible with Rstudio only if you configure with the
--disable-helper-threads option (ie, no automatic parallelization) along with
the --enable-R-shlib option that's needed for Rstudio. This is a minor glitch
in how pqR is linked that will be fixed soon.

------
nkurz
Revolution Analytics[1] is also claiming a lot of speed improvements. Is there
a sense yet of how pqR compares? Some of their speed up comes from linking
Intel's Math Kernel Library. Does this duplicate the "helper thread" approach
pqR uses or would they complement each other?

[1] I'm not familiar with them other than their website. My impression is that
they are real, but the website feels just "slick" enough to make me uncertain.
Are they considered reputable?

~~~
carterschonwald
A friend of mine was testing Revolutions tools at his job recently and found a
few frustration points: namely its based upon a relatively old version of R,
and while they allege to support larger than ram numerics, its apparently
quite thrashy.

[there some interesting subtleties to supporting larger than ram computation
well enough for it to beat distributed, which i'm trying to do for my own
work, so I found it quite exciting to hear that current analytical tooling
vendors dont do it terribly well :) ]

------
_anshulk
Phenomenal!

> Since pqR has not yet been tested on Windows and Mac systems, trying to
> install it on such a system is not currently recommended.

Can't wait to switch to it on os x. Installing on my server to play with it...

------
joelthelion
This is cool, but wouldn't it be a better idea to reimplement the interpreter
from the ground up on a solid platform such as pypy?

------
mikevm
Any idea why the R Core team haven't accepted his patches?

------
sunseb
No way I use this... In french, PQ = toilet paper lol. :-/

~~~
pi18n
Some English speakers claim they are uninterested in Coq for similar reasons.
I guess if you want to ignore a good tool because of a bad name, go ahead. But
it seems to be throwing the baby out with the bathwater.

~~~
draugadrotten
What's in a name?

While I do not suggest ignoring tools such as pqR, coq or gimp because their
childish and immature names, I do think it is bad manners to name your tools
in an offensive way.

It also makes people talk about the bad name you choose for your tool instead
of the properties your tool, and surely that's not what you wanted, as a
creator.

If you want your tool to be considered a professional tool used by
professionals, please name it like a professional would. If you want your tool
to be used by children, by all means, name it like a child would.

~~~
rflrob
I think it's unreasonable to expect the namers of a language to know every
conceivable double entendre in every language. Coq was named by French
researchers, and means "rooster", and pqR is obviously from the sequence of
letters. Neither of those seems particularly immature to me, in context.

~~~
extra88
Yes, it's the sequence of letters but that's not all. From the first line in
TFA, "pqR — a “pretty quick” version of R."

------
phalina
Good work Radford Neal!

