
Ask HN: Any experience running R within Postgres? - prmph
Would anyone like to share their experiences running R in Postgres? I&#x27;m evaluating accessing R packages from a remote Postgres db rather than using something like OpenCPU
======
apohn
Could you describe what you are trying to do in more detail?

It's been a few years, but I used Pl/R for this.

[http://www.joeconway.com/plr/doc/doc.html](http://www.joeconway.com/plr/doc/doc.html)
[http://www.joeconway.com/plr/](http://www.joeconway.com/plr/)

The use case was a fairly straightforward one. We needed to run a snippet of
R-code on some data in Postgres and wanted to limit the data transfer distance
between the database and the R session. We could also have just installed R on
the same server and had it call Postgres locally, but having being able to
call the R function via SQL was very convenient.

Debugging and package management was a pain, but once it was running things
were fairly smooth after that. At some companies the IT function will
rightfully scream in horror at Pl/R.

I'm don't knows how Pl/R moves the data between Postgres and R. I've seen a
lot of cases with other software where the data is written to disk and then
read into R. This is then done in reverse for the return trip. For larger
datasets or for small datasets that arrive at high frequency (e.g. streaming
or mini-batch applications) this may be problematic and it may make sense to
use R's JRI ([https://www.rforge.net/JRI/](https://www.rforge.net/JRI/))
instead.

~~~
prmph
So you still ned an external R engine? I was under the impression that the R
engine would be embedded in the Postgres database

~~~
apohn
Yes, the R engine is external. It can be installed on the same machine to
improve performance. You'll have to manage R and Postgres separately, but the
communication between them can be streamlined (e.g. what is done in PL/R).

This is typically how R-integration is done. The benefit of this is that the R
process is isolated from the DB engine process and any issues with the R
engine will not cause a major failure with the DB engine. If somebody
integrated the R runtime in the same process as Postgres, the R session
crashing could lead to entire DB going down as well.

MSSQL's and Oracle's R integration work the same way. They spawn separate R
processes and communicate with them.

Note that there's a nuance to "spawn a separate R process." Some pieces of
software spawn R using the command line and communicate using temporary
directories. Others use JRI, which loads the R library into Java and calls R
using API calls.

