
In-database R coming to SQL Server 2016 - Hansi
http://blog.revolutionanalytics.com/2015/05/r-in-sql-server.html
======
bladecatcher
While this is a good concept in theory, I'd be skeptical about building on top
of such a system. The primary reason is the slowness of R. I built heavy duty
data mining systems using a stack of kdb/q and R. In my experience, R, when
used for simple clustering algorithms like k-means and k-medoids slowed down
my system by nearly 70 times. This is despite running parallelized versions of
these algorithms (by means of the SPRINT R package) using mpiexec.

IMO, there is a very big gap in this space. There is an urgent need for high
performant data inference languages. MATLAB is decent, but is still clunky for
my taste. Plus, I prefer the simplicity of a file mapped column oriented
database like the one offered by kdb. As KDB is too expensive for me right
now, I'm considering building on top of the excellent J language/JDB database
stack for my big data needs.

~~~
gaius
This will all be down to the implementation of DataFrame no?

In other words if you put q ontop of a R DataFrame would you expect the same
performance? Or if you ran R ontop of K.

~~~
bladecatcher
I'm not sure if I understand your question correctly. kx provides shared
libraries to make R and q talk to each other. The specific approach I followed
used a shared library that brings R into the kdb+ memory space, meaning all
the R statistical routines and graphing capabilities can be invoked directly
from kdb+. Using this method means data is not passed between remote
processes.

~~~
gaius
Ah I assumed you were loading a vanilla DataFrame from the column store. My
bad :-)

------
gtrubetskoy
For what it's worth, PostgreSQL had this since 2003.
[http://www.joeconway.com/plr/](http://www.joeconway.com/plr/)

IMHO scripts running in a database server never work all that well - debugging
is a nightmare. At least this has been my experience from trying PG plpython a
few years ago.

Link to the original announcement email: [http://www.postgresql.org/message-
id/3E514A46.2040604@joecon...](http://www.postgresql.org/message-
id/3E514A46.2040604@joeconway.com)

~~~
hadley
R does have some pretty cool tools for after the fact debugging, like
dump.frames, but few people know about them.

------
fs111
How does that work when R is GPL licensed? Doesn't that make SQL server a
derived work?

~~~
toddkazakov
Revolution have their own implementation of R called OpenR.

~~~
hadley
No, that's not their own implementation. It's GNU r bundled with some extras

------
Jake232
I'm always reluctant to these kinds of ideas, of executing code on/within my
database server.

I know it's apparently sandboxed, but that didn't work out too well for
ElasticSearch recently: [https://jordan-
wright.github.io/blog/2015/03/08/elasticsearc...](https://jordan-
wright.github.io/blog/2015/03/08/elasticsearch-rce-vulnerability-
cve-2015-1427/).

~~~
SigmundA
SQL is code running in the database server. Most SQL implementations, TSQL,
PL/SQL, PGSQL are turing complete as well.

------
saosebastiao
I hope it's not as bad as PL/R. Seemed like a good idea, but the performance
was so terrible that it was essentially useless.

~~~
jeltz
Any idea what makes PL/R slow? Is it time spent doing data conversions from
PostgreSQL types to R types?

~~~
tankenmate
This article[0] shows exactly this problem; and it links to patches that
handle this issue. The article is 4.5 years old, not sure if the patches have
been upstreamed.

[0] [http://www.credativ.co.uk/credativ-
blog/2010/07/postgresql-t...](http://www.credativ.co.uk/credativ-
blog/2010/07/postgresql-topic-of-the-day---plr-performance-improvements)

------
sorokod
How does Java inside Oracle DB is doing nowadays?

------
modarts
Shudder

------
atorralb
just so you know, is already in SAP HANA... is nice to see that programming
languages are part of a database a not just an extension

~~~
toddkazakov
It's not actually in the database. It's a lousy implementation which does not
use the distributed nature of HANA. You fetch the data first and then you do
your analyses in R which is hosted near the DB. If you want parallel
processing in R, you're on your own.

------
vittore
Why Microsoft? Why? Why not python?

~~~
SigmundA
I would say why not improve the CLR support that has been there since SQL
2005?

~~~
BrentOzar
> I would say why not improve the CLR support that has been there since SQL
> 2005?

Because it won't cause new license sales, whereas integrating R might.

