
PG-Strom: SSD-To-GPU Direct SQL Execution - blopeur
https://github.com/pg-strom/devel
======
blopeur
Slides :
[https://www.slideshare.net/kaigai/20170602ossummitanintellig...](https://www.slideshare.net/kaigai/20170602ossummitanintelligentstorage)

~~~
walterbell
Could OpenCL (AMD, FPGA) be supported in the future?

~~~
eriknstr
A bit off topic on my side here but, I've been a aware of both OpenCL and
FPGAs for quite a while but never thought of these being combined in any way.

Would anyone care to comment on their experience with regards to the following
questions I have about this?

1\. Are you using an FPGA implementation of OpenCL, or are you synthesizing a
specific piece of OpenCL code into the FPGA?

2\. How does performance compare between OpenCL running on a high-end GPU and
a high-end FPGA?

3\. How does performance compare between implementing someting in FPGA
directly and implementing it in OpenCL and running that on an FPGA?

4\. How does working with OpenCL for FPGA compare to working with the FPGA
directly? Iow, what are the advantages and disadvantages you find of OpenCL as
opposed to Verilog or VHDL (or other ways you program an FPGA)?

5\. Based on the above, if you had X amount of dollars that could buy you
either a number of GPUs or a (probably different) number of FPGAs, which would
you choose? Consider also the electricity cost of running these as I've heard
that FPGAs generally waste less electricity because they produce less heat.

------
microcolonel
This seems like a perfect fit for AMD's SSG architecture (1TiB flash storage
directly on the GPU board).

~~~
arnon
It's Nvidia only, it has direct CUDA

------
ricw
I love the concept.p But it's hard to tell whether this is ready for
production. Anyone have any hands on experience with it?!

~~~
nikanj
This is the first time the concept has ever been shown publicly. You must be
exceptionally brave to use it in production. Most CTOs are cynical, and always
wait for the .1 version

~~~
jeltz
Hm? I have not followed PGStorm but the project has been around for at least 5
years, probably more.

~~~
kaigai
Early version of PG-Strom didn't plan SSD2GPU Direct SQL Execution, however,
some users concerns didn't concern about CPU performance only but I/O also.
So, I began to implement the feature to accelerate I/O also. Its development
started at Dec-2015.

------
en4bz
Note that GPUDirect is only available on nVidia Quadro and Tesla series GPUs.

~~~
nosefouratyou
Does that mean that this library[1] requires GPUDirect? That's sad.

[1]: [https://github.com/pg-strom/nvme-strom](https://github.com/pg-
strom/nvme-strom)

~~~
kaigai
Yes, Tesla or Quadro are required. In addition, Tesla P40/P100 are recommended
due to size of i/o mappable region.

------
redm
It's great to see this type of evolution in database performance, maybe this
will become the standard someday.

The only thing missing is that there isn't a comparable project for MySQL.

~~~
microcolonel
Not trying to be a snoot here, but what is the point of using MySQL today? I
can't think of something that I'm aware it does better than Postgres.

~~~
Filligree
Interoperating with code and data files that were made for/by MySQL.

~~~
microcolonel
Code I can understand, data a bit less. I'm just thinking that no MySQL
program which has already been written is going to have any reliance on the
performance characteristics of this flash-based database system, since it
wasn't a thing when they were written.

If you could change the shape of your tables to suit this approach, then
surely you could adopt Postgres.

------
raverbashing
Wondering how much the "moving data to the GPU" overhead won't help to make
things significantly faster

~~~
throwaway91111
Really depends on the shape of the data and query at hand. If you're doing
aggregate computations over million/billions of scalar-sized rows, it could be
quite fast. However joining two cold tables will still be more limited by
schema and disk read than anything.

~~~
arnon
Unless you're doing skipping and indexing properly....

------
X86BSD
This seems pretty nice. Does anyone know if there is anything similar in the
oracle camp? This seems like a big win over them if not.

~~~
nova1313
I suspect it might cut into their Exadata appliance market if they did, but
I'd love to hear anything from Oracle indicating they were adding something
like this to a new version. If you try to go massively parallel for workloads
they seem to push us towards Exadata. Parallel table/index scans may be your
best bet at the moment on Oracle.

As far as I know you can't do anything like custom scan providers in Oracle.
I've tried searching a few times for functionality like that and always came
up empty. Even if that was an available feature, I'm not sure how you'd even
begin costing it for the optimizer to take that into account, unless it was
something you'd just enable via hint.

I have OpenCL code running from within Oracle stored procedures via Java. You
can at least write your own logic from within the DB to utilize your GPUs.
I've done things like push 2 tables into GPU memory and let the GPU do the
joins once you manually load the columns to join on. Return the set to SQL and
use an indexed column to return the data you want from the table. It's not
something I'd consider running in production, but it was a neat exercise and
you can get it to work.

------
sixdimensional
I still am hoping for affordable SSD to FPGA direct execution someday!

~~~
ccrush
Direct execution? With GPUDirect from NVIDIA you could have an FPGA get a
memcpy() from graphics RAM to its data has. Still, not sure how you plan on
using this. I would hope that NVIDIA could give us programmable fabric on the
GPU silicon, and it could somehow access each compute units instruction
execution pipeline. Would be interesting indeed. Or just massive, fast FPGAs
with included highly parallel compute IP built in. Or all new NVIDIA cards
just ship a very fast FPGA that runs 3D graphics by default, but can be
reprogrammed to do CUDA compatible computation and its open source so users
can add new instructions or modify existing ones or even reconfigure the
number or arrangement of cores.

~~~
sixdimensional
Thinking something along the lines of, implementing/modelling SQL operators in
FPGA hardware, using massive parallelization through the FPGA. That needs
direct I/O access to very fast storage to be optimal, and probably still
additional processing via a CPU or GPU.

Expensive custom data warehouse appliances like Netezza already do something
like this, but I'm hoping for an implementation that is less expensive and
more generally available, that can be exploited to accelerate any database's
operations. Bonus points if it can scale up to a distributed model somehow.

------
throwaway-1209
Where clause is the worst possible use case for GPU: GPUs don't tolerate
branches or non-coalesced memory access all that well, and WHERE executes a
ton of branches. Even so, things seem to be faster. Go figure.

~~~
ris
I think it hugely depends on what your WHERE clause is.

~~~
throwaway-1209
I guess if your where clause is very selective, you could avoid the hit since
the same branch is taken by all threads in a warp. But then there are better
approaches such as BRIN indexes, RLE and dictionary encoding with which you
don't even touch the data that doesn't match your clause.

