
GPGPU Accelerates PostgreSQL - lelf
http://www.slideshare.net/kaigai/gpgpu-accelerates-postgresql
======
JonnieCache
So glad to see this coming along.

If any of the project team are reading, what I'd like to see most is GPU-
accelerated point-in-polygon lookups in postGIS, ST_Contains and so forth.

~~~
vardump
It's likely GPUs are slower than CPUs for spatial data structures. Getting the
data to GPU and results back takes just too long. Point in polygon is also
very branchy in general case. GPUs are really bad with branchy code, so it's
very unlikely you could even GPU accelerate such query, if it operates on
point coordinates and a set of polygon vertexes. At least it would be very
hard.

Edit: right, after thinking about it, the branches can be optimized out. It
could be fast if there are a set of sorted segments, just parallel compares
and some boolean logic.

Which leaves the problem of getting the data to GPU. Because you can
definitely stream same comparisons on the CPU much faster (memory bandwidth
limited) than you can stream the data to GPU over PCIe.

So 2 CPU socket system, such as Xeon E5, I'd bet on the CPU. PCIe 4.0, 16
lanes would give 30 GB/s (not sure if PCIe 4.0 is supported anywhere), vs.
aggregate CPU bandwidth of up to 150-200 GB/s. Dual socket Xeon E5 supports at
least 1 TB of RAM (16x 16 GB buffered DDR4). 32 GB DDR4 memory modules exist
as well, I think, and larger DIMM banks than 16 slots can be supported. It's
just the number of slots on typical 2 socket mainboards.

With more realistic setting, CPU would be even more ahead. All of this is
ignoring GPU latency issues, which can be anywhere from microseconds to tens
of milliseconds in pathological cases.

Unless the data was on the GPU in the first place... I think currently a
single GPU can have up to 12 GB of RAM. Maybe larger GPUs exist too. That's
just not much RAM compared to what is typical for CPUs. Currently smallest
amount of RAM a dual socket Xeon E5v3 standard server can have is 64 GB, if
all memory channels have at least one DIMM.

~~~
1Bad
Point in polygon actually seems like a good problem for the GPU. It can be
calculated with a simple angle calculation performed in parallel against all
segments. Trig functions are fairly heavy weight, and serially or with lesser
degrees of threading on the cpu, creates a large computation, making it good
for offloading to the GPU.

~~~
colanderman
Point-in-polygon is usually done with simple vector arithmetic; not trig.

------
majc2
As an aside, if this is of interest you might be interested in the third run
of the GPGPU course by coursera/University of Illinois starts in January. See:
[https://www.coursera.org/course/hetero](https://www.coursera.org/course/hetero)

------
joelthelion
It's great to see that they're using OpenCL. GPU computation desperatly needs
standardization, and this could help bring OpenCL drivers on par with CUDA.

~~~
vardump
A lot is lost in the translation when it comes to OpenCL. Current nVidia and
AMD GPUs are just wide SIMD machines, just like current x86 cores. GPUs are
just just wider and with much less cache, slower serial execution.

~~~
fiatmoney
"Current nVidia and AMD GPUs are just wide SIMD machines"

That's not really true; you can kind of treat it that way, depending on the
algorithm (eg, matrix multiplication breaks down cleanly that way), but there
are serious flexibility advantages in the "single-instruction-multiple-thread"
model vs SIMD. For example, consider streaming large numbers of hash lookups -
difficult to express clearly with a pure vector processing model.

~~~
vardump
Right, I should have mentioned GPUs have a ton of hardware threads. Then
again, they have to, GDDR5 memory access can take a _microsecond_. Try latency
like that on a generic CPU, hyperthreading or not...

So: GPUs are wide SIMD machines with a lot of hardware threads, massive branch
and glacial memory latencies. When there's a branch or memory latency, HW
simply switch thread. GPUs don't care about serial execution performance.

------
maaaats
Most servers I've used don't even have a GPU. It will be interesting to see
how this and other GPGPU applications for server software will shape the
server parks in the future.

~~~
perlgeek
In the scientific computing community, servers with GPUs are pretty common,
and available off-the-shelf. See for example
[http://www.supermicro.com/products/nfo/gpu.cfm](http://www.supermicro.com/products/nfo/gpu.cfm)

So asking the folks that already use that stuff could give pretty accurate
predictions.

------
adamtj
This is very interesting, but not as a CPU saving optimization. I'm sure it
does that very well, but that's not why it's Important. Rather, this seems to
me like the next step toward the inevitable future of PostgreSQL as
arbitrarily scalable, and as _the_ general query engine that ties together
whatever physical data stores you happen to use.

It seems obvious to me that pushing the Foreign Data Wrapper layer with work
like this is how we eventually break through the RDBMS scalability barrier of
the individual host. In the future, I'm sure you'll see similar work where
_the_ GPU won't be the GPU and the PCI bus won't be the PCI bus. Rather,
they'll be _a_ host and the network. A database service (database cluster in
Posgres's nomenclature) will eventually run not just on a single machine, but
on a single cluster of machines. Instead of a cluster of machines for
redundancy, you'll have a cluster of clusters.

Postgres is really two things in one: a physical layer of bytes in pages in
files, and a logical layer of queries on tables of records. The most important
piece in the future will be the logical layer. The FDW layer will naturally be
extended and generalized until it is fully as powerful as the current physical
layer. At that point, it can be made _THE_ api through which the logical layer
accesses data. The current physical layer will then be nothing more than the
default implementation of that general API.

At that point, we can move whole or partial tables to other hosts. Perhaps the
autovacuum daemon will gain a sibling in the autosharding daemon. The query
optimizer will need to care not just about disk IO, but network IO and will
need to start considering the non-uniform performance characteristics of
different tables. Some tables will be driven by Postgres's default physical
storage engine. Others will be driven by other RDBMSs, or by NoSQL key/value
or document stores, or other data stores. They may be on the same machine or a
different one.

Postgres will transform into a query engine on top of whatever data stores
best fit your workload. I expect the query engine will learn about columnar
stores and be able to mix those in a single query with the row stores,
key/value stores and document stores that it already understands. PostgreSQL
will be a central point through which you can aggregate, analyze, and
manipulate any and all of your data. It needn't be intrusive or disruptive:
you can still use a normal redis client for your redis store, but you can also
use Postgres to manipulate that data with SQL and to combine it with other
tables, whole RDBMSs, other NoSQL stores, spreadsheets, web services, or
anything else. Maybe it will even make things like Map/Reduce frameworks
redundant.

I don't typically follow Postgres's internal discussions, so maybe this is
already being discussed and planned. Or, maybe it's so obvious that nobody
even needs to talk about it. Or, perhaps I'm just some wide-eyed idealist who
doesn't understand the fundamental problems preventing such a thing from ever
being practical.

------
nrzuk
While I absolutely love the concept and really want to buy a graphics card
just to play with this on my development box. Find it quite exciting how some
applications are utilising graphics processing power.

But I can't help but wonder what the sys admin's response is going to be when
I start asking for additional graphics cards being added to his perfectly
built 2U database servers!

~~~
marcosdumay
Size is nothing.

You are also asking the sysadmin to install a closed, unreliable kernel-level
piece of software with that GPU.

~~~
darkarmani
> You are also asking the sysadmin to install a closed, unreliable kernel-
> level piece of software with that GPU

Unlike all of the closed network equipment they already manage.

------
fitshipit
This reminds me a little of the Netezza data warehouse appliance's
architecture: a query planner in front of lots of little nodes with one disk,
one CPU, and an FPGA. Every query is a full table scan, each node flashes the
WHERE clause to the FPGA, and slurps the whole disk through the FPGA.

------
jvickers
Does anyone know if or when the GPGPU acceleration will be available in the
normal Postgres install?

Is / will this acceleration be switched on by default?

~~~
ris
Too far off in the distant future to tell.

------
thomasfoster96
This + Amazon RDS would be pretty awesome for mapping.

------
sjtrny
in other breaking news grass is green and water is wet. Obviously throwing
more power at the problem ends up with faster execution.

There's a limit to GPGPU acceleration though. It's the tiny amount of RAM. We
need to adopt shared memory architecture like those found in games consoles. A
single massive pool of RAM would further unlock potential power.

~~~
hbogert
It's not that simple, when you look at those architectures like AMD's fusion
platform, the differences of latencies and bandwidth play a huge role, making
it a much more nuanced story. One of the papers showing this:
[http://link.springer.com/article/10.1007/s00450-012-0209-1#p...](http://link.springer.com/article/10.1007/s00450-012-0209-1#page-1)

~~~
Sanddancer
The paper's unfortunately paywalled, but it being two years old, and with
AMD's fusion platform having added a few interesting features in the interim,
like being able to just throw a pointer over the wall to the GPU, instead of
copying the entire data structure over, makes me wonder if the paper needs to
be revised to address this. GPGPU is a rapidly changing field, and just a
couple years is a big difference.

