

Accelerating SQL[ite] Database Operations on a GPU with CUDA [pdf] - there
http://www.cs.virginia.edu/~skadron/Papers/bakkum_sqlite_gpgpu10.pdf

======
lsb
This is really cool, but something doesn't smell right.

1) They give numbers from a single-core CPU implementation, and from a many-
core GPU implementation, and they discuss numbers from a many-core CPU
implementation as well, like some of the new four- or eight-core chips;
without many-core CPU numbers, it's comparing apples to oranges.

2) What indices are on the table? Does the GPU have any indices? A read-only
database is certainly going to have indices, and (hopefully) covering indices
for all the big queries.

3) SQLite has strongly-typed _values_ not _columns_ (you can stick a string in
an integer column and that's just fine) and it seems like they disregarded
much of the SQLite type system, only focusing on integer values. I'd like to
know how much time is spent on type checking values.

------
eigma
The queries are extremely simple (in appendix after references) and, in the
absence of indexes, require independent evaluation of all of the rows in the
table. This maps extremely well to the CUDA architecture which has hundreds of
memory-coupled independent processing elements.

But this isn't anywhere near the typical usage scenario of SQL databases. If
it was, scaling databases would be trivial by sharding tables across many
servers. In real life, you have indexes and joins, which translate into random
access patterns, which do not perform very well on CUDA.

------
romland
Heh. When I first saw the document I thought there is no way I am going to
read all that now. Surprisingly, the sections that I did end up reading were
quite easy to digest (and it's 5am here). I recommend bookmarking it and
reading it later if you can't do it now. This is genuinely interesting.

The gem which you might be looking for is:

 _The queries executed on the GPU were an average of 35X faster than those
executed through the serial SQLite virtual machine._

(my apologies to authors for reducing the entire document to that)

------
andrewvc
_The contribution of this paper is to implement and demon- strate a SQL
interface for GPU data processing. This in- terface enables a subset of SQL
SELECT queries on data that has been explicitly transferred in row-column form
to GPU memory._

This is fascinating stuff, but having to keep the entire DB in GPU memory
poses some obviously non-trivial problems in terms of practical applications.

------
rabidsnail
When are databases ever cpu bound?

~~~
lsb
I was going to give you a snarky answer, because one of the preconditions for
the paper is that it's CPU vs GPU operating on an in-memory database (so
there's no disk access by definition), but in thinking about it perhaps it's
not even CPU bound but cache-miss bound, and you could trade the latter for
the former by compressing the data into a trie or something smarter than just
a B+tree.

