Hacker News new | past | comments | ask | show | jobs | submit login
Show HN: my database engine for GPU (sourceforge.net)
364 points by antonmks on Jan 26, 2012 | hide | past | favorite | 60 comments



Excellent work! I have a very small (marketing-related) nitpick:

Would you consider hosting your project on github instead of SourceForge? I don't say this to be trendy, github just has a much more pleasing interface in my opinion. It's also faster, subjectively at least.


I second this. I'd rather be at the dentist than on a SourceForge page.

I swear it's the "myspace.com" of open source hosting.


Another practical consideration: I clicked on the "manual.txt" file and SF redirected me to a download page. I would have preferred to read the file in-place (like on github).


Congratulations, well done!

Our team has built an in-memory DB (hybrid, both column and row store available, also using advanced compression). We have investigated GPUs for scans, we found that while they can outperform CPUs, the bottleneck is the transfer of the data to the card. Since the card's memory is only in the GB's, and our databases are typically much larger this is kind of limiting, since you have to copy data back and forth.

Do you know what the maximum amount of GPU memory for a blade-like system currently is?


It is 6 GB for Tesla GPU. PCI Express transfer speed is 5GB/s, with average compression rate averaging 4/1 the speed becomes 20GB/s so transfer speed is not a bottleneck anymore.


I see. So for a large dataset you would still hit a bottleneck, compared to a main mem store which also uses compression. But nonetheless it's a good idea to do certain operations on hot data on the GPU.

Which operations benefit the largest in your experience? I would guess scans with simple predicates and expressions. What about aggregations?


I had investigated aggregates using gpu for building dwarf petacubes (http://www.cs.umd.edu/~nick/projects/Dwarf.pdf), and yes it looks viable.


I had not heard of that data structure before, thanks!

Sadly, it appears to be patented and thus of no use to me. http://www.cs.umd.edu/~nick/projects/Dwarf.html I think it's really quite crass and mercantile for UMD to patent basic computer science algorithms and data structures like that.


Except that hard drives have a much lower transfer rate?


But we're already working on the premise that data is in the main memory, thus the bottleneck being the GPU<->memory transfer rate.


Do you compress per row/column or over a given set?


By column. FOR(frame of reference) compression for integers and decimals, FOR-DELTA for already sorted columns, dictionary for strings.


This makes me think of CPU/GPU architectures where both are on the same die, but of course that won't work with CUDA.


can you share a link to the project?


You can find info here (unfortunately not just an easy download)

http://www.sap.com/hana/index.epx


I know that one pretty well :)


Super cool, and thanks for sharing!

Are you familiar with APL/K/Q/J ? These implement column store databases which a programming language significantly more expressive and usable. K and J are also significantly faster than any other SQL database -- it would be interesting to compare K's ultra-optimized in-memory processing to the GPU -- I'm not at all sure GPU is going to win here in practice.

See https://github.com/kevinlawler/kona for an open source K implementation (the wiki has docs). Same language, newer version (with database integration) and english language instead of ascii chars is called Q see http://kx.com/q/d/ (kdb1, q1 if you like longer, kdb, q if you like terser), and http://code.kx.com/wiki/Main_Page


My first thought looking at this was "Ooooh, it would be so nice if my GPU could parse Q.."

In fact, Q thoroughly sold me on tightly coupling a small expressive language with a parallel-friendly datastore. So many data problems require just this combination, and most general computing languages are surprisingly bad at it in comparison.


I'm not sure K's in-memory processing is incompatible with running on a GPU, either. I've been working on a fork of Kona that has a very different architecture (e.g. a bytecode VM, generational mark/sweep GC, lexical scope), and I plan on experimenting with CUDA/OpenCL once the fundamentals are set.


Oh, K's in memory processing model is surely compatible with modern GPUs - at last for most primitives.

The thing I'm not sure about is whether the transfer overhead (GPU, CPU, Memory) would still be worth it in the general case. There's a question of what you're actually comparing to - e.g. you can have a (relatively) naive implementation, like the first commit of Kona was. You can have Arthur's cache-optimized-but-still-pure-C-and-CPU implementation (I think the main kona branch is comparable these days, but I'm not sure), which is faster.

And then, you can compare it to an SSE2/SSE4 version, still on the CPU; such a version does not exist afaik, but is easier to code than you would expect if you use ispc (https://github.com/ispc/ispc/tree/master/examples) or cilk-plus -- simple things with no data dependencies, as many K primitives are, often get a 4x speedup with ISPC.

Oh, and by the way, thanks for your work on kona and this message -- I've only been following kevinlawler's branch, but I'll start following you too.


The transfer overhead would only need to happen once if there's a sequence of GPU-able primitives, though. The dataflow analysis for that isn't too hard.

I'll check ispc out too, thanks. My fork* is using some SSE, via GCC-specific extensions (http://gcc.gnu.org/onlinedocs/gcc/Vector-Extensions.html). It could be special-cased into Kona as-is, but I've found a better way to implement scalars than the elaborate preprocessor hack I did there. (Besides, tuning the reference counting would probably have a bigger impact, something like Deutsch-Bobrow at least.)

* It's not on github, yet; I don't usually post things there until they're reasonably usable end-to-end. My kona fork is mostly for pull requests; I'm talking about a completely new implementation.

Would you like to continue this discussion off HN? My contact info is in my profile.


Very, very cool. I'm always happy to see more ways to utilize the tremendous performance/price of GPUs.

So, what are the pros and cons of this approach? For a given hardware and energy budget, what kinds of task would be better suited to this kind of GPU implementation? Conversely, what kind of tasks would still be faster to run on a Core i7?


Agreed, seems like it has a lot of potential.

I remember some time back when GPUs were first being looked at for HPC uses I suggested a client look into them, but they were quickly dismissed due to their floating point limitation (which may have changed).

But DBs don't have huge FP needs. Another place GPUs are making a big difference is video editing, utilizing CUDA or OpenCL.


I think their floating point performance is improving.


GPUs have always had amazing floating point performance for single precision floats, that's basically all graphics is. The problem is that double precision support was traditionally non-existent, and has been slow until recently and it is still much slower than single precision float performance.


Double precision performance is now just half speed, which is the same as the CPU. Many applications are bandwidth bound, so running in double vs. single doesn't end up with a 2x difference anyway.


now this is the kind of thing i come to HN for... but why oh why sourceforge?


github would be better for browsing. But the source is only 117k - I'm surprised how much of a barrier a few clicks and a few seconds was for me. (Partly, I expected it to be bigger.) The 21 files total 18832 lines; there's bison, lex, and a bunch of cuda files. Some is structured as doing similar things for similar cases, so not that long. It seems to include a reasonably complete implementation of SQL in the bison etc (and there are two example files of SQL query sequences). I'm impressed.

There are some line comments, but no overviews. :(


This is awesome. :)

Someone did something similar for PostgreSQL here:

http://wiki.postgresql.org/wiki/PGStrom


That looks absolutely brilliant. So is the database all stored in the GPU memory? Or in main memory? Or does your engine handle storing to disk as well?

Pretty novel use of a GPU though, and if it's quick, it could be really useful.


Thank you ! The data are stored on a disk and only the columns necessary for transformations are brought to GPU. In case the data does not fit into a limited GPU memory, it is processed piece by piece. I tested it on terabyte sized files so the data does not have to fit into GPU memory .


Looks like a very interesting approach to me given the recent developments.

Still I have a few questions regarding your numbers. Can you specify how much data you actually stream from disk? Because when doing some back of the envelope calculations (8 columns out of a 1.8B rows assuming 4b each) you would need to stream ~300MB/s from disk. Which seems very unlikely for your setup (except you have a super fast SATA drive, or SSD).

Now, with only 4GB RAM, your are constantly filling your RAM and transporting data to GPU. Plus you need to compress the data somewhere (on GPU?).

Do I understand the manual.txt correctly, that you can only achieve the performance when the data is written compressed to disk before? (While sorting it?)

I may be wrong, but the group by and join looked like versions requiring sorted data.

Can you please give a little more details on that?


Here the details : Integers and decimals are both stored as 8 bytes. So one record takes 42 bytes. After compression it takes about 8 bytes ( I use FOR - frame of reference) compression and it works really well with decimal numbers (check the range of decimal numbers in lineitem.tbl file - usually you encode them in just a few bits). So I can stream about 10 million records per second from my disk. Which makes it exactly 180 seconds for 1.8B rows ( CUDA calculations are done in the background process). Records are decompressed in GPU and processed there. Actually the main bottleneck is not the disk access but the latency of CUDA calls - each call takes about 15 ms and when you process the entire file in chunks of ,say, 6 million records, it really adds up.


Interesting. I wonder how a GPU could best be put to use in a database engine. Joins, I suppose? Would be cool to read more about this implementation, however the readme is a bit sparse.


antonmks, did you consider using OpenCL instead? If yes - why CUDA outweighed it?


Whether it's still relevant or not CUDA has a lot of mindshare in the academic community. Partially this is due to marketing and having their product out first, but also OpenCL has to go through some more abstractions. There was a quite popular forum post[1] a while ago now where someone claimed a 5x perf improvement from switching from OpenCL to CUDA and, while i havn't tested it myself, it's the sort of statistic that sticks in your mind.

______________________

1. http://forums.nvidia.com/index.php?showtopic=109684


Column store, compare it with MonetDB (open source) and similar. This design is suitable for read-mostly databases. It's a trade-off, faster reads and much better compression. But the price is writes usually become an order of magnitude slower. It isn't good for transactional work (OLTP).

http://en.wikipedia.org/wiki/Column-oriented_DBMS

Also the author talks (other comments) about transfer rate when the real issue with GPUS is latency. Most GPUs don't have decent RAM caches and it's very hard to cover the latency.

Also: http://reddit.com/r/programming/comments/oxq6a/a_database_en...


Very nice. I'm glad people are using their GPU's for something more productive than farming bitcoins.


nvidia gpus only, right? does it work with ati as well? CUDA is probably more advanced, so it's no surprise.


I love this kind of novel thinking. I'd definitely like to read more detailed articles on how this works.


If you look up columnar databases, you can find a whole host of research on how these things store data. Storing values column by column to do vector processing is pretty common, but I can't recall seeing one built for a GPU before. Kudos to the dev for diving in to implementing one, it's pretty neat stuff.

Try finding research papers about MonetDB, Vertica/C-Store, or Vectorwise for some background. Or follow the links here:

http://www.dbms2.com/category/database-theory-practice/colum...


I was looking at this a number of weeks ago to see if it would be feasible to speed up any SQL queries. There is a paper about this for SQLite: http://www.cs.virginia.edu/~skadron/Papers/bakkum_sqlite_gpg...


What is the point of using "å" instead of "a"?


Those are two different characters. They are not interchangeable. It looks like the guy's name is Anton, a Swedish name. Thus he will create names with "å"-s in them if he wants to.

Will this cause any problems? It might. That still does not mean that "a" is a proper substitute for "å".

Should he have tried making a name from just English letters? That might have been a good idea. But that is a different recommendation than the one you gave.

"STARGÅTE", on the other hand, is an example of gratuitous replacement of "A" by "Å".


In Stargate, it's a nod to Earth's point-of-origin symbol being Å (well, it's not exactly Å, but it's very similar)[1].

[1] http://stargate.wikia.com/wiki/Point_of_origin


>It looks like the guy's name is Anton, a Swedish name

This Anton is Russian clearly. http://ru.linkedin.com/pub/anton-k/32/253/695

And Алёнка (Alyonka) is a diminutive form of Russian name Алёна (Alyona) http://ru.wikipedia.org/wiki/Алёнка


What's the point of using y instead of u?

Å and A are not at all the same letter. Although I agree that using characters outside of the ascii set for naming projects on the internet is probably a bad idea. If I had to translitterate Ålenkå to Latin chars, it would probably be Oolenko, but even that's not really that close. The Latin alphabet has a horribly limited set of vowels.


The transliteration of Å is usually "AA". In fact the "circle" above the A is really a lower case "a". It's just that as time passed, it "devolved" into a circle.


I always thought it was "oa". As in boat vs båt (the swedish word for boat). It is pronounced kinda like "oa", essentially a big round long "o" sound with the mouth muscles relaxing just before the sound is completed, making an OOO-a sound.


One benefit: it's very Googleable.


If you can type it.


Ålenkå.

Yep, perfectly typable. Though I did have to look up the deadkey for ˚ as it's not a diacritic I use often (Option-K on the Mac OS X US Extended keymap; on Windows or Linux US International keymaps you can do it without a deadkey as Right-Alt W, or on the Mac US keymap as Option-A).

I highly recommend that people use and learn the "extended" or "international" keymaps on their system, so they can learn how to type these sorts of characters easily. As more and more systems are Unicode enabled, and you interact with more people internationally who would prefer their names to be spelled properly, it's a useful skill to have (and doesn't really take much work).


I wouldn't say it's related in any way to the alphabet in use. Take the English 'a' as an example: skate and car produce two different sounds out of the same letter.

Alphabets transmit words, much less so sounds.


Many alphabets do a good job transmitting sounds. Spanish, French or German are easy to read. English is particularly bad at having any correlation between spelling and pronunciation. "Spelling bee" would make little sense in most languages.


Sure, but F and E and completely different letters, even if they are almost identical (except for the little bar at the base of the E that is missing on the F).

A and Å are different letters, and can't be substituted for each other.


In english. Other languages have a much stronger correlation between characters and sounds.


There are languages with true alphabets, ie. there is a one-to-one relation with written letters and sounds. A good example is Turkish. Once you learn the sounds of the 29 letters in the alphabet, you can perfectly pronounce all Turkish words.


> Alphabets transmit words, much less so sounds.

In English, that is. Some languages using latin script always use the same sound for the same letter (how it should be done, IMHO).


Google will handle it for you my friend.

http://www.google.co.il/search?q=alenka%20gpu


But only if you know to append "gpu" to the search. Just searching for 'Alenka' doesn't yield useful results (yet): http://www.google.dk/search?sourceid=chrome&ie=UTF-8&...




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: