Hacker News new | past | comments | ask | show | jobs | submit login

For some reason I am surprised that anyone would want to ship the data whole and as is, to the GPU. Wouldn't it make more sense to use a representative, transformed "GPU-ready" data set, both much smaller in size & designed specifically for the queries that are to be optimized?



We are not shipping all of the data as a whole to the gpu. We are going to be releasing some whitepapers that explain this in more detail but lets get a few things clear. Data is sent to the gpu compressed since it is compressed when it is stored. We can decompress VERY quickly on GPU's (30-50GB/s is easily achievable on a K80) and because each of our columns are compressed using one of our cascading compression algorithms (which everyone offers the best in terms of compression and throughput). We are a column store and only send over the columns that are being used in processing. So for example

select id, name, age, avg(income) from people group by gender

In this case only the income and gender columns would actually be sent to the gpu and they would do so in a compressed fashion to increase the "effective" bandwidth of data over PCIE. Even more interesting is that id, name, and age, would be pulled from our horizontal store instead of our compressed columnar store in order to minimize the number of iops necessary to fill the result set.


Once you read and prune out the dataset to only include the relevant data, then what's left for the GPU to do?


The transformation to GPU-ready would not be as trivial and effectively redundant as pruning the data of course. It would be produce a secondary data structure, like an index on a column, though in this case of course destined to be processed within the math-oriented, high-branch-cost setting of a GPU.


This is basically the bread and butter of columnar compute systems, not just GPU ones. GPUs ones just get to throw more compute at them, and thus do even better for these kinds of space/time trade-offs. Interestingly, most big data systems are increasingly columnar.




Applications are open for YC Winter 2020

Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | Legal | Apply to YC | Contact

Search: