However, there are so many types of databases around. Lambda architectures are all the rage now - you keep one database for your transactionals, and another for analytics. Analytics are huge, in the multi-billions of dollars every year and they've become one of the most important parts of steering a business and deciding on new strategy.
Larger businesses don't just 'go for it' anymore, they analyze, and inspect, and dig deep into their historical data to find out if something is worth doing.
GPUs tend to lend themselves well to analytics, contrary to transactions. Specifically, columnar databases. When the columns are all of the same data type, and the data locality is high, GPUs perform /very/ well.
Regarding your sorting point you may not really want to sort everything, you got that bit right. But what if you want to perform a `JOIN` on a bunch of data?
It makes more sense to sort it first, because the JOIN would be much faster - matching keys would be much easier.
Now, if you were performing really fast SORT on a GPU, you're saving precious processing time.
If you're running analytical workloads on big data sets, you're typically I/O bound to start with. It seems like managing moving little pieces of it back and forth to the GPU to compute is going to be a big PITA, add lots of little latencies, and gain you absolutely nothing. What am I missing there?
2. What if you only push indexes or similar up to the GPU, like an AB-tree index? You're keeping all of the 'heavy' stuff down, and only uploading a representation of it, to be later replaced with the actual data.
3. Think compression/decompression done on the GPU directly.
What about parallel queries over a Restriction-Union normalized data model?
I think the benefits would be similar in nature to columnar stores as you note:
> GPUs tend to lend themselves well to analytics, contrary to transactions. Specifically, columnar databases. When the columns are all of the same data type, and the data locality is high, GPUs perform /very/ well.