
Do GPU-optimized databases threaten Oracle, Splunk and Hadoop? - tmostak
http://diginomica.com/2016/04/11/do-gpu-optimized-databases-threaten-the-hegemony-of-oracle-splunk-and-hadoop/
======
gesman
Leaving sarcastic comments aside, I think the valid point is made - the rise
of commodity, garbage-quality hardware overmanufactured in China and
subsequent rise of "commodity-proud" software empowered by penny-wise /
dollar-stupid business processes are overdue for correction.

Instead of putting 20 rusty bicycles together and claim to be a revolutionary
fuel- and cost efficient rocket ship, why don't we build a rocket ship from
the beginning that actually flies quite well?

Hardware and chips optimized for DB engines, queries and huge amount of
streaming data will be welcomed.

~~~
txdv
GPUs are already getting slower, CPUs have been stagnant for a while (no 2x
every few years), except if they just double the core count. I think in time
specialized domain hardware will make financially sense because it will be the
only way to grow.

~~~
onion2k
_GPUs are already getting slower_

Do you mean the rate of change is slowing down, or that they're _actually_
getting slower?

~~~
tmostak
Based on Nvidia's Pascal launch I think you could argue that not only are GPUs
getting faster but that the range of change just dramatically accelerated (due
to a process node shrink and introduction of super-fast HBM2 memory).

~~~
vetinari
Pascal already launched?

I'm not getting excited until Nvidia really ships something that keeps up with
their promises. For the past few years, their products were underwhelming
compared to marketing materials before launch.

~~~
jandrese
Only in expensive compute boards. No GPUs yet.

------
tomlock
Interestingly, it seems like GPU acceleration will be available in postgresql
in one of the next few releases [0].

From that page it seems once enabled, there aren't any special requirements to
get a GPU accelerating a query. As a result, I'd be surprised as a result if
"GPU optimized" databases overtake regular-db-with-gpu-acceleration-addins.

[0]
[https://wiki.postgresql.org/wiki/PGStrom](https://wiki.postgresql.org/wiki/PGStrom)

~~~
tmostak
Not all GPU databases are created equal. Notwithstanding the amazing
functionality of Postgres and the impressive work of the PGStrom team, afaik
they are still limited to a single GPU and do not cache data on board the GPU.
It certainly makes Postgres much faster but is still likely one to two orders
of magnitude slower than something like MapD that runs on up to 16 GPUs per
server and caches hot data in GPU RAM.

Disclosure: I am the CEO of MapD.

~~~
fsaintjacques
I'm interested in knowing what subset of SQL MapD supports. How does it handle
expressions on strings and complex data types (tstamp zoned, dates?), etc.

~~~
tmostak
It can do most of the basic expressions on strings (i.e. like/ilike) and can
do the standard time operations (date_trunc, extract), with the goal of being
fully SQL-92 compliant this year.

You have to remember that many of our customers use us with our visual
analytics frontend - they don't care about what happens behind the scenes,
just that they can interactively explore billions of records with near-zero
lag.

I'm a huge fan of PostgreSQL and certainly if you need an ACID transactional
database with the latest SQL support and extensions you can't go wrong. But if
you need the fastest analytics performance there might be more-suitable
platforms. Different tools for different jobs.

~~~
swasheck
> I'm a huge fan of PostgreSQL and certainly if you need an ACID transactional
> database with the latest SQL support

This answered my question. So now for the follow-up. Are they limited to one
core because their goal is to be ACID? As much as people like to blame
performance on the tenets of ACID, it's still pretty important for a great
deal of things.

~~~
anarazel
Queries are not limited to a single core anymore in the upcoming 9.6 release.
There's a lot more that can be parallelized than what's in 9.6 though (sorting
most importantly probably).

------
CyanLite2
It doesn't threaten anybody because
Oracle/Microsoft/Splunk/SAP/Hadoop/Spark/etc can just add in GPU-optimized
code themselves.

------
Waleedasif322
Funny how it mentions USPS using GPUdb to "process complex queries and display
2D visualizations in the time it takes to load a Web page", yet every time I
visit the post office, It takes at least 8 seconds after scanning a prepaid
package for the package details to appear on the screen. They need to port
that tech over to where it matters

~~~
kingnothing
If they can do all that, how about adding it to their tracking number system
so I can see where my package is and an estimate of what time it will arrive?

------
jkot
GPU databases are around for a while, but not much has changed.

I think bigger thread is cheap memory and raise of in-memory computing. Today
you can have a workstation with half TB RAM for fairly reasonable price.
Hadoop is already being crushed by Spark.

~~~
sgt101
Hadoop is not map reduce, repeat Hadoop is not map reduce.

Also Spark is not a solution, it is an engine.

We run Spark on HDFS using all the paraphernalia of Hadoop to maintain some
sort of sanity around it, how do you manage access? Encryption? Scheduling?

~~~
jkot
I think we can agree that many users are migrating away from batch processing.

------
pdeva1
the point overlooked in this article is how much costlier vram is compared to
ram. to store same amount of data in vram as ram would cost you an order of
magnitude more. not to mention dbs like mapd are not distributed, so you are
limited to the amount of gpus you can cram in a single box

------
jerven
I don't think they really threaten Oracle, even for analytics where this makes
sense. The performance increase over in memory on sparc m7 per price point
won't be that insane. So just like in memory db the main question is how long
before Oracle accelerates it's own DB with this kind of tech. I think they
have only 3 years before Oracle will be there.

~~~
sgt101
The challenge for Oracle is not how to do it, but how to make it pay. The
price point for a TB class homebrew gpu solution is ~$30k. Oracle
implementations used to be in the $500k range. This is why Oracle must move to
the cloud and take customers with it.

Oracles problem, customers who are keen on the cloud may well go to GCE, MS or
AWS. Customers who are not so keen seek to save the money with open source and
commodity implementations.

If you are sitting on a huge installed base the very last thing you want to
see is disruptive technology shaking out your customer base. Oracle has seen
three waves in five years - Hadoop, Cloud and now GPU/SCM. It's a tribute to
the software and strategy of Oracle that they aren't bleeding rivers of red
ink.

------
frozenport
If the 2000s didn't kill Oracle what will?

In the case of Hadoop, it might be possible to transparently translate the
scripts to a GPU backend.

~~~
lmeyerov
Yep, multiple Spark->GPU projects already happening.

------
rkrzr
AFAIK GPUs only excel at data-parallel tasks (i.e. doing the exact same
operation to thousands of data points in parallel, like in a matrix
multiplication e.g.). So I wonder how they utilize this for ad hoc SQL
queries? Anybody have any pointers to some papers maybe?

~~~
matt4077
SQL is somewhat parallel. All row-level computations are independent and a
WHERE maps nicely to a reduce(). I'm not sure if these aren't limited by I/O,
though.

Here's a recent article by NVIDIA on using GPUs for graph computation which is
somewhat related: [https://devblogs.nvidia.com/parallelforall/gpus-graph-
predic...](https://devblogs.nvidia.com/parallelforall/gpus-graph-predictive-
analytics/)

~~~
jandrese
I think it will be highly dependent on how you structure your queries. Some
will be absolutely I/O limited while others will be barely faster than on the
CPU. Piping the data out to the GPU is definitely going to be an issue too
unless your database is small enough to fit in GPU memory (in which case the
whole approach is probably overkill anyway).

------
capkutay
Oracle already sells A LOT of Exadata; premium priced machines to run
databases on overdrive. I think they would be fine competing against a GPU-
optimized database.

~~~
sitkack
Open source gpu databases are just presales support.

------
lmeyerov
Maybe a good time to point out we've been specializing more on the visual
analytics side (mentioned companies are more like DBs or traditional Tableau)
by connecting GPUs in the browser to GPUs in the datacenter: graphistry.com .
And, we're hiring ;-)

------
Gratsby
LOL. Someone sold USPS a solution to the traveling salesman problem.

~~~
SixSigma
It is a very lucrative sector

[http://www.paragontruckrouting.com/](http://www.paragontruckrouting.com/)

~~~
Gratsby
It's funny to think of all the money and effort spent compared against the
savings (in money, energy, and time) you can put in place by simply telling
delivery drivers to not make left turns and putting a reasonably intelligent
dispatcher in place.

~~~
SixSigma
Paragon does multiple drivers / vehicle / load splitting etc.

------
edward
"Any headline that ends in a question mark can be answered by the word no."

[http://enwp.org/Betteridge's_law_of_headlines](http://enwp.org/Betteridge's_law_of_headlines)

------
TheRealPomax
Because we've all had enough of buzzfeed: the article doesn't even come close
to bothering to actually answer the question. Decide for yourself whether that
makes it link bait or not.

