
SQL on Distributed GPUs now available on AWS Marketplace - felipe_aramburu
https://blog.blazingdb.com/sql-on-distributed-gpus-now-available-on-aws-marketplace-c2b80fe012aa
======
qeternity
This just seems like something riding GPU hype. We run a medium sized Memsql
cluster, and get better performance on larger tables with more complex joins
than they do in their demo video...and we do this for much cheaper than the
lowest g2 instance + AMI. Our data is quite narrow, so fits in RAM quite well,
but I would bet that even Citus would fare quite well against this.

~~~
obstinate
I suppose it depends on what operations you do on your table and what your
query stream looks like. It's hard to imagine compute being the bottleneck for
most SQL workloads . . .

~~~
Wheaties466
On a Postgresql query before version 9.6 is limited to a single CPU core.

Depending on the amount of data it has to query it absolutely can take
forever.

~~~
qeternity
Even now, it's only multicore when doing full table scans (iirc...)

~~~
jasonmp85
True (see here: [https://www.postgresql.org/docs/9.6/static/parallel-
plans.ht...](https://www.postgresql.org/docs/9.6/static/parallel-plans.html)
), but in PostgreSQL 10 there are Parallel Bitmap Heap Scans and Parallel
Index Scans (see here: [http://rhaas.blogspot.com/2017/03/parallel-
query-v2.html](http://rhaas.blogspot.com/2017/03/parallel-query-v2.html) ).

------
nl
For those puzzled, the use case for this is rapid, interactive data
exploration via reporting tools.

------
cwyers
Go to their documentation website. Search for "ACID." 0 results found. I'm
shocked to find that you can make massive performance gains in a relational
database if you make no guarantees whatsoever that data will actually be
stored.

~~~
tyingq
From [https://docs.blazingdb.com/blog/welcome-to-
blazingdb](https://docs.blazingdb.com/blog/welcome-to-blazingdb)

 _" BLAZINGDB IS NOT: Built for transactions. Try to run your webapp backend
in BlazingDB and we'll scream STOP! At the very least, we'll find your
audacity impressive."_

There's no support for INSERT, UPDATE, or DELETE. There's a proprietary way to
ingest data. No way at all to delete or update rows as far as I can see.

So, I'm not quite sure what the popular use cases are here. Fast queries of
relatively static data?

~~~
cwyers
> So, I'm not quite sure what the popular use cases are here. Fast queries of
> relatively static data?

Yeah, and... their "impressive" benchmarks (which they don't publish near
enough information about to be anything like impressive, and I'm sorry, but
the idea that Postgres is something like seven times as fast as MySQL on a
straight-up JOIN is not something I am going to believe without evidence) look
a lot impressive if you add the time it takes to transfer your data out of an
actually useful database to this read-only replica for querying. What is this
intended to do? And what do GPUs have to do with it?

~~~
felipe_aramburu
Relatively static is correct. The data you bring into blazingdb is normally
brought in ready to be operated on with the understanding that it will not
change on a regular basis. So there are many datasets that already exist, do
not change, and are absolutely enormous that people want to analyze. GPUs
allow us to quicky compress and decompress data to reduce file i/o
bottleknecks, perform massive transformations much more quickly than on cpu,
give us the computational capacity to find the best ways to optimize the
layout of the information to take the most advantage of data skipping. The
intention here is to provide you access to arbitrary large amounts of data
with a horizontally scalable database solution that has orders of magnitude
more computational capacity than a cpu based solution.

~~~
tyingq
Might be good to get that "read+append only" message more upfront. Redshift
has update, delete, etc, and you're making direct comparisons.

You do push the "data warehouse" message, but it may not be clear that
anything other than append would require reloading the entire database.

~~~
felipe_aramburu
Update / Delete is forthcoming. We have a version of it working but have not
pushed it yet because it is part of a broader set of changes that have not
been made availble yet.

------
pokoleo
> The main admin login credentials are simply “admin” with your AWS assigned
> Instance-ID as the password.

There's got to be a better way to do this... right?

~~~
felipe_aramburu
This was suggested to me by AWS marketplace. They said it needed to be one
click which meant I could receive user input from pepole loading the AMI
through something like SSH. If you want to change this you can reach out to us
on [https://docs.blazingdb.com/discuss](https://docs.blazingdb.com/discuss)

------
Moter8
Why did you [author] add these tiny gifs into the blogpost? They change enough
to distract from the text and aren't clickable / are too small to actually
understand. (Apparently only the first gif is sorta-zoomable)

------
mi100hael
This is my first time hearing about BlazingDB. Looks like it's a pretty new
application. Anyone have any experience actually running it in production that
can speak to its real-world performance, reliability, etc?

------
makmanalp
See also: [https://www.mapd.com/](https://www.mapd.com/)

~~~
felipe_aramburu
We love what MapD is doing. We operate in different spaces. We have no
visualization platform nor intention of making one. Instead of trying to solve
problems in GPU RAM we focus on solving problems that require disk based
storage. So whereas MapD will be able to perform faster queries on problems
that fit in GPU Ram we are focused on problems that require orders of
magnitude more storage than what can fit into GPU RAM. Think dumping a
datalake into a database to make it all queryable.

~~~
makmanalp
So more out-of-core stuff. Sweet, thank you!

------
ganfortran
No benchmark? I don't really think this hold against Redshift.

~~~
felipe_aramburu
You can try out the community version and comapre it to a small redshift
instance.

~~~
tycho01
I was curious, what's different about the community edition? It didn't really
tell on the pricing page.

~~~
felipe_aramburu
The community edition works with a single gpu and does not allow horizontal
scaling.

