Hacker News new | past | comments | ask | show | jobs | submit login

Great to see more competition in GPU+DW space! Some questions:

1. How does BlazingDB compare to MapD ?

2. How do you skip ingest - are you using Apache Arrow like Dremio as an efficient data representation format?

3. Do you have any benchmarks, or maybe where would you see BlazingDB on this list? http://tech.marksblogg.com/benchmarks.html using same hardware as MapD ?

4. Can you run your solution in a cluster? :)

1) We are focused on the data lake. We love MapD, they are doing kick ass stuff, we are focused more on operating on information from disk and from cloud storage services like s3 or hdfs implementations.

2) We read parquet files into our own caching system. We often use arrow apis though do not rely on arrow for our data representation.

3) We have made client side benchmarks but have not performed a standardized replicable benchmark for people to validate yet. We have been a VERY small team to date and are going to make that available as soon as we can. You CAN launch AWS marketplace blazing instances to see how it performs.

4) You sure can. A large part of BlazingDb's focus is on distribution. You can add nodes during runtime.

Could you elaborate on the GPU + data lake part? Memory transfer lag to and from GPU is significant in comparison to GPU computing power. Data lake may mean mutliple heterogeneous data sources with or without schema. How is coping with it helped by a GPU?

So depending on your sources, e.g. if they are compressed or not, the GPU can greatly speed upi/o by compressing and decompressing directly on the GPU. This is particularly meaningful when you can store it in compressed state when you transfer to the GPU and decompress for use after its on. It doesn't solve most of the problems of working with hetergenous sources. That being said gpus definitely allowed us to speed up how we get data out of Apache parquet and we anticipate we will be able to benefit from their speed when adding more compressed file formats.

What types of compression do you use? I did some work on streamvbyte, and it seems relevant.

Right now we support RLE, RLE Delta RLE, Delta RLE, Dictionary, Bitpacking. Many of these are combined together. It does look interesting I am checking it out.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact