Hacker News new | past | comments | ask | show | jobs | submit login

Could you elaborate on the GPU + data lake part? Memory transfer lag to and from GPU is significant in comparison to GPU computing power. Data lake may mean mutliple heterogeneous data sources with or without schema. How is coping with it helped by a GPU?



So depending on your sources, e.g. if they are compressed or not, the GPU can greatly speed upi/o by compressing and decompressing directly on the GPU. This is particularly meaningful when you can store it in compressed state when you transfer to the GPU and decompress for use after its on. It doesn't solve most of the problems of working with hetergenous sources. That being said gpus definitely allowed us to speed up how we get data out of Apache parquet and we anticipate we will be able to benefit from their speed when adding more compressed file formats.


What types of compression do you use? I did some work on streamvbyte, and it seems relevant.


Right now we support RLE, RLE Delta RLE, Delta RLE, Dictionary, Bitpacking. Many of these are combined together. It does look interesting I am checking it out.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: