+1 to StarRocks. You can cut out a lot of the weight associated with denormalization, which ClickHouse almost forces you to do. Crazy big cluster sizes as well
YES!! Normalization was the BEST PRACTICE since DBMS was invented in like the 70s, and somehow, we just totally forgot about it the past 10 years ago for OLAP. 10x more expensive storage, impossible schema evolution because of data backfilling, the extra pipelines you have to build and maintain, and the cost scales with the data size and your business growth. I was just talking with a buddy of mine about this and how to run JOINs on the fly without denormalization pipelines using StarRocks in this case.
Many developers are seriously lacking in database fundamentals, especially over the past 10 years. I regularly see tables that don't have primary keys or indexes of any sort. It boggles the mind.
Quite a long time ago I worked on a system using Cassandra for storing some data. The system used about 100 gigabytes for data storage of Cassandra on all of the nodes in the cluster. At some point we needed to upgrade from version 2 to 3. I decided to take a backup using the included tooling of Cassandra. After taking a snapshot and compressing it, the size was 14 megabytes. I'm still in awe that this system existed.
Maybe they should rename it to their migration options page. Or maybe I'll just ask ChatGPT what the best alternative is...
Still, pretty useful stuff, but it also feels like Rockset had been moving a little too slowly in recent years, but congrats to them on finding a new home.
Funny that I hadn’t heard of them in the database space till they showed up at the top of ClickBench. Makes me wonder what other interesting projects I’m missing out on in China.
Apache Kylin was actually China’s very first top-level Apache project. It did come out of eBay, but the work all originated in China. It’s a really cool solution to query acceleration.
Very cool to see this! Kylin has been a super fun project to b a part of. A unique approach to OLAP/query acceleration that's pretty much the best way to deal with huge volumes of data.
It's great to see them collaborating with Women Who Code to get the word out and grow our open source family. If anyone is interested in learning more about the project, check us out here: http://kylin.apache.org/
Thanks for sharing this! Apache Kylin is an awesome project. It has been around for a few years and has been getting a lot of attention in Europe and Asia, but still seems to be relatively unknown by a lot of folks I talk to in the US.
If people are interested in getting involved with the community, or just want to check Kylin out, you can find the page for the project here: http://kylin.apache.org/
Kylin is a really cool project, and how it's using OLAP for faster analytics on large datasets/big data is interesting - especially given how old the technology is. The Kylin community has found some creative ways of breathing new life into OLAP. If you'd like to join the project you can learn more here: http://kylin.apache.org/