More

Merick · 2025-09-05T18:37:32 1757097452

Curious, you mention Doris. I wonder if you've tried looking into StarRocks?

DataWizard1986 · 2025-09-05T18:41:18 1757097678

+1 to StarRocks. You can cut out a lot of the weight associated with denormalization, which ClickHouse almost forces you to do. Crazy big cluster sizes as well

esafak · 2025-09-05T19:00:09 1757098809

https://medium.com/@marvin_data/testing-query-speed-for-duck...

apwell23 · 2025-09-05T18:53:46 1757098426

what about druid ? I don't hear much about druid these days. Has it fallen off?

sdairs · 2025-09-05T22:18:34 1757110714

yeah its legacy tech at this point

nrjames · 2025-09-05T19:07:36 1757099256

Not yet, but it's on the list! This is R&D work that I'm doing on the side, when I have time. Do you prefer StarRocks to Doris?

Merick · 2025-09-06T05:01:38 1757134898

Yeah, interestingly StarRocks was originally a fork from Doris, but these days it tends to outperform in most of the use cases I’ve read up on.

Merick · 2025-09-04T00:46:56 1756946816

Good stuff, always happy to see StarRocks get some recognition. Always surprised when I talk to folks who haven't heard of them.

Merick · 2025-04-10T00:21:15 1744244475

YES!! Normalization was the BEST PRACTICE since DBMS was invented in like the 70s, and somehow, we just totally forgot about it the past 10 years ago for OLAP. 10x more expensive storage, impossible schema evolution because of data backfilling, the extra pipelines you have to build and maintain, and the cost scales with the data size and your business growth. I was just talking with a buddy of mine about this and how to run JOINs on the fly without denormalization pipelines using StarRocks in this case.

icedchai · 2025-04-10T02:22:39 1744251759

Many developers are seriously lacking in database fundamentals, especially over the past 10 years. I regularly see tables that don't have primary keys or indexes of any sort. It boggles the mind.

sidewndr46 · 2025-04-10T02:24:06 1744251846

Quite a long time ago I worked on a system using Cassandra for storing some data. The system used about 100 gigabytes for data storage of Cassandra on all of the nodes in the cluster. At some point we needed to upgrade from version 2 to 3. I decided to take a backup using the included tooling of Cassandra. After taking a snapshot and compressing it, the size was 14 megabytes. I'm still in awe that this system existed.

Merick · on June 21, 2024

I'll always remember Rockset for their ridiculous comparison page: https://rockset.com/real-time-analytics-comparison/

Maybe they should rename it to their migration options page. Or maybe I'll just ask ChatGPT what the best alternative is...

Still, pretty useful stuff, but it also feels like Rockset had been moving a little too slowly in recent years, but congrats to them on finding a new home.

riku_iki · on June 21, 2024

It was funny seeing their front page saying "World's faster analytical and search database" has 90MB/s streaming ingest speed..

teej · on June 21, 2024

These pages are done for SEO. You get loads of inter-linked pages rich with keywords that match user searches exactly.

Merick · on June 21, 2024

Seconding the StarRocks project, best performance out there and the community is great. Tons of support.

Merick · on Feb 27, 2023

StarRocks, it’s a Linux Foundation project now, but a lot of the initial team and community behind it came from China.

https://github.com/StarRocks/starrocks

Funny that I hadn’t heard of them in the database space till they showed up at the top of ClickBench. Makes me wonder what other interesting projects I’m missing out on in China.

Merick · on June 10, 2020

Apache Kylin was actually China’s very first top-level Apache project. It did come out of eBay, but the work all originated in China. It’s a really cool solution to query acceleration.

You can learn about it on the community page here: http://kylin.apache.org/

It’s pretty popular across China, and I’ve seen it come up a bunch in Europe/South America, but in the U.S. it’s pretty new to a lot of folks.

Merick · on June 4, 2020

Very cool to see this! Kylin has been a super fun project to b a part of. A unique approach to OLAP/query acceleration that's pretty much the best way to deal with huge volumes of data.

It's great to see them collaborating with Women Who Code to get the word out and grow our open source family. If anyone is interested in learning more about the project, check us out here: http://kylin.apache.org/

Merick · on May 4, 2020

Thanks for sharing this! Apache Kylin is an awesome project. It has been around for a few years and has been getting a lot of attention in Europe and Asia, but still seems to be relatively unknown by a lot of folks I talk to in the US.

If people are interested in getting involved with the community, or just want to check Kylin out, you can find the page for the project here: http://kylin.apache.org/

Merick · on April 29, 2020

Kylin is a really cool project, and how it's using OLAP for faster analytics on large datasets/big data is interesting - especially given how old the technology is. The Kylin community has found some creative ways of breathing new life into OLAP. If you'd like to join the project you can learn more here: http://kylin.apache.org/