At previous employer, we built a system using Druid as the primary store of repo...

Dylan16807 · on March 19, 2017

Make sure you're looking at updated prices for ram too. 16x16GB of registered ECC DDR3 is about the same price and enormously faster.

sologoub · on March 20, 2017

Sure, but I believe we were limited by the available chassis to a lot lower than 16 slots.

Dylan16807 · on March 20, 2017

Well the first google result for "1u 16 dimms" is a refurbished chassis+motherboard+PSU for a hundred bucks. Brand new costs more but not terribly so; the main cost is the ram whether you go 8 slots or 16.

These SSDs have situational uses but unless you want 10+ TB in one server you can get a system with >50% as much actual RAM for the same price.

sologoub · on March 20, 2017

It's not the cost. We ran standardized chassis, so whatever our ops had is what they had...

Redsquare · on March 19, 2017

Would you choose to run with druid again?

sologoub · on March 19, 2017

For that use case, absolutely! We made do with the version that could not even support label appends (limited joins). The current version would allow us a lot fewer workarounds.

The probabilistic hyperloglog data type is also a game changer compared to say redshift, but again it's only viable if you are dealing with counting (estimating) unique entities across billions of rows and super-wide dimension sets.

If you are doing a general purpose analytics store, Redshift is hard to beat because of reliability and ease of implementation.

Druid is a purpose-built race car. Redshift is a good cross-over - far less headache and can do almost any job good enough, but you won't have the tuning or performance (when tuned right) at scale. Although, I'm continuously impressed with what redshift actually can do, dispite the humble feature set.

Druid's main weakness is lack of SQL support, so it's not a great analyst datastore. You pretty much have to wrap it into a reporting app.

scapecast · on March 20, 2017

Hi sologoub - can you elaborate a bit on the tuning for Redshift you're referring to? What's the pain there? Asking because we're building a performance management product for Redshift, I'd love your input! lars at intermix dot io

otterley · on March 19, 2017

What do you think of ClickHouse vis-a-vis Druid and Redshift?

sologoub · on March 20, 2017

Don't have any experience with that tech, but from reading the marketing landing page it sounds more akin to memSQL than Redshift, in that it seems to include options for streaming ingestion.

If I'm going to take on a similar project, I may POC memSQL or Citus DB, and possibly Big Query (if the project is built on Google Cloud as opposed to AWS or raw iron).