Also in my experience building the scatter-gather query functionality and re-aggregation is usually the easiest part. The hard part is figuring out how to build fair multi-tenancy and QoS into what is essentially a massively parallel user facing real-time data lake.
We can always group stuff in a higher level category.
Theres no difference between backend, frontend, gaming, embedded, etc, essentially they're all bit manipulators.
But... What's the purpose here?
> data warehouses like Snowflake and Redshift
Are fundamentally the same, and I’m yet to see any reason other than “marketing shenanigans” and “avoiding benchmarks” as to why they should be given their own special category. Call them all modern olap, or call them all data warehouses, doesn’t matter.
> general-purpose data placement algorithm for query serving systems that improves latency by maximizing query parallelism, spreading out shards that are frequently queried together.
This is cool, it will be interesting to know if the added parallelism wins over network overhead and added coordination required. Maybe there’s ways to shift where that line lies as well?
Given that the likes of ClickHouse and Druid can be made user facing, and support backend analytics workloads, doesn’t that just imply that Snowflake/redshift are just outright less capable?
Neither Clickhouse nor Druid can hold a candle to what Snowflake can do in terms of query capabilities, as well as the flexibility and richness of their product.
That’s just scratching the surface. They’re completely different product categories IMO, although they have a lot of technical / architectural overlap depending on how much you squint.
Devil is in the details basically.
Do you have something specific in mind?
My previous experience with Snowflake was the query functionality was lacking, performance was subpar (at best), and half the purported features were a joke (looking at you “Kafka integration”) or just gimmicky (the time travel feature)
(Disclosure: SingleStoreDB cofounder)