thanks for your summary and valuable feedback. a few notes from the crate team:
1. JOINs are on our roadmap. we might never reach full JOIN support anytime soon (INNER, LEFT OUTER, RIGHT OUTER, FULL OUTER and CROSS), but simple use-cases (1:n), equal joins shouldn't be too hard.
2. after re-reading our documentation on sharding i need to admit that we need to improve it. we mention it briefly here https://crate.io/docs/current/sql/ddl.html#sharding, more will be added asap
3. if we have growing datasets we typically work with parted tables. we just pushed that change just a few days ago (https://github.com/crate/crate/blob/master/docs/sql/partitio...), not yet in the RPM build.
4. no, you can specify routing https://crate.io/docs/current/sql/ddl.html#routing
5. agreed, that the doc can be improved. and we're at the very beginning https://crate.io/docs/current/sql/ddl.html#replication - rack awareness,... there are quite some replica settings on our roadmap. i'd be happy to hear your most important additions to replica policies.
i promise, we're working hard and try to be there faster than in a few years :)
Basic JOIN support would be good, but I think it is misleading to advertise Crate as having SQL support, because I think most users would assume that the SQL everyone knows, ANSI SQL, would be supported, which it definitely isn't yet.
As I said, sharding is mentioned, but no details about how this is actually implemented. A potential user could check in the code, but realistically this is unlikely as many users have probably not given a huge amount of thought to how the routing is done to different shards. It's a really important issue, because if you're doing range-based sharding like MongoDB does by default, that changes the kind of key that it should be partitioned on.
With regards to point 4, your documentation says "If a primary key constraint is defined, the routing column definition can be omitted or must match a primary key column." I read this as if there is a primary key defined, the routing column must match it, or can just not be specified, or more simply, you can only shard on the primary key. If this isn't the case, I think this needs re-wording.
In terms of replication, the explanation of the ranges of replicas is confusingly worded. I was wondering what the use-case for this is? Surely the idea of replicas is to determine how many node failures you want to support, and then set the count at the minimum number required to support this so as not to waste resources. Also, if you set a range, how does Crate determine where in that range to set the number of replicas? Is it as many as possible?
i promise, we're working hard and try to be there faster than in a few years :)