Hacker News new | comments | show | ask | jobs | submit login

I'm looking pretty seriously at MongoDB and I've heard that Cassandra is worth considering. I do a lot of data warehousing / statistical analytics which generally means some sort of star schema-based reporting with lots of crosstabs, dimensions, etc.

If anyone can relate their experience with either of these two platforms, would either be a good choice for live querying for these types of applications? I know you can use MapReduce to eventually get the data you need, but I need to support queries that respond in (well) less than a second, even for very large data sets.




Like any other data store, MongoDB will do what you need in less than a second... * if the database fits in memory *. Once it's on disk, it doesn't matter as much how little overhead the storage engine has, it's going to be slow.


HBase is probably closer to what you're looking for.


Why?


It basically depends on 2 things:

1) What kind of API do you need? It'll have to be simpler than SQL, likely key-value with secondary indices managed at either the API layer or application layer.

2) Which 2 of Consistency, Availability and Partition Tolerance do you need? Is "gets there eventually" enough or do you need "can read it right after I write it" guarantees?


MongoDB (and SQL dbs) does ad-hoc queries quite well as long as your data fits on a single machine.

Cassandra does ad-hoc queries quite well if you're willing to use mapreduce.

If you have a smaller number of frequently done queries and are willing to use MR for the others maybe that is a happy place for you.




Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | DMCA | Apply to YC | Contact

Search: