* A big benefit for SQL are transactions, I have a hard time seeing anything done efficiently with MongoDB.
* Additionally, joins cannot be consistent with the MongoDB model (shards, reading from secondaries, ...).
While this idea is charming, I don't see this reaching any serious production environment.
But it's cool, Citus implemented a somewhat working prototype which is useful for some use cases. Kudos ! :)
* We find that in the context of analytics, transactions aren't as important (they are still relevant, but people are OK if results are a bit off).
* For the MongoDB sharding model, CitusDB already keeps similar metadata and moves around shards accordingly. By syncing MongoDB's config server metadata to the CitusDB master, we can ensure that shards are moved around properly. We can also run queries across shards using this metadata.
And thanks a bunch for the encouragement. As you noted, we think of this as addressing some interesting use-cases, and are looking to get the community's feedback.
Is there some sort of advantage to this that I'm missing? It just seems like reinventing the wheel.
"In the following, we show a rather simple query, but complex queries that involve sub-selects, SQL window functions, and collection joins are also possible. In fact, you can even join a MongoDB collection with a PostgreSQL table."
"And the really exciting part for us is that CitusDB can easily enable analytic queries for sharded MongoDB installations, as it is built on PostgreSQL and can already distribute SQL queries."
This company builds CitusDB, "a distributed database that lets you run SQL queries over very large data sets. Designed for analytical queries, CitusDB enables real-time responsiveness." This apparently allows their software to work with MongoDB.
We use this tool to replicate data across our main MongoDB database, and a PostgreSQL one (eventually consistent).
We haven't wrote a comprehensive readme yet.
The only documentation we have so far are slides: http://viennot.biz/promiscuous
Stay tuned :)
I'd say that if you are working with diverse documents with a few or no relationships, a document oriented database is necessarily a better fit to the task at hand than a relational one. It's a tautology.
If you are needing many JOINS and your data is homogeneous, there's a mismatch. You probably should have chosen a relational database in the first place, as it better fits that model.
Durable ? NO. It is trivial to distribute data between multiple nodes using MongoDB and have writes be on some or all of them. I have yet to see a clustering SQL system that makes the process as simple as any NoSQL database.
Fewer application errors ? Well this just makes no sense. Why would a database be causing application errors ? Either the fetch/store request fails or it doesn't.
Lower TCO ? NO. MongoDB has an order of magnitude less TCO than any database I've used. Period. It is trivial to cluster, shard, deploy, manage and being schema-less there are no migrations to worry about. Ease of use is one of the big reasons why MongoDB is popular right now.
And PostgreSQL HSTORE seems like a bit of a hack to me. It has a ridiculously complex syntax and still not sure how it handles querying other data types e.g. dates.
The thing is, an RDBMS really is a superset of most of the other data models floating out there.. Entity-attribute-value, Document, plain key-value, whatever. Your every day ordinary RDBMS can do it, with transactions, durability, ease of administration, whatever. When you use a NoSQL data storage system you're explicitly trading off one or more of the capabilities of an RDBMS to get some other benefit. The one thing that Postgres isn't good at, as far as I know, is a graph-like data structure.
It would be great to have the Mongo one listed there as well.
* If the query goes over a single table and doesn't have filter clauses, then the cost of converting BSON data to PostgreSQL tuples becomes the bottleneck.
* If the query goes over a single table, doesn't have filter clauses and touches only a few columns, then the cost of reading the data from MongoDB (over TCP) becomes the bottleneck.
* If the query joins several tables or has complex sub-selects, then the lack of accurate data statistics forces choosing bad execution plans and becomes the bottleneck.
For most queries though, we found that the query performance was reasonable. PostgreSQL processed around 200K-400K documents per second per CPU core.
So this comes as a good surprise. I've been excited reading about PostGres, and now this seems like a very good fit for us.
// find bad data
SELECT * FROM table WHERE ...
UPDATE table SET ... WHERE ...
We found that translating the rest of SQL amounted to writing a database query executor, and instead opted for leaving that to Postgres.
For these translations, what use cases did you have in mind?
JOIN is the hardest part, I think.
For nested documents, we have some ideas on how to make that accessible, and talk about it in our README: https://github.com/citusdata/mongo_fwd/blob/master/README