Hi there, we are mostly targeting analytic use-cases with this. SQL already has a lot of analytics functionality built into the language, so we thought it might be interesting to enable it.
* We find that in the context of analytics, transactions aren't as important (they are still relevant, but people are OK if results are a bit off).
* For the MongoDB sharding model, CitusDB already keeps similar metadata and moves around shards accordingly. By syncing MongoDB's config server metadata to the CitusDB master, we can ensure that shards are moved around properly. We can also run queries across shards using this metadata.
And thanks a bunch for the encouragement. As you noted, we think of this as addressing some interesting use-cases, and are looking to get the community's feedback.
Your idea is great, but you will end up with average Joe using it for other cases thus MongoDB will be nontransactional database engine with SQL interface and sharding feature, which will be much slower and will have less features than any other RDBMS with proper replication funcionality.
Maybe they want to recycle complex analytic queries and visualization tools for rdbms-databases, while still enjoying the advantages of mongodb. They mention this in the first paragraph.
"This way, you can run complex analytic queries, join different collections, and benefit from SQL data visualization tools."
"In the following, we show a rather simple query, but complex queries that involve sub-selects, SQL window functions, and collection joins are also possible. In fact, you can even join a MongoDB collection with a PostgreSQL table."
"And the really exciting part for us is that CitusDB can easily enable analytic queries for sharded MongoDB installations, as it is built on PostgreSQL and can already distribute SQL queries."
This company builds CitusDB, "a distributed database that lets you run SQL queries over very large data sets. Designed for analytical queries, CitusDB enables real-time responsiveness." This apparently allows their software to work with MongoDB.
Those slides gave me a headache, so I'll just point out for the benefit of everyone that there is a README, and that it's some form of replication via AMQP queues.
Well, that's related to the particular implementations, not theory. If PostgreSQL works faster and is less resource-intensive than MongoDB, it says nothing about their theoretical underpinnings. The whole point of the NoSQL movement is that one approach (relational databases with SQL) to work with data and its structure cannot be the best answer to all the possibilities in that space. Not because of particularities in the implementation (eg. speed), but because it gives you an easier time working with it; there's no mismatch.
I'd say that if you are working with diverse documents with a few or no relationships, a document oriented database is necessarily a better fit to the task at hand than a relational one. It's a tautology.
If you are needing many JOINS and your data is homogeneous, there's a mismatch. You probably should have chosen a relational database in the first place, as it better fits that model.
Agreed. It's hilarious to watch people who keep trying to setup this RDBMS versus NoSQL situation. As though every data model can be perfectly handled by a RDBMS if only I learnt enough SQL.
Faster ? NO. In my use case I have one domain object with 30 embedded relationships. MongoDB: one request. SQL: 30 joins.
Durable ? NO. It is trivial to distribute data between multiple nodes using MongoDB and have writes be on some or all of them. I have yet to see a clustering SQL system that makes the process as simple as any NoSQL database.
Fewer application errors ? Well this just makes no sense. Why would a database be causing application errors ? Either the fetch/store request fails or it doesn't.
Lower TCO ? NO. MongoDB has an order of magnitude less TCO than any database I've used. Period. It is trivial to cluster, shard, deploy, manage and being schema-less there are no migrations to worry about. Ease of use is one of the big reasons why MongoDB is popular right now.
I don't think you understood him. There would not be many joins. You can use an RDBMS as a document database by storing documents in a single column, with key values in another column. For example, storing JSON in a text field (Postgres also has a native json type now as well I belive). You'd also want an Indexes table. Another approach that will also work well is to use a document model, you might have three tables, one of which has columns like (key, name, value) - all your data is in that table. Reddit uses something like this approach based on previous Hack News discussions.
But you are talking more about key/value model. Which I see as a different use case to document model.
And PostgreSQL HSTORE seems like a bit of a hack to me. It has a ridiculously complex syntax and still not sure how it handles querying other data types e.g. dates.
HSTORE stores values as strings, so in your query you'd have to cast to a date, or whatever type you're wanting to get out of it.
The thing is, an RDBMS really is a superset of most of the other data models floating out there.. Entity-attribute-value, Document, plain key-value, whatever. Your every day ordinary RDBMS can do it, with transactions, durability, ease of administration, whatever. When you use a NoSQL data storage system you're explicitly trading off one or more of the capabilities of an RDBMS to get some other benefit. The one thing that Postgres isn't good at, as far as I know, is a graph-like data structure.
In addition to calling Mongo from within Postgres theres a variety of other Foreign Data Wrappers that exist, including the ability to call MySQL, Oracle, Redis, and CouchDB. You can find the list of FDWs at http://pgxn.org/tag/foreign%20data%20wrapper/
It would be great to have the Mongo one listed there as well.
We measured performance across several SQL queries, and found that it depends. Assuming that the working set fits into memory:
* If the query goes over a single table and doesn't have filter clauses, then the cost of converting BSON data to PostgreSQL tuples becomes the bottleneck.
* If the query goes over a single table, doesn't have filter clauses and touches only a few columns, then the cost of reading the data from MongoDB (over TCP) becomes the bottleneck.
* If the query joins several tables or has complex sub-selects, then the lack of accurate data statistics forces choosing bad execution plans and becomes the bottleneck.
For most queries though, we found that the query performance was reasonable. PostgreSQL processed around 200K-400K documents per second per CPU core.
That comes at the right time. While I was reading through the docs of Postgres, a colleague of mine spent time doing the same for mongodb. Our goal is to have on each computer at the studio local database with most of our files from the game - 3d models, assets, etc - and we are looking what is going to suit us. He was able to model and put in mongo all our level maps, we have the asset information already in sqlite and all our shaders are cached there.
So this comes as a good surprise. I've been excited reading about PostGres, and now this seems like a very good fit for us.
Hi there - the collection doesn't need to be homogenous; if the query references a missing field in a document, the wrapper interprets that value as null.
* A big benefit for SQL are transactions, I have a hard time seeing anything done efficiently with MongoDB.
* Additionally, joins cannot be consistent with the MongoDB model (shards, reading from secondaries, ...).
While this idea is charming, I don't see this reaching any serious production environment.
But it's cool, Citus implemented a somewhat working prototype which is useful for some use cases. Kudos ! :)