
MongoDB 3.2: Now powered by Postgres - buffyoda
https://www.linkedin.com/pulse/mongodb-32-now-powered-postgresql-john-de-goes
======
preinheimer
tl;dr: There's a new "BI Connector" which will allow you to connect your
business intelligence tools to MongoDB using the Postgres wire protocol (which
many speak). This is somehow bad because Postgres is also popular, and maybe
people will use Postgres now. Also: the author (who has a competing connector)
knows the names of a lot of people at MongoDB.

~~~
mikey_p
My reading of the article was not that it uses the protocol, but that it
actually unpacks it into an actual Postgres database. This obviously looks bad
for MongoDB if their analytics literally uses a competing database.

[https://github.com/asya999/yam_fdw](https://github.com/asya999/yam_fdw)

~~~
buffyoda
If you're interested in how much computation gets pushed down into the MongoDB
database, versus how much gets pulled back into PostgreSQL, you can find some
examples here:

[http://slamdata.com/blog/2015/12/08/nosql-analytics-for-
mong...](http://slamdata.com/blog/2015/12/08/nosql-analytics-for-mongodb.html)

Warning: It's not pretty.

------
parenthephobia
tl;dr 2.0:

1) MongoDB Inc have made a "foreign data wrapper" for PostgreSQL which enables
MongoDB databases to be accessed from within PostgreSQL.

2) This makes data in MongoDB databases accessible to existing analytics
software for SQL databases, which is often made by companies with lots of
money.

3) The CTO of SlamData Inc, which makes analytics software for NoSQL
databases, thinks that MongoDB Inc shouldn't have done that.

tl;dr 3.0:

A company faces increased competition; isn't happy about it.

~~~
makomk
It enables MongoDB databases to be accessed from within PostgreSQL at a
massive performance penalty compared to just storing your data in Postgres in
the first place, because the particular kind of foreign data wrapper they're
using has limited ability to make use of MongoDB's query functionality and has
to literally load the entire contents of the database into Postgres for
anything non-trivial. Which means you're better off just using Postgres. This
is bad for MongoDB because their business model relies on people actually
using MongoDB rather than the competition.

------
onetwotree
Frankly, if I were tasked with integrating existing BI tools with MongoDB, I'd
immediately start looking at ways to "escape" the anemic Mongo ecosystem to
something a bit richer. A Postgres FDW seems like an excellent design.

Of course, I'm a bit of a Postgres partisan, and a Mongo refugee, but it still
seems like a solid engineering decision and most of this guys arguments seem
to hinge on "BUT POSTGRES IS TEH ENEMY!".

~~~
buffyoda
Well, you might start off down that path, but eventually you'd find that if
you try to execute analytics via PostgreSQL via FDW via Multicorn via MongoDB,
you're only able to push conjunctions of simple relational operators on
original (non-derived) fields in the source collection.

What that means is virtually any query will end up executing (via PostgreSQL
via FDW via Multicorn via MongoDB) by first pulling out all (!) the data from
all (!) source collections, relocating it to MongoDB, and then executing the
query. Possibly, in fact, these full collection scans might be repeated
multiple times, especially for nested data, crosses, and other types of
operations.

And then you'd decide that "solid engineering decision" wasn't so solid after
all. Then hopefully you'd quit MongoDB and go work on PostgreSQL full time.
;-)

------
ahachete
It's nothing new that PostgreSQL is a great tool for doing analytics, even
coming from MongoDB. I'm very happy that MongoDB took this route, it speaks a
lot about their capabilities in the non-OLTP world.

Having said that, I very biasedly say that there's a much better solution to
this connector, which doesn't flatten out the MongoDB data: it's called ToroDB
([https://github.com/torodb/torodb](https://github.com/torodb/torodb)).

ToroDB, open source, speaks the MongoDB protocol, transforms documents to
relational tables (without any kind of flattening, and without having to
define any schema) and stores data in a RDBMS. More precisely, PostgreSQL.

Current development version (repl branch) speaks the replication protocol, and
hence can replicate live from a MongoDB into PostgreSQL. No connector needed,
no flattening, no FDWs, nothing else. Just add a new "slave" (ToroDB) to your
replica set and you're good to go.

It goes even further: if you want pure data warehousing, ToroDB will soon
support GreenPlum. Some initial benchmarks
([http://www.slideshare.net/8kdata/torodb-scaling-
postgresql-l...](http://www.slideshare.net/8kdata/torodb-scaling-postgresql-
like-mongodb), slide #42) show 25x-75x improvement between doing aggregate
queries in MongoDB and their equivalent queries in GreenPlum's distributed
SQL.

Now that MongoDB 3.2 ships with PostgreSQL "included", feel free to try
ToroDB. It's always better the original :)

Note: I am a ToroDB developer.

~~~
buffyoda
I think ToroDB is super cool, and I wish your project the best of luck! You
can't go wrong building something on PostgreSQL. :)

That said, PostgreSQL FDW is NOT a great option for MongoDB analytics. Not
only is the data model so different that you lose the ability to answer many
types of questions, but Multicorn supports only basic pushdown (conjunctions
of simple relational operators on original columns).

What this means is that analytics via PostgreSQL via FDW via Multicorn via
MongoDB suffers from (a) very poor expressive power, and (b) ridiculously slow
performance, since nearly any type of query will require at least one full
table scan on all the source tables (in some cases, especially with arrays,
many more full table scans may be required for a single query!).

Better off just using ToroDB. Am I right? :)

~~~
ahachete
Thank you, John De Goes. I definitely agree. While PostgreSQL FDWs are a great
way of extending Postgres, I don't see they are a good fit for this use case.
Not only there are a lot of pushdowns not supported (although that is in the
process of being improved), but more importantly, as you mentioned, this
connector is going to impose a lot of full table scans for even the simplest
queries. I'm dying to benchmark this connector against ToroDB. But
unfortunately, the MongoDB proprietary license agreement explicitly forbids
any kind of benchmark. I guess they have reasons to do so ;)))))))

I cannot be objective saying that you should better of using ToroDB, but I
definitely think so.

I also want to congratulate you. I think Quasar and SlamData have gone very
far, and I'd encourage you to keep on pushing it. While this connector may or
may not adversely affect SlamData, there's always room for differentiation and
improvement. Good luck!

------
CurtMonash
Classic marketing pitch from a little company that wants to claim it's much
more significant than it is:

1\. Claim a must-have set of requirements that ... 2\. ... happen to match its
product's feature set ... 3\. ... but not its competitors.'

[http://slamdata.com/whitepapers/characteristics-of-nosql-
ana...](http://slamdata.com/whitepapers/characteristics-of-nosql-analytics-
systems/) is presumably the core of the argument.

I tend not to pay attention to such claims until the company rephrases them
more honestly.

That said, a brief discussion of what is really happening is in
[http://www.dbms2.com/2015/09/10/mongodb-
update/](http://www.dbms2.com/2015/09/10/mongodb-update/) Would more be
better? Sure.

------
bro-stick
The author tries to paint Mongo as an embarrassingly short-sighted, pseudo-
enterprise company that can't share its toys with others.

Mongo could refuse this by demonstrating collaboration efforts and solutions
with a solution marketplace similar Atlasssian and VMware. On the partner
side, cross-selling, cross promotions and collaborative sales/product
strategies can reduce conflict and wasted/duplicated/unaligned effort that can
lead to sour partner experiences.

------
bricss
Guru author should marry on Postgres)

------
anotherevan
MongoDB: the snapchat of databases.

~~~
doug1001
brilliant.

makes me wonder if snapchat uses mongo; i can't think of one more suited to
snapchat's unique selling point.

------
click170
Is this posted anywhere besides LinkedIn? Would like to read but am not
willing to give page views to LinkedIn.

~~~
aidos
That's really silly. LinkedIn produce some really interesting engineering
content.

EDIT I'm not sure what Pulse is, but it looks like aggregated content.
Anyways, here's an article from the LinkedIn engineering team that's well
worth a read [https://engineering.linkedin.com/distributed-systems/log-
wha...](https://engineering.linkedin.com/distributed-systems/log-what-every-
software-engineer-should-know-about-real-time-datas-unifying)

~~~
bro-stick
Perhaps a "magazine" HR/engineering uses to promote inbound candidate flow and
knowledge sharing.

------
alexkavon
Looks like it's time to switch databases. BLOAT. RIP Mongo.

~~~
bro-stick
Or Mongo may need to refine collaboration with partners to sell more deployed
customer solutions... that's the only takeaway from this I see.

