
Call me maybe: MongoDB (2013) - cirwin
http://aphyr.com/posts/284-call-me-maybe-mongodb
======
garretraziel
I'm building small application using Node.js and MongoDB and I'm planning to
host it on Openshift or Heroku. All that hate that MongoDB takes here on HN
makes me reconsider technologies I am using.

I will not have many relations in my database (model User, model Document,
User owns Document... and that's all) so I thought that NoSQL databases will
do. Plus, MongoDB lets me use GridFS - I'm planning to store pdf presentations
in it.

If I should drop MongoDB, what other technology should I use? Or should I fall
back to Postgres + ORM and manage my files in filesystem manually?

I don't want to start a flame, I am looking for an advice. I have considered
MongoDB to be "good enough" as GridFS lets me store my files without a hassle,
but after all that I read on the Internet, now I am not so sure.

~~~
threeseed
Most of the MongoDB hate comes from the PostgreSQL crowd.

It's actually a fine database for many domain models i.e. lots of nested data
and GridFS does work pretty well. That said don't use GridFS. Use something
like S3 and reference the files.

And if you are planning to use MongoDB then look at MongoLab/MongoHQ. Anyone
who says you should run your own database should NOT be listened to. Use a
hosted solution if you are starting off small. You don't want to be spending
your valuable time testing your backups (just one of the many operational
activities most people don't do).

~~~
jamesaguilar
> Most of the MongoDB hate comes from the PostgreSQL crowd.

Or anyone who doesn't like losing their customers' data.

------
StavrosK
Is there _any_ datastore in this series that behaved correctly in a partition?
I've seen ElasticSearch, Redis, Riak, Mongo, and all of them crapped their
pants.

~~~
nemothekid
Cassandra, Zookeeper and Kafka didn't shit the bed.

~~~
dbenhur
On Cassandra, some CQL collection operations behaved well (adding elements to
a set). Everything else Kyle tested there was demonstrated to lose data.

Kafka lost data all over the place under Jepson; though Kyle offered great
respect to the team and expects them to deliver optionally configured safer
semantics (at a performance cost) in future releases.

Riak was totally solid when using CRDTs and turning off the insane LWW
default.

------
olegp
For all the hate MongoDB gets, I have to say building queries in server side
JS using MongoDB JSON syntax rather than SQL style queries is the way to go.

Check out these two examples:

[https://github.com/olegp/stick-blog-
pg/blob/master/lib/serve...](https://github.com/olegp/stick-blog-
pg/blob/master/lib/server.js) \- uses Postgres
[https://github.com/olegp/stick-
blog/blob/master/lib/server.j...](https://github.com/olegp/stick-
blog/blob/master/lib/server.js) \- uses MongoDB - much easier to construct
dynamic queries using the JSON syntax

There are of course plenty of other things a relational DB like Postgres has
going for it, so I've been experimenting with having a MongoDB interface to
Postgres with data stored using the JSON datatype. It's now feature complete
and passes all unit tests with read performance exceeding that of Mongo in
some benchmarks:

[https://github.com/olegp/pg-mongo](https://github.com/olegp/pg-mongo)

~~~
buster
It's probably nice if you only write JS (say, you write a node.js app). To me
SQL is still the best query language we have. It's usable amongst a wide range
of database servers, from tiny embedded sqlite to oracle. It's _easy_.

One side that NoSQL fails on is to not have a common query language, imo. You
chose MongoDB in your project and want to migrate to another database? Not
that easy. Switch from MySQL to Postgres? Probably not a drop-in replacement,
but much easier to do.

~~~
twic
> To me SQL is still the best query language we have.

OQL! Okay, so nobody ever implemented OQL. But there are OQL-inspired query
languages in production which i prefer to SQL for routine use, such as JPQL.

One reason for that preference is the ability to join through foreign keys
with a syntax which resembles property access on objects:

    
    
      select e
      from Employee e
      where e.department.head.manager.level = 'VP'
    

This beats the equivalent SQL:

    
    
      select e.*
      from Employee e
      join Department d using (department_id)
      join Employee h on d.head_id = h.employee_id
      join Employee m on h.manager_id = m.employee_id
      where m.level = 'VP'
    

Admittedly, whilst JQPL is nice for this sort of routine fetch-and-filter
stuff, it lacks the more powerful features of SQL like window functions,
recursive common table expressions, etc. I don't often need those, but when i
do, it would be rather painful to do without them.

------
_JamesA_
Does anyone have experience/comparisons with OrientDB?
[http://www.orientechnologies.com/orientdb/](http://www.orientechnologies.com/orientdb/)

It seems to be much lesser known but it ticks all the right boxes compared to
other NoSQL datastores.

------
Gonzih
With rise of RethinkDB it would be lovely to see similar post on it.

~~~
hopeless
I've looked into RethinkDB and I really like it but… you have to migrate your
data between each 1.x -> 1.y release which might be trivial early on but
impossible at a larger scale :-/

[http://rethinkdb.com/stability/](http://rethinkdb.com/stability/)

~~~
neumino
This will not be needed for the next releases (if everything goes as planned)
[https://github.com/rethinkdb/rethinkdb/issues/1010#issuecomm...](https://github.com/rethinkdb/rethinkdb/issues/1010#issuecomment-47996409)

~~~
hopeless
Wow! That's a huge leap forward. I can look seriously at RethinkDB again for a
new project. Is there a public roadmap where I could have seen that coming?

~~~
fwr
You missed an opportunity to say that you need to rethink using it.

------
ulisesrmzroche
This is ancient stuff though, is this still relevant today?

~~~
leif
Yes, there are still problems with the election protocol, e.g. [1]. The right
kind of network partitions can cause multiple primaries to stay up
indefinitely, accepting writes on both sides of the partition, which will
eventually be rolled back. There is another problem with the election protocol
that allows writes acknowledged by a majority of machines to be rolled back
after an election.

Both of these problems can be fixed by using something like Raft[2] or Paxos
for elections, rather than the ad hoc mechanisms used today.

In TokuMX[3], we're currently working on replacing the election algorithm with
something similar to Raft, that will eliminate these sources of data loss.
We've heard that MongoDB is also working on fixing replication, but we don't
know what their exact plans are (they have a bigger challenge since they need
to stay compatible with their existing replication algorithms, which use
timestamps as transaction identifiers) or whether these fixes will end up in
2.8 or in a later version.

[1]:
[https://jira.mongodb.org/browse/SERVER-9848](https://jira.mongodb.org/browse/SERVER-9848)

[2]:
[https://ramcloud.stanford.edu/wiki/download/attachments/1137...](https://ramcloud.stanford.edu/wiki/download/attachments/11370504/raft.pdf)

[3]: [http://docs.tokutek.com/tokumx](http://docs.tokutek.com/tokumx)

~~~
ericingram
Once again, a TokuMX engineer steps up to explain issue and offer a potential
solution. I can't help but wonder why MongoDB engineers aren't doing this. But
no matter, just glad we're using TokuMX.

~~~
nevi-me
I think if I was working from someone's codebase, with about 80-90% of what I
need already built-in, I would have the time and resources to make
improvements on my fork, and shine glorious over the software making the base
of my product.

I wanted to try TokuMX months ago, but when I learnt that the version at the
time was based on Mongo 2.2 I shied away from it, because I need GeoJSON
capabilities. I remember that with 2.6 one of TokuTek's engineers said that
they needed to look at Mongo's code and start playing catch-up, I don't know
if they've done that so far.

What will Mongo 2.8 mean for TokuMX? We're seeing document-level locking,
possible B-Tree improvements (I presume Toku's R-Tree/Fractals [can't remember
which they use] will still be superior), possible transactions (although
what's on JIRA hasn't convinced me so far) and a few other improvements and
Performance Boosting Things. So to what scale with Toku remain relevant if
they don't keep up to date with Mongo, because in my case, using their
versions based on 2.2, their ideology of being 'a drop-in replacement for
MongoDB' doesn't work.

I'll go to their Github page and try see whether they've merged the 2.6
codebase to their latest versions though :) EDIT: from looking at their
release changelogs, as of October last year, they were in parity with Mongo
2.4, with the exception of geo-indices and full-text search, and 2.6 is still
an open milestone.

It kind of feels like the Joyent vs Strongloop thing on Node.js, but I wonder
if TokuTek employees push bug-fixes upstream to Mongo, or whether they just
fix them on TokuMX and use that as a selling-point; again with this I'll have
to do some digging to inform my opinion, but I'd appreciate if someone who
knows could clarify it.

~~~
zardosht
Another engineer at Tokutek here. As you see, we are up to 2.4, and have been
investigating 2.6 and Geo. With all possible features, whether they be from
MongoDB 2.6 or things we innovate on our own like partitioned collections, we
prioritize and address them based on customer and user feedback.

Also, 2.6 is not an all or nothing proposition that needs to be done in one
release. Features with the most demand (whether it be the new write commands
or aggregation framework improvements) will be done before others. We've done
this before. When we released 1.0 that was based on 2.2, we also released hash
based sharding with it which was a 2.4 feature. We did so because users
demanded it.

As for pushing bug fixes upstream, we file bugs when we see them. Our VP of
engineering was a winner in the MongoDB 2.6 bug hunt with SERVER-12878.
SERVER-9848 and SERVER-14382 are among the bugs I've filed.

~~~
nevi-me
Thanks for the response, I read a post on the mongo-user group , and that's
what I noticed, that a number of features are ported as and when necessary.
Don't read what I say in a very negative sense, because I'm mostly curious,
and it's my opinion that sometimes the little that we (I) get exposed to
regarding TokuMX specifically is that it's superior to Mongo, that it's a
"choose us or lose out" thing, but that happens when one doesn't follow a
certain topic, but only sees it being mentioned here and there (understandable
since Mongo has been the subject of "my start-up failed, and I blame it on
Mongo; so burn Mongo" kind of discussions).

One more question if you don't mind: since MongoDB will support various
storage engines from 2.8, including Tokutek's storage engine (can't remember
its name); notwithstanding other innovations on TokuMX, would switching from
mmap to Tokutek's storage engine mean that one ends up with Mongo having geo-
indices and other bells, while having TokuMX's main feature?

~~~
zardosht
Your last question is a bit loaded with a bunch of "ifs", so let's unwind it.
I don't know what MongoDB will "support" as far as other engines go. But
assuming we, Tokutek, release something that we support that is our engine
plugged into 2.8 using MongoDB's storage engine plugin, then according to the
design we heard about at MongoDBWorld, that product will be what you think it
is: Mongo with geo and "other bells", and TokuMX's compression + write
performance.

But 2.8 is a bit away and the storage engine API is a very fresh development.
I don't think anyone is in a position to be able to really guarantee what it
would look like and how TokuFT ([https://github.com/Tokutek/ft-
index/](https://github.com/Tokutek/ft-index/)) will plug into it. I definitely
cannot make any promises.

If you are interested in TokuMX + some missing features from MongoDB (sounds
like geo), and don't mind discussing your needs and use cases with our sales
guys, please give us feedback at
[http://www.tokutek.com/contact/](http://www.tokutek.com/contact/). As I
mentioned previously, user feedback drives what we do, so at the very least,
you can provide some additional data points.

------
onedev
People still use MongoDB in 2014?

~~~
ulisesrmzroche
Yup. Super big names too like Verizon, Forbes, and the British Government too.
So...

~~~
gaius
I'll wager all of those have more COBOL than they do any other technology.
Probably 1000x more than they're running on MongoDB.

