
Mongodb – not evil, just misunderstood - Kristories
http://siddharth-ravichandran.com/2013/12/04/mongodb-not-evil-just-misunderstood/
======
yummyfajitas
The "no migrations" claim is a lie. Migrations still exist, they just happen
at the time of reading:

    
    
        def parseMongoRow(jsonBlob):
            x = None
            if jsonBlob['date'] < 2012/12/02:
                x = jsonBlob['foo']['bar']
            else if jsonBlob['date'] < 2013/06/01:
                x = mergeFields(jsonBlob['bar'], jsonBlob['baz'])
            else:
                x = jsonBlob['x']
            ...
    

For expiring fields in a redis database, this isn't a big deal. Your code
stinks between [time of migration, time of migration + ttl]. For a permanent
datastore, ouch.

I also do not understand why they are using MongoDB for this. They describe a
schema involving shipments, shippoints and orders, and one of these things
does not occur without the other (an attempt at justifying the document based
data model?). I.e., something like this:

    
    
        CREATE TABLE orders (...)
    
        CREATE TABLE shipments (
          order_id BIGINT REFERENCES orders(id) NOT NULL,
          ship_from_id BIGINT REFERENCES shippoints(id) NOT NULL,
          ship_to_id BIGINT REFERENCES shippoints(id) NOT NULL
        )
    

You can't have a shipment without a shippoint. Black magic!

They have several hundred shipments/day, and lets be generous and assume there
are 100 updates/notes to a shipment. We are talking maybe 50,000 inserts/day
(omfg big data, invest in a 1TB hard disk!) and it sounds like data that's
considerably more important than an ad impressions or pageviews. (Well
actually _it fits in ram_ , so maybe the 1TB hard disk on a dedicated server
is overkill.)

Also, consider the daily aggregate generator in 3 lines of SQL rather than 236
lines of javascript:

    
    
        SELECT date, carrier, zone, COUNT(id), SUM(price)
              FROM shipments
              GROUP BY date, carrier, zone;
    

I don't get it. How is mongodb even remotely the right tool for this job?

~~~
djrobstep
Mongo also stores the same column/attribute names with every single row. 50000
inserts? 50000 (usually identical) sets of attribute names stored (compared to
once for SQL). And to my knowledge they are stored uncompressed. For a Big
Data database, it doesn't seem very good at efficiently storing big data.

~~~
jaimebuelta
This could be a problem, for sure. There are some tools on top of MongoDB that
tries to reduce it, for example mongoengine will allow you to define a
"compressed name" to store in the DB, meaning that you'll see 'timestamp' on
your codebase, but that will be stored as 't'.

------
specialp
I have been using MongoDB since the beta days but after many replica
set/sharding bugs springing up, I am less inclined to use it. That was the
major feature we used if for. Addressing this article:

-Yes you can make aggregate results using map reduce (In javascript with a single core, idempotent) and you can store the aggregates as separate documents. I do not see what is so magical about this.

-It is true that you do not have a set schema thus no migrations, but you do have to have a schema in mind for nearly everything as something has to make sense of that field.

If you add fields to a document and you already have documents in there, you
are going to have to add that data to them unless you want them to be nil. If
you rename a field you have to iterate all documents and add rename it. If you
remove a field you have to remove it from all existing documents if you do not
want to have that data around anymore. Certainly you do not HAVE to do any of
this but if you are fetching meaningful data from a field you need to.

And you need to index on whatever you want to search on or the performance is
terrible as Mongo is not very good on the disk so you have to keep that in
mind too. I think most people were attracted to Mongo due to "no migrations"
and schema but when you work with it almost any use case you find yourself
using an unenforced schema and doing script based "migrations" The recent
Mongo criticism is that it markets itself as a all purpose DB but in reality
document stores are not that.

------
rdtsc
> How can the default writes be fire and forget? It just made sense, given all
> the information to configure it the way you prefer I would always go with
> this approach

Yes, we should bring unacknowledged writes as a default back.

(Often a technology's greatest detractors are its own fans, they are digging
its grave without even realizing).

------
nailer
Why I (most recently) dislike Mongo:

The current _stable_ _official_ node driver silently wraps all exceptions,
including in unrelated code launched from DB callbacks. This is acknowledged
by MongoDB inc.

[https://groups.google.com/forum/#!topic/node-mongodb-
native/...](https://groups.google.com/forum/#!topic/node-mongodb-
native/AZewJaX4YuE)

------
p4lindromica
I think this article misunderstands the implications of the fire and forget
write concern. Fire and forget means that mongo has acknowledged a write that
may not be persisted to disk. It appears the author uses fire and forget for
notes, which are acceptable to lose, and tracking information, which are not
acceptable to lose.

The author states the only downside of fire and forget is that data may not be
available on subsequent queries. While this may be true, this should not be
what you are worrying about. The downside is that when your primary crashes,
or becomes overloaded to the point where your slaves all are lagging
significantly and you need to switch primaries, the data is inconsistent and
you will lose that acknowledged write.

------
djur
MongoDB isn't necessarily bad, but there's a number of valid reasons it's
received a lot of bad publicity.

It was a high-profile representative of the NoSQL trend and that trend is
experiencing a backlash. Since it has a relatively easy learning curve and a
JavaScript-centric interface (JS also beng trendy), it got a lot more uptake
by startups and inexperienced developers wanting to learn about NoSQL (or move
away from the supposedly moribund relational model).

The ad-hoc querying makes it tempting to build object mappers around Mongo and
write the kind of code you would with an ORM. A lot of NoSQL refugees did
exactly that. You can get pretty far doing that before the caveats with that
approach kick in. It's much harder to fall into that trap with something like
Riak.

And then MongoDB itself has had some poorly chosen defaults and design
decisions. These wouldn't be so high-profile if it wasn't for the other
issues.

------
coldtea
> _The advantages of schemaless documents are priceless. Not having to migrate
> is just one of the perks._

Was the first sign that this person doesn't know what he is talking about...

Of course you still have to migrate. Either your DB or your code, because you
can't handle a change in how you store stuff without one or the other.

"No migrations" is true only for the most trivial of changes, and only if
you're willing to handle special cases in your code.

~~~
Pxtl
I'd wager that the vast majority of schema changes are adding new columns or
new tables-as-child-relations, so to be fair those are free in a schemaless
DB. So the "trivial" case is actually pretty common.

Honestly, I think the difference between schemaless and SQL corresponds very
closely to the differences between static and dynamic-typed languages. The
schemaless approach provides faster iterations and the ability to simply
express things that require elaborate language features in a static-typed
language. The difference is that in a DB, you have to support the data created
by all your previous versions, so a DB requires far more discipline than the
application code.

I can see the appeal, but I'm not sure that it's worth it.

------
codex
For every positive article on MongoDB there are ten which are negative. Now
that's a strong brand.

~~~
davidw
Yep - when articles about something are prefaced with things like "not really
evil" or "not as bad as it's made out to be", it's a pretty sure sign that
your brand is tarnished.

------
tilsammans
I guess I am old-school but you are listing a bunch of reasons why MongoDB is
not evil, yet each and every one of these reasons turns out to be extremely
risky business. All of which simply do not apply with relational datastores.
The thing I took away from your post is that had you used a 20-year-old
relational datastore you would have 0 of your issues anyway.

> The advantages of schemaless documents are priceless. Not having to migrate
> is just one of the perks. Our schemas were largely in the form of Orders
> (having many) Shipments (going_from) ShipPoint (to) ShipPoint

You say priceless. I don't think it means what you think it means. A migration
is costly but also pretty rare. I migrate PostgreSQL with 100k+ rows as a
matter of routine, it's over before you know it. The schema you are using
(orders have many shipments going from point to point) are easily expressed in
a relational schema and once defined would hardly ever need to change, if at
all during the lifetime of the application. So what if I need to add a column
here or there. It won't matter at all. Do you have more than 100 million
documents in MongoDB? I guess you don't. Even if you do, relational has that
covered too.

> This doesn’t always have to be the case, though it significantly contributes
> to Mongodb’s fast writes.

What you are saying is that I need to change MongoDB in order to make it safe.
Relational database are safe out of the box, no change necessary.

> We add a lot of Notes to each shipment [...] it doesn’t critically affect
> the business workflows of the application.

Say __what __?

You're fine with data, even notes, being lost? That is completely acceptable
to you? I guess this is what shocks me most. You kids think it's normal to
lose data, and consider storing a note to be optional or something. It baffles
me. If a note is optional, why have it in the first place?

> They do but since most of the stuff is memory mapped

Translates to: you need to have your data in RAM. This does not scale at all.
It doesn't even begin to scale to the level where MySQL was. TEN years ago.

> Here is a simplified snapshot

What follows is a class that is 236 lines long. Two hundred and thirty-six
lines long. Dear sir, if this is your simplified code I fear what your actual
production code looks like. If you committed that to one of my repos we would
have a very serious talk. Also you would do this exactly once during your
career at my company.

> I haven’t even touched upon the replication and sharding features that
> Mongodb offers which I will reserve for another post.

Which every relational store also offers.

> To summarise I feel Mongodb is awesome

Why is it awesome? You have only shown me why it is horrible. I have seen
nothing that is awesome. Optional data persistence, needs huge amounts of RAM,
complex application level code to deal with reports, this is all stuff that
you can do better, faster and more reliable with a relational solution.

~~~
raverbashing
"A migration is costly but also pretty rare. I migrate PostgreSQL with 100k+
rows as a matter of routine, it's over before you know it."

Not if your application is still on development.

100k+ rows? Not complicated. Try with 23M rows.

Nobody would risk migrating that table.

~~~
acdha
How does this change with MongoDB? The question of development support applies
equally to both but it's usually _MUCH_ safer to add a SQL column (default
null, etc.) than to dive into a thicket of app-specific JS.

This has been happening for decades in the RDBMS world – even the ultra-
conservative Oracle admins I've worked with were willing to come of out in-
place retirement long enough to do something like that.

~~~
raverbashing
Changing code is one thing, changing the DB is another. A deployment that
changes only the code is simpler than one that changes the DB.

How many DBs do you have? Testing (usually local), staging, production? For
how many sites?

It works the same, if row['new_field'] do_something() else do_something_else()

You have unit tests to make sure it works.

And if it was so safer and easy to do it PostgreSQL wouldn't have added the
json field.

~~~
acdha
> A deployment that changes only the code is simpler than one that changes the
> DB.

In your experience, perhaps, but that's reversed in many other places.

As for everything else your point is only accurate if you assume that
migrations are done by hand. If use a migration library it's impossible to
forget to apply a migration to a database so there's no problem working with
copies or even forks of databases.

> And if it was so safer and easy to do it PostgreSQL wouldn't have added the
> json field.

You're implying causation incorrectly: hstore and JSON are useful for cases
where you explicitly do not want schema enforcement or can't afford the
performance impact of normalization. This is not saying that migrations are
hard, merely that not all problems have the same best solution.

------
larsmak
About a year ago I implemented a sub-system relying heavily on MongoDB. The
load is not that immense, a couple of hundred requests per minute. The dataset
is large however, several hundred GB, spread over millions of documents. Also,
updates happens in batch, during the night, while reads are happening all the
time. I have not had to touch the system since it was put in production over a
year ago - it just runs.

MongoDB is a tool, understand it's strengths and limitations and it will serve
you well. We achieves great performance for our use-case by correct schema-
design / partitioning of data, and sane use of indexes - which are excellent
in MongoDB. If you need to scale large, you need to store the data in such a
way that it does not require much resources to fetch them, i.e. you must store
the data according to your read-requirements. This is even more true in "key-
value" systems like Cassandra, which is more limited in how you can store
data. MongoDB is very flexible, so it's a lot easier to shoot yourself in the
foot.

------
camus2
Mongodb is not evil ,it was just sold as something it is not.

10gen were good at marketing, but businesses require results, not marketing.

MongoDB solves little problems.

------
smagch
You may take a look at a "MongoDB Gotchas" discussion.

[https://news.ycombinator.com/item?id=4745067](https://news.ycombinator.com/item?id=4745067)

~~~
koffiezet
Looking at that list, I would conclude:

"MongoDB Gotchas and How To Avoid Them: Don't use MongoDB"

It doesn't inspire much confidence in reliability to be honest...

------
shaneofalltrad
I have seen a lot of good use and success stories with MongoDB as well. This
negative reaction lately makes me want to give it a shot and see why there is
so much fear.

~~~
crassus
Nobody denies that it's an attractive API, but the problems I've seen are in
companies at scale. You won't run into insurmountable problems at first, only
later when performance matters.

~~~
goldenkey
I actually have first hand experience from merely a small data set of 37,000
records of a couple fields each. I needed to run a batch update to calculate a
property of each document. The property is dependent upon a count of other
similar documents. Turns out that it was so unbearably slow, that there was no
way to do the batch update without waiting hours.. I searched long and hard
for a solution to the write/read locks but there is none. So this is my holy
grail to share, and if you don't use Mongo..more power to you.

I ended up figuring out a workaround but it left a bad taste in my mouth. My
workaround was to create a temporary collection and insert all the previous
records, with their Category attribute updated, in a single .insert call.

I am more than sure that an SQL UPDATE query using a subselect for the
COUNT(*) would have probably executed in less than a couple seconds. That's
what's so sad about MongoDB. And I even had everything indexed, it made almost
no difference. The write/read locks are _murder_.

    
    
        houses.db.eval(function(){
    		const tmp = db[Math.random().toString(36).slice(2)];
    		tmp.insert(
    			db.houses.find().map(function(e){
    				e.Block = db.houses.count({
    					Street: e.Street,
    					Sector: {$lte: e.Sector},
    					Quadrant: 1
    				});
    				return e;
    			})
    		);
    		tmp.renameCollection("houses", true);
    	}, {nolock: true}, function(err){
    		dbCallback(err);
    		console.log("Done!");
    		process.exit();
    	});
    

Eh.

~~~
lennel
that is an paragon of maintainability if I ever saw one.

~~~
goldenkey
I hope that is sarcasm :-)

------
Pitarou
Disappointing. I want to like MongoDB, but if this is the best a MongoDB
apologist can come up with, I'll stick to SQL with a decent O-R mapping.

~~~
pkolaczk
Fortunately MongoDB is not the end of NoSQL. Some other NoSQL database stores
have huge advantages over relational database systems, especially in terms of
reliability, scalability and performance-to-price ratio.

~~~
poseid
what do you think of [http://www.arangodb.org/](http://www.arangodb.org/) ?

------
smegel
> Mongodb expects that your working set fits into RAM along with the indexes
> for your database.

I don't really get this. Surely any database system is going to be faster if
all frequently updated/accessed pages fit into ram...what makes mongo special
in this regard? Why does it degrade so badly when it has to access disk
(beyond the obvious)?

~~~
jaimebuelta
The problem with this in MongoDB in particular I (think) is due the fact that
it tries to work transparently on memory, leaving all the internals of moving
things from disk to memory to the OS. Therefore, the DB be accessing disk
without knowing, and being able to make any possible mitigation strategy.
Maybe other DBs can deal with it in a more intelligent way, working better in
degraded performance mode.

But yes, this is a problem on any DB. The moment it hits disk, performance is
terrible.

~~~
acdha
> But yes, this is a problem on any DB. The moment it hits disk, performance
> is terrible.

This is an over-simplification for any mature database unless all you care
about are massive, completely random read workloads[1]. RAM will obviously be
faster but there's a difference between terrible and performing at the level
of the underlying disk subsystem. With a decent system, commits per second
should track the underlying storage array's IOPS, large read / write traffic
should be capable of approaching disk bandwidth, etc.

Where things do get bad is when the database is doing more random I/O than
required by the workload – if storage is fragmented, it's playing sim-MySQL
and creating lots of temporary tables, etc. you will see performance which is
pathologically worse. This is a bug and should be fixed.

1\. If writes aren't disk I/O limited, you should expect data loss because
it's lying about durability.

------
nasalgoat
After my experience with MongoDB at scale - I was running one of the largest
MongoDB installations in the world according to 10gen - I have since run into
the arms of postgres and wish to never repeat the horrors I experienced over
there.

I think the main issue is that people want something to be simple, and now
that I'm dealing with the somewhat esoteric and opaquely documented postgres,
I can understand that feeling. Native replication and auto-failover (via
pgpool) is a bit of a black box on postgres, but under MongoDB it was fairly
simple.

The problem is, it's complicated for a reason, and that reason is scale. What
took me over 100 masters in MongoDB will only need two postgres boxes, so it's
worth dealing with the Oracle of Open Source to make it happen.

------
poseid
anyone here has looked at the discussion on ArangoDB a bit up -
[https://news.ycombinator.com/item?id=6859767](https://news.ycombinator.com/item?id=6859767)
\- also a schemaless, but compressable document datastore

