
Think before you Mongo - sundip
http://blog.runnable.com/post/149000201856/think-before-you-mongo
======
ta0967
like the yesterday's submission of "Why you should never use MongoDB (2013)"
([https://news.ycombinator.com/item?id=12290739](https://news.ycombinator.com/item?id=12290739)),
this is all just plain sad because totally unnecessary: the failure of
hierarchical databases was obvious in 1960s and was the background of Edgar F.
Codd's work on the relational model.

unfortunately, this industry is dominated by cocky PFYs (of any age) convinced
they don't need to study history of their field and, as a result, don't even
recognize they're repeating 50 years old mistakes. sure, SQL has fallen so
short of the promise of the relational model it's not even funny, but don't
conflate the model with the query language, folks!

~~~
catnaroek
The relational model _does_ have deficiencies, but the right way to address
them is with _more_ powerful schemas, not less. See: “categorical databases”.

(Think about it this way: SQL is Java. NoSQL is your typical extremely
forgiving dynamic language. We need a database equivalent of ML and Haskell.)

NoSQL is simply the result of not wanting to think about the logical structure
of data. Plain intellectual laziness.

~~~
jimbokun
"NoSQL is simply the result of not wanting to think about the logical
structure of data."

NoSQL was an attempt to scale by sacrificing some of the capabilities of the
relational model. Key value stores scale great, at the cost of having almost
no query capabilities to speak of.

Now, some developers may adopt NoSQL due to the ease of getting a new project
started. But I don't think that was the main motivation of the developers of
the major NoSQL databases.

(Although, NoSQL is so broad I'm sure there are counter examples.)

~~~
catnaroek
> NoSQL was an attempt to scale by sacrificing some of the capabilities of the
> relational model.

You can give up global consistency without sacrificing local (single-node)
consistency. And normalization isn't an all-or-nothing proposition: you can
select the kind of schema that best fits your needs. Unlike the case with
NoSQL, which just says “lalala... I can't hear you” whenever you bring up
consistency.

> Key value stores scale great, at the cost of having almost no query
> capabilities to speak of.

Far more worrisome is the loss of data integrity guarantees. It's okay to let
me selectively disable these guarantees when I don't need them (say, by using
a less structured schema), but a “database management system” that doesn't let
me enforce the intended structure of my data, under any configuration, is
simply not worthy of the name “database management system”.

------
jlaustill
The author freely admits they tried to do a relational schema in a non-
relational database. Then proceeds to say it was the wrong choice of product,
instead of design. This confuses me. I use MongoDB as my main database, and it
works great for me. However, I did have to learn that it works differently,
which meant writing my code differently.

My question is this, if you tried to use PostGres and designed your tables
super super wide and embedded everything in large object data types and then
it got to complicated to manage and figure out, would you consider that a
failure of PostGres, or your failure as a developer/architect?

I will have to chock this up to a good laugh myself. Lastly, I've used
relational databases for most of my career, and I'm very good at writing SQL.
But after learning MongoDB and how to use it properly, there isn't a lot that
I would want to go back to an RDBMS for. And with the future roadmap of
MongoDB, I don't see that changing.

------
dccoolgai
I agree... but at this point, it's tough to see with all the ink that's been
spilled on these issues for years how you could think anything else. Maybe I
read HN too much, but the manifold problems with MongoDB have been widely
publicized for the past 6 years... it seems pretty close to conventional
wisdom that you're going to have those problems if you decide to use Mongo.

~~~
eldavido
You have to separate between "Mongo the database" and "Mongo as it's used by
companies"; the latter causes far more problems than the former.

I last used MongoDB seriously in 2012-2015. We had myriad operations problems
including inconsistent indexing across shards (where some shards had an index
created and others didn't, it was baffling), issues with the balancer not
moving chunks properly, and more. Also it's just _different_ than other DBs
with its lack of transactional consistency (I think they've made progress on
building this), but that's part of why it's fast.

However, the bigger problem is that document databases -- in general -- enable
a kind of software development where the model sort of emerges over time,
rather than being carefully designed from the beginning. Yes, it's flexible,
but you pay an absolutely enormous cost down the line dealing with
inconsistent documents. It's not like code where if you do something stupid,
you can fix it over time with refactoring and "remodeling" \-- data has
_mass_. You can get into a situation where, with a large data set, it can take
a week or more just to run the migration script required to scan an entire
collection and rewrite a few billion documents into a new, better format.

There is no such thing as a "schemaless" database. That's like saying, oh
sure, we just have a bunch of 1s and 0s in memory -- our data is
"structureless". The question is whether the database enforces the schema, or
not. And I think that in a lot of cases, it's a lot worse to have an
"uncodified schema" than a rigid, but at least well-defined, one, that's
consistent across the data at all points.

Sidenote: It's also occurred to me over the past few years that it's almost
impossible to impose a consistent schema on a large enough dataset. If you
truly are dealing with "big data" (TB/PB scale) maybe go straight to the
document store of columnar because doing a migration is outright impossible,
but don't be so quick to write it off for GB-scale datasets.

~~~
dccoolgai
I would say the latter and the former are inexorably intertwined; i.e. Mongo
causes problems for companies _because_ it does things you would not expect a
database to do, like your example of phantom indexes...and pretty much all
bets being off when you try to leverage sharding... the one thing it was
supposed to be able to do to scale...

~~~
eldavido
Couldn't agree more.

We had more problems with sharding over the years than you could imagine...the
distributed locking mechanism didn't work a lot of the time, the balancer
didn't work, weird consistency issues between the config servers,
configuration that didn't get replicated across all shards, stupid shard key
selection (admittedly our fault but there really should be better guidance on
this topic), etc.

------
elmigranto
> Building an MVP on a super-tight schedule (early-stage apps, school
> projects, hackathons, etc.)

This sounds like Proof of Concept, not Minimal Viable Product.

~~~
pbreit
Still almost always better off on SQL (Lite or Postgres).

------
larrik
I think Mongo is GREAT at one very important thing:

Storing other people's data.

If you need to consume other services, especially if its more than one, it's
hard to beat Mongo (or other NoSQL databases). You get a lot of search power
(not always the easiest to tap, but you get it), and your app won't break when
they change their formats. If you need a LOT of it, even better.

I would never use it as my main datastore, though. At least not for any
projects I can think of offhand.

------
ergo14
Postgresql 9.4/9.5 offers nice support for JSON type data structures, if
someone wants to iterate fast - maybe they should start with that as a
baseline.

------
devuo
It's absurd to use a schemaless database for the simple sake of "rapid"
iteration! That is so, so incredibly inefficient. You need — NEED — to
understand your domain data (its structure, volume, usage, etc.) before you
even begin writing a single line of code or picking a specific DB vendor. This
— and only this — will rightfully inform you of what your actual technical
requirements are.

------
abhishekash
If we treat schemaless is to Mongo then we are sure to abuse it after
sometime. There are several other factors to consider. Mongo is NoSQL db and
not a greek god of data storage. So, do not curse it for your sins.

------
seagreen
Are there any key/value JSON databases that enforce JSON Schema (or some other
schema language)? That seems way better to me in situations where you actually
care about data integrity.

~~~
findjashua
Any particular reason you want it at the db layer? Wouldn't having it at the
query builder/orm layer suffice?

~~~
seagreen
I think it would be useful in various situations.

In my particular case clients of different quality are connecting to a local
datastore. I'd like to make sure that even if they mess up validation the data
in the store still matches the schemas it claims to. Of course, I could have
them connect to a local process instead and have that process handle
validation before the data goes in the store, but it's always nice to avoid
intermediaries.

------
stonewhite
Is it that time of the year, when mongo-hating blogs start popping up like
daisies?

------
BillFinchDba
I am coming from an 18-year SQL admin/dev background, and am both an MCDBA and
MongoDB certified DBA. I and my DEV team have been incorporating MongoDB into
our stack for about 5 years now in multiple use-cases.

I am using MongoDB in my production environment for things like consuming
incoming rates from different vendors, storing and serving pay stubs and
client invoices to our web customers, controlling MSMQ message queues,
archiving client emails for historical audits, etc. The key here is that they
are document oriented entities, not normalized relational data.

What it comes down to is a willingness to be agnostic in selection of your
database platform. Or more to the point, to let your use-cases drive the
platform instead of the reverse. If you are going to develop a use-case that
requires frequent partial updates, JOINs between multiple data structures, and
traditional entity normalization, then a traditional RDBMS is appropriate. If
your case allows for a denormalized mode of storage where all of the relevant
information is contained within one document structure, and you can benefit
from a fluid design, then MongoDB could be a good fit. If you need ephemeral
key-value pair structures such as in session state caching, then something
like Redis may be more in line with the requirements. They all have sweet
spots that they fill well...

We all tend to have our preferred DB "hammer" to drive developmental "nails".
What I propose is that we need to have an entire database toolbox from which
to choose the right tool for each job.

Others have commented here about the need to thoughtfully plan before you
write one line of code and/or choose your DB platform. I have to agree
completely. You can map a typical relational use-case to MongoDB very easily
to start with, but you do need to be able to enforce some level of control.
MongoDB does this now with document validation.

[https://docs.mongodb.com/manual/core/document-
validation/](https://docs.mongodb.com/manual/core/document-validation/)

You also now have document level atomicity and transaction-like behavior as of
MongoDB 3.2.

[https://docs.mongodb.com/manual/core/write-operations-
atomic...](https://docs.mongodb.com/manual/core/write-operations-atomicity/)

MongoDB, like the rest of us, is constantly iterating and improving. If you
have not looked at it lately and are basing your opinions on earlier versions,
I would encourage you to take another look...

Just my humble opinion...

------
fiatjaf
Think before you Nodejs.

~~~
mafro
Please elaborate. This comment provides little to the discussion.

~~~
beached_whale
Not nodejs, but I am in the process of tracking down and event loop loop (e.g.
emit( event_id ) somewhere in the path of an event lisener of that name). I
assume this can happen within node too.

The stack trace is amazing to see.

~~~
debacle
What tool are you using that allows you to see the stack trace on async
events? That seems like a good debugger.

~~~
beached_whale
The code is based on boost asio, so the io_service is on the same thread as
your code if you only start 1. Then after that I am in gdb/ldb. Asio's
io_service handles the async calls portion and dispatches to my callback. From
there on it is no longer necessarily async. The fun is that I have a higher
level event handler sitting above this to register/unregister listeners, like
one would in node. If one of those listeners happens to emit a message to what
is is also listening for, boom.

long store short, it's not the async code per say, but after that.

------
CodeSheikh
Any thoughts on Amazon's DynamoDb while we all are fresh on Mongo discussion
here? Thanks.

~~~
buckbova
I'd like to know this too. I've just started a proof of concept s3-lambda-
dynamodb project and I'd like to know what others think about it.

Also I know document dbs don't aggregate well, but wouldn't it be appropriate
to have your app backend in NoSQL and have etl or other duplication to an
rdbms or possibly cassandra?

------
rakibtg
"We’ve lived and we’ve learned, and now we’re in the process of migrating from
MongoDB to PostgreSQL." Why PostgreSQL instead of MySQL?

~~~
shandor
> Why PostgreSQL instead of MySQL?

Why, is there something inherently better in MySQL that makes their second
choice also suboptimal? Genuine question, I have very limited knowledge on
DBs.

------
NDizzle
Mongo only pawn in game of life.

