
Why I Migrated Away From MongoDB - svs
http://svs.io/post/31724990463/why-i-migrated-away-from-mongodb
======
gregjor
You were fortunate to recognize that MongoDB was the wrong tool for your job,
and lucky to be able to move to Postgres instead of continuing to throw your
time and effort away. I see the ad hominem "you're an ignorant idiot" attacks
already started, along with advice like using regexes to do case-insensitive
searches. Watching the NoSQL "movement" encounter the problems RDBMSs fixed 20
years ago and then hand-wave and kludge them away is frustrating. I wrote
about some of this in <http://typicalprogrammer.com/?p=14>.

Look at the bright side: programmers who are writing NoSQL-backed apps are
creating the fossil fuel that will keep programmers who know RDBMs working
into our retirement years. I already have more work than I can do fixing web
apps that were built around crap data management tools that failed to scale
beyond a few thousand users. Your Postgres expertise will still be a money-
making skill long after MongoDB is forgotten.

~~~
gaius
The thing with the NoSQL guys is that many of them seem not to be in a
position to make an educated comparison. For example, an, uhh, _enthusiastic_
MongoDB advocate recently informed me that MongoDB was superior to Oracle
because in Oracle you had to poll a table to see if it changed. Except, no,
that isn't actually true:
[http://docs.oracle.com/cd/B19306_01/appdev.102/b14251/adfns_...](http://docs.oracle.com/cd/B19306_01/appdev.102/b14251/adfns_dcn.htm#BGBBHGAH)
\- and that document is from 2005. And you could do a trigger and an AQ
message/callback 5 years before that (at least). You haven't needed to poll an
Oracle database for changes in a loooong time.

Basically, every evangelism point, you have to double-check and cross-
reference, because as you say, the NoSQL guys are encountering issues the
RDBMS community addressed years ago (7 in my example, but the sharding stuff,
20+ years) - except they think they are discovering it for the first time!

~~~
taligent
This kind of rubbish really needs to stop. Just because you don't agree with
or understand their choices does not mean that the majority of "NoSQL guys"
are ignorant or uneducated.

Some of the biggest companies e.g. Twitter, Foursquare, Google, Amazon all
rely on NoSQL.

The real issue I see is that by dismissing NoSQL as only for fools RDBMS
developers are failing to see why they are popular to begin with. Take
PostgreSQL for example and how difficult it is to shard/replicate compared to
CouchDB or MongoDB. This is an area PostgreSQL should see as an opportunity
for improvement.

But dismissing huge groups of people as uneducated just makes you seem
uneducated.

~~~
gaius
I think we did "sharding" with relational databases... Back in the 80s. Then
we got fast hash joins and partitioning and it turns out that the
disadvantages of sharding just aren't worth it. The NoSQL crew will figure
this out too around 2030 :-p

~~~
einhverfr
Be a little careful with this level of disdain.

Sharding may have been available in the 1980's, but what it lead to in some
products is quite amazing. Consider Teradata's clustering ability which is
sort of like sharding your database but without the disadvantages typically
associated with it. Postgres-XC now offers something similar as a semi-fork of
PostgreSQL.

Basically what we are talking about here is a two-tier database layer where
storage and coordination are separated, and two phase commit is used between
these two. Thus the coordination tier can enforce referential integrity
between storage nodes if necessary and thus allow write-extensibility.

This isn't something without uses. For high-end, high-write-load databases,
serving very large amounts of traffic (think airline reservations), this has
been a typical approach for quite some time.

The fundamental problem though is that once you give up on local consistency
over a given domain, you cannot have any guarantee of global consistency. The
current relational approaches (Postgres-XC and Teradata) both enforce ACID
compliance. BASE doesn't offer any consistency guarantee and therefore it is
only good for throw-away data.

~~~
gaius
Oh absolutely, but what you're talking about there, people do with CICS today,
and that's even older than the 80s. CICS is a technology I have a lot of
respect for.

But my point is - when I need to use something like that, I know that's what
I'm using. I don't imagine that it's some new invention. Hell, a lot of what
the NoSQL guys think they're inventing, IBM did back then too - IMS.

~~~
einhverfr
"Those who don't understand Unix are condemned to reinvent it, poorly." –
Henry Spencer

------
rbranson
I'm no fan of MongoDB, but this same advice goes for any NoSQL data store. I
am an Apache Cassandra contributor and community MVP, but my advice stays the
same: it's best just to start with a SQL database and go from there. Read some
books and learn it well: the "SQL Cookbook" from O'Reilly is great, and so is
"The Art of SQL." Premature optimization continues to be the root of all evil.

~~~
stickfigure
_it's best just to start with a SQL database and go from there._

This is bad advice. It's best to understand your problem domain and use the
tools that are most appropriate. You see a lot of two types of posts on HN:

* "I picked a NoSQL database for a problem domain with a better relational fit." Those posts look like this one.

* "I picked an RDBMS for a problem domain with a better NoSQL fit." Those posts are usually titled something like "How I scaled Postgres to XYZ qps" and describe an insane amount of re-engineering and operational hell. Oddly, these posts are usually proud of the accomplishment rather than embarrassed that they picked the wrong tools in the first place.

There are upsides and downsides to RDBMSes. From my experience, you should be
leaning towards NoSQL systems (of which there are many, each suited to
different use cases) when you have very large scaling needs (in terms of
dataset and qps), heavily polymorphic data, or data that has ambiguous
structure.

It's been stated many times already: Use the right tool for the right job.

~~~
gregjor
This advice -- the most frequent reply in this thread -- is the same as saying
you should be able to accurately predict the future. Database management is a
big and complicated topic. Especially with emerging tools like Mongo there is
no way reading a book is going to give you the expertise and experience you
need to choose the right tool. You do your best and maybe make the wrong
decision. It's not stupid to go down the wrong path. What's stupid is
continuing down the path after you've encountered one problem after another.

~~~
stickfigure
It doesn't require the ability to predict the future, just some familiarity
with the tools. The OP's use case (small amounts of data, low traffic, lots of
aggregation) is perfect for an RDBMS, and anyone with even casual familiarity
with both SQL and NoSQL systems could easily prognosticate that.

------
bunderbunder
_Fourthly, and this one completely blew my mind - somewhere along the stack of
mongodb, mongoid and mongoid-map-reduce, somewhere there, type information was
being lost. I thought we were scaling hard when one of our customers suddenly
had 1111 documents overnight. Imagine my disappointment when I realised it was
actually four 1s, added together. They’d become strings along the way._

I've been having a similar problem with an SQLite data store, only the other
way around. Strings were getting converted to numbers, with leading zeros that
were significant and needed to be maintained being lost along the way.

It sucked all the fun out of dynamic typing for me. At least in combination
with automatic type conversions. Having to think about type and when to make
transitions across type boundaries when you need to is just a little light
busywork. Having to worry about type and transitions across type boundaries
being made contrary to your intentions is a downright PITA and, it turns out,
a serious quality control issue.

~~~
zemo
Mongo accepts the data you give it. If you have a type-conversion error, it's
in your application layer. I use Mongo daily and have never seen this problem,
because I'm using a statically typed language. This seems like more of a
complaint about Ruby than Mongo.

I use Mongo daily on a Go project, and I actually think it's pretty annoying;
I'm not trying to be a Mongo apologist, but ... this type conversion argument
doesn't seem to be very fair to Mongo.

~~~
lucisferre
Bad code is bad code, in _any_ language.

I'm a little disappointed how _any_ post about moving from X to Y (especially
if X is Mongo) makes the top of the front page on HN. This is not really a
very good or insightful post. It's one persons experience and anecdotes of the
pain points of learning a new technology. Mildly interesting, but not really
expository in any way.

------
dccoolgai
"To be honest, the decision to use MongoDb was an ill-thought out one. Lesson
learned - thoroughly research any new technology you introduce into your
stack, know well the strengths and weaknesses thereof and evaluate honestly
whether it fits your needs or not - no matter how much hype there is
surrounding said technology."

I think you are not alone in learning this lesson with this particular
technology. Fortunately it's one I learned by proxy from working adjacent to a
team that decided to introduce Mongo into their stack...but I still wake up
and hear the sceams at night of "You have to put the whole dataset on
RAM?"...you weren't there, man...we lost a lot of good guys...

You have to draw a clean line between "stuff it is really fun and enlightening
to play with" and "stuff you introduce into your stack".

~~~
trafficlight
>> "You have to put the whole dataset on RAM?"

I'm pretty new to the whole database thing, but how is MongoDB different from
Postgres or Mysql in this respect? In a traditional database, the data is
pulled directly from the hard drive. Why does Mongo suffer a performance hit
and Mysql doesn't?

~~~
otterley
Because MongoDB mmap(2)s its backing stores into its process memory space.
It's a naive approach to persistence - it's very fast and simple, but if you
overcommit (i.e. you store more in the database than you have memory
available), page-thrashing results.

MySQL's InnoDB table engine, on the other hand, uses direct I/O (in the
recommended scenario) and manages the buffer pool independently of the kernel.
Its buffer pool manager is specifically designed for the typical workloads
MySQL is used for ([http://dev.mysql.com/doc/refman/5.5/en/innodb-buffer-
pool.ht...](http://dev.mysql.com/doc/refman/5.5/en/innodb-buffer-pool.html)) -
as opposed to the naive LRU that most OSes employ for their filessytem
buffers.

~~~
jeremyjh
Its hardly naive, it is optimized for an application that needs to keep its
entire working set in RAM, that is why sharding is so fundamental to the
design. Not all apps need that which is once again...

------
jamesli
I am both a database guy and a software engineer. Being a software engineer, i
kind of understand the hype behind NoSQL. Being a database guy spending years
in studying how database engine works under the hood, many NoSQL
implementations make me wonder how powerful marketing can be.

In general, I love the ideas behind NoSQL. I can still feel the excitement
when reading the BigTable and MapReduce papers. HBase, Hadoop, Radis, etc. are
awesome products. I use some of them in my work. But some other NoSQL
products? Being engineers, we must understand the implementation and be full
aware of its limitations, instead of believing their marketing materials.
Well, if all you want is to test a toy product, to build a prototype, or your
product is of low concurrency and low data size and you have no concern on
operation, it certainly looks that they make your development easier. But in
these scenarios, any good relational databases won't add significant burden
either.

~~~
taligent
> Being engineers, we must understand the implementation and be full aware of
> its limitations, instead of believing their marketing materials.

And as engineers we must understand that most other engineers do take their
role seriously and evaluate products on their merits.

Implying that they are falling for "marketing" just because you don't agree
with their choice and then lecturing them for their choice doesn't make you
come across well.

~~~
einhverfr
I do think that the popularity of MySQL, however, owes a lot to it being used
by non-engineers for simple web apps, though ;-)

------
jaimebuelta
Mmm, not sure about some of the complains...

\- You can make case insensitive searches on the DB using regexes
([http://www.mongodb.org/display/DOCS/Advanced+Queries#Advance...](http://www.mongodb.org/display/DOCS/Advanced+Queries#AdvancedQueries-
RegularExpressions)). A simple case-insensitive regex is not very bad
performance-wise, but in general, case-insensitive searches should be avoided
for search purposes (you can normalize to set everything to lower case or
other equivalent trick)

\- The proper way of doing an audit (and search later) is to make an
independent collection with a reference to the other(s) document in a
different collection. Then you can index by user, date, or any other field and
leave the main collection alone. The described embedded access collection
doesn't look very scalable.

\- Making map-reduce queries is tricky (at least for me). I think the guys on
10gen realizes that and the new aggregation framework is a way of dealing with
this. Anyway, the main advantage of SQL is this kind of things, the rich query
capabilities. Even if MongoDB allows some compared with other NoSQL DBs, if
there is a lot of work in defining new queries, probably a SQL DB is the best
fit, as that is where SQL excel.

I don't truly believe in this "you should research everything before starting"
(I mean, I believe in research, but too many times the "you should do your
homework" argument is overused. Sometimes you make a decision based in some
data that changes later, or is incomplete), as there are a lot of situations
where you find problems as you go, not in the first steps. But, according to
the description, looks like PostgreSQL is a better match and the transition
hasn't been too painful, so we can classify this into "bump in the
road"/"success history". Also, probably right now the DB schema is way more
known that in the beginning, which simplifies the use of a relational DB.

~~~
datasage
\- The proper way of doing an audit (and search later) is to make an
independent collection with a reference to the other(s) document in a
different collection. Then you can index by user, date, or any other field and
leave the main collection alone. The described embedded access collection
doesn't look very scalable.

I think this point is very important even in the RDBMS side. There are cases,
even with relational datastores that would preform better if the dataset was
built to the query.

The difficulty comes into play when you are trying to keep the denormalized
data up to date based on changes within the base dataset.

~~~
einhverfr
> "I think this point is very important even in the RDBMS side."

This is true. After all you can't extract an audit trail from a deleted
record. Simply in terms of information management it often makes sense to
represent this as separate info.

> "There are cases, even with relational datastores that would preform better
> if the dataset was built to the query."

The big problem with doing it that way is that you are screwed as soon as
requirements change. I recently blogged about not-1NF designs in PostgreSQL
(nested data structures for subset constraints), and the lesson I took away
was that you really don't want to have your select queries hitting the same
tables you maintain for inserts. You want your queries hitting a normalized
data structure even if the data comes in as something different.

------
daveman
As an analytics professional who was pressured into a MongoDB environment, I
feel the OP's pain. If you want to do gymnastics with your data, (aggregations
of aggregations, joining result sets back onto data), SQL expressions are a
1000 times easier than Mongo constructs (e.g. map reduces). We usually ended
up scraping out data from Mongo and dumping records into a SQL database before
doing our transformations.

All that said, our developers loved the ease of simple retrieval and
insertion, and of course the scalability. So I guess you ultimately need to
base your decisions on your priorities.

I don't fault the OP though, since it's hard to know just how limiting NoSQL
will be until you try to do all the things you used to assume were database
tablestakes (no pun intended).

------
dkhenry
Competly aside from the Article. The level of vitriolic discourse in this
topic is astounding. I am amazed that as a community discussions of Database
engines can draw out such mean spirited anger. I have never down voted as many
comments on HN in a single thread then I have on this topic. I don't care
which side of the debate you come down on. There is no excuse for belittleing
and insulting others in a technical forum. Thats right I am looking at you

    
    
        gregjor, gaius, and zemo
    

In this case it appears to mostly be those arguing for Postgre, but I wouldn't
care if you were arguing for sunshine and unicorns there is a way to behave
civilly and your not doing it.

~~~
gregjor
You can find this level of discourse in plenty of topics every day.
Programmers draw blood over indenting with tabs or spaces. It's geek
entertainment.

I have never been upvoted so much on HN. I admit to strong opinions and light
sarcasm but you'll have to show me where I've been uncivil, belittling or
insulting, except perhaps in response to people who insulted me.

------
jaequery
I've ran into similar issue as you described. Something that can be done so
simple and quickly in SQL, was bewilderingly difficult to do in mongo.

The schema-less database approach also seems attractive at first but updating
your data whenever your "app schema" changes starts to become a pain real
quick.

Now I can't really live w/o having a schema first, it actually saves you a lot
more time in the long run (even short run), being schema-less means you can't
really do anything too fancy w/ your data (generate reports, advanced search,
etc...)

~~~
lmm
You still need to update your data when doing schema changes under SQL, and
you have a lot less control over the process.

And you can do anything to your data without a schema, you just need to build
your app as a service that provides access to it.

IME SQL schemas do more harm than good; usually you end up with a schema
that's subtly weaker than what's actually valid for your application, and the
difference between the two models will trip you up at the worst possible time.
Have a small, distinct set of classes that you store, enforce that you store
only those (and don't access the storage layer any other way), enforce that
they remain backwards compatible, and enforce that you can't create invalid
instances. But application code is the best place to do all these things.

~~~
einhverfr
_IME SQL schemas do more harm than good_

Completely disagree here. The basic tradeoff is between flexible input and
flexible output. Without a rigid schema, ad hoc reporting is impossible
because you don't have an ability to articulate reporting criteria. I.e. no
declarative schema means no declarative reporting query.

I suppose that's ok as long as you never need to report on anything..... Might
work....

~~~
lmm
I'm not saying don't have a rigid schema, just don't enforce it at the storage
layer. If you're doing an ad-hoc report then you wouldn't have indexes in
place for it in the SQL case, so I don't see how it's any worse or harder in
mongodb.

~~~
einhverfr
> _If you're doing an ad-hoc report then you wouldn't have indexes in place
> for it in the SQL case, so I don't see how it's any worse or harder in
> mongodb._

Only true in a case where you ahve to index everything you might want to
search on, like with InnoDB. In PostgreSQL all you really need are your
foreign key indexes and a couple (if that) of criteria indexes and you are
good. That's more of an InnoDB limitation than a relational limitation.
Basically InnoDB tables are primary key indexes and they can only be traversed
in key order, not physical order, so sequential scans are painful....

~~~
lmm
If that level of indexing is sufficient I don't see why you can't just do the
same thing in mongodb.

~~~
einhverfr
>"If that level of indexing is sufficient I don't see why you can't just do
the same thing in mongodb."

What level of indexing is sufficient depends a great deal on the specifics of
the database layout on disk. In InnoDB for example, sequential scans are very
costly, and primary key lookups are very cheap. This is because the table is
more or less contained in the primary key index and this must be traversed _in
key order_ since physical order is not supported. This means a sequential scan
of a table means lots of random disk I/O and OS prefetching is useless.

So to address this you end up indexing everything you want to search on later.
Note that non-pk indexes are a little slower in InnoDB because you have to
traverse the index to find the primary key value, then you have to traverse
the primary key index to retrieve the table info.

In PostgreSQL things work differently. The table is a series of pages on disk
and rows are allocated from these as a heap. You can scan a table, but not an
index, in physical order in PostgreSQL. Therefore typically PostgreSQL
sequential scans on tables are lot faster than on MySQL because it is
sequential, rather than random, page reads. Indexes point at the tuple ID
which stores the page number and row number within a page. An index scan is a
tree traversal followed by processing pages indicated in the tuple ID.

This leads to a bunch of interesting things: Adding indexes is usually a
performance win with InnoDB. However for PostgreSQL, it will typically look up
what indexes it has and balance index scans against sequential scans of
tables. Unlike InnoDB, sequential scans sometimes win out planner-wise, esp.
on small tables.

So what indexes you need depends quite highly on how things are organized.

~~~
lmm
That's all pretty interesting, but I still don't see what you get with
postgres that you don't get with mongodb. Your database won't enforce your
schema for you, but I don't see how that means "ad hoc reporting is
impossible".

~~~
einhverfr
I have read through the Mongo db query docs and it does look like you can do
some ad hoc retrieval queries, and some aggregation. But in the SQL world
that's not really the same thing as ad hoc reporting.

I suppose "can't do" is too strong assuming your reporting matches your query.
However these things look a lot simpler to do in SQL than in Mongo's approach,
and I don't see how you can reliably transform data on output if you don't
have a guaranteed reliable schema on input. Also I don't really understand how
you would transform the data in this way with Mongo's API. I suppose you
always could but it looks painful to my (admittedly uninitiated) eyes.

How many lines of code are required to express a 50 line SQL query doing 5
joins, complex filters and group-by transformations, etc?

~~~
lmm
Obviously if your report makes assumptions about your data which aren't true
then you might get invalid data out. I absolutely agree with having a single
point through which writes to the data store must pass which enforces
business-level constraints on the data. I just don't find SQL a convenient
form to express those constraints (and my experience has been that any given
business domain will have some constraints that are too complex to express in
the SQL model, forcing you to resort to e.g. triggers - at which point the
constraint is not integrated with the rest of your data model, it's just a
piece of code that runs to check your inserts, which you could do equally well
in any language); I'd rather do it in a "normal" programming language.

I see what you're getting at with reporting now, you're talking about doing
actual calculations on the data? For mongodb I'd probably use its map-reduce
API, at which point you're writing javascript and you can do anything, and
performance should be fine. Though honestly other than performance I've never
had a problem with just fetching the rows I need and doing whatever
transformation/aggregation logic in e.g. python. SQL has never struck me as
particularly typesafe or gaining much advantage from being close to the data;
its sets-oriented way of thinking can be helpful in some places, but it's not
the only language that can do that.

If you like SQL as a language for writing logic in I can see why a traditional
database would appeal. But even then I feel like input validation, data
storage and query processing should be separate layers (and I see some
movement towards that with e.g. MySQL offering an interface to do key/value
lookups directly). If SQL is a good way of doing transform/aggregation logic
then it should stand alone, and I'd expect to see SQL-like query layers on top
of key-value stores.

------
dkhenry
Looks like he jumped ship just a bit too early.

<http://docs.mongodb.org/manual/applications/aggregation/>

------
dkarl
_You have to load every document in the database and extract the audit trail
from it, then filter it in your app for the user you’re looking for. Just the
thought of what that would do to my hardware was enough to turn me off the
whole idea._

Naive question from somebody who has done a little reading on and dabbling
with key/document-with-MapReduce style datastores, but who hasn't tackled a
real production problem: I thought running queries over the entire dataset was
one of the assumptions of horizontally scalable document stores? In terms of
avoiding computation, you can only limit queries by document key, which even
if you're clever/lucky doesn't always encode the parameters you're querying
on, or doesn't encode them in the right order, so you should be prepared to
run queries over your entire dataset. Hopefully the queries you run often are
optimized (e.g., using indexes or clever use of key ranges), but in the
general case, you have to be prepared to scan the whole shebang, and that's
supposed to be okay because of horizontal scalability, right?

~~~
enjo
Yep, or you need to build some other construct to support it (IE: keeping
running tallies and the like).

It's a tradeoff between the benefits of the document store vs the loss of
relational data. The blog author here clearly didn't understand the trade-offs
he was making.

As always with these discussions: it's important to use the right tool for the
job. I'm a big fan of what Mongo is doing. I've used it two higher scale
projects and have no complaints. Of course, I'm using it in the context for
which Mongo excels.

------
adambard
As a relative idiot when it comes to this sort of thing, I'd like to insert
the following supplementary question: what is the sort of application/dataset
for which Mongo is particularly suited?

I've used it on small projects, and have enjoyed it. Perhaps my data has just
been simple/loosely-coupled enough to never run into these problems?

I read a lot of posts like this on HN before every trying Mongo, so I've at
least been convinced to always implement schema at the application layer.
Others seem to keep learning that lesson in harder ways.

~~~
stingraycharles
"what is the sort of application/dataset for which Mongo is particularly
suited"

The majority of the NoSQL databases are based on Amazon's Dynamo: loosely
coupled replication. MongoDB is one of the few (next to Hbase and a few
others) that adopts Google BigTable's architecture: data is divided in Ranges,
and each mongod node serves multiple Ranges.

This means MongoDB is able to provide atomicity where it's harder with other
SQL databases. In particular, we need to be able to do some sort of "compare
and swap" operation that is guaranteed to be atomic/consistent, while still
being able to have our mongod nodes distributed over multiple datacenters.

In Dynamo-based architectures, in order to provide the same amount of
atomicity, you always end up writing to at least half + 1 the amount of
replicated nodes you have available in your cluster. This is more awkward, and
reduces the flexibility of the whole (the atomicity guarantee Mongo provides
also works for stored javascript procedures, for example).

Having said that, we're using MongoDB about 3 years in production at this
point, but we're far from happy about the availability it provides (issues
like MongoDB not detecting that a node has gone down, failing to fail over,
etc). We run a HA service, and to date _all_ of our failures in uptime have
been either the fault of our hosting provider or mongodb not failing over when
it should. As such, we're always looking for a better alternative to move to,
but at the moment MongoDB is about as good as it gets.

~~~
edwardcapriolo
MongoDB is not even remotely close big table architecture. It has a different
data model and a different sharding model, and just about a different
everything.

~~~
stingraycharles
I know that MongoDB is very different in architecture from BigTable (as
opposed to HBase and BigTable, for example), but I always understood that the
fundamental way they choose to assign and lookup regions to regionservers (or,
in mongodb terminology, shards and shard servers) was based on the BigTable
architecture.

Could you elaborate on the differences in the sharding model between the two?

------
redler
_digiDoc is all about converting paper documents like receipts and business
cards into searchable database, and so a document database seemed like a
logical fit(!)._

It looks like this single initial assumption is where things started going
wrong: conflating the pieces of paper that happen to be called "documents" in
the real world with the concept of a "document" in the context of a system
like MongoDB.

~~~
josephcooney
It's an easy mistake to make - to assume two things with the same name might
be similar. Especially when paper documents have been called thus for a very
long time.

It seems like another instance where all the good names were taken.
[http://jcooney.net/post/2012/04/03/Transaction-Argument-
Clas...](http://jcooney.net/post/2012/04/03/Transaction-Argument-Class-
Message-Service-Agent-Method.aspx)

------
tonynero
The guy is getting such hate on the comments on his site, yet his opening line
is that his choice was ill thought. Let him express his issues right?

I choose MongoDB for my last side project and while it was awesome working
schema-less and developing the client facing part of the project was certainly
quicker to deliver, i feel pretty lost on the analytics/BI side of it and
couldn't say it better than him: "Not having JOINs makes your data an
intractable lump of mud"

So coming from a relational/SQL background I found MongoDb awesome upfront,
but frustrating later on... and yes I'm off to learn
<http://docs.mongodb.org/manual/applications/aggregation/>

------
stevencorona
The downside, or challenge, with NoSQL (generally speaking) is that you need
to handle your aggregations ahead of time - you need to know what queries
you'll want to run in the future when you store your data. If you have some
new aggregation you want to keep, you'll need to re-process the data (with
Hadoop or something else).

It's the trade-off of being able to scale reads and writes horizontally. And
unless you need it, an RDBMS makes sense given the flexibility.

Maybe, instead of looking at NoSQL as a full-on replacement for RDBMS, we can
look at it as a better solution to sharding.

~~~
aneth4
This is the opposite of "agile." It is difficult to know where your product
will be in 2 months let alone 12, so it seems the advice to use SQL first is
sound - unless you enjoy long distractions to solve simple JOINs.

~~~
bthomas
If you are refactoring your app enough to need a different class of joins,
you'll probably need schema and data migrations too.

I think no-schema fits agile quite well. For rapid prototyping, I prefer Mongo
to even sqlite.

------
se85
The guy just jumped on the bandwagon without having a clue.

Just reading this blog - it's clear that MongoDB was not a good fit for him,
if he had bothered to do some research, he would have found this out on day
one.

Thats the real lesson he should be taking away from this and blogging about
yet somehow MongoDB are trolls and it's all their fault because of a lack of
features and they have bypassed 40 years of computer science and blah blah
blah blah, excuses, excuses, excuses.

edit: removed a few pointless sentences :-)

------
leothekim
"I can only come to the conclusion that mongodb is a well-funded and elaborate
troll."

It's possible the reasoning he used to use mongodb is the same as the one he
used to abandon it.

------
chaostheory
For me, what killed my enthusiasm for mongodb is the write locks. Yes they
have been greatly improved in the 2.x release but it's still not good enough
(for me).

------
programminggeek
Look, there are some places where document DB's solve problems easier/better
than SQL, other places kind of suck. For example, plain old object mapping is
easier with a document DB. Relational DB's tend to make your code
look/feel/act more relational and less object oriented. Your object model
tends to look just like your table structure. This can be good or bad
depending on your viewpoint.

There are some approaches to solve some of the author's problems that end up
making the Mongo system look and feel a lot more like a SQL system because
sometimes data is actually related.

The author could have also taken a different approach to his data schema that
would have fit more of a non-relational worldview.

Software development and architecture is about making choices and working with
and around the limitations of your tools. It doesn't matter if PostgreSQL or
MongoDB are "better". It's about solving a problem using a set of tools you
are comfortable with.

------
mrinterweb
I find this article to be more a reflection of a NoSQL newbie's failed foray
with a document database that later realized that the grass is not as green as
originally perceived. The developer realized that he does not like map-reduce
and missed not having joins. I don't see how this person's failed experience
with MongoDB is a reflection on MongoDB.

I think the recent popularity of MongoDB bashing is maybe a testament to
MongoDBs popularity. I'd guess that because MongoDB is probably the closest
NoSQL database to a RDBMS with its ad hoc queries, that it is attracting many
newcomers.

~~~
jeremyjh
Yes, MongoDB is more of a general-purpose database with lots of features that
remind us of relational databases. It is a purpose-built application database
for applications that would otherwise almost certainly be built with a
relational database. As they get deeper into their project and find out how
some of the trade-offs play-out their self-doubt is always about "should we
just go relational" - no one is staying up all night wondering if they should
migrate to Riak. If you start out with Riak you almost certainly know why and
are using it in a very specific context.

------
armored_mammal
Can someone confirm that there is no such thing as a case insensitive
index/search in Mongo? If true it seems likely that the author's comments have
some degree of truth, at least when it comes to its usefulness for web and
mobile applications. Storing data only lowercase isn't a good a idea for
obvious reasons, and storing two copies of the same data for searching only,
while not the end of the world, seems a little silly.

~~~
zemo
case-insensitive regex searches are supported.

~~~
lucasjans
Do you know why are case-insensitive searches not recommended? What's the
realistic work-around?

~~~
ericcholis
I've found that a simple {"lastname":/cholis/i} works great. However, trying
to do the same thing for a multi-key search isn't ideal. Specifically,
searching for 3 words in a title using $and with multiple regex queries on a
collection with 100,000+ documents took about 520 ms.

The mongodb documentation suggests that you could have an array with your
keywords, generated from the field you wish to search. Using indexes on
multikeys would make this faster, but your index size would be much larger.[1]

For my project, I'm likely going to institute solution like elasticsearch or
SOLR.

[1]
[http://www.mongodb.org/display/DOCS/Full+Text+Search+in+Mong...](http://www.mongodb.org/display/DOCS/Full+Text+Search+in+Mongo)

~~~
sureshv
Building over a full text search engine that indexes your db is going to scale
much better in the long run. Plus you get stemming and other niceties to boot
as long as consistency is not of the highest priority.

~~~
r00fus
With tools like SOLR or Elasticsearch, I keep wondering why people try to re-
invent the full-text indexing wheel - these tools make it easy to focus your
app/datastore on your functionality and leave search as an additional service,
to be used whenever you need it.

------
Teef
There are 3 reasons I have gone running and screaming from and RDBMS. 1\.
Software gets large / complex to get meaning full work done. I am all about
data consistency but at some point it is time to break things up into services
and not have a single database. 2\. If the software is popular enough everyone
is running to use NoSQL (cache is NoSQL). 3\. Clearly it is not a good storage
solution either because for example in an address book nested list greatly
simplifies everything. (right tool for the job)

I spent many years hammering away with RDBMS and by and large it was great
until it wasn't. I try to look at data storage more holistically now based on
best guess of the problem. I have tried to convert an application from
Postgresql to MongoDB and it failed but that wasn't MongoDB's falt it was
because I didn't change the data model to fit a document storage system. I
have also tried to use PostgreSQL for a realtime reporting system and failed
horrifically and that was not Postgesql fault it was mine. Amazing what
happens when you stop pushing a chain and start pull it!

~~~
einhverfr
> _1\. Software gets large / complex to get meaning full work done. I am all
> about data consistency but at some point it is time to break things up into
> services and not have a single database._

This is true. Managing complexity is always an important task. I am not sure
that NoSQL solves this however. Also the best way to break things up is to
loosely couple things, and this requires to some extent that you have ACID
compliance. A good RDBMS, like PostgreSQL or Oracle, will provide tools for
managing that loose coupling.

>"2. If the software is popular enough everyone is running to use NoSQL (cache
is NoSQL)."

Like proverbial lemmings over a cliff....

>"3. Clearly it is not a good storage solution either because for example in
an address book nested list greatly simplifies everything. (right tool for the
job)"

Funny, I thought nesting was what WITH RECURSIVE was for....

I am not saying there aren't use cases for MongoDB or reasons to switch some
applications. For example I can think of a few really cool apps, like maybe a
network back-plane for a huge LDAP directory. Also content management might be
a good fit. But despite your years of experience, it doesn't sound like you
have really looked at how to solve these with good RDBMS's.....

------
firemanx
I work for a company that operates in the energy industry. We utilize both
RDBMS and "NoSQL", both have their purposes that they fit in well. We store
customer account and configuration data in Postgres, and use Cassandra to
store time-series statistics and high write volume data.

I have a background in data warehousing in both Oracle and SQL Server, and was
part of the decision to use a polyglot persistence model. I've got at least a
decade's worth of experience in the DW world, and more as a general developer
before that, so I like to think I've got a relatively credible background in a
variety of data stores.

I haven't looked at Mongo much - it's durability concerns and the write lock
stuff pushed me away from it early on (I don't mean to disparage it, but that
was where it was at when I evaluated it), but Cassandra's configurable
consistency levels and operational story at a cluster level are what sold us
for our time-series data (that, and the ability to construct a sparse timeline
and multiplex reads/writes). For anything we need flexible querying with, we
push it into specialized Postgres dbs.

The level of willful ignorance and vitrol in this thread is kind of amazing.
Most of the really experienced DW guys I know are all looking at HBase,
Cassandra and others because they fit a niche that we've all been looking for
in certain data sets at really large scale. It doesn't mean we're ditching our
relational data stores, it just means we're augmenting them with other tools
because they fit the job at hand. To suggest that one tool is absolutely
perfect for every scenarios seems a little short-sighted to me, possibly
driven out of inexperience. I don't mean that as an insult - I know a lot of
guys who've been working on the same data sets for 30 years who really do just
need the one tool - however, you've got to realize there are other data sets
and problems for which your hammer just won't fit.

------
ilaksh
On your home page you imply that you can automatically OCR arbitrary
handwritten receipts into an analyzable format.

No one can do that. That is your problem, not MongoDB.

As far as aggregation, use the new Aggregation Framework
[http://docs.mongodb.org/manual/tutorial/aggregation-
examples...](http://docs.mongodb.org/manual/tutorial/aggregation-examples/):

    
    
        db.zipcodes.aggregate
          $group:
            _id: "$state"
            totalPop:
              $sum: "$pop"
        ,
          $match:
            totalPop:
              $gte: 10 * 1000 * 1000
    

As far as "losing the independence of your data access paths", no you don't.
You are free to use linking instead of embedding wherever you want.
[http://www.mongodb.org/display/DOCS/Schema+Design#SchemaDesi...](http://www.mongodb.org/display/DOCS/Schema+Design#SchemaDesign-
EmbeddingandLinking)

MongoDB doesn't have a built-in full text search? So what. Most systems with
large amounts of text to search do not rely on the text search capabilities
built into relational databases anyway. People use actual full-text search
engines like Lucene/Solr, Sphinx, reds, etc. Having said that, if you just
wanted to support lowercase keyword queries with MongoDB, would it really be
so hard to extract and store lowercase keywords from your text, as suggested
here?
[http://www.mongodb.org/display/DOCS/Full+Text+Search+in+Mong...](http://www.mongodb.org/display/DOCS/Full+Text+Search+in+Mongo)

If you are trying to add four 1s and get '1111' instead of 4, that is an error
in your application code which has nothing to do with MongoDB. Very common
problem with JavaScript. If it is JavaScript, try finding the code where you
are attempting numeric addition and change it so that instead of saying for
example 'total += newNumber' it says 'total += (newNumber * 1)' .

~~~
nnnnnnnn
"On your home page you imply that you can automatically OCR arbitrary
handwritten receipts into an analyzable format. No one can do that. That is
your problem"

Jeez, lay off the confrontational tone. He doesn't say anything about OCR.
Maybe he's using humans to do data entry? In any event, it's completely
irrelevant to the topic of databases.

~~~
ilaksh
I notice you ignored all of my several very specific points directly related
to his issues with the database system and your only comment was a criticism
about the tone you perceived.

OK, maybe he is using humans to do data entry. The home page to me implies
that the process is automatic, but I guess it doesn't rule out the possibility
of humans doing data entry when he says 'tag and categorize'. But if he is
using humans to do data entry instead of some automatic OCR, that is still his
main business problem, rather than MongoDB. The application is relevant to the
database discussion, and Hacker News is about all aspects of startups.

~~~
nnnnnnnn
It's because I have nothing to say about the database stuff. Why are you so
adversarial? I'm not here to cross swords with you; I don't have an opinion on
the matter.

But I did notice your rudeness, and you're now being rude to me. Totally
uncalled for.

~~~
ilaksh
I was not rude to the poster. I corrected him as far as his misguided
complaints about MongoDB and the main problem for his business. That is the
only way to help him.

I was definitely not rude to you either. You suggested that my comment was
irrelevant, and I pointed out that my comment had a lot of relevant content in
it whereas by your own definition of relevance your comment had none.

------
voidr
Relational databases are awesome if you are not dealing with huge amounts of
data that your current hardware can't handle the relational way. There are
some cases where you have a ridiculous amount of data(rows) and you simply
can't store that in a relational database and you are happy to live without
the benefits of relational databases.

If you have millions of rows, you are probably better off with something like
MongoDB, if you need to search that, you should probably use something like
Sphinx or Lucene anyway. But if you know that you won't have too much data for
the forceable future, you should use relational databases. OR you could simply
use both.

~~~
jeltz
A relational database at very modest hardware can handle millions of rows, and
with a solid database, good hardware and a DBA who knows his shit you can
handle billions.

OpenStreetMap has over a billion nodes stored in a PostgreSQL database.

<http://www.openstreetmap.org/stats/data_stats.html>

My point is that you can get very far with a classic relational database
before you have to scale vertically.

~~~
einhverfr
Billions of rows btw is not a problem even on modest hardware, with a half-
decent db, etc. It's analytics on this where it become possibly an issue.
Retrieving one row out of 1 billion is not that much more complex than
retrieving one row out of 1 million, nor is it that much more expensive
computationally assuming the right index is in place.

The problem comes with high concurrency, in particular very high write
concurrency, or with very complex queries which require a lot of RAM to do
properly. But that's where you need a solid db, good hardware, and a solid
DBA.

~~~
jeltz
Very true. If you do not do much with your data it is trivial to handle.

------
nashequilibrium
Taking all these comments into account, if startups have to prototype quickly
while trying to find market fit, does it makes sense to start off using
something like mongodb but with the plan to migrate to another database when
you business starts growing? The database space is so confusing right now. It
seems like Postgres is the safest choice and i also like this post fron Adam
D'Angelo - '[http://www.quora.com/Quora-Infrastructure/Why-does-Quora-
use...](http://www.quora.com/Quora-Infrastructure/Why-does-Quora-use-MySQL-as-
the-data-store-instead-of-NoSQLs-such-as-Cassandra-MongoDB-or-CouchDB)

------
bitdiffusion
It's not necessarily all or nothing - I have worked on several projects now
each using multiple database-type options: mongodb for read-intensive, loose-
schema type stuff where the growth is generally predictable (e.g. products,
suppliers, logs), postgres for relational-type stuff (orders) and solr for
searching (I know solr isn't a database but people seem hung up on whether
mongodb supports case-insensitive searching - hint: don't use any database for
search).

I doubt that, unless it's extremely simple, any set of requirements are an
exact match to only one of these technologies... mix and match is the future
:P

------
blaines
Very briefly put, I use MongoDB to start off most new projects.

My primary objective is that my application fulfills it's use case. Data is
malleable, so you should use the right tools for your needs.

That being said, it sounds like the OP was trying to use a chisel as a
replacement for a toolbox. Basically fighting the software (mongodb) to fit
his requirements, instead of using additional tools.

<http://blog.8thlight.com/uncle-bob/2012/05/15/NODB.html>

<http://blog.heroku.com/archives/2010/7/20/nosql/>

------
ww520
Wow, the first comment on the blog is so vile. He angrily blamed the "victim"
(OP) as a talentless developer.

Tools are enablers and supposed to make ordinary people rock star. If it takes
a rock star to use a tool, the tool fails.

------
DonnyV
If he just did 10min of research he would of realized MongoDB isn't for him.
Also by doing that research he would've realized that MongoDB has no data
constraints. Thats all done in your model in your application.

------
harel
You used a tool without researching it first, you jumped a bandwagon without
finding out its destination, you most likely used it wrong because you didn't
RTFM. Now you ditch and diss it. Grow up.

~~~
mattmanser
Very harsh given that he points out he made these mistakes right at the
beginning of his post.

~~~
harel
Fair enough, perhaps a bit harsh. But from that to call an entire system a
Troll... It works for some, it doesn't for others. Usually its down to use
case and knowledge of how things work. For me mongoDB was a perfect fix as a
datastore along side a tranditional Postgres instance.

------
bassemali
I've never used Ruby on the application layer, but I'd be wary of using an ODM
with MongoDB. The single most shocking issue seems to be at the application /
ODM level. Using the official 10gen-supported drivers gives you more control
and a better understanding of what's going on every step of the way.

Also, a thorough understanding of MongoDB indexing, advanced queries and
schema design would have squashed all of these issues. Has anybody had a more
pleasant experience with a MongoDB ODM?

------
itaborai83
To be fair, some NoSQL solutions were being sold marketing wise as the be-all
and end-all of data solutions. Just google "mongodb mysql migration" and look
how everyone is/was so eager to jump on the non-relational bandwagon. Some
backlash was to be expected, after all, we might have reached the Trough of
Disillusionment

------
anthony_barker
Made the same mistake on a banking project 10 years ago (with domino). The
project in question was a project tracking database.

For accounting type problems use a relational database. For document driven
items - e.g. a resume database - nosql works great. For a hybrid pick your
battles... or use both.

------
manorasa
I think the real lesson here is use the right tool for the right job.

------
effinjames
calm down redditors, he just need a basic non majestic scale solution, SQL
fitted him very well. large scale data aggregation needs to address disk and
network latency, that's where NoSQL shines. and if you operate at large scale
no 1 single simple tool will do justice, remember quote from google, 'at scale
everything breaks'?

~~~
Karunamon
_calm down redditors_

 _Please_ knock this crap off.

------
jeremyjh
Upvoted for?

------
aliks
Reasons why drop mongodb: 1\. try $or $and with $near 2\. No b-tree index ::
count() 10.000 rows = 100% CPU usage ;) Type google.com then::
site:jira.mongodb.org/browse/ planned but not scheduled

