
Guide to MongoDB for startups - optinidus
http://www.optinidus.com/blogs/guide-to-mongodb-for-startups/
======
chris_mongohq
The title "Guide to MongoDB for Startups" is hand-wavy, and is more like a
high level overview of MongoDB architecture. If you are looking for a "Guide
to MongoDB for Startups", it would probably be part of a much larger series of
technology choices for startups. Since I found the other lacking, I'll give
you my perception of a Guide to MongoDB for Startups:

* Do you have a customer yet? If not, technology choices do not matter, go build your product as fast as possible and get a customer.

* Is your system starting to have slow performance during max usage? If so, every system will have a few easy optimizations, find those using something like NewRelic. Aside from those, technology choices do not matter, throw money at any scaling problem (at this point, it typically isn't that much money). This will optimize your time for sales and marketing to get more customers.

* Is your system starting to buckle under the weight of customers? If so, go hire someone who can scale everything you learned from your customers, and not someone who bends to the latest trends on HN.

Here is an article on my philosophy and experience with scaling businesses and
architectures: [http://blog.mongohq.com/changing-the-growth-
formula/](http://blog.mongohq.com/changing-the-growth-formula/)

------
mkohlmyr
I find it somewhat depressing that the first comments on any post about
MongoDB generally seem to be short, disparaging and dismissive without
offering much if any substance.

We use MongoDB in production, processing millions of new records every month
and it works great. Our data has differing and evolving schemas and can be
used as JavaScript objects or python dictionaries with little effort. It works
and performs well for us and our use case.

I'd be delighted to find out what these nay-sayers are doing that MongoDB must
fail so incredibly at for them to have such strong negative opinions.

I'm not speaking only to the existing comments on this post but the general
theme of past MongoDB submissions.

~~~
thisishugo
There was a spate of high profile "why we moved from MongoDB to
$other_database" articles a year or so ago of startups getting burned when
they realised that the all-signing all-dancing cure-all database they (thought
they) had been sold, in fact wasn't.

The issues encountered ranged from MongoDB failing to scale out as easily as
promised, to significant loss of data. There was also some backlash because
MongoDB didn't (and perhaps still doesn't) persist data to disk when it
acknowledged it as received[0].

A general perception grew, rightly or not, than MongoDB was being marketed by
10Gen[1] over and above its capabilities. Then the actual message of "MongoDB
is not a drop-in replacement for traditional RDBMSs like PostgreSQL and MySQL,
and is not the ideal solution to every problem" has been filtered down to
"MongoDB is a terrible product with no use cases whatsoever." Such is the
effect of the HN echo chamber.

Much like PHP, MongoDB is now simply a product that you cannot say anything
good about on HN, even if you are in fact finding it an effective tool for
your use case in spite of its shortcomings.

[0] Cynically, because they were trying to win benchmarks. Less cynically,
because the use case was semi-ephemeral data where possibly losing some data
is acceptable.

[1] Now also called MongoDB

~~~
dougcorrea
Do you think it is MongoDB problem or all NoSQL dbs has the some problem?

~~~
camus2
Other NoSQL solutions were not marketed as a replacement for relational
databases. MongoDB might be better now,but you cant take managers for idiots
an expect them to buy into your product again. Right now MongoDB only lives
because of javascript and nodejs ,since nodejs sucks big time with anything
that is not MongoDB,library and driver wise.

~~~
baudehlo
Absolute rubbish. The node Postgres driver is excellent and has performed
flawlessly for us.

------
anuraj
Startups should use a relational database unless their requirements (current
and future) exactly fit a nosql model - which is difficult to predict. You can
migrate later if scalability becomes an issue. Going nosql first (considering
that most startups lack enough design experience) will make things messy and
would make it difficult to scale when needed.

------
nasalgoat
Here's my one-word guide:

Don't.

~~~
hydrogen18
I came to here say this as well.

~~~
tptacek
You came here to write a one-word comment about a database too, found it
already written, so instead decided to write "me too"?

Or did you instead have a detailed and carefully plotted comment about the
problems with Mongodb, but then you read "nasalgoat's" comment and found his
concision so bold and bracing that you were intimidated from contributing your
comment?

If so, I urge you to reconsider.

------
je42
MMh. I would have expected a one-liner: "Use something else" ? :)

------
pdq
Note to the designer of this page: white background with light gray foreground
text is a bad UI decision. Users need more contrast, as this page looks faded.

------
danbruc
_With the increasing data the search engines, e-commerce stores need to
provide us with accurate results. Which means processing huge amount of data.
Which SQL databases were never designed for._

What? There is no limit how much data you can stuff into a relational database
and it mostly works really good. There are definitely some scenarios where
relational databases are not optimal, for example aggregating huge datasets,
but there are often solutions like materialized views or aggregates. The
statement that RDBMSs are not designed to handle huge datasets is plain wrong.

 _With these growing needs the it was getting even more difficult to define a
fixed structure to the data and the need to have a solution which handles
unstructured data grew even more._

I don't understand how once inability to come up with a good data model is
related. If you have no data model you will have a hard time working with your
data anyway. You can stuff unstructured data into a string column and modern
RDBMSs also support semistructured data like XML and JSON including operations
to manipulate such data. And even with XML and JSON you still have a data
model, just not a relational one.

I agree that the relational model is somewhat stiff and it is - somewhere
between sometimes and often - a pain to design, implement or evolve a schema.
But a fair amount of the complexity usually comes from the domain you are
modeling and does not go away when you switch to a different data model - your
database may not complain when you stuff documents with seven different
schemas into it, but now the burden is on the code to deal with documents with
seven different formats. Data modeling and model evolution is a difficult
problem and does not go away by switching technologies. The major difference
is the point in time when you recognize that you have a problem.

 _So most of the NoSql databases traded consistency for Availability and
Partition Tolerance which was a fundamental shift from the relational world
where in you had to design your schema perfectly so that there were no
inconsistencies in the data ( 1NF, 2NF, 3NF, BCNF)._

Consistency in the sense of the CAP theorem and in the sense of database
schema normalization are almost completely unrelated. This gives me the
impression that the author does not really know what he is talking about.

 _One of the biggest advantages claimed by NoSql databases is Horizontal
scalability , which is the ability to handle more load by adding in more
machines, since the computing power these days has gone cheap compared to the
effort required in to fine tune the app and add more computation power and
memory to an existing machine._

RDBMSs support partitioning across server as well. It is probably harder to
set up and does not scale as well because the provided guarantees are
stronger, but it is possible.

 _MongoDB uses reader-writer locking mechanism , it gives concurrent access to
reads but exclusive rights for write operations which means it can handle
concurrent read operations but if there is aright operation it will block the
reads until it gets completed ._

So do many RDBMSs. Or they use MVCC and allow even more parallelism.

 _Prior to mongodb version 2.2 mongodb had an instance level lock which means
that whenever there was a write operation it used to lock the entire mongodb
instance and even if there was a read queued for a different database it will
have to wait as the write operation blocked the entire mongod instance.

This was changed in 2.2 where in the write operation locked the whole database
instead of the complete instance. So the solution which was left to scale the
write operations was to add in more shards and route the next write query to a
separate mongodb shard instance._

This is still extremely inferior to modern RDBMSs which usually support row
level looking.

 _For applications this is by far the most important point to take into
consideration when choosing which database to use, as for write intensive
application this might come in their way to scale._

RDBMSs allow you to opt out of consistency - if you want to read uncommited
data, you are usually free to do so. I don't see why a RDBMSs should
intrinsically allow less parallelism but admittedly dropping consistency
guarantees is quite contrary to the reasons you usually choose a RDBMSs to
begin with.

UPDATE: I probably misinterpreter the last part about locking and scalability
- after reading it again, it sounds like the author actually warns that using
MongoDB may cause scalability issues. I leave my comments as they are although
they sound a bit strange in that light.

TO-THE-DOWN-VOTERS: I would love to hear where I am wrong, my opinion is
neither set in stone nor absolute truth.

~~~
aidanhs
_This is still extremely inferior to modern RDBMSs which usually support row
level looking._

I suppose row-level locking is a necessity when you want to be able to scale
vertically. By comparison, I understand that MongoDB doesn't even try to
support vertical scaling so it makes sense to not bother with complex locking
systems.

Personally, I'd prefer to have to option of scaling without being forced to
increase the number of moving parts in my single most important system (the
database)...but the MongoDB docs make the fair point that there's a ceiling to
vertical scaling. It's probably higher than most people think though.

As an aside, I discovered recently that Informix supports byte-level locks for
'smart large objects' (namely CLOBs and BLOBs). Makes me wonder whether field
level locking will appear some day.

~~~
AlisdairO
The ceiling to vertical scaling is pretty damn high. Last I checked, Stack
Overflow still ran on a master-slave pair of SQL Server boxes with 64 gigs of
RAM each. That's not exactly big iron these days.

Locking has a degree of complexity to it, but implementing page locking (for
example) is not _that_ hard. The fact that the Mongo guys haven't is probably
symptomatic of the fact that they're still relying on the OS to cache data,
rather than implementing their own page manager like most other DBMSs. That
said, given the short duration of Mongo locks, lock granularity isn't as big a
deal with Mongo as some make it out to be.

------
dkhenry
I guess this is a good introduction to some of the internals, However its very
very high level. If I were to give one piece of advice it would be to plan out
your primary key's according to your data access pattern. This is especially
true if you know you will need to be sharding across multiple nodes.

------
mcot2
Any mention of mongodb these days should really be mentioning tokumx. It's got
real transactions, compression, hot backup, and optimized (sequential)
read/writes.

------
lkrubner
MongoDB offers the greatest benefit to those who have an evolving concept of
their schema, and that tends to be startups, though I have worked in large
firms that entirely re-invented their schemas. I worry that I would seem
tedious if I listed the places that I have worked, and yet, on Hacker News,
when I speak in abstract terms, I tend to get downvoted, so I will name a few
specifics.

I worked at Wine Spectator for a year (2010-2011),
[http://www.winespectator.com/](http://www.winespectator.com/) . They had
built their first web site circa 2000 using Oracle, Sun Solaris, and Vignette
with Java templates. Circa 2009 they decided to scrap the old, expensive
system and move to PHP/MySql and the Symfony framework. They could not decide
what their new schema should be, so, in the name of keeping things flexible,
they decided that all data would be in a single table. This table had 240
fields, most of them with generic names such as "modifier_01" and
"modifier_02". This was an organization that was looking for the flexibility
offered by MongoDB, but they tried to cram that flexibility into a relational
database, and they did so by ignoring all the relational features offered by
MySql. This "one size fits all" database table did not work for the last
assignment I was given: import all the old FileMaker Pro databases to a system
running Mysql/PHP/Symfony. I build an entirely different project, with its own
database and schema. Lord knows who is maintaining it now.

Then I worked a year (2011-2012) at Shermans Travel,
[http://www.shermanstravel.com/](http://www.shermanstravel.com/) , which also
tore apart its database. When I first arrived they were trying to save a
system built with MySql and PHP and later forced to conform to the CakePHP
framework. The database had over 300 tables, many of which were no longer in
use. Of the tables that were in use, many had fields that were no longer in
use. The code, and the database, were a sprawling mess, that had evolved
chaotically. (I am emphasizing this chaos because this criticism is often made
of MongoDB: without a schema then how do you keep your data organized? Well,
most of the places I have worked have had relational databases where the data
was completely disorganized). After a few months, the CTO and the tech team
decided on a complete re-write of the code. The tech team was allowed to vote
for either Java or Python or Ruby (no one wanted to use PHP). We voted for
Ruby. We rebuilt the site as 6 apps, using MySql for some of the apps and
MongoDB for some. My last big project there was a rescue effort for a broken
group of 4 database tables in MySql. There was a "users_history" table that
was suppose to track whether a user had subscribed or unsubscribed to various
newsletters we offered, but there had been a bug, apparently for years, such
that many of the "unsubscribe" attempts were not recorded. There were 3 other
tables with somewhat redundant data, and I wrote a script that scanned those
other 3 tables and attempted to funnel the correct data to the 4th table.

I have many more stories like this. I could write a whole book about places
where Oracle, MySql or PostGre was in use, but the data was badly organized.
Unused tables and unused fields are extremely common.

Why do I emphasize the chaos I have encountered? Because the charge of badly
organized data gets thrown at MongoDB a lot. If you would like to read a
scathing attack against MongoDB, read this:

[http://www.sarahmei.com/blog/2013/11/11/why-you-should-
never...](http://www.sarahmei.com/blog/2013/11/11/why-you-should-never-use-
mongodb/)

But to me, this line of argument compares the platonic ideal of relational
data against the actual use of MongoDB. Maybe if Edgar F. Codd designed your
schema to the 7th Normal Form then your schema really is well organized, but I
have not seen anything like this in real life.

What I have seen, in real life, convinces me that every organization has an
informal schema that is constantly evolving, and which can not be maintained
with anything like regularity. Most of the organizations I've been hired at
default to rebuilding everything every 5 or 6 years, because by that point the
old system has grown chaotic. Sarah Mei's description of the dangers of
MongoDB matches my own experience with relational databases: "we figured out
that we had accidentally chosen a cache for our database."

What I like about MongoDB is it openly, boldly declares that the chaos I've
seen is typical, and it facilitates the evolution of the schema which is going
to happen no matter what you do or say. Things evolve, often chaotically. A
programmer has a great idea and works on it in 2007, another programmer takes
over in 2008, the project starts as raw PHP, later is imported into Symfony,
then is re-written in Ruby, then it is broken up into several small apps.
Someone quits. The CTO is fired. Someone new starts working and, for the sake
of simplicity, prefers doing as much as possible as a background task, using
cron scripts. A year later someone joins and is disgusted with the profusion
of cron scripts, they want everything organized around a message queue. A very
good sysadmin joins the team at a time when most of the programmers are weak,
the sysadmin re-writes many of the background scripts, but he prefers Perl for
everything and he implements some data caching strategies that no one
understands.

You may think that I am exaggerating the level of chaos I have seen. I have
not worked for Facebook or Google or Apple and if you tell me that in those
companies everything is well run and well organized, then I will believe you,
as I have no reason to doubt you. But I have worked at a lot of older media
companies in New York City, and what I have seen is constant churn, churn at
every level, churn in the team, churn in the technologies, and churn in the
database.

I know I will be misunderstood, so let me try to clarify this:

I am not saying that chaos is good.

I am not saying that MongoDB is good because it encourages chaos.

I am saying that chaos is a symptom of the fact that most businesses do not
know what their schema should be, and even if they did know what their schema
should be, their needs would be different a year from now. The real schema
needs of the organization (that is, what sets of data should be acquired and
what the relations should be between those sets) are undergoing constant
evolution, and this evolution is necessary, healthy, and unstoppable.

What is the strength of a relational system? Consider Wikipedia's explanation
of Codd's Theorem:

[http://en.wikipedia.org/wiki/Edgar_F._Codd](http://en.wikipedia.org/wiki/Edgar_F._Codd)

"The domain independent relational calculus queries are precisely those
relational calculus queries that are invariant under choosing domains of
values beyond those appearing in the database itself. That is, queries that
may return different results for different domains are excluded. An example of
such a forbidden query is the query "select all tuples other than those
occurring in relation R", where R is a relation in the database. Assuming
different domains, i.e., sets of atomic data items from which tuples can be
constructed, this query returns different results and thus is clearly not
domain independent."

Clearly, this assumes that the relations among the data are known. The
organizations that I work with have no real idea about what relations they
want to establish among their data. They are in a permanent exploratory phase.
I believe these organizations could be described as "pre Codd", but most of
them have been "pre Codd" for decades, and they will always be "pre Codd". If
you force them to specify relations among their data, you will get answers
exactly as useful as these:

[http://blog.jimmyr.com/Funny_student_Exam_Answers_13_2008.ph...](http://blog.jimmyr.com/Funny_student_Exam_Answers_13_2008.php)

MongoDB is useful in this context. Start acquiring data. Don't pretend you
know what your schema is. You do not know what your schema is. The schema is
changing all the time anyway.

Is there a place for relational databases? Yes, because sometimes some parts
of the business become steady for some length of time, and for that part of
the business, capturing fixed sets of data, with fixed relations, is very
useful. But we should not pretend that this situation holds where it does not.
I am not convinced that this is even the general case, though there is an
overwhelming tendency in computer science, and in business, to pretend that
fixed-sets-with-fixed-relations is the general case. If you feel it is, then
you have been working at places facing conditions far more steady than what I
have seen, or perhaps you are simply considering a shorter time frame than I
am.

~~~
AlisdairO
The thing is, you still have a schema. You always have a schema. It's just
defined implicitly in your application rather than explicitly in your
database. I agree that during dev time Mongo conveniently eliminates the
hassle of recreating your DB all the time, but I just don't see how it makes
the problem of schema easier. And even if you desire this flexibility, I can't
see the advantage over using Postgres with a JSON type for the flexible parts
of your data, and relational data for the bits you know about.

And without knowing your schema in advance, how do you ensure consistency? If
you want consistency with mongo you basically need to either group all data
that will get modified in any one logical operation into a single document, or
you need to have (potentially extremely complex) strategies for resolving
conflicts in between modifications to groups of documents. Achieving either
implies a decent bit of domain knowledge.

------
corresation
As mkohlmyr mentions, it is unfortunate that MongoDB is offhandedly dismissed,
however by the second paragraph of the linked article you have to appreciate
how such hostility comes about.

 _But what is it that SQL databases could not solve which lead to the
evolution of NoSql databases_...{some "big data" changes to the world}...
_Which means processing huge amount of data. Which SQL databases were never
designed for._

Firstly, specialized data storage and retrieval, including unstructured or
document oriented, existed long before SQL did. Your filesystem is just such a
system. Everything old is new again.

And secondly, a citation is required for the SQL databases "were never
designed for" huge amounts of data claim. To start with, a definition of huge
is necessary to make such a claim. 1GB? 1TB? 100TB? There are SQL solutions
that work with all of those with ease. Many of the largest databases on the
planet are humming away on SQL systems right now. SQL is abstract from the
underlying platform, so if you have a cluster of 100 machines each in front of
1000 storage arrays each in front of 100 SSDs, it doesn't suddenly become
"NoSQL".

SQL databases are generalized solutions. They generally do not solve specific,
individual problems as well as precisely engineered solutions, which is why
giant companies like Google, with extremely precise needs (e.g. index the web)
have solutions that do a much better job for their purposes. Does that apply
to _your_ needs at all, though? Are you looking at your specific requirements,
engineering precise storage and retrieval that is optimal? Probably not.
Saying "MongoDB is under the umbrella of NoSQL, and NoSQL also kind of
encompasses highly specialized solutions from industry leaders, so it will
work for my startups contacts databases" is very, very poor, misleading
reasoning. But we see it all of the time on HN in regards to solutions like
mongodb. Which again is how you get responses that might seem hostile.

