
Go Cry on Somebody Else's Shoulder: MongoDB is fine - slyphon
http://blog.slyphon.com/post/12435929063/go-cry-on-somebody-elses-shoulder-mongodb-is-fine
======
pilif
In my opinion, there is but one feature that a database really must have:
whatever data I write into it, if I don't delete it, I want to read it back -
unaltered (preferably without needing at least three machines, but I'm willing
to compromise)

Software that cannot provide this single feature just isn't something I would
call database.

If it's unsafe default configurations or just bugs. I don't care.

Between these two articles over the weekend and some from earlier, personally,
I don't trust MongoDB to still have that feature and as such, it needs much
more than one article with a strongly worded title to convince me otherwise.

~~~
smacktoward
But wouldn't you rather pin your mission-critical data to a table and give it
a night it will never forget? :-D

<http://www.youtube.com/watch?v=b2F-DItXtZs>

~~~
Luyt
And with transcript: <http://www.mongodb-is-web-scale.com/>

------
gfodor
The author would have us believe that it's unfair to pick on any piece of
software because it "all sucks." They'd also have us believe that complaining
about your data disappearing in MongoDB is an unfair criticism, and then takes
the logical leap that judging the destruction of data and buggy software
somehow has something to do with your own ability to create backups. Generally
speaking the people who have been burned by MongoDB have survived by the fact
that _they had backups_. This has nothing to do with the fact that their
database nuked their data and that this is unacceptable if it happens due to
careless engineering or poor defaults.

Edit: To be fair, if MongoDB was advertised as a "fault-intolerate, ephemeral
database that is fast as heck but subject to failure and data loss, so do not
put mission critical information in it" then all bets would be off. But we
know that's never going to happen.

~~~
bluemoon
As he highlights, it is software it will likely have bugs

~~~
Joeri
Yes, but i've been using oracle databases for almost a decade and have never
known it to drop data on the floor through bugs (only through user error). Not
saying it doesn't happen, just that it's not a common event. It seems with
mongo you should expect dataloss.

~~~
wisty
Mongo people may say that's a good thing - if you aren't _planning_ on
dataloss, you are just begging for a disaster. And Mongo will force you to
deal with recovery early on.

That's no excuse for the DB being buggy, but _some_ of Mongo's problems are
due to hard design constraints - it's not so easy to make a DB that is fast
and reliable, and easy to configure. Other's are due to it being immature.
Some of it is concerning - it seems it can crumble under heavy write load -
not so great for a DB who's selling point is "fast at scale".

Part of Mongo's charm is how it works on a stock system. For traditional DB's,
they cache stuff in RAM, then the OS caches the stuff they cached in RAM and
swaps their cache to disk. Then you modify something, and the OS swaps the DB
cache from disk to RAM, then the DB tells the OS to write the change to disk
invalidating your OS disk cache, which then ... you get the picture. Mongo
(and Couch) use the OS's cache, which is suboptimal on a tuned machine, but
optimal on something you just threw together.

~~~
rdtsc
> if you aren't planning on dataloss, you are just begging for a disaster.

General rule for picking products -- if a product, by design is supposed to
teach you a lesson about backup strategies, don't use that product.

~~~
wisty
No, just that there's an upside to their risky design philosophy.

I _like_ Mongo because of its documentation. It's really really great. And
good documentation = widespread adoption, and a team who actually cares about
user's needs. What they really need is a lengthy tutorial on backups (which
they already have written), linked from every page in their documentation.
Because their reliability not something they should be hiding.

~~~
rdtsc
Sure there is an upside, nothing against, but the trade-off they made should
have been advertised on their front page (before they fixed the defaults) in
large bold flashing letters -- "you might lose your data if you use this
product with default options". That is all.

Why? Because they are making a database not an rrd logger or in memory caching
server.

> What they really need is a lengthy tutorial on backups.

As I put it the grandparent post, as a general rule, avoid products whose
mission is by design to teach you backup discipline. That is all.

> a team who actually cares about user's needs.

You know what is a better way to care about users' needs? Not losing their
data because of a bad design. We are not talking about generating a wrong
color for a webpage or even exceptions that are thrown and server needing
restart, we are talking about data being corrupted silently without users
noticing. Guess what, even backups become useless. You have no idea your data
is corrupted, so you keep backing up corrupted data.

------
cap10morgan
The difference between MongoDB and many of the other popular persistent data
stores (relational or not) is one of degree, not of kind.

MongoDB isn't a _fundamentally_ flawed system. It's just that the distance
between what 10gen (and many of its defenders) claim and what it delivers is
much greater than most other data storage systems. This is a subtle thing.

Many people have attempted to use MongoDB for serious, production
applications. The first few times they encounter problems, they assume it's
their fault and go RTFM, ask for help, and exercise their support contract if
they're lucky enough to have one. Eventually it dawns on them that they
shouldn't have to be jumping through these hoops, and that somewhere along the
way they have been misled.

So it's not like anyone is misinterpreting the purpose and/or problem domain
of MongoDB. It's more that they are exploring the available options, reading
what's out there about MongoDB, and thinking, "Gosh, that sounds awfully cool.
It fits what I'm trying to build, and it doesn't seem to have many obvious
drawbacks. I think I'll give that a try." And then they get burned miles
further down the road.

If MongoDB were presented as more of an experimental direction in rearranging
the priorities for a persistent data store, then that would be fine. That's
what it is, and that's great! We should have more of those. But when it's
marketed by 10gen (and others) as a one-size-fits-all, this-should-be-the-new-
default-for-everything drop-in replacement for relational databases, then it's
going to fall short. Far short.

------
cperciva
_I hate to break it to the poster (and I would if they hadn’t chickened out
and actually put their name on their post) but software has bugs._

This is not a valid excuse. This is like running a red light, smashing into
someone, and then telling them "hey, you should have looked before entering
the intersection... you should know that people sometimes run red lights".

Yes, you should have backups. No, that doesn't make data-loss bugs any more
excusable.

~~~
nullymcnull
The anon poster claimed to have deployed an early version of Mongo, at a "high
profile" company with tens of millions of users, and yet seemed surprised by
basic RTFM facts like 'must use getLastError after calls if you need to ensure
your write was taken', even well into a production deploy. That should raise
huge alarm bells for anyone who is considering taking the guy seriously.

It's just not clear that there were bona-fide 'data-loss bugs' in play here.
Seems at least as likely that misuse and misunderstanding of Mongo led to
data-loss that could have been avoided.

So, I'd revise your simile. This is more like ignoring a lot of perfectly safe
roads which lead to where you're trying to go, instead choosing to chance a
more exciting looking shortcut filled with lava pits and dinosaurs. And
putting on a blindfold before driving on to it.

Look, NoSQL is wild and wooly and full of tradeoffs, that's a truism by now.
If you use such tech without thoroughly understanding it, and consequently run
your company's data off a cliff, _absolutely_ it's on you. Mongo does not have
a responsibility to put training wheels on and save naive users from
themselves, because there should not be naive users. These are data stores,
the center of gravity for an application or a business. People involved in
choosing and deploying them should _not_ be whinging about default settings
being dangerous, about not getting write confirmations _when they didn't ask
for write confirmations_ , etc. There's just no excuse for relying blindly
upon default settings. Reading the manual on such tech is not optional. Those
who don't and run into problems, well, they'd be well-advised to chalk it up
as a learning experience and do better next time. Posting "ZOMG X SUCKS
BECAUSE I BURNED MYSELF WITH IT" is just silly, reactionary stuff, and it
depresses me that HN falls for it and upvotes it like it's worth a damn, every
freaking time.

~~~
foolinator
I still can't find a company that processes millions of dollars that uses
MongoDB without the help of real databases.

~~~
ceejayoz
Why would you? That's not what it's made for.

No one's doing nuclear physics simulations using JavaScript, but that doesn't
make it useless for client-side validation.

------
cmer
Mongo is fine until it's not. It's been fine for us for many months, but once
you hit its limitations, it's pretty horrible. We're in this situation right
now and we're seriously considering moving back to MySQL or Postgres.

Basically, "it doesn't scale" unless you throw tons of machines/shards at it.

Once they fix a few of their main issues such as the global write lock and fix
many of the bugs, it could become an outstanding piece of software. Until
then, I consider it as not ready for production use in a write-intensive
application. Knowing what I know now, I certainly would not have switched our
data to MongoDB.

~~~
ubernostrum
_Mongo is fine until it's not._

My experience is that this is true of _every_ database system (relational or
non-). The thing is that they all break in different ways at different points,
and so the smart thing to do is make choices based on that information.

The stupid thing to do is write blog posts about how Software Package X sucks
and nobody should use it for anything.

~~~
CaptainZapp
I work with databases in extremely high OLTP workload environments since 20 or
so years.

We're talking enterprise products, mostly Sybase, some Postgresql and very
little Oracle.

Have I encountered bugs?

Sure, tons of them. Some of them grave enough to render the specific version
of the database software unusable in the context of the project I worked on.

However, in all this time I probably dealt with no more then 3 - 5 corrupt
databases, none of them went corrupt due to a database bug. Usually it was
related to hardware failure,

Arguing that database corruption is inherent in the design of the product is,
from a database perspective, beyond the pale.

A database "breaking" is absolutely not the same as a database blasting your
data into corrupt confetti.

~~~
ubernostrum
So...

If you actually go through the various stuff posted, you find a recurring
theme: people who lose data fall into a pattern of "well, they told me not to
do this, but I did it anyway, so now it must be their fault".

Which, I think you'll find, is a far cry from "database corruption is inherent
in the design".

But hey, learning that sort of thing would require reading; much easier to
jump on a bandwagon, badmouth a product and downvote anyone who disagrees,
amirite?

------
tptacek
I'm not sure how hosting a video for Gawker builds a lot of credibility for
having field tested a database; perhaps there are more details he can provide
about how that is a particularly interesting trial for a database. Among other
things, that seems like "lots and lots of reads, very few writes" and a "very
consistent access pattern regardless" kind of situation.

~~~
slyphon
Ha! Fair point. I thought it was an interesting trial in that all updates to
our user data wound up being published into MongoDB. All the other tools we'd
tried for this purpose, CouchDB, MySQL with both MyISAM and InnoDB and even
"thousands of .js files in a hashed directory structure" didn't perform as
well. It allowed us to shift the load from our MySQL database to "something
else" as during our spikes we were getting killed. It was a read-heavy
workload in that case.

The thing that struck me about the original post was how it seemed some of the
complaints were just normal things that people learn when dealing with
clusters under load. "Adding a shard under heavy load is a nightmare." Well, I
mean, _duh_. If you add a shard and the cluster has to rebalance, you're
_adding load_. It's like how you're more likely to get a disk failure during a
RAID rebuild. The correct time to add a shard is during the off hours.

------
sqrt17
New slogan: "MongoDB. Better than JSON file blobs on an NFS store."

~~~
tomlin
JSON file blobs don't require any configuration on my part, so IMHO, JSON is
better than MongoDB.

~~~
randomdata
Unless you only need single key indexes, your custom made indexer/querier is
going to require _a lot_ of configuration. If you only need single key
indexes, you wouldn't choose MongoDB anyway.

------
zobzu
I love articles that start with attacks in the title, specially cheap ones
such as "go cry elsewhere" instead of giving real arguments.

6th grade all over again.

------
andrewf
Can somebody recommend a database with an API like Mongo's, but performance
and durability more like Postgresql or Oracle's?

What I want to do is throw semi-structured JSON data into a database, and
define indexes on a few columns that I'd like to do equality and ranged
queries on. Mongo seems ideal for this, but I don't needs its performance, and
want durability and the ability to run the odd query which covers more data
than fits into RAM, without completely falling over.

Right now, the alternative is to do something like the following in Postgres,
and have the application code extract a few things from the JSON and duplicate
them into database columns when I insert data.

    
    
      CREATE TABLE collected_data(
        source_node_id TEXT NOT NULL,
        timestamp INTEGER NOT NULL,
        json_data TEXT);
      CREATE INDEX collected_data_idx ON collected_data(source_node_id, timestamp);

~~~
socratic
This may be what you mean, but a common approach seems to be to create extra
tables to create sort-of-implicit-schema-less-indexes, e.g.,:

<http://news.ycombinator.com/item?id=496946>

A PostgreSQL-specific alternative might be to write triggers in one of the
provided procedural languages to turn your JSON into something indexed or
materialized elsewhere.

Do either of those work for you?

Also, purely out of curiosity, do you have a design reason for only wanting to
store schema-less JSON, or have you just been burned by slow database
migrations in the past?

There seems to be a big community of people who really want to reject schema
and use JSON for everything, and I'm really curious if they (a) don't
understand relational databases, (b) are getting some surprising productivity
gains somehow, (c) have been burned by slow database migrations in the past,
or (d) some other reason.

~~~
andrewf
All of the above would work, but feels less than ideal. I'm pretty comfortable
using a well-schema'd relational database to manage data, but I don't think it
fits something I'm working on atm.

I'm collecting and parsing data from a few different types of sources (think:
some web page scrapers, Twitter, RSS feeds) for later analysis. I want an
intermediate data store where I can throw all of the data together for
querying in the short term (within days).

Some of the features I extract from it will probably be stored for longer-term
use in a regular database. The JSON itself I expect won't ever be referred to
in the long term.

If I can think of a new piece of data I might want to look at, it's very
appealing to be able to just print it out in one of the data-gathering
programs, without having to touch the entire stack top-to-bottom, deploy a new
schema, etc.

------
josephcooney
Anecdotes like the one from the article that ended with "The one thing that
didn’t flinch was MongoDB" don't convince me one bit. When something else
between the end user is a bottleneck it would be silly to assume that is the
only problem in the entire system. Who is to say that if the load balancers
hadn't been configured differently, or higher spec'd that their MongoDB
wouldn't have become a smoldering crater?

~~~
spectre
While anecdotal evidence is always suspect, remember that the case in the
article is MongoDB's optimal use case. That being of being extremely read
heavy (There is no indication that they did more than one write that day).

------
KaeseEs
"First of all, with any piece of technology, you should, y’know, RTFM. EVERY
company out there selling software solutions is going to exaggerate how
awesome it is."

Ah, but it isn't the _company_ that's exaggerating the wonders of MongoDB...

------
bigfun
Actually, the default engine for MySQL 5.5 and later is InnoDB, not MyISAM
(for some reason :>).

------
xyan2284
remind me on the arguments of which programming language is the best. There is
no best technology. Different technologies are designed for different
engineering problems. You can't really blame the technology when you are
choosing the wrong tools for your problem.

~~~
dextorious
And there are some tools that are shit, and are unfit for every project.

We should never forget that, too. Even in carpentry, there _are_ badly made
hammers.

------
mambodog
If you're going to choose one of these high performance NoSQL DBs _you are
trading ACID for that performance_. How hard is this to understand guys? If
that doesn't suit your purposes, don't use it.

~~~
dextorious
1) No one said we are trading ALL of ACID. D should never be traded, period,
except for transient data or cache.

2) We don't even get the performance guarantee. See the pastebin post about
how the write lock affects performance and how synchronization with a slave
can go awry.

