
MongoDB's Write Lock - rick446
http://blog.pythonisito.com/2011/12/mongodbs-write-lock.html
======
DenisM
> MongoDB, as some of you may know, has a process-wide write lock.

I've never taken time to see what MogoDB db is, but thanks to this opening
sentence, now I know everything I ever wanted to know about this system.
Having worked for 13 years on database system design I am pretty confident
that a system not designed with concurrency in mind cannot be retrofitted with
any decent concurrency later.

Thank you Rick for saving me the time.

~~~
rogerbinns
There was a poxy operating system a few years ago. It only ran on one 32 bit
architecture, didn't have multi-processing, had limited device support and a
rather bizarre set of dev tools. Then they tidied things up a bit. But when
they added multi-processing these morons just used a single big kernel lock.
What a bunch of idiots. Obviously anyone using that operating system was blind
and stupid and there were far better solutions.

That operating system is Linux. It started out very simple and was good enough
for many people and then kept evolving. Nowadays the alternatives are mainly
footnotes in history.

MongoDB is also simple. Its locking approach does not give wrong answers. Its
users are happy. Your assertion that the locking can't evolve is bollocks
because you have no clue how the database works. While your statement has many
elements of truth for a relational database, it is meaningless for the current
generation of schemaless/NoSQL databases. Did you know that MongoDB has auto-
sharding and replication, and doesn't need locks for that either?

~~~
DenisM
I would argue Linux ended up with decent concurrency because both Sun and IBM
poured huge heaps of dollars into making it so with their contributions, plus
huge efforts from other contributors. If someone pours same amount of effort
into MongoDB they might succeed with a retrofit/rewrite as well. It's
relatively rare though.

------
orthecreedence
> If you are able to do this, it turns out that the global write lock really
> doesn't affect you. Blocking reads for a few nanoseconds while a write
> completes turns out to be a non-issue. (I have not measured this, but I
> suspect that the acquisition of the global write lock takes significantly
> longer than the actual write.)

Actually, it _does_ affect you. I have worked with mongodb in production in a
high-write scenario with about 1000 clients and it slowed to a crawl. All data
was in memory. The server was not breaking much of a sweat. mongostat showed
upwards of 60 queued reads/writes at any given time.

The only solution was to shard, but I feel like an enormous server like the
one we were using should be able to handle 1000 writing clients.

Keep in mind this was version 1.8. I no longer work at the company where this
happened and cannot testify to the performance of 2.0, but 1.8 has abysmal
write performance.

~~~
mnutt
Were your writes changing the size of the documents so that mongo had to move
them? I've had this happen and it'll cause mongo to grind to a halt.

~~~
orthecreedence
Some of them were, but others were just updating a boolean or an integer
value. For the most part we tried to pad our records, but I'm sure there was
some moving along the way.

~~~
rick446
I guess it depends on what % of your writes were simply updating a boolean or
integer value. My benchmark shows that simple updates like that don't affect
query performance much. Writes that take longer probably have different
performance characteristics, YMMV, etc.

------
xxqs
Why is everyone paying so much attention to MongoDB? It has been criticized a
lot for its design and implementation problems, but still for some reason it's
so popular.

To name a few,

* word-unaligned memory structures, which leads to incompatibility with virtually any non-x86 CPU architecture

* explicitly little-endian processing in the server, so there is no way to run the original code on any big-endian CPU architecture.

There has been a patchset which was tested on a SPARC CPU, but last time I
asked the author, the 10gen team completely ignored this effort.

apart from that, there have been reports of data loss without any failure note

~~~
ketralnis
Yeah, how dare people like what you don't like?

> It has been criticized a lot [...] but still for some reason it's so popular

Since you provide no data or sources for "criticized a lot" it's no surprise
that you don't provide the same for "so popular". I assume you mean "I've seen
some headlines on Hacker News about it".

> incompatibility with virtually any non-x86 CPU architecture [...] no way to
> run the original code on any big-endian CPU architecture

Huh. Maybe that's only a problem for people on non-x86 CPUs then?

Look I'm really not a fan of Mongo but "It has been criticized a lot [...] but
still for some reason it's so popular" describes _every technology ever_. Get
over it.

~~~
xxqs
> Maybe that's only a problem for people on non-x86 CPUs then?

actually it's a problem for application developers. I cannot rely on a backend
system with limitations like these. So I'll have to go back to the RDBMS
backend or look for other nosql alternatives, but definitely mongoDB is off my
list

~~~
nknight
The world runs on x86. You might have some legacy systems running SPARC or
POWER, but those systems are unlikely to reside in MongoDB's target market
anyway.

Arguing everyone else should ditch code that happens to not work on your pet
architecture is a pretty self-centered worldview.

~~~
xxqs
actually the number of ARM processors is growing, and not only in the mobile
sector. There have been some efforts to bring ARM architecture into the server
market. Also China is building its own MIPS-based supercomputer. Also the
SPARC architecture is actually developing, although it's a pity to see it
swallowed by Oracle. IBM is still shipping PowerPC servers.

besides, there are huge SPARC-only datacenters still running.

~~~
nknight
> _There have been some efforts to bring ARM architecture into the server
> market._

None of which have really gone anywhere.

> _China is building its own MIPS-based supercomputer._

You think a lot of supercomputers are going to be running Web-targeted NoSQL
datastores?

> _Also the SPARC architecture is actually developing_

So its partisans have insisted for the past decade. The real world doesn't
much care. Sun/Oracle was less than 2% total market share for servers in 2010,
by the way -- and that includes their x86 sales.

> _IBM is still shipping PowerPC servers._

How many? To whom? They had 13% market share in 2010, and they sell a lot more
x86 servers than POWER.

> _besides, there are huge SPARC-only datacenters still running._

There won't be much longer, and the ones that are mostly run legacy
infrastructure.

There's not even a convincing case for 10% of new servers running not-x86.
Greenfield systems don't care about not-x86. It's just irrelevant to pretty
much everybody. There's no reason for the MongoDB guys to make it any sort of
priority.

~~~
xxqs
a thoughtful software designer would build an endian-neutral and memory-
aligned architecture in the first place. In case of Mongo, the designers don't
seem to really understand how the computer hardware operates on the low level.

for the rest of your points, neither you nor me can tell what would be the
most selling server architecture in 3 years. Do you intend to design software
with unpredictable lifespan? I don't.

------
fleitz
Why not just go take the write lock out of the db all together if it doesn't
need to be journalled? It's obvious at that point that missing / mangled data
is acceptable. It's not particularly amazing that not writing data to disk is
faster than writing to disk. What IS amazing is that the geniuses at 10gen
have some how managed to make not writing to disk thousands of times slower
than writing to disk.

Who would design a database so shitty that journalling the data impacts
performance. Typically you only need one spindle for a journal to support 100
to 200 data spindles. If you can't pull 80 to 90 mb/sec sustained write from a
log drive something is seriously wrong.

48 iops now that's what I call "web scale". Let me just throw out my acid
database that does 30000 iops on _Win2k3_ of all things, to get 48 iops with
out journalling.

~~~
peschkaj
MongoDB's journal is another collection that only syncs to disk 10 times a
second. It's not a true journal like a write-ahead log. You can force that
collection to fsync, but... yeah, I get get stupendous performance using a
cheap RAID controller on commodity hardware with an old and busted relational
database. Hooray for 40 years of technology!

------
ilaksh
The thing is that you are still supposed to keep the whole working set in
memory and use sharding if its larger than that. Which means that none of this
is really relevant.

Except that now with that new graph showing such good performance on reads
during paging people are going to get confused.

Anyway you can get 32GB of RAM for $232 or 48GB for $636. Which means that for
90% of applications, you actually don't need to shard. And if the journaling
works (people with a vested interest in relational systems really have to hope
that it doesn't), then I don't have to worry about my data disappearing, even
if I only have one database server, which is also not a recommended design
with MongoDB (or any database really).

I have years of experience with SQL Server, Oracle and MySQL. However, MongoDB
is the most attractive database now because it makes the object-relational
impedance mismatch go away. [http://en.wikipedia.org/wiki/Object-
relational_impedance_mis...](http://en.wikipedia.org/wiki/Object-
relational_impedance_mismatch)

If I can write code (in CoffeeScript using a library like Mongolian Deadbeef)
like this

    
    
        posts.insert
          pageId: "hallo"
          title: "Hallo"
          body: "Welcome to my new blog!"
          created: new Date
    
        posts.findOne
          pageId: "hallo"
        , (err, post) ->
    
        posts.find().limit(5).sort(created: 1).toArray (err, array) ->
    

then whey would I want to deal with separate steps of setting up the
relational database tables, creating stored procedures, creating a software
layer to map my objects to my tables etc., or hiring a DBA?

I believe that most of the hate for MongoDB is fueled by a survival instinct.
The popularity of databases like MongoDB threaten to make years of experience
obsolete and threaten the existence of the DBA profession. Relational
databases are great, but they were an optimization designed to solve certain
problems that most people today just don't have, and now they have become an
unfortunate institutionalized dogma.

~~~
JanezStupar
From multiple years of NoSQL experience. The Object/Relational impedance
mismatch stays right there where you leave it.

Using K/V stores will only help you write data. But the whole impedance
nightmare is right there, waiting for you to try and make some sense of the
data. Especially if you want to do relations.

And you are dead wrong about relational DB's. It is either due to habit or
because of being a better fit that users demand representations of data that
are best served from a relational source. So you better plan your data models
wisely, because you WILL pump this data into a relational source, sooner or
later. It would be wise do keep a schema around all the time.

In the end it is merely a question of data normalization and the use case at
hand.

~~~
latch
> The Object/Relational impedance mismatch stays right there where you leave
> it.

Somewhat umbrella...but I do think that document-based storage does make a
difference. It certainly doesn't erase it and, as you say, it varies based on
the use-case at hand, but I think most people would consider the development
experience to have less friction (and that's certainly been the overwhelming
anecdotal evidence I've heard (and can give)).

~~~
prodigal_erik
I have to wonder whether that's because many devs started using schemaless
databases for the first time over the last couple of years, and haven't yet
really experienced the nightmare of data which was scribbled on by various
forgotten buggy versions of the apps and never rigorously migrated (because
they're self-selected to regard that as unimportant). I once worked on a Lotus
Notes-based system with documents eventually reaching such nonsensical states
that the dev team couldn't even say what app behavior would be appropriate,
much less what the latest version of all our code would happen to do.

~~~
JanezStupar
Exactly, I have done most of my work on legacy (think 10-15 years worth of
data, millions of documents that were spawned in countless application
versions, without any schema tracking whatsoever - everything is implicit in
the document itself) Lotus Domino applications.

Since there is no explicit database schema in these types of databases, what
you didn't do at write, you have to do at read. And usually you want to use
the latest view or representation of data, what do you do with data that
wasn't there ten versions ago. What do you do about data fields of wrong
format? What do you do with data that is "orphaned" and cannot be referenced
to other data. Yet it still is data and still is important.

Don't get me wrong, I Love NoSQL and I like to use it. I just have enough
experience with it to know that it is definitely not a silver bullet.

By the way: If someone is looking to hire a guy who is not afraid of tackling
this kind of issue, contact me. I have plenty of experience with coercion of
non relational data into a form suitable for analysis.

------
milkshakes
thank you for this. i don't understand why 10gen didn't put something out like
this in the first place, it would definitely helpfully frame a lot of the more
annoying discussions i've had.

~~~
rick446
Glad to help! 10gen actually has a policy of never sharing benchmarks so that
explains why they never said anything.

------
steve8918
Does anyone know what the performance differences would be between MongoDB and
SQL Server/Oracle if they all had enough RAM to hold the entire dataset in
memory?

I'm only guessing but it seems to me that any database with their entire
dataset in memory would be very fast, no?

~~~
megaman821
There were some slides showing PostgreSQL with fsync turned off performed
about the same as MongoDB. There is no MongoDB secret sauce that makes it any
faster than well established relational dbs with a few configuration tweaks to
make the comparison even.

------
amalag
I think Mongo is great for some use cases. There are some use cases where the
flexible json data just makes sense. Regarding his benchmarks, he turned off
journaling. Would love to see them with journaling turned on, see how much is
relevant.

------
nknight
> _In MongoDB version 2.0 and higher, this is addressed by detecting the
> likelihood of a page fault and releasing the lock before faulting._

I'm assuming MongoDB tries to detect this with OS-specific syscalls. Has there
been any attempt to determine whether it would be even faster and/or more
portable to just unconditionally "read" the pages before acquiring the lock?

~~~
latch
I forget who, but a fairly popular implementation of MongoDB once posted about
their experience, and they mentioned that they always did a find before doing
an update.

Every now and again you'll see this approach get suggested in the groups.

~~~
dolinsky
Sounds like a great extension of the benchmarks provided in the article.

