
A Year with MongoDB - mitchellh
http://blog.engineering.kiip.me/post/20988881092/a-year-with-mongodb
======
rdtsc
> Safe off by default

I think that is fixed now.

But this the single most appalling design decision they could have made while
also claiming their product was a "database". (And this has been discussed
here before so just do a search if you will).

This wasn't a bug, it was a deliberate design decision. Ok, that that would
have been alright if they put a bright red warning on their front page. "We
disabled durability by default (even though we call this product a database).
If you run this configuration as a single server your data could be silently
corrupted. Proceed at your own risk". But, they didn't. I call that being
"shady". Yap, you read the correctly, you'd issue a write and there would be
no result coming back acknowledging that at least the data made into the OS
buffer. I can only guess their motive was to look good in benchmarks that
everyone like to run and post results on the web. But there were (still are)
real cases of people's data being silently corrupted, and they noticed only
much later, when backups, and backups of backups were already made.

~~~
jbellis
According to <http://www.mongodb.org/display/DOCS/getLastError+Command> it is
still unsafe by default.

~~~
latch
No, the journal file is turned on server-side and you'll get a journal append
every 100ms (by default).

~~~
vsl
That's not what is meant by "safe" here. Safe is when you can be sure that the
data were written (into the journal at least) _by the time the write call in
your code returns_. Doing it up to 100ms later leaves a wide window open when
the application believes the data are safely stored, while they are in fact in
RAM only.

------
Philadelphia
The impression I got after hearing some of the 10gen developers speak at a
conference is that MongoDB has the same essential problem as PHP. It was
written by people without a lot of formal knowledge who, for whatever reason,
aren't interested in researching what's been tried before, what works, and
what doesn't. Because of that, they're always trying to reinvent the wheel,
and make flawed design decisions that keep causing problems.

~~~
latch
One of the silliest things I've read on HN, largely because of how insulting
it is, yet how easy it is to verify.

Rather than making a comment that can be summed up as "i think it's written by
people who don't know what they are doing", why not Google them and find out
what their background is?

The developers at the conference might not have represented the kernel team
(it's a surprisingly large company with a lot of different development
branches (core, drivers, tools, support)).

------
sirn
> We changed the structure of our heaviest used models a couple times in the
> past year, and instead of going back and updating millions of old documents,
> we simply added a “version” field to the document and the application
> handled the logic of reading both the old and new version. This flexibility
> was useful for both application developers and operations engineers.

Ugh, this sounds like a maintenance nightmare. How do you deal with adding
extra field to the document? Do you ever feel the need of running on-the-fly
migration of old versions? (But when you do, shouldn't running a migration for
all documents a better idea?)

I'll admit I'm a non-believer, but every time I see "Schemaless" in MongoDB, I
think "oh, so you're implementing schema in your application?"

~~~
radicalbyte
> oh, so you're implementing schema in your application?

Isn't that where the schema belongs? Each document represents a conceptual
whole. It doesn't contain fields which have to be NULL simply because they
weren't in previous versions of the schema.

I've been an rdbms guy (datawarehousing/ETL) for a long time now, I've seen a
lot of large databases which have been in production for considerable time.
They get messy. Really messy. They become basically unmaintainable. Apples,
oranges and pears all squashed into a schema the shape of a banana.

It's a pretty elegant solution, and is the problem XML/XSD were designed to
solve _.

The cleanest solution that I've seen in production used a relational database
as a blob storage for XML-serialized entities. Each table defined a basic
interface for the models, but each model was free to use its own general
schema. After 10 years it contained a set of very clean individual entities
which were conceptually correct.

_ As opposed to the usage as a serialization format for remoting, which has
been largely replaced with JSON.

~~~
trimbo
> isn't that where the schema belongs? [In the application]

Well, unless you have, you know, multiple applications accessing said data.
Then it's kind of important to keep it in sync, which is why RDBMS exist and
operate the way they do.

In my experience, on a long enough timeline, the probability of needing multi-
application access for your data goes to 1.

~~~
matthewcford
build an api

~~~
quadhome
It's a shame this comment is pithy, because I think it's dead-on.

There times to integrate at the database level. But, the default should be
single-application databases.

The rationale is the same as the grandparent's rationale FOR database
integration. The odds of needing to share data over time are 1.

Given that shared belief, the problem with database integration is that MANY
applications need to share facets of the same data. The single database ends
up having a huge surface area trying to satisfy every application's needs.

The resulting schema will have definitions relevant for applications A, B, and
C but X, Y and Z.

But, worse, there are dependencies between each application's working schema.
This means ensuring integrity becomes harder with every application that
integrates.

Finally, integration points are the hardest to change after-the-fact. The more
services that integrate, the less ability to make the inevitable model changes
necessary to fix mistakes/scale/normalize/denormalize/change solutions.

Thus, "build an api" is the best solution. Well-defined APIs and data-flows
between applications helps data and process locality and avoids most of the
problems I just listed. The trade-off is you're now meta-programming at the
process level— the complexity doesn't disappear, it's just reconceptualised.

~~~
fusiongyro
> The single database ends up having a huge surface area trying to satisfy
> every application's needs.

This is, more or less, exactly what views are for.

> Thus, "build an api" is the best solution.

And you can do that within the database with stored procedures, perhaps even
with the same language you would use in the front-end (depending). And look at
the advantages you have:

\- No implied N+1 issues because your API is too granular

\- No overfetching because your API is too coarse

\- No additional service layers needed

\- All the information is in the right place to ensure data validity and
performance

Let me be clear: I see these as two viable alternatives and different
situations are going to determine the appropriate tool. I bring this up
because I do think the NoSQL crowd overall has a very distorted and limited
picture of what exactly it is RDBMSes provide and why. If people look
underneath their ORMs, they may find an extremely powerful, capable and mature
system under there that can solve lots of problems well—possibly (but I admit,
not necessarily) even _their own_ problems.

~~~
quadhome
> \- All the information is in the right place to ensure data validity and
> performance

This is where we part ways.

We're talking specifically about integration. That means each system have
different processes and are talking with other people.

If this is a case of three apps exposing the same process over three different
channels (HTTP, UDP, morse code); then, database-level integration makes
perfect sense.

But, as soon as differing behaviors comes in, then the database level doesn't—
by definition— have enough information to ensure validity. One app thinks
columns X and Y are dependent in one way, the other app views it another way.
Now, one or the both of those apps are screwed for validity. And this problem
grows with N+1.

I am certainly not arguing against good databases. Stored procedures, views,
etc. are all great even for a single application. But, I am arguing database
level integration should be the rare exception to the rule.

------
luca_garulli
Hey, by reading all the bad things seems that OrientDB would fit better than
MongoDB for them:

\- Non-counting B-Trees: OrientDB uses MVRB-Tree that has the counter. size()
requires 0ns

\- Poor Memory Management: OrientDB uses MMAP too but with many settings to
optimize it usage

\- Uncompressed field names: the same as OrientDB

\- Global write lock: this kills your concurrency! OrientDB handles read/write
locks at segment level so it's really multi-thread under the hood

\- Safe off by default: the same as OrientDB (turn on synch to stay safe or
use good HW/multiple servers)

\- Offline table compaction: OrientDB compacts at each update/delete so the
underlying segments are always well defragmented

\- Secondaries do not keep hot data in RAM: totally different because OrientDB
is multi-master

Furthermore you have Transactions, SQL and support for Graphs. Maybe they
could avoid to use a RDBMS for some tasks using OrientDB for all.

My 0,02.

~~~
mtrn
Thanks for writing OrientDB! - I tried it, but I was pressed for time, so I
needed something that more or less worked instantly for my requirements -
which in the end was elasticsearch.

TL;

I researched MongoDB and OrientDB for a side-project with a bit heavy data
structure (10M+ docs, 800+ fields on two to three levels). MongoDB was
blazingly fast, but it segfaulted somewhere in the process (also index
creation needs extra time and isn't really ad-hoc). OrientDB wasn't as fast
and a little harder to do the initial setup but the inserting speed was ok -
for a while (500k docs or so) and then it degraded. I also looked at CouchDB,
but I somehow missed the ad-hoc query infrastructure.

My current solution, which works nice for the moment is elasticsearch; it's
fast - and it's possible to get a prototype from 0 to 10M docs in about 50
minutes - or less, if you load balance the bulk inserts on a cluster - which
is so easy to setup it's scary - and then let a full copy of the data settle
on each machine in the background.

Disclaimer - since this is a side project, I did only minimal research on each
of the technologies (call it 5 minute test) and ES clearly won the first round
over both MongoDB and OrientDB.

~~~
AdrianRossouw
i love ES, but i don't really feel comfortable with it as a primary datastore.
We tend to use couchdb to write to, and ES to query against. It all happens
automagically with a single shell command.

I won't use ES on it's own, because I have experienced situations in the past
where the dynamic type mapping functionality gets confused, ie: the first time
it sees a field, it indexes it as an integer, but then one of the later
records has 'n/a' instead of a number. The entire record became unquery-able
after that, even if it might have stored the original data.

You could fix this by creating the mapping by hand, BEFORE any data has been
imported, as it can't be modified later. But what you have then is a situation
where you have to maintain a schema to not get it to 'randomly' ignore data.

You also can't just tell ES to rebuild an index when you need to mess with the
mappings, you have to actually create a new index, change the mappings and
then reimport the data into the new index (possibly from the existing index).

It actually also feels right to me to split storing the data versus querying
the data between separate applications, because they have different enough
concerns, that being able to scale them out differently is a boon sometimes.

~~~
mtrn
Thank you for your input. Had minor issues with dynamic mapping, too - but
since the data is more or less just strings, I could circumvent ES' mechanism
to infer datatype from value by simple using an empty default-mapping.js. I'll
definitely give your approach a try.

------
tolitius
great post. direct and to the point, although there are many more flaws that I
am sure you could have shared.

we tried MongoDB to consume and analyze market feeds, and it failed miserably.
I can add a couple of things to your list:

* if there is a pending write due to an fsync lock, all reads are blocked: <https://jira.mongodb.org/browse/SERVER-4243>

* data loss + 10gen's white lies: [https://jira.mongodb.org/browse/SERVER-3367?focusedCommentId...](https://jira.mongodb.org/browse/SERVER-3367?focusedCommentId=66490&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-66490)

* _re_ sharding is hard. shard key is should be chosen once and for all => that alone kills the schemaless advantage

* moving chunks between shards [manually or auto] can take hours / days depending on the dataset (but we talking big data, right?)

* aggregate (if any complex: e.g. not SUM, COUNT, MIN, MAX) over several gigs of data takes hours (many minutes at best). Not everything can be incremental..

Those are just several. MongoDB has an excellent marketing => Meghan Gill is
great at what she does. But besides that, the tech is not quite there (yet?).

Nice going with Riak + PostgreSQL. I would also give Redis a try for things
that you keep in memory => set theory ftw! :)

~~~
taligent
MongoDB is successful because of more than just marketing.

It has great tool support, decent documentation, books and is accessible. Plus
the whole transition from MySQL concept makes it easy to grab onto.

~~~
chad_walters
That all supports the marketing effort. Mongo is optimized for a good out-of-
the-box experience for developers. It is basically the MySQL model -- hook
developers first, fix the fundamentals later. Caveat emptor.

~~~
madworld
Exactly, hook them in, so they question whether or not to deal with the
problems when it falls on its face.

------
lalmalang
We had a very similar situation ~300 writes per second on AWS. _but_ I suspect
some of this has to do with the fact that most people address scaling by
adding a replica set, rather than the much hairier sharding setup
(<http://www.mongodb.org/display/DOCS/Sharding+Introduction>), this seems
natural b/c mongodb's 'scalability' is often touted. In reality though,
because of the lock, RS dont really address the problem much, and we
encountered many of the problems described by the OP.

Not to denigrate the work the 10gen guys are doing -- they are obviously
working on a hard problem, and were very helpful, and the mms dashboard was
nice to pinpoint issues.

We decided to switch too though in the end, though i still enjoy using mongo
for small stuff here and there

~~~
jasonmccay
Again ... this is more an indictment on the poor IO performance of Amazon EBS
vs. MongoDB as a solution. MongoDB can scale both vertically and horizontally,
but as with anything you scale on Amazon infrastructure, you are going to have
to really think through strategies for dealing with the unpredictable
performance of EBS. There are blog posts galore addressing this fact.

I often think MongoDB has suffered more as a young technology because of the
proliferation of the AWS Cloud and the expectations of EBS performance.

~~~
lalmalang
In fact, I tried on-instance storage too -- this didnt help substantially. The
reality is that many (most?) stacks these days need to be able live happily on
AWS...

------
jrussbowman
From the beginning I've understood mongodb to be built with it's approach for
scaling, performance, redundancy and backup to be horizontal scaling. They
recently added journaling for single server durability, but before that
replication was how you made sure you data was safe.

It seems to me when I see complaints about mongodb it's because people don't
want to horizontally scale it and instead believe vertical scaling should be
more available.

Just seems to me people don't like how mongodb is built, but if used as
intended I think mongodb performs as advertised. In most cases I don't think
it's the tool, rather the one using it.

~~~
mitchellh
[Note: I wrote the blog post]

I'm not at all against horizontally scaling. However, I don't believe that
horizontally scaling should be necessary doing a mere 200 updates to per
second to a data store that isn't even fsyncing writes to disk.

Think of it in terms of cost per ops. Let's just say 200 update ops per second
is the point at which you need to shard (not scientific, but let's just use
that as a benchmark since that is what we saw at Kiip). MongoDB likes memory,
so let's use high-memory AWS instances as a cost benchmark. I think this is
fair since MongoDB advertises itself as a DB built for the cloud. The cheapest
high-memory instance is around $330/month.

That gives you a cost per op of 6.37e-5 cents per update operation.

Let's compare this to PostgreSQL, which we've had in production for a couple
months at Kiip now. Our PostgreSQL master server has peaked at around 1000
updates per second without issue, and also with the bonus that it doesn't
block reads for no reason. The cost per op for PostgreSQL is 1.27e-5 cents.

Therefore, if you're swimming in money, then MongoDB seems like a great way to
scale. However, we try to be more efficient with our infrastructure
expenditures.

EDIT: Updated numbers, math is hard.

~~~
zzzeek
typo in the numbers ? I can get your mongoDB number but not pg:

    
    
        >>> seconds_per_month = 60 * 60 * 24 * 30
        >>> ops_per_month_pg = 1000 * seconds_per_month
        >>> ops_per_month_mg = 200 * seconds_per_month
        >>> 330.0 / ops_per_month_pg * 100
        1.2731481481481482e-05
        >>> 330.0 / ops_per_month_mg * 100
        6.36574074074074e-05

~~~
nivertech
You could have skipped all the calculations and just say, that 1000/200=5, I.e
PostgreSQL 5 times more cost effective than MongoDB.

------
mrkurt
Part of the lesson here is that if you're doing MongoDB on EC2, you should
have more than enough RAM for your working set. EBS is pretty bad underlying
IO for databases, so you should treat your drives more as a relatively cold
storage engine.

This is the primary reason we're moving the bulk of our database ops to real
hardware with real arrays (and Fusion IO cards for the cool kids). We have a
direct connect to Amazon and actual IO performance... it's great.

~~~
mitchellh
> Part of the lesson here is that if you're doing MongoDB on EC2, you should
> have more than enough RAM for your working set.

We had more than enough RAM for our working set. Unfortunately, due to
MongoDB's poor memory managed and non-counting B-trees, even our hot data
would sometimes be purged out of memory for cold, unused data, causing serious
performance degradation.

~~~
jasonmccay
I understand your point, but the performance issues still stem off of poor IO
performance on Amazon EBS. As we continue to use it, we continue to find it to
be the source of most people's woes.

If you have solid (even reasonable) IO, then moving things in and out of
working memory is not painful. We have some customers on non-EBS spindles that
have very large working sets (as compared to memory) ... faulting 400-500
times per second, and hardly notice performance slow downs.

I think your suggestions are legit, but faulting performance has just as much
to do with IO congestion. That applies to insert/update performance as well.

~~~
ismarc
We are using Mongo in ec2 and raid10 with 6 ebs drives out performs ephemeral
disks when the dataset won't fit in RAM in a raw upsert scenario (our actual
data, loading in historical data). The use if mmap and relying on the OS to
page in/out the appropriate portions is painful, particularly because we end
up with a lot of moves (padding factor varies between 1.8 and 1.9 and because
of our dataset, using a large field on insert and clearing in update was less
performant than upserts and moves).

There's really two knobs to turn on Mongo, RAM and disk speed. Our particular
cluster doesn't have enough RAM for the dataset to fit in memory, but could
double its performance (or more) if each key range was mmapped individually
rather than the entire datastore the shard is responsible for just because of
how the OS manages pages. We haven't broken down to implement it yet, but with
the performance vs. cost tradeoffs, we may have to pretty soon.

------
axisK
Nice article, the write lock has really been making me think about whether
it's really the way to go in our own stack.

------
nomoremongo
Yyyyyyyup.

<http://pastebin.com/raw.php?i=FD3xe6Jt>

------
aschobel
We love MongoDB at Catch, it's been our primary backing store for all user
data for over 20 months now.

    
    
      > Catch.com
      > Data Size: 50GB
      > Total Documents 27,000,000
      > Operations per second: 450 (Create, reads, updates, etc.)
      > Lock % average 0%
      > CPU load average 0%
    

Global Lock isn't ideal, but Mongo is so fast it hasn't been an issue for us.
You need to keep on slow queries and design your schema and indexes correctly.

We don't want page faults on indexes, we design them to keep them in memory.

I don't get the safety issue, 20 months and we haven't lost any user data.
_shrug_

~~~
bretthoerner
> I don't get the safety issue, 20 months and we haven't lost any user data.
> shrug

Nobody loses any user data until they do.

~~~
siavosh
This should be a deal breaker for any serious app. Does the performance hit of
safe mode negate all other advantages of MongoDB?

~~~
madworld
That's most people's findings. If your dataset can fit in ram [1] and you
don't care about your data being safe then there might be an argument for
MongoDB. Once you care about your data, things like Voldemort, Riak, and
Cassandra will eat Mongo's lunch on speed.

[1] But as Artur Bergman so eloquently points out, if your data can fit in
ram, just use a native data-structure (<http://youtu.be/oebqlzblfyo?t=13m35s>)

------
jalons
Am I missing something, or did they say they didn't want to scale mongo
horizontally via sharding, then comment that they're doing so with riak, but
faulting mongodb for requiring it?

~~~
madworld
What they are doing with Riak isn't sharding. Riak from the ground up was been
designed as a distributed database. They didn't want to go horizontal when
really they shouldn't have to with their datasize based on Mongo's claims. The
problem is, Mongo lies about what their database can do, and the fact that
Kiip figured that out is why they didn't want to bother scaling out with mongo
as a band-aid for its problems. It was better for them to just use something
made to scale. That's how I read it, based on that blog and by his comments on
this post.

------
gregbair
If you're going to have a (management|engineering|whatever) blog for your
company/project, have a link to your company/project home page prominently
somewhere on the blog.

~~~
pearkes
Good point. For the record, we're at <http://kiip.me>

------
DonnyV
Please upvote collection level locking for MongoDB here.
<https://jira.mongodb.org/browse/SERVER-1240>

------
citricsquid
Not related to the article but the site: has anyone else been getting
"connection interrupted" errors with tumblr recently? If I load a tumblr blog
for the first time in ~24 hours the first and second page loads will result in
connection interrupted, the 3rd and beyond will all load fine.

------
pixelmonkey
This is a pretty epic troll on MongoDB, and some of their points are important
-- particularly global write lock and uncompressed field names, both issues
that needlessly afflict large MongoDB clusters and will likely be fixed
eventually.

However, it's pretty clear from this post that they were not using MongoDB in
the best way. For example, in a small part of their criticism of "safe off by
default", they write:

"We lost a sizable amount of data at Kiip for some time before realizing what
was happening and using safe saves where they made sense (user accounts,
billing, etc.)."

You shouldn't be storing user accounts and billing information in MongoDB.
Perhaps MongoDB's marketing made you believe you should store everything in
MongoDB, but you should know better.

In addition to that data being highly relational, it also requires the
transactional semantics present in mature relational databases. When I read
"user accounts, billing" here, I cringed.

Things that it makes total sense to use MongoDB for:

\- analytics systems: where server write thorughput, client-side async
(unsafe) upserts/inserts, and the atomic $inc operator become very valuable
tools.

[http://blog.mongodb.org/post/171353301/using-mongodb-for-
rea...](http://blog.mongodb.org/post/171353301/using-mongodb-for-real-time-
analytics)

\- content management systems: where schema-free design, avoidance of joins,
its query language, and support for arbitrary metadata become an excellent set
of tradeoffs vs. tabular storage in an RDBMS.

[http://www.mongodb.org/display/DOCS/How+MongoDB+is+Used+in+M...](http://www.mongodb.org/display/DOCS/How+MongoDB+is+Used+in+Media+and+Publishing)

\- document management systems: I have used MongoDB with great sucess as the
canonical store of documents which are then indexed in a full-text search
engine like Solr. You can do this kind of storage in an RDBMS, but MongoDB has
less administrative overhead, a simpler development workflow, and less
impedance mismatch with document-based stores like Solr. Further, with GridFS,
you can even use MongoDB as a store for actual files, and leverage MongoDB's
replica sets for spreading those files across machines.

Is your data relational? Can you benefit from transactional semantics? Can you
benefit from on-the-fly data aggregation (SQL aggregates)? Then use a
relational database!

Using multiple data stores is a reality of all large-scale technology
companies. Pick the right tool for the right job. At my company, we use
MongoDB, Postgres, Redis, and Solr -- and we use them each on the part of our
stack where we leverage their strengths and avoid their weaknesses.

This article reads to me like someone who decided to store all of their
canonical data for an e-commerce site in Solr, and then complains when they
realized that re-indexing their documents takes a long time, index corruption
occurs upon Solr/Lucene upgrades, or that referential integrity is not
supported. Solr gives you excellent full-text search, and makes a lot of
architectural trade-offs to achieve this. Such is the reality of technology
tools. What, were you expecting Solr to make your coffee, too?

Likewise, MongoDB made a lot of architectural tradeoffs to achieve the goals
it set out in its vision, as described here:

<http://www.mongodb.org/display/DOCS/Philosophy>

It may be a cool technology, but no, it won't make your coffee, too.

In the end, the author writes, "Over the past 6 months, we've scaled MongoDB
by moving data off of it. [...] we looked at our data access patterns and
chose the _right tool for the job_. For key-value data, we switched to Riak,
which provides predictable read/write latencies and is completely horizontally
scalable. For smaller sets of relational data where we wanted a rich query
layer, we moved to PostgreSQL."

Excellent! They ended up in the right place.

~~~
jeffdavis
So, they made some mistakes, learned from them, and ended up in the right
place.

Sounds like a great story for a blog post, that others might learn from as
well.

Calling it a troll -- just because their mistakes involved mongo and their
solution did not -- seems harsh.

~~~
pixelmonkey
Fair enough, perhaps it's not quite a troll. And I am not trying to devalue
the post or discussion.

But I'm finding that in this whole SQL vs. NoSQL debate, everyone is
desperately seeking the "one database to store everything" -- rather than
carefully evaluating trade-offs of systems before putting them into
production.

The conclusion of the article suggests that new projects start with
"PostgreSQL (or some traditional RDBMS) first", and then only switch to other
systems "when you find them necessary". Wrong conclusion. Think about what
you're building, and pick the _right_ data store for _your_ data.

~~~
jeffdavis
> everyone is desperately seeking the "one database to store everything" --
> rather than carefully evaluating trade-offs

Agreed.

> Think about what you're building, and pick the right data store for your
> data.

I partially disagree. Most businesses adapt considerably over time, and data
projects almost always expand as far as the engineering team can take them.
Even small businesses have a lot of different kinds of data, all held in
different systems and spreadsheets, and there is a lot of value in bringing
that data together (often driven by accounting).

So, at the beginning, you don't know what your data is, you just have a vague
idea (unless you are an early-stage startup with very specific data management
needs).

(Aside: your _queries_ are at least as important when choosing a data
management system as your _data_ ).

Traditional RDBMSs have been designed and have evolved over a long period of
time to offer pretty clear answers for the business data needs of most
"normal" businesses. It makes perfect sense to start with a general solution,
and try to pick out the special cases (e.g. "I need faster response on these
queries") as you go.

That doesn't mean that traditional RDBMSs are the only way to make a general-
purpose data management system. Maybe another architecture or model will come
along that will prove superior; or maybe it's already here and it's just not
mature enough.

But I would give very similar advice in most situations: start with a
traditional RDBMS, and then pick out the special cases as needed. Not all
cases, of course; a key-value system might be great for caching, or you might
need to do analysis that's just not working out with SQL.

------
paulsutter
This article is an excellent articulation of the strengths and (fixable)
issues with mongoDB.

I like MongoDB a lot, and the improvements suggested would really strengthen
the product and could make me more comfortable to use it in more serious
applications.

~~~
madworld
How is the global write lock "fixable" without a major rewrite of the
codebase?

Like the article suggested, it would be one thing if they did it for
transaction support. In reality, from looking at the code, it seems like the
global write lock came from not wanting to solve the hard problems other
people are solving.

~~~
geoffeg
DB-level locking is planned for MongoDB 2.2 which should be out within a few
months.

<https://jira.mongodb.org/browse/SERVER-4328>

~~~
spidaman
Meh, if your other option is to use PostgreSQL and get row level locks, a db
level lock is still a fail.

~~~
j-kidd
And here's a great post that provides some insight on how much effort has been
put in by RDBMS vendors to handle locking:

<http://stackoverflow.com/a/872808>

------
radagaisus
Start Up Bloggers listen up! If I click on your blog's logo I want to see your
product, not the blog's main page.

------
Jebus
_Uncompressed field names - If you store 1,000 documents with the key “foo”,
then “foo” is stored 1,000 times in your data set_

Oh my god. I didn't know about this. And I hate short, meaningless and anti-
intuitive field names. Please fix it mongodb devs!

~~~
gatesvp1
This is a long-outstanding bug, over 2 years old now:

<https://jira.mongodb.org/browse/SERVER-863>

Obviously you can vote for them to fix the bug. But it's been two years, so
I'm not sure it's really high on the priority list.

