Hacker News new | past | comments | ask | show | jobs | submit login
MongoDB 2.6 Released (mongodb.org)
143 points by francesca on Apr 8, 2014 | hide | past | web | favorite | 117 comments

A lot of hype... And we still have db level locking. If document level is too difficult, at LEAST do collection level (not that it is too much better, but least it some real improvement).

I was reading somewhere that Mongo can't do document level/record level locking because of mmap'ed files. The whole database is memory mapped. And mmap doesn't understand underlined data structure, it views the whole file as a large single blob.

Ditching mmap will not be that easy, cause most of the speed and simplicity of Mongo comes from using mmap.

mmap does complicate things. A traditional database often works using write ahead logs. The log holds changes made to the database over time - so when you want to write a change to your DB, you put 'I'm changing value x.y to 50' in your WAL. Sometime later the actual data pages holding the modified x data structure can be written out to disk. If you have a crash before the data pages get written out, you can 'replay' your WAL file to redo all the lost changes.

Unfortunately, a central requirement of a write ahead log is that the data pages must not get written before the WAL. If that were to occur, and there were a crash, the system wouldn't know that it had to undo the changes to the data pages, leaving you with corrupt data. mmap generally doesn't provide the ability to pin your dirty pages in memory - they're subject to getting flushed any time the system is under memory pressure - so it makes logging a lot more complicated to get right.

Just to note, the MongoDB journal is basically a write ahead log and has been around since 1.8. It's not simple, you are correct, and involved remapping portions of memory privately, and leads to some inflated numbers on the virtual memory reporting side. There's a great write up here:


That's interesting, thanks!

Yes, and an important limitation stems from this as well.

For a tool like Redis, it's fairly easy for me to accept the limitation that my data size can't exceed available RAM.

But for an indexed document store with full-text query capabilities, it's a lot harder to for me to accept that limitation.

Are you implying that a performant Mongo DB must keep all data in RAM? That's not true. We've got almost 1TB in a Mongo DB, and we sure don't have that much RAM.

You do have to be able to keep your indexes in RAM, but that's much less limiting.

Not 1:1, no - thanks for clarifying that. But depending on the content and the indexing, there is a strong correlation between the size of the database and the memory requirements under Mongo DB.

It doesn't use mmap for locking though - and in any case one database is already multiple files (2GB max). There are internal data structures used by MongoDB and the database itself takes care of locking, mmap just gets the data into memory.

"Finally, MongoDB 2.6 lays the foundation for massive improvements to concurrency in MongoDB 2.8, including document-level locking."

They hear you and are working on it.

It was about a year between the first release of 2.4 and the 2.6 release. Should we expect another full year for 2.8?

I believe they are on a ~yearly schedule now.

Heck I'd even be happy for them to do locking on a per file basis. (For those unfamiliar with mongodb, the actual database storage is broken up into 2GB files.)

I believe the 2 GB database limit is only applicable to 32-bit instances[0].

[0]: http://blog.mongodb.org/post/137788967/32-bit-limitations

You are confusing two things. I am talking about the files making up a database. MongoDB doesn't use one file per database - instead it allocates files up to 2GB in size (the first one starts at 64MB and then they double in size until the 2GB limit). So for example a database that is 200GB will consist of (roughly) 100 2GB files.

What you linked to is a consequence of memory mapping. Mapping a single 2GB file in a 32 bit process will use up virtually all the address space and you couldn't map more than one at a time.

Every Mongo release announcement I hold out hope for an improvement to database-level write lock, and every time I'm disappointing. Considering this is probably the #1 or #2 complaint people have with Mongo, I'm surprised it hasn't been addressed ahead of some of the other features that have made it in to recent releases.

I don't know anything about MongoDB but can you give an example where DB level locking is a problem?

High volume of writes

Note also that the lock blocks readers, and that writes are given priority over reads. Consequently even small volumes of writes can have major effects on readers.

High volume of writes from MANY producers.

From 1 producer, it doesn't matter.

Though the point of mongo is to be webscale which implies to me many writers.

What makes the single producer (writer?) case different from multiple, in the context of the effect on readers?

If there is a single high-volume data pump, for example machine generated data, will readers be affected by a continuous "fire hose" of incoming data?

In my experience, even 1 producer caused a full database lock for hours on our production server. We have 1 mongo server and an erroneous scheduled task of ours started at about 6am. It's only one process but the task basically re-syncs an entire collection (which was about 80,000 writes).

That single producer caused wide-scale locking/hanging for all readers on the website and I had to manually stop the task during business hours because of that. Oy!

Isn't that supposed to be the ideal use case for Mongo though?

Not for single-server performance. The database level lock severely limits MongoDB's single server performance. Just look up the sysbench benchmark comparing MongoDB with TokuMX (which I work on)

And it still relies entirely on the OS's scheduling algorithms for caching and IO. mmap is nice but it has no idea you're running a database.

Finally, MongoDB 2.6 lays the foundation for massive improvements to concurrency in MongoDB 2.8, including document-level locking.

This is exciting even if I don't expect it to happen soon.

TokuMX, which I work on, has document level locking and compression right now.

trying amisaserver found it somewhere in the comments below. claims to have MVCC.

sweet then, will give tokumx a shot as well. Thanks

TokuMX does have MVCC

With this release aggregation framework got super powerful. Now it returns a cursor. Now we can get the aggregation results and iterate over them. No more 16mb result limitation as well...

Agreed, it was annoying to have to use the MapReduce for larger sets.

Can you elaborate for people who don't understand the feature? Is the aggregation result large because of grouping?

Well, aggregation in MongoDB allows you to write sql like queries, and before 2.6 if aggregated result was > 16mb it was throwing an error, but now it gives you a cursor. Now you can fetch the results and send to another collection. For an example aggregation query in mongo have a look here: http://docs.mongodb.org/manual/tutorial/aggregation-zip-code...

I've been using Elasticsearch as a primary database for my new project, which has basically been a good NoSQL db that happens to have great search. However, peripheral tools(performance testing, hosting) have been a bit rough.

How do the two databases compare now. Is search improving in Mongo or is that something they are not really worrying about at the moment.

2.4 introduced text indexes for full text search as a beta, and 2.6 finishes the job of fully integrating them into the product - they are fully supported with the new release (including in the aggregation framework).

In terms of how they compare, I'm not familiar enough with Elasticsearch to comment, but for basic test searching needs, the implementation in MongoDB is pretty decent. More here:


if you are looking for a nosql dbms as a service which consumes json and supports SQL and search, you could try amisaserver

have you seen the pricing? it's insane

The price is unbelievably fair. Try deploying mysql in the cloud and while you are at it, you get Redis as well for cache and then you hit the wall you need search in your application, so you deploy elastic search as a service as well. Do the math and compare to ours.

Our pricing starts as low as $15 per month

The cheapest instance of amazon cloudsearch is around $79 monthly, You then have to deploy a transactional DBMS and then most likely S3 for storage as well.

It's 700/month (otherwise your product isn't needed because no-sharding? postgresql can do most of the stuff)

the hosted version is upto 8gb ram, don't you think that's low ?

sqlserver + oracle are also insane

$700 annually. Please read well before you misrepresent us.

By the way Partitioning by Hash and Range as we have stated means "Sharding"

the 700$month was mistyped

what i meant is that the correct price is 700 and not 350 since the 350 pricing doesn't include sharding/clustering

If $700/year is insane pricing for your needs, you're definitely not in their target market...

$700/year for enterprise-level solutions is generally regarded as suspiciously inexpensive.

ok i'm not, but why not offer another not-so-enterprise-with-reduced-support solution ?

Insane for what? Penny strapped students? I'm not sure they cater to that market.

This is good write-up about what's new with actual numbers. http://devops.com/news/mongodb-2-6-significant-release-mongo... Quote: "MongoDB 2.6 provides more efficient use of network resources; oplog processing is 75% faster; classes of scan, sort, $in and $all performance are significantly improved; and bulk operators for writes improve updates by as much as 5x."

Awesome news. I am excited for the aggregation cursor. As much as I love some of the alternatives that are almost ready I still turn to mongo for a vast majority of my deployments. Hopefully it will keep getting better and pushing others to do the same.

Can someone with more MongoDB experience give me your thoughts on the upgrade difficulty here? Worth doing soon, or waiting for a point release? Does this require a data rebuild/update process (coming from 2.4)?

Upgrade notes are here: http://docs.mongodb.org/master/release-notes/2.6-upgrade/

"the upgrade from MongoDB 2.4 to 2.6 is a binary-compatible drop-in upgrade: shut down the mongod instances and replace them with mongod instances running 2.6."

IMO unless you desperately need one of the new features I would hold off a few weeks. With a release this big I'd expect there will be some bugs and wouldn't be surprised to see 2.6.1 shortly.

This is always best practice for anything important unless you find your life is dull and bereft of emergency.

In the case of Mongo, it's the opposite - the upgrades are needed badly enough that it's worth the risk of a major number release.

an old one, but the circumstances arise so often (app updates) and the consequences so severe (shark attack!) that it's probably worth posting:


Authentication is the main thing to be aware of (no data rebuild), but full notes here:


And, you have to be on 2.4 first (I know you said you are, but best to make sure for others)

So i've been trying to find an the ideal case for mongodb, because I have to teach a nosql database to some people I am mentoring.

I'm leaning heavily towards couchdb though.


Be sure to check out RethinkDB as well, which is pretty much mongo with a MUCH nicer query interface and saner locking.

http://rethinkdb.com/ http://rethinkdb.com/docs/comparison-tables/

I can also recommend checking out rethinkdb. It feels like MongoDB but without many of the downsides.

Not saying MongoDB is bad (I've used Mongodb and like it), but much of their API-design is quirky/bad, locking is an issue and mmap feels weird too.

I've had it recommended a few times now. I'll add it onto the pile =)

I would stay with CouchDB. Its web interface if really good for inspecting and debugging what is inside the database. Really good for development.

Also it has a very nice HTTP interface so can talk to it straight from the Web via a proxy.

Well, for teaching purposes, independent of suitability for a use case there are multiple courses available for free here:


And, more node.js specific (but also offline capable, and available any time):



the latter is especially useful. I'll pass it along to people for extra reference.

-> an ideal use case: capture and store unstructured data, typically tweets. Tweets structure is json based, quite complex with many field and substructure. It's incredibly to store and manipulate such data with Mongodb without even knowing all the details of the fields! It's also a good use case beacause mongodb is fairly good at adding data, pretty bad at deleting data.

same with some others document json oriented database (like elastic search), but mongodb is a good compromise in many area, the query language is easy to understand and powerful, the biggest issue being the diffculty to do complex computation and aggreagation: mapreduce help, aggreagation framework helps too, but in this area SQL is generally much faster for instance.

Have a look at this: https://github.com/johnwilson/bytengine.

It could have been built with CouchDB however some features such as ad-hoc querying and partial document updates make MongoDB a more compelling choice (albeit prone to some scalability issues until mongodb version 2.8 hopefully lol!).

thanks! I'll take a look at it.

I use elasticsearch for adhoc querying : http://daemon.co.za/2012/05/elasticsearch-5-minutes/

And couchdb does have atomic in-place updates, http://wiki.apache.org/couchdb/Document_Update_Handlers

I used the latter recently to track the last view-time on images, as well as to build a schema migration routine in like 100 lines of code.

edit fixed link

While MongoDB has many use cases, it's just perfect if you need to pass over JSON to your JavaScript. Or need JSON output for any other reason.

On the list of reasons to use MongoDB, shouldn't being able to pass over JSON be at the bottom of the list given how trivial it is to pass a mysql row as a json array?

I can't see how that's a defining feature for it, because couchdb and others also do this really really well. Most of the time you don't even need db drivers, because couchdb is just a REST server.

Some notes on using the shell, and some unforeseen performance changes from the beta: http://comerford.cc/wordpress/2014/03/28/mongodb-2-6-shell-p...

Cursor for aggregate, proper explain for aggregate, index intersection, $redact and other cool operators, Multi* in Geospatial, faster execution and, foundation for document-level locking which should be introduced in MongoDB 2.8. I must say I'm happy with this release.

Playing with Meteor and mongo recently and have found mongo seems a little bit strange from a transitional SQL point of view, like do I need to embed or reference? Can anyone recommend a good book or source?

MongoDB Definitive Guide 2nd ed is pretty good

So does this mean document level locking has been implemented or just the foundation for its future implementation has been laid?

I can't hold my breath much longer! :-)

Edit: docs don't make any mention of it but then again they probably haven't updated them yet (fingers crossed!) http://docs.mongodb.org/manual/faq/concurrency/#what-type-of...

It means the foundation has been laid. 2.6 included a lot of refactoring and rewriting of some core subsystems, with the apparent goal of eliminating technical debt so they can make more impactful changes in 2.8.

Don't asphyxiate, tokumx has document-level locking right now. http://github.com/Tokutek/mongo

I am a big fan of their focus on manageability for sharded databases. I am less of a fan of their db internals that might require you to use many more shards than a more performant engine. More details at http://smalldatum.blogspot.com

The blog post was rather vague about 2.6's "better performance". Are there any concrete numbers?

The main reason I use MongoDB on Node is the maturity of the Mongoose ORM - I've used Node-ORM 2, BookshelfJS, and SequelizeJS and none of them felt as mature as Mongoose.

Mongoose is definitely a pleasure to work with, regardless of MongoDB faults. Well-designed.

Great ORM. Has undoubtedly made MongoDB millions by now.

How is it possible to have aggregation cursors without crunching the complete data set in advance (aka map/reduce) and still have consistent and correct results?

Will MongoDB still segfault under certain circumstances?

Everything will segfault under certain circumstances.

The only reasonable assumption is circumstances that do not involve hardware failure, the binary being compiled incorrectly, the source code being modified or replaced by someone downstream, the libraries it is using being corrupt or having been replaced by ABI-incompatible variants... none of these are reasonable circumstances; one would then further assume that the person posting has run into reasonable circumstances where MongoDB often crashes, which is not a stretch given the number of bugs that are filed against it that talk about this kind of issue.


Exactly; well said. It's been about a year but basically I had a replica set where sometimes one replica or the other would segfault and I'd have to manually delete its data files and re-replicate, after which it was fine. Happened about every few months. It was clearly based on application behavior and not an inherent system problem such as those you listed.

Have never seen my python server segfault.


I wonder, what does this mean for TokuMX

It means we have some auditing and backporting work ahead of us in the next few months.

Am glad someone has brought up TokuMX. Tell me, have you used it in production yet?

I wonder why 10gen hasn't made any official comment on the work the folks at Tokutek are doing to enhance Mongodb's features.

> I wonder why 10gen hasn't made any official comment on the work the folks at Tokutek are doing to enhance Mongodb's features.

Why would they comment? What would they say? Toku is basically trying to steal MongoDB's customers, they even use the same basic pricing model.

For me the really irritating thing is the observation that when I want to enjoy flame wars about Mongo, I seem to see a lot of their 'try TokuMX' everywhere. Last time I checked they are on the 2.2 codebase, unless that recently changed. Mongo 2.4 was a significant upgrade, and assuming I'm correct above, I feel like the Tokutek team disregard that. I get that it's good marketing to suggest an alternative, but going along criticising that which you've built upon doesn't go well with me.

Last time you checked was a while ago, it's 2.4 compatible now (except geo and full-text) and has been since TokuMX 1.3.

We generally don't criticize indiscriminately. MongoDB has a lot of good sides and we embrace and extend those, and where it has faults we try to work around or replace them. Our core strength is fast, reliable, compressed storage and MVCC semantics, so obviously we talk about that a lot, but we also understand and acknowledge that a large amount of TokuMX's success, to the degree it has some, is due to the excellent parts of MongoDB.

As an example, I personally am really excited about what MongoDB has done with aggregations in 2.6 (and what seems to be coming down the pipe soon), and I can't wait to merge it in to TokuMX. We all get stronger together.

This is a well put together reply, very politically correct. Good to know you're on TokuMX dev team.

Mongo haven't updated their feeds for Ubuntu. I wonder how long they will take to do so?

They have now. However 2.6 uses different package names, configuration file name, log file etc. It is generally mongod instead of the earlier mongodb (eg before it was /etc/mongodb.conf and is now /etc/mongod.conf).

This means no automatic upgrades to 2.6, and sysadmin action to correct config file name etc.

mongodb is the best database in the whole wide world at the moment. I encourage everyone to jump in mongodb for agile web scale development with full big data capability.

I truely can not tell if this comment is meant to be flamebait, buzzword-laden sarcasm, or NoSQL fanboy-ist.

Well, I think I must disagree with you in here.

I'm sure that it is a viable choice for some use cases, it's just that I didn't found a use case for it yet.

Being able to choose from PostgreSQL, Redis, Cassandra, heck, even ElasticSearch made me always choose one of those over MongoDB, at least for the problems which I had been trying to solve.

By the way if you are looking for all the above functionality provided by all the DBMS you mentioned in a single DBMS instance, you can check out amisaserver.com. Polygot persistence is just another fad.

Doesn't appear to have a community edition for local/private installation...

Don't forget RethinkDB. Very good general-purpose document store that supports joins and has an excellent query language (no DB write lock either!)

Oh? How is it better than rethinkdb (at the same job)?

the1 won't tell you. S/he's teasing.

I have never understood how the definition of "Document Database" is different from "File System".

Really? Have you ever tried to search for specific content in structured documents on a filesystem?

I have, and found what I was looking for. What am I doing wrong?

How much data were you grepping for? How much time did it take? How did you transmit those results over the network?

Or does your data fit in one csv file?

You have too little data and/or too little metadata, and are not taking into account time, flexibility, tooling, etc etc.

Essentially you say something akin to "Death metal is just pulsating air-waves, like jazz, so what's the big difference?".

You are referring to using an index, correct? Because grep is absolutely, madly efficient for a doing a full search.

The index portion of a file system are called files and directories.

Several file names can refer to the same data. Those are called hard links.

So with hard links, I can refer to a Foo by their related Bar.

/foo/foo1 /foo/foo2 /foo/by_bar/bar1 /foo/by_bar/bar2 /foo/by_bar/bar3 /bar/bar1 /bar/bar2 /bar/bar3 /bar/by_foo/foo1 /bar/by_foo/foo2

If I am not mistaken, this accurately describes the limits of MongoDB in terms of mapping relations. I'm not a Mongo expert because no one could convince me otherwise to date, somebody correct me?

>You are referring to using an index, correct? Because grep is absolutely, madly efficient for a doing a full search.

I'm not sure why you imply that a full search is incompatible with an index.

Perhaps you meant "full scan", that is reading everything while searching, instead of "full search" (searching everything). The first is not a prerequisite for the second.

In any case, grep is a very inefficient way of doing a full search. An index is so much faster it's not even funny.

>The index portion of a file system are called files and directories.

Those are just indexes for the names of the files and folders, and a few other select metadata. Nothing like a full-text search index, or even actual indexes on metadata.

(Some filesystems allow those too, e.g. in BeOS, but nowhere as comprehensive and flexible as using a dedicated tool for this, be it MongoDB or something else).

>Several file names can refer to the same data. Those are called hard links. So with hard links, I can refer to a Foo by their related Bar.

Sounds like a convoluted and inefficient way of building something somewhat like a "document database" with 1/10 the features (if that).

>I'm not a Mongo expert because no one could convince me otherwise to date, somebody correct me?

I'm far from a fan of Mongo, but you seem like you have already made up your mind, and nothing will change it.

Plus, if a filesystem is enough of a document database for you (with no cheating, e.g piling up tons of hacks and add-ons like external full-text scanning tools), then be all means, us one.

Grep is essentially the slowest way to search content. It has to read every byte. You can do much better with term/field indexing.

Why would you be calling grep from an online application anyway?

Grep doesn't read every byte.

I used the word "read" and not the word "compare" for a reason.

You are missing the big picture here of linear search versus indexed search. Optimizations in grep don't magically make it better than O(n).

It's very close to JSON objects. In fact, it uses JSON/BSON. So it's Hashes of data structures, which MongoDB makes accessible quickly, like a file system must be.

You can also use GridFS to store files in the document database, which actually breaks files into chunks and stores them in collections, also just like a FAT table.

In the case of MongoDB, dump a blob called BSON which itself can be larger than the JSON itself. Paradoxically this is touted as a space efficient binary serialization you then read it back using an index or something.

Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | Legal | Apply to YC | Contact