Hacker News new | comments | ask | show | jobs | submit login
Ten things I didn’t know about MongoDB (slowping.com)
50 points by iqster on June 10, 2011 | hide | past | web | favorite | 25 comments

Some things I didn't know:

1- While you can select only specific fields by using {fieldA: 1, fieldB: 1} as a 2nd parameter to find, you can also exclude fields using -1. Interestingly though, you can't mix inclusion or exclusion (which makes sense if you think about it), except to exclude _id. So {name: 1, description: 1, _id: -1} is possible.

2- Sending SIGUSR1 to the running mongod process will rotate the logs

3- We got a 25% memory/storage reduction by shrinking our field names. YMMV. I knew how data was stored, but I didn't know what amount we would specifically save :P

4- Count returns the value of found documents regardless of paging. This lets you pull documents + get total # very easily. If you pass true to count, you'll get the actual returned # of documents.

5- Replica sets either have a priority of 0 or 1...future versions will introduce more flexibility

>3- We got a 25% memory/storage reduction by shrinking our field names. YMMV. I knew how data was stored, but I didn't know what amount we would specifically save :P

I really wish they just had an internal lookup table to do that. I don't want to have to deal with keys like "c1, ba, la" in my application.

Some drivers/mappers support it. However, as much as MongoDB likes to lean on the drivers, this clearly belongs in the server.

With respect to #3, is this happening because field names aren't getting compressed when stored as BSON objects?

P.S. Your ebook and tutorial are great resources. Thanks for sharing!!

No, it's b/c of the document-oriented db system. All the field names are in every single one of the entries. Ex: If in every instance of 'Title' you had 't' instead. and every 'Post' you had 'p' instead --- it can add up to alot.

It is a property of document dbs, though I'm pretty sure implementations could internalize field names. Imma guess 99% of all Mongo collections ever created are pretty damn structured. When creating a collection, you could specify {internalize: false} or something to disable this when you do lean heavily on the schemaless behavior, but otherwise I'd like to see MongoDB take care of this.

In its defence, Mongo is basically brand new. They've done the cool CAP stuff first, now they're discovering why everyone else keeps banging on about wanting ACID guarantees.

edit: I suppose I should have expected to be downvoted -- I'll elaborate on my thoughts.

The folk who wrote things in the pre-CAP theorem era were not ignorant of the problems of scale. They did their best to attack them and did an amazing job of it.

If you remove constraints, then yes, you can improve performance. But soon you will discover why and how those constraints were imposed in the first place.

The implementers of Mongo are, less some genuine advances, doomed to repeat history per Santayana.

Funny that they say multi-master is bad since we can't reason about it. It always feels like they are out to insult the users telling them they are too stupid to use other things because it's hard. I guess all that durability didn't matter too until they had time to implement write-logs. I guess first class conflicts won't matter until they have real distributed computing support across multiple data centers either.

MongoDB is a single-system database with support for replication and sharding. If you want a true distributed database, you'll need to look into Riak or CouchDB.

Riak also has no master!

Yeah. I was just pointing that 10gen's arguments seem to evolve at it's own convenience. Developers need to take things more critically (the why) in general instead of stopping at "that makes sense."

A few things I didn't know about MongoDB when I started using it (admittedly, this was last year, things might have gotten better):


Another gotcha we found is that you can't do any type of real-time aggregation in a production environment.

There is a group() function, and there is Map Reduce, but since they both run via the JavaScript engine and the nature of JavaScript is single-threaded that means that you can never execute more than one of the aggregation queries at a time. So if an aggregation query is taking a long time to run, all other queries will block and potentially timeout.

Supposedly we'll have better options in 2.0.

In addition to supporting multi-threaded map-reduce, it'd be nice if they

1- Eliminate any memory limits on inline map reduce, or bring back output to temporary collections. If they bring back output to temporary tables, allow them to be run on slaves and not participate in replication

2- Polished and released a production-ready version of their MongoDB Hadoop Adapter: https://github.com/mongodb/mongo-hadoop

I didn't know about global lock too. In the presence of this lock we sometimes have several seconds read queries even from collections we are not writing too (we are very write heavy).

That is not much of an explanation for why. It's like saying Chinese food is better for the Chinese.

You know, Mongo DB is Web Scale: http://www.youtube.com/watch?v=b2F-DItXtZs

Too bad their site doesn't scale as well as MongoDB

Crashed for me too

I always find it annoying that you can't really restrict MongoDBs memory consumption.

I'd love to run it alongside a ElasticSearch process and a small redis server on a machine, but while I can limit ElasticSearch and I could theoretically limit Redis, MongoDB would just grow as far as I understood (mmapped I/O). I'd love to use it as a second copy of the data in ElasticSearch

I agree being able to put a hard limit, in some situations, might be nice. However, I believe the current implementation pretty much leaves memory management up to the OS, which, hey, I'm no expert, but sounds reasonable to me.

MongoDB won't starve other processes of memory unless the OS decides it should. It seems like the best way to leverage the most amount of available memory.

"MongoDB should likely not be run in a 32-bit environment." This is sending up alarm bells for me. Why does it matter? Can a Mongo fan explain why this is?

Probably because of the file size limitations with memory mapped IO on a 32bit system.

And god help you if you reach that limit on an already repaired/compacted db. You can't even connect your shell to delete some data. I had to rm -fr a collection out of the /data folder..thankfully it wasn't production..but I can only imagine...

Applications are open for YC Summer 2019

Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | Legal | Apply to YC | Contact