Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Very cool. We're battling Mongo as we speak. Never ran into a problem where you have to run a compaction on a DB before, would love to know of a way to automate this like every other DB I've ever used. It's probably something I've glossed over and I'm sure in the first 5 min I'll be slapping my forehead.

Btw, do we get a free mongoDB sticker set to us on completion? I lost my last one.




Despite not knowing anything about your dataset and insertion/deletion patterns, you may want to consider power of two allocation. It's a new feature in 2.2 that costs some additional disk space up front (new documents potentially get 2x padding), but can save a lot of disk over the long run by eliminating many data access patterns that add to fragmentation. One (pathological) benchmark I ran saved 800x the disk space after 1,000K (edited from 100K), insertions and deletes of 1-10KB documents.

http://docs.mongodb.org/manual/reference/commands/#usePowerO...


All databases do not like to give back space they have allocated because allocating space is an expensive operation. You probably don't need to compact the db, unless you have a problem with file size (which in some scenarios I have had this issue).

There's nothing built in that I know of, but you can easily just make something that runs on a regular basis like a scheduled task or cron job that runs either compact command or repair database command.

http://docs.mongodb.org/manual/reference/command/compact/ http://docs.mongodb.org/manual/reference/command/repairDatab...


Speak for your own databases. Mine does it just fine.


I didn't say they can't do it, they just don't like to.


Again speak for your own databases. There are databases out there that plug each hole every time you delete a row AKA zero fragmentation.

Only databases that fit their working set in memory have to resort to fragmentation to get reasonable performance. If the data set fits in memory there are other strategies.


It is not only this: moving datums around in the database heap is expensive (invalidating caches, exacerbating thrashing) and touchy (index pointers, concurrency considerations).

But debloating over time is an important property, so it's probably worth it to eventually get it right.


I can hook you up with some stickers. meghan @ 10gen.com




Consider applying for YC's Fall 2025 batch! Applications are open till Aug 4

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: