Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Let's also consider that MongoDB's entire I/O architecture and data structure layout is built around a global lock and in-place updates, which makes implementing non-blocking compaction nearly impossible without either a MAJOR overhaul of the codebase or if somehow they were able to rig up the replication system to internalize the advocated compact-on-replica-set-slave-and-then-rotate method.

PostgreSQL has an automatic cost-based vacuum and sophisticated adaptive free space and visibility maps. With log-then-checkpoint writes and MVCC, there's no interruption of the database at all to reclaim page space used by deleted rows.

Cassandra also has a write-ahead log, and flushes it's in-memory representation of data to immutable files on disk, which actually was a pretty brilliant decision. Dead rows are removed during a minor compaction, which can also merge data files together. The compaction is very fast because the files are immutable and in key-sorted order, making it all lock-free, sequential I/O.

In theoryland(tm), in-place updates seem better because you're only writing data once. In realityworld(R), the need for compaction, cost of random I/O, data corruption problems, and locking make it suck.



Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: