Hacker News new | comments | show | ask | jobs | submit login

If I never expect the dataset to grow past 1GB and a single server, why would I use anything else? It doesn't really fail - none of the issues described were "failures" really. [edit: just to be clear, it didn't crash and burn, I don't think performance issue == failure] The data loss was not confirmed either: "There appears to be some data loss occurring" and in small deployments you can just use transaction log.

There's no other project I know of, which provides: schemaless json documents, indexing on any part of them, server-side mapreduce, lots of connectors for different languages, atomic updates on part of the document. If there is one and it's better than mongo, I'd switch any moment.

>> "It doesn't really fail - none of the issues described were "failures" really."

These absolutely were failures.

The author listed several instances in which the database became unavailable, the vendor-supplied client drivers refused to communicate with it, or both. Some of these scenarios included the primary database daemon crashing, secondaries failing to return from a "repairing" to an "online" state after a failure (and unable to serve operations in the cluster), and configuration servers failing to propagate shard config to the rest of the cluster -- which required taking down the entire database cluster to repair.

Each of the issues described above would result in extended application downtime (or at best highly degraded availability), the full attention of an operations team, and potential lost revenue. The data loss concern is also unnerving. In a rapidly-moving distributed system, it can be difficult to pin down and identify the root cause of data loss. However, many techniques such as implementing counters at the application level and periodically sanity-checking them against the database can at minimum indicate that data is missing or corrupted. The issues described do not appear to be related to a journal or lack thereof.

Further, the fact that the database's throughput is limited to utilizing a single core of a 16-way box due to a global write lock demonstrates that even when ample IO throughput is available, writes will be stuck contending for the global lock, while all reads are blocked. Being forced to run multiple instances of the daemon behind a sharding service on the same box to achieve any reasonable level of concurrency is embarrassing.

On the "1GB / small dataset" point, keep in mind that Mongo does not permit compactions and read/write operations to occur concurrently. As documents are inserted, updated, and deleted, what may be 1GB of data will grow without bound in size, past 10GB, 16GB, 32GB, and so on until it is compacted in a write-heavy scenario. Unfortunately, compaction also requires that nodes be taken out of service. Even with small datasets, the fact that they will continue to grow without bound in write/update/delete-heavy scenarios until the node is taken out of service to be compacted further compromises the availability of the system.

What's unfortunate is that many of these issues aren't simply "bugs" that can be fixed with a JIRA ticket, a patch, and a couple rounds of code review -- instead, they reach to the core of the engine itself. Even with small datasets, there are very good reasons to pause and carefully consider whether or not your application and operations team can tolerate these tradeoffs.

Just to be 100% clear -- so people don't misunderstand your explanation of Mongo's compaction: Mongo does have a free space map that it uses to attempt to fit new data or resized documents into "holes" left by deleted data. However, compaction will still eventually have to be ran as the data will continue to fragment and eventually things get bad.

The data loss was not confirmed either: "There appears to be some data loss occurring"

Oh, this mystery is a failure all right, and even the most charitable interpretation would call it a misfeature.

what do you think of redis? I feel the same way about Mongo for the most part, but have been considering switching.

If you can model your data in redis data structures it is excellent. Keep in mind that there is no preferred mechanism for operating redis when data is larger than ram. There is vm and diskstore, both deprecated by antirez, and a focus on data sets that fit in ram.

If you can do both of those things, it is awesome.

Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | DMCA | Apply to YC | Contact