People give MSSQL shit for having row-level locks (if you don't use their MVCC option), yet how is it that Mongo runs with a database-wide option and people don't immediately laugh and walk away? Is the hype so powerful that people just shrug about a huge mutex?
cue: someone telling me SQL query optimization is easy and people are just idiots.
Personally, I would much rather perform a complex query in SQL than NoSQL. The base-line is likely to be not-terrible, which may be good enough for single-use queries. Optimization is much easier (IMHO) - SQL is a DSL for querying and indexing. Compared to having to write code to do that (which seems to be the primary NoSQL approach), the DSL approach seems much more efficient. SQL isn't perfect, but it beats no-DSL-support or another DSL that is even worse than SQL :-)
MongoDB has its problems (for instance document schemes can't be enforced, disk space consumption concept is unintuitive,...) and relational DBs have their strengths (ACID, SQL is standard among DBs,...). But very few people ever reach a point where the global write lock becomes a problem. 10gen knows exactly which features are vital to most of their users and which are not...
The relational model takes this observation and concludes that there should be no "natural hierarchy" in the logical model (the physical model is a separate question.) It's a _theoretically_ beautiful idea. The counter-intuitive _real-world_ result is that the theoretical approach also yielded faster systems than alternative philosophies. I think that's the reason why the relational model has dominated for the past 30+ years.
You might well argue that that's because a _good_ NoSQL implementation has yet to be created (the no true Scotsman argument). That may well prove to be true one day, but I would bet that a good non-relational implementation will be a derivative from the relational world (Postgres with hstore, or Google's F1 with Protobufs); not from a project whose starting axiom is to get rid of SQL/relational.
MongoDB is in its own class of terribleness.
A benchmark from Oracle shows InnoDB as being much faster on a read-only workload.
EDIT: By the way the locks are table level in MyISAM and Memory.
Unfortunately the read/write improvements weren't high enough to justify the massive engineering feat it would take to replace MongoDB with TokuMX, mostly due to the lack of commercial support for Toku.
The migration is easy now that they've released a tool to replicate from a stock mongo to TokuMX. They also do have commercial support that we've paid for and their team has been incredibly helpful and responsive.
Just to clarify, are you talking about MSSQL's tendency to escalate locks (e.g. from row to page to extent to table)? They do that to make locks more manageable, and while in most cases it improves performance, in some high-concurrency situations it can be a gigantic PITA (especially given that the ways to strong-arm it into not escalating is not the most helpful).
> Lately I haven't read any news about a company migrating to Mongo, but rather most were either departing from mongo
They had some strange defaults to start with (to give them serious advantage in small silly benchmarks) like un-acknowledged writes. Yes you read that correctly, for years their default configuration was to throw write requests over the fence and assume they succeeded.
There are some horror stories of people's databases becoming silently corrupted. Those corrupted databases were then backed up and back ups were corrupted.
Eventually they had fixed their defaults but their reputation as an engineering company went down the hill ( I speak for myself here mostly).
From my experience a lot of people using MongoDB don't know what they are using. To them 'unacknowledged writes' and 'database level locks' sounds like Latin. They see short examples, cool mugs, other cool kids talking about how easy MongoDB is and they start using it.
Tens of thousands of computer geeks all randomly chose to hate a product. Clearly a coincidence or unlucky alignment of stars, nothing to do with said product, of course.
Lover: MongoDB is so much faster and easier than old school SQL!
Hater: It's fast and easy because it doesn't check if it's written data correctly, which leads to corrupt data.
Lover: Well you could always turn on the write lock.
Hater: If you turn on the write lock, it becomes slower than normal SQL databases.
Lover: Well you're just trying to use it the same way you use SQL. MongoDB is great for some problems and not so great for other problems.
Hater: What problems is MongoDB better at solving?
And then that's the end. I've never heard a convincing use case for this thing. If you can change that for me and give me one, I'd be delighted to listen.
It is also very easy to use and manage.
This is a really bold claim that I very much doubt is true. For one, there are numerous other document stores that target the same sort of use cases as MongoDB. Second, there are benchmarks floating around that PostgreSQL used as a key value store is faster than MongoDB as is. I wouldn't be surprised if you saw similar things with other SQL databases.
That doesn't mean I would want to store financial transactions for a bank in a MongoDB, but it has its place and that place is clearly huge (judging by the number of people using it).
I love it because it is:
- simple to use
- fast enough (with safe writes, thank you!)
- simple to use
- simple to use
Did I mention it was simple to use, which allows me to focus on app instead of wrestling with DB?
I have used Mongo plenty and I really don't want to do it again. With its almost-safe writes it's extremely slow and no simpler to use than Postgres. Also much less flexible, basically a subset of Postgres' data model.
This is something that SQL database fans don't get. A small start-up can't afford to have a team of guys fiddling with incredibly obscure performance tuning settings.
A developer might not know how to optimize a slow-running Postgre query, but he knows how to cache data and denormalize in a manner that makes the query he wants to run fast. And a document-store plays nice with this.
Any simple relational beauty of SQL databases falls down once you get over a few million rows, and then the hideous hacks and bizarre settings start coming out. At that point you need a pile of domain-specific knowledge to make them play nice.
Mongo takes the domain-specific knowledge out of the equation and lets the developers apply their existing skill-set to the data.
I've you've got a thousand employees? Get a few DBAs and Postgre.
If you've got, like, five?
People think that staffing up devs creates agility and speed, when in reality it just increases product features.
Scope creep without someone around to voice the needs of operations is a crazy-efficient generator of technical debt.
For example, if you start a background index generation on an eventually consistent replica set, indexing on secondary nodes are done foreground. Which means you only accept reads from slaves but slaves are unresponsive because of the index generation. In this state, if you try to do anything fancy your data will go corrupt. Only way out is to wait through the outage (which I find it pretty hard to do so). This is still not solved in 2.4, waiting for 2.6.
Replica sets with all secondaries which can't elect a primary because it lost a node, or the mostly random primary-secondary switches that drops all connections, seldomly primary reelecting itself meanwhile dropping connections for no apparent reasons. Mongo offers tin foil hats for integrity, consistency and reliability. So yeah, I'd rather examine and understand why an SQL query is slow. Because it is at least deterministic, which in mongo nothing really is.
Postgres supports free form json, XML or hstore document formats by the way, couchDb has its own specific features as a document db too. I still don't see why people want to go on with mongo this bad.
There's a program running that, 100 times per second, reads a document, MD5's the field, and writes it back. At the same time, it reads a file from the local filesystem, MD5's it, and writes it back. The document and the local filesystem file started with the same value.
After a few thousand kill -9's on the master instance, the local file and the mongo document are still identical.
I've been running MongoDB in production since 2010.
It's definitely possible to use Mongo in a way that isn't safe for your particular use case. But we're doing it correctly.
I haven't lost a bit of data in more than three years of MongoDB.
Mongo has a lot of limitations. We're currently researching various 'big data' solutions, because for us, Mongo doesn't fit that.
For ease of development (in dynamic languages, where your in-program data structures and in-database documents look almost identical), safety, and lack of headaches, MongoDB has been a consistent win for me and the teams I've been on.
It sounds like it might be the latter, which is not a particularly stressful test (because you can't detect data rollback).
I'm more familiar with relational database internals, but I wouldn't be surprised if a DB just optimized out the unchanged-write entirely (they'd still need to read the current row value, but they don't have to invoke any data modification code once they see the value hasn't changed).
For a good test, you really want to simulate a power-loss, which you aren't getting when you do a process-level kill, because all the OS cache/buffers survives. You can do simulate this with a VM, or with a loopback device. I'd be amazed if MongoDB passed a changing-100-times-a-second test then. I'd be amazed if any database passed it. I'd even be amazed if two filesystems passed :-)
I plan on extending this test by blocking 27017 with iptables, then doing the kill -9, then wiping out all of the database files. That'll be fun. :)
And you can cache and denormalize with postgres just as easily, and it will probably perform better and not corrupt your data.
What mongo makes easy is things like replication and failover. It can go horribly wrong on that front, but until you see that, it is much easier to get up and running with replication and failover.
Not to mention the complete lack of durability. Even Redis is more durable.
Postgres is trivial to use. Is something slow (which will happen much later than with Mongo)? Add an index, done. I'm not a DBA, you don't need one.
...a single Python process using gevent and pymongo can copy a large MongoDB collection in half the time that mongodump (written in C++) takes, even when the MongoDB client and server are on the same machine.
2. I'm sure the Mongo devs would accept pull requests for mongodump.
2. Doubt it, and there's no reason for me to write one for them.
A couple of my coworkers at Basho have done geospatial work with Riak, our scalable, distributed database: http://basho.com/indexing-the-zombie-apocalypse-with-riak/
That said, other people mention Cloudant deployments here, and that is CouchDB. If you want a Mongo-ish (they are still very different) NoSQL document store (notice I did not say database), there is extra layer of functionality on top of CouchDB known as GeoCouch. I have never personally used it but I have been looking for a reason to.
to everyone use hnotify!