I was never sure, though, if that meant I had never faced a problem suitable for one of these DBMSs or if my mind is just so warped by years of using relational engines (mostly Postgres, or SQLite for simple projects) that I could not think of modeling my data any other way.
Recently though, I had to get familiar with the database schema of the ERP system we use at work, plus some modifications that have been done to it over the years, and it kind of feels to me like somebody was trying force a square peg through a round hole (i.e. trying to model data in relational terms, either not fully "getting" the relational model or using data that simply refuses to be modeled in that way).
I sometimes think the people who wrote the ERP system might have enjoyed a NoSQL DBMS. Then again, with a multi-user ERP system, you <i>really</i> want transactions (personally, I feel that ACID-compliant transactions are single most useful benefit of RDBMS engines), and most NoSQL-engines seem to kind of not have them.
(1) Transactional model. Many NoSQL databases are non-ACID, but others are (Google's stores all have some transactional guarantees). Some databases try to gain efficiency by relaxing their transactional guarantees, some probably just didn't get around to implementing a proper transactional system yet.
(2) Data model. The relational models can be overly restrictive as you cannot easily represent contained, repeated elements in an object without ending up in a crazy join smorgasbord. Note that that doesn't mean you have to be dynamically typed.
(3) Distribution/sharding/clustering. RDBMSes are traditionally single machine, and getting them to cluster is usually a huge source of pain. NoSQL databases are often built from the ground for sharding.
I think people go for MongoDB mostly for (2) and ease of use. Very few people have an actual big data problem where you really need (3), and for reliability there are simpler solutions (hot standby). (1) makes it so much easier to build reliable systems that it'd be a real deal breaker for me.
I personally don't understand why so many people go for NoSQL, seems to me like that creates a substantial cost, both for performance, but more importantly missing the transaction guarantees, with no real benefit, at least none that's obvious to me. MongoDB with its unapplied writes, no real transactions, but no real distribution story seems like an odd choice in particular.
Pardon me if I'm just sniping on the word "object" here, but if you think of your data as objects then you will find the relational model restrictive.
In my experience, objects are an application concept, closely coupled to an implementation. If you can conceive of your data in implementation-independent terms, i.e. as entities and relationships, then you can put a RDBMS to effective use.
In a distributed scenario, when a partition event occurs, relational databases opt for consistency, whereas nosql opts for availability. This is formally correct behaviour by relational dbs, but comes with a cost. The cost is a serial performance component, that can't be parallelized. Nosql DBs (most of them), in this scenario, go for availability, and may thus eschew some consistency guarantee tasks, with a cost in data consistency and an advantage in parallel performance.
The trick, as ever, is to use each tool in its function. Nosql and relational dbs are wholly different tools, for wholly different problem classes. Using nosql where consistency is paramount irks me to no end, and that is the case 80% of the time I see people using nosql. On the other hand, in specific cases, nosql DBs are a new useful tool in my arsenal.
For example when site admin says "I want new archive that I can fill with items, items will have Id (automatically), Name (string), IsMale (bool)". He also want to do complex queries on this data as well. That's where NoSQL comes to help.
And to answer why exactly MongoDb is so popular - it's because it has awesome driver support for every popular language.
I don't understand what's so hard to understand here. It's a simple solution to EAV/nulltable nightmare.
I've only seen EAV used in one system, Magento, but was a disaster. It's complex and slow to the point that very product in stored both in the EAV model and as a "flattened product".
For systems dealing with sales and economy in general I would almost alway pick a RBDMS, it's seems a much more natural fit. The ability to do ad-hoc queries in SQL, rather that map-reduce is a huge advantage.
Quite frankly, no application I have ever worked on has had to deal with "huge" amounts of data by any common definition (a couple of gigabytes at the most).
And like I did say, looking at our ERP system's database I am beginning to understand the appeal of a database without a fixed schema. Some of the tables have dozens of columns, with most of the rows being full of NULL values. So I do get that part, but no application I have ever worked on was like that.
This is generally addressed in a relational design with a star schema. First create a dimension table:
CREATE TABLE person (
id BIGINT PRIMARY KEY NOT NULL
CREATE TABLE person_name (
person_id BIGINT REFERENCES person(id) UNIQUE,
name VARCHAR(128) NOT NULL
CREATE TABLE person_bank_details (
person_id BIGINT REFERENCES person(id) UNIQUE,
For the most part, for most projects, worrying about multi-master replication is going to be pointless. You can always put some data in a distributed K/V (or document) store and point to that from your SQL if you need to.
It's easy: Overzealous DBAs who insist in normalization at all costs.
A new technology allows developers to try new approaches to the challenges, by sidestepping those DBAs.
Now I can understand where that people are coming from.
Note that Cassandra scales linearly.
MongoDB stores Metadata, (nearly) uncompressed on a per document basis, so of course it uses way more diskspace. It doesn't store the data in any efficient way either.
Also it's pretty much unoptimized, compared to Postgres which has been around for a really long time so it's kinda slow.
MongoDB has a lot of limitations that can really bite yo (document size even though that's the smallest (gridfs), how you can do indices, even limitations in how your query can look like, etc
The only thing that's good about MongoDB is that it's nice for getting something up and running quickly and that it's a charm to scale (in many different kinds), compared to PostgreSQL. If PostgreSQL had something built in(!) coming at least close to that (and development has a strong focus there) it would be perfect.
For all these reasons many companies actually have hybrid systems, because sometimes one thing makes sense and sometimes the other.
The benchmark seems strange, cause there are many SQL and NoSQL databases that are faster and that's a kinda well-known fact. I think everyone who ever had to decide on a database system has known that, even without a benchmark.
This makes it kinda look like an advertisement (look at the company behind the blog).
Using PostgreSQL 9.3 with JSON for a while now and it's great. Also I know it is possible to scale PostgreSQL and it's really nice. Still a lot more complexity involved (again, depending on the use case).
Just use the right tool and please let's stop with such shallow comparisons, because I think it kinda harms the reputation of database engineers and system architects - and the authors of such comparison. When you look for real comparisons and example use cases, typical patterns or just some help one always stumbles across these things and they tend to quickly be out of date too, cause all well-known databases have a lot of active development going on.
If anything I hope databases like RethinkDB and even multi-master PostgreSQL if we ever see it learn from the most crucial mistake here - rolling your own consensus. (This goes for databases other than MongoDB, I don't meant to single it out here)
Stick to proven algoritms, ZAB, Paxos, Raft.
I'm not saying you shouldn't use Mongo, but to say Aphyr's assessment of database shouldn't be considered in all deployments isn't wise.
So let me leave off with our use case. We have few writes, little need for sharding (our largest customers run on a single node and keep the hot data in memory), we use acknowledged writes (MongoDB can be journaled you know), and our customers are willing to have three + nodes in an HA scenario. The HA is simple to configure. So far in three years we have no data loss.
What more information can I provide to make up for the downvotes?
As an aside, downvotes to me should be used for those that contribute nothing to a conversation. If you disagree with what someone says, state your disagreement so that we all, including me, can benefit from your better experience.
A lot of people may have thought, prior to this benchmark, that MongoDB was the right tool for all high performance JSON-related tasks.
This! A million times! I agree 100% that Postgresql is better than MongoDB in every way except in ease of replication (that's really my only need) for HA. We needed an embedded database in our product and I wanted so bad to use Postgres, but we needed HA and we needed our customers to be able to set it up. This was so important to us that we took the otherwise inferior solution.
Now, that said, I bring up the replication/HA issue every time one of these "postgres is better than mongodb" articles comes up. The last time I posted a Postgres developer responded to me that they're laying the groundwork for a good answer and that it's coming. I can not WAIT for that day!
2ndQuadrant have developed Bi-Directional Replication for 9.4 (http://2ndquadrant.com/en/resources/bdr/, https://wiki.postgresql.org/wiki/BDR_User_Guide) based on the aforementioned logical changeset support in 9.4.
I'd rather spend more time understanding things up-front and have a reliable solution rather than flick a switch and have something which initially works but I'm not too confident in. That's fine for initial development I suppose, but not in production.
Are we really that scared of 'research'?
You've missed the main point of why we use MongoDB. We use it embedded in our product and it's up to our customers to configure HA if they need it. Sure, we document the process for configuring it, but the simpler it is, the less likely a customer is going to have a problem with it.
I would agree with you if it were a database that we maintained in-house, yeah, it's certainly doable. But my main point is that this has to be done in the field at customers who are just trying to use our product.
This article is interesting because Postgres document store options may make it competitive to us when compared to MongoDB.
Early evangelism for Mongodb was overwhelmingly one that hyped performance over all others. It was, somewhat infamously now, webscale.
So now pgsql (since 9.2, but vastly improved in 9.3) can also do the things that MongoDB does, better, if you want to do the document approach (which is a serious debate unto itself). That is news and is interesting.
As for scaling out, I would argue that 9.3 offers more realistic, robust options than MongoDB does.
It is mistakenly evangelized as also being faster, leaner, scalable, more concurrent than anything out there. Some clients would pay extra to redo CRUD using Node.js because it's "Asynchronous and We wanna Pay For Speed and Scalability".
But some devs swear by it, I just fail to see what the advantages are.
Thas has not been possible with Postgres json storage type. Instead, the entire JSON blob must be read out, modified, and inserted back in.
This reality is well known to those that understand Postgres, which is why they have HStore. HStore is limited though (particularly to the size of the store), so there is work underway to make it more competitive with MongoDB.
So now they are also releasing a jsonb (b for binary) storage format, which looks promising, but I can't find any information on exactly what its features are. I would love to actually see a benchmark comparing field updates, but this benchmark is not it.
MongoDB is a database with trade-offs, downsides, and more crappy edge cases then MySQL, but it does exist because at its core it allows data modeling that traditional SQL databases are lacking.
MongoDB has first class arrays rather than forcing you to do joins. It supports schema-less data, which is rarely useful, but when you need it can be very useful. It can do inserts and count increments very quickly (yes the write lock means you eventually have to put collections in separate databases), which is also useful for certain use cases.
"... later versions will include a complete range of workloads (including deleting, updating, appending, and complex select operations) and they will also evaluate multi-server configurations."
Any update in PostgreSQL will result in a new tuple being inserted, whether it contains json, hstore or anything else; that's the basis of multi-version concurrency control. It'll be the same deal with jsonb.
Not only that, the delta to update the page to contain the new tuple, and a copy of the full page the tuple is being written to (if it's the first change to that page in that checkpoint cycle) are written to the write-ahead log.
I'll take issue with a couple of points, though. Postgres has arrays:
* A Postgres Array can end up being stored elsewhere, whereas in MongoDB an array will be contained within the document
* I am also not clear on what exactly can be stuck inside an array (In MongoDB it can be an object that contains more arrays) while maintaining first-class access and updates.
I would love to find more detailed information on these points, but with jsonb they may be moot now.
I will look into the increment issue in PostgreSQL more carefully. I just know that it wasn't feasible in MySQL when I was using it.
You can store anything in an array that you can store as an attribute of a row. There may be some edge cases I don't know about, but generally anything that can be a column can be an array element.
Do bear in mind that you'll never get MongoDB-style in-place updates in PostgreSQL, due to MVCC. You may save a round-trip with the entire JSON object once they implement update operators, though.
That being said, I find it weird that now it is cool to make fun of MongoDB. Some people on this thread have even said they want to know if a service is using MongoDB and they'd not use that service. I am pretty sure they'd be all over Stripe (who store your money related stuff in MongoDB) in a different thread.
That's a bit scary. There has been several successful attacks against virtual currency exchanges that use MongoDB, utilizing the eventual consistency to your advantage.
If you handle money, you don't want any inconsistencies in your database, no matter how temporary. You can work around these of course but you really need to know what you're doing.
against solemnly depending on MongoDB's eventual consistency alone.
The reason, I think, is that few people using it had a good sense of where the pain points would be. Many of us somehow imagined that because we didn't know where mongodb's limitations were, we wouldn't run into them. Cue frustration when you're bitten by some issue you didn't think about.
I do believe there are good uses for it. But they are pretty specialised, and I'd rather use a relational db for most things.
The hatred was always present, it's just that the hype has died down as the early adopters have (re)discovered the sharp corners in both architecture and implementation.
You mean like this?
update table set jsonCol->propertyA = 42;
"As of PostgreSQL 7.4, PL/Python is only available as an "untrusted" language, meaning it does not offer any way of restricting what users can do in it. It has therefore been renamed to plpythonu. The trusted variant plpython might become available again in future, if a new secure execution mechanism is developed in Python. The writer of a function in untrusted PL/Python must take care that the function cannot be used to do anything unwanted, since it will be able to do anything that could be done by a user logged in as the database administrator. Only superusers can create functions in untrusted languages such as plpythonu."
Incidentally I've been merging my json in perl. It's automatically built into postgres, and the Hash::Merge library with it's configurable behaviour is very handy. Still, looking forward to not doing that. 9.4 is coming before xmas, right?
The issue with PL/Python is that it's nigh on impossible to properly sandbox Python.
create table foo (d hstore);
insert into foo (d) values(hstore(ARRAY['key', 'key2'],
-- "key"=>"value", "key2"=>"value2"
update foo set d = d || hstore('key', UPPER(d->'key'));
-- "key"=>"VALUE", "key2"=>"value2"
The whole point of btree indexes is that they're efficient to query from disk.
a few seeks aren't terrible for small/medium applications, but when you're asking for thousands of queries a second any disk access is bad news.
I have a script that grows a bit Perl hash. It fills up memory and starts paging. The application grinds to a halt (almost). I tie the hash to Berkly DB file, so it writes to disk. Performance is about a third of the in memory hash, BUT doesn't slow when it's too big for memory.
But then again, plenty of shops still use MySQL (and one of my clients uses DB2...).
But just like with MySQL, I imagine those who can migrate to another DB would do so at the first opportunity or wait for the equivalent of MariaDB to make a "drop-in replacement" in lieu of messing with their application code.
But then again, I am going to say the opposite of what I just said to hedge my bet and point out plenty of shops still use MySQL. No serious company like YouTube, Facebook, or until recently Google Ads would use it.
I love sitting in my armchair and passing technical judgements without providing any technical details!
Link in question: https://news.ycombinator.com/newswelcome.html.
I am a huge fan of PostgreSQL, but in order to avoid having to rewrite an application designed for Mongo, I tried TokuMX and was pleasantly surprised by its performance.
People are switching to MongoDB because the developer and deployment experience is so good.
I found that many companies have incredibly bad, not use-case driven reasons to pick their DB. Not that MongoDB is a bad database, but it has it's fair share of issues. Granted, this is the case for every database, but if you pick any database without being aware of those, you end up in a world of pain.
(Context: I consult for backend systems in general)
After that the project grew, I was brought in and needed to do some reports, which were a huge pain in the ass without proper SQL and joins.
I don't intend to have anything to do with people using MongoDB unless they have a really, really good reason for doing so, and it's used alongside other databases.
His company installs and monitors sensors in civic infrastructure: roads, train tracks, bridges, ... These sensors provide constant data streams which need to be stored somewhere before processing. Any raw data older than 2 weeks is worthless, and if 2-3% of data is lost before written it's not much of a problem. More data is coming in all the time any way.
For them MongoDB is the perfect transient cache. Sensor data is processed and the results stored elsewhere. Easy to expand. Flipping between installations is just a matter of toggling load balancer, an once an offline cache has been processed, it can be nuked and put back as a fresh system.
So, for a setup where easy size expansion, fast mostly-reliable writes and ease of use are the primary design constraints, mongo fits in suprisingly well. It's a fascinating use-case.
As in "Our lead dev always wanted to try X, because he likes the name".
Given that, I really need to build an app using project-he-who-may-not-be-named.
The sad state of affairs however is that often developers choose which DB they'd like to work with and (Dev)OPS doesn't get a say in that.
There's a deep lesson here to be learned. One that can make you rich.
WAT? can you tell us more about your experience? according to mine ,developers are moving away "en masse" from Mongodb,because the experience was so bad.
The general issue that I often see is that people don't choose their databases by "is this a good fit or a bad fit for our use case" but rather by "that looks cool" or "we need that to attract talent" or "we did that project with X, so X is also fine for that new, totally different thing". And then they also try and stick with it at all costs, instead of acknowledging that the choice was a bad fit. And then they call you and say "we have an X database in flames, can you come in and rescue our data? Best would be yesterday?"
This is not restricted to document databases. I've seen apps run on mysql on a cluster with 10 Sun XFires where after a man-month of index-optimization 3 would have been sufficient. That was like shooting dead fish in a dry barrel. A lot of developers don't want to bother with stuff like "how does that complicated piece of machinery actually behave if I push it."
This sounds interesting. I've been looking for a good example where NoSQL db is better than relation db. Could you provide more details.
Unless I'm misunderstanding what Xylakant means, that's certainly doable in Postgresql and has been in other relational databases for a long time.
In old times you could suddenly switch off power of a running Informix server and it will correctly restore the state on that very moment after reboot, dropping all partially (unfinished) transactions, keeping all the committed ones, so it was possible to have a clean state of the system.
How does it work? Append-only so-called "physical logs" on an direct-access, unbuffered by an OS and hard drives storage, proper partitioning on separate physical drives (for real parallelism), etc.
Again, real databases are all about data-storage (architecture, data-structures, algorithms, design decisions, proper implementation) not some "user-friendly" APIs and "well-written" docs to quickly gain popularity among ignorant.
Database back-ends are among the hardest problems in programming and the amount of research which has been done in 80s and 90s in this field is quite remarkable. I doubt that a bunch of punks with professional marketing and sales techniques could adequately replace implementation of a high-performance and fail-safe storage back-end, which is called a database.
As someone who evaluated and eventually recommended against implementing MongoDB for a new product, I entirely agree with you. The Mongo implementation was shot after about three months of wasted time after failure in pre-production testing. It was rolled back at great cost to SQL Server with a light document-style abstraction as Mongo failed miserably on precisely what you said: reliability, consistency, scalability and storage management.
Proper relational databases (PostgreSQL, Oracle, SQL Server in my scope of experience) can take an absolute incredible amount of punishment without breaking any guarantees. The relational bit above is optional but it is at least something you can choose to use or not.
For me it always sounded like a bunch of people not willing to learn how to write proper queries and optimize data storage.
Why does that make this article suspicious? Obviously they stand to benefit from more people using PostgreSQL, but they've been exceedingly clear with their methodology - their benchmark is available on Github: https://github.com/EnterpriseDB/pg_nosql_benchmark.
Also contains some comparisons with MongoDB.
That said these articles are pretty pointless. Performance isn't the reason companies are switching to MongoDB.
Again, very basic PoC.
Than to fix up the sharding and clustering in PostgreSQL. It's been an issue for quite a few years now and still not mainlined. I think most PostgreSQL users are simply scaling them vertically.
Sooo I don't know.