No. Every time I've used mongodb we've ended up regretting it for one reason or another. And migrating to a different database after launch is a huge hassle.
I've done a couple projects where we kicked off with postgres using JSONB columns for early iteration. Then we gradually migrated to normal SQL columns as the product matured and our design decisions crystallized. That gave us basically all the benefits of mongodb but with a very smooth journey toward classical database semantics as we locked down features and scaled.
Back in 2010 when the MongoDB hype was high, the well-known company where I was working at the time decided to build the next version of the product using MongoDB. I was on the analytics team and had to code a whole bunch of intricate map-reduce jobs to extract summary data out of Mongo. I'd repeatedly head to the product team and ask them to explain the edge cases I was seeing in the data and they would not be able to give me an answer because the data was on the third or fourth version of their mentally-stored schema and no one knew anymore. All in all, misery.
I decided to check out the hype back then, and started writing tutorials on using PHP with MongoDB. After the 3rd posting, I realized that they were all about being anti relational, and though you you could have keys to other records in your record. This lead to bringing back records, querying more records, then manually filtering records,rinse and repeat.
Iirc, they've turned around on their anti relational views and now allow for joins.
I looked at OrientDB awhile back ago, but it fell flat with the lack of features and oddness.
If I had more time, I would really dig into ArangoDB.
DynamoDB is a lot more explicit about its tradeoffs. Much of the backlash against Mongo was because it basically claimed to be well-suited to any use case, when its sweet spot was really far narrower. To be successful with Mongo, you need to design the entire app around its limitations, but those limitations were initially downplayed and obscured.
People were convinced that Mongo was a good choice as a default, general purpose db, when it clearly wasn’t for about a million reasons.
I don’t think DynamoDB is marketed or viewed in the same way. The docs are pretty clear about needing to design your data model to specifically work well with Dynamo. People using it seem to generally be aware of its limitations, and deliberately choose to accept them for the sake of performance and scalability. At least that’s my perception.
> I don’t think DynamoDB is marketed or viewed in the same way. The docs are pretty clear about needing to design your data model to specifically work well with Dynamo.
More importantly, AWS is very explicit in letting newcomers be perfectly aware that DynamoDB's usecases are, that they are very specific niche use cases, and that if users require schemas and joins then they should just stick with either relational databases or graph databases.
Basically when speed and horizontal scalability are very important, and consistency/durability are less important. It’s also pretty good for unstructured or irregularly structured data that’s hard to write a schema for.
Web scraping or data ingestion from apis might be a reasonable use case. Or maybe consumer apps/games where occasional data loss or inconsistency isn’t a big deal.
It can also be used effectively as a kind of durable cache (with a nice query language) in place of redis/memcached if you give it plenty of ram. While its guarantees or lack thereof aren’t great for a database, they’re pretty good for a cache.
DynamoDB is a completely different beast and would use no other data store unless I had to. It can pretty much handle any transactional workload I need.
It's cheap, its fast and it scales super high. Don't need much more
DynamoDB is consistent and scalable, not fast and cheap.
DDB is great for storing data that is required to be scalable and never needs to be joined. Add in DAX, developer time necessary to orchestrate transactions, calculate the scaling costs and...that's how AWS gets you.
Plus, local development requires half-complete emulators or a hosted database you're charged for.
No, maybe people should think twice about DynanoDB.
This came up in a thread a few days back, but people considering it should note that Dynamo’s transactions are limited to 25 rows. That may be enough for most operations, but I definitely wouldn’t say it can handle “any transactional workload”. I ran into this limit pretty quickly when trying it out.
I've recently adopted this practice. All new features use a JSONB (or hstore - I'm experimenting with both) field instead of a "real" one until they're bedded in and stop changing. Then I convert the field + data to a real field with a NOT NULL constraint in one easy migration.
So far so good. Being able to join on JSONB fields right in the SQL is awesome
I never understood the appeal of the JSON to SQL columns workflow.
At least with the ORM(-ish) tools I worked with, it always felt much more straightforward to just change classes within the application code and automatically generate the respective migrations files to be run on the relational database, than having to interact as a human with json (for the app as well as for business intelligence and reporting/monitoring).
I feel I have handwritten significantly more schema and data migration code for non-sql databases and JSONB in Postgres than for relational databases in the last 10 years.
Sure once the database growths bigger, those automatically generate migrations files don't work as seamlessly anymore and can be dangerous, but no tool or database magically solves all the problems at scale.
> At least with the ORM(-ish) tools I worked with, it always felt much more straightforward to just change classes within the application code and automatically generate the respective migrations...
There's your answer, no? You've got specific tools and workflows you've designed to work with your database in a specific way. You can't just take one workflow and substitute a piece of another workflow and expect it to always work. Your tools and processes would be as useless for me as mine would be for you.
Sure, but almost all the times (I'm not specifically referring to person in the comment I'm responding to and should have been more clear on that) people were making it fundamentally an sql vs. non-sql argument.
As in you need to switch your entire database system or make a bet on this completely new player instead of battle proven tech, because with relational databases schema migrations are hard. Among a bunch of other questionable claims.
Sure if you take the vanilla databases without their ecosystem, there is some merit to that, but I don't find it a very practical argument to ignore all the tooling and workflows that do exists and are used by people.
When MongoDB first came out, I was eager to check it out. I was using mainly Django with Postgres at the time, but had my career start on ZODB, an object-oriented non-sql database created in the late 90s. So I was hopeful that MongoDB could give me the best of both worlds. One of the reasons I backed of Mongo very quickly was the lack of tooling as soon as you have something resembling a schema or relationship and you'd want to change it.
> I never understood the appeal of the JSON to SQL columns workflow.
The appeal is non-technical.
Writing good migrations, having tests around them to ensure they didn't leave the DB in an inconsistent state if a subset of them failed requires good understanding of a RDBMS and the specific product.
You'll be surprised how many engineers don't meet that criteria.
The JSON to SQL columns workflow allows any developer to offload what a RDBMS does into the client/server/bootstrap code.
At that point the DB is like a key-value store and you get to claim you are using Postgresql.
Is that what I would use if I had reliability and performance in mind?
No... but it's way cheaper (and quicker) to find devs who can do this (JSON to SQL columns workflow) than devs who can write well tested, reliable and non hacky migrations.
Often at a startup, reliability and performance have lower priority than getting some features out the door in a day.
>At that point the DB is like a key-value store and you get to claim you are using Postgresql.
I know this statement is a little tongue in cheek but there is a lot of value organizationally to using Postgres as a key-value store vs MongoDb or some other flavor of no-sql. Even if you're not really getting all of the benefits of a relational db, you are still getting ACID transactions, envryption, user/access controls, backup/restore, libraries, etc. You're getting this in a tool that you are comfortable with and that you may be using in other areas where a relational model is more appropriate and thus have experience with a lot of the management aspects. Obviously if you are a mongo expert and using mongo everywhere this doesn't apply but in my experience software engineers/data engineers/DBA with postgres experience are easier to find.
Hey there, thanks for reading my comments and sharing your thoughts.
> in my experience software engineers/data engineers/DBA with postgres experience are easier to find.
I need to understand what "postgres experience" means in this context. Having experience writing postgres client software using some postgres driver is a different kind of experience than installing, provisioning, monitoring and turning postgres itself.
The former is a days worth of work for any good engineer; the later can take decades and those people are really hard to find.
> you are still getting ACID transactions, envryption, user/access controls, backup/restore, libraries, etc
Sure, at a high level, making a KV-store on top of a RDBMS feels like a grand idea - hey you get ACID for free and all the RDBMS goodness while getting the flexibility of a KV-store to boot.
As they say - when something sounds too good to be true ...
One small scenario worth thinking about:
What does a KV-store on top of a RDBMS with ACID semantics even mean?
How do you handle merging objects?
Does the RDBMS even supporting merging objects inside a column?
What would happen if I have the different attributes in the same object that I then update one after the other into a JSON column on a RDBMS?
Does it fail one of the updates with an error message that data is lost?
Does it automagically merge the data? How does it even know how to handle conflicts?
Hint: No it cannot - the reason why a RDBMS can even offer ACID semantics is because RDBMS operations assume certain rules have been followed. Database Decomposition is important for a number of reliability reasons.
There are KV-stores out there that handle all these scenarios natively without the client having to worry about it (and doing a halfassed, buggy job about it because they are not DB designers). They have conflict resolution algorithms, offer CRDTs for the clients to use and engineers who know what they are doing, use them.
Using the right tool for the right job is what experienced engineers do because it not only makes their lives easier but the businesses they help run more reliable and resilient.
I'm using it because the cost of changing the schema for a feature that may only be experimental is too high. I started off changing the schema every time, and it just got too expensive.
Using the JSON-to-SQL workflow, I can mess around with feature design and iterate fast, and then crystallise the schema into SQL once it has stopped changing.
Writing solid migrations can end up being more complex than the feature code itself, and when features are changing fast, writing solid migrations is kinda pointless. The feature may not even make it to production.
But yeah, sure, I'm an idiot who doesn't know how to write proper SQL. rolleyes
> Using the JSON-to-SQL workflow, I can mess around with feature design and iterate fast, and then crystallise the schema into SQL once it has stopped changing.
Sure, you're experimenting and things are in a state of flux.
I get it. Been there done that, but not in production.
Your customers want features over reliability. This is perfectly OK at a startup - the customers already depend on a system elsewhere but that system does not meet all their needs so they are experimenting with you but if your company runs to the ground tomorrow, they still have not migrated 100% over to you, so they are "safe". If your software loses data, they probably won't even notice until they really start to move over for real and lose money (if they did not do testing for consistency).
In your position I would be honest and say data can be lost or corrupted and thats the price I am willing to pay for flexibility given the resource constraints I have.
It's too expensive to do things properly and customers don't really want that reliability so we can take some risks.
However, it would be naive to say you just designed an ACID complaint KV store by implementing some adhoc KV operations using JSON columns in a RDBMS.
I make a lot of money based off the disasters software like this creates when the original architect of such a system have "moved on" to other companies.
Data corruptions, losses, inconsistencies, meaningless key relationships, subpar performance (one place had two columns - an ID and a JSONB and were wondering why their query performance was poor, the DB locked so much and some updates never appeared to go through or "reset" the data).
There are KV-stores out there, that sell for a pretty penny, handle all KV conflict scenarios natively without the KV-store client having to worry about it (and doing a halfassed, buggy job about it because they are not DB designers and might not even know the bombs they are planting in their code. Maybe ignorance is bliss?). These systems have conflict resolution algorithms, offer CRDTs for the clients to use and engineers who know what they are doing, use them.
> The feature may not even make it to production
Feel free to do whatever you want with code that never makes it to production and affect people's livelihood.
I would argue you don't even need a DB to give you the warm fuzzy feelings and just do everything in memory.
Afterall, memory is cheap and configuring a DB correctly can get too expensive.
As an aside - if you're working at a company that regularly pushes features that do not even make it to production, there's a miscommunication issue.
Your business is bleeding money.
This is not to say overall, your business is not profitable - its just that it's bleeding money in that specific project and other more profitable ones are making up the slack.
At the minimum, the deliverable should be broken down into a POC and, when it's clear a production need absolutely exists, that POC is delivered production ready.
> Writing solid migrations can end up being more complex than the feature code itself
Sure, thats the price you pay for ACID compliance.
Nothing is free. There is no magic.
> I'm using it because the cost of changing the schema for a feature that may only be experimental is too high. I started off changing the schema every time, and it just got too expensive.
Again, you are free to do whatever you want in an experimental setup but we are talking production here.
If your proposal is you push the same design to production that you use in your experimental/POC setup, in a serious, well used and depended on production environment without anyone noticing, I would really like to know more! My contact info in the profile.
If your claim is you have figured out a way to get something for nothing - hmm.
The ROI of the businesses you help run just trended towards infinity!
Sorry, I can't wade through that much bullshit. I got about halfway through a carefully-written refutation of everything you wrote, but you're not going to listen to me (since you didn't the first time either).
I know what I'm doing. I've been doing this for literally decades. Please stop assuming that people who don't agree with you are doing so from ignorance.
> but you're not going to listen to me (since you didn't the first time either)
I listened to what you had to say the first time (which is why I responded to it) and will continue to listen to what you have to say. My contact info is on my profile.
> Please stop assuming that people who don't agree with you are doing so from ignorance
I make no such assumptions.
I shared why I think the JSON blob on a RDBMS causes more pain than it solves and I look forward to your refutation of everything I wrote.
My contact info is on my profile and I keep it there because I like to know where I am wrong.
Have a great weekend and look forward to connecting with you
Thing is, for the early stage development or prototyping, where JSON to SQL is most praised for, to be completely honest, I don't write good migrations either.
I change the application code, trigger the cli to generate the sql migration and be done with it. In 99% of cases I don't even look at the generate file, I don't write specific migration tests, etc. If migration runs without error and the application tests pass in development and staging there is a good chance it will as well in production or auto rollback saving my ass.
This works until it doesn't and you have to become much more vigilant. But at that point I much rather build upon the relative robustness and consistency of the sql I got so far, than JSON that is all over the place.
Maybe it's my tooling, but almost every time I made a JSON column in Postgres for the sake of saving time in the past, it slowed me down almost immediately after adding it.
> I much rather build upon the relative robustness and consistency of the sql I got so far, than JSON that is all over the place.
Yup.
> almost every time I made a JSON column in Postgres for the sake of saving time in the past, it slowed me down almost immediately after adding it
Backs my experience fixing software from companies that had terrible data loss issues after these JSON hacks reared their ugly head.
I am not saying JSON does not belong in a DB - I have many well tested and audited code that does do that. What I am saying is I have seen things go wrong far too much, for me to trust that someone who proposes that approach to actually know what they are getting themselves into.
Plus, it's way cheaper to use a custom built KeyValDB that actually supports object semantics if one does not mean to use RDBMS semantics.
Anyways, I am being severely downvoted and cant respond until a few hours between posts so I wont be responding any longer on this thread as I have a lot of work to do the next few days and cant wait around.
See my other comments. Contact info in my profile, would love to be in touch!
The JSON-to-columns approach is a best practice for analytic applications. ClickHouse has a feature called materialized columns that allows you to do this cheaply. You can add new columns that are computed from JSON on existing rows and materialized for new rows.
If we were going to start from scratch today, we'd probably use Postgres. But, realistically, the primary motivation behind that decision would be because Postgres is available on AWS, and that would centralize more of our operations. (DocumentDB is, of course available. Its not Mongo. I'd be curious to hear from people who actually had Mongo deployments and were able to move to DocumentDB; its missing so many of MongoDB's APIs that we physically can't, our applications would not run).
Mongo isn't that bad. It has limitations. You work within the limitations... or you don't. But I really don't think a valid option is "mongodb fuckin sucks m8, shit tier db". We're not going to be migrating terabytes of data and tens of thousands of lines of code when the benefit is tenuous for our business domain.
Should you use MongoDB today? I'll say No, but not for the reasons anyone else is giving. MongoDB's Cloud Provider agreement has decimated the cloud marketplace for the database. Essentially, if you want to run a version released in the past few years (4.2+), you need to be on their first-party Atlas product. Many other parties, especially the big clouds, are on 3.6 (or have compatibility products like DocumentDB/CosmosDB which target 3.6). Atlas is great. Its fairly priced and has a great UX and operations experience. But, I don't feel comfortable about there being political reasons why I couldn't change providers if that changes. If you have business requirements which demand, say, data in a specific region, or government-class infra, or specific compliance frameworks, Atlas may not be able to meet them.
> We're not going to be migrating terabytes of data
You may have dramatically less 'real data' than mongo makes you think you do. I migrated one of our mid sized database out of mongo and into PG a couple years ago. The reduction in size was massive. One table in particular that was storing a small number of numeric fields per doc went from ~10GB to ~50MB. I wouldn't expect this with all datasets of course, but mongo's document + key storage overhead can be massive in some use cases.
This is (probably) an artifact of Mongo's schema-less nature; when you don't have tables with structure, every document you store has to detail its own schema inline.
In a relational database, you have columns with names and types, and that info is shared by all of the rows.
In Mongo, every cell has to specify its name and type, even if that layout is shared by every other cell in the document.
Mongo's way is more flexible, but it's terrible for storage efficiency.
/caveat i know nothing about how mongo is storing data and haven’t used it since 2010
it doesn’t really have to approach the schemas that way. one would think it would be optimized for repeat schema in the same way one might create the schema definition and then reference it in packing and unpacking the data. seems like if there was schema overhead taking up storage unnecessarily that could be optimized relatively easily.
having schema references also might be a good management tool to understand which records vary potentially due to an application evolving it’s needs.
> one would think it would be optimized for repeat schema
I don't think that's the problem Mongo is designed to solve. Mongo's promise was the ability to work with un- and semi- structured data, and it would make sense if its optimizations were focused on that problem, not on reducing overhead when someone tries to strongarm it into being MySQL.
Generally speaking, if your data is structured well enough that you can define a schema ahead of time, you're better off with a traditional RDBMS, because that's the problem an RDBMS is designed to solve.
Yes, and the sooner you do it the better. But doing it during project planning / experimentation phase (or when you don't know what the final result should be yet) will really just slow you down. In many ways, very similar to the static / dynamic language trade offs.
Unless this has changed in recent years, the BSON format that Mongo uses is more or less JSON optimized for parsing speed and takes more or less as much space as storing your entire database in JSON.
JSON is a great format for simplicity and readability but as a storage format it's hard to come up with one that's more bloated.
Work in the book industry, our implementation of it(onix) is a decent way to allow non IT professionals to encode complex data in a standardised way, but as a way to store and transmit large amounts of data its a nightmare. The only thing that saves it is there is so much repeated data it compresses brilliantly. Ha.
Not OP but I think it's more about the importance of the said data, number of collections to think about and so on. Regarding your point, I would guess some index changes might have had a significant impact here.
I'll give you a reason that not many people mention not to use Mongo. Schema definition acts as a form of documentation for your database. As someone that has come into a legacy project built on Mongo, its a nightmare trying to work out the structure of the database. Especially as there is redundant copies of some data in different collections.
I actually love the idea of providers that abstract the major cloud vendors to run things like Mongo does with Atlas: you can spin up Mongo on any of the three -- boom lock-in concerns gone, and the best part, is you're both supporting the project but also have the creators for tech-support.
Regarding your question about the ability to be able to move to DocumentDB from MongoDB, we (Countly Analytics team) weren't able since several APIs in newer MongoDB releases are still not available in DocumentDB.
> Jepsen evaluated MongoDB version 4.2.6, and found that even at the strongest levels of read and write concern, it failed to preserve snapshot isolation. Instead, Jepsen observed read skew, cyclic information flow, duplicate writes, and internal consistency violations. Weak defaults meant that transactions could lose writes and allow dirty reads, even downgrading requested safety levels at the database and collection level.
Then don’t use the defaults? Sql Server use to have an empty password as a default for the Sa user and it was trivial to find servers exposed on the internet with the default password. While part of the blame was MS’s, it’s always on the person who does the installation to know what they are doing.
Yes. We have applications running on both PostgreSQL and MongoDB and I find that working with MongoDB is just more pleasant. I think it mostly boils down to my preference of document databases as opposed to relational ones. It feels much more natural to me to embed / nest certain properties within a document instead of spreading it across several tables and then joining everything together to get the complete data. MongoDB makes working with these sometimes nested documents easy (I mean, it better) and there's always Aggregation Pipeline when you need it (something that I again find much more pleasant and readable over SQL).
What always irks me is when somebody suggests PostgreSQL's json (or jsonb) types as an alternative to using MongoDB. All it's saying is that the person hasn't really invested a lot of time into MongoDB because there are things that PostgreSQL simply cannot do over a json type, especially when it comes to data updates. Or it can do that but the query is just overly complicated and often includes sub queries just to get the indexes into the arrays you want to update. All of that is simple in MongoDB, not really a surprise - that's exactly what it was made for. The last time I worked with PostreSQL's json I sometimes ended up just pulling the value out of the column entirely, modified it in memory and set the it back to the db because that was either way easier or the only way to do the operation I wanted (needless to say there are only exceptional cases where you can do that safely).
Lastly, if you can easily replace MongoDB with PostgreSQL and its json types or you're missing joins a lot (MongoDB does have left join but it's rarely needed), chances are you haven't really designed your data in a "document" oriented way and there's no reason to use MongoDB in that case.
I'm curious, any chance you remember some of those json/jsonb update hassles?
(not arguing, just curious - when things get hairy in JSON, I give up on SQL and [1] I write a user defined function (CREATE FUNCTION) in JS (plv8) or Python (plpython).
[1] assuming the update code needs to run inside the database, e.g. for performance reasons... otherwise just perform your update in the application, where you presumably have a richer library for manipulating data structures...
I don't remember the specific case (it's been a few years) but I do remember it had something to do with updating an array member. I googled around and found this [0] (the second question) which looks very similar. It's as simple as it gets - find an array member with "value" equal to "blue" and decrease its "qty". In MongoDB you can do that pretty easily and the update should be atomic. The SQL version looks complicated and it's not even the form that you should use (notice the note about the race condition). Then again, maybe there's already a way to do that in PostgreSQL in a more elegant way, I assume the support has improved over the years.
How do you handle the data integrity issues that are present, even at the strongest configurable integrity levels, or is data integrity not an issue for your application?
And if you’re using C#, the MongoLinq library makes using Mongo with Linq just as easy as using EF with an RDMS. We were able to easily support both in a product just by passing Linq expressions to different repository classes and the expressions were translated to either Sql or MongoQuery by the appropriate provider.
Yep, the update operators for MongoDB are really great and not replaceable with Postgres. Now if only MongoDB had a good sharding story it would be a worth contender for me.
No, I had a sour experience in 2009 where it ate my data, the devs were rather cavalier with "there's a warning on the download page" (I got it through apt), it ate my data again when the OOM killer killed its process.
I didn't like the project attitude of a database being so lax with persistence, so I never used it again.
I feel like MongoDB now is actually a pretty stable product simply through time and investment, however I will never trust the company for using our data to beta test for a decade.
That's my attitude as well. RethinkDB, in comparison, had a much better attitude of "reliable first, fast later". Unfortunately, it turned out that when you're a database, it doesn't matter how much data you lose, only how fast you are while losing it.
> MySQL is slow as a dog. MongoDB will run circles around MySQL because MongoDB is web scale.
> "MongoDB does have some impressive benchmarks, but they do some interesting things to get those numbers. For example, when you write to MongoDB, you don't actually write anything. You stage your data to be written at a later time. If there's a problem writing your data, you're fucked. Does that sound like a good design to you?"
> If that's what they need to do to get those kickass benchmarks, then it's a great design.
> "..... If you were stupid enough to totally ignore durability just to get benchmarks, I suggest you pipe your data to /dev/null. It will be very fast."
> If /dev/null is fast and web scale I will use it. Is it web scale?
> "You are kidding me, right? I was making a joke. I mean, if you're happy writing to a database that doesn't give you any idea that your data is actually written just because you want high performance numbers, why not write to /dev/null? It's fast as hell."
Listening to the community and using Postgres is my biggest regret. In hindight, given our scale, any database would have worked. There is no built in solution for high availability with multiple VPS, and having one server alone isn't enough availability for me.
I love RethinkDB. I used it for 5 years at my previous company. It was document oriented, relational, and stable.
I use Postgres JSON blobs right now, but it's odd having a different syntax for keys/values depending on whether they're in a named row or a JSON blob.
Yes, it's our main DB. I still like it quite a lot, we use Mongoose as ODM, it makes adding new stuff so much easier without having to do things like alter table etc. But for our big data stuff we use BigQuery, simple because of cost.
I do like how easy it is to get a mongo instance up and running locally. I found maintenance tasks for mongo are much easier than postgres.
One thing you still need to do is manage indexes for performance, I've had to spend many a days tuning these.
I have come across some rather frustrating issues, for example a count documents call is executed as an aggregate call, but it doesn't do projection using your filters. e.g you want to count how many times the name 'hacker' appears. It will do the search against name, then do the $count, but because it doesn't do a projection, it will read the whole document in to do this. Which is not good when the property you're searching against has an index, so it shouldn't have to read in the document at all.
Yes, use it with Atlas for every one of my companies' projects.
- The document model is a no-brainer when working with JS on the front-end. I have JSON from the client, and Dictionaries on the backend (Flask), so it's as easy as dumping into the DB via the pymongo driver. No object relational mapping.
- Can scale up/down physical hardware as needed so we only pay for what we use
I'd echo most of those sentiments, but not all:
+ Support on Atlas is good
+ Set up process is very streamlined and smooth
+ Modifications (scaling IOS etc) are all very easy, but ...
- The Atlas web client had a bug that swapped data types (fixed, but still)
- The database starts off fast, but seemed to get quite slow considering how small it was (fitted entirely in RAM)
- The latest Jepsen report suggests t is still very cavalier with data integrity (http://jepsen.io/analyses/mongodb-4.2.6)
My experience has been the same, for I only used it for a couple weeks now.
I love their web interface - anyone knows if there are any cloud offerings for PostgresSQL which are similar?
I can't help noticing that the majority - not all, but the majority - of "No." responses here summarize identically: someone used MongoDB a long time ago (between 5 and 11 years) and ran into a problem, so they stopped using it and will never try or re-evaluate it again.
I'm a bit surprised that developers and systems engineers get burned to the point that they disconnect from the daily reality of their occupation, that software is often shaky in its infancy but almost always improves over time.
Here's the thing, it's not just that early Mongo had issues, sure whatever that's life. But they argued that things like silently dropping data if your database gets too large were acceptable because it's documented (and this is just one such issue).
The sole, number one requirement I have for my database is that the data I put in is still there when I go to get it back. Failing that and pretending it's not an issue is more than "being shaky", it's a violation of the trust I put in my data-store. It's great they've fixed those issues now, but that cavalier attitude towards data the integrity of my data is what makes me hesitant to use it in future, not the fact that a bug involving data-loss existed.
None of this is to say I'd never use it, but it'd be far harder for me to trust it again vs postgres, couch, rethink, cassandra, or any number of other data-stores that took data integrity seriously from the start.
The problem with MongoDB is not that it was shaky in its infancy. It is that the developers made it very clear from the start that they don't care. Speed is everything, everything else is secondary. This changed a bit over time, but the latest Jepsen report (http://jepsen.io/analyses/mongodb-4.2.6) is still a mess and by now I'm pretty sure that is less "shaky in its infancy" and more a deep seated design problem that will never be fixed.
How often I re-evaluate applications, you mean? I would say it happens somewhat regularly; if there's something I miss with one I switched from, or if I hear or read something good about an application I stopped using.
I guess the more interesting questions are how often you re-implement discarded technologies back in to your core stack after they’ve let you down, and how much time you spend re-evaluating your core tech but don’t find enough improvements to incur the switching costs.
For something as fundamental as the database, most developers (and people who rely on them) would probably hope for “rarely” and “not much”. What are the benefits you see from taking a different approach?
I had to rip out a comletely schemaless mongodb and replace it with a sql DB. The data was completely relational, and developing without a schema was a pita.
I asked why mongodb was used in the first place. As far as I could tell, the answer was one resume driven dev.
If a document store made sense for our data then we would’ve made mongodb work. It was less “we got burned” and more “why were we using mongo in the first place?”
If you have a problem with product A and you have 10 other options, find product B is good enough and start working with it, there is no need to go back to A from time to time to see if it was improved. And watching all the 10 products like a horse race and move to the best of the hour is not reality.
I don't think this is limited to devs or databases. You see it sometimes in Firefox/Chrome threads along the lines of "firefox sucks", "when did you last use it?", "4 years ago".
Mongo in particular had marketing plays that were quite deceitful; that particular flavor of distate can run deeper than some tech issues.
disclaimer: I use mongo in my day to day as a primary DB store.
I've promised myself to never touch MongoDB again.
Worked for a valley startup back in 2013 that picked Mongo as primary data store because the CEO liked the simplicity and was too busy with the future to learn anything more complicated.
I only implemented a couple of features before I got out of that mess. But from my experience, compared to dozens of SQL and NoSQL databases I've worked with; it was definitely the worst option. We spent way too much time dealing with Mongo-specific issues and limitations.
Whats your opinion on postgres vs nosql like cassandra and aerospike, fundamentally is there any reason that postgres can't scale as well as nosql? If I store key value in postgres and add read replica to scale read and partition to scale write will I not be able to keep up with other nosql solutions? If so what are the reasons?
Unfortunately. And devs are still doing triple lookups (joins) like they’re using sql, forgetting to add indices and designing/evolving schemas sloppily. Data is a mess and there’s an outage every few weeks. It’s also expensive af (probably due to improper use). It’s flexible when you’re trying to “get going” but creates pain later on.
I like SQL because it is way more expressive and helps you answer questions you didn’t know you would have. I find that incredibly valuable. Pretty tough to do in mongo. Generally BQ is good for this (in addition to your app DB) but if you use mongo you’re gonna have a hell of a time stuffing that sloppy schema into any column-oriented dB.
I like some of the serverless GCP dbs like datastore and firestore over mongo. They Index every field and force you to add composite indices on first query run by spitting out an error with a link to create it. If you understand their unique but simple API, limits, and quotas, they work predictably and scale nearly limitlessly.
> I like SQL because it is way more expressive and helps you answer questions you didn’t know you would have
That's precisely one of my points against using object-oriented approaches to model domain data in business-related, ERP-like applications. I always go for simpler data structures representing relational database records instead. Way more flexible.
No, I inherited a project that was using it a number of years back. After 6 months we migrated away to postgres. The data was really relational so it was just the wrong tool for the job. I can see that it might have value as a document store but with postgres' json facilities these days it's hard for me to see a scenario where I'd choose it.
Yes. I've been using mongo for various projects for 8+ years. I like the flexibility of the document model. No migration headaches. The query language is powerful and intuitive. I haven't used the graph features yet, but it is nice to know that Mongo can support it if that need ever comes up. I use mongo Atlas as a managed database for peace of mind.
I use Redis in addition for caching.
For testing/TDD i use mongo-memory-database. It creates a isolated in-memory instance for each of my test suite, so there is no need for mocking.
> Jepsen evaluated MongoDB version 4.2.6, and found that even at the strongest levels of read and write concern, it failed to preserve snapshot isolation. Instead, Jepsen observed read skew, cyclic information flow, duplicate writes, and internal consistency violations. Weak defaults meant that transactions could lose writes and allow dirty reads, even downgrading requested safety levels at the database and collection level.
Interesting. I haven't had a usecase where data integrity is critial yet.
A project i'm working on now will have credits and accounts. To accomplish that in mongo. I create a transaction with with "pending" status. Then i try to debit the source account, and credit the destination account and I add the pending transaction to the accounts. If that works I set the transaction to "committed" and I remove the pending transactions from the accounts.
If you haven't already I would absolutely read the Jepsen articles on MongoDB, at least so you are aware of the risks and failure states. There's some stuff about transactions that may be relevant too.
Yes, still use it, together with GridFS. We mostly use it as a persistence store for SOAP message dumps from XML to JSON in Java; everything else usually resides in a Postgres or Aurora RDS database on AWS or some legacy Oracle/MSSQL stuff elsewhere (which we try to get rid of).
The issue with most of this stuff is that a lot of projects just need some persistence and basic querying, and almost any database can do that equally fine. At that point, maintenance, ops workload in general or reliability are the differentiators and most of those go away when you run them as a service with your cloud provider of choice. Which specific one you use is practically selected for you: take all the ones that are compatible with your application or framework and sort by price.
I use postgres for pretty much everything these days, sometimes with Hasura if I need GraphQL.
There was a short period of time (1-2 years or so?) where I did my projects with Mongo, but right now I don't see what it can offer over something extremely stable like postgres, which also handles JSON amazingly well.
Someone needs to explain to me what the benefits of NoSQL with MongoDB are when you have the JSONB column type and the ability to query and insert at the field level with JSON in PostgreSQL? Maybe there's some benefit, but I'm not seeing that major "gotta have it" feature or performance gains. And I ask this question seriously, because I just don't know the answer.
>Someone needs to explain to me what the benefits of NoSQL with MongoDB are when you have the JSONB column type
Schema-less design is a feature of MongoDB, not NoSQL, but has unfortunately has persisted as one of the "benefits" of NoSQL. However when you look at the top 5 "NoSQL" databases on db-engine rankings only 2 support "JSON", MongoDB and Elasticsearch. The rest are key value, or require schemas (Cassandra).
The appeal of NoSQL, to me, has been the scale out to tens, hundreds or thousands of machines relatively easily. This is where Postgres does not shine, and this is the only reason I'd choose something as binding as DynanmoDB over something like Postgres.
Having to work with MongoDB, I say only benefit is you don't need to plan or think through your design. Which in my view is not really a benefit. You end up checking for null properties everywhere.
Yes, we are using it at iFunny (20М+ installs on iOS + Android). 180 virtual servers in 12 clusters. Reasons are scalability and reliability. We can turn off any baremetal host under those virtual servers and system won't even notice that (it actually happens 2-3 times a year). We can add/remove replicas to scale reading load and add/remove shards to scale writes.
Yes, still using it for storing relatively unstructured blobs of JSON which I only lookup via key but update with various operators ($addToSet, $set, $incr). Also using it as a persistent session store and lately for storing rate-limiting information. I've come to like the MongoDB update operators and features such as the Change Events. However I will eventually be moving off MongoDB as I need horizontal scaling via shards, and the sharding setup of MongoDB is needlessly complex and brittle. Currently planning to go with ScyllaDB for my large-volume key-value blob datastore and some flavour of SQL for low-volume things which need more complex query behavior (user database, subscriptions). ScyllaDB will present challenges since I still want to have the ability to update single keys in big JSON blobs, even if my only lookup is by key.
If MongoDB came up with a better sharding experience where every box is equal and you don't need the dance with shards on top of replica sets plus mongos plus config servers plus arbiters I might consider it again.
It is just shocking that you would be on mongoDB and move away from it for horizontal sharding. As some one who has never used mongo that seems like the only reason to use it
No offense, but if I wanted to have managed MongoDB I might as well use AWS DocumentDB.
I honestly believe that this is a key differentiating feature for many databases. ElasticSearch, ScyllaDB, others too work the same way: Every node is equal and in order to scale you just keep adding more boxes, end of story. Compare that with what you have to do with MongoDB.
DocumentDB is a great DB, but in the end of the day it's a forked version of 3.6-ish so it's missing a ton of new features like multi-collection ACID guaratneees.
Plus, whenever you use DocumentDB you're accessing it via the MongoDB official drivers, so you'll be handling that compatibility mess yourself.
I'd be curious if anyone has experience using Mongo or other document DBs as a read model projection database in a CQRS event sourced system.
That is, data which can be reconstructed from a consistent event store at any point, and is organized into structures for fast querying and loading for display on the client.
- Backend load -> update -> store updates must be atomic and consistent
- Client reads are fine being eventually consistent
- Need to store structured data (ie a calendar event with the names of all participants) but also find and update nested data (ie a participant's name changes).
- Client queries need indexed filter and sort capability on specific document fields.
Mongo or similar databases seem like they might be ideal for this use case, since they allow storage of nested document data while still being able to index and perform updates on that nested data, but I haven't really seen a deep dive from anyone using it for this purpose.
I think the saying goes: fool me once, shame on you; fool me for like 7 or 8 years in a row, shame on me.
At this point, if MongoDB could turn on a dime from being an unreliable system to a robust one, that would be the most remarkable part of this story, because I've never seen that happen with any engineering organization. It's rare that engineering departments can significantly improve their quality culture, and it's never fast. Microsoft Windows went from crashing (for me) >=3 times a day to crashing almost never, and it took them over 10 years -- and replacing their kernel twice. I've seen nothing from MongoDB to make me believe they even realize the problem, much less are on their way to solving it.
In the lifetime of a database, 2 years is nothing. And it's always easier for quality to slip than to improve. If they turned it all around since I last touched MongoDB, good for them, but they've burned through all their trust, and it's going to take a lot longer than that to earn it back.
Why would you? The philosphy is the same even if you have better locking. It hasn't changed substantially. The only time you get schemas even are when you use it with like BI connectors. And that's just an accomodation.
I am still using it for few legacy apps. I don't really love it but it it is rock solid for that payload: mostly just storing simple events. For new projects I just use Postgres, especially after they've introduced json datatype I don't have need for noSQL database for type of projects I am doing.
Yes, we do at WaystoCap. It's our main DB. I like it mostly for the flexibility. I think it's one of the best DBs when you're in pre-product market fit stage and making a lot of changes to the data model.
One of the main cons I've experienced with it, is it's beginner friendly nature and docs leads you to have a non-optimal data schema for No-SQL. Like even the way it does pagination with Skip, is not the performant way to do it.
As areas in our business mature and scale we suffer bottlenecks and have pretty big changes to optimize those areas of the data model.
There seems to be a hate bandwagon that everyone is jumping on; I've seen it on Reddit for about the last year or two. The one guy who talks about the issues with Atlas being the only platform which gets the latest versions does seem quite concerning though.
I was very happy with Scalegrid hosted MongoDB for the 2 years a project ran on it. Prior to that, 1 year on a self-hosted cluster had a failed write due to a network partition and I learned about single write masters and used CouchDB on my next "document-DB-appropriate" project. Atlas ended up being expensive and the service was a bit stand-offish, where Scalegrid's staff were super helpful etc...
Their pricing was the primary factor.
What do I define as "document-DB-appropriate"?
Blobbed stuff where you want it all in the same document sized doses, and always the same dose, to the point where you feel ridiculous putting bits together in the same shape over and over and notice that you CRUD the same dose/shape all day... Objects...
Don't get me wrong, there's times in computer data land when the first 128 bytes are the header and that the data is multiplexed in 32bit chunks with 24 bits zero-padded and this padding tells you "channel 2 is starting now buddy!"
But SQL is certainly a viable option for many things and rather standard and known and supported and stuff...
No, we used MongoDB in 2009-2010 but it was a disaster as performance and integrity fell apart as our data-set grew. Maybe it’s OK now, but I see no reason to go back.
I have not used it for years. I simply prefer relational databases after all.
When a document database makes sense, it happens, I go with CouchDB. Its multi-primary architecture is attractive compared to MongoDB with a single primary node. But I'm thinking about using PostgreSQL and jsonb next time.
There's no reasonable way to compare performance between PostgreSQL and MongoDB because they're not the same kind of product and they don't have a similar query interface.
You can take a particular application and compare performance when using two different databases on the back-end, but then the database itself might not necessarily be your bottleneck, it might also be the way the application is written. Because the database is only a part of the equation, the comparison also doesn't tell you anything about the performance of any other application.
You can get amazing performance results if you don't care about data persistence. Anyway when postgresql introduced JSONB there was a benchmark for CRUD operations, can't find it now, because it was long time ago, but I remeber postgresql was doing circles around mongodb.
I am disappointed with the direction that MongoDB took this past few years. Going ACID shows in benchmarks [1] and it’s not advisable if you are using MongoDB for stats and queue. (No one uses MongoDB for financial transactions despite the changes.)
And the recent change to a restrictive license is worrisome as well. I have been thinking of forking 3.4 and make it back to “true” open source and awesome performance. (If any C++ devs want to help out, reach out to me! username @gmail.com)
I had some positive outcomes using Postgres and Mongo as essentially a front end cache. The use case was unique in that all of the spark/etl jobs ran against postgres, which then triggered caching workers to build up a more document oriented cache that the front end talked to. Essentially allowing the front end to get a KV on steroids and documents already prepared well to be rendered and pre joined.
Against a specific use case. I was never burned by Mongo, but I wouldn’t choose it if I had to have one backend only. It lends itself nicely as part of an ecosystem imho
Our use case is dynamic dashboards generation where a document contains multiple components for our frontend to render. Having a simple unstructured database really helps us build the dashboard efficiently. Using a relational database would increase the development time of each dashboard tremendously and having relational integrity would make it even worse.
Having a simple document with everything needed is a much nicer experience. Granted, our use case is very limited and it is read only.
We've used it for several years after switching away from it for Docdb last year and then back again in November. We've had applications running on both MongoDB and AWS continuously through 2019, but our clients and team in general needed certain things: Ddb doesn't support everything we've done w/ mongo in the past. Our team also feel it's easier to integrate and translate to clients.
We stopped using it because indexing was annoying, and we found a product (CosmosDB) that automatically indexes everything for us and still has a pretty decent SQL-like query syntax.
I also think the "cool" factor has stopped. It was very chic to use Mongo back in 2010 when everybody else was trying to scale with SQL. Nowadays, DynamoDB/CosmosDB/Cassandra eats Mongo for lunch.
The company went bakrupt, but the game we made[0] still lives. It uses MongoDB to store user accounts, profiles, inventories, match history and all persistent data.
I was the one who suggested to use MongoDB when we started (instead of SQL) as the game front-end was JS and back-end Node.js, meaning that having a MongoDB database we could easily type, store and retrieve data. Overall I think it was a good decision, it ran pretty well with over 200k MAU, ~2-3k concurrent. We never really had performance issues with the database, but we did have to implement our own locking system to make sure that non-atomic operations are performed correctly. I think there are only 3 VPSs running MongoDB (one master, one replica and one back-up I think), so actually the entire database (a few gigabytes with around 1M accounts) only runs on a single cheap VPS.
MongoDB is a pretty good database IMO. I've used it at several companies in the past and wouldn't mind using it again.
My favorite DB is RethinkDB. It's a shame that the company behind it fizzled out and was absorbed by Stripe. I still cannot wrap my mind around why it's not more popular. It's similar to MongoDB but much better. It's the perfect database. It adds constraints which improve the quality of your code. Also RethinkDB scales very well and the control panel that comes with it is mind-bendingly powerful, I'm not kidding; you can click a button to shard or replicate a table over multiple hosts! WTF! I can't say the same about Postgres unfortunately. There is nothing truly remarkable about it.
I use Postgres for one of my projects today but purely because of compatibility reasons with an existing system. I don't understand what all the hype is with Postgres.
So what's the status of RethinkDB? Is it still being developed? Who's working on it? Is it stable and feature-complete or are there any kinds of features missing that only a dedicated (paid) team would be able to pull off? Genuinely curious!
IMO I've come to see DB transactions as a hack because they limit scalability.
It's possible to use two-phase commits instead. Two-phase commits can scale without limit but you have to be more careful when designing your tables and specifying your indexes.
For tables which need atomicity, you can add a 'state' column/field which is either 'pending' or 'settled'. You can have a separate parallel subroutine (or process) in your code to handle settlement. You have to design it in such a way that if the system fails at any point, it will re-process all 'pending' entries in an idempotent way.
It's a bit more work, but it scales without limit. You can add a shard key (or use account IDs) so that you can have multiple processes/hosts working in parallel to insert pending transactions and also multiple processes to settle them in parallel.
In financial transactions, I would separate it into two parts: A debit entry and a credit entry which are initially inserted into the DB as 'pending'. The settlement routine will match them up together - Making sure to process/settle the debit side of the transaction first and only once the debit is in the settled state, it will process/settle the credit side.
If the settlement engine sees any conflict (e.g. user tries to spend more money than they have by submitting many transactions in parallel), it will accept all the transactions as 'pending' but mark the later ones as canceled (they will not settle) so the credit entry will never be created on the other account.
Exactly it's a bit more of work does make you prone to mistakes. This is what happens when you rewrite transaction on your application. See what happens with to a btc exchange using mongo who got hacked.
I'm a bit outdated with mongo since 2.4-2.6 . It's a bit traumatic and i'm never coming back to it.
If in case I'll need a high atomicity and consistency for financial transaction i'll just use postgres with SERIALIZABLE transaction isolation. This solves everything.
Checkout the redbook.io it's a bit outdated but you can see from stonebreaker's discussion that nosql + sql will merge, which is what happening or happened right now. Mongo having transaction, and SQL having json datatypes.
much of the anti-mongo sentiment is around people who used it in 2009-ish.
it's useful to point out that when using the early version of any software, there will be bugs. that's a trade-off early adopters always need to contend with. unfortunately, it hasn't sat well over time despite every one of the concerns being addressed since.
> much of the anti-mongo sentiment is around people who used it in 2009-ish.
So? I think this is a very plausible way of measuring how much you should trust something. There had to be a downside to the "move fast and break things" that mongodb subscribes to (i.e., add features now, think about reliability later).
As it turns out, they moved fast and broke their reputation.
i agree, there's reputational impact but again it's the trade off that early adopters make and are fully aware of.
When you're the first user of a brand new companies' database, you better bet there will still be kinks they're working through and let's face it you're the guinea pig.
i'm sure mongo is thankful for all these guinea pigs because they helped to define the roadmap and iron out the bugs. however, in the end of the day 2009 version of a product is entirely different than 2020 version. there are endless analogies (facebook, google, etc.) the only difference with an open source technology is users can analyze it under a microscope.
> You make technical decisions based on what happened a decade ago? Do you make all decisions based on ten year old information?
Yes?
The problem was not that MongoDB was young, the problem was that authors of it had no idea about databases when they started their work. They stored data in RAM and get fastest benchmarks, because they didn't worry about persisting the data on disk. They outright lied in their documentation about the guarantees.
It wasn't the being early product part, it was the misrepresenting the truth part that hurt their credibility.
DynamoDB for example is much younger than MongoDB, and doesn't suffer from this.
yep I heard of it and had similar period of excitement I had with Mongo. Although when they were postponing addition of distributed functionality I lost interest. You can't design a good distributed system as an addon, you need to think about it from day 1.
You seem to think this is about tech. It's not. It's about company culture, trust, and reputation.
Code can change reasonably fast, in some cases 10 years is enough to replace an entire codebase. But company culture has or can have an expected life time way longer than 10 years. In this case, my heuristic is justified.
I’m saying it’s completely illogical to base technical decisions on information that is 10 years old when just like you can see the modern specs of the iPhone and compare to the first iPod, you can also see the modern specs of Mongo, read third party assessments, and even look at the source code to see if it meets your needs.
At that point it was webscale ready. The problem was that MongoDB lied in their documentation what guarantees it provided. It took them a decade to improve, and now that data is safer (still not completely) it no longer offers performance edge over its competition.
Originally it was super fast, because it kept everything in RAM and didn't worry about writing data to disk, which caused many people to lose data.
the hype about postgres is that there is no hype. it's one of those systems that have been working for decades. they have their flaws but they are well known and as long as you aware of them it does exactly what it is supposed to do.
I have to say https://www.mongodb.com/cloud/atlas is awesome. Use it at work and would like to use it on personal projects in the future instead of whatever other DB solution.
Looking at the responses, I'm seeing a fair number of "No" responses. I recall a time when devs would practically rage on you if you dared question the decision to use Mongo. What changed? Is it just an example of dev cultism where the cult devs are the loudest?
I've actually never used it, and probably won't at this point. But, we do use Couchbase at my workplace, and it's worked well for us. The use case is very limited in scope, but it serves our purposes well.
I'd be curious to know how they compare from folks who have used both?
Depends on what you're doing, I suppose. CouchDB's multimaster concurrency features are still effectively best in class, but there are so many managed DB services now, that it doesn't seem like a headline feature.
The RESTful interface is pretty cool. And Couch's in-built security features are near enough a best kept secret.
CouchDB always lost to Mongo in discussions about speed unfortunately. Data safety and security notwithstanding. You can speed it up if you're willing to use Erlang as the primary query language, which wasn't something many were keen to do.
Another big hamstring is data modelling in a document db...
I'm not a data scientist, but anytime I hear complaints about Mongo, and have access to the codebase that's the subject of the complaint, I invariably find whopping great multi-property entities confined to a single document, without anything close to an attempt to normalise the data (or worse, a concerted intellectual effort to stuff everything into a single document...).
Another CouchDB fan here. CouchDb's mango queries are implemented as Erlang map functions under the hood. Users now can get the speed of Erlang and the usability of Mongo-like query syntax.
Thanks for the CouchDB feedback guys! I am curious to try it out. I should have been more clear, I was actually referring to Couchbase (https://www.couchbase.com/) not CouchDB :) Very unfortunate naming conflict considering they are both JSON document stores. Is there something about the word 'couch' that I'm missing?
From my experience with Couchbase though, I would recommend it to others. They support SDKs in many languages, have decent documentation, and the server has performed well for us. The query language 'N1QL', tries to emulate SQL syntax, and I found it nice to work with.
Couchbase was a commercial CouchDB spinoff product, co-created by the programmer who created CouchDB. Much of the API is (was?) the same, but where possible, Erlang was exchanged for C++.
It also has a built-in, in-memory cache layer, which CouchDB doesn't have.
I used CouchDB for a project back in 2012 when all of these things were fairly new. The reason was to embed the DB on an Android device. This turned out to not work as well as advertised. I have no idea what's happened with Couch since, but started got pretty familiar with Mongo in the last 12 months. It's OK, but it's expensive to make it performant. As other commenters here have mentioned, the primary use case for these Object/Document storage solutions seems to be to remove knowledge of Relational DB as a requirement for developers.
I think it's probably excellent for prototyping, but I wouldn't want it running a heavily used product in production.
Does MongoDB still use multi-maps? That is, can a map still have multiple of the same key? This led to a very tricky bug at my last place of work, where the MongoDB Ruby client would see different values than the Python client. It's not a problem with MongoDB, per se, but many of the clients assume Mongo stores maps, when it really stores multi-maps. And it did hurt my opinion of the database seeing that most of its users we're not actually aware of the data structure it was based on.
Yep. Thats only because Graylog requires it. We don't put any real data in Mongo. That'd just be silly.
I'd put SNMP telemetry data in Mongo, only for the fact that recording that can be somewhat lossy. Plainly, I just don't trust Mongo with consistency, availability, or partition tolerance.
And because Mongo's backup facilities suck (requires taking the DB into readonly, or accepting no time consistency egads), the only good way to do a backup is to put the DB on LVM, and making a LVM snapshot.
I got familiar with Mongo because of a piece of software that we acquired last year. It was way out of date and step one was to bring it up to latest. Step two was to change the hardware configuration and indexes to something reasonable. After doing this it's tolerable, but still lacking in speed per cost in my view. We have tentative plans to migrate to a relational database.
There are probably good use cases for Mongo, but I haven't found one yet.
jsonb is limited in it's querying capabilities. you're essentially casting the json as a string when storing and querying, it's not truly json. therein lies the advantage of document model, native secondary indexes, $<queries> for everry nested layer, etc.
Postgres's jsonb format is definitely not "essentially" a string. You can get native indexes on json fields in Postgres, and you can query fields however you like.
This is among the most shallow comment I ever saw on HN. Just because `TreeSet a = [1,2,3]` is surrounded by [ and ] does that mean `a` is a list? When you do `jsonb j = '{"a":"c"}` postgres builds a separate data structure for you from that json, which allows you to use a btree index.
I don't believe that is true, as far as I understand jsonb is a binary representation, not text. And there are certainly ways to index jsonb columns, either a function index on a particular jsonb expression or a GIN index so that you can perform contains (@>) queries against the whole jsonb contents.
I only use it on one legacy website I maintain. It's used as a more advanced Redis to cache JSON structures and extract just the data that's needed. This use case has proved to be mostly useless. It would have been better to just use Redis which is already used for caching other data.
I haven't used it for a while and I've been planning on learning Postgres after much praise from a friend (and the internet at large). I find DB design and migrations quite difficult though, any book/course recommendation?
I don't have a specific book for you, I myself learned about relational databases at school, not sure why is not being taught everywhere.
The things that you have problems with aren't really specific to PostgreSQL so if you are looking for a book while one based on PostgreSQL might be easier, there might be better books that talk about relational databases and what you learn will still apply.
About migrations, from PostgreSQL point of view you just use ALTER TABLE, CREATE INDEX, DROP INDEX etc. it's fairly trivial.
The problem comes that developers want to have it version controlled. So a framework around it is often built. Typically that functionality comes with whatever framework you are planning to use. For example if you use Python and Django framework it comes with migrations functionality, so you just learn what the framework provides.
I studied a different Engineering, not CS, so never really had any course (besides basic C). I've read quite a bit and understand the normalization forms, but it's more from a practical sense. What data should we include, how to structure it so that it's not too complex, but doesn't come back biting me, etc.
A good example (but not only!) is login. You can have just user session, or session+token in DB, or session + devices (1 active token/device), etc. So maybe seeing business problems and how the DB was structured to help would be the best here.
> The problem comes that developers want to have it version controlled.
Oh maybe that's the reason I found those so tricky. I don't explicitly care about VCS, but I definitely care about migrations going wrong, and hopefully being able to revert if it goes wrong. Maybe if I get myself comfortable enough with manual migrations this is not such a huge topic though.
When structuring data you generally want it to be in 3rd normal form or higher. The normal forms the higher you go the less duplication you have in your data, but then the more joins you will do and that could reduce performance.
> A good example (but not only!) is login. You can have just user session, or session+token in DB, or session + devices (1 active token/device), etc. So maybe seeing business problems and how the DB was structured to help would be the best here.
I think you're overthinking it a bit, besides I don't know if you're saying it but in case you are, you don't want to mix login information (like user account, name, login password etc) with a session. Those things are separate and have different requirements.
For example let's say your site became so popular that you have scaling problems. It is perfectly fine to take the session and put it in nosql store, or (maybe even better) in a cache. This data is accessed on key/value basis and also while not great it won't kill you if the data disappears due to some outage (users just get log out).
As to what you store there, session + device or session + token in DB is purely based on your need. Frankly I don't know what you mean by token in DB, in fact I'm not sure what do you mean session + device either. Sessions generally is a randomly generated ID that tied to a session, if user connects from a different device they will have another session anyway.
I don't think this particular thing is something that relational database dictates. But you generally wouldn't want to place sessions in user table. Those two things are completely different even when they seem to be related. You should have separate users table and separate session table (most frameworks handle sessions for you, so you might not even need to think too much about it)
> Oh maybe that's the reason I found those so tricky. I don't explicitly care about VCS, but I definitely care about migrations going wrong, and hopefully being able to revert if it goes wrong. Maybe if I get myself comfortable enough with manual migrations this is not such a huge topic though.
In PostgreSQL (it might not be true in other databases) DDL (Data Modification Language) can be inside of a transaction. So you can start a migration with "BEGIN" and then check if what you did worked fine before you say "COMMIT". Note though, when I said developers want to have it version controlled. They do that to avoid any mistakes. Such as forgetting the BEGIN, or accidentally pasting wrong thing to the shell. So if you have a production data you absolutely should use them. But if I were you I would first try to learn it by doing it by hand on a test database. If you understand what is actually done the migrations don't seem that magical.
Anyone has thoughts about cassandra and how it compares to mongodb? There seems to be a big enterprise push with azure's cosmosdb, but I've not heard much about it from people who have actually used it.
Unless you can hire Cassandra engineers who have enough experience maintaining clusters, or you can afford to pay DSE to do that for you, working with Cassandra will be quite the burden.
The advancement of PostgrSQL has been amazing, and much credit is due indeed to the NoSQL community’s innovations. I cannot comment of MongoDB except to say that it is quite different than Cassandra.
To summarize, be sure you really need Cassandra and are able to dedicate the appropriate resources to it before taking the plunge.
I had to port the BSON library to an embedded product. I wasn't pleased that their approach to malloc() failure is to call abort(). Fuck it, we're too lazy to report errors and degrade gracefully.
Yes, for prototypes, MVPs, temporary chaotic storages.
NO MongoDB for production workloads. No MongoDB for more than 1000 records in one collection. No MongoDB higher 4.0.3.
There are huge overlap between Mongo and Firestore. Mongo gives you the ability to finely control many aspects of the DB (sharing, replication, etc...) where as Firestore manages it for you. Firestore and DynamoDB are also closed source and you can only run them on their respective cloud providers. I don't have enough direct experience with DynamoDB to give you an accurate comparison.
We have mostly moved off mongo at this point, there remains a single tiny mongo cluster running with a handful of collections that aren't worth the time investment to move at the moment. Almost everything moved to PG. The abstract issue we had with mongo is the purported 'best practices' with using it were seemingly in conflict with its actual implementation. I should note that these are issues across the last ~5 years so it's likely some of this has changed, it's also likely my recollection of the details are not perfect.
Mongo pushes the idea of keeping related data in a single document. So if you have a hierarchy of data, keep in all in a nested document under a 'parent' concept, say an 'Account'. The problem with this is that there is a document limit of 16MB and key overhead is high. At one point we had to write a sharding strategy where by data would be sharded across multiple documents due to this limit. This also broke atomic updates so we had to code around that. We also ran into a problem where for some update patterns, mongo would read the entire document, modify it, then flush the entire document back to disk. For large documents, this became extremely slow. At one point this required an emergency migration of a database to TokuMX which has a fast update optimization that avoids this read-modify-write pattern in many cases, as I recall it was something like 25x faster in our particular situation. This same issue caused massive operations issues any time mongo failed over as the ram state of the secondary isn't kept up to date with the master so updates to large docs would result in massive latency spikes until the secondary could get it's ram state in order. In general we just found that mongo's recommended pattern of usage just didn't scale well at all, which is in contrast to its marketing pitch.
I think at one point we had something around 6TB of data spread across 3 mongo clusters. After migrating most of that to PG or other stores and reworking various processes that could now use SQL transactions and views, the data size is a small fraction of what it was in mongo and everything is substantially faster. In one extreme example there was a process that synced data to an external store as the result certain updates. Because we couldn't use single documents and had no cross document update guarantees we would have to walk almost the entire dataset for this update to guarantee consistency. It got to the point that this process took over 24 hours and we would schedule these update to run over a weekend as a result. With the data moved PG, that same process is now implemented as a materialized view that takes ~20 seconds to build and we sync every 15 minutes just to be sure. Granted this improvement isn't just a database change but rather an architectural change, however mongo's lack of multi-doc transactions and document size limit are what drove the architectural design in the first place.
Then there are bugs, of which there were many, but the worst of which was a situation where updates claimed to succeed, but actually just dropped the data on the floor. I found a matching issue in mongo's bug tracker that had been open for years at that point. Ultimately I just can't trust a datastore that has open data loss bugs for years, regardless of its current state.
I have never used MongoDB directly, but I can tell you since their license shenanigans I consider it toxic.
Their "we will try some different form of Open Source (which really isn't open source at all, but we still want you to think so, because we know open source is popular) also AWS did something evil by using our software accoding to the license that we used for our software" thing really didn't inspire any trust.
I've done a couple projects where we kicked off with postgres using JSONB columns for early iteration. Then we gradually migrated to normal SQL columns as the product matured and our design decisions crystallized. That gave us basically all the benefits of mongodb but with a very smooth journey toward classical database semantics as we locked down features and scaled.