Marketing. Marketing is what MongoDB got right, along with a small helping of being early to market. It's not even the best or most convenient document store, doesn't have the most features, isn't the most secure, doesn't handle high availability the best, they basically got everything just "good enough" (and sometimes not), along with stellar marketing.
They marketed their asses off while they iterated to build a document store that was worth the praise they got people to heap on to it.
IIRC the acquisition of the wired tiger engine was basically a rewrite, because the previous implementation was just bad. So if the database engine wasn't good, and distribution wasn't good (until later), and schema management wasn't good, and the query language wasn't all that great, what did mongo really get right? Marketing and being early to market/new and shiny at the right time (when everyone was deciding that writing the frontend and the backend in the same language, javascript, was an awesome thing -- which I don't necessarily disagree with).
If someone needs a database these days, I always just suggest postgres. if it's good enoguh for reddit (https://github.com/reddit/reddit/wiki/Architecture-Overview#...), it's good enough for the next facebook for dogs (unless your problem is completely different then of course you should explore whatever database paradigm fits your problem the best).
The biggest thing MongoDB got right was putting developer first, optimize for initial experience and development speed.
Most database systems came from relation theory background and optimize for performance, operations, data safety, but often developer convenience was sacrificed.
Did anybody liked writing data migration? Try explaining value of table migration to some Product Manager of early startup :-).
You can argue if that's were the right tradeoffs, but that's how MongoDB get a lot of steam.
"Did anybody liked writing data migration? Try explaining value of table migration to some Product Manager of early startup :-)."
People are so spoiled this days. What is so hard in writing table migration, it is like a few minutes of your time. And why would some non techy product manager even know that you wrote some migration sql file. It is a minuscule technical detail related to the chosen technology and he will not see any repercussion in the production. But try to explain to him why after some reckless sprints you got many inconsistencies in the user facing app which you will have if you abuse "schemaless" NoSQL but use implicit schemas which change over time without proper checks. And if my product manager will push just for initial speed all the time without 0 time for some basic architecture (and I mean really basic), engineering practices and such, you know really useful stuff, I would leave that place without second thought. It is digging it`s own hole from the beginning.
And what is the problem of data migrations? Implicit schemas in MongoDB are much worse of a problem when your data requirements change. With relational databases you at least have integrity checks on the DB level that guard you from much of the possible problems when doing such migrations.
> What is so hard in writing table migration, it is like a few minutes of your time
Wow, I wrote migration script for a couple of table rationalisation and it took me a week, to make it error-prone and to provide a revert option. Not to mention that you usually need to work with production data on your local to test if the migration will work without any errors.
And If you happen to have a migration error on your production database. It is a terrible experience, which usually includes downtime.
Maybe my lack of experience on writing SQL migrations increased that time, but isn't this the whole point of the OP in terms of dev first.
Writing SQL migrations is not easy, I agree with that. But I don't think the issue is with SQL: in my opinion migrations _in general_ are difficult to do properly , regardless of the technology, expecially
Even when you use a schemaless db like MongoDB, you still have to manage your data (let's say the schema is implicit).
If for example you decide that the "phone number" field is no longer a string but a list of objects (containing each the phone number and the kind of number), you will have to update the old data to adapt it to the new model, incurring in the risk of messing up production data.
Of course you can (and it is what many people do) avoid the problem at the data level and "fix it" at the application level (for example, checking every time if you have a single string or the new list), but if you really want you can do the same with a SQL database (keep the "phone" field on the table, but add the new table for the list of numbers).
A disadvantage of the SQL approach is that many are less versed in it than in the application language, but there are many tools available to handle SQL migrations in your framework's language.
> Even when you use a schemaless db like MongoDB, you still have to manage your data (let's say the schema is implicit).
Yes, the "secret" to MongoDB not having to write data migrations is simple: you get to simply have them as technical debt for later.
What could go wrong ?
Well, that's easy too: an accounting system where each method has a switch with 10+ cases, each case having 30+ lines of code inside of it, depending on something like "record version" or worse, on implicit tests.
And what could be worse ? Well, having a switch with only 9 cases in one of those methods ...
> Of course you can (and it is what many people do) avoid the problem at the data level and "fix it" at the application level (for example, checking every time if you have a single string or the new list), but if you really want you can do the same with a SQL database (keep the "phone" field on the table, but add the new table for the list of numbers).
This is essentially how my team handles things while using HBase as our main DB. At first it felt weird to not write a migration at the DB level and maintaining compatibility is certainly something that you have to keep in mind at all times, but we have a pretty solid internal library for handling de/serialization in Scala.
> A disadvantage of the SQL approach is that many are less versed in it than in the application language, but there are many tools available to handle SQL migrations in your framework's language.
I was surprised by this at my previous job. One of the first things I learned about when starting to do web development was database migrations (Miguel Grinberg's Flask tutorial). Sure, I had to learn Alembic when I started the job, but it's a pretty straight forward API and the main issue was the lack of transactional DDL in MySQL that we constantly had to keep in mind. Besides that it was just writing SQL statements wrapped in Python.
What you wrote is data migration. Table/database migrations are just DDL scripts which are usually really small on an ongoing project. For example you add 2 columns in a sprint so you have that in the SQL migration file for that sprint.
A real project usually have a lot of things that changes over time, including how to store the data. Adding columns and tables are basically the only simple things and not even new tables are always easy if they are connected to other tables and you have contraints that means they have to be pre-filled with something from the old database.
Other things that can change is splitting one table into multiple table while keeping all data, splitting or joining columns while keeping data and so on. For large datasets this may cause headaches. Any error will also cause a lot of extra work.
What about removing a column? What about if you've decided that you both current and next version of the code to work (then removing a column is a 3 step process, copy to new and use new, skip a version, remove old).
Migration is hard, but I fail to see how something like MongoDB makes this easier.
Either you kick the can to the code, or do a migration script. Is there some magic style of coding that allows me to use the old data structures with the new code without having to do any special checks?
...if you're making a relatively small change to a non-mission-critical, non-distributed system that can be taken offline briefly while you migrate; then you can just shut everything down, migrate, roll new versions, and start up again.
If you need to keep everything running while you do a gradual rollout across thousands of DBs, this gets more complicated very quickly - and that's not even taking into account more complex migrations that involve more than just "add / remove this column".
You're definitely right, however, to point out that implicit schemas don't make this better: you have all the same problems, but with less documentation / validation helping you not screw things up. It's also a good point that many businesses / applications never reach this latter stage where migrations become much, much more complex; when you can get away with the former approach, you definitely should.
> What is so hard in writing table migration, it is like a few minutes of your time.
I have worked on a number of large enterprise projects where there were hundreds and hundreds of migrations over the lifespan. I have never seen it take minutes of time. Or maybe you're just one of those fun developers who never properly tested their work.
But I have definitely seen production outages and subsequent failures to rollback because of edge cases between various individual migrations.
What is the problem of hundreds or even thousands of migrations?? Point of writing migration files is that you can start from empty database, execute every migration file and end up with the current production database structure. So If you have 1000 migration files, writing 1000 and first is easy for the current sprint, you have state X and just need to write a diff for the next state.
And I have written many migrations in systems with tens of different db`s/schemas in the same project which could be different based on the geographical region or some other requirement and never had any big problem
I understand what migrations are. I just disagree that they are trivial.
Especially since you need to write the rollback script as well. And then test it with data. And then ensure there are no weird interactions with other migrations. Because if there is a problem with any one of them then it can be a huge headache to unwind everything.
Most of the migrations do not involve fiddling with data (for example adding a column, table). But if they do, yes you will spend more time on them. But it will be on the order of magnitude better then doing the same thing in MongoDB where you just have implicit schema
I have worked on tens of projects in the past decade, and I have very rarely seen migrations be anything other than automatic, taking one command to be generated.
But often you are adding to reference data or running a transform SQL in order to produce some new denormalised table or the huge array of changes needed to support new use cases. These can't be automated and are hugely error prone.
If your Schema is properly setup after the initial changes to it, then it's not error prone at all, it's doing what it's supposed to do.
Unless you migration involves "disable all checks, add new columns/tables, move and delete old data, enable all checks"
If you are migrating to a denormalised table then it should obviously crash when your conversion tool is doing something wrong and the RDBMS just prevented it from succeeding in doing something wrong.
If your migration is actually error prone as in "mangling and corrupting" data I suggest you are doing something wrong and you probably disabled the safety checks in your RDBMS.
> What is so hard in writing table migration, it is like a few minutes of your time.
The company I work for has 10's of TB's of data on most customer systems, and migration is either impossible as they have no place to store it, or too costly in time to move.
So while writing the code is not hard, the difficulty (or even possibility) of field upgrades can vastly influence whether or not a feature is considered.
"only a few minutes" - try explaining that to any DBA working in a regulated environment, like healthcare / cGMP: all of our ETL scripts require validation plans & independent review by QA -- nothing touching/persisting data is "a few minutes".
> The biggest thing MongoDB got right was putting developer first, optimize for initial experience and development speed.
That's absolutely true, but this is also what gave it a bad rep. It put security on the back-seat, and since it's "convenient" and just REST, it allows for moronic stuff like talking to your database straight from a client's web-browser. Security as an afterthought promotes bad or inexperienced developers to do silly stuff, and since MongoDB focuses primarily on web applications - that is a really really bad thing. How many leaks have there been simply because the database was exposed to the public internet without any authentication? Too many.
Now the problem is that inexperienced devs will flock to something that makes it that easy, and there's no real way of preventing that, but developers should never be put first - it leads to lazy solutions and compromises on too many other levels.
Yes, I like it. And you know what? My databases have lots of checks in the schema. This way I know, that I get correct data when I query the database. In MongoDB I'm not sure. I can throw there anything, and get anything. It's just a normal GIGO queue.
And the migrations are fine. You just need to know the tools. I observed huge programmers' aversion to learn SQL (however they don't complain when they need to learn HQL).
Nothing comes for free. If your schema doesn't live in your DB, then it lives in your app code. In the end of the day, you have to handle data model changes.
No you don`t. DB should be a single source of truth and there is no better tool that cares for the integrity of the data then the DB itself. If you have highly concurrent web app, and X connections are changing same collection, good luck with keeping the data sane without checks and transactions on the DB level. And let me not even mention if you have several different apps/services accessing the same db.
Managing replication and consensus for me is valuable. Map/reduce infrastructure is valuable. Forcing me to fit my data into a square table model is not helpful. Transactions at the data layer are rarely helpful (transactional behaviour always needs cooperation from the application layer, and that's very difficult to achieve with a traditional RDBMs; almost all database-backed applications I've seen don't actually offer useful transactionality). My tool of choice is Cassandra or similar.
I wouldn’t say this is something it got right - actually something it got wrong!
I have never worked with a company that was sunk by the cost of migrating data between schemas. I have, however, worked with companies where the bugs that come with a schemaless database have almost sunk it. YMMV of course but I think it’s a false trade-off.
I would say that the developer experience was good in the sense that it was very easy to replicate Mongo databases - but that’s more of a system administration thing. I honestly don’t think that schemaless databases add any real value.
We're going through hell right now because the system I inherited uses Mongo for order/transaction/inventory data which benefits HUGELY from being stored in an ACID-compliant database, even when you ignore the fact that the data is highly relational and perfectly suited to a "boring" SQL DB.
Mongo has its place, but its place isn't in most line-of-business applications for SMEs.
"Pick the right tool for the job" has been the biggest casualty in the midst of the schemaless hype we're going through right now.
> "Pick the right tool for the job" has been the biggest casualty in the midst of the schemaless hype we're going through right now.
Even getting to the point of realizing that most domains where NoSQL or schemaless solutions thrive are also domains where you need to be thinking in suites of persistence technology, ie where trying to choose a single data persistence technology is crazy, would be a big step.
A beast like Reddit (for example), has clear areas where a document centered schemaless approach would yield big wins, but that doesn't mean it's ideal for handling their advertising billing or their master data.
"The job" is quite often many smaller jobs, each of which needs it's own tool. Sometimes it's a lot easier to put on slippers than to carpet the whole world.
Developer convenience is not a substitute for sitting down and learning SQL - but a good number of people I've met over the years plugging MongoDB et el have at some point thrown their arms in the air at the prospect of doing that.
Here's the thing, in practice relational databases aren't hard, nor will most devs ever need to know what relational theory is in order to be productive with databases built on it. Performance optimization does get a bit harder but there are relatively few situations where there's any magic involved, knowing how to read an explain plan and when/where to add an index is what most of it boils down to.
As someone who actually bothered to sit down and spend the effort learning SQL, I agree. I've seen many colleagues in my career say "Oh SQL is hard/stupid/broken/whatever. MongoDB is so easy, it just works." and when pressed on whether they actually bothered to try and learn SQL, the answer is always no. Seems developers today are so lazy that we can't be bothered to learn a technology, if it's different from the ones we already know.
> Did anybody liked writing data migration? Try explaining value of table migration to some Product Manager of early startup :-).
data integrity isn't good enough for a product manager? It's like saying "statically typed languages aren't worth the effort, do everything in JavaScript".
Furthermore, MongoDB or not, an application in development needs DB migrations. I don't believe one second you can change all your classes or structs and pretend like MongoDB will magically update your previous documents and collections to the current state of your application.
But one huge advantage of MongoDB is that it is schemaless. And so you have a lot more options for how you migrate documents. For example you can move all of the data under a particular key e.g. "v1" and allow two schemas to co-exist within the one document. Significantly less risky than running migration scripts or doing a alter schema in production.
But it's just as easy and just as safe to allow two schemas to co-exist within the same database. If you follow the necessary steps for a risk-free data migration, having both schemas coexist simultaneously and doing gradual switchovers, both models are going to be comparably difficult.
What schemaless lets you do is hide your risk better by disguising those steps. "db.table.update({$set: ...}, {multi: true})" looks friendly, but it's the same terrifying operation as running DROP TABLE A the instant you're done generating table B. And it looks easier, but that's because your code's beliefs about which schema it's reading from are hidden and implicit.
In a world where developer convenience wins over performance, operations and data safety a significant number of those same developers argue over whether they should be referred to as computer scientists, engineers, architects, or painters.
Agree. If civil engineer chose risky/bad materials for your house because it would be convenient and you would get a house 30 days faster I cant imagine a person not being mad about that.
So true. Whenever stuff gets "convenient" for me as a developer, I ask myself what the downside of this "convenience" for me is that will have to be endured by others. Often times it's inconveniences for the guys who have to operate the thing that I develop later (the single greatest things about this DevOps hype is that it made quite some developers experience the operational pain that their creations inflict first-hand, thereby making them less likely to take "convenient-for-them" shortcuts that make operations much less convenient). Sometimes it's inconveniences for other developers working on the same project. Other times it's inconveniences for developers maintaining the product later (which might as well include me, too).
Whatever the inconvenience is, I need to be able to stand in for them being an acceptable price to pay for my personal convenience. And that is much more likely to be the case if I carry the "inconvenience" part of the equation, while others get the "convenience" part - like the operations people, or the end user, or other developers. That's because I consider myself a professional who's getting paid good money to make other people's life easier, especially if those "other people" vastly outnumber myself or the few other devs on my project team. That is the essence of my job, and it's not intended to be "convenient" for me - developer convenience is therefore one of the least-relevant criteria by which I ought to choose how I'm going to do it.
Developers who highly value "convenient" solutions are bloody amateurs in my book, especially if they can't even envision the ways in which their convenience has to be paid for by others.
Mongo or not, you have to migrate your old data at some point, or as the project grows you'll end up adding a layer over layer of complexity in your code to handle all the accumulated inconsistencies.
> The biggest thing MongoDB got right was putting developer first, optimize for initial experience and development speed
Specifically, I would say mobile and node.js developers. The other part that worked well was so sharding, out-of-the-box. Oh, there were major issues like an index updates skipping documents, but it appeared to work and was the easiest to set up.
> SQL is a great language for ad-hoc exploration of data or for building aggregates over data sets. It’s an absolutely miserable language for programmatically accessing data.
Disagreed. SQL is great for programmatically accessing data, especially with a good client library like SQLAlchemy. I have a much easier time understanding and, most importantly, debugging SQL than MongoDB's query language.
It did not solve the injection issue either, MongoDB injections are a thing[1] and they're much harder to reason about than SQL injections.
I agree with your disagreement. In fact the writer seems to have it backwards.
The hard part of SQL is learning complex queries for "ad hoc exploration of data or for building aggregates over data sets": joins, unions, window functions, recursive common table expressions, etc. (Nevertheless it's still shorter than doing the same thing in some procedural language.)
The easy part is "programmatically accessing data." What is meant by "programmatically accessing data" anyway, besides "select * from . . ."? The writer's main criticism is:
> If you want to insert data,
> you’ll probably end up constructing the string (?,?,?,?,?,?,?,?)
> and counting arguments very carefully.
With PHP's built-in PDO library, you can use named placeholders instead of question marks:
select *
from fruits
where color = :color and calories < :calories
I believe the authors point is that you have to use SQLAlchemy for that experience. Natively, the experience is much harder and that's why ORMs (like SQLALchemy) have been built. In MongoDB, the driver matches the command line interface 1-1, there is no need for a higher level wrapper so that you can use your database.
That's actually a myth. One thing is Object Mapping, which is repetitive, but whenever there is a layer of indirection to "hide SQL from the developer", then there will be developer pain and waste of developer time.
SQL is just a language and needs to be learned. It's not hard and being declarative it also teaches to think differently than imperative programming.
Want to query in Javascript? Sure go ahead, but don't say SQL is harder.
I think a good ORM is one that isn#t just "hide SQL from the devs" and rather "hide trivial SQL from the devs and let them get dirty if they want to".
Some ORMs I've worked with allowed me to completely bypass the ORM system while still getting most of the benefits. I do recall a ORM that I've use a long while ago that didn't support reading Views as objects but it could be trivially added simply by writing a custom query method which was unpacked like any other query into the object and that worked for Views then.
Assembler is just a language and needs to be learned.
Oh wait. Nothing needs to be learned and there are always cases for higher up abstractions. Especially SQL which can get extremely complex if it needs to.
I've found there are few ORMs like SQLAlchemy. Most common ORMs hide the SQL, which is what you were talking about. With SQLAlchemy the SQL seeps out into the code, like SQL blending in with Python.
And then you have to maintain your own consistency layer in the application because you'll either duplicate data or use relations in Mongo like you would in a relational database.
MongoDB is excellent for design-first rapid agile development when domains are still maturing. This is common in large system redesigns and start-ups.
Comparing MongoDB to SQL or relational is pointless. They are tools. Given the right circumstances, both serve a purpose.
Relational databases are still necessary for analytics and reporting. (Try using any reporting tool on a schemaless database or a schema that’s a moving target)
I would argue that MongoDB is best used for R&D, but once your domains have matured, quickly moving to a relational data store will provide more sustainability.
I’ve also started incorporating graph databases into my architecture thinking. You should too.
SQL is most sophisticated language ever built by mankind. It is centuries ahead of the second most productive language. Mix Window Functions, Common Table Expressions, Qube/RollUp, Materialized Views, ... and holy smoke...
You should try really hard to be unproductive with SQL and any relational system.
> Querying SQL data involves constructing strings, and then – if you want to avoid SQL injection – successfully lining up placeholders between your pile of strings and your programming language. If you want to insert data, you’ll probably end up constructing the string (?,?,?,?,?,?,?,?) and counting arguments very carefully.
Uhm what?
At least in .NET+SQL Server land you can give your SQL queries named parameters. If there's a mismatch you'll get an exception saying so...
Perhaps this is a problem in some other stack (PHP?)
Instead write a micro service on top of it that holds all the domain knowledge about how the data is structured in the storage layer? Stored procedures are not necessarily business logic, they can be an abstraction layer over the underlying structure, so yes, that can quite well make sense to place it in the DB.
EDIT: To those mentioning migrations, note that reviewing migrations are like reviewing patches. Sometimes you want to not review a specific patch but rather look at the whole method, and `blame` a specific line or three to get an idea of what the previous developer's intention was. You can't get that type of overview from reviewing each patch or migration.
I have a system with a lot of logic in the database / the schema is maintained with one of our Rails apps, using the traditional migration tools. It’s pretty awesome.
What about `WHERE foo IN (?,?,?)`. I usually let the language (PHP, Python) handle counting the arguments, but there is no (clean) way to do that with named parameters. And there is no option of using some parameters named and others anonymous, so if there is an `IN` anywhere in the query, then the entire query - and any variation of the query - must use anonymous parameters.
This is can be automatically handled if your ORM supports it; even a micro-ORM like Dapper does.
Otherwise my go-to solution for this if it isn't supported is to pass the collection as a user-defined table type, filled with the values. you can either use WHERE IN or join on this table variable.
EDIT: my perspective on this if from working with SQL Server and .NET. I don't know enough about Python or PHP development to comment on those.
It's not moot: Dapper doesn't auto-generate the query for you, it's just a thin layer over .NET's SqlClient to reduce boilerplate of converting to and from C# objects.
You still have to write your SQL statements, and in your example you'd still be able to refer to your array by name:
select SomethingID where AnotherThing in @yourNamedArrayParameter
I'm not convinced about the point of structured operation format.
I agree that manually building SQL strings can be cumbersome. But in practice you can use libraries that allow you to easily create them programmatically.
The JSON queries of MongoDB on the other hand, while they are simple to use programmatically, they are a pain to use outside of your program, for example for running ad-hoc queries, or generating a report from your data. And there is no library to fix this.
The appeal and the pitfall of JSON is just that it's less broken down than tables.
Suppose you built a "contact database," but all it was was the scanned images of people's business cards. You could flip through pages of them, but you can't do searches. But the input is so easy, just scan and save. And the output is likewise easy, just embed the images in a web page.
1. Running optical character recognition would be the first step in breaking down these cards into parsable information.
2. Converting the text into JSON and storing it in MongoDB would be taking it to the next level.
3. Breaking down the JSON into a set of tables would turn it into the most flexible and useful form --- but it's also the most work.
You don't have to take all your data all the way to tables. For example, I leave my website log files as text. Grep and awk are usually all I need. I don't usually import them into tables --- although I have a few times for easier analysis!
Although you don't have to turn everything into tables, I would say there are several applications out there where the developer should have gone all the way and not stopped at JSON.
Slightly off-topic, but once ActiveRecord(Rails) - always ActiveRecord. After years of developers complaining about SQL DBs being "difficult" and MongoDB allowing to "easily prototype", I still don't get what's hard about working with PostgreSQL if you use something like Rails.
Mongo is one of the few NoSqls to offer "unique" keys. While common in SQL, this is a big differentiator from other NoSqls and allows many use cases that would have otherwise ended up in the SQL world
> The situation is sufficiently untenable that it’s rare to write apps without making use of an ORM or some similar library, creating additional cognitive overhead to using a SQL database, and creating painful impedance mismatches.
That's only true if you're using PHP,Python,Javascript or dynamically typed language. Mongo documents don't automatically map to classes in C++ or Java, the impedance mismatch still exists and a data mapper library is still required, just like with RDBMS.
Talking about an easy experience for failover, and horizontally scaling, I recommend taking a look at MemSQL or NuoDB. I think MemSQL did a really nice job in creating a fast bare-bone SQL solution, with an easy to use management interface. https://docs.memsql.com/operational-manual/v6.0/managing-hig...
Our community began being filled with relatively inexperienced types who treated engineering problems and solutions like a popularity contest. It was suddenly 'smart' to mock it, so they did, barely understanding the basics. In essence a desperate attempt to fill the gaps in their CV.
Meh. If you find yourself mocking, you're probably doing something wrong somewhere
Sorry for the obvious plug, but if one wants solid replication, ACID compliance and a few other 'enterprisey' features, you should try MarkLogic: https://developer.marklogic.com/products
I feel like we got a lot more things right ;)
Other than price it would be good to hear, if people agree and what's missing still.
Not sure why you were downvoted. It is at least a little exciting to see a successful IPO of a business built on an open-source database, even if it's not the best one. It proves the model is viable, and is a healthy thing for open source in enterprise.
They marketed their asses off while they iterated to build a document store that was worth the praise they got people to heap on to it.
IIRC the acquisition of the wired tiger engine was basically a rewrite, because the previous implementation was just bad. So if the database engine wasn't good, and distribution wasn't good (until later), and schema management wasn't good, and the query language wasn't all that great, what did mongo really get right? Marketing and being early to market/new and shiny at the right time (when everyone was deciding that writing the frontend and the backend in the same language, javascript, was an awesome thing -- which I don't necessarily disagree with).
If someone needs a database these days, I always just suggest postgres. if it's good enoguh for reddit (https://github.com/reddit/reddit/wiki/Architecture-Overview#...), it's good enough for the next facebook for dogs (unless your problem is completely different then of course you should explore whatever database paradigm fits your problem the best).