Hacker News new | past | comments | ask | show | jobs | submit login
Startup Engineers and Our Mistakes with MongoDB (nemil.com)
94 points by nemild on July 19, 2017 | hide | past | web | favorite | 114 comments



It terrifies me to see this quote from their CTO:

"MongoDB's CTO disagrees with this statement arguing that nearly 90% of database installations today would benefit from being replaced with MongoDB."

I used to attend "office hours" at MongoDB's office where guests ask MongoDB employees for help. Most of my questions involved very complex aggregation queries (that would have been trivial in SQL) that even MongoDB employees could not solve.

While I waited to be helped I would listen to other people ask the same questions over and over and over: "How do I join these two collections? How can I <perform a transaction> and ensure both writes succeed/fail?"

Rather than say "MongoDB doesn't offer this functionality" and maybe advise them that this database isn't what they needed for the specific project, the engineers spent a majority of these office hours explaining they don't need schemas, transactions, relations and advised them how to hack together something that worked.

I don't think MongoDB is appropriate for 90% of the database installations out there. I don't know the number, but it isn't 90%, it isn't a majority, and doubt it's a significant number.

MongoDB really needs to explain what it does exceedingly well versus the other distinct offerings in databases (RDBMS, Key Value stores, MPP/warehouses, HDFS, etc) and market that.

What they do right now is somewhere between disingenuous and down right negligent.

Of course its the responsibility of a company and its staff to choose the right tool for the job, but these postmortems are becoming tiring.


> "MongoDB's CTO disagrees with this statement arguing that nearly 90% of database installations today would benefit (MongoDB shareholders) from being replaced with MongoDB."

If Mongo's DB was half as good as their marketing team they would really have something useful.


I do think that mongodb is appropriate for 90% of the database installations, but also are mysql and postgres. Most of it is wordpress anyway.


I really don't understand your comments.

You're critical that MongoDB employees whose job it is to help people use the product showed users approaches to implement their use cases. How is that bad or somehow unique ? I've been to Datastax, MySQL, Oracle, Teradata, Cloudera, Hortonworks workshops before and all of them did the same thing. It's their job.

Also MongoDB has been crystal clear what it offers in comparison to other databases. It is a document database with easy to use scalability options in a market that doesn't have many of these. And it's mentioned throughout their documentation and by third parties about the pros/cons of document databases and how to appropriately model your data.

And yes these postmortems are becoming very tiring. Engineers should not be pushing any technology without a spike to determine whether it's appropriate or not. If you don't have a document orientated data model then don't use a document store.


> approaches to implement their use cases.

This is where you and your parent disagree. Your parent believes they showed "hacks" and not "approaches to implement their use cases".


MongoDB is web-scale.


So is /dev/null.


Does /dev/null support sharding?


- "do you even know what a shard is?"

- "shards are the secret ingredient in the webscale sauce, they just work"

<3


> By far the most consistent mistake was choosing a non-relational database, when your data was strongly relational. Mongoose's ODM made this mistake surprisingly easy to make, which led to issues down the line.

This mirrors my experiences with Mongo as well. The vast majority of data is relational. Mongoose allows people to make the mistake of structuring their data relationally. But all you are doing is pushing all the joins to the web server (which in Node land is single threaded and compounds the mistake of choosing NoSQL).

There may be some microservices that Mongo is great fit for, but it should not be your core data store.


But all you are doing is pushing all the joins to the web server (which in Node land is single threaded and compounds the mistake of choosing NoSQL).

Why is pushing the joins "to the web" inherently bad? It's a lot easier to scale web servers than database servers.


Simplified, a JOIN on an SQL server will take advantage of its indexes to grab the data from the disk that matches the specified criteria. SQL databases are built for this sort of thing, and they're really freaking good at it.

Sending that same JOIN to the web server means, at the least, shipping everything from both tables matching the criteria and matching up those two potentially huge blobs of data with each other on the web server. Doing this in an even partially efficient manner will split your query logic between DB queries and other code. Also, it's not unlikely that you'll be doing this JOIN operation in a language that maybe isn't great at optimizing that sort of CPU intense behavior on its own, so you're probably going to need to drill in and optimize on your own (vs letting SQL do its magic). Plus, we're going to need to store all this data in memory while we match it. So now our web servers, while more easily scaled than a DB, are actually pretty damn big boxes with decent memory and CPU, and that's expensive to scale.


If you are doing a lot of joins in Mongo - you're doing it wrong. The whole purpose of a document DB is storing your model with the related data as one document.


If we dump everything we might JOIN on to a single document then in many cases we're going to end up with a ridiculously nested document, and/or duplicate data everywhere. Let's try mapping out the project I'm working on right now.

We've got moves, and each move has multiple stops:

  move: {
    shipper: string,
    consignee: string,
    stops: [{
      address: string,
      appt: datetime,
    }] 
  }
So far this looks great. I like this. Yay MongoDB!

But I need to add a driver to each stop... Where do I put the driver? If I put him under the stop I'm going to duplicate his data all over the place. Do I put the move under the driver instead?

  driver: {
    name: string,
    id: string,
    moves: [{
      shipper: string,
      consignee: string,
      stops: [{
        address: string,
        appt: datetime,
    }]
  }
Ok, not bad. We've got an array of objects under an array of objects but I'm still perfectly comfortable.

Now I need to get a terminal-based look at all the drivers, so I know what every driver is doing in a particular city, also drivers can actually have multiple trucks too:

  terminal: {
    city: string,
    drivers: [{
      name: string,
      id: string,
      truck: [{
        plate: string,
        moves: [{
          shipper: string,
          consignee: string,
          stops: [{
            address: string,
            appt: datetime,
          }]
        }]
      }]
    }]
  }
That's getting a touch nested but... Drivers actually work for vendors, and vendors might work across multiple terminals. Do we duplicate the vendor across terminals, or the terminal across vendors? Also even the driver/truck relationship is many-to-many... Do we duplicate the truck info? What happens when we need to update a truck? Do we attempt to update across all drivers? Do we duplicate that data too?

  terminal: {
    city: string,
    vendor: [{ // many-to-many! duplicate vendor data?
      id: string,
      drivers: [{
        name: string,
        id: string,
        truck: [{ // many-to-many! duplicate truck data?
          plate: string,
          moves: [{
            shipper: string,
            consignee: string,
            stops: [{
              address: string,
              appt: datetime,
            }]
          }]
        }]
      }]
    }]
  }
Ok, this schema is starting to look like shit and it's only going to get worse. The shipper and consignee should actually be unique objects as well, and we might want to update the contact info on them. Do we duplicate that data and update across all moves?

Also we want to cross-reference the shipper, consignee, billing, and ordering entities for market analysis (the consignee isn't always the customer, and the customer doesn't even always get the bill!) Do we duplicate that information across moves? Do we duplicate moves across entities? More many-to-many relationships...

Maybe I'm just not creative enough to use MongoDB, but I'm really starting to feel like this is a square peg in a round hole.


This isn't rocket scientist and people far smarter than us have already figured this stuff out.

https://martinfowler.com/bliki/DDD_Aggregate.html

If you look at this from a Domain Driven Design standpoint, you should model your business and think about your aggregate roots and each collection should be its own aggregate root, data access should be handled by one class/microservice for each aggregate root.

If you modeled your Domain correctly that wouldn't be an issue.

On the other hand, not modeling your domain correctly first would lead to an ungodly, untestable, tangle of stored procedures and 10 way joins with A relational database.

The Mongo docs goes into best practices which mirrors basically the concept of an aggregate root.

https://www.mongodb.com/blog/post/thinking-documents-part-2


The MongoDB document linked says "Referencing should be used to represent complex many-to-many relationships" ... "References are usually implemented by saving the _id field of one document in the related document as a reference. A second query is then executed by the application to return the referenced data."

So... It's like a join, but with two queries instead of one, and you push the join data to the webserver, which brings us back to the start of this comment thread...


From your scenarios and what your asking to get from your data (yes I understand your problem domain, I wrote mobile software for field service workers in a previous life), you're not doing a lot of complex two way joins where you need to get all of the data from table A and match it to all of the data from table B. You're starting with a known entity like a terminal with references to other known entities. In your scenarios all your DB is doing is searching tables based on indexes and creating a data set in memory. If you did the same searches one table at a time starting from your known root - the terminal you are looking for and query individual tables based on foreign references, the only performance difference would be the slight overhead of multiple DB calls. You wouldn't even be getting unnecessary data back.

Even if I were using a relational database, I would still separate out my different domains (drivers, terminals, stops, etc.) into different classes/microservices and join the different entities in memory.

This would allow different developers, teams etc. to focus on separate business objects and would allow you to start off with a monolithic code base/app and separate the app out to microservice later without intertwined logic and to mix and match storage based on what makes sense.

I never said always embed everything into one document. I said think through your aggregate roots to decide how to design your apps and your document model.

I've designed this way for the last 10 years - 9 using RDMSs. I don't do complex 10 way joins for apps. I separate out business domain logically and let each module be responsible for its own domain and then have an overarching orchestrating class that combines the various objects when needed.


> I don't do complex 10 way joins for apps.

That hits home. There's a lot of this in the legacy code base, and yeah, it's painful.

Interesting approach. Thanks for the discussion. I'll have to ponder this a while.


Joins are typically a tiny intersection of 2 data sets. Multiply that by the number of web requests. Moreover, a query planner is far more optimized than the code you'd write on your web server.


also, scaling databases is not terribly difficult, it is just very "clumsy".

Still, having a proper transaction manager and query planner will give you far better performance compared to doing it on webservers.

Also, aren't bsically moving the complexity of the database to the webserver? Which makes your webserver far more complex and harder to scale in comparison. Especially if you need ACID. (which is a must if you are doing anything with data storage imo).


You don't always need ACID. Sometimes eventual consistency is good enough. How would it make the webserver itself more complex? The application itself should just be able to run on n servers.


If data needs to be processed, an option would be to use MongoDB for collecting the data in bulk and later decide how you need to structure it for your needs.

When it comes to "read" the data, you read it only from processed database that can be anything.

Since you can do your processing completely independently of your web server, your are not necessarily pushing your computational load from the DB to the web server.


> If data needs to be processed, an option would be to use MongoDB for collecting the data in bulk and later decide how you need to structure it for your needs.

OR, i can open a file stream, serialize my data to JSON, entity by entity, and dump all to a file.

The good old file.


Because we want to do partial updates, searches, indexing etc.

Your position can be applied to all databases. Why not abandon them all and just use CSV ?


You could always do that, dump the data to s3, and use Athena :-)

https://aws.amazon.com/athena/


And then be locked-in to Amazon. From ubiquitous files to ... a single service provider.


replied to the same question here: https://news.ycombinator.com/item?id=14805812


Then again, if you're going to push into Postgres you might as well store the temp unstructured records there too


Well, that's also an option :) But I believe this is a relatively new option.

Personally, I never got into the hype with the NoSQL and for me the only use is as a convenient API for storing JSON files during prototyping so that I can later decide how my data should be structured.

Definitely tech debt for the future but fast iterated design that needs the database stuff fixed has more chance for success than the rigid design with the perfect data architecture.


It that point it's harder for me to understand why I'm not just writing out JSON files or something.


You definitely can but MongoDB provides convenience over storing and managing JSON files on you filesystem by your own efforts.


No, I really can't think of any situation where installing, managing, updating, maintaining, and crying over a mongodb cluster would be easier for a flat-file datastore than say, a file system, or S3. In fact if one is on AWS, then nothing beats just dumping them in S3 and processing them as necessary. If I'm a startup and I really wanted some querying then PostgreSQL looks great, with Amazon handling the maint and backups.


You can even query flatfiles in s3 using SQL via AWS Athena.


Sure, but it is also another service to keep running. Depends on what scale you're operating at.


Any RDBMS is just a toolset to manage files on some filesystem. If you are dealing with some config files you can spare yourself the trouble and just load them from the filesystem.

NoSQL or classic RDBMS, these tools give you ways to manage your data on the filesystem(or in the memory?). You can spend time to write a code that processes files in a folder or you can use the software that does it for you. You can write a code that relates two files in a folder or you can use a software that already can do it and you can just ask that software for the outcome.


I don't really have to write that code though; what language doesn't already have file-manipulation libraries?


Well I guess some people wonder why people are using ready software when you can achieve the exact same thing with a text editor and a compiler.


That's not the same thing at all; you're suggesting an approach that probably takes more code and definitely takes more setup.


You are underestimating the complexities of file management


Again, depends on scope and scale. If you have a handful of items to process it's not a big deal.


I feel the best thing to come out of MongoDB is that Postgres now handles JSON.


Yes, this was an incredible development, and lead to hilarious projects like ToroDB (MongoDB API on top of Postgres), which ended up being faster and safer, with relations when you need them.

JSON columns can be incredibly useful and easy to understand for things like user preferences, without the hassle of a metatable.


Call this hilarious if you want, I think it's rather cool, thanks for the pointer. Once this is out of beta it will definitely be useful, if not already.


No doubt the best thing that has happened lately to Postgres as well!

And Postgres can do JSON with vastly more performance than Mongo:

https://www.enterprisedb.com/postgres-plus-edb-blog/marc-lin...


If you're going to get on the "MongoDB was overhyped" bandwagon, you should probably not post links to performance comparisons you pulled off of an EDB Postgres website.


At least EDB used an inspectable and reproducable test. I think posting links to performance comparisons is fine when the performance comparison can be independently verified.


look at this article on why that benchmark was bullshit.

https://newbiedba.wordpress.com/2017/05/26/thoughts-on-postg...


I suggest then that you get the latest versions of MongoDB and PostgreSQL and run that test and report back. Verifiable isn't the same as verified.


How about built in replication that isn't insane to setup and use? Might be a close second.


The reality of MongoDB is that it's a very specific use case of nonrelational data, which is very rare these days. I'm sad that so many people get looped into Mongo with a MEAN/MERN stack when those apps are almost always the CRUD apps that would benefit from SQL. Why force yourself to maintain a schema implicitly for relational data when you can get error messaging and explicit schemas?

I think MongoDB is a great fit for the right problem. I've been coding for 4 years and have yet to find a problem that fit Mongo best.

As far as quick prototyping goes, this article articulates well why that really isn't the case. Going back to the authors post here on HN, that's a point I would really disagree with the CTO on as well.


"I've been coding for 4 years and have yet to find a problem that fit Mongo best."

I'm going to leave that right there.....


I can't read the implication of this, but to clarify, I haven't personally encountered a problem that fit Mongo best. I can imagine one.


Today in 2017, there is no real reason to use MongoDB other than in prototyping. I am happily waiting until the final nail is put on the coffin of this overhyped, flawed document store.


The hype is the real issue. Even today, the CTO appears to claim that 90% of apps aren't relational enough to need Mongo.

I did a startup in 2011/2012 where we bought into the hype and used MongoDB & Node + Mongoose. It was horrific.

Your app is relational, full stop. You would know if it wasn't. Do you have users? Do those users need to log in? Well, you now have access tokens related to users. Do those users need to create anything at all? You've now related those owned objects back to the user. We're talking the very basics of any application, and Mongo's support for it is basically nil.

We went down the road of embedding relations inside the model (e.g. an "user" has multiple "tickets" to "events"), and then we'd just filter through the "events" table to find the related one. But what if the admin running the event wants a list of all the tickets for his event? We're talking webapp 101 stuff, but we had to write some really gnarly application logic and duplicate a bunch of data to make it fast.

It was a complete waste of time.

At the crux of this is a fallacy that easier is always better. It's not. There is a reason SQL databases are a little complicated to use - they have evolved over many decades to fit the needs of real applications. MongoDB is only barely removed from writing raw JSON arrays to disk, which is probably the simplest "database" imaginable.

When you so callously discard the common wisdom on how to store data, you rediscover why it exists, the slow, hard way.

10gen made a business - and a fortune - out of misleading new developers.


>Do you have users? Do those users need to log in? Well, you now have access tokens related to users.

Aren't those tokens supposed to be stored client side anyway? Aren't those tokens supposed to contain encoded information about the user that you decode server side? Why would you store the users token to begin with?


That's a matter of opinion and implementation. Not everyone uses JWTs. What if you want to invalidate all logged-in tokens for a user? You either need to store a blacklist (that's a relation) or set extremely short expiry times.


i think he is discussing standard cookie session ids here, not access tokens. in many web applications, all the information you need is stored within that token and signed by a server-side secret.


Same thing. How do you invalidate a signed cookie?


"Today in 2017, there is no real reason to use Oracle other than in prototyping. I am happily waiting until the final nail is put on the coffin of this overhyped, flawed relational store."

Point being you can say just about the same thing about any other database. They are all flawed for one reason or another. I happen to like Mongo quite a lot, but I rarely use it. You need to understand what situations it works best in and it's unfair to both it and yourself to generalize and wish it dead.


Oracle RDBMS isn't flawed. It is expensive, complex to configure/mantain/extend, expensive in licensing, expensive in storage, and also expensive. Did i mention expensive?

PostgreSQL is eating Oracle's customers little by little.

RDBMS aren't flawed at all, this is technology that has been perfected for the last 40 years. What is flawed is to think that relational data can be easily stored on a document store (typical mistake).

I think document stores are a great thing; the only problem is that MongoDB isn't a good document store.


> I think document stores are a great thing; the only problem is that MongoDB isn't a good document store.

To add to that - you know when you need one. If you don't know you need a document store, you don't need it.

When you try a good document store, like Solr, you realize how appallingly limited Mongo really is. It's not suited for any practical use outside of demo apps.


Solr is not a document store. It's a search engine.

If you actually tried using it as one you would realise how ignorant your comment is. Both the Solr and ElasticSearch have gone on record before stating that it should never be used as the source of truth.


THIS, so much this!!!!

I'm working on overhauling (rewriting, from scratch) a project right now where the previous developer decided in his infinite wisdom that using ES as a database was a good idea.

I have tried repeatedly to explain to him, and the product owner, why using a search engine as a storage database is a horrific idea. Owner gets it, dev doesn't. I asked that dev if he could overhaul the project, what would he do, and he said "not use SQL at all, and use ES 100%". <_<

I think this is the problem with Mongo too... you have people misusing it, and then want to throw the baby out with the bath water and act like Mongo is the problem, when really, the problem is how they are using it.


And yet - it can be used as one. "If you actually tried using it" is a wholly inappropriate thing to say to someone you don't know. We used it in a large application indexing & faceting millions of documents, and it works just fine for the practical purpose of querying & retrieving those documents. Of course, we did mate it with Postgres to build the full app. Each tool to its strength.

There's no need for the vitriol.


Spoken like someone who have never used Oracle with something other than Java. Also expensive seems like a flaw to me.


"You need to understand what situations it works best in"

I always hear this, but with no "such as" examples given. Could you please provide some?


The same example I provide everywhere. I have successfully used Mongo as a geospatial data store, where I have a public transit app/website, and most objects/documents have geospatial properties.

Sure, there's PostGIS, but saving a GeoJSON multiline which I can slap an index on works for me.

I have grown fond of Mongo because I've been using it for almost 5 years now. I know its limitations, and also what I can do in RDBMS that I can also do in it.

I sometimes build OLAP logical models at work for clients, and most of what I touch at work is SQL, but sometimes when I start a project I use Mongo instead of SQL.

It's only one use-case, but I hope it is something.


Interesting. I haven't used JSON-stores because in my mind it always boils down to the question of how you would link entities together. I can't seem to get a good answer... so let me put this to you:

In a standard RDBMS I would probably have a custom datatype for LatLon (or MGRS):

  create custom type LatLon(...)
  create table Person(id as int, name as string, location as LatLon)
  create table Shop(id as int, name as string, location as LatLon)

  Find everyone 10k around Macy's:
  select p.* from Person p inner join Shop s where s.name="Macy's" and with_distance(p.location, s.location, 10km)
How would a JSON-store be better at managing this, given that with RDBMS systems you have indexes to help speed such cross-table looks?


Earlier in 2017, Atlassian acquired Trello (MongoDB, Node.js, Redis tech stack) for 425 Million dollars


... So? Trello is a great product, and it will continue to be a great product no matter where it chooses to store its data.


Funnily enough, the one feature I need to completely replace jira with trello is the ability to have the same card on multiple boards, but I guess they couldn't handle that kind of relational data.


Why? I don't personally care if MongoDB succeeds or fails, but if they can listen to their users, fix their issues and come out the other end with a database that works as advertised what is wrong with that? Why is it so important to you that they fail?


Why is it so important to you that they fail?

If it fails, then I can quit having the same discussions repeatedly as try to talk my co-workers out of using it.


A really great writeup. A revealing exploration of the perils of mongoDB, and of the greater issue of selecting technology for a project.

There is never any mention of usecase, in companies. Everyone is still trying to be google or facebook, and uses "their tools" without a use case for why. It's just because that's what everyone else does.

All technology has a place somewhere for some use - but we need to know why we are choosing them.

For example, I choose Ruby on Rails because of it's gains in productivity, reliability, security and pleasure of use. I can quickly sling maintainable code that can scale up to millions of users. And most importantly, I don't have to make a lot of decisions. Best practices are defaulted into the framework. Some people shy away from frameworks, favoring "simpler" tools - but that only means they have to re-solve all the problems the framework has already solved.

Likewise, with MongoDB, all those useful things SQL databases do for us need to be resolved. But, that's a database decision, and many people making the decision are just developers in search of a data store. Transactions and ACID are often not even on their radar.

Overall, i think this is a symptom of a lower barrier for entry. Where before you needed a CS degree to be a programmer, now you just need to go to a bootcamp. People are not properly trained, aware (or interested) in the underpinnings like databases, networking or security - and to their own peril and the peril of the companies they work for.


(I’m the author of this series)

Eliot Horowitz (HN: @ehwizard), MongoDB’s current and founding CTO, reached out after my last post - and spent two hours providing feedback last week in Palo Alto. It was an expansive discussion, and Eliot was reflective and eager to understand the perspectives I had heard. He noted how much it mattered to him what HN thought.

I left with tremendous empathy for the challenges of building a venture-backed database company - while we also disagreed in a few key areas. As engineers, I hope we continue to have thoughtful, if spirited discussions, like Eliot and I had - where we both are open to being wrong and have a desire to understand differing viewpoints.

In the interest of length, I didn’t write the whole interview into my series, but wanted to share key parts of our discussion on HN:

- NoSQL: I felt that NoSQL was overhyped in the early 2010s - and that 10gen’s marketing claims were overblown. Eliot argued that many of NoSQL’s benefits have been realized and that 10gen’s early marketing accurately reflected the changes to come. For example, while he doesn’t love the term NoSQL for its impreciseness, he feels that both the JSON-like data structure and horizontal scaling are here to stay and were in fact the key changes NoSQL led to (not the SQL DSL). In that sense, he argues that Amazon RDS is a form of NoSQL today - and that NoSQL had a powerful impact on the roadmap of existing SQL databases.

- Data Loss: I generally have stayed away from talking about the controversial examples of MongoDB data loss in the earliest days (there are several of HN posts that note this). I know personally of only one team that lost data, but it’s always hard to understand if this was a database issue - or their own mistakes. I did ask Eliot explicitly about this, and he said that these were exceedingly rare.

- Defaults: I feel like it’s playing with fire to set bad defaults in a database - with numerous data breaches due to 10gen’s early decisions on authentication, remote login, and encryption (see for example, https://snyk.io/blog/mongodb-hack-and-secure-defaults/ ). For auth, Eliot argues that developers need to take responsibility for exposing MongoDB on public servers - and that the SLA for a self-hosted instance is different than a managed instance (at minimum, I have issues with users having their data exposed to the world through no fault of their own). He disagreed with 10gen’s decision to turn on auth by default in later self-hosted versions once MongoDB ignored remote connections by default (but thought this was the right choice for the managed Atlas service). (But before 2014, the default behavior was no auth - and accepting all remote connections, see https://snyk.io/blog/mongodb-hack-and-secure-defaults/ ; Eliot notes that this took a while because changing the default would have caused issues for existing customers)

I do have concerns when 10gen explicitly targets junior developers (to be fair, he could never have predicted the growth of Node and the interest in the backend for frontend engineers). What he says makes sense say 20 years ago, but with 25% of new software engineers coming from coding bootcamps with non-engineering backgrounds, I worry that defaults matter ever more in dev tools (and even seasoned engineers may mess this up, if they’re coming from a database with different defaults). We discussed analogies like seat belt lights versus the responsibility of passengers to know better. He also argued that waiting to get all this right - not just auth - would impact database innovation, while I think there’s a balance that gets us a lot of the low hanging fruit (like security).

- Mistakes with MongoDB: Eliot felt that certain posts (such as Sarah Mei’s popular HN post about Diaspora: https://news.ycombinator.com/item?id=6712703 ) misunderstood how to architect MongoDB. Regardless of the particulars, I think that new technology is especially susceptible to issues like this until the community develops broader knowledge. The failure states are well known in many relational databases and there is a broad base of knowledge about how best to architect relational data models. As startup engineers, we have to weigh tradeoffs of new vs old tools, with some new tech being game changing for startups (see PG’s much cited post about the benefits of new languages like Python: http://www.paulgraham.com/gh.html ) while many others are hyped new tech that damages startup productivity.

- SLAs are hard in databases: Early products make choices that customers get used to (unacknowledged saves), and you can’t quickly just change the default behavior, esp in a database. 10gen’s early customers loved not waiting for the writes, and only later did 10gen realize that this was an issue for others (the default is very different in nearly every other database). To migrate their first customers without dislocation, they had to hold off on changing the default behavior for longer than they wanted to. In Eliot’s view, their competitors would unfairly argue that this default was a way to juice benchmarks, hoping to stoke anger at MongoDB and cut into their growth. (I tend to have sympathy for 10gen’s perspectives, with the controversial, mistaken benchmark that I referred to in part 1 looking to me like an honest mistake)

Generally, he felt that the issues with MongoDB were few and far between - and that the benefits of MongoDB was game changing for so many startups. Many of them would not have survived without MongoDB.

I’ll add some more notes from the interview in part 3, where I go into 10gen’s early marketing. I also let Eliot know that I would be happy to share/publicize any broader responses/critiques he writes, so that we can have a thoughtful debate that benefits others (and he can point out issues in my arguments).

(Apologies in advance if I’ve made mistakes in representing Eliot’s views)


> To migrate their first customers without dislocation, they had to hold off on changing the default behavior for longer than they wanted to.

This does seem like a sticky situation, but a potential solution springs to mind. Maybe there's a reason this would have been infeasible, but why not introduce a "MongoDB Legacy" product line which would be a fork with unsafe defaults, secondary to the main product/branch with safer defaults? That way the old customers would have a clear upgrade path every release, just like the customers on the safe product, at the small expense of 10gen having to cut 2 releases each time and mind the diffs around options and defaults. Maybe this would have been more expedient than waiting until version 2.6 of MongoDB?


> In that sense, he argues that Amazon RDS is a form of NoSQL today

Lol. What?

RDS is literally hosted RDBMS SQL databases.


I had a similar reaction.


Having personally talked to Eliot many times on the phone going over our substantial issues with Mongo at scale back in 2012, including data loss, I find him saying it was "exceedingly rare" rather amusing.


This brings back some memories from 2012. I was updating an internal tool at the company where I worked, and after a lot of thought I decided to replace a tried and true MySQL database with Mongo. It really seemed like a good idea at the time - we were storing loosely related documents where not having a fixed schema was an advantage.

Any advantage I got was lost by the time I built all the logic to handle the joins I did need to make (those join keys weren't so schemaless after all). And then I needed logic to make sure every record had the right keys, that they made sense, etc. This was all stuff that MySQL had been doing for me.

But the real kicker was that Mongo made my performance worse. When I migrated my database (roughly 10 million records), I compared before and after sizes of the actual files. Mongo's were 2-3x the size. I didn't realize it before starting, but Mongo's preallocation model gave me huge files with sparsely written records that had room for future updates. But I didn't need that extra room because my data rarely changed. So I ended up with larger files that took longer to scan for unindexed queries (mostly for reporting), which meant I had to index more stuff, which increased my memory usage, and forced me to upgrade to larger servers.

Had I stuck with MySQL, I would have been fine. Many expensive lessons were learned from this, which I guess was valuable.


I'm a long-time MongoDB user and largely a fan of it because I believe the interface is superior to some text-based SQL statements.

But whatever DB you like, if you move the state of your application to another application (i.e, a database) you better make sure you really understand how it works.

For SQL databases, many people think they know how they work, but misconceptions seem widespread. Essentially, many beginners believe that SQL databases guarantee serializability everywhere and all the time, relying on the database (and OR-mappers) a lot / too much in my opinion.

Choosing a system that only offers very few guarantees forces you to think about them more explicitly. On the other hand, if you never bothered to understand SQL, you probably won't bother understanding anyDB's restrictions and guarantees either, and then everything goes to sh*t.

Some databases might be easier to understand than others, but I feel MongoDB is on the 'easier' end here. YMMV.


Couple of points:

1. ORMs I'm not sure why you dismiss ORMs. Have you ever used a good ORM? There are a lot of high-quality ORMs out there, that do the heavy lifting, provide type safety, and take care of the boilerplate. There is no reason to write text-based SQL anymore.

2. Relations: Foreign keys are incredibly useful. They're the biggest feature I miss out in relational databases. They help keep your data in a sane state. If a user deletes his account, you can configure your database use a DELETE CASCADE to automatically remove all data associated with the user.

3. Normalization: NoSQL databases encourage you to denormalize your data. I've seen NoSQL databses frequently run into problems with stale data, with date that is duplicated and not kept updated, where you have multiple out-of-sync version of the same piece of information. With NoSQL database, you have to handle all the complications of denormalization. (You, the developer, have to remember to update/delete/etc from the multiple places the same piece of data lives in.) The database doesn't do it for you. (The database is a dumb key-value store, nothing more.) Relational databases encourage keeping the logical design of your databse normalized. To quote Wikipedia: "The preferred method is to keep the logical design normalised, but allow the database management system (DBMS) to store additional redundant information on disk to optimise query response. In this case it is the DBMS software's responsibility to ensure that any redundant copies are kept consistent. This method is often implemented in SQL as indexed views (Microsoft SQL Server) or materialised views (Oracle).".

4. Schemas: The worst thing about NoSQL is the absence of an enforced schema. Schemaless databases are a scourge. There's always a schema -- it's just that it's scattered all over the code. If you are joining a new company, you have to sift through piles of code to figure what the structure of the data is. Schemas are like types, and my dislike for dynamically typed languages carries over to schema-less database. Relational databases make you think carefully about the schema, and specify the schema explicitly.

In the end, you end up needing to do a lot of extra work, likely end up with more unstable and buggy system, just to avoid the small amount of totally-worth-it upfront work that setting up a relational database requires.


As someone who thinks both systems have their place, there are a few issues in your arguments:

1. ...There is no reason to write text-based SQL anymore.

There sure is! Unless your schema is dead simple your going to run into places where hand tuning is a requirement. Lots of ORM's are third rate at best, people swear by them but don't have complex data that would make one throw fits.

3. Normalization: NoSQL databases encourage you to denormalize your data. I've seen NoSQL databses frequently run into problems with stale data, with date that is duplicated and not kept updated, where you have multiple out-of-sync version of the same piece of information.

Nosql doesn't encourage you to de-normalize, and there are reasons to de-normalize in an RDBMS. Denormalization when done right should NOT result in duplication, if it does your doing it wrong. Both systems are afflicted by these poor choices. It is easier to screw it up in a document store like mongo.

4. Schemas: The worst thing about NoSQL is the absence of an enforced schema... If you are joining a new company, you have to sift through piles of code to figure what the structure of the data is...

Again were not talking about a problem that is exclusive to document stores. You can just as easily have queries all over the codebase that make changing your RDBMS hard. People also do very stupid things like throwing giant json blobs (and before that xml) into their RDBMS and expect it to work. However a schema makes dealing with data apart from your application a much simpler task, unless you throw all those rules out the window, and a shocking number of people do.


> Denormalization when done right should NOT result in duplication

Isn't that what it is by definition? Can you give an example of how would you denormalize without causing some sort of data duplication?


There are two main reasons to denormalize: security, and performance.

The former (security) is something I have only ever had to deal with twice. If your going to cordon off data based on some arbitrary ruleset (3rd party) it is a path you can take. For transactional integrity, and from an operations perspective both cases were perfectly valid.

The latter is one I have also only seen rarely.

I have seen it done to keep a history of actions without cluttering up the primary table. As an example I worked on a DB with millions of entries for users, and a separate table for emails, when the user updated their email, it would add a new row, verify it and then fetch the newest complete row when the user was queried. Getting rid of all the history in that table provided a fairly significant performance bump.

The second was to split little used columns in a single table out from the frequently accessed ones. The resulting trimmed table then fit in memory and was orders of magnitude faster than going to disk.

Is there ever a case to denormalize and duplicate? There are a few:

OLAP systems are unique beasts, and duplicate and denormalize in many a strange way, and all in the name of performance and ease of use for end users. The reality is that these systems usually have a SINGLE point of entry for adding and updating records (that aren't materialize views of end users). OLAP systems are weird, because nothing makes sense in them, but most of them (with all their duplicate data) aren't primary stores. Oddly I have replaced a handful of OLAP RDMBS with document stores...

The other major reason to duplicate is for "snapshotting". If you need to know the state of one record when another record was created then your probably going to end up with duplicate information. IN every case where I have seen this the words "compliance" and "audit trail" are part of the conversation. However rather than being problematic the decoupled nature of the duplicate record is desirable.


Most of your points are ripe with folly.

1. ORMs are slow, but generally they are worth it, and it's trivial to dive into raw queries when needed for performance reasons.

2. Mongo supports relationships, not sure what you're trying to get at. Reverse delete rules are TRIVIAL to implement in your models. This is NOT the fault of the database engine, but fault of the developer, 100%.

3/4. Lack of normalization and schema is only a problem if you let it be. This is why model layers exist! If you are having to look "all over the code" to figure out the models, then the problem is you don't have any models.

This problem isn't even limited to Mongo either, this can happen on a SQL-backed project too if the developer is an amateur. Not having a model layer is the fault of the developer (or architect), not to be blamed on the DB engine of choice.


You should try to not start a comment with an attack. It just makes you look bad. See: http://www.paulgraham.com/disagree.html

Regarding your points:

> 1. ORMs are slow

Maybe you haven't used any recent ORM. Where did you get this idea from? In the JVM world, Hibernate is nearly dead for new projects. EBean[1] and JOOQ[2] are both great choice server-side Java/Kotlin/Scala projects. On Android / for SQLite, there's DBFlow[3], which I've used, and had great experiences. These ORMs are fast, intuitive, and powerful.

Also relevant is: https://dzone.com/articles/martin-fowler-orm-hate

> 2. Mongo supports relationships

Mongo's approach to relations[4] is far weaker and less powerful than what an RDBMS can handle. It was almost tacked on later as an afterthought. In addition, joins aren't a easy and straightforward with the kind of relations that NoSQL databases, like MongoDB support.

> 3/4. Lack of normalization and schema is only a problem if you let it be. This is why model layers exist!

NoSQL databases actively encourage de-normalization. For example, just take a look at this Firebase blog post[A] or see this page in their documentation[B]. It's almost sad and a shame that a company like Google is encouraging such poor ideas and practices; practices that they likely don't follow themselves (but I could be wrong).

I know some young devs, who just graduating out of college, swoon all over databases like Firebase's. They read articles like these that say "schema-free" is cool, and denormalization is "normal". Ugh... I almost want to throw up.

Model layers are optional. I recently worked at, and quickly quit, a (Fortune 500) company that used MongoDB, had an incredibly complex data model, and had no model layer or schema enforcement of any kind. Their codebase was thoroughly bug-ridden, and their heavily denormalized data had out-of-sync and stale copies all over the place. It was disgusting.

[1] http://ebean-orm.github.io/

[2] https://www.jooq.org/

[3] https://agrosner.gitbooks.io/dbflow/content/

[4] https://stackoverflow.com/a/6994654

[A] https://firebase.googleblog.com/2013/04/denormalizing-your-d...

[B] https://firebase.google.com/docs/database/web/structure-data


None of your points are derogatory to MongoDB in particular, any more-so than any other database.

Your points are wholly reflective of bad implementors and their implementations, not to the underlying technologies.

If one misuses a tool, that's on them, not on the tool. Further, I think it's pretty fair to say: if it's as easy to mis-use something as it is to use it, then you have a very powerful tool that should be used carefully. If it's hard to mis-use something, then it's probably not a powerful tool.

Basically, to use an analogy, don't use a jack-hammer to re-grout delicate tiles. A chisel and hammer work better, and you definitely wouldn't want to use a hammer and chisel to break up a concrete slab. Similarly, you wouldn't use a handheld rotary tool to cut a concrete slab, but using it to remove grout is perfect. Misuse of a tool leading to damage doesn't mean the tool is flawed, bad, or broken, it means it is being used incorrectly. Can you use it that way? Sure! Should you? Probably not.

ORMS are indeed slower than straight SQL, period, always, 100% of the time. This is not opinion, but hard fact of simple logic. Generally speaking, it is rather silly to expect a complex abstraction to be faster, leaner, and more efficient than a lower level one. Adding more code and complexity will never make things execute faster. (Caching does not count, obviously, that's not the point here)

That F500's total lack of effective systems architecture is the problem, not the database they used.

This is like building a treehouse with nails driven at 90 degree angle, and then blaming the hammer when it falls apart. Totally, man, you definitely shouldn't have used screws, or driven your nails at an angle converse to the lines of force. Keep blaming your tools...


Well

1.) It's always a text-based interface. The ORM merely hides the fact. Type safety is weak, as is with JSON, but both aren't un-typed.

2) Yes, they are, and there are a lot of good use cases for relational databases. But if you have a number of services that share nothing (especially not the database), what is the foreign key good for?

3) You can do any style with any database. Overly aggressive normalization in read-heavy environments can be bad, as is de-normalization in write-heavy environments. Choose the tool for the job..

4) Why all over the code? Can't you make it easy to find the model classes? Doesn't seem so hard. Also, is this so different from, say, EF code first? And anyway, doesn't every SQL database in fact, have multiple schemas (the one in the db, the one in the scripts, the one in the ORM and the one in the code?) Is that really so different?


* Type safety is weak, as is with JSON, but both aren't un-typed."

How is type safety "weak" with an ORM? You are using strongly typed objects.


Yes, but they don't match the data types your SQL database supports, do they? You're talking to a remote service here., i.e. varchar, decimal, datetime, etc. in SQL aren't the same as in whatever language you're using.

That's true for JSON as well, but you probably need JSON on the other side (the API) as well...


Yes. When you use Entity Franework's generator to map from a database table to a class, it not only maps the types from the database to your class, it also maps constraints like charactr length, nullable fields vs. non nullable fields, etc.

The same for Mongo if you are using the C# driver. If you get a Mongo collection using the standard C# syntax --

var collection = database.GetCollection<Users>("Users");

You will work with a strongly typed "schema". As long as you are working with the "collection" object, you will be working with strongly typed objects and doing type safe Linq queries.


> There are a lot of high-quality ORMs out there, that do the heavy lifting, provide type safety, and take care of the boilerplate.

I am a happy user and can recommend the following ORMs:

for Java, Hibernate (a classic)

for C# in .NET, how about... NHibernate?

for Python... SQLAlchemy.

Really good software. WARNING: they require you to read the manual.


Couple of points: 1. ORMs I'm not sure why you dismiss ORMs. Have you ever used a good ORM? There are a lot of high-quality ORMs out there, that do the heavy lifting, provide type safety, and take care of the boilerplate. There is no reason to write text-based SQL anymore.

I use ORMs to do just what the name implies -- to map relational models to objects. If I'm working with objects anyway, either I'm going to end up using an ORM or creating my own object mappings.

2. Relations: Foreign keys are incredibly useful. They're the biggest feature I miss out in relational databases. They help keep your data in a sane state. If a user deletes his account, you can configure your database use a DELETE CASCADE to automatically remove all data associated with the user.

Or I can just as easily use my User management micro-service to enforce business rules.....

3. Normalization: NoSQL databases encourage you to denormalize your data. I've seen NoSQL databses frequently run into problems with stale data, with date that is duplicated and not kept updated, where you have multiple out-of-sync version of the same piece of information. With NoSQL database, you have to handle all the complications of denormalization. (You, the developer, have to remember to update/delete/etc from the multiple places the same piece of data lives in.) The database doesn't do it for you. (The database is a dumb key-value store, nothing more.) Relational databases encourage keeping the logical design of your databse normalized. To quote Wikipedia: "The preferred method is to keep the logical design normalised, but allow the database management system (DBMS) to store additional redundant information on disk to optimise query response. In this case it is the DBMS software's responsibility to ensure that any redundant copies are kept consistent. This method is often implemented in SQL as indexed views (Microsoft SQL Server) or materialised views (Oracle).".

Why would you have your business logic strewn all over your code base instead of using a common library/microservice that all of your code depends on? I wouldn't design a system even with Sql where all the code modifies the database willy-nilly.

4. Schemas: The worst thing about NoSQL is the absence of an enforced schema. Schemaless databases are a scourge. There's always a schema -- it's just that it's scattered all over the code. If you are joining a new company, you have to sift through piles of code to figure what the structure of the data is. Schemas are like types, and my dislike for dynamically typed languages carries over to schema-less database. Relational databases make you think carefully about the schema, and specify the schema explicitly.

Why is your schema "scattered all across your code? I use Mongo with C#. When I'm reading from writing to a collection, I'm not reading/writing BsonDocuments.

A typical code snippet from C# using RoboMongo is:

var collection = database.GetCollection<User>("Users").AsQueryable();

All of my Linq queries, inserts, updates, etc. are strongly typed objects with autocomplete and type safety.

The "User" object is defined in one central place.

In the end, you end up needing to do a lot of extra work, likely end up with more unstable and buggy system, just to avoid the small amount of totally-worth-it upfront work that setting up a relational database requires.

"It's a poor carpenter that blames his tools"


> "It's a poor carpenter that blames his tools"

I think the point of the article is that Mongo is being used mostly by "poor carpenters", and provides worse defaults than SQL does. It's obvious you're operating at an advanced level that young and hungry startups aren't.


...which brings us back to the original point: if you use tools you don't understand, chances are things go wrong.


Key-value stores provide very little. The whole deal with SQL/relational databases, is that they provide much much more. The SQL database like CockroachDB are built on top of RocksDB, a key-value database: https://www.cockroachlabs.com/blog/sql-in-cockroachdb-mappin...

It's the difference between assembly language and a high-level language. Key-value stores are simple, basic, and primitive. What's wrong with that? It's like a farmer saying, I don't want to learn how to use a tractor (it's too complicated), and who chooses to farm with a hand tools instead.

> if you use tools you don't understand

Yea.... key-value stores are incredibly complex and require years of learning to understand.

I could explain a key-value store to a five-year old. In fact, I think even a pigeon can understand the fundamental principle underlying a key-value store. https://en.wikipedia.org/wiki/Pigeon_intelligence

For any non-trivial data model, go with something that can capture your model at a high level, like a relational database or a graph database.


> Some databases might be easier to understand than others, but I feel MongoDB is on the 'easier' end here. YMMV.

I agree with most of this, relational databases have a lot of moving parts that many developers (myself included) don't fully understand.

However, I believe relational databases (Postgres is what I have experience with) has defaults that are basically correct, and unlikely to cause significant issues, whereas in my experience MongoDB did not have this. MongoDB might be easier to understand on the surface, but I think that's deceptive.


> I believe relational databases (Postgres is what I have experience with) has defaults that are basically correct, and unlikely to cause significant issues, whereas in my experience MongoDB did not have this.

Strongly, strongly agreed. Learning every moving part in InnoDB (as an example) is quite a task. But not knowing about those features will rarely burn you, and it's usually apparent when you need to know something you don't.

Meanwhile Mongo... well, Mongo says a replicated write is complete as soon as it's queued to be written. They eventually updated the default settings to be slightly less catastrophic, but still far from good. What you don't know about Mongo can and will hurt you, which arguably makes the appearance of ease a drawback.

http://hackingdistributed.com/2013/01/29/mongo-ft/


Man, people just looooove to link 5 year old blogs about fault tolerance, meanwhile that isn't even relevant anymore, and honestly, it wasn't relevant at the time either, because changing the write concern is trivial.

The real nugget of truth here is this; don't use tools that you don't understand... PERIOD.

Further, there are a plethora of alternative engines for Mongo that do have fault tolerance in mind.


+1 - Mongo is much easier to misuse. Not to mention that many users of SQL these days use ORM's that abstract many of the complex SQL features with a professional library that actually knows how to handle the small details and only exposes a basic subset of functionality that is good enough for most use cases.


> However, I believe relational databases (Postgres is what I have experience with) has defaults that are basically correct, and unlikely to cause significant issues, whereas in my experience MongoDB did not have this. MongoDB might be easier to understand on the surface, but I think that's deceptive.

Postgres has good and sane defaults. Also the documentation INSIDE the config file is top notch. The one thing which makes postgres the golden standard in my opinion is its extensive documentation. Documentation is key to maintaining an RDBMS

Mysql does some weird things on an initial install, and documentation is sadly all over the place.


"some text-based SQL statements"

What do you have in mind there? Systems that process SQL statements interpret these statements, validate them for correctness against a schema and then provide an auditable plan. Just because SQL bears some resemblance to spoken/written English (a property that vanishes pretty quick when you get past simple use cases) doesn't mean it isn't rigorously analyzed by database systems; SELECT isn't some alternative form of 'grep.'

Perhaps you're thinking of SQL injection attacks that plague that the LAMP stack? If so then you should know that NoSQL isn't immune to injection attacks. Here[1] is an OWASP page on testing for NoSQL injection vulnerabilities; the same sloppy coding patterns that allow attackers to synthesize SQL statements are also manifest in NoSQL applications.

[1] https://www.owasp.org/index.php/Testing_for_NoSQL_injection


Being condescending isn't helping your argument here. No, I didn't think of SQL injection and I do have a vague understanding of the SQL grammar. The NoSQL injections seem a bit constructed and pretty much can't happen with a strongly typed language in the application layer.


> Choosing a system that only offers very few guarantees forces you to think about them more explicitly.

Yes, of course. And it also forces you to implement, on your own: relational (key) constraints, real transactions, etc.

And thus, you, on your own, compete against more or less 44 years of research, developing, and releases by brilliant computer scientists and engineers (Ingres DB, 1973- PostgreSQL latest version, 2017), scientists who already solved those problems {relational constraints, transactions, etc} and made said solutions perform to the max possible.

So a better alternative is to simply... learn more about relational databases.


I'm sorry, you're so right. I'm just one of the idiots on the Internet who haven't used a system for 10+ years (or the bulk of their existence at least) and makes condescending comments...


nothing much new here.

mongodb marketing/PR was a problem, advertising a general purpose RDMS alternative without the technology to support it.

users were also a problem, using a technology without understanding the architecture and trade-offs.

to be over critical of "mongodb" though is unproductive. there were a lot good people working on new software to solve hard problems and I'd like that to continue.

lets just not repeat the same mistakes. don't get caught in marketing hype, and carefully evaluate technology decisions.


'One 10gen engineer made this point in analogizing SQL to Cobol, arguing that "SQL is Annoying":'

Why wasn't this the tagline for Mongo to begin with? At least it shows the arrogance, ignorance, and outright stupidity behind the database. This is not an engineer's comment. It cannot possibly be now, decades after SQL has become a de facto standard for some very good reasons (outlined in the article and elsewhere, won't rehash here). It is a marketer's comment, probably spoken by an engineer who isn't qualified to build simple demo apps, let alone anything as complex as a database. Mongo's CTO seems to fit this bill of marketer more than anything else or how would he be able to make the claims he makes with a straight face?

Our industry has a problem with fads and decisions based on feelings. Even if SQL is annoying, it's incredibly stupid to base your tech choices on feelings. It's even dumber to choose a product whose engineers build the product based on feelings. Last I checked, I thought we were an industry of engineers, trying to apply scientific principles and some human ingenuity to build software. Where does 'annoying' fit into that? Or other adjectives I often see thrown around on these boards like 'clean' and 'slick'. I know it's hard to quantify the quality of software and speak about it with any coherence, but this is beyond incoherent. 'Annoying' as applied to SQL tells me nothing. 'Annoying' as applied to the engineer making this incredibly stupid comment tells me that this engineer is either lazy, gullible, or just downright stupid. Have we seriously lost our ability to discern hype from reality that we let people and companies like this dictate our technology choices, throwing out decades of solid research in computer science for what some idiot at a company more specialized in marketing and PR than engineering tells us?

I'm willing and ready to hear well-thought out criticism of SQL and RDBMS. What I'm not willing and ready to hear is some idiot's feelings about SQL or RDBMS. You want to make a claim? Measure it. Present a report with data. But goddamn, if one of my reports came to me with this kind of stupidity, I'd give them a chance, but if they persisted, I'd fire them. This isn't revolutionary thought. It's not evolutionary thought. It's lazy idiots who don't want to learn SQL, a language so easy even business people with essentially no computer skills can pick up.


What about for IoT devices? I could see NoSQL still having a place there, although I could also see it working just as well with a SQL database.


What is special about IoT devices that make them more suited towards a NoSQL solution? Scale? Is that really an issue for the majority of IoT deployments?


Scale and also the data structure is often times very flat; usually just a bunch of records from sensor readings.


most IoT devices do not really need to be scalable themselves. The backend/processing side of IOT needs to be scalable. And it already mostly is doing that on both network side (ipv6) and datastore side (TSDB?).


> Schemas: Schemaless does not mean no schema; instead, it means an implicit schema in the app (a particularly challenging misnomer for anyone outside our industry)

Still a challenge for many in the industry.


These "Why technology X is worthless" types of posts and statements always give me a bit of a laugh. As with most technologies, there is a) a learning curve, and b) best practices to follow.

Many of the comments here seem to discuss a few things which, in my opinion, lead me to believe that the implementation is incorrect or decisions are being made on old data or based on older versions of MongoDB. Making arguments against any product based on previous versions seems to be counter productive. Most of the posts here don't make reference to a specific MongoDB version, but many reference their experience with the product in 2012.

Assuming that these experiences were at the end of 2012 with the most current at the time version, we are still talking about version 2.2.2. The current stable version is 3.4.6. As you can imagine, there have been many advances to the product in five years. Basing one's knowledge and opinion of MongoDB on old versions doesn't seem logical. Even back in 2012 though, there was a lot of misinformation about MongoDB. A blog post from that time period (https://blog.serverdensity.com/does-everyone-hate-mongodb/) argues some of that information at that time.

Many other comments were made based on what seems to be a lack of spending time with learning MongoDB. It is a non-relational document store. It is NOT a relational database. It requires a different way of storage design and thinking and attempting to force MongoDB to be a SQL-like database is silly. SQL is indeed a popular option and can work well.

However, non-relational databases work well too. There are lots of implementations of it across a lot of companies. A few relatively recent posts are good examples https://engineering.snagajob.com/mongodb-in-aws-does-it-real... and https://mongomikeblog.wordpress.com/2016/04/29/why-we-went-w... That doesn't account for many of the large companies out there using it, like Expedia, Facebook, Forbes, to name a few.

If we want to have a discussion about specific, current, features of MongoDB, that's great. There are a lot of them. Many have been mentioned. Some have been mentioned as a negative due to what appears to be poor implementation. If we want to debate pros and cons of any product, we should be sure that we are talking about how things are intended to be implemented and not some hacked together approach.

Perhaps the marketing spin was too aggressive. I'm not a marketing person. I'll leave it to Mr. Horowitz to backup his claims. But if we are going to have a discussion about the pros and cons of MongoDB, can we at least agree to not talk about old versions? I mean, I had a bad experience with Windows 3.1, so should I not use Windows anymore? ;-)




Applications are open for YC Summer 2019

Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | Legal | Apply to YC | Contact

Search: