Hacker News new | past | comments | ask | show | jobs | submit login
The Marketing Behind MongoDB (nemil.com)
322 points by nemild on Aug 29, 2017 | hide | past | favorite | 221 comments



100% of my friends who have used Mongo/similar NoSQL have given up and had a nasty rewrite back to pgSQL.

This seems to be the journey:

1. Lack of migrations is awesome! We can iterate so quickly for MVP

2. Get users

3. Add features, still enjoying the speed of iteration

4. Get more users

5. Start building reporting features for enterprise/customer support/product metrics (ie: when the real potential success starts)

6. Realise you desperately need joins, transactions and other SQL features

7. Pause product dev for 1-3+ months to migrate back to SQL, or do some weird parallel development process to move it piecemeal back.

I think the most interesting question though is would they be able to get MVP and initial customers that set off this if they were moving (slightly) slower due to SQL and slight overhead that comes with?

My thought is definitely yes.


> I think the most interesting question though is would they be able to get MVP and initial customers that set off this if they were moving (slightly) slower due to SQL and slight overhead that comes with?

I've used Postgres and Mongo pretty extensively, and for any reasonably seasoned developer, the startup overhead of an SQL system is a myth. There may upfront cost to learning how an RDMS and SQL work in the first place, but once you're familiar with them, they'll be faster than Mongo on any new project.

The schemaless concept of a document database seems to be the major selling factor in velocity of movement, but once you've got a good handle on a migration framework in the vein of ActiveRecord or other popular software, that's negated completely. It also really doesn't take long before schemaless starts to cause big problems for you in terms of data consistency -- it's not just the big players that get bitten by this.

The simplified query language is another one. SQL is a little bit obtuse, but it's not that bad once you have a handle on it, and a lot of people are familiar with it. Once you add in an ORM layer, the lazy-style access of a framework like Sequel or SQLAlchemy makes the developer experience quite a bit better than any Mongo APIs that I've seen. Also, after you get beyond trivial usage, SQL's flexibility so wildly outstrips Mongo's query documents that it's not even worth talking about.

Postgres on the other hand ships with a great management CLI, a very powerful REPL (psql), and features like data types/constraints/transactions that guarantee you correctness with zero effort on your part. I can only speak for myself, but I'd take Postgres to the hackathon any day of the week.


I totally agree with you, and started writing something about how understanding a good ORM takes nearly all the headache away.

I think the thing people do find slow is a lot of 'documents within documents' in SQL. It turns out this is usually a bad development pattern long term but it is super fast being able to just add docs inside docs with no configuration. It feels very slow writing foreign keys, navigation props and schemas for this in SQL vs JSON, where you can just dump your object in and you're done.

Basically; I think with noSQL you get some very short term gain for a lot of long term pain, and you're right, ORMs and other tooling solves this mostly.

I myself fell for this trap, and while it was a nightmare it actually matured me more as a professional more than anything I've ever done recently. Regardless of crazy hype, I don't think I'll ever fall for a solution so easily without evaluating it properly.

I think I assumed the "crowd" had done the tech due diligence on this stuff and it definitely wasn't the case.


I agree that Postgres trumps Mongo for most use cases, but if ORM is the answer, you might be asking the wrong question.

I love that Fowler discusses[1] how to avoid ORMs and offers only two possibilities:

> Either you use the relational model in memory, or you don't use it in the database.

He clearly has a blind spot. The correct answer, at least in some cases, is don't use objects in the first place.

[1]: https://www.martinfowler.com/bliki/OrmHate.html


I've been quite skeptical of the anti-ORM sentiments, since I always found the ORM handy, not a problem etc.

Two things recently started to change my mind

One is, after profiling some views in our Django app I found that a significant portion of the time in list views (i.e. returning many objects) was spent instantiating ORM model objects. Not on fetching them from the db, but turning them into objects whose imminent destiny is to get turned back into dicts for serializing to JSON.

So we switched to values() queries which return plain dicts, but then the ORM is just a query engine.

The second was, in a large app, finding that many of the tables start to require a degree of care to only perform 'safe' queries, i.e. ones supported by appropriate indexes. The integration of the ORM into the framework provides many conveniences, but this tends to involve a lack of awareness/control over exactly which queries get made. Most of the patterns that are 'idiomatic' to the framework also tend to involve a lot of ORM querying logic pushed down in the view layer... while our need to perform only a restricted set of 'safe' queries pushed us in the other direction, towards a repository layer with carefully curated query code.

I'm seriously considering exploring ORM-less development in future


There's no single technology that's perfect for all situations. An ORM is really useful for a lot of very common tasks. Most apps include a lot of reading/writing of single rows and basic list queries for which ORMs are a superior way of working.

But for anything involving complex querying, I typically just go straight to SQL. Most ORMs provide plenty of ways to handling different levels of query need from full ORM queries, to queries into entity objects, to queries into flat dictionaries.

I don't think ORM-less development is the answer to the wisdom that an ORM isn't a complete and sufficient interface to the database. That's just throwing the baby out with the bathwater.


"The integration of the ORM into the framework provides many conveniences, but this tends to involve a lack of awareness/control over exactly which queries get made."

I think that the lack of visibility is the problem, more than ORMs as such - one of the things that Rails got right is the logging.

In development, Rails will show you every SQL query as it happens, so you can see the blizzard of SQL queries that your request-response cycles are generating, and can spot crazy stuff once you have conditioned yourself to actually read the logs. Conversely, I've found that not being able to easily see what the ORM for your platform is doing can be really disempowering.

I suspect that nicer graphical tools would make this much more accessible: a lot of developers seem to either treat logs as noise, and I've seen some junior devs appear almost afraid of them.


It's easy to get logs, it'd be better not to deploy code with unsafe queries and then have to scan through the logs to see why your database is on its knees though


I haven't thought of this before, ORMs probably use 2-3x as much memory as regular structs/dicts. I've started using only the query builder in all my side projects and I've had a good experience so far.


yes, not just the memory... for models with lots of fields, in Python+Django, the time overhead of instantiating these complex heavyweight objects can really add up if you have a whole 'page' of them

I always figured the main overhead was doing I/O with the db, but in these cases it's not always so

interesting blog post about this from the creator of SQLAlchemy: http://techspot.zzzeek.org/2015/02/15/asynchronous-python-an...


The correct answer, at least in some cases, is don't use objects in the first place.

That at least fits my interpretation of what Fowler meant by "use the relational model in memory," though--if you're using a relational database, don't try to directly map objects to relational tables, but take advantage of what SQL offers instead. That's consistent with not using objects in the first place (although I suppose it doesn't require it).


I may indeed have interpreted that uncharitably.


Why do people always trot out the SQL relational databases?

If I had to do it all over again, I'd use a graph database. They work better and are faster for most web apps and social stuff. Including reports and joins.

And in my experience, I have avoided joins and used the database mostly as a key value store. Which makes it perfect for when we need to scale. We just make a CockroachDB adapter and that's it!


The issue is that if you can avoid joins long term, then you really either don't have enough data, enough complexity, or very rarely managed to have a data set that doesn't need joins.

If it is option 1 or 2, you just didn't need joins YET. When the time comes that 1 and 2 no longer apply it will be VERY hard to add them.

Sure you can use both types of databases where it makes the most sense but at that point you might as well just use the "NoSql" functionality in the relational db. PSQL does a JSON doc store VERY well and even allows deep joins when time comes.

I personally never saw much benefit to the graph databases. Sure the are great for a few data structures but most of the speed differences came them from being in memory more than anything else. Pretty much everything is fast with a good data structure design and everything in memory.


I have noob questions about what you've written here. I apologize for asking them here but you seem like you think clearly and know what your are talking about.

If you are in case 3, then is that when NoSQL allegedly starts paying off? If so, is the payoff in actual performance or just not having to write SQL?

I know you said that you could just use the NoSQL features of the relational DB. If you really truely know for certain that you will never need to join your data, then isn't using an entire relational DB software package a lot more overhead?


I case 3 you get get some real speedups, but you usually have to design the db layout for the use case. Step one is "How am I going to lookup this data" then design the data structures for that use case. Often(but not always) in these cases the NoSql db is just a lookup db with the system of record being a somewhere else and data is transformed into multiple data sets for lookup.

IMHO, most of the NoSQL hype was developers not wanting to learn sql/ddl along with a lot of resume padding. It is also important to note that NoSQL no longer means "No SQL" but really is now "Not Relational and/or ACID" (https://en.wikipedia.org/wiki/ACID). The only thing worse than SQL is everything else created to replace it. Many "NoSQL" products use a limited SQL like syntax as their only query format.

As for relational DB overhead, there is no reason for it be more then NoSQL(see sqlite) unless you are willing to give up part of ACID. Special cases like this are another use case for NoSQL but you probably want ACID unless you have data/throughput to make it nonviable. Otherwise you have it just ignore issues (normally what happens) or code for them in your app.


Actually major graph databases store data on disk as well. That's not the reason they are faster. The reason is that all the pointers point directly to the data.

Consider how you'd get the "top 10 books, and their related authors, and their related biographies".

1. A graph database would literally just look at an index of books, and then grab the book records. A relational database would do the same with an index, so far so good.

2. Now comes the difference. For each book, the graph database would just load the list of pointers to related authors and load those. And for each author, it would load the list of pointers to related biographies, which have pointers to related pictures etc. And in O(jk) where j is the number of books to return, and k the maximum number of things to get per book, it's done.

3. Now consider the same step for a relational database. After getting the books, it has to load the authors, and search it for each author. This takes O(k log N) where N is the total number of entries, and grows (albeit slower and slower) with increasing amounts of data. Then once every author record is loaded, it has to enumerate all their ids into a giant list and do it again. The list can be stored incrementally and the searches can be parallelized but at the end of the day ALL JOINS have an extra O(log N) factor, which is what slows down the database, usually by a factor of 10 for data that's in the millions or billions of rows.

4) The design of a graph database naturally encourages using indexes. In relational databases you have to remember to add them. And even after you do, the relational database takes log N longer to do all the joins. And joins are done very often in social networks, fetching related stuff, and other things in a normalized schema.

You can do all the relational algebra stuff while walking a graph, too.

In fact any relational database can be turned into a graph database by just adding a variable-length list of "pointers to related data" to each record which would point to the actual location of the rows in the related table's index. And then manage all relations between rows not as joins but as entries in these lists. And finally, implement support for a graph database language alongside SQL. However, I have not seen any such extensions to InnoDB or Postgres etc. which turn them into graph databases.


I understand that they do persist to disk and store data differently but the ones I've seen used in practice are in memory during runtime. It as been a few years but so maybe this has changed. Any examples?

Graph databases also are only fast when doing graph transversal lookups. They all have index just like RDB. Sure you got from parent->child->grandchild quickly but how to do you find the parent in the first place? An index normally.


But the parent is one lookup and the children are 1000. Thats why.


This is only if the 1000 children are already in memory. Otherwise it is still 1000 lookups, by index. This is why the Graph Dbs on the market tend to be in memory.

I'm not saying Graph DBs don't have their place. They do, but they are the wrong answer for MOST datasets.


NO. Each lookup thereafter is NOT by index. That's where the savings come from. The pointers already contain the exact block where to load the info from.


What's a good resource in getting introduced to "the postgre of graph databases", including it's value proposition over regular relational databases?



From what I've observed, the "crowd" is easily seduced by performance above all other concerns - correctness, security, science, etc. NoSQL was invented and gained popularity b/c it originally was easier to scale via sharding, and that was seductive enough [1] to give it industry momentum. While performance is a feature, it seemed many folks originally advocating NoSQL did not understand or appreciate the mathematical foundations of the relational model. A decade of experience seems to be driving zeitgeist back to scientific fundamentals.

[1]:https://www.youtube.com/watch?v=b2F-DItXtZs


> I've used Postgres and Mongo pretty extensively, and for any reasonably seasoned developer, the startup overhead of an SQL system is a myth. There may upfront cost to learning how an RDMS and SQL work in the first place, but once you're familiar with them, they'll be faster than Mongo on any new project.

> The schemaless concept of a document database seems to be the major selling factor in velocity of movement, but once you've got a good handle on a migration framework in the vein of ActiveRecord or other popular software, that's negated completely. It also really doesn't take long before schemaless starts to cause big problems for you in terms of data consistency -- it's not just the big players that get bitten by this.

I don't disagree with the overall sentiment, but this post is overstating things in a few places...

Schemas add friction in development. You can certainly minimize this friction with experience and tooling, but for a completely new project, no amount of experience or ORM magic is going to "completely negate" that additional friction vs a schemaless system where you can change the shapes of your data and the relationships between them without any restrictions whatsoever.

For rapid prototyping of new projects, it could be entirely reasonable to trade off long term data-consistency benefits of having a well-defined schema in return for faster iteration times. But taking that schemaless prototype to production directly is decidedly unwise, so the final step in a prototyping process like this should involve a refactoring of your data model to introduce a schema/migrations system before that point.

It would be fine to argue that that final refactoring step to make the schemaless prototype production-ready could completely negate the faster initial iteration times afforded by starting out with a schemaless DB (I too would generally agree with this assertion, but we must also remember not all prototypes will go to production, as prototypes are meant to prove out an idea, and the idea itself won't always work out). But to claim on a brand new project (where the data model is ill defined and prone to sweeping changes), one can iterate _faster_ while maintaining a schema and migrations system than on a schemaless system (given similar levels of proficiency with both systems) strikes me as hyperbolic.


> For rapid prototyping of new projects, it could be entirely reasonable to trade off long term data-consistency benefits of having a well-defined schema in return for faster iteration times. But taking that schemaless prototype to production directly is decidedly unwise, so the final step in a prototyping process like this should involve a refactoring of your data model to introduce a schema/migrations system before that point.

I don't get it. If you're not in production anyway, you're not working with real data. And if you're not working with real data, you don't have to migrate anything. The entire problem just straight-up disappears. Just blow away your database and recreate it with the new schema. Modify your test data insertion scripts to match and load them.


Or alter your schema in the client. Quite easy to right-click and 'Add New Column' or ADD COLUMN or whatnot. I'm trying this with a new project and it is resulting in a data model that feels cleaner and hand-built.

One can simply capture the schema when they've found a good structure, and then use that as the starting point for a migration system (like ActiveRecord).


The need to maintain an explicit schema and update it every time your data model changes is the point of friction here, not the migrations system. To be perfectly clear, I'm not suggesting maintaining a schema is onerous enough to outweigh the numerous other benefits having an explicit schema provides, even at the prototype stage, but when you're prototyping and don't need to worry about data migrations, having to maintain a schema is certainly not going to be _faster_ than not having to maintain a schema, as the original post claimed.


You always have to maintain an explicit schema -- it's just defined in your code instead of the database. And depending on your tooling, it's not easier/faster to maintain that schema in code vs. the database.

What is worse for schemaless databases is schema versioning. Your code will forever have to support every schema that has ever existed in production data. I have code to support schemaless data that was entered well over a decade ago. If I had simply used a relational database I would just have a single schema to deal with.

During development, schema changes are quite frictionless. It also provides extra safety and documentation that any friction is completely worth it.


> You always have to maintain an explicit schema -- it's just defined in your code instead of the database.

This is normally known as an implicit schema. It's implicit precisely because you don't explicitly maintain a schema separately from your application code.

> During development, schema changes are quite frictionless. It also provides extra safety and documentation that any friction is completely worth it.

The second sentence contradicts the first. And I'm not disagreeing with the second. The friction may be worth it, but calling it _frictionless_ was my issue with this post and the original (which went even further by asserting the friction somehow accelerated development).


> It's implicit precisely because you don't explicitly maintain a schema separately from your application code.

But that's not completely true. There is plenty of code necessary for managing the schema and just the schema in a document database. Default values, relationships, validation, and managing previous version of the schema. I have plenty of code that just exists to handle documents in the "old" format.

> The second sentence contradicts the first.

Maybe I should have said almost frictionless instead of quite. And honestly, some things are less work with a schema (like creating a new defaulted boolean column) in an RDBMS than in a schemaless design. So the friction is relative.

> The friction may be worth it, but calling it _frictionless_ was my issue with this post and the original

I have a similar issue with calling a document-store _schemaless_. It's not schemaless, it has a schema. In fact, it has as many schemas as there are changes to the structure of the data. And this, in my opinion, is the biggest negative to that kind of design.


I don't think we're really disagreeing on anything, just talking over each other a bit. Let me try to clarify myself:

> But that's not completely true. There is plenty of code necessary for managing the schema and just the schema in a document database. Default values, relationships, and managing previous version of the schema. I have plenty of code that just exists to handle documents in the "old" format.

This is only the case because you chose to _add_ an explicit schema on top of whatever schemaless DB you were using for maintainability reasons, which is certainly necessary if you're building a production app. However, in this case I was talking specifically about using schemaless DBs in the context of prototyping, and adding an explicit schema (that specifies things like default values, relationships, and migrations) is certainly not mandatory in the prototyping phase when you're not dealing with any real/past data or migrations.

MongoDB and other schemaless databases by default will happily accept whatever document you want to store in it without any care in the world, and this can be a desirable property for iterating as quickly as possible, but in production you definitely want to specify an explicit schema of some kind on top of Mongo as you have done, or just move off of Mongo altogether onto a proper relational database with a mandatory explicit schema.

As to which is the better long-term choice for a production app, I totally agree with everything you said. With schemaless databases, even if you add an explicit schema on top of it, there's always still the possibility to underspecify in your schema and re-introduce implicit data dependencies into your data model and application logic, which can lead to nightmarish bugs in production (I've experienced many instances of this first hand). It's much better to use a database designed around a mandatory explicit schema for relational data for the stronger data consistency guarantees they provide once you start handling real, persistent data.


When you're prototyping, you can be equally as sloppy with a relational database and make the same productivity gains. However, the sort of thing you describe doesn't even sound like a real prototype but rather something fairly trivial:

> by default will happily accept whatever document you want to store in it without any care in the world, and this can be a desirable property for iterating as quickly as possible

I fail to see how a database full of mismatched documents is valuable in prototyping. My own experience with prototyping in an RDBMS is just a constant evolution of the existing sample (sometimes real) data. Adding new columns is trivial, breaking up a single column to a one-to-many is a simple insert..select into a new table as one example. Similar transformation with a document-store involves writing a lot of throw away code.

But from another perspective entirely, I find actually designing the schema to be the best place start when I prototype an application. If I have the database design correct then designing the corresponding UI or API is almost trivial. Now obviously that's just one style of development but it's a no less valid one. And I don't have to do anything more to move to production.


> I fail to see how a database full of mismatched documents is valuable in prototyping.

Again, as you yourself even brought up, just because you don't have an explicit schema doesn't mean your data needs to be schemaless. There can still be an implicit schema to your data that depends on the shape of the documents you store in your database. And in a prototype without real data (I'm defining real data here as data that can't be trivially discarded without consequence), the documents that actually end up in your database will have a uniform shape simply because your code only operates on the current version of that implicit schema, and older versions can simply be deleted.

> But from another perspective entirely, I find actually designing the schema to be the best place start when I prototype an application. If I have the database design correct then designing the corresponding UI or API is almost trivial. Now obviously that's just one style of development but it's a no less valid one. And I don't have to do anything more to move to production.

That approach is certainly valid. I just wanted to make it clear that using a schemaless database doesn't mean you can't do any schema design up front, it just affords you the ability to skip the overhead of updating and adhering to an explicit schema at every step in the evolution of your prototype. In order words, you let your code and your product needs drive the changes and growth in your schema, and only look towards solidifying the changes in your schema into an explicit document to adhere strictly to once you have more concrete insights into how your data model needs to look, driven by the data needs of a working prototype.

I feel I've stated my position as clearly as I could, so I'll leave it at that. Feel free to reply if you still take issue with anything I said, but I probably won't be responding.


I think you have stated it well. And I agree that there is overhead in creating a schema. I think, however, where we might disagree is on whether or not that overhead is anything more than trivial.

I think ultimately this is similar to the debate on static vs. dynamic typing.


If you're using an ORM, you can generate the schema automatically. With Hibernate, you set the hibernate.hbm2ddl.auto property [1]. If you set it to 'create' or 'update', it will alter the database schema so that it matches your objects, which is handy for prototyping, and terrible for production. If you set it to 'validate', it will check that the database schema matches your objects, and refuse to start if it doesn't, useless for prototyping but ideal for production.

[1] https://stackoverflow.com/questions/438146/hibernate-hbm2ddl...


That's a neat feature, but as you may already know, ORMs come with their own baggage in the form of object-relational impedance mismatch. There's an entire rather-substantial Wikipedia article on it, so I won't go into it any further myself: https://en.wikipedia.org/wiki/Object-relational_impedance_mi...

The sibling thread also brings up similar issues with ORMs: https://news.ycombinator.com/item?id=15127058


ORMs are a solution to the problem of the impedance mismatch between an object model and its corresponding relational model. It's not that ORMs come with the problem.

I think it increasingly rare today to see real object domain models anyway - the original raison d'etre of ORMs. Instead it's common to see domain concepts modeled in the data layer and modified by logic (often limited to CRUD) in an application/service/controller layer. The problem with this of course is that as complexity increases, that app/service/controller layer increasingly approaches a big ball of mud.

Edit: grammar


Eh. I've used ORMs and SQL extensively for many years. I think the concerns over mismatch are routinely overstated.


It's such a small amount of friction compared to the advantages of a good RDBMS.


Would you recommend some resource for learning to setup postgres? As a new grad I often times find myself frustrated with setting up postgres or other rdbms so I usually opt for easier systems like amazon RDS, firebase or mongo.

Before docker became popular, I basically have to do something like [this](https://github.com/docker-library/postgres/blob/master/9.6/D...), which I don't think I have mastered till this day.


> As a new grad I often times find myself frustrated with setting up postgres or other rdbms so I usually opt for easier systems like amazon RDS, firebase or mongo.

Yeah, I know exactly what you mean. Everything in Postgres is supremely well documented, but there's just so much documentation, that getting the basics down is pretty hard. I definitely went through a period of building muscle that was a tad frustrating to say the least.

Unfortunately, I don't know a particularly succinct resource that I recommend right now, but I will suggest taking a look at official docs. Maybe the section on managing databases in particular [1]. In general, you should just get good at using createdb, dropdb, and psql (which is mostly just generic SQL).

If you're on a Mac, installation and setup might looks as easy as:

    brew install postgresql
    brew services start postgresql
    createdb my-test-db
    psql my-test-db
    dropdb my-test-db
(There's obviously far more there than just that, but at a basic level, most of the commands are reasonably easy to use.)

[1] https://www.postgresql.org/docs/9.6/static/managing-database...


Now setup 3-5 replica nodes with at least automatic master election and failover.


and you need that because ...?


You shouldn't go down when one box or VPS falls over?


For 99% of the situations you just need 2 boxes.


And if you do need more, there's commercial solutions:

https://aws.amazon.com/rds/postgresql/

https://www.citusdata.com/


RDS is very far from optimal for high load PG


And, I've worked in a few that needed more... million+ simultaneous users, and over 380k requests per second, and 5 nines. There really are times you need more than an SQL server can keep up with.

Another, it was the shape of the data, all keyed off a single record, but ancillary tables with additional queries or joins meant over 30 joins, or 15 additional queries to render one resource, which fit into a single "nosql" record and mongo supported the additional indexes needed, it was a good fit for the use case.

I'm not advocating nosql for everything... I've seen a few that went mongo that would have been better served with sql. But that isn't to say that a document or column-store database is never a better fit from the start.


You shouldn't and that is trivial to setup.


Docker works fine, but I prefer to run a vagrant machine with ansible to setup postgres + everything else for development. In production I just use RDS.

This is the ansible role I use: https://github.com/ANXS/postgresql


do you guys have replication set up to your dev such that dev will have somewhat current info? If so how do you guys handle that replication?


Since we take nightly backups of the RDS instances and store them in S3, I just wrote some ansible commands to download/import into the local db.


> the startup overhead of an SQL system is a myth.

It's also a myth that devs choose NoSQL over SQL because "SQL is too hard."


It's also a myth that devs choose NoSQL over SQL because "SQL is too hard."

What I see more of is situations where the developers "know" SQL but still just don't want to sit down and figure out their data models.

It's like, the temptation to reach for the nearest short-cut -- whether that means over-reliance on ORMs (or hodgepodge 'data access layers'); or continually munging stuff at the application layer for nearly every operation; or... Mongo -- is always just too great.


figuring out the data model as you build can be beneficial, you realize what you need as you work with the thing and can add it on the fly, at a certain point of course you need to do a cleanup. Getting people to accept the cleanup requirement is the difficult part.


Can't speak for anyone else... for me, it was the shape of the data I was interacting with was best represented as nested objects... think a classifieds site... primary record is the ad/entry, but different types of products will have differing ancillary data. That was my use case in first using MongoDB. Why, because otherwise there were up to 30+ joins in order to bring up all the data around any given classified entry.

Reshaping the data into a document store, with a handful of indexes made a lot of sense. Nearly effortless replication was also a nice feature, and needed for some extra endurance and scale. I'm pretty sad that RethinkDB didn't get farther, as their management interface and approach was a lot better than MongoDB.

To me it's about what does the data look like, what are the performance needs, and how much do you want to spend, cross-train or hire out expertise. Not to mention the lock-in options. There are lots of options, and it depends on the project and use. I've worked in enough applications where at least parts of the application needed to be moved off of SQL short of spending FAR more on licensing and/or servers that were insanely expensive for the need the value wasn't good.


Unfortunately, it's one of those myths that is prevalent enough because there are enough proponents of NoSQL who give it as their reason for doing it.

Kind of like how people also say they choose Node.js so they only have to use one language on the server and the client.


The issue is absolutely not the overhead of schemas or managing them.

The issue is the overhead of systems administration when clustering. Clustering is a requirement to support high availability, and Postgres clustering is a real bear to set up. Mongo clustering is easy.

Postgres and most other SQL databases hail from the era when things ran on a "box." They're designed that way. Clustering is a bolt-on. It's that more than queries or schemas that drive NoSQL.

It would have been better to fork Postgres and create a cleaned-up easy to cluster version minus the obnoxious arcana, but that's no fun. It's more fun to reinvent the wheel.

Edit: We migrated from Postgres to RethinkDB for this reason and are mostly happy with it. We miss the data integrity guards of SQL but getting clustering without a full time DBA and Raft consensus fail-over without intervention is a worthy trade off. We still use Postgres for back-end warehousing and analytics but those are not live systems so they can live on a "box."


>I've used Postgres and Mongo pretty extensively, and for any reasonably seasoned developer, the startup overhead of an SQL system is a myth. There may upfront cost to learning how an RDMS and SQL work in the first place, but once you're familiar with them, they'll be faster than Mongo on any new project.

Exactly, I was very new to the RDBMS world and always I have preferred to go for MongoDB. Once in my side project, I thought of learning RDBMS and used Postgresql. Once I learnt how transactions, joins works then RDBMS was totally an easy topic.

I have used Jooq instead of ORM, which again helped me to learn the queries and underlying system of Postgresql!


In the words of one of my former DBAs: "MySQL makes a pretty good noSQL database".

That is to say you can do a quick and dirty key-value store in MySQL and it will be plenty fast. When you're ready to start normalizing things, just start pulling values out into their own columns and/or tables.

Conversely, when starting a product, don't worry about deeply normalized databases: use a relational DB as a glorified key-value store, then normalize things as you go. You'll get the best of both worlds.


Yep. I've done the same thing by using JSONB in postgres.


I've done similar for most SQL databases having either an xml, or json column to store ancillary information that didn't need to be accessed or indexed in ways that aren't programmatic.

For example, if you support Amazon Payments, Paypal and a payment gateway, the shape of the data for the transaction is different, and rarely looked at, why not shove it into a bigtext/blob field as xml/json.


Yes, relational databases are easy deal with. A single machine with proper backups can handle data needs for most companies. The explosion of managed service providers makes this even cheaper, faster and easier to start with.

ORMs can do 95% of most project data access needs and a little bit of SQL knowledge allows for easily creating and editing tables - or use the numerous GUI tools out there.

If anything, SQL actually lets you be more agile by allowing for quick queries to fix data or do a complicated 1-time report without having to write up and deploy code. This probably has more to do with the SQL language itself which is why it's surprising that MongoDB never tried to implement any of it.


But the cost of a server big enough to handle a million simultaneous users can be cost prohibitive if you can get away with a few smaller nodes and something that scales horizontally... not to mention failover/redundancy which can be downright expensive and/or painful in many SQL variants.


That's the whole point - worry about that when you actually have a million simultaneous users, which is a huge number. There are plenty of options like reading from replicas, using a caching layer, or moving specific data to another datastore - all work that's better done when you actually have success and the resources to understand your data access patterns and implement a smart solution when it's needed.

Basic replication has been part of every major relational database for decades. Where is it expensive or painful? Do you really need 100% uptime? There's a vast ecosystem for relational databases that can help with all of this and now you have CitusDB, MemSQL and CockroachDB if you need built-in next-gen scaling and availability.


The cost of dev time having to reimplement some of the features available in RDBMS on top of NoSQL stores can be NxTimes the cost of of that box.


This is kind of a ridiculous statement given the massive popularity of NoSQL DBs like Cassandra, Redis, and Lucene.

Not to mention what you described seems like a perfectly reasonable architectural evolution. Are we all really expecting to make a single decision on a DB for the entirety of an app's life? It makes a lot of sense to use a flexible db schema while the data takes shape in an app's infancy, then solidify things with a more structured database schema further down the line. If you're abstracting things away in ORMs anyway, it might not even be that painful if you're just switching out DB drivers.


Most people don't use Redis and Lucene as general-purpose data stores; they're normally used for specific purposes.


I would go so far to say that nobody should ever use Lucene as a data store. It's very easy for it to get corrupted because it was explictly designed to support fast full-text searches at the expense of persistence (because your data is supposed to be persisted elsewhere).


Nobody uses Redis like a database and expects persistence of data. Redis is usually used in the context of a message queue.

Lucene isn't a database, it's a full text search engine.

Cassandra has a query language which is very close to SQL, so it's not really NoSQL.


> Redis is usually used in the context of a message queue.

Or caches, or simple data stores for things like counts.


> Nobody uses Redis like a database and expects persistence of data.

Plenty of people do, it's got just as good persistence and backups as anything else. It's very versatile and used for lots of situations, with message queues just being one option (although not a great fit until it gets a stream data type).


>Nobody uses Redis like a database and expects persistence of data.

You'd be surprised. I know of at least one (extremely) large company that does / has done this.


Redis is either used as pub sub / message queue but also often it is used for caching similar to memcached.


Caching does not have the same expectations for persistance that a database has. If data in Redis gets blown away, I don't really care. If data in my RDBMS or NoSQLDB gets blown away, it's a huge problem.


Yes, I agree. Most people I know would use RDBMS such as Postgres to store bulk of business data (customer accounts, transactions, receipts etc) and then use complementary systems for other purposes (so could be Memcache/Redis for caching, ElasticSearch for logs etc).

It's not a false dichotomy of using either Postgres or Redis, you can use them alongside each other, they have different uses.


Canssandra is much more difficult than an SQL database to build functionality in, but can scale linearly with the number of nodes added. I quite enjoyed it’s deliberately imposed limitations to help you write a scalable data layer.


0. Finally, an API that actually attempts to fit modern programming paradigms!

I'm not sure the unstructured nature of many NoSQL databases was even the selling feature so much as the NoSQL part. People were growing tired of dealing with lacklustre ORMs and poor integration with their programming tools and were desperate, even in light of the shortcomings, to try something new.

It is funny that we have a thousand and one different general purpose programming languages that try to solve the problems with general purpose computing, but SQL dialects (with slight variation here and there), have very little competition, despite all the problems with the SQL language.

A NoSQL relational database could have potentially hit the sweet spot, but Mongo and similar databases came first and caused enough trouble, as you have described, that I feel like they've killed the momentum in looking for better solutions to access database data (relational ones included). Nowadays many of those people are just happy to have a relational database again, even that means accepting SQL.


> It is funny that we have a thousand and one different general purpose programming languages that try to solve the problems with general purpose computing, but SQL dialects (with slight variation here and there), have very little competition, despite all the problems with the SQL language.

This is kinda like complaining that there's only one implementation of lambda calculus. Yes, SQL is not directly a relational algebra, but it's pretty close. (Also, tables are not relations and SQL deals with that.)


> This is kinda like complaining that there's only one implementation of lambda calculus. Yes, SQL is not directly a relational algebra, but it's pretty close.

And yet someone proficient in SQL will not necessarily be able to read a traditionally-written lambda calculus expression as the languages differ significantly (and not just because of tables), so I think this proves my point that there are different, and hopefully better, ways to express the interaction with the database. There is no reason why the protocol has to be exactly as SQL is. There are theoretically an infinite number of ways to communicate the same intent to a RDBMS.

For instance, SQL does not do composability well. There is no fundamental reason why you should not be able to write query fragments, simplifying the way many queries are constructed and avoiding the huge mess that monolithic, hard to debug, queries often become. It is simply that SQL does not allow for it. A better API could improve on this significantly, without losing any features a relational database provides. You don't have to throw all database theory out the window just to change the API away from one that was designed 40 years ago.


> And yet someone proficient in SQL will not necessarily be able to read a traditionally-written lambda calculus expression as the languages differ significantly (and not just because of tables), so I think this proves my point that there are different, and hopefully better, ways to express the interaction with the database. There is no reason why the protocol has to be exactly as SQL is. There are theoretically an infinite number of ways to communicate the same intent to a RDBMS.

First, lambda calculus isn't a language -- at least, not in the typical sense. It's a mathematical logic construct. It's more akin to geometry than to a programming language. Same applies to relational algebra.

And of course lambda calculus and relational algebra differ significantly... Their domains are completely disjoint. I was not comparing them in this way. I'll restate my premise:

If we consider SQL a mapping of relational algebra operators to SQL operators [0], then any other language which also maps to relational algebra will not be significantly different. Thus, complaining about SQL being a singularity is kinda like complaining that relational algebra, lambda calculus, or any other mathematical constructs are singularities.

> For instance, SQL does not do composability well. There is no fundamental reason why you should not be able to write query fragments, simplifying the way many queries are constructed and avoiding the huge mess that monolithic, hard to debug, queries often become. It is simply that SQL does not allow for it. A better API could improve on this significantly, without losing any features a relational database provides.

You can compose SQL with views. Non-materialized views act as pure SQL composition. Materialized views allow precalculation of subexpressions.

Most of the big databases also support common table expressions, which allow for decomposition within a single query.

https://en.wikipedia.org/wiki/Hierarchical_and_recursive_que...

> You don't have to throw all database theory out the window just to change the API away from one that was designed 40 years ago.

Of course not. The problem is that any new SQL would just be a new mapping for the same underlying relational algebra. "Welcome to the new SQL, same as the old SQL."

[0] It's not, because tables aren't relations. However, the operators are similar enough that parallels are easily drawn.


> First, lambda calculus isn't a language -- at least, not in the typical sense.

There are common ways to express lambda calculus expressions in order for people to communicate those expressions with each other, which is very much a language. After all, we are talking about the language of SQL here, not the mathematical concepts that lie beneath the language. Don't confuse what SQL is able to achieve with the way SQL is written. We are discussing only the latter.

> The problem is that any new SQL would just be a new mapping for the same underlying relational algebra.

And general purpose programming languages map to the same underlying turing machine. That hasn't stopped us from coming up with variations in language to improve the developer experience, even if they are all quite similar at the end of the day. Let's face it, if you can program in one general purpose programming language, you can program in them all. There are no real differences between them. The only thing to learn when switching languages is the grammar and idioms of the language. The underlying mathematical theory never changes.

You could just as easily say "Welcome to the new C, same as the old C." And yet, the new C has proven to be quite valuable when it comes to developer productivity and happiness. There is very much room for the same in SQL land, if someone was ever motivated enough to look for it. I maintain that NoSQL becoming synonymous with unstructured key-value/document data stores has killed what motivation there may have been though.


Here's my personal version of the story:

With SQL DBs (Oracle, SQLServer and MySQL):

1. SQL database migrations where killing us. Going back and forward in a dev environment was impossible. No hot deploy in production.

2. Could not work well with application user-defined fields: adding columns adhoc to the database, indexing them, normalizing and denormalizing, performance issues, everything was a problem.

3. Blobs holding logging data got unmanageable quickly.

4. Joins where very hard to optimize even though the team had a lot of DBA experience fine tuning databases.

5. Had to build a very complex architecture around the database for a product that was not that complex: cache, search, database, blob store, distributed, etc.

And with all our 1990s and 2000s previous experiences in data warehousing, business intelligence and DB optimization tools, we were still wasting valuable time with SQL design, indexing, query planning and parameter optimization. So we gave MongoDB a try. First as a cache. Later as the only DB.

Our journey:

1. Heard about Mongo. Tried the DB. The driver worked great. To me that's the number one "marketing antics behind MongoDB": their strategy creating drivers and supporting the programmer community.

2. Understood what NoSQL meant and forgot about joins altogether.

3. Understood what NoSQL meant and built transactions into atomic documents.

4. Understood what NoSQL meant and stopped relying on the database for type, primary and foreign key constraints, default values, triggers (argh!), stored procedures (2x argh!), etc.

5. Simplified the architecture with integrated search, queue and cache. Less moving parts = joy.

6. Result: very low maintenance, easy install, configuration, replication and migrations. 99.999% availability.

7. Bonus: we even implemented a very high frequency, atomic distributed semaphore system with a FIFO queue that reaps zombies using Mongo built-in networking features.

So we've reduced DB-related issues by an order of magnitude. How? I think because NoSQL is a way of saying the DB should not be magically answering random queries. A database should be a data store, period -- just store and retrieve data the way the app needs it. By focusing our energies on getting the data right as documents for a document store meant data flows as objects from code in and out of Mongo.

I believe people underestimate how important (and productive) it is to keep the same data structures flowing between the UI (JSON), server (Object/Hash/Dictionary) and DB (document). It makes code easier to read and more resillient to errors.

But SQL DBs come with a convenience layer bolted on to run random user queries with things like OUTER joins and GROUP BYs. For that we need to flatten data into tables, which clashes with typically how data flows in an app.

SQL DBs however are great as the single source of truth for data: a schema can be laid out and enforced independently of code, so it's safely guarded from programmers breaking it. Business sets up a SQL DB so that their reporting people can query data on demand while consultants with zero knowledge of the business can write code limited by constraints managed by DBAs. SQL is even taught at business schools, which is revealing of who its target audience actually is.

Bottom-line: SQL and schema enforcing are end-user features we did not need to build our tool. On the other hand, every single MongoDB feature is something we need and use profusely.


> 2. Understood what NoSQL meant and forgot about joins altogether.

How would you represent a simple invoicing system in MongoDB (e.g. Customers + Products + Orders + OrderLineItems )? NoSQL-for-everything advocates posit two solutions: either denormalize the data by embedding Customer information within an Order document, which also contains an array of OrderLineItems, or use a UUID as a kind-of foreign key and maintain separate relationships. Both approaches have serious problems (data-duplication and inevitable inconsistency in the first, and lack of referential integrity in the second, besides ending-up abusing a NoSQL database as an RDBMS). Is there a better way? Or would you agree that certain classes of problems are best left to RDBMS' domain?


The example you've used (invoices) is actually quite instructive for demonstrating the benefits of a "document store." An invoice, historically, was a literal printed piece of paper. Invoices are actually really annoying to implement in an RDBMS because of so-called "referential integrity" -- an invoice should be a "snapshot in time" of everything that happened when the order was processed, so ideally, when a user views their invoices from the past 2 years, they look the same every time.

Except, oops, your user got married and moved, now your precious "referential integrity" means jack because the generated invoice is flat-out wrong. Product removed from the store? Too bad, needs to stay in the database forever for historical purposes. Prices need to change? Better design the database to handle snapshots of every product state.

If you were implementing this in MongoDB, you'd probably store a UUID and the flattened data at the time of invoice generation, that way you can still query on ids AND not deal with the headache of having a combinatorial explosion of data in your RDBMS.


You would solve this in a RDBMS the same way: de-normalize when you're saving the invoice (example: a line items table with snap shot of current item price, description, etc.)


Yes, which suggests that the "serious problems" mentioned by the grandparent aren't serious (or problems) at all.


In Postgres, you'd simply have a table with a JSON column for the snapshot-in-type contract.

You can then select fields from that JSON for invoices, reports, etc with the arrow operator:

https://www.postgresql.org/docs/9.6/static/functions-json.ht...


With SQL you can denormalize all that (and should) to create that snapshot. But with NoSQL you can't normalize and get back a way to quickly query the number of products sold per month over the last 5 years.


Yes, this is possible with Aggregation and MapReduce: https://docs.mongodb.com/manual/aggregation/


For relative values of "quickly".


Instead of nebulous terms like NoSQL you should instead just look at the damn features because these concepts are orthogonal. MongoDB has transaction isolation on the document level instead of the database level. If you can store everything in a single document then it doesn't matter. If you can't then use a database that supports database level transactions. It doesn't matter if it's a NoSQL or RDBMS database.

I feel a lot of people know that typical nosql databases (without database level transactions) are not suitable for their problem but they don't know why and then just think NoSQL is always bad and RDBMS are always better because the NoSQL databases are intended to be used for different problems.


Not the original commentor, but there are some valid cases for NoSQL: some people use it for storing massive amounts of web crawling data. But the thing here is that it's throw-away'ish, and in that case it's often not worth it to add structure (even though there pretty much is structure in everythig you look at long enough).

But I do think having any data consisting of, say, items, orders, users, payment in MongoDB is very much a bad idea. Been there.


> I think because NoSQL is a way of saying the DB should not be magically answering random queries.

The reason this is wrong is something that Codd et al learned a while ago: the data is MORE IMPORTANT than the application. Applications change and/or become obsolete; the data doesn't. You will still need to query the same database 50 years from now, but you likely won't have the same application to do it with. That means that everything that is important to the data (schema, constraints and so on) needs to stay with the data.


What was your tool?


What nonsense. I have never considered using postgres a hindrance due to migrations or something of the like. If anything, migrations and having a consistent database makes it easier to reason about what you code should do without having to take care of missing pieces of information in noSQL objects that have become stale because of 'speed of iteration'.


I really wonder what all of these claimants do. Do they change their data models and corresponding procedures, and then just leave all of the stale, nonconforming data in there? How is that working out?


>I think the most interesting question though is would they be able to get MVP and initial customers that set off this if they were moving (slightly) slower due to SQL and slight overhead that comes with?

Why the heck not? People (startups and others) were using RDBMSs and SQL for years before NoSQL was a thing.

Also, if slight delay is a thing, how about the impact of your point 7 on customers?


I have been using Mongo for several years now on a fresh build that follows basically the outline you have here. The fresh build was triggered in part because of the cost (development time and infrastructure) in using Microsoft SQL Server. The only difference is that I got to step 5, realised I had two very different reporting requirements. One that Mongo works well with (real time aggregated data), and one which SQL works well with (high granularity, long time period, historical stuff). So, I use Mongo for the first requirement, and SQL for the later.

I see many posts like this, and indeed many replies like it also (your thread currently). Curious, why do they always focus on have only one database to fit all their needs? Is my situation special, can't you just make a decision about what tool is best for the job and use it? Is there some eventuality where having databases specific to needs results in disaster that I am just not aware of?


JSONB in pgSQL fills in so nicely for steps 1-4. Once you need columns and relationships, the relational store is already there!


I largely agree with your journey and your premise.

However, I would add that I've seen just as many beast-of-a-thing SQL databases as NoSQL ones. In fact, anything I've seen at sufficient scale has lots of wrangling with their database.

While your step 7 usually ends up with a better tech result, it's done at a time when the organization knows the problem domain much (much) better. They have real users. You'd probably get a similar result going from one SQL database to another.


I'm doing the same for another project. Reporting with $wind can be limiting

Schemaless db is great feature. Joins can be done through an orm like Laravel's eloquent which allows you to join mysql to mongo using a cross-join table on the mysql side.

Adding columns in an mpv product is trivial. Getting a stream of unstructured data and being able to save the different data to a document that is part of the same collection is an amazing feature that is needed still.


Adding a non-NULL column to an existing table with millions of rows, and then updating the entity/model objects upwards towards the UI is non-trivial - I find myself "queuing up" planned database schema changes and applying them in a batch.

What I'd love is a traditional database system with the concept of "cheap columns" that store "just data" scalar values, which can be easily created, dropped and manipulated - they're addressable so they can be used in queries and statistics/analytics, but the values themselves won't need indexes or referential-integrity - and for an ORM and Scaffolding system to support them - that way I won't pull out my hair when the client says one day "let's go from a single column for the customer's name to having separate title/first/middle/last/suffix columns"... and back again the week after.


So something like PostgreSQL `json` and `jsonb` types? You can select them out into columns, so they should play nicely with ORMs.

https://www.postgresql.org/docs/current/static/datatype-json...


It seems like the "column families" of Cassandra would appeal to you.


This! 100% this. I loved phases 1-4. Then 5 hit and horror show. I realized quickly that I was building kludgy extraction routines to get the reports/metrics I needed. Once the horrible process became unmanageable I would have gladly paid the longer development/iteration time on phases 1-4 in order to have a robust and resilient system which allows proper reporting.


My feeling is that the people who flock to NoSQL have never used a good ORM such as Django to abstract away the raw SQL queries. A good ORM gives you the simplicity of NoSQL plus all the power of a relational database.


I think that is only true of a subset of NoSQL users.

And even if it is true that Django's ORM is very nice, there are tradeoffs to using an ORM that mean you don't get "all the power" of whatever database you are using. Maybe all the power you care about; for instance, power in expressiveness and easy cross platform support, at the expense of performance optimizations that aren't important to you. But it's still a tradeoff.


You're right.


(Author here) You can read parts 1 and 2 of the three part series:

- Part 1, Why Did So Many Startups Choose MongoDB: https://www.nemil.com/mongo/1.html

- Part 2, Startup Engineers and Our Mistakes with MongoDB: https://www.nemil.com/mongo/2.html

You can see most of the notes from my interview with MongoDB's CTO, Eliot here:

https://news.ycombinator.com/item?id=14804765

And the interview notes related to MongoDB's marketing are somewhere in this HN post:

https://news.ycombinator.com/item?id=15124316


I really enjoy your writing style and how you wrote about the "story of humans" rather than writing YAMTT (Yet Another MongoDB Trash Talk).

I'm actually a lot more comfortable now using MongoDB in production after understanding its level of maturity and what the right application is. For the last couple years I was scared off by all the negative HN comments on MongoDB articles.


Thanks for the kind words. Most thoughtful engineers I've met are believers in right tool for the job and thinking in tradeoffs, not good and bad - and so I'm a big fan about digging into MongoDB's value and understanding your projects' needs.

MongoDB has matured as well over the last decade, and it's a testament to the hard work of their engineering team.

You may like this old post that I wrote which echoes the broader points I make in this series about thoughtfully making eng choices and having better eng debates:

https://www.nemil.com/musings/betterdebates.html


so what is the right application for mongodb? only when you don't need ACID? Can you give some example?

Genuinely curious


We used Couchbase (a NoSQL like MongoDB) for email batch sending (think invitations to a party) and it worked very well. We could store each email sent as a separate, denormalized document so the sender could see EXACTLY how their contact data was replaced in each individual instance, the "View as a Web Page" functionality was trivial (instead of recalculating everything from the normalized forms- which can be blown up by contact data changes, just load the document that you sent out), and it's lovely TTL feature meant we could handle the configurable retention policies trivially as well.

It wasn't so good at doing reports (how many customers viewed the email yet, responded, etc.). One thing we talked about doing was just storing a sqlite or h2 database as a document for reporting purposes (if we had been more single threaded that could have worked nicely). We ended up using a separate Sql DB for that.

There are cases where denormalized data is the "right" way to view stuff, and cases where the data really is easier if its normalized, and that is a good reason to push your DB selection one way or another.


Interesting. Seems like that would just be one part of a larger application, though. And for that, my mind just jumps to something like Postgres with `jsonb` fields to store it all denormalized, then using columns to store relations, like the contact it was sent to. Along with other tables for other parts of the application, of course.

This way your aren't complicating your stack by adding more services sooner.


We were replacing a fully SQL email engine (that was starting to fall over due to load) with this more hybrid approach; we had customers, we knew that the business case closed, but the load was starting to overpower the main database, so we spun off separate databases, and bought ourselves a little more overhead by splitting out the normalized and denormalized data. Could well have been a mistake, but we weren't thrilled with postgres ability to scale horizontally, so went to CB so we could scale a bit more. (As I recall, we were doing 7m emails a day, our goal was to support up to 70m with that structure.)


Interesting choice to have people sign up for free stickers. I expect a follow-up article on "The Marketing Behind the Marketing Behind MongoDB" :)


Both the stickers and three-part series were conscious choices.

Etsy did a three-part series on the benefits of MongoDB in the early 2010s.

MongoDB was also well known for its stickers, and their marketing team wrote online about how these stickers helped solicit emails:

"Give away swag: This is a low cost, easy way to build a lead database and get developers to decorate their laptops with your logo!"


I see, thanks for the extra context. By the way, the notion of "marketing attacks" is a powerful one, it's going to stick with me.


I take it you never skateboarded. If you really want to see a veritable marketing orgy, look at the skateboarding world. It's gotten even crazier last time I looked.

This is like the kale salad of that type of marketing. It is definitely effective, though.


Do you know if it ended up ever being a full series? The Etsy site only has the first two parts.


It ended at two, but these were later follow-ups that seemed inspired by the early experience. The first one below especially felt like a coda to the first two:

http://mcfunley.com/why-mongodb-never-worked-out-at-etsy

http://mcfunley.com/choose-boring-technology


There's 100X the interest in hyping technologies, as opposed to realistic analysis. I appreciate that you chose the latter.


I maybe missed it in the posts, but would you mind to explain why you focus on MongoDB in particular and NoSQL in general?


I was in the startup world in the early 2010s, and saw MongoDB used in a number of startups (over ~5 years).

These companies included everything from very early stage Y Combinator backed startups to one of the most famous unicorns (tech issues at growing companies are rarely discussed publicly, so I was lucky that my friends were willing to privately share; compare that to successful tech decisions, which are widely discussed and blogged).

In that time, I heard many stories about the issues companies had with it - and angry debates about whether it was the right choice. In the early years, you were ancient if you used something more conventional than Mongo; in later years, you were dumb if you used Mongo in many startups (both views have their issues).

I wanted to understand:

- Why was MongoDB so popular

- What issues did startup engineers have with it

- Given these issues, why was it chosen in the first place

And most importantly, what lessons does this case study have for future dev tool decisions.


I used MongoDB in production. About 6 years ago, I have bad experience with it and abandon it.

In a new job recently, I picked up it again and to my surprise, it performs super well and have lots of new concepts.

1. TTL index: the data automatically remove after a certain time

2. Hidden Replica: we can do whatever on this node without slow down producion

3. Very easy to use oplog: It's super easy to get access to oplog, just as you work with a normal collection

4. Aggregation Framework is awesome: It's tedious to write at first, however, it become very clear with the pipeline design

5. JOIN: I'm suprising when I discover this. But it does have a similar concept to join now with `$lookup`

6. Very low storage: If you used MongoDB as a log/timeseries database/event you will appreciate WiredTiger

7. Metric exposing is awesome: Lots of userful metrics

8. The cluster is much easier to manage nowsaday and very stable: adding a new node is just a matter of booting up server and evrything handle automatically. Think of MySQL, where you have to export data, get the position of bin log and file name to config the slave.

One thing remains same is we still have to deal with migration and back fill data.

Careful planning and design the database are still requirement, not just dump whatever you want to it.


"but need to spend time debating how to protect ourselves from marketing “attacks”."

I don't have a complete solution to this, but I know I've got two for you:

1. Beware any solution where the same thing is demo'ed every time. It's a sign that either nobody is actually using it, or it's over-tied to the specific thing being demo'ed.

2. Any time you see a solution that massively outperforms some existing and well-developed solution, you must always ask what was dropped from the existing solution to get that speed, on the assumption that the well-developed solution is probably already pretty optimized to be whatever it is. It isn't necessarily bad to drop things, heck 90% of "cloud" technology basically consists of "dropping the things that don't work well in a distributed environment", but you need to know what those things are.

#2 in particular would have saved a lot of people in the context of MongoDB. It doesn't mean Mongo is a bad choice for everything, or at least, not anymore, but you need to know what you're giving up to get there. (And in particular I'd suggest "We can use this technology without having to have discipline!" is... double-edged at the very least. You may not want a bondage-and-discipline tech at a startup, but tools that offer some gentle-but-solid defaults and guidance may help you focus your cognitive energies less on establishing all the rules from scratch and more on whatever your actual problem is.)


There is a common perception in the developer community that "if you build it they will come", which is usually not the case. Theres a surprising amount of marketing behind every successful company.


Reminds me of my favorite taoist saying: We honor what lies within our sphere of knowledge, but fail to acknowledge how much we depend on what lies beyond it.


Absolutely. RethinkDB was technically a far superior product but lacked the marketing and fanboyism that MongoDB orchestrated so effectively.

Today, Docker is following MongoDB's playbook.

HN Hiring Trends of MongoDB vs. Docker: http://i.imgur.com/tYKzmaG.png


tbh, I don't disagree with you here. RethinkDB invested in our open source community more than marketing.. it's arguably part of why we didn't succeed as a business.


I personally got how important are sales after a hard lesson. At the very least if one cannot explain technology to a salesperson, then the technology is not ready.


Great write up! That was informative and well-written.

As someone who writes content aimed at developers, I'm concerned about the generalized sentiment this creates towards technical content marketing. I stopped coding years ago and started trying to lead teams and figure out how to make businesses grow; quality marketing is a way to do that. In the HackerNews bubble, that's viewed as a step 'down the ladder'.

I think it's important to make a distinction; is the issue that they marketed well or that their product didn't match their marketing? It's the latter -- any company that hopes to succeed in the uber-competitive, fast-moving technical world needs marketing.

If we're making a fancy tool and I write a quality guide demonstrating how you can do "us" 'the ol' fashioned way' in an effort to demonstrate our value, is that wrong? How else am I to do it? You simply won't come if I don't tantalize you.

Digital Ocean had a brilliant content strategy; incentivize users to write tutorials that would bring people to try those tutorials out on their droplets. I hope we don't dismiss technical marketing as evil. Technical marketing can be valuable and be written with care and integrity.


Reminder: they shipped with the defaults which basically would throw the data over the wall and never acknowledge it back to the user (because speed and such). Interestingly enough it still called itself a database, and that right there is the power of marketing.


My own research doesn't back up that this "unsafe write" was done by 10gen for benchmarking reasons, but rather due to an early expected use case (from a footnote in part 1 of my series):

Waiting for writes was off by default:

> This "unchecked" type of write is just supposed to be for stuff like analytics or sensor data, when you're getting a zillion a second and don't really care [if] some get lost [or] if the server crashes.

This was rarely the typical use case in most startups - even though the defaults were based on it for a long time.

It’s unclear to me how long this was the top Google result, and it was earlier in Mongo’s life (2009). This may be one of the benchmarks that led to anger at competing NoSQL vendors - and seems to me like a mistake rather than a malevolent effort.

https://www.nemil.com/mongo/1.html#fn2

_____________________

MongoDB's CTO has also mentioned that if he could go back and change anything it would be this early default, as his earliest customers valued it - but he quickly realized that it caused issues for others.

(Throughout this series, my goal has been to be tough, but fair to both sides)


If your monitoring can't handle your what system is doing, aggregate locally until it can. Having systematically incorrect stats (because loss is correlated with load) is worse than having none.


The real question is how many years they left the unsafe default after people pointed out that it was a problem.


It took a little while (say 1.5ish year from when they realized it was causing issues). According to MongoDB's CTO, this was because they couldn't quickly move their early users from the behavior they were used to.


And the dodgy benchmarks comparing their ability to write to memory with Postgres ability to write to disk!

Fact: Postgres with its JSONB datatype is a better Mongo than Mongo...


What you wrote isn't a fact, it's a claim, and a shoddy one at that. If you're basing it on anything other than hearsay, I'd love to see your proof.


Source for that fact?


This is around the time JSONB was introduced[1].

There was also (not sure if still developed) an interface called ToroDB, which allows you to plug in a MongoDB application to store data in Postgres. The interface provides Mongo compatible layer and the application still thinks it talks with Mongo. Even with the extra translation layer, ToroDB was faster than original MongoDB[2].

[1] https://www.enterprisedb.com/postgres-plus-edb-blog/marc-lin...

[2] https://www.slideshare.net/8kdata/torodb-internals-how-to-cr...


That EnterpriseDB benchmark is a comically inept hit piece. See: https://newbiedba.wordpress.com/2017/05/26/thoughts-on-postg...


Just to counterbalance the amount of negative sentiment here, we've used MongoDB for Artsy.net, following a recommendation from Foursquare's CTO (and others) since 2010. Elliott has also been very generous with his time, too. MongoDB enabled us to iterate fast and continues to serve well.


Thanks for sharing. It'd be really helpful for others to understand the use cases that made MongoDB a good choice for you in 2010.

Part of what I've felt is missing in the debate is where early MongoDB versions were a good choice, and where they were a poor choice. For example, I've seen it used in a number of places where transactions are really important (a number of fintech companies) - and I'll assume this was not a need for your team?

And as I've stressed in this series, my goal is not to argue that MongoDB is a poor choice, especially as it has matured. Rather, it is to argue that we have to pick the right tool for the job - and can't blindly follow hype or marketing.


I would love to know what mongodb offered you above MySQL/Postgres?

I always hear you can iterate fast but for most usecases I cannot see how mongodb is quicker to iterate than a relational database. Would love to learn though.


I can't speak for the parent, but there was a time around 2011 when I was pretty excited that MongoDB would let me just start coding without needing to 1) configure postgres's listening port and pg_hba.conf to allow easy local access, and 2) churn through a dozen or so revisions to the schema without having to set up a system for running migration scripts.

I really think a lot of Mongo's success came from that "It just works" experience the first time people tried it.

I later came to see it as a bad tradeoff. Open, unauthenticated access by default created more costs in security than it saved in early prototyping. Automating the creation of my dev environment made it not such a big deal to make Postgres configured right when it booted. And once I'd gotten a SQL migration system that I liked, I just kept re-using it. These things took some time for me to develop and learn, though.


onfigure postgres's listening port and pg_hba.conf to allow easy local access

That is literally 30 seconds work!


Only with the benefit of hindsight, which isn't helpful to the newcomer.

To someone new who is just trying to work through some web framework tutorial or hack out a half-formed proof of concept, it can be several minutes (or much longer) of trying, failing, copying error messages into Google, and following some guide written for Debian but failing to get it working because you're running Red Hat (or vice versa). When you stack up enough similar problems getting the rest of the stack to work (e.g. configuring Apache or nginx), it stops a lot of new people in their tracks. When they're given something that just works, it's a godsend.


As someone who has worked largely with MySQL until a recent project that is internal to the media enterprise I work for, MongoDB was a godsend in rapidly developing an app that was formerly using MySQL. I faced an extremely tight deadline to rewrite it in order to be ready for a production period for a national publication and it just worked for the purpose of the app.

It otherwise required a lot of state handling of JSON objects on the front end that required a lot of parsing to be compatible with the tables and repairing and rewriting large swaths of that code can still happen, but could not in time for the deadlines.

I was able to pull the whole works together from the ground up (with improved UX) because of the document model. In this case documents are quite the thing being passed around and managed in the application, though.

In the long term, if the app is implemented into larger projects underway, then I would probably set up some sort of persistent store -- but until then it's been eye-openingly great.

For suitable projects, I agree fully. And for smaller applications, or tentative solutions (that are required in some environments) it's a more efficient option.


How is someone like that qualified to work as a software engineer? These were things I did when I was 13 on our old family Pentium, not something I would want a production engineer to do.


How is someone like that qualified to work as a software engineer?

If software engineering was like civil engineering then using MongoDB would be like building a bridge out of noodles.


I don't think anyone cares to impose a rule saying "You must be this tall to use MongoDB."


Heavy user of Postgres and Mongo. I use both in production for years. I don't think that there are good or bad databases. Every database has a different focus and thus respective trade-offs.

You know this feeling when you have an idea and want to create a quick prototype to test the idea? By quick I mean you want to have this thing running in a few minutes without any effort. This is possible both with Mongo and Postgres.

But with Mongo you even don't need to create databases or collections (tables) before. You just write to the DB and everything will be created if not there yet. This small thing let the code stay super simple and no migrations are required. With Docker and the official Mongo image you can setup a webserver with Mongo with authentication in seconds. Accessing the DB without an ORM is still super comfortable since everything is JSON (in case you work with Node and the Mongo native driver).

So, one of the biggest reasons for me is that I start after all and not find an excuse not to start this prototype. Because I am lazy and most time for creating a prototype is always spent on setting up the development environment. This is also why people like JSFiddle and Codepen so much: Just start coding.

It's great for prototyping and most of my prototypes stay prototypes. The few which got successful got migrated to a more mature architecture. Sometimes they are employing Postgres, sometimes Cassandra, but often enough they stay with Mongo but including validators, etc. All successful projects I started, began quick and dirty and slowly evolved into a mature architecture. The best projects were those where I had a prototype running in less than two hours.

It's really the use case. I think disliking a technology just limits your options.


Limiting my options is a good reason for becoming more experienced. The more of my options I limit, and the faster I limit them, the faster I can make things, as opposed to exploring options.

It's why beginners' guides are opinionated: The beginner has no opinions of their own, so they must be given the opinions of someone who has them so they can make progress instead of being overwhelmed by choice. Making progress involves replacing your teachers' opinions with your own, so you can limit your options to a manageable set instead of them being limited for you, but at no point are you truly open to looking at every single thing equally. It would take too much time for very little reward.


Thank you for sharing your experience and practical perspective. That was really insightful - especially the part about how your successful projects start out as rough sketches, quicker to start the better; then mature over time, migrating to other types of databases as needed.

Such a pragmatic approach, to use the right tool for the job, while keeping the advantage of having a range of tools at hand.


I've seen the best success in the NoSQL world when you treat data like a history book rather than a moment in time.

What I mean by this is creating entities, mutating them, and trying to maintain a state is the natural pattern to assume, but it is often wrong. It leads to a many issues reported here and elsewhere and people declaring "such and such DB sucks". This is a SQL approach to using data and NoSQL fails badly at this generally speaking and should not be used to store "state" since state often requires ACID and relationships.

Treating data as a history book, or log of events over time is a safer way to use NoSQL as an authoritative source. It plays well with their ability to scale simply and naturally and event data has no relations. Everything in the single event is immutable as it is a fact of occurrence. Invalidating a past event is as simple as generating a later event that says that event is now invalid from this point in time.

From this you can generate "entities" which describe a relationship between your events. From your log data you can generate a session, for instance. These are not authoritative sources and if found to be invalid you recalculate the entity, or state, from the collection of events.

I could create an entire social network profile for a person by aggregating and querying the history of events that make up their current state. And the real power is I can generate their state at precisely any point in time in the past and possibly predict what their state will be in the future based on their past stream of events and derivative states. If a past event was found to be invalid, a new event declares this and with a greater timestamp authorizes that fact and I can recalculate the entity that describes their "current state".

This takes work and thought though and one of the promises you've seen from some NoSQL vendors is they save you work. The work is always there but you need to get comfortable thinking of data in a different way and not all data needs this approach. But, in many cases, NoSQL is not trash.


I love event sourcing as a pattern. ES has its own overhead, but it is an intriguing pattern that works well for many use cases.


It really does and can be extrapolated to many, many use cases.

To me, this is the real power of document data stores. It's not for everything such as configuration or business rule data. But for a lot of generated data in the form of events it's a useful way to organize data.


I had trouble modeling sql because I started with nosql and used this exact approach.


Yeah in the SQL world you'll want to stick relational algebra and ERD's and then optimize from there.


Happy resident of a MongoDB environment here! In one of my projects, we are collecting IOT "clickstream" with slightly varying contents, around 20 million inserts per day. These are then transformed into "sessions" that are inserted in another collection. Everything works as it should.

Reporting is based on PowerBI and aggregations are scheduled through cron jobs to refresh the data sources that PowerBI uses. Realtime metrics are not needed, but could easily be implemented through the "session" generator by calling PowerBI Streaming API in parallel when writing a document.


I think what you're describing here is something that is often overlooked. You are treating the same data in 2 ways - events and entities and there's an important distinction.

The events come in and you store them. They are immutable descriptions with a timestamp which provides sequence. There is no relation between any 2 events.

Then you keep another collection of entities which you call "sessions". These comb through the event data and combine them into sessions which can answer different questions.

The event data is the authoritative source and because there are timestamps they are sequenced. This way you can generate session from them and if anything were to happen to make the session collection invalid you can "rollback" to a certain point in time and recalculate the sessions from there.

In the NoSQL world you often want to begin with "events". And then form "entities" from the sequence of events that are useful to you. Because you've stored every action you can always go back and recalculate and create new entities you hadn't yet discovered.


Yes, this is exactly our use case. Every now and then a need for new attributes or metrics appear that are to be calculated or otherwise logically determined for the "sessions" generated. Sometimes a new device model / software version starts providing new attributes with the transmitted event and we can just add metrics and aggregations based on those attributes into the session generator if those attributes are present on an event.


I have a Node server that restarts once a day. On start it reads the state of the app (which is just one big JSON) from MongoDB into a JavaScript variable. When something changes it updates the variable in the Node process and fires an event to update the field in MongoDB. In other word it only ever reads from MongoDB on server start. This has worked very good so far but with all the negative comments I'm thinking there must be some kind of drawback or reason I should switch to Postgre JSON instead.

One problem I see is that I need to overwrite the whole thing when saving with Postgre when you can easily just set one field in MongoDB. If I need to overwrite I may as well just save the whole state to a file instead.

Forgot to add that it does the same for users. Users are one collection which reads the document on login and saves the same way as the global state does on every update. How would that work with Postgre?


I'm not saying you should switch to PG, but since version 9.5 it does allow you to update JSON keys: https://stackoverflow.com/a/35349699/221786

Full docs: https://www.postgresql.org/docs/9.5/static/functions-json.ht...


While I have no particular stake in whether you rewrite, presumably in the Postgres rewrite, you'd break the monolithic JSON file up to be one row per key you write.

But that on its own would hardly be a reason to change.

"Forgot to add that it does the same for users. Users are one collection which reads the document on login and saves the same way as the global state does on every update. How would that work with Postgre?"

Well, that boils down to "how do relational databases work?", which is not really a suitable HN comment topic. Relational databases are designed fundamentally differently.

It doesn't matter as much as long as you stay under a scaling limit where reading and writing the entire state all the time isn't a problem for you, and there's no need to do any sort of query like "Tell me the last login time for all users in a fraction of a second, please". What is probably the best thing for you to do is A: budget some time to read about and maybe even do some playing with relational databases so that B: when you encounter a problem for which they are the best solution, you realize that you've done so, and can take appropriate action. "Appropriate action" doesn't always mean "rewrite the entire website in Postgres"; sometimes it can be "denormalize the data into a Postgres db, run the queries I need, and throw the database away", which can, in some circumstances, still be faster than trying to convince a NoSQL to do queries it really doesn't want to do. Especially on a production database that's busy trying to do the real work.

The idea here is to make sure you don't get sucked into the tarpit of accidentally trying to replicate relational database features at your application level, which starts so beguilingly small and grows to consume your entire development budget if you start down that road accidentally and don't turn away soon enough, rather than to do things the "right" way for the abstract sake of doing it right.

Oh, and here's a tip that can save your job and/or career: If you ever find yourself in a position where you're trying to implement "transactions" in your application code, immediately stop and go get an appropriate database on the backend. The horrifying thing about this case isn't that you'll fail... it's that you will totally produce a solution that works perfectly in QA, and fails miserably with data destruction and angry (if not actively litigious) customers in production, and the road of "temporary fixes that seem to fix the problem but actually make it worse" is basically never ending. Learn what this looks like so you know it when you see it!


Thank you for your very detailed and good response. I actually have a lot of experience with relational databases, much more than with NoSQL.

The thing with the users collection is that I need to be able to add any arbitrary attribute to the user document so I can't really use a table with a schema. I guess I could use a related key value table for those but the attributes can be anything, nested objects and arrays. Maybe serializing the value would work but it still seem that Mongo is a better fit in that case.

I also like seeing the user document as a big JSON without joins when doing DB admin stuff.


> The thing with the users collection is that I need to be able to add any arbitrary attribute to the user document so I can't really use a table with a schema.

You could put the attributes that all users share (username, email address, maybe hashed password) in a table, and then have a jsonb or hstore field for the rest of the attributes. That way you can still programmatically access the elements, and even build indexes over them.

But if you have no real reason to change DB, don't.


That's a good idea. Thanks.


> I need to be able to add any arbitrary attribute to the user document so I can't really use a table with a schema.

I'd love to see a sample of this data


It's not your usual CRUD app. It's a realtime sandbox game in early access.

Some arbitrary stuff I keep adding for various reasons:

    gunsLeftToCompensateDueToChangeInVersionX: 123,
    winnerOfEventX: true,
    didSomethingFunky: Date.now(),
    noOfEnemiesKilledBeforeVersionY: 123,
    inventory: [{id: GENERIC_ITEM_ID, quantity: 123}, {id: uuid, name: "Ultimate Weapon", damage: 10000, someUniqueProp: {doThis: true, uses: 123}}],
    flaggedForCheatingInThisOrThatArena: true
Etc etc. Biggest reason though is just convenience of adding some state to the player when developing new features. Itt would be a complete pain having to add schemas for every single situation, even if it's standardised for all players. Now I just add it to the player object and it's saved automatically to the player document.

Not saying what I do is best practice or anything but it enables me to move fast without having to do much DB admin stuff.


Mongo is great for certain use cases. And if it ain't broke don't fix it.


> Mongo is great for certain use cases.

People keep saying this but I haven't seen such a case yet.


Is MongoDB the leader in NoSQL? I stopped using them years ago because of scaling issues. With things like Elastic Search and Dynamo DB, I see no reason to use MongoDB anymore.


It depends on whether you need it to be free as in free beer.

Couchbase is an enterprise grade NoSQL database, it scales very well, and it has a full SQL-like query language, and a lot of large companies use it.

The community edition is free as in free beer and it is also open source. For enterprise use, you can buy a support contract, and there is an enterprise edition.


Curious, why the downvotes?

I wanted to also add that it makes a difference if you only want one server or not. A good small Couchbase cluster has at least 3 nodes.

If you have VirtualBox, puppet and vagrant, this makes it ridiculously easy to bring up a small cluster:

https://github.com/couchbaselabs/vagrants

Note: you still have to use the UI to configure the cluster but this gets the nodes (VMs) up and running quickly.


There are many types of databases and MongoDB remains one of the top choices for document-stores. "NoSQL" is a meaningless term and is yet another point underscoring how better marketing always wins.


Actually, MongoDB is web scale.

https://www.youtube.com/watch?v=b2F-DItXtZs


Elasticsearch is a cool search engine and really excels at some workloads that mostly fall into the realm of analytics and search, but it has its own set of very strict limitations (for example it sucks at updates, cannot do proper joins,...) which make it of limited suitability as a general purpose data store.


Based on the job posts on SO it is. But it's decreasing in popularity quite strongly comparing to Cassandra for example. http://www.reallyhyped.com/?keywords=cassandra%2Credis%2Cmon...


I tried to add SQLite to the graph and it is at the bottom. We could question the relevance of this tool!


Well, it does say "reallyhyped.com" not "reallyused.com".

Joking aside, SQLite is on a ton of devices, yes. But that's primarily because it ships with OS's and is in lots of embedded devices. For web development and other server-side things, I get the feeling that SQLite is not as heavily used as client-server databases.


People usually don't get jobs maintaining sqlite databases. Doesn't mean that sqlite isn't used by everyone, its mostly used in client apps though.


What about Redis?


Whenever I have talked to people choosing Mongodb, it always came down to two things:

1. Not knowing a about Postgres JSONB column type (which behaves pretty much like Mongodb)

2. Not knowing about Amazon RDS as a single click hosting option for Postgres.

Most people run their own Mongodb servers...Which is what gives the impression of "easy to get started".

And here's the other truth as well - People painfully move back to Postgres after losing data. Not because of "oh we can now afford to do SQL" or "oh our data model now needs Postgres". But plain and simple - they lost data that they can't admit to in public.


1. Hardly fair, considering that Postgres JSONB is only 2.5 years old, and the marketing fanfare around Mongo had largely petered out at that point. I've been involved in three companies which use mongo, and all of them selected it before JSONB was in Postgres.


I'm not quite sure what has time got to do with it. I don't believe I mentioned that this was more than 2.5 years ago.

I'm talking contemporary companies - Mongodb still holds far higher startup mind share than Postgres. MEAN/MERN stack is still a thing.


Whenever I see a young gullible engineer, all twinkly-eyed and bedazzled by the MongoDB hype (or any other NoSQL hype), I'll send him a link [1] on a rather interesting talk by a very thoughtful Turing award winning engineer named Michael Stonebraker. It takes a couple of days for the charm of NoSQL to wane off, but eventually they'll always see how they have been fooled into believing without a sane level of critique.

[1] https://youtu.be/KRcecxdGxvQ?t=2072


Bah. Stonebraker invented this marketing strategy in the first place. I heard his companies make all sorts of insane claims - from Postgres, to Vertica to VOltDB. After the first one, I didn't bite on the second and third, only to have some random manager overrule me, and then spend millions migrating off of these proprietary databases later (Vertica and VoltDB)


(Author here) As part of this three part series, MongoDB’s CTO, Eliot Horowitz, was gracious enough to spend two hours chatting with me.

In our discussion, we touched on 10gen’s early marketing strategy, which I've combined with notes from my research:

- 10gen’s Marketing Focus: Eliot noted that much of 10gen’s marketing message was meant for large enterprise CTOs and engineers who made database decisions. If you’re trying to build a database company, this is where most of the money is. But many startup engineers I knew didn’t realize this, and seemed to think the message applied to them. 10gen’s explicit focus on sponsoring hackathons and and targeting startups also encouraged these issues.

- Anger on HN: Eliot and I disagreed where some of the MongoDB anger on HN comes from. In his view, a fair bit of it stems from competitors and their supporters. In my view, much of the anger came from 10gen’s fanciful marketing message that outstripped the product in the early days. I believe that if the marketing message had been more thoughtful and the product more mature when the marketing ramped up, the community anger would have been much less (but it likely would have significantly hurt MongoDB's adoption, which is a challenging problem).

- 10gen’s Marketing “Strategy”: Eliot argued that 10gen didn’t have much of an early marketing strategy. In my view, their marketing team made some smart decisions that really set them apart from competitors. First, MongoDB’s Javascript DSL, JSON data store, and onboarding experience were critical differentiators - and their product was early to market. Their marketing team then used this as they built the MongoDB user groups/conference network, pitched NoSQL and the MEAN stack as the future, and brought industry allies to their side.

- Engineers and Marketers: We debated how much role CTOs should play in dev tool marketing. At 10gen, Eliot noted that he was rarely involved, instead focusing on engineering and product. My own view is that engineers should be involved in the marketing message in highly technical products - and have some input into marketing. I also believe that marketing a database is very different from other tools in the engineering stack such as a frontend framework.

Much of this debate stems from the differential objectives of engineers (making the right decisions for their teams) and marketers/founders (convincing customers to use your product, sometimes at any cost).

I let Eliot know that I would support him in sharing any follow-up thoughts, with the hope of spurring a thoughtful debate in our community.

Finally, I won't argue that 10gen's marketing and product was enough on its own to explain the growth amongst startups - the usability of MongoDB was a key reason for it's success in startups. Less discussed also was the marketing undertaken by training programs, bootcamps, and conferences (see the marketing around MEAN that inundated Hacker News and Reddit in the early-mid 2010s, as NodeJS was growing).


In his view, a fair bit of it stems from competitors and their supporters

LOL no. 99% of it comes from actual, working DBAs and others with production responsibilities.


"Default settings" and "DBA with production responsibilities" should never coexist in the same post. Since >50% of the complaints are about "default settings", you know the rest.


Also lost to history is 10gen's original intent of being more of a complete platform, with Mongo just being the storage layer. Business Insider ran on that platform. Some details in https://www.usv.com/blog/10gen

(A friend of mine was VP Eng there and was trying to recruit me to work on the larger platform, but as Mongo started gaining traction they dropped the other parts to focus on that)


Hey, just an FYI, there's a typo in the first sentence:

> In 2013, 10gen — the company behind MongoDB — moved into a large 30,000 square foot office in NYC's SoHo neighborhood

That should be midtown, not SoHo (the previous office that they moved from in 2013 was in SoHo, but I'm pretty sure that one was not 30,000 square feet).


Thanks so much, fixed!


>"The Marketing Behind MongoDB"

>"Countless NoSQL databases competed to be the database of choice. MongoDB's marketing strategy helped it become the winner."

Of course, of course!

Because if it the strategy was based on MongoDB's scalability and security...


I have always wondered how much of what I see on Hacker News is "sponsored content." Not that there is anything wrong with it.


There is a lot wrong with not declaring sponsored content as such, so much so that it's illegal in many countries.


Many popular tech blog posts are written to promote something - the company that wrote it, the tech that's used, an engineer looking for exposure, etc.


The other question is how many upvotes are "sponsored" upvotes.


A similar success story, albeit a bit earlier in the hype cycle, is Realm.

They started with an lightweight db engine for embedded systems named TightDB. Getting traction in that space is hard, so they pivoted to become a „mobile database“, taking a page out of the MongoDB playbook

- they built a slick wrapper for iOS and Android to make it extremely simple to get started

- they wrote a fantastic set of tutorials to get started quickly

- they started doing lots of events to grow their community

And they‘ve succeeded to get quite a lot of companies to use their product — by making something that is fun to use for developers (at least in the beginning)


I feel like a lot of issues people have with NoSQL or MongoDB is that there trying to fit the Relational Model and SQL way of doing things into MongoDB. MongoDB is a document database. So the things you did in a RDBMS won't work in a document database. Time to think different.

Some of the things I'v seen people complain about.

JOINS - should be done in the application

SQL - one of the big issues I have with RDBMS is that application logic lives in the database and application, causing all kinds of brittle spaghetti code. If you need that code to live outside of the application then its time to build a web-service around it or just have a shared library all the apps use.

Reports - should be done in the application, could also create a view in Mongodb and pull report data from that.

Schemes - RDBMS schemes are a huge pain, the restrictiveness of it causes all kinds of pain when you want a dynamic data-store. The best thing about Mongodb's collections is that every document in a collection can have a totally different schema. If you want to do that in a RDBMS you end up hacking it by shoving json or xml data into fields. Your basically creating a document database but doing a horrible implementation of it.

ORMS - if your using an orm then your already going down the path of using a NoSQL database. ORMs are hacks for RDBMS because mapping your data model to SQL is a pain. With Mongodb this is all built into the driver and database. Thats why Mongodb Inc creates drivers for all major languages. Plus most drivers have attributes you can decorate your modals with that make it flexible when the structure changes. So nothing blows up.


> JOINS - should be done in the application

Ok... Also tell me, transactions should be done in the application as well?

Great! Next time, i'll do transactions and joins in the application, and discard the 35+ years of experience and performance improvements that any modern RDBMS has for joins and transactions.

> The best thing about Mongodb's collections is that every document in a collection can have a totally different schema.

In which sense is this a good thing?

> ORMS - if your using an orm then your already going down the path of using a NoSQL database. ORMs are hacks for RDBMS because mapping your data model to SQL is a pain.

The ORM simply allows easier interaction with the RDBMS when using an object-oriented programming language, mainly for moving data from tables to objects. The ORM is explicitely for RDBMS; the "R" in "ORM" stands for Relational.


I start with MongoDB on every single one of my side projects because it’s so easy to get started.

No setting up permission or schema.

Just works.


From reading the comments on HN every time these MongoDB threads come up, I'm beginning to think -- and I really hope I'm wrong about this -- that a lot of people started using MongoDB who had basically no prior experience in using relational DBs. If true, that is just insane. I guess I take it for granted that programmers have at least some computer science background and are familiar with the relational algebra, ACID, etc. It kind of saddens me.


As we all know these marketing strategies works well only for the initial bubble. For the long term, its just going to be whoever providing the perfect content.


Refreshingly honest.



Survival? A planned IPO?


OK for everyone describing issues with RDBMS take Postgres create table with id and "data" jsonb column and you will have a NOSQL solution with Nx the performance and Yx the features.


Can somebody point me toward the 'also a marketing strategy for mongodb' comment?

I came here to like it


I think the only strategy would be: give for free to the developers, charge big bucks from the enterprise.


A lot of their growth is due to a lack of mature ORMs. If you're using Django it's super easy to use Postgres. If you're using Node you're out of luck. So for certain ecosystems MongoDB can be easier to work with due to the available libraries.

I personally much prefer Postgres though.


Django's ORM, especially with built-in migrations, is great.

Just don't think it doesn't have any gotchas. It also can't do JOINs (with more than one element)


Summary: learn the relational model of data management, as set up by Edgar F ‘Ted’ Codd almost half a century ago, and enjoy innoculation against hype for all your life.


Except all the hype around CockroachDB


Even their logo reminds me of Mongo's. I wonder if it's intentional (a spoof to be precise).


if only elixir phoenix supported it :(


I can't tell if this is a serious comment or not, but there is no reason why you can't use Mongo with Phoenix.

For example- https://tomjoro.github.io/2017-02-09-ecto3-mongodb-phoenix/




Join us for AI Startup School this June 16-17 in San Francisco!

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: