
Goodbye MongoDB, Hello PostgreSQL - YorickPeterse
http://developer.olery.com/blog/goodbye-mongodb-hello-postgresql/
======
jgrahamc
As a greying developer I am most amused by people discovering that 'old'
technologies like SQL databases work really well.

The only useful piece of advice I can give a younger developer is... be
careful when drinking the newtech koolaid.

And one more thing:

    
    
        star = Sequel.lit('*')
    
        User.select(:locale)
          .select_append { count(star).as(:amount) }
          .select_append { ((count(star) / sum(count(star)).over) * 100.0).as(:percentage) }
          .group(:locale)
          .order(Sequel.desc(:percentage))
    

just makes me want to cry. Learn SQL rather than wrapping it.

~~~
YorickPeterse
It's not so much about not wanting to write/understand SQL (both are still
very much required), but about composability. If you want to re-use bits of a
SQL query written as a string literal your only option is string concatention
or using some kind of string builder/template system. In both cases there's
little validation of the query's correctness (syntax wise) until you actually
run it.

While I agree that many ORMs go too far or even worse, not implement certain
powerful features, Sequel (and similar tools I imagine) strikes a nice
balance. The particular examples I gave on their own are not super useful, but
we already have quite a few queries that are composed/re-used without having
to concat strings.

So tl;dr: it's about composability, not being "lazy" or "ignorant" to SQL.

~~~
mbreese
How often do you _really_ reuse bits of a SQL query? SQL is the language that
an ORM will generate. The question is, how much work is done to avoid using
SQL and is it worth it?

The only time that I've found ORMs useful for composing queries when I have to
dynamically create a query at runtime based upon user input. And even in that
case, today, I'd probably still just concatenate strings for a proper SQL
statement.

~~~
collyw
I think there is a problem of tooling in SQL, which means it is difficult to
reuse it.

I spent the other day writing a big query, joining about 10 tables. So usually
I start by joining the two main ones. Check that gives me the results I want.
Add in another table or conditions, check again. Repeat until all tables are
joined into the query and conditions are added.

Then I noticed the results of my GROUP_CONCAT were not as I expected. I had a
couple of suspect joins that I tried removing. Same problem. In these
situations, it is often easier to go through the same process from scratch of
adding in one table at a time and ensure that it is working.

We need some kind of unit test equivalent for SQL.

~~~
johan_larson
Mature relational databases support views, pre-built queries that developers
can define and other developers can then use as "tables" in higher-level
queries, allowing some code reuse. This has been around a long time.

------
morgante
This post reflects an interesting technical narrative of companies switching
off MongoDB to more traditional relational databases as they grow.

Importantly, I don't think that's an indictment of MongoDB. Instead, it
highlights the key advantages of NoSQL: ease of use and rapid iteration.

When you're first working on a project, MongoDB is very easy to slap in. You
don't even have to create tables/collections. As you iterate, you don't have
to constantly be updating schemas as your product direction shifts. Thus,
MongoDB is perfect for early startups.

As a company scales, of course it's warts start to show. The very attributes
(like schemaless storage) which were an asset when just a few developers were
hacking on the project become a drawback as the company grows in size.
Reliability becomes more of a concern.

Naturally, people shift to a more traditional database. Personally, I'm almost
ready to switch some of our core functionality from MongoDB to Postgres. This
isn't an indictment of Mongo though.

~~~
distantsounds
Really, you can't spend the 10 minutes designing a table structure in a SQL
database? And now you have to spend months re-inventing the wheel because you
wanted an easy out?

This post reflects on developers being lazy, instead of doing it right the
first time around. Oh no, you have to log in to the db and run a CREATE TABLE
statement every few months when you need to scale. Cry some more.

And even then, 'lazy' is subjective. I don't find SQL that hard to implement.
There's a library for practically every language you can think of. What's your
excuse?

~~~
xyzzy123
Actually, I think that's a great way to think about it: NoSQL is the "dynamic
typing" of the database world.

Put another way, it's like "what? you couldn't spend 10 minutes declaring
types everywhere?" \- yeah, it's less robust, yet dynamically typed languages
remain popular.

My excuse: When I'm just past the mock stage, and still playing with what UI
functionality should be, sometimes I just want to get some JSON persisted. I'm
changing the shape of the data a lot, discovering the schema as I build. The
persistence is frankly a technical nuisance I wish I didn't have to think
about.

Sometimes, the result is "good enough" and I don't need to go through the
ceremony of glueing in an ORM.

~~~
alttab
Your given excuse makes you sound more like a naive amateur instead of a
pragmatic architect

~~~
asmosoinio
Just in case GP cares: To me it did not sound like this at all. I think making
prototypes and starting to build the thing is the key to getting to a good
architecture.

------
IgorPartola
> Another way of handling this is defining a schema in your models. For
> example, Mongoid, a popular MongoDB ODM for Ruby, lets you do just that.
> However, when defining a schema using such tools one should wonder why they
> aren’t defining the schema in the database itself.

Bah. It's like they didn't know that schema-free data stores mean "there is no
schema; different objects may have different fields". This is the whole point
of MongoDB: you assume the responsibility of managing the schema. That's a
"feature": you get greater flexibility by assuming more responsibility.
Whether it's a useful feature, I won't say.

Their second reason is much more valid: MongoDB is not consistent/durable, all
that. It's good for a cache, but not for long term data.

Here's a third reason I'd give against it as your primary data store: it's
expensive. You have to keep your entire dataset in RAM, but that's not always
necessary. My favorite example is from Foursquare. They had every single
check-in ever in MongoDB in RAM. That's absolutely unnecessary, and quite
silly to do so. Old check-ins are archived data. You don't need them. No user
ever wants to know when/where they checked in three years ago. This is why at
the time they were paying for 68 GB RAM Amazon boxes instead of 4-8 GB boxes.
(I have no idea what they do now. I remember chatting with them on HN after a
catastrophic out of memory failure when they filled up the entire 68 GB's).

~~~
colinhowe
We have a dataset much larger than RAM (600gb vs 60gb RAM). It's the working
set that really matters. Accessing data outside the working set can be slow -
unless you have SSDs :)

Regarding consistent/durable: during the past four years, we've not had any
problems on this front that weren't caused by us in. We've had an issue that
was a misconfiguration on our part where we allowed writes to a server because
we pulled it out of a replica set. We also ran out of space on the logging
volume once and that caused downtime - but, we didn't have log rotation or
anything setup.

In general, we've found the failover very reliable and new primaries have come
online without any problems.

That said, schemaless is both a blessing and a curse. Now that Postgres and
MySQL have online alter built-in I'd possibly choose one of them if we were
starting everything again.

~~~
IgorPartola
Well, SSD's aren't nearly as fast as RAM, even today.

I think we are in agreement about the working data vs all data. I am saying
that in most applications your working dataset is much smaller than your total
data set. So why pay for hardware capable of holding your entire dataset in
RAM when you don't need it?

I am surprised you are able to do this with Mongo. Last I checked, it simply
did not handle this case, and started failing miserably if it was not able to
fit all data into RAM.

Re: durability: I am not talking about the server going down, coming back up.
I am talking about whether there is an fsync() when writing data. Set up a
test case where you are writing data very rapidly to MongoDB, then pull the
plug on the box it's running on. It'll come back up, but the data it told you
it just wrote won't be there because it didn't checkpoint. Did you check that
all your writes succeeded when you had node failures in your cluster? Most
applications don't have the machinery to do this because generally the state
necessary to check this is stored in the database, yet it's the database you
are testing. The only way to test this is to also write logs (also
atomically), and then verify DB data against logs. Or, just use a database
that guarantees durability.

Re: consistency: MongoDB doesn't support transactions [1]. That's enough to
exclude it from a large number of applications. Anything to do with money, for
example, is out since you really don't want double spending to be a thing.

[1] [http://docs.mongodb.org/manual/core/write-operations-
atomici...](http://docs.mongodb.org/manual/core/write-operations-atomicity/)

~~~
colinhowe
Fair points - we don't allow failed servers back into replica sets and rely on
writing to multiple nodes instead of the disk as source of truth.

This may not suit everyone and absolutely does not suit financial
transactions. You can bend Mongo to do it using additional collections and
money movement logs... but, why bother when it's simpler to use
MySQL/Postgres?

------
pizza234
There is a mistake in the article, due to the OP not knowing an arguably basic
notion about MySQL.

> when defining a field as int(11) you can just happily insert textual data
> and MySQL will try to convert it.

this is dependent on the SQL Mode, which is quite flexible. for example, the
STRICT_ALL_TABLES will prevent strings to be inserted in INT fields:

mysql> create table example ( `number` int(11) not null );

mysql> insert into example (number) values ('wat'); Query OK, 1 row affected,
1 warning (0,00 sec)

mysql> SET sql_mode = 'STRICT_ALL_TABLES';

mysql> insert into example (number) values ('wat'); ERROR 1366 (HY000):
Incorrect integer value: 'wat' for column 'number' at row 1

There are certainly advantages in choosing PostgreSQL over MySQL... this is
just not one of them :-)

~~~
solutionyogi
Personally, I would not use a database which is 'lax' by default instead of
'strict'. What other choices have they made which I need to learn OR it will
bite me big in Production?

~~~
pizza234
It's not so easy.

MySQL has a large legacy of being used as a very-immediate-although-somewhat-
toy database at its roots.

For example, in absolute terms, I would find much more troubling the usage of
non-transactional tables, justified by meaningless microbenchmarks, which has
been somewhat common for some time.

Nowadays MySQL is definitely reliable, and it has a much more expert
surrounding culture than the past, so if a person/company is willing to put
time and knowledge, it's a reasonable choice.

I don't find automatic conversion so damaging that people should stay away
just because it's default.

When you reach some level, you definitely need to have a relatively intimate
knowledge of your tools, and at such point, one is far from the "defaults".

------
bgentry
_In the future we might also move our Rails applications over to Sequel, but
considering Rails is so tightly coupled to ActiveRecord we’re not entirely
sure yet if this is worth the time and effort._

Actually, with modern versions of Rails, using Sequel in place of ActiveRecord
isn't bad at all. Nothing in Rails is really tied to ActiveRecord anymore.
There are dependencies on ActiveModel, but you can easily make Sequel::Model
objects conform to this interface. Sequel-rails helps with most of that:
[https://github.com/TalentBox/sequel-
rails](https://github.com/TalentBox/sequel-rails)

It's really just the migration that's difficult, as that's tough to do
piecemeal and requires good test coverage.

Here's a recent side project of mine that I switched from ActiveRecord to
Sequel pretty quickly once I remembered how extremely limiting the querying
capabilities of ActiveRecord can be: [https://github.com/bgentry/portfolio-
api](https://github.com/bgentry/portfolio-api)

And if you really want to keep the option for schemaless data storage,
Postgres can now do that with better performance than MongoDB, while keeping
full indexing capabilities:
[http://blogs.enterprisedb.com/2014/09/24/postgres-
outperform...](http://blogs.enterprisedb.com/2014/09/24/postgres-outperforms-
mongodb-and-ushers-in-new-developer-reality/)

~~~
YorickPeterse
While certainly possible we simply haven't really evaluated it yet in depth.
In between releasing a bunch of upcoming features and upgrading Rails from 3.2
to 4.2 I'd rather wait with _also_ moving from ActiveRecord to Sequel for the
time being.

~~~
bgentry
yes, by all means upgrade yourself to the latest release of Rails before
attempting to go "off the rails" :)

------
aravan
The article is sort of weird, why Document, no sql, no schema DB to be a
schema DB? One fundamental fact that you liked MongoDB in early days (first 5
years) because it was damn easy to handle anything you throw at, you learned
to build the business around it. In 5 years, you learned enough, business
model matured, not much changes in the data model. Now development progress is
matured, you have got team to work on, your worries relies on consistency. You
might want to back to SQL. Now you know what schema/table you need, string
length, data type, constraint, relationship etc, because you learned that in 5
years time. You don't need to deal with new developers screwing the Document
DB (like storing Object ID as string in few places)

Developers, who reads the article beware of reality, you can look back 5 years
now, not looking forward 5 years ahead. Don't waste time in dealing with
database columns and schemas, instead build the business faster, I found Mongo
DB or any Document DB is good fit for agility.

1000's of business move to Document DB because it is schema-less.

While working on Document DB, you should be master in writing stand-alone
scripts in Python/Perl/Ruby to run every-time you break the structure, or fix
inconsistency.

edit: fixed typo errors

------
super_sloth
Is there anybody here who has run MongoDB at moderate scale with good results?

As in a few terabytes of data, >10k ops/second territory.

I've been really disappointed with its reliability and performance in
situations where I've been around that.

~~~
quackerhacker
I've been using mongo on 39M+ records (tracking financial tick data) across 73
assets (collections) and my queries take anywhere between 2-3mins depending on
complexity.

You can always run db.currentOp() in the mongo shell to see what process is
taking forever as well.

Let me clarify why though, there methods of optimizing a query by adding
another field, but since I have to traverse my records with the sort() cursor
my queries take that long.

~~~
Thiz
Jesus Christ!

I remember querying databases with 100 tables and millions of records in
foxpro a century ago and it took less than a second.

What has happened to the world while I was in cryogenic state? Take me back to
the nitrogen pool!

------
code_duck
I had a business based on another company's API years ago, and they started
changing a lot of their systems to work with mongo. The data lost a few useful
points, like IDs and the speed and reliability didn't seem to be improved. My
thought at the time it was, how about you guys just use postgresql? My comment
on the mailing list about that seemed to be taken as an insult or naïveté.
However, sure enough two or three years later they said they had all sorts of
problems with Mongo and were switching to… Postgres.

------
esilverberg2
The author's assertion that "Another problem with MySQL is that any table
modification (e.g. adding a column) will result in the table being locked for
both reading and writing. This means that any operation using such a table
will have to wait until the modification has completed." is no longer correct
as of Mysql 5.6:

[http://dev.mysql.com/doc/refman/5.7/en/innodb-create-
index-o...](http://dev.mysql.com/doc/refman/5.7/en/innodb-create-index-
overview.html)

If you specify ALGORITHM=INPLACE,LOCK=NONE you can alter table without
blocking reads and writes. We have used this method successfully in Amazon RDS
when updating schemas.

~~~
troels
It's not exactly a common operation either, so basing the choice of rdbms on
it seems a bit arbitrary.

~~~
jeltz
It happens a couple of times every month in our production databases. If it
required locking tables we would have to schedule downtimes during the early
mornings every time this happens would would have been a pain.

------
DonnyV
As soon as you have more then 1 app talking to your database its time to
either create a common library that talks to your database that can be shared
or wrap it in a service. This has nothing to do with NoSQL vs SQL. It has to
do with architecture composition.

Plus your problem with checking if a field exists is because Ruby doesn't
support property attributes. This is easily solved in C# using attributes.
Which you can fake in Ruby. [http://stackoverflow.com/questions/1085070/how-
do-i-fake-c-s...](http://stackoverflow.com/questions/1085070/how-do-i-fake-c-
sharp-style-attributes-in-ruby)

Data consistency is what your models are for. No data should be inserted
before being assigned to a model.

Again it sounds like you needed a shared library or just a service wrapped
around your database.

------
jhoffner
Something that I need to call out here. In talking about how to handle the
lack of fixed schemas within Mongo, the OP refers to using if/else blocks to
get around field name changes. This shows a severe lack of understanding of
how to work with Mongo at such a fundamental level. If using Mongoid (which he
is) you simply need to run `Post.all.rename(title: :post_title)` to do the
schema change. This may seem like a nitpick to some people, but honestly if
that concept was lost on the developer then its clear that there is little to
no credibility to be had in the reporting of all the other Mongo issues.

I'm glad that the OP is growing as a developer and starting to understand the
merits of SQL and why having defined schemas is important. SQL is awesome and
its not going to be replaced by NoSQL (at least not fully). However please
make sure that you are building the foundation of your understanding by fully
grasping the technologies that you are already using.

------
yawz
> ... we value the following: > Consistency

With all due respect, you should have had this list before selecting MongoDB.
Consistency, as in "eventual consistency" and also in the way that you
describe it is better supported outside the NoSQL group.

Also, I don't fully agree with the "schemaless" discussion. The moment you
think about your data, you build a schema in your head, which, then, is
translated into code etc. It's about how far you take this rigid model. Maybe
you should see it as "flexible schema".

------
tannerj
So let me ask a question. What should I use when I do need a schemaless
database? Is NoSQL never the answer? I've got a project that needs to allow
clients to create registration forms for different events that my company
hosts. A lot of the registration data will have a defined shema ex: name,
email, address. I feel like that stuff should go in a RDMS, but all the event
specific stuff needs to be schemaless. I know I can do custom key/value tables
in a RDMS, but that doesn't feel right either. Is MongoDB useless as a
database, or are people being bitten for thinking it's a silver bullet and
throwing it at every problem?

~~~
Ixiaus
Riak is amazing and actually scales. But I would only use Riak for high-volume
data storage (similar to S3).

FoundationDB looks great, I haven't used it yet but they appear to have their
heads on right.

PostgreSQL, the newer versions, have indexable BJSON data types so you can get
the same exact behavior from Postgres as you do from Mongo but with a true
RDBMS along with it, a dependable storage engine, etc...

Postgres is harder to scale horizontally though - if you have really high-
volume data writes, you should be using something else for that.

I typically use PostgreSQL for all of my highly "structured" data and Riak for
high-volume and "flatter" data (Postgres also often serves as an index _into_
those objects).

~~~
tannerj
Awesome, Thanks for the reply. My use cause is barely out of the "toy app"
range. We only do a handful of events each year and they only draw around 100
attendants. We're talking a very small amount of data. When you do the
PostgreSQL and Riak combo is it ever on the same/related dataset. What are you
using at the application layer? I'm building this in rails and I feel like it
would be better to store the structured fields in MySQL and the variant data
in something NoSQL. But I haven't read much into using active record with two
different persistence layers. That's interesting about PostgreSQL with
indexable BJSON. I've only ever used MySQL, I really need to check out
Postgres.

~~~
Ixiaus
It's easier to go from something more highly structured to something looser.
Start with a relational db and let it grow then pay attention to what data
gives you the most scale pain and try to move that out to Riak / Cassandra /
etc...

Don't prematurely scale, just pay attention to your metrics, scale vertically
first, then tackle the very specific pain points.

------
imfletcher
while i've not spent much time with the "nosql" products, its mostly because
i've decided to stay SQL. The bottom line is that most humans think of data
the same way SQL thinks about data. When you talk to people outside of
engineering (i.e. mgmt or customers), they expect the product to be able to do
things that SQL does naturally (normalized data for easy changes to things
like 'user name', join and sort based on a variety of cross cutting
properties, etc). You can always make both do the same thing eventually, but
one will fight you more than the other....

------
Tyguy7
I did a similar migration last year. Loving postgres.

------
alttab
This post outlines some of the most expensive parts of deploying, operating,
and maintaining an application that operates on "NoSQL" databases like Mongo,
and in my experience, DynamoDB:

1) Implicit Schemas. We avoid this completely by doing production migrations
and re-indexes on every new field we add. Its expensive, and we have to
write/test/run scripts in production. At times I wish I could just write a
Rails migration on an RDS instance and call it a day. N-1 compatibility isn't
hard to accomplish with good code reviews.

2) Search. Want to search and join like in the old days? Good luck. Using a
NoSQL DB as the primary authoritative store of record is great, but you'll
need a secondary indexes for any searches you want to do. If you need to look
up an object on a new field (or even one that already exists in all of your
data) if you haven't built an index for it you will have to.

3) Serving Clients. Because of the schemaless-blobby nature of writing
clients, things get real messy the minute you have multiple services or
applications reading or writing to a DB. To get around this, you have to put a
service in front of it and serve the data from some RPC technology, which is
an extra step and requires more development and maintenance.

4) Administration. There aren't a long history yet of robust toolsets and
"science" behind different schemaless NoSQL databases. Meaning which one you
choose has a huge impact on your ability to fine tune it, debug consistency or
other expectation issues, and do things like proper failover, backups,
restores, etc. Knowledge between similar NoSQL DBs doesn't transfer as well so
your mileage will vary more so than on SQL databases (MySQL vs PostgresSQL,
for instance).

NoSQL has its place and purpose, but it is rarely as the "one database that
rules them all" that many businesses end up with. I'd be interested in counter
stories.

------
st3fan
Yes if you have one MongoDB database that uses `title` and another one that
uses `post_title` then you have to adjust your code for that.

Guess what. Same thing applies to SQL.

~~~
freebullets
In SQL you can't have a table whose title field is called either `title` or
`post_title` depending on which record you're looking at.

~~~
dhatch387
You easily could have this in SQL if you designed a poor schema or failed to
migrate data from the previous field name.

------
jqm
I haven't used MongoDB in production (the comments regarding reliability have
been around for awhile), but playing around with it, I do like the json format
and query structure.

The issue I have with SQL (MS sql in the case of work) is the amount of
cleverness involved in some queries I have seen. Among the old timers, it
seems almost a badge of honer to develop the longest, most clever SQL query
that does everything in one step. Inevitably, there are problems, and people
have trouble figuring out why because they can't debug parts of the statement
in isolation. In this case, it's as if they have written an entire program in
one line and can't test parts in isolation. No doubt this is abuse of the
language... and admittedly my SQL skills are not world class, but I keep
scratching my head and wondering why they do this to themselves. If a query is
so complex you can't tell what it is doing, and it misbehaves, perhaps you
would have been better off with some smaller queries you join in the
program... nothing against joins... one or two or three of them... but
really....

I keep asking myself if I'm missing something, but I sort of doubt it.

~~~
prottmann
That is then not a problem of the SQL-Language, it is more a problem of the
database design.

When you need really complex queries, you should think about a new db
structure.

~~~
jqm
The largest database is vendor provided so we are stuck (for now).

Thanks for the insight.

------
harel
I switched a new product in planning from MongoDB to PostgreSql 9.4. With the
new JSONB columns you get the best of both worlds.

------
hbbio
Interesting to read both the article and comments. It's cool to hate Mongo
nowadays but we use it successfully (for PEPS, open source:
[https://github.com/MLstate/PEPS](https://github.com/MLstate/PEPS)).

The version of MongoDb used is not mentioned in the article. The changes
between versions is a problem with Mongo but it improved much recently.

One problem clearly identified is the lack of model. There are solutions to
this. For instance, with Opa ([http://opalang.org](http://opalang.org)) we use
strong static typing to generate a database model and guarantee that the whole
applications sticks to it. That leaves out model changes, but there are
solutions for that.

Also, there is no need to have the whole collections in RAM, but clearly
enough RAM for the "working set" helps.

In the end, Mongo is no magic so do SQL databases which have their share of
problems too.

------
remon
I made a little bet with myself when I saw the topic of this post that a good
80% or more of the top comments would be of the "Shouldn't jump on the
MongoDB/NoSQL bandwagon, nothing wrong with SQL" variety combined with various
arguments to support that position or arguments against NoSQL in general or
MongoDB specifically. I won.

Here's the thing; anyone that claims one is superior over the other without
adding a specific use case or context is doing something akin to religious
preaching. Databases are tools. Pick the tool you need for the job. If you
don't understand exactly what a hammer is for and how it does what it does put
effort into gaining that understanding. If after that research you think the
tool has flaws that are important to you then don't use it.

------
tapirl
There is only one NoSQL database satisfies all following requirements: 1\.
easy to use 2\. reliable 3\. transaction support 4\. easy to scale 5\. eays to
build indexes

It is BigTable.

Generally, if your data is not very large, you should use a SQL database.
NoSQL is mainly used for easy scaling, not for its schema-less feature.

------
neunhoef
Interesting article, did you consider any of the newer NoSQL solutions that
offer a data model more similar to MongoDB but more, often configurable,
consistency guarantees (like ArangoDB or RethinkDB)? I could imagine that the
effort of a migration would have been considerably smaller...

------
ahachete
As the post explained, one of the big problems of migration off of MongoDB is
finding out the "effective" schema of the data, and then migrating it
(according to that schema). By "effective" I mean the resulting structure of
the data stored in the database.

I think it would very helpful to use here ToroDB
([https://github.com/torodb/torodb](https://github.com/torodb/torodb)). While
being MongoDB-compatible, it stores data structured into PostgreSQL tables,
automatically identifying the schema of the data. Then, just by looking at the
created tables, you very easily have the schema (and the data migrated to that
schema). It would make migration easier.

Disclaimer: I am a ToroDB developer

------
neumino
The problem of schemaless database mentioned in the article is then replaced
with painful migrations.

The trend to go back to SQL databases now is fueled by the "same" poor reasons
why people were going for NoSQL databases. You can't have everything...

~~~
bni
I have found that the more schema you have, i.e. the more structure you have
for your data, the easier migrations become, because there are no surprises.

------
_pmf_
Diving in a cesspit makes one appreciate clean air.

------
pacomerh
It's very interesting to see the amount of negative articles about MongoDB,
and at the same time see that its very popular on job boards. I'd honestly
like to read an article about a success case

~~~
htsh
The successes tend to be quiet. And folks who are happy don't take to the
internet to defend their choices.

I've been on two straight projects where Mongo has been fine, no better or
worse than an SQL database, but certainly nowhere as bad as one would assume
reading all the negativity around here. I have experience with it being stable
in two different contexts in production (social games, a very common use case,
being one). A search for "social games mongo" should show you some success
stories.

The thing that makes it a good choice for me when starting a new project is
that you don't really think about it, especially if you're following lean
startup methodology where the goal is to get a product into the hands of the
users as quickly as possible & you anticipate changes to your models. Mongo is
pretty fantastic for this. And when things stabilize, moving data really isn't
that bad or expensive if you modeled based on this assumption (which you
should). And now that PostgreSQL has a pretty good JSON datatype, it presents
a pretty painless path if and when you outgrow Mongo.

------
clay_to_n
A related read, titled Why You Should Never Use MongoDB:
[https://news.ycombinator.com/item?id=6712703](https://news.ycombinator.com/item?id=6712703)

------
dcomartin
Beware of "The Problem Of Schemaless" can occur with a relational DB when you
start defining columns as nullable. You end up with the same problem.

~~~
efuquen
I wouldn't call it the same problem. Something nullable is similar to an
Optional type (i.e. in scala Option[String] would map to a nullable varchar).
That is not the same as being schemaless, you at least still have types that
are being enforced, even if they might be empty.

I would agree that having too many nullable fields could indicate a problem
with the schema, i.e. you should break things up into more tables.

~~~
dcomartin
Agree. I'll rephrase to "similar" problem.

------
xgil
I also keep reading about how bad is MongoDB for lots of projects. I don't
think MongoDB is worthless, but from my experience I also have concerns about
it.

If so many people migrate to other technologies, why is MongoDB still so
popular? Have a look at [http://db-engines.com/en/ranking_trend](http://db-
engines.com/en/ranking_trend), MongoDB has just passed PostgreSQL

------
jbergens
They don't write if they looked at other databases. I think there are a lot of
alternatives that might have worked better for them than MongoDb and even
Mongo 3.0 might have worked better. Regarding the schemalessness for simple
new properties they should probably have fixed that in the ORM layer, it can
add default values if needed. That something is null can actually happen in
sql also.

------
solidsnack9000
> For example, when defining a field as int(11) you can just happily insert
> textual data...

My eyes practically fell out of my head when reading this.

------
barce
The section, "The Problem with Schemaless," blames the technology instead of
whoever put the data in there in the first place.

If the code has to handle both page.title and page_title, this is a feature of
using a schemaless technology.

Also, lots of the issues the author had with MongoDB are also to be found in
MySQL, e.g. taking hours to recover from a corrupt database/datastore.

~~~
acveilleux
It's just that if you have a sizable team (or worse over a long time with low
overlap) working on a product, you intrinsically get an accumulation of
duplicates, deprecated, or both "keys" in your schema-less schema.

With a DB that enforces a schema, the overhead of modifying the schema tends
to moderate that nature.

I once worked in a lab (Ph.D. students are atrocious programmers BTW) where I
introduced a schema-less store (cheesy K/V store) to handle some mundane
metadata caching on some medical imaging. It was intended to store 4-5
attributes per PK, I never touched it after setting it up for what I needed
but showed it to colleagues.

Fast forward 10 years and there were something like 3000 attributes defined.
Several hundreds of which were serialized blobs. Huge amount of overlap
between the different attributes.

Almost all that because people didn't know what was already in there so they
just did their own thing.

------
cskakun
I'm also moving back to Postgres. Postgres was the first DB I used when I
learned to code PHP. MySQL came later. Just recently, around a year or two
ago, I got introduced to Mongo and all its surrounding hype. Mongo is good for
small small apps that need no references and have a low requirement for speed
and data integrity.

------
Fiahil
I've been thinking of the same changes for a few months. We are running on
MongoDB with a few M entries, and around 10GB of data. We use Scala and
ReactiveMongo on the application side. So, my main fear is we're going to have
a lot of overhead when trying to migrate to something like "ReactivePostgres".

------
serve_yay
It's very tiring to read things like this. And that is not meant as support
for Mongo, quite the contrary in fact.

~~~
TheOtherHobbes
It's interesting that if you search for MongoDB vs Postgres, you won't find
_any_ articles suggesting you should go from P to M.

Supposedly WiredTiger for M 3.0 will be more everything, including faster and
better.

We'll see.

------
buckbova
The concept of moving from one database technology to another, especially
something like NoSQL to Postgres, sounds like a huge task.

Aside from redesigning the schema/model and changing all the code, what about
new backup/restore procedures? Scaling and performance best practices? Did you
need to hire a postgres expert?

~~~
YorickPeterse
We didn't hire any experts, instead we educated ourselves on the matter. For
example, one of the first steps we took was to run some rough benchmarks on Pg
to see how it behaved compared to MySQL ([https://github.com/olery/rds-
shootout](https://github.com/olery/rds-shootout)). Followed by this was mainly
discussing PostgreSQL vs MySQL with those who used either one (or both) in
production for a somewhat serious workload.

Backup/restoring is handled by Amazon RDS in our case, in the past we had a
custom backup system for Mongo that backed data up to Amazon S3.

~~~
mtarnovan
Will you publish the results of the benchmark?

------
pmontra
You probably had a replica set and maybe an off-site replica. What are you
using now, a single PostgreSQL instance, a master-slave cluster or any other
distributed setup? If positive, which one of the many psql distributed
technologies are you using? Thanks.

~~~
YorickPeterse
Before (Mongo): 1 primary, 2 secondaries, 2 arbiters Now (Postgres): 1 primary

Most of our applications require write access in some shape or form, so at
least the default replication of Amazon RDS doesn't cut it. Besides that we
don't really need it so far, don't see the need for it in the coming months
either.

~~~
hugopeixoto
One problem I have with postgres right now is that whenever I want to upgrade
the cluster (say from 9.1 to 9.3) there's downtime while the cluster is
upgraded.

Not even vanilla replication helps, right now. I think I'd have to use
something like slony to replicate between different versions of postgresql,
but I never tried it.

How does upgrading postgresql versions work in RDS?

------
rdtsc
Wonder if they evaluated rethinkdb?

~~~
YorickPeterse
I briefly looked at it, but seeing how young it was (and still is) I'd prefer
not to bet any money on it for now, I'd rather use something tried and proven.

------
spdustin
I feel like I'm missing something. So many comments in the article and here
along the lines of, "just issues a warning and not an error."

Am I the only one here who's thinking that this is correct behavior on the
part of the DBMS? Three result codes from an operation: 1) everything is okay,
2) I'm sorry Dave, I can't do that (error) and 3) Okay, if you insist, but I'm
going to change your data to make it work.

Am I the only one who thinks: the programmer should be aware of and respond
appropriately to ALL THREE, not just 1 and 2.? That anything else is just
laziness?

Or is that just me? Am I missing some subtle consideration here that results
in my thought process being naive? If I'm being naive I do want to understand
what I'm missing, because getting schemas right and having my code react when
improper data types are being used is sometimes a pain, ORM or not. But I've
always thought it was the right thing to do.

~~~
carlio
Personally I feel like 3 should not be an option. Either things are okay or
not okay, I don't like the "sort of okay maybe" option because that implies
lots of fuzzy definitions and boundaries that the developer just has to learn.

Schema says int? Give it an int or fail. Not "it's sort of okay if you give it
a thing which could be coerced into an int according to the database's ideas
of coercability".

------
snambi
Thank god for return of sanity.

------
onderkalaci
Who doesn't want NoSQL with ACID properties? Can anyone tell me what is
superior on MongoDB compared to PostgreSQL?

If anybody is going to talk about scalability, there are many good
alternatives to scale PostgreSQL too.

------
atlih
Not that im excusing mysql, but what I do when adding columns to a +10mil row
tables is to create a new table with the new column and then insert into
newtable from oldtable.

That changes those few hours into few seconds.

------
neunhoef
For some additional thoughts about schemaless vs. schema-enforcing datastores
see

    
    
      https://gist.github.com/neunhoef/9b6749089775e472d44c

------
timhon
What system is he using to graph his queries response times?

~~~
bscanlan
New Relic.

------
jaequery
Care to share your experience in terms of performance of Postgresql's JSONB vs
Mongo? for both selects/inserts? and how about nested search queries?

~~~
wallflower
In general, NoSQL databases are good for storing unstructured data. They start
to fall apart when you want to query that unstructured data. You will hate
yourself if you want to query something that is nested inside something,
especially if one of the nesting levels is of the array JSON type. This is
because indexes are required (just like in a normal database). Many times,
Map-Reduces are required since the queries don't run in reasonable runtime. At
some point, because you are storing the entire JSON document, relational
databases start winning the space game (they don't store the name of the
column with every document, like NoSQL does).

------
goshx
Am I alone in thinking that if a programmer writes a code that allows a string
to be sent to an integer field in the database the issue is not with the DBMS?

~~~
troels
Yeah. And since they use ActiveRecord, how would they even go about doing that
in the first place? Over all the years I've used MySql, that has never been an
issue for me. Sure it has its quirks, but show me a mature piece of software
that doesn't.

------
yoanizer
This maybe related:

[https://news.ycombinator.com/item?id=6712703](https://news.ycombinator.com/item?id=6712703)

------
CurtMonash
There are ever more use cases for which relational data models are a poor fit.

For everything else, there are relational DBMS.

------
nmenglund
Do anyone recognize the application used for the metrics screenshots?

~~~
infecto
New Relic

~~~
YorickPeterse
Correct, this is taken from New Relic's APM service.

------
general_failure
I think strict mode makes the warning an error

------
michaelochurch
My aversion to NoSQLs is derived from the readiness with which uninformed
people dive into them, integrate them with their products, and create a web of
complexity around something that should ideally be boring and reliable: the
database.

There are so many things that I've heard you "can't do in SQL" that are false,
at least pertaining to Postgres. Semi-structured data, full-text search, "web
scale" programming, geographical indexes... all of these things, people say
"you can't do in a relational database" and that's not true. Postgres is
fucking powerful and can do a lot, very well.

Arguably, PostgreSQL isn't always a "relational database". You can _use_ it as
a key-value store. It just happens that you often _want_ relational logic in a
multi-purpose, long-lived data store. The relational database seems to be an
attractor; the requirements that accrue to a typical in-house "we can do it
better"/NIH non-relational database often converge on it.

NoSQLs have their place at very large scale (100+ TB) and there are plenty of
specialized reasons to use alternative databases-- I doubt that Postgres's
full-text search is competitive with Elasticsearch-- but I feel like most of
the anti-SQL sentiment is against _the language_. And sure, it's an ugly and
outmoded language, but the database is one place where I'd rather have an ugly
language and rock-solid tech than the other way around.

Finally, fuck ORMs.

~~~
Cakez0r
Agree with all of your points. The thing that perhaps annoys me the most is
that people seem to assume that nosql is magically fast and that sql is
slow... Sure, a key-value pair lookup runs like shit off a shovel, but as soon
as you want to run anything more complex than that, it's likely faster with a
relational database and a few indexes...

Usually the same people that think running an application on a cloud platform
magically makes it fast, when in reality a VPS would be cheaper and faster.

~~~
pkolaczk
NoSQL is not a well-defined term - there are huge differences between various
solutions all commonly called "NoSQL", so the comparison to NoSQL is bogus.
There are different NoSQL things designed with different use-cases in mind. On
one side you'll have things like MongoDB, which is easy to set up, but doesn't
really scale once your dataset grows out of memory, on the other side things
like Cassandra which are unmatched by any RDBMS in terms of performance,
scalability and availability on real-time, transactional workloads. Of course,
your RDBMS of choice might be better at complex JOINs than Cassandra (which
doesn't have JOINS at all, btw), but will it stand the chance at performace
competition with Apache Spark or Hadoop? I don't think so.

------
lafar6502
Isn't this a little premature to announce a victory? After all, they've been
using MongoDB for 5 years and it delivered for most of the time. Who knows if
this new shiny SQL thing will be sufficient for next 5 years? I'd say it's
quite likely after few rounds of excited development the database will be slow
and crappy again ;)

------
bricss
No comment.. (facepalm) and bad PR.

------
dkyc
MongoDB was never about its benchmarking, scaling or sharding abilities. It
was about two things:

#1: No schema. If you're prototyping, schemas just slow you down. Maybe that
changes as your application matures.

#2: Great drivers. The amount of pain to get a basic CRUD app running with a
SQL backend is just too high: push the data from the browser to the server in
JSON or XML, validate, convert into database schema, sanitize using prepared
statements or stored procedures, and send it off. Compare that with MongoDB:
_it 's just JSON™_.

The one thing that killed MongoDB was lack of reliability. Even when you're a
startup, you may not lose data. I think MongoDB could have easily dropped 50%
of performance & scalability for data security and it would be well off now.
It was never intended to be the safe, sane choice for big enterprises.

