
Evernote blog: WhySQL? - grifaton
http://blog.evernote.com/tech/2012/02/23/whysql/
======
mapleoin
Does anyone have a link to a decent comparison between MySQL and PostgreSQL?
I'm really wondering why so many people use MySQL, even though it supports a
lot fewer SQL features than PostgreSQL.

~~~
morganpyne
Simple reliable replication has been a huge differentiator for a long time;
enough so to put up with a lot of the other faults of MySQL. Have not
revisited Postgres replication in a long time but I have seen that it has been
worked on. Anyone with recent experience in both care to explain how the
replication of both stacks up in recent versions?

~~~
moe
_Simple reliable replication_

I cringe every time I read that. MySQL replication is many things, but _it is
not reliable_ (as anyone who has used it at scale will confirm).

I think the only reason this myth prevails is because hardly anyone ever
actually verifies if their master/slave are in sync. A table checksum can be a
real eye-opener here, especially on a deployment that's been running for while
and undergone schema changes, restarts, network splits, etc.

------
j_col
Very interesting to see them bucking the trends, I love his closing line:

"But we’re relatively satisfied with sharded MySQL storage for Evernote user
account metadata, even though that’s not going to win any style points from
the cool kids."

Indeed, hipsters beware!

------
opendomain
There are other reasons to choose NoSQL For example, when Craigslist was using
mySql and they had to change their schema, it took MONTHS to facilitate the
change across all their slaves. You can also have a mixed strategy of using
both RDBMS and NoSQL to achieve consistency while being able to be flexible to
architecture changes. Lastly- have you looked at total overal cost? Setting up
a large cluster with mySql will have a large operational cost and it may not
be partition tolerant so if the wrong servers go down, it may cascade to your
whole data store.

~~~
zzzeek
Doing data migrations up front in a NoSQL system means you've changed the
fields around in your document types, then you're done. Now 95% of your
documents are wrong, missing those changes. Maybe you have linkages between
different types of document (akin to a foreign key) - now a lot of those might
be pointing to nothing.

In this model, it's the application's job to anticipate and work around these
inconsistencies. Assertions that check for data integrity, if you have them,
have to be modified to work around this. For a lot of the web applications we
talk about these days, who cares - it's people's lists of friends and TODO
notes. It's simple data and some dangling records aren't going to hurt anyone.

In the SQL world, we instead write migration scripts that migrate the
structure and data all at once. This is a little more work up front, but as
long as you stuck to a mostly normalized form and use great tools (which we
now have) this is not a big deal ("MONTHS" to migrate across slaves sounds
like they had some very wrong decisions made earlier on). The application can
move straight to supporting the new structure and entirely forget that the old
one existed. In this world, we can also have really complex relationships
between fields, like when we're storing accounting or medical data linked to
temporal records of changes. The application can consume this structure
without worrying about consistency.

~~~
einhverfr
I think the problem is that when you change a table structure in MySQL, it
takes quite a bit of time to do the actual writes. Adding a column tends to
force a rewrite of the table. Braindead I know....

I think the problem is that MySQL isn't really a standard SQL-world db. It has
some of the advantages of one, but not all of them and some very annoying
gotchas.

------
dabeeeenster
It helps that they have a perfectly shardable product I guess.

~~~
rmc
Yes, if you're Google (where any page can link to any other) or Facebook
(where a person can friend any other) you wouldn't have this. But lots of
businesses that provide software solutions do usually have something that is
very localised.

~~~
HarrisonFisk
It's a bit funny that you use Facebook as an example since they use sharded
MySQL as their primary data store.

------
zv
tldr - it works, we don't care about "being cool"

~~~
rmc
The true hacker style

------
fkn
Can anyone explain the following bit: "They’re cleanly partitioned into 20
million data separate data sets, one per user."

Does it mean they have a database per user? That can't be right is it?

~~~
driverdan
Why not have a database for each user? Evernote's data is partitioned
perfectly for that. Notebooks and notes are accessible to one user or are
public. There is no sharing notes between users.

~~~
mitchellhislop
There is sharing notes between users though - I have several shared notebooks,
each holding shared notes.

~~~
ThaddeusQuay2
Actually, his idea of a database for each user could still work, even though
there is sharing of data between users. Take each database, and turn it into
an executable object, which reads/writes its own data, and which communicates
with other objects for sharing. It's like taking the actor model of
computation, and orienting it for database use. I don't know of any working
example of where this has been done, but I don't see why it wouldn't be
feasible.

<http://en.wikipedia.org/wiki/Actor_model>

------
trustfundbaby
I get where they're coming from, but I do find this to be a little smug :)

See, they haven't run into problems with their setup, as per, MySql 'just
works' for them.

What would be interesting and educational (for me anyway) would be a situation
where folks that ran into serious problems with their SQL setup despite doing
the 'right things' persevered where conventional knowledge would have them
switch to a NoSql solution.

tldr; Dog bites man article, would love to hear from someone that actually
struggled with a SQL solution and soldiered on.

------
bitdiffusion
The notebook/note example is weak - in a nosql database you need to design
your data structure appropriately to get the level of atomicity you require.

Storing an entire notebook in a single document would be the most obvious. I
use postgres all the time and sql is great, but poo-pooing nosql because it
wouldn't work with your relational structure is not the best idea. Also - I
have found a hybrid between nosql (mongodb) and sql (postgres) is ideal - who
says you need to use a single database?

~~~
bni
What about when you want to find all notes that was made a specific day last
month (say for a report)?

Traverse all notebook documents and look at each notes date? Good luck with
that.

~~~
artsrc
Many nosql databases support queries. If you are using one of these then you
are in a better place for features like that than you are with heavily shaded
SQL.

------
artsrc
If the replication is asynchronous then SQL databases are not durable. So the
most important feature of SQL databases generally isn't one.

------
oacgnol
I can imagine that while Evernote has a lot of data to store, it doesn't have
the massive amount of concurrent reads that might occur with an equally large
web app. Do they publish numbers on read/write usage?

------
EtienneK
The biggest news to me was that they are using MySQL.

------
shingen
It's amazing how when you focus on proven (but supposedly boring or old)
technology that just works, and works very well, you can devote a lot of other
resources to the actual product and usability.

Maybe it's the 30 year old in me showing, but I'm sticking with the 'it just
works' crowd. Until some other approach provides a staggeringly overwhelming
reason to switch. I find scaling up with MySQL to be ridiculously easy,
allowing me to focus my time elsewhere. Ram, bandwidth, and fast storage have
gotten substantially cheaper in the last few years, making it that much easier
and cost effective to throw hardware at scaling up. For 99.9% of the Web,
those hardware resources are expanding in value much faster than traffic is
increasing.

(It's understood other developers find it just as easy to take a different
approach)

~~~
luigi
It's amazing how when you use recent technology that has been designed and
engineered from the ground up for the demands of modern web apps, you can
devote a lot of other resources to the actual product and usability.

~~~
geophile
Modern web apps are applications. Applications, over time, run into many
problems that SQL database systems were designed to address:

* The occasional join.

* Complex queries.

* Reporting.

* Schema changes.

* Transactions, with options for trading off strictness and performance.

* Performance problems that are easily addressed using stored procedures.

* Performance tuning with minimal code changes (e.g. adding an index to change the execution plan of a complex query).

* Enforcement of invariants regardless of application code.

Why would you give these up for scalability problems you won't have, with
99.999999% probability?

~~~
gizzlon
_"Why would you give these up for scalability problems you won't have, with
99.999999% probability?"_

I'm not saying people shouldn't use RDBMS', but there are many reasons for
using something else:

    
    
      - The new databases/stores have different features and use cases. Many include features RDBMS' do not have.
      - Having to use table/column's for everything can be quite unnatural and tiresome.
      - RDBMS' are battle-tested and their pros & cons are well known. But they might also be based on legacy models and truths that simply no longer holds.
      - At least for me, using & learning something new is a big motivation booster :)
    

Use whatever is right for the job. But one should at least have a general
knowledge of whats out there: features, pros & cons etc..

~~~
geophile
"The new databases/stores have different features and use cases. Many include
features RDBMS' do not have."

True. My view is that the tradeoff comes out overwhelmingly on the RDBMS side
most of the time.

"Having to use table/column's for everything can be quite unnatural and
tiresome."

Unnatural in the sense that it is different from the first language you learn,
yes. What I find tiresome is low-level, record-at-a-time programming when
doing what could be expressed concisely as, say, a 5-way join with
aggregation.

"RDBMS' are battle-tested and their pros & cons are well known. But they might
also be based on legacy models and truths that simply no longer holds."

I really think you need to learn something about RDBMS fundamentals. Set
theory certainly "holds" to this day, and is certainly not legacy. The
relational model is clumsy for some kinds of data ( (e.g. time series), but
saying that the foundation is obsolete is just objectively false.

"At least for me, using & learning something new is a big motivation booster"

No argument there. But now take the next step and do an objective comparison
between old and new. I have no problem with NoSQL systems. I even built one.
But for nearly all applications, when you account for a wide set of
requirements, (not just what is needed to get the MVP up and running), I think
an RDBMS is the right tool.

~~~
gizzlon
_"What I find tiresome is low-level, record-at-a-time programming when doing
what could be expressed concisely as, say, a 5-way join with aggregation"_

Again, that depends on what you are doing. Converting back and forth between
rows/tables and, say, data stored in hashes, is tiresome. It's also unnatural
when the data doesn't "fit" in the relational model. Of course it works, but
writing all this banal sql is not fun.

 _"I really think you need to learn something about RDBMS fundamentals...."_

Right.. I was talking about the software. Maybe "models" is the wrong word.

 _"But now take the next step and do an objective comparison between old and
new"_

Believe it's pretty objective already.

------
jbverschoor
That was the worst "why we still use mysql"-post ever.

------
4ad
This article is very weak, they insist a lot on ACID, but those are completely
orthogonal concepts from SQL. Most NoSQL products are ACID.

Also, the example itself is very weak as bitdiffusion below pointed out.

~~~
mapgrep
Good point about SQL not having a lock on ACID. CouchDB in particular is very
proud of its out-of-the-box ACID compliance.

People sometimes conflate the DB access approach (document vs relational) with
the storage approach (transactional vs warehouse). This may be because the
NoSQL poster child, MongoDB, at one point defaulted to a non ACID mode of
operation. But you can have a relational DB that's not ACID (MySQL 3) and an
object DB that is (Couch).

I was surprised so much of the original article focused on ACID as though it
were the biggest selling point for an RDBMS. It seems like the biggest win
(right now) is the sheer number of things a typical RDMBS does for you -- not
just ACID but also data integrity (foreign key constraints), automatic index
creation (mostly), and automated schema changes across many records (ALTER
TABLE). The cost, of course, is the up-front effort of fitting your data and
app to the relational model.

