
MySQL is to SQL like MongoDB to NoSQL - geal
http://use-the-index-luke.com/blog/2013-10-01/mysql-is-to-sql-like-mongodb-to-nosql
======
zamalek
While a lot of people people have disagreed with this guy in the past, I find
it very hard to disagree with him in most of his posts.

My exposure, as far as SQL goes, has been MsSQL, PostgreSQL and MySQL. With
absolutely loads of MsSQL, and pretty hairy data structures at that (they have
an aversion to the polyglot approach where I work) - SQL is one of the most
enjoyable things that you can do once you get past the elementary "SELECT
WHERE" (and stop using graphical development tools).

I have been saying what Markus said for a long time, MySQL is why RDBMS has a
bad name and why the "not using SQL" movement exists. I really hope the
MariaDB team spend their time where it is needed most (a join engine that
doesn't suck), if they haven't already.

I mean, at the end of the day you have MySQL that can't even do hash joins,
and then you have PostgreSQL with GEQO:
[http://www.postgresql.org/docs/9.0/static/geqo-pg-
intro.html](http://www.postgresql.org/docs/9.0/static/geqo-pg-intro.html)

~~~
xd
You should know that the fork of MySQL, MariaDB, addressed hash joins well
over a year ago: [https://mariadb.com/kb/en/block-based-join-
algorithms/#block...](https://mariadb.com/kb/en/block-based-join-
algorithms/#block-hash-join)

~~~
VLM
Unfortunately thats only a year ago. I believe MySQL added transactions over a
decade ago and you still run into people who insist MySQL doesn't have
transactions, because five years ago they read something written seven years
ago that was years out of date at that time.

You just don't see behavior like that outside the database community. Hey did
you know that BIND doesn't support IPv6? Yeah version 4.9.3 doesn't support
ipv6. Don't bother me with facts like version 9.0 was released more than
thirteen years ago in 2000.

So once every installed and reasonably security patched mysql server in the
world supports hash joins, the world has at minimum a decade of people who
have no idea what a hash join is, and no idea how or why to use it, none the
less demanding that mysql doesn't support it.

(Bitter? Yeah maybe a little. You can get good advice about OS, or compilers,
off the internet. But overall the internet is useless for database advice, a
couple stars don't outshine the greater cesspool)

~~~
Buttons840
I have long said: "MySQL is the PHP of databases."

In my experience, the database is capable, but the database itself has some
quirks (whether these are due to the database itself or the developers I do
not know). Some things I do know for sure:

\- I worked on a product which supported MsSQL, Oracle, Postgres, and MySQL.
Only MySQL compared character columns in a case insensitive way which caused
us some extra work, but we did fix this using the proper collation.

\- I worked on a Django project. When using MySQL, a Django ORM query would
pull the entire result set and hold it in memory while you iterated over it,
sometimes consuming gigs of ram if we weren't careful. Iterating over a result
set from Postgres would pull (presumably) only one result at a time. Whatever
the case, it didn't cause memory problems like MySQL. This is not MySQL's
fault, but again, it was the exceptionally problematic database.

\- I'm currently working with a MySQL database which is quite troublesome
because foreign-keys are not enforced. See a non-null foreign-key? You might
expect it to refer to an existing record, but you'd be wrong. Again, failure
of the developers, not the database.

\- I also am currently working with a MsSQL database. We had to add a column
to a MsSQL table; no locking, no problem. Then I had to add a column to a
MySQL table, which locked the entire table for the length of the operation.
MySQL was, again, problematic where another database was not.

There are other problems I have only heard of, but I will not mention those.
All the above are personal experiences where MySQL was causing trouble where
other databases were not. It parallels my experience with PHP, both are great
tools if used right, but they seem to be misused more often then other
technologies.

~~~
astrodust
MySQL has been my go-to database for years because it was fast, reliable and
effective. Unfortunately it hasn't been evolving as quickly as other
databases, Postgres in particular.

There's a lot of quirks in MySQL that were excusable a decade a go, but now
they're problems that've been solved in virtually every other database engine.
Working with Sybase was never a pleasant experience, but versions a decade old
have features that MySQL _still_ doesn't have.

Every so often you'll hit the wall with MySQL, little things like applying a
LIMIT in a subselect, or row length limitations, that will make you wonder why
you're using MySQL at all.

MySQL is absolutely the PHP of databases.

~~~
xd
Be aware that when using text or blob types, these only contribute ~12bytes to
the total row length limitation of 65kb.

~~~
astrodust
Actually that's what I thought, but it's not always the case. Sometimes a good
chunk of the TEXT type column is saved in the row itself (~768 characters),
and when the length exceeds that, the remainder is stored in the blob area of
the table store. ([http://www.mysqlperformanceblog.com/2011/04/07/innodb-row-
si...](http://www.mysqlperformanceblog.com/2011/04/07/innodb-row-size-
limitation/))

This depends on your row format, an obscure technical issue most never have to
deal with, but it can be a problem.

------
leif
I may be biased because I work on it, but I think TokuMX solves or will soon
solve all of the "big data scaling" problems that exist in MongoDB.

There is definitely still room for polyglot strategies: Zookeeper is for tiny
but crucially consistent data, Riak is for fancy distributed systems
availability guarantees when you can afford a simple data model, and I believe
Redis has value as a sophisticated programming model for things you can fit in
RAM (but I actually have no Redis experience personally).

But in the past few months, I've grown to be really impressed with the
document model, and the aggregation framework is getting more and more
powerful with each release. I think that's the important part of MongoDB
(sharding's a little messy and Riak seems to have their heads on straighter
with that), and TokuMX takes that and sands off the rough edges you see with
big data sets and concurrency, and I think that's going to end up dominating
MongoDB and being a really compelling point in the NoSQL space.

------
leokun
People like to give MongoDB shit, but is actually pretty fun to use. I
wouldn't use it as a "big data platform." At scale I'd use Cassandra. For
relational data I'd use postgresql. For memory caching, Redis. So when do I
use MongoDB? For prototyping. Why? Because it's fun to use.

~~~
threeseed
Sorry but "fun" or "developer productivity" aren't allowed.

You have to have a database that is expensive, obtuse or difficult to manage
to be taken seriously.

~~~
alrs
"Developer Productivity" that leads to "ops people refuse to work here" isn't
a stellar strategy.

~~~
Diederich
Do 'ops people' tend to not like MongoDB?

~~~
alrs
Despise it.

~~~
Diederich
I've been nothing but an 'ops guy' for 20 years now, and I've been
enthusiastically using MongoDB for my 'devops' work since 2009.

Having said that, I've never used MongoDB 'on the large'. I've always used
MongoDB against datasets that mostly fit into memory, and use some other
solution for 'bigger data'.

I suspect most of the MongoDB dislike is rooted in three points: 1\. The
'Mongo People' really over-hyped it early on. 2\. The decision to not have
single-server disk safety (but providing very elegant multi-server data
safety) was a poorly understood decision. I personally agree with their
direction. Your data isn't really, really safe until it's on more than one
system. 3\. Based on what I've read, I don't think MongoDB scales up (even
with their built-in sharding) as well as some other solutions. Perhaps this is
limited to some kinds of use cases, and not others.

In my opinion, for people who do day to day fast development with dynamic
languages, like myself, MongoDB, used correctly, is really fantastic, at least
for many use cases.

------
taspeotis
At work we have an MS SQL Server instance managing ~1TB of data spread among
various databases.

No one database is > 200GB so it's not "web scale" by any means but sometimes
you need to pull out tricks like indexed/materialised views.

I use Transact-SQL directly via ADO in C++ and ADO .NET in C# and indirectly
via EF and NHibernate.

It's easy to manage schema with SQL Server Data Tools and diagnosing a poorly
performing query is straightforward with graphical execution plans.

Quite frankly, I don't care that there are alternative RDBMS' or alternatives
to traditional RDBMS.

But ... I keep hearing about problem after problem with MySQL. What's the
trick to using it successfully? Is the barrier to entry too low, and problems
are from legions of rank amateurs? i.e. is it as simple as constructing your
schema thoughtfully, with tables that are "well-it's-almost-3NF" and making
some educated guesses about which indexes might be needed in advance?

~~~
raverbashing
MySQL is a good database for basing a CMS on. (for a loose definition of
database)

PostgreSQL is a good SQL database.

And don't take me wrong, I'll take MySQL instead of PostgreSQL for simple and
medium projects.

"Is the barrier to entry too low, and problems are from legions of rank
amateurs?"

Maybe. Well, "low barrier to entry" is not a defect.

~~~
cperciva
_MySQL is a good database for basing a CMS on. (for a loose definition of
database)_

It seems to me that the success of MySQL can be tied directly to people using
"database" for things where what they really need is "filesystem".

~~~
bad_user
How easy is it to put a lock on a file while writing on it? What happens in an
environment where you have multiple frontend servers that need to write to the
same file? Are you going to have like one server that does the reading/writing
from a file and have like a protocol with which other servers can send
commands?

Great, you've just invented a half-assed DBMS.

~~~
lmm
> Great, you've just invented a half-assed DBMS.

My first thought is "Yeah, we already have one of those, it's called MySQL"

------
TazeTSchnitzel
MySQL is also generally poorly "designed". MySQL is to a database as PHP is to
a programming language.

~~~
MagicWishMonkey
I've heard lots of complaints about mysql, do you know if query deadlocks are
more common with mysql than with something like postgres? I've been running
into that a lot lately, and I don't understand what the hell is going on. None
of the queries I'm executing have particularly long run times (usually a few
milliseconds, occasionally 100ms), but I still get random deadlock errors from
time to time. I've never run into this on any other database platform (most of
my experience is with Oracle and SQL Server), and I have no idea what the hell
is going on.

~~~
VLM
We also have no idea whats going on with your system. I ran into this twice
that I can recall. Once was a pathological SELECT FOR UPDATE where it was
locking up way too much stuff, and another was a time where a boring looking
update was locking a whole bunch of seemingly unrelated indexes.

So for the two examples I ran into all I can say is make your transactions as
small as possible and take a close look at ALL the indexes especially if you
think they're unrelated.

There's an option for logging the intimate details of deadlocks into the log
file (or error file or whatever it was) anyway the main PITA is having to
restart the server to enable it. On second thought the real PITA is
operational in the "watched pot never boils" tradition as soon as you enable
it the intermittent problem seems to go away, just dumb luck.

~~~
MagicWishMonkey
The query in question is an UPDATE mytable SET flag=1 WHERE ID IN(...)
statement. I'm assuming that another system was in the middle of a performing
a SELECT statement on the flag column.

In this particular case it's perfectly ok for the select statement to get
slightly incomplete data (e.g. returning a record with a flag that might have
just been changed), so it would be nice if I could configure the table so that
select statements never lock. Is it possible to do something like that?

~~~
VLM
The first stack exchange when I googled was pretty interesting. For
completeness, if you use innodb as your engine

SET GLOBAL TRANSACTION ISOLATION LEVEL READ UNCOMMITTED

Seems to be the line you are asking for. There's the usual way to make it a
system default in the config file. As for the difference between what you are
asking to do, and what you should do, well... be careful. Especially if you're
feeding the dirty read back into something that could lead to an update. That
could be icky.

There's some way to do myisam stuff.

Generally I've found if you've got two things trying to lock one thing,
overall processing goes a lot faster if one greedy dude grabs it for the whole
thing. Cooperative multitasking doesn't work if it spends way more time lock
switching than actually processing. Of course greedy processing is an
excellent way to jam things up if you're not careful.

The funniest discovery I ever made about that related to importing zillions of
rows with autocommit off it was dramatically faster rather than demanding an
I/O store. This was the kind of data where if someone yanked the power cord it
would be auto-re-imported without any loss; if your only copy of the data is
in RAM and the INSERT you just ran, then autocommit off might be unwise.

Your IN (...) if ... is a select that might none the less be similar to my fun
time a couple years back with dueling indexes. This is the origin of the whole
never index more than you have to meme because it doesn't just waste storage
and write time, an "innocent" looking pathological index intended to make some
report run faster can kill read performance if its touches too many things
other indexes touch while "separate" processing is happening. If you don't
know A leads to B, B leads to C, C leads to D, then why does messing with A
kill D performance when D should be totally separate, ah forgot about that
peculiar linkage...

Note that none of this is a mysql problem. You can shoot yourself in the foot
with a common 9mm or a weirdo caliber. More people get hurt by the 9mm not
because it inherently magically hits feet more often but because theres just
more people using it. I'm quite sure you can screw up Oracle or DB2 the same
way if enough people try hard enough. I guarantee if you google you're going
to find more "mysql did something weird" stories than any other DBMS simply
because more people use mysql.

~~~
MagicWishMonkey
Thank you for helping me, this is great.

------
xd
[http://www.xaprb.com/blog/2013/10/01/mysql-isnt-limited-
to-n...](http://www.xaprb.com/blog/2013/10/01/mysql-isnt-limited-to-nested-
loop-joins/)

~~~
JunkDNA
I note for those who don't get to the bottom, there are comments there that
suggest the author of the post you linked is incorrect. I'm not qualified to
know who is right.

~~~
xd
Both comments are speculative.

Are people, in general, aware that MySQL employs a pluggable storage engine? I
wonder if much of the confusion about MySQLs abilities stems from arguments
from people aware of it against those that have only ever used the stock
engines.

~~~
debacle
Most people use the default storage engine, without really thinking about it,
but almost every project I've worked with recommends innodb - not sure why.

~~~
VLM
There's no surprise it became the default storage engine instead of myisam
around early ver 5.5 or so, it is superior to myisam in many ways.

Personally I like the row level locks and the way it can enforce foreign keys,
which myisam can't do.

------
mattkrea
Why is it that when it comes to databases no one is unbiased?

I cannot find any reliable articles on the web concerning database engines
that I look at and trust the author. It's even more difficult considering I am
far from a pro with databases but quite simply I won't use any MS products and
so I've used MySQL for smaller projects. I've started using MongoDB a lot more
lately and regardless of what people (mostly people who've never used it I
would imagine) say about it I love it.

------
monstrado
Comparing apples to oranges really, but I suppose it's more accurate if you
take into account all the people who are (or could be using) a relational
database without much issue and then switching over.

The article is right though, I've heard scaling MongoDB into the hundreds of
gigs to terabytes is an absolute nightmare for the operation team, the
software is actually very cool when dealing with reasonably sized data that
has an elastic structure.

There are other "NoSQL" databases out there that pay more attention to scale,
like HBase. My HBase cluster is just over a TB compressed, consistently
churning 6k requests a second, without an issue.

------
pmelendez
> "In my eyes, MySQL has done great harm to SQL because many of the problems
> people associate with SQL are in fact just MySQL problems"

That's a very strong and subjective statement. If any, resources heavy
databases like Oracle or PostgreSQL, had brought more users to NoSQL than
MySQL. That's doesn't there is something wrong with those databases, only that
they weren't the right tool for the job is some cases.

Also "MySQL problems" are most of the time due to poor usage, not to the
database system itself, actually when used properly MySQL/MyISAM is a great
tool

~~~
TylerE
That is pure FUD. Postgres is no more resource heavy than MySQL (the stock
config will only use 32MB of ram!), and will perform better on many real world
workloads.

~~~
lmm
I think it's not resource heaviness per se so much as high latency (and low
user-friendliness) in a dev configuration. Postgres takes noticeable time to
start up, both server and client, and the client feels less responsive; its
commands are also more arcane (e.g. mysql's "show tables" is something like
"\d"). Postgres is quite possibly better for a "production" configuration, but
it's much slower to develop with, so developers get to thinking of it as
"slow".

~~~
TylerE
I've never noticed latency issues.

Edit: Timing

Running the client: 12ms Server Startup: <400ms (due to the way OS X services
work hard to time directly) Server Shutdown: 925ms

Those are on a 3 year old Mac Mini - hardly a stud machine.

Why are you using the command line client anyway? You know pg_admin exists and
is free and runs everywhere, right?

------
wcummings
I think a lot of the NoSQL hate comes from people who don't understand the
use-case, and don't build High Availability/Low Latency systems. A lot of the
things I work on require pre-summarized data to provide fast response times at
scale (aggregating normalized data would be too slow) and a NoSQL DB (not
necessarily Mongo, I prefer Couchbase for most things) with a simple but
flexible data model that gives more control to the developer Just Makes Sense.

~~~
ddorian43
is couchbase advancing? they have no release in a long time(excluding the
mobile_stuff) ?

~~~
wcummings
Last release was 2.1, in June afaik

------
threeseed
You would think someone who claims to be an expert on databases would have a
clue about his industry.

MongoDB is a document database and is as different from almost every other
NoSQL database as it is from every SQL database. It very much stands alone and
requires your domain model to be structured in a particular way. To say that
it "represents" NoSQL is ridiculous. And to act like a guide on how to scale
MongoDB to store 100GB is a problem is also ridiculous. I can create domain
models that almost every SQL database would struggle with that MongoDB could
breeze through and vice versa.

And seriously anyone claims Cassandra or Riak are misspent adventures are
simply delusional. They are solving real world problems in particular around
horizontal scaling today that could never be done as cheaply or easily as
before. A master-master cluster that costs nothing, scales linearly and can be
managed by a developer. Would love to know what product existed years ago that
could do that.

Likewise I take exception with the criticism of MySQL. It is an easy to use,
manage and install and has the best tooling bar none. It does what it is
intended to do perfectly. Some people who deal exclusively with ORM layers
will never see the imperfections and just see the ease of use.

~~~
taspeotis
> I can create domain models that almost every SQL database would struggle
> with that MongoDB could breeze through and vice versa.

I am always interested in learning about models that SQL/"traditional relation
model" can't easily (or less easily) represent or query. Last time I asked
someone pointed me to Datomic's EAVT (entity-attribute-value-time) model as a
good example.

Do you know of any more than EAVT?

~~~
mcphilip
One classic example is tree/graph structured data.

I've worked extensively with modeling and querying medical concepts and
relationships in RDBMS. I realize there are tools like recursive common table
expressions an materialized paths that can aide querying such data, but now
that I'm working at a different job using neo4j, I can see how much simpler
the medical informatics domain could be modeled and traversed in a graph
database.

------
sergiosgc
If the article assertion is true, then what is the Postgresql of NoSQL? In OSS
RDBMSs, Postgresql quickly rose to the position of best designed, less quirky
contender (albeit waay slower than MySQL in the past).

Is there a NoSQL equivalent?

~~~
asdasf
>(albeit waay slower than MySQL in the past).

That isn't even remotely close to true. Single threaded "insert 10,000 rows"
benchmarks had postgresql slightly slower, not "waay slower". Concurrent
access benchmarks have always had postgresql faster than mysql.

~~~
sergiosgc
I'm old. When I say in the past, I mean pgsql 4 ;-)

~~~
asdasf
I'm old. I know that there was no postgresql 4. The first release of
postgresql was 6, and what I said was accurate as of the release of version 6,
in 1997 or 1998.

~~~
sergiosgc
[slow bow, with flamboyant hat salute] :)

I never expected the very geek joke to be understood. Very cool!

On a more serious note, I didn't mean to support the ever old myth that
postgresql is slower than mysql. It hasn't been the case for over ten years,
and all the while being a better database in every aspect.

However, go back enough and it was slower (than mysql) for real-world loads.
You could get near enough if you disabled fsync, but then were negating many
of the advantages of pgsql. Even then, in the old days, table based locking
would kill a real database. Table based locking was solved early enough ('98?
'99? around that time). Fsync was solved with WAL, in the performance effort
of the early 7.x series. It was around late '2000 that I came back to
postgresql for good. Until then, it was either mysql for a quick hack or
oracle for anything serious. Since then, it's postgresql for everything (I
don't deal with the scenarios where Oracle still leads).

------
mbroberg
So then does this mean that PostgreSQL is to SQL like CouchDB is to NoSQL?

------
BlobbleBlab
I can tell you from experience that joins are also slow in Oracle Database
Enterprise Edition. Joins are slow.

A nicely normalized relational data model has many advantages, but speed is
not one of them.

------
Roboprog
Maybe that explains why Oracle is pushing MySQL, then. See? Open Source DB
bad, must get the real thing from Oracle!

Or just use PostgreSQL :-)

------
jsemrau
No it's not . Mongo sucks. MYSQL is a great tool.

~~~
cnlwsu
Not even Apple/Google threads have this much fanboisms and useless comments.

