
Mongo DB is web scale - roder
http://www.xtranormal.com/watch/6995033/
======
cscotta
To the good folks at 10Gen, Antirez, the Cassandra project, LinkedIn, Google,
Amazon, and everyone working to advance the state of datastores available for
increasingly specific applications, your work is undeniably important. The
team I work with has been evaluating a handful of emerging datastores for some
applications and are making plans to migrate a few types of data in some of
our systems from one to another. Thank God that we have choice beyond a
standard RDBMS and BDB.

To the armies of bloggers parroting slogan after slogan, ricing benchmarks so
far removed from real-world applications as to place themselves somewhere on
the spectrum between meaningless and malicious in the name of pageviews, well,
that's not helping anyone.

It's great to see people get excited about new technologies. But if
tech/startup culture is one that embraces and celebrates the fail, then we
damn well better also talk about the different areas where many different
datastores are not appropriate choices, and where some of them downright
break, risking extensive downtime or corruption, require unreasonable amounts
memory for indexes, or multiple machines to provide for durability, or return
inconsistent "postmodern" results, or what have you -- _without_ it being
understood as a personal attack on any one individual, company, or sector.

The productivity of this video is debatable, sure. But perhaps we can
appreciate that it parodies the "magic scaling sauce" image that the "tech
press" (face it, we're it) has given surprisingly young but maturing
technologies. There's no magic sauce, no drop-in answer to "web scale," and
certainly nothing fraught with difficulty. It's data. If you have a ton of it
and it's valuable, it's worth mountains of expensive programmer time and
effort to ensure its integrity, accessibility, and utility based on the
storage and query requirements of an application.

~~~
EasyCompany
Totally agree, as i am a beginner programmer, i am totally confused which
database to use and when, particularly when. I have read up on so many, so
many blog posts that contradict each other, i have looked into coudb,
cassandra, mysql, neo4j, hbase, persevere and a few more. I have just decided
to learn each one well and judge as my knowledge grows, so i have started with
mysql and couchdb.

~~~
Devilboy
You should try a RBDMS too :P

~~~
artsrc
There aren't any relational databases doing any thing useful, only non
relational databases, like those based on SQL.

~~~
silentbicycle
To clarify this comment a bit: The conventional RDBMSs people refer to as
relational databases (Oracle, Postgres, MySQL, SQLite, etc.) shoot for
compatibility with the SQL standard, which diverges from the relational model.
For example, you can have duplicate rows in a table, which doesn't make any
sense in the (set-based) relational model. SQL is not "purely" relational.

If you read Chris Date's books (I recommend starting with _An Introduction to
Database Systems_), he really hammers this point.

There are people who hate SQL because it's too relational, and there are
people who hate SQL because it's _not relational enough_.

------
dmytton
These videos are certainly funny, particularly the iPhone vs HTC one that was
popular a few months ago. But as cscotta pointed out there's an interesting
trend of picking certain flaws in particular databases and highlighting them
without telling the whole story.

Anyone watching this might believe that MongoDB has a major flaw where data
isn't written immediately and you have no idea if it has successfully been
stored to disk. Whilst it's true that inserts are not immediately written by
default, you can a) change the startup config to set the delay time b) force a
write of all pending changes from the command line c) force the write from
your call to the insert/update/remove/etc method in all the libraries and most
importantly, d) request the library method wait and return the response from
MongoDB so you can determine if the write was successful or not.

This means you have complete control over when you need fast inserts at the
expense of potential data loss, or when you need to be certain the data has
been written.

~~~
woogley
While you're correct, I don't believe the point was that you had no control
over it. The point was that their benchmarks came from using delayed-write
setting and compared it against databases that ARE writing to disk. He's
saying that mongo is cheating the benchmark by doing work after the timer has
stopped.

~~~
binspace
Maybe cheating is ok. Sure, there is a reason why it's faster. Maybe the
guarantee of the write to disk is not as necessary as some people would
believe.

It's certainly not the same as writing to /dev/null.

Maybe the developer can have other ways of guaranteeing consistency.

------
mrshoe
Ironic that transactions and guaranteed data integrity as performance
tradeoffs are now being used as arguments _for_ MySQL and _against_ those
pesky new up-and-coming open source databases.

~~~
tptacek
Funny: yes. Meaningful: no. The tradeoffs between MySQL and memory-backed k-v
stores are in a different league than the old tradeoffs between MySQL,
Postgres, and Oracle.

~~~
moe
I wouldn't call it a different league.

MySQL had _severe_ data consistency issues before InnoDB came around and even
today, on InnoDB, there is a variety of situations that will cause silent data
truncation or silent data loss.

------
rbranson
tl;dw: 99% of web applications will never need to be "web scale," so quit
worrying and just use an RDBMS.

~~~
psadauskas
99% of web applications will never need relations, so just go the easy route
and don't worry about a relational schema for your data.

~~~
kingkilr
Uhh, in the context of an RDBMS a "relation" doesn't mean a "relationship" in
the sense of a foreign key, it means a relational algebra relation, saying
most applications will never need relations is like saying most applications
don't use data.

~~~
chopsueyar
Oh Codd!

~~~
old_sound
I'm laughing badly thanks to this comment :D

------
js4all
This is hilarious:

"Why not write to /dev/null? It's fast as hell"

"Does /dev/null support sharding?"

~~~
auxbuss
I choked on that one too, but mainly because it echoes the kind of thing
pseudo knowledgeable non-tech folk say in real life. At least in my real life.

~~~
js4all
Yeah right. And /dev/null jokes are as old as Unix and always good for a
laugh.

------
mburney
Hilarious video...but it reminds me of the painful fact that I know very
little about databases so choosing between SQL and noSQL for my web app just
seems arbitrary for me. I wish there were some simpler explanations out there.

~~~
dasil003
First learn SQL and the relational paradigm.

SQL represents 40 years of experience designing general data stores that work
for the widest set of applications. SQL databases solve many very hard
problems which you are probably not aware of.

If you jump into NoSQL first you will be reimplementing SQL features in your
application code, and doing a shitty job of it because you have experience
with data stores that actually solve these problems well.

The reason for the existence of so many NoSQL databases is the rise of web
applications and the need to scale massively. However the majority of apps
will never need to scale beyond a single well-tuned database server anyway. By
the time they do you will have hard problems to solve regardless of what data
store you used. The advantage of SQL is that it's a fantastic hedge on the
evolution of your data usage patterns because it is designed to support ad-hoc
queries well, and the schema prevents bad application code from thrusting your
data into chaos at the first occurrence of a small bug.

Realistically if you knew you had to build an app for 5 million daily users,
and you knew exactly what it was going to do, then an SQL database very well
might be the wrong choice. But in the real world you have a long road ahead
before you hit that scale, and you'll have real data to determine what kind of
alternate data stores can best handle your load. Personally I'm a _huge_ fan
of redis, and its ability to scrape bottlenecks off a MySQL database in a
piecemeal fashion.

~~~
steveklabnik
NoSQL isn't just about scale. Sometimes, modeling data relationally is a real
bitch, and a document store makes more sense.

~~~
mburney
Yeah, the reason I was thinking of using mongo is because I have to write a
simple posting app (i.e. allow for blog or twitter style posts with tags) so
defining a relational schema seems like overkill to me. But I can't be sure.

To the parent: thanks for the edifying comment. I do have some experience with
SQL and relational DBs, but was thinking of using noSQL for some projects.
Your point to thoroughly learn the relational model is well taken.

~~~
foldr
Even here, you might be surprised by how limited some NoSQL stores are. E.g.,
if you have a requirement like "find the first 20 blog posts written by X
after this date."

------
kingnothing
I'm currently using mongodb for a new project at work and the biggest surprise
for me is that there is no elegant solution for doing the SQL equivalent of
COUNT(DISTINCT field). Count exists, distinct exists to return the set to you,
but the combination isn't there.

The only solutions i have found are to check the length of the distinct query,
which takes too long for a large result set, or to write a map reduce function
which takes longer than I'd like and is a large amount of code for
functionality that should already exist in the db.

------
sant0sk1
I hate the fact that I want to upmod this submission.

(I withstood the urge, you probably should too)

------
wccrawford
As usual, use the technology that fits your needs.

If you want a lot of speed and can tolerate the possible (even if unlikely)
data loss, use NoSQL.

If your business requires that your data is guaranteed and always up-to-date
at any moment, then use RDBMS.

~~~
tszming
Durability is a generic topic, not only the problem of NoSQL.

~~~
jchrisa
Some of the NoSQL solutions take durability very seriously, some put it second
to looking good in benchmarks. NoSQL is about choice, and the most durable of
the NoSQL stores are more durable than many of the venerable relational
databases.

For instances, what CouchDB treated as a major bug, is the accepted behavior
of many relational databases. (Eg, data isn't lost, but must be recovered via
a long-running process should there be an uncontrolled shtudown.)

Riak and Cassandra also have modes that treat durability as paramount, and
give you better assurances than MySQL or even commercial RDBMS products.

------
sandis
How ironic – an article about "web scale" and a "503 Service Unavailable No
server is available to handle this request." error when I try to open the link
:) (I'm certain that Mongo DB is not the one to blame though)

------
Tichy
Is this some kind of generative cartoon? Plug in a dialogue in text, and it
generates too teddy bears speaking the dialogue?

------
chopsueyar
Funny stuff. How did you make that and how do you synchronize, the mouth
movement with the dialogue?

~~~
studer
<http://www.xtranormal.com/>

"If you can type, you can make movies."

