
MongoGate — or let's have a serious NoSQL discussion - zeit_geist
http://blog.erlang.de/mongogate-or-lets-have-a-serious-nosql-discus
======
muyuu
My big concern with so-called NoSQL solutions is the "culture" that seems to
be brewing there.

If you go to the "Don't use MongoDB" post (
<http://news.ycombinator.com/item?id=3202081> ) you will read some, IMO,
extremely worrying comments from a few pro-NoSQL users including antirez
(Redis).

For some reason NoSQL now apparently means "unreliable datastore for
unimportant, throwaway data" and defaults are chosen accordingly. Why the hell
is that?

NoSQL for me doesn't imply anything other than "no SQL", and at a stretch "no
schema" - this makes a lot of sense for many of us who routinely need to
create databases that are logically trivial. In many cases they are a bunch of
glorified persistent hash tables that usually don't fit in memory. But this
doesn't mean they aren't critical. Why would it have to? This isn't anything
new either, we've had Berkeley DB for a long while. It's just a bit of the dry
side and it may fall short in many cases.

What I was looking forward to and I hoped I could find in the "NoSQL scene" is
an alternative to traditional DBs but without the overhead that many times is
not necessary (but sometimes is, and I intend to continue using PostgreSQL
when appropriate). Ideally, something as simple as mongoDB appears to be
(tried the interactive tutorial).

So when exactly NoSQL stopped meaning "no SQL" and started meaning "unreliable
cache"? Other than the simplicity, I fail to see where it would fit in the
market then (other than the amateur market). There are better, stablished DB
caching solutions. There are persistence libraries in any moderately language.
There are reliable databases that are fast enough when you have the budget to
scale to several dedicated servers.

How about Riak?

~~~
m104
NoSQL has never meant "unreliable datastore for unimportant, throwaway data".
If it did, there would be no need for the MongoDB rant because that poor level
of at-scale reliability would have been understood from the beginning. MongoDB
wasn't marketed as an unreliable data store, so expectations weren't met by
the rant author.

I'm worried about the culture that's brewing as well, but I see it more as an
attempt from some NoSQL supporters to keep MongoDB looking good, even in the
face of serious data integrity issues. The battle lines are forming between
SQL and NoSQL (relational vs. non-relational data stores, really) and there's
a lot of money and reputation at stake. What we don't want is for the facts to
die in a war of rhetoric about the merits of SQL vs. NoSQL. That would be
dumb.

With that said, the first paragraph of the rant is worrying:

"I've kept quiet for awhile for various political reasons, but I now feel a
kind of social responsibility to deter people from banking their business on
MongoDB."

What the hell does "various political reasons" mean? I'm more concerned about
that than any deficiencies in MongoDB's codebase. Is there a well-funded
campaign to silence MongoDB/NoSQL criticism, or is this just one customer's
attempt to save face for choosing the wrong data store?

~~~
Cieplak
Conspiracy theory

<http://news.ycombinator.com/item?id=3066022>

------
mtkd
Looking at HN today - it's full of hate and negativity.

Kudos to the developers that rise above this, often working for nothing, to
build the awesome tools that future generations will use to build awesome
apps.

~~~
mechanical_fish
_it's full of hate and negativity_

You can't spell _NoSQL_ without the word _no_.

This is why I try never to use the word _NoSQL_. It's a flamebait word,
deliberately engineered to add heat rather than light. There's no such thing
as "a NoSQL database"; there are only databases. Even the relational databases
that parse SQL have significant differences, and the databases that don't
speak SQL are all over the map.

~~~
cperciva
_You can't spell NoSQL without the word no._

Nor can you spell it without the word _os_. But I don't think we're talking
about a mouth or other external opening.

A lot of people now read "NoSQL" as "Not Only SQL", which seems more positive
than negative.

~~~
ssmoot
Maybe I've just been living under a rock, but that's the first time I've ever
seen "Not Only SQL".

I can't tell if you're being serious or not.

~~~
cperciva
Not Only SQL has been around for at least two years:
<http://twitter.com/#!/simonw/status/5339626595>

I'm not sure who came up with it or when.

------
j_baker
I'm calling troll.

Having a serious discussion about NoSQL databases begs the exact same question
as having a serious discussion about cancer: what kind would you like to have
a serious discussion about?

I think the most important lesson we can learn from NoSQL in general is that
the idea of a one-size-fits-all database is becoming dated. NoSQL databases
certainly _don't_ solve the problems the author points out, and they probably
never will. In fact that's the point. By not solving one set of problems, you
allow yourself to solve another set of problems.

How about we use databases to solve the problems they were meant to solve,
rather than basing our choices on whatever the popular opinion is at the
moment.

~~~
jeffdavis
"I think the most important lesson we can learn from NoSQL in general is that
the idea of a one-size-fits-all database is becoming dated."

For programming languages, using the "right tool for the job" has little
downside. Perhaps the developers need to learn an extra language, or perhaps
there is some communication overhead between them. But unless the components
are tightly-coupled, there's not much of a loss.

In contrast, the value of the whole data is greater than the sum of the parts.
If you have a website selling products and an inventory management system and
an automatic price-setting tool, it's hard to use a different DBMS for each
one.

Even for data sets that seem unrelated at first, there may be a lot of value
in the small connections between them. This is becoming increasingly apparent
and companies are trying very hard to see these connections. Being in separate
systems just makes that more difficult.

So, there are good reasons to use multiple database systems, but there is also
a much higher cost. Saying "use the right tool for the job" doesn't give any
guidance about when it's worth the cost and when it's not.

~~~
j_baker
I think you're mixing concerns a bit. For data warehousing purposes, I agree
that it's absolutely preferable to have all the data in one place (like
hadoop/HDFS).

For production OLTP stuff, I'd argue that it's a bad idea to do the kind of
processing you're talking about in the database unless you can avoid it.
Beyond the performance implications, you'll likely have to alter your schema
in unnatural ways that you wouldn't otherwise.

Now, I absolutely agree that you need to do a cost/benefit analysis and that
there _are_ costs associated with having multiple databases. But I don't think
those costs are as high as they would appear on first intuition.

~~~
jeffdavis
I think you can run into problems in OLTP, as well. To stick with the example,
you have three systems: sales from the website, price-setting tool, and
inventory system.

Should the sale happen at all? Not if the inventory is depleted. Sure, you can
put it on back-order, but then you have an unhappy customer.

At what price should the sale happen? It would be nice if you could
automatically raise prices when the inventory drops below 10 units (which may
indicate a demand spike or a supply interruption), for example. If you don't
raise prices soon enough, you're more likely to run into a depleted inventory,
again making the customer unhappy.

And what if you encounter an error moving data between systems? The customer
thinks the sale happened, but it wasn't (or couldn't be) loaded into the
inventory system for some reason. The customer will call a week later asking
why it still isn't shipped, the service rep will be clueless trying to trace
between the systems, and ultimately the customer will be unhappy.

(Just to be clear: properly integrated data management may still be done with
multiple systems. But it's harder.)

------
X4
Hi zeit_geist,

to me the problems you've described in your blog post are specifically
application model problems. I think we shouldn't abstract the application
model into the database, but the database into the application model.

I know a very innovative French developer who wrote an application server,
that comes integrated with the database. In this very way you just call the
exported functions provided by your database directly. How you model your
(re)caching/(re)indexing and other application needs is totally up to you.
This a) a freedom you barely find anywhere else. b) bare to the metal
development of an application c) the most effcient way to develop an
application. (b/c you only implement what you need and don't use a generalized
construct that serves a general purpose very well, but doesn't scale with your
application very well)

I would recommend to implement an application using the pattern that you know
works best for the application, if you don't know it yet, then it's time to
read books that enlighten our horizon of available solutions until we can
start developing again.

I will show you an example of what I mean.

<http://gwan.ch/api#kv>

This is how I think is the most elegant way to interact with a(n integrated)
database.

I am curious on what you think about this. I know I've not referred to the
points in your post, but I've read it carefully. Thanks for hearing me out.
I'm sorry I didn't post to your blog, but I prefer to post without subscribing
to an external party. You limit the users who can answer this way imho. I'm
not sure if it helps you to keep out trolls/spammers, but it sure helps to
keep response rate low.

~~~
zeit_geist
I have taken a short look at your approach only. I still think KV-stores are
Assembler-like constructs and as such I would apply my criticism to your
approach equally -- please correct me if I (mis-)judge your project! But in
general, I think your approach is a good one.

Regarding comments at my blog: I don't understand what you mean with
"subscribing". According to the settings page, you do not have to register.
You are free to comment there anonymously. That being said, your comment at HN
is highly appreciated. Thank you for taking your time!

------
linuxhansl
"The problem is: NoSQL is not a solution at all. It's a trade-off."

Bingo, that is _precisely_ what NotOnlySQL is all about. For example you trade
some consistency guarantees for the ability to scale out.

Uninformed (has the author heard about the CAP theorem?), either-or diatribes
like this article don't really serve any purpose other than sowing discord.

This is just like "C++ is better than Java is better than..." type flames
wars. :)

We use Oracle and we use HBase. We would never replace Oracle with HBase for
all of our data needs. At the same time we have need for a store that scales
beyond what even Oracle can provide (and yes, we use RAC with multi TB caches
across a database instance).

For the same reason we use Java, C++, Scala, Perl, Closure, Bash, JavaScript,
etc... The right tool for the right job.

Personally what I would like to see is:

* secondary indexes

* snapshot isolation (in leu of global transactions, which will never scale).

Disclaimer: HBase committer here.

------
yummyfajitas
I find one of his counterexamples amusing:

 _Managing Highly-dimensional data and access to it: ...I'm thinking of e.g.
geo/spatial data here. Where are the solutions out there?_

<http://www.mongodb.org/display/DOCS/Geospatial+Indexing>

~~~
zeit_geist
True. The author forgot to add "scalable" there -- but that is actually mean
there.

I am studying Multi-Dimensional Indexing for more than 3y now and have
implemented many of the state-of-the-art indexes. They are all not sufficient
as especially MongoDB's implemention is insufficient in especially the
scalability-domain.

------
smoody
The original document to which you refer (that I refuse to refer to as the
"MongoGate" document :-) is about system reliability. Those types of problems
can exist in any database system and are not specific to every NoSQL database
system. The document claims that MongoDB doesn't perform well under very high
loads in a replicated environment.

Yes, NoSQL doesn't fit the problem you're trying to solve. Perhaps there are a
set of problems that are difficult to solve with NoSQL, but there exists sets
of problems for which NoSQL databases are perfectly suited. So, I would modify
your post to state that NoSQL isn't the solution to every problem, but don't
think you're uncovering some big secret, because most people already know
that.

~~~
zeit_geist
I took "MongoGate" (sorry, I just fell in love with that term) as an example
of what happens when overly positive expectations hit the hard ground.
Removing the hype from NoSQL ("it's innovative", "it scales", "the cool guys
use it", yada yada) is what I like to do.

I surely do not uncover any secret there. But I haven't stumbled upon a "what
NoSQL lacks" blog post recently either. You can read my post in many ways, but
the latter one is actually one possible way imho.

------
rfurlan
I just recently published an article describing my experience migrating from
SQL Server to MongoDB, you can read it here:
<http://www.wireclub.com/development/TqnkQwQ8CxUYTVT90/read>

I tried my best to describe both what we gained and what we lost after the
transition. At the end of the day, MongoDB (and other NoSQL solutions) are
different tools for different jobs. Obviously it takes investment to master a
new tool and we almost aborted the migration in two different occasions simply
because we didn't know enough about maximizing MongoDB performance. Now that
the dust has settled and with all things considered, I am glad we didn't.

------
bbulkow
Disclosure: I wrote a product called Citrusleaf, which also plays in the NoSQL
space.

I also want a better discussion of NoSQL. It isn't fair to hate on databases
without understanding the pressures of operations. I saw a friend's company
where a big, fancy oracle system lost _all_ of its data on their main test/dev
system at a crucial moment - lost over 100,000 user accounts, including those
of executives of key customers. They were forces to merge with a competitor
about 4 months later.

You need to take database backups, you need to stage your systems. You need to
have extra hardware on hand.

Some of our customers at Citrusleaf continue to "run with scissors". I like
the attitude, but we've had to talk sternly with them about the benefits of
staging, bucket testing new releases (app and db), and penciling out the
realistic hardware requirements.

The new crop of distributed databases provide an immense opportunity for all
of us. We can write more agile applications than ever before, and as a
community we all need to understand the _benefits_ of flexibility. This
includes your entire organization.

That being said, there are technology differences between the NoSQL solutions,
and at Citrusleaf we've focused on operations and deployability. My co-founder
ran Yahoo Mobile's engineering and ops group, so understands the tradeoffs. We
have a group in India (hi guys!) of great developers (not support guys) simply
to make sure that when you've got a problem at 3am there's someone to take
care of you.

Performance is important in this agile world, and Citrusleaf has it.
<http://bit.ly/rRlq9V>

A slide I showed at HPTS (the high performance transaction systems conference)
showed a Zynga game on the right, and an EA facebook game on the right. Zynga
is an amazing machine in terms of getting huge, rich applications to market.
Every pixel is covered with things to do, artwork, everything. And they're
rolling out new games every week, and I haven't ever seen downtime (unlike
Netflix Streaming, which has maintenance on a regular basis).

Zynga has been a huge proponent of NoSQL (but not Mongo) since its inception,
and although I don't know what EA does internally (maybe they use the same
tech but have other agility issues), NoSQL is clearly part of a high scale,
rich application need.

Join or be flattened.

~~~
cperciva
Is there some reason why your benchmarks look like you're cheating?

"The Citrusleaf server node received input from 4 client nodes, the MongoDB
server node received input from 1 client node running 2 client processes, and
the Redis server node received input from 2 client nodes."

I mean -- if you're cheating, that's bad. If you're not cheating, why the hell
do you set things up to look like you're cheating?

