

Fighting The NoSQL Mindset - jbyers
http://www.yafla.com/dforbes/The_Impact_of_SSDs_on_Database_Performance_and_the_Performance_Paradox_of_Data_Explodification/
The full title of this post is "The Impact of SSDs on Database Performance and the Performance Paradox of Data Explodification" but I thought the first heading summed it up better.
======
stingraycharles
I'm getting more and more annoyed by these kind of articles. Why are people
talking about NoSQL and relational databases as if they're solving the same
problems? There is no "NoSQL" mindset to fight against, and there is no
"RDBMS" mindset to fight against: the only mindset to fight against, is a
mindset that says a silver bullet exists.

~~~
wvenable
> Why are people talking about NoSQL and relational databases as if they're
> solving the same problems?

If you can switch from one to the other and back again, how are they not
solving the same problems?

~~~
sjs
used to solve a problem != meant to solve a problem

Let's bring up the age-old vehicle analogy. Any vehicle fundamentally solves
the the same problem but that's naïve outlook that doesn't acknowledge any
specific use-cases. I can drive a truck down the street to get a Coke or ride
my bicycle across Canada but that doesn't mean bicycles and trucks are _meant_
to solve the same problem. The differences are obvious when you need to move a
pallet of Coke (or 50).

~~~
scotty79
What the stuff is "meant for" is often not important in real life situations.

If you have pallet of Coke to move, a bike and no driving licence that you
don't care what the bike was meant for.

If you have to hammer a nail but you only have pilers in your close vicinity
you hammer it with pilers and don't care what inventors of pilers meant them
for.

I'm not trying to draw any parallels to RDBMS vs NoSQL just pointing out that
what often matters in real life is what problems are solved with which tools
and why. What the tools were meant for is pie in the sky.

~~~
sjs
For legacy stuff that's true. For new projects we can and should make better
decisions (if politics aren't in the way). Many DB systems are free and open
source. If you have Unix servers you can choose from PostgreSQL, MySQL,
Cassandra, Riak, CouchDB, MongoDB, Voldemort, etc. There is no reason not to
choose the right tool for the job. You can assess them all if you have the
time and resources.

It's not the physical world and we basically always have whichever tool we
need available to us. I mainly use open source software so that may not be
true for everyone I suppose. That's their choice though.

------
steveklabnik
I have a feeling that "I made up a random database as big as Digg's, and look,
I'm getting 1000x the speed!!!" would kind of tend to imply that you're not
accurately replicating their problem.

~~~
madair
Perhaps you missed the point of the whole exercise: To respond to a high-
profile yet poor representation of RDBMS performance.

The OP repeated this reasoning more than once.

He showed that the facts cited by Digg do not make sense unless we take into
account poor database technology, poor database configuration, or poor
database skills, or all three.

Perhaps you disagree with that. However, that's not what you said above. The
OP also discussed the nature of this micro-benchmark, and it's relevance
despite his own poor knowledge of the actual data characteristics.

So in other words, he has already directly addressed your concern in advance,
more than once on that too as a matter of fact. Considering that fact, you
haven't actually responded to his article, you just wrote a "tends to" point
about micro-benchmarks, I think it's pretty clear that Dennis Forbes knows a
thing or two about benchmarks.

Blah blah blah to empty air, this comment page is pretty much a fact-free and
nuance-free flame war anyway, so what's the point, sorta embarrassing for the
esteemed HN crowd.

~~~
steveklabnik
> you just wrote a "tends to" point about micro-benchmarks

I am not a database guy. I also don't know enough about Digg's set up to say
with authority if these comments make sense. So I specifically wrote "I feel,"
"tends to," and "imply" because I'm not comfortable making an absolute
statement about the issue.

However... I don't see how this test is in any way relevant. a 30GB database?
Running on totally different hardware?

In any case, re-reading the article again, I see that relevance paragraph now.
I guess I missed it the first time around between all of the flaming, trollish
comments about both NoSQL and MySQL. But I still don't see how we can
extrapolate this test in any way to imply anything about Digg's practices at
all. Then again, it's 8:30am.

~~~
hp1995acer
How does one "miss a paragraph"?

What a unclever excuse for missing the point.

~~~
steveklabnik
Because it was 8:30am when I read this, and I usually wake up at 10.

It still doesn't change my original point, however. Just because he
acknowledges that the benchmark is unrelated to what he's talking about
doesn't excuse him from the fact that it's unrelated to what he's talking
about.

It also doesn't change the fact that the article is still a troll, regardless
of the correctness of his benchmark.

------
psadauskas
After experiencing how easy it is to get started on and develop against
MongoDB, I feel like RDBMS are a premature optimization. Its so much easier to
evolve your data model, and write arbitrary queries, and its plenty fast
enough for 90% of the web apps out there. Save the RDBMS for when you have
relational data that needs to be faster, or whatever other feature you happen
to need for that part of your app.

~~~
jpcx01
Great point. The thing I'm interested in on these new data storage
technologies is at the low level, rapid development, early stage (low traffic)
portion of a project since thats where I'm at.

I'd rather be able to remove barriers like having to design a schema, and get
some early efficiencies to develop my app fast and iterate.

Though, my experience in scaling every site that needed to be scaled has
concluded with sharding. So MongoDB sort of fits there as well with its
autosharding capabilities.

~~~
Vitaly
nosql databases do not free you from having to design your schema! they do
free you from having to run migrations, but those are trivial while you are
small, migrations are only a problem when you have huge tables. With nosql dbs
you can skip migrations and do 'repair on read' schema changes, but you still
need to design your schema. and coming from sql world, designing nosql schemas
can be a lot of pain since you have to change your perspective completely. and
crazy naming (of cassandra and other big-table derivatives) doesn't help
either: wtf is column family, etc :).

------
Confusion
_These examples always end up being "we moved from MySQL to NoSQL" rather than
"We moved from Sybase ASE to NoSQL"_

That's because there's a zillion MySQL installs out there, with users that
talk about them. On the contrary, there are a _lot_ less Sybase installs out
there and their (corporate) users don't talk about them. Go figure that you
only hear about MySQL. But please, keep on spreading the FUD; that just gives
us the edge of using a free, OSS, system.

 _DISCLAIMER: This is not a high-fidelity reproduction of Digg's situation_

And it's probably not even a low-fidelity reproduction. The article gives us
no reason to suppose he actually knew or understood the problem Digg had. He
just shows that it was not a trivial one, as that would've been easy to solve.

~~~
quicksilver03
Equating MySQL with the all of the other RDBMS systems out there seems the
other cardinal mistake of those discussions. Just do

sed s/NoSQL/NoMySQL/

and avoid the confusion.

------
astrec
Where are the writes?

~~~
rbranson
Exactly what I'm asking. Anyone can make read benchmarks fast. He's adding
index optimizations such as clustering that kill write performance. Balance
please.

~~~
gill_bates
The original Digg entry specifically makes the entire point that Digg is
willing to sacrifice write performance to improve read performance (which they
did to a massive degree with Cassandra), and then demonstrated how absolutely
horrendous their read performance was.

Need more be said? Seriously?

~~~
skorgu
Cassandra is _hugely_ biased towards making writes fast [1] and there's no
indication in either Digg article that they (somehow) changed that fact in
their deployment by tuning. Digg loads the _complexity_ during the write by
denormalizing but that load is on the application server not the data store.

[1] <http://spyced.blogspot.com/2010/01/cassandra-05.html>

------
rythie
Cassandra is about scaling _writes_ not reads.

MySQL is slow on writes because you'll get a random write for every index you
maintain (+the table it's self). MySQL's replication will help scale reads but
does nothing for writes. Cassandra is actually said to be _slower_ on reads
and it's thought people will already be using memcache so it's not a problem.

The article doesn't seem to have any writes going on while he is reading.

------
sjs
> Alternately you can just clutch onto NoSQL and bleat about how it changes
> all of the rules anyways, which is the route quite a few have decided to
> pursue.

Bath water and baby gone without a second thought.

Not to mention that later the author says:

> SSDs change everything.

To which I say: "Or you can clutch onto SSDs and bleat about how they change
all of the rules anyway, which is the route this author has decided to
pursue."

------
rbranson
Silly us. The engineers at Google and Amazon also don't know what they're
doing either. They should go back to computer science class and let the big
boys and SQL Server run the shop.

~~~
houseabsolute
It's a different problem. I can get a database that holds 30 GB, although I
suspect the actual DB was much larger. I can't get a database that holds >PB.

------
vyrotek
Great article. I have to admit though, I'm a huge fan of SQL Server :) I would
love to see more in depth comparisons.

~~~
learnalist
I too am a huge fan of SQL, what mysql and postgres has given to the world and
how it has inspired and set seeds in many of us to use open source, contribute
to open source.

I am dipping my toes into couchdb, just to see what all the noise is about.

Still getting my head around map,reduce and the fact im writing in javascript.
All that aside the biggest exciting factor for me.

It is so much easier to write custom functions for it than SQL. ( Mysql, and
yes I only tried via phpmyadmin )

~~~
WorkerBee
_I too am a huge fan of SQL_

Uh, he said "fan of SQL Server", which is Microsoft's SQL product. Fans of SQL
Server tend not to be fans of mySQL at all, since SQL Server makes mySQL look
like a "bottom-feeder" in the original poster's words. Not that I disagree.

------
sliderr
And then suddenly your nice MSSQL database machine explodes, leaving you
wondering why you didn't go for a fault tolerant system.

~~~
WorkerBee
There is lots of stuff in SQL server for backups, hot spares, clustering. It
can be fault-tolerant if you set it up that way. I wonder if you knew about
that functionality, and if not, why make claims in ignorance.

------
schammy
The problem with this article is that he is testing as ONE ACTIVE USER, as if
only one person was ever using digg at any given time. Of course the queries
are going to return thousands of times faster. Try replicating digg's actual
environment, which I know nothing of, but I know the site gets a ton of
traffic and probably has between 10-50,000 users online at any given time.

Now run the same queries you were doing again on your test machine, but
simulating 50K users online at once. Oh, and don't forget about thousands of
writes per second, which was conveniently not part of this test. What's that
you say? The performance is suddenly complete shit? Color me shocked.

~~~
wanderr
I was going to mention the same thing: there isn't even a mention of the word
concurrency in the whole article.

------
apower
Judging from the comments here, people don't seem to know basic database or
rdbms concepts or usage.

