

MongoDB as a better default data store - trustfundbaby
http://blog.gregweber.info/posts/2011-06-08-high-performance-rb-part2

======
davidw
> Most SQL databases trade off too much in terms of performance and features
> in order to have 2 things that are actually low on the need list of most web
> applications:

> * relational tables

> * transactions

Shudder... I think I'll hold on to my relational tables and transactions as
the default core of my systems, and optimize when I see that they're not fast
enough, thank you very much.

~~~
reedlaw
I see a lot of negative comments that lead me to believe that few here have
seen the problems with RDBMS's that lead some of us to consider NoSQL
alternatives. An application that frequently times out (page loads over 30
seconds) to me is a good candidate for a NoSQL backend, especially considering
we are mainly storing documents. This app was not developed by us and granted
the design is terrible with many N+1 queries (which we are eliminating one by
one), but I'm very eager to consider alternatives to MySQL for this and future
apps. Any comment that dismisses this need off-hand strikes me as somewhat
unaware of the problem. What I would like to see more discussion of is what
are the viable alternatives to RDBMS + ORM (specifically MySQL +
ActiveRecord). I am specifically considering Mongo + Mongoid. Now if anyone
can give me some solid reasons why this is not a good idea, I would be glad to
hear it.

~~~
ironchef
"An application that frequently times out (page loads over 30 seconds) to me
is a good candidate for a NoSQL backend" -Why? What tells you that the
application is spending most of its time waiting on the database? If it's
spending a lot of time waiting on the relational database, I would suggest
it's a good candidate for: 1) query optimization (as you start to surmise with
your N+1 observation) 2) possibly better disk IO 3) possibly a different db
architecture

~~~
reedlaw
We have New Relic installed. MySQL is not the only bottleneck, but it's a big
one. Ruby code is another. New Relic can't show us how much Ruby execution
time is ActiveRecord and how much is our application code, but we recently
upgraded from Rails 2.3 to 3 and noticed some decrease in speed. We have been
implementing memcached and optimizing queries with pure SQL but still have a
long way to go to get decent performance. Most of the slow pages are back end
admin stuff, so it's not absolutely critical, but nevertheless it's left me
with a bad impression of the traditional stack.

~~~
arethuza
But that sounds like a problem with the application, not the stack. What makes
you think that document oriented databases are any more immune to bad
application design than relational databases?

~~~
reedlaw
I believe it's a combination of application and stack that's the problem. I've
been comparing Rails 3 apps to similar apps in other frameworks. Locomotive is
a Rails CMS and it was taking ~1 second per page load. Calypso is a CMS in
Node.js and page loads were nearly instant. Both used Mongodb. Locomotive sped
up when 4 unicorn workers were applied, but it still didn't match Node.js.

------
strictfp
There seems to be a common misconception that relational databases have no in-
memory indexes and query caches and therefore are intrinsically slow. I don't
understand why this is.

~~~
foobarbazetc
Because the people who write these articles are morons?

~~~
troutwine
That is not very charitable of you, is it? I think it's fair to say that in
many areas tools to interface with relational databases are lacking; consider
the horror-shows that are Arel or Hibernate. Also, good schema design is not,
in itself, a trivial task, especially considering how common ignorance of such
principles are. In my experience, a not insignificant number of practitioners
learn a "normalization" peculiar to a specific product--or even a specific
product version--and apply this forward into the future to disastrous effect.

------
wfarr
"Over the next couple of years a lot of developers are going to make good
money moving companies off of Node.js and MongoDB." -
<http://twitter.com/#!/b6n/status/73535660467818496>

~~~
BasDirks
Big words, little substance. This is why I'm not on twitter.

~~~
sausagefeet
The words show up in a standard font for me...

------
antihero
I rather like redis, once you get the hang of doing things in that way, and if
you plan your system so that you know what ways you're going to access your
data in advance, it's really not a problem. Besides, say you want to get a
list of "events" by date, you can regenerate the index, or do the sorting
within the app.

~~~
MatthewPhillips
I'm new to Redis and my approach to this problem has been to create a Sorted
Set with the "score" being the date represented as an integer (in JS it's
getTime()).

------
mike_esspe
Due to global write lock you unlikely can use MongoDB without memcached, if
you have a lot of writes. We sometimes have queries waiting for a lock for a
second.

~~~
dolinsky
Indeed, once your app achieves a certain threshold then Memcached comes into
play. Curious if you've looked into sharding as a way to help minimize the
lock time needed?

Per collection locking is coming soon as well
<https://jira.mongodb.org/browse/SERVER-1240>

~~~
mike_esspe
Yes, we are going to use sharding.

But if per document lock could be available, then mongodb performance would be
much more impressive. Though as far as i understood it's not possible with
memory mapped files for some reason.

------
moe
_We can put data into MongoDB faster than we can get it out of MySQL during
the migration._

Yes, with the small caveat that Mongo doesn't respond to any queries[1] while
you do that. So better don't try such a bulk operation on a production Mongo.

[1]
[http://2.bp.blogspot.com/_VHQJkYQ5-dY/TUO3RAn8SNI/AAAAAAAABq...](http://2.bp.blogspot.com/_VHQJkYQ5-dY/TUO3RAn8SNI/AAAAAAAABqs/FJKgl_HgBWA/s1600/mongo-
rw-fail.png)

~~~
jzawodn
As the original author of that quote, no. I was running MongoDB in "safe" mode
where it doesn't return until the query has been ack'd. Furthermore, I was
running with w=2 meaning that at least one other note also had to ack the
data.

~~~
moe
Well, I've observed the same behaviour on a single-node mongo during bulk
imports, without special configuration changes.

Edit: Not sure why it's getting downvoted. You can reproduce it fairly easily
by performing bulk writes - esp. deletes, but also mongoimport or just a
write-loop. The global lock is a known problem.

------
nverdo
No-SQL solutions have proven their worth, but off course are not in any case a
replacement for any relational database

~~~
ironchef
Of course...and relational databases might not be the right tool for the right
job. Making a blanket statement that there is a "default" for data in a web
site is purely ignorance of the problem (i'm not saying you're saying
that....)

Of course the original author seems to think that most relational databases
lack performance and / or features. There is no one size fits all situation.
This is why companies like facebook, netflix, etc. use combinations of
relational databases and nosql databases (although i think netflix is almost
completely over to simpleDb now...)

------
ddemaree
I can't help thinking people who say MongoDB should be a "default" for all web
apps are people who've never had to set up, maintain, monitor, debug, or fix a
real MongoDB instance in production. It's a pain in the ass compared to MySQL
or Postgres. Since doing it right involves redundancy, 64-bit environments and
lots of storage space, it can be very expensive to host yourself, and Mongo-
as-a-service options like MongoHQ are also pricey for the level of
storage/performance you get. That's not to say Mongo sucks — I love Mongo, but
I've been burned enough trying to use it on various projects to make sure I
(or my company) have a solid reason for needing it before subjecting myself to
more pain.

In my experience there are always ways to tweak something in your RDBMS setup
to improve performance, like tuning queries or your indexing strategy, or
skipping using an ORM for super-critical stuff, or looking at I/O latency or
blocking issues between your app and DB machines.

~~~
gregwebs
The CTO I am working with is responsible for maintaining both MySQL and
MongoDB data stores himself in addition to writing a lot of code. After
starting with MySQL he has been gradually shifting everything to MongoDB and
would like to just get rid of MySQL.

I personally don't have deep operations experience, and I am sure your
individual experience is valid.

I think that in part, MongoDB is maturing- it now has single server durability
and replica sets. Single server durability means it now can be a default,
whereas before you had to commit to 2 servers as you allude to.

People seem to report it is more critical to have enough memory, and that it
wants more in comparison that a SQL database, so perhaps that limits its
ability to be a default database.

The issue of course isn't whether you _can_ adjust an RDMS based
infrastructure to suit your needs, but how much effort that will be in
comparison to an alternative.

------
middus
I don't understand why he talks about an ORM (object-relational mapper).

~~~
gregwebs
I added a more detailed explanation, I hope it makes sense:

In a database with a schema, if you saved a string to an integer field, the
database could either complain or coerce the string to a integer. Either way,
whenever you read a row, you know what type the value will be whenever you
read data a row from the database. Another issue with schemaless is simply
mis-typing the key (column) name. If you have a field called referrer, but you
perform a query using referer, MongoDB won’t know you have done anything
wrong. In a SQL database you will get an error on insert and on querying. An
ORM can instead be the one to tell you about the error.

~~~
middus
An object-relational mapper maps between objects and relations -- hence the
name. Thus, without relations (-> Mongo) there can be no ORM.

~~~
gregwebs
Perhaps I am using the term to loosely, and I do see people use an acronym
with a D for document now. There are still plenty of relations in Mongo,
particularly when you expose them through Mongoid, rather than being enforced
in a schema: <http://mongoid.org/docs/relations.html>

The MongoDB docs do say that I am supposed to use the term "ODM". I will
update my post. thanks.

