

How You Should Go About Learning NoSQL - latch
http://openmymind.net/2011/8/15/How-You-Should-Go-About-Learning-NoSQL

======
yummyfajitas
Step 1 in learning NoSQL: learn SQL very well. There is a good chance it
already solves your problems AND ensures your data makes sense.

~~~
mtogo
As a bonus, SQL cares about data reliability _.

_ Not that all of "NoSQL" doesn't, but two major "NoSQL" databases, Redis and
MongoDB, both seem to care more about performance than durability.

~~~
rbranson
MySQL cares about data reliability? Maybe if you spend much time agonizing
over the configuration and set it to strict mode.

~~~
mtogo
I generally don't use MySQL, but iirc it confirms that nothing went wrong
saving the data (whereas with mongo there is no confirmation step, you just
have to hope that it was successful), and the data is written to disk.
Additionally, i've heard of zero "Oh no, MySQL just lost 50% of my production
database!" where it wasn't the user's fault, and i've heard enough of that
from Mongo to stay away.

~~~
jasonmccay
Respectfully, this isn't true. Yes, fire and forget is the default behavior,
but there is a confirmation step that you can check. It may be implemented a
bit differently from driver to driver, but it is generally called a "safe
insert". Numerous people use this to ensure their writes across single
databases as well as multi-node master-slave and replica sets.

~~~
mtogo
Respectfully, this isn't true. Safe inserts are _safer_ , but not safe. There
could still be a problem writing the data to disk, and (My|Postgre)SQL just
doesn't have this problem.

~~~
jasonmccay
Assuming the changes that were introduced in MongoDB 1.8 for single-server
durability, writes are being put to disk with journaling. So, the data is
written to disk.

------
noelwelsh
My approach was to read the Dynamo paper. It's presents the goals of Dynamo
and the technology from which it is constructed in a very compact form. From
there you have a basis for understanding the engineering tradeoffs made by the
other NoSQL databases. Then you can actually make decisions based on what fits
your use case, rather than just choosing Mongo because it's popular. (Don't
get me wrong, Mongo is great in its niche.)

Reading the Google BigTable paper would also be a good idea, as it represents
another major strand of work.

My blog post on Dynamo: [http://untyped.com/untyping/2011/01/21/all-about-
amazons-dyn...](http://untyped.com/untyping/2011/01/21/all-about-amazons-
dynamo/)

------
nkh
From the article:

 _A lot of NoSQL solutions are about solving specific problems. MongoDB is a
general solution which can (and probably should) be used in 90% of the cases
that you currently use an RDBMS._

As I use don't use any NoSQL solutions today, can someone list out a few of
these cases where I am using an RDBMS and I should be using MongoDB?

~~~
mtogo
I'm wondering about this too. For almost everything i work on, the data being
in the correct format, written to disk, and stored reliably is more important
than the speed or other advantages of Mongo.

~~~
jasonmccay
I have a hard time understanding responses like this and, in the end, they
make me think that people a) really don't understand some of these new data
technologies and b) are just as much on the anti-bandwagon as those that sit
on the latter.

For people using MongoDB daily, their data is in the correct format, it is
persisted to disk, it is stored reliably and it is fast. On top of that, it
gives them flexibility as their web app changes and new features and data
structures are added, it is easily viewed and manipulated in JSON-format, and
if and when the day comes that they need serious scaling, it helps there as
well. Also, in the world of EC2, it is straight-forward to set up replica sets
to offer redundancy.

All these conversations sound eerily familiar to the Java guys bashing Ruby
and Ruby on Rails about five years ago. If you want to stick with what you
have, then no worries. I just think people should be excited that over the
last couple of years, the "golden hammer" approach to storing data has finally
been overtaken and developers have a choice about what technology best solves
their data problem.

~~~
StavrosK
I have first-hand experience with MongoDB silently eating my data (yes, I was
using the 64-bit version) which I realized a week earlier when I noticed half
the dataset missing.

The reputation (specifically of Mongo) is probably not undeserved.

~~~
jasonmccay
I am not knowledgeable of your situation, but I am curious about your set up,
the intentions you had when you were storing things and other factors in
MongoDB not storing your data. Also, I am curious how long ago this was...what
version of MongoDB you were using and if you were checking getLastError after
the write (assuming you wanted that behavior).

I am not denying your experience, just curious about the whole picture.

~~~
StavrosK
I have a writeup here:

[http://www.korokithakis.net/posts/using-mongodb-for-great-
sc...](http://www.korokithakis.net/posts/using-mongodb-for-great-science-
part-2/)

~~~
dolinsky
I really don't see how you couldn't replace MongoDB in that post with any
other database and still be presented with the same issues: \- adding indexes
slows writes down and increases memory requirements \- running queries with
poor / no indexes will cause you to have I/O constraints and an unhappy CPU \-
it seems like your data set was better suited for a graph db like neo4j than a
document db or RDBMS.

As for the data corruption, I didnt see you mention if you were checking for a
response on saves or not.

I think the problems you experienced were due much more to the fact that you
apparently had more data than the machine could handle, not necessarily the
database engine used.

~~~
itaborai83
MongoDB, the "blame the user" database

------
Maro
There are several different NoSQLs.

For example, currently MongoDB is a good choice if you're looking for indices,
ad-hoc queries and want to get up and running quickly on eg. a web project.
Interestingly, with indices and ad-hoc queries, MongoDB becomes a lot like
Mysql, only you're writing weird And() and Or() functions instead of "SELECT
... WHERE AND ... OR". Also, indices are tied to the documents themselves, so
it's not clear how that scales.

Shameless plug:

Or you could go more bleeding-edge and check out ScalienDB, which is a
straight key-value store built on Paxos and sharding. Nevertheless, getting
started is easy:

    
    
      https://github.com/scalien/scaliendb/wiki
    

I wrote ScalienDB. The plan going forward is to add a data model and
distributed transactions and in general become a low level data layer
substrate similar to how Google uses its databases.

------
figital
SQL is (an expression of) relational algebra:
<http://en.wikipedia.org/wiki/Relational_algebra>. NoSQL is your (poorly
named) ability to cache stuff (flat datasets) in assorted places.

------
timanglade
It’s obvious the author means well — and thanks to him for plugging the NOSQL
Tapes! — but there are some seriously leaky assumptions at work here about a)
how Mongo, Redis (& RDBMSes) work and b) the extent to which this poorly-
digested (or regurgitated) understanding applies to other NOSQL solutions.

Too many shortcuts, inaccuracies, half-truths and implied errors to list here.
The core idea of learning about NOSQL through first-hand experience (&
dissection) of each project is sound, though.

------
br1
In some RDBMSs, primary and secondary indexes are very different. MsSql
clustered tables store the tuples themselves in the primary index B Tree.
Traversing the data in the order of the primary index incurs no extra disk
seeks. Also, you can add extra columns to secondary indexes to support queries
without searching the primary index at all.

The emulation of secondary indexes is inefficient. In a RDBMS without built in
index maintenance you would create table ( _LeaderboardId,_ ScoreId), not
(*LeaderboardId, ScoreIds). There's no need for comma separated fields.

------
JoshClose
You don't hear as much about it, but RavenDB is an excellent No-SQL solution
in MS land. It was written from the ground up with .NET instead of being
ported. It's become my first choice ahead of MS-SQL Server for .NET projects.

<http://ravendb.net/>

------
rsbrown
Very light reading, but all the same an excellent overview of MongoDB, Redis
and Cassandra. Well done.

------
perfunctory
How can one learn something that is defined by what it is not.

~~~
virmundi
This thought is not mine, but I cannot recall the source: NoSQL is a bad name;
it should be NotSQL. As a result it is a very large umbrella. When one sees
NoSQL it is a safe assumption to think MongoDB. But you could also think DB4O
(I like this much more, in an abstract way). So you can go about learning any
of these technologies since you find an instance of NotSQL. To learn NoSQL,
you are really still able to learn this. You are learning a philosophy rather
than a technology.

~~~
nickknw
I remember reading something by Erik Meijer where he states he also thinks it
is an unfortunate name. I think he suggested coSQL, and gave an interesting
perspective on the relationship between SQL and noSQL.

This was the article: [http://cacm.acm.org/magazines/2011/4/106584-a-co-
relational-...](http://cacm.acm.org/magazines/2011/4/106584-a-co-relational-
model-of-data-for-large-shared-data-banks)

------
akivabamberger
The plural of index is indices, not indexes...

------
Cloven
"Yesterday I tweeted three simple rules to learning NoSQL. Today I'd like to
expand on that. The rules are:

    
    
        1: Use Mon" CTRL-W

~~~
latch
What path would you suggest people take in learning something that challenges
decades of best practices?

~~~
knieveltech
1st, be damned sure that decades old best practices actually aren't the best
tool for your current application, developer hype be damned.

~~~
latch
Hard to know whether they aren't the best tool for the job without at least
familiarizing yourself with some alternatives.

------
BenSS
I've had a great experience with CouchDB as a NoSQL solution, simply because
it is NOT like your traditional MySQL/Postgres. I'm not trying to map my
existing knowledge to something that is fundamentally different. I think
MongoDB might be a mistake in this regard, because users will attempting to
deploy it similarly to a SQL solution.

