
Ask HN: Learning NoSQL, papers and books - wareotie
In your opinion, which papers and books are mandatory to really understand NoSQL subject?
======
indogooner
I don't know what is your current knowledge/experience with NoSQL databases
but I would suggest start with the well known Bigtable paper [1]. Post that
instead of reading more papers have a look at AOSA chapter on NoSQL [2]. You
can then either go through Bigtable paper again to improve understanding if
you feel so or jump to Dynamo paper[3]. To develop your understanding further
I think it would be good to go through documentation and source-code of some
opens source databases. This would help you connect the usage scenarios with
the design choices you saw in the papers.

After this it is upto you. The papers involve references to lot of distributed
systems literature. If you are interested you can go through resources here
[4]. If you want to go a more hands-on way, I would also recommend reading AWS
DynamoDB best practices (you can read up Cassandra or CouchDB also)
documentation [5] to see the practical consideration while using these
systems. Then try to use it or any other NoSQL database in a side project and
see whether they are good fit. The data modelling would involve thinking hard
about use-cases and would also help you compare this to relational systems.

[1]
[https://static.googleusercontent.com/media/research.google.c...](https://static.googleusercontent.com/media/research.google.com/en//archive/bigtable-
osdi06.pdf) [2]
[http://www.aosabook.org/en/nosql.html](http://www.aosabook.org/en/nosql.html)
[3] [http://www.allthingsdistributed.com/files/amazon-dynamo-
sosp...](http://www.allthingsdistributed.com/files/amazon-dynamo-sosp2007.pdf)
[4] [https://github.com/aphyr/distsys-class](https://github.com/aphyr/distsys-
class) [5]
[http://docs.aws.amazon.com/amazondynamodb/latest/developergu...](http://docs.aws.amazon.com/amazondynamodb/latest/developerguide/GuidelinesForTables.html)

------
tzury
Not ranting or trolling, but in the vast majority of cases I've come across,
PostgreSQL or even mySQL or SQLite would have been a better choice.

 _(There must be something appealing to developers using JSON 's style syntax
rather than a Structured Query Language.)_

There should be a solid reason to pick noSQL in general, and when such appear,
picking the right one amongst the available noSQL platform is another job.

[https://en.wikipedia.org/wiki/NoSQL](https://en.wikipedia.org/wiki/NoSQL)

~~~
greendisc
> Not ranting or trolling, but in the vast majority of cases I've come across,
> PostgreSQL or even mySQL or SQLite would have been a better choice.

 _This_ is ranting.

I am a Postgres proponent but saying that PostgresSQL/mySQL/SQLite is the
better choice in the vast majority of cases the parent has come across is
reckless. The words were well chosen making the rant not that obvious.

There aren't good or bad DBs. Every DB has its strengths and respective trade-
offs. As much I like Postgres, there so many use cases to use also other DBs
and also NoSQL ones. I am not feeding the troll and starting reasoning why
NoSQL can be terrific or SQL can be a big struggle, I am on both sides, both
SQL and NoSQL have their place.

It's sad that a thread which is about learning NoSQL gets hijacked by a
unrelated top comment opposing NoSQL.

~~~
traviscj
There definitely are bad databases. You can easily make a system that is NOT
consistent and NOT available and NOT partition tolerant, for example.

------
hdra
I highly recommend Martin Kleppmann's Designing Data Intensive
Applications([http://dataintensive.net](http://dataintensive.net)).

It will not only help you understand what's "SQL" and "NoSQL" data stores, it
also covers the differences between each of them, what problems they are
designed to solve, how they try to solve it, and if it'll help with your
problems as well.

------
ozanonay
I teach a course on database systems, including one class on distributed
databases (like Dynamo and Spanner) and another on dataflow engines (like
MapReduce/Hadoop and Spark).

Students seem to find the Dynamo paper to be the single most enlightening
resource. It does a great job of explaining Amazon's use case and how the
solution fits the problem. I also reference the relevant Red Book chapter and
some students value that context.

It's worth noting that students are very comfortable with relational DBMSs by
this point, both in theory and in practice. It quickly becomes clear to them
that NoSQL is better called "no transactions", as they know the costs and
benefits of various isolation levels in a traditional RDBMS. If you don't yet
have an undergraduate-level background in database systems I'd encourage you
to seek that out either first or at least along the way to understanding NoSQL
systems. My recommendations for how to do this as a self-learner are up on
[https://teachyourselfcs.com](https://teachyourselfcs.com).

~~~
itcmcgrath
Yet many non-relational systems do support ACID transactions across multiple
resources. Just from Google there is Megastore, Cloud Datastore, Spanner,
Cloud Firestore

------
zzzcpan
Distributed systems. Consensus [0], CAP, PACELC theorems [1], CRDTs [2], maybe
Chord DHT [3] for hash rings. Oh, and jepsen.io for actual database choices.

[0]
[https://en.wikipedia.org/wiki/Consensus_(computer_science)](https://en.wikipedia.org/wiki/Consensus_\(computer_science\))

[1]
[https://en.wikipedia.org/wiki/PACELC_theorem](https://en.wikipedia.org/wiki/PACELC_theorem)

[2] [https://en.wikipedia.org/wiki/Conflict-
free_replicated_data_...](https://en.wikipedia.org/wiki/Conflict-
free_replicated_data_type)

[3] [https://en.wikipedia.org/wiki/Chord_(peer-to-
peer)](https://en.wikipedia.org/wiki/Chord_\(peer-to-peer\))

------
TruthSHIFT
The most important thing to understand about NoSQL is when you should use it.
For many circumstances, NoSQL isn't the right tool for the job. The key is
being able to recognize when it is.

I'm still learning how to determine when I should use NoSQL instead of SQL. My
best advice is to carefully consider how to plan on querying your data. If you
plan on making complex queries that link multiple relationships, NoSQL is not
for you.

~~~
analogic
Or in a slightly different form, what I'd personally love to always have an
answer for: Is there a fast way to do this complex query in reasonable time in
rdbms or do we have to force it into NoSQLish solution? (say.. solr)

After I've optimized my query/indexes to get from 60s to like 4s running
through usual stuff and trying to not do anything too stupid, how to get it to
<200ms? Maybe better question how to structure data so you don't need the
complex query?

------
sciurus
Seven Databases in Seven Weeks [https://pragprog.com/book/rwdata/seven-
databases-in-seven-we...](https://pragprog.com/book/rwdata/seven-databases-in-
seven-weeks)

Designing Data Intensive applications
[http://dataintensive.net/](http://dataintensive.net/)

------
brudgers
Same as for SQL databases: _Readings in Database Systems, 5th Edition_ \--
Peter Bailis, Joseph M. Hellerstein, Michael Stonebraker, editors

[http://www.redbook.io/](http://www.redbook.io/)

------
WoodenChair
As a starting point, if you have little background in NoSQL, I strongly
recommend this 1 hour talk by Martin Fowler:
[https://www.youtube.com/watch?v=qI_g07C_Q5I](https://www.youtube.com/watch?v=qI_g07C_Q5I)

It's slightly dated, but it still gives a strong overview of the different
paradigms. The truth is what you want to learn probably differs greatly
depending on the paradigm that fits your application. NoSQL databases can
broadly be categorized into document-oriented, key-value store, columnar, and
graph. This video will help you understand what (at least three) of those are.
Then you can focus in on books/articles about the paradigm that makes the most
sense for you.

------
rolandm
Video from Martin Fowler about Introduction to NoSQL:
[https://www.youtube.com/watch?v=qI_g07C_Q5I](https://www.youtube.com/watch?v=qI_g07C_Q5I)

Tutorial from Felix Gessert about NoSQL [https://medium.baqend.com/nosql-
databases-a-survey-and-decis...](https://medium.baqend.com/nosql-databases-a-
survey-and-decision-guidance-ea7823a822d)

and Slides [https://www.slideshare.net/felixgessert/nosql-data-stores-
in...](https://www.slideshare.net/felixgessert/nosql-data-stores-in-research-
and-practice-icde-2016-tutorial-extended-version-75275720)

------
zitterbewegung
Designing Data-Intensive Applications [1] is a good book all around for
creating application and management of the data that they provide including
NoSQL.

[1] See [http://dataintensive.net](http://dataintensive.net)

------
dahart
I don't know of any mandatory books or theory about NoSQL, I picked it up on
the fly using Firebase for a web app. Not affiliated, but I'm a reasonably
happy customer. It's super easy to learn, and they have lots of tips and
pointers about how to use it well, as I'm sure others do.

Their tips are here, and I _think_ this applies to most/all NoSQL (someone
correct me if I'm wrong.)
[https://firebase.google.com/docs/database/web/structure-
data](https://firebase.google.com/docs/database/web/structure-data)

The tl;dr is:

\- Avoid complex queries. Structure data so that you can make simple queries
that execute fast.

\- Avoid nesting & flatten data as much as is reasonable.

NoSQL is easier to learn & use than SQL, there's lower barrier to entry, but
the trade off is that it's less powerful than SQL, so you have to keep your
data simple too.

~~~
jklein11
>Avoid nesting & flatten data as much as is reasonable.

Isn't this contradictory?

~~~
dahart
Yes, it is a little bit - if you mean that one reason to use NoSQL is to store
nested JSON.

This is referring more to schema than data. In part what that means is to
avoid nested indexes... subtle but different than avoiding any nesting at all.
In other words, if you can treat the nested data as a blob, it's probably
okay, but if it's being used for a query, it's adding complexity that can
cause trouble.

Some of the reasons for that are Firebase-specific, it has to do with security
rules and how security can get too complicated if you're not careful with
nesting.

But I'd guess it still applies to other NoSQL data... nesting data as part of
the schema is like making another table, and all the complexity that comes
with it. Except it's a new table you can only get to by going through the
first table.

A common problem with nesting is thinking you got the order right for your use
case and later finding out you sometimes want to index by the inner data
rather than the outer data. If you only have A/B (B nested in A) and you need
to query for As, then you're fine. When you find out you need to query for Bs,
you have a problem.

Firebase even recommends duplicating data, if necessary, to have two indexes
A/B and B/A, rather than trying to query for nested data.

~~~
jklein11
It seems like this stackoverflow question hits on the same issue you ran
into.[1]

It looks like that might be specific to Firebase's implementation because this
can be achieved with Mongodb.[2]

1\. [https://stackoverflow.com/questions/27207059/firebase-
query-...](https://stackoverflow.com/questions/27207059/firebase-query-double-
nested)

2.[https://stackoverflow.com/questions/15654228/sort-by-
embedde...](https://stackoverflow.com/questions/15654228/sort-by-embedded-
document)

~~~
dahart
No, that's not the issue I ran into. Their API has changed since this question
was asked & answered to fix that particular issue.

The bigger issue remains that schema nesting causes a type of complexity that
SQL dbs avoid by always being flat. Even that answer you linked to, the very
last sentence is: the most important one for people new to NoSQL/hierarchical
databases seems to be "avoid building nests".

Schema nesting in mongodb is also best avoided, if you can, e.g.:

[https://stackoverflow.com/questions/5108790/mongodb-best-
pra...](https://stackoverflow.com/questions/5108790/mongodb-best-practice-
nesting)

------
manigandham
Start with a general understanding of SQL/NoSQL/ACID/CAP and how they relate:
[https://www.quora.com/What-is-the-relation-between-SQL-
NoSQL...](https://www.quora.com/What-is-the-relation-between-SQL-NoSQL-the-
CAP-theorem-and-ACID/answer/Mani-Gandham)

Then read this book for in-depth details - _Designing Data-Intensive
Applications_ : [https://dataintensive.net/](https://dataintensive.net/)

------
unkown-unknowns
I found the book CouchDB: The Definitive Guide to be a good introduction when
I first read it some years ago. I bought the dead tree edition but they have
an online version that I think may have been updated.

[http://guide.couchdb.org/](http://guide.couchdb.org/)

------
opendomain
NoSQL Distilled NoSQL for dummies 7 databases in 7 weeks NoSQL for mere
mortals Professional NoSQL

and of course the orirginal papers from Amazon and Google.

If you have more questions - contact me at HN AT NoSql dot Com

