

My top 5 tips for modelling your data to scale - mbroberg
https://cloudant.com/blog/my-top-5-tips-for-modelling-your-data-to-scale/

======
ddebernardy
I'm still very puzzled about these eventually consistent systems… Maybe I'm
just getting old…

> Consider Immutable Data

˙ƃuoɹʍ ƃuᴉɥʇǝɯos ƃuᴉop ǝq ʇsnɯ no⅄

Perhaps locking isn't such a bad idea after all?

> De-Normalise Your Data

The more usual advise is to normalize it to the hilt:

[http://sqlblog.com/blogs/paul_nielsen/archive/2007/12/12/10-...](http://sqlblog.com/blogs/paul_nielsen/archive/2007/12/12/10-lessons-
from-35k-tps.aspx)

~~~
mathattack
Can you explain more on why normalizing is better? I get it for OLTP systems,
but for data analysis performance it just seems like doing the denormalization
can remove a step every time you want to pull the data.

~~~
ddebernardy
That's a use-case where I agree with you 100%, especially when it's an
expensive step: pre-aggregating (and occasionally indexing) a count or sum can
be priceless for data analysis

OP, in contrast, basically advocates getting rid of joins and such to avoid
network overhead. He's not wrong in his own context, i.e. a Document store
with no locking. It just contrasts, to me, with the more conventional tip you
hear in SQL land.

Note that OP mentions difficult problems arising around 1-tps due to the lack
of locking. The reference I linked to was advice on how to deal with
throughput that is four orders of magnitude higher in SQL Server. The gist of
it is that, at higher transactions per second rates, disk IO and CPU tend to
become blocking earlier than network access. Normalizing more helps in that
context.

