

A data modeling challenge: relational vs document oriented for modeling orders - mschireson
http://maxschireson.com/2011/05/10/a-data-modeling-challenge-to-the-relational-is-always-best-set/

======
pradocchia
It's perfectly acceptable to store point-in-time values--whole strings!--in
your relation model, along side the FK. They actually record _different_
information:

1\. the strings are for reconstructing the order as it, at the time it was
placed.

2\. the FKs are for managing and categorizing the order data, eg: How many
orders did customer X place in March? What's the margin on SKU Y? etc. The FKs
facilitate this. Update the SKU description, assuming the product is still the
same product, and all the data flows through properly to the reports.

Really, the relation model is not optimized for "first-order" applications,
such as saving and displaying a complex entity. Document dbs are great at
this.

Instead, it's optimized for second and third-order applications, like "show me
all the clients that ordered X in March but didn't order Y in April." The
whole point of the relation model is to make the data orthogonal to any
particular application. Try asking those second-order questions of your
document-based DB. Yes, it can be done, but at what effort, and with what
degree of confidence in the results? cf.
<http://howfuckedismydatabase.com/nosql/>

EDIT: all that said, I think there really is a benefit to document dbs: maybe
you don't care about second order applications yet. Maybe you just need to
bang something out today.

Since the relational model is orthogonal to your application, it can be a real
bitch to get it "right". How do you know what you will use this data for
tomorrow? You don't, you can make some guesses, but this is where the normal
forms come in: "follow these rules, and you can use your data for anything!"
Well, fine, but the resulting models are often very hard to grok, and hard to
work with. For someone w/ better things to do, this can turn into a real time
sink.

Perhaps it _is_ better to take on a little technical debt today, with the
expectation that tomorrow, if need be, you'll convert or extract to a more
relational form.

~~~
mschireson
Agree on using both point in time values and FKs.

Agree also that there are plenty of second and third order apps where RDBMS is
great.

The non-DB comic is funny.

~~~
pradocchia
You took a real beating on your blog, which prompted my edit.

It may be useful at times to frame the debate in terms of time to market,
secondary apps, etc. The relational model necessarily presupposes you will use
the database for more than one thing, and over an extended period of time,
thus warranting a high up-front investment and agnostic model. This may be
right in some cases, but often times it's more important to publish early,
bootstrap yourself, and start iterating.

~~~
mschireson
Thanks. Didn't expect as much interest as I got - would definitely have worked
longer on the post if I thought thousands of people would have read it :)

My view is not that relational is bad, but that the world needs an
alternative.

------
mattj
I enjoy using mongo / other document stores, but this article is lame. The
contrast is between a single document storing an order, with sub-documents
containing details, versus a table per sub-doc type. They claim you have to
denormalize data in the relational version, which isn't true, and use that
claim to assert that updates are complicated.

Guess what? Denormalizing data can make things complicated in mongo-land, too.
This isn't just a relational db problem.

~~~
davidmathers
Yeah, total straw man. For e.g.:

 _Similarly, the order lines don’t contain product names, but rather foreign
keys which point to a product table_

Well, the product name could be the foreign key. Or it could not be. If you
have 2 different products with the same name (or 1 person with 2 different
names) then you have some complexity. Doesn't matter what kind of database
you're using.

------
tzs
I was playing around with the idea of redoing our customer database at work.
Our customer database contains for each customer their email address, name,
when their service expires, how many licenses they have, and things like that.
Essentially a bunch of named scalar values.

It also contains their transaction history--essentially an event log. Some of
the transactions have auxiliary information such as specific line items on an
order.

A person subscribing to our service can use it on multiple computers (up to a
number determined by exactly what they purchased). So there is a list of the
computers they are using it on, with some information about that use.

The state of a customer could be represented nicely by a nested data
structure, like a Perl hash or a JSON document, and so some kind of document
oriented storage seems like it could work well. Most changes to a customer's
data consist of adding new transactions, and occasionally updating the
expiration date.

We have a customer service interface that our support people use to view and
change a customer's data. Getting all the data by fetching a single JSON
document would be much simpler than the multiple queries that are now done to
grab the data from the MySQL database it is in, spread across many normalized
tables.

In general, with one notable exception, almost everything that deals with a
customer is dealing with one customer at a time. For instance, a support
person is dealing with the customer they have on the phone. The order
processor is dealing the customer's account for the order that it is currently
processing.

It's not even clear we need a database at all for this. Why not just a file
per account, with the file containing a serialized Perl hash or JSON document
or something similar, with a simple classic Unix file locking mechanism used
to prevent corruption? This is easy to scale, for both reads and writes.

That brings me to the notable exception to "dealing with one customer at a
time". Reports. Reports like to do things involving selecting data from a
large number of customers, related by various factors such as having purchased
particular SKUs in a particular time frame, or having their credit cards
charged through a particular gateway, or things like that.

Reports seems to be something that fits in a lot better with the traditional
relational approach.

If the set of reports needed was known up front, I think one could design a
system that uses a document approach for storing customer data, but also
maintains a relational reports database to hold data specifically for reports.
We tend to have a changing set of reports--we do a lot of poking around and
playing with data to try to find ways to improve our system and I'm often
looking at things I had no idea a week before I'd ever need to look at for a
report.

------
VladRussian
the basic point of the article is that author likes documents better than
normalized record sets. There are at least 2 sets of concepts involved -
documents vs. records and denormalization vs. normalization. The author
preference for document format has no bearing on the necessity and
effectiveness of normalization if record approach is chosen.

