
Best Practices for Data Modeling - mjirv
https://www.stitchdata.com/blog/best-practices-for-data-modeling/
======
zackmorris
From the article:

 _In general, when building a data model for end users you 're going to want
to materialize as much as possible. This often means denormalizing as much as
possible so that, instead of having a star schema where joins are performed on
the fly, you have a few really wide tables (many many columns) with all of the
relevant information for a given object available._

Where "materialization" means whether a relation is created as a view (rather
than a table).

I'm not sure I agree with this. Personally, I think it's better to primarily
normalize and write all queries as close to functional programming as
possible. Which means opting for computation on the fly during development and
relying on the underlying database infrastructure to optimize internally. Then
denormalize or dupe data via views and caching after bottlenecks have been
identified. Otherwise you're fighting premature optimization during
development.

That said, there could be a place for materialization in log-structured
storage (LSS):

[https://jvns.ca/blog/2017/06/11/log-structured-
storage/](https://jvns.ca/blog/2017/06/11/log-structured-storage/)

I'm thinking that the future of all of this could be LSS on distributed
consensus algorithms like RAFT, then devoting some amount of cache to building
out materialized relationships that can always be derived from the log. Then
we could have our cake and eat it too with fast durable writes and low-
overhead reads without joins.

If anyone has an example of this, I'd love to see it.

------
coward12345678
This article has more fluff in it than Ron Jeremey's condo circa 1995

~~~
tracer4201
It confirmed the things I’ve learned in the past year. Not sure what you’re
talking about. If you have something more constructive, tell us.

------
rumanator
Does anyone know if there are any resources on how to design data warehouses
for non-relationa data such as images, both unprocessed and processed?

------
throwaway35784
> IDs should get an _id suffix, and primary keys should be called $OBJECT_id
> (e.g., order_id, user_id, subscription_id, order_item_name_id).

Wrong. Be consistent. All id's are named ID or id. Consistency is key.

Most of the article is filler and style choices.

~~~
Ma8ee
No, clarity is much more important than consistency. And I much prefer to see
an expression like

    
    
      O.order_id = CO.order_id
    

than

    
    
      O.Id = CO.order_id

~~~
roland-s
If you're using table aliases like "O" you're already throwing clarity out the
window. The following is perfectly clear and readable IMO:

    
    
        Orders.id = Customers.order_id
    

and also indicates which column is the primary key and which is the foreign
key. What about other column names duplicated across tables, should

    
    
        Orders.created_at = Customers.created_at
    

become

    
    
        Orders.order_created_at = Customers.customer_created_at
    

Scope exists for a reason, don't be afraid to use it!

~~~
Ma8ee
The reason we use aliases is it almost only in toy-examples that most tables
are called things like Order. I won't fill my SELECT with row after row with
CustomerOrderRowHistory...

And your second example kind of prove my point. When it it written out like
that it become obvious that you are trying to compare diffent quantities.
There is clearly something wrong when you are trying to join on
order_created_at = customer_created_at which isn't at all as clear with your
way of writing.

