Hacker News new | past | comments | ask | show | jobs | submit login

Context: I'm a data architect, and I work on HUGE datasets.

The answer to your question is "it depends". In this case, it depends on what you are trying to do with the data. First, here's a little back-and-forth of why you would use the different approaches, with a final thought at the end.

If the intent is to log data as a snapshot in time, and storage is cheap, then denormalize the data. Querying may be a bit slower, but depending on your software, it may be financially beneficial.

If the intent is to always show the most up-to-date information, then normalize the data. Depending on the complexity of your setup, normalization may be the simplest solution anyway.

Usually, denormalization takes place as a logging function, and normalization is used for indexed data.

If you will have a lot of real-time reads, use a normalized relational database.

If you will have a lot of writes, but relatively few reads, then go denormalized logging.

In my day-to-day job, we use 2 databases, one is normalized and one is denormalized, and the choice of which to use depends on what we are doing. Of course, our data is massive scale, so it may not apply to you.

If this is a small project, just use a normalized database. Don't overengineer it.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: