Hacker News new | past | comments | ask | show | jobs | submit login
Ask HN: How to get good at data modelling?
29 points by nomilk on May 9, 2022 | hide | past | favorite | 4 comments
I recently made a database for a side project (10-15 tables, moderate complexity). I estimated it would take 40 hours for the entire db (tables, columns, relations, validations, indexes), but so far it has taken 5x as long and still isn't finished.

I'd like to get better so such blow-outs don't happen in the future. Here's my plan:

1. Before modelling, map out business processes thoroughly on paper or whiteboard, including edgecases.

2. Implement ideas early to inexpensively discover if they work or don't work

3. Experience improves pattern recognition, so practice, practice, practice.

4. Know the types of table associations well

What other things can a self-taught SWE do/read/watch to get better at quickly and competently designing relational databases?




My approach:

1. Model business processes that you expect to be generating data as well as those that will be asking for data. Both sides inform the design: how hard it is to put data in, and how well the representation supports the different ways you want to get it out again.

2. Design the schema. Pen and paper is a good start, as are some tools that let you do it graphically.

3. Start writing down queries you expect to be able to do. Again, pen and paper/whiteboard. These force you to think through the relations and see if they make sense when you're building queries. It isn't uncommon for a schema that makes sense for putting data into a database to not quite be flexible enough to support the kinds of queries you care about.

4. Prototype, test. This entails not just coding up the schema, but connecting it to some source that will populate the database. I often write synthetic data generators to shove a bunch of data in so I can evaluate queries. This can inform schema revisions to address performance issues.

SQLite is invaluable for prototyping. Last time I did this I used the SQLite manager plugin for Firefox to play with the database design as it was being designed in that 4th step above. Even if the plan is to move to a different database, SQLite is good for getting the rough picture right and playing with it with minimal work/friction.

I have a couple database textbooks around that I refer to that are good references for SQL and schema design. I'd have to go dig them out of a box to find the names though - it's been a couple years since the last time I went through this exercise.


Thanks a lot! I need to get better at generating synthetic data. All the libraries exist, I'm just not fluent in them yet.

If you think the books would help, and you recall their names, please do share!


What works for me: prototyping tables in Excel.

Which is a grandiose way of saying: sketching them!

Show a few rows for Table A, and how they connect to rows in Table B. Connect keys using colour coding, e.g. Table A's PK, and its FK in Table B, are both a certain shade of blue.

This can quickly yield a sense of whether the model "feels right". Are copies of data showing up? Are there any situations which can't be represented? What happens over time - if a row in Table A gets replaced/superseded, how does that affect other tables?

Side benefit - knowledge sharing. It's a nice medium for sharing with team mates and other colleagues for a really rapid share of info and design.


My tips (as a DB architect amongst other things):

- Get your business requirements as detailed & complete as possible before you begin.

- Break down your requirements into nouns, which will be entities, and adjectives/properties/descriptives, which will be your columns. Write a list of each. E.g., 'Sale' is a noun, 'Sale Date' is a property of a Sale. Repeat for every requirement. Make a big list.

- Entities NORMALLY but not always become tables. Likewise properties. Find out about strong and weak entities. Look for properties shared between entities, entities that are properties of other entities. Consider using weak entities to map things together, i.e. Appointment only exists if Customer, Branch and FinancialProduct exist. Avoid circular relationships.

- Look through your list for non-trivial dependencies. These are hierarchies, e.g. Branch -> Department -> Manager -> Employee, or transient dependencies e.g. Postcode -> Address -> Person, or in fact any non-direct relationship between one thing and another. Consider how you will map these out - as a series of parent-child tables? Some recursive structure?

- Decide on whether you want a fully normalised model, a partially-normalised model or a denormalised model. Normalised models scale well but without good index and partition management perform like shit. Denormalised models will fly, but good luck updating data and maintaining consistency.

- Build your ERDs before you write any code. You want a conceptual model, then a logical model, and then a physical model, in that order. Your conceptual model links your concepts and establishes your cross-dependencies; the logical model is the formal ERD of the same and your physical model should be homogenous (that is to say, directly mappable) to the SQL DDL which will create your schema. This depends on the product you're deploying to.

- Find out about 1-1, 1-N, M-N relationships. Find out about fan and chasm traps. Find out about datatypes. Avoid direct M-N relationships and structure out your schema with solid foundations. Choose your datatypes well and fit them well to the data. Nobody needs NVARCHAR(MAX)es on every column. The query optimiser will not thank you.

- Prototype your schema - deploy it somewhere. Now insert, update, delete data from it. Query it. What hurts? Can your requirements be met without a pile of abstractions? Make changes and retry.

- How will you handle abstractions? Direct queries? Views? Stored procedures? Indexed views (materialised views)? Will there be some crappy ORM layered on top? What will you feed it on, direct queries or views?

- Make any changes you need, repeat and rinse, repeat and rinse.

- Write a Crayola summary with lots of colourful diagrams and smiley-face stickers and present it back to the business analysts for feedback. Do not use long words.

There is quite a bit more to it than this but these I feel are the essentials.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: