
Design better databases - fordarnold
https://web.archive.org/web/20160408103107/http://dbpatterns.com/
======
dang
The original URL was [http://dbpatterns.com/](http://dbpatterns.com/), which
no longer points to the right content, so we replaced it with the closest
archive.org copy of the original.

------
mortehu
Tip for SQL users:

If you give all your ID fields unique names, e.g. by calling your field
"reservation_id" instead of "id", even in the reservation table, you can do
stuff like:

SELECT * FROM reservation JOIN guest USING (reservation_id);

By doing "USING (reservation_id)" instead of "ON reservation.id =
guest.reservation_id", the field will be automatically deduplicated, so you
don't have to qualify it elsewhere in the query, and "SELECT *" will return
only one copy.

~~~
daxelrod
It's long bothered me that SQL doesn't have a way of saying "join these two
tables based on the key relationships I've already defined in my schema". (Or
maybe some variants do?)

The database already knows that guest.reservation_id is a foreign key
referencing reservation.id. Why should you have to repeat yourself?

~~~
greggyb
And when there are multiple foreign keys between two tables?

~~~
kbsletten
Genuinely curious, what's the use-case for this?

~~~
level3
Some examples:

In project management, tasks might have an assigner_id and assignee_id

In transaction management, transactions might have a sender_id and receiver_id

In sports, matches might have a team1_id and team2_id

~~~
jonahx
> In transaction management, transactions might have a sender_id and
> receiver_id

in normal form, shouldn't you have "transactions", "transaction_senders", and
"transaction_receivers" anyway? similarly for the other cases?

i know sometimes the simplicity of a single table is a valid choice, so your
point still stands, but it does feel like the examples are cases of "well, if
you're were doing _the right way_ (TM), that wouldn't happen"

~~~
dsp1234
_shouldn 't you have "transactions", "transaction_senders", and
"transaction_receivers"_

 _were doing the right way(TM), that wouldn 't happen_

Having a separate transaction_senders and transaction_receivers wouldn't be
_the right way(TM)_ unless it's possible to have multiple senders/receivers.
If a transaction can only have 1 sender/receiver then normalization is
complete when the sender/receiver data is removed from 'transactions' table
and replaced with the sender_id and receiver_id columns.

When introducing transaction_senders/transaction_receivers tables without
further constraints, it would be immediately possible to have multiple
transaction senders and multiple transaction receivers for a single
transaction, which is likely an error.

------
clay_to_n
It's an old-looking website, but I've found this site has some really cool
data models:
[http://www.databaseanswers.org/data_models/index.htm](http://www.databaseanswers.org/data_models/index.htm)

As someone in the healthcare space, looking at some of these models gives me a
better idea of how various aspects of the healthcare industry work, and the
things they interact with. Ex)
[http://www.databaseanswers.org/data_models/patient_data_ware...](http://www.databaseanswers.org/data_models/patient_data_warehouse/index.htm)

~~~
elchief
those models are actually pretty bad

the ones in these books are better:

[https://dba.stackexchange.com/questions/12991/ready-to-
use-d...](https://dba.stackexchange.com/questions/12991/ready-to-use-database-
models-example/23831#23831)

~~~
theli0nheart
This strikes me as a very unhelpful comment. Why are they bad? Why are the
ones in these books better?

~~~
jqm
The site (databaseanswers) is neat and I've gotten some good ideas from it in
the past, but most of the models (that I've looked at anyway) are pretty
simple and you almost really wouldn't need to look at a diagram drawn by
somebody else to intuitively put something like that together. It seems almost
more for someone creating MS Access level applications. Granted there may be
some more complex schemas that I didn't see.

There might be an argument about excessive normalization in some cases also.
Take some of those layouts too far and try to extend them and you might wind
up with tons of little tables. Normalization was I think more important back
in those days (not that it's not still important... but some of the downsides
of going overboard on normalization have become apparent I think.... at least
to me).

------
klodolph
I just clicked on the second "featured" pattern and found this hot mess:

[http://dbpatterns.com/documents/5091f74289cbad03bc958bc0/](http://dbpatterns.com/documents/5091f74289cbad03bc958bc0/)

It has the "let's put a UUID on every row" disease common to designers who
have never really learned anything other than object oriented design. Price is
a string (I guess so you can put "market price" on the fish?), and there's a
currency symbol on every "delivery". The whole thing just makes no sense. 18
stars. Unbelievable.

~~~
skrebbel
I'm daft. What's wrong with using uuids as primary keys everywhere? I'd
appreciate it if you could elaborate :)

~~~
mvitorino
The problem is that most uuids are generated in a way that, when sorted
alphabetically as strings, would have a random order.

Example: t0: 7458e3a9-716b-4352-b2e4-b5b67d0c089b t1:
4d8d753c-1777-439d-8725-b093b1bd8430

Using this as a PK in any relational database will mean the rows are stored in
the clustered index order, which causes extreme fragmentation because the db
engine constantly has to find "holes" in the data pages to insert more recent
records instead of adding those at the bottom of the table as would happen
with any continuously increasing key.

~~~
ww520
This is a non-issue. Most db engines just append new data rows at the end of
the data table and assign an internal row ID for it. The keys (UUID in this
case) are stored in the separate index pages using B+ tree, which searches
random key (UUID) or sequentially incremented key equally well.

UUID key is a problem only if your main query is a range query on the PK of a
clustered indexed table. If your main query is a range query, you should pick
something that can be sorted in the range anyway.

~~~
ddorian43
Actually only postgresql does it this way (by storing data in the heap and not
in the primary index). Mysql(innodb),mssql,oracle uses the store the row in
the primary-key.

Edit: I'm ~wrong, see below.

~~~
ww520
Only Innodb does it by default. The others use heap table by default.

MSSQL allows clustered indexed table as an option to order the physical
storage of rows. Oracle has index-organized table as an option.

Edit: They don't use clustered indexed table by default because record
insertion is very expensive since clustered index forces the table to store
the records contiguously in the index's order. Also Innodb is not truely
clustered indexed. It only stores records contiguously for one page at the B+
Tree leaf level. Records in different pages are scattered all over even if the
index values are sequential.

------
zenogais
Kinda misleading title. Doesn't really have anything to do with database
design. It has to do with sharing Entity-Relationship diagrams. Maybe "Design
better relational models".

------
jordanlev
If anyone is interested, there is a 2-volume set of books called "The Data
Model Resource Book". They've been around for a while, so for more traditional
businesses, but very thorough and broken out by industry:

[http://www.wiley.com/WileyCDA/WileyTitle/productCd-047138023...](http://www.wiley.com/WileyCDA/WileyTitle/productCd-0471380237.html)

[http://www.wiley.com/WileyCDA/WileyTitle/productCd-047135348...](http://www.wiley.com/WileyCDA/WileyTitle/productCd-0471353485.html)

~~~
rlonstein
+1. Definitely worth reading, I have Vols 1 & 2 (Vol 3 came out later). I
found it useful because the author explained the decisions about the models
and traps to avoid.

~~~
rukuu001
> because the author explained the decisions

By far one of the most useful things a resource can do.

It doesn't just broaden your understanding of a domain, but it _can teach you
how to think_ about design problems.

------
koolba
More of an ER modeler than a database designer. The interface is pretty
slick/simple. Here's a sample I just cooked up:
[http://dbpatterns.com/documents/570699101514b428de8893a1](http://dbpatterns.com/documents/570699101514b428de8893a1)

I can't imagine using it for anything real but it was fun to play with.

If the creator is reading this, here's a bug report: If the user hasn't hit
"Save" when they export the schema, they get an empty text file and they'll
think the app is broken. It should either auto-save or prompt the user " _You
have unsaved content, do you want to save before exporting?_ "

UPDATE: One more bug report, the export doesn't escape the columns names at
all. It just naively adds double quotes around them. This breaks with double
quotes in field names ( _yeah yeah which is stupid but still.._ ):

    
    
        CREATE TABLE "foo" (
    	"id" int,
    	"baz" varchar(255),
    	"bar" varchar(255),
    	""test"" varchar(255)
        );

~~~
astazangasta
I found the interface very off-putting. If this is a site about exploring data
models, a "public feed" and "search" is not a great way to interact with
those. In fact, a timeline chronology of when these things were made seems a
great way to expose mostly irrelevant information to a large number of users.

How do I browse these models? Is there a taxonomy of organization? How many
models are in the system? Are there models relevant to general domains of
interest to me? None of this is made apparent.

~~~
zo1
I think they're aiming at the "jsfiddle" field. I.e. Sharing ERDs and such
over SO and other social sites. As opposed to them trying to be some sort of
comprehensive database of standardized DB patterns.

------
cmrdporcupine
Y'know I really miss designing relational schemas. As much as I hate SQL the
language I love the relational model that it is a (butchered and ugly and
compromised) implementation of. But working where I do on the systems I do now
this is something I never have to deal with anymore.

There's something very therapeutic about organizing data using a system of
rules, and the relational data model is a powerful one.

~~~
dv35z
Completely agree with you. I have always hoped that some of the GUI modelers
would allow you to add notes/explanations on tables & relations, explaining
the motivation behind design decisions. Example, "M2M relation here allows us
to 'tag' objects with multiple attributes".

------
iamleppert
Looking around I see lots of people just blindly using UUIDs for everything.
At a prior job, I had a boss who forced me to use UUIDs for everything, before
we had even collected a single row of data.

It hurts my pragmatism to solve a distributed systems problem before we even
had a distributed system! Don't be a sheep and use UUIDs, rather than a simple
integer primary key or composite key that is natural to the table. People say
it's just an id, who cares? But I say people making these kinds of decisions
are probably making other poor choices backed by something they read and don't
understand.

Here's a great article on locality and the need for uniqueness and why UUIDs
should only be introduced when needed and to solve very specific distributed
systems problems. The reality is almost all applications will never need to
have multiple concurrent writers.

[https://eager.io/blog/how-long-does-an-id-need-to-
be/?hn](https://eager.io/blog/how-long-does-an-id-need-to-be/?hn)

~~~
vkjv
I barely ever use uuids. As you say, it solves a fairly specific problem.
Usually when you need to scale to multiple datacenters, it's easier and faster
to add a datacenter column and starting using it with id as a composite key.

The only time I use UUIDs is when I need to pass that id out, but I want to
hidr how many rows there are or the rate that they are being created. However,
I usually use COMBs to prevent fragmentation.

------
gholevas
I built an electron app that allows you to design, generate, and share
Mongoose Schemas if anyone finds it useful:
[http://mongomulch.com](http://mongomulch.com)

------
moron4hire
These schemas seem like pretty simple, straightforward adaptations of specific
problems. This doesn't actually look like any sort of actionable advice on how
to design better schemas. Are we supposed to just osmose the knowledge
somehow? I was expecting something more along the lines of recommendations
like "foreign key relationships should always be indexed, nobody ever came up
with a realistic example where they shouldn't! Why the hell isn't this the
default behavior?"

------
igrekel
Interesting idea but it the way it is currently done isn't that great.

Most of the patterns I've seen are obvious simple things but they are missing
a lot of content like : \- What was the design's intended properties? \-
Expected volumes, access patterns? \- Good places for indices? Maybe the
comments could be of some help but really haven't seen much. I've actually
found more interesting content in the HN comments than on the site itself.

------
hghar
I don't think this must be called "patterns" in computer science patterns are
a model applied to give solutions to recurring problems. I was expecting
something like Martin Fowler Patterns-Enterprise-Application-Architecture but
for databases.

This should be called something like database designs.

~~~
kazinator
So what you're saying is that "design patterns" should only be for code, not
for data?

If you apply some recipe to structure code, that's a "design pattern"; if we
apply some recipe for structuring data, it's a "data(base) design"?

But code is data. But sometimes it's not clear whether something is code or
data, or whether it's closer to being code than to data.

Design patterns incorporate data. The Observer pattern requires a list of
observers that are notified; that is a data structure which we can have in a
database: an observer table joined to an event source by event source ID.

If I make such a database (say for a large number of users to set up
persistent notifications about some interesting interesting things), am I
doing "database design" or implementing the Observer pattern? :)

------
nxzero
One of the keys to understanding SQL, and on that note, I would highly
recommend SQLZoo's interactive tutorial that allows someone to query a (fake)
database and progress using baby steps: [http://sqlzoo.net](http://sqlzoo.net)

------
haddr
Cool website, for off the shelf models. For anything more advanced you should
be careful, as modeling real world scenarios is rarely so abstract and mamy
times you need to make some scenario-specific tradeoffs during the modeling
phase to fulfill your requirements.

------
barryosull
Looks like they’re trying to create a repo of open source DB schemas for
domains. That's putting the cart before the horse in my mind. People don’t
start with DB schemas when building domains, they end up with DB schemas after
modelling them.

------
Scirra_Tom
Had a quick look through, don't understand what audience this is targeted
towards.

------
tacone
I like this site. As simple as it is, it has a lot of potential community-
wise.

\- it could allow sql exports for various platforms \- it could feature an API
so people can write their own framework drivers (for example creating
migrations, importing existing schemas etc)

Lets hope that it gains users, so that the voting system (the star) can become
more useful to filter out the garbage.

Edit: please remove the login wall to see the starred items, it raises the
entry barrier quite a bit. Do you really want new visitors to see low rated
schemas as the first thing?

------
adwf
As a database guy this makes me feel good. From looking at a lot of these
schemas, my job is not going to disappear anytime soon...

Most egregious example for me is probably the prevalence of a lot of "type"
tables when a simple enum column would do. Or maybe the sheer number of UUIDs
that are being thrown around.

I even saw a circular ID chain in one. Would be fun setting up foreign key
constraints for that!

------
raziel2p
Most of these seem extremely simple. I would like to see some examples where a
good database schema isn't so obvious.

------
Mister_Snuggles
When looking at an author, I can see all of their patterns. This one, in
particular, has a pattern that was forked 13 times:
[http://dbpatterns.com/accounts/profile/thaichor/](http://dbpatterns.com/accounts/profile/thaichor/)

How do you see the forks?

------
n13
Was wondering how difficult is to code a UI like this? i.e. SVG with draggable
boxes with the connections/lines?

------
olalonde
Wow, that's really cool! Would be nice to have a "migration" export format for
Rails, Knex.js, etc.

------
jdc0589
I was bored, so here is a quickly thrown together generic "thing" DB model:
[http://dbpatterns.com/documents/5706a53c1514b428de88940c](http://dbpatterns.com/documents/5706a53c1514b428de88940c)

------
burrox
I like it, I think there's a real use case for it. I spent quite some time
looking for examples on how to create an schema for an activity based workflow
webapp.

------
intrasight
An empty schema diagram is to a data modeler what a blank canvas is to an
artist or a blank page to a writer - thrilling but somewhat intimidating.

------
stevesun21
I thought about the similar idea to create domain models to elaborate business
rules with UML ORD rather than database ERD.

------
jmcgough
The search function doesn't seem to be working - I can't get it to return
results ever.

------
geniium
Seems like a big playground...

------
vonklaus
wow. doing my first real data model. tough to find resources like this,
started literally resterday. can't wait to check this out

------
whatnotests
Kinda cool but the UX really needs some re-thinking.

Simple things like dialogs getting stacked on top of each other, using
`prompt("...")` sometimes and modals other times, foreign key relationship
arrows not indicating cardinality, etc --

Looks good but please don't consider it "finished" yet.

------
gherkin0
> Public Newsfeed

> jnichols created new pattern

> Penis

> 24 seconds ago

Ok... maybe they need to start with some patterns about filtering spam and
noise.

