
Death by Database - ScottWRobinson
https://ovid.github.io/articles/death-by-database.html
======
dyeje
I am surprised that their suggested solution was for a many-to-many
relationship. Seems like you would want to stay one-to-many but remove the
unique validations. If you're sharing address records then it seems there
could be many possible frustrating cases. For example:

What if someone goes to create an address, typos it, matches another
customer's address, and then fixes it updating the other customer's address in
the process? The other customer might accidentally send their luxury item to a
stranger if they aren't paying attention during checkout.

The use case with married couples is compelling, but similar to the previous
scenario, what if they got divorced? The one that moves updates their address,
which updates their ex's address as well. Same as the other scenario, you
might end up with a customer sending their item the wrong place.

These might be edge cases but in general I think it's dangerous to change
things like an address without an explicit action from the user.

~~~
gshulegaard
I am not surprised because the problem wasn't correctly defined:

> At the database level, they had a "one-to-many" relationship between
> customers and addresses.

Which the author goes on to say causes:

> That was their first problem. A customer's partner might come into Bob's and
> order something and if the address was entered correctly it would be flagged
> as "in use" and we had to use a different address or deliberately enter a
> typo.

But that's not true. The problem being described is not caused or implied by a
one-to-many relationship between customers and addresses. The actual problem
is that the address table has unique constraints defined in such a way that
duplicate addresses are not allowed.

So given that the problem was incorrectly defined as one-to-many being the
root cause, it makes sense that the proposed solution is many-to-many.

Just to be clear, many-to-many in this use case makes a lot of sense and
depending on what the schema looks like it may be the best solution. I am just
saying that the problem described was not because of "one-to-many".

For example I work on a multi-tenant web app and we store emails (which have a
similar problem with mailing/billing addresses) with a one-to-many
relationship to accounts. We don't have a problem with an e-mail being used in
several accounts... _because our constraints allow it_.

~~~
thirdsun
> But that's not true. The problem being described is not caused or implied by
> a one-to-many relationship between customers and addresses. The actual
> problem is that the address table has unique constraints defined in such a
> way that duplicate addresses are not allowed.

Furthermore the unique constraint most likely didn't even serve its purpose:
How can you really make sure that an address is unique? There are countless
ways to write one and the same address - in its simplest form it might be
variations of Street / Str. or Avenue / Ave. Any information combining
multiple lengthy strings will be prone to resulting in several slightly
different variations.

------
protoster
Addresses is one of those categories for which people should regularly read
the "falsehoods programmers believe about X" list. Here's one:
[https://www.mjt.me.uk/posts/falsehoods-programmers-
believe-a...](https://www.mjt.me.uk/posts/falsehoods-programmers-believe-
about-addresses/)

~~~
petre
Hmm, we have name (enter as many as you like, Li Pong Tai if you wish),
address, country (select, excludes Kosovo and Western Sahara and other
territories pretending to be countries), phone and email fields. Email is
verified with a code. Forcing a certain format very rigidly is bad practice.
Like first name, last name or state, city. What if the customer is Korean or
lives on a ranch in the countryside or is a ranger and uses their work
address? Or worse, lives in the Medina in Morocco.

~~~
wstuartcl
In the US there is also a trend for certain companies to use an address
"fixer" or "reformater" API to automatically wither check and or fix an
entered address. So bad. When I moved into a newly built apartment building a
few years ago there were more than a few sites that just would not accept that
the actual address was real and valid. ick.

------
davnicwil
This isn't about database design, per se, it's about data model design in
general.

Starting a software project by thinking hard about the data model is always
good advice, but even knowing this it is almost always impossible to design
things perfectly up front.

You need to hit problems square on in practice, and reshape your model to deal
with them. If you think otherwise, if you really think you can think through
in advance all the possible unexpected permutations of how your model might
not work in practice, like with this address thing, you are just kidding
yourself. It never happens that way.

So, maybe the best advice is not only to start by thinking about the data
model but to have a mindset of continually thinking about and changing your
data model, and to build your software using languages, tools and technologies
that better accomodate this.

~~~
thirdsun
I agree and I'm surprised potential data models aren't as often discussed and
debated as the choice of technology and stacks.

At least to me considering and planning a lasting data model, despite knowing
that it'll change, is a far more significant focus than choosing a framework -
a choice which is often dictated by your experience and familiarity with those
options anyway. And yet we rarely see discussions about best practices when it
comes to data models. Maybe the requirements and solutions are often too
specific.

~~~
dvfjsdhgfv
> I agree and I'm surprised potential data models aren't as often discussed
> and debated as the choice of technology and stacks.

This. People can discuss stacks and tools for hours, whereas the optimal data
model seems to be a boring topic, even though very often it turns out far more
important in the long run.

------
wonnage
What they're really asking for is multi-user accounts, which is a whole can of
worms... also the address uniqueness requirement seems like it'll leak your
customers' information.

Also don't understand why the system for spamming marketing emails needs to
share the same address table (and associated headaches) as actual customers.
Presumably you will want to spam people who aren't in the system yet...

~~~
tener
Exactly. These should simply be different databases. You could then enrich the
spamming one using the client one, but no need to do the reverse...

------
alphanumeric0
That Linus Torvald quote comes to mind: “Bad programmers worry about the code.
Good programmers worry about data structures and their relationships.”

~~~
tabtab
This is why I believe the future of source code management will be via
relational databases instead of in hierarchical file systems. Large projects
are too involved for trees. The same with GUI specifications: They are too
complex to manage with OOP/OOM. Plus, a GUI spec in a database is (mostly)
language-neutral so that you are not tied to a GUI library written in a
specific programming language. You learn one engine, and then use it for PHP,
C#, Java, Python, etc.

~~~
zzzcpan
Data structures have nothing to do with SQL though. That quote is pretty much
anti-relational databases.

~~~
tabtab
Please elaborate. Non-trivial data structured will either need a database, or
end up resembling one anyhow when you add features they'll eventually need as
they mature (concurrency, data integrity, querying, adding indexes without
changing functionality/design, backups, etc.)

Note that didn't imply that Linux should designed around a DB. However, with
virtualization, the idea of an "OS" may change or vanish. The view is toward
"applications", with the associations between applications, users, file
systems, and machinery more flexible. In that approach, a DB-centric view may
make more sense. A (traditional) "OS" is a machine-centric view.

~~~
zzzcpan
Data-structures are machine-centric, not abstract relational algebra. You can
understand why it's important to think about data structures for the machine.
Which is a completely opposite view from relational databases, where you are
supposed to not think of how data relates to the machine at all.

~~~
tabtab
In practice the boundary between both is fuzzy. Processing DB data can and is
often done in a row-by-row or cell-by-cell basis. Not all computations are
convenient to reduce to relational algebra, at least for most practitioners.
We can't all be Sheldons.

------
kyberias
The author advocates starting with database design before software. Some other
people advocate the opposite: the database is just a detail.

I think the point is to spend time in designing a good domain model regardless
of how it's stored (database).

~~~
wvenable
I also always start with the "database" design or the storage model or
whatever you want to call it. If you do database design with proper foreign
key relationships, etc, the entire rest of the design including the UI and
most of the business rules can all flow directly from that.

You can also do that with a good domain model but the rigors of actually
having to physically construct it in a database ensures less mistakes.

Now, if you're using a loosy-goosy JSON blog storage solution than there is no
advantage to starting at that level.

~~~
bbojan
"Show me your flowcharts and conceal your tables, and I shall continue to be
mystified. Show me your tables, and I won’t usually need your flowcharts;
they’ll be obvious.”

~~~
mooreds
[https://en.wikiquote.org/wiki/Fred_Brooks](https://en.wikiquote.org/wiki/Fred_Brooks)
in case anyone wanted the source.

------
tabtab
I've often found the "proper" database design is too confusing to users and/or
managers. Users will have to understand the complex relationships to enter,
inspect, and trouble-shoot data properly. Often they don't. Simplicity often
trumps logic and the D.R.Y. Principle even if the limits (simplicity) create
extra work in some situations.

Further, you don't know what marketers will dream up in the future and cannot
realistically anticipate enough of their harebrained ideas. Try to keep
marketing separate from production when possible, but sometimes marketers
and/or the bosses want something technically goofy and you have to fudge stuff
to get it.

Warn them about possible long-term consequences, but if they _insist_ on
Frankenstein, you just have to do it. Get your warnings in writing so that you
have a record about their decision when bleep hits the fan later. Further if
you avoid complaining a lot in general, then important complaints carry more
weight. Otherwise, they'll mistake your important warnings for mundane ones.

~~~
Apaec
The db is not only about tables, there's a reason why views exist, the
users/managers should only query views, these are easy to change whenever the
underlying private schema changes.

~~~
tabtab
Views are an imperfect abstraction (as usually implemented). Plus, pre-many-
to-many-ing all relationships will make for a notably slower system even if
views hide many-to-many-ness. (As I mentioned elsewhere, one often can't know
which relationships will grow to many-to-many over time.)

------
claydavisss
Projects and even companies I have witnessed dying are more often than not
done in by underinvesting in databases.

When you start a project with a React UI and just throw whatever you need into
the schema in order to make things come up on the page...you're headed for
disaster.

Or on the flip side...you know how databases work but you don't know how to
say NO to features that are expensive.

This seems to be getting worse over time. Databases are becoming a lost art.

~~~
thrower123
There is a lot of hard-won knowledge that is aging out of the profession
increasingly lately, I have been observing. It's tempting to blame web
development, and its proverbial low bar to entry, but I think the real problem
is computer UX getting too good, such that you can accomplish almost anything
one would desire with a computer, without having to delve into the nasty
implementation details.

I am just barely starting to get grey in my beard (largely from dealing with
frighteningly incompetent consulting firms, rather than age...) but I can
remember the eldritch incantations dealing with autoexec.bat and DOS memory
modes, or the clusterfuck of trying to get printers or new bits of hardware to
work, or the panic of trying to fix BSODs when I'd trashed the system
installing something dodgy from LimeWire or the shovelware bin at WalMart. The
next generation coming through has been shielded from these horrors, and
mostly matured in an environment where computers work reliably; and when they
do fail, it is usually opaque, inscrutable, and largely hidden from their
eyes. Aside from a crash-course in the scientific method for diagnosing and
debugging issues, the old dodgy software world exposed one rather harshly to
many of the underlying realities of the system, and our current software
environments are still mostly built on those foundations, with a few dozen
layers of lipstick applied to the pig.

It certainly doesn't help that most instruction in software engineering either
hews to the abstract and theoretical or the novel, with passing consideration
of the practical realities and the history of the art. Ultimately, we write
code that runs on silicon transistors, not ideal Turing machines, and in a
great many fields we are retreading extensively explored ground, a hamster
wheel of innovation. Every generation seems to have to need to have a go at
yet another build system or object database, or rediscover the model-view-
controller pattern. We delight in making endless new and exciting and broken
wheels, in shameful ignorance of the hard-won lessons of the past.

------
candidtim
This is why I don't think of database as an interface. Instead, create
functional interface with some business meaning (REST API, GraphQL, a Java
interface - whatever suits a particular use case), which will be __versioned
__and properly maintained with __backward compatibility __where possible
(which is, I think, almost always possible). It doesn 't cancel a proper DB
design though, but helps a lot when you need to change it. Or move to another
DB engine altogether.

You know, every problem can be solved with another level of abstraction. No
sarcasm, I think it actually holds here.

~~~
tabtab
While a many-to-many DB relationship can in theory model a 1-to-many
relationship via constraints and/or domain logic coding, it seems kind of
anti-YAGNI to me. Being that ANY 1-to-many relationship can potentially change
into being many-to-many, which ones do you "pre many" and which ones are left
1-to-many? My crystal ball is not that powerful. If it were, I'd be competing
with Warren Buffett instead of puzzling over database design.

------
3pt14159
Same. I spend about 30% of my time writing out the migration and making sure I
haven't missed anything. It saves so many headaches later. It's also one of
the reasons I stubbornly re-implemented Rail's whitespace formatting of
schema.rb into a rake task. I look at that thing 50 times a day. It needs to
be readable, I don't care if it clutters up a pull request review or git diff.

~~~
thirdsun
I'm sure you already know about this gem, but just in case:
[https://github.com/voormedia/rails-erd](https://github.com/voormedia/rails-
erd)

------
castlegloom
I worked on a codebase with a similar address problem. Oh, and there were no
primary keys. Instead, each row referred back to a separate table with four
keyed initials.

Good database design upfront, and vigilant upkeep will keep the application
layer tidier.

------
discordianfish
The real problem here seems to be that changing the schema seems prohibitive
expensive.

~~~
ben509
That's a huge problem with SQL in general. It _ought_ to be trivial to
maintain multiple views of the data to allow you to migrate old clients bit by
bit, but in practice it's rarely done.

Also, it's a business problem. They can always ask for more cruft and IT has a
huge incentive to take the short-sighted route, get promoted and move on.

~~~
tabtab
Re: "[flexible schema change is a] huge problem with SQL in general." \-- I've
been a proponent of the idea of "dynamic relational". You can add any column
on whim to a row just by including it in an INSERT or UPDATE command, unless
an explicit constraint/rule forbid it for a given table or database. You could
even make tables similarly create-on-write. No DBA's needed! (within reason)
Comparisons have to be done with care, though, as with any dynamic/implied
typed system.

But create-on-write doesn't solve cross-table relationship flexibility
problems: such as changes between 1-to-1 to 1-to-many and/or to many-to-many.
If you already have data, that's usually a tricky domain problem regardless of
what kind of database technology you use. It's about the semantics of your
data, not machines.

~~~
Scarbutt
_Re: "[flexible schema change is a] huge problem with SQL in general." _

Graph databases mostly solve these problem but they have not seen much uptake,
don't know why.

~~~
tabtab
Using "Dynamic Relational" (DR) is much more similar to existing RDBMS. You
don't have to relearn much. I'm not necessarily saying DR is objectively
better or worse than graph DB's, only that DR is a smaller learning curve to
those using the most common RDBMS out there.

------
manigandham
Wrong solution and nothing to do with database design.

Why do the addresses even need to be in the ecommerce system to do the
mailing? Couldn't they be in another table? Or a simple CSV? No reason to
create fake customer records to send some spam. Something's missing in this
story.

But if for some reason they are forced to use the same db, multiple customers
sharing the same address doesn't mean they need to share the same row in the
table. Keep the 1:many relation _without the unique constraint_. Customers can
then have multiple addresses, even if they're shared with others.

Addresses are complicated too, so you might as well just have a country field
+ the rest and then use an API to standardize, catching errors and making
queries easy.

~~~
nradov
Most US sellers use some sort of USPS certified address validation and cleanup
software to catch data entry errors and prevent delivery problems. So dumping
everything but the country into a single big field isn't necessarily a viable
design.

[https://www.usps.com/nationalpremieraccounts/manageprocessan...](https://www.usps.com/nationalpremieraccounts/manageprocessandaddress.htm)

~~~
wstuartcl
DO NOT DO THIS.

Have you ever moved into a new house or apartment on a new street or address?
Effectively blocking users because you assume their address is corrupt because
your system is out of date. At some point you have to choose what is better,
trusting that your users are able to enter their address or that an API (from
USPS or one of the dozen or so others out there) is able to understand what is
and is not a valid address.

~~~
nradov
Don't trust your users. Use the address validation software, but if the
address doesn't work allow the user to specifically choose to use their
original entry. That's how most large ecommerce vendors operate now and it
allows a manual override for new addresses.

And the address lists are updated very quickly now for any decent software. So
it's only a problem for a very limited time.

------
yoava
Very shallow post.

Come on, this is not about database design. This is about making the wrong
technical choice, the definition of "I have a hammer, so everything looks like
a nail".

What about export of the addresses, and using a second service on the export?

