

Normalize First - edw519
http://codeeleven.blogspot.com/2008/07/normalize-first.html

======
gecko
I think that the linked blog post goes a bit overboard. Atwood himself says
repeatedly that you should start with a normalized schema; he just also says
that denormalizing for performance reasons should not have such a strong
stigma attached, given that it's the only way to make big, massive databases
work correctly. We can argue over whether Atwood is arguing to optimize a bit
too early, but the general idea he's expressing is neither new nor (at least
among experienced DBAs) disturbing.

~~~
ars
Except that the premise is wrong! Denormalizing will not help you speed up the
database. It's a false argument. It just won't do that.

There are specific table structures combined with specific query type that can
be denormalized to speed things up. But a normal table will not speed up in
the slightest, in fact most of the time it will slow down!

It will slow down because if you need only 1 column out of several, the larger
each row is the slower the access will be.

Denormalizing is even worse for large sites. By normalizing you can often put
some tables on one machine, and others on a different one. For example
screen_name and phone number, which jeff combined. Screen_name is used by the
chat program, and phone number by the profile page.

Run the chat program on a totally different set of machines. All you need to
send it is the user_id.

~~~
IsaacSchlueter
I get what you're saying. But I think "screen_name" is used here to indicate
the IM links that some sites let users put on their profile. They then create
the little ymsgr:sendIM?user_name hyperlinks so that you can IM that person.

So, that IS actually something that would go on the profile page, right with
the phone number.

I'd always heard that normalization should only be done at sane data
boundaries. IE, if you have a bunch of fields that always go together, they
should be in the same table.

~~~
ars
The thing with screen_name was just an example of how normalizing could help
with speed. It wasn't meant to be definitive. Plus for the profile page you
could just get the data from that other server.

>I'd always heard that normalization should only be done at sane data
boundaries. IE, if you have a bunch of fields that always go together, they
should be in the same table.

No, not exactly. It's too big of a subject to explain in a post like this,
plus there is tons of text on it if you search.

But, basically normalization removes duplicated data from the database. It's
also used to allow multiple X to exist for a single Y without having columns
like X_1, X_2, X_3, etc. And finally it's used to show relationships between
things.

There are some people will will simply split a table into two tables along
logical boundaries (but still, just one row in each table). This is not
normalization. It can be done if some data is accessed rarely, or because the
database can not handle such a big table, but it's not because of
normalization.

------
davidmathers
Oh man, this is just as bad as the Coding Horror post.

Four lists:

1\. Things which are relations

\-- relations

\--

2\. Things you can use to represent relations

\-- pen and paper

\-- perl/python/ruby/javascript/etc arrays & hashes

\-- c arrays & pointers

\-- SQL RDBMS tables

\--

3\. Things which the concept of relational normalization applies to

\-- relations

\--

4\. Things which the concept of relational normalization doesn't apply to

\-- pen and paper

\-- perl/python/ruby/javascript/etc arrays & hashes

\-- c arrays & pointers

\-- SQL RDBMS tables

\--

If it sounds like I'm being pedantic, realize that 3 things must be taken into
account when talking about whether or not a group of SQL tables represents a
normalized group of relations: table stucture, row structure, and access
methods.

It's possible to put an entire, fully 3rd normal form, relational database
into a single SQL table.

------
notdarkyet
Not long ago I submitted an Ask HN that requested information on database
design practices but it never got popular. It seems like it would be quite
obvious to most here on how to perform quality database design, but I cannot
find any good resources online. If any or you here have that info, or more
specifically standard practices that are not necessarily in books, please
reply because I tend to feel somewhat lost on the topic.

~~~
IsaacSchlueter
Articles that always seem to bubble up on HN:

1\. _Paul Graham was wrong in that essay, and also he's an idiot._

2\. Anything by Jeff Atwood

3\. _Jeff Atwood was wrong in that blog post, and also he's an idiot._

4\. _How I made my startup (succeed|fail) by working 114 hours a day and
eating only cardboard and pocket lint._

4a. _How we built a web app in 4 minutes for only $0.37_

5\. _Being a (programmer|hacker|founder) is teh awesome._

Not saying that all the articles are on those subjects, but they are HN gold,
that's for sure. If you could somehow work some of these topics in to your
question, it might do better.

~~~
babul
Well look at the demographic of the HN user and the obvious becomes even more
obvious.

------
icey
I saw this over proggit and it made me laugh hard enough that I bookmarked it.

Now I think of it every time I see Atwood's name:

<http://lost-theory.org/realultimatepower/> (Yes, it's a ripoff of
<http://realultimatepower.net>)

------
dhotson
This submission title looks like a troll, but I'll bite.

The good thing about Coding Horror is that it gets people thinking about
issues that they might not have considered before. While the details may not
be 100% correct, I think the general principle he's trying to communicate is
valid.

I can't speak for Jeff, but one of the things that he will probably admit is
that he isn't perfect. I've found that he doesn't necessarily write about
stuff that he is an expert on, but mostly stuff he's learning about or issues
that people should perhaps be thinking about.

I'm sure there are plenty of other websites that debate the finer points of
database design.. perhaps they too should stop blogging in case that someone
takes their advice too literally. :P

~~~
jawngee
Atwood runs Wordpress on a Windows box. Enough said.

~~~
dhotson
There's really no need to be dogmatic about it.

If it's stupid and it works, it's not stupid.

~~~
thwarted
This must be some new method of doing boolean algebra and discrete math I'm
not familiar with.

