How about a column-oriented database? http://en.wikipedia.org/wiki/Column-oriented_DBMS
Would that help much? (I don't know much about databases :S)
There are specific table structures combined with specific query type that can be denormalized to speed things up. But a normal table will not speed up in the slightest, in fact most of the time it will slow down!
It will slow down because if you need only 1 column out of several, the larger each row is the slower the access will be.
Denormalizing is even worse for large sites. By normalizing you can often put some tables on one machine, and others on a different one. For example screen_name and phone number, which jeff combined. Screen_name is used by the chat program, and phone number by the profile page.
Run the chat program on a totally different set of machines. All you need to send it is the user_id.
So, that IS actually something that would go on the profile page, right with the phone number.
I'd always heard that normalization should only be done at sane data boundaries. IE, if you have a bunch of fields that always go together, they should be in the same table.
>I'd always heard that normalization should only be done at sane data boundaries. IE, if you have a bunch of fields that always go together, they should be in the same table.
No, not exactly. It's too big of a subject to explain in a post like this, plus there is tons of text on it if you search.
But, basically normalization removes duplicated data from the database. It's also used to allow multiple X to exist for a single Y without having columns like X_1, X_2, X_3, etc. And finally it's used to show relationships between things.
There are some people will will simply split a table into two tables along logical boundaries (but still, just one row in each table). This is not normalization. It can be done if some data is accessed rarely, or because the database can not handle such a big table, but it's not because of normalization.
1. Paul Graham was wrong in that essay, and also he's an idiot.
2. Anything by Jeff Atwood
3. Jeff Atwood was wrong in that blog post, and also he's an idiot.
4. How I made my startup (succeed|fail) by working 114 hours a day and eating only cardboard and pocket lint.
4a. How we built a web app in 4 minutes for only $0.37
5. Being a (programmer|hacker|founder) is teh awesome.
Not saying that all the articles are on those subjects, but they are HN gold, that's for sure. If you could somehow work some of these topics in to your question, it might do better.
There are millions of references on the subject, but perhaps you could start with a good intro, one of my favorites:
Hope that helps.
And keep on posting. Sooner or later the crowd oughta pick up on it a get your ball rolling.
Ted Codd's big idea is fairly simple: "we can store data in logical, rather than physical, structures and then use logic statements, rather than algorithms, to retrieve the data."
Wikipedia can often end being more confusing than enlightening but the "informal introduction" on this page is a place to start:
Here's a page I just found about predicate/first-order logic:
This book by Chris Date covers a lot of basic relational database ideas, but his writing is a bit tedious:
After that the main mental challenge is just learning to think in sets. Joe Celko has some writings specifically about that in his various books.
After that, the best book I've seen for SQL and practical issues is The Art of SQL:
Hope that helps. There's a lot of really bad writing about relational databases out there.
"The real trick is to spend some time with a good theory book first (I like my DATA & DATABASES), so you have an overview of RDBMS concepts. It makes the language much easier. Can you imagine trying to larn Fortran without knowing algebra?
I like Rick van der Lans book for a language reference and intro. Then move on to Hernandez and Viescas. And finish up with my stuff."
Rick has a book specifically for MySQL:
And, in the beginning there were 2 different teams at IBM trying to implement Ted Codds ideas. The ISBL team stuck faithfully to Codds ideas and the Sequel team kind of bastardized them. But SQL won in the marketplace. Hugh Darwen is still bitter:
1. Things which are relations
2. Things you can use to represent relations
-- pen and paper
-- c arrays & pointers
-- SQL RDBMS tables
3. Things which the concept of relational normalization applies to
4. Things which the concept of relational normalization doesn't apply to
If it sounds like I'm being pedantic, realize that 3 things must be taken into account when talking about whether or not a group of SQL tables represents a normalized group of relations: table stucture, row structure, and access methods.
It's possible to put an entire, fully 3rd normal form, relational database into a single SQL table.
Now I think of it every time I see Atwood's name:
http://lost-theory.org/realultimatepower/ (Yes, it's a ripoff of http://realultimatepower.net)
The good thing about Coding Horror is that it gets people thinking about issues that they might not have considered before. While the details may not be 100% correct, I think the general principle he's trying to communicate is valid.
I can't speak for Jeff, but one of the things that he will probably admit is that he isn't perfect. I've found that he doesn't necessarily write about stuff that he is an expert on, but mostly stuff he's learning about or issues that people should perhaps be thinking about.
I'm sure there are plenty of other websites that debate the finer points of database design.. perhaps they too should stop blogging in case that someone takes their advice too literally. :P
If it's stupid and it works, it's not stupid.