

SQL AntiPatterns Presentation from MySQLConf 2009 - jwinter
http://www.slideshare.net/billkarwin/sql-antipatterns-strike-back?src=embed

======
gcv
Fascinating. I haven't finished reading the entire presentation yet, but I
already stumbled on something.

The author lists a drawback of the first antipattern ("metadata tribbles"): a
table split into multiples to keep its size down requires tricky querying,
specifically, it requires a union across all splits (slide 10). It also
requires extra effort to keep table structures synchronized (slide 11). Then,
in a solution to his second antipattern ("entity-attribute-value"), he
suggests doing something quite similar to his first antipattern: creating
multiple tables with nearly identical columns and using a union to query
across them (slides 27 and 28).

Hello? So to resolve antipattern 2 you just apply antipattern 1? This is why
precisely why relational databases for most problems look like square pegs
being hammered into round holes.

[EDIT] Not sure if it's worth finishing reading this presentation. Slide 56, a
"solution" to the problem of storing a hierarchy in a relational database.
Breadcrumbs. Not bad, except that the query now requires a "like" clause.
Whoops, can't use an index on that column anymore! (Feel free to correct me if
there exists a RDBMS which can use an index on a "like.") Hope the author
doesn't mind a nice table scan on that query. He doesn't even mention this
problem.

~~~
fendale
Well, certainly in Oracle, it can use an index on a like if it is structured:

    
    
        where col like 'str%'
    

However, doing

    
    
       where col like '%str'
    

Means it cannot use any index. So in the example given, it would be able to
use an index because there is a leading 'path' before the %, assuming MySQL
can do the same thing.

In Oracle you can do hierarchy query's using the 'connect by' syntax, which
means you can structure your table with id, parent_id and get all the rows out
efficiently in a single piece of SQL, which is nice!

~~~
c0un7d0wn
You use a database that supports CTE or at the very least, recursive querying
options, then you format your schema like so:

    
    
      Comments:
      /-----------------------------\
      |   ID   | ParentID |   etc   |
      |-----------------------------|
      |    1   |   Null   |   etc   |
      |    2   |   Null   |   etc   |
      |    3   |     1    |   etc   |
      |    4   |     1    |   etc   |
      |    5   |     4    |   etc   |
      \-----------------------------/
    

You then use a Common Table Expressions query, like this:

    
    
      WITH CommentTree (ParentID, ID, Level)
      AS
      (
        SELECT ParentID, ID, 0 as [Level], 
            FROM Comments
            WHERE ParentID = [ROOT COMMENT ID] OR ParentID IS NULL
        UNION ALL
            SELECT c.ParentID, c.ID, ct.Level + 1
                FROM Comments c
                JOIN CommentTree ct ON ct.ID = c.ParentID
      )
      -- Now actually query the data from the CTE Expression
      SELECT ParentID, ID, Level FROM CommentTree 
         ORDER BY ParentID, Level, ID
    

This returns a table with a calculated column called Level.

    
    
      /---------------------------\
      | ParentID | Level |   ID   |
      |---------------------------|
      |   Null   |   0   |   1    |
      |   Null   |   0   |   2    |
      |    1     |   1   |   3    |
      |    1     |   1   |   4    |
      |    4     |   2   |   5    |
      \---------------------------/
    

Which of course, would represent this tree:

    
    
           Root
          /    \
         2      1
               / \
              3   4
                   \
                    5

------
snorkel
I disagree with slides 20 - 30 (calling "entity-attribute-value" tables an
antipattern). Variable attributes table not only gives you a lot of
application flexibility but also helps performance in that you're not
polluting your primary table with extraneous metadata columns that are hardly
ever referenced. It doesn't surprise me that DB programmer would look at that
and be offended that it's not purely referential -- not everything that
happens in a database has to be referential to everything else. Sometimes
databases are just dumb data stores. Not all problems are solved by
"referential integrity"

~~~
blasdel
The Entity-Attribute-Value idiom (tuplestores, RDF, BigTable, etc.) is by far
my favorite data model. The App Engine 'Datastore' sugar on top of BigTable is
pretty goddamn awesome.

RDF would have been great if the W3C wasn't full of idiot blowhards.

There are some application domains that have not lived up to their potential
because of RDBMS abuse where EAV would fit perfectly -- Desktop Search for
example. All of [Spotlight, Google Desktop, Beagle, Gnome Tracker] use RDBMSs
with a pre-determined set of 'attributes' baked in, crippling any creative
use. Spotlight at least has the decency to let you add your own non-indexed
properties, but it's not very useful. The contents of people's filesystems and
metadata about them will never fit into some spec-writer's schema.

Compared to RDBMSs I think sugared EAV is lot more amenable to being abused in
the "when all you have is a hammer" way -- they start a bit slow but scale
linearly with more resources without getting any slower.

------
gchpaco
I was expecting something more MySQL specific, and was pleasantly surprised.
It's certainly worth thinking twice before violating any of these.

