
Cross-tabulation in MySQL (2001) - alexandros
http://dev.mysql.com/tech-resources/articles/wizard/print_version.html
======
AndrewO
This method has been the bane of my existence for going-on 5 years now. YMMV,
of course, but here's my experience:

Back in 2004, we were decided to denormalize a set of records containing year,
quarter, and 15 or so data points going back to 1996. Using this cross-tab
method, we started out with 8 (years) * 4 (quarters/year) * 15 (data points) =
480 columns (a few more for non-denormalized data). 5 years later, we're up to
the high 700s. Needless to say, this pretty much blows out your standard MySQL
admin tools.

Another pitfall is, you really have to make sure you data is pristine (of
course you should be doing that anyway, but reality is usually far from ideal,
especially if you're me in 2004), otherwise you end up with doubling of values
or values included when they shouldn't be. The generated queries are very very
difficult to debug, again blowing out many admin tools and also text editors
(unless you add pretty formatting in your query generator).

So what would I recommend instead? There are multi-dimensional databases or
ETL tools that will properly put your transactional data in an analytical
style data warehouse. Read Kimball's Data Warehouse Toolkit [1] (ignore the
kinda ratty paper quality—the content is good). The key thing is to
denormalize down to a star schema, but keep your facts and dimensions separate
(things you'll learn in DWT).

There's an Java based OLAP server called Mondrian [2] that also may be worth
your time. I ended up going with ActiveWarehouse and ActiveWarehouse-ETL [3],
but I've felt some things a little more complicated than they needed to be
(but that's another discussion—and I haven't kept up with their development
lately, so it may have improved).

On a side note, recently I've started wondering if map-reduce style tools
would make sense in this context, but I haven't explored using them that way
yet, so I can't actually make a recommendation there.

[1] [http://www.amazon.com/Data-Warehouse-Toolkit-Complete-
Dimens...](http://www.amazon.com/Data-Warehouse-Toolkit-Complete-
Dimensional/dp/0471200247/ref=sr_1_1?ie=UTF8&s=books&qid=1241130825&sr=8-1)

[2] <http://mondrian.pentaho.org/>

[3] <http://activewarehouse.rubyforge.org/>

~~~
alexandros
Cross-tabulation is not about denormalization. If anything, it is about
producing the reports you want while keeping your data normalized.

~~~
AndrewO
It creates repeating groups (gender -> M,F; location -> Boston, Cagliari, etc;
dept -> Development, Personel, etc)—to me, that looks like denormalization.

I'm not sure this is what you're objecting to, but whether or not the data is
_stored_ denormalized (as opposed to just using the result set and throwing it
away), is not really of consequence. I think we'll both agree that we can and
should keep our transactional data normalized.

Either way, this method goes against the grain of what SQL is supposed to do
(otherwise you could meta-program queries without having to resort to a Perl
script to string this thing together). My critique still stands that these
queries can grow to intractable sizes.

If one is going to be a lot of reporting like this, they should really look to
separate their transactional and analytical databases.

But hey, like I said YMMV. In a simple case it works. But trust me when I say,
after 5 years of data growth and requirements shifts, I wish I knew then the
burden that it is now. :)

