
Ask HN: What are the best data warehouse and BI resources in print and online?   - 3pt14159
I started formally practicing data analytics a little under a year ago. Since then, I’ve really ramped up my Ruby, C++ (w. boost), bash, and (especially) MySQL skills. I’ve always been a MS Excel geek, but I love the power, versatility, and reusability of MySQL charged Ruby scripts. Excel is now just the final step for most of my analytics and market intelligence reports. I use Excel to generate pretty graphs, and to calculate the occasional statistical distribution or linear optimization. I’m at the point where I will still need to do these types of things, but I would like to expand into other things that I think will help the stake holders at my company. Real time dashboards with our key business metrics, recurring automated reports, and (possibly) automatic neural networks. This is the type of stuff that a good Business Intelligence (BI) solution would provide painlessly.<p>After a bit of research around the internet, I discovered that I need a rock solid data warehouse if I’m going to effectively implement any of the BI products out there. My question is this: What are the best books and online hidden gems about building data warehouses? I need to fully understand not just why a certain method will work better for my circumstances, but also how to implement that method. Code examples are a must. I’m willing to spend a fair bit of time and money on learning this and I know no other place to turn for trustworthy advice. Thanks a ton, you guys and gals are the best!
======
pradocchia
Have your read Kimball's Data Warehouse Toolkit? I'm partial to the first
edition (1996). It's mostly conceptual stuff: design rationale, OLTP vs OLAP,
fact vs dimension, type II SCDs (very important), surrogate keys, aggregate
navigation etc.

Most code in a data warehouse is ETL code, and much of that is obscured behind
the interface of big-name tools like Informatica. I don't think you are
interested in that. People who roll their own have typically rolled with Perl.
Try a search on Perl + ETL and you might find something.

Other items of interest:

* Pentaho, an open-source BI suite. Includes ETL, reporting and multidimensional cubes.

* Column-oriented data stores like Vertica, Sybase IQ, MonetDB, and Infobright. MonetDB and Infobright are both open source. Infobright integrates with MySQL, so that might be up your alley.

* SSDs may or may not turn data warehousing on its head. Yet to be determined.

* schema-driven ETL is very powerful: define an input view/function, a target table, a few naming conventions, and generate your ETL code directly off the system tables--ETL with opinions, as it were.

EDIT:

For online resources, I used read intelligententerprise.com. Can't vouch for
its current quality. Back then it was hit or miss, and promotional items
frequently overshadowed substantive material.

~~~
3pt14159
Thanks a ton. I'll make sure I pick up Kimball's Data Warehouse Toolkit and
check out intelligententerprise.com. The four that pop out most from your post
are Pentaho, MonetDB, Infobright, and schema-driven ETL, so I'll start my
search there. Thanks once again for the help!

~~~
pradocchia
How has it gone? Have you found any helpful sources?

I have since learned that LucidDB may be more appropriate for DW-style
workloads than MonetDB. There was a LucidDB thread on HN recently that had
more details.

