
Data Warehousing 101 - corbet
http://lwn.net/SubscriberLink/452307/f691b8557f05ab8e/
======
eneveu
If you are interested in Data Warehousing, you should read Ralph Kimball's
"The Data Warehouse Toolkit": [http://www.amazon.com/Data-Warehouse-Toolkit-
Complete-Dimens...](http://www.amazon.com/Data-Warehouse-Toolkit-Complete-
Dimensional/dp/0471200247)

When I started learning about BI (Business Intelligence), a few members of the
Pentaho community advised me to read this book. I'm glad I did. Kimball is one
of the "fathers" of data warehousing, and his book had a lot of great insights
for dimensional modeling. It helped me avoid many design mistakes while
building my DWH, and gave me insight I might have taken years to discover.

It's a "theoretical" book, in the sense that it does not focus on any specific
technology; it's also a "practical book", because he uses real-world scenarios
(inventory management, e-commerce, CRM...) to demonstrate the various
dimensional modeling techniques. I also liked the part about BI project
management and encouraging BI in a company (= how to engage users and how to
"sell" a BI project to management).

He also has a newsletter with many DWH design tips (archives here:
<http://www.kimballgroup.com/html/07dt.html> ).

------
thibaut_barrere
Oldies but goodies

<http://philip.greenspun.com/wtr/data-warehousing.html>

(data warehousing for cavemen)

~~~
flannell
Great link I found it very readable, hope the guy does more. I intend to buy
the book he recommends at the bottom as well.

------
xal
Shopify is pondering open-sourcing our internal tool called Tiller. It runs
all the reporting for our considerable data warehouse efforts, yet it's
lightweight and super fast to get running.

Watch this space.

~~~
rufugee
I'd be very interested in this as well...please submit to HN if this becomes a
reality!

------
billswift
>If you're building an archive, your only requirements are to minimize storage
cost and to make sure the archive can keep up with the generation of data.

And in some of the cases he mentions be really. _really_ certain you don't
lose data, since some of the laws impose criminal penalties on data loss, and
not necessarily even on the most responsible parties (legislatures have been
getting increasingly psychotic this way).

------
pratikpatel
The last stage of enterprise integration with the DW is through Data Marts,
which are organized into Dimensions and Facts, and allow for dynamic
interfaces for business users to mine their data. My current project is using
Informatica CDC (Change Data Capture) to read multiple source databases
through their logs and aggregate in real-time. Its really incredible and
enables any level of intricate reporting requirements.

~~~
r00fus
The article talks about Informatica as an ETL tool provider... I wouldn't
agree it's the "last stage" either, ETL is critical to any (E)DW.

------
dgudkov
This isn't a good article about data warehousing 101. I've been working in
data warehousing since 2004. The core thing in DW is DWH data model because
it's actually abstraction layer than converts raw transactional data into
meaningful, consistent, correct and persistent representation of an
organization's activity. Tools (including mentioned in the article) are just
means to achieve that goal.

------
mumrah
Hive is a really slick DW tool built on top of Hadoop. It has a SQL-like
language and supports typical DW techniques like table partitioning, key
clustering, etc.

------
ajtaylor
As a data warehouse newbie, this was an excellent introduction. I've heard a
few of the names mentioned, but there are lots of new faces to explore.

------
orenmazor
I'd like to learn more about this. can anybody recommend some cool projects
(open source, or even just ideas) for me to explore?

------
T_S_
This article seems a little out of date. It's missing most of the things I
have looked at over the past year. E.g. Riak, MongoDB, Redis and so on.

------
gaius
Can't mention MapReduce without Oracle Coherence (nee Tangosol)

~~~
mwexler
Really? I've mentioned MapReduce for years without mentioning Coherence or
Tangosol.

Or was I missing a joke?

~~~
gaius
Presumably you weren't enumerating available tools for it in the context of
datawarehousing?

Because if you were, you made an omission.

