
Ask HN: What is a good book or resource discussing how to create a data warehouse - thorin
I&#x27;m very aware of working with databases in a transactional environment (Oracle&#x2F;SQLServer&#x2F;Postgres) and I&#x27;ve done a lot of work creating reports and BI with Jasper, Oracle&#x27;s tools and Business objects. I&#x27;ve never created a data warehouse or a star schema though. I&#x27;m aware of Ralph Kimball, but I was wondering if there are any classic texts I should be aware of or any modern takes on the subject.<p>I have a rough idea how to do all this and could knock up manual scheduled scripts to do the ETL and get close to the kind of schema design. I guess what I&#x27;m looking for is a case study or overview of best practice.
======
heynickc
I haven't actually built a cube myself, but I support a few right now. My
boss, who built ours, always refers me to Ralph Kimball's "The Microsoft Data
Warehouse Toolkit" (you mentioned Oracle, so maybe there are synonymous
toolsets out there)

I know you said you're aware of Ralph Kimball, but the first ~100 pages are
broken into 1) Defining the Business Requirements and 2) Designing the
Business Process Dimensional Model. It's really helped me wrap my head around
the original design of the Facts and Dimensions as they relate to the
business.

~~~
thorin
Thanks - amazon already tried to sell me this The Data Warehouse Toolkit: The
Definitive Guide to Dimensional Modeling which I'd probably go for. I'm
guessing the content would be similar but more database agnostic which suits
my preferences.

I feel comfortable doing the reporting and scripting but it would be good to
validate my ideas for the designs when I get some hands on experience -
hopefully soon!

Maybe a new question but I wonder if the data warehouse has been superseded a
little by the whole big data / multiple data sources thing?

~~~
heynickc
Yea I've wondered that myself. I picked up a book a while back called "Data
Science for Business"
[http://shop.oreilly.com/product/0636920028918.do](http://shop.oreilly.com/product/0636920028918.do)

Something that stood out to me was their discussion about "Big Data
technologies" (Hadoop, HBase...) and how they "support" data mining
techniques. So sometimes I wonder if big data technologies are more for the
processing of the data, like the ETL needed for data warehouses - but since
it's big, we need those special technologies to process it (and do additional
data-sciencey things on it - like calculate a probability of churn,
probability of loan default, etc). End results can still be those high-
performance in-memory objects that we can slice and dice, just like our data
warehouses, if that's how we need to see them.

This is all coming from no practical experience with Hadoop / Big Data, just
research so hopefully someone clarifies :)

------
elchief
This book is good (I've read it), highly rated on Amazon, and reasonably
priced: [http://www.amazon.com/Agile-Data-Warehouse-Design-
Collaborat...](http://www.amazon.com/Agile-Data-Warehouse-Design-
Collaborative/dp/0956817203)

