
Show HN: We wrote a book to help data people build scalable analytics stacks - shadowsun7
https://www.holistics.io/books/setup-analytics/
======
vinhdp
I'm a data engineer at a large corporation. At my current company, we use
Pentaho to extract and transform our data from Oracle daily. The transformed
data is loaded to a staging database where we then model it to a star-schema.
Then the final results are bulk loaded to an on-prem DW. The process takes
hours to complete.

I’m interested in moving to a model that continually extracts changed data to
a data lake, then using the power of a cloud data warehouse to read those
files and perform the transformations and modeling in SQL. I guess that's the
ELT concept that you mentioned in the book's summary.

Goal being to reduce the latency and allow for the possibility of more
frequent batches, as well as making the process more accessible to my team
with strong SQL skills and being able to adapt faster to changing business
needs.

This book looks like a good foray for me to get a glimpse into those new
process. Thanks for putting it together.

~~~
huy
I'm one of the authors of the book. Yes you're right. The book outline the
transition from the "old world of BI" to the "new world of BI". If you read
through the book, you'll see the biases clearly stated there:

\- ELT over ETL

\- Data Modeling is crucial as part of BI workflow

\- Cloud DW over on-premise DW (but this depends on your org's requirements)

\- SQL reporting (Redash, Metabase, Looker, Holistics) over non-SQL reporting
(Tableau, Qlik)

~~~
mritchie712
I'd toss SeekWell[1] in the "new world" category, but we've taken a different
approach. Instead of forcing people to use a whole separate platform for BI,
we decided to tightly integrate with the tools people were already going to
for data (e.g. Google Sheets, Excel, Slack, etc.). We've found teams stay
better informed when the data is in places they're already hanging out.

[1] [https://seekwell.io/](https://seekwell.io/)

------
sondnm
Really well done! The content looks like a great balance. I'm an engineer
working a bit in data engineering, and I find the content of the book relevant
to me.

I like how you explain why the methodologies of the pre-cloud era still have
lessons learned to apply to today, but implementation best practices have
changed thanks to the cost model of the cloud.

The section that stood out the most to me, strangely, was not anything to do
with the technology or analytics stack. It was in Chapter 3 – Data Modeling
Layer and Concepts where you discuss the dynamic between the CEO and the data
analyst and the data. This really articulated quite well how our own dynamic
functions at our current company. Even with our current data warehouse, our BI
team is a bottleneck, and it is something becoming more and more apparent to
me. It is my primary motivation in seeking out how best to re-architect our
analytics stack.

~~~
huy
Thank you for sharing your thoughts! Yes the concept of data modeling and
self-service analytics are interesting, yet few people fully grasp it. Chapter
3.1 is probably one of my most favorites.

------
alanng
This is just the book I need!

A little bit of context: I am a product manager and I have been working with
data analysts and engineers for a few months, and even though I have tried to
do a lot of research, sometimes I still don't understand what they said.

Terms are extremely difficult and varied depending on the site, and it seems
like each company will have a different perception for one term.

So that's where this book comes in handy. It helped me visualize the big
picture of the whole data analytics landscape. What's more, I understand what
the role and challenges of the data analysts and data engineers in my team
are. I was able to communicate with them in their "language", especially when
I was explaining why we should use ELT instead of ETL (Chap 3, I suppose)

Anw, I think this book is great for non-tech people like me, but it requires
certain experience in the tech industry to get started with. Definitely
recommend for other PMs who will be working with data people!

------
kentnguyen
I've been looking for something like this for a while. Most of the time when I
go online to search for resources on building analytics stack, most of the
content is biased towards the vendor's preferred way of doing things. This
looks like it will give me a high-level understanding to the why of the
proposed approach.

~~~
huy
This is exactly why we started out writing this book. We actually spoke with a
lot of customers, and a fair share of them sharing the same thing: They found
a lot of how-to on the web, but none of which is comprehensive, and goes deep
into the "why" and the "history" of BI.

------
sixhobbits
Amazing that we can get such high quality resources for free. That said, I'm
not convinced by the example where the CEO uses the "data modelling layer"
(essentially what holistics offers to build for you and what this book is an
ad for). In my experience a good data analyst does far more than "translate"
the business question to SQL. The exec's understanding is not only limited by
not knowing SQL, but also by potential confounds or a billion other things
that can make a seemingly meaningful result meaningless or dangerous.

I don't think simpler technology can give people the magic answers and easy
data access that they crave any more than no code tools can let people build
complicated _correct_ systems

~~~
huy
You’re right, and also right that we might be biased. We actually not trying
to say that with the modeling layer the CEO can remove her reliance on the DA
completely, that would be foolish. We think she can only reduce that reliance
down when it _only_ comes to getting access to data. We are not talking about
complicated analysis that requires a proper data mindset.

------
shadowsun7
Hey HN. This is something we've been working on for the last three months over
at Holistics.

If you're a data analyst, data engineer, or a founder setting up a data
analytics stack for the first time, this is a book that will give you a soup-
to-nuts overview of an entire field.

Like most books about data analytics, this assumes some amount of technical
competence.

Unlike most books in the space, this is mostly about first principles. About
the ideas behind the tools, not the tools themselves.

The hope is to give you 'just enough to not get lost'. And the book is written
to be read within 2 hours of reading — in some cases, no more than two
sittings!

There's probably more than a hundred hours of research and writing that went
into this. I'm looking forward to read your comments.

------
khaito24
Great book and spot on of the problem statement. One interesting note to point
out is I didn't see any mention of testing your data models or version
control? This should be part of the process of the modern analytics stack to
ensure data quality.

A suggestion of approaches and tools could be useful. Whether it's via tools
such as dbt for expected field values or with frameworks such as
GreatExpectations. What happens to data that doesn't conform to expected
values? How should you handle it? and how can the testing process be
automated? This forms an important part of ensuring data quality and
reliability of the analysed output.

~~~
huy
Yes.. we did get a number of early feedback about this exact topic of data
quality. But we eventually decided to cover it in another separate sidebar to
the book. It’s also not a simple topic to cover. And the book is supposed to
“give you enough to be dangerous”.

------
thongda
Nice, the illustrations look pretty good. I just took a look at the table of
contents, it seems to cover a lot of my questions about data analytics for a
product guy like me, will spend some time reading this weekend.

Sending to my data team btw, thanks for sharing

------
scared2
Very nice drawings, what tool was used ?

~~~
huy
We used iPad, Apple Pencil and Paper app :)

