Hacker News new | past | comments | ask | show | jobs | submit login
The Data Engineering Handbook (github.com/dataexpert-io)
177 points by matthewhefferon 1 day ago | hide | past | favorite | 19 comments





I highly recommend Fundamental of Data Engineering book [1].

Hopefully the authors can update the book soon to reflect the latest information and expand with another entire chapter for data management as they did to data architecture.

[1] Fundamentals of Data Engineering:

https://www.oreilly.com/library/view/fundamentals-of-data/97...


I want to second this. As a CTO I was able to use the themes and concepts to better be able help make decisions within a domain I didn’t know before reading this book.

This isn’t a handbook wtf. It’s more akin to an Awesome [topic] list like Awesome Python on GitHub showing all sorts of projects and things related to the language.

"This repo has all the resources you need to become an amazing data engineer!"

That's a bold claim. Is this a marketing post for selling courses?


It’s a plug for the author’s discord server (first entry under the “must-join” communities list).

Pro tip for any aspiring DEs: you can ignore 98% of the linked junk on this repo. Learn python, learn SQL, read DDIA, and you’ll do fine.


Almost sounds like faang interview prep

Interesting


He does sell a course, but the repo is open-source with contributions from others. It’s got solid resources, and I found it pretty useful.

It's ONE exercise

Nice list! Although as somebody who works on open source tools for data engineering, it kills me a little to see "companies" as the the list header rather than, say, "projects".

(also, shameless plug for my.latest project Wimsey which is non-company affiliated but does let you test data in a nice, lightweight way: https://github.com/benrutter/wimsey)


Is there a data analyst handbook? Starting a role in that soon (switching from SWE - tired of full-time programming)

Is this still a career path worth investing in? It seemed super hot a couple of years ago and then people stopped talking about it. Saw many comments complaining about lack of jobs.

Despite having 4 years of experience as a SWE (excluding bachelor and CS master), it’s the only job I can get. So there’s that

I think it should be worth it. For the reason that you are closer to the product you are building, you help defining what to build.

As opposed to programming which is more like plumbing work.


Believe me, there's a lot of plumbing moving stuff from point A to B and dealing with poop ("dirty data" is the industry euphemism) in the data engineering and data analyst space.

In my more analytic moments I try to convince myself that data engineering and analysis is like chemical refining, creating useful byproducts out of raw liquids, but in my cynical moments, the plumbing metaphors for it are just so much more evocative.


Still, somehow, I think it is where it all started in the 1950s. "I have all these numbers, I need a machine to help me do something useful".

And then after this came a huge industry with programmers.

For me it is more like back-to-basics.


I don’t think people stopped analyzing data, but the job titles probably changed? Data scientists and data engineers are probably now doing what data analysts used to do?

r/dataanalysis and r/analytics still gets mang posts

Kimball's Data Warehousing book is excellent.

"You can save more souls with roller skates and Easy-Bake Ovens than with this 2,000-page sleeping pill" The Simpsons Season 13, Episode 6 :-)

I first learned about star schemas from one of Kimball's books years ago. The content was good, but the writing style wasn’t particularly engaging.

I think the books remain relevant for foundational concepts like dimensional modeling, but Kimball's focus reflected the dominant dbs of the time like Oracle and SQL Server. Columnar databases such as MonetDB were niche and not widely adopted... If I remember right, I don't think Kimball books cover those more than a passing mention.

Are there any more modern books about warehousing out there you would recommend? (other than DDIA, which is brought up all the time these days).




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: