I highly recommend Fundamental of Data Engineering book [1].
Hopefully the authors can update the book soon to reflect the latest information and expand with another entire chapter for data management as they did to data architecture.
I want to second this. As a CTO I was able to use the themes and concepts to better be able help make decisions within a domain I didn’t know before reading this book.
This isn’t a handbook wtf. It’s more akin to an Awesome [topic] list like Awesome Python on GitHub showing all sorts of projects and things related to the language.
Nice list! Although as somebody who works on open source tools for data engineering, it kills me a little to see "companies" as the the list header rather than, say, "projects".
(also, shameless plug for my.latest project Wimsey which is non-company affiliated but does let you test data in a nice, lightweight way: https://github.com/benrutter/wimsey)
Is this still a career path worth investing in?
It seemed super hot a couple of years ago and then people stopped talking about it. Saw many comments complaining about lack of jobs.
Believe me, there's a lot of plumbing moving stuff from point A to B and dealing with poop ("dirty data" is the industry euphemism) in the data engineering and data analyst space.
In my more analytic moments I try to convince myself that data engineering and analysis is like chemical refining, creating useful byproducts out of raw liquids, but in my cynical moments, the plumbing metaphors for it are just so much more evocative.
I don’t think people stopped analyzing data, but the job titles probably changed? Data scientists and data engineers are probably now doing what data analysts used to do?
"You can save more souls with roller skates and Easy-Bake Ovens than with this 2,000-page sleeping pill" The Simpsons Season 13, Episode 6 :-)
I first learned about star schemas from one of Kimball's books years ago. The content was good, but the writing style wasn’t particularly engaging.
I think the books remain relevant for foundational concepts like dimensional modeling, but Kimball's focus reflected the dominant dbs of the time like Oracle and SQL Server. Columnar databases such as MonetDB were niche and not widely adopted... If I remember right, I don't think Kimball books cover those more than a passing mention.
Are there any more modern books about warehousing out there you would recommend? (other than DDIA, which is brought up all the time these days).
Hopefully the authors can update the book soon to reflect the latest information and expand with another entire chapter for data management as they did to data architecture.
[1] Fundamentals of Data Engineering:
https://www.oreilly.com/library/view/fundamentals-of-data/97...
reply