
Ask HN: Best resources, books, & courses on Data Engineering? - elamje
Hi everyone, I&#x27;m  searching for resources for learning data engineering, but I can&#x27;t find anything on google because of SEO bullsh<i></i> ruining the results.<p>My background is a Computer Engineering undergrad (lots of Java &amp; Clojure, and an intro course to Hadoop), so I&#x27;m not starting from scratch. I want books&#x2F;online resources that are up to date, and extremely practical from the perspective of someone doing big data in tech. E.g. Algorithms for Big Data, Hadoop, Spark, Kafka, Cassandra, Scala, etc.<p>I really appreciate all suggestions, and comments on the quality of others&#x27; suggestions.
======
nw__dataeng
I'd highly recommend reading [Designing Data-Intensive
Applications]([https://www.amazon.com/Designing-Data-Intensive-
Applications...](https://www.amazon.com/Designing-Data-Intensive-Applications-
Reliable-
Maintainable/dp/1449373321/ref=sr_1_1?crid=1JFD2NONOK4OG&keywords=designing+data+intensive+applications&qid=1562940898&s=gateway&sprefix=data+intensive+app%2Caps%2C196&sr=8-1)).
The book gives you a great overview of designing data systems - foundational
knowledge you'll need in any DE role.

The reason you can't find data engineering materials online is because real
data engineering really only happens at a handful of companies - and those
companies maintain this knowledge base internally and do not share it.

I noticed that you listed tools / frameworks to learn, as well as languages.
Another piece of advice would be to not focus on those because they come and
go (for example, Hadoop is pretty much deprecated in any DE-heavy company).
What lasts is an understanding of distributed systems, distributed query
engines, storage technologies, and algorithms & data structures. If you have a
firm grasp on those, you won't have to start from scratch every time a new
framework is introduced. You'll immediately recognize what problems the tech
is solving and how they're solving it, and based on your knowledge you can
connect the dots and know if that solution is what you need.

Another thing to do is watch CS186 from Berkeley in its entirety. This course
is about relational databases, but will give you the foundation you need to
speak the DE language.

Source: I work as a data engineer at what some would call a big company :)

~~~
elamje
Great advice! I actually got that book last night as I researched more. I’ll
be looking into the Berkeley class as well!

------
mindcrash
List of resources here:

[https://github.com/adilkhash/Data-Engineering-
HowTo](https://github.com/adilkhash/Data-Engineering-HowTo)

And here is a (free) book you might like:

[https://github.com/andkret/Cookbook](https://github.com/andkret/Cookbook)

"I get asked super often how to become a Data Engineer. That's why I decided
to start this cookbook with all the topics you need to look into.

It's not only useful for beginners, professionals will definitely like the
case study section."

Also +1 for Kleppmann's book mentioned below. That thing is awesome.

