Hacker News new | past | comments | ask | show | jobs | submit login

I'd highly recommend reading [Designing Data-Intensive Applications](https://www.amazon.com/Designing-Data-Intensive-Applications...). The book gives you a great overview of designing data systems - foundational knowledge you'll need in any DE role.

The reason you can't find data engineering materials online is because real data engineering really only happens at a handful of companies - and those companies maintain this knowledge base internally and do not share it.

I noticed that you listed tools / frameworks to learn, as well as languages. Another piece of advice would be to not focus on those because they come and go (for example, Hadoop is pretty much deprecated in any DE-heavy company). What lasts is an understanding of distributed systems, distributed query engines, storage technologies, and algorithms & data structures. If you have a firm grasp on those, you won't have to start from scratch every time a new framework is introduced. You'll immediately recognize what problems the tech is solving and how they're solving it, and based on your knowledge you can connect the dots and know if that solution is what you need.

Another thing to do is watch CS186 from Berkeley in its entirety. This course is about relational databases, but will give you the foundation you need to speak the DE language.

Source: I work as a data engineer at what some would call a big company :)




Great advice! I actually got that book last night as I researched more. I’ll be looking into the Berkeley class as well!




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: