Hey HN community,
I recently pushed an update to my GitHub repo titled "Practical Data Engineering: A Hands-On Real-Estate Project Guide". This open-source project aims to tackle real-world data engineering challenges while exploring various technologies. It guides you through building a data application that collects, enriches, and visualizes real-estate data, potentially helping you find your dream property.
This project covers web scraping with Beautiful Soup, processing data with Spark and Delta Lake, visualizing with Apache Superset, and much more, all orchestrated on Kubernetes for scalability.
I started this project back in November 2020, mainly to learn and teach data engineering. Three years on, I'm fascinated by the fact that despite the data engineering space moving extremely fast, the core of my project, powered by carefully chosen tools from the Open Data Stack, remains relevant to this day. This project is my most searched blog post on Google, which motivated me to update it.
I updated to the latest versions of tools like Dagster while exploring new additions like delta-rs, which allows direct interactions with Delta Tables in Python.
https://github.com/sspaeti-com/practical-data-engineering
I look forward to your thoughts and seeing what you would build differently. My future plans are to add Rill Developer as a code-first BI tool and add DuckDB or Polars to the mix.
Without having looked into it in detail, from the outside looking in, it strikes me as if you're highlighting the technologies you're using more than the actual insights you're getting from the data. Even on this very post you're saying you have plans to add X, Y and Z to the stack—without considering why you're doing it.
This is perfectly fine if your goal is just to learn all these technologies. But in that case, chances are the project isn't really interesting to anyone but you
I would encourage you to take a step back and reconsider what you can do next. Now that you possess the knowledge of using all these tools, how can you best use the best tool to answer the best question that can be asked about some interesting problem?
Usually, interesting problems are those that remove a major source of pain for someone else. Very often, that someone else soon becomes your first customer.
Good luck!