
Show HN: Data Engineering Project for Beginners - jkm2155
https://www.startdataengineering.com/post/data-engineering-project-for-beginners-batch-edition/
======
jkm2155
Simple project to help beginners get started with data engineering. This is my
fist post on HN, any feedback would be greatly appreciated.

~~~
gifflar
Thanks for the illuminating post. I like how Apache Airflow is used to move
the pyspark script to a S3 location so that it can be read by the EMR step. I
remember working on a project where we wanted to automate a data pipeline
using Airflow and had this problem of how to get our pipeline scripts to the
right locations.

~~~
jkm2155
@gifflar Glad you found it illuminating :). Yea moving spark script to S3
using a Airflow task is usually the easiest.

