I work for a small company and am in charge of an ETL application. I only learned of the acronym in the last few months and have been on this project for over 18 months. I am in the process of re-working some aspects of what I built to handle de-duplication better. I will be touching core components of the application so this is a good chance to make changes. I would like to learn more about common practices before I reinvent the wheel a second time. |
Apache Spark is another option https://databricks.com/session/building-robust-etl-pipelines...