We've been using the python celery task queue library to coordinate python ETL-esque jobs, but as our setup gets more heterogenous language and job wise we're looking for something built more explicitly for the purpose. Luigi and Airflow are both contenders.
Data engineering is moving in a direction where pipelines are generated dynamically. "Analysis automation", "analytics as a service", "ETL frameworks" require dynamic pipeline generation. Providing services around aggregation, A/B testing and experimentation, anomaly detection and cohort analysis require metadata-driven pipeline generation.
Airflow is also more state aware where the job dependencies are kept of for every run independently and stored in the Airflow metadata database. The fact that this metadata is handled by Airflow makes it much easier to say- rerun a task and every downstream tasks from it for a date range. You can perform very precise surgery on false positive/ false negative and rerun sub sections of workflow in time easily.