
Ask HN : Apache Airflow vs. Celery - nuser5
I see many common features between airflow and celery. Airflow uses Celery as an Executor. A DAG in airflow is similar to a chain in celery. What are the specific cases airflow is better and which one is good for production?
======
NightMKoder
Both have a place but are generally used differently. Below is my opinion of
the two, though you could in theory use these systems differently.

Airflow executes repeatable processes, usually on a schedule. You probably
won’t want to use airflow to kick off a process per user - an airflow run is
“heavy”. Within a run you would usually run Spark or the like to move and
transform large heaps of data. Airflow makes for a good orchestrator that
makes sure all the jobs are run in the right order. Airflow is best used as a
better structured version of shell scripts to create reporting and data
science pipelines once an hour.

Celery is more of a task executor. I don’t have as much experience with celery
specifically, but generally background work systems like celery are built as a
way to reliably do work asynchronously from a user request. Think async api
calls - e.g. sending emails. You could just do in another thread (or just
don’t await in node and let it happen somewhere), but you want something to
monitor that these things are done even in the face of kill -9. Systems like
Celery tend to perform better when there aren’t dependencies between messages,
and messages are small - e.g. dealing with a single user. Usage of celery-like
systems also tends to be real-time - you want it done asynchronously, but
“sooner than tomorrow.”

