Local Executor vs Celery Executor vs Kubernetes Executor
Choosing the Best Executor for Your Airflow Workflow 🔧
The Story: How Airflow Runs Tasks
Imagine you have a complex workflow:
- Some tasks need to run on the same machine.
- Some tasks need to run across multiple servers.
- Some tasks need dynamic scaling based on workload.
Now, the question is:
How do you control where and how these tasks run?
In Airflow, executors are the key to this control.
But with multiple types of executors—Local, Celery, and Kubernetes—how do you choose the best one for your environment?
That’s what this article will help you decide!
What Are Executors in Apache Airflow?
An executor in Apache Airflow determines where and how the tasks of a DAG (Directed Acyclic Graph) run. Airflow provides different executors to cater to various environments:
- Local Executor: Executes tasks on the same machine as the scheduler.
- Celery Executor: Runs tasks across multiple machines using a message broker.
- Kubernetes Executor: Dynamically runs tasks as Kubernetes pods in a cloud-native environment.
Default Executor: Local Executor
LocalExecutor
By default, Airflow uses the Local Executor. This is ideal for small environments or when you want to execute tasks on a single machine.
-
Pros:
-
Simple to set up.
-
No external infrastructure needed.
-
Cons:
-
Limited scalability (single machine).
-
Not suitable for large-scale tasks or distributed systems.
Why Executors Matter in Airflow
Without an appropriate executor:
- Your Airflow system will be unable to scale.
- Task execution might become slow or unreliable.
- Your DAGs won’t handle failures, retries, or large datasets efficiently.
With the right executor:
- Airflow can scale horizontally (across multiple nodes).
- Task execution becomes more efficient (based on your environment).
- Your workflows become more fault-tolerant and predictable.
The Most Common Executors Explained
1. LocalExecutor (Default)
✔ Executes tasks on the same machine as the scheduler.
❌ Limited by the resources of a single machine.
Use case: Ideal for development, small-scale workflows, or when you want to run everything locally on a single machine.
2. CeleryExecutor
✔ Runs tasks on distributed worker nodes.
✔ Uses a message broker (e.g., Redis or RabbitMQ) to manage task execution across machines.
❌ Requires a message broker and additional setup.
Use case: Ideal for medium to large-scale workflows where you need to horizontally scale task execution.
3. KubernetesExecutor
✔ Dynamically runs tasks as Kubernetes pods.
✔ Scales automatically based on the workload.
❌ Requires Kubernetes infrastructure.
Use case: Best for cloud-native environments that are already using Kubernetes, or where task isolation and scalability are key.
Performance Comparison
Here’s a breakdown of how each executor performs under different conditions:
| Feature | Local Executor | Celery Executor | Kubernetes Executor |
|---|---|---|---|
| Scalability | Low (Single machine) | High (Horizontal scaling) | Very High (Dynamic scaling) |
| Setup Complexity | Low | Medium | High |
| Fault Tolerance | None | Good (Retries supported) | Excellent (Pod retries) |
| Resource Isolation | None | Limited | Full (Pod isolation) |
| Cloud-Native Support | No | No | Yes |
When to Use Each Executor
-
Local Executor: Use for simple, single-machine setups or small-scale tasks. If you’re testing or running small DAGs, the Local Executor is the easiest choice.
-
Celery Executor: Best for medium to large-scale workflows where you need horizontal scaling. It’s ideal when running multiple tasks in parallel across different machines.
-
Kubernetes Executor: Perfect for cloud-native environments where Kubernetes is already in use. It offers dynamic scaling and excellent resource isolation.
Visual Example: Executors in Action
Scenario
Let’s assume you’re running a DAG with three tasks: Task A, Task B, and Task C. You want to execute these tasks in parallel and dynamically scale them.
| Executor | Task A | Task B | Task C |
|---|---|---|---|
| Local Executor | Runs on the same machine | Runs on the same machine | Runs on the same machine |
| Celery Executor | Runs on different workers | Runs on different workers | Runs on different workers |
| Kubernetes Executor | Runs in separate pods | Runs in separate pods | Runs in separate pods |
Practical Example: Celery Executor Setup
from airflow import DAG
from airflow.operators.python import PythonOperator
from airflow.utils.dates import days_ago
from airflow.executors.celery_executor import CeleryExecutor
def my_task():
print("Task is running...")
with DAG(
'celery_executor_example',
default_args={'owner': 'airflow'},
schedule_interval=None,
start_date=days_ago(1),
executor=CeleryExecutor(), # Setting CeleryExecutor here
) as dag:
task1 = PythonOperator(
task_id='task1',
python_callable=my_task
)
task1
Input & Output Example
Input: Configure your Airflow to use the Celery Executor:
[core]
executor = CeleryExecutor
[celery]
broker_url = redis://localhost:6379/0
Output: Tasks are distributed across multiple worker nodes and execute in parallel:
[2024-01-01 12:00:00,000] {scheduler.py:1729} INFO - Starting scheduler...
[2024-01-01 12:01:00,000] {celery_executor.py:134} INFO - Celery worker executing task1
Executor Best Practices
Choosing the Right Executor
- For small-scale environments: Stick with LocalExecutor if your workflow doesn’t need to scale.
- For medium to large environments: Use CeleryExecutor if you need horizontal scalability but don’t want the complexity of Kubernetes.
- For cloud-native environments: If you're already using Kubernetes, the KubernetesExecutor is the best option for dynamic task execution.
Summary 🧠
- Executors control where and how tasks are run in Airflow.
- LocalExecutor is best for simple, small-scale tasks.
- CeleryExecutor allows for horizontal scaling across multiple machines.
- KubernetesExecutor is ideal for cloud-native environments using Kubernetes.
- Choosing the right executor depends on your scaling needs and infrastructure.
Key Takeaways
- Executors define how tasks are distributed and executed.
- CeleryExecutor and KubernetesExecutor offer more scalability than LocalExecutor.
- The best executor depends on your workflow scale and infrastructure.
What’s Next?
➡️ Scaling Workers & Horizontal Autoscaling
Learn how to scale your Airflow setup horizontally and make use of autoscaling to handle larger workloads.