Skip to main content

Local Executor vs Celery Executor vs Kubernetes Executor

Choosing the Best Executor for Your Airflow Workflow 🔧

The Story: How Airflow Runs Tasks

Imagine you have a complex workflow:

  • Some tasks need to run on the same machine.
  • Some tasks need to run across multiple servers.
  • Some tasks need dynamic scaling based on workload.

Now, the question is:

How do you control where and how these tasks run?

In Airflow, executors are the key to this control.

But with multiple types of executors—Local, Celery, and Kubernetes—how do you choose the best one for your environment?

That’s what this article will help you decide!


What Are Executors in Apache Airflow?

An executor in Apache Airflow determines where and how the tasks of a DAG (Directed Acyclic Graph) run. Airflow provides different executors to cater to various environments:

  • Local Executor: Executes tasks on the same machine as the scheduler.
  • Celery Executor: Runs tasks across multiple machines using a message broker.
  • Kubernetes Executor: Dynamically runs tasks as Kubernetes pods in a cloud-native environment.

Default Executor: Local Executor

LocalExecutor

By default, Airflow uses the Local Executor. This is ideal for small environments or when you want to execute tasks on a single machine.

  • Pros:

  • Simple to set up.

  • No external infrastructure needed.

  • Cons:

  • Limited scalability (single machine).

  • Not suitable for large-scale tasks or distributed systems.


Why Executors Matter in Airflow

Without an appropriate executor:

  • Your Airflow system will be unable to scale.
  • Task execution might become slow or unreliable.
  • Your DAGs won’t handle failures, retries, or large datasets efficiently.

With the right executor:

  • Airflow can scale horizontally (across multiple nodes).
  • Task execution becomes more efficient (based on your environment).
  • Your workflows become more fault-tolerant and predictable.

The Most Common Executors Explained

1. LocalExecutor (Default)

✔ Executes tasks on the same machine as the scheduler.
❌ Limited by the resources of a single machine.

Use case: Ideal for development, small-scale workflows, or when you want to run everything locally on a single machine.


2. CeleryExecutor

✔ Runs tasks on distributed worker nodes.
✔ Uses a message broker (e.g., Redis or RabbitMQ) to manage task execution across machines.
❌ Requires a message broker and additional setup.

Use case: Ideal for medium to large-scale workflows where you need to horizontally scale task execution.


3. KubernetesExecutor

✔ Dynamically runs tasks as Kubernetes pods.
✔ Scales automatically based on the workload.
❌ Requires Kubernetes infrastructure.

Use case: Best for cloud-native environments that are already using Kubernetes, or where task isolation and scalability are key.


Performance Comparison

Here’s a breakdown of how each executor performs under different conditions:

FeatureLocal ExecutorCelery ExecutorKubernetes Executor
ScalabilityLow (Single machine)High (Horizontal scaling)Very High (Dynamic scaling)
Setup ComplexityLowMediumHigh
Fault ToleranceNoneGood (Retries supported)Excellent (Pod retries)
Resource IsolationNoneLimitedFull (Pod isolation)
Cloud-Native SupportNoNoYes

When to Use Each Executor

  • Local Executor: Use for simple, single-machine setups or small-scale tasks. If you’re testing or running small DAGs, the Local Executor is the easiest choice.

  • Celery Executor: Best for medium to large-scale workflows where you need horizontal scaling. It’s ideal when running multiple tasks in parallel across different machines.

  • Kubernetes Executor: Perfect for cloud-native environments where Kubernetes is already in use. It offers dynamic scaling and excellent resource isolation.


Visual Example: Executors in Action

Scenario

Let’s assume you’re running a DAG with three tasks: Task A, Task B, and Task C. You want to execute these tasks in parallel and dynamically scale them.

ExecutorTask ATask BTask C
Local ExecutorRuns on the same machineRuns on the same machineRuns on the same machine
Celery ExecutorRuns on different workersRuns on different workersRuns on different workers
Kubernetes ExecutorRuns in separate podsRuns in separate podsRuns in separate pods

Practical Example: Celery Executor Setup

from airflow import DAG
from airflow.operators.python import PythonOperator
from airflow.utils.dates import days_ago
from airflow.executors.celery_executor import CeleryExecutor

def my_task():
print("Task is running...")

with DAG(
'celery_executor_example',
default_args={'owner': 'airflow'},
schedule_interval=None,
start_date=days_ago(1),
executor=CeleryExecutor(), # Setting CeleryExecutor here
) as dag:

task1 = PythonOperator(
task_id='task1',
python_callable=my_task
)

task1

Input & Output Example

Input: Configure your Airflow to use the Celery Executor:

[core]
executor = CeleryExecutor

[celery]
broker_url = redis://localhost:6379/0

Output: Tasks are distributed across multiple worker nodes and execute in parallel:

[2024-01-01 12:00:00,000] {scheduler.py:1729} INFO - Starting scheduler...
[2024-01-01 12:01:00,000] {celery_executor.py:134} INFO - Celery worker executing task1

Executor Best Practices

Choosing the Right Executor

  • For small-scale environments: Stick with LocalExecutor if your workflow doesn’t need to scale.
  • For medium to large environments: Use CeleryExecutor if you need horizontal scalability but don’t want the complexity of Kubernetes.
  • For cloud-native environments: If you're already using Kubernetes, the KubernetesExecutor is the best option for dynamic task execution.

Summary 🧠

  • Executors control where and how tasks are run in Airflow.
  • LocalExecutor is best for simple, small-scale tasks.
  • CeleryExecutor allows for horizontal scaling across multiple machines.
  • KubernetesExecutor is ideal for cloud-native environments using Kubernetes.
  • Choosing the right executor depends on your scaling needs and infrastructure.

Key Takeaways

  • Executors define how tasks are distributed and executed.
  • CeleryExecutor and KubernetesExecutor offer more scalability than LocalExecutor.
  • The best executor depends on your workflow scale and infrastructure.

What’s Next?

➡️ Scaling Workers & Horizontal Autoscaling
Learn how to scale your Airflow setup horizontally and make use of autoscaling to handle larger workloads.