Skip to main content

Task Dependencies in Apache Airflow

You already know how to create tasks using operators.

But tasks alone are like actors without a script β€”
they need to know when to act and when to wait.

That’s where task dependencies come in.


Why Task Dependencies Matter​

In real data pipelines:

  • You extract data before you transform
  • You validate before you load
  • You notify only after success

Airflow enforces this logic through dependencies.

πŸ“Œ Rule:

A task will not run until all its upstream tasks succeed.


Real-World Story: Dependency as Trust​

Imagine a relay race πŸƒβ€β™‚οΈ:

  • Runner 2 cannot start until Runner 1 hands over the baton
  • If Runner 1 fails, the race stops

This is exactly how Airflow treats dependencies.


Ways to Define Dependencies in Airflow​

Airflow provides three main methods:

  1. set_upstream()
  2. set_downstream()
  3. Bitshift operators (>> and <<)

All do the same thing β€” readability is the difference.


1️⃣ Using set_downstream()​

Syntax​

task_a.set_downstream(task_b)

Meaning​

task_b runs after task_a


Example​

extract.set_downstream(transform)

πŸ“Œ extract β†’ transform


2️⃣ Using set_upstream()​

Syntax​

task_b.set_upstream(task_a)

Meaning​

task_b waits for task_a


Example​

load.set_upstream(transform)

πŸ“Œ transform β†’ load


3️⃣ Using Bitshift Operators (>> and <<)​

This is the most modern and recommended approach.


Right Shift (>>)​

extract >> transform

Means:

extract runs before transform


Left Shift (<<)​

load << transform

Means:

transform runs before load


Full DAG Example: Linear Dependencies​

from airflow import DAG
from airflow.operators.empty import EmptyOperator
from datetime import datetime

with DAG(
dag_id="task_dependencies_demo",
start_date=datetime(2024, 1, 1),
schedule_interval="@daily",
catchup=False,
tags=["dependencies", "airflow"],
) as dag:

start = EmptyOperator(task_id="start")
extract = EmptyOperator(task_id="extract")
transform = EmptyOperator(task_id="transform")
load = EmptyOperator(task_id="load")
end = EmptyOperator(task_id="end")

start >> extract >> transform >> load >> end

Input & Output Flow​

Input​

  • DAG triggered manually or by schedule

Output​

  • Tasks execute top to bottom
  • Each task waits for upstream success
  • DAG completes successfully

Fan-Out Dependencies (One to Many)​

extract >> [transform, validate]

πŸ“Œ extract must complete before both tasks start.


Fan-In Dependencies (Many to One)​

[transform, validate] >> load

πŸ“Œ load waits until both tasks succeed.


Complex Dependency Example​

start >> extract
extract >> [transform, validate]
[transform, validate] >> load >> end

πŸ“Š This creates a diamond-shaped DAG, common in production.


What Happens on Failure?​

  • Downstream tasks are skipped
  • DAG stops progressing
  • Logs show failure root cause

πŸ“Œ This prevents bad data from flowing downstream.


Best Practices (Enterprise Grade)​

βœ… Prefer >> and << for readability
βœ… Keep dependency chains simple
βœ… Use fan-in/fan-out thoughtfully
βœ… Avoid overly complex DAG graphs
βœ… Visualize DAGs in Graph View


Common Mistakes​

❌ Circular dependencies (Airflow blocks them)
❌ Mixing dependency styles in one DAG
❌ Overloading DAGs with logic
❌ Forgetting dependency definition


Key Takeaways​

  • Task dependencies control execution order
  • set_upstream and set_downstream define relationships
  • Bitshift operators are cleaner and preferred
  • Proper dependencies prevent data corruption
  • Clear DAG graphs improve maintainability

Summary​

In this chapter, you learned:

  • Why task dependencies exist
  • Three ways to define dependencies
  • Linear, fan-in, and fan-out patterns
  • Failure behavior in Airflow
  • Professional best practices

🎯 You now control the flow of execution in Airflow.


What’s Next?​

πŸ‘‰ Scheduling & Cron Expressions
Learn how Airflow decides when your DAG runs:

  • Presets (@daily, @hourly)
  • Cron expressions
  • Timezone handling