Sensors β Poking vs Rescheduling, ExternalTaskSensor
Imagine this scenario:
Your pipeline depends on another system β maybe a **file upload, a **database update, or another DAG finishing.
You donβt want your DAG to fail or run prematurely.
You just want it to wait patiently and efficiently.
This is where Sensors come in.
What Are Sensors in Airflow?β
Sensors are specialized operators that:
- Wait for a condition to be true
- Can pause DAG execution until an event occurs
- Integrate with Airflow scheduling and retries
- Can be efficient in resource usage when configured correctly
Examples of common conditions:
- File exists in S3 or local filesystem
- Database row or table is available
- Another DAG has completed successfully
Poking vs Rescheduling Modesβ
Poking Modeβ
- Default behavior for most sensors
- Continuously checks the condition at a fixed interval (poke_interval)
- Keeps the task instance running
- Can consume worker slots if the wait is long
from airflow.sensors.filesystem import FileSensor
from datetime import datetime
with DAG(dag_id="poke_sensor_example", start_date=datetime(2024,1,1), schedule_interval="@daily") as dag:
wait_for_file = FileSensor(
task_id="wait_for_file",
filepath="/data/input/sales_{{ ds }}.csv",
poke_interval=60,
timeout=3600, # max wait 1 hour
mode="poke",
)
Rescheduling Modeβ
- More efficient for long waits
- Releases the worker slot between checks
- Reduces resource consumption
- Ideal for cloud or multi-task environments
wait_for_file_reschedule = FileSensor(
task_id="wait_for_file_reschedule",
filepath="/data/input/sales_{{ ds }}.csv",
poke_interval=60,
timeout=3600,
mode="reschedule",
)
ExternalTaskSensorβ
Sometimes your DAG must wait for another DAG or task to complete.
from airflow.sensors.external_task import ExternalTaskSensor
wait_for_dag = ExternalTaskSensor(
task_id="wait_for_daily_sales_dag",
external_dag_id="daily_sales_pipeline",
external_task_id="load_sales_table",
allowed_states=["success"],
failed_states=["failed", "skipped"],
poke_interval=300,
timeout=7200,
)
Inputβ
| Parameter | Value |
|---|---|
| external_dag_id | daily_sales_pipeline |
| external_task_id | load_sales_table |
| poke_interval | 300 sec |
Outputβ
External DAG daily_sales_pipeline/load_sales_table completed successfully
Sensor Best Practicesβ
β Recommendedβ
- Use reschedule mode for long waits
- Set timeout to avoid endless tasks
- Limit poke_interval to balance responsiveness and resource usage
- Combine with SLAs for monitoring
β Avoidβ
- Poking sensors with very short intervals on long waits
- Waiting for unavailable or unreliable resources
- Ignoring sensor failures or retries
- Using sensors for heavy computation
Common Mistakesβ
β Using Poke mode for hours-long waits
β Not handling failure states in ExternalTaskSensor
β Overloading workers with too many sensors
β Forgetting to parameterize file paths or DAG IDs
Real-World Use Casesβ
- Wait for ETL files to arrive before processing
- Trigger downstream DAG only after upstream DAG completes
- Poll external APIs or databases for data availability
- Synchronize cross-system workflows
Summaryβ
Sensors are the gatekeepers of your DAGs:
Key Takeaways:
- Wait for conditions without manual intervention
- Choose poke for short waits, reschedule for long waits
- ExternalTaskSensor ensures DAG dependencies are respected
- Best practices reduce wasted resources and improve reliability
Properly implemented, sensors make your Airflow workflows robust, efficient, and event-driven.
Whatβs Next?β
Next in the series:
β‘οΈ Hooks Explained β Database, S3, GCP, Azure