Scheduling & Cron Expressions in Apache Airflow
So far, youβve created DAGs and set up task dependencies. β
But when do these DAGs actually run?
Airflow is like an orchestra. Without timing, the music is chaotic.
This is where scheduling and cron expressions come into play.
Why Scheduling Mattersβ
Scheduling allows your DAGs to:
- Run automatically at predefined times
- Avoid manual triggering
- Ensure consistent data delivery
π Airflow's secret:
DAG execution depends on start_date and schedule_interval.
Scheduling with schedule_intervalβ
schedule_interval is the heart of Airflow scheduling.
- Can be a preset string
- Or a cron expression
Presets Examplesβ
| Preset | Meaning |
|---|---|
| @once | Run once only |
| @hourly | Every hour |
| @daily | Once per day (midnight) |
| @weekly | Once per week (Sunday midnight) |
| @monthly | First day of month |
DAG Example: Daily Scheduleβ
from airflow import DAG
from airflow.operators.empty import EmptyOperator
from datetime import datetime
with DAG(
dag_id="daily_schedule_dag",
start_date=datetime(2024, 1, 1),
schedule_interval="@daily",
catchup=False,
tags=["scheduling"],
) as dag:
start = EmptyOperator(task_id="start")
end = EmptyOperator(task_id="end")
start >> end
π Runs daily at midnight starting from January 1, 2024.
Cron Expressionsβ
Cron expressions provide full control over scheduling.
Cron Syntaxβ
Cron expressions follow this format:
βββββββββ minute (0 - 59)
β βββββββ hour (0 - 23)
β β βββββ day of month (1 - 31)
β β β βββ month (1 - 12)
β β β β β day of week (0 - 6)
β β β β β
* * * * *
Examplesβ
| Cron | Meaning |
|---|---|
| 0 6 * * * | Every day at 6 AM |
| 30 2 * * 1 | Every Monday at 2:30 AM |
| 0 0 1 * * | First day of every month at midnight |
DAG Example: Custom Cronβ
with DAG(
dag_id="cron_schedule_dag",
start_date=datetime(2024, 1, 1),
schedule_interval="0 6 ** ** **", # Every day at 6 AM
catchup=False,
tags=["scheduling", "cron"],
) as dag:
start = EmptyOperator(task_id="start")
end = EmptyOperator(task_id="end")
start >> end
Start Date and Catchupβ
- start_date: When Airflow begins scheduling
- catchup: Whether Airflow should backfill runs before today
Example: Backfill Disabledβ
catchup=False
Prevents Airflow from running historical DAGs, which is usually preferred for beginners.
Timezone Handlingβ
Airflow 2.x supports timezone-aware DAGs:
from pendulum import timezone
with DAG(
dag_id="timezone_dag",
start_date=datetime(2024, 1, 1, tzinfo=timezone("Asia/Kolkata")),
schedule_interval="@daily",
catchup=False,
) as dag:
start = EmptyOperator(task_id="start")
β DAG runs according to local timezone, not UTC.
Input & Output Flowβ
Inputβ
- schedule_interval (preset or cron)
- start_date
- catchup flag
- Optional timezone
Outputβ
- DAG automatically triggered at the scheduled times
- Tasks executed in order according to dependencies
- Logs available in UI
Best Practices (Professional)β
β
Use presets for common schedules
β
Use cron for complex scheduling
β
Always define start_date
β
Set catchup=False unless historical runs are needed
β
Leverage timezone-aware scheduling in production
Common Mistakesβ
β Using dynamic datetime.now() as start_date
β Forgetting timezone differences
β Overlapping schedules causing race conditions
β Confusing schedule_interval with execution date
Key Takeawaysβ
- Airflow scheduling automates DAG execution
- schedule_interval controls run frequency
- Cron expressions give precise timing
- start_date + catchup manage backfills
- Timezone-aware DAGs prevent timing errors
Summaryβ
In this chapter, you learned:
- How schedule_interval works
- Difference between presets and cron expressions
- Importance of start_date and catchup
- How to use timezone-aware DAGs
- Professional scheduling best practices
π― Your DAGs now run automatically, reliably, and on schedule.
Whatβs Next?β
π Airflow Variables & Connections
Learn how to store configuration and credentials:
- Use Variables for parameters
- Use Connections for databases, APIs, and more