Skip to main content

Scheduling & Cron Expressions in Apache Airflow

So far, you’ve created DAGs and set up task dependencies. βœ…

But when do these DAGs actually run?

Airflow is like an orchestra. Without timing, the music is chaotic.
This is where scheduling and cron expressions come into play.


Why Scheduling Matters​

Scheduling allows your DAGs to:

  • Run automatically at predefined times
  • Avoid manual triggering
  • Ensure consistent data delivery

πŸ“Œ Airflow's secret:

DAG execution depends on start_date and schedule_interval.


Scheduling with schedule_interval​

schedule_interval is the heart of Airflow scheduling.

  • Can be a preset string
  • Or a cron expression

Presets Examples​

PresetMeaning
@onceRun once only
@hourlyEvery hour
@dailyOnce per day (midnight)
@weeklyOnce per week (Sunday midnight)
@monthlyFirst day of month

DAG Example: Daily Schedule​

from airflow import DAG
from airflow.operators.empty import EmptyOperator
from datetime import datetime

with DAG(
dag_id="daily_schedule_dag",
start_date=datetime(2024, 1, 1),
schedule_interval="@daily",
catchup=False,
tags=["scheduling"],
) as dag:

start = EmptyOperator(task_id="start")
end = EmptyOperator(task_id="end")

start >> end

πŸ“Œ Runs daily at midnight starting from January 1, 2024.


Cron Expressions​

Cron expressions provide full control over scheduling.

Cron Syntax​

Cron expressions follow this format:

β”Œβ”€β”€β”€β”€β”€β”€β”€β”€ minute (0 - 59)
β”‚ β”Œβ”€β”€β”€β”€β”€β”€ hour (0 - 23)
β”‚ β”‚ β”Œβ”€β”€β”€β”€ day of month (1 - 31)
β”‚ β”‚ β”‚ β”Œβ”€β”€ month (1 - 12)
β”‚ β”‚ β”‚ β”‚ β”Œ day of week (0 - 6)
β”‚ β”‚ β”‚ β”‚ β”‚
* * * * *

Examples​

CronMeaning
0 6 * * *Every day at 6 AM
30 2 * * 1Every Monday at 2:30 AM
0 0 1 * *First day of every month at midnight

DAG Example: Custom Cron​

with DAG(
dag_id="cron_schedule_dag",
start_date=datetime(2024, 1, 1),
schedule_interval="0 6 ** ** **", # Every day at 6 AM
catchup=False,
tags=["scheduling", "cron"],
) as dag:

start = EmptyOperator(task_id="start")
end = EmptyOperator(task_id="end")

start >> end

Start Date and Catchup​

  • start_date: When Airflow begins scheduling
  • catchup: Whether Airflow should backfill runs before today

Example: Backfill Disabled​

catchup=False

Prevents Airflow from running historical DAGs, which is usually preferred for beginners.


Timezone Handling​

Airflow 2.x supports timezone-aware DAGs:

from pendulum import timezone

with DAG(
dag_id="timezone_dag",
start_date=datetime(2024, 1, 1, tzinfo=timezone("Asia/Kolkata")),
schedule_interval="@daily",
catchup=False,
) as dag:
start = EmptyOperator(task_id="start")

βœ… DAG runs according to local timezone, not UTC.


Input & Output Flow​

Input​

  • schedule_interval (preset or cron)
  • start_date
  • catchup flag
  • Optional timezone

Output​

  • DAG automatically triggered at the scheduled times
  • Tasks executed in order according to dependencies
  • Logs available in UI

Best Practices (Professional)​

βœ… Use presets for common schedules
βœ… Use cron for complex scheduling
βœ… Always define start_date
βœ… Set catchup=False unless historical runs are needed
βœ… Leverage timezone-aware scheduling in production


Common Mistakes​

❌ Using dynamic datetime.now() as start_date
❌ Forgetting timezone differences
❌ Overlapping schedules causing race conditions
❌ Confusing schedule_interval with execution date


Key Takeaways​

  • Airflow scheduling automates DAG execution
  • schedule_interval controls run frequency
  • Cron expressions give precise timing
  • start_date + catchup manage backfills
  • Timezone-aware DAGs prevent timing errors

Summary​

In this chapter, you learned:

  • How schedule_interval works
  • Difference between presets and cron expressions
  • Importance of start_date and catchup
  • How to use timezone-aware DAGs
  • Professional scheduling best practices

🎯 Your DAGs now run automatically, reliably, and on schedule.


What’s Next?​

πŸ‘‰ Airflow Variables & Connections
Learn how to store configuration and credentials:

  • Use Variables for parameters
  • Use Connections for databases, APIs, and more