Creating Your First DAG in Apache Airflow
Imagine you are running a data factory. Every morning, raw data arrives, gets cleaned, transformed, and finally delivered to analytics teams. You donβt manually trigger each step β Apache Airflow does it for you.
At the heart of Airflow lies one powerful concept:
The DAG (Directed Acyclic Graph)
In this chapter, youβll build your **first Airflow DAG, understand how it works, and learn production-ready best practices used by real data engineering teams.
What is a DAG in Airflow?β
A DAG is:
- A collection of tasks
- Organized in a specific order
- With no circular dependencies
- Executed on a schedule or event
π Simple definition: A DAG is a blueprint that tells Airflow what to run, when to run, and in what order.
Real-World Story: Why DAGs Existβ
Think of making coffee β:
- Grind beans
- Boil water
- Brew coffee
You canβt brew before boiling water. You shouldnβt grind beans again after brewing.
This logical flow is exactly how DAGs work.
Basic Structure of an Airflow DAGβ
Every DAG file is a Python script and usually contains:
- DAG definition
- default_args
- Schedule
- Tasks (operators)
- Dependencies
Creating Your First DAG (Step-by-Step)β
Step 1: Import Required Modulesβ
from airflow import DAG
from airflow.operators.empty import EmptyOperator
from datetime import datetime
Step 2: Define default_argsβ
default_args are shared settings applied to all tasks in the DAG.
default_args = {
"owner": "data_engineering_team",
"retries": 1,
}
Common default_args Explainedβ
| Argument | Meaning |
|---|---|
| owner | Who owns the DAG |
| retries | Number of retry attempts |
| retry_delay | Delay between retries |
| email_on_failure | Notify on failure |
Step 3: Create the DAG Objectβ
with DAG(
dag_id="my_first_dag",
description="My first Airflow DAG",
default_args=default_args,
start_date=datetime(2024, 1, 1),
schedule_interval="@daily",
catchup=False,
tags=["beginner", "airflow"],
) as dag:
Key Parameters Explainedβ
| Parameter | Description |
|---|---|
| dag_id | Unique DAG name |
| start_date | When scheduling starts |
| schedule_interval | How often it runs |
| catchup | Prevents backfills |
| tags | Helps UI organization |
Step 4: Add Tasksβ
start = EmptyOperator(task_id="start")
end = EmptyOperator(task_id="end")
π EmptyOperator is perfect for:
- Start markers
- End markers
- Logical grouping
Step 5: Define Task Dependenciesβ
start >> end
This means:
Start must complete before End runs
Full Example DAG (Complete Code)β
from airflow import DAG
from airflow.operators.empty import EmptyOperator
from datetime import datetime
default_args = {
"owner": "data_engineering_team",
"retries": 1,
}
with DAG(
dag_id="my_first_dag",
description="My first Airflow DAG",
default_args=default_args,
start_date=datetime(2024, 1, 1),
schedule_interval="@daily",
catchup=False,
tags=["beginner", "airflow"],
) as dag:
start = EmptyOperator(task_id="start")
end = EmptyOperator(task_id="end")
start >> end
Example Input & Outputβ
Inputβ
- DAG scheduled at 12:00 AM daily
- No external data required
Outputβ
- Task start β SUCCESS
- Task end β SUCCESS
- DAG run marked green in Airflow UI
How This Appears in the Airflow UIβ
- DAG Name: my_first_dag
- Schedule: Daily
- Tasks: start β end
- Status: Running / Success / Failed
π Visual Graph View makes dependencies instantly clear.
Best Practices (Production-Grade)β
β
Always set catchup=False for beginners
β
Use meaningful dag_id names
β
Add tags for discoverability
β
Keep DAGs idempotent
β
Avoid heavy logic inside DAG files
Common Beginner Mistakesβ
β Forgetting start_date
β Using dynamic datetime.now()
β Circular dependencies
β Overloading DAG files with business logic
Key Takeawaysβ
- DAGs are the backbone of Apache Airflow
- Every DAG is a Python file
- default_args reduce repetition
- Scheduling controls automation
- Dependencies define execution order
Summaryβ
In this chapter, you learned:
- What a DAG is and why it exists
- How to create your first DAG
- Understanding default_args
- Scheduling fundamentals
- Task dependencies
- Best practices used by professionals
π― Youβve officially created your first Airflow DAG!
Whatβs Next?β
π Operators Basics Learn how real tasks actually do work using:
- PythonOperator
- BashOperator
- EmptyOperator