XComs & Data Passing Between Tasks in Apache Airflow
The Story Behind XComs π¦β
Imagine an assembly line.
One worker extracts raw materials.
Another processes them.
A third prepares the final product.
But thereβs a problem β how do they pass information to each other without shouting across the factory floor?
In Apache Airflow, that communication channel is called XCom β short for Cross-Communication.
XComs allow tasks to exchange small pieces of data, enabling workflows to be dynamic, intelligent, and interconnected rather than isolated steps.
What is an XCom?β
An XCom is a keyβvalue pair stored in Airflowβs metadata database that allows one task to send data to another task.
Key Characteristicsβ
β
Task-to-task communication
β
Stored in Airflow metadata DB
β
Lightweight (small data only)
β Not designed for large datasets
Rule of Thumb:
XComs pass messages, not data lakes.
Why XComs Existβ
Without XComs:
- Tasks are isolated
- No dynamic behavior
- No decision-making based on upstream results
With XComs:
- Downstream tasks can react to upstream results
- Branching and conditional logic becomes possible
- Pipelines become intelligent, not static
Basic XCom Workflowβ
- Task A generates a value
- Task A pushes the value to XCom
- Task B pulls the value from XCom
- Task B uses the value for processing
XCom Push & Pull β Classic Wayβ
Example Scenarioβ
- Task 1 fetches todayβs sales number
- Task 2 calculates a bonus based on that number
DAG Example (PythonOperator)β
from airflow import DAG
from airflow.operators.python import PythonOperator
from datetime import datetime
def fetch_sales(****context):
sales = 1200
context['ti'].xcom_push(key='daily_sales', value=sales)
def calculate_bonus(****context):
sales = context['ti'].xcom_pull(
key='daily_sales',
task_ids='fetch_sales_task'
)
bonus = sales ** 0.10
print(f"Bonus Amount: {bonus}")
with DAG(
dag_id='xcom_basic_example',
start_date=datetime(2024, 1, 1),
schedule_interval=None,
catchup=False,
) as dag:
fetch_sales_task = PythonOperator(
task_id='fetch_sales_task',
python_callable=fetch_sales,
provide_context=True
)
calculate_bonus_task = PythonOperator(
task_id='calculate_bonus_task',
python_callable=calculate_bonus,
provide_context=True
)
fetch_sales_task >> calculate_bonus_task
Input & Output Exampleβ
Input (Generated in Task 1)
{
"daily_sales": 1200
}
Output (Used in Task 2)
Bonus Amount: 120.0
XComs with TaskFlow API (Modern & Recommended)β
Airflow 2+ introduced the TaskFlow API, which makes XComs almost invisible.
Why This Mattersβ
- Cleaner code
- No manual push/pull
- Pythonic experience
TaskFlow Exampleβ
from airflow.decorators import dag, task
from datetime import datetime
@dag(
dag_id="taskflow_xcom_example",
start_date=datetime(2024, 1, 1),
schedule_interval=None,
catchup=False,
)
def xcom_taskflow_dag():
@task
def fetch_sales():
return 1200
@task
def calculate_bonus(sales):
return sales ** 0.10
bonus = calculate_bonus(fetch_sales())
xcom_taskflow_dag()
Input & Outputβ
Input Returned by Task 1
1200
Output Consumed by Task 2
120.0
π‘ Behind the scenes, Airflow automatically stores and retrieves XComs.
Where Are XComs Stored?β
By default:
- Stored in Airflow metadata database
- Table: xcom
Each record contains:
- DAG ID
- Task ID
- Execution date
- Key
- Value (serialized)
XCom Size Limitations β οΈβ
XComs are not designed for large payloads.
Best Practicesβ
β
IDs
β
Status flags
β
Filenames
β
API responses (small)
β Pandas DataFrames
β Large JSON blobs
β Images or files
π For large data:
- Use S3 / GCS / HDFS
- Pass only the file path via XCom
Custom XCom Backends (Advanced)β
Airflow allows:
- Custom XCom storage
- External systems (Redis, S3)
Useful for:
- Large payloads
- High-throughput systems
This is an advanced topic and usually required only at scale.
Common XCom Mistakesβ
β Treating XCom as a data warehouse
β Forgetting task_ids when pulling
β Overusing XCom for large objects
β Tight coupling between tasks
When You SHOULD Use XComsβ
βοΈ Dynamic branching
βοΈ Passing IDs or flags
βοΈ API pagination tokens
βοΈ TaskFlow return values
When You SHOULD NOT Use XComsβ
β Moving datasets
β Sharing megabytes of data
β Replacing external storage
Summary π§ β
- XComs enable task-to-task communication
- They store small, meaningful data
- TaskFlow API makes XCom usage seamless
- Overuse or misuse can harm performance
- Best practice: pass references, not data
Key Takeawaysβ
- XCom = Cross-Communication
- Stored in Airflow metadata DB
- Lightweight by design
- Essential for dynamic workflows
- TaskFlow API is the modern standard
Whatβs Next?β
β‘οΈ Branching Workflows β BranchPythonOperator
Learn how XComs power conditional execution and dynamic paths in DAGs.