SQL Operators β PostgresOperator, MySqlOperator, SnowflakeOperator
Every data pipeline eventually reaches the same destination:
A database.
Whether itβs:
- Loading transformed data
- Running data quality checks
- Executing analytics queries
- Or triggering downstream systems
Airflow doesnβt store data β
it orchestrates how data moves and transforms.
That orchestration happens through SQL Operators.
What Are SQL Operators in Airflow?β
SQL Operators allow Airflow to execute SQL statements against databases in a controlled, observable, and retryable way.
They:
- Use database hooks under the hood
- Support templating and dynamic SQL
- Integrate with Airflow Connections
- Fail or succeed based on query execution
When Should You Use SQL Operators?β
Best Use Casesβ
- Data loading (INSERT, COPY, MERGE)
- Data transformations inside the database
- Data quality checks
- Schema management
- Stored procedure execution
When Not to Use Themβ
- Heavy business logic (use Python or Spark)
- Row-by-row processing
- Long-running analytical jobs better suited for warehouses
Common SQL Operators in Airflowβ
| Operator | Database |
|---|---|
| PostgresOperator | PostgreSQL |
| MySqlOperator | MySQL |
| SnowflakeOperator | Snowflake |
All of them follow the same core pattern.
PostgresOperator Deep Diveβ
Letβs start with PostgreSQL.
Basic Exampleβ
from airflow import DAG
from airflow.providers.postgres.operators.postgres import PostgresOperator
from datetime import datetime
with DAG(
dag_id="postgres_operator_example",
start_date=datetime(2024, 1, 1),
schedule_interval="@daily",
catchup=False,
) as dag:
create_table = PostgresOperator(
task_id="create_sales_table",
postgres_conn_id="postgres_default",
sql="""
CREATE TABLE IF NOT EXISTS sales (
order_id INT,
amount NUMERIC,
created_at DATE
);
""",
)
Inputβ
| Parameter | Value |
|---|---|
| postgres_conn_id | postgres_default |
| sql | CREATE TABLE |
Outputβ
Table sales created successfully
Using Templated SQL Filesβ
Instead of embedding SQL inline, use .sql files.
PostgresOperator(
task_id="load_sales_data",
postgres_conn_id="postgres_default",
sql="sql/load_sales_{{ ds }}.sql",
)
Inputβ
| Variable | Value |
|---|---|
| ds | 2024-01-10 |
Outputβ
Data loaded for 2024-01-10
This improves:
- Readability
- Version control
- Reusability
MySqlOperator Exampleβ
The pattern stays the same.
from airflow.providers.mysql.operators.mysql import MySqlOperator
MySqlOperator(
task_id="mysql_cleanup",
mysql_conn_id="mysql_reporting",
sql="""
DELETE FROM sessions
WHERE session_date < DATE_SUB(CURDATE(), INTERVAL 90 DAY);
""",
)
Inputβ
| Parameter | Value |
|---|---|
| mysql_conn_id | mysql_reporting |
| retention | 90 days |
Outputβ
Old records deleted successfully
SnowflakeOperator Deep Diveβ
Snowflake is different β but Airflow makes it feel the same.
from airflow.providers.snowflake.operators.snowflake import SnowflakeOperator
SnowflakeOperator(
task_id="snowflake_transform",
snowflake_conn_id="snowflake_warehouse",
sql="""
MERGE INTO analytics.sales s
USING staging.sales_stg st
ON s.order_id = st.order_id
WHEN MATCHED THEN UPDATE SET amount = st.amount
WHEN NOT MATCHED THEN INSERT VALUES (
st.order_id, st.amount, CURRENT_DATE
);
""",
)
Inputβ
| Parameter | Value |
|---|---|
| snowflake_conn_id | snowflake_warehouse |
| sql | MERGE statement |
Outputβ
Merge completed successfully
XComs & SQL Operatorsβ
By default, SQL Operators:
- Do not push query results to XCom
However, some operators support returning row counts or results depending on configuration.
β οΈ Best Practice:
Use SQL Operators for execution β not for extracting large datasets.
Transactions & Autocommitβ
Most SQL operators support autocommit.
PostgresOperator(
task_id="autocommit_example",
postgres_conn_id="postgres_default",
sql="VACUUM;",
autocommit=True,
)
Use autocommit for:
- DDL statements
- Maintenance operations
SQL Operators vs PythonOperator + DB Hookβ
| Approach | Best For |
|---|---|
| SQL Operator | Simple, clean SQL execution |
| Python + Hook | Dynamic logic, looping, conditionals |
π Rule:
If itβs pure SQL β use a SQL Operator.
Security & Credentialsβ
Best Practicesβ
- Store credentials in Airflow Connections
- Use least-privilege database roles
- Avoid hardcoded schema names when possible
- Parameterize SQL using templates
Common Mistakesβ
β Embedding massive SQL inline
β Hardcoding credentials
β Using SQL Operators for data extraction
β Mixing orchestration logic inside SQL
Real-World Use Casesβ
- Data warehouse transformations
- Daily aggregation jobs
- Data quality checks
- Slowly changing dimension updates
- Schema migrations
Summaryβ
SQL Operators are the workhorses of data orchestration in Airflow.
Key Takeaways:
- Simple, reliable SQL execution
- Consistent patterns across databases
- Deep integration with Airflow Connections
- Best for in-database transformations
Used correctly, SQL Operators keep your pipelines efficient, readable, and scalable.
Whatβs Next?β
Next in the series:
β‘οΈ File Operators β S3, GCS, LocalFilesystem