Snowflake with Python, PySpark, Databricks β Enterprise Integration
π¬ Story Time β βWe Need All Our Tools Talking to Each Otherββ
Ritika, a lead data engineer at a fast-growing SaaS company, faces an integration challenge.
Her ecosystem is huge:
- Python notebooks for analysts
- PySpark pipelines on Databricks
- Machine learning workflows
- Batch + streaming
- Snowflake as the central data warehouse
The CTO declares:
βEverything must flow into Snowflake and out of Snowflake, seamlessly.β
Now Ritika must connect Python, PySpark, and Databricks in a clean, scalable architecture.
π§ 1. Snowflake + Python β Your Data Engineering Power Duoβ
Python integrates with Snowflake through:
- Snowflake Connector for Python
- Snowpark for Python
- Pandas + Snowflake Native Connectors
- Streamlit-in-Snowflake (SIS)
Ritika starts with the Python connector.
π 1.1 Snowflake Python Connectorβ
Install:
pip install snowflake-connector-python
Connect:
import snowflake.connector
conn = snowflake.connector.connect(
user='RITIKA',
password='xxxxxxx',
account='AB12345.ap-south-1',
warehouse='ANALYTICS_WH',
database='SALES_DB',
schema='PUBLIC'
)
cursor = conn.cursor()
cursor.execute("SELECT COUNT(*) FROM ORDERS")
print(cursor.fetchone())
This powers:
- ad hoc scripts
- ETL micro-jobs
- Python automations
- Airflow & Prefect pipelines
π§ 1.2 Snowpark for Python β Server-Side Pythonβ
Ritika discovers Snowpark, allowing Python logic to run inside Snowflake compute.
Install:
pip install snowflake-snowpark-python
Example:
from snowflake.snowpark import Session
session = Session.builder.configs(connection_parameters).create()
df = session.table("ORDERS")
df_filtered = df.filter(df["REVENUE"] > 1000)
df_filtered.show()
Benefits:β
- Pushdown compute to Snowflake
- Distributed processing
- Zero data movement
- ML model execution inside Snowflake
π₯ 2. Snowflake + PySpark Integrationβ
Snowflake integrates with PySpark via the Spark Snowflake Connector.
Perfect for:
- Large-scale Spark transformations
- Ingest from Delta Lake
- ETL pipelines running on Databricks or EMR
- Converting Spark DataFrames β Snowflake tables
π 2.1 Spark Snowflake Connector Setupβ
Add dependencies:
--packages net.snowflake:snowflake-jdbc:3.13.28,net.snowflake:spark-snowflake_2.12:2.12.0-spark_3.3
Connection options:
sfOptions = {
"sfURL": "AB12345.snowflakecomputing.com",
"sfAccount": "AB12345",
"sfUser": "RITIKA",
"sfPassword": "xxxx",
"sfDatabase": "SALES_DB",
"sfSchema": "PUBLIC",
"sfWarehouse": "SPARK_WH"
}
Write Spark DataFrame β Snowflakeβ
df.write \
.format("snowflake") \
.options(**sfOptions) \
.option("dbtable", "ORDERS_CLEAN") \
.save()
Read from Snowflake β Sparkβ
df_snow = spark.read \
.format("snowflake") \
.options(**sfOptions) \
.option("query", "SELECT * FROM SALES_DB.PUBLIC.ORDERS") \
.load()
ποΈ 3. Snowflake + Databricks β A Modern Lakehouse Integrationβ
Databricks teams often use:
- Spark for heavy transformations
- MLflow for experimentation
- Delta Lake for raw zone
- Snowflake for analytics, BI & governed modeling
Ritika builds a pipeline:
- Raw data β Delta Lake
- Transform in Databricks using PySpark
- Load curated data β Snowflake
- Analysts query Snowflake using BI tools
π 3.1 Databricks + Snowflake Connector Exampleβ
In Databricks notebook:
options = {
"sfUrl": "ab12345.ap-south-1.snowflakecomputing.com",
"sfUser": dbutils.secrets.get("snowflake", "USER"),
"sfPassword": dbutils.secrets.get("snowflake", "PASSWORD"),
"sfDatabase": "REVENUE_DB",
"sfSchema": "PUBLIC",
"sfWarehouse": "DBRICKS_WH"
}
spark_df = spark.sql("SELECT * FROM unified_sales")
spark_df.write \
.format("snowflake") \
.options(**options) \
.option("dbtable", "UNIFIED_SALES_SF") \
.mode("overwrite") \
.save()
Why Databricks integrates well with Snowflake:β
- High-performance parallel load
- Supports Delta β Snowflake
- Easy credential management via Secrets
- Handles large ETL pipelines
π€ 4. Machine Learning Workflowsβ
Ritika combines:
- Snowpark for Python (feature engineering inside Snowflake)
- Spark ML or Databricks MLflow
- Snowflake UDFs & UDTFs
- Model scoring inside Snowflake
Example: Deploy ML model using Snowpark UDF:
@udf
def score_model(amount: float) -> float:
return amount * 0.98 # simplified example
Apply on Snowflake table:
session.table("ORDERS").select(score_model("REVENUE")).show()
This removes the need for exporting large datasets.
π§ 5. Architecture Patternsβ
β Pattern 1 β Databricks as Transformation Layer, Snowflake as Analyticsβ
- Spark cleans & enriches
- Snowflake stores final models & tables
β Pattern 2 β Snowpark-First Architectureβ
- All transformations in Snowflake
- Only ML training outside
β Pattern 3 β Hybrid Lakehouseβ
- Delta for raw + bronze
- Snowflake for gold semantic layers
π¦ 6. Best Practicesβ
- Use Snowpark where possible to avoid data movement
- Use Spark Connector for large-scale batch loads
- Do not oversize Snowflake warehouses for Spark loads
- Use COPY INTO for bulk micro-batch ingestion
- Use Secrets Manager on Databricks for credentials
- Monitor connector jobs through Query History
- Keep transformations close to the compute engine (Spark or Snowflake)
π Real-World Ending β βEverything Works Together Nowββ
With her new integration setup:
- Python automations sync instantly with Snowflake
- Spark pipelines load cleaned data at scale
- Databricks notebooks talk to Snowflake seamlessly
- ML workloads run inside Snowflake using Snowpark
- No messy data exports or CSV dumps
Her CTO smiles:
βThis is a true modern data platform. Excellent work.β
π Summaryβ
Snowflake integrates deeply with:
β Python & Snowparkβ
β PySparkβ
β Databricksβ
β ML & Feature Engineeringβ
β Modern Lakehouse Workflowsβ
Together they create a scalable, flexible, and enterprise-grade data ecosystem.