Alerting β Email & Slack Alerts for Job Failures
π¬ Story Time β βThe Job Failedβ¦ and Nobody Knewββ
Rahul, a data engineer at an e-commerce company, receives a frantic message at 10 AM:
βWhy is todayβs dashboard blank?β
It turns out:
- ETL pipeline failed at 2:00 AM
- No one received an alert
- No monitoring was set up
- Dashboards showed stale data
Rahul thinks:
βA pipeline without alerts is like a plane without sensors.β
He opens Databricks Workflows to configure Email + Slack alerts for every step.
π₯ 1. Why Alerts Matter in Productionβ
Alerts help teams react immediately to failures like:
- Cluster issues
- Node failures
- Schema mismatches
- API rate limits
- File unavailability
- Data validation failures
- Logic bugs in notebooks/scripts
Without alerts, teams lose hours β or worse, publish incorrect data.
π§ 2. Email Alerts in Databricksβ
Email alerts are the simplest and fastest way to get notified.
How to Add Email Alertsβ
- Go to your Job / Workflow
- Click Alerts
- Add:
- Your email
- Team distribution email
- On-call group email
Choose alert type:
- On Failure
- On Success
- On Start
- On Duration Over Threshold
Example β Alert Configurationβ
On Failure β [analytics-team@company.com](mailto:analytics-team@company.com)
On Duration Exceeded β [dataops@company.com](mailto:dataops@company.com)
Databricks automatically sends:
- Error message
- Failed task name
- Logs link
- Run details
- Cluster info
Perfect for morning triage.
π¨ 3. Slack Alerts β For Real-Time Team Visibilityβ
Most modern teams prefer Slack notifications because:
- Everyone sees alerts
- Rapid response coordination
- On-call rotation visibility
- Faster triage
Step 1 β Create a Slack Webhook URLβ
In Slack:
Apps β Incoming Webhooks β Create New Webhook
Select channel, e.g., #data-alerts.
Copy the webhook URL.
Step 2 β Add Slack Webhook to Databricks Workflowsβ
In the Job configuration:
Alerts β Add β Webhook β Paste URL
Step 3 β Customize Slack Message (Optional)β
Databricks sends structured info like:
- Status
- Workflow name
- Link to job run
- Failed task
- Failure reason
But you can also design your own message via a Python task:
import requests
payload = {
"text": f"π¨ Databricks Job Failed: {dbutils.jobs.taskValues.get('task_name')}"
}
requests.post(slack_webhook_url, json=payload)
Now failures appear instantly in Slack.
βοΈ 4. Alerts for Multi-Task Workflows (Per Task)β
Databricks allows:
β Alerts for the entire workflowβ
β Alerts per individual taskβ
This is extremely helpful when:
- The validation task fails
- But upstream ingestion tasks run fine
- Only the downstream team needs notification
Example:
validate_data β On Failure β #quality-alerts
load_gold β On Failure β #data-engineering
π οΈ 5. On-Failure Trigger Tasks (Advanced Alerts)β
You can create error handling tasks inside workflows.
Example:
validate β load_gold
β
notify_failure
The notify_failure task runs only when:
{
"condition": "failed()"
}
Inside this task:
requests.post(slack_url, json={"text": "Validation failed in Databricks!"})
This enables fully automated error routing.
π§ͺ 6. Real Example β Notebook Alert on Errorβ
In a notebook:
try:
df = spark.table("silver.sales")
assert df.count() > 0
except Exception as e:
dbutils.notebook.exit(f"ERROR: {str(e)}")
Databricks will automatically trigger failure alerts.
π 7. Alerts With Databricks SQL (Dashboards)β
Databricks SQL supports real-time condition-based alerts:
- Revenue drop alerts
- Data drift detection
- SLA monitoring
- Missing data alerts
Example:
Alert when COUNT(*) < 1000 in daily_sales table
Alerts can fire:
- Slack webhooks
- PagerDuty
- Custom HTTP endpoints
π§ Best Practicesβ
- Always configure on-failure alerts
- Use Slack β primary, email β secondary
- Create separate channels per pipeline type
- Add file-based triggers + alerts for ingestion issues
- Include run URL in alert message
- Add retry logic + alerts only after retries fail
- Use service principals for webhook authentication
π Real-World Ending β βThe Alert Saved the Morningββ
Next day, at exactly 2:01 AM:
- API returned empty data
- The validation task failed
- Slack alerted the team instantly
- Issue resolved before business hours
At 9:00 AM, dashboards were fresh.
Rahulβs manager said:
βFinallyβ¦ the pipeline can talk to us when things go wrong.β
And thatβs the magic of Databricks Alerts.
π Summaryβ
Databricks supports:
-
β Email alerts
-
β Slack alerts
-
β Webhook-based alerts
-
β On-failure tasks
-
β SQL alerts
-
β Per-task notification targeting
A must-have component for production-grade pipeline monitoring.
π Next Topic
Cluster Policies β Cost & Security Enforcement