Bronze / Silver / Gold Layers — Lakehouse Medallion Model
Bronze / Silver / Gold Layers — Lakehouse Medallion Model
🌍 Why the Medallion Model Exists (Short Story)
Imagine a bakery that receives raw ingredients from dozens of suppliers.
Some flour is high quality.
Some arrives in broken bags.
Some ingredients have missing labels.
Some are fresh… some are mysteriously old.
Would the bakery use all of this directly to bake bread?
Of course not.
They sort it.
They clean it.
They check quality.
They refine it into usable forms.
This process is exactly what the Medallion Architecture does for data.
Databricks groups data into three simple layers:
- Bronze → raw
- Silver → cleaned
- Gold → business-ready
This structure makes large data systems predictable, trustworthy, and scalable.
🥉 1. Bronze Layer — “Raw but Reliable”
The Bronze layer stores raw data exactly as it arrives.
This includes:
- Raw JSON, CSV, binary logs
- Streaming ingestion (Autoloader)
- Duplicate or messy records
- Columns that don’t always match
- Events arriving out of order
🎯 Purpose of Bronze
- Keep the original source data (for auditing & replay)
- No business logic
- No cleanup
- No transformations
✔ Best Practices
- Use Delta Lake for reliability
- Auto-ingest using Autoloader or streaming
- Partition only when necessary
📦 Example
/mnt/bronze/sales
/mnt/bronze/customers
Think of Bronze as the raw pantry of the data bakery.
🥈 2. Silver Layer — “Clean, Organized, and Usable”
The Silver layer is where the real work happens.
Here you:
- Clean data
- Deduplicate
- Parse nested fields
- Fix data types
- Standardize columns
- Join data across sources
- Apply initial business rules
🎯 Purpose of Silver
Make data trustworthy and ready for broad analytical use.
This is usually the biggest and most complex layer.
✔ Best Practices
- Use
MERGEto handle late-arriving or changed data - Maintain CDC patterns here
- Enforce schema consistency
📦 Example
/mnt/silver/sales_clean
/mnt/silver/customers_enriched
Silver tables are your clean ingredients — ready for recipes.
🥇 3. Gold Layer — “Business-Ready Insights”
The Gold layer is where data becomes value.
Here you build:
- BI dashboards (Power BI, Tableau)
- Aggregations (daily/monthly metrics)
- Feature tables for ML
- Domain-specific marts (Sales, Finance, Marketing)
🎯 Purpose of Gold
Deliver data in the exact form business users need.
✔ Best Practices
- Keep Gold tables stable and predictable
- Use incremental updates (MERGE or UPDATE)
- Document business logic clearly
📦 Example
/mnt/gold/sales_summary_daily
/mnt/gold/customer_lifetime_value
Gold tables represent the finished products: the baked bread, cakes, and pastries.
🔁 How Data Flows (Simple Diagram)
RAW DATA ↓ 🥉 Bronze (unprocessed) ↓ 🥈 Silver (clean + reliable) ↓ 🥇 Gold
(business-level insights)
This pipeline turns chaos into clarity.
🧠 Why the Medallion Model Works So Well
✔ 1. Clear separation of responsibility
Raw → Clean → Analytics.
✔ 2. Easier debugging
If something breaks in Gold, check Silver.
If Silver breaks, check Bronze.
✔ 3. Scales beautifully
You can grow each layer independently.
✔ 4. Supports both batch and streaming
Modern Lakehouse pipelines demand both.
✔ 5. Works perfectly with Delta Lake + Databricks
Versioning + schema enforcement + time travel = stable layers.
🧩 Quick Real-World Example
Source
Ecommerce clickstream data
Bronze
Raw events from website logs
Silver
Cleaned sessions with user IDs and timestamps
Gold
Daily product conversion metrics for marketing teams
This structure is used at:
- Retail companies
- Banks
- Healthcare providers
- Startups
- Enterprises
It works everywhere.
📘 Summary
- The Medallion Architecture organizes data into Bronze → Silver → Gold layers.
- Bronze stores raw, unprocessed data exactly as received.
- Silver cleans, standardizes, and enriches the data for analytical use.
- Gold provides business-ready datasets, metrics, and curated domains.
- This model improves reliability, scalability, debugging, and team collaboration.
- It is the foundation of modern Databricks Lakehouse pipelines.
👉 Next Topic
Delta Live Tables (DLT Pipelines) — Hands-On Concepts