Tables in Databricks β Managed vs External
π§ A Simple Story to Beginβ
Imagine Databricks has two types of βhomesβ where your tables can live.
π Home Type 1: Databricks Takes Care of Everythingβ
You store your table and Databricks decides where to put the data files, how to organize them, and even cleans up after you.
This is a Managed Table.
π‘ Home Type 2: You Bring Your Own Folderβ
You point Databricks to a location you control in cloud storage (S3, ADLS, GCS).
Databricks stores table metadata, but the actual files live where you choose.
This is an External Table.
Thatβs the entire concept in one simple picture.
πΌ What Is a Managed Table?β
A Managed Table is one where:
- Databricks decides where the data files are stored
- Data and metadata are both controlled by Databricks
- Dropping the table deletes the data files automatically
- Storage path lives inside your workspaceβs managed storage location
π¦ Exampleβ
CREATE TABLE sales_bronze (
id INT,
amount DOUBLE
);
No LOCATION given β automatically managed.
β Benefits of Managed Tablesβ
- Easiest to use
- Automatic cleanup
- Perfect for internal Lakehouse workflows
- Delta features work smoothly
β When Managed Tables Are NOT Idealβ
- When multiple tools, systems, or teams need file-level access
- When you must keep tight control over the physical storage layout
- When you use external governance (e.g., AWS Glue, Unity Catalog external volumes)
π What Is an External Table?β
An External Table stores:
- Metadata inside Databricks
- Data files outside Databricks (in a place you choose)
π¦ Exampleβ
CREATE TABLE logs_raw
USING delta
LOCATION 'abfss://raw@datalake.dfs.core.windows.net/logs/';
You are telling Databricks:
βMy files are stored here β just manage the table definition.β
β Benefits of External Tablesβ
- You control the cloud storage location
- Easier for sharing data with non-Databricks systems
- Good for multi-cloud or shared architectures
- File-level access is always available
β Downsidesβ
- If you drop the table, the files remain (you must clean manually)
- More responsibility on your side
- Slightly more setup required
π Managed vs External β The One-Sentence Differenceβ
Managed tables store both metadata and data in Databricks. External tables store metadata in Databricks, but data in a location you choose.
π How to Check Table Typeβ
DESCRIBE DETAIL table_name;
You'll see:
type: MANAGED or EXTERNALlocation: where the data actually lives
π§ When Should You Use Which?β
β Use Managed Tables When:β
- You want Databricks to handle everything
- You are building Bronze β Silver β Gold tables
- The data is internal to your Lakehouse
- You don't care about controlling the cloud path
β Use External Tables When:β
- You must control your own storage folder
- You share files with other systems or teams
- You are migrating existing data into Databricks
- You use external governance/security layers
- Data must remain even if the table is dropped
π¦ Simple Visualβ
Managed Table
ββ Metadata -> Databricks
ββ Data Files -> Databricks-managed storage
External Table
ββ Metadata -> Databricks
ββ Data Files -> Your cloud storage path
π Summaryβ
- Databricks has two types of tables: Managed and External.
- Managed tables store both the data and metadata inside Databricks.
- External tables store metadata in Databricks but data in a location you choose.
- Managed tables are simple and great for internal Lakehouse workflows.
- External tables give you full control and are ideal for multi-tool ecosystems.
- Dropping a managed table deletes data; dropping an external table does not.
Both table types are essential β you choose based on how much control you need.
π Next Topic
Delta Lake Overview β The Storage Layer of Databricks