Skip to main content

File Operators – S3, GCS, and Local Filesystem

Every data pipeline has a moment where data becomes a file.

CSV exports.
JSON payloads.
Parquet partitions.
Log archives.

Airflow doesn’t process the data itself β€”
it coordinates how files move, arrive, and transform.

That coordination is handled by File Operators.


What Are File Operators in Airflow?​

File Operators manage:

  • File transfers
  • File existence checks
  • Uploads and downloads
  • Movement between local and cloud storage

They are built on top of storage hooks, giving you:

  • Authentication via Airflow Connections
  • Retry and logging support
  • Consistent patterns across providers

When Should You Use File Operators?​

Ideal Use Cases​

  • Uploading data to S3 or GCS
  • Downloading files for processing
  • Moving files between buckets
  • Archiving or cleaning up files
  • Validating file availability

When Not to Use Them​

  • Row-level data transformations
  • Streaming workloads
  • Heavy compute logic

LocalFilesystem Operators​

Let’s start with the simplest form β€” local files.

FileSensor (Local)​

from airflow.sensors.filesystem import FileSensor

FileSensor(
task_id="wait_for_local_file",
filepath="/data/input/sales_2024-01-10.csv",
poke_interval=60,
timeout=3600,
)

Input​

ParameterValue
filepath/data/input/sales_2024-01-10.csv

Output​

File detected successfully

Amazon S3 Operators​

S3 is one of the most common storage layers in modern pipelines.


Uploading Files to S3​

from airflow.providers.amazon.aws.transfers.local_to_s3 import LocalFilesystemToS3Operator

LocalFilesystemToS3Operator(
task_id="upload_to_s3",
filename="/data/output/sales.csv",
dest_key="sales/2024/01/sales.csv",
dest_bucket="analytics-bucket",
aws_conn_id="aws_default",
)

Input​

SourceDestination
/data/output/sales.csvs3://analytics-bucket/sales/2024/01/sales.csv

Output​

Upload completed successfully

Downloading Files from S3​

from airflow.providers.amazon.aws.transfers.s3_to_local import S3ToLocalFilesystemOperator

S3ToLocalFilesystemOperator(
task_id="download_from_s3",
bucket_name="raw-data",
object_name="events/events_2024-01-10.json",
filename="/tmp/events.json",
)

Input​

SourceDestination
s3://raw-data/events_2024-01-10.json/tmp/events.json

Output​

File downloaded successfully

Google Cloud Storage (GCS) Operators​

GCS operators mirror S3 patterns almost exactly.


Uploading Files to GCS​

from airflow.providers.google.cloud.transfers.local_to_gcs import LocalFilesystemToGCSOperator

LocalFilesystemToGCSOperator(
task_id="upload_to_gcs",
src="/data/output/customers.csv",
dst="customers/2024/customers.csv",
bucket="analytics-gcs-bucket",
gcp_conn_id="google_cloud_default",
)

Input​

SourceDestination
customers.csvgs://analytics-gcs-bucket/customers/2024/customers.csv

Output​

File uploaded to GCS

Downloading Files from GCS​

from airflow.providers.google.cloud.transfers.gcs_to_local import GCSToLocalFilesystemOperator

GCSToLocalFilesystemOperator(
task_id="download_from_gcs",
bucket="raw-events",
object_name="2024/01/events.json",
filename="/tmp/events.json",
)

Templating Paths with Execution Date​

File operators fully support Jinja templating.

dest_key="sales/{{ ds }}/sales.csv"

This enables:

  • Partitioned storage
  • Date-based organization
  • Backfill-friendly pipelines

File Operators & XCom​

Most file operators:

  • Do not push XComs
  • Rely on task success/failure

This is intentional β€” files are the contract.


File Operators vs Sensors​

Use CaseOperatorSensor
Move fileβœ…βŒ
Check file existsβŒβœ…
Wait for arrivalβŒβœ…

Often used together:

  1. Sensor waits
  2. Operator moves or processes

Security Best Practices​

  • Use IAM roles or service accounts
  • Avoid embedding access keys
  • Limit bucket permissions
  • Encrypt sensitive files

❌ Avoid​

  • Hardcoded credentials
  • Overly broad bucket access
  • Public buckets for internal data

Common Mistakes​

❌ Mixing transformation logic with file movement
❌ Ignoring idempotency
❌ Hardcoding file paths
❌ Uploading partially written files


Real-World Use Cases​

  • Data lake ingestion
  • ML feature storage
  • Report generation
  • Backup and archival workflows
  • Cross-cloud data movement

Summary​

File Operators are the logistics layer of Airflow.

Key Takeaways:

  • Move files reliably across systems
  • Consistent patterns across S3, GCS, and local
  • Deep integration with Airflow Connections
  • Best used with sensors for event-driven workflows

They keep your pipelines organized, scalable, and cloud-native.


What’s Next?​

Next article in the series:

➑️ HttpOperator & REST API Workflows