Doramagic Project Pack · Human Manual

airflow

Apache Airflow provides a comprehensive solution for workflow management with the following key characteristics:

Project Introduction

Related topics: Architecture Overview, Core Concepts

Section Related Pages

Continue reading this section for the full explanation and source context.

Section Key DAG Properties

Continue reading this section for the full explanation and source context.

Section Default Connections

Continue reading this section for the full explanation and source context.

Section Provider Management

Continue reading this section for the full explanation and source context.

Related topics: Architecture Overview, Core Concepts

Project Introduction

Apache Airflow is an open-source platform designed for orchestrating, scheduling, and monitoring complex workflows. Originally developed at Airbnb, it has become the industry standard for data pipeline orchestration, enabling users to define, schedule, and execute workflows as Directed Acyclic Graphs (DAGs).

Sources: README.md

Overview

Apache Airflow provides a comprehensive solution for workflow management with the following key characteristics:

AspectDescription
TypeOpen-source workflow orchestration platform
LanguagePython
LicenseApache License 2.0
Primary UseData pipelines, ETL processes, ML workflows
ArchitectureDistributed, scalable, extensible

Sources: README.md

Core Architecture

Airflow follows a distributed architecture with several key components working together to manage workflow execution.

graph TD
    subgraph Client["Client Layer"]
        WebServer["Web Server"]
        CLI["Command Line Interface"]
        API["REST API"]
    end
    
    subgraph Scheduler["Core Components"]
        Scheduler["Scheduler"]
        Executor["Executor"]
        Database["Metadata Database"]
    end
    
    subgraph Workers["Worker Layer"]
        Workers["Workers"]
        Tasks["Task Instances"]
    end
    
    subgraph External["External Systems"]
        Triggerer["Triggerer"]
        Logger["Logging"]
    end
    
    WebServer --> Database
    CLI --> Database
    API --> Database
    Scheduler --> Database
    Scheduler --> Executor
    Executor --> Workers
    Workers --> Database
    Triggerer --> Database
    
    style Database fill:#f9f,stroke:#333,stroke-width:2px

Sources: airflow-core/src/airflow/cli/cli_config.py

DAG-Based Workflow Model

At the heart of Airflow is the DAG (Directed Acyclic Graph) concept. A DAG represents a collection of tasks with defined dependencies, arranged in a way that reflects the logical flow of work.

Key DAG Properties

PropertyDescription
DirectedTasks have explicit dependencies (upstream/downstream relationships)
AcyclicNo circular dependencies; workflows flow in one direction
GraphComplex dependency structures are supported

Default Connections

Airflow ships with pre-configured connection definitions for common integrations:

Connection IDTypeDefault Configuration
facebook_defaultfacebook_socialFacebook Ad account credentials
fs_defaultfsFilesystem path /
ftp_defaultftplocalhost:21, SSH key auth
google_cloud_defaultgoogle_cloud_platformDefault GCP schema
hive_cli_defaulthive_clilocalhost:10000, Beeline mode
hiveserver2_defaulthiveserver2localhost:10000
http_defaulthttphttps://www.httpbin.org/
gremlin_defaultgremlingremlin:8182

Sources: airflow-core/src/airflow/utils/db.py

Providers Ecosystem

Apache Airflow's functionality is extended through Providers — packages that integrate Airflow with external services and systems.

graph LR
    Airflow["Apache Airflow Core"] --> Providers["Providers"]
    Providers --> Google["Google Cloud Provider"]
    Providers --> AWS["Amazon Provider"]
    Providers --> Azure["Microsoft Azure Provider"]
    Providers --> Edge3["Edge3 Provider"]
    Providers --> Others["Other Providers"]

Provider Management

Providers can be discovered and managed through the Airflow CLI:

airflow providers [list|get|widgets|sensors|hooks|executors]

Key Provider Features

FeatureDescription
HooksInterfaces to external systems for connection management
OperatorsPre-built task templates for common operations
SensorsTasks that wait for external conditions
TransfersTasks for moving data between systems

Sources: PROVIDERS.rst

Version Information

The Airflow version is managed centrally and can be accessed programmatically:

# airflow-core/src/airflow/version.py
version = "2.11.0"

Sources: airflow-core/src/airflow/version.py

Installation Methods

Airflow supports multiple installation approaches to suit different environments and use cases.

MethodUse CaseCommand/Config
PyPIStandard installationpip install apache-airflow
SourcesDevelopment/contributionClone repository and install
Providers from sourcesTesting provider changesBreeze commands
ConstraintsReproducible buildsUse constraints files

Sources: INSTALLING.md

Installing Specific Versions

The --use-airflow-version option provides flexibility for version installation:

Version SpecifierBehavior
noneSkip Airflow installation
wheelInstall from local dist/ folder
sdistInstall from source distribution
owner/repo:branchInstall from GitHub repository
PR_NUMBERInstall from a Pull Request

Sources: dev/breeze/src/airflow_breeze/commands/common_options.py

Vendored Dependencies

Airflow maintains a _vendor package for dependencies that need special handling:

graph TD
    Vendor["_vendor Package"] --> License["Move to licenses/ folder"]
    Vendor --> Remove["Remove README/supporting files"]
    Vendor --> Requirements["Add to pyproject.toml"]
    Vendor --> Fixes["Re-apply historical fixes"]

Vendoring Process

  1. Update vendor.md with library, version, and SHA256 hash
  2. Remove old files and directories
  3. Move LICENSE files to licenses/ folder
  4. Add requirements to pyproject.toml with appropriate comments
  5. Re-apply any necessary cherry-picked fixes

Sources: airflow-core/src/airflow/_vendor/README.md

Development Environment (Breeze)

The Breeze tool is Airflow's primary development environment, providing consistent tooling for testing, building, and development.

Key Breeze Commands

CommandPurpose
breeze sbom update-sbom-informationUpdate SBOM information
breeze workflow run-publishRun documentation workflow
breeze buildBuild Airflow images

Breeze Configuration Options

OptionDescription
--airflow-versionSpecify Airflow version
--debian-versionSelect base Debian version
--docker-cacheConfigure build caching
--mount-sourcesControl source mounting strategy
--allow-pre-releasesEnable pre-release installations

Sources: dev/breeze/src/airflow_breeze/commands/sbom_commands.py

CLI Commands Structure

Airflow provides a comprehensive command-line interface organized into logical groups:

graph TD
    CLI["airflow CLI"] --> Groups
    Groups --> Config["config - View configuration"]
    Groups --> Info["info - Show system info"]
    Groups --> Plugins["plugins - Dump plugin info"]
    Groups --> Connections["connections - Manage connections"]
    Groups --> Providers["providers - Display providers"]
    Groups --> DAGs["dags - Manage DAGs"]
    Groups --> db_manager["db-manager - Database management"]
    Groups --> rotate_fernet["rotate-fernet-key - Rotate keys"]

Core CLI Commands

CommandFunctionPurpose
airflow configView configurationDisplay current Airflow settings
airflow infoSystem informationShow environment details
airflow pluginsPlugin dumpDisplay loaded plugins
airflow connectionsConnection managementCRUD operations for connections
airflow providersProvider displayList installed providers
airflow db-managerDatabase managementExternal DB manager operations

Sources: airflow-core/src/airflow/cli/cli_config.py

Dependencies and Requirements

Generated Dependency Files

The repository maintains several auto-generated dependency files:

FilePurposeGeneration Command
devel_deps.txtDevelopment dependencies./dev/get_devel_deps.sh
dep_tree.txtFull dependency treeuv tree --no-dedupe
dependency_depth.jsonDependency depth analysisGenerated by Breeze

PyPI README Generation

The PYPI_README.md is automatically generated from the main README using pre-commit hooks defined in .pre-commit-config.yaml, ensuring consistency between project documentation and PyPI listings.

Sources: generated/README.md

Summary

Apache Airflow is a robust, extensible workflow orchestration platform that provides:

  • DAG-based workflow definition for complex pipeline management
  • Extensible architecture through providers and plugins
  • Multiple deployment options from development to production
  • Comprehensive CLI and UI for monitoring and management
  • Strong ecosystem of integrations with major cloud providers and services

The platform's design prioritizes dynamic pipeline generation, extensible operator library, and robust scheduling capabilities, making it the go-to choice for data engineering teams worldwide.

Sources: README.md

Core Concepts

Related topics: Scheduler and Executor Architecture, Data Flow and State Management

Section Related Pages

Continue reading this section for the full explanation and source context.

Section DAG Structure

Continue reading this section for the full explanation and source context.

Section DAG Commands

Continue reading this section for the full explanation and source context.

Section DAG Execution Parameters

Continue reading this section for the full explanation and source context.

Related topics: Scheduler and Executor Architecture, Data Flow and State Management

Core Concepts

Apache Airflow is an open-source workflow orchestration platform designed to programmatically author, schedule, and monitor complex data pipelines. The platform provides a robust framework for defining workflows as Directed Acyclic Graphs (DAGs), enabling organizations to automate and manage data processing workflows at scale.

DAG (Directed Acyclic Graph)

The DAG is the fundamental building block of Airflow. It represents a collection of tasks with defined dependencies, organized in a way that reflects the logical flow of work through the system.

DAG Structure

A DAG defines the overall structure of a workflow:

  • Nodes: Represent individual tasks or operations
  • Edges: Define dependencies between tasks (task A must complete before task B can start)
  • No Cycles: The graph must flow in one direction without circular dependencies

DAG Commands

Airflow provides comprehensive CLI commands for managing DAGs through the dag_cli_commands configuration.

CommandPurpose
dags listList all DAGs in the environment
dags detailsGet detailed information about a specific DAG
dags list-runsList all runs for a specific DAG
dags list-import-errorsShow DAGs with import errors
dags reportDisplay DagBag loading report
dags pausePause a DAG from scheduling
dags unpauseResume DAG scheduling
dags backfillRun subsections of a DAG for a date range
dags testTest a DAG execution

Source: airflow-core/src/airflow/cli/cli_config.py

DAG Execution Parameters

When creating a backfill operation, the following parameters can be configured:

ParameterDescription
dag_idThe identifier of the DAG to backfill
from_dateStart date for the backfill range
to_dateEnd date for the backfill range
run_confConfiguration to pass to the DAG run
run_backwardsExecute DAG runs in reverse chronological order
max_active_runsMaximum number of concurrent active DAG runs
reprocess_behaviorHow to handle already-processed runs
run_on_latest_versionWhether to use the latest DAG version
dry_runExecute without making changes

Source: airflow-core/src/airflow/cli/cli_config.py

Tasks and Task Instances

Task Instance States

A task instance represents a specific execution of a task within a DAG run. Each task instance progresses through various states during its lifecycle.

graph TD
    A[None] --> B[Scheduled]
    B --> C[Queued]
    C --> D[Running]
    D --> E{Success/Failed/Skipped}
    E --> F[Success]
    E --> G[Failed]
    E --> H[Skipped]
    D --> I[Upstream Failed]
    F --> J[Complete]
    G --> J
    H --> J
    I --> J

Task Instance Management

The Airflow UI provides functionality for clearing task instances, allowing operators to re-execute tasks that may have failed or need to be reprocessed. The ClearTaskInstanceConfirmationDialog component handles the confirmation workflow for this operation.

Key attributes displayed when clearing a task instance:

  • Current state of the task
  • Start date (shown as relative time)
  • User who executed the task (or "unknown user" if unavailable)

DAG Runs

A DAG Run represents an individual execution of an entire DAG at a specific point in time. Each DAG run has:

  • Run ID: Unique identifier for the run
  • State: Current state (running, success, failed)
  • Execution Date: The logical date/time the DAG was scheduled to run
  • Start/End Date: Actual execution timestamps
  • Configuration: Run-specific configuration parameters

Listing DAG Runs

The list-runs command supports filtering by:

  • State: Filter runs by their state (running, success, failed)
  • No Backfill: Exclude backfill runs from results
  • Start Date: Filter runs executed after a specific date

Source: airflow-core/src/airflow/cli/cli_config.py

Variables

Airflow Variables provide a mechanism for storing and retrieving arbitrary content or settings as simple key-value pairs. Variables are encrypted when the fernet_key is configured.

Variable Commands

CommandPurpose
variables listList all variables
variables getGet a specific variable value
variables setSet a variable value
variables deleteDelete a variable
variables exportExport all variables to stdout
variables importImport variables from a file

Variable Options

OptionDescription
VARVariable key name
VAR_VALUEValue to set
VAR_DESCRIPTIONOptional description
SERIALIZE_JSONSerialize value as JSON
DESERIALIZE_JSONDeserialize JSON value
DEFAULTDefault value if variable doesn't exist
VAR_IMPORTPath to import file
VAR_ACTION_ON_EXISTING_KEYAction for existing keys

Source: airflow-core/src/airflow/cli/cli_config.py

Connections

Airflow Connections store credentials and configuration information needed to connect to external systems. Default connections are automatically created during initialization.

Default Connection Types

The db.py utility creates several default connections:

Connection IDTypePurpose
facebook_socialfacebook_socialFacebook social authentication
fs_defaultfsFile system operations
ftp_defaultftpFTP server access
google_cloud_defaultgoogle_cloud_platformGCP default credentials
gremlin_defaultgremlinGremlin graph database
hive_cli_defaulthive_cliHive command-line interface
hiveserver2_defaulthiveserver2HiveServer2 JDBC connections
http_defaulthttpGeneric HTTP endpoints
iceberg_defaulticebergApache Iceberg catalog

Source: airflow-core/src/airflow/utils/db.py

Connection CLI Commands

CommandPurpose
connections listList all connections
connections addAdd a new connection
connections deleteDelete a connection
connections getGet connection details
connections editEdit an existing connection

Assets

Assets represent data sources or destinations in Airflow. They can be associated with DAGs to create data-aware scheduling.

Asset Commands

CommandPurpose
assets listList all assets
assets detailsShow asset details
assets materialzeMaterialize an asset

Asset Details Parameters

ParameterDescription
ASSET_ALIASAlias name for the asset
ASSET_NAMEName of the asset
ASSET_URIURI identifying the asset

Source: airflow-core/src/airflow/cli/cli_config.py

Configuration Changes in Airflow 3.0

Airflow 3.0 introduces several configuration changes that affect default behavior.

Default Behavior Changes

ConfigurationOld DefaultNew Default
catchup_by_defaultTrueFalse
create_cron_data_intervalsTrueFalse
create_delta_data_intervalsTrueFalse

Source: airflow-core/src/airflow/cli/commands/config_command.py

Configuration Renames

Several scheduler configurations have been renamed:

Old NameNew Name
scheduler.processor_poll_intervalscheduler.scheduler_idle_sleep_time
scheduler.deactivate_stale_dags_intervalscheduler.parsing_cleanup_interval
scheduler.statsd_onmetrics.statsd_on
scheduler.max_threadsdag_processor.parsing_processes

Source: airflow-core/src/airflow/cli/commands/config_command.py

Catchup Behavior Change

In Airflow 3.0, DAGs without explicit catchup parameter definition will not catch up by default. This represents a change from Airflow 2.x behavior. Organizations relying on catchup behavior should set catchup = True in their DAG definitions or configure:

[scheduler]
catchup_by_default = True

Providers

Airflow Providers extend the core functionality by integrating with external services and systems.

Provider Information Commands

CommandPurpose
providers listList all installed providers
providers detailsShow provider details
providers hooksList registered provider hooks
providers triggersList registered provider triggers
providers executorsGet information about executors
providers secretsGet information about secrets backends
providers connectionsGet connection information
providers notificationsGet notification information
providers extra-linksList extra links from providers
providers widgetsList connection form widgets
providers behaviorsGet connection types with custom behaviors
providers loggingGet task logging handlers
providers auth-managersGet auth managers information
providers configsGet provider configuration
providers lazy-loadedCheck lazy loading status

Source: airflow-core/src/airflow/cli/cli_config.py

Dependency Management

Dependency Tree

Airflow's dependency tree defines the module structure and relationships between packages. This information is generated and maintained in the repository for dependency analysis.

FilePurpose
dep_tree.txtComplete dependency tree of Airflow
dependency_depth.jsonDependency depth analysis

Generated using:

uv tree --no-dedupe > /opt/airflow/generated/dep_tree.txt

Source: generated/README.md

Workflow Architecture

graph TD
    subgraph "Authoring"
        A[Define DAG] --> B[Define Tasks]
        B --> C[Set Dependencies]
    end
    
    subgraph "Scheduling"
        D[Scheduler] --> E{Parse DAGs}
        E --> F[DAG Run Created]
        F --> G[Task Instances Queued]
    end
    
    subgraph "Execution"
        G --> H[Executor]
        H --> I[Worker]
        I --> J[Task Execution]
    end
    
    subgraph "Monitoring"
        J --> K[Update State]
        K --> L[UI/Webserver]
        L --> M[Logs & Metrics]
    end
    
    G -->|Async Operations| N[Async Commands]
    N --> O[Cloud Composer Environments]

Security Features

Fernet Key Rotation

Airflow supports rotating Fernet encryption keys to maintain security of stored credentials and variables:

airflow rotate-fernet-key

This command rotates all encrypted connection credentials and variables.

Source: airflow-core/src/airflow/cli/cli_config.py

Configuration Reference

For secure connections, refer to the official documentation: https://airflow.apache.org/docs/apache-airflow/stable/howto/secure-connections.html

Additional CLI Commands

Info Command

Provides system and environment information:

airflow info [--anonymize] [--file-io] [--output OUTPUT] [--verbose]

Standalone Mode

Runs a complete Airflow instance for testing or development:

airflow standalone

Cheat Sheet

Displays a quick reference for common Airflow commands:

airflow cheat-sheet [--verbose]

Plugins Command

Dump information about loaded plugins:

airflow plugins [--output OUTPUT] [--verbose]

Source: airflow-core/src/airflow/cli/cli_config.py

Teams (RBAC)

Airflow supports team-based access control with the following operations:

Team Commands

CommandPurpose
teams createCreate a new team
teams deleteDelete a team

Team Creation Requirements

  • Team names must be 3-50 characters long
  • Only alphanumeric characters, hyphens, and underscores are allowed

Source: airflow-core/src/airflow/cli/cli_config.py

Source: https://github.com/apache/airflow / Human Manual

Architecture Overview

Related topics: Scheduler and Executor Architecture, REST API

Section Related Pages

Continue reading this section for the full explanation and source context.

Section Component Overview

Continue reading this section for the full explanation and source context.

Section Scalability Features

Continue reading this section for the full explanation and source context.

Section Processing Pipeline

Continue reading this section for the full explanation and source context.

Related topics: Scheduler and Executor Architecture, REST API

Architecture Overview

Apache Airflow is an open-source workflow orchestration platform designed to programmatically author, schedule, and monitor complex data pipelines. The architecture is built around a distributed, scalable design that separates concerns between scheduling, execution, and monitoring.

Core Architectural Components

Airflow's architecture consists of several key components that work together to manage workflow execution across distributed environments.

Component Overview

ComponentPurposeKey Files
SchedulerTriggers scheduled tasks and submits tasks to executorsairflow-core/src/airflow/dag_processing/manager.py
ExecutorExecutes tasks distributed across workersConfigured via airflow.cfg
Web ServerProvides UI for monitoring and managementairflow/ui/
DatabaseStores DAGs, connections, variables, and execution historyConfigured via airflow.cfg
DAG ProcessorParses and processes DAG filesairflow-core/src/airflow/dag_processing/manager.py

Sources: airflow-core/docs/core-concepts/overview.rst

Basic Architecture

The basic Airflow architecture runs all components on a single machine, suitable for development, testing, and small-scale deployments.

graph TB
    subgraph "Airflow Core"
        WS[Web Server] <--> DB[(Metadata Database)]
        SCH[Scheduler] <--> DB
        SCH <--> EX[Executor]
        EX <--> DB
        DP[DAG Processor] <--> DB
    end
    
    subgraph "DAG Storage"
        DAG[DAG Files] --> DP
    end
    
    WS -->|UI Access| Users
    SCH -->|Schedule| DAG

This single-node deployment includes all core components running together, with the scheduler handling task scheduling and the web server providing the user interface.

Sources: airflow-core/docs/img/diagram_basic_airflow_architecture.py

Distributed Architecture

For production environments, Airflow supports a distributed architecture where components can scale independently.

graph TB
    subgraph "Web Tier"
        WS1[Web Server 1]
        WS2[Web Server 2]
        LB[Load Balancer]
    end
    
    subgraph "Scheduler Tier"
        SCH1[Scheduler 1]
        SCH2[Scheduler 2]
    end
    
    subgraph "Worker Tier"
        W1[Worker 1]
        W2[Worker 2]
        W3[Worker N]
    end
    
    subgraph "Metadata"
        DB[(PostgreSQL/MySQL)]
        REDIS[(Redis/Message Broker)]
    end
    
    LB --> WS1
    LB --> WS2
    WS1 <--> DB
    WS2 <--> DB
    SCH1 <--> DB
    SCH2 <--> DB
    SCH1 --> REDIS
    SCH2 --> REDIS
    REDIS --> W1
    REDIS --> W2
    REDIS --> W3

Scalability Features

The distributed architecture provides:

  • Horizontal Scaling: Multiple schedulers and web servers can run in parallel
  • Worker Flexibility: Workers can be added or removed based on workload
  • High Availability: No single point of failure for critical components
  • Isolation: DAG processing is separated from execution

Sources: airflow-core/docs/img/diagram_distributed_airflow_architecture.py

DAG Processing Architecture

The DAG Processor is responsible for reading, parsing, and validating DAG files before the Scheduler can schedule their tasks.

Processing Pipeline

graph LR
    A[DAG Files] -->|Read| B[DAG Processor]
    B -->|Parse| C[DAG Bag]
    C -->|Validate| D[Valid DAGs]
    D -->|Sync| E[(Metadata DB)]
    B -->|Log| F[Processor Logs]

Key Processing Manager

The DagFileProcessorManager handles:

  1. File Discovery: Scanning DAG directories for Python files
  2. Parsing: Converting Python DAG definitions into Airflow DAG objects
  3. Serialization: Storing parsed DAGs in the database
  4. Callback Handling: Processing DAG-level callbacks
# Simplified from airflow-core/src/airflow/dag_processing/manager.py
class DagFileProcessorManager:
    def process_file(self, filepath):
        """Process a single DAG file and return list of DAGs."""
        dagbag = DagBag(dag_folder=filepath)
        for dag in dagbag.dags.values():
            dag.sync_to_db()
        return dagbag.dags

Sources: airflow-core/src/airflow/dag_processing/manager.py

Authentication and Authorization Architecture

Airflow supports pluggable authentication through the Auth Manager interface, allowing integration with various authentication backends.

graph TB
    U[User] -->|Auth Request| AM[Auth Manager]
    AM -->|User Lookup| DB[(Metadata DB)]
    AM -->|Backend Check| BE[External Backend<br/>OIDC/SAML/LDAP]
    BE -->|Validation| AM
    AM -->|Permissions| PERMS[Permission Set]
    PERMS -->|Grant Access| RES[Resources]

Auth Manager Components

ComponentResponsibility
AuthManagerCore interface for authentication
FastAPIAuthManagerDefault implementation for Airflow 3.0+
BackendsExternal identity providers

The auth manager architecture enables:

  • Integration with enterprise identity providers
  • Role-based access control (RBAC)
  • Fine-grained permissions on DAGs and tasks

Sources: airflow-core/docs/img/diagram_auth_manager_airflow_architecture.py

Configuration Management

Airflow 3.0 introduced significant configuration changes from Airflow 2.x:

Configuration Changes Summary

Old ParameterNew ParameterSection
processor_poll_intervalscheduler_idle_sleep_timescheduler
deactivate_stale_dags_intervalparsing_cleanup_intervalscheduler
statsd_onstatsd_onmetrics
max_threadsparsing_processesdag_processor
create_cron_data_intervalsunchangedscheduler
create_delta_data_intervalsunchangedscheduler

Default Behavior Changes

In Airflow 3.0, the default for catchup_by_default is False, meaning DAGs without explicit catchup configuration will not backfill past runs.

Sources: airflow-core/src/airflow/cli/commands/config_command.py

Executor Architecture

Airflow supports multiple executor types for different deployment scenarios:

ExecutorUse CaseScalability
LocalExecutorSingle machine, developmentLimited
SequentialExecutorDebugging, minimal resourcesNone
CeleryExecutorProduction, distributedHigh
KubernetesExecutorContainerized environmentsVery High
RayExecutorRay cluster integrationHigh

Executor Selection

Executors are configured in airflow.cfg:

[core]
executor = KubernetesExecutor

Sources: airflow-core/docs/core-concepts/overview.rst

CLI and Command Architecture

The Airflow CLI provides a comprehensive command-line interface for managing Airflow components:

Command Structure

airflow
├── config          # View configuration
├── connections     # Manage connections
├── dags            # DAG management
├── db-manager      # Database management
├── info            # System information
├── plugins         # Plugin dump
├── providers       # Provider information
├── rotate-fernet-key  # Credential rotation
├── standalone      # All-in-one mode
└── version         # Version display

Key CLI Commands

CommandPurpose
airflow dags backfillRun historical DAG executions
airflow dags list-runsList DAG run instances
airflow connections listShow configured connections
airflow providers listDisplay loaded providers
airflow rotate-fernet-keyRotate encryption keys

Sources: airflow-core/src/airflow/cli/cli_config.py

Provider Architecture

Providers extend Airflow's capabilities by integrating with external systems:

Provider Categories

  • Cloud Providers: Google Cloud, AWS, Azure
  • Service Integrations: HTTP, SSH, GraphQL
  • Database Connectors: PostgreSQL, MySQL, Snowflake
  • Data Processing: Spark, Databricks, dbt

Provider Communication

graph LR
    A[Airflow Task] -->|Hook| B[Provider Hook]
    B -->|API| C[External Service]
    C -->|Response| B
    B -->|Result| A

Providers are discovered and managed through the airflow providers CLI commands.

Sources: airflow-core/src/airflow/cli/cli_config.py

Installation Modes

Docker Installation

Airflow provides official Docker images with multiple variants:

Image TypeDescriptionSize
apache/airflow:latestLatest stable, default Python~1GB
apache/airflow:3.x.xVersioned release~1GB
apache/airflow:slim-latestMinimal installation~500MB

Installation Methods

# Basic installation
pip install 'apache-airflow==3.2.0' \
 --constraint "https://raw.githubusercontent.com/apache/airflow/constraints-3.2.0/constraints-3.10.txt"

# With extras
pip install 'apache-airflow[postgres,google]==3.2.0' \
 --constraint "https://raw.githubusercontent.com/apache/airflow/constraints-3.2.0/constraints-3.10.txt"

Sources: docker-stack-docs/README.md Sources: generated/PYPI_README.md

Default Connections

Airflow ships with pre-configured default connections for common integrations:

Connection IDTypePurpose
google_cloud_defaultgoogle_cloud_platformGCP resources
fs_defaultfsFile system access
http_defaulthttpHTTP endpoints
ftp_defaultftpFTP/SFTP transfers
hive_cli_defaulthive_cliHive CLI connections
hiveserver2_defaulthiveserver2HiveServer2 connections

Sources: airflow-core/src/airflow/utils/db.py

Development and Testing Architecture

The Breeze development environment provides an integrated development setup:

Breeze Features

  • Pre-configured Docker-based development environment
  • Multiple Python version support
  • Integration testing framework
  • Static code analysis tools
  • Documentation building capabilities

Development Commands

# Start development shell
breeze shell

# Run tests
breeze test

# Build documentation
breeze build-docs

Sources: dev/breeze/src/airflow_breeze/commands/developer_commands.py

Shared Distribution Architecture

Airflow supports shared code distributions for cross-project functionality:

Configuration

[tool.airflow]
shared_distributions = [
     "apache-airflow-shared-timezones",
]

Shared distributions are:

  1. Defined in pyproject.toml under tool.airflow
  2. Symlinked via _shared folder
  3. Automatically synchronized by pre-commit hooks

Sources: shared/README.md

MyPy Type Checking Integration

Airflow provides custom MyPy plugins for enhanced type checking:

Available Plugins

PluginPurpose
airflow_mypy.plugins.decoratorsType checking for Airflow decorators
airflow_mypy.plugins.outputsType inference for XCom arguments

Configuration

[mypy]
plugins = airflow_mypy.plugins.decorators, airflow_mypy.plugins.outputs

Or in pyproject.toml:

[tool.mypy]
plugins = ["airflow_mypy.plugins.decorators", "airflow_mypy.plugins.outputs"]

Sources: dev/mypy/README.md

Build and Release Architecture

SBOM (Software Bill of Materials)

Airflow generates SBOMs for security and compliance tracking:

  • Dependency tree generation via uv tree
  • Dependency depth analysis
  • Version-specific SBOM files

Documentation Build

The documentation system includes:

  • Sphinx-based documentation generation
  • Pagefind search integration
  • Multi-version documentation support
  • Third-party inventory tracking

Sources: dev/breeze/src/airflow_breeze/commands/sbom_commands.py Sources: devel-common/src/sphinx_exts/pagefind_search/README.md

Summary

Apache Airflow's architecture is designed for scalability, reliability, and extensibility:

  • Modular Design: Components can be scaled independently
  • Pluggable Executors: Support for various execution backends
  • Extensible Providers: Integration with external systems
  • Production-Ready: High availability and monitoring capabilities
  • Developer-Friendly: Comprehensive tooling and documentation

The architecture supports deployments from single-machine development environments to large-scale, distributed production systems handling thousands of workflows.

Sources: airflow-core/docs/core-concepts/overview.rst

Scheduler and Executor Architecture

Related topics: Architecture Overview, Kubernetes Deployment

Section Related Pages

Continue reading this section for the full explanation and source context.

Section Scheduler Process

Continue reading this section for the full explanation and source context.

Section Job Lifecycle

Continue reading this section for the full explanation and source context.

Section Scheduler Responsibilities

Continue reading this section for the full explanation and source context.

Related topics: Architecture Overview, Kubernetes Deployment

Scheduler and Executor Architecture

Apache Airflow's scheduling and execution system is a distributed architecture that coordinates the parsing of Directed Acyclic Graphs (DAGs), scheduling of task instances, and execution of tasks across worker nodes. This document provides a comprehensive overview of how the Scheduler and Executors interact to process and execute workflows.

Overview

The Scheduler and Executor architecture in Apache Airflow consists of several interconnected components that work together to transform DAG definitions into executed tasks. The system separates concerns between scheduling decisions (when to run tasks based on dependencies and timetable data) and execution (how and where tasks actually run).

graph TD
    A[DAG Files] --> B[DAG Processor]
    B --> C[DAG Parsing]
    C --> D[Serialized DAGs]
    D --> E[Metadata Database]
    E --> F[Scheduler]
    F --> G[Executor]
    G --> H[Workers]
    H --> I[Task Execution]
    I --> E

Scheduler Architecture

Scheduler Process

The Scheduler is a long-running daemon process that continuously monitors DAGs and schedules task instances for execution. It is configured through CLI commands defined in the system.

CLI Configuration:

The scheduler command is defined in cli_config.py and accepts multiple configuration parameters for controlling its behavior.

ParameterPurposeDefault
--num-runsNumber of scheduler runs before exiting-1 (infinite)
--only-idleOnly schedule DAGs with idle tasksFalse
--pidPID file locationNone
--daemonRun as daemon processFalse
--stdoutstdout log fileNone
--stderrstderr log fileNone
--log-fileLog file pathNone
--skip-serve-logsSkip serving logsFalse
--devDevelopment modeFalse

Sources: airflow-core/src/airflow/cli/cli_config.py

Job Lifecycle

The Scheduler creates and manages Jobs that represent its operational cycles. Each scheduler run follows a specific lifecycle that involves database interactions.

sequenceDiagram
    participant CLI as CLI Component
    participant JobRunner as JobRunner
    participant DB as Database
    participant TaskRunner as TaskRunner

    activate CLI
    CLI->>JobRunner: Create Job
    activate JobRunner
    JobRunner->>DB: Create Job Record
    activate DB
    DB-->>JobRunner: Job Created
    JobRunner->>DB: Create Session
    DB->>JobRunner: Session
    deactivate DB
    JobRunner->>CLI: Job Created
    deactivate JobRunner
    CLI->>JobRunner: Execute Job
    activate JobRunner
    par
        JobRunner->>DB: Schedule Tasks
        activate DB
        DB-->>JobRunner: Scheduled Tasks
        deactivate DB
    and
        JobRunner->>JobRunner: Process DAG Files
    end
    JobRunner->>DB: Perform Heartbeat
    activate DB
    DB->>JobRunner: Heartbeat Response
    JobRunner->>JobRunner: Heartbeat Callback
    DB-->>JobRunner: Close Session
    deactivate DB
    JobRunner->>CLI: Job Completed
    deactivate JobRunner
    deactivate CLI

Sources: airflow-core/src/airflow/jobs/JOB_LIFECYCLE.md

Scheduler Responsibilities

The Scheduler performs the following key operations:

  1. DAG Parsing: Reads DAG files and parses them into Python objects
  2. Task Scheduling: Determines which tasks are ready to execute based on dependencies
  3. DagRun Creation: Creates DagRun records for scheduled DAG runs
  4. TaskInstance Creation: Creates TaskInstance records for tasks to be executed
  5. Heartbeating: Maintains its presence and reports health to the database

Executor Architecture

Executors are responsible for the actual execution of tasks. Airflow supports multiple executor types, each with different deployment characteristics.

Executor Types

ExecutorDescriptionUse Case
SequentialExecutorExecutes tasks sequentially in the same processDevelopment/Debugging
LocalExecutorExecutes tasks in parallel processes on a single machineSingle-node deployments
CeleryExecutorDistributes tasks across multiple machines using CeleryDistributed production deployments
KubernetesExecutorCreates pods per task in KubernetesKubernetes-native deployments
LocalKubernetesExecutorHybrid of Local and Kubernetes executorsTesting/minimal Kubernetes

Executor Loader

The ExecutorLoader class is responsible for loading and configuring executors based on Airflow configuration. It supports both simple executor names and complex module paths with optional aliases.

graph TD
    A[Executor Config] --> B{ExecutorLoader}
    B --> C{Check Format}
    C -->|Simple Name| D[Load Core Executor]
    C -->|Team:Executor| E[Parse Team Config]
    E --> F[Alias:Module/Name]
    F --> G[Resolve Module Path]
    D --> H[Return ExecutorName]
    G --> H

Executor Configuration Parsing:

The loader parses executor configurations in multiple formats:

  • Simple name: SequentialExecutor, LocalExecutor
  • Module path: airflow.executors.local_executor.LocalExecutor
  • With alias: MyAlias:LocalExecutor
  • Team-based: team_name:executor_name or team_name:alias:executor_name

Sources: airflow-core/src/airflow/executors/executor_loader.py

Base Executor Interface

All executors inherit from BaseExecutor which defines the standard interface:

MethodPurpose
execute_async()Execute a single task
sync()Sync state with metadata database
end()Cleanup executor resources
terminate()Force terminate all running tasks
try_adopt_task_instances()Adopt orphaned task instances
render_slots()Render available executor slots

Local Executor

The Local Executor executes tasks in parallel worker processes on a single machine, providing a balance between simplicity and parallelism.

Key Features:

  • Configurable parallelism (number of parallel workers)
  • Supports task-level parallelism within a single node
  • Inherits configuration from core executor registry
  • Can be configured with aliases for multi-team deployments
# Example: Loading LocalExecutor with team configuration
executor_names_per_team.append(
    ExecutorName(
        alias=None, 
        module_path=cls.executors[module_or_name], 
        team_name=team_name
    )
)

Sources: airflow-core/src/airflow/executors/local_executor.py

Task Scheduling Flow

The complete task scheduling flow involves multiple stages from DAG definition to task execution:

graph LR
    A[DAG File] --> B[DAG Processor]
    B --> C[Parse DAG]
    C --> D[Serialize DAG]
    D --> E[Store in DB]
    E --> F[Scheduler]
    F --> G[Evaluate Timetable]
    G --> H{Check Dependencies}
    H -->|Met| I[Create TaskInstance]
    H -->|Not Met| J[Skip]
    I --> K[Queue Task]
    K --> L[Executor]
    L --> M[Worker]
    M --> N[Execute Task]
    N --> O[Update State]
    O --> P[Record XCom]

DAG Processing

The DAG collection module handles parsing and synchronization of DAGs:

ComponentResponsibility
DagBagCollection of parsed DAGs from file system
DagFileProcessorParses individual DAG files
DagFileProcessorAgentManages multiple DAG processors
SerializedDagModelDatabase representation of serialized DAGs

Sources: airflow-core/src/airflow/dag_processing/collection.py

Timetables

Timetables determine when DAGs should be triggered. They provide schedule information and calculate logical dates for DAG runs.

Key Timetable Methods:

MethodPurpose
describe()Human-readable schedule description
infer_data_interval()Infer run boundaries from logical date
get_next_runtime()Calculate next scheduled run time
validate()Validate timetable configuration

Sources: airflow-core/src/airflow/timetables/base.py

Task State Management

Task instances maintain state throughout their lifecycle. The TaskState class provides methods for managing task state via supervisor communications.

graph TD
    A[Task Starts] --> B[Scheduled]
    B --> C[Queued]
    C --> D[Started]
    D --> E{Ran Successfully?}
    E -->|Yes| F[Success]
    E -->|No| G{Retryable?}
    G -->|Yes| H[Up for Retry]
    H --> C
    G -->|No| I[Failed]
    F --> J[Emit XCom]
    I --> J

TaskState API

The TaskState class provides key-value storage for task state information:

MethodDescription
get(key)Retrieve task state value by key
set(key, value)Store task state value
delete(key)Delete a specific key
clear(all_map_indices)Clear all keys or map-index specific keys

Sources: task-sdk/src/airflow/sdk/execution_time/context.py

DAG Runs and Task Instances

DagRun States

StateDescription
queuedInitial state when DAG run is created
runningDAG run is currently executing
successAll tasks in the DAG completed successfully
failedDAG run failed due to task or system failure

Run Types

TypeTrigger Mechanism
scheduledTriggered automatically by timetable
manualTriggered by user action via CLI or UI
datasetTriggered by dataset dependency
backfillTriggered by explicit backfill command

Sources: airflow-core/src/airflow/ui/src/pages/DagRuns.tsx

CLI Commands for Scheduler and Executor

Scheduler Commands

# Start the scheduler
airflow scheduler

# Start with specific number of runs
airflow scheduler --num-runs 10

# Run in daemon mode
airflow scheduler --daemon --log-file /path/to/scheduler.log

# Development mode
airflow scheduler --dev

DAG Processing Commands

# Trigger DAG run
airflow dags trigger <dag_id>

# Test task
airflow tasks test <dag_id> <task_id> <logical_date>

# Clear task instances
airflow tasks clear <dag_id>

# Render task template
airflow tasks render <dag_id> <task_id> <logical_date>

Sources: airflow-core/src/airflow/cli/cli_config.py

Configuration

Executor Configuration

Executors are configured in airflow.cfg or through environment variables:

[core]
executor = LocalExecutor

[celery]
celery_executor_config = ...

Scheduler Configuration

ConfigurationDescriptionDefault
scheduler_num_runsNumber of scheduler runs-1 (infinite)
scheduler_idle_sleep_timeSeconds between scheduler loops1
num_runsAlternative parameter for number of runs-1
only_idleOnly schedule idle DAGsFalse

Signal Handling

The scheduler supports the following signals for operational control:

SignalAction
SIGUSR2Dump a snapshot of task state being tracked by the executor

Example usage:

pkill -f -USR2 "airflow scheduler"

Sources: airflow-core/src/airflow/cli/cli_config.py

Triggerer

The Triggerer is a separate daemon that manages lightweight asynchronous triggers for tasks that need to wait for external conditions:

airflow triggerer --capacity 1000 --queues queue1,queue2
ParameterPurpose
--capacityMaximum concurrent triggers
--queuesQueues to consume from
--pidPID file location
--daemonRun as daemon

DAG Processor

The DAG Processor parses and validates DAG files before they are scheduled:

airflow dag-processor --bundle-name <name> --num-runs 5

Summary

The Scheduler and Executor Architecture in Apache Airflow provides a robust, scalable system for orchestrating complex workflows. Key architectural principles include:

  1. Separation of Concerns: Scheduler handles scheduling decisions; Executors handle task execution
  2. Pluggable Executors: Multiple executor types support different deployment scenarios
  3. Database-Driven State: All state is persisted in the metadata database for durability
  4. Continuous Loop: Scheduler runs continuously, periodically evaluating DAGs and scheduling tasks
  5. Team-Based Execution: Modern Airflow supports team-based executor configuration with aliases

Sources: airflow-core/src/airflow/cli/cli_config.py

REST API

Related topics: User Interface, Architecture Overview

Section Related Pages

Continue reading this section for the full explanation and source context.

Section Core API (/api/v1)

Continue reading this section for the full explanation and source context.

Section Execution API (/execution)

Continue reading this section for the full explanation and source context.

Section Authentication Flow

Continue reading this section for the full explanation and source context.

Related topics: User Interface, Architecture Overview

REST API

Apache Airflow provides a comprehensive REST API for programmatic interaction with the workflow automation platform. The API is built using FastAPI and enables external systems and applications to manage DAGs, trigger executions, monitor workflows, and interact with task instances.

Architecture Overview

Apache Airflow's REST API is architected as a multi-layer FastAPI application that separates concerns between the core API and the execution API.

graph TD
    subgraph "Client Layer"
        A[External Clients]
        B[CLI Tools]
        C[UI Dashboard]
    end
    
    subgraph "REST API Layer"
        D[Core API<br/>/api/v1]
        E[Execution API<br/>/execution]
    end
    
    subgraph "Service Layer"
        F[DAG Management Service]
        G[Task Execution Service]
        H[Configuration Service]
    end
    
    subgraph "Data Layer"
        I[Airflow Database]
        J[Metadata DB]
    end
    
    A --> D
    B --> D
    C --> D
    C --> E
    D --> F
    D --> H
    E --> G
    F --> I
    G --> I
    H --> J

Core API (`/api/v1`)

The Core API provides the primary interface for DAG management, monitoring, and administrative operations. It handles:

  • DAG run creation and management
  • Task instance operations
  • Connection and variable management
  • User and permission management
  • Plugin and provider information

Sources: airflow-core/src/airflow/api_fastapi/core_api/app.py

Execution API (`/execution`)

The Execution API is designed for lightweight, high-performance task execution operations. It provides endpoints for:

  • Task state updates
  • XCom value operations
  • Task heartbeat signals
  • Execution context retrieval

Sources: airflow-core/src/airflow/api_fastapi/execution_api/app.py

Authentication and Authorization

Airflow's REST API supports multiple authentication mechanisms through a pluggable auth manager architecture.

Authentication Flow

sequenceDiagram
    participant Client
    participant API as REST API
    participant Auth as Auth Manager
    participant DB as Database
    
    Client->>API: Request with credentials
    API->>Auth: Validate credentials
    Auth->>DB: Check user/permissions
    DB-->>Auth: User info + permissions
    Auth-->>API: Auth result
   alt Authentication Success
        API-->>Client: 200 + Response data
    else Authentication Failure
        API-->>Client: 401/403 + Error
    end

Simple Auth Manager

For development and testing environments, Airflow provides a Simple Auth Manager that supports basic authentication mechanisms.

Sources: airflow-core/src/airflow/api_fastapi/auth/managers/simple/simple_auth_manager.py

Access Control

The REST API implements granular access control through DAG-level permissions:

PermissionDescription
GETRead access to DAG information
POSTCreate/modify DAG resources
DELETERemove DAG resources
EDITModify DAG configuration

Access control is enforced through dependency injection on route handlers:

dependencies=[
    Depends(requires_access_dag("GET")),
    Depends(requires_access_dag("GET", DagAccessEntity.DEPENDENCIES)),
    Depends(requires_access_dag("GET", DagAccessEntity.TASK_INSTANCE)),
]

Sources: airflow-core/src/airflow/api_fastapi/core_api/routes/ui/structure.py

DAG Runs API

DAG Runs represent individual executions of a Directed Acyclic Graph (DAG). The DAG Runs API provides comprehensive endpoints for managing workflow executions.

DAG Run Data Model

FieldTypeDescription
dag_idstringUnique identifier for the DAG
run_idstringUnique identifier for this execution
stateenumCurrent state (queued, running, success, failed)
confobjectConfiguration passed to the DAG
logical_datedatetimeScheduled execution time
start_datedatetimeActual execution start time
end_datedatetimeExecution completion time
external_triggerbooleanWhether triggered externally

Sources: airflow-core/src/airflow/api_fastapi/core_api/datamodels/dag_run.py

Key Endpoints

EndpointMethodDescription
/dags/{dag_id}/dagRunsGETList DAG runs with filtering
/dags/{dag_id}/dagRunsPOSTTrigger a new DAG run
/dags/{dag_id}/dagRuns/{run_id}GETGet specific DAG run details
/dags/{dag_id}/dagRuns/{run_id}DELETEDelete a DAG run
/dags/{dag_id}/dagRuns/{run_id}/clearPOSTClear task instances
/dags/{dag_id}/dagRuns/{run_id}/confirmPOSTConfirm a DAG run
/dags/{dag_id}/dagRuns/{run_id}/updatePATCHUpdate DAG run state

Sources: airflow-core/src/airflow/api_fastapi/core_api/routes/ui/dag_runs.py

Query Parameters

The DAG Runs list endpoint supports extensive filtering:

ParameterTypeDescription
limitintegerMaximum number of results (default: 100)
offsetintegerPagination offset
order_bystringSort field
stateenumFilter by state
dag_run_idstringFilter by run ID
logical_datedatetimeFilter by logical date
start_datedatetimeFilter by start date
end_datedatetimeFilter by end date
include_upstreambooleanInclude upstream dependencies
include_downstreambooleanInclude downstream tasks
depthintegerTree depth limit
rootstringRoot node filter
external_dependenciesbooleanInclude external dependencies

API Structure and Organization

Route Organization

The REST API routes are organized by functional area within the FastAPI application structure:

graph LR
    subgraph "/api/v1"
        A[UI Routes]
        B[DAG Routes]
        C[Task Routes]
        D[Connection Routes]
        E[Variable Routes]
        F[Plugin Routes]
    end
    
    subgraph "/execution"
        G[Task Execution]
        H[XCom Operations]
    end

Response Models

All API endpoints return structured responses using Pydantic models. Responses include:

  • Data: The requested resource or operation result
  • Meta: Pagination information and metadata
  • Links: HATEOAS-style navigation links

Error Handling

The API implements consistent error handling with structured error responses:

Status CodeCategory
400Bad Request - Invalid parameters
401Unauthorized - Authentication required
403Forbidden - Insufficient permissions
404Not Found - Resource doesn't exist
409Conflict - State conflict
500Internal Server Error

Security Considerations

Session Management

The REST API integrates with Airflow's session management system. When authentication is enabled:

  1. Clients must authenticate to receive a session cookie
  2. Subsequent requests include the session cookie
  3. Sessions expire based on configuration settings

Role-Based Access Control (RBAC)

The API supports RBAC through integration with the auth manager:

  • Admin: Full access to all resources
  • Op: Access to DAG operations
  • User: Read access to DAGs, limited write access
  • Viewer: Read-only access

Configuration

Enabling the REST API

The REST API is enabled by default when Airflow is configured with a supported auth manager. Key configuration options:

OptionDefaultDescription
auth_managersimpleAuthentication backend
api_url-Base URL for API endpoints
secret_key-Session encryption key

CORS Configuration

Cross-Origin Resource Sharing (CORS) can be configured to allow web clients to access the API:

SettingDescription
access_control_allow_originsAllowed origins
access_control_allow_methodsAllowed HTTP methods
access_control_allow_headersAllowed headers

API Versioning

Apache Airflow maintains API stability through versioning:

  • Current Version: /api/v1
  • Version Prefix: All endpoints are prefixed with the version
  • Stability Guarantee: Within a major version, breaking changes are avoided

Sources: airflow-core/docs/stable-rest-api-ref.rst

Usage Examples

Triggering a DAG Run

curl -X POST "http://airflow-server:8080/api/v1/dags/my_dag/dagRuns" \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer <token>" \
  -d '{
    "dag_run_id": "manual_run_001",
    "logical_date": "2024-01-15T10:00:00Z",
    "conf": {"key": "value"}
  }'

Listing DAG Runs

curl "http://airflow-server:8080/api/v1/dags/my_dag/dagRuns?state=running&limit=10" \
  -H "Authorization: Bearer <token>"

See Also

Sources: airflow-core/src/airflow/api_fastapi/core_api/app.py

User Interface

Related topics: REST API, Core Concepts

Section Related Pages

Continue reading this section for the full explanation and source context.

Section Technology Stack

Continue reading this section for the full explanation and source context.

Section Directory Structure

Continue reading this section for the full explanation and source context.

Section Core Route Definitions

Continue reading this section for the full explanation and source context.

Related topics: REST API, Core Concepts

User Interface

Overview

Apache Airflow's User Interface (UI) is a modern web-based frontend built with React and TypeScript that provides comprehensive workflow management, monitoring, and operational capabilities. The UI serves as the primary visual interface for data engineers and operators to interact with DAGs, task instances, DAG runs, connections, and various Airflow components.

The UI framework is organized under airflow-core/src/airflow/ui/ and leverages contemporary React patterns including component composition, lazy loading, and type-safe TypeScript implementations. Sources: airflow-core/docs/ui.rst:1-50

Architecture Overview

Technology Stack

ComponentTechnologyPurpose
FrameworkReact 18+UI component library
LanguageTypeScriptType safety and better DX
State ManagementTanStack Query / React QueryServer state management
RoutingReact Router v7Client-side routing
StylingChakra UIComponent library and styling
Build ToolViteFast development and building
IconsLucide ReactConsistent iconography

Sources: airflow-core/src/airflow/ui/package.json:1-30

Directory Structure

airflow/
├── ui/
│   ├── src/
│   │   ├── layouts/          # Page layout components
│   │   │   └── Details/      # Detail view layouts
│   │   │       ├── Grid/     # Grid view for DAGs
│   │   │       └── Graph/    # Graph view for DAGs
│   │   ├── pages/            # Page components
│   │   │   ├── Dashboard/    # Dashboard pages
│   │   │   ├── Dag/          # DAG detail pages
│   │   │   ├── Run/          # Run detail pages
│   │   │   └── Connections/  # Connection management
│   │   ├── components/       # Reusable UI components
│   │   │   ├── Clear/        # Clear action components
│   │   │   └── TriggerDag/   # DAG triggering components
│   │   └── router.tsx        # Application routing configuration
│   └── package.json
└── providers/
    └── edge3/                # Edge provider with custom UI
        └── plugins/www/src/
            ├── pages/        # Edge-specific pages
            └── components/   # Edge-specific components

Routing System

The UI uses React Router v7 for client-side navigation. Routes are defined in router.tsx and organized into two main categories: main application routes and detail views.

graph TD
    A["/"] --> B["Dashboard"]
    A --> C["DAGs List"]
    A --> D["DAG Detail"]
    D --> D1["Grid View"]
    D --> D2["Graph View"]
    D --> D3["Overview"]
    A --> E["DAG Runs"]
    A --> F["Connections"]
    A --> G["Providers"]
    
    style A fill:#e1f5fe
    style D1 fill:#fff3e0
    style D2 fill:#e8f5e9
    style D3 fill:#f3e5f5

Core Route Definitions

The router configuration maps URL paths to page components and handles lazy loading for improved performance:

// Simplified route structure from router.tsx
routes = [
  { path: "/", component: Dashboard },
  { path: "/dags", component: DagList },
  { path: "/dags/:dagId", component: DagDetail },
  { path: "/dags/:dagId/runs", component: DagRuns },
  { path: "/runs/:dagRunId", component: RunDetail },
  { path: "/connections", component: Connections },
]

Page Components

Dashboard

The Dashboard (Dashboard.tsx) serves as the landing page, providing an overview of the Airflow environment. It typically includes:

  • DAG statistics and health metrics
  • Recent DAG runs
  • Failed or stuck tasks requiring attention
  • Quick access to frequently used DAGs

Sources: airflow-core/src/airflow/ui/src/pages/Dashboard/Dashboard.tsx:1-50

DAG Detail Views

DAG detail pages provide comprehensive views of individual DAGs with multiple visualization modes:

#### Grid View

The Grid view (Grid.tsx) displays DAG tasks in a tabular format, allowing users to:

  • View task states across multiple DAG runs
  • Navigate through historical runs
  • Identify failed or running tasks quickly
graph LR
    subgraph "Grid View Structure"
        H["Header: DAG Info"]
        T["Task Grid Table"]
        F["Filter/Sort Controls"]
    end
    
    T --> T1["Task Columns"]
    T --> T2["Run Rows"]
    T1 --> Cell["State Cell"]

Sources: airflow-core/src/airflow/ui/src/layouts/Details/Grid/Grid.tsx:1-80

#### Graph View

The Graph view (Graph.tsx) renders the DAG structure visually as a directed acyclic graph, showing:

  • Task dependencies and relationships
  • Task execution states with color coding
  • Interactive node selection and navigation

Sources: airflow-core/src/airflow/ui/src/layouts/Details/Graph/Graph.tsx:1-80

#### Overview Page

The Overview page (Overview.tsx) provides a comprehensive dashboard for a specific DAG:

interface OverviewComponents {
  dagStats: DagStats;           // DAG statistics
  failedRuns: FailedRuns;       // Failed run alerts
  durationChart: DurationChart; // Duration visualization
  assetEvents: AssetEvents;     // Asset event tracking
  dagDeadlines: DagDeadlines;    // Deadline management
  failedLogs: FailedLogs;        // Failed task logs
}

Sources: airflow-core/src/airflow/ui/src/pages/Dag/Overview/Overview.tsx:1-100

DAG Runs Page

The DAG Runs page (DagRuns.tsx) displays a table of all DAG runs with the following columns:

ColumnDescriptionFeatures
DAG Run IDUnique identifierLink to run detail
StateCurrent stateColored badge
Run TypeScheduled, manual, etc.Icon + text
Run AfterScheduled execution timeTime component
Triggering UserUser who triggeredUsername display
Start DateExecution start timeTimestamp
End DateExecution end timeTimestamp
DurationTotal execution timeCalculated field
DAG VersionsVersion trackingVersion badges

Sources: airflow-core/src/airflow/ui/src/pages/DagRuns.tsx:1-100

Dialog Components

Dialogs are used throughout the UI for focused interactions, confirmations, and detailed forms.

Confirmation Dialogs

The ClearTaskInstanceConfirmationDialog.tsx demonstrates the dialog pattern used for critical operations:

<Dialog.Root lazyMount onOpenChange={onClose} open={open} size="xl">
  <Dialog.Content backdrop>
    <Dialog.Header>
      <Dialog.Title>
        <Icon color="tomato"><GoAlertFill /></Icon>
        {translate("dags:runAndTaskActions.confirmationDialog.title")}
      </Dialog.Title>
    </Dialog.Header>
    <Dialog.Body>
      {/* Confirmation details */}
    </Dialog.Body>
    <Dialog.Footer>
      <Button onClick={onClose}>{translate("common:modal.confirm")}</Button>
    </Dialog.Footer>
  </Dialog.Content>
</Dialog.Root>

Key characteristics:

  • lazyMount: Content renders only when opened
  • unmountOnExit: Complete cleanup when closed
  • backdrop: Modal overlay for focus
  • size variants: sm, md, lg, xl for different content types

Sources: airflow-core/src/airflow/ui/src/components/Clear/TaskInstance/ClearTaskInstanceConfirmationDialog.tsx:1-60

Edit Dialogs

The EditConnectionButton.tsx demonstrates dialog usage for editing forms:

<Dialog.Root lazyMount onOpenChange={handleClose} open={open} size="xl" unmountOnExit>
  <Dialog.Content backdrop>
    <Dialog.Header>
      <Heading size="xl">{translate("connections.edit")}</Heading>
    </Dialog.Header>
    <Dialog.Body>
      <ConnectionForm
        error={error}
        initialConnection={initialConnectionValue}
        isEditMode={true}
        isPending={isPending}
        mutateConnection={editConnection}
      />
    </Dialog.Body>
  </Dialog.Content>
</Dialog.Root>

Sources: airflow-core/src/airflow/ui/src/pages/Connections/EditConnectionButton.tsx:1-50

Component Library

Clear Actions

The UI provides comprehensive task and group clearing functionality:

graph TD
    A["Clear Action Trigger"] --> B{"Single vs Group"}
    B -->|Single| C["ClearTaskInstanceDialog"]
    B -->|Group| D["ClearGroupTaskInstanceDialog"]
    
    C --> E["Confirmation Dialog"]
    D --> F["Options Selection"]
    F --> F1["Past Tasks"]
    F --> F2["Future Tasks"]
    F --> F3["Upstream"]
    F --> F4["Downstream"]
    F --> F5["Only Failed"]
    
    E --> G["Action Accordion"]
    F --> G
    G --> H["API Execution"]

Components in airflow-core/src/airflow/ui/src/components/Clear/TaskInstance/:

Trigger DAG Components

The TriggerDAGAdvancedOptions.tsx provides advanced options when triggering DAGs:

OptionPurpose
dagRunIdCustom run identifier
partitionKeyPartition-based execution
noteExecution notes/documentation
<Controller
  control={control}
  name="dagRunId"
  render={({ field }) => (
    <Field.Root mt={6} orientation="horizontal">
      <Field.Label fontSize="md" style={{ flexBasis: "30%" }}>
        {translate("runId")}
      </Field.Label>
      <Stack css={{ flexBasis: "70%" }}>
        <Input {...field} size="sm" />
        <Field.HelperText>{translate("components:triggerDag.runIdHelp")}</Field.HelperText>
      </Stack>
    </Field.Root>
  )}
/>

Sources: airflow-core/src/airflow/ui/src/components/TriggerDag/TriggerDAGAdvancedOptions.tsx:1-80

Edge Provider UI

The Edge provider (providers/edge3/) extends the base UI with worker-specific pages and components.

Worker Management Page

The WorkerPage.tsx provides a table-based interface for managing edge workers:

graph LR
    subgraph "Worker Page Structure"
        T["Worker Table"]
        H["Header Actions"]
        F["Filter Controls"]
    end
    
    T --> C1["Worker Name"]
    T --> C2["Queues"]
    T --> C3["Active Jobs"]
    T --> C4["System Info"]
    T --> C5["Operations"]

Worker operations include:

  • Delete: Available for offline, unknown, or offline maintenance states
  • Shutdown: Available for idle, running, maintenance states
  • Enter Maintenance: Set worker to maintenance mode with comment
  • Exit Maintenance: Remove worker from maintenance mode

Sources: providers/edge3/src/airflow/providers/edge3/plugins/www/src/pages/WorkerPage.tsx:1-100

Worker Operation Dialogs

Bulk operations are supported via BulkWorkerOperations.tsx:

<Dialog.Root>
  <Portal>
    <Dialog.Backdrop />
    <Dialog.Positioner>
      <Dialog.Content>
        <Dialog.Header>
          <Dialog.Title>
            Delete {deleteWorkers.length} selected worker(s)
          </Dialog.Title>
        </Dialog.Header>
        <Dialog.Body>
          <List.Root>
            {deleteWorkers.map((worker) => (
              <List.Item key={worker.worker_name}>{worker.worker_name}</List.Item>
            ))}
          </List.Root>
        </Dialog.Body>
        <Dialog.Footer>
          <Button colorPalette="danger" loading={isBulkDeletePending}>
            Delete Workers
          </Button>
        </Dialog.Footer>
      </Dialog.Content>
    </Dialog.Positioner>
  </Portal>
</Dialog.Root>

Internationalization

The UI uses i18n (internationalization) patterns for all user-facing text:

// Translation usage example
translate("dagRun.runAfter")      // Column headers
translate("dags:runAndTaskActions.confirmationDialog.title")
translate("common:modal.confirm")
translate("taskInstance", { count: affectedTasks.total_entries })

Translation keys are organized by:

  • common: Shared UI elements
  • components: Component-specific strings
  • dags: DAG-related UI strings
  • dagRun: DAG run specific strings

State Management and Data Fetching

The UI uses TanStack Query (React Query) for server state management:

// Typical data fetching pattern
const { data, isLoading, error } = useQuery({
  queryKey: ['dagRuns', dagId, page],
  queryFn: () => fetchDagRuns(dagId, page),
});

Key patterns:

  • Optimistic updates: Immediate UI feedback during mutations
  • Lazy loading: Components render only when needed
  • Error boundaries: Graceful error handling with ErrorAlert components
  • Loading states: Skeleton loaders for better UX

Pagination

Paginated lists are implemented consistently throughout the UI:

<Pagination.Root
  count={data?.total_entries ?? 0}
  onPageChange={(event) => setPage(event.page)}
  page={page}
  pageSize={PAGE_LIMIT}
>
  <HStack>
    <Pagination.PrevTrigger />
    <Pagination.Items />
    <Pagination.NextTrigger />
  </HStack>
</Pagination.Root>

Standard pagination constants:

  • PAGE_LIMIT: Default items per page (typically 25-50)
  • total_entries: Total count from API response

Summary

Apache Airflow's User Interface is a comprehensive React-based frontend that provides:

  1. Multi-view DAG Visualization: Grid, Graph, and Overview views for different use cases
  2. Comprehensive Task Management: Clear, trigger, and manage task instances
  3. Connection Management: Visual interface for managing Airflow connections
  4. Edge Worker Control: Dedicated UI for edge worker deployment management
  5. Consistent Component Patterns: Reusable dialogs, tables, and forms
  6. Internationalization: Full i18n support for global deployments
  7. Type Safety: Full TypeScript coverage for reliable development

The UI architecture emphasizes modularity, lazy loading, and consistent patterns to ensure maintainability and performance across large-scale Airflow deployments.

Sources: airflow-core/src/airflow/ui/package.json:1-30

Data Flow and State Management

Related topics: Core Concepts, Connections and Variables

Section Related Pages

Continue reading this section for the full explanation and source context.

Section Purpose and Role

Continue reading this section for the full explanation and source context.

Section XCom Data Model

Continue reading this section for the full explanation and source context.

Section XCom API Operations

Continue reading this section for the full explanation and source context.

Related topics: Core Concepts, Connections and Variables

Data Flow and State Management

Overview

Data Flow and State Management in Apache Airflow encompasses the mechanisms by which tasks communicate, share state, and coordinate execution across a Directed Acyclic Graph (DAG). Airflow provides several interconnected systems to manage how data moves between tasks, how task states are tracked, and how assets trigger workflow execution.

The core components involved in data flow include XCom (Cross-Communication), Assets, and Task State management. These systems work together to enable complex workflows where tasks can pass data, react to external events, and maintain execution context throughout the DAG lifecycle.

XCom (Cross-Communication)

Purpose and Role

XCom is Airflow's primary mechanism for inter-task communication. It allows tasks to push and pull values that can be used by downstream tasks in the same DAG. XComs are stored in the Airflow metadata database and can contain any serializable Python object.

XCom Data Model

AttributeTypeDescription
keyStringIdentifier for the XCom value
valueAnyThe actual data being passed
task_idStringTask that produced the XCom
dag_idStringDAG containing the task
run_idStringDAG run identifier
map_indexIntegerIndex for mapped tasks (-1 for non-mapped)
timestampDateTimeWhen the XCom was created
connection_idStringOptional connection for large data

XCom API Operations

#### Push (Sending XCom Values)

Tasks can push XCom values using the xcom_push() method:

task_instance.xcom_push(key="result", value={"data": "example"})

Or implicitly by returning a value from a task:

@task
def process_data():
    return {"processed": True, "count": 42}

#### Pull (Receiving XCom Values)

Downstream tasks can retrieve XCom values using xcom_pull():

upstream_result = ti.xcom_pull(task_ids=["process_data"], key="result")

XComArg

XComArg provides a more declarative way to reference XCom values, enabling type-safe access to task outputs:

from airflow.models.xcom_arg import XComArg

processed_data = XComArg(task_id="process_data")
result = processed_data(map_indexes=[0])

XComArg supports the following parameters:

ParameterTypeDescription
task_idStringSource task identifier
dag_idStringOptional DAG identifier
map_indexInteger/ListSpecific map index or indices
keyStringXCom key to retrieve

Asset Management

Asset Model

Assets represent data sources or sinks that Airflows can monitor. They enable event-driven scheduling where DAGs can be triggered when underlying data changes.

AttributeTypeDescription
nameStringHuman-readable asset name
uriStringUnique identifier (URN, path, etc.)
groupStringLogical grouping of assets
extraDictAdditional metadata
created_atDateTimeCreation timestamp
updated_atDateTimeLast modification timestamp

Asset States

Asset states track the lifecycle and availability of assets:

class AssetState:
    """Represents the state of an asset in the system."""
    
    IDLE = "idle"
    SCHEDULED = "scheduled"
    RUNNING = "running"
    SUCCESS = "success"
    FAILED = "failed"

#### State Transitions

graph TD
    A[IDLE] -->|Asset referenced| B[SCHEDULED]
    B -->|Scheduler picks up| C[RUNNING]
    C -->|Success| D[SUCCESS]
    C -->|Failure| E[FAILED]
    D -->|Next schedule| B
    E -->|Retry| B

Task State Management

Task States

Task instances progress through a defined set of states:

StateDescription
noneTask has not been triggered
queuedTask is queued for execution
runningTask is currently executing
successTask completed successfully
failedTask failed during execution
upstream_failedUpstream task dependency failed
skippedTask was skipped due to branching
up_for_retryTask failed but will be retried
up_for_rescheduleTask is waiting for reschedule

TaskState Model

class TaskState:
    """Tracks the current state and context of a task instance."""
    
    task_id: str
    dag_id: str
    run_id: str
    state: TaskInstanceState
    try_number: int
    max_tries: int
    start_date: Optional[datetime]
    end_date: Optional[datetime]
    duration: Optional[float]

State Persistence

Task states are persisted to the metadata database, enabling:

  • Recovery from scheduler restarts
  • Historical execution tracking
  • DAG run state reconstruction
  • SLA monitoring and alerting

Data Flow Architecture

Task Execution Flow

graph TD
    A[DAG Trigger] --> B[Scheduler]
    B --> C{Dependency Check}
    C -->|All met| D[Queue Task]
    C -->|Not met| E[Wait]
    D --> F[Executor]
    F --> G[Worker]
    G --> H[Execute Task]
    H --> I{Push XCom}
    I -->|Yes| J[Store in DB]
    J --> K[Task Complete]
    I -->|No| K
    K --> L[Update TaskState]
    L --> M[Check Downstream]
    M --> C

XCom Storage Flow

graph LR
    A[Task Instance] -->|xcom_push| B[XCom Table]
    C[Task Instance] -->|xcom_pull| B
    B -->|query| D[Metadata DB]
    D -->|result| C

Serialization and Deserialization

XComArg Serialization

When DAGs are serialized (for example, for the webserver or worker), XComArg references must be properly handled:

class XComArgBase:
    """Base class for serializable XCom arguments."""
    
    def serialize(self) -> dict:
        """Convert XComArg to dictionary format."""
        return {
            "task_id": self.task_id,
            "dag_id": self.dag_id,
            "key": self.key,
            "map_index": self.map_index,
        }
    
    @staticmethod
    def deserialize(data: dict) -> "XComArgBase":
        """Reconstruct XComArg from dictionary."""
        return XComArg(**data)

Best Practices

Data Flow Design

  1. Minimize XCom payload size - Large XCom values impact database performance
  2. Use Assets for external data - Let the scheduler handle dependency tracking
  3. Prefer pull over push patterns - Downstream tasks should pull required data
  4. Clear XCom when unnecessary - Use xcom_clear() to prevent accumulation

State Management

  1. Monitor task durations - Track state transitions for performance analysis
  2. Configure appropriate retries - Set retries and retry_delay based on task reliability
  3. Use SLA alerts - Configure sla parameter for critical task deadlines
  4. Clean up failed states - Implement proper error handling and state recovery

Configuration Options

ParameterDefaultDescription
xcom_pickle_tasks_on_errorFalseSerialize task data to XCom on error
max_xcom_size1048576Maximum XCom value size in bytes
xcom_backendairflow.models.xcom.BaseXComCustom XCom backend class
enable_asset_uri_validationTrueValidate asset URIs on creation

Source: https://github.com/apache/airflow / Human Manual

Connections and Variables

Related topics: Data Flow and State Management, Docker and Helm Deployment

Section Related Pages

Continue reading this section for the full explanation and source context.

Section Purpose and Scope

Continue reading this section for the full explanation and source context.

Section Connection Types

Continue reading this section for the full explanation and source context.

Section Connection Management via CLI

Continue reading this section for the full explanation and source context.

Related topics: Data Flow and State Management, Docker and Helm Deployment

Connections and Variables

Overview

Connections and Variables are two fundamental abstractions in Apache Airflow for managing configuration and external system integrations. They enable DAGs to store, retrieve, and utilize configuration data and credentials without hardcoding sensitive information.

Connections provide a secure way to store and manage credentials for external systems such as databases, APIs, cloud services, and file systems. They centralize authentication configuration in one place, making it easy to manage, update, and audit access to external resources.

Variables provide a simple key-value store for storing arbitrary configuration data that can be accessed across DAGs and tasks. They are useful for storing environment-specific settings, feature flags, and general configuration values.

Both features support multiple backend storage mechanisms and can be managed through the Airflow UI, CLI, or programmatically via the API.

Connections

Purpose and Scope

Connections in Airflow encapsulate all the information needed to connect to an external system. Each connection includes:

  • A unique connection identifier (conn_id)
  • Connection type (conn_type) specifying the external system category
  • Host, port, login, and password fields
  • Schema and extra parameters for additional configuration

Connections are stored in the Airflow metadata database by default but can also be backed by secrets backends such as HashiCorp Vault, AWS Secrets Manager, Azure Key Vault, or Google Cloud Secret Manager.

Connection Types

Airflow supports numerous connection types through provider packages. The core supported types include:

Connection TypeDescriptionDefault Port
facebook_socialFacebook OAuth/social authenticationN/A
fsFilesystem connectionN/A
ftpFTP/SFTP server21
google_cloud_platformGoogle Cloud Platform servicesN/A
gremlinApache TinkerPop Gremlin server8182
hive_cliHive command-line interface10000
hiveserver2HiveServer2 JDBC interface10000
httpHTTP/HTTPS endpoints443
icebergApache Iceberg catalogN/A

Sources: airflow-core/src/airflow/utils/db.py:200-260

Connection Management via CLI

The Airflow CLI provides comprehensive commands for managing connections:

# List all connections
airflow connections list

# Get a specific connection
airflow connections get <conn_id>

# Add a new connection
airflow connections add <conn_id> --conn-type http --host https://api.example.com

# Delete a connection
airflow connections delete <conn_id>

# Export all connections
airflow connections export /tmp/connections.json

# Import connections from file
airflow connections import /tmp/connections.json

# Test a connection
airflow connections test <conn_id>

The connection commands are defined in the CLI configuration with lazy-loaded implementations:

CONNECTIONS_COMMANDS = (
    ActionCommand(name="get", func=lazy_load_command("airflow.cli.commands.connection_command.connections_get"), ...),
    ActionCommand(name="list", func=lazy_load_command("airflow.cli.commands.connection_command.connections_list"), ...),
    ActionCommand(name="add", func=lazy_load_command("airflow.cli.commands.connection_command.connections_add"), ...),
    ActionCommand(name="delete", func=lazy_load_command("airflow.cli.commands.connection_command.connections_delete"), ...),
    ActionCommand(name="export", func=lazy_load_command("airflow.cli.commands.connection_command.connections_export"), ...),
    ActionCommand(name="import", func=lazy_load_command("airflow.cli.commands.connection_command.connections_import"), ...),
    ActionCommand(name="test", func=lazy_load_command("airflow.cli.commands.connection_command.connections_test"), ...),
)

Sources: airflow-core/src/airflow/cli/cli_config.py:1-50

Connection Storage Architecture

graph TD
    A[Airflow CLI/UI/API] --> B[Connection CRUD Operations]
    B --> C{Secrets Backend}
    C -->|None/Default| D[Airflow Metadata Database]
    C -->|HashiCorp Vault| E[Vault Secrets]
    C -->|AWS| F[AWS Secrets Manager]
    C -->|Azure| G[Azure Key Vault]
    C -->|GCP| H[GCP Secret Manager]
    D --> I[connection Table]
    E --> J[Vault Path]
    F --> K[AWS Secrets]
    G --> L[Azure Keys]
    H --> M[GCP Secrets]

Using Connections in DAGs

Connections are accessed in DAGs through the BaseHook class:

from airflow.hooks.base import BaseHook

def get_external_api_data():
    conn = BaseHook.get_connection("my_external_api")
    api_key = conn.password
    base_url = conn.host
    
    # Use credentials to make API calls
    ...

Connections with custom configurations can utilize the extra field for JSON-encoded parameters:

Connection(
    conn_id="ftp_default",
    conn_type="ftp",
    host="localhost",
    port=21,
    login="airflow",
    password="airflow",
    extra='{"key_file": "~/.ssh/id_rsa", "no_host_key_check": true}'
)

Sources: airflow-core/src/airflow/utils/db.py:230-240

Variables

Purpose and Scope

Variables provide a simple key-value storage mechanism for storing configuration data that can be shared across DAGs and tasks. They support string values with optional JSON serialization for complex data structures.

Key characteristics of Variables:

  • Key-value pairs stored in the Airflow metadata database
  • Optional JSON serialization for non-string values
  • Support for environment-specific configuration
  • Accessible from all DAGs and tasks
  • Can be exported/imported in bulk

Variable Management via CLI

The Airflow CLI provides comprehensive commands for managing variables:

# List all variables
airflow variables list

# Get a specific variable
airflow variables get <var_key>

# Set a variable
airflow variables set <var_key> <var_value>

# Set a variable with JSON serialization
airflow variables set config_json '{"setting": true}' --serialize-json

# Delete a variable
airflow variables delete <var_key>

# Export all variables
airflow variables export /tmp/variables.json

# Import variables from file
airflow variables import /tmp/variables.json

The variable commands are defined with support for serialization options:

VARIABLES_COMMANDS = (
    ActionCommand(name="list", func=lazy_load_command("airflow.cli.commands.variable_command.variables_list"), ...),
    ActionCommand(name="get", func=lazy_load_command("airflow.cli.commands.variable_command.variables_get"), 
                  args=(ARG_VAR, ARG_DESERIALIZE_JSON, ARG_DEFAULT, ARG_VERBOSE)),
    ActionCommand(name="set", func=lazy_load_command("airflow.cli.commands.variable_command.variables_set"),
                  args=(ARG_VAR, ARG_VAR_VALUE, ARG_VAR_DESCRIPTION, ARG_SERIALIZE_JSON, ARG_VERBOSE)),
    ActionCommand(name="delete", func=lazy_load_command("airflow.cli.commands.variable_command.variables_delete"), ...),
    ActionCommand(name="export", func=lazy_load_command("airflow.cli.commands.variable_command.variables_export"), ...),
    ActionCommand(name="import", func=lazy_load_command("airflow.cli.commands.variable_command.variables_import"), ...),
)

Sources: airflow-core/src/airflow/cli/cli_config.py:50-80

Variable Storage Architecture

graph TD
    A[DAG/Task Code] --> B[Variable API]
    B --> C[Secrets Backend]
    C -->|Default| D[Metadata Database]
    C -->|Backend| E[External Secrets]
    D --> F[variable Table]
    E --> F
    B --> G[JSON Deserializer]
    G --> H[Python Objects]
    F --> I[Key-Value Store]

Using Variables in DAGs

Variables can be accessed using the Variable class:

from airflow.models import Variable

# Get a simple string variable
api_endpoint = Variable.get("api_endpoint")

# Get with default value
timeout = Variable.get("timeout", default_var=30)

# Get and deserialize JSON
config = Variable.get("my_config", deserialize_json=True)

# Set a variable
Variable.set("my_key", "my_value")
Variable.set("config", {"setting": True}, serialize_json=True)

Secrets Backend Integration

Both Connections and Variables can be backed by external secrets backends for enhanced security. This allows storing sensitive data in enterprise-grade secret management systems while maintaining the same Airflow API interface.

Available Secrets Backends

BackendPackageDescription
HashiCorp Vaultairflow.providers.hashicorpHashiCorp Vault KV secrets engine
AWS Secrets Managerairflow.providers.amazonAWS Secrets Manager and SSM Parameter Store
Azure Key Vaultairflow.providers.microsoft.azureAzure Key Vault secrets
GCP Secret Managerairflow.providers.googleGoogle Cloud Secret Manager
Environment VariablesBuilt-inRead from OS environment variables
Local FilesBuilt-inRead from JSON/YAML files

Configuring Secrets Backends

The secrets backend is configured via the [secrets] section in airflow.cfg:

[secrets]
backend = airflow.providers.google.cloud.secrets.secret_manager.CloudSecretManagerBackend
backend_kwargs = {"project_id": "my-project", "connections_prefix": "airflow-connections"}

Best Practices

Security Considerations

  1. Never hardcode credentials - Always use Connections or Variables for sensitive data
  2. Use secrets backends - For production environments, use enterprise secret management systems
  3. Enable encryption - Ensure the Airflow metadata database is encrypted at rest
  4. Rotate credentials - Regularly rotate passwords and API keys stored in connections
  5. Audit access - Monitor and log access to sensitive connections and variables

Performance Considerations

  1. Avoid frequent variable reads - Cache variable values when used repeatedly in a task
  2. Use connection pooling - Many hooks automatically pool connections; configure appropriately
  3. Limit extra field size - Keep connection extra JSON data minimal for performance

Operational Considerations

  1. Use meaningful conn_ids - Follow naming conventions like {env}_{system}_{purpose}
  2. Document connections - Use the description field to document connection purpose and owners
  3. Export for disaster recovery - Regularly export connections and variables for backup
# Backup connections and variables
airflow connections export /opt/airflow/backups/connections_$(date +%Y%m%d).json
airflow variables export /opt/airflow/backups/variables_$(date +%Y%m%d).json

CLI Command Reference

Connection Commands

CommandDescription
airflow connections listList all connections
airflow connections get <conn_id>Get connection details
airflow connections add <conn_id> [options]Create a connection
airflow connections delete <conn_id>Delete a connection
airflow connections export <file>Export connections to file
airflow connections import <file>Import connections from file
airflow connections test <conn_id>Test connection availability
airflow connections create-default-connectionsInitialize provider default connections

Variable Commands

CommandDescription
airflow variables listList all variables
airflow variables get <key>Get variable value
airflow variables set <key> <value>Set variable value
airflow variables delete <key>Delete a variable
airflow variables export <file>Export variables to file
airflow variables import <file>Import variables from file

See Also

Sources: airflow-core/src/airflow/utils/db.py:200-260

Docker and Helm Deployment

Related topics: Kubernetes Deployment, Architecture Overview

Section Related Pages

Continue reading this section for the full explanation and source context.

Section Production Image (Dockerfile)

Continue reading this section for the full explanation and source context.

Section CI Image (Dockerfile.ci)

Continue reading this section for the full explanation and source context.

Section Version Constants

Continue reading this section for the full explanation and source context.

Related topics: Kubernetes Deployment, Architecture Overview

Docker and Helm Deployment

Apache Airflow provides comprehensive support for containerized and Kubernetes-based deployments through Docker images and Helm charts. This documentation covers the deployment mechanisms, configuration options, and best practices for running Airflow in these environments.

Overview

Apache Airflow can be deployed using two primary methods:

Deployment MethodUse CasePrimary Files
DockerLocal development, single-node deployments, CI/CD pipelinesDockerfile, Dockerfile.ci
Helm ChartProduction Kubernetes deployments, distributed systemschart/Chart.yaml, chart/values.yaml

Sources: docker-stack-docs/README.md

Docker Image Architecture

Apache Airflow ships with two main Docker images optimized for different use cases.

Production Image (`Dockerfile`)

The production Docker image is built using multi-stage builds and includes all necessary components for running Airflow in production environments. The image uses python:{DEFAULT_PYTHON_MAJOR_MINOR_VERSION}-slim-{ALLOWED_DEBIAN_VERSIONS[0]} as its base.

Key characteristics of the production image:

  • Base OS: Debian slim variant for minimal footprint
  • Package Manager: Uses uv (Ultraviolet) for faster pip installations
  • Architecture: Multi-stage build for optimized image size
  • User Space: Runs as non-root user for security

Sources: dev/breeze/src/airflow_breeze/commands/release_management_commands.py:55-62

CI Image (`Dockerfile.ci`)

The CI image is designed for continuous integration workflows and includes additional tooling for testing and development.

# syntax=docker/dockerfile:1.4
FROM python:{DEFAULT_PYTHON_MAJOR_MINOR_VERSION}-slim-{ALLOWED_DEBIAN_VERSIONS[0]}
RUN apt-get update && apt-get install -y --no-install-recommends libatomic1 git curl
RUN pip install uv=={UV_VERSION}
RUN --mount=type=cache,id=cache-airflow-build-dockerfile-installation,target=/root/.cache/ \
  uv pip install --system ignore pip=={AIRFLOW_PIP_VERSION} hatch=={HATCH_VERSION} \
  pyyaml=={PYYAML_VERSION} gitpython=={GITPYTHON_VERSION} rich=={RICH_VERSION} \
  prek=={PREK_VERSION}
COPY . /opt/airflow

Sources: dev/breeze/src/airflow_breeze/commands/release_management_commands.py:56-65

Version Constants

The build process uses the following pinned version constants:

ConstantVersionPurpose
AIRFLOW_PIP_VERSION26.1.1pip package manager
AIRFLOW_UV_VERSION0.11.11Fast Python package installer
GITPYTHON_VERSION3.1.50Git operations in Python
RICH_VERSION15.0.0Rich terminal output
HATCH_VERSION1.16.5Python packaging
PYYAML_VERSION6.0.3YAML parsing
PREK_VERSION0.3.13Pre-commit hooks

Sources: dev/breeze/src/airflow_breeze/commands/release_management_commands.py:70-76

Helm Chart Deployment

The Apache Airflow Helm chart provides a production-ready deployment mechanism for Kubernetes clusters.

Chart Metadata

PropertyValue
Chart Nameairflow
RepositoryApache Airflow Official
Kubernetes Supportv1.30+, v1.31+, v1.32+, v1.33+

Supported Kubernetes versions are determined by the major cloud providers (EKS, AKS, GKE) support windows.

Sources: dev/breeze/src/airflow_breeze/global_constants.py:48-54

Core Components

The Helm chart deploys the following core Airflow components:

graph TD
    A[Helm Chart] --> B[Webserver]
    A --> C[Scheduler]
    A --> D[Triggerer]
    A --> E[Worker]
    A --> F[Flower]
    A --> G[StatsD]
    A --> H[Redis]
    A --> I[PostgreSQL/MySQL]
    
    B -->|Read/Write| I
    C -->|Queue Jobs| H
    D -->|Async Tasks| H
    E -->|Process Tasks| H

Scheduler Deployment

The scheduler is deployed as a Kubernetes Deployment with the following characteristics:

  • Replicas: Configurable via replicas parameter
  • Resources: Configurable CPU and memory limits
  • Health Checks: Liveness and readiness probes
  • Persistence: Optional volume mounts for DAGs and logs

Sources: chart/templates/scheduler/scheduler-deployment.yaml

Worker Deployment

Workers process tasks from the message queue and are deployed as:

  • Deployment Type: StatefulSet or Deployment based on configuration
  • Scaling: Horizontal Pod Autoscaler (HPA) support
  • Queue Configuration: Multiple queues supported
  • Resource Management: Configurable resource requests and limits

Sources: chart/templates/workers/worker-deployment.yaml

Installation Methods

Installing from PyPI

While it is possible to install Airflow using tools like Poetry or pip-tools, only pip installation is currently officially supported.

Note: Installing via Poetry or pip-tools is not currently supported.

For repeatable installation, Airflow maintains "known-to-be-working" constraint files in the orphan constraints-main and constraints-2-0 branches. These constraint files are maintained per major/minor Python version.

Sources: generated/PYPI_README.md

Installing Providers from Sources

Providers can be installed from source with SHA512 verification:

shasum -a 512 {{ package_name }}-{{ package_version }}.tar.gz | diff - {{ package_name }}-{{ package_version }}.tar.gz.sha512

This ensures the integrity of the downloaded package against the provided checksum.

Sources: devel-common/src/sphinx_exts/includes/installing-providers-from-sources.rst

Helm Chart Package Preparation

The release management tooling includes commands for preparing Helm chart packages:

graph LR
    A[Chart Source] --> B[helm package]
    B --> C[Deterministic Repack]
    C --> D[PGP Signature]
    D --> E[Distribution Archive]

Package Signing

Helm chart packages can be signed using GPG for verification:

helm gpg sign -u <sign-email> <archive-name>

The signing process generates a provenance file (.tgz.prov) that can be used to verify the package integrity.

Sources: dev/breeze/src/airflow_breeze/commands/release_management_commands.py:400-420

Software Bill of Materials (SBOM)

The Airflow project generates and maintains SBOM information for security and compliance purposes.

SBOM Commands

The breeze CLI provides commands for managing SBOM information:

breeze sbom update-sbom-information [OPTIONS]

#### Command Options

OptionTypeDescription
--airflow-site-archive-pathPathDirectory for airflow-site-archive
--airflow-root-pathPathRoot of the airflow repository
--airflow-versionStringVersion of airflow to update SBOM
--airflow-constraints-referenceStringConstraints reference for SBOM generation

These files are placed in airflow-site-archive/docs-archive/ or generated/_build/docs/apache-airflow/stable/ depending on the configuration.

Sources: dev/breeze/src/airflow_breeze/commands/sbom_commands.py

SBOM File Generation

SBOM files are generated based on:

The SBOM information is automatically regenerated using the breeze release-management generate-providers-metadata command.

Sources: generated/README.md

Docker Compose Deployment

For development and testing purposes, Airflow can be deployed using Docker Compose.

Prerequisites

  • Docker Engine
  • Docker Compose v2+
  • SQLite database (automatically created if AIRFLOW_HOME is not set)

Basic Usage

For example commands that start Airflow, refer to the Executing commands documentation.

Sources: docker-stack-docs/README.md

Deployment Architecture

graph TD
    subgraph "Client Layer"
        A[Airflow CLI] --> B[REST API]
        C[Web UI] --> B
    end
    
    subgraph "Core Services"
        D[Scheduler] --> E[(Metadata DB)]
        F[Triggerer] --> E
        G[Webserver] --> E
    end
    
    subgraph "Task Execution"
        H[Workers] --> I[Message Queue]
        D --> I
        F --> I
    end
    
    subgraph "Storage"
        J[DAGs Repository]
        K[Logs Storage]
        L[Plugins]
    end
    
    H --> K
    G --> K
    D --> J

Default Connections

The Airflow deployment includes pre-configured connection templates for common services:

Connection IDTypeDefault Settings
google_cloud_defaultGoogle Cloud PlatformSchema: default
http_defaultHTTPHost: https://www.httpbin.org/
ftp_defaultFTPlocalhost:21
hive_cli_defaultHive CLIlocalhost:10000
hiveserver2_defaultHiveServer2localhost:10000
fs_defaultFile SystemPath: /
gremlin_defaultGremlinHost: gremlin:8182
iceberg_defaultIcebergHTTPS endpoint

Sources: airflow-core/src/airflow/utils/db.py

Verification and Security

Package Verification

PyPI releases can be verified by downloading the package, signature, and SHA sum files:

#!/bin/bash
PACKAGE_VERSION={{ package_version }}
PACKAGE_NAME={{ package_name }}
provider_download_dir=$(mktemp -d)
pip download --no-deps "${PACKAGE_NAME}==${PACKAGE_VERSION}" --dest "${provider_download_dir}"
curl "{{ base_url }}/{{ package_name_underscores }}-{{ package_version }}-py3-none-any.whl.asc" \
    -L -o "${provider_download_dir}/{{ package_name_underscores }}-{{ package_version }}-py3-none-any.whl.asc"
curl "{{ base_url }}/{{ package_name_underscores }}-{{ package_version }}-py3-none-any.whl.sha512" \
    -L -o "${provider_download_dir}/{{ package_name_underscores }}-{{ package_version }}-py3-none-any.whl.sha512"
echo "Please verify files downloaded to ${provider_download_dir}"
ls -

Sources: devel-common/src/sphinx_exts/includes/installing-providers-from-sources.rst

Sources: docker-stack-docs/README.md

Kubernetes Deployment

Related topics: Docker and Helm Deployment, Scheduler and Executor Architecture

Section Related Pages

Continue reading this section for the full explanation and source context.

Section Kubernetes Executor Architecture

Continue reading this section for the full explanation and source context.

Section Core Components

Continue reading this section for the full explanation and source context.

Section Kubernetes Executor Configuration

Continue reading this section for the full explanation and source context.

Related topics: Docker and Helm Deployment, Scheduler and Executor Architecture

Kubernetes Deployment

Apache Airflow provides comprehensive Kubernetes support through the CNCF Kubernetes provider package. This integration enables Airflow to run tasks as Kubernetes Pods, providing dynamic resource allocation, isolation, and scalability for workflow execution.

Architecture Overview

Airflow's Kubernetes deployment consists of multiple components that work together to provide a flexible, scalable execution environment.

Kubernetes Executor Architecture

The Kubernetes Executor is one of Airflow's core executors that launches a new Pod for each task instance. Unlike the Celery Executor which reuses workers, the Kubernetes Executor creates isolated Pods for each task execution.

graph TD
    A[Airflow Scheduler] -->|submits task| B[Kubernetes Executor]
    B -->|creates| C[Task Pod]
    B -->|watches| C
    C -->|completes| D[Pod Status Update]
    D -->|pods cleaned up| E[Resource Cleanup]
    
    F[Kubernetes API Server] -->|manages| C
    G[Worker Nodes] -->|hosts| C

Core Components

ComponentFile PathPurpose
KubernetesExecutorproviders/cncf/kubernetes/.../executors/kubernetes_executor.pyMain executor implementation
KubernetesPodOperatorproviders/cncf/kubernetes/.../operators/pod.pyOperator for running pods
KubeConfigproviders/cncf/kubernetes/.../kube_config.pyConfiguration management
PodGeneratorproviders/cncf/kubernetes/.../pod_generator.pyPod specification builder

Configuration

Kubernetes Executor Configuration

The Kubernetes Executor requires configuration in the airflow.cfg file under the [kubernetes] section.

[core]
executor = airflow.providers.cncf.kubernetes.executors.kubernetes_executor.KubernetesExecutor

[kubernetes]
namespace = default
pod_template_file = /path/to/pod_template.yaml
worker_container_repository = apache/airflow
worker_container_tag = latest

Key Configuration Parameters

ParameterDescriptionDefault
namespaceKubernetes namespace for task podsdefault
pod_template_filePath to pod template yaml-
worker_container_repositoryDocker image repositoryapache/airflow
worker_container_tagDocker image taglatest
delete_worker_podsDelete pods after task completionTrue
delete_worker_pods_on_failureDelete pods on task failureFalse
worker_pods_creation_batch_sizeBatch size for pod creation2

Sources: providers/cncf/kubernetes/src/airflow/providers/cncf/kubernetes/kube_config.py

Pod Generation

The PodGenerator class is responsible for constructing Kubernetes Pod specifications from Airflow task definitions.

Pod Template System

Airflow supports custom pod templates that define the base pod specification. These templates use Jinja2 templating to allow dynamic value injection.

apiVersion: v1
kind: Pod
metadata:
  name: airflow-worker-pod
spec:
  containers:
    - name: base
      image: "{{ container_image }}"
      env:
        - name: AIRFLOW__CORE__EXECUTOR
          value: LocalExecutor

Sources: providers/cncf/kubernetes/src/airflow/providers/cncf/kubernetes/pod_generator.py

Dynamic Pod Configuration

The KubernetesPodOperator allows dynamic configuration of pod specifications at runtime:

from airflow.providers.cncf.kubernetes.operators.pod import KubernetesPodOperator

run_pod = KubernetesPodOperator(
    task_id="run_kubernetes_pod",
    name="my-task-pod",
    namespace="default",
    image="python:3.9-slim",
    cmds=["python", "-c", "print('Hello from Kubernetes')"],
    labels={"app": "airflow", "environment": "production"},
    volumes=[config_volume],
    volume_mounts=[config_volume_mount],
    get_logs=True,
    is_delete_operator_pod=True,
)

Airflow Components on Kubernetes

Triggerer Deployment

The Airflow Triggerer runs as a Kubernetes Deployment to manage triggering logic in a distributed environment.

apiVersion: apps/v1
kind: Deployment
metadata:
  name: airflow-triggerer
  namespace: airflow
spec:
  replicas: 2
  selector:
    matchLabels:
      component: triggerer
      tier: airflow
  template:
    metadata:
      labels:
        component: triggerer
        tier: airflow
    spec:
      serviceAccountName: airflow-triggerer
      containers:
        - name: triggerer
          image: {{ .Values.images.triggerer }}
          args: ["triggerer"]
          ports:
            - name: logs
              containerPort: 8793

Sources: chart/templates/triggerer/triggerer-deployment.yaml

Component Services

ComponentKubernetes ObjectPurpose
SchedulerDeploymentDAG parsing and task scheduling
WebserverDeploymentAirflow UI serving
TriggererDeploymentDeferred task handling
WorkerDeployment/DaemonSetTask execution
FlowerDeploymentCelery monitoring (optional)
DatabaseStatefulSetMetadata storage
RedisStatefulSetMessage broker

Local Development with Skaffold

Airflow provides development workflows using Skaffold for hot-reloading code changes into running Kubernetes pods.

Skaffold Dev Loop

breeze k8s dev

This command runs the Skaffold dev loop to sync DAGs and airflow-core sources to running pods including scheduler, triggerer, dag-processor, and API Server with hot-reload support.

@pulumi_benchmark.group.command(
    name="dev",
    help=(
        "Run skaffold dev loop to sync dags and airflow-core sources to running pods "
        "(scheduler/triggerer/dag-processor/API Server hot-reload; UI auto-refresh not supported yet)."
    ),
)

Sources: dev/breeze/src/airflow_breeze/commands/kubernetes_commands.py

KubernetesPodOperator

The KubernetesPodOperator allows users to define and execute arbitrary Kubernetes Pods as Airflow tasks.

Operator Features

FeatureDescription
Full Pod SpecDefine complete pod specifications
Volume ManagementSupport for ConfigMaps, Secrets, PVCs
Image PullPrivate registry authentication
Resource LimitsCPU and memory constraints
Node SelectionPod affinity and node selectors
SidecarsSupport for sidecar containers

Basic Usage

from airflow.providers.cncf.kubernetes.operators.pod import KubernetesPodOperator

with dag:
    k = KubernetesPodOperator(
        task_id="demo_pod",
        name="demo-pod",
        image="python:3.9",
        cmds=["python", "-c", "print('Hello World')"],
        namespace="default",
        is_delete_operator_pod=True,
        get_logs=True,
    )

Sources: providers/cncf/kubernetes/src/airflow/providers/cncf/kubernetes/operators/pod.py

Executor Implementation

Task Execution Flow

sequenceDiagram
    participant Scheduler
    participant K8sExecutor
    participant KubeAPI
    participant Pod
    
    Scheduler->>K8sExecutor: Queue task for execution
    K8sExecutor->>KubeAPI: Create Pod
    KubeAPI->>Pod: Launch pod
    Pod->>Pod: Execute task
    Pod->>KubeAPI: Report completion
    KubeAPI->>K8sExecutor: Pod completed
    K8sExecutor->>Scheduler: Task result

Executor Configuration

class KubernetesExecutor:
    """Kubernetes executor implementation."""
    
    def __init__(self):
        self.kube_config = KubeConfig()
        self.manager = PodLauncher(
            kube_config=self.kube_config,
            in_cluster=self.kube_config.get("in_cluster", False),
        )

Sources: providers/cncf/kubernetes/src/airflow/providers/cncf/kubernetes/executors/kubernetes_executor.py

Security Considerations

Service Account Configuration

Running Airflow on Kubernetes requires proper service account configuration with appropriate RBAC permissions.

apiVersion: v1
kind: ServiceAccount
metadata:
  name: airflow-executor
  namespace: airflow

Sources: providers/cncf/kubernetes/src/airflow/providers/cncf/kubernetes/kube_config.py

Doramagic Pitfall Log

Source-linked risks stay visible on the manual page so the preview does not read like a recommendation.

high `ExternalTaskSensor` can succeed early for task groups with NULL task states

First-time setup may fail or require extra isolation and rollback planning.

medium README/documentation is current enough for a first validation pass.

The project should not be treated as fully validated until this signal is reviewed.

medium Maintainer activity is unknown

Users cannot judge support quality until recent activity, releases, and issue response are checked.

medium no_demo

The project may affect permissions, credentials, data exposure, or host boundaries.

Doramagic Pitfall Log

Doramagic extracted 16 source-linked risk signals. Review them before installing or handing real data to the project.

1. Installation risk: `ExternalTaskSensor` can succeed early for task groups with NULL task states

  • Severity: high
  • Finding: Installation risk is backed by a source signal: ExternalTaskSensor can succeed early for task groups with NULL task states. Treat it as a review item until the current version is checked.
  • User impact: First-time setup may fail or require extra isolation and rollback planning.
  • Recommended check: Open the linked source, confirm whether it still applies to the current version, and keep the first run isolated.
  • Evidence: Source-linked evidence: https://github.com/apache/airflow/issues/66877

2. Capability assumption: README/documentation is current enough for a first validation pass.

  • Severity: medium
  • Finding: README/documentation is current enough for a first validation pass.
  • User impact: The project should not be treated as fully validated until this signal is reviewed.
  • Recommended check: Open the linked source, confirm whether it still applies to the current version, and keep the first run isolated.
  • Evidence: capability.assumptions | github_repo:33884891 | https://github.com/apache/airflow | README/documentation is current enough for a first validation pass.

3. Maintenance risk: Maintainer activity is unknown

  • Severity: medium
  • Finding: Maintenance risk is backed by a source signal: Maintainer activity is unknown. Treat it as a review item until the current version is checked.
  • User impact: Users cannot judge support quality until recent activity, releases, and issue response are checked.
  • Recommended check: Open the linked source, confirm whether it still applies to the current version, and keep the first run isolated.
  • Evidence: evidence.maintainer_signals | github_repo:33884891 | https://github.com/apache/airflow | last_activity_observed missing

4. Security or permission risk: no_demo

  • Severity: medium
  • Finding: no_demo
  • User impact: The project may affect permissions, credentials, data exposure, or host boundaries.
  • Recommended check: Open the linked source, confirm whether it still applies to the current version, and keep the first run isolated.
  • Evidence: downstream_validation.risk_items | github_repo:33884891 | https://github.com/apache/airflow | no_demo; severity=medium

5. Security or permission risk: No sandbox install has been executed yet; downstream must verify before user use.

  • Severity: medium
  • Finding: No sandbox install has been executed yet; downstream must verify before user use.
  • User impact: The project may affect permissions, credentials, data exposure, or host boundaries.
  • Recommended check: Open the linked source, confirm whether it still applies to the current version, and keep the first run isolated.
  • Evidence: risks.safety_notes | github_repo:33884891 | https://github.com/apache/airflow | No sandbox install has been executed yet; downstream must verify before user use.

6. Security or permission risk: no_demo

  • Severity: medium
  • Finding: no_demo
  • User impact: The project may affect permissions, credentials, data exposure, or host boundaries.
  • Recommended check: Open the linked source, confirm whether it still applies to the current version, and keep the first run isolated.
  • Evidence: risks.scoring_risks | github_repo:33884891 | https://github.com/apache/airflow | no_demo; severity=medium

7. Security or permission risk: Apache Airflow 3.1.6

  • Severity: medium
  • Finding: Security or permission risk is backed by a source signal: Apache Airflow 3.1.6. Treat it as a review item until the current version is checked.
  • User impact: The project may affect permissions, credentials, data exposure, or host boundaries.
  • Recommended check: Open the linked source, confirm whether it still applies to the current version, and keep the first run isolated.
  • Evidence: Source-linked evidence: https://github.com/apache/airflow/releases/tag/3.1.6

8. Security or permission risk: Apache Airflow 3.1.7

  • Severity: medium
  • Finding: Security or permission risk is backed by a source signal: Apache Airflow 3.1.7. Treat it as a review item until the current version is checked.
  • User impact: The project may affect permissions, credentials, data exposure, or host boundaries.
  • Recommended check: Open the linked source, confirm whether it still applies to the current version, and keep the first run isolated.
  • Evidence: Source-linked evidence: https://github.com/apache/airflow/releases/tag/3.1.7

9. Security or permission risk: Apache Airflow 3.1.8

  • Severity: medium
  • Finding: Security or permission risk is backed by a source signal: Apache Airflow 3.1.8. Treat it as a review item until the current version is checked.
  • User impact: The project may affect permissions, credentials, data exposure, or host boundaries.
  • Recommended check: Open the linked source, confirm whether it still applies to the current version, and keep the first run isolated.
  • Evidence: Source-linked evidence: https://github.com/apache/airflow/releases/tag/3.1.8

10. Security or permission risk: Apache Airflow 3.2.0

  • Severity: medium
  • Finding: Security or permission risk is backed by a source signal: Apache Airflow 3.2.0. Treat it as a review item until the current version is checked.
  • User impact: The project may affect permissions, credentials, data exposure, or host boundaries.
  • Recommended check: Open the linked source, confirm whether it still applies to the current version, and keep the first run isolated.
  • Evidence: Source-linked evidence: https://github.com/apache/airflow/releases/tag/3.2.0

11. Security or permission risk: Apache Airflow 3.2.1

  • Severity: medium
  • Finding: Security or permission risk is backed by a source signal: Apache Airflow 3.2.1. Treat it as a review item until the current version is checked.
  • User impact: The project may affect permissions, credentials, data exposure, or host boundaries.
  • Recommended check: Open the linked source, confirm whether it still applies to the current version, and keep the first run isolated.
  • Evidence: Source-linked evidence: https://github.com/apache/airflow/releases/tag/3.2.1

12. Security or permission risk: Apache Airflow Ctl (airflowctl) 0.1.2

  • Severity: medium
  • Finding: Security or permission risk is backed by a source signal: Apache Airflow Ctl (airflowctl) 0.1.2. Treat it as a review item until the current version is checked.
  • User impact: The project may affect permissions, credentials, data exposure, or host boundaries.
  • Recommended check: Open the linked source, confirm whether it still applies to the current version, and keep the first run isolated.
  • Evidence: Source-linked evidence: https://github.com/apache/airflow/releases/tag/airflow-ctl/0.1.2

Source: Doramagic discovery, validation, and Project Pack records

Community Discussion Evidence

These external discussion links are review inputs, not standalone proof that the project is production-ready.

Sources 12

Count of project-level external discussion links exposed on this manual page.

Use Review before install

Open the linked issues or discussions before treating the pack as ready for your environment.

Community Discussion Evidence

Doramagic exposes project-level community discussion separately from official documentation. Review these links before using airflow with real data or production workflows.

Source: Project Pack community evidence and pitfall evidence