Doramagic Project Pack · Human Manual
airflow
Apache Airflow provides a comprehensive solution for workflow management with the following key characteristics:
Project Introduction
Related topics: Architecture Overview, Core Concepts
Continue reading this section for the full explanation and source context.
Continue reading this section for the full explanation and source context.
Continue reading this section for the full explanation and source context.
Continue reading this section for the full explanation and source context.
Related Pages
Related topics: Architecture Overview, Core Concepts
Project Introduction
Apache Airflow is an open-source platform designed for orchestrating, scheduling, and monitoring complex workflows. Originally developed at Airbnb, it has become the industry standard for data pipeline orchestration, enabling users to define, schedule, and execute workflows as Directed Acyclic Graphs (DAGs).
Sources: README.md
Overview
Apache Airflow provides a comprehensive solution for workflow management with the following key characteristics:
| Aspect | Description |
|---|---|
| Type | Open-source workflow orchestration platform |
| Language | Python |
| License | Apache License 2.0 |
| Primary Use | Data pipelines, ETL processes, ML workflows |
| Architecture | Distributed, scalable, extensible |
Sources: README.md
Core Architecture
Airflow follows a distributed architecture with several key components working together to manage workflow execution.
graph TD
subgraph Client["Client Layer"]
WebServer["Web Server"]
CLI["Command Line Interface"]
API["REST API"]
end
subgraph Scheduler["Core Components"]
Scheduler["Scheduler"]
Executor["Executor"]
Database["Metadata Database"]
end
subgraph Workers["Worker Layer"]
Workers["Workers"]
Tasks["Task Instances"]
end
subgraph External["External Systems"]
Triggerer["Triggerer"]
Logger["Logging"]
end
WebServer --> Database
CLI --> Database
API --> Database
Scheduler --> Database
Scheduler --> Executor
Executor --> Workers
Workers --> Database
Triggerer --> Database
style Database fill:#f9f,stroke:#333,stroke-width:2pxSources: airflow-core/src/airflow/cli/cli_config.py
DAG-Based Workflow Model
At the heart of Airflow is the DAG (Directed Acyclic Graph) concept. A DAG represents a collection of tasks with defined dependencies, arranged in a way that reflects the logical flow of work.
Key DAG Properties
| Property | Description |
|---|---|
| Directed | Tasks have explicit dependencies (upstream/downstream relationships) |
| Acyclic | No circular dependencies; workflows flow in one direction |
| Graph | Complex dependency structures are supported |
Default Connections
Airflow ships with pre-configured connection definitions for common integrations:
| Connection ID | Type | Default Configuration |
|---|---|---|
facebook_default | facebook_social | Facebook Ad account credentials |
fs_default | fs | Filesystem path / |
ftp_default | ftp | localhost:21, SSH key auth |
google_cloud_default | google_cloud_platform | Default GCP schema |
hive_cli_default | hive_cli | localhost:10000, Beeline mode |
hiveserver2_default | hiveserver2 | localhost:10000 |
http_default | http | https://www.httpbin.org/ |
gremlin_default | gremlin | gremlin:8182 |
Sources: airflow-core/src/airflow/utils/db.py
Providers Ecosystem
Apache Airflow's functionality is extended through Providers — packages that integrate Airflow with external services and systems.
graph LR
Airflow["Apache Airflow Core"] --> Providers["Providers"]
Providers --> Google["Google Cloud Provider"]
Providers --> AWS["Amazon Provider"]
Providers --> Azure["Microsoft Azure Provider"]
Providers --> Edge3["Edge3 Provider"]
Providers --> Others["Other Providers"]Provider Management
Providers can be discovered and managed through the Airflow CLI:
airflow providers [list|get|widgets|sensors|hooks|executors]
Key Provider Features
| Feature | Description |
|---|---|
| Hooks | Interfaces to external systems for connection management |
| Operators | Pre-built task templates for common operations |
| Sensors | Tasks that wait for external conditions |
| Transfers | Tasks for moving data between systems |
Sources: PROVIDERS.rst
Version Information
The Airflow version is managed centrally and can be accessed programmatically:
# airflow-core/src/airflow/version.py
version = "2.11.0"
Sources: airflow-core/src/airflow/version.py
Installation Methods
Airflow supports multiple installation approaches to suit different environments and use cases.
| Method | Use Case | Command/Config |
|---|---|---|
| PyPI | Standard installation | pip install apache-airflow |
| Sources | Development/contribution | Clone repository and install |
| Providers from sources | Testing provider changes | Breeze commands |
| Constraints | Reproducible builds | Use constraints files |
Sources: INSTALLING.md
Installing Specific Versions
The --use-airflow-version option provides flexibility for version installation:
| Version Specifier | Behavior |
|---|---|
none | Skip Airflow installation |
wheel | Install from local dist/ folder |
sdist | Install from source distribution |
owner/repo:branch | Install from GitHub repository |
PR_NUMBER | Install from a Pull Request |
Sources: dev/breeze/src/airflow_breeze/commands/common_options.py
Vendored Dependencies
Airflow maintains a _vendor package for dependencies that need special handling:
graph TD
Vendor["_vendor Package"] --> License["Move to licenses/ folder"]
Vendor --> Remove["Remove README/supporting files"]
Vendor --> Requirements["Add to pyproject.toml"]
Vendor --> Fixes["Re-apply historical fixes"]Vendoring Process
- Update
vendor.mdwith library, version, and SHA256 hash - Remove old files and directories
- Move LICENSE files to
licenses/folder - Add requirements to
pyproject.tomlwith appropriate comments - Re-apply any necessary cherry-picked fixes
Sources: airflow-core/src/airflow/_vendor/README.md
Development Environment (Breeze)
The Breeze tool is Airflow's primary development environment, providing consistent tooling for testing, building, and development.
Key Breeze Commands
| Command | Purpose |
|---|---|
breeze sbom update-sbom-information | Update SBOM information |
breeze workflow run-publish | Run documentation workflow |
breeze build | Build Airflow images |
Breeze Configuration Options
| Option | Description |
|---|---|
--airflow-version | Specify Airflow version |
--debian-version | Select base Debian version |
--docker-cache | Configure build caching |
--mount-sources | Control source mounting strategy |
--allow-pre-releases | Enable pre-release installations |
Sources: dev/breeze/src/airflow_breeze/commands/sbom_commands.py
CLI Commands Structure
Airflow provides a comprehensive command-line interface organized into logical groups:
graph TD
CLI["airflow CLI"] --> Groups
Groups --> Config["config - View configuration"]
Groups --> Info["info - Show system info"]
Groups --> Plugins["plugins - Dump plugin info"]
Groups --> Connections["connections - Manage connections"]
Groups --> Providers["providers - Display providers"]
Groups --> DAGs["dags - Manage DAGs"]
Groups --> db_manager["db-manager - Database management"]
Groups --> rotate_fernet["rotate-fernet-key - Rotate keys"]Core CLI Commands
| Command | Function | Purpose |
|---|---|---|
airflow config | View configuration | Display current Airflow settings |
airflow info | System information | Show environment details |
airflow plugins | Plugin dump | Display loaded plugins |
airflow connections | Connection management | CRUD operations for connections |
airflow providers | Provider display | List installed providers |
airflow db-manager | Database management | External DB manager operations |
Sources: airflow-core/src/airflow/cli/cli_config.py
Dependencies and Requirements
Generated Dependency Files
The repository maintains several auto-generated dependency files:
| File | Purpose | Generation Command |
|---|---|---|
devel_deps.txt | Development dependencies | ./dev/get_devel_deps.sh |
dep_tree.txt | Full dependency tree | uv tree --no-dedupe |
dependency_depth.json | Dependency depth analysis | Generated by Breeze |
PyPI README Generation
The PYPI_README.md is automatically generated from the main README using pre-commit hooks defined in .pre-commit-config.yaml, ensuring consistency between project documentation and PyPI listings.
Sources: generated/README.md
Summary
Apache Airflow is a robust, extensible workflow orchestration platform that provides:
- DAG-based workflow definition for complex pipeline management
- Extensible architecture through providers and plugins
- Multiple deployment options from development to production
- Comprehensive CLI and UI for monitoring and management
- Strong ecosystem of integrations with major cloud providers and services
The platform's design prioritizes dynamic pipeline generation, extensible operator library, and robust scheduling capabilities, making it the go-to choice for data engineering teams worldwide.
Sources: README.md
Core Concepts
Related topics: Scheduler and Executor Architecture, Data Flow and State Management
Continue reading this section for the full explanation and source context.
Continue reading this section for the full explanation and source context.
Continue reading this section for the full explanation and source context.
Continue reading this section for the full explanation and source context.
Related Pages
Related topics: Scheduler and Executor Architecture, Data Flow and State Management
Core Concepts
Apache Airflow is an open-source workflow orchestration platform designed to programmatically author, schedule, and monitor complex data pipelines. The platform provides a robust framework for defining workflows as Directed Acyclic Graphs (DAGs), enabling organizations to automate and manage data processing workflows at scale.
DAG (Directed Acyclic Graph)
The DAG is the fundamental building block of Airflow. It represents a collection of tasks with defined dependencies, organized in a way that reflects the logical flow of work through the system.
DAG Structure
A DAG defines the overall structure of a workflow:
- Nodes: Represent individual tasks or operations
- Edges: Define dependencies between tasks (task A must complete before task B can start)
- No Cycles: The graph must flow in one direction without circular dependencies
DAG Commands
Airflow provides comprehensive CLI commands for managing DAGs through the dag_cli_commands configuration.
| Command | Purpose |
|---|---|
dags list | List all DAGs in the environment |
dags details | Get detailed information about a specific DAG |
dags list-runs | List all runs for a specific DAG |
dags list-import-errors | Show DAGs with import errors |
dags report | Display DagBag loading report |
dags pause | Pause a DAG from scheduling |
dags unpause | Resume DAG scheduling |
dags backfill | Run subsections of a DAG for a date range |
dags test | Test a DAG execution |
Source: airflow-core/src/airflow/cli/cli_config.py
DAG Execution Parameters
When creating a backfill operation, the following parameters can be configured:
| Parameter | Description |
|---|---|
dag_id | The identifier of the DAG to backfill |
from_date | Start date for the backfill range |
to_date | End date for the backfill range |
run_conf | Configuration to pass to the DAG run |
run_backwards | Execute DAG runs in reverse chronological order |
max_active_runs | Maximum number of concurrent active DAG runs |
reprocess_behavior | How to handle already-processed runs |
run_on_latest_version | Whether to use the latest DAG version |
dry_run | Execute without making changes |
Source: airflow-core/src/airflow/cli/cli_config.py
Tasks and Task Instances
Task Instance States
A task instance represents a specific execution of a task within a DAG run. Each task instance progresses through various states during its lifecycle.
graph TD
A[None] --> B[Scheduled]
B --> C[Queued]
C --> D[Running]
D --> E{Success/Failed/Skipped}
E --> F[Success]
E --> G[Failed]
E --> H[Skipped]
D --> I[Upstream Failed]
F --> J[Complete]
G --> J
H --> J
I --> JTask Instance Management
The Airflow UI provides functionality for clearing task instances, allowing operators to re-execute tasks that may have failed or need to be reprocessed. The ClearTaskInstanceConfirmationDialog component handles the confirmation workflow for this operation.
Key attributes displayed when clearing a task instance:
- Current state of the task
- Start date (shown as relative time)
- User who executed the task (or "unknown user" if unavailable)
DAG Runs
A DAG Run represents an individual execution of an entire DAG at a specific point in time. Each DAG run has:
- Run ID: Unique identifier for the run
- State: Current state (running, success, failed)
- Execution Date: The logical date/time the DAG was scheduled to run
- Start/End Date: Actual execution timestamps
- Configuration: Run-specific configuration parameters
Listing DAG Runs
The list-runs command supports filtering by:
- State: Filter runs by their state (running, success, failed)
- No Backfill: Exclude backfill runs from results
- Start Date: Filter runs executed after a specific date
Source: airflow-core/src/airflow/cli/cli_config.py
Variables
Airflow Variables provide a mechanism for storing and retrieving arbitrary content or settings as simple key-value pairs. Variables are encrypted when the fernet_key is configured.
Variable Commands
| Command | Purpose |
|---|---|
variables list | List all variables |
variables get | Get a specific variable value |
variables set | Set a variable value |
variables delete | Delete a variable |
variables export | Export all variables to stdout |
variables import | Import variables from a file |
Variable Options
| Option | Description |
|---|---|
VAR | Variable key name |
VAR_VALUE | Value to set |
VAR_DESCRIPTION | Optional description |
SERIALIZE_JSON | Serialize value as JSON |
DESERIALIZE_JSON | Deserialize JSON value |
DEFAULT | Default value if variable doesn't exist |
VAR_IMPORT | Path to import file |
VAR_ACTION_ON_EXISTING_KEY | Action for existing keys |
Source: airflow-core/src/airflow/cli/cli_config.py
Connections
Airflow Connections store credentials and configuration information needed to connect to external systems. Default connections are automatically created during initialization.
Default Connection Types
The db.py utility creates several default connections:
| Connection ID | Type | Purpose |
|---|---|---|
facebook_social | facebook_social | Facebook social authentication |
fs_default | fs | File system operations |
ftp_default | ftp | FTP server access |
google_cloud_default | google_cloud_platform | GCP default credentials |
gremlin_default | gremlin | Gremlin graph database |
hive_cli_default | hive_cli | Hive command-line interface |
hiveserver2_default | hiveserver2 | HiveServer2 JDBC connections |
http_default | http | Generic HTTP endpoints |
iceberg_default | iceberg | Apache Iceberg catalog |
Source: airflow-core/src/airflow/utils/db.py
Connection CLI Commands
| Command | Purpose |
|---|---|
connections list | List all connections |
connections add | Add a new connection |
connections delete | Delete a connection |
connections get | Get connection details |
connections edit | Edit an existing connection |
Assets
Assets represent data sources or destinations in Airflow. They can be associated with DAGs to create data-aware scheduling.
Asset Commands
| Command | Purpose |
|---|---|
assets list | List all assets |
assets details | Show asset details |
assets materialze | Materialize an asset |
Asset Details Parameters
| Parameter | Description |
|---|---|
ASSET_ALIAS | Alias name for the asset |
ASSET_NAME | Name of the asset |
ASSET_URI | URI identifying the asset |
Source: airflow-core/src/airflow/cli/cli_config.py
Configuration Changes in Airflow 3.0
Airflow 3.0 introduces several configuration changes that affect default behavior.
Default Behavior Changes
| Configuration | Old Default | New Default |
|---|---|---|
catchup_by_default | True | False |
create_cron_data_intervals | True | False |
create_delta_data_intervals | True | False |
Source: airflow-core/src/airflow/cli/commands/config_command.py
Configuration Renames
Several scheduler configurations have been renamed:
| Old Name | New Name |
|---|---|
scheduler.processor_poll_interval | scheduler.scheduler_idle_sleep_time |
scheduler.deactivate_stale_dags_interval | scheduler.parsing_cleanup_interval |
scheduler.statsd_on | metrics.statsd_on |
scheduler.max_threads | dag_processor.parsing_processes |
Source: airflow-core/src/airflow/cli/commands/config_command.py
Catchup Behavior Change
In Airflow 3.0, DAGs without explicit catchup parameter definition will not catch up by default. This represents a change from Airflow 2.x behavior. Organizations relying on catchup behavior should set catchup = True in their DAG definitions or configure:
[scheduler]
catchup_by_default = True
Providers
Airflow Providers extend the core functionality by integrating with external services and systems.
Provider Information Commands
| Command | Purpose |
|---|---|
providers list | List all installed providers |
providers details | Show provider details |
providers hooks | List registered provider hooks |
providers triggers | List registered provider triggers |
providers executors | Get information about executors |
providers secrets | Get information about secrets backends |
providers connections | Get connection information |
providers notifications | Get notification information |
providers extra-links | List extra links from providers |
providers widgets | List connection form widgets |
providers behaviors | Get connection types with custom behaviors |
providers logging | Get task logging handlers |
providers auth-managers | Get auth managers information |
providers configs | Get provider configuration |
providers lazy-loaded | Check lazy loading status |
Source: airflow-core/src/airflow/cli/cli_config.py
Dependency Management
Dependency Tree
Airflow's dependency tree defines the module structure and relationships between packages. This information is generated and maintained in the repository for dependency analysis.
| File | Purpose |
|---|---|
dep_tree.txt | Complete dependency tree of Airflow |
dependency_depth.json | Dependency depth analysis |
Generated using:
uv tree --no-dedupe > /opt/airflow/generated/dep_tree.txt
Source: generated/README.md
Workflow Architecture
graph TD
subgraph "Authoring"
A[Define DAG] --> B[Define Tasks]
B --> C[Set Dependencies]
end
subgraph "Scheduling"
D[Scheduler] --> E{Parse DAGs}
E --> F[DAG Run Created]
F --> G[Task Instances Queued]
end
subgraph "Execution"
G --> H[Executor]
H --> I[Worker]
I --> J[Task Execution]
end
subgraph "Monitoring"
J --> K[Update State]
K --> L[UI/Webserver]
L --> M[Logs & Metrics]
end
G -->|Async Operations| N[Async Commands]
N --> O[Cloud Composer Environments]Security Features
Fernet Key Rotation
Airflow supports rotating Fernet encryption keys to maintain security of stored credentials and variables:
airflow rotate-fernet-key
This command rotates all encrypted connection credentials and variables.
Source: airflow-core/src/airflow/cli/cli_config.py
Configuration Reference
For secure connections, refer to the official documentation: https://airflow.apache.org/docs/apache-airflow/stable/howto/secure-connections.html
Additional CLI Commands
Info Command
Provides system and environment information:
airflow info [--anonymize] [--file-io] [--output OUTPUT] [--verbose]
Standalone Mode
Runs a complete Airflow instance for testing or development:
airflow standalone
Cheat Sheet
Displays a quick reference for common Airflow commands:
airflow cheat-sheet [--verbose]
Plugins Command
Dump information about loaded plugins:
airflow plugins [--output OUTPUT] [--verbose]
Source: airflow-core/src/airflow/cli/cli_config.py
Teams (RBAC)
Airflow supports team-based access control with the following operations:
Team Commands
| Command | Purpose |
|---|---|
teams create | Create a new team |
teams delete | Delete a team |
Team Creation Requirements
- Team names must be 3-50 characters long
- Only alphanumeric characters, hyphens, and underscores are allowed
Source: https://github.com/apache/airflow / Human Manual
Architecture Overview
Related topics: Scheduler and Executor Architecture, REST API
Continue reading this section for the full explanation and source context.
Continue reading this section for the full explanation and source context.
Continue reading this section for the full explanation and source context.
Continue reading this section for the full explanation and source context.
Related Pages
Related topics: Scheduler and Executor Architecture, REST API
Architecture Overview
Apache Airflow is an open-source workflow orchestration platform designed to programmatically author, schedule, and monitor complex data pipelines. The architecture is built around a distributed, scalable design that separates concerns between scheduling, execution, and monitoring.
Core Architectural Components
Airflow's architecture consists of several key components that work together to manage workflow execution across distributed environments.
Component Overview
| Component | Purpose | Key Files |
|---|---|---|
| Scheduler | Triggers scheduled tasks and submits tasks to executors | airflow-core/src/airflow/dag_processing/manager.py |
| Executor | Executes tasks distributed across workers | Configured via airflow.cfg |
| Web Server | Provides UI for monitoring and management | airflow/ui/ |
| Database | Stores DAGs, connections, variables, and execution history | Configured via airflow.cfg |
| DAG Processor | Parses and processes DAG files | airflow-core/src/airflow/dag_processing/manager.py |
Sources: airflow-core/docs/core-concepts/overview.rst
Basic Architecture
The basic Airflow architecture runs all components on a single machine, suitable for development, testing, and small-scale deployments.
graph TB
subgraph "Airflow Core"
WS[Web Server] <--> DB[(Metadata Database)]
SCH[Scheduler] <--> DB
SCH <--> EX[Executor]
EX <--> DB
DP[DAG Processor] <--> DB
end
subgraph "DAG Storage"
DAG[DAG Files] --> DP
end
WS -->|UI Access| Users
SCH -->|Schedule| DAGThis single-node deployment includes all core components running together, with the scheduler handling task scheduling and the web server providing the user interface.
Sources: airflow-core/docs/img/diagram_basic_airflow_architecture.py
Distributed Architecture
For production environments, Airflow supports a distributed architecture where components can scale independently.
graph TB
subgraph "Web Tier"
WS1[Web Server 1]
WS2[Web Server 2]
LB[Load Balancer]
end
subgraph "Scheduler Tier"
SCH1[Scheduler 1]
SCH2[Scheduler 2]
end
subgraph "Worker Tier"
W1[Worker 1]
W2[Worker 2]
W3[Worker N]
end
subgraph "Metadata"
DB[(PostgreSQL/MySQL)]
REDIS[(Redis/Message Broker)]
end
LB --> WS1
LB --> WS2
WS1 <--> DB
WS2 <--> DB
SCH1 <--> DB
SCH2 <--> DB
SCH1 --> REDIS
SCH2 --> REDIS
REDIS --> W1
REDIS --> W2
REDIS --> W3Scalability Features
The distributed architecture provides:
- Horizontal Scaling: Multiple schedulers and web servers can run in parallel
- Worker Flexibility: Workers can be added or removed based on workload
- High Availability: No single point of failure for critical components
- Isolation: DAG processing is separated from execution
Sources: airflow-core/docs/img/diagram_distributed_airflow_architecture.py
DAG Processing Architecture
The DAG Processor is responsible for reading, parsing, and validating DAG files before the Scheduler can schedule their tasks.
Processing Pipeline
graph LR
A[DAG Files] -->|Read| B[DAG Processor]
B -->|Parse| C[DAG Bag]
C -->|Validate| D[Valid DAGs]
D -->|Sync| E[(Metadata DB)]
B -->|Log| F[Processor Logs]Key Processing Manager
The DagFileProcessorManager handles:
- File Discovery: Scanning DAG directories for Python files
- Parsing: Converting Python DAG definitions into Airflow DAG objects
- Serialization: Storing parsed DAGs in the database
- Callback Handling: Processing DAG-level callbacks
# Simplified from airflow-core/src/airflow/dag_processing/manager.py
class DagFileProcessorManager:
def process_file(self, filepath):
"""Process a single DAG file and return list of DAGs."""
dagbag = DagBag(dag_folder=filepath)
for dag in dagbag.dags.values():
dag.sync_to_db()
return dagbag.dags
Sources: airflow-core/src/airflow/dag_processing/manager.py
Authentication and Authorization Architecture
Airflow supports pluggable authentication through the Auth Manager interface, allowing integration with various authentication backends.
graph TB
U[User] -->|Auth Request| AM[Auth Manager]
AM -->|User Lookup| DB[(Metadata DB)]
AM -->|Backend Check| BE[External Backend<br/>OIDC/SAML/LDAP]
BE -->|Validation| AM
AM -->|Permissions| PERMS[Permission Set]
PERMS -->|Grant Access| RES[Resources]Auth Manager Components
| Component | Responsibility |
|---|---|
AuthManager | Core interface for authentication |
FastAPIAuthManager | Default implementation for Airflow 3.0+ |
Backends | External identity providers |
The auth manager architecture enables:
- Integration with enterprise identity providers
- Role-based access control (RBAC)
- Fine-grained permissions on DAGs and tasks
Sources: airflow-core/docs/img/diagram_auth_manager_airflow_architecture.py
Configuration Management
Airflow 3.0 introduced significant configuration changes from Airflow 2.x:
Configuration Changes Summary
| Old Parameter | New Parameter | Section |
|---|---|---|
processor_poll_interval | scheduler_idle_sleep_time | scheduler |
deactivate_stale_dags_interval | parsing_cleanup_interval | scheduler |
statsd_on | statsd_on | metrics |
max_threads | parsing_processes | dag_processor |
create_cron_data_intervals | unchanged | scheduler |
create_delta_data_intervals | unchanged | scheduler |
Default Behavior Changes
In Airflow 3.0, the default for catchup_by_default is False, meaning DAGs without explicit catchup configuration will not backfill past runs.
Sources: airflow-core/src/airflow/cli/commands/config_command.py
Executor Architecture
Airflow supports multiple executor types for different deployment scenarios:
| Executor | Use Case | Scalability |
|---|---|---|
| LocalExecutor | Single machine, development | Limited |
| SequentialExecutor | Debugging, minimal resources | None |
| CeleryExecutor | Production, distributed | High |
| KubernetesExecutor | Containerized environments | Very High |
| RayExecutor | Ray cluster integration | High |
Executor Selection
Executors are configured in airflow.cfg:
[core]
executor = KubernetesExecutor
Sources: airflow-core/docs/core-concepts/overview.rst
CLI and Command Architecture
The Airflow CLI provides a comprehensive command-line interface for managing Airflow components:
Command Structure
airflow
├── config # View configuration
├── connections # Manage connections
├── dags # DAG management
├── db-manager # Database management
├── info # System information
├── plugins # Plugin dump
├── providers # Provider information
├── rotate-fernet-key # Credential rotation
├── standalone # All-in-one mode
└── version # Version display
Key CLI Commands
| Command | Purpose |
|---|---|
airflow dags backfill | Run historical DAG executions |
airflow dags list-runs | List DAG run instances |
airflow connections list | Show configured connections |
airflow providers list | Display loaded providers |
airflow rotate-fernet-key | Rotate encryption keys |
Sources: airflow-core/src/airflow/cli/cli_config.py
Provider Architecture
Providers extend Airflow's capabilities by integrating with external systems:
Provider Categories
- Cloud Providers: Google Cloud, AWS, Azure
- Service Integrations: HTTP, SSH, GraphQL
- Database Connectors: PostgreSQL, MySQL, Snowflake
- Data Processing: Spark, Databricks, dbt
Provider Communication
graph LR
A[Airflow Task] -->|Hook| B[Provider Hook]
B -->|API| C[External Service]
C -->|Response| B
B -->|Result| AProviders are discovered and managed through the airflow providers CLI commands.
Sources: airflow-core/src/airflow/cli/cli_config.py
Installation Modes
Docker Installation
Airflow provides official Docker images with multiple variants:
| Image Type | Description | Size |
|---|---|---|
apache/airflow:latest | Latest stable, default Python | ~1GB |
apache/airflow:3.x.x | Versioned release | ~1GB |
apache/airflow:slim-latest | Minimal installation | ~500MB |
Installation Methods
# Basic installation
pip install 'apache-airflow==3.2.0' \
--constraint "https://raw.githubusercontent.com/apache/airflow/constraints-3.2.0/constraints-3.10.txt"
# With extras
pip install 'apache-airflow[postgres,google]==3.2.0' \
--constraint "https://raw.githubusercontent.com/apache/airflow/constraints-3.2.0/constraints-3.10.txt"
Sources: docker-stack-docs/README.md Sources: generated/PYPI_README.md
Default Connections
Airflow ships with pre-configured default connections for common integrations:
| Connection ID | Type | Purpose |
|---|---|---|
google_cloud_default | google_cloud_platform | GCP resources |
fs_default | fs | File system access |
http_default | http | HTTP endpoints |
ftp_default | ftp | FTP/SFTP transfers |
hive_cli_default | hive_cli | Hive CLI connections |
hiveserver2_default | hiveserver2 | HiveServer2 connections |
Sources: airflow-core/src/airflow/utils/db.py
Development and Testing Architecture
The Breeze development environment provides an integrated development setup:
Breeze Features
- Pre-configured Docker-based development environment
- Multiple Python version support
- Integration testing framework
- Static code analysis tools
- Documentation building capabilities
Development Commands
# Start development shell
breeze shell
# Run tests
breeze test
# Build documentation
breeze build-docs
Sources: dev/breeze/src/airflow_breeze/commands/developer_commands.py
Shared Distribution Architecture
Airflow supports shared code distributions for cross-project functionality:
Configuration
[tool.airflow]
shared_distributions = [
"apache-airflow-shared-timezones",
]
Shared distributions are:
- Defined in
pyproject.tomlundertool.airflow - Symlinked via
_sharedfolder - Automatically synchronized by pre-commit hooks
Sources: shared/README.md
MyPy Type Checking Integration
Airflow provides custom MyPy plugins for enhanced type checking:
Available Plugins
| Plugin | Purpose |
|---|---|
airflow_mypy.plugins.decorators | Type checking for Airflow decorators |
airflow_mypy.plugins.outputs | Type inference for XCom arguments |
Configuration
[mypy]
plugins = airflow_mypy.plugins.decorators, airflow_mypy.plugins.outputs
Or in pyproject.toml:
[tool.mypy]
plugins = ["airflow_mypy.plugins.decorators", "airflow_mypy.plugins.outputs"]
Sources: dev/mypy/README.md
Build and Release Architecture
SBOM (Software Bill of Materials)
Airflow generates SBOMs for security and compliance tracking:
- Dependency tree generation via
uv tree - Dependency depth analysis
- Version-specific SBOM files
Documentation Build
The documentation system includes:
- Sphinx-based documentation generation
- Pagefind search integration
- Multi-version documentation support
- Third-party inventory tracking
Sources: dev/breeze/src/airflow_breeze/commands/sbom_commands.py Sources: devel-common/src/sphinx_exts/pagefind_search/README.md
Summary
Apache Airflow's architecture is designed for scalability, reliability, and extensibility:
- Modular Design: Components can be scaled independently
- Pluggable Executors: Support for various execution backends
- Extensible Providers: Integration with external systems
- Production-Ready: High availability and monitoring capabilities
- Developer-Friendly: Comprehensive tooling and documentation
The architecture supports deployments from single-machine development environments to large-scale, distributed production systems handling thousands of workflows.
Scheduler and Executor Architecture
Related topics: Architecture Overview, Kubernetes Deployment
Continue reading this section for the full explanation and source context.
Continue reading this section for the full explanation and source context.
Continue reading this section for the full explanation and source context.
Continue reading this section for the full explanation and source context.
Related Pages
Related topics: Architecture Overview, Kubernetes Deployment
Scheduler and Executor Architecture
Apache Airflow's scheduling and execution system is a distributed architecture that coordinates the parsing of Directed Acyclic Graphs (DAGs), scheduling of task instances, and execution of tasks across worker nodes. This document provides a comprehensive overview of how the Scheduler and Executors interact to process and execute workflows.
Overview
The Scheduler and Executor architecture in Apache Airflow consists of several interconnected components that work together to transform DAG definitions into executed tasks. The system separates concerns between scheduling decisions (when to run tasks based on dependencies and timetable data) and execution (how and where tasks actually run).
graph TD
A[DAG Files] --> B[DAG Processor]
B --> C[DAG Parsing]
C --> D[Serialized DAGs]
D --> E[Metadata Database]
E --> F[Scheduler]
F --> G[Executor]
G --> H[Workers]
H --> I[Task Execution]
I --> EScheduler Architecture
Scheduler Process
The Scheduler is a long-running daemon process that continuously monitors DAGs and schedules task instances for execution. It is configured through CLI commands defined in the system.
CLI Configuration:
The scheduler command is defined in cli_config.py and accepts multiple configuration parameters for controlling its behavior.
| Parameter | Purpose | Default |
|---|---|---|
--num-runs | Number of scheduler runs before exiting | -1 (infinite) |
--only-idle | Only schedule DAGs with idle tasks | False |
--pid | PID file location | None |
--daemon | Run as daemon process | False |
--stdout | stdout log file | None |
--stderr | stderr log file | None |
--log-file | Log file path | None |
--skip-serve-logs | Skip serving logs | False |
--dev | Development mode | False |
Sources: airflow-core/src/airflow/cli/cli_config.py
Job Lifecycle
The Scheduler creates and manages Jobs that represent its operational cycles. Each scheduler run follows a specific lifecycle that involves database interactions.
sequenceDiagram
participant CLI as CLI Component
participant JobRunner as JobRunner
participant DB as Database
participant TaskRunner as TaskRunner
activate CLI
CLI->>JobRunner: Create Job
activate JobRunner
JobRunner->>DB: Create Job Record
activate DB
DB-->>JobRunner: Job Created
JobRunner->>DB: Create Session
DB->>JobRunner: Session
deactivate DB
JobRunner->>CLI: Job Created
deactivate JobRunner
CLI->>JobRunner: Execute Job
activate JobRunner
par
JobRunner->>DB: Schedule Tasks
activate DB
DB-->>JobRunner: Scheduled Tasks
deactivate DB
and
JobRunner->>JobRunner: Process DAG Files
end
JobRunner->>DB: Perform Heartbeat
activate DB
DB->>JobRunner: Heartbeat Response
JobRunner->>JobRunner: Heartbeat Callback
DB-->>JobRunner: Close Session
deactivate DB
JobRunner->>CLI: Job Completed
deactivate JobRunner
deactivate CLISources: airflow-core/src/airflow/jobs/JOB_LIFECYCLE.md
Scheduler Responsibilities
The Scheduler performs the following key operations:
- DAG Parsing: Reads DAG files and parses them into Python objects
- Task Scheduling: Determines which tasks are ready to execute based on dependencies
- DagRun Creation: Creates DagRun records for scheduled DAG runs
- TaskInstance Creation: Creates TaskInstance records for tasks to be executed
- Heartbeating: Maintains its presence and reports health to the database
Executor Architecture
Executors are responsible for the actual execution of tasks. Airflow supports multiple executor types, each with different deployment characteristics.
Executor Types
| Executor | Description | Use Case |
|---|---|---|
| SequentialExecutor | Executes tasks sequentially in the same process | Development/Debugging |
| LocalExecutor | Executes tasks in parallel processes on a single machine | Single-node deployments |
| CeleryExecutor | Distributes tasks across multiple machines using Celery | Distributed production deployments |
| KubernetesExecutor | Creates pods per task in Kubernetes | Kubernetes-native deployments |
| LocalKubernetesExecutor | Hybrid of Local and Kubernetes executors | Testing/minimal Kubernetes |
Executor Loader
The ExecutorLoader class is responsible for loading and configuring executors based on Airflow configuration. It supports both simple executor names and complex module paths with optional aliases.
graph TD
A[Executor Config] --> B{ExecutorLoader}
B --> C{Check Format}
C -->|Simple Name| D[Load Core Executor]
C -->|Team:Executor| E[Parse Team Config]
E --> F[Alias:Module/Name]
F --> G[Resolve Module Path]
D --> H[Return ExecutorName]
G --> HExecutor Configuration Parsing:
The loader parses executor configurations in multiple formats:
- Simple name:
SequentialExecutor,LocalExecutor - Module path:
airflow.executors.local_executor.LocalExecutor - With alias:
MyAlias:LocalExecutor - Team-based:
team_name:executor_nameorteam_name:alias:executor_name
Sources: airflow-core/src/airflow/executors/executor_loader.py
Base Executor Interface
All executors inherit from BaseExecutor which defines the standard interface:
| Method | Purpose |
|---|---|
execute_async() | Execute a single task |
sync() | Sync state with metadata database |
end() | Cleanup executor resources |
terminate() | Force terminate all running tasks |
try_adopt_task_instances() | Adopt orphaned task instances |
render_slots() | Render available executor slots |
Local Executor
The Local Executor executes tasks in parallel worker processes on a single machine, providing a balance between simplicity and parallelism.
Key Features:
- Configurable parallelism (number of parallel workers)
- Supports task-level parallelism within a single node
- Inherits configuration from core executor registry
- Can be configured with aliases for multi-team deployments
# Example: Loading LocalExecutor with team configuration
executor_names_per_team.append(
ExecutorName(
alias=None,
module_path=cls.executors[module_or_name],
team_name=team_name
)
)
Sources: airflow-core/src/airflow/executors/local_executor.py
Task Scheduling Flow
The complete task scheduling flow involves multiple stages from DAG definition to task execution:
graph LR
A[DAG File] --> B[DAG Processor]
B --> C[Parse DAG]
C --> D[Serialize DAG]
D --> E[Store in DB]
E --> F[Scheduler]
F --> G[Evaluate Timetable]
G --> H{Check Dependencies}
H -->|Met| I[Create TaskInstance]
H -->|Not Met| J[Skip]
I --> K[Queue Task]
K --> L[Executor]
L --> M[Worker]
M --> N[Execute Task]
N --> O[Update State]
O --> P[Record XCom]DAG Processing
The DAG collection module handles parsing and synchronization of DAGs:
| Component | Responsibility |
|---|---|
DagBag | Collection of parsed DAGs from file system |
DagFileProcessor | Parses individual DAG files |
DagFileProcessorAgent | Manages multiple DAG processors |
SerializedDagModel | Database representation of serialized DAGs |
Sources: airflow-core/src/airflow/dag_processing/collection.py
Timetables
Timetables determine when DAGs should be triggered. They provide schedule information and calculate logical dates for DAG runs.
Key Timetable Methods:
| Method | Purpose |
|---|---|
describe() | Human-readable schedule description |
infer_data_interval() | Infer run boundaries from logical date |
get_next_runtime() | Calculate next scheduled run time |
validate() | Validate timetable configuration |
Sources: airflow-core/src/airflow/timetables/base.py
Task State Management
Task instances maintain state throughout their lifecycle. The TaskState class provides methods for managing task state via supervisor communications.
graph TD
A[Task Starts] --> B[Scheduled]
B --> C[Queued]
C --> D[Started]
D --> E{Ran Successfully?}
E -->|Yes| F[Success]
E -->|No| G{Retryable?}
G -->|Yes| H[Up for Retry]
H --> C
G -->|No| I[Failed]
F --> J[Emit XCom]
I --> JTaskState API
The TaskState class provides key-value storage for task state information:
| Method | Description |
|---|---|
get(key) | Retrieve task state value by key |
set(key, value) | Store task state value |
delete(key) | Delete a specific key |
clear(all_map_indices) | Clear all keys or map-index specific keys |
Sources: task-sdk/src/airflow/sdk/execution_time/context.py
DAG Runs and Task Instances
DagRun States
| State | Description |
|---|---|
queued | Initial state when DAG run is created |
running | DAG run is currently executing |
success | All tasks in the DAG completed successfully |
failed | DAG run failed due to task or system failure |
Run Types
| Type | Trigger Mechanism |
|---|---|
scheduled | Triggered automatically by timetable |
manual | Triggered by user action via CLI or UI |
dataset | Triggered by dataset dependency |
backfill | Triggered by explicit backfill command |
Sources: airflow-core/src/airflow/ui/src/pages/DagRuns.tsx
CLI Commands for Scheduler and Executor
Scheduler Commands
# Start the scheduler
airflow scheduler
# Start with specific number of runs
airflow scheduler --num-runs 10
# Run in daemon mode
airflow scheduler --daemon --log-file /path/to/scheduler.log
# Development mode
airflow scheduler --dev
DAG Processing Commands
# Trigger DAG run
airflow dags trigger <dag_id>
# Test task
airflow tasks test <dag_id> <task_id> <logical_date>
# Clear task instances
airflow tasks clear <dag_id>
# Render task template
airflow tasks render <dag_id> <task_id> <logical_date>
Sources: airflow-core/src/airflow/cli/cli_config.py
Configuration
Executor Configuration
Executors are configured in airflow.cfg or through environment variables:
[core]
executor = LocalExecutor
[celery]
celery_executor_config = ...
Scheduler Configuration
| Configuration | Description | Default |
|---|---|---|
scheduler_num_runs | Number of scheduler runs | -1 (infinite) |
scheduler_idle_sleep_time | Seconds between scheduler loops | 1 |
num_runs | Alternative parameter for number of runs | -1 |
only_idle | Only schedule idle DAGs | False |
Signal Handling
The scheduler supports the following signals for operational control:
| Signal | Action |
|---|---|
SIGUSR2 | Dump a snapshot of task state being tracked by the executor |
Example usage:
pkill -f -USR2 "airflow scheduler"
Sources: airflow-core/src/airflow/cli/cli_config.py
Related Components
Triggerer
The Triggerer is a separate daemon that manages lightweight asynchronous triggers for tasks that need to wait for external conditions:
airflow triggerer --capacity 1000 --queues queue1,queue2
| Parameter | Purpose |
|---|---|
--capacity | Maximum concurrent triggers |
--queues | Queues to consume from |
--pid | PID file location |
--daemon | Run as daemon |
DAG Processor
The DAG Processor parses and validates DAG files before they are scheduled:
airflow dag-processor --bundle-name <name> --num-runs 5
Summary
The Scheduler and Executor Architecture in Apache Airflow provides a robust, scalable system for orchestrating complex workflows. Key architectural principles include:
- Separation of Concerns: Scheduler handles scheduling decisions; Executors handle task execution
- Pluggable Executors: Multiple executor types support different deployment scenarios
- Database-Driven State: All state is persisted in the metadata database for durability
- Continuous Loop: Scheduler runs continuously, periodically evaluating DAGs and scheduling tasks
- Team-Based Execution: Modern Airflow supports team-based executor configuration with aliases
REST API
Related topics: User Interface, Architecture Overview
Continue reading this section for the full explanation and source context.
Continue reading this section for the full explanation and source context.
Continue reading this section for the full explanation and source context.
Continue reading this section for the full explanation and source context.
Related Pages
Related topics: User Interface, Architecture Overview
REST API
Apache Airflow provides a comprehensive REST API for programmatic interaction with the workflow automation platform. The API is built using FastAPI and enables external systems and applications to manage DAGs, trigger executions, monitor workflows, and interact with task instances.
Architecture Overview
Apache Airflow's REST API is architected as a multi-layer FastAPI application that separates concerns between the core API and the execution API.
graph TD
subgraph "Client Layer"
A[External Clients]
B[CLI Tools]
C[UI Dashboard]
end
subgraph "REST API Layer"
D[Core API<br/>/api/v1]
E[Execution API<br/>/execution]
end
subgraph "Service Layer"
F[DAG Management Service]
G[Task Execution Service]
H[Configuration Service]
end
subgraph "Data Layer"
I[Airflow Database]
J[Metadata DB]
end
A --> D
B --> D
C --> D
C --> E
D --> F
D --> H
E --> G
F --> I
G --> I
H --> JCore API (`/api/v1`)
The Core API provides the primary interface for DAG management, monitoring, and administrative operations. It handles:
- DAG run creation and management
- Task instance operations
- Connection and variable management
- User and permission management
- Plugin and provider information
Sources: airflow-core/src/airflow/api_fastapi/core_api/app.py
Execution API (`/execution`)
The Execution API is designed for lightweight, high-performance task execution operations. It provides endpoints for:
- Task state updates
- XCom value operations
- Task heartbeat signals
- Execution context retrieval
Sources: airflow-core/src/airflow/api_fastapi/execution_api/app.py
Authentication and Authorization
Airflow's REST API supports multiple authentication mechanisms through a pluggable auth manager architecture.
Authentication Flow
sequenceDiagram
participant Client
participant API as REST API
participant Auth as Auth Manager
participant DB as Database
Client->>API: Request with credentials
API->>Auth: Validate credentials
Auth->>DB: Check user/permissions
DB-->>Auth: User info + permissions
Auth-->>API: Auth result
alt Authentication Success
API-->>Client: 200 + Response data
else Authentication Failure
API-->>Client: 401/403 + Error
endSimple Auth Manager
For development and testing environments, Airflow provides a Simple Auth Manager that supports basic authentication mechanisms.
Sources: airflow-core/src/airflow/api_fastapi/auth/managers/simple/simple_auth_manager.py
Access Control
The REST API implements granular access control through DAG-level permissions:
| Permission | Description |
|---|---|
GET | Read access to DAG information |
POST | Create/modify DAG resources |
DELETE | Remove DAG resources |
EDIT | Modify DAG configuration |
Access control is enforced through dependency injection on route handlers:
dependencies=[
Depends(requires_access_dag("GET")),
Depends(requires_access_dag("GET", DagAccessEntity.DEPENDENCIES)),
Depends(requires_access_dag("GET", DagAccessEntity.TASK_INSTANCE)),
]
Sources: airflow-core/src/airflow/api_fastapi/core_api/routes/ui/structure.py
DAG Runs API
DAG Runs represent individual executions of a Directed Acyclic Graph (DAG). The DAG Runs API provides comprehensive endpoints for managing workflow executions.
DAG Run Data Model
| Field | Type | Description |
|---|---|---|
dag_id | string | Unique identifier for the DAG |
run_id | string | Unique identifier for this execution |
state | enum | Current state (queued, running, success, failed) |
conf | object | Configuration passed to the DAG |
logical_date | datetime | Scheduled execution time |
start_date | datetime | Actual execution start time |
end_date | datetime | Execution completion time |
external_trigger | boolean | Whether triggered externally |
Sources: airflow-core/src/airflow/api_fastapi/core_api/datamodels/dag_run.py
Key Endpoints
| Endpoint | Method | Description |
|---|---|---|
/dags/{dag_id}/dagRuns | GET | List DAG runs with filtering |
/dags/{dag_id}/dagRuns | POST | Trigger a new DAG run |
/dags/{dag_id}/dagRuns/{run_id} | GET | Get specific DAG run details |
/dags/{dag_id}/dagRuns/{run_id} | DELETE | Delete a DAG run |
/dags/{dag_id}/dagRuns/{run_id}/clear | POST | Clear task instances |
/dags/{dag_id}/dagRuns/{run_id}/confirm | POST | Confirm a DAG run |
/dags/{dag_id}/dagRuns/{run_id}/update | PATCH | Update DAG run state |
Sources: airflow-core/src/airflow/api_fastapi/core_api/routes/ui/dag_runs.py
Query Parameters
The DAG Runs list endpoint supports extensive filtering:
| Parameter | Type | Description |
|---|---|---|
limit | integer | Maximum number of results (default: 100) |
offset | integer | Pagination offset |
order_by | string | Sort field |
state | enum | Filter by state |
dag_run_id | string | Filter by run ID |
logical_date | datetime | Filter by logical date |
start_date | datetime | Filter by start date |
end_date | datetime | Filter by end date |
include_upstream | boolean | Include upstream dependencies |
include_downstream | boolean | Include downstream tasks |
depth | integer | Tree depth limit |
root | string | Root node filter |
external_dependencies | boolean | Include external dependencies |
API Structure and Organization
Route Organization
The REST API routes are organized by functional area within the FastAPI application structure:
graph LR
subgraph "/api/v1"
A[UI Routes]
B[DAG Routes]
C[Task Routes]
D[Connection Routes]
E[Variable Routes]
F[Plugin Routes]
end
subgraph "/execution"
G[Task Execution]
H[XCom Operations]
endResponse Models
All API endpoints return structured responses using Pydantic models. Responses include:
- Data: The requested resource or operation result
- Meta: Pagination information and metadata
- Links: HATEOAS-style navigation links
Error Handling
The API implements consistent error handling with structured error responses:
| Status Code | Category |
|---|---|
| 400 | Bad Request - Invalid parameters |
| 401 | Unauthorized - Authentication required |
| 403 | Forbidden - Insufficient permissions |
| 404 | Not Found - Resource doesn't exist |
| 409 | Conflict - State conflict |
| 500 | Internal Server Error |
Security Considerations
Session Management
The REST API integrates with Airflow's session management system. When authentication is enabled:
- Clients must authenticate to receive a session cookie
- Subsequent requests include the session cookie
- Sessions expire based on configuration settings
Role-Based Access Control (RBAC)
The API supports RBAC through integration with the auth manager:
- Admin: Full access to all resources
- Op: Access to DAG operations
- User: Read access to DAGs, limited write access
- Viewer: Read-only access
Configuration
Enabling the REST API
The REST API is enabled by default when Airflow is configured with a supported auth manager. Key configuration options:
| Option | Default | Description |
|---|---|---|
auth_manager | simple | Authentication backend |
api_url | - | Base URL for API endpoints |
secret_key | - | Session encryption key |
CORS Configuration
Cross-Origin Resource Sharing (CORS) can be configured to allow web clients to access the API:
| Setting | Description |
|---|---|
access_control_allow_origins | Allowed origins |
access_control_allow_methods | Allowed HTTP methods |
access_control_allow_headers | Allowed headers |
API Versioning
Apache Airflow maintains API stability through versioning:
- Current Version:
/api/v1 - Version Prefix: All endpoints are prefixed with the version
- Stability Guarantee: Within a major version, breaking changes are avoided
Sources: airflow-core/docs/stable-rest-api-ref.rst
Usage Examples
Triggering a DAG Run
curl -X POST "http://airflow-server:8080/api/v1/dags/my_dag/dagRuns" \
-H "Content-Type: application/json" \
-H "Authorization: Bearer <token>" \
-d '{
"dag_run_id": "manual_run_001",
"logical_date": "2024-01-15T10:00:00Z",
"conf": {"key": "value"}
}'
Listing DAG Runs
curl "http://airflow-server:8080/api/v1/dags/my_dag/dagRuns?state=running&limit=10" \
-H "Authorization: Bearer <token>"
See Also
- CLI Commands - Alternative command-line interface
- Authentication - Auth manager configuration
- DAG Runs - DAG execution management
- Task Instances - Task operation API
Sources: airflow-core/src/airflow/api_fastapi/core_api/app.py
User Interface
Related topics: REST API, Core Concepts
Continue reading this section for the full explanation and source context.
Continue reading this section for the full explanation and source context.
Continue reading this section for the full explanation and source context.
Continue reading this section for the full explanation and source context.
Related Pages
Related topics: REST API, Core Concepts
User Interface
Overview
Apache Airflow's User Interface (UI) is a modern web-based frontend built with React and TypeScript that provides comprehensive workflow management, monitoring, and operational capabilities. The UI serves as the primary visual interface for data engineers and operators to interact with DAGs, task instances, DAG runs, connections, and various Airflow components.
The UI framework is organized under airflow-core/src/airflow/ui/ and leverages contemporary React patterns including component composition, lazy loading, and type-safe TypeScript implementations. Sources: airflow-core/docs/ui.rst:1-50
Architecture Overview
Technology Stack
| Component | Technology | Purpose |
|---|---|---|
| Framework | React 18+ | UI component library |
| Language | TypeScript | Type safety and better DX |
| State Management | TanStack Query / React Query | Server state management |
| Routing | React Router v7 | Client-side routing |
| Styling | Chakra UI | Component library and styling |
| Build Tool | Vite | Fast development and building |
| Icons | Lucide React | Consistent iconography |
Sources: airflow-core/src/airflow/ui/package.json:1-30
Directory Structure
airflow/
├── ui/
│ ├── src/
│ │ ├── layouts/ # Page layout components
│ │ │ └── Details/ # Detail view layouts
│ │ │ ├── Grid/ # Grid view for DAGs
│ │ │ └── Graph/ # Graph view for DAGs
│ │ ├── pages/ # Page components
│ │ │ ├── Dashboard/ # Dashboard pages
│ │ │ ├── Dag/ # DAG detail pages
│ │ │ ├── Run/ # Run detail pages
│ │ │ └── Connections/ # Connection management
│ │ ├── components/ # Reusable UI components
│ │ │ ├── Clear/ # Clear action components
│ │ │ └── TriggerDag/ # DAG triggering components
│ │ └── router.tsx # Application routing configuration
│ └── package.json
└── providers/
└── edge3/ # Edge provider with custom UI
└── plugins/www/src/
├── pages/ # Edge-specific pages
└── components/ # Edge-specific components
Routing System
The UI uses React Router v7 for client-side navigation. Routes are defined in router.tsx and organized into two main categories: main application routes and detail views.
graph TD
A["/"] --> B["Dashboard"]
A --> C["DAGs List"]
A --> D["DAG Detail"]
D --> D1["Grid View"]
D --> D2["Graph View"]
D --> D3["Overview"]
A --> E["DAG Runs"]
A --> F["Connections"]
A --> G["Providers"]
style A fill:#e1f5fe
style D1 fill:#fff3e0
style D2 fill:#e8f5e9
style D3 fill:#f3e5f5Core Route Definitions
The router configuration maps URL paths to page components and handles lazy loading for improved performance:
// Simplified route structure from router.tsx
routes = [
{ path: "/", component: Dashboard },
{ path: "/dags", component: DagList },
{ path: "/dags/:dagId", component: DagDetail },
{ path: "/dags/:dagId/runs", component: DagRuns },
{ path: "/runs/:dagRunId", component: RunDetail },
{ path: "/connections", component: Connections },
]
Page Components
Dashboard
The Dashboard (Dashboard.tsx) serves as the landing page, providing an overview of the Airflow environment. It typically includes:
- DAG statistics and health metrics
- Recent DAG runs
- Failed or stuck tasks requiring attention
- Quick access to frequently used DAGs
Sources: airflow-core/src/airflow/ui/src/pages/Dashboard/Dashboard.tsx:1-50
DAG Detail Views
DAG detail pages provide comprehensive views of individual DAGs with multiple visualization modes:
#### Grid View
The Grid view (Grid.tsx) displays DAG tasks in a tabular format, allowing users to:
- View task states across multiple DAG runs
- Navigate through historical runs
- Identify failed or running tasks quickly
graph LR
subgraph "Grid View Structure"
H["Header: DAG Info"]
T["Task Grid Table"]
F["Filter/Sort Controls"]
end
T --> T1["Task Columns"]
T --> T2["Run Rows"]
T1 --> Cell["State Cell"]Sources: airflow-core/src/airflow/ui/src/layouts/Details/Grid/Grid.tsx:1-80
#### Graph View
The Graph view (Graph.tsx) renders the DAG structure visually as a directed acyclic graph, showing:
- Task dependencies and relationships
- Task execution states with color coding
- Interactive node selection and navigation
Sources: airflow-core/src/airflow/ui/src/layouts/Details/Graph/Graph.tsx:1-80
#### Overview Page
The Overview page (Overview.tsx) provides a comprehensive dashboard for a specific DAG:
interface OverviewComponents {
dagStats: DagStats; // DAG statistics
failedRuns: FailedRuns; // Failed run alerts
durationChart: DurationChart; // Duration visualization
assetEvents: AssetEvents; // Asset event tracking
dagDeadlines: DagDeadlines; // Deadline management
failedLogs: FailedLogs; // Failed task logs
}
Sources: airflow-core/src/airflow/ui/src/pages/Dag/Overview/Overview.tsx:1-100
DAG Runs Page
The DAG Runs page (DagRuns.tsx) displays a table of all DAG runs with the following columns:
| Column | Description | Features |
|---|---|---|
| DAG Run ID | Unique identifier | Link to run detail |
| State | Current state | Colored badge |
| Run Type | Scheduled, manual, etc. | Icon + text |
| Run After | Scheduled execution time | Time component |
| Triggering User | User who triggered | Username display |
| Start Date | Execution start time | Timestamp |
| End Date | Execution end time | Timestamp |
| Duration | Total execution time | Calculated field |
| DAG Versions | Version tracking | Version badges |
Sources: airflow-core/src/airflow/ui/src/pages/DagRuns.tsx:1-100
Dialog Components
Dialogs are used throughout the UI for focused interactions, confirmations, and detailed forms.
Confirmation Dialogs
The ClearTaskInstanceConfirmationDialog.tsx demonstrates the dialog pattern used for critical operations:
<Dialog.Root lazyMount onOpenChange={onClose} open={open} size="xl">
<Dialog.Content backdrop>
<Dialog.Header>
<Dialog.Title>
<Icon color="tomato"><GoAlertFill /></Icon>
{translate("dags:runAndTaskActions.confirmationDialog.title")}
</Dialog.Title>
</Dialog.Header>
<Dialog.Body>
{/* Confirmation details */}
</Dialog.Body>
<Dialog.Footer>
<Button onClick={onClose}>{translate("common:modal.confirm")}</Button>
</Dialog.Footer>
</Dialog.Content>
</Dialog.Root>
Key characteristics:
- lazyMount: Content renders only when opened
- unmountOnExit: Complete cleanup when closed
- backdrop: Modal overlay for focus
- size variants: sm, md, lg, xl for different content types
Edit Dialogs
The EditConnectionButton.tsx demonstrates dialog usage for editing forms:
<Dialog.Root lazyMount onOpenChange={handleClose} open={open} size="xl" unmountOnExit>
<Dialog.Content backdrop>
<Dialog.Header>
<Heading size="xl">{translate("connections.edit")}</Heading>
</Dialog.Header>
<Dialog.Body>
<ConnectionForm
error={error}
initialConnection={initialConnectionValue}
isEditMode={true}
isPending={isPending}
mutateConnection={editConnection}
/>
</Dialog.Body>
</Dialog.Content>
</Dialog.Root>
Sources: airflow-core/src/airflow/ui/src/pages/Connections/EditConnectionButton.tsx:1-50
Component Library
Clear Actions
The UI provides comprehensive task and group clearing functionality:
graph TD
A["Clear Action Trigger"] --> B{"Single vs Group"}
B -->|Single| C["ClearTaskInstanceDialog"]
B -->|Group| D["ClearGroupTaskInstanceDialog"]
C --> E["Confirmation Dialog"]
D --> F["Options Selection"]
F --> F1["Past Tasks"]
F --> F2["Future Tasks"]
F --> F3["Upstream"]
F --> F4["Downstream"]
F --> F5["Only Failed"]
E --> G["Action Accordion"]
F --> G
G --> H["API Execution"]Components in airflow-core/src/airflow/ui/src/components/Clear/TaskInstance/:
ClearTaskInstanceDialog.tsx- Single task clearingClearTaskInstanceConfirmationDialog.tsx- Confirmation UIClearGroupTaskInstanceDialog.tsx- Bulk task clearing
Trigger DAG Components
The TriggerDAGAdvancedOptions.tsx provides advanced options when triggering DAGs:
| Option | Purpose |
|---|---|
| dagRunId | Custom run identifier |
| partitionKey | Partition-based execution |
| note | Execution notes/documentation |
<Controller
control={control}
name="dagRunId"
render={({ field }) => (
<Field.Root mt={6} orientation="horizontal">
<Field.Label fontSize="md" style={{ flexBasis: "30%" }}>
{translate("runId")}
</Field.Label>
<Stack css={{ flexBasis: "70%" }}>
<Input {...field} size="sm" />
<Field.HelperText>{translate("components:triggerDag.runIdHelp")}</Field.HelperText>
</Stack>
</Field.Root>
)}
/>
Sources: airflow-core/src/airflow/ui/src/components/TriggerDag/TriggerDAGAdvancedOptions.tsx:1-80
Edge Provider UI
The Edge provider (providers/edge3/) extends the base UI with worker-specific pages and components.
Worker Management Page
The WorkerPage.tsx provides a table-based interface for managing edge workers:
graph LR
subgraph "Worker Page Structure"
T["Worker Table"]
H["Header Actions"]
F["Filter Controls"]
end
T --> C1["Worker Name"]
T --> C2["Queues"]
T --> C3["Active Jobs"]
T --> C4["System Info"]
T --> C5["Operations"]Worker operations include:
- Delete: Available for offline, unknown, or offline maintenance states
- Shutdown: Available for idle, running, maintenance states
- Enter Maintenance: Set worker to maintenance mode with comment
- Exit Maintenance: Remove worker from maintenance mode
Sources: providers/edge3/src/airflow/providers/edge3/plugins/www/src/pages/WorkerPage.tsx:1-100
Worker Operation Dialogs
Bulk operations are supported via BulkWorkerOperations.tsx:
<Dialog.Root>
<Portal>
<Dialog.Backdrop />
<Dialog.Positioner>
<Dialog.Content>
<Dialog.Header>
<Dialog.Title>
Delete {deleteWorkers.length} selected worker(s)
</Dialog.Title>
</Dialog.Header>
<Dialog.Body>
<List.Root>
{deleteWorkers.map((worker) => (
<List.Item key={worker.worker_name}>{worker.worker_name}</List.Item>
))}
</List.Root>
</Dialog.Body>
<Dialog.Footer>
<Button colorPalette="danger" loading={isBulkDeletePending}>
Delete Workers
</Button>
</Dialog.Footer>
</Dialog.Content>
</Dialog.Positioner>
</Portal>
</Dialog.Root>
Internationalization
The UI uses i18n (internationalization) patterns for all user-facing text:
// Translation usage example
translate("dagRun.runAfter") // Column headers
translate("dags:runAndTaskActions.confirmationDialog.title")
translate("common:modal.confirm")
translate("taskInstance", { count: affectedTasks.total_entries })
Translation keys are organized by:
- common: Shared UI elements
- components: Component-specific strings
- dags: DAG-related UI strings
- dagRun: DAG run specific strings
State Management and Data Fetching
The UI uses TanStack Query (React Query) for server state management:
// Typical data fetching pattern
const { data, isLoading, error } = useQuery({
queryKey: ['dagRuns', dagId, page],
queryFn: () => fetchDagRuns(dagId, page),
});
Key patterns:
- Optimistic updates: Immediate UI feedback during mutations
- Lazy loading: Components render only when needed
- Error boundaries: Graceful error handling with
ErrorAlertcomponents - Loading states: Skeleton loaders for better UX
Pagination
Paginated lists are implemented consistently throughout the UI:
<Pagination.Root
count={data?.total_entries ?? 0}
onPageChange={(event) => setPage(event.page)}
page={page}
pageSize={PAGE_LIMIT}
>
<HStack>
<Pagination.PrevTrigger />
<Pagination.Items />
<Pagination.NextTrigger />
</HStack>
</Pagination.Root>
Standard pagination constants:
- PAGE_LIMIT: Default items per page (typically 25-50)
- total_entries: Total count from API response
Summary
Apache Airflow's User Interface is a comprehensive React-based frontend that provides:
- Multi-view DAG Visualization: Grid, Graph, and Overview views for different use cases
- Comprehensive Task Management: Clear, trigger, and manage task instances
- Connection Management: Visual interface for managing Airflow connections
- Edge Worker Control: Dedicated UI for edge worker deployment management
- Consistent Component Patterns: Reusable dialogs, tables, and forms
- Internationalization: Full i18n support for global deployments
- Type Safety: Full TypeScript coverage for reliable development
The UI architecture emphasizes modularity, lazy loading, and consistent patterns to ensure maintainability and performance across large-scale Airflow deployments.
Data Flow and State Management
Related topics: Core Concepts, Connections and Variables
Continue reading this section for the full explanation and source context.
Continue reading this section for the full explanation and source context.
Continue reading this section for the full explanation and source context.
Continue reading this section for the full explanation and source context.
Related Pages
Related topics: Core Concepts, Connections and Variables
Data Flow and State Management
Overview
Data Flow and State Management in Apache Airflow encompasses the mechanisms by which tasks communicate, share state, and coordinate execution across a Directed Acyclic Graph (DAG). Airflow provides several interconnected systems to manage how data moves between tasks, how task states are tracked, and how assets trigger workflow execution.
The core components involved in data flow include XCom (Cross-Communication), Assets, and Task State management. These systems work together to enable complex workflows where tasks can pass data, react to external events, and maintain execution context throughout the DAG lifecycle.
XCom (Cross-Communication)
Purpose and Role
XCom is Airflow's primary mechanism for inter-task communication. It allows tasks to push and pull values that can be used by downstream tasks in the same DAG. XComs are stored in the Airflow metadata database and can contain any serializable Python object.
XCom Data Model
| Attribute | Type | Description |
|---|---|---|
key | String | Identifier for the XCom value |
value | Any | The actual data being passed |
task_id | String | Task that produced the XCom |
dag_id | String | DAG containing the task |
run_id | String | DAG run identifier |
map_index | Integer | Index for mapped tasks (-1 for non-mapped) |
timestamp | DateTime | When the XCom was created |
connection_id | String | Optional connection for large data |
XCom API Operations
#### Push (Sending XCom Values)
Tasks can push XCom values using the xcom_push() method:
task_instance.xcom_push(key="result", value={"data": "example"})
Or implicitly by returning a value from a task:
@task
def process_data():
return {"processed": True, "count": 42}
#### Pull (Receiving XCom Values)
Downstream tasks can retrieve XCom values using xcom_pull():
upstream_result = ti.xcom_pull(task_ids=["process_data"], key="result")
XComArg
XComArg provides a more declarative way to reference XCom values, enabling type-safe access to task outputs:
from airflow.models.xcom_arg import XComArg
processed_data = XComArg(task_id="process_data")
result = processed_data(map_indexes=[0])
XComArg supports the following parameters:
| Parameter | Type | Description |
|---|---|---|
task_id | String | Source task identifier |
dag_id | String | Optional DAG identifier |
map_index | Integer/List | Specific map index or indices |
key | String | XCom key to retrieve |
Asset Management
Asset Model
Assets represent data sources or sinks that Airflows can monitor. They enable event-driven scheduling where DAGs can be triggered when underlying data changes.
| Attribute | Type | Description |
|---|---|---|
name | String | Human-readable asset name |
uri | String | Unique identifier (URN, path, etc.) |
group | String | Logical grouping of assets |
extra | Dict | Additional metadata |
created_at | DateTime | Creation timestamp |
updated_at | DateTime | Last modification timestamp |
Asset States
Asset states track the lifecycle and availability of assets:
class AssetState:
"""Represents the state of an asset in the system."""
IDLE = "idle"
SCHEDULED = "scheduled"
RUNNING = "running"
SUCCESS = "success"
FAILED = "failed"
#### State Transitions
graph TD
A[IDLE] -->|Asset referenced| B[SCHEDULED]
B -->|Scheduler picks up| C[RUNNING]
C -->|Success| D[SUCCESS]
C -->|Failure| E[FAILED]
D -->|Next schedule| B
E -->|Retry| BTask State Management
Task States
Task instances progress through a defined set of states:
| State | Description |
|---|---|
none | Task has not been triggered |
queued | Task is queued for execution |
running | Task is currently executing |
success | Task completed successfully |
failed | Task failed during execution |
upstream_failed | Upstream task dependency failed |
skipped | Task was skipped due to branching |
up_for_retry | Task failed but will be retried |
up_for_reschedule | Task is waiting for reschedule |
TaskState Model
class TaskState:
"""Tracks the current state and context of a task instance."""
task_id: str
dag_id: str
run_id: str
state: TaskInstanceState
try_number: int
max_tries: int
start_date: Optional[datetime]
end_date: Optional[datetime]
duration: Optional[float]
State Persistence
Task states are persisted to the metadata database, enabling:
- Recovery from scheduler restarts
- Historical execution tracking
- DAG run state reconstruction
- SLA monitoring and alerting
Data Flow Architecture
Task Execution Flow
graph TD
A[DAG Trigger] --> B[Scheduler]
B --> C{Dependency Check}
C -->|All met| D[Queue Task]
C -->|Not met| E[Wait]
D --> F[Executor]
F --> G[Worker]
G --> H[Execute Task]
H --> I{Push XCom}
I -->|Yes| J[Store in DB]
J --> K[Task Complete]
I -->|No| K
K --> L[Update TaskState]
L --> M[Check Downstream]
M --> CXCom Storage Flow
graph LR
A[Task Instance] -->|xcom_push| B[XCom Table]
C[Task Instance] -->|xcom_pull| B
B -->|query| D[Metadata DB]
D -->|result| CSerialization and Deserialization
XComArg Serialization
When DAGs are serialized (for example, for the webserver or worker), XComArg references must be properly handled:
class XComArgBase:
"""Base class for serializable XCom arguments."""
def serialize(self) -> dict:
"""Convert XComArg to dictionary format."""
return {
"task_id": self.task_id,
"dag_id": self.dag_id,
"key": self.key,
"map_index": self.map_index,
}
@staticmethod
def deserialize(data: dict) -> "XComArgBase":
"""Reconstruct XComArg from dictionary."""
return XComArg(**data)
Best Practices
Data Flow Design
- Minimize XCom payload size - Large XCom values impact database performance
- Use Assets for external data - Let the scheduler handle dependency tracking
- Prefer pull over push patterns - Downstream tasks should pull required data
- Clear XCom when unnecessary - Use
xcom_clear()to prevent accumulation
State Management
- Monitor task durations - Track state transitions for performance analysis
- Configure appropriate retries - Set
retriesandretry_delaybased on task reliability - Use SLA alerts - Configure
slaparameter for critical task deadlines - Clean up failed states - Implement proper error handling and state recovery
Configuration Options
| Parameter | Default | Description |
|---|---|---|
xcom_pickle_tasks_on_error | False | Serialize task data to XCom on error |
max_xcom_size | 1048576 | Maximum XCom value size in bytes |
xcom_backend | airflow.models.xcom.BaseXCom | Custom XCom backend class |
enable_asset_uri_validation | True | Validate asset URIs on creation |
Related Documentation
Source: https://github.com/apache/airflow / Human Manual
Connections and Variables
Related topics: Data Flow and State Management, Docker and Helm Deployment
Continue reading this section for the full explanation and source context.
Continue reading this section for the full explanation and source context.
Continue reading this section for the full explanation and source context.
Continue reading this section for the full explanation and source context.
Related Pages
Related topics: Data Flow and State Management, Docker and Helm Deployment
Connections and Variables
Overview
Connections and Variables are two fundamental abstractions in Apache Airflow for managing configuration and external system integrations. They enable DAGs to store, retrieve, and utilize configuration data and credentials without hardcoding sensitive information.
Connections provide a secure way to store and manage credentials for external systems such as databases, APIs, cloud services, and file systems. They centralize authentication configuration in one place, making it easy to manage, update, and audit access to external resources.
Variables provide a simple key-value store for storing arbitrary configuration data that can be accessed across DAGs and tasks. They are useful for storing environment-specific settings, feature flags, and general configuration values.
Both features support multiple backend storage mechanisms and can be managed through the Airflow UI, CLI, or programmatically via the API.
Connections
Purpose and Scope
Connections in Airflow encapsulate all the information needed to connect to an external system. Each connection includes:
- A unique connection identifier (conn_id)
- Connection type (conn_type) specifying the external system category
- Host, port, login, and password fields
- Schema and extra parameters for additional configuration
Connections are stored in the Airflow metadata database by default but can also be backed by secrets backends such as HashiCorp Vault, AWS Secrets Manager, Azure Key Vault, or Google Cloud Secret Manager.
Connection Types
Airflow supports numerous connection types through provider packages. The core supported types include:
| Connection Type | Description | Default Port |
|---|---|---|
facebook_social | Facebook OAuth/social authentication | N/A |
fs | Filesystem connection | N/A |
ftp | FTP/SFTP server | 21 |
google_cloud_platform | Google Cloud Platform services | N/A |
gremlin | Apache TinkerPop Gremlin server | 8182 |
hive_cli | Hive command-line interface | 10000 |
hiveserver2 | HiveServer2 JDBC interface | 10000 |
http | HTTP/HTTPS endpoints | 443 |
iceberg | Apache Iceberg catalog | N/A |
Sources: airflow-core/src/airflow/utils/db.py:200-260
Connection Management via CLI
The Airflow CLI provides comprehensive commands for managing connections:
# List all connections
airflow connections list
# Get a specific connection
airflow connections get <conn_id>
# Add a new connection
airflow connections add <conn_id> --conn-type http --host https://api.example.com
# Delete a connection
airflow connections delete <conn_id>
# Export all connections
airflow connections export /tmp/connections.json
# Import connections from file
airflow connections import /tmp/connections.json
# Test a connection
airflow connections test <conn_id>
The connection commands are defined in the CLI configuration with lazy-loaded implementations:
CONNECTIONS_COMMANDS = (
ActionCommand(name="get", func=lazy_load_command("airflow.cli.commands.connection_command.connections_get"), ...),
ActionCommand(name="list", func=lazy_load_command("airflow.cli.commands.connection_command.connections_list"), ...),
ActionCommand(name="add", func=lazy_load_command("airflow.cli.commands.connection_command.connections_add"), ...),
ActionCommand(name="delete", func=lazy_load_command("airflow.cli.commands.connection_command.connections_delete"), ...),
ActionCommand(name="export", func=lazy_load_command("airflow.cli.commands.connection_command.connections_export"), ...),
ActionCommand(name="import", func=lazy_load_command("airflow.cli.commands.connection_command.connections_import"), ...),
ActionCommand(name="test", func=lazy_load_command("airflow.cli.commands.connection_command.connections_test"), ...),
)
Sources: airflow-core/src/airflow/cli/cli_config.py:1-50
Connection Storage Architecture
graph TD
A[Airflow CLI/UI/API] --> B[Connection CRUD Operations]
B --> C{Secrets Backend}
C -->|None/Default| D[Airflow Metadata Database]
C -->|HashiCorp Vault| E[Vault Secrets]
C -->|AWS| F[AWS Secrets Manager]
C -->|Azure| G[Azure Key Vault]
C -->|GCP| H[GCP Secret Manager]
D --> I[connection Table]
E --> J[Vault Path]
F --> K[AWS Secrets]
G --> L[Azure Keys]
H --> M[GCP Secrets]Using Connections in DAGs
Connections are accessed in DAGs through the BaseHook class:
from airflow.hooks.base import BaseHook
def get_external_api_data():
conn = BaseHook.get_connection("my_external_api")
api_key = conn.password
base_url = conn.host
# Use credentials to make API calls
...
Connections with custom configurations can utilize the extra field for JSON-encoded parameters:
Connection(
conn_id="ftp_default",
conn_type="ftp",
host="localhost",
port=21,
login="airflow",
password="airflow",
extra='{"key_file": "~/.ssh/id_rsa", "no_host_key_check": true}'
)
Sources: airflow-core/src/airflow/utils/db.py:230-240
Variables
Purpose and Scope
Variables provide a simple key-value storage mechanism for storing configuration data that can be shared across DAGs and tasks. They support string values with optional JSON serialization for complex data structures.
Key characteristics of Variables:
- Key-value pairs stored in the Airflow metadata database
- Optional JSON serialization for non-string values
- Support for environment-specific configuration
- Accessible from all DAGs and tasks
- Can be exported/imported in bulk
Variable Management via CLI
The Airflow CLI provides comprehensive commands for managing variables:
# List all variables
airflow variables list
# Get a specific variable
airflow variables get <var_key>
# Set a variable
airflow variables set <var_key> <var_value>
# Set a variable with JSON serialization
airflow variables set config_json '{"setting": true}' --serialize-json
# Delete a variable
airflow variables delete <var_key>
# Export all variables
airflow variables export /tmp/variables.json
# Import variables from file
airflow variables import /tmp/variables.json
The variable commands are defined with support for serialization options:
VARIABLES_COMMANDS = (
ActionCommand(name="list", func=lazy_load_command("airflow.cli.commands.variable_command.variables_list"), ...),
ActionCommand(name="get", func=lazy_load_command("airflow.cli.commands.variable_command.variables_get"),
args=(ARG_VAR, ARG_DESERIALIZE_JSON, ARG_DEFAULT, ARG_VERBOSE)),
ActionCommand(name="set", func=lazy_load_command("airflow.cli.commands.variable_command.variables_set"),
args=(ARG_VAR, ARG_VAR_VALUE, ARG_VAR_DESCRIPTION, ARG_SERIALIZE_JSON, ARG_VERBOSE)),
ActionCommand(name="delete", func=lazy_load_command("airflow.cli.commands.variable_command.variables_delete"), ...),
ActionCommand(name="export", func=lazy_load_command("airflow.cli.commands.variable_command.variables_export"), ...),
ActionCommand(name="import", func=lazy_load_command("airflow.cli.commands.variable_command.variables_import"), ...),
)
Sources: airflow-core/src/airflow/cli/cli_config.py:50-80
Variable Storage Architecture
graph TD
A[DAG/Task Code] --> B[Variable API]
B --> C[Secrets Backend]
C -->|Default| D[Metadata Database]
C -->|Backend| E[External Secrets]
D --> F[variable Table]
E --> F
B --> G[JSON Deserializer]
G --> H[Python Objects]
F --> I[Key-Value Store]Using Variables in DAGs
Variables can be accessed using the Variable class:
from airflow.models import Variable
# Get a simple string variable
api_endpoint = Variable.get("api_endpoint")
# Get with default value
timeout = Variable.get("timeout", default_var=30)
# Get and deserialize JSON
config = Variable.get("my_config", deserialize_json=True)
# Set a variable
Variable.set("my_key", "my_value")
Variable.set("config", {"setting": True}, serialize_json=True)
Secrets Backend Integration
Both Connections and Variables can be backed by external secrets backends for enhanced security. This allows storing sensitive data in enterprise-grade secret management systems while maintaining the same Airflow API interface.
Available Secrets Backends
| Backend | Package | Description |
|---|---|---|
| HashiCorp Vault | airflow.providers.hashicorp | HashiCorp Vault KV secrets engine |
| AWS Secrets Manager | airflow.providers.amazon | AWS Secrets Manager and SSM Parameter Store |
| Azure Key Vault | airflow.providers.microsoft.azure | Azure Key Vault secrets |
| GCP Secret Manager | airflow.providers.google | Google Cloud Secret Manager |
| Environment Variables | Built-in | Read from OS environment variables |
| Local Files | Built-in | Read from JSON/YAML files |
Configuring Secrets Backends
The secrets backend is configured via the [secrets] section in airflow.cfg:
[secrets]
backend = airflow.providers.google.cloud.secrets.secret_manager.CloudSecretManagerBackend
backend_kwargs = {"project_id": "my-project", "connections_prefix": "airflow-connections"}
Best Practices
Security Considerations
- Never hardcode credentials - Always use Connections or Variables for sensitive data
- Use secrets backends - For production environments, use enterprise secret management systems
- Enable encryption - Ensure the Airflow metadata database is encrypted at rest
- Rotate credentials - Regularly rotate passwords and API keys stored in connections
- Audit access - Monitor and log access to sensitive connections and variables
Performance Considerations
- Avoid frequent variable reads - Cache variable values when used repeatedly in a task
- Use connection pooling - Many hooks automatically pool connections; configure appropriately
- Limit extra field size - Keep connection
extraJSON data minimal for performance
Operational Considerations
- Use meaningful conn_ids - Follow naming conventions like
{env}_{system}_{purpose} - Document connections - Use the description field to document connection purpose and owners
- Export for disaster recovery - Regularly export connections and variables for backup
# Backup connections and variables
airflow connections export /opt/airflow/backups/connections_$(date +%Y%m%d).json
airflow variables export /opt/airflow/backups/variables_$(date +%Y%m%d).json
CLI Command Reference
Connection Commands
| Command | Description |
|---|---|
airflow connections list | List all connections |
airflow connections get <conn_id> | Get connection details |
airflow connections add <conn_id> [options] | Create a connection |
airflow connections delete <conn_id> | Delete a connection |
airflow connections export <file> | Export connections to file |
airflow connections import <file> | Import connections from file |
airflow connections test <conn_id> | Test connection availability |
airflow connections create-default-connections | Initialize provider default connections |
Variable Commands
| Command | Description |
|---|---|
airflow variables list | List all variables |
airflow variables get <key> | Get variable value |
airflow variables set <key> <value> | Set variable value |
airflow variables delete <key> | Delete a variable |
airflow variables export <file> | Export variables to file |
airflow variables import <file> | Import variables from file |
See Also
- Airflow Connections Documentation
- Airflow Variables Documentation
- Secrets Backend Configuration
- Google Cloud Composer Hook - Example of connection usage in provider hooks
Docker and Helm Deployment
Related topics: Kubernetes Deployment, Architecture Overview
Continue reading this section for the full explanation and source context.
Continue reading this section for the full explanation and source context.
Continue reading this section for the full explanation and source context.
Continue reading this section for the full explanation and source context.
Related Pages
Related topics: Kubernetes Deployment, Architecture Overview
Docker and Helm Deployment
Apache Airflow provides comprehensive support for containerized and Kubernetes-based deployments through Docker images and Helm charts. This documentation covers the deployment mechanisms, configuration options, and best practices for running Airflow in these environments.
Overview
Apache Airflow can be deployed using two primary methods:
| Deployment Method | Use Case | Primary Files |
|---|---|---|
| Docker | Local development, single-node deployments, CI/CD pipelines | Dockerfile, Dockerfile.ci |
| Helm Chart | Production Kubernetes deployments, distributed systems | chart/Chart.yaml, chart/values.yaml |
Sources: docker-stack-docs/README.md
Docker Image Architecture
Apache Airflow ships with two main Docker images optimized for different use cases.
Production Image (`Dockerfile`)
The production Docker image is built using multi-stage builds and includes all necessary components for running Airflow in production environments. The image uses python:{DEFAULT_PYTHON_MAJOR_MINOR_VERSION}-slim-{ALLOWED_DEBIAN_VERSIONS[0]} as its base.
Key characteristics of the production image:
- Base OS: Debian slim variant for minimal footprint
- Package Manager: Uses
uv(Ultraviolet) for faster pip installations - Architecture: Multi-stage build for optimized image size
- User Space: Runs as non-root user for security
Sources: dev/breeze/src/airflow_breeze/commands/release_management_commands.py:55-62
CI Image (`Dockerfile.ci`)
The CI image is designed for continuous integration workflows and includes additional tooling for testing and development.
# syntax=docker/dockerfile:1.4
FROM python:{DEFAULT_PYTHON_MAJOR_MINOR_VERSION}-slim-{ALLOWED_DEBIAN_VERSIONS[0]}
RUN apt-get update && apt-get install -y --no-install-recommends libatomic1 git curl
RUN pip install uv=={UV_VERSION}
RUN --mount=type=cache,id=cache-airflow-build-dockerfile-installation,target=/root/.cache/ \
uv pip install --system ignore pip=={AIRFLOW_PIP_VERSION} hatch=={HATCH_VERSION} \
pyyaml=={PYYAML_VERSION} gitpython=={GITPYTHON_VERSION} rich=={RICH_VERSION} \
prek=={PREK_VERSION}
COPY . /opt/airflow
Sources: dev/breeze/src/airflow_breeze/commands/release_management_commands.py:56-65
Version Constants
The build process uses the following pinned version constants:
| Constant | Version | Purpose |
|---|---|---|
AIRFLOW_PIP_VERSION | 26.1.1 | pip package manager |
AIRFLOW_UV_VERSION | 0.11.11 | Fast Python package installer |
GITPYTHON_VERSION | 3.1.50 | Git operations in Python |
RICH_VERSION | 15.0.0 | Rich terminal output |
HATCH_VERSION | 1.16.5 | Python packaging |
PYYAML_VERSION | 6.0.3 | YAML parsing |
PREK_VERSION | 0.3.13 | Pre-commit hooks |
Sources: dev/breeze/src/airflow_breeze/commands/release_management_commands.py:70-76
Helm Chart Deployment
The Apache Airflow Helm chart provides a production-ready deployment mechanism for Kubernetes clusters.
Chart Metadata
| Property | Value |
|---|---|
| Chart Name | airflow |
| Repository | Apache Airflow Official |
| Kubernetes Support | v1.30+, v1.31+, v1.32+, v1.33+ |
Supported Kubernetes versions are determined by the major cloud providers (EKS, AKS, GKE) support windows.
Sources: dev/breeze/src/airflow_breeze/global_constants.py:48-54
Core Components
The Helm chart deploys the following core Airflow components:
graph TD
A[Helm Chart] --> B[Webserver]
A --> C[Scheduler]
A --> D[Triggerer]
A --> E[Worker]
A --> F[Flower]
A --> G[StatsD]
A --> H[Redis]
A --> I[PostgreSQL/MySQL]
B -->|Read/Write| I
C -->|Queue Jobs| H
D -->|Async Tasks| H
E -->|Process Tasks| HScheduler Deployment
The scheduler is deployed as a Kubernetes Deployment with the following characteristics:
- Replicas: Configurable via
replicasparameter - Resources: Configurable CPU and memory limits
- Health Checks: Liveness and readiness probes
- Persistence: Optional volume mounts for DAGs and logs
Sources: chart/templates/scheduler/scheduler-deployment.yaml
Worker Deployment
Workers process tasks from the message queue and are deployed as:
- Deployment Type: StatefulSet or Deployment based on configuration
- Scaling: Horizontal Pod Autoscaler (HPA) support
- Queue Configuration: Multiple queues supported
- Resource Management: Configurable resource requests and limits
Sources: chart/templates/workers/worker-deployment.yaml
Installation Methods
Installing from PyPI
While it is possible to install Airflow using tools like Poetry or pip-tools, only pip installation is currently officially supported.
Note: Installing via Poetry or pip-tools is not currently supported.
For repeatable installation, Airflow maintains "known-to-be-working" constraint files in the orphan constraints-main and constraints-2-0 branches. These constraint files are maintained per major/minor Python version.
Sources: generated/PYPI_README.md
Installing Providers from Sources
Providers can be installed from source with SHA512 verification:
shasum -a 512 {{ package_name }}-{{ package_version }}.tar.gz | diff - {{ package_name }}-{{ package_version }}.tar.gz.sha512
This ensures the integrity of the downloaded package against the provided checksum.
Sources: devel-common/src/sphinx_exts/includes/installing-providers-from-sources.rst
Helm Chart Package Preparation
The release management tooling includes commands for preparing Helm chart packages:
graph LR
A[Chart Source] --> B[helm package]
B --> C[Deterministic Repack]
C --> D[PGP Signature]
D --> E[Distribution Archive]Package Signing
Helm chart packages can be signed using GPG for verification:
helm gpg sign -u <sign-email> <archive-name>
The signing process generates a provenance file (.tgz.prov) that can be used to verify the package integrity.
Sources: dev/breeze/src/airflow_breeze/commands/release_management_commands.py:400-420
Software Bill of Materials (SBOM)
The Airflow project generates and maintains SBOM information for security and compliance purposes.
SBOM Commands
The breeze CLI provides commands for managing SBOM information:
breeze sbom update-sbom-information [OPTIONS]
#### Command Options
| Option | Type | Description |
|---|---|---|
--airflow-site-archive-path | Path | Directory for airflow-site-archive |
--airflow-root-path | Path | Root of the airflow repository |
--airflow-version | String | Version of airflow to update SBOM |
--airflow-constraints-reference | String | Constraints reference for SBOM generation |
These files are placed in airflow-site-archive/docs-archive/ or generated/_build/docs/apache-airflow/stable/ depending on the configuration.
Sources: dev/breeze/src/airflow_breeze/commands/sbom_commands.py
SBOM File Generation
SBOM files are generated based on:
provider.yamlfiles- Airflow constraints
- Provider tags and releases
The SBOM information is automatically regenerated using the breeze release-management generate-providers-metadata command.
Sources: generated/README.md
Docker Compose Deployment
For development and testing purposes, Airflow can be deployed using Docker Compose.
Prerequisites
- Docker Engine
- Docker Compose v2+
- SQLite database (automatically created if
AIRFLOW_HOMEis not set)
Basic Usage
For example commands that start Airflow, refer to the Executing commands documentation.
Sources: docker-stack-docs/README.md
Deployment Architecture
graph TD
subgraph "Client Layer"
A[Airflow CLI] --> B[REST API]
C[Web UI] --> B
end
subgraph "Core Services"
D[Scheduler] --> E[(Metadata DB)]
F[Triggerer] --> E
G[Webserver] --> E
end
subgraph "Task Execution"
H[Workers] --> I[Message Queue]
D --> I
F --> I
end
subgraph "Storage"
J[DAGs Repository]
K[Logs Storage]
L[Plugins]
end
H --> K
G --> K
D --> JDefault Connections
The Airflow deployment includes pre-configured connection templates for common services:
| Connection ID | Type | Default Settings |
|---|---|---|
google_cloud_default | Google Cloud Platform | Schema: default |
http_default | HTTP | Host: https://www.httpbin.org/ |
ftp_default | FTP | localhost:21 |
hive_cli_default | Hive CLI | localhost:10000 |
hiveserver2_default | HiveServer2 | localhost:10000 |
fs_default | File System | Path: / |
gremlin_default | Gremlin | Host: gremlin:8182 |
iceberg_default | Iceberg | HTTPS endpoint |
Sources: airflow-core/src/airflow/utils/db.py
Verification and Security
Package Verification
PyPI releases can be verified by downloading the package, signature, and SHA sum files:
#!/bin/bash
PACKAGE_VERSION={{ package_version }}
PACKAGE_NAME={{ package_name }}
provider_download_dir=$(mktemp -d)
pip download --no-deps "${PACKAGE_NAME}==${PACKAGE_VERSION}" --dest "${provider_download_dir}"
curl "{{ base_url }}/{{ package_name_underscores }}-{{ package_version }}-py3-none-any.whl.asc" \
-L -o "${provider_download_dir}/{{ package_name_underscores }}-{{ package_version }}-py3-none-any.whl.asc"
curl "{{ base_url }}/{{ package_name_underscores }}-{{ package_version }}-py3-none-any.whl.sha512" \
-L -o "${provider_download_dir}/{{ package_name_underscores }}-{{ package_version }}-py3-none-any.whl.sha512"
echo "Please verify files downloaded to ${provider_download_dir}"
ls -
Sources: devel-common/src/sphinx_exts/includes/installing-providers-from-sources.rst
Related Documentation
Sources: docker-stack-docs/README.md
Kubernetes Deployment
Related topics: Docker and Helm Deployment, Scheduler and Executor Architecture
Continue reading this section for the full explanation and source context.
Continue reading this section for the full explanation and source context.
Continue reading this section for the full explanation and source context.
Continue reading this section for the full explanation and source context.
Related Pages
Related topics: Docker and Helm Deployment, Scheduler and Executor Architecture
Kubernetes Deployment
Apache Airflow provides comprehensive Kubernetes support through the CNCF Kubernetes provider package. This integration enables Airflow to run tasks as Kubernetes Pods, providing dynamic resource allocation, isolation, and scalability for workflow execution.
Architecture Overview
Airflow's Kubernetes deployment consists of multiple components that work together to provide a flexible, scalable execution environment.
Kubernetes Executor Architecture
The Kubernetes Executor is one of Airflow's core executors that launches a new Pod for each task instance. Unlike the Celery Executor which reuses workers, the Kubernetes Executor creates isolated Pods for each task execution.
graph TD
A[Airflow Scheduler] -->|submits task| B[Kubernetes Executor]
B -->|creates| C[Task Pod]
B -->|watches| C
C -->|completes| D[Pod Status Update]
D -->|pods cleaned up| E[Resource Cleanup]
F[Kubernetes API Server] -->|manages| C
G[Worker Nodes] -->|hosts| CCore Components
| Component | File Path | Purpose |
|---|---|---|
| KubernetesExecutor | providers/cncf/kubernetes/.../executors/kubernetes_executor.py | Main executor implementation |
| KubernetesPodOperator | providers/cncf/kubernetes/.../operators/pod.py | Operator for running pods |
| KubeConfig | providers/cncf/kubernetes/.../kube_config.py | Configuration management |
| PodGenerator | providers/cncf/kubernetes/.../pod_generator.py | Pod specification builder |
Configuration
Kubernetes Executor Configuration
The Kubernetes Executor requires configuration in the airflow.cfg file under the [kubernetes] section.
[core]
executor = airflow.providers.cncf.kubernetes.executors.kubernetes_executor.KubernetesExecutor
[kubernetes]
namespace = default
pod_template_file = /path/to/pod_template.yaml
worker_container_repository = apache/airflow
worker_container_tag = latest
Key Configuration Parameters
| Parameter | Description | Default |
|---|---|---|
namespace | Kubernetes namespace for task pods | default |
pod_template_file | Path to pod template yaml | - |
worker_container_repository | Docker image repository | apache/airflow |
worker_container_tag | Docker image tag | latest |
delete_worker_pods | Delete pods after task completion | True |
delete_worker_pods_on_failure | Delete pods on task failure | False |
worker_pods_creation_batch_size | Batch size for pod creation | 2 |
Sources: providers/cncf/kubernetes/src/airflow/providers/cncf/kubernetes/kube_config.py
Pod Generation
The PodGenerator class is responsible for constructing Kubernetes Pod specifications from Airflow task definitions.
Pod Template System
Airflow supports custom pod templates that define the base pod specification. These templates use Jinja2 templating to allow dynamic value injection.
apiVersion: v1
kind: Pod
metadata:
name: airflow-worker-pod
spec:
containers:
- name: base
image: "{{ container_image }}"
env:
- name: AIRFLOW__CORE__EXECUTOR
value: LocalExecutor
Sources: providers/cncf/kubernetes/src/airflow/providers/cncf/kubernetes/pod_generator.py
Dynamic Pod Configuration
The KubernetesPodOperator allows dynamic configuration of pod specifications at runtime:
from airflow.providers.cncf.kubernetes.operators.pod import KubernetesPodOperator
run_pod = KubernetesPodOperator(
task_id="run_kubernetes_pod",
name="my-task-pod",
namespace="default",
image="python:3.9-slim",
cmds=["python", "-c", "print('Hello from Kubernetes')"],
labels={"app": "airflow", "environment": "production"},
volumes=[config_volume],
volume_mounts=[config_volume_mount],
get_logs=True,
is_delete_operator_pod=True,
)
Airflow Components on Kubernetes
Triggerer Deployment
The Airflow Triggerer runs as a Kubernetes Deployment to manage triggering logic in a distributed environment.
apiVersion: apps/v1
kind: Deployment
metadata:
name: airflow-triggerer
namespace: airflow
spec:
replicas: 2
selector:
matchLabels:
component: triggerer
tier: airflow
template:
metadata:
labels:
component: triggerer
tier: airflow
spec:
serviceAccountName: airflow-triggerer
containers:
- name: triggerer
image: {{ .Values.images.triggerer }}
args: ["triggerer"]
ports:
- name: logs
containerPort: 8793
Sources: chart/templates/triggerer/triggerer-deployment.yaml
Component Services
| Component | Kubernetes Object | Purpose |
|---|---|---|
| Scheduler | Deployment | DAG parsing and task scheduling |
| Webserver | Deployment | Airflow UI serving |
| Triggerer | Deployment | Deferred task handling |
| Worker | Deployment/DaemonSet | Task execution |
| Flower | Deployment | Celery monitoring (optional) |
| Database | StatefulSet | Metadata storage |
| Redis | StatefulSet | Message broker |
Local Development with Skaffold
Airflow provides development workflows using Skaffold for hot-reloading code changes into running Kubernetes pods.
Skaffold Dev Loop
breeze k8s dev
This command runs the Skaffold dev loop to sync DAGs and airflow-core sources to running pods including scheduler, triggerer, dag-processor, and API Server with hot-reload support.
@pulumi_benchmark.group.command(
name="dev",
help=(
"Run skaffold dev loop to sync dags and airflow-core sources to running pods "
"(scheduler/triggerer/dag-processor/API Server hot-reload; UI auto-refresh not supported yet)."
),
)
Sources: dev/breeze/src/airflow_breeze/commands/kubernetes_commands.py
KubernetesPodOperator
The KubernetesPodOperator allows users to define and execute arbitrary Kubernetes Pods as Airflow tasks.
Operator Features
| Feature | Description |
|---|---|
| Full Pod Spec | Define complete pod specifications |
| Volume Management | Support for ConfigMaps, Secrets, PVCs |
| Image Pull | Private registry authentication |
| Resource Limits | CPU and memory constraints |
| Node Selection | Pod affinity and node selectors |
| Sidecars | Support for sidecar containers |
Basic Usage
from airflow.providers.cncf.kubernetes.operators.pod import KubernetesPodOperator
with dag:
k = KubernetesPodOperator(
task_id="demo_pod",
name="demo-pod",
image="python:3.9",
cmds=["python", "-c", "print('Hello World')"],
namespace="default",
is_delete_operator_pod=True,
get_logs=True,
)
Sources: providers/cncf/kubernetes/src/airflow/providers/cncf/kubernetes/operators/pod.py
Executor Implementation
Task Execution Flow
sequenceDiagram
participant Scheduler
participant K8sExecutor
participant KubeAPI
participant Pod
Scheduler->>K8sExecutor: Queue task for execution
K8sExecutor->>KubeAPI: Create Pod
KubeAPI->>Pod: Launch pod
Pod->>Pod: Execute task
Pod->>KubeAPI: Report completion
KubeAPI->>K8sExecutor: Pod completed
K8sExecutor->>Scheduler: Task resultExecutor Configuration
class KubernetesExecutor:
"""Kubernetes executor implementation."""
def __init__(self):
self.kube_config = KubeConfig()
self.manager = PodLauncher(
kube_config=self.kube_config,
in_cluster=self.kube_config.get("in_cluster", False),
)
Sources: providers/cncf/kubernetes/src/airflow/providers/cncf/kubernetes/executors/kubernetes_executor.py
Security Considerations
Service Account Configuration
Running Airflow on Kubernetes requires proper service account configuration with appropriate RBAC permissions.
apiVersion: v1
kind: ServiceAccount
metadata:
name: airflow-executor
namespace: airflowSources: providers/cncf/kubernetes/src/airflow/providers/cncf/kubernetes/kube_config.py
Doramagic Pitfall Log
Source-linked risks stay visible on the manual page so the preview does not read like a recommendation.
First-time setup may fail or require extra isolation and rollback planning.
The project should not be treated as fully validated until this signal is reviewed.
Users cannot judge support quality until recent activity, releases, and issue response are checked.
The project may affect permissions, credentials, data exposure, or host boundaries.
Doramagic Pitfall Log
Doramagic extracted 16 source-linked risk signals. Review them before installing or handing real data to the project.
1. Installation risk: `ExternalTaskSensor` can succeed early for task groups with NULL task states
- Severity: high
- Finding: Installation risk is backed by a source signal:
ExternalTaskSensorcan succeed early for task groups with NULL task states. Treat it as a review item until the current version is checked. - User impact: First-time setup may fail or require extra isolation and rollback planning.
- Recommended check: Open the linked source, confirm whether it still applies to the current version, and keep the first run isolated.
- Evidence: Source-linked evidence: https://github.com/apache/airflow/issues/66877
2. Capability assumption: README/documentation is current enough for a first validation pass.
- Severity: medium
- Finding: README/documentation is current enough for a first validation pass.
- User impact: The project should not be treated as fully validated until this signal is reviewed.
- Recommended check: Open the linked source, confirm whether it still applies to the current version, and keep the first run isolated.
- Evidence: capability.assumptions | github_repo:33884891 | https://github.com/apache/airflow | README/documentation is current enough for a first validation pass.
3. Maintenance risk: Maintainer activity is unknown
- Severity: medium
- Finding: Maintenance risk is backed by a source signal: Maintainer activity is unknown. Treat it as a review item until the current version is checked.
- User impact: Users cannot judge support quality until recent activity, releases, and issue response are checked.
- Recommended check: Open the linked source, confirm whether it still applies to the current version, and keep the first run isolated.
- Evidence: evidence.maintainer_signals | github_repo:33884891 | https://github.com/apache/airflow | last_activity_observed missing
4. Security or permission risk: no_demo
- Severity: medium
- Finding: no_demo
- User impact: The project may affect permissions, credentials, data exposure, or host boundaries.
- Recommended check: Open the linked source, confirm whether it still applies to the current version, and keep the first run isolated.
- Evidence: downstream_validation.risk_items | github_repo:33884891 | https://github.com/apache/airflow | no_demo; severity=medium
5. Security or permission risk: No sandbox install has been executed yet; downstream must verify before user use.
- Severity: medium
- Finding: No sandbox install has been executed yet; downstream must verify before user use.
- User impact: The project may affect permissions, credentials, data exposure, or host boundaries.
- Recommended check: Open the linked source, confirm whether it still applies to the current version, and keep the first run isolated.
- Evidence: risks.safety_notes | github_repo:33884891 | https://github.com/apache/airflow | No sandbox install has been executed yet; downstream must verify before user use.
6. Security or permission risk: no_demo
- Severity: medium
- Finding: no_demo
- User impact: The project may affect permissions, credentials, data exposure, or host boundaries.
- Recommended check: Open the linked source, confirm whether it still applies to the current version, and keep the first run isolated.
- Evidence: risks.scoring_risks | github_repo:33884891 | https://github.com/apache/airflow | no_demo; severity=medium
7. Security or permission risk: Apache Airflow 3.1.6
- Severity: medium
- Finding: Security or permission risk is backed by a source signal: Apache Airflow 3.1.6. Treat it as a review item until the current version is checked.
- User impact: The project may affect permissions, credentials, data exposure, or host boundaries.
- Recommended check: Open the linked source, confirm whether it still applies to the current version, and keep the first run isolated.
- Evidence: Source-linked evidence: https://github.com/apache/airflow/releases/tag/3.1.6
8. Security or permission risk: Apache Airflow 3.1.7
- Severity: medium
- Finding: Security or permission risk is backed by a source signal: Apache Airflow 3.1.7. Treat it as a review item until the current version is checked.
- User impact: The project may affect permissions, credentials, data exposure, or host boundaries.
- Recommended check: Open the linked source, confirm whether it still applies to the current version, and keep the first run isolated.
- Evidence: Source-linked evidence: https://github.com/apache/airflow/releases/tag/3.1.7
9. Security or permission risk: Apache Airflow 3.1.8
- Severity: medium
- Finding: Security or permission risk is backed by a source signal: Apache Airflow 3.1.8. Treat it as a review item until the current version is checked.
- User impact: The project may affect permissions, credentials, data exposure, or host boundaries.
- Recommended check: Open the linked source, confirm whether it still applies to the current version, and keep the first run isolated.
- Evidence: Source-linked evidence: https://github.com/apache/airflow/releases/tag/3.1.8
10. Security or permission risk: Apache Airflow 3.2.0
- Severity: medium
- Finding: Security or permission risk is backed by a source signal: Apache Airflow 3.2.0. Treat it as a review item until the current version is checked.
- User impact: The project may affect permissions, credentials, data exposure, or host boundaries.
- Recommended check: Open the linked source, confirm whether it still applies to the current version, and keep the first run isolated.
- Evidence: Source-linked evidence: https://github.com/apache/airflow/releases/tag/3.2.0
11. Security or permission risk: Apache Airflow 3.2.1
- Severity: medium
- Finding: Security or permission risk is backed by a source signal: Apache Airflow 3.2.1. Treat it as a review item until the current version is checked.
- User impact: The project may affect permissions, credentials, data exposure, or host boundaries.
- Recommended check: Open the linked source, confirm whether it still applies to the current version, and keep the first run isolated.
- Evidence: Source-linked evidence: https://github.com/apache/airflow/releases/tag/3.2.1
12. Security or permission risk: Apache Airflow Ctl (airflowctl) 0.1.2
- Severity: medium
- Finding: Security or permission risk is backed by a source signal: Apache Airflow Ctl (airflowctl) 0.1.2. Treat it as a review item until the current version is checked.
- User impact: The project may affect permissions, credentials, data exposure, or host boundaries.
- Recommended check: Open the linked source, confirm whether it still applies to the current version, and keep the first run isolated.
- Evidence: Source-linked evidence: https://github.com/apache/airflow/releases/tag/airflow-ctl/0.1.2
Source: Doramagic discovery, validation, and Project Pack records
Community Discussion Evidence
These external discussion links are review inputs, not standalone proof that the project is production-ready.
Count of project-level external discussion links exposed on this manual page.
Open the linked issues or discussions before treating the pack as ready for your environment.
Community Discussion Evidence
Doramagic exposes project-level community discussion separately from official documentation. Review these links before using airflow with real data or production workflows.
ExternalTaskSensorcan succeed early for task groups with NULL task st - github / github_issue- edge3: upgrade from 1.x silently leaves schema stale — `BaseDBManager.up - github / github_issue
- Apache Airflow Helm Chart 1.21.0 - github / github_release
- Apache Airflow 3.2.1 - github / github_release
- Apache Airflow 3.2.0 - github / github_release
- Apache Airflow Helm Chart 1.20.0 - github / github_release
- airflow-ctl/0.1.3 - github / github_release
- Apache Airflow 3.1.8 - github / github_release
- Apache Airflow Ctl (airflowctl) 0.1.2 - github / github_release
- Apache Airflow Helm Chart 1.19.0 - github / github_release
- Apache Airflow 3.1.7 - github / github_release
- Apache Airflow 3.1.6 - github / github_release
Source: Project Pack community evidence and pitfall evidence