# https://github.com/apache/airflow 项目说明书

生成时间：2026-05-15 12:21:31 UTC

## 目录

- [Project Introduction](#project-introduction)
- [Core Concepts](#core-concepts)
- [Architecture Overview](#architecture-overview)
- [Scheduler and Executor Architecture](#scheduler-executor)
- [REST API](#rest-api)
- [User Interface](#user-interface)
- [Data Flow and State Management](#data-flow-xcom)
- [Connections and Variables](#connections-variables)
- [Docker and Helm Deployment](#docker-helm)
- [Kubernetes Deployment](#kubernetes-deployment)

<a id='project-introduction'></a>

## Project Introduction

### 相关页面

相关主题：[Architecture Overview](#architecture-overview), [Core Concepts](#core-concepts)

<details>
<summary>相关源码文件</summary>

以下源码文件用于生成本页说明：

- [README.md](https://github.com/apache/airflow/blob/main/README.md)
- [INSTALLING.md](https://github.com/apache/airflow/blob/main/INSTALLING.md)
- [PROVIDERS.rst](https://github.com/apache/airflow/blob/main/PROVIDERS.rst)
- [airflow-core/src/airflow/version.py](https://github.com/apache/airflow/blob/main/airflow-core/src/airflow/version.py)
- [airflow-core/src/airflow/utils/db.py](https://github.com/apache/airflow/blob/main/airflow-core/src/airflow/utils/db.py)
- [airflow-core/src/airflow/_vendor/README.md](https://github.com/apache/airflow/blob/main/airflow-core/src/airflow/_vendor/README.md)
</details>

# Project Introduction

Apache Airflow is an open-source platform designed for **orchestrating, scheduling, and monitoring** complex workflows. Originally developed at Airbnb, it has become the industry standard for data pipeline orchestration, enabling users to define, schedule, and execute workflows as Directed Acyclic Graphs (DAGs).

资料来源：[README.md](https://github.com/apache/airflow/blob/main/README.md)

## Overview

Apache Airflow provides a comprehensive solution for workflow management with the following key characteristics:

| Aspect | Description |
|--------|-------------|
| **Type** | Open-source workflow orchestration platform |
| **Language** | Python |
| **License** | Apache License 2.0 |
| **Primary Use** | Data pipelines, ETL processes, ML workflows |
| **Architecture** | Distributed, scalable, extensible |

资料来源：[README.md](https://github.com/apache/airflow/blob/main/README.md)

## Core Architecture

Airflow follows a distributed architecture with several key components working together to manage workflow execution.

```mermaid
graph TD
    subgraph Client["Client Layer"]
        WebServer["Web Server"]
        CLI["Command Line Interface"]
        API["REST API"]
    end
    
    subgraph Scheduler["Core Components"]
        Scheduler["Scheduler"]
        Executor["Executor"]
        Database["Metadata Database"]
    end
    
    subgraph Workers["Worker Layer"]
        Workers["Workers"]
        Tasks["Task Instances"]
    end
    
    subgraph External["External Systems"]
        Triggerer["Triggerer"]
        Logger["Logging"]
    end
    
    WebServer --> Database
    CLI --> Database
    API --> Database
    Scheduler --> Database
    Scheduler --> Executor
    Executor --> Workers
    Workers --> Database
    Triggerer --> Database
    
    style Database fill:#f9f,stroke:#333,stroke-width:2px
```

资料来源：[airflow-core/src/airflow/cli/cli_config.py](https://github.com/apache/airflow/blob/main/airflow-core/src/airflow/cli/cli_config.py)

## DAG-Based Workflow Model

At the heart of Airflow is the **DAG (Directed Acyclic Graph)** concept. A DAG represents a collection of tasks with defined dependencies, arranged in a way that reflects the logical flow of work.

### Key DAG Properties

| Property | Description |
|----------|-------------|
| **Directed** | Tasks have explicit dependencies (upstream/downstream relationships) |
| **Acyclic** | No circular dependencies; workflows flow in one direction |
| **Graph** | Complex dependency structures are supported |

### Default Connections

Airflow ships with pre-configured connection definitions for common integrations:

| Connection ID | Type | Default Configuration |
|---------------|------|----------------------|
| `facebook_default` | facebook_social | Facebook Ad account credentials |
| `fs_default` | fs | Filesystem path `/` |
| `ftp_default` | ftp | localhost:21, SSH key auth |
| `google_cloud_default` | google_cloud_platform | Default GCP schema |
| `hive_cli_default` | hive_cli | localhost:10000, Beeline mode |
| `hiveserver2_default` | hiveserver2 | localhost:10000 |
| `http_default` | http | https://www.httpbin.org/ |
| `gremlin_default` | gremlin | gremlin:8182 |

资料来源：[airflow-core/src/airflow/utils/db.py](https://github.com/apache/airflow/blob/main/airflow-core/src/airflow/utils/db.py)

## Providers Ecosystem

Apache Airflow's functionality is extended through **Providers** — packages that integrate Airflow with external services and systems.

```mermaid
graph LR
    Airflow["Apache Airflow Core"] --> Providers["Providers"]
    Providers --> Google["Google Cloud Provider"]
    Providers --> AWS["Amazon Provider"]
    Providers --> Azure["Microsoft Azure Provider"]
    Providers --> Edge3["Edge3 Provider"]
    Providers --> Others["Other Providers"]
```

### Provider Management

Providers can be discovered and managed through the Airflow CLI:

```bash
airflow providers [list|get|widgets|sensors|hooks|executors]
```

### Key Provider Features

| Feature | Description |
|---------|-------------|
| **Hooks** | Interfaces to external systems for connection management |
| **Operators** | Pre-built task templates for common operations |
| **Sensors** | Tasks that wait for external conditions |
| **Transfers** | Tasks for moving data between systems |

资料来源：[PROVIDERS.rst](https://github.com/apache/airflow/blob/main/PROVIDERS.rst)

## Version Information

The Airflow version is managed centrally and can be accessed programmatically:

```python
# airflow-core/src/airflow/version.py
version = "2.11.0"
```

资料来源：[airflow-core/src/airflow/version.py](https://github.com/apache/airflow/blob/main/airflow-core/src/airflow/version.py)

## Installation Methods

Airflow supports multiple installation approaches to suit different environments and use cases.

| Method | Use Case | Command/Config |
|--------|----------|----------------|
| **PyPI** | Standard installation | `pip install apache-airflow` |
| **Sources** | Development/contribution | Clone repository and install |
| **Providers from sources** | Testing provider changes | Breeze commands |
| **Constraints** | Reproducible builds | Use constraints files |

资料来源：[INSTALLING.md](https://github.com/apache/airflow/blob/main/INSTALLING.md)

### Installing Specific Versions

The `--use-airflow-version` option provides flexibility for version installation:

| Version Specifier | Behavior |
|-------------------|----------|
| `none` | Skip Airflow installation |
| `wheel` | Install from local `dist/` folder |
| `sdist` | Install from source distribution |
| `owner/repo:branch` | Install from GitHub repository |
| `PR_NUMBER` | Install from a Pull Request |

资料来源：[dev/breeze/src/airflow_breeze/commands/common_options.py](https://github.com/apache/airflow/blob/main/dev/breeze/src/airflow_breeze/commands/common_options.py)

## Vendored Dependencies

Airflow maintains a `_vendor` package for dependencies that need special handling:

```mermaid
graph TD
    Vendor["_vendor Package"] --> License["Move to licenses/ folder"]
    Vendor --> Remove["Remove README/supporting files"]
    Vendor --> Requirements["Add to pyproject.toml"]
    Vendor --> Fixes["Re-apply historical fixes"]
```

### Vendoring Process

1. Update `vendor.md` with library, version, and SHA256 hash
2. Remove old files and directories
3. Move LICENSE files to `licenses/` folder
4. Add requirements to `pyproject.toml` with appropriate comments
5. Re-apply any necessary cherry-picked fixes

资料来源：[airflow-core/src/airflow/_vendor/README.md](https://github.com/apache/airflow/blob/main/airflow-core/src/airflow/_vendor/README.md)

## Development Environment (Breeze)

The **Breeze** tool is Airflow's primary development environment, providing consistent tooling for testing, building, and development.

### Key Breeze Commands

| Command | Purpose |
|---------|---------|
| `breeze sbom update-sbom-information` | Update SBOM information |
| `breeze workflow run-publish` | Run documentation workflow |
| `breeze build` | Build Airflow images |

### Breeze Configuration Options

| Option | Description |
|--------|-------------|
| `--airflow-version` | Specify Airflow version |
| `--debian-version` | Select base Debian version |
| `--docker-cache` | Configure build caching |
| `--mount-sources` | Control source mounting strategy |
| `--allow-pre-releases` | Enable pre-release installations |

资料来源：[dev/breeze/src/airflow_breeze/commands/sbom_commands.py](https://github.com/apache/airflow/blob/main/dev/breeze/src/airflow_breeze/commands/sbom_commands.py)

## CLI Commands Structure

Airflow provides a comprehensive command-line interface organized into logical groups:

```mermaid
graph TD
    CLI["airflow CLI"] --> Groups
    Groups --> Config["config - View configuration"]
    Groups --> Info["info - Show system info"]
    Groups --> Plugins["plugins - Dump plugin info"]
    Groups --> Connections["connections - Manage connections"]
    Groups --> Providers["providers - Display providers"]
    Groups --> DAGs["dags - Manage DAGs"]
    Groups --> db_manager["db-manager - Database management"]
    Groups --> rotate_fernet["rotate-fernet-key - Rotate keys"]
```

### Core CLI Commands

| Command | Function | Purpose |
|---------|----------|---------|
| `airflow config` | View configuration | Display current Airflow settings |
| `airflow info` | System information | Show environment details |
| `airflow plugins` | Plugin dump | Display loaded plugins |
| `airflow connections` | Connection management | CRUD operations for connections |
| `airflow providers` | Provider display | List installed providers |
| `airflow db-manager` | Database management | External DB manager operations |

资料来源：[airflow-core/src/airflow/cli/cli_config.py](https://github.com/apache/airflow/blob/main/airflow-core/src/airflow/cli/cli_config.py)

## Dependencies and Requirements

### Generated Dependency Files

The repository maintains several auto-generated dependency files:

| File | Purpose | Generation Command |
|------|---------|-------------------|
| `devel_deps.txt` | Development dependencies | `./dev/get_devel_deps.sh` |
| `dep_tree.txt` | Full dependency tree | `uv tree --no-dedupe` |
| `dependency_depth.json` | Dependency depth analysis | Generated by Breeze |

### PyPI README Generation

The `PYPI_README.md` is automatically generated from the main README using pre-commit hooks defined in `.pre-commit-config.yaml`, ensuring consistency between project documentation and PyPI listings.

资料来源：[generated/README.md](https://github.com/apache/airflow/blob/main/generated/README.md)

## Summary

Apache Airflow is a robust, extensible workflow orchestration platform that provides:

- **DAG-based workflow definition** for complex pipeline management
- **Extensible architecture** through providers and plugins
- **Multiple deployment options** from development to production
- **Comprehensive CLI and UI** for monitoring and management
- **Strong ecosystem** of integrations with major cloud providers and services

The platform's design prioritizes **dynamic pipeline generation**, **extensible operator library**, and **robust scheduling capabilities**, making it the go-to choice for data engineering teams worldwide.

---

<a id='core-concepts'></a>

## Core Concepts

### 相关页面

相关主题：[Scheduler and Executor Architecture](#scheduler-executor), [Data Flow and State Management](#data-flow-xcom)

<details>
<summary>Related Source Files</summary>

The following source files are used to generate this documentation:

- [airflow-core/src/airflow/cli/cli_config.py](https://github.com/apache/airflow/blob/main/airflow-core/src/airflow/cli/cli_config.py)
- [airflow-core/src/airflow/cli/commands/config_command.py](https://github.com/apache/airflow/blob/main/airflow-core/src/airflow/cli/commands/config_command.py)
- [airflow-core/src/airflow/utils/db.py](https://github.com/apache/airflow/blob/main/airflow-core/src/airflow/utils/db.py)
- [providers/google/src/airflow/providers/google/cloud/hooks/cloud_composer.py](https://github.com/apache/airflow/blob/main/providers/google/src/airflow/providers/google/cloud/hooks/cloud_composer.py)
- [generated/README.md](https://github.com/apache/airflow/blob/main/generated/README.md)
</details>

# Core Concepts

Apache Airflow is an open-source workflow orchestration platform designed to programmatically author, schedule, and monitor complex data pipelines. The platform provides a robust framework for defining workflows as Directed Acyclic Graphs (DAGs), enabling organizations to automate and manage data processing workflows at scale.

## DAG (Directed Acyclic Graph)

The DAG is the fundamental building block of Airflow. It represents a collection of tasks with defined dependencies, organized in a way that reflects the logical flow of work through the system.

### DAG Structure

A DAG defines the overall structure of a workflow:

- **Nodes**: Represent individual tasks or operations
- **Edges**: Define dependencies between tasks (task A must complete before task B can start)
- **No Cycles**: The graph must flow in one direction without circular dependencies

### DAG Commands

Airflow provides comprehensive CLI commands for managing DAGs through the `dag_cli_commands` configuration.

| Command | Purpose |
|---------|---------|
| `dags list` | List all DAGs in the environment |
| `dags details` | Get detailed information about a specific DAG |
| `dags list-runs` | List all runs for a specific DAG |
| `dags list-import-errors` | Show DAGs with import errors |
| `dags report` | Display DagBag loading report |
| `dags pause` | Pause a DAG from scheduling |
| `dags unpause` | Resume DAG scheduling |
| `dags backfill` | Run subsections of a DAG for a date range |
| `dags test` | Test a DAG execution |

Source: [airflow-core/src/airflow/cli/cli_config.py](airflow-core/src/airflow/cli/cli_config.py)

### DAG Execution Parameters

When creating a backfill operation, the following parameters can be configured:

| Parameter | Description |
|-----------|-------------|
| `dag_id` | The identifier of the DAG to backfill |
| `from_date` | Start date for the backfill range |
| `to_date` | End date for the backfill range |
| `run_conf` | Configuration to pass to the DAG run |
| `run_backwards` | Execute DAG runs in reverse chronological order |
| `max_active_runs` | Maximum number of concurrent active DAG runs |
| `reprocess_behavior` | How to handle already-processed runs |
| `run_on_latest_version` | Whether to use the latest DAG version |
| `dry_run` | Execute without making changes |

Source: [airflow-core/src/airflow/cli/cli_config.py](airflow-core/src/airflow/cli/cli_config.py)

## Tasks and Task Instances

### Task Instance States

A task instance represents a specific execution of a task within a DAG run. Each task instance progresses through various states during its lifecycle.

```mermaid
graph TD
    A[None] --> B[Scheduled]
    B --> C[Queued]
    C --> D[Running]
    D --> E{Success/Failed/Skipped}
    E --> F[Success]
    E --> G[Failed]
    E --> H[Skipped]
    D --> I[Upstream Failed]
    F --> J[Complete]
    G --> J
    H --> J
    I --> J
```

### Task Instance Management

The Airflow UI provides functionality for clearing task instances, allowing operators to re-execute tasks that may have failed or need to be reprocessed. The `ClearTaskInstanceConfirmationDialog` component handles the confirmation workflow for this operation.

Key attributes displayed when clearing a task instance:

- Current state of the task
- Start date (shown as relative time)
- User who executed the task (or "unknown user" if unavailable)

## DAG Runs

A DAG Run represents an individual execution of an entire DAG at a specific point in time. Each DAG run has:

- **Run ID**: Unique identifier for the run
- **State**: Current state (running, success, failed)
- **Execution Date**: The logical date/time the DAG was scheduled to run
- **Start/End Date**: Actual execution timestamps
- **Configuration**: Run-specific configuration parameters

### Listing DAG Runs

The `list-runs` command supports filtering by:

- **State**: Filter runs by their state (running, success, failed)
- **No Backfill**: Exclude backfill runs from results
- **Start Date**: Filter runs executed after a specific date

Source: [airflow-core/src/airflow/cli/cli_config.py](airflow-core/src/airflow/cli/cli_config.py)

## Variables

Airflow Variables provide a mechanism for storing and retrieving arbitrary content or settings as simple key-value pairs. Variables are encrypted when the `fernet_key` is configured.

### Variable Commands

| Command | Purpose |
|---------|---------|
| `variables list` | List all variables |
| `variables get` | Get a specific variable value |
| `variables set` | Set a variable value |
| `variables delete` | Delete a variable |
| `variables export` | Export all variables to stdout |
| `variables import` | Import variables from a file |

### Variable Options

| Option | Description |
|--------|-------------|
| `VAR` | Variable key name |
| `VAR_VALUE` | Value to set |
| `VAR_DESCRIPTION` | Optional description |
| `SERIALIZE_JSON` | Serialize value as JSON |
| `DESERIALIZE_JSON` | Deserialize JSON value |
| `DEFAULT` | Default value if variable doesn't exist |
| `VAR_IMPORT` | Path to import file |
| `VAR_ACTION_ON_EXISTING_KEY` | Action for existing keys |

Source: [airflow-core/src/airflow/cli/cli_config.py](airflow-core/src/airflow/cli/cli_config.py)

## Connections

Airflow Connections store credentials and configuration information needed to connect to external systems. Default connections are automatically created during initialization.

### Default Connection Types

The `db.py` utility creates several default connections:

| Connection ID | Type | Purpose |
|---------------|------|---------|
| `facebook_social` | facebook_social | Facebook social authentication |
| `fs_default` | fs | File system operations |
| `ftp_default` | ftp | FTP server access |
| `google_cloud_default` | google_cloud_platform | GCP default credentials |
| `gremlin_default` | gremlin | Gremlin graph database |
| `hive_cli_default` | hive_cli | Hive command-line interface |
| `hiveserver2_default` | hiveserver2 | HiveServer2 JDBC connections |
| `http_default` | http | Generic HTTP endpoints |
| `iceberg_default` | iceberg | Apache Iceberg catalog |

Source: [airflow-core/src/airflow/utils/db.py](airflow-core/src/airflow/utils/db.py)

### Connection CLI Commands

| Command | Purpose |
|---------|---------|
| `connections list` | List all connections |
| `connections add` | Add a new connection |
| `connections delete` | Delete a connection |
| `connections get` | Get connection details |
| `connections edit` | Edit an existing connection |

## Assets

Assets represent data sources or destinations in Airflow. They can be associated with DAGs to create data-aware scheduling.

### Asset Commands

| Command | Purpose |
|---------|---------|
| `assets list` | List all assets |
| `assets details` | Show asset details |
| `assets materialze` | Materialize an asset |

### Asset Details Parameters

| Parameter | Description |
|-----------|-------------|
| `ASSET_ALIAS` | Alias name for the asset |
| `ASSET_NAME` | Name of the asset |
| `ASSET_URI` | URI identifying the asset |

Source: [airflow-core/src/airflow/cli/cli_config.py](airflow-core/src/airflow/cli/cli_config.py)

## Configuration Changes in Airflow 3.0

Airflow 3.0 introduces several configuration changes that affect default behavior.

### Default Behavior Changes

| Configuration | Old Default | New Default |
|---------------|-------------|-------------|
| `catchup_by_default` | `True` | `False` |
| `create_cron_data_intervals` | `True` | `False` |
| `create_delta_data_intervals` | `True` | `False` |

Source: [airflow-core/src/airflow/cli/commands/config_command.py](airflow-core/src/airflow/cli/commands/config_command.py)

### Configuration Renames

Several scheduler configurations have been renamed:

| Old Name | New Name |
|----------|----------|
| `scheduler.processor_poll_interval` | `scheduler.scheduler_idle_sleep_time` |
| `scheduler.deactivate_stale_dags_interval` | `scheduler.parsing_cleanup_interval` |
| `scheduler.statsd_on` | `metrics.statsd_on` |
| `scheduler.max_threads` | `dag_processor.parsing_processes` |

Source: [airflow-core/src/airflow/cli/commands/config_command.py](airflow-core/src/airflow/cli/commands/config_command.py)

### Catchup Behavior Change

In Airflow 3.0, DAGs without explicit `catchup` parameter definition will not catch up by default. This represents a change from Airflow 2.x behavior. Organizations relying on catchup behavior should set `catchup = True` in their DAG definitions or configure:

```ini
[scheduler]
catchup_by_default = True
```

## Providers

Airflow Providers extend the core functionality by integrating with external services and systems.

### Provider Information Commands

| Command | Purpose |
|---------|---------|
| `providers list` | List all installed providers |
| `providers details` | Show provider details |
| `providers hooks` | List registered provider hooks |
| `providers triggers` | List registered provider triggers |
| `providers executors` | Get information about executors |
| `providers secrets` | Get information about secrets backends |
| `providers connections` | Get connection information |
| `providers notifications` | Get notification information |
| `providers extra-links` | List extra links from providers |
| `providers widgets` | List connection form widgets |
| `providers behaviors` | Get connection types with custom behaviors |
| `providers logging` | Get task logging handlers |
| `providers auth-managers` | Get auth managers information |
| `providers configs` | Get provider configuration |
| `providers lazy-loaded` | Check lazy loading status |

Source: [airflow-core/src/airflow/cli/cli_config.py](airflow-core/src/airflow/cli/cli_config.py)

## Dependency Management

### Dependency Tree

Airflow's dependency tree defines the module structure and relationships between packages. This information is generated and maintained in the repository for dependency analysis.

| File | Purpose |
|------|---------|
| `dep_tree.txt` | Complete dependency tree of Airflow |
| `dependency_depth.json` | Dependency depth analysis |

Generated using:
```bash
uv tree --no-dedupe > /opt/airflow/generated/dep_tree.txt
```

Source: [generated/README.md](generated/README.md)

## Workflow Architecture

```mermaid
graph TD
    subgraph "Authoring"
        A[Define DAG] --> B[Define Tasks]
        B --> C[Set Dependencies]
    end
    
    subgraph "Scheduling"
        D[Scheduler] --> E{Parse DAGs}
        E --> F[DAG Run Created]
        F --> G[Task Instances Queued]
    end
    
    subgraph "Execution"
        G --> H[Executor]
        H --> I[Worker]
        I --> J[Task Execution]
    end
    
    subgraph "Monitoring"
        J --> K[Update State]
        K --> L[UI/Webserver]
        L --> M[Logs & Metrics]
    end
    
    G -->|Async Operations| N[Async Commands]
    N --> O[Cloud Composer Environments]
```

## Security Features

### Fernet Key Rotation

Airflow supports rotating Fernet encryption keys to maintain security of stored credentials and variables:

```bash
airflow rotate-fernet-key
```

This command rotates all encrypted connection credentials and variables.

Source: [airflow-core/src/airflow/cli/cli_config.py](airflow-core/src/airflow/cli/cli_config.py)

### Configuration Reference

For secure connections, refer to the official documentation:
https://airflow.apache.org/docs/apache-airflow/stable/howto/secure-connections.html

## Additional CLI Commands

### Info Command

Provides system and environment information:

```bash
airflow info [--anonymize] [--file-io] [--output OUTPUT] [--verbose]
```

### Standalone Mode

Runs a complete Airflow instance for testing or development:

```bash
airflow standalone
```

### Cheat Sheet

Displays a quick reference for common Airflow commands:

```bash
airflow cheat-sheet [--verbose]
```

### Plugins Command

Dump information about loaded plugins:

```bash
airflow plugins [--output OUTPUT] [--verbose]
```

Source: [airflow-core/src/airflow/cli/cli_config.py](airflow-core/src/airflow/cli/cli_config.py)

## Teams (RBAC)

Airflow supports team-based access control with the following operations:

### Team Commands

| Command | Purpose |
|---------|---------|
| `teams create` | Create a new team |
| `teams delete` | Delete a team |

### Team Creation Requirements

- Team names must be 3-50 characters long
- Only alphanumeric characters, hyphens, and underscores are allowed

Source: [airflow-core/src/airflow/cli/cli_config.py](airflow-core/src/airflow/cli/cli_config.py)

---

<a id='architecture-overview'></a>

## Architecture Overview

### 相关页面

相关主题：[Scheduler and Executor Architecture](#scheduler-executor), [REST API](#rest-api)

<details>
<summary>相关源码文件</summary>

以下源码文件用于生成本页说明：

- [airflow-core/docs/img/diagram_basic_airflow_architecture.py](https://github.com/apache/airflow/blob/main/airflow-core/docs/img/diagram_basic_airflow_architecture.py)
- [airflow-core/docs/img/diagram_distributed_airflow_architecture.py](https://github.com/apache/airflow/blob/main/airflow-core/docs/img/diagram_distributed_airflow_architecture.py)
- [airflow-core/docs/img/diagram_auth_manager_airflow_architecture.py](https://github.com/apache/airflow/blob/main/airflow-core/docs/img/diagram_auth_manager_airflow_architecture.py)
- [airflow-core/src/airflow/dag_processing/manager.py](https://github.com/apache/airflow/blob/main/airflow-core/src/airflow/dag_processing/manager.py)
- [airflow-core/docs/core-concepts/overview.rst](https://github.com/apache/airflow/blob/main/airflow-core/docs/core-concepts/overview.rst)
</details>

# Architecture Overview

Apache Airflow is an open-source workflow orchestration platform designed to programmatically author, schedule, and monitor complex data pipelines. The architecture is built around a distributed, scalable design that separates concerns between scheduling, execution, and monitoring.

## Core Architectural Components

Airflow's architecture consists of several key components that work together to manage workflow execution across distributed environments.

### Component Overview

| Component | Purpose | Key Files |
|-----------|---------|-----------|
| **Scheduler** | Triggers scheduled tasks and submits tasks to executors | `airflow-core/src/airflow/dag_processing/manager.py` |
| **Executor** | Executes tasks distributed across workers | Configured via `airflow.cfg` |
| **Web Server** | Provides UI for monitoring and management | `airflow/ui/` |
| **Database** | Stores DAGs, connections, variables, and execution history | Configured via `airflow.cfg` |
| **DAG Processor** | Parses and processes DAG files | `airflow-core/src/airflow/dag_processing/manager.py` |

资料来源：[airflow-core/docs/core-concepts/overview.rst]()

## Basic Architecture

The basic Airflow architecture runs all components on a single machine, suitable for development, testing, and small-scale deployments.

```mermaid
graph TB
    subgraph "Airflow Core"
        WS[Web Server] <--> DB[(Metadata Database)]
        SCH[Scheduler] <--> DB
        SCH <--> EX[Executor]
        EX <--> DB
        DP[DAG Processor] <--> DB
    end
    
    subgraph "DAG Storage"
        DAG[DAG Files] --> DP
    end
    
    WS -->|UI Access| Users
    SCH -->|Schedule| DAG
```

This single-node deployment includes all core components running together, with the scheduler handling task scheduling and the web server providing the user interface.

资料来源：[airflow-core/docs/img/diagram_basic_airflow_architecture.py]()

## Distributed Architecture

For production environments, Airflow supports a distributed architecture where components can scale independently.

```mermaid
graph TB
    subgraph "Web Tier"
        WS1[Web Server 1]
        WS2[Web Server 2]
        LB[Load Balancer]
    end
    
    subgraph "Scheduler Tier"
        SCH1[Scheduler 1]
        SCH2[Scheduler 2]
    end
    
    subgraph "Worker Tier"
        W1[Worker 1]
        W2[Worker 2]
        W3[Worker N]
    end
    
    subgraph "Metadata"
        DB[(PostgreSQL/MySQL)]
        REDIS[(Redis/Message Broker)]
    end
    
    LB --> WS1
    LB --> WS2
    WS1 <--> DB
    WS2 <--> DB
    SCH1 <--> DB
    SCH2 <--> DB
    SCH1 --> REDIS
    SCH2 --> REDIS
    REDIS --> W1
    REDIS --> W2
    REDIS --> W3
```

### Scalability Features

The distributed architecture provides:

- **Horizontal Scaling**: Multiple schedulers and web servers can run in parallel
- **Worker Flexibility**: Workers can be added or removed based on workload
- **High Availability**: No single point of failure for critical components
- **Isolation**: DAG processing is separated from execution

资料来源：[airflow-core/docs/img/diagram_distributed_airflow_architecture.py]()

## DAG Processing Architecture

The DAG Processor is responsible for reading, parsing, and validating DAG files before the Scheduler can schedule their tasks.

### Processing Pipeline

```mermaid
graph LR
    A[DAG Files] -->|Read| B[DAG Processor]
    B -->|Parse| C[DAG Bag]
    C -->|Validate| D[Valid DAGs]
    D -->|Sync| E[(Metadata DB)]
    B -->|Log| F[Processor Logs]
```

### Key Processing Manager

The `DagFileProcessorManager` handles:

1. **File Discovery**: Scanning DAG directories for Python files
2. **Parsing**: Converting Python DAG definitions into Airflow DAG objects
3. **Serialization**: Storing parsed DAGs in the database
4. **Callback Handling**: Processing DAG-level callbacks

```python
# Simplified from airflow-core/src/airflow/dag_processing/manager.py
class DagFileProcessorManager:
    def process_file(self, filepath):
        """Process a single DAG file and return list of DAGs."""
        dagbag = DagBag(dag_folder=filepath)
        for dag in dagbag.dags.values():
            dag.sync_to_db()
        return dagbag.dags
```

资料来源：[airflow-core/src/airflow/dag_processing/manager.py]()

## Authentication and Authorization Architecture

Airflow supports pluggable authentication through the Auth Manager interface, allowing integration with various authentication backends.

```mermaid
graph TB
    U[User] -->|Auth Request| AM[Auth Manager]
    AM -->|User Lookup| DB[(Metadata DB)]
    AM -->|Backend Check| BE[External Backend<br/>OIDC/SAML/LDAP]
    BE -->|Validation| AM
    AM -->|Permissions| PERMS[Permission Set]
    PERMS -->|Grant Access| RES[Resources]
```

### Auth Manager Components

| Component | Responsibility |
|-----------|----------------|
| `AuthManager` | Core interface for authentication |
| `FastAPIAuthManager` | Default implementation for Airflow 3.0+ |
| `Backends` | External identity providers |

The auth manager architecture enables:

- Integration with enterprise identity providers
- Role-based access control (RBAC)
- Fine-grained permissions on DAGs and tasks

资料来源：[airflow-core/docs/img/diagram_auth_manager_airflow_architecture.py]()

## Configuration Management

Airflow 3.0 introduced significant configuration changes from Airflow 2.x:

### Configuration Changes Summary

| Old Parameter | New Parameter | Section |
|---------------|---------------|---------|
| `processor_poll_interval` | `scheduler_idle_sleep_time` | scheduler |
| `deactivate_stale_dags_interval` | `parsing_cleanup_interval` | scheduler |
| `statsd_on` | `statsd_on` | metrics |
| `max_threads` | `parsing_processes` | dag_processor |
| `create_cron_data_intervals` | unchanged | scheduler |
| `create_delta_data_intervals` | unchanged | scheduler |

### Default Behavior Changes

In Airflow 3.0, the default for `catchup_by_default` is `False`, meaning DAGs without explicit catchup configuration will not backfill past runs.

资料来源：[airflow-core/src/airflow/cli/commands/config_command.py]()

## Executor Architecture

Airflow supports multiple executor types for different deployment scenarios:

| Executor | Use Case | Scalability |
|----------|----------|-------------|
| **LocalExecutor** | Single machine, development | Limited |
| **SequentialExecutor** | Debugging, minimal resources | None |
| **CeleryExecutor** | Production, distributed | High |
| **KubernetesExecutor** | Containerized environments | Very High |
| **RayExecutor** | Ray cluster integration | High |

### Executor Selection

Executors are configured in `airflow.cfg`:

```ini
[core]
executor = KubernetesExecutor
```

资料来源：[airflow-core/docs/core-concepts/overview.rst]()

## CLI and Command Architecture

The Airflow CLI provides a comprehensive command-line interface for managing Airflow components:

### Command Structure

```
airflow
├── config          # View configuration
├── connections     # Manage connections
├── dags            # DAG management
├── db-manager      # Database management
├── info            # System information
├── plugins         # Plugin dump
├── providers       # Provider information
├── rotate-fernet-key  # Credential rotation
├── standalone      # All-in-one mode
└── version         # Version display
```

### Key CLI Commands

| Command | Purpose |
|---------|---------|
| `airflow dags backfill` | Run historical DAG executions |
| `airflow dags list-runs` | List DAG run instances |
| `airflow connections list` | Show configured connections |
| `airflow providers list` | Display loaded providers |
| `airflow rotate-fernet-key` | Rotate encryption keys |

资料来源：[airflow-core/src/airflow/cli/cli_config.py]()

## Provider Architecture

Providers extend Airflow's capabilities by integrating with external systems:

### Provider Categories

- **Cloud Providers**: Google Cloud, AWS, Azure
- **Service Integrations**: HTTP, SSH, GraphQL
- **Database Connectors**: PostgreSQL, MySQL, Snowflake
- **Data Processing**: Spark, Databricks, dbt

### Provider Communication

```mermaid
graph LR
    A[Airflow Task] -->|Hook| B[Provider Hook]
    B -->|API| C[External Service]
    C -->|Response| B
    B -->|Result| A
```

Providers are discovered and managed through the `airflow providers` CLI commands.

资料来源：[airflow-core/src/airflow/cli/cli_config.py]()

## Installation Modes

### Docker Installation

Airflow provides official Docker images with multiple variants:

| Image Type | Description | Size |
|------------|-------------|------|
| `apache/airflow:latest` | Latest stable, default Python | ~1GB |
| `apache/airflow:3.x.x` | Versioned release | ~1GB |
| `apache/airflow:slim-latest` | Minimal installation | ~500MB |

### Installation Methods

```bash
# Basic installation
pip install 'apache-airflow==3.2.0' \
 --constraint "https://raw.githubusercontent.com/apache/airflow/constraints-3.2.0/constraints-3.10.txt"

# With extras
pip install 'apache-airflow[postgres,google]==3.2.0' \
 --constraint "https://raw.githubusercontent.com/apache/airflow/constraints-3.2.0/constraints-3.10.txt"
```

资料来源：[docker-stack-docs/README.md]()
资料来源：[generated/PYPI_README.md]()

## Default Connections

Airflow ships with pre-configured default connections for common integrations:

| Connection ID | Type | Purpose |
|---------------|------|---------|
| `google_cloud_default` | google_cloud_platform | GCP resources |
| `fs_default` | fs | File system access |
| `http_default` | http | HTTP endpoints |
| `ftp_default` | ftp | FTP/SFTP transfers |
| `hive_cli_default` | hive_cli | Hive CLI connections |
| `hiveserver2_default` | hiveserver2 | HiveServer2 connections |

资料来源：[airflow-core/src/airflow/utils/db.py]()

## Development and Testing Architecture

The Breeze development environment provides an integrated development setup:

### Breeze Features

- Pre-configured Docker-based development environment
- Multiple Python version support
- Integration testing framework
- Static code analysis tools
- Documentation building capabilities

### Development Commands

```bash
# Start development shell
breeze shell

# Run tests
breeze test

# Build documentation
breeze build-docs
```

资料来源：[dev/breeze/src/airflow_breeze/commands/developer_commands.py]()

## Shared Distribution Architecture

Airflow supports shared code distributions for cross-project functionality:

### Configuration

```toml
[tool.airflow]
shared_distributions = [
     "apache-airflow-shared-timezones",
]
```

Shared distributions are:

1. Defined in `pyproject.toml` under `tool.airflow`
2. Symlinked via `_shared` folder
3. Automatically synchronized by pre-commit hooks

资料来源：[shared/README.md]()

## MyPy Type Checking Integration

Airflow provides custom MyPy plugins for enhanced type checking:

### Available Plugins

| Plugin | Purpose |
|--------|---------|
| `airflow_mypy.plugins.decorators` | Type checking for Airflow decorators |
| `airflow_mypy.plugins.outputs` | Type inference for XCom arguments |

### Configuration

```ini
[mypy]
plugins = airflow_mypy.plugins.decorators, airflow_mypy.plugins.outputs
```

Or in `pyproject.toml`:

```toml
[tool.mypy]
plugins = ["airflow_mypy.plugins.decorators", "airflow_mypy.plugins.outputs"]
```

资料来源：[dev/mypy/README.md]()

## Build and Release Architecture

### SBOM (Software Bill of Materials)

Airflow generates SBOMs for security and compliance tracking:

- Dependency tree generation via `uv tree`
- Dependency depth analysis
- Version-specific SBOM files

### Documentation Build

The documentation system includes:

- Sphinx-based documentation generation
- Pagefind search integration
- Multi-version documentation support
- Third-party inventory tracking

资料来源：[dev/breeze/src/airflow_breeze/commands/sbom_commands.py]()
资料来源：[devel-common/src/sphinx_exts/pagefind_search/README.md]()

## Summary

Apache Airflow's architecture is designed for scalability, reliability, and extensibility:

- **Modular Design**: Components can be scaled independently
- **Pluggable Executors**: Support for various execution backends
- **Extensible Providers**: Integration with external systems
- **Production-Ready**: High availability and monitoring capabilities
- **Developer-Friendly**: Comprehensive tooling and documentation

The architecture supports deployments from single-machine development environments to large-scale, distributed production systems handling thousands of workflows.

---

<a id='scheduler-executor'></a>

## Scheduler and Executor Architecture

### 相关页面

相关主题：[Architecture Overview](#architecture-overview), [Kubernetes Deployment](#kubernetes-deployment)

<details>
<summary>相关源码文件</summary>

以下源码文件用于生成本页说明：

- [airflow-core/src/airflow/jobs/scheduler_job_runner.py](https://github.com/apache/airflow/blob/main/airflow-core/src/airflow/jobs/scheduler_job_runner.py)
- [airflow-core/src/airflow/executors/base_executor.py](https://github.com/apache/airflow/blob/main/airflow-core/src/airflow/executors/base_executor.py)
- [airflow-core/src/airflow/executors/local_executor.py](https://github.com/apache/airflow/blob/main/airflow-core/src/airflow/executors/local_executor.py)
- [airflow-core/src/airflow/executors/executor_loader.py](https://github.com/apache/airflow/blob/main/airflow-core/src/airflow/executors/executor_loader.py)
- [airflow-core/src/airflow/dag_processing/collection.py](https://github.com/apache/airflow/blob/main/airflow-core/src/airflow/dag_processing/collection.py)
- [airflow-core/src/airflow/timetables/base.py](https://github.com/apache/airflow/blob/main/airflow-core/src/airflow/timetables/base.py)
- [airflow-core/docs/administration-and-deployment/scheduler.rst](https://github.com/apache/airflow/blob/main/airflow-core/docs/administration-and-deployment/scheduler.rst)
</details>

# Scheduler and Executor Architecture

Apache Airflow's scheduling and execution system is a distributed architecture that coordinates the parsing of Directed Acyclic Graphs (DAGs), scheduling of task instances, and execution of tasks across worker nodes. This document provides a comprehensive overview of how the Scheduler and Executors interact to process and execute workflows.

## Overview

The Scheduler and Executor architecture in Apache Airflow consists of several interconnected components that work together to transform DAG definitions into executed tasks. The system separates concerns between **scheduling decisions** (when to run tasks based on dependencies and timetable data) and **execution** (how and where tasks actually run).

```mermaid
graph TD
    A[DAG Files] --> B[DAG Processor]
    B --> C[DAG Parsing]
    C --> D[Serialized DAGs]
    D --> E[Metadata Database]
    E --> F[Scheduler]
    F --> G[Executor]
    G --> H[Workers]
    H --> I[Task Execution]
    I --> E
```

## Scheduler Architecture

### Scheduler Process

The Scheduler is a long-running daemon process that continuously monitors DAGs and schedules task instances for execution. It is configured through CLI commands defined in the system.

**CLI Configuration:**

The scheduler command is defined in `cli_config.py` and accepts multiple configuration parameters for controlling its behavior.

| Parameter | Purpose | Default |
|-----------|---------|---------|
| `--num-runs` | Number of scheduler runs before exiting | -1 (infinite) |
| `--only-idle` | Only schedule DAGs with idle tasks | False |
| `--pid` | PID file location | None |
| `--daemon` | Run as daemon process | False |
| `--stdout` | stdout log file | None |
| `--stderr` | stderr log file | None |
| `--log-file` | Log file path | None |
| `--skip-serve-logs` | Skip serving logs | False |
| `--dev` | Development mode | False |

资料来源：[airflow-core/src/airflow/cli/cli_config.py](https://github.com/apache/airflow/blob/main/airflow-core/src/airflow/cli/cli_config.py)

### Job Lifecycle

The Scheduler creates and manages Jobs that represent its operational cycles. Each scheduler run follows a specific lifecycle that involves database interactions.

```mermaid
sequenceDiagram
    participant CLI as CLI Component
    participant JobRunner as JobRunner
    participant DB as Database
    participant TaskRunner as TaskRunner

    activate CLI
    CLI->>JobRunner: Create Job
    activate JobRunner
    JobRunner->>DB: Create Job Record
    activate DB
    DB-->>JobRunner: Job Created
    JobRunner->>DB: Create Session
    DB->>JobRunner: Session
    deactivate DB
    JobRunner->>CLI: Job Created
    deactivate JobRunner
    CLI->>JobRunner: Execute Job
    activate JobRunner
    par
        JobRunner->>DB: Schedule Tasks
        activate DB
        DB-->>JobRunner: Scheduled Tasks
        deactivate DB
    and
        JobRunner->>JobRunner: Process DAG Files
    end
    JobRunner->>DB: Perform Heartbeat
    activate DB
    DB->>JobRunner: Heartbeat Response
    JobRunner->>JobRunner: Heartbeat Callback
    DB-->>JobRunner: Close Session
    deactivate DB
    JobRunner->>CLI: Job Completed
    deactivate JobRunner
    deactivate CLI
```

资料来源：[airflow-core/src/airflow/jobs/JOB_LIFECYCLE.md](https://github.com/apache/airflow/blob/main/airflow-core/src/airflow/jobs/JOB_LIFECYCLE.md)

### Scheduler Responsibilities

The Scheduler performs the following key operations:

1. **DAG Parsing**: Reads DAG files and parses them into Python objects
2. **Task Scheduling**: Determines which tasks are ready to execute based on dependencies
3. **DagRun Creation**: Creates DagRun records for scheduled DAG runs
4. **TaskInstance Creation**: Creates TaskInstance records for tasks to be executed
5. **Heartbeating**: Maintains its presence and reports health to the database

## Executor Architecture

Executors are responsible for the actual execution of tasks. Airflow supports multiple executor types, each with different deployment characteristics.

### Executor Types

| Executor | Description | Use Case |
|----------|-------------|----------|
| SequentialExecutor | Executes tasks sequentially in the same process | Development/Debugging |
| LocalExecutor | Executes tasks in parallel processes on a single machine | Single-node deployments |
| CeleryExecutor | Distributes tasks across multiple machines using Celery | Distributed production deployments |
| KubernetesExecutor | Creates pods per task in Kubernetes | Kubernetes-native deployments |
| LocalKubernetesExecutor | Hybrid of Local and Kubernetes executors | Testing/minimal Kubernetes |

### Executor Loader

The `ExecutorLoader` class is responsible for loading and configuring executors based on Airflow configuration. It supports both simple executor names and complex module paths with optional aliases.

```mermaid
graph TD
    A[Executor Config] --> B{ExecutorLoader}
    B --> C{Check Format}
    C -->|Simple Name| D[Load Core Executor]
    C -->|Team:Executor| E[Parse Team Config]
    E --> F[Alias:Module/Name]
    F --> G[Resolve Module Path]
    D --> H[Return ExecutorName]
    G --> H
```

**Executor Configuration Parsing:**

The loader parses executor configurations in multiple formats:

- **Simple name**: `SequentialExecutor`, `LocalExecutor`
- **Module path**: `airflow.executors.local_executor.LocalExecutor`
- **With alias**: `MyAlias:LocalExecutor`
- **Team-based**: `team_name:executor_name` or `team_name:alias:executor_name`

资料来源：[airflow-core/src/airflow/executors/executor_loader.py](https://github.com/apache/airflow/blob/main/airflow-core/src/airflow/executors/executor_loader.py)

### Base Executor Interface

All executors inherit from `BaseExecutor` which defines the standard interface:

| Method | Purpose |
|--------|---------|
| `execute_async()` | Execute a single task |
| `sync()` | Sync state with metadata database |
| `end()` | Cleanup executor resources |
| `terminate()` | Force terminate all running tasks |
| `try_adopt_task_instances()` | Adopt orphaned task instances |
| `render_slots()` | Render available executor slots |

### Local Executor

The Local Executor executes tasks in parallel worker processes on a single machine, providing a balance between simplicity and parallelism.

**Key Features:**
- Configurable parallelism (number of parallel workers)
- Supports task-level parallelism within a single node
- Inherits configuration from core executor registry
- Can be configured with aliases for multi-team deployments

```python
# Example: Loading LocalExecutor with team configuration
executor_names_per_team.append(
    ExecutorName(
        alias=None, 
        module_path=cls.executors[module_or_name], 
        team_name=team_name
    )
)
```

资料来源：[airflow-core/src/airflow/executors/local_executor.py](https://github.com/apache/airflow/blob/main/airflow-core/src/airflow/executors/local_executor.py)

## Task Scheduling Flow

The complete task scheduling flow involves multiple stages from DAG definition to task execution:

```mermaid
graph LR
    A[DAG File] --> B[DAG Processor]
    B --> C[Parse DAG]
    C --> D[Serialize DAG]
    D --> E[Store in DB]
    E --> F[Scheduler]
    F --> G[Evaluate Timetable]
    G --> H{Check Dependencies}
    H -->|Met| I[Create TaskInstance]
    H -->|Not Met| J[Skip]
    I --> K[Queue Task]
    K --> L[Executor]
    L --> M[Worker]
    M --> N[Execute Task]
    N --> O[Update State]
    O --> P[Record XCom]
```

### DAG Processing

The DAG collection module handles parsing and synchronization of DAGs:

| Component | Responsibility |
|-----------|----------------|
| `DagBag` | Collection of parsed DAGs from file system |
| `DagFileProcessor` | Parses individual DAG files |
| `DagFileProcessorAgent` | Manages multiple DAG processors |
| `SerializedDagModel` | Database representation of serialized DAGs |

资料来源：[airflow-core/src/airflow/dag_processing/collection.py](https://github.com/apache/airflow/blob/main/airflow-core/src/airflow/dag_processing/collection.py)

### Timetables

Timetables determine when DAGs should be triggered. They provide schedule information and calculate logical dates for DAG runs.

**Key Timetable Methods:**

| Method | Purpose |
|--------|---------|
| `describe()` | Human-readable schedule description |
| `infer_data_interval()` | Infer run boundaries from logical date |
| `get_next_runtime()` | Calculate next scheduled run time |
| `validate()` | Validate timetable configuration |

资料来源：[airflow-core/src/airflow/timetables/base.py](https://github.com/apache/airflow/blob/main/airflow-core/src/airflow/timetables/base.py)

## Task State Management

Task instances maintain state throughout their lifecycle. The `TaskState` class provides methods for managing task state via supervisor communications.

```mermaid
graph TD
    A[Task Starts] --> B[Scheduled]
    B --> C[Queued]
    C --> D[Started]
    D --> E{Ran Successfully?}
    E -->|Yes| F[Success]
    E -->|No| G{Retryable?}
    G -->|Yes| H[Up for Retry]
    H --> C
    G -->|No| I[Failed]
    F --> J[Emit XCom]
    I --> J
```

### TaskState API

The TaskState class provides key-value storage for task state information:

| Method | Description |
|--------|-------------|
| `get(key)` | Retrieve task state value by key |
| `set(key, value)` | Store task state value |
| `delete(key)` | Delete a specific key |
| `clear(all_map_indices)` | Clear all keys or map-index specific keys |

资料来源：[task-sdk/src/airflow/sdk/execution_time/context.py](https://github.com/apache/airflow/blob/main/task-sdk/src/airflow/sdk/execution_time/context.py)

## DAG Runs and Task Instances

### DagRun States

| State | Description |
|-------|-------------|
| `queued` | Initial state when DAG run is created |
| `running` | DAG run is currently executing |
| `success` | All tasks in the DAG completed successfully |
| `failed` | DAG run failed due to task or system failure |

### Run Types

| Type | Trigger Mechanism |
|------|-------------------|
| `scheduled` | Triggered automatically by timetable |
| `manual` | Triggered by user action via CLI or UI |
| `dataset` | Triggered by dataset dependency |
| `backfill` | Triggered by explicit backfill command |

资料来源：[airflow-core/src/airflow/ui/src/pages/DagRuns.tsx](https://github.com/apache/airflow/blob/main/airflow-core/src/airflow/ui/src/pages/DagRuns.tsx)

## CLI Commands for Scheduler and Executor

### Scheduler Commands

```bash
# Start the scheduler
airflow scheduler

# Start with specific number of runs
airflow scheduler --num-runs 10

# Run in daemon mode
airflow scheduler --daemon --log-file /path/to/scheduler.log

# Development mode
airflow scheduler --dev
```

### DAG Processing Commands

```bash
# Trigger DAG run
airflow dags trigger <dag_id>

# Test task
airflow tasks test <dag_id> <task_id> <logical_date>

# Clear task instances
airflow tasks clear <dag_id>

# Render task template
airflow tasks render <dag_id> <task_id> <logical_date>
```

资料来源：[airflow-core/src/airflow/cli/cli_config.py](https://github.com/apache/airflow/blob/main/airflow-core/src/airflow/cli/cli_config.py)

## Configuration

### Executor Configuration

Executors are configured in `airflow.cfg` or through environment variables:

```ini
[core]
executor = LocalExecutor

[celery]
celery_executor_config = ...
```

### Scheduler Configuration

| Configuration | Description | Default |
|---------------|--------------|---------|
| `scheduler_num_runs` | Number of scheduler runs | -1 (infinite) |
| `scheduler_idle_sleep_time` | Seconds between scheduler loops | 1 |
| `num_runs` | Alternative parameter for number of runs | -1 |
| `only_idle` | Only schedule idle DAGs | False |

## Signal Handling

The scheduler supports the following signals for operational control:

| Signal | Action |
|--------|--------|
| `SIGUSR2` | Dump a snapshot of task state being tracked by the executor |

Example usage:
```bash
pkill -f -USR2 "airflow scheduler"
```

资料来源：[airflow-core/src/airflow/cli/cli_config.py](https://github.com/apache/airflow/blob/main/airflow-core/src/airflow/cli/cli_config.py)

## Related Components

### Triggerer

The Triggerer is a separate daemon that manages lightweight asynchronous triggers for tasks that need to wait for external conditions:

```bash
airflow triggerer --capacity 1000 --queues queue1,queue2
```

| Parameter | Purpose |
|-----------|---------|
| `--capacity` | Maximum concurrent triggers |
| `--queues` | Queues to consume from |
| `--pid` | PID file location |
| `--daemon` | Run as daemon |

### DAG Processor

The DAG Processor parses and validates DAG files before they are scheduled:

```bash
airflow dag-processor --bundle-name <name> --num-runs 5
```

## Summary

The Scheduler and Executor Architecture in Apache Airflow provides a robust, scalable system for orchestrating complex workflows. Key architectural principles include:

1. **Separation of Concerns**: Scheduler handles scheduling decisions; Executors handle task execution
2. **Pluggable Executors**: Multiple executor types support different deployment scenarios
3. **Database-Driven State**: All state is persisted in the metadata database for durability
4. **Continuous Loop**: Scheduler runs continuously, periodically evaluating DAGs and scheduling tasks
5. **Team-Based Execution**: Modern Airflow supports team-based executor configuration with aliases

---

<a id='rest-api'></a>

## REST API

### 相关页面

相关主题：[User Interface](#user-interface), [Architecture Overview](#architecture-overview)

<details>
<summary>相关源码文件</summary>

以下源码文件用于生成本页说明：

- [airflow-core/src/airflow/api_fastapi/core_api/app.py](https://github.com/apache/airflow/blob/main/airflow-core/src/airflow/api_fastapi/core_api/app.py)
- [airflow-core/src/airflow/api_fastapi/core_api/routes/ui/dag_runs.py](https://github.com/apache/airflow/blob/main/airflow-core/src/airflow/api_fastapi/core_api/routes/ui/dag_runs.py)
- [airflow-core/src/airflow/api_fastapi/core_api/datamodels/dag_run.py](https://github.com/apache/airflow/blob/main/airflow-core/src/airflow/api_fastapi/core_api/datamodels/dag_run.py)
- [airflow-core/src/airflow/api_fastapi/auth/managers/simple/simple_auth_manager.py](https://github.com/apache/airflow/blob/main/airflow-core/src/airflow/api_fastapi/auth/managers/simple/simple_auth_manager.py)
- [airflow-core/src/airflow/api_fastapi/execution_api/app.py](https://github.com/apache/airflow/blob/main/airflow-core/src/airflow/api_fastapi/execution_api/app.py)
- [airflow-core/docs/stable-rest-api-ref.rst](https://github.com/apache/airflow/blob/main/airflow-core/docs/stable-rest-api-ref.rst)
</details>

# REST API

Apache Airflow provides a comprehensive REST API for programmatic interaction with the workflow automation platform. The API is built using FastAPI and enables external systems and applications to manage DAGs, trigger executions, monitor workflows, and interact with task instances.

## Architecture Overview

Apache Airflow's REST API is architected as a multi-layer FastAPI application that separates concerns between the core API and the execution API.

```mermaid
graph TD
    subgraph "Client Layer"
        A[External Clients]
        B[CLI Tools]
        C[UI Dashboard]
    end
    
    subgraph "REST API Layer"
        D[Core API<br/>/api/v1]
        E[Execution API<br/>/execution]
    end
    
    subgraph "Service Layer"
        F[DAG Management Service]
        G[Task Execution Service]
        H[Configuration Service]
    end
    
    subgraph "Data Layer"
        I[Airflow Database]
        J[Metadata DB]
    end
    
    A --> D
    B --> D
    C --> D
    C --> E
    D --> F
    D --> H
    E --> G
    F --> I
    G --> I
    H --> J
```

### Core API (`/api/v1`)

The Core API provides the primary interface for DAG management, monitoring, and administrative operations. It handles:

- DAG run creation and management
- Task instance operations
- Connection and variable management
- User and permission management
- Plugin and provider information

资料来源：[airflow-core/src/airflow/api_fastapi/core_api/app.py](https://github.com/apache/airflow/blob/main/airflow-core/src/airflow/api_fastapi/core_api/app.py)

### Execution API (`/execution`)

The Execution API is designed for lightweight, high-performance task execution operations. It provides endpoints for:

- Task state updates
- XCom value operations
- Task heartbeat signals
- Execution context retrieval

资料来源：[airflow-core/src/airflow/api_fastapi/execution_api/app.py](https://github.com/apache/airflow/blob/main/airflow-core/src/airflow/api_fastapi/execution_api/app.py)

## Authentication and Authorization

Airflow's REST API supports multiple authentication mechanisms through a pluggable auth manager architecture.

### Authentication Flow

```mermaid
sequenceDiagram
    participant Client
    participant API as REST API
    participant Auth as Auth Manager
    participant DB as Database
    
    Client->>API: Request with credentials
    API->>Auth: Validate credentials
    Auth->>DB: Check user/permissions
    DB-->>Auth: User info + permissions
    Auth-->>API: Auth result
   alt Authentication Success
        API-->>Client: 200 + Response data
    else Authentication Failure
        API-->>Client: 401/403 + Error
    end
```

### Simple Auth Manager

For development and testing environments, Airflow provides a Simple Auth Manager that supports basic authentication mechanisms.

资料来源：[airflow-core/src/airflow/api_fastapi/auth/managers/simple/simple_auth_manager.py](https://github.com/apache/airflow/blob/main/airflow-core/src/airflow/api_fastapi/auth/managers/simple/simple_auth_manager.py)

### Access Control

The REST API implements granular access control through DAG-level permissions:

| Permission | Description |
|------------|-------------|
| `GET` | Read access to DAG information |
| `POST` | Create/modify DAG resources |
| `DELETE` | Remove DAG resources |
| `EDIT` | Modify DAG configuration |

Access control is enforced through dependency injection on route handlers:

```python
dependencies=[
    Depends(requires_access_dag("GET")),
    Depends(requires_access_dag("GET", DagAccessEntity.DEPENDENCIES)),
    Depends(requires_access_dag("GET", DagAccessEntity.TASK_INSTANCE)),
]
```

资料来源：[airflow-core/src/airflow/api_fastapi/core_api/routes/ui/structure.py](https://github.com/apache/airflow/blob/main/airflow-core/src/airflow/api_fastapi/core_api/routes/ui/structure.py)

## DAG Runs API

DAG Runs represent individual executions of a Directed Acyclic Graph (DAG). The DAG Runs API provides comprehensive endpoints for managing workflow executions.

### DAG Run Data Model

| Field | Type | Description |
|-------|------|-------------|
| `dag_id` | string | Unique identifier for the DAG |
| `run_id` | string | Unique identifier for this execution |
| `state` | enum | Current state (queued, running, success, failed) |
| `conf` | object | Configuration passed to the DAG |
| `logical_date` | datetime | Scheduled execution time |
| `start_date` | datetime | Actual execution start time |
| `end_date` | datetime | Execution completion time |
| `external_trigger` | boolean | Whether triggered externally |

资料来源：[airflow-core/src/airflow/api_fastapi/core_api/datamodels/dag_run.py](https://github.com/apache/airflow/blob/main/airflow-core/src/airflow/api_fastapi/core_api/datamodels/dag_run.py)

### Key Endpoints

| Endpoint | Method | Description |
|----------|--------|-------------|
| `/dags/{dag_id}/dagRuns` | GET | List DAG runs with filtering |
| `/dags/{dag_id}/dagRuns` | POST | Trigger a new DAG run |
| `/dags/{dag_id}/dagRuns/{run_id}` | GET | Get specific DAG run details |
| `/dags/{dag_id}/dagRuns/{run_id}` | DELETE | Delete a DAG run |
| `/dags/{dag_id}/dagRuns/{run_id}/clear` | POST | Clear task instances |
| `/dags/{dag_id}/dagRuns/{run_id}/confirm` | POST | Confirm a DAG run |
| `/dags/{dag_id}/dagRuns/{run_id}/update` | PATCH | Update DAG run state |

资料来源：[airflow-core/src/airflow/api_fastapi/core_api/routes/ui/dag_runs.py](https://github.com/apache/airflow/blob/main/airflow-core/src/airflow/api_fastapi/core_api/routes/ui/dag_runs.py)

### Query Parameters

The DAG Runs list endpoint supports extensive filtering:

| Parameter | Type | Description |
|-----------|------|-------------|
| `limit` | integer | Maximum number of results (default: 100) |
| `offset` | integer | Pagination offset |
| `order_by` | string | Sort field |
| `state` | enum | Filter by state |
| `dag_run_id` | string | Filter by run ID |
| `logical_date` | datetime | Filter by logical date |
| `start_date` | datetime | Filter by start date |
| `end_date` | datetime | Filter by end date |
| `include_upstream` | boolean | Include upstream dependencies |
| `include_downstream` | boolean | Include downstream tasks |
| `depth` | integer | Tree depth limit |
| `root` | string | Root node filter |
| `external_dependencies` | boolean | Include external dependencies |

## API Structure and Organization

### Route Organization

The REST API routes are organized by functional area within the FastAPI application structure:

```mermaid
graph LR
    subgraph "/api/v1"
        A[UI Routes]
        B[DAG Routes]
        C[Task Routes]
        D[Connection Routes]
        E[Variable Routes]
        F[Plugin Routes]
    end
    
    subgraph "/execution"
        G[Task Execution]
        H[XCom Operations]
    end
```

### Response Models

All API endpoints return structured responses using Pydantic models. Responses include:

- **Data**: The requested resource or operation result
- **Meta**: Pagination information and metadata
- **Links**: HATEOAS-style navigation links

### Error Handling

The API implements consistent error handling with structured error responses:

| Status Code | Category |
|-------------|----------|
| 400 | Bad Request - Invalid parameters |
| 401 | Unauthorized - Authentication required |
| 403 | Forbidden - Insufficient permissions |
| 404 | Not Found - Resource doesn't exist |
| 409 | Conflict - State conflict |
| 500 | Internal Server Error |

## Security Considerations

### Session Management

The REST API integrates with Airflow's session management system. When authentication is enabled:

1. Clients must authenticate to receive a session cookie
2. Subsequent requests include the session cookie
3. Sessions expire based on configuration settings

### Role-Based Access Control (RBAC)

The API supports RBAC through integration with the auth manager:

- **Admin**: Full access to all resources
- **Op**: Access to DAG operations
- **User**: Read access to DAGs, limited write access
- **Viewer**: Read-only access

## Configuration

### Enabling the REST API

The REST API is enabled by default when Airflow is configured with a supported auth manager. Key configuration options:

| Option | Default | Description |
|--------|---------|-------------|
| `auth_manager` | simple | Authentication backend |
| `api_url` | - | Base URL for API endpoints |
| `secret_key` | - | Session encryption key |

### CORS Configuration

Cross-Origin Resource Sharing (CORS) can be configured to allow web clients to access the API:

| Setting | Description |
|---------|-------------|
| `access_control_allow_origins` | Allowed origins |
| `access_control_allow_methods` | Allowed HTTP methods |
| `access_control_allow_headers` | Allowed headers |

## API Versioning

Apache Airflow maintains API stability through versioning:

- **Current Version**: `/api/v1`
- **Version Prefix**: All endpoints are prefixed with the version
- **Stability Guarantee**: Within a major version, breaking changes are avoided

资料来源：[airflow-core/docs/stable-rest-api-ref.rst](https://github.com/apache/airflow/blob/main/airflow-core/docs/stable-rest-api-ref.rst)

## Usage Examples

### Triggering a DAG Run

```bash
curl -X POST "http://airflow-server:8080/api/v1/dags/my_dag/dagRuns" \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer <token>" \
  -d '{
    "dag_run_id": "manual_run_001",
    "logical_date": "2024-01-15T10:00:00Z",
    "conf": {"key": "value"}
  }'
```

### Listing DAG Runs

```bash
curl "http://airflow-server:8080/api/v1/dags/my_dag/dagRuns?state=running&limit=10" \
  -H "Authorization: Bearer <token>"
```

## See Also

- [CLI Commands](cli-commands) - Alternative command-line interface
- [Authentication](authentication) - Auth manager configuration
- [DAG Runs](dag-runs) - DAG execution management
- [Task Instances](task-instances) - Task operation API

---

<a id='user-interface'></a>

## User Interface

### 相关页面

相关主题：[REST API](#rest-api), [Core Concepts](#core-concepts)

<details>
<summary>相关源码文件</summary>

以下源码文件用于生成本页说明：

- [airflow-core/src/airflow/ui/src/layouts/Details/Grid/Grid.tsx](https://github.com/apache/airflow/blob/main/airflow-core/src/airflow/ui/src/layouts/Details/Grid/Grid.tsx)
- [airflow-core/src/airflow/ui/src/layouts/Details/Graph/Graph.tsx](https://github.com/apache/airflow/blob/main/airflow-core/src/airflow/ui/src/layouts/Details/Graph/Graph.tsx)
- [airflow-core/src/airflow/ui/src/pages/Dashboard/Dashboard.tsx](https://github.com/apache/airflow/blob/main/airflow-core/src/airflow/ui/src/pages/Dashboard/Dashboard.tsx)
- [airflow-core/src/airflow/ui/src/router.tsx](https://github.com/apache/airflow/blob/main/airflow-core/src/airflow/ui/src/router.tsx)
- [airflow-core/src/airflow/ui/package.json](https://github.com/apache/airflow/blob/main/airflow-core/src/airflow/ui/package.json)
- [airflow-core/docs/ui.rst](https://github.com/apache/airflow/blob/main/airflow-core/docs/ui.rst)
- [airflow-core/src/airflow/ui/src/pages/DagRuns.tsx](https://github.com/apache/airflow/blob/main/airflow-core/src/airflow/ui/src/pages/DagRuns.tsx)
- [airflow-core/src/airflow/ui/src/pages/Dag/Overview/Overview.tsx](https://github.com/apache/airflow/blob/main/airflow-core/src/airflow/ui/src/pages/Dag/Overview/Overview.tsx)
- [airflow-core/src/airflow/ui/src/components/Clear/TaskInstance/ClearTaskInstanceConfirmationDialog.tsx](https://github.com/apache/airflow/blob/main/airflow-core/src/airflow/ui/src/components/Clear/TaskInstance/ClearTaskInstanceConfirmationDialog.tsx)
- [airflow-core/src/airflow/ui/src/pages/Connections/EditConnectionButton.tsx](https://github.com/apache/airflow/blob/main/airflow-core/src/airflow/ui/src/pages/Connections/EditConnectionButton.tsx)
- [providers/edge3/src/airflow/providers/edge3/plugins/www/src/pages/WorkerPage.tsx](https://github.com/apache/airflow/blob/main/providers/edge3/src/airflow/providers/edge3/plugins/www/src/pages/WorkerPage.tsx)
</details>

# User Interface

## Overview

Apache Airflow's User Interface (UI) is a modern web-based frontend built with React and TypeScript that provides comprehensive workflow management, monitoring, and operational capabilities. The UI serves as the primary visual interface for data engineers and operators to interact with DAGs, task instances, DAG runs, connections, and various Airflow components.

The UI framework is organized under `airflow-core/src/airflow/ui/` and leverages contemporary React patterns including component composition, lazy loading, and type-safe TypeScript implementations. 资料来源：[airflow-core/docs/ui.rst:1-50]()

## Architecture Overview

### Technology Stack

| Component | Technology | Purpose |
|-----------|------------|---------|
| Framework | React 18+ | UI component library |
| Language | TypeScript | Type safety and better DX |
| State Management | TanStack Query / React Query | Server state management |
| Routing | React Router v7 | Client-side routing |
| Styling | Chakra UI | Component library and styling |
| Build Tool | Vite | Fast development and building |
| Icons | Lucide React | Consistent iconography |

资料来源：[airflow-core/src/airflow/ui/package.json:1-30]()

### Directory Structure

```
airflow/
├── ui/
│   ├── src/
│   │   ├── layouts/          # Page layout components
│   │   │   └── Details/      # Detail view layouts
│   │   │       ├── Grid/     # Grid view for DAGs
│   │   │       └── Graph/    # Graph view for DAGs
│   │   ├── pages/            # Page components
│   │   │   ├── Dashboard/    # Dashboard pages
│   │   │   ├── Dag/          # DAG detail pages
│   │   │   ├── Run/          # Run detail pages
│   │   │   └── Connections/  # Connection management
│   │   ├── components/       # Reusable UI components
│   │   │   ├── Clear/        # Clear action components
│   │   │   └── TriggerDag/   # DAG triggering components
│   │   └── router.tsx        # Application routing configuration
│   └── package.json
└── providers/
    └── edge3/                # Edge provider with custom UI
        └── plugins/www/src/
            ├── pages/        # Edge-specific pages
            └── components/   # Edge-specific components
```

## Routing System

The UI uses React Router v7 for client-side navigation. Routes are defined in `router.tsx` and organized into two main categories: main application routes and detail views.

```mermaid
graph TD
    A["/"] --> B["Dashboard"]
    A --> C["DAGs List"]
    A --> D["DAG Detail"]
    D --> D1["Grid View"]
    D --> D2["Graph View"]
    D --> D3["Overview"]
    A --> E["DAG Runs"]
    A --> F["Connections"]
    A --> G["Providers"]
    
    style A fill:#e1f5fe
    style D1 fill:#fff3e0
    style D2 fill:#e8f5e9
    style D3 fill:#f3e5f5
```

### Core Route Definitions

The router configuration maps URL paths to page components and handles lazy loading for improved performance:

```typescript
// Simplified route structure from router.tsx
routes = [
  { path: "/", component: Dashboard },
  { path: "/dags", component: DagList },
  { path: "/dags/:dagId", component: DagDetail },
  { path: "/dags/:dagId/runs", component: DagRuns },
  { path: "/runs/:dagRunId", component: RunDetail },
  { path: "/connections", component: Connections },
]
```

## Page Components

### Dashboard

The Dashboard (`Dashboard.tsx`) serves as the landing page, providing an overview of the Airflow environment. It typically includes:

- DAG statistics and health metrics
- Recent DAG runs
- Failed or stuck tasks requiring attention
- Quick access to frequently used DAGs

资料来源：[airflow-core/src/airflow/ui/src/pages/Dashboard/Dashboard.tsx:1-50]()

### DAG Detail Views

DAG detail pages provide comprehensive views of individual DAGs with multiple visualization modes:

#### Grid View

The Grid view (`Grid.tsx`) displays DAG tasks in a tabular format, allowing users to:

- View task states across multiple DAG runs
- Navigate through historical runs
- Identify failed or running tasks quickly

```mermaid
graph LR
    subgraph "Grid View Structure"
        H["Header: DAG Info"]
        T["Task Grid Table"]
        F["Filter/Sort Controls"]
    end
    
    T --> T1["Task Columns"]
    T --> T2["Run Rows"]
    T1 --> Cell["State Cell"]
```

资料来源：[airflow-core/src/airflow/ui/src/layouts/Details/Grid/Grid.tsx:1-80]()

#### Graph View

The Graph view (`Graph.tsx`) renders the DAG structure visually as a directed acyclic graph, showing:

- Task dependencies and relationships
- Task execution states with color coding
- Interactive node selection and navigation

资料来源：[airflow-core/src/airflow/ui/src/layouts/Details/Graph/Graph.tsx:1-80]()

#### Overview Page

The Overview page (`Overview.tsx`) provides a comprehensive dashboard for a specific DAG:

```typescript
interface OverviewComponents {
  dagStats: DagStats;           // DAG statistics
  failedRuns: FailedRuns;       // Failed run alerts
  durationChart: DurationChart; // Duration visualization
  assetEvents: AssetEvents;     // Asset event tracking
  dagDeadlines: DagDeadlines;    // Deadline management
  failedLogs: FailedLogs;        // Failed task logs
}
```

资料来源：[airflow-core/src/airflow/ui/src/pages/Dag/Overview/Overview.tsx:1-100]()

### DAG Runs Page

The DAG Runs page (`DagRuns.tsx`) displays a table of all DAG runs with the following columns:

| Column | Description | Features |
|--------|-------------|----------|
| DAG Run ID | Unique identifier | Link to run detail |
| State | Current state | Colored badge |
| Run Type | Scheduled, manual, etc. | Icon + text |
| Run After | Scheduled execution time | Time component |
| Triggering User | User who triggered | Username display |
| Start Date | Execution start time | Timestamp |
| End Date | Execution end time | Timestamp |
| Duration | Total execution time | Calculated field |
| DAG Versions | Version tracking | Version badges |

资料来源：[airflow-core/src/airflow/ui/src/pages/DagRuns.tsx:1-100]()

## Dialog Components

Dialogs are used throughout the UI for focused interactions, confirmations, and detailed forms.

### Confirmation Dialogs

The `ClearTaskInstanceConfirmationDialog.tsx` demonstrates the dialog pattern used for critical operations:

```tsx
<Dialog.Root lazyMount onOpenChange={onClose} open={open} size="xl">
  <Dialog.Content backdrop>
    <Dialog.Header>
      <Dialog.Title>
        <Icon color="tomato"><GoAlertFill /></Icon>
        {translate("dags:runAndTaskActions.confirmationDialog.title")}
      </Dialog.Title>
    </Dialog.Header>
    <Dialog.Body>
      {/* Confirmation details */}
    </Dialog.Body>
    <Dialog.Footer>
      <Button onClick={onClose}>{translate("common:modal.confirm")}</Button>
    </Dialog.Footer>
  </Dialog.Content>
</Dialog.Root>
```

Key characteristics:
- **lazyMount**: Content renders only when opened
- **unmountOnExit**: Complete cleanup when closed
- **backdrop**: Modal overlay for focus
- **size variants**: sm, md, lg, xl for different content types

资料来源：[airflow-core/src/airflow/ui/src/components/Clear/TaskInstance/ClearTaskInstanceConfirmationDialog.tsx:1-60]()

### Edit Dialogs

The `EditConnectionButton.tsx` demonstrates dialog usage for editing forms:

```tsx
<Dialog.Root lazyMount onOpenChange={handleClose} open={open} size="xl" unmountOnExit>
  <Dialog.Content backdrop>
    <Dialog.Header>
      <Heading size="xl">{translate("connections.edit")}</Heading>
    </Dialog.Header>
    <Dialog.Body>
      <ConnectionForm
        error={error}
        initialConnection={initialConnectionValue}
        isEditMode={true}
        isPending={isPending}
        mutateConnection={editConnection}
      />
    </Dialog.Body>
  </Dialog.Content>
</Dialog.Root>
```

资料来源：[airflow-core/src/airflow/ui/src/pages/Connections/EditConnectionButton.tsx:1-50]()

## Component Library

### Clear Actions

The UI provides comprehensive task and group clearing functionality:

```mermaid
graph TD
    A["Clear Action Trigger"] --> B{"Single vs Group"}
    B -->|Single| C["ClearTaskInstanceDialog"]
    B -->|Group| D["ClearGroupTaskInstanceDialog"]
    
    C --> E["Confirmation Dialog"]
    D --> F["Options Selection"]
    F --> F1["Past Tasks"]
    F --> F2["Future Tasks"]
    F --> F3["Upstream"]
    F --> F4["Downstream"]
    F --> F5["Only Failed"]
    
    E --> G["Action Accordion"]
    F --> G
    G --> H["API Execution"]
```

Components in `airflow-core/src/airflow/ui/src/components/Clear/TaskInstance/`:

- `ClearTaskInstanceDialog.tsx` - Single task clearing
- `ClearTaskInstanceConfirmationDialog.tsx` - Confirmation UI
- `ClearGroupTaskInstanceDialog.tsx` - Bulk task clearing

### Trigger DAG Components

The `TriggerDAGAdvancedOptions.tsx` provides advanced options when triggering DAGs:

| Option | Purpose |
|--------|---------|
| dagRunId | Custom run identifier |
| partitionKey | Partition-based execution |
| note | Execution notes/documentation |

```tsx
<Controller
  control={control}
  name="dagRunId"
  render={({ field }) => (
    <Field.Root mt={6} orientation="horizontal">
      <Field.Label fontSize="md" style={{ flexBasis: "30%" }}>
        {translate("runId")}
      </Field.Label>
      <Stack css={{ flexBasis: "70%" }}>
        <Input {...field} size="sm" />
        <Field.HelperText>{translate("components:triggerDag.runIdHelp")}</Field.HelperText>
      </Stack>
    </Field.Root>
  )}
/>
```

资料来源：[airflow-core/src/airflow/ui/src/components/TriggerDag/TriggerDAGAdvancedOptions.tsx:1-80]()

## Edge Provider UI

The Edge provider (`providers/edge3/`) extends the base UI with worker-specific pages and components.

### Worker Management Page

The `WorkerPage.tsx` provides a table-based interface for managing edge workers:

```mermaid
graph LR
    subgraph "Worker Page Structure"
        T["Worker Table"]
        H["Header Actions"]
        F["Filter Controls"]
    end
    
    T --> C1["Worker Name"]
    T --> C2["Queues"]
    T --> C3["Active Jobs"]
    T --> C4["System Info"]
    T --> C5["Operations"]
```

Worker operations include:
- **Delete**: Available for offline, unknown, or offline maintenance states
- **Shutdown**: Available for idle, running, maintenance states
- **Enter Maintenance**: Set worker to maintenance mode with comment
- **Exit Maintenance**: Remove worker from maintenance mode

资料来源：[providers/edge3/src/airflow/providers/edge3/plugins/www/src/pages/WorkerPage.tsx:1-100]()

### Worker Operation Dialogs

Bulk operations are supported via `BulkWorkerOperations.tsx`:

```tsx
<Dialog.Root>
  <Portal>
    <Dialog.Backdrop />
    <Dialog.Positioner>
      <Dialog.Content>
        <Dialog.Header>
          <Dialog.Title>
            Delete {deleteWorkers.length} selected worker(s)
          </Dialog.Title>
        </Dialog.Header>
        <Dialog.Body>
          <List.Root>
            {deleteWorkers.map((worker) => (
              <List.Item key={worker.worker_name}>{worker.worker_name}</List.Item>
            ))}
          </List.Root>
        </Dialog.Body>
        <Dialog.Footer>
          <Button colorPalette="danger" loading={isBulkDeletePending}>
            Delete Workers
          </Button>
        </Dialog.Footer>
      </Dialog.Content>
    </Dialog.Positioner>
  </Portal>
</Dialog.Root>
```

## Internationalization

The UI uses i18n (internationalization) patterns for all user-facing text:

```typescript
// Translation usage example
translate("dagRun.runAfter")      // Column headers
translate("dags:runAndTaskActions.confirmationDialog.title")
translate("common:modal.confirm")
translate("taskInstance", { count: affectedTasks.total_entries })
```

Translation keys are organized by:
- **common**: Shared UI elements
- **components**: Component-specific strings
- **dags**: DAG-related UI strings
- **dagRun**: DAG run specific strings

## State Management and Data Fetching

The UI uses TanStack Query (React Query) for server state management:

```typescript
// Typical data fetching pattern
const { data, isLoading, error } = useQuery({
  queryKey: ['dagRuns', dagId, page],
  queryFn: () => fetchDagRuns(dagId, page),
});
```

Key patterns:
- **Optimistic updates**: Immediate UI feedback during mutations
- **Lazy loading**: Components render only when needed
- **Error boundaries**: Graceful error handling with `ErrorAlert` components
- **Loading states**: Skeleton loaders for better UX

## Pagination

Paginated lists are implemented consistently throughout the UI:

```tsx
<Pagination.Root
  count={data?.total_entries ?? 0}
  onPageChange={(event) => setPage(event.page)}
  page={page}
  pageSize={PAGE_LIMIT}
>
  <HStack>
    <Pagination.PrevTrigger />
    <Pagination.Items />
    <Pagination.NextTrigger />
  </HStack>
</Pagination.Root>
```

Standard pagination constants:
- **PAGE_LIMIT**: Default items per page (typically 25-50)
- **total_entries**: Total count from API response

## Summary

Apache Airflow's User Interface is a comprehensive React-based frontend that provides:

1. **Multi-view DAG Visualization**: Grid, Graph, and Overview views for different use cases
2. **Comprehensive Task Management**: Clear, trigger, and manage task instances
3. **Connection Management**: Visual interface for managing Airflow connections
4. **Edge Worker Control**: Dedicated UI for edge worker deployment management
5. **Consistent Component Patterns**: Reusable dialogs, tables, and forms
6. **Internationalization**: Full i18n support for global deployments
7. **Type Safety**: Full TypeScript coverage for reliable development

The UI architecture emphasizes modularity, lazy loading, and consistent patterns to ensure maintainability and performance across large-scale Airflow deployments.

---

<a id='data-flow-xcom'></a>

## Data Flow and State Management

### 相关页面

相关主题：[Core Concepts](#core-concepts), [Connections and Variables](#connections-variables)

<details>
<summary>Relevant Source Files</summary>

以下源码文件用于生成本页说明：

- [airflow-core/src/airflow/models/xcom.py](https://github.com/apache/airflow/blob/main/airflow-core/src/airflow/models/xcom.py)
- [airflow-core/src/airflow/models/xcom_arg.py](https://github.com/apache/airflow/blob/main/airflow-core/src/airflow/models/xcom_arg.py)
- [airflow-core/src/airflow/models/asset.py](https://github.com/apache/airflow/blob/main/airflow-core/src/airflow/models/asset.py)
- [airflow-core/src/airflow/models/asset_state.py](https://github.com/apache/airflow/blob/main/airflow-core/src/airflow/models/asset_state.py)
- [airflow-core/src/airflow/models/task_state.py](https://github.com/apache/airflow/blob/main/airflow-core/src/airflow/models/task_state.py)
- [airflow-core/src/airflow/serialization/definitions/xcom_arg.py](https://github.com/apache/airflow/blob/main/airflow-core/src/airflow/serialization/definitions/xcom_arg.py)
- [airflow-core/docs/core-concepts/xcoms.rst](https://github.com/apache/airflow/blob/main/airflow-core/docs/core-concepts/xcoms.rst)
</details>

# Data Flow and State Management

## Overview

Data Flow and State Management in Apache Airflow encompasses the mechanisms by which tasks communicate, share state, and coordinate execution across a Directed Acyclic Graph (DAG). Airflow provides several interconnected systems to manage how data moves between tasks, how task states are tracked, and how assets trigger workflow execution.

The core components involved in data flow include XCom (Cross-Communication), Assets, and Task State management. These systems work together to enable complex workflows where tasks can pass data, react to external events, and maintain execution context throughout the DAG lifecycle.

## XCom (Cross-Communication)

### Purpose and Role

XCom is Airflow's primary mechanism for inter-task communication. It allows tasks to push and pull values that can be used by downstream tasks in the same DAG. XComs are stored in the Airflow metadata database and can contain any serializable Python object.

### XCom Data Model

| Attribute | Type | Description |
|-----------|------|-------------|
| `key` | String | Identifier for the XCom value |
| `value` | Any | The actual data being passed |
| `task_id` | String | Task that produced the XCom |
| `dag_id` | String | DAG containing the task |
| `run_id` | String | DAG run identifier |
| `map_index` | Integer | Index for mapped tasks (-1 for non-mapped) |
| `timestamp` | DateTime | When the XCom was created |
| `connection_id` | String | Optional connection for large data |

### XCom API Operations

#### Push (Sending XCom Values)

Tasks can push XCom values using the `xcom_push()` method:

```python
task_instance.xcom_push(key="result", value={"data": "example"})
```

Or implicitly by returning a value from a task:

```python
@task
def process_data():
    return {"processed": True, "count": 42}
```

#### Pull (Receiving XCom Values)

Downstream tasks can retrieve XCom values using `xcom_pull()`:

```python
upstream_result = ti.xcom_pull(task_ids=["process_data"], key="result")
```

### XComArg

`XComArg` provides a more declarative way to reference XCom values, enabling type-safe access to task outputs:

```python
from airflow.models.xcom_arg import XComArg

processed_data = XComArg(task_id="process_data")
result = processed_data(map_indexes=[0])
```

XComArg supports the following parameters:

| Parameter | Type | Description |
|-----------|------|-------------|
| `task_id` | String | Source task identifier |
| `dag_id` | String | Optional DAG identifier |
| `map_index` | Integer/List | Specific map index or indices |
| `key` | String | XCom key to retrieve |

## Asset Management

### Asset Model

Assets represent data sources or sinks that Airflows can monitor. They enable event-driven scheduling where DAGs can be triggered when underlying data changes.

| Attribute | Type | Description |
|-----------|------|-------------|
| `name` | String | Human-readable asset name |
| `uri` | String | Unique identifier (URN, path, etc.) |
| `group` | String | Logical grouping of assets |
| `extra` | Dict | Additional metadata |
| `created_at` | DateTime | Creation timestamp |
| `updated_at` | DateTime | Last modification timestamp |

### Asset States

Asset states track the lifecycle and availability of assets:

```python
class AssetState:
    """Represents the state of an asset in the system."""
    
    IDLE = "idle"
    SCHEDULED = "scheduled"
    RUNNING = "running"
    SUCCESS = "success"
    FAILED = "failed"
```

#### State Transitions

```mermaid
graph TD
    A[IDLE] -->|Asset referenced| B[SCHEDULED]
    B -->|Scheduler picks up| C[RUNNING]
    C -->|Success| D[SUCCESS]
    C -->|Failure| E[FAILED]
    D -->|Next schedule| B
    E -->|Retry| B
```

## Task State Management

### Task States

Task instances progress through a defined set of states:

| State | Description |
|-------|-------------|
| `none` | Task has not been triggered |
| `queued` | Task is queued for execution |
| `running` | Task is currently executing |
| `success` | Task completed successfully |
| `failed` | Task failed during execution |
| `upstream_failed` | Upstream task dependency failed |
| `skipped` | Task was skipped due to branching |
| `up_for_retry` | Task failed but will be retried |
| `up_for_reschedule` | Task is waiting for reschedule |

### TaskState Model

```python
class TaskState:
    """Tracks the current state and context of a task instance."""
    
    task_id: str
    dag_id: str
    run_id: str
    state: TaskInstanceState
    try_number: int
    max_tries: int
    start_date: Optional[datetime]
    end_date: Optional[datetime]
    duration: Optional[float]
```

### State Persistence

Task states are persisted to the metadata database, enabling:

- Recovery from scheduler restarts
- Historical execution tracking
- DAG run state reconstruction
- SLA monitoring and alerting

## Data Flow Architecture

### Task Execution Flow

```mermaid
graph TD
    A[DAG Trigger] --> B[Scheduler]
    B --> C{Dependency Check}
    C -->|All met| D[Queue Task]
    C -->|Not met| E[Wait]
    D --> F[Executor]
    F --> G[Worker]
    G --> H[Execute Task]
    H --> I{Push XCom}
    I -->|Yes| J[Store in DB]
    J --> K[Task Complete]
    I -->|No| K
    K --> L[Update TaskState]
    L --> M[Check Downstream]
    M --> C
```

### XCom Storage Flow

```mermaid
graph LR
    A[Task Instance] -->|xcom_push| B[XCom Table]
    C[Task Instance] -->|xcom_pull| B
    B -->|query| D[Metadata DB]
    D -->|result| C
```

## Serialization and Deserialization

### XComArg Serialization

When DAGs are serialized (for example, for the webserver or worker), XComArg references must be properly handled:

```python
class XComArgBase:
    """Base class for serializable XCom arguments."""
    
    def serialize(self) -> dict:
        """Convert XComArg to dictionary format."""
        return {
            "task_id": self.task_id,
            "dag_id": self.dag_id,
            "key": self.key,
            "map_index": self.map_index,
        }
    
    @staticmethod
    def deserialize(data: dict) -> "XComArgBase":
        """Reconstruct XComArg from dictionary."""
        return XComArg(**data)
```

## Best Practices

### Data Flow Design

1. **Minimize XCom payload size** - Large XCom values impact database performance
2. **Use Assets for external data** - Let the scheduler handle dependency tracking
3. **Prefer pull over push patterns** - Downstream tasks should pull required data
4. **Clear XCom when unnecessary** - Use `xcom_clear()` to prevent accumulation

### State Management

1. **Monitor task durations** - Track state transitions for performance analysis
2. **Configure appropriate retries** - Set `retries` and `retry_delay` based on task reliability
3. **Use SLA alerts** - Configure `sla` parameter for critical task deadlines
4. **Clean up failed states** - Implement proper error handling and state recovery

## Configuration Options

| Parameter | Default | Description |
|-----------|---------|-------------|
| `xcom_pickle_tasks_on_error` | False | Serialize task data to XCom on error |
| `max_xcom_size` | 1048576 | Maximum XCom value size in bytes |
| `xcom_backend` | airflow.models.xcom.BaseXCom | Custom XCom backend class |
| `enable_asset_uri_validation` | True | Validate asset URIs on creation |

## Related Documentation

- [XCom Documentation](airflow-core/docs/core-concepts/xcoms.rst)
- [Assets Guide](https://airflow.apache.org/docs/apache-airflow/stable/authoring-and-scheduling/asset-management.html)
- [Task Lifecycle](https://airflow.apache.org/docs/apache-airflow/stable/core-concepts/tasks.html#task-instances)

---

<a id='connections-variables'></a>

## Connections and Variables

### 相关页面

相关主题：[Data Flow and State Management](#data-flow-xcom), [Docker and Helm Deployment](#docker-helm)

<details>
<summary>相关源码文件</summary>

以下源码文件用于生成本页说明：

- [airflow-core/src/airflow/cli/cli_config.py](https://github.com/apache/airflow/blob/main/airflow-core/src/airflow/cli/cli_config.py)
- [airflow-core/src/airflow/utils/db.py](https://github.com/apache/airflow/blob/main/airflow-core/src/airflow/utils/db.py)
- [providers/google/src/airflow/providers/google/cloud/hooks/cloud_composer.py](https://github.com/apache/airflow/blob/main/providers/google/src/airflow/providers/google/cloud/hooks/cloud_composer.py)
- [dev/breeze/src/airflow_breeze/params/build_prod_params.py](https://github.com/apache/airflow/blob/main/dev/breeze/src/airflow_breeze/params/build_prod_params.py)
- [generated/README.md](https://github.com/apache/airflow/blob/main/generated/README.md)
- [dev/breeze/src/airflow_breeze/commands/sbom_commands.py](https://github.com/apache/airflow/blob/main/dev/breeze/src/airflow_breeze/commands/sbom_commands.py)
</details>

# Connections and Variables

## Overview

Connections and Variables are two fundamental abstractions in Apache Airflow for managing configuration and external system integrations. They enable DAGs to store, retrieve, and utilize configuration data and credentials without hardcoding sensitive information.

**Connections** provide a secure way to store and manage credentials for external systems such as databases, APIs, cloud services, and file systems. They centralize authentication configuration in one place, making it easy to manage, update, and audit access to external resources.

**Variables** provide a simple key-value store for storing arbitrary configuration data that can be accessed across DAGs and tasks. They are useful for storing environment-specific settings, feature flags, and general configuration values.

Both features support multiple backend storage mechanisms and can be managed through the Airflow UI, CLI, or programmatically via the API.

## Connections

### Purpose and Scope

Connections in Airflow encapsulate all the information needed to connect to an external system. Each connection includes:

- A unique connection identifier (conn_id)
- Connection type (conn_type) specifying the external system category
- Host, port, login, and password fields
- Schema and extra parameters for additional configuration

Connections are stored in the Airflow metadata database by default but can also be backed by secrets backends such as HashiCorp Vault, AWS Secrets Manager, Azure Key Vault, or Google Cloud Secret Manager.

### Connection Types

Airflow supports numerous connection types through provider packages. The core supported types include:

| Connection Type | Description | Default Port |
|-----------------|-------------|-------------|
| `facebook_social` | Facebook OAuth/social authentication | N/A |
| `fs` | Filesystem connection | N/A |
| `ftp` | FTP/SFTP server | 21 |
| `google_cloud_platform` | Google Cloud Platform services | N/A |
| `gremlin` | Apache TinkerPop Gremlin server | 8182 |
| `hive_cli` | Hive command-line interface | 10000 |
| `hiveserver2` | HiveServer2 JDBC interface | 10000 |
| `http` | HTTP/HTTPS endpoints | 443 |
| `iceberg` | Apache Iceberg catalog | N/A |

资料来源：[airflow-core/src/airflow/utils/db.py:200-260]()

### Connection Management via CLI

The Airflow CLI provides comprehensive commands for managing connections:

```bash
# List all connections
airflow connections list

# Get a specific connection
airflow connections get <conn_id>

# Add a new connection
airflow connections add <conn_id> --conn-type http --host https://api.example.com

# Delete a connection
airflow connections delete <conn_id>

# Export all connections
airflow connections export /tmp/connections.json

# Import connections from file
airflow connections import /tmp/connections.json

# Test a connection
airflow connections test <conn_id>
```

The connection commands are defined in the CLI configuration with lazy-loaded implementations:

```python
CONNECTIONS_COMMANDS = (
    ActionCommand(name="get", func=lazy_load_command("airflow.cli.commands.connection_command.connections_get"), ...),
    ActionCommand(name="list", func=lazy_load_command("airflow.cli.commands.connection_command.connections_list"), ...),
    ActionCommand(name="add", func=lazy_load_command("airflow.cli.commands.connection_command.connections_add"), ...),
    ActionCommand(name="delete", func=lazy_load_command("airflow.cli.commands.connection_command.connections_delete"), ...),
    ActionCommand(name="export", func=lazy_load_command("airflow.cli.commands.connection_command.connections_export"), ...),
    ActionCommand(name="import", func=lazy_load_command("airflow.cli.commands.connection_command.connections_import"), ...),
    ActionCommand(name="test", func=lazy_load_command("airflow.cli.commands.connection_command.connections_test"), ...),
)
```

资料来源：[airflow-core/src/airflow/cli/cli_config.py:1-50]()

### Connection Storage Architecture

```mermaid
graph TD
    A[Airflow CLI/UI/API] --> B[Connection CRUD Operations]
    B --> C{Secrets Backend}
    C -->|None/Default| D[Airflow Metadata Database]
    C -->|HashiCorp Vault| E[Vault Secrets]
    C -->|AWS| F[AWS Secrets Manager]
    C -->|Azure| G[Azure Key Vault]
    C -->|GCP| H[GCP Secret Manager]
    D --> I[connection Table]
    E --> J[Vault Path]
    F --> K[AWS Secrets]
    G --> L[Azure Keys]
    H --> M[GCP Secrets]
```

### Using Connections in DAGs

Connections are accessed in DAGs through the `BaseHook` class:

```python
from airflow.hooks.base import BaseHook

def get_external_api_data():
    conn = BaseHook.get_connection("my_external_api")
    api_key = conn.password
    base_url = conn.host
    
    # Use credentials to make API calls
    ...
```

Connections with custom configurations can utilize the `extra` field for JSON-encoded parameters:

```python
Connection(
    conn_id="ftp_default",
    conn_type="ftp",
    host="localhost",
    port=21,
    login="airflow",
    password="airflow",
    extra='{"key_file": "~/.ssh/id_rsa", "no_host_key_check": true}'
)
```

资料来源：[airflow-core/src/airflow/utils/db.py:230-240]()

## Variables

### Purpose and Scope

Variables provide a simple key-value storage mechanism for storing configuration data that can be shared across DAGs and tasks. They support string values with optional JSON serialization for complex data structures.

Key characteristics of Variables:

- Key-value pairs stored in the Airflow metadata database
- Optional JSON serialization for non-string values
- Support for environment-specific configuration
- Accessible from all DAGs and tasks
- Can be exported/imported in bulk

### Variable Management via CLI

The Airflow CLI provides comprehensive commands for managing variables:

```bash
# List all variables
airflow variables list

# Get a specific variable
airflow variables get <var_key>

# Set a variable
airflow variables set <var_key> <var_value>

# Set a variable with JSON serialization
airflow variables set config_json '{"setting": true}' --serialize-json

# Delete a variable
airflow variables delete <var_key>

# Export all variables
airflow variables export /tmp/variables.json

# Import variables from file
airflow variables import /tmp/variables.json
```

The variable commands are defined with support for serialization options:

```python
VARIABLES_COMMANDS = (
    ActionCommand(name="list", func=lazy_load_command("airflow.cli.commands.variable_command.variables_list"), ...),
    ActionCommand(name="get", func=lazy_load_command("airflow.cli.commands.variable_command.variables_get"), 
                  args=(ARG_VAR, ARG_DESERIALIZE_JSON, ARG_DEFAULT, ARG_VERBOSE)),
    ActionCommand(name="set", func=lazy_load_command("airflow.cli.commands.variable_command.variables_set"),
                  args=(ARG_VAR, ARG_VAR_VALUE, ARG_VAR_DESCRIPTION, ARG_SERIALIZE_JSON, ARG_VERBOSE)),
    ActionCommand(name="delete", func=lazy_load_command("airflow.cli.commands.variable_command.variables_delete"), ...),
    ActionCommand(name="export", func=lazy_load_command("airflow.cli.commands.variable_command.variables_export"), ...),
    ActionCommand(name="import", func=lazy_load_command("airflow.cli.commands.variable_command.variables_import"), ...),
)
```

资料来源：[airflow-core/src/airflow/cli/cli_config.py:50-80]()

### Variable Storage Architecture

```mermaid
graph TD
    A[DAG/Task Code] --> B[Variable API]
    B --> C[Secrets Backend]
    C -->|Default| D[Metadata Database]
    C -->|Backend| E[External Secrets]
    D --> F[variable Table]
    E --> F
    B --> G[JSON Deserializer]
    G --> H[Python Objects]
    F --> I[Key-Value Store]
```

### Using Variables in DAGs

Variables can be accessed using the `Variable` class:

```python
from airflow.models import Variable

# Get a simple string variable
api_endpoint = Variable.get("api_endpoint")

# Get with default value
timeout = Variable.get("timeout", default_var=30)

# Get and deserialize JSON
config = Variable.get("my_config", deserialize_json=True)

# Set a variable
Variable.set("my_key", "my_value")
Variable.set("config", {"setting": True}, serialize_json=True)
```

## Secrets Backend Integration

Both Connections and Variables can be backed by external secrets backends for enhanced security. This allows storing sensitive data in enterprise-grade secret management systems while maintaining the same Airflow API interface.

### Available Secrets Backends

| Backend | Package | Description |
|---------|---------|-------------|
| HashiCorp Vault | `airflow.providers.hashicorp` | HashiCorp Vault KV secrets engine |
| AWS Secrets Manager | `airflow.providers.amazon` | AWS Secrets Manager and SSM Parameter Store |
| Azure Key Vault | `airflow.providers.microsoft.azure` | Azure Key Vault secrets |
| GCP Secret Manager | `airflow.providers.google` | Google Cloud Secret Manager |
| Environment Variables | Built-in | Read from OS environment variables |
| Local Files | Built-in | Read from JSON/YAML files |

### Configuring Secrets Backends

The secrets backend is configured via the `[secrets]` section in `airflow.cfg`:

```ini
[secrets]
backend = airflow.providers.google.cloud.secrets.secret_manager.CloudSecretManagerBackend
backend_kwargs = {"project_id": "my-project", "connections_prefix": "airflow-connections"}
```

## Best Practices

### Security Considerations

1. **Never hardcode credentials** - Always use Connections or Variables for sensitive data
2. **Use secrets backends** - For production environments, use enterprise secret management systems
3. **Enable encryption** - Ensure the Airflow metadata database is encrypted at rest
4. **Rotate credentials** - Regularly rotate passwords and API keys stored in connections
5. **Audit access** - Monitor and log access to sensitive connections and variables

### Performance Considerations

1. **Avoid frequent variable reads** - Cache variable values when used repeatedly in a task
2. **Use connection pooling** - Many hooks automatically pool connections; configure appropriately
3. **Limit extra field size** - Keep connection `extra` JSON data minimal for performance

### Operational Considerations

1. **Use meaningful conn_ids** - Follow naming conventions like `{env}_{system}_{purpose}`
2. **Document connections** - Use the description field to document connection purpose and owners
3. **Export for disaster recovery** - Regularly export connections and variables for backup

```bash
# Backup connections and variables
airflow connections export /opt/airflow/backups/connections_$(date +%Y%m%d).json
airflow variables export /opt/airflow/backups/variables_$(date +%Y%m%d).json
```

## CLI Command Reference

### Connection Commands

| Command | Description |
|---------|-------------|
| `airflow connections list` | List all connections |
| `airflow connections get <conn_id>` | Get connection details |
| `airflow connections add <conn_id> [options]` | Create a connection |
| `airflow connections delete <conn_id>` | Delete a connection |
| `airflow connections export <file>` | Export connections to file |
| `airflow connections import <file>` | Import connections from file |
| `airflow connections test <conn_id>` | Test connection availability |
| `airflow connections create-default-connections` | Initialize provider default connections |

### Variable Commands

| Command | Description |
|---------|-------------|
| `airflow variables list` | List all variables |
| `airflow variables get <key>` | Get variable value |
| `airflow variables set <key> <value>` | Set variable value |
| `airflow variables delete <key>` | Delete a variable |
| `airflow variables export <file>` | Export variables to file |
| `airflow variables import <file>` | Import variables from file |

## See Also

- [Airflow Connections Documentation](https://airflow.apache.org/docs/apache-airflow/stable/authoring-and-scheduling/connections.html)
- [Airflow Variables Documentation](https://airflow.apache.org/docs/apache-airflow/stable/core-concepts/variables.html)
- [Secrets Backend Configuration](https://airflow.apache.org/docs/apache-airflow/stable/security/secrets/secrets-backend/index.html)
- [Google Cloud Composer Hook](https://github.com/apache/airflow/blob/main/providers/google/src/airflow/providers/google/cloud/hooks/cloud_composer.py) - Example of connection usage in provider hooks

---

<a id='docker-helm'></a>

## Docker and Helm Deployment

### 相关页面

相关主题：[Kubernetes Deployment](#kubernetes-deployment), [Architecture Overview](#architecture-overview)

<details>
<summary>相关源码文件</summary>

以下源码文件用于生成本页说明：

- [dev/breeze/src/airflow_breeze/commands/release_management_commands.py](https://github.com/apache/airflow/blob/main/dev/breeze/src/airflow_breeze/commands/release_management_commands.py)
- [docker-stack-docs/README.md](https://github.com/apache/airflow/blob/main/docker-stack-docs/README.md)
- [generated/PYPI_README.md](https://github.com/apache/airflow/blob/main/generated/PYPI_README.md)
- [dev/breeze/src/airflow_breeze/commands/sbom_commands.py](https://github.com/apache/airflow/blob/main/dev/breeze/src/airflow_breeze/commands/sbom_commands.py)
- [dev/breeze/src/airflow_breeze/global_constants.py](https://github.com/apache/airflow/blob/main/dev/breeze/src/airflow_breeze/global_constants.py)
</details>

# Docker and Helm Deployment

Apache Airflow provides comprehensive support for containerized and Kubernetes-based deployments through Docker images and Helm charts. This documentation covers the deployment mechanisms, configuration options, and best practices for running Airflow in these environments.

## Overview

Apache Airflow can be deployed using two primary methods:

| Deployment Method | Use Case | Primary Files |
|------------------|----------|---------------|
| **Docker** | Local development, single-node deployments, CI/CD pipelines | `Dockerfile`, `Dockerfile.ci` |
| **Helm Chart** | Production Kubernetes deployments, distributed systems | `chart/Chart.yaml`, `chart/values.yaml` |

资料来源：[docker-stack-docs/README.md](https://github.com/apache/airflow/blob/main/docker-stack-docs/README.md)

## Docker Image Architecture

Apache Airflow ships with two main Docker images optimized for different use cases.

### Production Image (`Dockerfile`)

The production Docker image is built using multi-stage builds and includes all necessary components for running Airflow in production environments. The image uses `python:{DEFAULT_PYTHON_MAJOR_MINOR_VERSION}-slim-{ALLOWED_DEBIAN_VERSIONS[0]}` as its base.

Key characteristics of the production image:

- **Base OS**: Debian slim variant for minimal footprint
- **Package Manager**: Uses `uv` (Ultraviolet) for faster pip installations
- **Architecture**: Multi-stage build for optimized image size
- **User Space**: Runs as non-root user for security

资料来源：[dev/breeze/src/airflow_breeze/commands/release_management_commands.py:55-62](https://github.com/apache/airflow/blob/main/dev/breeze/src/airflow_breeze/commands/release_management_commands.py)

### CI Image (`Dockerfile.ci`)

The CI image is designed for continuous integration workflows and includes additional tooling for testing and development.

```dockerfile
# syntax=docker/dockerfile:1.4
FROM python:{DEFAULT_PYTHON_MAJOR_MINOR_VERSION}-slim-{ALLOWED_DEBIAN_VERSIONS[0]}
RUN apt-get update && apt-get install -y --no-install-recommends libatomic1 git curl
RUN pip install uv=={UV_VERSION}
RUN --mount=type=cache,id=cache-airflow-build-dockerfile-installation,target=/root/.cache/ \
  uv pip install --system ignore pip=={AIRFLOW_PIP_VERSION} hatch=={HATCH_VERSION} \
  pyyaml=={PYYAML_VERSION} gitpython=={GITPYTHON_VERSION} rich=={RICH_VERSION} \
  prek=={PREK_VERSION}
COPY . /opt/airflow
```

资料来源：[dev/breeze/src/airflow_breeze/commands/release_management_commands.py:56-65](https://github.com/apache/airflow/blob/main/dev/breeze/src/airflow_breeze/commands/release_management_commands.py)

### Version Constants

The build process uses the following pinned version constants:

| Constant | Version | Purpose |
|----------|---------|---------|
| `AIRFLOW_PIP_VERSION` | `26.1.1` | pip package manager |
| `AIRFLOW_UV_VERSION` | `0.11.11` | Fast Python package installer |
| `GITPYTHON_VERSION` | `3.1.50` | Git operations in Python |
| `RICH_VERSION` | `15.0.0` | Rich terminal output |
| `HATCH_VERSION` | `1.16.5` | Python packaging |
| `PYYAML_VERSION` | `6.0.3` | YAML parsing |
| `PREK_VERSION` | `0.3.13` | Pre-commit hooks |

资料来源：[dev/breeze/src/airflow_breeze/commands/release_management_commands.py:70-76](https://github.com/apache/airflow/blob/main/dev/breeze/src/airflow_breeze/commands/release_management_commands.py)

## Helm Chart Deployment

The Apache Airflow Helm chart provides a production-ready deployment mechanism for Kubernetes clusters.

### Chart Metadata

| Property | Value |
|----------|-------|
| Chart Name | `airflow` |
| Repository | Apache Airflow Official |
| Kubernetes Support | v1.30+, v1.31+, v1.32+, v1.33+ |

Supported Kubernetes versions are determined by the major cloud providers (EKS, AKS, GKE) support windows.

资料来源：[dev/breeze/src/airflow_breeze/global_constants.py:48-54](https://github.com/apache/airflow/blob/main/dev/breeze/src/airflow_breeze/global_constants.py)

### Core Components

The Helm chart deploys the following core Airflow components:

```mermaid
graph TD
    A[Helm Chart] --> B[Webserver]
    A --> C[Scheduler]
    A --> D[Triggerer]
    A --> E[Worker]
    A --> F[Flower]
    A --> G[StatsD]
    A --> H[Redis]
    A --> I[PostgreSQL/MySQL]
    
    B -->|Read/Write| I
    C -->|Queue Jobs| H
    D -->|Async Tasks| H
    E -->|Process Tasks| H
```

### Scheduler Deployment

The scheduler is deployed as a Kubernetes Deployment with the following characteristics:

- **Replicas**: Configurable via `replicas` parameter
- **Resources**: Configurable CPU and memory limits
- **Health Checks**: Liveness and readiness probes
- **Persistence**: Optional volume mounts for DAGs and logs

资料来源：[chart/templates/scheduler/scheduler-deployment.yaml](https://github.com/apache/airflow/blob/main/chart/templates/scheduler/scheduler-deployment.yaml)

### Worker Deployment

Workers process tasks from the message queue and are deployed as:

- **Deployment Type**: StatefulSet or Deployment based on configuration
- **Scaling**: Horizontal Pod Autoscaler (HPA) support
- **Queue Configuration**: Multiple queues supported
- **Resource Management**: Configurable resource requests and limits

资料来源：[chart/templates/workers/worker-deployment.yaml](https://github.com/apache/airflow/blob/main/chart/templates/workers/worker-deployment.yaml)

## Installation Methods

### Installing from PyPI

While it is possible to install Airflow using tools like Poetry or pip-tools, only `pip` installation is currently officially supported.

> **Note**: Installing via Poetry or pip-tools is not currently supported.

For repeatable installation, Airflow maintains "known-to-be-working" constraint files in the orphan `constraints-main` and `constraints-2-0` branches. These constraint files are maintained per major/minor Python version.

资料来源：[generated/PYPI_README.md](https://github.com/apache/airflow/blob/main/generated/PYPI_README.md)

### Installing Providers from Sources

Providers can be installed from source with SHA512 verification:

```bash
shasum -a 512 {{ package_name }}-{{ package_version }}.tar.gz | diff - {{ package_name }}-{{ package_version }}.tar.gz.sha512
```

This ensures the integrity of the downloaded package against the provided checksum.

资料来源：[devel-common/src/sphinx_exts/includes/installing-providers-from-sources.rst](https://github.com/apache/airflow/blob/main/devel-common/src/sphinx_exts/includes/installing-providers-from-sources.rst)

## Helm Chart Package Preparation

The release management tooling includes commands for preparing Helm chart packages:

```mermaid
graph LR
    A[Chart Source] --> B[helm package]
    B --> C[Deterministic Repack]
    C --> D[PGP Signature]
    D --> E[Distribution Archive]
```

### Package Signing

Helm chart packages can be signed using GPG for verification:

```bash
helm gpg sign -u <sign-email> <archive-name>
```

The signing process generates a provenance file (`.tgz.prov`) that can be used to verify the package integrity.

资料来源：[dev/breeze/src/airflow_breeze/commands/release_management_commands.py:400-420](https://github.com/apache/airflow/blob/main/dev/breeze/src/airflow_breeze/commands/release_management_commands.py)

## Software Bill of Materials (SBOM)

The Airflow project generates and maintains SBOM information for security and compliance purposes.

### SBOM Commands

The `breeze` CLI provides commands for managing SBOM information:

```bash
breeze sbom update-sbom-information [OPTIONS]
```

#### Command Options

| Option | Type | Description |
|--------|------|-------------|
| `--airflow-site-archive-path` | Path | Directory for airflow-site-archive |
| `--airflow-root-path` | Path | Root of the airflow repository |
| `--airflow-version` | String | Version of airflow to update SBOM |
| `--airflow-constraints-reference` | String | Constraints reference for SBOM generation |

These files are placed in `airflow-site-archive/docs-archive/` or `generated/_build/docs/apache-airflow/stable/` depending on the configuration.

资料来源：[dev/breeze/src/airflow_breeze/commands/sbom_commands.py](https://github.com/apache/airflow/blob/main/dev/breeze/src/airflow_breeze/commands/sbom_commands.py)

### SBOM File Generation

SBOM files are generated based on:

- `provider.yaml` files
- Airflow constraints
- Provider tags and releases

The SBOM information is automatically regenerated using the `breeze release-management generate-providers-metadata` command.

资料来源：[generated/README.md](https://github.com/apache/airflow/blob/main/generated/README.md)

## Docker Compose Deployment

For development and testing purposes, Airflow can be deployed using Docker Compose.

### Prerequisites

- Docker Engine
- Docker Compose v2+
- SQLite database (automatically created if `AIRFLOW_HOME` is not set)

### Basic Usage

For example commands that start Airflow, refer to the [Executing commands](https://airflow.apache.org/docs/docker-stack/entrypoint.html#entrypoint-commands) documentation.

资料来源：[docker-stack-docs/README.md](https://github.com/apache/airflow/blob/main/docker-stack-docs/README.md)

## Deployment Architecture

```mermaid
graph TD
    subgraph "Client Layer"
        A[Airflow CLI] --> B[REST API]
        C[Web UI] --> B
    end
    
    subgraph "Core Services"
        D[Scheduler] --> E[(Metadata DB)]
        F[Triggerer] --> E
        G[Webserver] --> E
    end
    
    subgraph "Task Execution"
        H[Workers] --> I[Message Queue]
        D --> I
        F --> I
    end
    
    subgraph "Storage"
        J[DAGs Repository]
        K[Logs Storage]
        L[Plugins]
    end
    
    H --> K
    G --> K
    D --> J
```

## Default Connections

The Airflow deployment includes pre-configured connection templates for common services:

| Connection ID | Type | Default Settings |
|---------------|------|-------------------|
| `google_cloud_default` | Google Cloud Platform | Schema: default |
| `http_default` | HTTP | Host: https://www.httpbin.org/ |
| `ftp_default` | FTP | localhost:21 |
| `hive_cli_default` | Hive CLI | localhost:10000 |
| `hiveserver2_default` | HiveServer2 | localhost:10000 |
| `fs_default` | File System | Path: / |
| `gremlin_default` | Gremlin | Host: gremlin:8182 |
| `iceberg_default` | Iceberg | HTTPS endpoint |

资料来源：[airflow-core/src/airflow/utils/db.py](https://github.com/apache/airflow/blob/main/airflow-core/src/airflow/utils/db.py)

## Verification and Security

### Package Verification

PyPI releases can be verified by downloading the package, signature, and SHA sum files:

```bash
#!/bin/bash
PACKAGE_VERSION={{ package_version }}
PACKAGE_NAME={{ package_name }}
provider_download_dir=$(mktemp -d)
pip download --no-deps "${PACKAGE_NAME}==${PACKAGE_VERSION}" --dest "${provider_download_dir}"
curl "{{ base_url }}/{{ package_name_underscores }}-{{ package_version }}-py3-none-any.whl.asc" \
    -L -o "${provider_download_dir}/{{ package_name_underscores }}-{{ package_version }}-py3-none-any.whl.asc"
curl "{{ base_url }}/{{ package_name_underscores }}-{{ package_version }}-py3-none-any.whl.sha512" \
    -L -o "${provider_download_dir}/{{ package_name_underscores }}-{{ package_version }}-py3-none-any.whl.sha512"
echo "Please verify files downloaded to ${provider_download_dir}"
ls -
```

资料来源：[devel-common/src/sphinx_exts/includes/installing-providers-from-sources.rst](https://github.com/apache/airflow/blob/main/devel-common/src/sphinx_exts/includes/installing-providers-from-sources.rst)

## Related Documentation

- [Docker Stack Documentation](https://airflow.apache.org/docs/docker-stack/)
- [Helm Chart Documentation](https://airflow.apache.org/docs/helm-chart/)
- [Docker Compose Guide](https://airflow.apache.org/docs/apache-airflow/stable/howto/docker-compose/index.html)
- [Executing Commands in Docker](https://airflow.apache.org/docs/docker-stack/entrypoint.html)

---

<a id='kubernetes-deployment'></a>

## Kubernetes Deployment

### 相关页面

相关主题：[Docker and Helm Deployment](#docker-helm), [Scheduler and Executor Architecture](#scheduler-executor)

<details>
<summary>相关源码文件</summary>

以下源码文件用于生成本页说明：

- [providers/cncf/kubernetes/src/airflow/providers/cncf/kubernetes/executors/kubernetes_executor.py](https://github.com/apache/airflow/blob/main/providers/cncf/kubernetes/src/airflow/providers/cncf/kubernetes/executors/kubernetes_executor.py)
- [providers/cncf/kubernetes/src/airflow/providers/cncf/kubernetes/operators/pod.py](https://github.com/apache/airflow/blob/main/providers/cncf/kubernetes/src/airflow/providers/cncf/kubernetes/operators/pod.py)
- [providers/cncf/kubernetes/src/airflow/providers/cncf/kubernetes/kube_config.py](https://github.com/apache/airflow/blob/main/providers/cncf/kubernetes/src/airflow/providers/cncf/kubernetes/kube_config.py)
- [providers/cncf/kubernetes/src/airflow/providers/cncf/kubernetes/pod_generator.py](https://github.com/apache/airflow/blob/main/providers/cncf/kubernetes/src/airflow/providers/cncf/kubernetes/pod_generator.py)
- [chart/templates/triggerer/triggerer-deployment.yaml](https://github.com/apache/airflow/blob/main/chart/templates/triggerer/triggerer-deployment.yaml)
- [airflow-core/docs/administration-and-deployment/kubernetes.rst](https://github.com/apache/airflow/blob/main/airflow-core/docs/administration-and-deployment/kubernetes.rst)
- [dev/breeze/src/airflow_breeze/commands/kubernetes_commands.py](https://github.com/apache/airflow/blob/main/dev/breeze/src/airflow_breeze/commands/kubernetes_commands.py)
</details>

# Kubernetes Deployment

Apache Airflow provides comprehensive Kubernetes support through the CNCF Kubernetes provider package. This integration enables Airflow to run tasks as Kubernetes Pods, providing dynamic resource allocation, isolation, and scalability for workflow execution.

## Architecture Overview

Airflow's Kubernetes deployment consists of multiple components that work together to provide a flexible, scalable execution environment.

### Kubernetes Executor Architecture

The Kubernetes Executor is one of Airflow's core executors that launches a new Pod for each task instance. Unlike the Celery Executor which reuses workers, the Kubernetes Executor creates isolated Pods for each task execution.

```mermaid
graph TD
    A[Airflow Scheduler] -->|submits task| B[Kubernetes Executor]
    B -->|creates| C[Task Pod]
    B -->|watches| C
    C -->|completes| D[Pod Status Update]
    D -->|pods cleaned up| E[Resource Cleanup]
    
    F[Kubernetes API Server] -->|manages| C
    G[Worker Nodes] -->|hosts| C
```

### Core Components

| Component | File Path | Purpose |
|-----------|-----------|---------|
| KubernetesExecutor | `providers/cncf/kubernetes/.../executors/kubernetes_executor.py` | Main executor implementation |
| KubernetesPodOperator | `providers/cncf/kubernetes/.../operators/pod.py` | Operator for running pods |
| KubeConfig | `providers/cncf/kubernetes/.../kube_config.py` | Configuration management |
| PodGenerator | `providers/cncf/kubernetes/.../pod_generator.py` | Pod specification builder |

## Configuration

### Kubernetes Executor Configuration

The Kubernetes Executor requires configuration in the `airflow.cfg` file under the `[kubernetes]` section.

```ini
[core]
executor = airflow.providers.cncf.kubernetes.executors.kubernetes_executor.KubernetesExecutor

[kubernetes]
namespace = default
pod_template_file = /path/to/pod_template.yaml
worker_container_repository = apache/airflow
worker_container_tag = latest
```

### Key Configuration Parameters

| Parameter | Description | Default |
|-----------|-------------|---------|
| `namespace` | Kubernetes namespace for task pods | `default` |
| `pod_template_file` | Path to pod template yaml | - |
| `worker_container_repository` | Docker image repository | `apache/airflow` |
| `worker_container_tag` | Docker image tag | `latest` |
| `delete_worker_pods` | Delete pods after task completion | `True` |
| `delete_worker_pods_on_failure` | Delete pods on task failure | `False` |
| `worker_pods_creation_batch_size` | Batch size for pod creation | `2` |

资料来源：[providers/cncf/kubernetes/src/airflow/providers/cncf/kubernetes/kube_config.py]()

## Pod Generation

The `PodGenerator` class is responsible for constructing Kubernetes Pod specifications from Airflow task definitions.

### Pod Template System

Airflow supports custom pod templates that define the base pod specification. These templates use Jinja2 templating to allow dynamic value injection.

```yaml
apiVersion: v1
kind: Pod
metadata:
  name: airflow-worker-pod
spec:
  containers:
    - name: base
      image: "{{ container_image }}"
      env:
        - name: AIRFLOW__CORE__EXECUTOR
          value: LocalExecutor
```

资料来源：[providers/cncf/kubernetes/src/airflow/providers/cncf/kubernetes/pod_generator.py]()

### Dynamic Pod Configuration

The KubernetesPodOperator allows dynamic configuration of pod specifications at runtime:

```python
from airflow.providers.cncf.kubernetes.operators.pod import KubernetesPodOperator

run_pod = KubernetesPodOperator(
    task_id="run_kubernetes_pod",
    name="my-task-pod",
    namespace="default",
    image="python:3.9-slim",
    cmds=["python", "-c", "print('Hello from Kubernetes')"],
    labels={"app": "airflow", "environment": "production"},
    volumes=[config_volume],
    volume_mounts=[config_volume_mount],
    get_logs=True,
    is_delete_operator_pod=True,
)
```

## Airflow Components on Kubernetes

### Triggerer Deployment

The Airflow Triggerer runs as a Kubernetes Deployment to manage triggering logic in a distributed environment.

```yaml
apiVersion: apps/v1
kind: Deployment
metadata:
  name: airflow-triggerer
  namespace: airflow
spec:
  replicas: 2
  selector:
    matchLabels:
      component: triggerer
      tier: airflow
  template:
    metadata:
      labels:
        component: triggerer
        tier: airflow
    spec:
      serviceAccountName: airflow-triggerer
      containers:
        - name: triggerer
          image: {{ .Values.images.triggerer }}
          args: ["triggerer"]
          ports:
            - name: logs
              containerPort: 8793
```

资料来源：[chart/templates/triggerer/triggerer-deployment.yaml]()

### Component Services

| Component | Kubernetes Object | Purpose |
|-----------|-------------------|---------|
| Scheduler | Deployment | DAG parsing and task scheduling |
| Webserver | Deployment | Airflow UI serving |
| Triggerer | Deployment | Deferred task handling |
| Worker | Deployment/DaemonSet | Task execution |
| Flower | Deployment | Celery monitoring (optional) |
| Database | StatefulSet | Metadata storage |
| Redis | StatefulSet | Message broker |

## Local Development with Skaffold

Airflow provides development workflows using Skaffold for hot-reloading code changes into running Kubernetes pods.

### Skaffold Dev Loop

```bash
breeze k8s dev
```

This command runs the Skaffold dev loop to sync DAGs and airflow-core sources to running pods including scheduler, triggerer, dag-processor, and API Server with hot-reload support.

```python
@pulumi_benchmark.group.command(
    name="dev",
    help=(
        "Run skaffold dev loop to sync dags and airflow-core sources to running pods "
        "(scheduler/triggerer/dag-processor/API Server hot-reload; UI auto-refresh not supported yet)."
    ),
)
```

资料来源：[dev/breeze/src/airflow_breeze/commands/kubernetes_commands.py]()

## KubernetesPodOperator

The KubernetesPodOperator allows users to define and execute arbitrary Kubernetes Pods as Airflow tasks.

### Operator Features

| Feature | Description |
|---------|-------------|
| Full Pod Spec | Define complete pod specifications |
| Volume Management | Support for ConfigMaps, Secrets, PVCs |
| Image Pull | Private registry authentication |
| Resource Limits | CPU and memory constraints |
| Node Selection | Pod affinity and node selectors |
| Sidecars | Support for sidecar containers |

### Basic Usage

```python
from airflow.providers.cncf.kubernetes.operators.pod import KubernetesPodOperator

with dag:
    k = KubernetesPodOperator(
        task_id="demo_pod",
        name="demo-pod",
        image="python:3.9",
        cmds=["python", "-c", "print('Hello World')"],
        namespace="default",
        is_delete_operator_pod=True,
        get_logs=True,
    )
```

资料来源：[providers/cncf/kubernetes/src/airflow/providers/cncf/kubernetes/operators/pod.py]()

## Executor Implementation

### Task Execution Flow

```mermaid
sequenceDiagram
    participant Scheduler
    participant K8sExecutor
    participant KubeAPI
    participant Pod
    
    Scheduler->>K8sExecutor: Queue task for execution
    K8sExecutor->>KubeAPI: Create Pod
    KubeAPI->>Pod: Launch pod
    Pod->>Pod: Execute task
    Pod->>KubeAPI: Report completion
    KubeAPI->>K8sExecutor: Pod completed
    K8sExecutor->>Scheduler: Task result
```

### Executor Configuration

```python
class KubernetesExecutor:
    """Kubernetes executor implementation."""
    
    def __init__(self):
        self.kube_config = KubeConfig()
        self.manager = PodLauncher(
            kube_config=self.kube_config,
            in_cluster=self.kube_config.get("in_cluster", False),
        )
```

资料来源：[providers/cncf/kubernetes/src/airflow/providers/cncf/kubernetes/executors/kubernetes_executor.py]()

## Security Considerations

### Service Account Configuration

Running Airflow on Kubernetes requires proper service account configuration with appropriate RBAC permissions.

```yaml
apiVersion: v1
kind: ServiceAccount
metadata:
  name: airflow-executor
  namespace: airflow
---
apiVersion: rbac.authorization.k8s.io/v1
kind: Role
metadata:
  name: airflow-executor
rules:
  - apiGroups: [""]
    resources: ["pods"]
    verbs: ["create", "delete", "get", "list", "patch", "watch"]
  - apiGroups: [""]
    resources: ["pods/log"]
    verbs: ["get"]
```

### Security Context

Pods can be configured with security contexts for enhanced isolation:

```python
KubernetesPodOperator(
    task_id="secure_pod",
    security_context={
        "runAsUser": 1000,
        "fsGroup": 2000,
        "capabilities": {"drop": ["ALL"]},
    },
    container_security_context={
        "readOnlyRootFilesystem": True,
        "runAsNonRoot": True,
    },
)
```

## Resource Management

### Pod Resources

Tasks can define resource requests and limits:

```python
KubernetesPodOperator(
    task_id="resource_limited_task",
    container_resources={
        "request_memory": "128Mi",
        "request_cpu": 0.5,
        "limit_memory": "256Mi",
        "limit_cpu": 1,
    },
)
```

### Node Affinity and Tolerations

```python
KubernetesPodOperator(
    task_id="specific_node_task",
    node_selector={
        "disktype": "ssd",
        "kubernetes.io/arch": "amd64",
    },
    tolerations=[
        {
            "key": "dedicated",
            "operator": "Equal",
            "value": "airflow",
            "effect": "NoSchedule",
        }
    ],
)
```

## Monitoring and Logging

### Log Aggregation

The KubernetesPodOperator supports automatic log retrieval from pods:

```python
KubernetesPodOperator(
    task_id="task_with_logs",
    get_logs=True,
    log_events_on_failure=True,
)
```

### Pod Monitoring

Pod status can be monitored through the Kubernetes API:

```python
from airflow.providers.cncf.kubernetes.utils import delete_from_dict, create_from_dict

# Watch pod status
watcher = PodWatcher(kube_config)
for event in watcher.poll_for_pod_completion(task_instance):
    if event["type"] == "DELETED":
        break
```

## Best Practices

### 1. Use Pod Templates

Define reusable pod templates to ensure consistency across tasks:

```yaml
# pod_template.yaml
apiVersion: v1
kind: Pod
spec:
  securityContext:
    runAsUser: 1000
    fsGroup: 2000
  containers:
    - name: base
      resources:
        requests:
          memory: "128Mi"
          cpu: 0.25
        limits:
          memory: "256Mi"
          cpu: 0.5
```

### 2. Implement Graceful Cleanup

Configure pod deletion policies to manage resource cleanup:

```python
KubernetesPodOperator(
    task_id="cleanup_example",
    is_delete_operator_pod=True,
    on_finish_callback=cleanup_handler,
)
```

### 3. Set Appropriate Timeouts

Always define task timeouts to prevent orphaned pods:

```python
KubernetesPodOperator(
    task_id="timeout_task",
    execution_timeout=timedelta(minutes=30),
    task_concurrency=10,
)
```

## Helm Chart Deployment

The official Airflow Helm chart provides production-ready Kubernetes deployment configurations.

### Minimal Installation

```bash
helm install airflow apache-airflow/airflow \
  --namespace airflow \
  --create-namespace \
  --set executor=CeleryExecutor \
  --set webserver.service.type=LoadBalancer
```

### Production Installation

```bash
helm install airflow apache-airflow/airflow \
  --namespace airflow \
  --values production-values.yaml
```

The Helm chart supports extensive customization through values files, including:
- Multi-tier architecture configuration
- Ingress setup for webserver access
- External database and message broker
- Persistent volume claims for DAG storage
- Resource limits per component
- Security contexts and pod policies

## References

For further information, refer to the official documentation at `airflow-core/docs/administration-and-deployment/kubernetes.rst` and the Kubernetes provider source code in `providers/cncf/kubernetes/`.

---

---

## Doramagic 踩坑日志

项目：apache/airflow

摘要：发现 19 个潜在踩坑项，其中 1 个为 high/blocking；最高优先级：安装坑 - 来源证据：`ExternalTaskSensor` can succeed early for task groups with NULL task states。

## 1. 安装坑 · 来源证据：`ExternalTaskSensor` can succeed early for task groups with NULL task states

- 严重度：high
- 证据强度：source_linked
- 发现：GitHub 社区证据显示该项目存在一个安装相关的待验证问题：`ExternalTaskSensor` can succeed early for task groups with NULL task states
- 对用户的影响：可能增加新用户试用和生产接入成本。
- 建议检查：来源问题仍为 open，Pack Agent 需要复核是否仍影响当前版本。
- 防护动作：不得脱离来源链接放大为确定性结论；需要标注适用版本和复核状态。
- 证据：community_evidence:github | cevd_3c746f7ce44f43f1a5a81840f4ee741a | https://github.com/apache/airflow/issues/66877 | 来源讨论提到 python 相关条件，需在安装/试用前复核。

## 2. 能力坑 · 能力判断依赖假设

- 严重度：medium
- 证据强度：source_linked
- 发现：README/documentation is current enough for a first validation pass.
- 对用户的影响：假设不成立时，用户拿不到承诺的能力。
- 建议检查：将假设转成下游验证清单。
- 防护动作：假设必须转成验证项；没有验证结果前不能写成事实。
- 证据：capability.assumptions | github_repo:33884891 | https://github.com/apache/airflow | README/documentation is current enough for a first validation pass.

## 3. 维护坑 · 维护活跃度未知

- 严重度：medium
- 证据强度：source_linked
- 发现：未记录 last_activity_observed。
- 对用户的影响：新项目、停更项目和活跃项目会被混在一起，推荐信任度下降。
- 建议检查：补 GitHub 最近 commit、release、issue/PR 响应信号。
- 防护动作：维护活跃度未知时，推荐强度不能标为高信任。
- 证据：evidence.maintainer_signals | github_repo:33884891 | https://github.com/apache/airflow | last_activity_observed missing

## 4. 安全/权限坑 · 下游验证发现风险项

- 严重度：medium
- 证据强度：source_linked
- 发现：no_demo
- 对用户的影响：下游已经要求复核，不能在页面中弱化。
- 建议检查：进入安全/权限治理复核队列。
- 防护动作：下游风险存在时必须保持 review/recommendation 降级。
- 证据：downstream_validation.risk_items | github_repo:33884891 | https://github.com/apache/airflow | no_demo; severity=medium

## 5. 安全/权限坑 · 存在安全注意事项

- 严重度：medium
- 证据强度：source_linked
- 发现：No sandbox install has been executed yet; downstream must verify before user use.
- 对用户的影响：用户安装前需要知道权限边界和敏感操作。
- 建议检查：转成明确权限清单和安全审查提示。
- 防护动作：安全注意事项必须面向用户前置展示。
- 证据：risks.safety_notes | github_repo:33884891 | https://github.com/apache/airflow | No sandbox install has been executed yet; downstream must verify before user use.

## 6. 安全/权限坑 · 存在评分风险

- 严重度：medium
- 证据强度：source_linked
- 发现：no_demo
- 对用户的影响：风险会影响是否适合普通用户安装。
- 建议检查：把风险写入边界卡，并确认是否需要人工复核。
- 防护动作：评分风险必须进入边界卡，不能只作为内部分数。
- 证据：risks.scoring_risks | github_repo:33884891 | https://github.com/apache/airflow | no_demo; severity=medium

## 7. 安全/权限坑 · 来源证据：Apache Airflow 3.1.6

- 严重度：medium
- 证据强度：source_linked
- 发现：GitHub 社区证据显示该项目存在一个安全/权限相关的待验证问题：Apache Airflow 3.1.6
- 对用户的影响：可能阻塞安装或首次运行。
- 建议检查：来源显示可能已有修复、规避或版本变化，说明书中必须标注适用版本。
- 防护动作：不得脱离来源链接放大为确定性结论；需要标注适用版本和复核状态。
- 证据：community_evidence:github | cevd_c94c5b79bb91454c9e0ad22b4a36dc11 | https://github.com/apache/airflow/releases/tag/3.1.6 | 来源讨论提到 docker 相关条件，需在安装/试用前复核。

## 8. 安全/权限坑 · 来源证据：Apache Airflow 3.1.7

- 严重度：medium
- 证据强度：source_linked
- 发现：GitHub 社区证据显示该项目存在一个安全/权限相关的待验证问题：Apache Airflow 3.1.7
- 对用户的影响：可能阻塞安装或首次运行。
- 建议检查：来源显示可能已有修复、规避或版本变化，说明书中必须标注适用版本。
- 防护动作：不得脱离来源链接放大为确定性结论；需要标注适用版本和复核状态。
- 证据：community_evidence:github | cevd_80a5614167b44ecca96422caa56afca4 | https://github.com/apache/airflow/releases/tag/3.1.7 | 来源讨论提到 docker 相关条件，需在安装/试用前复核。

## 9. 安全/权限坑 · 来源证据：Apache Airflow 3.1.8

- 严重度：medium
- 证据强度：source_linked
- 发现：GitHub 社区证据显示该项目存在一个安全/权限相关的待验证问题：Apache Airflow 3.1.8
- 对用户的影响：可能阻塞安装或首次运行。
- 建议检查：来源显示可能已有修复、规避或版本变化，说明书中必须标注适用版本。
- 防护动作：不得脱离来源链接放大为确定性结论；需要标注适用版本和复核状态。
- 证据：community_evidence:github | cevd_3437fcfc89ff40a0ba17b7e5a5d8aa2c | https://github.com/apache/airflow/releases/tag/3.1.8 | 来源讨论提到 python 相关条件，需在安装/试用前复核。

## 10. 安全/权限坑 · 来源证据：Apache Airflow 3.2.0

- 严重度：medium
- 证据强度：source_linked
- 发现：GitHub 社区证据显示该项目存在一个安全/权限相关的待验证问题：Apache Airflow 3.2.0
- 对用户的影响：可能阻塞安装或首次运行。
- 建议检查：来源显示可能已有修复、规避或版本变化，说明书中必须标注适用版本。
- 防护动作：不得脱离来源链接放大为确定性结论；需要标注适用版本和复核状态。
- 证据：community_evidence:github | cevd_d18a498a98ed48f0bb3f813aaa554aea | https://github.com/apache/airflow/releases/tag/3.2.0 | 来源讨论提到 docker 相关条件，需在安装/试用前复核。

## 11. 安全/权限坑 · 来源证据：Apache Airflow 3.2.1

- 严重度：medium
- 证据强度：source_linked
- 发现：GitHub 社区证据显示该项目存在一个安全/权限相关的待验证问题：Apache Airflow 3.2.1
- 对用户的影响：可能影响升级、迁移或版本选择。
- 建议检查：来源显示可能已有修复、规避或版本变化，说明书中必须标注适用版本。
- 防护动作：不得脱离来源链接放大为确定性结论；需要标注适用版本和复核状态。
- 证据：community_evidence:github | cevd_41889eab2c0240bbb0bd010262d41346 | https://github.com/apache/airflow/releases/tag/3.2.1 | 来源讨论提到 python 相关条件，需在安装/试用前复核。

## 12. 安全/权限坑 · 来源证据：Apache Airflow Ctl (airflowctl) 0.1.2

- 严重度：medium
- 证据强度：source_linked
- 发现：GitHub 社区证据显示该项目存在一个安全/权限相关的待验证问题：Apache Airflow Ctl (airflowctl) 0.1.2
- 对用户的影响：可能阻塞安装或首次运行。
- 建议检查：来源显示可能已有修复、规避或版本变化，说明书中必须标注适用版本。
- 防护动作：不得脱离来源链接放大为确定性结论；需要标注适用版本和复核状态。
- 证据：community_evidence:github | cevd_df5b495084624a899a4674d4b0193ec7 | https://github.com/apache/airflow/releases/tag/airflow-ctl/0.1.2 | 来源类型 github_release 暴露的待验证使用条件。

## 13. 安全/权限坑 · 来源证据：Apache Airflow Helm Chart 1.19.0

- 严重度：medium
- 证据强度：source_linked
- 发现：GitHub 社区证据显示该项目存在一个安全/权限相关的待验证问题：Apache Airflow Helm Chart 1.19.0
- 对用户的影响：可能影响升级、迁移或版本选择。
- 建议检查：来源显示可能已有修复、规避或版本变化，说明书中必须标注适用版本。
- 防护动作：不得脱离来源链接放大为确定性结论；需要标注适用版本和复核状态。
- 证据：community_evidence:github | cevd_0f5ecc599d634c1daa9ee6c9f2434849 | https://github.com/apache/airflow/releases/tag/helm-chart/1.19.0 | 来源类型 github_release 暴露的待验证使用条件。

## 14. 安全/权限坑 · 来源证据：Apache Airflow Helm Chart 1.20.0

- 严重度：medium
- 证据强度：source_linked
- 发现：GitHub 社区证据显示该项目存在一个安全/权限相关的待验证问题：Apache Airflow Helm Chart 1.20.0
- 对用户的影响：可能影响升级、迁移或版本选择。
- 建议检查：来源显示可能已有修复、规避或版本变化，说明书中必须标注适用版本。
- 防护动作：不得脱离来源链接放大为确定性结论；需要标注适用版本和复核状态。
- 证据：community_evidence:github | cevd_c13fe02283094cffa9e186716cc8dcf5 | https://github.com/apache/airflow/releases/tag/helm-chart/1.20.0 | 来源讨论提到 node 相关条件，需在安装/试用前复核。

## 15. 安全/权限坑 · 来源证据：Apache Airflow Helm Chart 1.21.0

- 严重度：medium
- 证据强度：source_linked
- 发现：GitHub 社区证据显示该项目存在一个安全/权限相关的待验证问题：Apache Airflow Helm Chart 1.21.0
- 对用户的影响：可能影响升级、迁移或版本选择。
- 建议检查：来源显示可能已有修复、规避或版本变化，说明书中必须标注适用版本。
- 防护动作：不得脱离来源链接放大为确定性结论；需要标注适用版本和复核状态。
- 证据：community_evidence:github | cevd_d29213d0ae2441e8907f407d8dc9b861 | https://github.com/apache/airflow/releases/tag/helm-chart/1.21.0 | 来源讨论提到 docker 相关条件，需在安装/试用前复核。

## 16. 安全/权限坑 · 来源证据：airflow-ctl/0.1.3

- 严重度：medium
- 证据强度：source_linked
- 发现：GitHub 社区证据显示该项目存在一个安全/权限相关的待验证问题：airflow-ctl/0.1.3
- 对用户的影响：可能影响授权、密钥配置或安全边界。
- 建议检查：来源显示可能已有修复、规避或版本变化，说明书中必须标注适用版本。
- 防护动作：不得脱离来源链接放大为确定性结论；需要标注适用版本和复核状态。
- 证据：community_evidence:github | cevd_9096523c7ba541c6ac1b6b926d6f6bc0 | https://github.com/apache/airflow/releases/tag/airflow-ctl/0.1.3 | 来源讨论提到 python 相关条件，需在安装/试用前复核。

## 17. 安全/权限坑 · 来源证据：edge3: upgrade from 1.x silently leaves schema stale — `BaseDBManager.upgradedb` stamps `alembic_version_edge3` to head…

- 严重度：medium
- 证据强度：source_linked
- 发现：GitHub 社区证据显示该项目存在一个安全/权限相关的待验证问题：edge3: upgrade from 1.x silently leaves schema stale — `BaseDBManager.upgradedb` stamps `alembic_version_edge3` to head without running migrations
- 对用户的影响：可能影响升级、迁移或版本选择。
- 建议检查：来源显示可能已有修复、规避或版本变化，说明书中必须标注适用版本。
- 防护动作：不得脱离来源链接放大为确定性结论；需要标注适用版本和复核状态。
- 证据：community_evidence:github | cevd_2bc80550929a49ddbce05c557bb225e0 | https://github.com/apache/airflow/issues/66524 | 来源讨论提到 python 相关条件，需在安装/试用前复核。

## 18. 维护坑 · issue/PR 响应质量未知

- 严重度：low
- 证据强度：source_linked
- 发现：issue_or_pr_quality=unknown。
- 对用户的影响：用户无法判断遇到问题后是否有人维护。
- 建议检查：抽样最近 issue/PR，判断是否长期无人处理。
- 防护动作：issue/PR 响应未知时，必须提示维护风险。
- 证据：evidence.maintainer_signals | github_repo:33884891 | https://github.com/apache/airflow | issue_or_pr_quality=unknown

## 19. 维护坑 · 发布节奏不明确

- 严重度：low
- 证据强度：source_linked
- 发现：release_recency=unknown。
- 对用户的影响：安装命令和文档可能落后于代码，用户踩坑概率升高。
- 建议检查：确认最近 release/tag 和 README 安装命令是否一致。
- 防护动作：发布节奏未知或过期时，安装说明必须标注可能漂移。
- 证据：evidence.maintainer_signals | github_repo:33884891 | https://github.com/apache/airflow | release_recency=unknown

<!-- canonical_name: apache/airflow; human_manual_source: deepwiki_human_wiki -->
