# https://github.com/flyteorg/flyte Project Manual

Generated at: 2026-06-19 10:38:54 UTC

## Table of Contents

- [Repository Overview & System Architecture](#page-1)
- [Backend Services & Data Plane APIs](#page-2)
- [Plugin System, Task Execution & Extensibility](#page-3)
- [Deployment, Tooling & Operations](#page-4)

<a id='page-1'></a>

## Repository Overview & System Architecture

### Related Pages

Related topics: [Backend Services & Data Plane APIs](#page-2), [Plugin System, Task Execution & Extensibility](#page-3)

<details>
<summary>Related Source Files</summary>

The following source files were used to generate this page:

- [README.md](https://github.com/flyteorg/flyte/blob/main/README.md)
- [flytestdlib/README.md](https://github.com/flyteorg/flyte/blob/main/flytestdlib/README.md)
- [executor/README.md](https://github.com/flyteorg/flyte/blob/main/executor/README.md)
- [flyteidl2/gen_utils/rust/src/lib.rs](https://github.com/flyteorg/flyte/blob/main/flyteidl2/gen_utils/rust/src/lib.rs)
- [flyteidl2/gen_utils/rust/src/google.rpc.rs](https://github.com/flyteorg/flyte/blob/main/flyteidl2/gen_utils/rust/src/google.rpc.rs)
- [gen/go/flyteidl2/app/app_definition.pb.go](https://github.com/flyteorg/flyte/blob/main/gen/go/flyteidl2/app/app_definition.pb.go)
- [gen/ts/flyteidl2/app/app_definition_pb.ts](https://github.com/flyteorg/flyte/blob/main/gen/ts/flyteidl2/app/app_definition_pb.ts)
- [gen/go/gateway/flyteidl2/app/app_logs_service.swagger.json](https://github.com/flyteorg/flyte/blob/main/gen/go/gateway/flyteidl2/app/app_logs_service.swagger.json)
- [app/internal/k8s/app_client.go](https://github.com/flyteorg/flyte/blob/main/app/internal/k8s/app_client.go)
</details>

# Repository Overview & System Architecture

## 1. Purpose and Scope

Flyte is an open-source project licensed under Apache 2.0 that provides an extensible orchestration platform for data and machine-learning workflows. The umbrella repository hosts the core backend services, the shared standard library, code-generation utilities, and runtime adapters that together form the Flyte control plane and execution plane. The project's stated contribution entry points are the backend README and the community Slack, indicating that this monorepo is the canonical source of truth for the server-side implementation [README.md](README.md).

The repository is organized as a polyglot monorepo. Go is the primary implementation language for backend services, Rust is used for performance-sensitive bindings exposed to Python via PyO3, and TypeScript is generated for web/gateway consumers. Multi-language support is achieved through a Protocol Buffers-based Interface Definition Language (IDL) called `flyteidl2`, which is the single source of truth for all wire types [flyteidl2/gen_utils/rust/src/lib.rs](flyteidl2/gen_utils/rust/src/lib.rs). Standard cross-cutting concerns — configuration, CLI flag generation, and storage abstractions — are extracted into the `flytestdlib` shared library [flytestdlib/README.md](flytestdlib/README.md).

## 2. Repository Structure and Components

The repository is composed of several logical modules, each with a focused responsibility:

| Module | Role | Notable Artifacts |
|---|---|---|
| `flyteidl2/` | Interface Definition Language v2 and code-generation utilities (Rust + PyO3) | `gen_utils/rust/src/lib.rs` |
| `flytestdlib/` | Shared Go library: config, `cli/pflags`, abstract `storage` | `flytestdlib/README.md` |
| `executor/` | Kubernetes-native workflow executor component | `executor/README.md` |
| `app/` | Application controller that deploys workloads onto Knative | `app/internal/k8s/app_client.go` |
| `gen/go/` | Generated Go protobuf/gRPC bindings | `gen/go/flyteidl2/app/app_definition.pb.go` |
| `gen/ts/` | Generated TypeScript protobuf bindings | `gen/ts/flyteidl2/app/app_definition_pb.ts` |
| `gen/go/gateway/` | Generated OpenAPI/Swagger definitions for the gateway | `gen/go/gateway/flyteidl2/app/app_logs_service.swagger.json` |

### 2.1 Shared Library (`flytestdlib`)

`flytestdlib` is intentionally narrow. According to its README, it exposes a strongly typed configuration loader, a `cli/pflags` generator that derives command-line flags from Go structs, and a `storage` abstraction that uses `stow` to talk to S3, Azure Blob, and GCS while remaining protobuf-aware for in-memory testing [flytestdlib/README.md](flytestdlib/README.md).

### 2.2 Executor

The `executor/` directory contains a Kubernetes-style project that ships with a `make build-installer` target. The target uses Kustomize to produce a self-contained `install.yaml` bundle so operators can deploy the executor with `kubectl apply -f`. An optional Helm chart can be produced via the `kubebuilder` `helm/v1-alpha` plugin [executor/README.md](executor/README.md).

### 2.3 App Controller

The app controller reconciles Flyte `App` custom resources into Knative revisions. It inspects autoscaling settings on a `flyteapp.Spec` and emits the corresponding `autoscaling.knative.dev/*` annotations, supporting replica counts, request-rate and concurrency scaling metrics, and a custom scale-down window [app/internal/k8s/app_client.go](app/internal/k8s/app_client.go).

## 3. Interface Definition Language and Multi-Language Code Generation

At the heart of the architecture sits `flyteidl2`, a Protocol Buffers-based IDL. The Rust crate under `flyteidl2/gen_utils/rust/` orchestrates `prost-build` to emit Rust types and exposes a PyO3 module so that Python clients can consume the same wire format [flyteidl2/gen_utils/rust/src/lib.rs](flyteidl2/gen_utils/rust/src/lib.rs). The same IDL also drives Go and TypeScript generation.

The generated artifacts evidence a consistent contract:

- **Go**: `gen/go/flyteidl2/app/app_definition.pb.go` exposes idiomatic Go structs such as `Link`, `Spec_Container`, and `Spec_Pod` discriminated unions for the application's payload type [gen/go/flyteidl2/app/app_definition.pb.go](gen/go/flyteidl2/app/app_definition.pb.go).
- **TypeScript**: `gen/ts/flyteidl2/app/app_definition_pb.ts` mirrors these as TypeScript message types using the `proto3` runtime helpers, including fields like `assignedCluster`, `currentReplicas`, and the oneof-typed `payload` of `AppWrapper` [gen/ts/flyteidl2/app/app_definition_pb.ts](gen/ts/flyteidl2/app/app_definition_pb.ts).
- **Gateway/OpenAPI**: `gen/go/gateway/flyteidl2/app/app_logs_service.swagger.json` and sibling files emit the OpenAPI surface for the HTTP/JSON gateway, reusing Google's `protobufAny` envelope and `google.rpc.Status` error model [gen/go/gateway/flyteidl2/app/app_logs_service.swagger.json](gen/go/gateway/flyteidl2/app/app_logs_service.swagger.json).

Error reporting is standardized through the `google.rpc.Status` envelope, which carries a numeric `code`, an English developer-facing `message`, and a list of `google.protobuf.Any` detail payloads [flyteidl2/gen_utils/rust/src/google.rpc.rs](flyteidl2/gen_utils/rust/src/google.rpc.rs). This choice gives every language binding an identical error contract.

## 4. Runtime Architecture and Community Context

The end-to-end runtime can be summarized as follows:

```mermaid
flowchart LR
    User[User / flytekit] -->|gRPC + protobuf| Gateway
    Gateway --> Admin[FlyteAdmin]
    Admin -->|CRDs| K8s[(Kubernetes API)]
    K8s --> Executor[Executor]
    K8s --> AppCtrl[App Controller]
    AppCtrl -->|Knative annotations| Knative[(Knative Serving)]
    Knative --> Pods[Workload Pods]
    Executor --> Pods
    Pods --> Storage[(Object Storage<br/>S3 / GCS / Azure)]
    Pods --> Admin
```

The IDL layer guarantees that user SDKs, the control plane (FlyteAdmin, gateway), and the data plane (executor, app controller, Knative pods) all speak the same wire types. Storage is mediated through `flytestdlib`'s `stow`-backed abstraction, which keeps blob I/O uniform across providers [flytestdlib/README.md](flytestdlib/README.md).

### 4.1 Active Areas of Community Interest

Several long-running community discussions map directly onto components shipped from this monorepo:

- **Failure-node support** ([Issue #1506](https://github.com/flyteorg/flyte/issues/1506)) — users want the failure-node primitive that already exists in the backend to be exposed in flytekit; this requires evolution of the workflow IDL under `flyteidl2`.
- **Runtime overrides during execution** ([Issue #475](https://github.com/flyteorg/flyte/issues/475)) — adjusting resources, retries, and catalog settings after registration touches the same IDL types managed in `flyteidl2` and the gRPC services defined under `gen/go/`.
- **Webhook-based notifications** ([Issue #2317](https://github.com/flyteorg/flyte/issues/2317)) — replacing SES/SendGrid with generic webhooks changes FlyteAdmin's notification subsystem rather than the IDL itself.
- **DBT plugin** ([Issue #2202](https://github.com/flyteorg/flyte/issues/2202)) — a new flytekit plugin that would integrate with the existing task-type machinery defined in the IDL.
- **CI/CD reference workflow** ([Issue #2772](https://github.com/flyteorg/flyte/issues/2772)) — calls for documenting production Flyte delivery pipelines, complementing the install bundle produced by the executor component [executor/README.md](executor/README.md).

The latest tagged release is **v2.0.24**, which primarily contains CI refinements and Kubernetes controller-runtime dependency bumps (`sigs.k8s.io/controller-runtime` 0.23.3 → 0.24.1 and `k8s.io/client-go` 0.36.0 → 0.36.1), underscoring that keeping pace with upstream Kubernetes APIs is an ongoing concern for the project [README.md](README.md).

## See Also

- [Flyte Backend README](docs/BACKEND_README.md)
- [flytestdlib Shared Components](flytestdlib/README.md)
- [Executor Component](executor/README.md)
- [flyteidl2 IDL Utilities](flyteidl2/gen_utils/rust/src/lib.rs)
- [App Controller (Kubernetes)](app/internal/k8s/app_client.go)

---

<a id='page-2'></a>

## Backend Services & Data Plane APIs

### Related Pages

Related topics: [Repository Overview & System Architecture](#page-1), [Plugin System, Task Execution & Extensibility](#page-3), [Deployment, Tooling & Operations](#page-4)

<details>
<summary>Related Source Files</summary>

The following source files were used to generate this page:

- [README.md](https://github.com/flyteorg/flyte/blob/main/README.md)
- [flytecopilot/README.md](https://github.com/flyteorg/flyte/blob/main/flytecopilot/README.md)
- [gen/go/gateway/flyteidl2/app/app_service.swagger.json](https://github.com/flyteorg/flyte/blob/main/gen/go/gateway/flyteidl2/app/app_service.swagger.json)
- [gen/go/gateway/flyteidl2/app/app_definition.swagger.json](https://github.com/flyteorg/flyte/blob/main/gen/go/gateway/flyteidl2/app/app_definition.swagger.json)
- [gen/go/gateway/flyteidl2/app/replica_definition.swagger.json](https://github.com/flyteorg/flyte/blob/main/gen/go/gateway/flyteidl2/app/replica_definition.swagger.json)
- [gen/go/gateway/flyteidl2/app/app_logs_service.swagger.json](https://github.com/flyteorg/flyte/blob/main/gen/go/gateway/flyteidl2/app/app_logs_service.swagger.json)
- [gen/go/gateway/flyteidl2/app/app_logs_payload.swagger.json](https://github.com/flyteorg/flyte/blob/main/gen/go/gateway/flyteidl2/app/app_logs_payload.swagger.json)
- [gen/ts/flyteidl2/app/app_definition_pb.ts](https://github.com/flyteorg/flyte/blob/main/gen/ts/flyteidl2/app/app_definition_pb.ts)
- [gen/ts/flyteidl2/app/app_logs_payload_pb.ts](https://github.com/flyteorg/flyte/blob/main/gen/ts/flyteidl2/app/app_logs_payload_pb.ts)
- [flyteidl2/gen_utils/rust/src/lib.rs](https://github.com/flyteorg/flyte/blob/main/flyteidl2/gen_utils/rust/src/lib.rs)
</details>

# Backend Services & Data Plane APIs

## Overview and Scope

The Flyte 2 backend, housed in this repository, is a Kubernetes-native service that exposes a typed **data plane API** for managing the lifecycle of *applications* (apps) and their *replicas*. Unlike Flyte 1, where the surface was centered on workflow/node/task identifiers, Flyte 2 organizes its data plane around the concepts of `app` (a long-running workload specification) and `replica` (a concrete instance of that spec) [Source: [gen/ts/flyteidl2/app/app_definition_pb.ts]()](https://github.com/flyteorg/flyte/blob/main/gen/ts/flyteidl2/app/app_definition_pb.ts).

The project README describes this repository as the home for the **Kubernetes-native backend infrastructure for deploying Flyte 2 as a distributed, multi-node service**, with the protocol buffer definitions and contribution guide in [docs/BACKEND_README.md](https://github.com/flyteorg/flyte/blob/main/docs/BACKEND_README.md) [Source: [README.md]()](https://github.com/flyteorg/flyte/blob/main/README.md). The companion **control plane** is provided as a managed service via [Union.ai](https://www.union.ai/try-flyte-2), while the data plane services (logs, replica management, app lifecycle) live in this repo.

The data plane is intentionally **multi-language**: the same `flyteidl2` protocol definitions are compiled to Go (gRPC + gRPC-Gateway HTTP), TypeScript (for the web console and SDKs), and Rust (for high-performance clients such as the Python SDK bindings) [Source: [flyteidl2/gen_utils/rust/src/lib.rs]()](https://github.com/flyteorg/flyte/blob/main/flyteidl2/gen_utils/rust/src/lib.rs). This shared IDL guarantees that the backend, the Python SDK ([flyte-sdk](https://github.com/flyteorg/flyte-sdk)), the CLI, and the UI all speak the same contract.

## API Surfaces

The data plane is defined by a small set of focused services, each shipped as a generated Swagger/OpenAPI document under `gen/go/gateway/flyteidl2/app/`:

| Service | Proto File | Purpose |
|---|---|---|
| `AppService` | `app_service.proto` | CRUD and lifecycle operations for applications [Source: [gen/go/gateway/flyteidl2/app/app_service.swagger.json]()](https://github.com/flyteorg/flyte/blob/main/gen/go/gateway/flyteidl2/app/app_service.swagger.json) |
| `ReplicaDefinition` | `replica_definition.proto` | Shapes describing how a replica is materialized [Source: [gen/go/gateway/flyteidl2/app/replica_definition.swagger.json]()](https://github.com/flyteorg/flyte/blob/main/gen/go/gateway/flyteidl2/app/replica_definition.swagger.json) |
| `AppLogsService` | `app_logs_service.proto` | Streaming/tail logs for an app or a specific replica [Source: [gen/go/gateway/flyteidl2/app/app_logs_service.swagger.json]()](https://github.com/flyteorg/flyte/blob/main/gen/go/gateway/flyteidl2/app/app_logs_service.swagger.json) |

### App and Replica Identifiers

Every app is keyed by a four-part identifier — `org`, `project`, `domain`, and `name` — defined in the `flyteidl2.app.Identifier` message. This mirrors the Flyte 1 namespace model so that existing multi-tenancy boundaries carry over [Source: [gen/ts/flyteidl2/app/app_definition_pb.ts]()](https://github.com/flyteorg/flyte/blob/main/gen/ts/flyteidl2/app/app_definition_pb.ts).

A `Status` message associated with an app carries cluster placement and replica counts (e.g. `assigned_cluster`, `current_replicas`) along with a `Condition` enum that includes a `Substate` field for finer-grained failure reasons such as `IMAGE_PULL_ERROR`. This addresses a long-standing community request (e.g. [issue #1506](https://github.com/flyteorg/flyte/issues/1506) for richer failure-node semantics) by giving consumers a typed way to introspect why a deployment failed.

### Logs API

`AppLogsService` accepts a `TailLogsRequest` that uses a `oneof` to choose between an `app_id` (all replicas) or a `replica_id` (a single replica) [Source: [gen/ts/flyteidl2/app/app_logs_payload_pb.ts]()](https://github.com/flyteorg/flyte/blob/main/gen/ts/flyteidl2/app/app_logs_payload_pb.ts). This lets a user open a unified log stream for a service-style app or drill into a specific pod-level replica, which is essential for debugging long-running deployments.

## Transport, Errors, and Cross-Language Clients

All services are exposed over both **gRPC** and **HTTP/JSON** via the gRPC-Gateway pattern, with Google API HTTP annotations generated alongside [Source: [gen/ts/google/api/http_pb.ts]()](https://github.com/flyteorg/flyte/blob/main/gen/ts/google/api/http_pb.ts). The HTTP path supports `GET`, `POST`, `PUT`, `DELETE`, `PATCH`, and custom patterns, enabling RESTful access from browsers, CLIs, and webhook-based integrations — a direct response to the notification design discussion in [issue #2317](https://github.com/flyteorg/flyte/issues/2317) about replacing email-based notifications with webhook APIs.

Errors follow the canonical `google.rpc.Status` model (code, message, structured `details` of type `google.protobuf.Any`) rather than ad-hoc HTTP error envelopes [Source: [flyteidl2/gen_utils/rust/src/google.rpc.rs]()](https://github.com/flyteorg/flyte/blob/main/flyteidl2/gen_utils/rust/src/google.rpc.rs). This makes the API easy to consume from any language and easy to evolve: new error categories can be added as `Any`-typed details without breaking older clients.

The Rust crate under `flyteidl2/gen_utils/rust/src` uses `prost` for decoding and `pyo3` to expose the messages directly to Python, which is how the high-performance Python SDK bindings are produced without hand-written glue code [Source: [flyteidl2/gen_utils/rust/src/lib.rs]()](https://github.com/flyteorg/flyte/blob/main/flyteidl2/gen_utils/rust/src/lib.rs).

## The Data Plane Sidecar: Flyte CoPilot

While the gRPC/HTTP services handle *control* operations (create app, list replicas, fetch status), the **data plane** for moving bytes in and out of containers is handled by `flytecopilot` [Source: [flytecopilot/README.md]()](https://github.com/flyteorg/flyte/blob/main/flytecopilot/README.md). CoPilot runs as a sidecar or init container inside the user's pod and operates in two modes:

- **Downloader** — runs before the main container starts, materializing Flyte metadata (and any configured input data) into a shared volume so that arbitrary containers can be orchestrated by Flyte without Flyte-specific code.
- **Sidecar** — runs in parallel with the main container, monitors its lifecycle, and uploads the metadata it produces back to remote storage when the container exits (signaled by a `_SUCCESS` file).

```mermaid
flowchart LR
    SDK[flyte-sdk / CLI] -->|gRPC + HTTP| GW[gRPC-Gateway]
    GW --> SVC[AppService / AppLogsService]
    SVC --> K8s[Kubernetes API]
    K8s --> POD[Pod: main + co-pilot]
    POD -->|inputs| CP[flyte-copilot downloader]
    POD -->|outputs| CS[flyte-copilot sidecar]
    CS --> OBJ[Object store]
```

This separation lets the data plane API remain small and stable while the heavyweight data movement happens out-of-band on the node itself, which is the pattern that enables generic overrides during execution (community discussion in [issue #475](https://github.com/flyteorg/flyte/issues/475)) — resources, env vars, and storage locations are read by CoPilot from the pod spec at runtime rather than baked into a registered workflow.

## See Also

- [flyte-sdk](https://github.com/flyteorg/flyte-sdk) — Python SDK that consumes this data plane
- [docs/BACKEND_README.md](https://github.com/flyteorg/flyte/blob/main/docs/BACKEND_README.md) — backend architecture and contribution guide
- [Union.ai Flyte 2 docs](https://www.union.ai/docs/v2/flyte/user-guide/running-locally/) — managed control plane reference
- Related issues: [#1506 Failure-Node support](https://github.com/flyteorg/flyte/issues/1506), [#2317 Webhook notifications](https://github.com/flyteorg/flyte/issues/2317), [#475 Overrides during execution](https://github.com/flyteorg/flyte/issues/475)

---

<a id='page-3'></a>

## Plugin System, Task Execution & Extensibility

### Related Pages

Related topics: [Repository Overview & System Architecture](#page-1), [Backend Services & Data Plane APIs](#page-2)

<details>
<summary>Related Source Files</summary>

The following source files were used to generate this page:

- [README.md](https://github.com/flyteorg/flyte/blob/main/README.md)
- [flyteplugins/README.md](https://github.com/flyteorg/flyte/blob/main/flyteplugins/README.md)
- [flytestdlib/README.md](https://github.com/flyteorg/flyte/blob/main/flytestdlib/README.md)
- [executor/README.md](https://github.com/flyteorg/flyte/blob/main/executor/README.md)
- [flyteidl2/gen_utils/rust/src/lib.rs](https://github.com/flyteorg/flyte/blob/main/flyteidl2/gen_utils/rust/src/lib.rs)
- [flyteidl2/gen_utils/rust/src/google.rpc.rs](https://github.com/flyteorg/flyte/blob/main/flyteidl2/gen_utils/rust/src/google.rpc.rs)
- [executor/api/v1/zz_generated.deepcopy.go](https://github.com/flyteorg/flyte/blob/main/executor/api/v1/zz_generated.deepcopy.go)
- [gen/ts/flyteidl2/app/app_definition_pb.ts](https://github.com/flyteorg/flyte/blob/main/gen/ts/flyteidl2/app/app_definition_pb.ts)
- [gen/go/gateway/flyteidl2/app/app_service.swagger.json](https://github.com/flyteorg/flyte/blob/main/gen/go/gateway/flyteidl2/app/app_service.swagger.json)
</details>

# Plugin System, Task Execution & Extensibility

Flyte is a Kubernetes-native orchestrator for data and machine-learning workflows. Its power comes from a layered, pluggable architecture: an Interface Definition Language (IDL) defines the contracts between user code and the control plane, a Go-based plugin machinery executes tasks on backend resources, and a growing set of community-contributed task plugins lets Flyte run arbitrary workloads (Pod, Spark, Ray, DBT, and more). This page documents how those layers fit together and how to reason about extending the system.

## High-Level Architecture

The monorepo is organized into cooperating components. The top-level [README.md](https://github.com/flyteorg/flyte/blob/main/README.md) positions Flyte as a workflow engine that compiles, schedules, and dispatches tasks onto Kubernetes. Beneath that surface, four modules are central to extensibility:

| Module | Role | Source of truth |
| --- | --- | --- |
| `flyteidl2/` | Protobuf IDL that defines every wire type (tasks, workflows, apps, secrets) | [flyteidl2/gen_utils/rust/src/lib.rs](https://github.com/flyteorg/flyte/blob/main/flyteidl2/gen_utils/rust/src/lib.rs) |
| `flytestdlib/` | Shared Go library (config, pflags, storage, logging) | [flytestdlib/README.md](https://github.com/flyteorg/flyte/blob/main/flytestdlib/README.md) |
| `flyteplugins/` | Plugin machinery and per-task-type plugins (Pod, Spark, Ray, …) | [flyteplugins/README.md](https://github.com/flyteorg/flyte/blob/main/flyteplugins/README.md) |
| `executor/` | Kubernetes operator built with `controller-runtime` that reconciles custom resources | [executor/README.md](https://github.com/flyteorg/flyte/blob/main/executor/README.md) |

```mermaid
flowchart LR
  User[User SDK / flytekit] -->|Register & Launch| Admin[Flyteadmin + Scheduler]
  Admin -->|Task event| Plugins[flyteplugins: pluginmachinery]
  Plugins -->|Driver resource| K8s[(Kubernetes API)]
  K8s --> Exec[executor: kubebuilder operator]
  Exec -->|Phase transitions| Plugins
  Plugins --> Admin
  Admin -->|Status / Outputs| User
```

The IDL sits at the bottom of the dependency stack. As shown in [flyteidl2/gen_utils/rust/src/lib.rs](https://github.com/flyteorg/flyte/blob/main/flyteidl2/gen_utils/rust/src/lib.rs), the generated Rust crate re-exports submodules for `actions`, `app`, `auth`, `project`, `common`, `workflow`, `logs.dataplane`, `core`, `notification`, `task`, `trigger`, and `secret`. Every Flyte component — including plugins — consumes types from this single source. A canonical example is the `google.rpc.Status` envelope used uniformly for error propagation across services ([flyteidl2/gen_utils/rust/src/google.rpc.rs](https://github.com/flyteorg/flyte/blob/main/flyteidl2/gen_utils/rust/src/google.rpc.rs)).

## Plugin Machinery and Task Plugins

The `flyteplugins` module is the extensibility surface for backend-side execution. The short [flyteplugins/README.md](https://github.com/flyteorg/flyte/blob/main/flyteplugins/README.md) declares it as the home of "Plugins contributed by flyte community." In practice, the module is split into two layers:

- **Plugin machinery** (`tasks/pluginmachinery/`) — generic interfaces that a plugin must implement, plus helpers for resource construction, secret resolution, and event reporting.
- **Concrete plugins** (`tasks/plugins/.../`) — task-type-specific drivers. The classic example is the Pod plugin, which converts a Flyte task into a Kubernetes Pod; Spark and Ray plugins follow the same shape but emit `SparkApplication` or `RayJob` custom resources instead.

Plugins subscribe to the task-type identifier declared in the IDL. A plugin is responsible for:

1. Building the underlying Kubernetes object for the task (Pod, Spark cluster, Ray cluster, sidecar container, etc.).
2. Watching the resource and translating its status into Flyte's `core.WorkflowExecution`/`NodeExecution` event stream.
3. Honoring Flyte annotations such as retries, resources, and the catalog.

This design lets Flyte add new task kinds without modifying the scheduler. The community has exploited this repeatedly — for example, the [DBT plugin](https://github.com/flyteorg/flyte/issues/2202) proposed by Gojek in [Issue #2202](https://github.com/flyteorg/flyte/issues/2202) plugs into the same machinery to run dbt models as Flyte tasks. Other commonly requested extensions, such as exposing **failure-node** handling through `flytekit` ([Issue #1506](https://github.com/flyteorg/flyte/issues/1506)) and supporting **per-execution overrides** of resources, retries, and Spark/Hive config ([Issue #475](https://github.com/flyteorg/flyte/issues/475)), are also addressed by enriching the plugin interface rather than rewriting task execution.

## Task Execution via the Executor Operator

The `executor/` directory is a Kubernetes operator scaffolded with **Kubebuilder** and using `sigs.k8s.io/controller-runtime`, as documented in [executor/README.md](https://github.com/flyteorg/flyte/blob/main/executor/README.md). Generated deepcopy methods in [executor/api/v1/zz_generated.deepcopy.go](https://github.com/flyteorg/flyte/blob/main/executor/api/v1/zz_generated.deepcopy.go) (e.g., `PhaseTransition`, `TaskAction`) show that the operator tracks task lifecycle in a CRD: the `PhaseTransition` struct carries an `OccurredAt` timestamp, and `TaskAction` embeds Kubernetes `TypeMeta` and `ObjectMeta` so the operator can produce child resources declaratively.

A typical execution loop is:

1. The Flyte scheduler resolves the next node to run and emits a `TaskEvent` to the plugin.
2. The matching plugin materializes a Kubernetes object (Pod, Spark app, etc.).
3. The **executor** watches its own CRDs and reconciles their state, persisting `PhaseTransition` records.
4. When the underlying resource succeeds, the operator marks the task complete and returns the outputs blob.

Two installation patterns are supported, both described in [executor/README.md](https://github.com/flyteorg/flyte/blob/main/executor/README.md): a single `install.yaml` bundle generated by `make build-installer` for `kubectl apply`, or a Helm chart produced through `kubebuilder edit --plugins=helm/v1-alpha`.

## Shared Infrastructure: flytestdlib

The same `flytestdlib` module is consumed by both `flyteplugins` and the executor, and its [flytestdlib/README.md](https://github.com/flyteorg/flyte/blob/main/flytestdlib/README.md) lists three building blocks relevant to plugin authors:

- **`config`** — strongly typed Go config structs with parsing, validation, and live-reload support. Plugins typically expose configuration this way so operators can tune behavior at runtime.
- **`cli/pflags`** — a small generator that turns those config structs into `pflag` CLI flags, ensuring every config knob is reachable from the binary's command line.
- **`storage`** — a `stow`-backed abstraction over S3, Azure Blob, and GCS, with an in-memory implementation for tests and native protobuf (de)serialization.

A plugin author who needs a new config section, a new flag, or object-storage round-tripping for task outputs should reach for `flytestdlib` rather than re-inventing it.

## Extending Flyte: Practical Guidance

Putting the pieces together, an extension usually touches three layers:

1. **IDL** — if the new feature requires a new wire field (e.g., a new task-type enum, a new app substate such as the `Status.Substate` already present in [gen/ts/flyteidl2/app/app_definition_pb.ts](https://github.com/flyteorg/flyte/blob/main/gen/ts/flyteidl2/app/app_definition_pb.ts), or a new notification channel like the webhook APIs requested in [Issue #2317](https://github.com/flyteorg/flyte/issues/2317)), add it to a `flyteidl2/*.proto` file and regenerate. Code generation pipelines exist for Go (`gen/go/...`), TypeScript (`gen/ts/...`), and Rust (see [flyteidl2/gen_utils/rust/src/lib.rs](https://github.com/flyteorg/flyte/blob/main/flyteidl2/gen_utils/rust/src/lib.rs)).
2. **Plugin** — implement the `pluginmachinery` interfaces for the new task type. Use `flytestdlib` for config and storage. Mirror the Pod/Spark/Ray examples in `flyteplugins/go/tasks/plugins/`.
3. **Operator** — if the extension requires a long-running Kubernetes control loop (for example, a custom resource model), add it under `executor/` following the kubebuilder conventions in [executor/README.md](https://github.com/flyteorg/flyte/blob/main/executor/README.md).

Common failure modes to watch for include: not bumping the IDL for new fields (clients and servers desynchronize), bypassing `flytestdlib` (resulting in divergent config loading), and registering a plugin without a corresponding scheduler hookup (the plugin is loaded but never invoked). Issues such as the CI/CD workflow request in [#2772](https://github.com/flyteorg/flyte/issues/2772) and the overrides feature in [#475](https://github.com/flyteorg/flyte/issues/475) illustrate that extensibility is most often a cross-cutting change spanning IDL, plugin, and operator — not a single-package edit.

## See Also

- [Project Overview & Architecture](./README.md)
- [flytestdlib Shared Components](./flytestdlib/README.md)
- [Executor Operator (Kubebuilder)](./executor/README.md)
- [flyteplugins Module](./flyteplugins/README.md)
- Community: [Issue #2202 DBT plugin](https://github.com/flyteorg/flyte/issues/2202), [Issue #1506 Failure-node support](https://github.com/flyteorg/flyte/issues/1506), [Issue #475 Per-execution overrides](https://github.com/flyteorg/flyte/issues/475), [Issue #2317 Webhook notifications](https://github.com/flyteorg/flyte/issues/2317), [Issue #2772 CI/CD workflow](https://github.com/flyteorg/flyte/issues/2772)

---

<a id='page-4'></a>

## Deployment, Tooling & Operations

### Related Pages

Related topics: [Repository Overview & System Architecture](#page-1), [Backend Services & Data Plane APIs](#page-2)

<details>
<summary>Related Source Files</summary>

The following source files were used to generate this page:

- [README.md](https://github.com/flyteorg/flyte/blob/main/README.md)
- [executor/README.md](https://github.com/flyteorg/flyte/blob/main/executor/README.md)
- [executor/api/v1/zz_generated.deepcopy.go](https://github.com/flyteorg/flyte/blob/main/executor/api/v1/zz_generated.deepcopy.go)
- [app/internal/k8s/app_client.go](https://github.com/flyteorg/flyte/blob/main/app/internal/k8s/app_client.go)
- [flytestdlib/README.md](https://github.com/flyteorg/flyte/blob/main/flytestdlib/README.md)
- [flyteidl2/gen_utils/rust/src/google.rpc.rs](https://github.com/flyteorg/flyte/blob/main/flyteidl2/gen_utils/rust/src/google.rpc.rs)
- [flyteidl2/gen_utils/rust/src/lib.rs](https://github.com/flyteorg/flyte/blob/main/flyteidl2/gen_utils/rust/src/lib.rs)
- [gen/go/gateway/flyteidl2/app/app_definition.swagger.json](https://github.com/flyteorg/flyte/blob/main/gen/go/gateway/flyteidl2/app/app_definition.swagger.json)
- [gen/ts/flyteidl2/app/app_definition_pb.ts](https://github.com/flyteorg/flyte/blob/main/gen/ts/flyteidl2/app/app_definition_pb.ts)
- [gen/python/flyteidl2/app/app_definition_pb2.py](https://github.com/flyteorg/flyte/blob/main/gen/python/flyteidl2/app/app_definition_pb2.py)
</details>

# Deployment, Tooling & Operations

## Overview

Flyte is an open-source orchestrator for data, ML, and analytics pipelines. The repository hosts multiple deployable components — including the executor, the `app` runtime, the `flytestdlib` shared utilities, and the cross-language IDL (`flyteidl` / `flyteidl2`) — along with the tooling required to install, build, and operate them in production. The `Deployment, Tooling & Operations` area is therefore the layer that turns Flyte's source code into running clusters, generated language clients, and operational utilities. Source: [README.md:1-30](https://github.com/flyteorg/flyte/blob/main/README.md).

The repository ships several deployment surfaces:

- A Kubernetes-native **executor** packaged with Kustomize and Helm (and a kubebuilder-style installer), intended to run Flyte workloads on a cluster.
- A **flytestdlib** Go module that consolidates operational primitives (config, pflag generation, storage) reused by every Flyte service.
- A multi-language **IDL** pipeline that emits Go, Python, Rust, and TypeScript clients/servers from a single protobuf definition.
- An **app** component that reconciles Flyte `App` custom resources into Knative revisions and surfaces autoscaling metadata.

## Executor: Cluster Operator & Installer

The `executor/` directory is a kubebuilder-managed Kubernetes operator. Its `README.md` documents two supported distribution paths. The first path bundles every rendered Kubernetes manifest into a single `install.yaml`:

```sh
make build-installer IMG=<some-registry>/executor:tag
kubectl apply -f https://raw.githubusercontent.com/<org>/executor/<tag or branch>/dist/install.yaml
```

Source: [executor/README.md:5-18](https://github.com/flyteorg/flyte/blob/main/executor/README.md).

The second path is a Helm chart, generated through the `kubebuilder edit --plugins=helm/v1-alpha` plugin and produced under `dist/chart`. The README explicitly warns operators to re-render the chart after manifest changes and to preserve any customizations previously added to `dist/chart/values.yaml` or `dist/chart/manager/manager.yaml`. Source: [executor/README.md:20-33](https://github.com/flyteorg/flyte/blob/main/executor/README.md).

The executor exposes typed APIs declared in `executor/api/v1/`. Generated `DeepCopy` functions (for example, `PhaseTransition.DeepCopyInto`) are produced by `controller-gen` and are required by the controller-runtime runtime for safe object copies. Source: [executor/api/v1/zz_generated.deepcopy.go:1-31](https://github.com/flyteorg/flyte/blob/main/executor/api/v1/zz_generated.deepcopy.go). Operators upgrading the chart should not edit this file manually; it is regenerated by the build.

## Flytestdlib: Shared Operational Primitives

`flytestdlib` is the Go module Flyte services import to avoid re-implementing the same operational plumbing. The README enumerates three capabilities:

| Component | Purpose |
|---|---|
| `config` | Strongly-typed Go configuration with parsing, validation, and live file watching. |
| `cli/pflags` | CLI that introspects a Go struct and emits `pflag` definitions for every field; installable via the provided `godownloader.sh` script or Scoop. |
| `storage` | Abstract object-store layer (S3, Azure, GCS) on top of `stow`, with a configurable factory, an in-memory implementation for tests, and native protobuf support. |

Source: [flytestdlib/README.md:13-31](https://github.com/flyteorg/flyte/blob/main/flytestdlib/README.md).

For operators, this means every Flyte service uses the same config-loader semantics, the same flag surface, and the same storage abstractions, which simplifies SRE runbooks and observability.

## App Runtime and Knative Autoscaling

The `app/internal/k8s/app_client.go` file implements the reconciliation logic that maps a Flyte `App` custom resource onto a Knative revision. The `buildAutoscalingAnnotations` helper translates Flyte's `Autoscaling` spec into the canonical Knative annotations. The mapping is straightforward and worth memorising when debugging scale-out behaviour:

| Flyte field | Knative annotation |
|---|---|
| `autoscaling.replicas.min` | `autoscaling.knative.dev/min-scale` |
| `autoscaling.replicas.max` | `autoscaling.knative.dev/max-scale` |
| `scalingMetric.requestRate.target` | `autoscaling.knative.dev/metric=rps`, `…/target` |
| `scalingMetric.concurrency.target` | `autoscaling.knative.dev/metric=concurrency`, `…/target` |
| `scaledownPeriod` | `autoscaling.knative.dev/window` |

Source: [app/internal/k8s/app_client.go:1-38](https://github.com/flyteorg/flyte/blob/main/app/internal/k8s/app_client.go).

When Knative returns a `True` condition without a message, the file also defines a default-message table (`knativeCondDefaultMessages`) so the UI does not show empty status strings. This is a common operational gotcha when an app silently scales to zero.

```mermaid
flowchart LR
    A[Flyte App CR] --> B[app_client.go reconciler]
    B --> C{Autoscaling spec?}
    C -- yes --> D[Knative annotations]
    C -- no --> E[No annotations]
    D --> F[Knative Revision]
    F --> G[Pods / scale-to-zero]
    F --> H[Conditions + default messages]
```

## Multi-Language IDL and Code Generation

A consistent operational story requires stable, generated client/server bindings. The repository maintains `flyteidl` and the newer `flyteidl2` protos, and emits them to four targets. The Rust crate root re-exports the generated modules so downstream Python bindings (via PyO3) can be aggregated from a single place. Source: [flyteidl2/gen_utils/rust/src/lib.rs:1-19](https://github.com/flyteorg/flyte/blob/main/flyteidl2/gen_utils/rust/src/lib.rs). The shared `google.rpc.Status` error envelope used by every gRPC surface — code, message, and detail `Any` messages — is generated into Rust here. Source: [flyteidl2/gen_utils/rust/src/google.rpc.rs:1-28](https://github.com/flyteorg/flyte/blob/main/flyteidl2/gen_utils/rust/src/google.rpc.rs).

Concrete artifacts in the tree include the Go gateway Swagger definitions, the TypeScript `protobuf-es` descriptors, and the Python `*_pb2.py` modules. For example, the `App`, `AppWrapper`, `Identifier`, `Meta`, `Condition`, and `Status` types — including deployment substates such as `IMAGE_PULL_ERROR`, `CRASH_LOOP`, and `OOM_KILLED` — are emitted into TypeScript. Source: [gen/ts/flyteidl2/app/app_definition_pb.ts:1-60](https://github.com/flyteorg/flyte/blob/main/gen/ts/flyteidl2/app/app_definition_pb.ts). The same enum values appear in the Python descriptor pool. Source: [gen/python/flyteidl2/app/app_definition_pb2.py:1-1](https://github.com/flyteorg/flyte/blob/main/gen/python/flyteidl2/app/app_definition_pb2.py). Operators should treat all of these files as read-only build outputs; schema changes start from the `*.proto` sources.

## Operational Implications from the Community

The community backlog highlights operations gaps that affect the tooling surface directly:

- **CI/CD for production** is an active design topic (issue #2772) because there is no first-party workflow yet — teams currently mirror the Lyft MLOps flow described in the docs.
- **Runtime overrides at execution time** (issue #475) are still partially limited to registration-time configuration (resources, catalog, retries, Spark/Hive config), forcing a re-register cycle that complicates day-2 operations.
- **Failure-node support** (issue #1506) is exposed in the backend but not yet in the Python/Java SDKs, so SREs writing user-facing runbooks must rely on backend hooks.
- **Notification delivery** (issue #2317) currently funnels through email providers (SES, SendGrid, PagerDuty/GitHub/Slack email APIs), which constrains how operators wire alerting.

Each of these is a reason to keep the executor, `flytestdlib`, and the IDL generation pipeline under tight CI: they are the surface area that production incidents depend on.

## See Also

- Executor API reference (`executor/api/v1/`)
- Flytestdlib configuration and storage guide (`flytestdlib/`)
- Flyte App CRD and Knative integration (`app/internal/k8s/`)
- Flyte IDL generation pipeline (`flyteidl2/`)

---

<!-- evidence_pipeline_checked: true -->
<!-- evidence_injected: true -->

---

## Pitfall Log

Project: flyteorg/flyte

Summary: Found 7 structured pitfall item(s), including 1 high/blocking item(s). Top priority: Installation risk - Installation risk requires verification.

## 1. Installation risk - Installation risk requires verification

- Severity: high
- Evidence strength: source_linked
- Finding: Project evidence flags a installation risk. Review the linked source before relying on this workflow.
- User impact: May increase setup, validation, or first-run risk for the user.
- Evidence: community_evidence:github | https://github.com/flyteorg/flyte/issues/7558

## 2. Capability evidence risk - Capability evidence risk requires verification

- Severity: medium
- Evidence strength: source_linked
- Finding: README/documentation is current enough for a first validation pass.
- User impact: May increase setup, validation, or first-run risk for the user.
- Evidence: capability.assumptions | https://github.com/flyteorg/flyte

## 3. Maintenance risk - Maintenance risk requires verification

- Severity: medium
- Evidence strength: source_linked
- Finding: Project evidence flags a maintenance risk. Review the linked source before relying on this workflow.
- User impact: May increase setup, validation, or first-run risk for the user.
- Evidence: evidence.maintainer_signals | https://github.com/flyteorg/flyte

## 4. Security or permission risk - Security or permission risk requires verification

- Severity: medium
- Evidence strength: source_linked
- Finding: no_demo
- User impact: May increase setup, validation, or first-run risk for the user.
- Evidence: downstream_validation.risk_items | https://github.com/flyteorg/flyte

## 5. Security or permission risk - Security or permission risk requires verification

- Severity: medium
- Evidence strength: source_linked
- Finding: no_demo
- User impact: May increase setup, validation, or first-run risk for the user.
- Evidence: risks.scoring_risks | https://github.com/flyteorg/flyte

## 6. Maintenance risk - Maintenance risk requires verification

- Severity: low
- Evidence strength: source_linked
- Finding: issue_or_pr_quality=unknown。
- User impact: May increase setup, validation, or first-run risk for the user.
- Evidence: evidence.maintainer_signals | https://github.com/flyteorg/flyte

## 7. Maintenance risk - Maintenance risk requires verification

- Severity: low
- Evidence strength: source_linked
- Finding: release_recency=unknown。
- User impact: May increase setup, validation, or first-run risk for the user.
- Evidence: evidence.maintainer_signals | https://github.com/flyteorg/flyte

<!-- canonical_name: flyteorg/flyte; human_manual_source: deepwiki_human_wiki -->
