Doramagic Project Pack · Human Manual
flyte
Dynamic, resilient AI orchestration. Coordinate data, models, and compute as you build AI workflows.
Repository Overview & System Architecture
Related topics: Backend Services & Data Plane APIs, Plugin System, Task Execution & Extensibility
Continue reading this section for the full explanation and source context.
Continue reading this section for the full explanation and source context.
Continue reading this section for the full explanation and source context.
Continue reading this section for the full explanation and source context.
Related Pages
Related topics: Backend Services & Data Plane APIs, Plugin System, Task Execution & Extensibility
Repository Overview & System Architecture
1. Purpose and Scope
Flyte is an open-source project licensed under Apache 2.0 that provides an extensible orchestration platform for data and machine-learning workflows. The umbrella repository hosts the core backend services, the shared standard library, code-generation utilities, and runtime adapters that together form the Flyte control plane and execution plane. The project's stated contribution entry points are the backend README and the community Slack, indicating that this monorepo is the canonical source of truth for the server-side implementation README.md.
The repository is organized as a polyglot monorepo. Go is the primary implementation language for backend services, Rust is used for performance-sensitive bindings exposed to Python via PyO3, and TypeScript is generated for web/gateway consumers. Multi-language support is achieved through a Protocol Buffers-based Interface Definition Language (IDL) called flyteidl2, which is the single source of truth for all wire types flyteidl2/gen_utils/rust/src/lib.rs. Standard cross-cutting concerns — configuration, CLI flag generation, and storage abstractions — are extracted into the flytestdlib shared library flytestdlib/README.md.
2. Repository Structure and Components
The repository is composed of several logical modules, each with a focused responsibility:
| Module | Role | Notable Artifacts |
|---|---|---|
flyteidl2/ | Interface Definition Language v2 and code-generation utilities (Rust + PyO3) | gen_utils/rust/src/lib.rs |
flytestdlib/ | Shared Go library: config, cli/pflags, abstract storage | flytestdlib/README.md |
executor/ | Kubernetes-native workflow executor component | executor/README.md |
app/ | Application controller that deploys workloads onto Knative | app/internal/k8s/app_client.go |
gen/go/ | Generated Go protobuf/gRPC bindings | gen/go/flyteidl2/app/app_definition.pb.go |
gen/ts/ | Generated TypeScript protobuf bindings | gen/ts/flyteidl2/app/app_definition_pb.ts |
gen/go/gateway/ | Generated OpenAPI/Swagger definitions for the gateway | gen/go/gateway/flyteidl2/app/app_logs_service.swagger.json |
2.1 Shared Library (`flytestdlib`)
flytestdlib is intentionally narrow. According to its README, it exposes a strongly typed configuration loader, a cli/pflags generator that derives command-line flags from Go structs, and a storage abstraction that uses stow to talk to S3, Azure Blob, and GCS while remaining protobuf-aware for in-memory testing flytestdlib/README.md.
2.2 Executor
The executor/ directory contains a Kubernetes-style project that ships with a make build-installer target. The target uses Kustomize to produce a self-contained install.yaml bundle so operators can deploy the executor with kubectl apply -f. An optional Helm chart can be produced via the kubebuilder helm/v1-alpha plugin executor/README.md.
2.3 App Controller
The app controller reconciles Flyte App custom resources into Knative revisions. It inspects autoscaling settings on a flyteapp.Spec and emits the corresponding autoscaling.knative.dev/* annotations, supporting replica counts, request-rate and concurrency scaling metrics, and a custom scale-down window app/internal/k8s/app_client.go.
3. Interface Definition Language and Multi-Language Code Generation
At the heart of the architecture sits flyteidl2, a Protocol Buffers-based IDL. The Rust crate under flyteidl2/gen_utils/rust/ orchestrates prost-build to emit Rust types and exposes a PyO3 module so that Python clients can consume the same wire format flyteidl2/gen_utils/rust/src/lib.rs. The same IDL also drives Go and TypeScript generation.
The generated artifacts evidence a consistent contract:
- Go:
gen/go/flyteidl2/app/app_definition.pb.goexposes idiomatic Go structs such asLink,Spec_Container, andSpec_Poddiscriminated unions for the application's payload type gen/go/flyteidl2/app/app_definition.pb.go. - TypeScript:
gen/ts/flyteidl2/app/app_definition_pb.tsmirrors these as TypeScript message types using theproto3runtime helpers, including fields likeassignedCluster,currentReplicas, and the oneof-typedpayloadofAppWrappergen/ts/flyteidl2/app/app_definition_pb.ts. - Gateway/OpenAPI:
gen/go/gateway/flyteidl2/app/app_logs_service.swagger.jsonand sibling files emit the OpenAPI surface for the HTTP/JSON gateway, reusing Google'sprotobufAnyenvelope andgoogle.rpc.Statuserror model gen/go/gateway/flyteidl2/app/app_logs_service.swagger.json.
Error reporting is standardized through the google.rpc.Status envelope, which carries a numeric code, an English developer-facing message, and a list of google.protobuf.Any detail payloads flyteidl2/gen_utils/rust/src/google.rpc.rs. This choice gives every language binding an identical error contract.
4. Runtime Architecture and Community Context
The end-to-end runtime can be summarized as follows:
flowchart LR
User[User / flytekit] -->|gRPC + protobuf| Gateway
Gateway --> Admin[FlyteAdmin]
Admin -->|CRDs| K8s[(Kubernetes API)]
K8s --> Executor[Executor]
K8s --> AppCtrl[App Controller]
AppCtrl -->|Knative annotations| Knative[(Knative Serving)]
Knative --> Pods[Workload Pods]
Executor --> Pods
Pods --> Storage[(Object Storage<br/>S3 / GCS / Azure)]
Pods --> AdminThe IDL layer guarantees that user SDKs, the control plane (FlyteAdmin, gateway), and the data plane (executor, app controller, Knative pods) all speak the same wire types. Storage is mediated through flytestdlib's stow-backed abstraction, which keeps blob I/O uniform across providers flytestdlib/README.md.
4.1 Active Areas of Community Interest
Several long-running community discussions map directly onto components shipped from this monorepo:
- Failure-node support (Issue #1506) — users want the failure-node primitive that already exists in the backend to be exposed in flytekit; this requires evolution of the workflow IDL under
flyteidl2. - Runtime overrides during execution (Issue #475) — adjusting resources, retries, and catalog settings after registration touches the same IDL types managed in
flyteidl2and the gRPC services defined undergen/go/. - Webhook-based notifications (Issue #2317) — replacing SES/SendGrid with generic webhooks changes FlyteAdmin's notification subsystem rather than the IDL itself.
- DBT plugin (Issue #2202) — a new flytekit plugin that would integrate with the existing task-type machinery defined in the IDL.
- CI/CD reference workflow (Issue #2772) — calls for documenting production Flyte delivery pipelines, complementing the install bundle produced by the executor component executor/README.md.
The latest tagged release is v2.0.24, which primarily contains CI refinements and Kubernetes controller-runtime dependency bumps (sigs.k8s.io/controller-runtime 0.23.3 → 0.24.1 and k8s.io/client-go 0.36.0 → 0.36.1), underscoring that keeping pace with upstream Kubernetes APIs is an ongoing concern for the project README.md.
See Also
Source: https://github.com/flyteorg/flyte / Human Manual
Backend Services & Data Plane APIs
Related topics: Repository Overview & System Architecture, Plugin System, Task Execution & Extensibility, Deployment, Tooling & Operations
Continue reading this section for the full explanation and source context.
Continue reading this section for the full explanation and source context.
Continue reading this section for the full explanation and source context.
Related Pages
Related topics: Repository Overview & System Architecture, Plugin System, Task Execution & Extensibility, Deployment, Tooling & Operations
Backend Services & Data Plane APIs
Overview and Scope
The Flyte 2 backend, housed in this repository, is a Kubernetes-native service that exposes a typed data plane API for managing the lifecycle of *applications* (apps) and their *replicas*. Unlike Flyte 1, where the surface was centered on workflow/node/task identifiers, Flyte 2 organizes its data plane around the concepts of app (a long-running workload specification) and replica (a concrete instance of that spec) Source: [gen/ts/flyteidl2/app/app_definition_pb.ts](https://github.com/flyteorg/flyte/blob/efe332ee213de88b002c5055eb40ec9b5b3efda4/gen/ts/flyteidl2/app/app_definition_pb.ts).
The project README describes this repository as the home for the Kubernetes-native backend infrastructure for deploying Flyte 2 as a distributed, multi-node service, with the protocol buffer definitions and contribution guide in docs/BACKEND_README.md Source: [README.md](https://github.com/flyteorg/flyte/blob/efe332ee213de88b002c5055eb40ec9b5b3efda4/README.md). The companion control plane is provided as a managed service via Union.ai, while the data plane services (logs, replica management, app lifecycle) live in this repo.
The data plane is intentionally multi-language: the same flyteidl2 protocol definitions are compiled to Go (gRPC + gRPC-Gateway HTTP), TypeScript (for the web console and SDKs), and Rust (for high-performance clients such as the Python SDK bindings) Source: [flyteidl2/gen_utils/rust/src/lib.rs](https://github.com/flyteorg/flyte/blob/efe332ee213de88b002c5055eb40ec9b5b3efda4/flyteidl2/gen_utils/rust/src/lib.rs). This shared IDL guarantees that the backend, the Python SDK (flyte-sdk), the CLI, and the UI all speak the same contract.
API Surfaces
The data plane is defined by a small set of focused services, each shipped as a generated Swagger/OpenAPI document under gen/go/gateway/flyteidl2/app/:
| Service | Proto File | Purpose |
|---|---|---|
AppService | app_service.proto | CRUD and lifecycle operations for applications Source: [gen/go/gateway/flyteidl2/app/app_service.swagger.json](https://github.com/flyteorg/flyte/blob/efe332ee213de88b002c5055eb40ec9b5b3efda4/gen/go/gateway/flyteidl2/app/app_service.swagger.json) |
ReplicaDefinition | replica_definition.proto | Shapes describing how a replica is materialized Source: [gen/go/gateway/flyteidl2/app/replica_definition.swagger.json](https://github.com/flyteorg/flyte/blob/efe332ee213de88b002c5055eb40ec9b5b3efda4/gen/go/gateway/flyteidl2/app/replica_definition.swagger.json) |
AppLogsService | app_logs_service.proto | Streaming/tail logs for an app or a specific replica Source: [gen/go/gateway/flyteidl2/app/app_logs_service.swagger.json](https://github.com/flyteorg/flyte/blob/efe332ee213de88b002c5055eb40ec9b5b3efda4/gen/go/gateway/flyteidl2/app/app_logs_service.swagger.json) |
App and Replica Identifiers
Every app is keyed by a four-part identifier — org, project, domain, and name — defined in the flyteidl2.app.Identifier message. This mirrors the Flyte 1 namespace model so that existing multi-tenancy boundaries carry over Source: [gen/ts/flyteidl2/app/app_definition_pb.ts](https://github.com/flyteorg/flyte/blob/efe332ee213de88b002c5055eb40ec9b5b3efda4/gen/ts/flyteidl2/app/app_definition_pb.ts).
A Status message associated with an app carries cluster placement and replica counts (e.g. assigned_cluster, current_replicas) along with a Condition enum that includes a Substate field for finer-grained failure reasons such as IMAGE_PULL_ERROR. This addresses a long-standing community request (e.g. issue #1506 for richer failure-node semantics) by giving consumers a typed way to introspect why a deployment failed.
Logs API
AppLogsService accepts a TailLogsRequest that uses a oneof to choose between an app_id (all replicas) or a replica_id (a single replica) Source: [gen/ts/flyteidl2/app/app_logs_payload_pb.ts](https://github.com/flyteorg/flyte/blob/efe332ee213de88b002c5055eb40ec9b5b3efda4/gen/ts/flyteidl2/app/app_logs_payload_pb.ts). This lets a user open a unified log stream for a service-style app or drill into a specific pod-level replica, which is essential for debugging long-running deployments.
Transport, Errors, and Cross-Language Clients
All services are exposed over both gRPC and HTTP/JSON via the gRPC-Gateway pattern, with Google API HTTP annotations generated alongside Source: [gen/ts/google/api/http_pb.ts](https://github.com/flyteorg/flyte/blob/efe332ee213de88b002c5055eb40ec9b5b3efda4/gen/ts/google/api/http_pb.ts). The HTTP path supports GET, POST, PUT, DELETE, PATCH, and custom patterns, enabling RESTful access from browsers, CLIs, and webhook-based integrations — a direct response to the notification design discussion in issue #2317 about replacing email-based notifications with webhook APIs.
Errors follow the canonical google.rpc.Status model (code, message, structured details of type google.protobuf.Any) rather than ad-hoc HTTP error envelopes Source: [flyteidl2/gen_utils/rust/src/google.rpc.rs](https://github.com/flyteorg/flyte/blob/efe332ee213de88b002c5055eb40ec9b5b3efda4/flyteidl2/gen_utils/rust/src/google.rpc.rs). This makes the API easy to consume from any language and easy to evolve: new error categories can be added as Any-typed details without breaking older clients.
The Rust crate under flyteidl2/gen_utils/rust/src uses prost for decoding and pyo3 to expose the messages directly to Python, which is how the high-performance Python SDK bindings are produced without hand-written glue code Source: [flyteidl2/gen_utils/rust/src/lib.rs](https://github.com/flyteorg/flyte/blob/efe332ee213de88b002c5055eb40ec9b5b3efda4/flyteidl2/gen_utils/rust/src/lib.rs).
The Data Plane Sidecar: Flyte CoPilot
While the gRPC/HTTP services handle *control* operations (create app, list replicas, fetch status), the data plane for moving bytes in and out of containers is handled by flytecopilot Source: [flytecopilot/README.md](https://github.com/flyteorg/flyte/blob/efe332ee213de88b002c5055eb40ec9b5b3efda4/flytecopilot/README.md). CoPilot runs as a sidecar or init container inside the user's pod and operates in two modes:
- Downloader — runs before the main container starts, materializing Flyte metadata (and any configured input data) into a shared volume so that arbitrary containers can be orchestrated by Flyte without Flyte-specific code.
- Sidecar — runs in parallel with the main container, monitors its lifecycle, and uploads the metadata it produces back to remote storage when the container exits (signaled by a
_SUCCESSfile).
flowchart LR
SDK[flyte-sdk / CLI] -->|gRPC + HTTP| GW[gRPC-Gateway]
GW --> SVC[AppService / AppLogsService]
SVC --> K8s[Kubernetes API]
K8s --> POD[Pod: main + co-pilot]
POD -->|inputs| CP[flyte-copilot downloader]
POD -->|outputs| CS[flyte-copilot sidecar]
CS --> OBJ[Object store]This separation lets the data plane API remain small and stable while the heavyweight data movement happens out-of-band on the node itself, which is the pattern that enables generic overrides during execution (community discussion in issue #475) — resources, env vars, and storage locations are read by CoPilot from the pod spec at runtime rather than baked into a registered workflow.
See Also
- flyte-sdk — Python SDK that consumes this data plane
- docs/BACKEND_README.md — backend architecture and contribution guide
- Union.ai Flyte 2 docs — managed control plane reference
- Related issues: #1506 Failure-Node support, #2317 Webhook notifications, #475 Overrides during execution
Source: https://github.com/flyteorg/flyte / Human Manual
Plugin System, Task Execution & Extensibility
Related topics: Repository Overview & System Architecture, Backend Services & Data Plane APIs
Continue reading this section for the full explanation and source context.
Related Pages
Related topics: Repository Overview & System Architecture, Backend Services & Data Plane APIs
Plugin System, Task Execution & Extensibility
Flyte is a Kubernetes-native orchestrator for data and machine-learning workflows. Its power comes from a layered, pluggable architecture: an Interface Definition Language (IDL) defines the contracts between user code and the control plane, a Go-based plugin machinery executes tasks on backend resources, and a growing set of community-contributed task plugins lets Flyte run arbitrary workloads (Pod, Spark, Ray, DBT, and more). This page documents how those layers fit together and how to reason about extending the system.
High-Level Architecture
The monorepo is organized into cooperating components. The top-level README.md positions Flyte as a workflow engine that compiles, schedules, and dispatches tasks onto Kubernetes. Beneath that surface, four modules are central to extensibility:
| Module | Role | Source of truth |
|---|---|---|
flyteidl2/ | Protobuf IDL that defines every wire type (tasks, workflows, apps, secrets) | flyteidl2/gen_utils/rust/src/lib.rs |
flytestdlib/ | Shared Go library (config, pflags, storage, logging) | flytestdlib/README.md |
flyteplugins/ | Plugin machinery and per-task-type plugins (Pod, Spark, Ray, …) | flyteplugins/README.md |
executor/ | Kubernetes operator built with controller-runtime that reconciles custom resources | executor/README.md |
flowchart LR User[User SDK / flytekit] -->|Register & Launch| Admin[Flyteadmin + Scheduler] Admin -->|Task event| Plugins[flyteplugins: pluginmachinery] Plugins -->|Driver resource| K8s[(Kubernetes API)] K8s --> Exec[executor: kubebuilder operator] Exec -->|Phase transitions| Plugins Plugins --> Admin Admin -->|Status / Outputs| User
The IDL sits at the bottom of the dependency stack. As shown in flyteidl2/gen_utils/rust/src/lib.rs, the generated Rust crate re-exports submodules for actions, app, auth, project, common, workflow, logs.dataplane, core, notification, task, trigger, and secret. Every Flyte component — including plugins — consumes types from this single source. A canonical example is the google.rpc.Status envelope used uniformly for error propagation across services (flyteidl2/gen_utils/rust/src/google.rpc.rs).
Plugin Machinery and Task Plugins
The flyteplugins module is the extensibility surface for backend-side execution. The short flyteplugins/README.md declares it as the home of "Plugins contributed by flyte community." In practice, the module is split into two layers:
- Plugin machinery (
tasks/pluginmachinery/) — generic interfaces that a plugin must implement, plus helpers for resource construction, secret resolution, and event reporting. - Concrete plugins (
tasks/plugins/.../) — task-type-specific drivers. The classic example is the Pod plugin, which converts a Flyte task into a Kubernetes Pod; Spark and Ray plugins follow the same shape but emitSparkApplicationorRayJobcustom resources instead.
Plugins subscribe to the task-type identifier declared in the IDL. A plugin is responsible for:
- Building the underlying Kubernetes object for the task (Pod, Spark cluster, Ray cluster, sidecar container, etc.).
- Watching the resource and translating its status into Flyte's
core.WorkflowExecution/NodeExecutionevent stream. - Honoring Flyte annotations such as retries, resources, and the catalog.
This design lets Flyte add new task kinds without modifying the scheduler. The community has exploited this repeatedly — for example, the DBT plugin proposed by Gojek in Issue #2202 plugs into the same machinery to run dbt models as Flyte tasks. Other commonly requested extensions, such as exposing failure-node handling through flytekit (Issue #1506) and supporting per-execution overrides of resources, retries, and Spark/Hive config (Issue #475), are also addressed by enriching the plugin interface rather than rewriting task execution.
Task Execution via the Executor Operator
The executor/ directory is a Kubernetes operator scaffolded with Kubebuilder and using sigs.k8s.io/controller-runtime, as documented in executor/README.md. Generated deepcopy methods in executor/api/v1/zz_generated.deepcopy.go (e.g., PhaseTransition, TaskAction) show that the operator tracks task lifecycle in a CRD: the PhaseTransition struct carries an OccurredAt timestamp, and TaskAction embeds Kubernetes TypeMeta and ObjectMeta so the operator can produce child resources declaratively.
A typical execution loop is:
- The Flyte scheduler resolves the next node to run and emits a
TaskEventto the plugin. - The matching plugin materializes a Kubernetes object (Pod, Spark app, etc.).
- The executor watches its own CRDs and reconciles their state, persisting
PhaseTransitionrecords. - When the underlying resource succeeds, the operator marks the task complete and returns the outputs blob.
Two installation patterns are supported, both described in executor/README.md: a single install.yaml bundle generated by make build-installer for kubectl apply, or a Helm chart produced through kubebuilder edit --plugins=helm/v1-alpha.
Shared Infrastructure: flytestdlib
The same flytestdlib module is consumed by both flyteplugins and the executor, and its flytestdlib/README.md lists three building blocks relevant to plugin authors:
config— strongly typed Go config structs with parsing, validation, and live-reload support. Plugins typically expose configuration this way so operators can tune behavior at runtime.cli/pflags— a small generator that turns those config structs intopflagCLI flags, ensuring every config knob is reachable from the binary's command line.storage— astow-backed abstraction over S3, Azure Blob, and GCS, with an in-memory implementation for tests and native protobuf (de)serialization.
A plugin author who needs a new config section, a new flag, or object-storage round-tripping for task outputs should reach for flytestdlib rather than re-inventing it.
Extending Flyte: Practical Guidance
Putting the pieces together, an extension usually touches three layers:
- IDL — if the new feature requires a new wire field (e.g., a new task-type enum, a new app substate such as the
Status.Substatealready present in gen/ts/flyteidl2/app/app_definition_pb.ts, or a new notification channel like the webhook APIs requested in Issue #2317), add it to aflyteidl2/*.protofile and regenerate. Code generation pipelines exist for Go (gen/go/...), TypeScript (gen/ts/...), and Rust (see flyteidl2/gen_utils/rust/src/lib.rs). - Plugin — implement the
pluginmachineryinterfaces for the new task type. Useflytestdlibfor config and storage. Mirror the Pod/Spark/Ray examples inflyteplugins/go/tasks/plugins/. - Operator — if the extension requires a long-running Kubernetes control loop (for example, a custom resource model), add it under
executor/following the kubebuilder conventions in executor/README.md.
Common failure modes to watch for include: not bumping the IDL for new fields (clients and servers desynchronize), bypassing flytestdlib (resulting in divergent config loading), and registering a plugin without a corresponding scheduler hookup (the plugin is loaded but never invoked). Issues such as the CI/CD workflow request in #2772 and the overrides feature in #475 illustrate that extensibility is most often a cross-cutting change spanning IDL, plugin, and operator — not a single-package edit.
See Also
Source: https://github.com/flyteorg/flyte / Human Manual
Deployment, Tooling & Operations
Related topics: Repository Overview & System Architecture, Backend Services & Data Plane APIs
Continue reading this section for the full explanation and source context.
Related Pages
Related topics: Repository Overview & System Architecture, Backend Services & Data Plane APIs
Deployment, Tooling & Operations
Overview
Flyte is an open-source orchestrator for data, ML, and analytics pipelines. The repository hosts multiple deployable components — including the executor, the app runtime, the flytestdlib shared utilities, and the cross-language IDL (flyteidl / flyteidl2) — along with the tooling required to install, build, and operate them in production. The Deployment, Tooling & Operations area is therefore the layer that turns Flyte's source code into running clusters, generated language clients, and operational utilities. Source: README.md:1-30.
The repository ships several deployment surfaces:
- A Kubernetes-native executor packaged with Kustomize and Helm (and a kubebuilder-style installer), intended to run Flyte workloads on a cluster.
- A flytestdlib Go module that consolidates operational primitives (config, pflag generation, storage) reused by every Flyte service.
- A multi-language IDL pipeline that emits Go, Python, Rust, and TypeScript clients/servers from a single protobuf definition.
- An app component that reconciles Flyte
Appcustom resources into Knative revisions and surfaces autoscaling metadata.
Executor: Cluster Operator & Installer
The executor/ directory is a kubebuilder-managed Kubernetes operator. Its README.md documents two supported distribution paths. The first path bundles every rendered Kubernetes manifest into a single install.yaml:
make build-installer IMG=<some-registry>/executor:tag
kubectl apply -f https://raw.githubusercontent.com/<org>/executor/<tag or branch>/dist/install.yaml
Source: executor/README.md:5-18.
The second path is a Helm chart, generated through the kubebuilder edit --plugins=helm/v1-alpha plugin and produced under dist/chart. The README explicitly warns operators to re-render the chart after manifest changes and to preserve any customizations previously added to dist/chart/values.yaml or dist/chart/manager/manager.yaml. Source: executor/README.md:20-33.
The executor exposes typed APIs declared in executor/api/v1/. Generated DeepCopy functions (for example, PhaseTransition.DeepCopyInto) are produced by controller-gen and are required by the controller-runtime runtime for safe object copies. Source: executor/api/v1/zz_generated.deepcopy.go:1-31. Operators upgrading the chart should not edit this file manually; it is regenerated by the build.
Flytestdlib: Shared Operational Primitives
flytestdlib is the Go module Flyte services import to avoid re-implementing the same operational plumbing. The README enumerates three capabilities:
| Component | Purpose |
|---|---|
config | Strongly-typed Go configuration with parsing, validation, and live file watching. |
cli/pflags | CLI that introspects a Go struct and emits pflag definitions for every field; installable via the provided godownloader.sh script or Scoop. |
storage | Abstract object-store layer (S3, Azure, GCS) on top of stow, with a configurable factory, an in-memory implementation for tests, and native protobuf support. |
Source: flytestdlib/README.md:13-31.
For operators, this means every Flyte service uses the same config-loader semantics, the same flag surface, and the same storage abstractions, which simplifies SRE runbooks and observability.
App Runtime and Knative Autoscaling
The app/internal/k8s/app_client.go file implements the reconciliation logic that maps a Flyte App custom resource onto a Knative revision. The buildAutoscalingAnnotations helper translates Flyte's Autoscaling spec into the canonical Knative annotations. The mapping is straightforward and worth memorising when debugging scale-out behaviour:
| Flyte field | Knative annotation |
|---|---|
autoscaling.replicas.min | autoscaling.knative.dev/min-scale |
autoscaling.replicas.max | autoscaling.knative.dev/max-scale |
scalingMetric.requestRate.target | autoscaling.knative.dev/metric=rps, …/target |
scalingMetric.concurrency.target | autoscaling.knative.dev/metric=concurrency, …/target |
scaledownPeriod | autoscaling.knative.dev/window |
Source: app/internal/k8s/app_client.go:1-38.
When Knative returns a True condition without a message, the file also defines a default-message table (knativeCondDefaultMessages) so the UI does not show empty status strings. This is a common operational gotcha when an app silently scales to zero.
flowchart LR
A[Flyte App CR] --> B[app_client.go reconciler]
B --> C{Autoscaling spec?}
C -- yes --> D[Knative annotations]
C -- no --> E[No annotations]
D --> F[Knative Revision]
F --> G[Pods / scale-to-zero]
F --> H[Conditions + default messages]Multi-Language IDL and Code Generation
A consistent operational story requires stable, generated client/server bindings. The repository maintains flyteidl and the newer flyteidl2 protos, and emits them to four targets. The Rust crate root re-exports the generated modules so downstream Python bindings (via PyO3) can be aggregated from a single place. Source: flyteidl2/gen_utils/rust/src/lib.rs:1-19. The shared google.rpc.Status error envelope used by every gRPC surface — code, message, and detail Any messages — is generated into Rust here. Source: flyteidl2/gen_utils/rust/src/google.rpc.rs:1-28.
Concrete artifacts in the tree include the Go gateway Swagger definitions, the TypeScript protobuf-es descriptors, and the Python *_pb2.py modules. For example, the App, AppWrapper, Identifier, Meta, Condition, and Status types — including deployment substates such as IMAGE_PULL_ERROR, CRASH_LOOP, and OOM_KILLED — are emitted into TypeScript. Source: gen/ts/flyteidl2/app/app_definition_pb.ts:1-60. The same enum values appear in the Python descriptor pool. Source: gen/python/flyteidl2/app/app_definition_pb2.py:1-1. Operators should treat all of these files as read-only build outputs; schema changes start from the *.proto sources.
Operational Implications from the Community
The community backlog highlights operations gaps that affect the tooling surface directly:
- CI/CD for production is an active design topic (issue #2772) because there is no first-party workflow yet — teams currently mirror the Lyft MLOps flow described in the docs.
- Runtime overrides at execution time (issue #475) are still partially limited to registration-time configuration (resources, catalog, retries, Spark/Hive config), forcing a re-register cycle that complicates day-2 operations.
- Failure-node support (issue #1506) is exposed in the backend but not yet in the Python/Java SDKs, so SREs writing user-facing runbooks must rely on backend hooks.
- Notification delivery (issue #2317) currently funnels through email providers (SES, SendGrid, PagerDuty/GitHub/Slack email APIs), which constrains how operators wire alerting.
Each of these is a reason to keep the executor, flytestdlib, and the IDL generation pipeline under tight CI: they are the surface area that production incidents depend on.
See Also
- Executor API reference (
executor/api/v1/) - Flytestdlib configuration and storage guide (
flytestdlib/) - Flyte App CRD and Knative integration (
app/internal/k8s/) - Flyte IDL generation pipeline (
flyteidl2/)
Source: https://github.com/flyteorg/flyte / Human Manual
Doramagic Pitfall Log
Source-linked risks stay visible on the manual page so the preview does not read like a recommendation.
May increase setup, validation, or first-run risk for the user.
May increase setup, validation, or first-run risk for the user.
May increase setup, validation, or first-run risk for the user.
May increase setup, validation, or first-run risk for the user.
Doramagic Pitfall Log
Found 7 structured pitfall item(s), including 1 high/blocking item(s). Top priority: Installation risk - Installation risk requires verification.
1. Installation risk: Installation risk requires verification
- Severity: high
- Finding: Project evidence flags a installation risk. Review the linked source before relying on this workflow.
- User impact: May increase setup, validation, or first-run risk for the user.
- Recommended check: Reproduce the official install and quickstart path in an isolated environment.
- Evidence: community_evidence:github | https://github.com/flyteorg/flyte/issues/7558
2. Capability evidence risk: Capability evidence risk requires verification
- Severity: medium
- Finding: README/documentation is current enough for a first validation pass.
- User impact: May increase setup, validation, or first-run risk for the user.
- Recommended check: Reproduce the official install and quickstart path in an isolated environment.
- Evidence: capability.assumptions | https://github.com/flyteorg/flyte
3. Maintenance risk: Maintenance risk requires verification
- Severity: medium
- Finding: Project evidence flags a maintenance risk. Review the linked source before relying on this workflow.
- User impact: May increase setup, validation, or first-run risk for the user.
- Recommended check: Reproduce the official install and quickstart path in an isolated environment.
- Evidence: evidence.maintainer_signals | https://github.com/flyteorg/flyte
4. Security or permission risk: Security or permission risk requires verification
- Severity: medium
- Finding: no_demo
- User impact: May increase setup, validation, or first-run risk for the user.
- Recommended check: Reproduce the official install and quickstart path in an isolated environment.
- Evidence: downstream_validation.risk_items | https://github.com/flyteorg/flyte
5. Security or permission risk: Security or permission risk requires verification
- Severity: medium
- Finding: no_demo
- User impact: May increase setup, validation, or first-run risk for the user.
- Recommended check: Reproduce the official install and quickstart path in an isolated environment.
- Evidence: risks.scoring_risks | https://github.com/flyteorg/flyte
6. Maintenance risk: Maintenance risk requires verification
- Severity: low
- Finding: issue_or_pr_quality=unknown。
- User impact: May increase setup, validation, or first-run risk for the user.
- Recommended check: Reproduce the official install and quickstart path in an isolated environment.
- Evidence: evidence.maintainer_signals | https://github.com/flyteorg/flyte
7. Maintenance risk: Maintenance risk requires verification
- Severity: low
- Finding: release_recency=unknown。
- User impact: May increase setup, validation, or first-run risk for the user.
- Recommended check: Reproduce the official install and quickstart path in an isolated environment.
- Evidence: evidence.maintainer_signals | https://github.com/flyteorg/flyte
Source: Doramagic discovery, validation, and Project Pack records
Community Discussion Evidence
These external discussion links are review inputs, not standalone proof that the project is production-ready.
Count of project-level external discussion links exposed on this manual page.
Open the linked issues or discussions before treating the pack as ready for your environment.
Community Discussion Evidence
Doramagic exposes project-level community discussion separately from official documentation. Review these links before using flyte with real data or production workflows.
- Fix typo: rename ActionMetadata.funtion_name → function_name (proto, bac - github / github_issue
- v2.0.24 - github / github_release
- v2.0.23 - github / github_release
- v2.0.22 - github / github_release
- v2.0.21 - github / github_release
- v2.0.20 - github / github_release
- v2.0.19 - github / github_release
- Flyte v1.16.7 milestone release - github / github_release
- v2.0.18 - github / github_release
- v2.0.17 - github / github_release
- v2.0.16 - github / github_release
- Capability evidence risk requires verification - GitHub / issue
Source: Project Pack community evidence and pitfall evidence