# https://github.com/apache/tvm Project Manual

Generated at: 2026-06-17 07:15:21 UTC

## Table of Contents

- [TVM Overview & Unity Architecture](#page-1)
- [Frontends, Relax Graph IR & Transformations](#page-2)
- [TensorIR Scheduling & Backend Code Generation](#page-3)
- [MetaSchedule Auto-Tuning, Runtime & Deployment](#page-4)

<a id='page-1'></a>

## TVM Overview & Unity Architecture

### Related Pages

Related topics: [Frontends, Relax Graph IR & Transformations](#page-2), [TensorIR Scheduling & Backend Code Generation](#page-3), [MetaSchedule Auto-Tuning, Runtime & Deployment](#page-4)

<details>
<summary>Related Source Files</summary>

The following source files were used to generate this page:

- [README.md](https://github.com/apache/tvm/blob/main/README.md)
- [web/README.md](https://github.com/apache/tvm/blob/main/web/README.md)
- [web/package.json](https://github.com/apache/tvm/blob/main/web/package.json)
- [web/src/ctypes.ts](https://github.com/apache/tvm/blob/main/web/src/ctypes.ts)
- [ci/README.md](https://github.com/apache/tvm/blob/main/ci/README.md)
- [ci/jenkins/README.md](https://github.com/apache/tvm/blob/main/ci/jenkins/README.md)
- [ci/scripts/package/README.md](https://github.com/apache/tvm/blob/main/ci/scripts/package/README.md)
- [docker/README.md](https://github.com/apache/tvm/blob/main/docker/README.md)
- [src/backend/vulkan/runtime/README.md](https://github.com/apache/tvm/blob/main/src/backend/vulkan/runtime/README.md)
- [src/backend/hexagon/runtime/README.md](https://github.com/apache/tvm/blob/main/src/backend/hexagon/runtime/README.md)
- [src/backend/opencl/runtime/opencl_wrapper/README.md](https://github.com/apache/tvm/blob/main/src/backend/opencl/runtime/opencl_wrapper/README.md)
- [apps/ios_rpc/README.md](https://github.com/apache/tvm/blob/main/apps/ios_rpc/README.md)
- [apps/android_rpc/README.md](https://github.com/apache/tvm/blob/main/apps/android_rpc/README.md)
- [apps/cpp_rpc/README.md](https://github.com/apache/tvm/blob/main/apps/cpp_rpc/README.md)
- [jvm/README.md](https://github.com/apache/tvm/blob/main/jvm/README.md)
- [python/tvm/relax/backend/contrib/example_npu/README.md](https://github.com/apache/tvm/blob/main/python/tvm/relax/backend/contrib/example_npu/README.md)
</details>

# TVM Overview & Unity Architecture

## 1. Project Purpose and Core Principles

Apache TVM is an open deep learning compiler framework that follows two guiding principles stated directly in the project root: *Python-first development that enables quick customization of machine learning compiler pipelines*, and *Universal deployment to bring models into minimum deployable modules* [README.md](https://github.com/apache/tvm/blob/main/README.md).

The project began as a research effort for deep learning compilation and incorporated ideas from Halide (Tensor IR and arithmetic simplification), Loopy (integer-set analysis and loop transformation primitives), and Theano (symbolic scan for recurrence) [README.md](https://github.com/apache/tvm/blob/main/README.md). After several rounds of redesign, the current architecture centers on a *cross-level design with TensorIR*, unifying high-level graph optimizations and low-level loop-level scheduling in a single abstraction [README.md](https://github.com/apache/tvm/blob/main/README.md). This evolution was tracked publicly through community RFCs, including the TensorIR scheduling tracking issue and the v0.5/v0.8 roadmaps discussed in the community context.

The license is Apache-2.0, and the project operates under the Apache committer model [README.md](https://github.com/apache/tvm/blob/main/README.md).

## 2. The Unity Architecture

The "Unity" initiative referenced in the community vote (issue #16368) represents TVM's unified compilation stack that supports modern GenAI workloads such as stable diffusion, Whisper, GPT-class LLMs, and open LLMs. The architecture is exposed through the Python entry point in `python/tvm/relax/`, where the Relax framework serves as the high-level IR that pairs with TensorIR for low-level scheduling [python/tvm/relax/backend/contrib/example_npu/README.md](https://github.com/apache/tvm/blob/main/python/tvm/relax/backend/contrib/example_npu/README.md).

A key extensibility mechanism is **BYOC (Bring Your Own Codegen)**, which lets external accelerator vendors plug into the Unity pipeline. The `example_npu` directory shows a hands-on pattern for adding an NPU backend targeting mobile NPUs (AMD XDNA, Google Edge TPU, Samsung NPU), dedicated AI chips (Intel Movidius, Qualcomm Hexagon, MediaTek APU), cloud AI accelerators (AWS Inferentia, Google TPU, Microsoft Azure Maia), and custom ASICs [python/tvm/relax/backend/contrib/example_npu/README.md](https://github.com/apache/tvm/blob/main/python/tvm/relax/backend/contrib/example_npu/README.md).

The runtime side of Unity relies on a stable C ABI exposed through `tvmjs` (TVM WASM/WebGPU runtime for JS/TS) and a typed FFI layer. The TypeScript binding in [web/src/ctypes.ts](https://github.com/apache/tvm/blob/main/web/src/ctypes.ts) defines `kTVMFFIObject`, `kTVMFFIFunction`, `kTVMFFITensor`, `kTVMFFIArray`, and related type codes, illustrating the static-object ABI that the compiler, runtime, and language bindings all share.

```mermaid
flowchart LR
    A[Python Frontend<br/>Relax / TIR] --> B[Compiler Pipeline<br/>TensorIR + BYOC]
    B --> C[Target Backends]
    C --> C1[CUDA / Vulkan]
    C --> C2[Hexagon / OpenCL]
    C --> C3[Metal / WebGPU]
    B --> D[Runtime Artifacts]
    D --> D1[C/C++ libtvm_runtime]
    D --> D2[TVM4J JVM]
    D --> D3[tvmjs WASM/JS]
    D --> D4[RPC: iOS / Android / C++]
```

## 3. Deployment Surfaces and Backend Runtimes

Unity is realized through a rich set of backend runtimes and deployment surfaces:

| Surface | Component | Source |
|---|---|---|
| GPU/compute | Vulkan runtime (DeviceAPI, ThreadEntry, WrappedFunc, command-buffer streams) | [src/backend/vulkan/runtime/README.md](https://github.com/apache/tvm/blob/main/src/backend/vulkan/runtime/README.md) |
| DSP | Hexagon runtime (host cross-compile + on-device + Android) | [src/backend/hexagon/runtime/README.md](https://github.com/apache/tvm/blob/main/src/backend/hexagon/runtime/README.md) |
| OpenCL | Dynamically-loaded OpenCL wrapper | [src/backend/opencl/runtime/opencl_wrapper/README.md](https://github.com/apache/tvm/blob/main/src/backend/opencl/runtime/opencl_wrapper/README.md) |
| Browser/Node | WASM + WebGPU via Emscripten (`prepwasm`, `make`) | [web/README.md](https://github.com/apache/tvm/blob/main/web/README.md), [web/package.json](https://github.com/apache/tvm/blob/main/web/package.json) |
| Mobile RPC | iOS and Android RPC apps | [apps/ios_rpc/README.md](https://github.com/apache/tvm/blob/main/apps/ios_rpc/README.md), [apps/android_rpc/README.md](https://github.com/apache/tvm/blob/main/apps/android_rpc/README.md) |
| Server RPC | Standalone C++ RPC server (`USE_CPP_RPC=ON`) | [apps/cpp_rpc/README.md](https://github.com/apache/tvm/blob/main/apps/cpp_rpc/README.md) |
| JVM | TVM4J Java bindings | [jvm/README.md](https://github.com/apache/tvm/blob/main/jvm/README.md) |

The Vulkan runtime intentionally mirrors the CUDA stream model: a thread-local `vkCommandBuffer` queues launches and a fence-based explicit sync simulates CUDA stream semantics [src/backend/vulkan/runtime/README.md](https://github.com/apache/tvm/blob/main/src/backend/vulkan/runtime/README.md). The Hexagon runtime requires LLVM ≥ 7.0.0 plus the Hexagon SDK (≥ 4.0.0) and supports cross-compile from x86, plus on-target and Android builds gated by `USE_HEXAGON`, `USE_HEXAGON_ARCH`, and `USE_HEXAGON_SDK` [src/backend/hexagon/runtime/README.md](https://github.com/apache/tvm/blob/main/src/backend/hexagon/runtime/README.md). The OpenCL wrapper sidesteps SDK dependency by dynamically loading the platform OpenCL library through a fixed set of wrapped symbols [src/backend/opencl/runtime/opencl_wrapper/README.md](https://github.com/apache/tvm/blob/main/src/backend/opencl/runtime/opencl_wrapper/README.md).

## 4. Build Infrastructure, CI, and Distribution

Build and packaging are decoupled. Docker images provide reproducible environments — `docker/bash.sh <image_name>` mounts the source tree, switches the user, and uses host networking [docker/README.md](https://github.com/apache/tvm/blob/main/docker/README.md).

CI is split between Jenkins (Linux, including GPU regression suites that gate merges) and GitHub Actions (Windows, macOS, and automation bots) [ci/README.md](https://github.com/apache/tvm/blob/main/ci/README.md), [ci/jenkins/README.md](https://github.com/apache/tvm/blob/main/ci/jenkins/README.md). Test suites live under `tests/scripts` as `task_*` scripts invoked by `Jenkinsfile`, with shared helpers in `ci/scripts` [ci/README.md](https://github.com/apache/tvm/blob/main/ci/README.md).

Wheels are produced by `cibuildwheel` driven from `pyproject.toml` and `.github/workflows/publish_wheel.yml`. Helper scripts build the CUDA sidecar `libtvm_runtime_cuda.so` (Linux: `manylinux_build_libtvm_runtime_cuda.sh`; Windows: `windows_build_libtvm_runtime_cuda.bat`) [ci/scripts/package/README.md](https://github.com/apache/tvm/blob/main/ci/scripts/package/README.md). The current published version is `0.25.0-dev1` for `tvmjs` [web/package.json](https://github.com/apache/tvm/blob/main/web/package.json), and the v0.25.0.rc1 release in the community context is a backport of recent main.

## See Also

- TVM Relax BYOC example NPU backend — [python/tvm/relax/backend/contrib/example_npu/README.md](https://github.com/apache/tvm/blob/main/python/tvm/relax/backend/contrib/example_npu/README.md)
- TVM WebAssembly / WebGPU runtime — [web/README.md](https://github.com/apache/tvm/blob/main/web/README.md)
- Continuous Integration overview — [ci/README.md](https://github.com/apache/tvm/blob/main/ci/README.md)
- Hexagon backend runtime — [src/backend/hexagon/runtime/README.md](https://github.com/apache/tvm/blob/main/src/backend/hexagon/runtime/README.md)
- Vulkan runtime architecture — [src/backend/vulkan/runtime/README.md](https://github.com/apache/tvm/blob/main/src/backend/vulkan/runtime/README.md)

---

<a id='page-2'></a>

## Frontends, Relax Graph IR & Transformations

### Related Pages

Related topics: [TVM Overview & Unity Architecture](#page-1), [TensorIR Scheduling & Backend Code Generation](#page-3), [MetaSchedule Auto-Tuning, Runtime & Deployment](#page-4)

<details>
<summary>Related Source Files</summary>

The following source files were used to generate this page:

- [README.md](https://github.com/apache/tvm/blob/main/README.md)
- [python/tvm/relax/backend/contrib/example_npu/README.md](https://github.com/apache/tvm/blob/main/python/tvm/relax/backend/contrib/example_npu/README.md)
- [src/backend/hexagon/runtime/hexagon_thread_manager.h](https://github.com/apache/tvm/blob/main/src/backend/hexagon/runtime/hexagon_thread_manager.h)
- [src/backend/vulkan/runtime/README.md](https://github.com/apache/tvm/blob/main/src/backend/vulkan/runtime/README.md)
- [src/backend/opencl/runtime/opencl_wrapper/README.md](https://github.com/apache/tvm/blob/main/src/backend/opencl/runtime/opencl_wrapper/README.md)
- [web/src/runtime.ts](https://github.com/apache/tvm/blob/main/web/src/runtime.ts)
- [web/src/compact.ts](https://github.com/apache/tvm/blob/main/web/src/compact.ts)
- [jvm/core/src/main/java/org/apache/tvm/TVMObject.java](https://github.com/apache/tvm/blob/main/jvm/core/src/main/java/org/apache/tvm/TVMObject.java)
- [jvm/core/src/main/java/org/apache/tvm/TVMValueNull.java](https://github.com/apache/tvm/blob/main/jvm/core/src/main/java/org/apache/tvm/TVMValueNull.java)
- [ci/README.md](https://github.com/apache/tvm/blob/main/ci/README.md)
</details>

# Frontends, Relax Graph IR & Transformations

## Overview

Apache TVM is an open machine learning compilation framework whose stated principles are **Python-first development** for "quick customization of machine learning compiler pipelines" and **universal deployment** to "bring models into minimum deployable modules" [Source: [README.md:1-15]()](https://github.com/apache/tvm/blob/main/README.md). To achieve this, TVM separates *ingestion* of models from many frameworks, *compilation* through a unified high-level graph IR (Relax), and *lowering* to vendor-specific backends. The current design "focuses on a cross-level design with TensorIR" and has gone through several redesigns compared to the initial research project [Source: [README.md:38-42]()](https://github.com/apache/tvm/blob/main/README.md).

This page describes the boundary between **frontends** (model importers), the **Relax** high-level graph IR, and the **transformation** passes that prepare a Relax function for backend lowering.

## Frontends: Importing Models into Relax

TVM supports ingesting models from multiple frameworks through dedicated frontend modules under `python/tvm/relax/frontend/`. Each frontend parses a framework-specific representation and emits a Relax `Function` whose body is composed of Relax `Call` nodes wrapping imported operator expressions. The Relax frontend layer is intentionally minimal: it is responsible only for translation, not for optimization. All subsequent optimization lives in the transformation pipeline.

Once a model is imported, the resulting IRModule is passed to a sequence of Relax transformations before being lowered to TensorIR (or to a BYOC codegen). This separation is a recurring theme in the codebase: a recently added **Example NPU Backend** shows that "Build a Neural Processing Unit (NPU) backend for TVM's Relax framework using Bring Your Own Codegen (BYOC)" is the canonical pattern for adding a new hardware target, and that NPUs typically only accept a fixed set of operations (matmul, conv, activations) [Source: [python/tvm/relax/backend/contrib/example_npu/README.md:5-19]()](https://github.com/apache/tvm/blob/main/python/tvm/relax/backend/contrib/example_npu/README.md). Frontend output is therefore shaped to be partitionable by the BYOC pipeline.

### Frontend Layer Components

| Component | Role |
| --- | --- |
| Per-framework frontend (e.g. Torch, ONNX) | Translate framework graph to Relax |
| Relax `Function` | Top-level callable with structured signature info |
| Relax `Call` | Reference to an operator (callable or extern) |
| `extern` op | Marks a node that must be handled by a backend (BYOC) |

## The Relax Graph IR

Relax is TVM's high-level, function-level IR. It complements TensorIR, which operates on the loop/statement level. Relax carries the *whole-program* view (control flow, function boundaries, shape and type information) while TensorIR describes *how* a particular tensor computation is scheduled. The README explicitly identifies the "cross-level design with TensorIR" as the focus of the most recent redesign [Source: [README.md:40-42]()](https://github.com/apache/tvm/blob/main/README.md).

A Relax module exposes:

- **Expressions** (`expr.h`): Functions, calls, tuples, variables, and constants.
- **StructInfo** (`struct_info.h`): Static shape and type annotations attached to expressions, enabling shape-aware passes before lowering to TIR.
- **Globals and modules**: Top-level functions that may reference each other, allowing whole-program analyses such as constant folding, dead-code elimination, and operator fusion.

The cross-level design means that a Relax function is gradually rewritten: high-level operators (e.g. `relax.matmul`, `relax.conv2d`) are decomposed into `relax.call_tir` invocations, which the lowering pipeline materializes into TensorIR `PrimFunc` implementations.

```mermaid
flowchart LR
  A[Framework Model<br/>PyTorch / ONNX / TFLite] --> B[Relax Frontend]
  B --> C[Relax IRModule]
  C --> D[Relax Transformations]
  D --> E{Partitionable<br/>subgraph?}
  E -- yes --> F[BYOC Codegen<br/>e.g. NPU]
  E -- no --> G[Lower to TensorIR]
  F --> H[Backend Runtime<br/>CPU/CUDA/OpenCL/<br/>Vulkan/Hexagon/Metal]
  G --> H
  H --> I[Deployable Module]
```

## Transformations

Transformations are Python-callable passes that take an `IRModule` and return a (possibly modified) `IRModule`. They are exposed through `python/tvm/relax/transform/transform.py` and the C++ headers under `include/tvm/relax/transform.h`. Each pass is registered so that `tvm.transform.Sequential` can compose them, and the same infrastructure is used for TensorIR passes.

Common categories of Relax transformations include:

- **Operator fusion**: Combining elementwise and reduction chains into a single `Function` to reduce memory traffic.
- **Dead-code elimination**: Removing unused parameters, bindings, and functions.
- **Constant folding and lifting**: Promoting constants out of hot paths.
- **Layout and dtype rewrites**: Adjusting the IR to match the capabilities of a chosen backend.
- **Partitioning for BYOC**: Detecting subgraphs that match a registered external compiler/codegen and replacing them with `extern` calls. The Example NPU backend illustrates the structure of such a partitioner [Source: [python/tvm/relax/backend/contrib/example_npu/README.md:11-19]()](https://github.com/apache/tvm/blob/main/python/tvm/relax/backend/contrib/example_npu/README.md).
- **Lowering to TensorIR**: Converting high-level operators to `call_tir` and corresponding `PrimFunc` implementations.

Transformations are composable; a typical `tvm` compilation pipeline stitches a sequence of Relax passes followed by TensorIR passes and a BYOC integration.

## Frontend-to-Deployment Data Flow

The end-to-end data flow that ties together frontends, the Relax IR, and transformations is:

1. A user loads a model in a frontend module and obtains a Relax `IRModule`.
2. A pipeline of Relax transformations rewrites the module: fuses operators, lifts constants, rewrites layouts, and partitions subgraphs for external codegens.
3. The remaining operators are lowered to TensorIR `PrimFunc`s and scheduled for the chosen target.
4. The final module is compiled into a deployable artifact: a shared library loaded by the C++ runtime, a Java artifact (TVM4J) [Source: [jvm/core/src/main/java/org/apache/tvm/TVMObject.java:1-30]()](https://github.com/apache/tvm/blob/main/jvm/core/src/main/java/org/apache/tvm/TVMObject.java), a WebAssembly bundle driven by `web/src/runtime.ts` [Source: [web/src/runtime.ts:1-25]()](https://github.com/apache/tvm/blob/main/web/src/runtime.ts), or a Hexagon binary compiled with the Hexagon SDK [Source: [src/backend/hexagon/runtime/hexagon_thread_manager.h:1-15]()](https://github.com/apache/tvm/blob/main/src/backend/hexagon/runtime/hexagon_thread_manager.h).
5. Hardware-specific runtime threads (e.g. Hexagon HTP/HVX workers) execute the partitioned functions on-device.

## Community Context

The transition tracked in issue #16368 ("Transition Main to Unity") is a community-driven effort to consolidate TVM's long-lived branches (Relax, Unity, MLC-AI) into a single main branch for GenAI workloads, which directly affects the frontends and Relax pipeline described above. TensorIR scheduling (issue #7527) provides the scheduling primitives that the Relax-to-TensorIR lowering pass relies on. Auto TensorCore CodeGen (issue #4105) is an example of a BYOC integration that consumes the same partitioning machinery as the Example NPU backend.

## See Also

- [TVM TensorIR Scheduling](https://github.com/apache/tvm/blob/main/src/tir/README.md) — the lower-level IR that Relax lowers to.
- [Example NPU Backend](https://github.com/apache/tvm/blob/main/python/tvm/relax/backend/contrib/example_npu/README.md) — worked example of BYOC partitioning.
- [Hexagon Runtime](https://github.com/apache/tvm/blob/main/src/backend/hexagon/runtime/README.md) — one of the supported deployment targets.
- [TVM WebAssembly Runtime](https://github.com/apache/tvm/blob/main/web/README.md) — browser-side deployment via the JavaScript runtime in `web/src/runtime.ts`.

---

<a id='page-3'></a>

## TensorIR Scheduling & Backend Code Generation

### Related Pages

Related topics: [TVM Overview & Unity Architecture](#page-1), [Frontends, Relax Graph IR & Transformations](#page-2), [MetaSchedule Auto-Tuning, Runtime & Deployment](#page-4)

<details>
<summary>Related Source Files</summary>

The following source files were used to generate this page:

- [README.md](https://github.com/apache/tvm/blob/main/README.md)
- [web/src/runtime.ts](https://github.com/apache/tvm/blob/main/web/src/runtime.ts)
- [web/src/compact.ts](https://github.com/apache/tvm/blob/main/web/src/compact.ts)
- [web/README.md](https://github.com/apache/tvm/blob/main/web/README.md)
- [jvm/core/src/main/java/org/apache/tvm/Device.java](https://github.com/apache/tvm/blob/main/jvm/core/src/main/java/org/apache/tvm/Device.java)
- [src/backend/vulkan/runtime/README.md](https://github.com/apache/tvm/blob/main/src/backend/vulkan/runtime/README.md)
- [src/backend/hexagon/runtime/hexagon_thread_manager.h](https://github.com/apache/tvm/blob/main/src/backend/hexagon/runtime/hexagon_thread_manager.h)
- [src/backend/hexagon/runtime/hexagon_device_api.h](https://github.com/apache/tvm/blob/main/src/backend/hexagon/runtime/hexagon_device_api.h)
- [src/backend/hexagon/runtime/hexagon_user_dma_registers.h](https://github.com/apache/tvm/blob/main/src/backend/hexagon/runtime/hexagon_user_dma_registers.h)
- [src/backend/hexagon/runtime/hexagon_user_dma_descriptors.h](https://github.com/apache/tvm/blob/main/src/backend/hexagon/runtime/hexagon_user_dma_descriptors.h)
- [python/tvm/relax/backend/contrib/example_npu/README.md](https://github.com/apache/tvm/blob/main/python/tvm/relax/backend/contrib/example_npu/README.md)
- [apps/android_rpc/app/src/main/jni/tvm_runtime.h](https://github.com/apache/tvm/blob/main/apps/android_rpc/app/src/main/jni/tvm_runtime.h)
</details>

# TensorIR Scheduling & Backend Code Generation

## 1. Overview and Purpose

TensorIR is described in the project README as the focus of TVM's most recent architectural revision, which "focuses on a cross-level design with TensorIR" ([README.md:1-50](https://github.com/apache/tvm/blob/main/README.md)). TensorIR Scheduling refers to the transformation primitives that operate on a schedulable tensor-level intermediate representation (IR), allowing compiler developers to rewrite loops, perform tiling, vectorization, and fusion before backend code generation. Community discussion under issue [#7527 "[RFC][Tracking Issue] TensorIR Scheduling"](https://github.com/apache/tvm/issues/7527) and the Unity transition vote in issue [#16368](https://github.com/apache/tvm/issues/16368) confirm TensorIR scheduling is a foundational piece for the new generation of TVM.

Backend Code Generation is the second half of the pipeline: a scheduled TIR is lowered into target-specific source code and a runtime module. Each backend (CUDA, Vulkan, WebGPU, Hexagon, NPU) implements its own device API, stream model, and module loader, then registers a global packed function that the runtime calls.

## 2. Compilation Pipeline

The flow from a high-level model to an executable backend module moves through several stages:

```mermaid
flowchart LR
    A[High-level Model] --> B[Relax / Graph IR]
    B --> C[TensorIR]
    C --> D[Schedule Primitives]
    D --> E[Target-specific TIR]
    E --> F[Backend CodeGen]
    F --> G[Runtime Module]
    G --> H[Device API + Stream]
```

Source: [README.md:1-50](https://github.com/apache/tvm/blob/main/README.md), [src/backend/hexagon/runtime/hexagon_device_api.h:1-50](https://github.com/apache/tvm/blob/main/src/backend/hexagon/runtime/hexagon_device_api.h)

The schedule step applies transformations such as `split`, `reorder`, `cache_read`, `cache_write`, `compute_at`, and `reverse_compute_at`. These are exposed to Python through `tvm.s_tir` and the underlying C++ schedule class executes the corresponding mutations on the IR.

## 3. Backend Code Generation Across Targets

Once a schedule is fixed, the lowered TIR is handed to a per-target codegen that emits C++, CUDA, SPIR-V, LLVM IR, or C-source. The runtime side is then materialised as a loadable module. The following table summarizes the relevant backends observed in the repository and the runtime hooks they implement.

| Target | Codegen / Runtime Anchor | Key Responsibility |
|---|---|---|
| CUDA | `src/backend/cuda/codegen/` (scheduled TIR → PTX/CUDA C++) | TensorCore intrinsics, shared memory alloc |
| Vulkan | `src/backend/vulkan/runtime/README.md` | SPIR-V pipeline from TIR shader |
| WebGPU / Wasm | `web/src/runtime.ts`, `web/README.md` | Loads `tvmjs_runtime.wasm`, exposes PackedFunc |
| Hexagon | `src/backend/hexagon/runtime/hexagon_*.h` | HTP/HVX threads, user-DMA, VTCM pool |
| Android RPC | `apps/android_rpc/app/src/main/jni/tvm_runtime.h` | Unified C++ runtime with conditional backends |

Source: [src/backend/vulkan/runtime/README.md:1-30](https://github.com/apache/tvm/blob/main/src/backend/vulkan/runtime/README.md), [web/README.md:1-50](https://github.com/apache/tvm/blob/main/web/README.md), [apps/android_rpc/app/src/main/jni/tvm_runtime.h:1-50](https://github.com/apache/tvm/blob/main/apps/android_rpc/app/src/main/jni/tvm_runtime.h).

The Vulkan README explicitly notes that TVM simulates CUDA-style streams by "maintaining a thread-local `vkCommandBuffer` instance, and queueing up (or eagerly executing, depending on the availability of the `VK_KHR_push_descriptor` extension)" ([src/backend/vulkan/runtime/README.md:1-20](https://github.com/apache/tvm/blob/main/src/backend/vulkan/runtime/README.md)). When a scheduled kernel is submitted, the Vulkan device API ends command-buffer recording, submits it to the device queue, and waits on a fence — a direct parallel to how the CUDA backend synchronises its stream.

## 4. Hexagon Backend: A Worked Example

The Hexagon backend is one of the most complete illustrations of post-schedule codegen. A scheduled TIR is lowered into Hexagon-specific code; the runtime layer then manages hardware resources explicitly:

- The `HexagonThreadManager` spawns worker threads with configurable stack and pipe sizes, and exposes them as `TVMStreamHandle`s ([src/backend/hexagon/runtime/hexagon_thread_manager.h:1-50](https://github.com/apache/tvm/blob/main/src/backend/hexagon/runtime/hexagon_thread_manager.h)).
- The `HexagonDeviceAPI` implements `AllocDataSpace`, `CopyDataFromTo`, and owns the thread manager, user-DMA engine, and VTCM pool ([src/backend/hexagon/runtime/hexagon_device_api.h:1-50](https://github.com/apache/tvm/blob/main/src/backend/hexagon/runtime/hexagon_device_api.h)).
- The DMA engine is programmed by writing the descriptor fields defined in `hexagon_user_dma_descriptors.h` and the control registers in `hexagon_user_dma_registers.h`, where bit-fields such as `DESC_BYPASSSRC_MASK` and `DM0_STATUS_*` are encoded directly into the IR-generated DMA setup sequence.

This pattern — schedule the loop nest, lower to target instructions, then dispatch through a device API that owns threads, streams, and memory pools — is the same shape used for CUDA and Vulkan.

## 5. Cross-Runtime Frontends and Device Codes

The device-type enum is shared across frontends. The Java binding defines constants such as `kDLCPU = 1`, `kDLCUDA = 2`, `kDLOpenCL = 4`, `kDLVulkan = 7`, `kDLMetal = 8`, `kDLWebGPU = 15`, `kDLHexagon = 16` ([jvm/core/src/main/java/org/apache/tvm/Device.java:1-50](https://github.com/apache/tvm/blob/main/jvm/core/src/main/java/org/apache/tvm/Device.java)). The TypeScript runtime mirrors this enum: `cpu: 1, cuda: 2, cl: 4, vulkan: 7, metal: 8, webgpu: 15` ([web/src/runtime.ts:1-20](https://github.com/apache/tvm/blob/main/web/src/runtime.ts)). The two maps must stay in lock-step so that a module produced by the Python compiler can be deserialised and dispatched on either the JVM or the browser runtime.

## 6. Common Failure Modes and Limitations

A few recurring failure patterns surface across the backends and frontends:

- Device-type drift: if a new `DLDevice` code is added to the C++ runtime but not propagated to `Device.java` or `runtime.ts`, the frontend will fail to recognise the device string.
- WASM/WebGPU lifecycle: the TypeScript runtime warns that objects returned from `PackedFunc` calls must be released through a scope, because WASM and WebGPU memory is "not tracked through JS native garbage collection" ([web/src/runtime.ts:1-50](https://github.com/apache/tvm/blob/main/web/src/runtime.ts)).
- WebGPU is still flagged as experimental: the WebGPU RPC test requires Chrome Canary on macOS plus Vulkan SDK ≥ 1.1, and Firefox support is pending the Fence extension ([web/README.md:1-50](https://github.com/apache/tvm/blob/main/web/README.md)).
- TensorCore codegen maturity: the Auto TensorCore CodeGen RFC (issue [#4105](https://github.com/apache/tvm/issues/4105)) emphasises that algorithm description and schedule should not differ from normal CUDA codegen, so schedule authors must take care to expose layout, mma fragment, and shared-memory buffer information that the TensorCore lowering expects.

## 7. See Also

- TensorIR RFC and tracking issue: [#7527](https://github.com/apache/tvm/issues/7527)
- Auto TensorCore CodeGen: [#4105](https://github.com/apache/tvm/issues/4105)
- Unity transition vote: [#16368](https://github.com/apache/tvm/issues/16368)
- TVM Roadmap v0.8: [#7434](https://github.com/apache/tvm/issues/7434)
- Relax example NPU backend: [python/tvm/relax/backend/contrib/example_npu/README.md](https://github.com/apache/tvm/blob/main/python/tvm/relax/backend/contrib/example_npu/README.md)

---

<a id='page-4'></a>

## MetaSchedule Auto-Tuning, Runtime & Deployment

### Related Pages

Related topics: [TVM Overview & Unity Architecture](#page-1), [Frontends, Relax Graph IR & Transformations](#page-2), [TensorIR Scheduling & Backend Code Generation](#page-3)

<details>
<summary>Related Source Files</summary>

The following source files were used to generate this page:

- [README.md](https://github.com/apache/tvm/blob/main/README.md)
- [web/README.md](https://github.com/apache/tvm/blob/main/web/README.md)
- [web/src/runtime.ts](https://github.com/apache/tvm/blob/main/web/src/runtime.ts)
- [web/src/index.ts](https://github.com/apache/tvm/blob/main/web/src/index.ts)
- [src/backend/hexagon/runtime/README.md](https://github.com/apache/tvm/blob/main/src/backend/hexagon/runtime/README.md)
- [src/backend/hexagon/runtime/hexagon_thread_manager.h](https://github.com/apache/tvm/blob/main/src/backend/hexagon/runtime/hexagon_thread_manager.h)
- [src/backend/hexagon/runtime/hexagon_user_dma_registers.h](https://github.com/apache/tvm/blob/main/src/backend/hexagon/runtime/hexagon_user_dma_registers.h)
- [src/backend/vulkan/runtime/README.md](https://github.com/apache/tvm/blob/main/src/backend/vulkan/runtime/README.md)
- [src/backend/opencl/runtime/opencl_wrapper/README.md](https://github.com/apache/tvm/blob/main/src/backend/opencl/runtime/opencl_wrapper/README.md)
- [apps/ios_rpc/README.md](https://github.com/apache/tvm/blob/main/apps/ios_rpc/README.md)
- [apps/cpp_rpc/README.md](https://github.com/apache/tvm/blob/main/apps/cpp_rpc/README.md)
- [apps/android_rpc/README.md](https://github.com/apache/tvm/blob/main/apps/android_rpc/README.md)
- [jvm/README.md](https://github.com/apache/tvm/blob/main/jvm/README.md)
- [python/tvm/relax/backend/contrib/example_npu/README.md](https://github.com/apache/tvm/blob/main/python/tvm/relax/backend/contrib/example_npu/README.md)
- [ci/README.md](https://github.com/apache/tvm/blob/main/ci/README.md)
- [ci/jenkins/README.md](https://github.com/apache/tvm/blob/main/ci/jenkins/README.md)
- [ci/scripts/package/README.md](https://github.com/apache/tvm/blob/main/ci/scripts/package/README.md)
- [docker/README.md](https://github.com/apache/tvm/blob/main/docker/README.md)
</details>

# MetaSchedule Auto-Tuning, Runtime & Deployment

## Overview

Apache TVM is an open machine learning compilation framework built around two guiding principles: Python-first development so compiler pipelines can be customized quickly, and universal deployment that turns models into minimum deployable modules. Source: [README.md](https://github.com/apache/tvm/blob/main/README.md)

The "Auto-Tuning, Runtime & Deployment" surface spans three loosely coupled layers:

1. The schedule-search machinery that derives optimized programs (TensorIR / MetaSchedule), discussed publicly in the TensorIR Scheduling tracking issue #7527.
2. The device-level runtimes that execute the resulting compiled modules.
3. The deployment packaging that delivers those modules to browsers, phones, DSPs, GPUs, and JVMs.

The source files in the repository demonstrate an unusually broad runtime footprint, while exposing a unified FFI (`PackedFunc` / `AsyncPackedFunc`) for client code. Source: [web/src/runtime.ts](https://github.com/apache/tvm/blob/main/web/src/runtime.ts)

## TensorIR / Auto-Tuning Surface

The most recent design focus is "a cross-level design with TensorIR," which links the high-level Relax/Graph IR to the low-level TensorIR scheduling surface so that automatic schedule search can be a drop-in component of an end-to-end compilation pipeline. Source: [README.md](https://github.com/apache/tvm/blob/main/README.md)

The TensorIR Scheduling RFC tracking issue (#7527) collects the original RFC and the landing sequence of the initial schedule primitives, schedule rules, and search strategies that together form the auto-tuning stack. The runtime backends consume the same TensorIR / TIR function form that these auto-tuners emit, so the entire stack is centered on a single IR contract. Source: [README.md](https://github.com/apache/tvm/blob/main/README.md)

## Runtime Infrastructure

TVM's runtime is organized as a collection of `DeviceAPI` implementations, each owning its own resource managers, threading model, and memory allocators:

- **Hexagon DSP runtime** — Provides the executables, libraries, and wrappers needed to load and run compiled TVM modules on Qualcomm Hexagon hardware or the Hexagon simulator. The thread manager tracks hardware resources (HTP, HVX) and creates resource managers on demand. Source: [src/backend/hexagon/runtime/README.md](https://github.com/apache/tvm/blob/main/src/backend/hexagon/runtime/README.md), [src/backend/hexagon/runtime/hexagon_thread_manager.h](https://github.com/apache/tvm/blob/main/src/backend/hexagon/runtime/hexagon_thread_manager.h)
- **Vulkan runtime** — Implements the TVM `DeviceAPI` interface on top of Vulkan. `VulkanDeviceAPI` initializes the Vulkan instance and devices, `VulkanThreadEntry` maintains a per-thread staging buffer and stream, and `VulkanWrappedFunc` retrieves a `VulkanPipeline` from the module node and launches the kernel on the active stream. Source: [src/backend/vulkan/runtime/README.md](https://github.com/apache/tvm/blob/main/src/backend/vulkan/runtime/README.md)
- **OpenCL runtime** — A thin wrapper around the OpenCL host API that lets TVM load OpenCL at runtime on devices where the SDK is not preinstalled (e.g., Android phones), avoiding the need to vendor a copy of the OpenCL library. Source: [src/backend/opencl/runtime/opencl_wrapper/README.md](https://github.com/apache/tvm/blob/main/src/backend/opencl/runtime/opencl_wrapper/README.md)
- **WebAssembly / Web runtime** — A TypeScript FFI layer (`FFILibrary`) wrapping a WebAssembly instance, exposing `PackedFunc` and `AsyncPackedFunc` callable types and a `WebGPUContext` for GPU-backed execution in the browser. Source: [web/src/runtime.ts](https://github.com/apache/tvm/blob/main/web/src/runtime.ts), [web/src/index.ts](https://github.com/apache/tvm/blob/main/web/src/index.ts)

The Web runtime is built with Emscripten into `libtvm_runtime.bc`, `tvmjs_runtime.wasm`, and a WASI-compatible `tvmjs_runtime.wasi.js`; the TypeScript bundle is then produced with `npm run bundle`. Source: [web/README.md](https://github.com/apache/tvm/blob/main/web/README.md)

## Deployment Targets

```mermaid
flowchart LR
    A[Python toolchain<br/>compile + tune] --> B[Compiled TVM module]
    B --> C[Native C++ runtime]
    B --> D[C++ RPC server]
    D --> E[iOS app]
    D --> F[Android app]
    B --> G[WASM runtime + JS bundle]
    G --> H[Browser / WebGPU]
    B --> I[Java JNI via TVM4J]
    B --> J[Hexagon DSP / HTP / HVX]
    B --> K[BYOC: NPU accelerators]
```

Deployment is intentionally minimal: a tuned module is exported as a shared library plus a thin C/C++ entry point. The repository ships several packaging targets out of the box:

- **C++ RPC server** — Built when `USE_CPP_RPC=ON`. The same recipe is reused for Android cross-compilation via the NDK toolchain file. Source: [apps/cpp_rpc/README.md](https://github.com/apache/tvm/blob/main/apps/cpp_rpc/README.md)
- **iOS RPC app** — An Xcode project that embeds the TVM runtime and a custom DSO loader plugin, allowing the host Python script to drive the device over the RPC channel. Source: [apps/ios_rpc/README.md](https://github.com/apache/tvm/blob/main/apps/ios_rpc/README.md)
- **Android RPC app** — A Gradle project that bundles TVM4J and exposes the RPC server. Source: [apps/android_rpc/README.md](https://github.com/apache/tvm/blob/main/apps/android_rpc/README.md)
- **JVM (TVM4J)** — A Java frontend that constructs tensors from native arrays, registers Java callbacks as TVM functions, loads shared libraries produced by the Python toolchain, and provides RPC primitives. Requires JDK 1.6+, Maven 3, and an LLVM-enabled TVM build. Source: [jvm/README.md](https://github.com/apache/tvm/blob/main/jvm/README.md)
- **BYOC NPUs** — The Relax framework's Bring-Your-Own-Codegen pipeline supports dispatching subgraphs to external accelerators. The example NPU backend in `python/tvm/relax/backend/contrib/example_npu` shows the pattern for mobile NPUs (AMD XDNA, Google Edge TPU, Samsung NPU), dedicated AI chips (Intel Movidius, Qualcomm Hexagon, MediaTek APU), and cloud AI accelerators. Source: [python/tvm/relax/backend/contrib/example_npu/README.md](https://github.com/apache/tvm/blob/main/python/tvm/relax/backend/contrib/example_npu/README.md)
- **Wheels** — `cibuildwheel` driven by `.github/workflows/publish_wheel.yml` and `pyproject.toml`. Helper scripts such as `manylinux_build_libtvm_runtime_cuda.sh` and `windows_build_libtvm_runtime_cuda.bat` build the CUDA-enabled runtime sidecar. Source: [ci/scripts/package/README.md](https://github.com/apache/tvm/blob/main/ci/scripts/package/README.md)

Continuous integration is split between Jenkins (Linux + accelerated hardware) and GitHub Actions (Windows, macOS, on-repo automations). Lint scripts live in `tests/lint`, task scripts in `tests/scripts`, and Docker images in `docker/` provide the underlying execution environments. Source: [ci/README.md](https://github.com/apache/tvm/blob/main/ci/README.md), [ci/jenkins/README.md](https://github.com/apache/tvm/blob/main/ci/jenkins/README.md), [docker/README.md](https://github.com/apache/tvm/blob/main/docker/README.md)

## Hexagon-Specific Notes

The Hexagon backend shows the depth of the deployment story. The user-mode DMA driver is mapped through a dedicated header that defines the `dm*_set_*` / `dm*_get_*` accessors for the Syndrone descriptors, the bus-error and abort codes, and the guest/monitor mode controls. Source: [src/backend/hexagon/runtime/hexagon_user_dma_registers.h](https://github.com/apache/tvm/blob/main/src/backend/hexagon/runtime/hexagon_user_dma_registers.h)

For host-side cross-compilation, LLVM 7.0.0 is the minimum supported version; for execution, the Hexagon SDK 4.0.0 or later is required. Source: [src/backend/hexagon/runtime/README.md](https://github.com/apache/tvm/blob/main/src/backend/hexagon/runtime/README.md)

## See Also

- Project overview: [README.md](https://github.com/apache/tvm/blob/main/README.md)
- Web runtime build: [web/README.md](https://github.com/apache/tvm/blob/main/web/README.md)
- Packaging & wheels: [ci/scripts/package/README.md](https://github.com/apache/tvm/blob/main/ci/scripts/package/README.md)
- Community: TensorIR Scheduling RFC tracking issue [#7527](https://github.com/apache/tvm/issues/7527)
- Related backends: Vulkan ([src/backend/vulkan/runtime/README.md](https://github.com/apache/tvm/blob/main/src/backend/vulkan/runtime/README.md)), OpenCL ([src/backend/opencl/runtime/opencl_wrapper/README.md](https://github.com/apache/tvm/blob/main/src/backend/opencl/runtime/opencl_wrapper/README.md))
- Deployment apps: [apps/cpp_rpc/README.md](https://github.com/apache/tvm/blob/main/apps/cpp_rpc/README.md), [apps/ios_rpc/README.md](https://github.com/apache/tvm/blob/main/apps/ios_rpc/README.md), [apps/android_rpc/README.md](https://github.com/apache/tvm/blob/main/apps/android_rpc/README.md), [jvm/README.md](https://github.com/apache/tvm/blob/main/jvm/README.md)
- BYOC: [python/tvm/relax/backend/contrib/example_npu/README.md](https://github.com/apache/tvm/blob/main/python/tvm/relax/backend/contrib/example_npu/README.md)

---

<!-- evidence_pipeline_checked: true -->
<!-- evidence_injected: true -->

---

## Pitfall Log

Project: apache/tvm

Summary: Found 8 structured pitfall item(s), including 1 high/blocking item(s). Top priority: Security or permission risk - Security or permission risk requires verification.

## 1. Security or permission risk - Security or permission risk requires verification

- Severity: high
- Evidence strength: source_linked
- Finding: Project evidence flags a security or permission risk. Review the linked source before relying on this workflow.
- User impact: May increase setup, validation, or first-run risk for the user.
- Evidence: community_evidence:github | https://github.com/apache/tvm/issues/19802

## 2. Capability evidence risk - Capability evidence risk requires verification

- Severity: medium
- Evidence strength: source_linked
- Finding: README/documentation is current enough for a first validation pass.
- User impact: May increase setup, validation, or first-run risk for the user.
- Evidence: capability.assumptions | https://github.com/apache/tvm

## 3. Maintenance risk - Maintenance risk requires verification

- Severity: medium
- Evidence strength: source_linked
- Finding: Project evidence flags a maintenance risk. Review the linked source before relying on this workflow.
- User impact: May increase setup, validation, or first-run risk for the user.
- Evidence: evidence.maintainer_signals | https://github.com/apache/tvm

## 4. Security or permission risk - Security or permission risk requires verification

- Severity: medium
- Evidence strength: source_linked
- Finding: no_demo
- User impact: May increase setup, validation, or first-run risk for the user.
- Evidence: downstream_validation.risk_items | https://github.com/apache/tvm

## 5. Security or permission risk - Security or permission risk requires verification

- Severity: medium
- Evidence strength: source_linked
- Finding: no_demo
- User impact: May increase setup, validation, or first-run risk for the user.
- Evidence: risks.scoring_risks | https://github.com/apache/tvm

## 6. Security or permission risk - Security or permission risk requires verification

- Severity: medium
- Evidence strength: source_linked
- Finding: Project evidence flags a security or permission risk. Review the linked source before relying on this workflow.
- User impact: May increase setup, validation, or first-run risk for the user.
- Evidence: community_evidence:github | https://github.com/apache/tvm/issues/19702

## 7. Maintenance risk - Maintenance risk requires verification

- Severity: low
- Evidence strength: source_linked
- Finding: issue_or_pr_quality=unknown。
- User impact: May increase setup, validation, or first-run risk for the user.
- Evidence: evidence.maintainer_signals | https://github.com/apache/tvm

## 8. Maintenance risk - Maintenance risk requires verification

- Severity: low
- Evidence strength: source_linked
- Finding: release_recency=unknown。
- User impact: May increase setup, validation, or first-run risk for the user.
- Evidence: evidence.maintainer_signals | https://github.com/apache/tvm

<!-- canonical_name: apache/tvm; human_manual_source: deepwiki_human_wiki -->