Doramagic Project Pack · Human Manual
tvm
Open Machine Learning Compiler Framework
TVM Overview & Unity Architecture
Related topics: Frontends, Relax Graph IR & Transformations, TensorIR Scheduling & Backend Code Generation, MetaSchedule Auto-Tuning, Runtime & Deployment
Continue reading this section for the full explanation and source context.
Related Pages
Related topics: Frontends, Relax Graph IR & Transformations, TensorIR Scheduling & Backend Code Generation, MetaSchedule Auto-Tuning, Runtime & Deployment
TVM Overview & Unity Architecture
1. Project Purpose and Core Principles
Apache TVM is an open deep learning compiler framework that follows two guiding principles stated directly in the project root: *Python-first development that enables quick customization of machine learning compiler pipelines*, and *Universal deployment to bring models into minimum deployable modules* README.md.
The project began as a research effort for deep learning compilation and incorporated ideas from Halide (Tensor IR and arithmetic simplification), Loopy (integer-set analysis and loop transformation primitives), and Theano (symbolic scan for recurrence) README.md. After several rounds of redesign, the current architecture centers on a *cross-level design with TensorIR*, unifying high-level graph optimizations and low-level loop-level scheduling in a single abstraction README.md. This evolution was tracked publicly through community RFCs, including the TensorIR scheduling tracking issue and the v0.5/v0.8 roadmaps discussed in the community context.
The license is Apache-2.0, and the project operates under the Apache committer model README.md.
2. The Unity Architecture
The "Unity" initiative referenced in the community vote (issue #16368) represents TVM's unified compilation stack that supports modern GenAI workloads such as stable diffusion, Whisper, GPT-class LLMs, and open LLMs. The architecture is exposed through the Python entry point in python/tvm/relax/, where the Relax framework serves as the high-level IR that pairs with TensorIR for low-level scheduling python/tvm/relax/backend/contrib/example_npu/README.md.
A key extensibility mechanism is BYOC (Bring Your Own Codegen), which lets external accelerator vendors plug into the Unity pipeline. The example_npu directory shows a hands-on pattern for adding an NPU backend targeting mobile NPUs (AMD XDNA, Google Edge TPU, Samsung NPU), dedicated AI chips (Intel Movidius, Qualcomm Hexagon, MediaTek APU), cloud AI accelerators (AWS Inferentia, Google TPU, Microsoft Azure Maia), and custom ASICs python/tvm/relax/backend/contrib/example_npu/README.md.
The runtime side of Unity relies on a stable C ABI exposed through tvmjs (TVM WASM/WebGPU runtime for JS/TS) and a typed FFI layer. The TypeScript binding in web/src/ctypes.ts defines kTVMFFIObject, kTVMFFIFunction, kTVMFFITensor, kTVMFFIArray, and related type codes, illustrating the static-object ABI that the compiler, runtime, and language bindings all share.
flowchart LR
A[Python Frontend<br/>Relax / TIR] --> B[Compiler Pipeline<br/>TensorIR + BYOC]
B --> C[Target Backends]
C --> C1[CUDA / Vulkan]
C --> C2[Hexagon / OpenCL]
C --> C3[Metal / WebGPU]
B --> D[Runtime Artifacts]
D --> D1[C/C++ libtvm_runtime]
D --> D2[TVM4J JVM]
D --> D3[tvmjs WASM/JS]
D --> D4[RPC: iOS / Android / C++]3. Deployment Surfaces and Backend Runtimes
Unity is realized through a rich set of backend runtimes and deployment surfaces:
| Surface | Component | Source |
|---|---|---|
| GPU/compute | Vulkan runtime (DeviceAPI, ThreadEntry, WrappedFunc, command-buffer streams) | src/backend/vulkan/runtime/README.md |
| DSP | Hexagon runtime (host cross-compile + on-device + Android) | src/backend/hexagon/runtime/README.md |
| OpenCL | Dynamically-loaded OpenCL wrapper | src/backend/opencl/runtime/opencl_wrapper/README.md |
| Browser/Node | WASM + WebGPU via Emscripten (prepwasm, make) | web/README.md, web/package.json |
| Mobile RPC | iOS and Android RPC apps | apps/ios_rpc/README.md, apps/android_rpc/README.md |
| Server RPC | Standalone C++ RPC server (USE_CPP_RPC=ON) | apps/cpp_rpc/README.md |
| JVM | TVM4J Java bindings | jvm/README.md |
The Vulkan runtime intentionally mirrors the CUDA stream model: a thread-local vkCommandBuffer queues launches and a fence-based explicit sync simulates CUDA stream semantics src/backend/vulkan/runtime/README.md. The Hexagon runtime requires LLVM ≥ 7.0.0 plus the Hexagon SDK (≥ 4.0.0) and supports cross-compile from x86, plus on-target and Android builds gated by USE_HEXAGON, USE_HEXAGON_ARCH, and USE_HEXAGON_SDK src/backend/hexagon/runtime/README.md. The OpenCL wrapper sidesteps SDK dependency by dynamically loading the platform OpenCL library through a fixed set of wrapped symbols src/backend/opencl/runtime/opencl_wrapper/README.md.
4. Build Infrastructure, CI, and Distribution
Build and packaging are decoupled. Docker images provide reproducible environments — docker/bash.sh <image_name> mounts the source tree, switches the user, and uses host networking docker/README.md.
CI is split between Jenkins (Linux, including GPU regression suites that gate merges) and GitHub Actions (Windows, macOS, and automation bots) ci/README.md, ci/jenkins/README.md. Test suites live under tests/scripts as task_* scripts invoked by Jenkinsfile, with shared helpers in ci/scripts ci/README.md.
Wheels are produced by cibuildwheel driven from pyproject.toml and .github/workflows/publish_wheel.yml. Helper scripts build the CUDA sidecar libtvm_runtime_cuda.so (Linux: manylinux_build_libtvm_runtime_cuda.sh; Windows: windows_build_libtvm_runtime_cuda.bat) ci/scripts/package/README.md. The current published version is 0.25.0-dev1 for tvmjs web/package.json, and the v0.25.0.rc1 release in the community context is a backport of recent main.
See Also
- TVM Relax BYOC example NPU backend — python/tvm/relax/backend/contrib/example_npu/README.md
- TVM WebAssembly / WebGPU runtime — web/README.md
- Continuous Integration overview — ci/README.md
- Hexagon backend runtime — src/backend/hexagon/runtime/README.md
- Vulkan runtime architecture — src/backend/vulkan/runtime/README.md
Source: https://github.com/apache/tvm / Human Manual
Frontends, Relax Graph IR & Transformations
Related topics: TVM Overview & Unity Architecture, TensorIR Scheduling & Backend Code Generation, MetaSchedule Auto-Tuning, Runtime & Deployment
Continue reading this section for the full explanation and source context.
Continue reading this section for the full explanation and source context.
Related Pages
Related topics: TVM Overview & Unity Architecture, TensorIR Scheduling & Backend Code Generation, MetaSchedule Auto-Tuning, Runtime & Deployment
Frontends, Relax Graph IR & Transformations
Overview
Apache TVM is an open machine learning compilation framework whose stated principles are Python-first development for "quick customization of machine learning compiler pipelines" and universal deployment to "bring models into minimum deployable modules" Source: [README.md:1-15](https://github.com/apache/tvm/blob/ddfec9c3d670358dd75225f1cf9fbfb6c9b0cdfa/README.md). To achieve this, TVM separates *ingestion* of models from many frameworks, *compilation* through a unified high-level graph IR (Relax), and *lowering* to vendor-specific backends. The current design "focuses on a cross-level design with TensorIR" and has gone through several redesigns compared to the initial research project Source: [README.md:38-42](https://github.com/apache/tvm/blob/ddfec9c3d670358dd75225f1cf9fbfb6c9b0cdfa/README.md).
This page describes the boundary between frontends (model importers), the Relax high-level graph IR, and the transformation passes that prepare a Relax function for backend lowering.
Frontends: Importing Models into Relax
TVM supports ingesting models from multiple frameworks through dedicated frontend modules under python/tvm/relax/frontend/. Each frontend parses a framework-specific representation and emits a Relax Function whose body is composed of Relax Call nodes wrapping imported operator expressions. The Relax frontend layer is intentionally minimal: it is responsible only for translation, not for optimization. All subsequent optimization lives in the transformation pipeline.
Once a model is imported, the resulting IRModule is passed to a sequence of Relax transformations before being lowered to TensorIR (or to a BYOC codegen). This separation is a recurring theme in the codebase: a recently added Example NPU Backend shows that "Build a Neural Processing Unit (NPU) backend for TVM's Relax framework using Bring Your Own Codegen (BYOC)" is the canonical pattern for adding a new hardware target, and that NPUs typically only accept a fixed set of operations (matmul, conv, activations) Source: [python/tvm/relax/backend/contrib/example_npu/README.md:5-19](https://github.com/apache/tvm/blob/ddfec9c3d670358dd75225f1cf9fbfb6c9b0cdfa/python/tvm/relax/backend/contrib/example_npu/README.md). Frontend output is therefore shaped to be partitionable by the BYOC pipeline.
Frontend Layer Components
| Component | Role |
|---|---|
| Per-framework frontend (e.g. Torch, ONNX) | Translate framework graph to Relax |
Relax Function | Top-level callable with structured signature info |
Relax Call | Reference to an operator (callable or extern) |
extern op | Marks a node that must be handled by a backend (BYOC) |
The Relax Graph IR
Relax is TVM's high-level, function-level IR. It complements TensorIR, which operates on the loop/statement level. Relax carries the *whole-program* view (control flow, function boundaries, shape and type information) while TensorIR describes *how* a particular tensor computation is scheduled. The README explicitly identifies the "cross-level design with TensorIR" as the focus of the most recent redesign Source: [README.md:40-42](https://github.com/apache/tvm/blob/ddfec9c3d670358dd75225f1cf9fbfb6c9b0cdfa/README.md).
A Relax module exposes:
- Expressions (
expr.h): Functions, calls, tuples, variables, and constants. - StructInfo (
struct_info.h): Static shape and type annotations attached to expressions, enabling shape-aware passes before lowering to TIR. - Globals and modules: Top-level functions that may reference each other, allowing whole-program analyses such as constant folding, dead-code elimination, and operator fusion.
The cross-level design means that a Relax function is gradually rewritten: high-level operators (e.g. relax.matmul, relax.conv2d) are decomposed into relax.call_tir invocations, which the lowering pipeline materializes into TensorIR PrimFunc implementations.
flowchart LR
A[Framework Model<br/>PyTorch / ONNX / TFLite] --> B[Relax Frontend]
B --> C[Relax IRModule]
C --> D[Relax Transformations]
D --> E{Partitionable<br/>subgraph?}
E -- yes --> F[BYOC Codegen<br/>e.g. NPU]
E -- no --> G[Lower to TensorIR]
F --> H[Backend Runtime<br/>CPU/CUDA/OpenCL/<br/>Vulkan/Hexagon/Metal]
G --> H
H --> I[Deployable Module]Transformations
Transformations are Python-callable passes that take an IRModule and return a (possibly modified) IRModule. They are exposed through python/tvm/relax/transform/transform.py and the C++ headers under include/tvm/relax/transform.h. Each pass is registered so that tvm.transform.Sequential can compose them, and the same infrastructure is used for TensorIR passes.
Common categories of Relax transformations include:
- Operator fusion: Combining elementwise and reduction chains into a single
Functionto reduce memory traffic. - Dead-code elimination: Removing unused parameters, bindings, and functions.
- Constant folding and lifting: Promoting constants out of hot paths.
- Layout and dtype rewrites: Adjusting the IR to match the capabilities of a chosen backend.
- Partitioning for BYOC: Detecting subgraphs that match a registered external compiler/codegen and replacing them with
externcalls. The Example NPU backend illustrates the structure of such a partitioner Source: [python/tvm/relax/backend/contrib/example_npu/README.md:11-19](https://github.com/apache/tvm/blob/ddfec9c3d670358dd75225f1cf9fbfb6c9b0cdfa/python/tvm/relax/backend/contrib/example_npu/README.md). - Lowering to TensorIR: Converting high-level operators to
call_tirand correspondingPrimFuncimplementations.
Transformations are composable; a typical tvm compilation pipeline stitches a sequence of Relax passes followed by TensorIR passes and a BYOC integration.
Frontend-to-Deployment Data Flow
The end-to-end data flow that ties together frontends, the Relax IR, and transformations is:
- A user loads a model in a frontend module and obtains a Relax
IRModule. - A pipeline of Relax transformations rewrites the module: fuses operators, lifts constants, rewrites layouts, and partitions subgraphs for external codegens.
- The remaining operators are lowered to TensorIR
PrimFuncs and scheduled for the chosen target. - The final module is compiled into a deployable artifact: a shared library loaded by the C++ runtime, a Java artifact (TVM4J) Source: [jvm/core/src/main/java/org/apache/tvm/TVMObject.java:1-30](https://github.com/apache/tvm/blob/ddfec9c3d670358dd75225f1cf9fbfb6c9b0cdfa/jvm/core/src/main/java/org/apache/tvm/TVMObject.java), a WebAssembly bundle driven by
web/src/runtime.tsSource: [web/src/runtime.ts:1-25](https://github.com/apache/tvm/blob/ddfec9c3d670358dd75225f1cf9fbfb6c9b0cdfa/web/src/runtime.ts), or a Hexagon binary compiled with the Hexagon SDK Source: [src/backend/hexagon/runtime/hexagon_thread_manager.h:1-15](https://github.com/apache/tvm/blob/ddfec9c3d670358dd75225f1cf9fbfb6c9b0cdfa/src/backend/hexagon/runtime/hexagon_thread_manager.h). - Hardware-specific runtime threads (e.g. Hexagon HTP/HVX workers) execute the partitioned functions on-device.
Community Context
The transition tracked in issue #16368 ("Transition Main to Unity") is a community-driven effort to consolidate TVM's long-lived branches (Relax, Unity, MLC-AI) into a single main branch for GenAI workloads, which directly affects the frontends and Relax pipeline described above. TensorIR scheduling (issue #7527) provides the scheduling primitives that the Relax-to-TensorIR lowering pass relies on. Auto TensorCore CodeGen (issue #4105) is an example of a BYOC integration that consumes the same partitioning machinery as the Example NPU backend.
See Also
- TVM TensorIR Scheduling — the lower-level IR that Relax lowers to.
- Example NPU Backend — worked example of BYOC partitioning.
- Hexagon Runtime — one of the supported deployment targets.
- TVM WebAssembly Runtime — browser-side deployment via the JavaScript runtime in
web/src/runtime.ts.
Source: https://github.com/apache/tvm / Human Manual
TensorIR Scheduling & Backend Code Generation
Related topics: TVM Overview & Unity Architecture, Frontends, Relax Graph IR & Transformations, MetaSchedule Auto-Tuning, Runtime & Deployment
Continue reading this section for the full explanation and source context.
Related Pages
Related topics: TVM Overview & Unity Architecture, Frontends, Relax Graph IR & Transformations, MetaSchedule Auto-Tuning, Runtime & Deployment
TensorIR Scheduling & Backend Code Generation
1. Overview and Purpose
TensorIR is described in the project README as the focus of TVM's most recent architectural revision, which "focuses on a cross-level design with TensorIR" (README.md:1-50). TensorIR Scheduling refers to the transformation primitives that operate on a schedulable tensor-level intermediate representation (IR), allowing compiler developers to rewrite loops, perform tiling, vectorization, and fusion before backend code generation. Community discussion under issue [#7527 "[RFC][Tracking Issue] TensorIR Scheduling"](https://github.com/apache/tvm/issues/7527) and the Unity transition vote in issue #16368 confirm TensorIR scheduling is a foundational piece for the new generation of TVM.
Backend Code Generation is the second half of the pipeline: a scheduled TIR is lowered into target-specific source code and a runtime module. Each backend (CUDA, Vulkan, WebGPU, Hexagon, NPU) implements its own device API, stream model, and module loader, then registers a global packed function that the runtime calls.
2. Compilation Pipeline
The flow from a high-level model to an executable backend module moves through several stages:
flowchart LR
A[High-level Model] --> B[Relax / Graph IR]
B --> C[TensorIR]
C --> D[Schedule Primitives]
D --> E[Target-specific TIR]
E --> F[Backend CodeGen]
F --> G[Runtime Module]
G --> H[Device API + Stream]Source: README.md:1-50, src/backend/hexagon/runtime/hexagon_device_api.h:1-50
The schedule step applies transformations such as split, reorder, cache_read, cache_write, compute_at, and reverse_compute_at. These are exposed to Python through tvm.s_tir and the underlying C++ schedule class executes the corresponding mutations on the IR.
3. Backend Code Generation Across Targets
Once a schedule is fixed, the lowered TIR is handed to a per-target codegen that emits C++, CUDA, SPIR-V, LLVM IR, or C-source. The runtime side is then materialised as a loadable module. The following table summarizes the relevant backends observed in the repository and the runtime hooks they implement.
| Target | Codegen / Runtime Anchor | Key Responsibility |
|---|---|---|
| CUDA | src/backend/cuda/codegen/ (scheduled TIR → PTX/CUDA C++) | TensorCore intrinsics, shared memory alloc |
| Vulkan | src/backend/vulkan/runtime/README.md | SPIR-V pipeline from TIR shader |
| WebGPU / Wasm | web/src/runtime.ts, web/README.md | Loads tvmjs_runtime.wasm, exposes PackedFunc |
| Hexagon | src/backend/hexagon/runtime/hexagon_*.h | HTP/HVX threads, user-DMA, VTCM pool |
| Android RPC | apps/android_rpc/app/src/main/jni/tvm_runtime.h | Unified C++ runtime with conditional backends |
Source: src/backend/vulkan/runtime/README.md:1-30, web/README.md:1-50, apps/android_rpc/app/src/main/jni/tvm_runtime.h:1-50.
The Vulkan README explicitly notes that TVM simulates CUDA-style streams by "maintaining a thread-local vkCommandBuffer instance, and queueing up (or eagerly executing, depending on the availability of the VK_KHR_push_descriptor extension)" (src/backend/vulkan/runtime/README.md:1-20). When a scheduled kernel is submitted, the Vulkan device API ends command-buffer recording, submits it to the device queue, and waits on a fence — a direct parallel to how the CUDA backend synchronises its stream.
4. Hexagon Backend: A Worked Example
The Hexagon backend is one of the most complete illustrations of post-schedule codegen. A scheduled TIR is lowered into Hexagon-specific code; the runtime layer then manages hardware resources explicitly:
- The
HexagonThreadManagerspawns worker threads with configurable stack and pipe sizes, and exposes them asTVMStreamHandles (src/backend/hexagon/runtime/hexagon_thread_manager.h:1-50). - The
HexagonDeviceAPIimplementsAllocDataSpace,CopyDataFromTo, and owns the thread manager, user-DMA engine, and VTCM pool (src/backend/hexagon/runtime/hexagon_device_api.h:1-50). - The DMA engine is programmed by writing the descriptor fields defined in
hexagon_user_dma_descriptors.hand the control registers inhexagon_user_dma_registers.h, where bit-fields such asDESC_BYPASSSRC_MASKandDM0_STATUS_*are encoded directly into the IR-generated DMA setup sequence.
This pattern — schedule the loop nest, lower to target instructions, then dispatch through a device API that owns threads, streams, and memory pools — is the same shape used for CUDA and Vulkan.
5. Cross-Runtime Frontends and Device Codes
The device-type enum is shared across frontends. The Java binding defines constants such as kDLCPU = 1, kDLCUDA = 2, kDLOpenCL = 4, kDLVulkan = 7, kDLMetal = 8, kDLWebGPU = 15, kDLHexagon = 16 (jvm/core/src/main/java/org/apache/tvm/Device.java:1-50). The TypeScript runtime mirrors this enum: cpu: 1, cuda: 2, cl: 4, vulkan: 7, metal: 8, webgpu: 15 (web/src/runtime.ts:1-20). The two maps must stay in lock-step so that a module produced by the Python compiler can be deserialised and dispatched on either the JVM or the browser runtime.
6. Common Failure Modes and Limitations
A few recurring failure patterns surface across the backends and frontends:
- Device-type drift: if a new
DLDevicecode is added to the C++ runtime but not propagated toDevice.javaorruntime.ts, the frontend will fail to recognise the device string. - WASM/WebGPU lifecycle: the TypeScript runtime warns that objects returned from
PackedFunccalls must be released through a scope, because WASM and WebGPU memory is "not tracked through JS native garbage collection" (web/src/runtime.ts:1-50). - WebGPU is still flagged as experimental: the WebGPU RPC test requires Chrome Canary on macOS plus Vulkan SDK ≥ 1.1, and Firefox support is pending the Fence extension (web/README.md:1-50).
- TensorCore codegen maturity: the Auto TensorCore CodeGen RFC (issue #4105) emphasises that algorithm description and schedule should not differ from normal CUDA codegen, so schedule authors must take care to expose layout, mma fragment, and shared-memory buffer information that the TensorCore lowering expects.
7. See Also
- TensorIR RFC and tracking issue: #7527
- Auto TensorCore CodeGen: #4105
- Unity transition vote: #16368
- TVM Roadmap v0.8: #7434
- Relax example NPU backend: python/tvm/relax/backend/contrib/example_npu/README.md
Source: https://github.com/apache/tvm / Human Manual
MetaSchedule Auto-Tuning, Runtime & Deployment
Related topics: TVM Overview & Unity Architecture, Frontends, Relax Graph IR & Transformations, TensorIR Scheduling & Backend Code Generation
Continue reading this section for the full explanation and source context.
Related Pages
Related topics: TVM Overview & Unity Architecture, Frontends, Relax Graph IR & Transformations, TensorIR Scheduling & Backend Code Generation
MetaSchedule Auto-Tuning, Runtime & Deployment
Overview
Apache TVM is an open machine learning compilation framework built around two guiding principles: Python-first development so compiler pipelines can be customized quickly, and universal deployment that turns models into minimum deployable modules. Source: README.md
The "Auto-Tuning, Runtime & Deployment" surface spans three loosely coupled layers:
- The schedule-search machinery that derives optimized programs (TensorIR / MetaSchedule), discussed publicly in the TensorIR Scheduling tracking issue #7527.
- The device-level runtimes that execute the resulting compiled modules.
- The deployment packaging that delivers those modules to browsers, phones, DSPs, GPUs, and JVMs.
The source files in the repository demonstrate an unusually broad runtime footprint, while exposing a unified FFI (PackedFunc / AsyncPackedFunc) for client code. Source: web/src/runtime.ts
TensorIR / Auto-Tuning Surface
The most recent design focus is "a cross-level design with TensorIR," which links the high-level Relax/Graph IR to the low-level TensorIR scheduling surface so that automatic schedule search can be a drop-in component of an end-to-end compilation pipeline. Source: README.md
The TensorIR Scheduling RFC tracking issue (#7527) collects the original RFC and the landing sequence of the initial schedule primitives, schedule rules, and search strategies that together form the auto-tuning stack. The runtime backends consume the same TensorIR / TIR function form that these auto-tuners emit, so the entire stack is centered on a single IR contract. Source: README.md
Runtime Infrastructure
TVM's runtime is organized as a collection of DeviceAPI implementations, each owning its own resource managers, threading model, and memory allocators:
- Hexagon DSP runtime — Provides the executables, libraries, and wrappers needed to load and run compiled TVM modules on Qualcomm Hexagon hardware or the Hexagon simulator. The thread manager tracks hardware resources (HTP, HVX) and creates resource managers on demand. Source: src/backend/hexagon/runtime/README.md, src/backend/hexagon/runtime/hexagon_thread_manager.h
- Vulkan runtime — Implements the TVM
DeviceAPIinterface on top of Vulkan.VulkanDeviceAPIinitializes the Vulkan instance and devices,VulkanThreadEntrymaintains a per-thread staging buffer and stream, andVulkanWrappedFuncretrieves aVulkanPipelinefrom the module node and launches the kernel on the active stream. Source: src/backend/vulkan/runtime/README.md - OpenCL runtime — A thin wrapper around the OpenCL host API that lets TVM load OpenCL at runtime on devices where the SDK is not preinstalled (e.g., Android phones), avoiding the need to vendor a copy of the OpenCL library. Source: src/backend/opencl/runtime/opencl_wrapper/README.md
- WebAssembly / Web runtime — A TypeScript FFI layer (
FFILibrary) wrapping a WebAssembly instance, exposingPackedFuncandAsyncPackedFunccallable types and aWebGPUContextfor GPU-backed execution in the browser. Source: web/src/runtime.ts, web/src/index.ts
The Web runtime is built with Emscripten into libtvm_runtime.bc, tvmjs_runtime.wasm, and a WASI-compatible tvmjs_runtime.wasi.js; the TypeScript bundle is then produced with npm run bundle. Source: web/README.md
Deployment Targets
flowchart LR
A[Python toolchain<br/>compile + tune] --> B[Compiled TVM module]
B --> C[Native C++ runtime]
B --> D[C++ RPC server]
D --> E[iOS app]
D --> F[Android app]
B --> G[WASM runtime + JS bundle]
G --> H[Browser / WebGPU]
B --> I[Java JNI via TVM4J]
B --> J[Hexagon DSP / HTP / HVX]
B --> K[BYOC: NPU accelerators]Deployment is intentionally minimal: a tuned module is exported as a shared library plus a thin C/C++ entry point. The repository ships several packaging targets out of the box:
- C++ RPC server — Built when
USE_CPP_RPC=ON. The same recipe is reused for Android cross-compilation via the NDK toolchain file. Source: apps/cpp_rpc/README.md - iOS RPC app — An Xcode project that embeds the TVM runtime and a custom DSO loader plugin, allowing the host Python script to drive the device over the RPC channel. Source: apps/ios_rpc/README.md
- Android RPC app — A Gradle project that bundles TVM4J and exposes the RPC server. Source: apps/android_rpc/README.md
- JVM (TVM4J) — A Java frontend that constructs tensors from native arrays, registers Java callbacks as TVM functions, loads shared libraries produced by the Python toolchain, and provides RPC primitives. Requires JDK 1.6+, Maven 3, and an LLVM-enabled TVM build. Source: jvm/README.md
- BYOC NPUs — The Relax framework's Bring-Your-Own-Codegen pipeline supports dispatching subgraphs to external accelerators. The example NPU backend in
python/tvm/relax/backend/contrib/example_npushows the pattern for mobile NPUs (AMD XDNA, Google Edge TPU, Samsung NPU), dedicated AI chips (Intel Movidius, Qualcomm Hexagon, MediaTek APU), and cloud AI accelerators. Source: python/tvm/relax/backend/contrib/example_npu/README.md - Wheels —
cibuildwheeldriven by.github/workflows/publish_wheel.ymlandpyproject.toml. Helper scripts such asmanylinux_build_libtvm_runtime_cuda.shandwindows_build_libtvm_runtime_cuda.batbuild the CUDA-enabled runtime sidecar. Source: ci/scripts/package/README.md
Continuous integration is split between Jenkins (Linux + accelerated hardware) and GitHub Actions (Windows, macOS, on-repo automations). Lint scripts live in tests/lint, task scripts in tests/scripts, and Docker images in docker/ provide the underlying execution environments. Source: ci/README.md, ci/jenkins/README.md, docker/README.md
Hexagon-Specific Notes
The Hexagon backend shows the depth of the deployment story. The user-mode DMA driver is mapped through a dedicated header that defines the dm*_set_* / dm*_get_* accessors for the Syndrone descriptors, the bus-error and abort codes, and the guest/monitor mode controls. Source: src/backend/hexagon/runtime/hexagon_user_dma_registers.h
For host-side cross-compilation, LLVM 7.0.0 is the minimum supported version; for execution, the Hexagon SDK 4.0.0 or later is required. Source: src/backend/hexagon/runtime/README.md
See Also
- Project overview: README.md
- Web runtime build: web/README.md
- Packaging & wheels: ci/scripts/package/README.md
- Community: TensorIR Scheduling RFC tracking issue #7527
- Related backends: Vulkan (src/backend/vulkan/runtime/README.md), OpenCL (src/backend/opencl/runtime/opencl_wrapper/README.md)
- Deployment apps: apps/cpp_rpc/README.md, apps/ios_rpc/README.md, apps/android_rpc/README.md, jvm/README.md
- BYOC: python/tvm/relax/backend/contrib/example_npu/README.md
Source: https://github.com/apache/tvm / Human Manual
Doramagic Pitfall Log
Source-linked risks stay visible on the manual page so the preview does not read like a recommendation.
May increase setup, validation, or first-run risk for the user.
May increase setup, validation, or first-run risk for the user.
May increase setup, validation, or first-run risk for the user.
May increase setup, validation, or first-run risk for the user.
Doramagic Pitfall Log
Found 8 structured pitfall item(s), including 1 high/blocking item(s). Top priority: Security or permission risk - Security or permission risk requires verification.
1. Security or permission risk: Security or permission risk requires verification
- Severity: high
- Finding: Project evidence flags a security or permission risk. Review the linked source before relying on this workflow.
- User impact: May increase setup, validation, or first-run risk for the user.
- Recommended check: Reproduce the official install and quickstart path in an isolated environment.
- Evidence: community_evidence:github | https://github.com/apache/tvm/issues/19802
2. Capability evidence risk: Capability evidence risk requires verification
- Severity: medium
- Finding: README/documentation is current enough for a first validation pass.
- User impact: May increase setup, validation, or first-run risk for the user.
- Recommended check: Reproduce the official install and quickstart path in an isolated environment.
- Evidence: capability.assumptions | https://github.com/apache/tvm
3. Maintenance risk: Maintenance risk requires verification
- Severity: medium
- Finding: Project evidence flags a maintenance risk. Review the linked source before relying on this workflow.
- User impact: May increase setup, validation, or first-run risk for the user.
- Recommended check: Reproduce the official install and quickstart path in an isolated environment.
- Evidence: evidence.maintainer_signals | https://github.com/apache/tvm
4. Security or permission risk: Security or permission risk requires verification
- Severity: medium
- Finding: no_demo
- User impact: May increase setup, validation, or first-run risk for the user.
- Recommended check: Reproduce the official install and quickstart path in an isolated environment.
- Evidence: downstream_validation.risk_items | https://github.com/apache/tvm
5. Security or permission risk: Security or permission risk requires verification
- Severity: medium
- Finding: no_demo
- User impact: May increase setup, validation, or first-run risk for the user.
- Recommended check: Reproduce the official install and quickstart path in an isolated environment.
- Evidence: risks.scoring_risks | https://github.com/apache/tvm
6. Security or permission risk: Security or permission risk requires verification
- Severity: medium
- Finding: Project evidence flags a security or permission risk. Review the linked source before relying on this workflow.
- User impact: May increase setup, validation, or first-run risk for the user.
- Recommended check: Reproduce the official install and quickstart path in an isolated environment.
- Evidence: community_evidence:github | https://github.com/apache/tvm/issues/19702
7. Maintenance risk: Maintenance risk requires verification
- Severity: low
- Finding: issue_or_pr_quality=unknown。
- User impact: May increase setup, validation, or first-run risk for the user.
- Recommended check: Reproduce the official install and quickstart path in an isolated environment.
- Evidence: evidence.maintainer_signals | https://github.com/apache/tvm
8. Maintenance risk: Maintenance risk requires verification
- Severity: low
- Finding: release_recency=unknown。
- User impact: May increase setup, validation, or first-run risk for the user.
- Recommended check: Reproduce the official install and quickstart path in an isolated environment.
- Evidence: evidence.maintainer_signals | https://github.com/apache/tvm
Source: Doramagic discovery, validation, and Project Pack records
Community Discussion Evidence
These external discussion links are review inputs, not standalone proof that the project is production-ready.
Count of project-level external discussion links exposed on this manual page.
Open the linked issues or discussions before treating the pack as ready for your environment.
Community Discussion Evidence
Doramagic exposes project-level community discussion separately from official documentation. Review these links before using tvm with real data or production workflows.
- [[VOTE] Release Apache TVM v0.25.0.rc1](https://github.com/apache/tvm/issues/19802) - github / github_issue
- [[VOTE] Release Apache TVM v0.25.0.rc0](https://github.com/apache/tvm/issues/19702) - github / github_issue
- [[Tracking Issue][TFLite] Remaining builtin operator coverage beyond #194](https://github.com/apache/tvm/issues/19519) - github / github_issue
- v0.25.0.rc1 - github / github_release
- v0.25.0.rc0 - github / github_release
- Apache TVM v0.24.0 - github / github_release
- Apache TVM v0.23.0 - github / github_release
- Apache TVM v0.22.0 - github / github_release
- Apache TVM v0.21.0 - github / github_release
- Apache TVM v0.20.0 - github / github_release
- Apache TVM v0.19.0 - github / github_release
- Apache TVM v0.18.0 - github / github_release
Source: Project Pack community evidence and pitfall evidence