tvm Manual - Doramagic.ai

Doramagic Project Pack · Human Manual

tvm

Open Machine Learning Compiler Framework

TVM Overview & Unity Architecture

Related topics: Frontends, Relax Graph IR & Transformations, TensorIR Scheduling & Backend Code Generation, MetaSchedule Auto-Tuning, Runtime & Deployment

Section Related Pages

Continue reading this section for the full explanation and source context.

TVM Overview & Unity Architecture

1. Project Purpose and Core Principles

Apache TVM is an open deep learning compiler framework that follows two guiding principles stated directly in the project root: *Python-first development that enables quick customization of machine learning compiler pipelines*, and *Universal deployment to bring models into minimum deployable modules* README.md.

The project began as a research effort for deep learning compilation and incorporated ideas from Halide (Tensor IR and arithmetic simplification), Loopy (integer-set analysis and loop transformation primitives), and Theano (symbolic scan for recurrence) README.md. After several rounds of redesign, the current architecture centers on a *cross-level design with TensorIR*, unifying high-level graph optimizations and low-level loop-level scheduling in a single abstraction README.md. This evolution was tracked publicly through community RFCs, including the TensorIR scheduling tracking issue and the v0.5/v0.8 roadmaps discussed in the community context.

The license is Apache-2.0, and the project operates under the Apache committer model README.md.

2. The Unity Architecture

The "Unity" initiative referenced in the community vote (issue #16368) represents TVM's unified compilation stack that supports modern GenAI workloads such as stable diffusion, Whisper, GPT-class LLMs, and open LLMs. The architecture is exposed through the Python entry point in python/tvm/relax/, where the Relax framework serves as the high-level IR that pairs with TensorIR for low-level scheduling python/tvm/relax/backend/contrib/example_npu/README.md.

A key extensibility mechanism is BYOC (Bring Your Own Codegen), which lets external accelerator vendors plug into the Unity pipeline. The example_npu directory shows a hands-on pattern for adding an NPU backend targeting mobile NPUs (AMD XDNA, Google Edge TPU, Samsung NPU), dedicated AI chips (Intel Movidius, Qualcomm Hexagon, MediaTek APU), cloud AI accelerators (AWS Inferentia, Google TPU, Microsoft Azure Maia), and custom ASICs python/tvm/relax/backend/contrib/example_npu/README.md.

The runtime side of Unity relies on a stable C ABI exposed through tvmjs (TVM WASM/WebGPU runtime for JS/TS) and a typed FFI layer. The TypeScript binding in web/src/ctypes.ts defines kTVMFFIObject, kTVMFFIFunction, kTVMFFITensor, kTVMFFIArray, and related type codes, illustrating the static-object ABI that the compiler, runtime, and language bindings all share.

flowchart LR
    A[Python Frontend<br/>Relax / TIR] --> B[Compiler Pipeline<br/>TensorIR + BYOC]
    B --> C[Target Backends]
    C --> C1[CUDA / Vulkan]
    C --> C2[Hexagon / OpenCL]
    C --> C3[Metal / WebGPU]
    B --> D[Runtime Artifacts]
    D --> D1[C/C++ libtvm_runtime]
    D --> D2[TVM4J JVM]
    D --> D3[tvmjs WASM/JS]
    D --> D4[RPC: iOS / Android / C++]

3. Deployment Surfaces and Backend Runtimes

Unity is realized through a rich set of backend runtimes and deployment surfaces:

Surface	Component	Source
GPU/compute	Vulkan runtime (DeviceAPI, ThreadEntry, WrappedFunc, command-buffer streams)	src/backend/vulkan/runtime/README.md
DSP	Hexagon runtime (host cross-compile + on-device + Android)	src/backend/hexagon/runtime/README.md
OpenCL	Dynamically-loaded OpenCL wrapper	src/backend/opencl/runtime/opencl_wrapper/README.md
Browser/Node	WASM + WebGPU via Emscripten (`prepwasm`, `make`)	web/README.md, web/package.json
Mobile RPC	iOS and Android RPC apps	apps/ios_rpc/README.md, apps/android_rpc/README.md
Server RPC	Standalone C++ RPC server (`USE_CPP_RPC=ON`)	apps/cpp_rpc/README.md
JVM	TVM4J Java bindings	jvm/README.md

The Vulkan runtime intentionally mirrors the CUDA stream model: a thread-local vkCommandBuffer queues launches and a fence-based explicit sync simulates CUDA stream semantics src/backend/vulkan/runtime/README.md. The Hexagon runtime requires LLVM ≥ 7.0.0 plus the Hexagon SDK (≥ 4.0.0) and supports cross-compile from x86, plus on-target and Android builds gated by USE_HEXAGON, USE_HEXAGON_ARCH, and USE_HEXAGON_SDK src/backend/hexagon/runtime/README.md. The OpenCL wrapper sidesteps SDK dependency by dynamically loading the platform OpenCL library through a fixed set of wrapped symbols src/backend/opencl/runtime/opencl_wrapper/README.md.

4. Build Infrastructure, CI, and Distribution

Build and packaging are decoupled. Docker images provide reproducible environments — docker/bash.sh <image_name> mounts the source tree, switches the user, and uses host networking docker/README.md.

CI is split between Jenkins (Linux, including GPU regression suites that gate merges) and GitHub Actions (Windows, macOS, and automation bots) ci/README.md, ci/jenkins/README.md. Test suites live under tests/scripts as task_* scripts invoked by Jenkinsfile, with shared helpers in ci/scripts ci/README.md.

Wheels are produced by cibuildwheel driven from pyproject.toml and .github/workflows/publish_wheel.yml. Helper scripts build the CUDA sidecar libtvm_runtime_cuda.so (Linux: manylinux_build_libtvm_runtime_cuda.sh; Windows: windows_build_libtvm_runtime_cuda.bat) ci/scripts/package/README.md. The current published version is 0.25.0-dev1 for tvmjs web/package.json, and the v0.25.0.rc1 release in the community context is a backport of recent main.

Frontends, Relax Graph IR & Transformations

Related topics: TVM Overview & Unity Architecture, TensorIR Scheduling & Backend Code Generation, MetaSchedule Auto-Tuning, Runtime & Deployment

Section Related Pages

Continue reading this section for the full explanation and source context.

Section Frontend Layer Components

Continue reading this section for the full explanation and source context.

Frontends, Relax Graph IR & Transformations

Overview

Apache TVM is an open machine learning compilation framework whose stated principles are Python-first development for "quick customization of machine learning compiler pipelines" and universal deployment to "bring models into minimum deployable modules" Source: [README.md:1-15](https://github.com/apache/tvm/blob/ddfec9c3d670358dd75225f1cf9fbfb6c9b0cdfa/README.md). To achieve this, TVM separates *ingestion* of models from many frameworks, *compilation* through a unified high-level graph IR (Relax), and *lowering* to vendor-specific backends. The current design "focuses on a cross-level design with TensorIR" and has gone through several redesigns compared to the initial research project Source: [README.md:38-42](https://github.com/apache/tvm/blob/ddfec9c3d670358dd75225f1cf9fbfb6c9b0cdfa/README.md).

This page describes the boundary between frontends (model importers), the Relax high-level graph IR, and the transformation passes that prepare a Relax function for backend lowering.

Frontends: Importing Models into Relax

TVM supports ingesting models from multiple frameworks through dedicated frontend modules under python/tvm/relax/frontend/. Each frontend parses a framework-specific representation and emits a Relax Function whose body is composed of Relax Call nodes wrapping imported operator expressions. The Relax frontend layer is intentionally minimal: it is responsible only for translation, not for optimization. All subsequent optimization lives in the transformation pipeline.

Once a model is imported, the resulting IRModule is passed to a sequence of Relax transformations before being lowered to TensorIR (or to a BYOC codegen). This separation is a recurring theme in the codebase: a recently added Example NPU Backend shows that "Build a Neural Processing Unit (NPU) backend for TVM's Relax framework using Bring Your Own Codegen (BYOC)" is the canonical pattern for adding a new hardware target, and that NPUs typically only accept a fixed set of operations (matmul, conv, activations) Source: [python/tvm/relax/backend/contrib/example_npu/README.md:5-19](https://github.com/apache/tvm/blob/ddfec9c3d670358dd75225f1cf9fbfb6c9b0cdfa/python/tvm/relax/backend/contrib/example_npu/README.md). Frontend output is therefore shaped to be partitionable by the BYOC pipeline.

Frontend Layer Components

Component	Role
Per-framework frontend (e.g. Torch, ONNX)	Translate framework graph to Relax
Relax `Function`	Top-level callable with structured signature info
Relax `Call`	Reference to an operator (callable or extern)
`extern` op	Marks a node that must be handled by a backend (BYOC)

The Relax Graph IR

Relax is TVM's high-level, function-level IR. It complements TensorIR, which operates on the loop/statement level. Relax carries the *whole-program* view (control flow, function boundaries, shape and type information) while TensorIR describes *how* a particular tensor computation is scheduled. The README explicitly identifies the "cross-level design with TensorIR" as the focus of the most recent redesign Source: [README.md:40-42](https://github.com/apache/tvm/blob/ddfec9c3d670358dd75225f1cf9fbfb6c9b0cdfa/README.md).

A Relax module exposes:

Expressions (expr.h): Functions, calls, tuples, variables, and constants.
StructInfo (struct_info.h): Static shape and type annotations attached to expressions, enabling shape-aware passes before lowering to TIR.
Globals and modules: Top-level functions that may reference each other, allowing whole-program analyses such as constant folding, dead-code elimination, and operator fusion.

The cross-level design means that a Relax function is gradually rewritten: high-level operators (e.g. relax.matmul, relax.conv2d) are decomposed into relax.call_tir invocations, which the lowering pipeline materializes into TensorIR PrimFunc implementations.

flowchart LR
  A[Framework Model<br/>PyTorch / ONNX / TFLite] --> B[Relax Frontend]
  B --> C[Relax IRModule]
  C --> D[Relax Transformations]
  D --> E{Partitionable<br/>subgraph?}
  E -- yes --> F[BYOC Codegen<br/>e.g. NPU]
  E -- no --> G[Lower to TensorIR]
  F --> H[Backend Runtime<br/>CPU/CUDA/OpenCL/<br/>Vulkan/Hexagon/Metal]
  G --> H
  H --> I[Deployable Module]

Transformations

Transformations are Python-callable passes that take an IRModule and return a (possibly modified) IRModule. They are exposed through python/tvm/relax/transform/transform.py and the C++ headers under include/tvm/relax/transform.h. Each pass is registered so that tvm.transform.Sequential can compose them, and the same infrastructure is used for TensorIR passes.

Common categories of Relax transformations include:

Operator fusion: Combining elementwise and reduction chains into a single Function to reduce memory traffic.
Dead-code elimination: Removing unused parameters, bindings, and functions.
Constant folding and lifting: Promoting constants out of hot paths.
Layout and dtype rewrites: Adjusting the IR to match the capabilities of a chosen backend.
Partitioning for BYOC: Detecting subgraphs that match a registered external compiler/codegen and replacing them with extern calls. The Example NPU backend illustrates the structure of such a partitioner Source: [python/tvm/relax/backend/contrib/example_npu/README.md:11-19](https://github.com/apache/tvm/blob/ddfec9c3d670358dd75225f1cf9fbfb6c9b0cdfa/python/tvm/relax/backend/contrib/example_npu/README.md).
Lowering to TensorIR: Converting high-level operators to call_tir and corresponding PrimFunc implementations.

Transformations are composable; a typical tvm compilation pipeline stitches a sequence of Relax passes followed by TensorIR passes and a BYOC integration.

Frontend-to-Deployment Data Flow

The end-to-end data flow that ties together frontends, the Relax IR, and transformations is:

A user loads a model in a frontend module and obtains a Relax IRModule.
A pipeline of Relax transformations rewrites the module: fuses operators, lifts constants, rewrites layouts, and partitions subgraphs for external codegens.
The remaining operators are lowered to TensorIR PrimFuncs and scheduled for the chosen target.
The final module is compiled into a deployable artifact: a shared library loaded by the C++ runtime, a Java artifact (TVM4J) Source: [jvm/core/src/main/java/org/apache/tvm/TVMObject.java:1-30](https://github.com/apache/tvm/blob/ddfec9c3d670358dd75225f1cf9fbfb6c9b0cdfa/jvm/core/src/main/java/org/apache/tvm/TVMObject.java), a WebAssembly bundle driven by web/src/runtime.ts Source: [web/src/runtime.ts:1-25](https://github.com/apache/tvm/blob/ddfec9c3d670358dd75225f1cf9fbfb6c9b0cdfa/web/src/runtime.ts), or a Hexagon binary compiled with the Hexagon SDK Source: [src/backend/hexagon/runtime/hexagon_thread_manager.h:1-15](https://github.com/apache/tvm/blob/ddfec9c3d670358dd75225f1cf9fbfb6c9b0cdfa/src/backend/hexagon/runtime/hexagon_thread_manager.h).
Hardware-specific runtime threads (e.g. Hexagon HTP/HVX workers) execute the partitioned functions on-device.

Community Context

The transition tracked in issue #16368 ("Transition Main to Unity") is a community-driven effort to consolidate TVM's long-lived branches (Relax, Unity, MLC-AI) into a single main branch for GenAI workloads, which directly affects the frontends and Relax pipeline described above. TensorIR scheduling (issue #7527) provides the scheduling primitives that the Relax-to-TensorIR lowering pass relies on. Auto TensorCore CodeGen (issue #4105) is an example of a BYOC integration that consumes the same partitioning machinery as the Example NPU backend.

TensorIR Scheduling & Backend Code Generation

Related topics: TVM Overview & Unity Architecture, Frontends, Relax Graph IR & Transformations, MetaSchedule Auto-Tuning, Runtime & Deployment

Section Related Pages

Continue reading this section for the full explanation and source context.

TensorIR Scheduling & Backend Code Generation

1. Overview and Purpose

TensorIR is described in the project README as the focus of TVM's most recent architectural revision, which "focuses on a cross-level design with TensorIR" (README.md:1-50). TensorIR Scheduling refers to the transformation primitives that operate on a schedulable tensor-level intermediate representation (IR), allowing compiler developers to rewrite loops, perform tiling, vectorization, and fusion before backend code generation. Community discussion under issue [#7527 "[RFC][Tracking Issue] TensorIR Scheduling"](https://github.com/apache/tvm/issues/7527) and the Unity transition vote in issue #16368 confirm TensorIR scheduling is a foundational piece for the new generation of TVM.

Backend Code Generation is the second half of the pipeline: a scheduled TIR is lowered into target-specific source code and a runtime module. Each backend (CUDA, Vulkan, WebGPU, Hexagon, NPU) implements its own device API, stream model, and module loader, then registers a global packed function that the runtime calls.

2. Compilation Pipeline

The flow from a high-level model to an executable backend module moves through several stages:

flowchart LR
    A[High-level Model] --> B[Relax / Graph IR]
    B --> C[TensorIR]
    C --> D[Schedule Primitives]
    D --> E[Target-specific TIR]
    E --> F[Backend CodeGen]
    F --> G[Runtime Module]
    G --> H[Device API + Stream]

Source: README.md:1-50, src/backend/hexagon/runtime/hexagon_device_api.h:1-50

The schedule step applies transformations such as split, reorder, cache_read, cache_write, compute_at, and reverse_compute_at. These are exposed to Python through tvm.s_tir and the underlying C++ schedule class executes the corresponding mutations on the IR.

3. Backend Code Generation Across Targets

Once a schedule is fixed, the lowered TIR is handed to a per-target codegen that emits C++, CUDA, SPIR-V, LLVM IR, or C-source. The runtime side is then materialised as a loadable module. The following table summarizes the relevant backends observed in the repository and the runtime hooks they implement.

Target	Codegen / Runtime Anchor	Key Responsibility
CUDA	`src/backend/cuda/codegen/` (scheduled TIR → PTX/CUDA C++)	TensorCore intrinsics, shared memory alloc
Vulkan	`src/backend/vulkan/runtime/README.md`	SPIR-V pipeline from TIR shader
WebGPU / Wasm	`web/src/runtime.ts`, `web/README.md`	Loads `tvmjs_runtime.wasm`, exposes PackedFunc
Hexagon	`src/backend/hexagon/runtime/hexagon_*.h`	HTP/HVX threads, user-DMA, VTCM pool
Android RPC	`apps/android_rpc/app/src/main/jni/tvm_runtime.h`	Unified C++ runtime with conditional backends

Source: src/backend/vulkan/runtime/README.md:1-30, web/README.md:1-50, apps/android_rpc/app/src/main/jni/tvm_runtime.h:1-50.

The Vulkan README explicitly notes that TVM simulates CUDA-style streams by "maintaining a thread-local vkCommandBuffer instance, and queueing up (or eagerly executing, depending on the availability of the VK_KHR_push_descriptor extension)" (src/backend/vulkan/runtime/README.md:1-20). When a scheduled kernel is submitted, the Vulkan device API ends command-buffer recording, submits it to the device queue, and waits on a fence — a direct parallel to how the CUDA backend synchronises its stream.

4. Hexagon Backend: A Worked Example

The Hexagon backend is one of the most complete illustrations of post-schedule codegen. A scheduled TIR is lowered into Hexagon-specific code; the runtime layer then manages hardware resources explicitly:

The HexagonThreadManager spawns worker threads with configurable stack and pipe sizes, and exposes them as TVMStreamHandles (src/backend/hexagon/runtime/hexagon_thread_manager.h:1-50).
The HexagonDeviceAPI implements AllocDataSpace, CopyDataFromTo, and owns the thread manager, user-DMA engine, and VTCM pool (src/backend/hexagon/runtime/hexagon_device_api.h:1-50).
The DMA engine is programmed by writing the descriptor fields defined in hexagon_user_dma_descriptors.h and the control registers in hexagon_user_dma_registers.h, where bit-fields such as DESC_BYPASSSRC_MASK and DM0_STATUS_* are encoded directly into the IR-generated DMA setup sequence.

This pattern — schedule the loop nest, lower to target instructions, then dispatch through a device API that owns threads, streams, and memory pools — is the same shape used for CUDA and Vulkan.

5. Cross-Runtime Frontends and Device Codes

The device-type enum is shared across frontends. The Java binding defines constants such as kDLCPU = 1, kDLCUDA = 2, kDLOpenCL = 4, kDLVulkan = 7, kDLMetal = 8, kDLWebGPU = 15, kDLHexagon = 16 (jvm/core/src/main/java/org/apache/tvm/Device.java:1-50). The TypeScript runtime mirrors this enum: cpu: 1, cuda: 2, cl: 4, vulkan: 7, metal: 8, webgpu: 15 (web/src/runtime.ts:1-20). The two maps must stay in lock-step so that a module produced by the Python compiler can be deserialised and dispatched on either the JVM or the browser runtime.

6. Common Failure Modes and Limitations

A few recurring failure patterns surface across the backends and frontends:

Device-type drift: if a new DLDevice code is added to the C++ runtime but not propagated to Device.java or runtime.ts, the frontend will fail to recognise the device string.
WASM/WebGPU lifecycle: the TypeScript runtime warns that objects returned from PackedFunc calls must be released through a scope, because WASM and WebGPU memory is "not tracked through JS native garbage collection" (web/src/runtime.ts:1-50).
WebGPU is still flagged as experimental: the WebGPU RPC test requires Chrome Canary on macOS plus Vulkan SDK ≥ 1.1, and Firefox support is pending the Fence extension (web/README.md:1-50).
TensorCore codegen maturity: the Auto TensorCore CodeGen RFC (issue #4105) emphasises that algorithm description and schedule should not differ from normal CUDA codegen, so schedule authors must take care to expose layout, mma fragment, and shared-memory buffer information that the TensorCore lowering expects.

7. See Also

TensorIR RFC and tracking issue: #7527
Auto TensorCore CodeGen: #4105
Unity transition vote: #16368
TVM Roadmap v0.8: #7434
Relax example NPU backend: python/tvm/relax/backend/contrib/example_npu/README.md

Source: https://github.com/apache/tvm / Human Manual

MetaSchedule Auto-Tuning, Runtime & Deployment

Related topics: TVM Overview & Unity Architecture, Frontends, Relax Graph IR & Transformations, TensorIR Scheduling & Backend Code Generation

Section Related Pages

Continue reading this section for the full explanation and source context.

MetaSchedule Auto-Tuning, Runtime & Deployment

Overview

Apache TVM is an open machine learning compilation framework built around two guiding principles: Python-first development so compiler pipelines can be customized quickly, and universal deployment that turns models into minimum deployable modules. Source: README.md

The "Auto-Tuning, Runtime & Deployment" surface spans three loosely coupled layers:

The schedule-search machinery that derives optimized programs (TensorIR / MetaSchedule), discussed publicly in the TensorIR Scheduling tracking issue #7527.
The device-level runtimes that execute the resulting compiled modules.
The deployment packaging that delivers those modules to browsers, phones, DSPs, GPUs, and JVMs.

The source files in the repository demonstrate an unusually broad runtime footprint, while exposing a unified FFI (PackedFunc / AsyncPackedFunc) for client code. Source: web/src/runtime.ts

TensorIR / Auto-Tuning Surface

The most recent design focus is "a cross-level design with TensorIR," which links the high-level Relax/Graph IR to the low-level TensorIR scheduling surface so that automatic schedule search can be a drop-in component of an end-to-end compilation pipeline. Source: README.md

The TensorIR Scheduling RFC tracking issue (#7527) collects the original RFC and the landing sequence of the initial schedule primitives, schedule rules, and search strategies that together form the auto-tuning stack. The runtime backends consume the same TensorIR / TIR function form that these auto-tuners emit, so the entire stack is centered on a single IR contract. Source: README.md

Runtime Infrastructure

TVM's runtime is organized as a collection of DeviceAPI implementations, each owning its own resource managers, threading model, and memory allocators:

Hexagon DSP runtime — Provides the executables, libraries, and wrappers needed to load and run compiled TVM modules on Qualcomm Hexagon hardware or the Hexagon simulator. The thread manager tracks hardware resources (HTP, HVX) and creates resource managers on demand. Source: src/backend/hexagon/runtime/README.md, src/backend/hexagon/runtime/hexagon_thread_manager.h
Vulkan runtime — Implements the TVM DeviceAPI interface on top of Vulkan. VulkanDeviceAPI initializes the Vulkan instance and devices, VulkanThreadEntry maintains a per-thread staging buffer and stream, and VulkanWrappedFunc retrieves a VulkanPipeline from the module node and launches the kernel on the active stream. Source: src/backend/vulkan/runtime/README.md
OpenCL runtime — A thin wrapper around the OpenCL host API that lets TVM load OpenCL at runtime on devices where the SDK is not preinstalled (e.g., Android phones), avoiding the need to vendor a copy of the OpenCL library. Source: src/backend/opencl/runtime/opencl_wrapper/README.md
WebAssembly / Web runtime — A TypeScript FFI layer (FFILibrary) wrapping a WebAssembly instance, exposing PackedFunc and AsyncPackedFunc callable types and a WebGPUContext for GPU-backed execution in the browser. Source: web/src/runtime.ts, web/src/index.ts

The Web runtime is built with Emscripten into libtvm_runtime.bc, tvmjs_runtime.wasm, and a WASI-compatible tvmjs_runtime.wasi.js; the TypeScript bundle is then produced with npm run bundle. Source: web/README.md

Deployment Targets

flowchart LR
    A[Python toolchain<br/>compile + tune] --> B[Compiled TVM module]
    B --> C[Native C++ runtime]
    B --> D[C++ RPC server]
    D --> E[iOS app]
    D --> F[Android app]
    B --> G[WASM runtime + JS bundle]
    G --> H[Browser / WebGPU]
    B --> I[Java JNI via TVM4J]
    B --> J[Hexagon DSP / HTP / HVX]
    B --> K[BYOC: NPU accelerators]

Deployment is intentionally minimal: a tuned module is exported as a shared library plus a thin C/C++ entry point. The repository ships several packaging targets out of the box:

C++ RPC server — Built when USE_CPP_RPC=ON. The same recipe is reused for Android cross-compilation via the NDK toolchain file. Source: apps/cpp_rpc/README.md
iOS RPC app — An Xcode project that embeds the TVM runtime and a custom DSO loader plugin, allowing the host Python script to drive the device over the RPC channel. Source: apps/ios_rpc/README.md
Android RPC app — A Gradle project that bundles TVM4J and exposes the RPC server. Source: apps/android_rpc/README.md
JVM (TVM4J) — A Java frontend that constructs tensors from native arrays, registers Java callbacks as TVM functions, loads shared libraries produced by the Python toolchain, and provides RPC primitives. Requires JDK 1.6+, Maven 3, and an LLVM-enabled TVM build. Source: jvm/README.md
BYOC NPUs — The Relax framework's Bring-Your-Own-Codegen pipeline supports dispatching subgraphs to external accelerators. The example NPU backend in python/tvm/relax/backend/contrib/example_npu shows the pattern for mobile NPUs (AMD XDNA, Google Edge TPU, Samsung NPU), dedicated AI chips (Intel Movidius, Qualcomm Hexagon, MediaTek APU), and cloud AI accelerators. Source: python/tvm/relax/backend/contrib/example_npu/README.md
Wheels — cibuildwheel driven by .github/workflows/publish_wheel.yml and pyproject.toml. Helper scripts such as manylinux_build_libtvm_runtime_cuda.sh and windows_build_libtvm_runtime_cuda.bat build the CUDA-enabled runtime sidecar. Source: ci/scripts/package/README.md

Continuous integration is split between Jenkins (Linux + accelerated hardware) and GitHub Actions (Windows, macOS, on-repo automations). Lint scripts live in tests/lint, task scripts in tests/scripts, and Docker images in docker/ provide the underlying execution environments. Source: ci/README.md, ci/jenkins/README.md, docker/README.md

Hexagon-Specific Notes

The Hexagon backend shows the depth of the deployment story. The user-mode DMA driver is mapped through a dedicated header that defines the dm*_set_* / dm*_get_* accessors for the Syndrone descriptors, the bus-error and abort codes, and the guest/monitor mode controls. Source: src/backend/hexagon/runtime/hexagon_user_dma_registers.h

For host-side cross-compilation, LLVM 7.0.0 is the minimum supported version; for execution, the Hexagon SDK 4.0.0 or later is required. Source: src/backend/hexagon/runtime/README.md

Doramagic Pitfall Log

Source-linked risks stay visible on the manual page so the preview does not read like a recommendation.

high Security or permission risk requires verification

May increase setup, validation, or first-run risk for the user.

medium Capability evidence risk requires verification

May increase setup, validation, or first-run risk for the user.

medium Maintenance risk requires verification

May increase setup, validation, or first-run risk for the user.

medium Security or permission risk requires verification

May increase setup, validation, or first-run risk for the user.

Doramagic Pitfall Log

Found 8 structured pitfall item(s), including 1 high/blocking item(s). Top priority: Security or permission risk - Security or permission risk requires verification.

1. Security or permission risk: Security or permission risk requires verification

Severity: high
Finding: Project evidence flags a security or permission risk. Review the linked source before relying on this workflow.
User impact: May increase setup, validation, or first-run risk for the user.
Recommended check: Reproduce the official install and quickstart path in an isolated environment.
Evidence: community_evidence:github | https://github.com/apache/tvm/issues/19802

2. Capability evidence risk: Capability evidence risk requires verification

Severity: medium
Finding: README/documentation is current enough for a first validation pass.
User impact: May increase setup, validation, or first-run risk for the user.
Recommended check: Reproduce the official install and quickstart path in an isolated environment.
Evidence: capability.assumptions | https://github.com/apache/tvm

3. Maintenance risk: Maintenance risk requires verification

Severity: medium
Finding: Project evidence flags a maintenance risk. Review the linked source before relying on this workflow.
User impact: May increase setup, validation, or first-run risk for the user.
Recommended check: Reproduce the official install and quickstart path in an isolated environment.
Evidence: evidence.maintainer_signals | https://github.com/apache/tvm

4. Security or permission risk: Security or permission risk requires verification

Severity: medium
Finding: no_demo
User impact: May increase setup, validation, or first-run risk for the user.
Recommended check: Reproduce the official install and quickstart path in an isolated environment.
Evidence: downstream_validation.risk_items | https://github.com/apache/tvm

5. Security or permission risk: Security or permission risk requires verification

Severity: medium
Finding: no_demo
User impact: May increase setup, validation, or first-run risk for the user.
Recommended check: Reproduce the official install and quickstart path in an isolated environment.
Evidence: risks.scoring_risks | https://github.com/apache/tvm

6. Security or permission risk: Security or permission risk requires verification

Severity: medium
Finding: Project evidence flags a security or permission risk. Review the linked source before relying on this workflow.
User impact: May increase setup, validation, or first-run risk for the user.
Recommended check: Reproduce the official install and quickstart path in an isolated environment.
Evidence: community_evidence:github | https://github.com/apache/tvm/issues/19702

7. Maintenance risk: Maintenance risk requires verification

Severity: low
Finding: issue_or_pr_quality=unknown。
User impact: May increase setup, validation, or first-run risk for the user.
Recommended check: Reproduce the official install and quickstart path in an isolated environment.
Evidence: evidence.maintainer_signals | https://github.com/apache/tvm

8. Maintenance risk: Maintenance risk requires verification

Severity: low
Finding: release_recency=unknown。
User impact: May increase setup, validation, or first-run risk for the user.
Recommended check: Reproduce the official install and quickstart path in an isolated environment.
Evidence: evidence.maintainer_signals | https://github.com/apache/tvm

Source: Doramagic discovery, validation, and Project Pack records

Community Discussion Evidence

These external discussion links are review inputs, not standalone proof that the project is production-ready.

Sources 12

Count of project-level external discussion links exposed on this manual page.

Use Review before install

Open the linked issues or discussions before treating the pack as ready for your environment.

Community Discussion Evidence

Doramagic exposes project-level community discussion separately from official documentation. Review these links before using tvm with real data or production workflows.

[[VOTE] Release Apache TVM v0.25.0.rc1](https://github.com/apache/tvm/issues/19802) - github / github_issue
[[VOTE] Release Apache TVM v0.25.0.rc0](https://github.com/apache/tvm/issues/19702) - github / github_issue
[[Tracking Issue][TFLite] Remaining builtin operator coverage beyond #194](https://github.com/apache/tvm/issues/19519) - github / github_issue
v0.25.0.rc1 - github / github_release
v0.25.0.rc0 - github / github_release
Apache TVM v0.24.0 - github / github_release
Apache TVM v0.23.0 - github / github_release
Apache TVM v0.22.0 - github / github_release
Apache TVM v0.21.0 - github / github_release
Apache TVM v0.20.0 - github / github_release
Apache TVM v0.19.0 - github / github_release
Apache TVM v0.18.0 - github / github_release

Source: Project Pack community evidence and pitfall evidence