# https://github.com/huggingface/kernels Project Manual

Generated at: 2026-06-17 13:31:04 UTC

## Table of Contents

- [Project Overview and System Architecture](#page-1)
- [Loading Kernels with the `kernels` Python Package](#page-2)
- [Building Kernels with `kernel-builder` and the Nix Builder](#page-3)
- [Example Kernels and Backend Variants](#page-4)

<a id='page-1'></a>

## Project Overview and System Architecture

### Related Pages

Related topics: [Loading Kernels with the `kernels` Python Package](#page-2), [Building Kernels with `kernel-builder` and the Nix Builder](#page-3), [Example Kernels and Backend Variants](#page-4)

<details>
<summary>Related Source Files</summary>

The following source files were used to generate this page:

- [README.md](https://github.com/huggingface/kernels/blob/main/README.md)
- [nix-builder/README.md](https://github.com/huggingface/kernels/blob/main/nix-builder/README.md)
- [kernels/src/kernels/utils.py](https://github.com/huggingface/kernels/blob/main/kernels/src/kernels/utils.py)
- [kernels/src/kernels/lockfile.py](https://github.com/huggingface/kernels/blob/main/kernels/src/kernels/lockfile.py)
- [kernels/src/kernels/benchmark.py](https://github.com/huggingface/kernels/blob/main/kernels/src/kernels/benchmark.py)
- [kernels/src/kernels/layer/func.py](https://github.com/huggingface/kernels/blob/main/kernels/src/kernels/layer/func.py)
- [kernel-builder/src/main.rs](https://github.com/huggingface/kernels/blob/main/kernel-builder/src/main.rs)
- [kernel-builder/src/upload.rs](https://github.com/huggingface/kernels/blob/main/kernel-builder/src/upload.rs)
- [kernel-builder/src/init.rs](https://github.com/huggingface/kernels/blob/main/kernel-builder/src/init.rs)
- [kernel-builder/src/card.rs](https://github.com/huggingface/kernels/blob/main/kernel-builder/src/card.rs)
- [kernel-builder/src/pyproject/ops_identifier.rs](https://github.com/huggingface/kernels/blob/main/kernel-builder/src/pyproject/ops_identifier.rs)
- [kernel-builder/src/pyproject/tvm_ffi/mod.rs](https://github.com/huggingface/kernels/blob/main/kernel-builder/src/pyproject/tvm_ffi/mod.rs)
- [kernel-builder/src/pyproject/templates/torch/noarch/setup.py](https://github.com/huggingface/kernels/blob/main/kernel-builder/src/pyproject/templates/torch/noarch/setup.py)
- [kernel-builder/src/init/templates/CARD.md](https://github.com/huggingface/kernels/blob/main/kernel-builder/src/init/templates/CARD.md)
- [kernel-builder/skills/xpu-kernels/README.md](https://github.com/huggingface/kernels/blob/main/kernel-builder/skills/xpu-kernels/README.md)
- [kernels-data/src/lib.rs](https://github.com/huggingface/kernels/blob/main/kernels-data/src/lib.rs)
- [kernels-data/src/config/v1.rs](https://github.com/huggingface/kernels/blob/main/kernels-data/src/config/v1.rs)
- [kernels-data/bindings/python/src/lib.rs](https://github.com/huggingface/kernels/blob/main/kernels-data/bindings/python/src/lib.rs)
</details>

# Project Overview and System Architecture

The **huggingface/kernels** project provides a unified stack for distributing and consuming hardware-specific compute kernels as ordinary Python packages. It is designed around three roles: a Python runtime that loads kernels from the Hub, a `kernel-builder` CLI that turns upstream C++/CUDA/HIP/XPU/TVM-FFI sources into Hub-ready artifacts, and a `kernels-data` shared library that holds the canonical build configuration and metadata schema. The project is "kernel repo type" aware on the Hub, meaning kernels are first-class repositories with their own `repo_type="kernel"`, versioning, and discovery surface ([nix-builder/README.md](https://github.com/huggingface/kernels/blob/main/nix-builder/README.md)).

## High-Level Architecture

The system is best understood as a pipeline: a kernel author writes `build.toml` + sources, the `kernel-builder` CLI compiles per-backend variants and uploads them to the Hub, and an end user resolves a kernel through the `kernels` Python package. The diagram below shows the major components and the data that flows between them.

```mermaid
flowchart LR
    A[Kernel source<br/>build.toml + C++/CUDA] --> B[kernel-builder CLI]
    B -->|pyproject + setup.py| C[Per-backend wheels<br/>build/torch-cuda, ...]
    C -->|upload| D[(Hugging Face Hub<br/>repo_type=kernel)]
    D -->|get_kernel| E[kernels Python package]
    E -->|torch/TVM-FFI| F[User model]
    B -.-> G[kernels-data<br/>config + metadata]
    E -.-> G
    G -->|Backend enum| B
```

The shared `kernels-data` crate defines the `Backend` enum (`Cann`, `Cpu`, `Cuda`, `Metal`, `Neuron`, `Rocm`, `Xpu`) and the `build.toml` schema used by every component, ensuring that the CLI, the Python loader, and the Hub metadata stay in lock-step ([kernels-data/src/lib.rs](https://github.com/huggingface/kernels/blob/main/kernels-data/src/lib.rs), [kernels-data/bindings/python/src/lib.rs](https://github.com/huggingface/kernels/blob/main/kernels-data/bindings/python/src/lib.rs)).

## Build System: `kernel-builder`

The Rust-based `kernel-builder` CLI exposes the full kernel lifecycle through subcommands. Reading [kernel-builder/src/main.rs](https://github.com/huggingface/kernels/blob/main/kernel-builder/src/main.rs), the available commands include `Init` (scaffold a new kernel), `CheckConfig` / `CheckBuilds` (validate `build.toml` and outputs), `CreatePyproject` (render `setup.py` and `pyproject.toml` for a specific backend), `Devshell` (drop into a Nix dev shell), `FillCard` (render the Hub model card), and `Upload`.

### `build.toml` and kernel identification

Every kernel declares its identity in `build.toml`. The `[general]` section lists the supported backends, the Python name, license, and an optional `[general.hub]` block with the destination `repo-id` and `branch`. The `kernels-data` parser materializes this into a typed `Build` structure and rejects invalid configurations ([kernels-data/src/config/v1.rs](https://github.com/huggingface/kernels/blob/main/kernels-data/src/config/v1.rs)).

At build time each artifact is suffixed with a **unique identifier** to avoid module name collisions when multiple versions of the same kernel are loaded side by side. `KernelIdentifier::new` derives a Git short hash from the source tree and falls back to a random string when Git is unavailable, then composes identifiers of the form `_<name>_<backend>_<unique_id>` ([kernel-builder/src/pyproject/ops_identifier.rs](https://github.com/huggingface/kernels/blob/main/kernel-builder/src/pyproject/ops_identifier.rs)).

### Per-backend templates

The CLI generates a backend-specific `setup.py` from Jinja templates. The Torch CPU/noarch template implements a custom `BuildKernel` `setuptools` command that reads `build.toml`, intersects the requested `--backends` with those declared in `[general]`, and invokes the per-backend builder under `build/` ([kernel-builder/src/pyproject/templates/torch/noarch/setup.py](https://github.com/huggingface/kernels/blob/main/kernel-builder/src/pyproject/templates/torch/noarch/setup.py)). A parallel `tvm_ffi` template renders the same scaffolding for the experimental TVM-FFI framework ([kernel-builder/src/pyproject/tvm_ffi/mod.rs](https://github.com/huggingface/kernels/blob/main/kernel-builder/src/pyproject/tvm_ffi/mod.rs)). The `nix-builder` wraps the same flow inside a reproducible Nix expression, leveraging the Hugging Face binary cache for fast incremental builds ([nix-builder/README.md](https://github.com/huggingface/kernels/blob/main/nix-builder/README.md)).

### Uploading and model cards

`upload` resolves the target `repo_id` and branch from CLI args, the `build.toml` `[general.hub]` block, or the per-variant `metadata.json` files. This fallback chain lets authors pin a build to a specific branch such as `build-toml-branch` while still allowing CI overrides ([kernel-builder/src/upload.rs](https://github.com/huggingface/kernels/blob/main/kernel-builder/src/upload.rs)). The companion `fill-card` command renders `CARD.md` from a template that includes usage snippets, available functions, optional layers, and benchmark instructions ([kernel-builder/src/card.rs](https://github.com/huggingface/kernels/blob/main/kernel-builder/src/card.rs), [kernel-builder/src/init/templates/CARD.md](https://github.com/huggingface/kernels/blob/main/kernel-builder/src/init/templates/CARD.md)).

## Python Runtime: the `kernels` Package

The runtime is the consumer-facing half of the project. It centers on `get_kernel`, which resolves a `repo_id` plus `version` to a `metadata.json`, picks the best variant for the current PyTorch build, downloads the wheel, and registers it as a loadable Python module — even from a path outside `PYTHONPATH` ([kernels/src/kernels/utils.py](https://github.com/huggingface/kernels/blob/main/kernels/src/kernels/utils.py)). The `KERNELS_CACHE` environment variable configures the cache directory, and `LOCAL_KERNELS` allows overriding repo IDs with local build directories for development without uploading.

### Versioning, lockfiles, and reproducibility

Starting with v0.15.1, `version` is mandatory when calling `get_kernel`; bare repo lookups are no longer accepted (see [v0.15.1 release notes](https://github.com/huggingface/kernels/releases/tag/v0.15.1)). The runtime ships a lockfile mechanism that pins every file in a variant to its LFS/Blob SHA, which is essential for reproducibility ([kernels/src/kernels/lockfile.py](https://github.com/huggingface/kernels/blob/main/kernels/src/kernels/lockfile.py)). This is complemented by a proposal to record the `kernel-builder` Git SHA plus a `dirty` flag in the build metadata so that consumers can detect builds produced from uncommitted sources ([issue #648](https://github.com/huggingface/kernels/issues/648)).

### Layers, functions, and benchmarks

Beyond raw module loading, the package exposes higher-level integration patterns:

| Construct | Purpose | Source |
| --- | --- | --- |
| `FuncRepository` | Reference a single function (e.g. `silu_and_mul`) inside a kernel repo; supports `can_torch_compile` / `can_backward` (v0.15.2) | [kernels/src/kernels/layer/func.py](https://github.com/huggingface/kernels/blob/main/kernels/src/kernels/layer/func.py) |
| `use_kernel_func_from_hub` | Decorator factory that makes a function kernel-pluggable | [kernels/src/kernels/layer/func.py](https://github.com/huggingface/kernels/blob/main/kernels/src/kernels/layer/func.py) |
| `Benchmark` | Base class for `kernels benchmark <repo>` scripts with `setup`, `verify_*`, and timing hooks | [kernels/src/kernels/benchmark.py](https://github.com/huggingface/kernels/blob/main/kernels/src/kernels/benchmark.py) |

## Community and Operations

A few community-driven concerns have shaped the architecture. The Hub is now a first-class "kernel" repo type, with an overview page at `huggingface.co/kernels` that lets users filter by backend (v0.14.0). Trust gating uses the Hub API to check publishers, addressing [issue around untrusted kernels](https://github.com/huggingface/kernels/issues/651). Security analysis reports for `kernels-community` repos are tracked in [issue #657](https://github.com/huggingface/kernels/issues/657), and additional integration requests (FlagOS, causal-conv1d, XPU skill packaging) are openly discussed in the tracker ([kernel-builder/skills/xpu-kernels/README.md](https://github.com/huggingface/kernels/blob/main/kernel-builder/skills/xpu-kernels/README.md), [issue #130](https://github.com/huggingface/kernels/issues/130), [issue #317](https://github.com/huggingface/kernels/issues/317)).

## See Also

- [Kernel Builder CLI Reference](#)
- [Loading Kernels at Runtime](#)
- [build.toml Schema Reference](#)
- [Hub Kernel Repo Type](#)

---

<a id='page-2'></a>

## Loading Kernels with the `kernels` Python Package

### Related Pages

Related topics: [Project Overview and System Architecture](#page-1), [Building Kernels with `kernel-builder` and the Nix Builder](#page-3), [Example Kernels and Backend Variants](#page-4)

<details>
<summary>Related Source Files</summary>

The following source files were used to generate this page:

- [kernels/README.md](https://github.com/huggingface/kernels/blob/main/kernels/README.md)
- [README.md](https://github.com/huggingface/kernels/blob/main/README.md)
- [kernels/src/kernels/utils.py](https://github.com/huggingface/kernels/blob/main/kernels/src/kernels/utils.py)
- [kernels/src/kernels/benchmark.py](https://github.com/huggingface/kernels/blob/main/kernels/src/kernels/benchmark.py)
- [kernels/src/kernels/cli/benchmark.py](https://github.com/huggingface/kernels/blob/main/kernels/src/kernels/cli/benchmark.py)
- [kernels/src/kernels/layer/func.py](https://github.com/huggingface/kernels/blob/main/kernels/src/kernels/layer/func.py)
- [kernel-builder/README.md](https://github.com/huggingface/kernels/blob/main/kernel-builder/README.md)
- [kernel-builder/src/init.rs](https://github.com/huggingface/kernels/blob/main/kernel-builder/src/init.rs)
- [kernel-builder/src/init/templates/example.py](https://github.com/huggingface/kernels/blob/main/kernel-builder/src/init/templates/example.py)
- [kernel-builder/src/upload.rs](https://github.com/huggingface/kernels/blob/main/kernel-builder/src/upload.rs)
- [kernel-builder/src/pyproject/templates/torch/noarch/setup.py](https://github.com/huggingface/kernels/blob/main/kernel-builder/src/pyproject/templates/torch/noarch/setup.py)
- [kernel-builder/src/pyproject/tvm_ffi/mod.rs](https://github.com/huggingface/kernels/blob/main/kernel-builder/src/pyproject/tvm_ffi/mod.rs)
- [kernel-builder/src/pyproject/torch/mod.rs](https://github.com/huggingface/kernels/blob/main/kernel-builder/src/pyproject/torch/mod.rs)
- [kernels-data/bindings/python/src/lib.rs](https://github.com/huggingface/kernels/blob/main/kernels-data/bindings/python/src/lib.rs)
</details>

# Loading Kernels with the `kernels` Python Package

The `kernels` package is a thin runtime layer that lets Python applications and libraries pull pre-built compute kernels (CUDA, ROCm, XPU, Metal, CPU, Neuron, CANN) directly from the [Hugging Face Hub](https://hf.co/) and load them as if they were regular Python modules. Unlike a normal pip-installed extension, a Hub-loaded kernel is **portable** (it can be loaded from paths outside `PYTHONPATH`), **unique** (multiple versions can coexist in a single process), and **compatible** (it works across recent Python versions and multiple PyTorch ABIs) [Source: [kernels/README.md:1-12](https://github.com/huggingface/kernels/blob/main/kernels/README.md)].

## High-Level Loading Flow

When a caller invokes `get_kernel("owner/repo", version=N)`, the package resolves the requested Hub revision, enumerates the available per-backend build variants, selects the variant that matches the host's current backend (e.g. `cuda`, `xpu`, `cpu`, `metal`, `rocm`, `neuron`, `cann`), downloads the wheel into a local cache, and imports the resulting Python module. The selected module is then registered in a process-global registry so it can be inspected with `get_loaded_kernels()` [Source: [kernels/src/kernels/utils.py:1-120](https://github.com/huggingface/kernels/blob/main/kernels/src/kernels/utils.py)].

```mermaid
flowchart LR
    A[User calls get_kernel] --> B[Resolve Hub repo + version]
    B --> C[Fetch variant list via Hub API]
    C --> D{Backend matches host?}
    D -- No --> E[Raise unsupported backend]
    D -- Yes --> F[Download wheel to KERNELS_CACHE]
    F --> G[Import module from cache path]
    G --> H[Register LoadedKernel in _loaded_kernels]
    H --> I[Return module to caller]
```

## Core Public API

The package exposes a small, focused set of entry points. The table below summarises the loaders and their parameters as defined in the source.

| Function | Source | Key arguments | Returns |
| --- | --- | --- | --- |
| `get_kernel(repo_id, version=..., revision=..., backend=..., trust_remote_code=...)` | [kernels/src/kernels/utils.py](https://github.com/huggingface/kernels/blob/main/kernels/src/kernels/utils.py) | `repo_id` (`"owner/name"`), `version` (required integer, see v0.15.1 below), `revision` (branch/tag/sha), `backend` (auto-detected), `trust_remote_code` | Imported `ModuleType` |
| `get_local_kernel(path, ...)` | [kernels/src/kernels/utils.py](https://github.com/huggingface/kernels/blob/main/kernels/src/kernels/utils.py) | Local `Path` to a built kernel tree (e.g. `build/`) | Imported `ModuleType` |
| `load_kernel(repo_id, lockfile=..., backend=..., revision=...)` | [kernels/src/kernels/utils.py:170-200](https://github.com/huggingface/kernels/blob/main/kernels/src/kernels/utils.py) | Mutually exclusive `lockfile` or `revision`; if both absent the locked SHA is read from caller package metadata | Imported `ModuleType` |
| `get_loaded_kernels()` | [kernels/src/kernels/utils.py:60-80](https://github.com/huggingface/kernels/blob/main/kernels/src/kernels/utils.py) | None | `list[LoadedKernel]` snapshot |
| `get_local_kernel_overrides()` (via `LOCAL_KERNELS`) | [kernels/src/kernels/utils.py:90-120](https://github.com/huggingface/kernels/blob/main/kernels/src/kernels/utils.py) | Colon-separated `name=path` entries | Mapping of repo name → local path |

### A minimal end-to-end example

```python
import torch
from kernels import get_kernel

# `version` is required since v0.15.1
activation = get_kernel("kernels-community/activation", version=1)

x = torch.randn((10, 10), dtype=torch.float16, device="cuda")
y = torch.empty_like(x)
activation.gelu_fast(y, x)
```

[Source: [kernels/README.md:21-39](https://github.com/huggingface/kernels/blob/main/kernels/README.md)]

## The `LoadedKernel` Data Model

Every successfully imported kernel is wrapped in a `LoadedKernel` dataclass that captures both the runtime handle and the descriptive metadata needed for introspection, logging, and reproducibility checks [Source: [kernels/src/kernels/utils.py:30-80](https://github.com/huggingface/kernels/blob/main/kernels/src/kernels/utils.py)].

| Field | Type | Meaning |
| --- | --- | --- |
| `metadata` | `Metadata` | Backend-agnostic descriptor: `id`, `name`, `version`, `license`, `upstream`, `source`, `python_depends`, `backend` |
| `module` | `ModuleType` | The imported Python module exposing the kernel ops |
| `repo_info` | `RepoInfo \| None` | `(repo_id, revision)` for Hub loads; `None` for `get_local_kernel` / `load_kernel` / `get_locked_kernel` |

`Metadata` and the `Backend` enum are produced by the Rust core (`kernels-data`) and re-exported to Python through PyO3 bindings. The supported backends are `CANN`, `CPU`, `CUDA`, `Metal`, `Neuron`, `ROCm`, and `XPU` [Source: [kernels-data/bindings/python/src/lib.rs:1-60](https://github.com/huggingface/kernels/blob/main/kernels-data/bindings/python/src/lib.rs)].

## Configuration & Environment Variables

Two environment variables shape the runtime behaviour of the loader:

- `KERNELS_CACHE` — overrides the directory where downloaded wheels are stored. If unset, the package falls back to its default cache location [Source: [kernels/src/kernels/utils.py:82-88](https://github.com/huggingface/kernels/blob/main/kernels/src/kernels/utils.py)].
- `LOCAL_KERNELS` — a colon-separated list of `repo_name=path` entries that take precedence over Hub downloads. This is the recommended way to point an application at a freshly built kernel during development, and is exactly how the `kernel-builder init` example script tests a local build [Source: [kernels/src/kernels/utils.py:90-120](https://github.com/huggingface/kernels/blob/main/kernels/src/kernels/utils.py); [kernel-builder/src/init/templates/example.py:1-20](https://github.com/huggingface/kernels/blob/main/kernel-builder/src/init/templates/example.py)].

The `trust_remote_code` flag (passed to `get_kernel`) controls whether the loader will execute repository code that is not signed by a trusted Hub publisher. As of v0.14.1 the package consults the Hub API to verify publisher trust rather than relying solely on a local allow-list [Source: [kernels/src/kernels/utils.py:130-160](https://github.com/huggingface/kernels/blob/main/kernels/src/kernels/utils.py)].

## Integration Patterns: `FuncRepository` and Decorators

For library authors who want to *map* PyTorch modules onto Hub-backed kernel functions, the package ships a layer system. `FuncRepository` references a single function inside a Hub kernel repo and exposes it as a `torch.nn.Module` subclass; `LocalFuncRepository` does the same for a function inside a locally built kernel directory. Both classes override `__hash__` / `__eq__` so they can be used as dictionary keys in registry-style dispatch tables [Source: [kernels/src/kernels/layer/func.py:1-120](https://github.com/huggingface/kernels/blob/main/kernels/src/kernels/layer/func.py)].

The companion decorator `use_kernel_func_from_hub(func_name)` rewrites a plain Python function so that, when called on a `torch.nn.Module`, it is replaced by the Hub kernel version — provided the caller has explicitly opted in via `trust_remote_code`. v0.15.2 added first-class support for `can_torch_compile` and `can_backward` flags on `FuncRepository`, allowing the loader to skip functions that would not survive `torch.compile` or autograd tracing [Source: [kernels/src/kernels/layer/func.py:120-180](https://github.com/huggingface/kernels/blob/main/kernels/src/kernels/layer/func.py); [release v0.15.2](https://github.com/huggingface/kernels/releases/tag/v0.15.2)].

## Benchmarking Loaded Kernels

The CLI subcommand `kernels benchmark <repo_id>` runs user-defined benchmarks against a loaded kernel. Users subclass `kernels.benchmark.Benchmark`, implement `setup()` plus one or more `benchmark_*()` methods, and (optionally) `verify_*()` methods that return a reference tensor. The runner handles device synchronisation across CUDA, XPU, and MPS, supports warmup/iteration counts, and can upload results to the Hub via `POST /api/kernels/{repo_id}/benchmarks` [Source: [kernels/src/kernels/benchmark.py:1-50](https://github.com/huggingface/kernels/blob/main/kernels/src/kernels/benchmark.py); [kernels/src/kernels/cli/benchmark.py:1-120](https://github.com/huggingface/kernels/blob/main/kernels/src/kernels/cli/benchmark.py)].

```python
import torch
from kernels import Benchmark

class SiluBenchmark(Benchmark):
    def setup(self):
        self.x = torch.randn(128, 1024, device=self.device, dtype=torch.float16)

    def benchmark_silu(self):
        self.kernel.silu_and_mul(self.x)
```

[Source: [kernels/src/kernels/benchmark.py:10-30](https://github.com/huggingface/kernels/blob/main/kernels/src/kernels/benchmark.py)]

## Common Failure Modes

| Symptom | Likely cause | Fix |
| --- | --- | --- |
| `ValueError: version is required` | Pre-v0.15.1 call without `version=`. Since v0.15.1, specifying the kernel version is mandatory. | Pass an explicit `version=N`. [Source: [release v0.15.1](https://github.com/huggingface/kernels/releases/tag/v0.15.1)] |
| `RuntimeError: no matching backend` | Host PyTorch is built for a backend that the repo does not publish (e.g. loading a CUDA-only kernel on an XPU machine). | Pick a repo that publishes the matching variant, or set `backend=` explicitly. [Source: [kernels/src/kernels/utils.py:130-160](https://github.com/huggingface/kernels/blob/main/kernels/src/kernels/utils.py)] |
| `ValueError: lockfile and revision both cannot be specified` | Passed both to `load_kernel`. | Use exactly one. [Source: [kernels/src/kernels/utils.py:170-200](https://github.com/huggingface/kernels/blob/main/kernels/src/kernels/utils.py)] |
| Stale local build being ignored | A wheel from a previous run is still in the cache and is preferred over `LOCAL_KERNELS`. | Clear `KERNELS_CACHE` or set `LOCAL_KERNELS` so local paths win. [Source: [kernels/src/kernels/utils.py:82-120](https://github.com/huggingface/kernels/blob/main/kernels/src/kernels/utils.py)] |
| "Dirty" build warning | Build emitted by an uncommitted `kernel-builder` checkout. | Pin a released `kernel-builder` version. [Source: [issue #648](https://github.com/huggingface/kernels/issues/648)] |

## See Also

- [kernels/README.md](https://github.com/huggingface/kernels/blob/main/kernels/README.md) — package overview and quick start
- [kernel-builder/README.md](https://github.com/huggingface/kernels/blob/main/kernel-builder/README.md) — building and uploading kernels
- [kernels/src/kernels/utils.py](https://github.com/huggingface/kernels/blob/main/kernels/src/kernels/utils.py) — loader implementation
- [kernels/src/kernels/layer/func.py](https://github.com/huggingface/kernels/blob/main/kernels/src/kernels/layer/func.py) — `FuncRepository` and decorator
- [kernels/src/kernels/benchmark.py](https://github.com/huggingface/kernels/blob/main/kernels/src/kernels/benchmark.py) — benchmarking base class
- [Hub kernels overview](https://huggingface.co/kernels) — searchable registry of published kernels
- Release notes: [v0.15.2](https://github.com/huggingface/kernels/releases/tag/v0.15.2), [v0.15.1](https://github.com/huggingface/kernels/releases/tag/v0.15.1), [v0.14.1](https://github.com/huggingface/kernels/releases/tag/v0.14.1), [v0.14.0](https://github.com/huggingface/kernels/releases/tag/v0.14.0)

---

<a id='page-3'></a>

## Building Kernels with `kernel-builder` and the Nix Builder

### Related Pages

Related topics: [Project Overview and System Architecture](#page-1), [Loading Kernels with the `kernels` Python Package](#page-2), [Example Kernels and Backend Variants](#page-4)

<details>
<summary>Related Source Files</summary>

The following source files were used to generate this page:

- [nix-builder/README.md](https://github.com/huggingface/kernels/blob/main/nix-builder/README.md)
- [kernel-builder/src/main.rs](https://github.com/huggingface/kernels/blob/main/kernel-builder/src/main.rs)
- [kernel-builder/src/build.rs](https://github.com/huggingface/kernels/blob/main/kernel-builder/src/build.rs)
- [kernel-builder/src/init.rs](https://github.com/huggingface/kernels/blob/main/kernel-builder/src/init.rs)
- [kernel-builder/src/upload.rs](https://github.com/huggingface/kernels/blob/main/kernel-builder/src/upload.rs)
- [kernel-builder/src/pyproject/ops_identifier.rs](https://github.com/huggingface/kernels/blob/main/kernel-builder/src/pyproject/ops_identifier.rs)
- [kernel-builder/src/pyproject/torch/mod.rs](https://github.com/huggingface/kernels/blob/main/kernel-builder/src/pyproject/torch/mod.rs)
- [kernel-builder/src/pyproject/tvm_ffi/mod.rs](https://github.com/huggingface/kernels/blob/main/kernel-builder/src/pyproject/tvm_ffi/mod.rs)
- [kernel-builder/src/pyproject/kernel.rs](https://github.com/huggingface/kernels/blob/main/kernel-builder/src/pyproject/kernel.rs)
- [kernel-builder/src/pyproject/templates/torch/noarch/setup.py](https://github.com/huggingface/kernels/blob/main/kernel-builder/src/pyproject/templates/torch/noarch/setup.py)
- [kernels-data/src/config/mod.rs](https://github.com/huggingface/kernels/blob/main/kernels-data/src/config/mod.rs)
- [terraform/README.md](https://github.com/huggingface/kernels/blob/main/terraform/README.md)
- [kernels/src/kernels/utils.py](https://github.com/huggingface/kernels/blob/main/kernels/src/kernels/utils.py)
- [kernels/src/kernels/__init__.py](https://github.com/huggingface/kernels/blob/main/kernels/src/kernels/__init__.py)
</details>

# Building Kernels with `kernel-builder` and the Nix Builder

## Overview and Purpose

The `kernel-builder` CLI (with its companion Nix package in `nix-builder/`) is the upstream half of the Hugging Face Kernels system. It scaffolds, compiles, validates, and uploads compute kernels that are then loaded at runtime by the `kernels` Python package. According to [nix-builder/README.md](https://github.com/huggingface/kernels/blob/main/nix-builder/README.md), the builder exists to guarantee three properties:

- **Portable** — kernels can be loaded from paths outside `PYTHONPATH`.
- **Unique** — multiple versions of the same kernel can coexist in a single Python process.
- **Compatible** — kernels support recent Python versions and the various PyTorch build configurations (different CUDA versions and C++ ABIs).

Under the hood, the builder is a Rust binary defined in [kernel-builder/src/main.rs](https://github.com/huggingface/kernels/blob/main/kernel-builder/src/main.rs) that orchestrates a Nix-based build via the `nix` CLI, while a separate Python `setup.py` template drives the per-backend compilation. The whole pipeline produces Hub-compatible kernel repositories that can later be installed with `kernels.install_kernel` (see [kernels/src/kernels/__init__.py](https://github.com/huggingface/kernels/blob/main/kernels/src/kernels/__init__.py)).

## The Kernel-Builder CLI

### Subcommands

The CLI exposes its commands via the `Cli` enum in [kernel-builder/src/main.rs](https://github.com/huggingface/kernels/blob/main/kernel-builder/src/main.rs). The headline commands are:

| Command | Purpose |
| --- | --- |
| `init` | Scaffold a new kernel project from a template. |
| `build` / `build-and-copy` | Compile the kernel variants via Nix. |
| `devshell` | Spawn a Nix development shell for hacking on the kernel. |
| `create-pyproject` | Render the generated CMake/`pyproject.toml` for inspection. |
| `check-config` | Validate `build.toml`. |
| `check-abi` | Verify ABI compatibility of an extension. |
| `check-builds` | Validate already-built artifacts. |
| `upload` | Push build artifacts to the Hugging Face Hub. |
| `build-and-upload` | Build and upload in one step. |
| `fill-card` | Render the `CARD.md` template for the kernel. |

### Scaffolding a Kernel

`kernel-builder init` parses positional arguments through the `InitArgs` struct in [kernel-builder/src/init.rs](https://github.com/huggingface/kernels/blob/main/kernel-builder/src/init.rs). It accepts an optional `--name OWNER/REPO` flag and a `--backends` list whose default values come from `default_init_backends()`. The `BackendSelection` enum supports the literal `"all"` (matching every supported backend) or any value accepted by `Backend::from_str`, which is defined in [kernels-data/src/config/mod.rs](https://github.com/huggingface/kernels/blob/main/kernels-data/src/config/mod.rs). The supported backends are listed below.

### Build, Upload, and Reproducibility

`run_build` and `run_build_and_copy` in [kernel-builder/src/build.rs](https://github.com/huggingface/kernels/blob/main/kernel-builder/src/build.rs) prepare a Nix flake and dispatch either `nix build` (per-variant attribute `redistributable.{variant}`) or `nix run` against the `build-and-copy` attribute. Each produced kernel carries a unique identifier assembled by `KernelIdentifier::to_string_for_backend` in [kernel-builder/src/pyproject/ops_identifier.rs](https://github.com/huggingface/kernels/blob/main/kernel-builder/src/pyproject/ops_identifier.rs), formatted as `_{name}_{backend}_{unique_id}`. The identifier is derived from a Git short hash when available, otherwise from a random string — which is the reproducibility signal discussed in [issue #648](https://github.com/huggingface/kernels/issues/648), where a `dirty` boolean and the commit SHA of `kernel-builder` itself should be embedded in the build metadata.

`run_upload` in [kernel-builder/src/upload.rs](https://github.com/huggingface/kernels/blob/main/kernel-builder/src/upload.rs) reads `--repo-id` and `--branch` either from CLI flags or from `[general.hub]` in `build.toml` and `metadata.json`. If neither is set, it falls back to `detect_branch_from_metadata`, which inspects each variant directory for `metadata.json` to derive the version branch.

## Build Configuration and Backend Support

### Backend Matrix

[kernels-data/src/config/mod.rs](https://github.com/huggingface/kernels/blob/main/kernels-data/src/config/mod.rs) defines `Backend::all()` as the canonical seven-backend set: `Cann`, `Cpu`, `Cuda`, `Metal`, `Neuron`, `Rocm`, `Xpu`. [nix-builder/README.md](https://github.com/huggingface/kernels/blob/main/nix-builder/README.md) summarises the current support tier:

| Backend | Kernels runtime | Kernel-builder | CI validated | Tier |
| --- | --- | --- | --- | --- |
| CUDA | ✓ | ✓ | ✓ | 1 |
| ROCm | ✓ | ✓ | ✗ | 2 |
| XPU | ✓ | ✓ | ✗ | 2 |
| Metal | ✓ | ✓ | ✗ | 2 |
| Huawei NPU | ✓ | ✗ | ✗ | 3 |
| Neuron | ✓ (experimental) | ✗ | ✗ | 3 |

The same file warns that Neuron support is experimental and currently requires pre-release packages.

### FFI Backends: Torch and TVM FFI

Per-backend scaffolding is generated by two sibling modules. [kernel-builder/src/pyproject/torch/mod.rs](https://github.com/huggingface/kernels/blob/main/kernel-builder/src/pyproject/torch/mod.rs) handles Torch FFI builds, embedding CMake helpers such as `build-variants.cmake`, `kernel.cmake`, and backend-specific toolchains (e.g. `compile-metal.cmake`, `hipify.py`, `metallib_to_header.py`). [kernel-builder/src/pyproject/tvm_ffi/mod.rs](https://github.com/huggingface/kernels/blob/main/kernel-builder/src/pyproject/tvm_ffi/mod.rs) does the same for the newer TVM FFI backend, including a CUDA capability-detection script.

Both modules feed the same `render_kernel_components` function in [kernel-builder/src/pyproject/kernel.rs](https://github.com/huggingface/kernels/blob/main/kernel-builder/src/pyproject/kernel.rs), which switches on the `Kernel` variant (`Cpu`, `Cuda`, `Rocm`, `Metal`, `Xpu`) and emits the correct set of source paths for each kernel.

### Setup and `build.toml`

After scaffolding, the project ships a `build.toml` and a generated `setup.py`. The template at [kernel-builder/src/pyproject/templates/torch/noarch/setup.py](https://github.com/huggingface/kernels/blob/main/kernel-builder/src/pyproject/templates/torch/noarch/setup.py) defines a `BuildKernel` command that:

1. Reads `build.toml` with `tomllib` (Python 3.11+) or `tomli`.
2. Reads `general.backends` and intersects with `--backends=…` if provided.
3. Creates a `build/` directory and invokes `build_backend` for each requested backend.

A typical Nix invocation against an example project is shown in the README:

```bash
cd examples/relu
nix run .#build-and-copy \
  --max-jobs 2 \
  --cores 8 \
  -L
```

To accelerate rebuilds, the README recommends enabling the Hugging Face binary cache with `cachix use huggingface`.

## Uploading, Loading, and Tooling

Once the artifacts exist, `kernel-builder upload` (or the combined `build-and-upload`) pushes each variant to the Hub. After upload, downstream users consume kernels through the Python API surface listed in [kernels/src/kernels/__init__.py](https://github.com/huggingface/kernels/blob/main/kernels/src/kernels/__init__.py): `get_kernel`, `get_kernel_variants`, `install_kernel`, `get_local_kernel`, `get_locked_kernel`, and helpers such as `use_kernel_func_from_hub` and `replace_kernel_forward_from_hub`. As announced in the [v0.15.1 release notes](https://github.com/huggingface/kernels/releases/tag/v0.15.1), specifying a kernel version is now mandatory:

```python
# Not valid anymore!
activation = kernels.get_kernel("kernels-community/activation")

# Required form:
activation = kernels.get_kernel("kernels-community/activation", version=1)
```

For iteration and benchmarking, [kernels/src/kernels/benchmark.py](https://github.com/huggingface/kernels/blob/main/kernels/src/kernels/benchmark.py) defines a `Benchmark` base class that auto-loads the kernel from a `repo_id`, runs `setup()`, and exposes `benchmark_*` / `verify_*` methods for the `kernels benchmark` runner. The CLI reference for `kernel-builder` is itself a recent pain point (see [docs issue #621](https://github.com/huggingface/kernels/issues/621)), and the Builder README notes that contributors can provision an EC2 development workspace via the scripts in [terraform/README.md](https://github.com/huggingface/kernels/blob/main/terraform/README.md), which seeds a `nix develop` shell and a 1 TiB data volume for kernel work.

## See Also

- [Using the `kernels` Python package](https://github.com/huggingface/kernels)
- [`kernel-builder` CLI reference](https://github.com/huggingface/kernels/tree/main/kernel-builder)
- [Hub kernel repositories](https://huggingface.co/kernels) (introduced in [v0.14.0](https://github.com/huggingface/kernels/releases/tag/v0.14.0))
- [Release v0.13.0 — `kernel-builder` CLI overhaul](https://github.com/huggingface/kernels/releases/tag/v0.13.0)
- [Issue #651 — Guide for users without kernel publishing access](https://github.com/huggingface/kernels/issues/651)
- [Issue #657 — Security analysis reports for community kernels](https://github.com/huggingface/kernels/issues/657)

---

<a id='page-4'></a>

## Example Kernels and Backend Variants

### Related Pages

Related topics: [Project Overview and System Architecture](#page-1), [Loading Kernels with the `kernels` Python Package](#page-2), [Building Kernels with `kernel-builder` and the Nix Builder](#page-3)

<details>
<summary>Related Source Files</summary>

The following source files were used to generate this page:

- [examples/kernels/relu/build.toml](https://github.com/huggingface/kernels/blob/main/examples/kernels/relu/build.toml)
- [examples/kernels/relu/flake.nix](https://github.com/huggingface/kernels/blob/main/examples/kernels/relu/flake.nix)
- [examples/kernels/relu/CARD.md](https://github.com/huggingface/kernels/blob/main/examples/kernels/relu/CARD.md)
- [examples/kernels/relu-torch-stable-abi/build.toml](https://github.com/huggingface/kernels/blob/main/examples/kernels/relu-torch-stable-abi/build.toml)
- [examples/kernels/relu-torch-stable-abi/flake.nix](https://github.com/huggingface/kernels/blob/main/examples/kernels/relu-torch-stable-abi/flake.nix)
- [examples/kernels/relu-tvm-ffi/build.toml](https://github.com/huggingface/kernels/blob/main/examples/kernels/relu-tvm-ffi/build.toml)
- [kernels-data/src/config/mod.rs](https://github.com/huggingface/kernels/blob/main/kernels-data/src/config/mod.rs)
- [kernels-data/src/config/v1.rs](https://github.com/huggingface/kernels/blob/main/kernels-data/src/config/v1.rs)
- [kernel-builder/src/pyproject/torch/mod.rs](https://github.com/huggingface/kernels/blob/main/kernel-builder/src/pyproject/torch/mod.rs)
- [kernel-builder/src/pyproject/tvm_ffi/mod.rs](https://github.com/huggingface/kernels/blob/main/kernel-builder/src/pyproject/tvm_ffi/mod.rs)
- [kernel-builder/src/pyproject/templates/torch/noarch/setup.py](https://github.com/huggingface/kernels/blob/main/kernel-builder/src/pyproject/templates/torch/noarch/setup.py)
- [kernel-builder/src/init.rs](https://github.com/huggingface/kernels/blob/main/kernel-builder/src/init.rs)
- [kernel-builder/src/init/templates/CARD.md](https://github.com/huggingface/kernels/blob/main/kernel-builder/src/init/templates/CARD.md)
- [kernel-builder/src/card.rs](https://github.com/huggingface/kernels/blob/main/kernel-builder/src/card.rs)
- [kernel-builder/src/upload.rs](https://github.com/huggingface/kernels/blob/main/kernel-builder/src/upload.rs)
- [kernels/src/kernels/utils.py](https://github.com/huggingface/kernels/blob/main/kernels/src/kernels/utils.py)
- [nix-builder/README.md](https://github.com/huggingface/kernels/blob/main/nix-builder/README.md)
</details>

# Example Kernels and Backend Variants

The `examples/kernels/` directory in the repository contains reference kernel projects that demonstrate how to structure, configure, build, and distribute kernels for the Hugging Face Hub. Each example pairs a simple kernel implementation (ReLU) with a different execution backend or FFI layer, giving kernel authors a working template they can copy and adapt.

The three example projects currently shipped are:

- `examples/kernels/relu` — a baseline ReLU kernel built with the default Torch extension pipeline
- `examples/kernels/relu-torch-stable-abi` — a ReLU kernel built against PyTorch's stable C++ ABI
- `examples/kernels/relu-tvm-ffi` — a ReLU kernel exposed through the TVM-FFI layer

## Supported Backends

The build pipeline resolves backend targets from the `general.backends` array in `build.toml`. The full set of supported backends is enumerated in the shared `Backend` enum used across the data model and CLI (`Source: [kernels-data/src/config/mod.rs:60-100]()`):

| Backend  | Description                          |
|----------|--------------------------------------|
| `cpu`    | CPU reference implementation         |
| `cuda`   | NVIDIA CUDA                          |
| `metal`  | Apple Metal                          |
| `rocm`   | AMD ROCm / HIP                       |
| `xpu`    | Intel XPU                            |
| `neuron` | AWS Trainium / Neuron (NKI)          |
| `cann`   | Huawei CANN                          |

Each backend produces a separate artifact under the repository's `build/` directory, identified by a directory name such as `torch-cuda`, `torch-xpu`, or `torch-metal`. The variant string and CMake/pyproject generation for these backends is implemented in the Torch pyproject module (`Source: [kernel-builder/src/pyproject/torch/mod.rs:1-30]()`).

## FFI Variants

Beyond the execution backend, the example kernels illustrate two distinct foreign-function interface (FFI) layers used to bridge C++ kernels to Python:

- **Torch C++ extension** — the default FFI used by the `relu` and `relu-torch-stable-abi` examples. Source files are compiled into a CPython extension and the functions are registered through `torch::Library` / `TORCH_LIBRARY` macros. The `setup.py` template used for backend-agnostic builds lives at `Source: [kernel-builder/src/pyproject/templates/torch/noarch/setup.py]()`.
- **TVM FFI** — used by the `relu-tvm-ffi` example. Kernels are registered through the TVM-FFI object system instead of `torch::Library`, which makes them consumable by any TVM-based runtime. The corresponding pyproject generation is implemented in `Source: [kernel-builder/src/pyproject/tvm_ffi/mod.rs:1-40]()`, including a dedicated `tvm_ffi/setup.py` template.

Selecting a non-default FFI is driven by the `build.toml` schema parsed in `Source: [kernels-data/src/config/v1.rs:1-50]()`, which the builder migrates into the current internal representation before rendering templates.

## Common Project Layout

All three examples follow the same scaffold produced by `kernel-builder init` and described in the Nix-builder README (`Source: [nix-builder/README.md:1-40]()`):

- `build.toml` — declarative build configuration. Required fields are `general.name`, `general.license`, and `general.backends`; the optional `general.hub` section supplies `repo-id` and `branch` for Hub distribution.
- `flake.nix` — Nix expression that drops the user into a reproducible development shell with the `kernel-builder` CLI on `PATH` and a writable Cachix cache for pre-built dependencies.
- `CARD.md` — Jinja2 template for the Hub model card. The master template lives at `Source: [kernel-builder/src/init/templates/CARD.md]()` and is filled in at upload time by `Source: [kernel-builder/src/card.rs]()`, which inspects the kernel's `torch-ext/<module>/__init__.py` and `layers/__init__.py` to enumerate functions and layers.
- `torch-ext/<module>/` — directory containing the C++ / CUDA / Metal / HIP source for the kernel, together with an `__init__.py` declaring the exposed functions and (optionally) a `layers/__init__.py` declaring `nn.Module` wrappers.

The `init` command is also where authors declare which backends to enable when scaffolding a new project; the supported values are described in `Source: [kernel-builder/src/init.rs]()` and map one-to-one to the `Backend` enum values listed above.

## Build and Distribution Flow

The canonical build path for an example kernel is:

```bash
nix develop path:examples/kernels/relu
kernel-builder build
```

`kernel-builder build` reads `build.toml`, generates a per-backend `setup.py` / CMake configuration, and produces one wheel per backend under `build/`. Uploading those wheels to the Hub is handled by `kernel-builder upload`, which infers the destination `repo-id` and `branch` from CLI arguments, the `general.hub` section of `build.toml`, or the variant `metadata.json` (in that order of precedence) (`Source: [kernel-builder/src/upload.rs:1-40]()`).

Once a kernel is on the Hub, it can be loaded from Python with an explicit version (required as of `kernels` v0.15.1):

```python
from kernels import get_kernel

activation = get_kernel("kernels-community/relu", version=1)
relu = activation.relu
```

The current environment is matched against the available variants by `get_kernel_variants`, which returns a sorted list of compatibility decisions with the most preferred variant first (`Source: [kernels/src/kernels/utils.py:1-60]()`). The matching criteria combine Python version, Torch version, the active CUDA/XPU/Metal backend, and the kernel's declared `backends` array.

## See Also

- `kernel-builder` CLI reference
- `build.toml` schema reference (v1 migration handled in `kernels-data/src/config/v1.rs`)
- Hub kernel repository type (introduced in v0.14.0)
- Issue [#648](https://github.com/huggingface/kernels/issues/648) — tracking dirty-build reproducibility metadata that future versions of the example kernels will surface.

---

<!-- evidence_pipeline_checked: true -->
<!-- evidence_injected: true -->

---

## Pitfall Log

Project: huggingface/kernels

Summary: Found 9 structured pitfall item(s), including 0 high/blocking item(s). Top priority: Installation risk - Installation risk requires verification.

## 1. Installation risk - Installation risk requires verification

- Severity: medium
- Evidence strength: source_linked
- Finding: Project evidence flags a installation risk. Review the linked source before relying on this workflow.
- User impact: May increase setup, validation, or first-run risk for the user.
- Evidence: community_evidence:github | https://github.com/huggingface/kernels/issues/651

## 2. Installation risk - Installation risk requires verification

- Severity: medium
- Evidence strength: source_linked
- Finding: Project evidence flags a installation risk. Review the linked source before relying on this workflow.
- User impact: May increase setup, validation, or first-run risk for the user.
- Evidence: community_evidence:github | https://github.com/huggingface/kernels/issues/648

## 3. Capability evidence risk - Capability evidence risk requires verification

- Severity: medium
- Evidence strength: source_linked
- Finding: README/documentation is current enough for a first validation pass.
- User impact: May increase setup, validation, or first-run risk for the user.
- Evidence: capability.assumptions | https://github.com/huggingface/kernels

## 4. Maintenance risk - Maintenance risk requires verification

- Severity: medium
- Evidence strength: source_linked
- Finding: Project evidence flags a maintenance risk. Review the linked source before relying on this workflow.
- User impact: May increase setup, validation, or first-run risk for the user.
- Evidence: evidence.maintainer_signals | https://github.com/huggingface/kernels

## 5. Security or permission risk - Security or permission risk requires verification

- Severity: medium
- Evidence strength: source_linked
- Finding: no_demo
- User impact: May increase setup, validation, or first-run risk for the user.
- Evidence: downstream_validation.risk_items | https://github.com/huggingface/kernels

## 6. Security or permission risk - Security or permission risk requires verification

- Severity: medium
- Evidence strength: source_linked
- Finding: no_demo
- User impact: May increase setup, validation, or first-run risk for the user.
- Evidence: risks.scoring_risks | https://github.com/huggingface/kernels

## 7. Security or permission risk - Security or permission risk requires verification

- Severity: medium
- Evidence strength: source_linked
- Finding: Project evidence flags a security or permission risk. Review the linked source before relying on this workflow.
- User impact: May increase setup, validation, or first-run risk for the user.
- Evidence: community_evidence:github | https://github.com/huggingface/kernels/issues/657

## 8. Maintenance risk - Maintenance risk requires verification

- Severity: low
- Evidence strength: source_linked
- Finding: issue_or_pr_quality=unknown。
- User impact: May increase setup, validation, or first-run risk for the user.
- Evidence: evidence.maintainer_signals | https://github.com/huggingface/kernels

## 9. Maintenance risk - Maintenance risk requires verification

- Severity: low
- Evidence strength: source_linked
- Finding: release_recency=unknown。
- User impact: May increase setup, validation, or first-run risk for the user.
- Evidence: evidence.maintainer_signals | https://github.com/huggingface/kernels

<!-- canonical_name: huggingface/kernels; human_manual_source: deepwiki_human_wiki -->
