# https://github.com/Tencent-Hunyuan/HunyuanVideo Project Manual

Generated at: 2026-06-25 06:34:25 UTC

## Table of Contents

- [HunyuanVideo Overview and System Architecture](#page-1)
- [Inference Workflows and Deployment Modes](#page-2)
- [Core Model Components and Diffusion Pipeline](#page-3)
- [Community Roadmap, Troubleshooting, and Known Issues](#page-4)

<a id='page-1'></a>

## HunyuanVideo Overview and System Architecture

### Related Pages

Related topics: [Inference Workflows and Deployment Modes](#page-2), [Core Model Components and Diffusion Pipeline](#page-3)

<details>
<summary>Related Source Files</summary>

The following source files were used to generate this page:

- [LICENSE.txt](https://github.com/Tencent-Hunyuan/HunyuanVideo/blob/main/LICENSE.txt)
- [ckpts/README.md](https://github.com/Tencent-Hunyuan/HunyuanVideo/blob/main/ckpts/README.md)
- [hyvideo/config.py](https://github.com/Tencent-Hunyuan/HunyuanVideo/blob/main/hyvideo/config.py)
- [hyvideo/constants.py](https://github.com/Tencent-Hunyuan/HunyuanVideo/blob/main/hyvideo/constants.py)
- [hyvideo/inference.py](https://github.com/Tencent-Hunyuan/HunyuanVideo/blob/main/hyvideo/inference.py)
- [hyvideo/prompt_rewrite.py](https://github.com/Tencent-Hunyuan/HunyuanVideo/blob/main/hyvideo/prompt_rewrite.py)
- [hyvideo/modules/__init__.py](https://github.com/Tencent-Hunyuan/HunyuanVideo/blob/main/hyvideo/modules/__init__.py)
- [hyvideo/modules/models.py](https://github.com/Tencent-Hunyuan/HunyuanVideo/blob/main/hyvideo/modules/models.py)
- [hyvideo/vae/autoencoder_kl_causal_3d.py](https://github.com/Tencent-Hunyuan/HunyuanVideo/blob/main/hyvideo/vae/autoencoder_kl_causal_3d.py)
- [hyvideo/diffusion/schedulers/scheduling_flow_match_discrete.py](https://github.com/Tencent-Hunyuan/HunyuanVideo/blob/main/hyvideo/diffusion/schedulers/scheduling_flow_match_discrete.py)
- [hyvideo/diffusion/pipelines/pipeline_hunyuan_video.py](https://github.com/Tencent-Hunyuan/HunyuanVideo/blob/main/hyvideo/diffusion/pipelines/pipeline_hunyuan_video.py)
</details>

# HunyuanVideo Overview and System Architecture

## Purpose and Scope

HunyuanVideo is an open-source text-to-video (T2V) generation system released by Tencent under the **Tencent Hunyuan Community License Agreement** (Source: [LICENSE.txt:1-25]()). The repository provides the model weights, inference code, and configuration utilities needed to run high-resolution video synthesis on multi-GPU setups. As of this writing, the public release centers on the T2V checkpoint family, while the image-to-video (I2V) variant is on the public roadmap — a recurring point of community interest tracked in issues #128, #131, #172, #180, and #198.

The repository is organized so that pretrained weights are stored under `HunyuanVideo/ckpts/` in a strict directory layout containing `hunyuan-video-t2v-720p/transformers`, `vae`, `text_encoder`, and `text_encoder_2` (Source: [ckpts/README.md:1-10]()). A second community-supported MLLM path, `llava-llama-3-8b-v1_1-transformers`, can be preprocessed into the `text_encoder` directory to save GPU memory (Source: [ckpts/README.md:35-50]()).

## System Architecture

The inference stack is a multi-stage latent diffusion pipeline. The user prompt is first optionally rewritten by an LLM in `hyvideo/prompt_rewrite.py` (Normal or Master mode), then encoded by two frozen text encoders, diffused in latent space by a DiT-style transformer, and finally decoded back to pixel space by a causal 3D VAE. A flow-matching scheduler governs the denoising trajectory.

```mermaid
flowchart LR
    A[User Prompt] --> B[Prompt Rewriter<br/>hyvideo/prompt_rewrite.py]
    B --> C[Text Encoder 1<br/>MLLM / LLaMA]
    A --> D[Text Encoder 2<br/>CLIP ViT-L/14]
    C --> E[HYVideoDiffusionTransformer<br/>hyvideo/modules/models.py]
    D --> E
    E --> F[FlowMatchDiscreteScheduler<br/>scheduling_flow_match_discrete.py]
    F --> G[Causal 3D VAE<br/>autoencoder_kl_causal_3d.py]
    G --> H[Output Video Frames]
```

The pipeline entry point is `HunyuanVideoPipeline`, a subclass of `DiffusionPipeline` that wires together a VAE, two text encoders, a `HYVideoDiffusionTransformer`, and a `KarrasDiffusionSchedulers`-compatible scheduler (Source: [hyvideo/diffusion/pipelines/pipeline_hunyuan_video.py:10-30]()). It defines an explicit `model_cpu_offload_seq` of `"text_encoder->text_encoder_2->transformer->vae"` and excludes the transformer from CPU offload, indicating that the transformer is the most memory-hungry component (Source: [hyvideo/diffusion/pipelines/pipeline_hunyuan_video.py:32-35]()).

## Key Components

### DiT Backbone

`HYVideoDiffusionTransformer` is registered via `@register_to_config` and constructed by `load_model` in `hyvideo/modules/__init__.py`, which dispatches on the `args.model` key against the `HUNYUAN_VIDEO_CONFIG` registry (Source: [hyvideo/modules/__init__.py:1-20]()). The default inference model is `HYVideo-T/2-cfgdistill` (Source: [hyvideo/config.py:30-40]()). The transformer is composed of `MMDoubleStreamBlock` and `MMSingleStreamBlock` modules, mirroring the SD3 / Flux design where text and video tokens receive separate modulation before being fused (Source: [hyvideo/modules/models.py:18-80]()). A `SingleTokenRefiner` block refines text token embeddings before they enter the joint attention layers.

### Text Encoding and Prompt Templates

Two text encoders are loaded. The first is a decoder-only MLLM (default `llava-llama-3-8b-v1_1-transformers`), which requires a Llama-3 chat template to be applied via `PROMPT_TEMPLATE_ENCODE` (Source: [hyvideo/constants.py:20-40]()). A dedicated `PROMPT_TEMPLATE_ENCODE_VIDEO` template instructs the same MLLM to describe videos across five aspects — content/theme, color/shape/spatial relations, actions/temporal changes, environment/style, and camera angles (Source: [hyvideo/constants.py:28-50]()). The second encoder is `openai/clip-vit-large-patch14`, which provides an alternative CLIP embedding (Source: [ckpts/README.md:25-30]()). A shared negative prompt is hard-coded: `"Aerial view, aerial view, overexposed, low quality, deformation, a poor composition, bad hands, bad teeth, bad eyes, bad limbs, distortion"` (Source: [hyvideo/constants.py:50-55]()).

### Causal 3D VAE

The decoder is a 3D causal VAE (`AutoencoderKLCausal3D`) modified from `diffusers==0.29.2`, supporting a fall-back import path against either the patched or upstream `diffusers.loaders` API (Source: [hyvideo/vae/autoencoder_kl_causal_3d.py:15-30]()). The "causal" design preserves temporal causality across video frames, which is essential for coherent T2V outputs.

### Flow-Matching Scheduler

Sampling is driven by `FlowMatchDiscreteScheduler`, derived from Stability AI's flow-matching implementation and adapted from `diffusers==0.29.2` (Source: [hyvideo/diffusion/schedulers/scheduling_flow_match_discrete.py:1-30]()). It exposes a `step` method that returns a `FlowMatchDiscreteSchedulerOutput` with `prev_sample` for iterative denoising.

## Configuration and Usage Patterns

Argument parsing is centralized in `hyvideo/config.py`, which groups CLI flags into network, extra-models, denoise-schedule, inference, and parallel sections. The network group exposes `--model`, `--latent-channels`, `--precision` (`fp32`/`fp16`/`bf16`, default `bf16`), and `--rope-theta` (Source: [hyvideo/config.py:25-55]()). Supported precisions, normalization types, and activation types are whitelisted as `PRECISIONS`, `NORMALIZATION_TYPE`, and `ACTIVATION_TYPE` sets in `hyvideo/constants.py` (Source: [hyvideo/constants.py:1-20]()). The `--flow-reverse` flag and the `--ulysses-degree` / `--ring-degree` flags used in community-reported multi-GPU recipes (e.g. issue #249) are defined in the parallel-args group.

A typical inference command is `torchrun --nproc_per_node=N sample_video.py --video-size 1280 720 --video-length 129 --infer-steps 50 --prompt "..." --flow-reverse --seed 42 --ulysses-degree N` (Source: [hyvideo/inference.py]() and issue #249). This shows the pipeline is designed for distributed execution: a `torchrun` launcher combined with sequence-parallel strategies (Ulysses, Ring) sharding the `HYVideoDiffusionTransformer` across devices.

## Community Engagement and Roadmap

The community consistently raises three themes: (1) the I2V release date — tracked in issues #128, #131, #172, #180, and #198; (2) the absence of an official fine-tuning script, flagged in issue #302; and (3) installation/parallel-inference pitfalls such as the CUDA 12.4 / cuBLAS 12.4.5.8 / cuDNN 9.0 requirement discussed in issue #317. Users encountering multi-GPU hangs typically need to verify these driver versions before assuming a code defect.

## See Also

- [Pretrained Model Download Guide](ckpts-README)
- [Pipeline Reference](pipeline-hunyuan-video)
- [HYVideoDiffusionTransformer Architecture](hyvideo-modules-models)
- [Flow-Matching Scheduler Details](scheduling-flow-match-discrete)
- [3D Causal VAE](autoencoder-kl-causal-3d)

---

<a id='page-2'></a>

## Inference Workflows and Deployment Modes

### Related Pages

Related topics: [HunyuanVideo Overview and System Architecture](#page-1), [Core Model Components and Diffusion Pipeline](#page-3), [Community Roadmap, Troubleshooting, and Known Issues](#page-4)

<details>
<summary>Related Source Files</summary>

The following source files were used to generate this page:

- [sample_video.py](https://github.com/Tencent-Hunyuan/HunyuanVideo/blob/main/sample_video.py)
- [gradio_server.py](https://github.com/Tencent-Hunyuan/HunyuanVideo/blob/main/gradio_server.py)
- [scripts/run_sample_video.sh](https://github.com/Tencent-Hunyuan/HunyuanVideo/blob/main/scripts/run_sample_video.sh)
- [scripts/run_sample_video_fp8.sh](https://github.com/Tencent-Hunyuan/HunyuanVideo/blob/main/scripts/run_sample_video_fp8.sh)
- [scripts/run_sample_video_multigpu.sh](https://github.com/Tencent-Hunyuan/HunyuanVideo/blob/main/scripts/run_sample_video_multigpu.sh)
- [hyvideo/inference.py](https://github.com/Tencent-Hunyuan/HunyuanVideo/blob/main/hyvideo/inference.py)
- [hyvideo/diffusion/pipelines/pipeline_hunyuan_video.py](https://github.com/Tencent-Hunyuan/HunyuanVideo/blob/main/hyvideo/diffusion/pipelines/pipeline_hunyuan_video.py)
- [hyvideo/diffusion/schedulers/scheduling_flow_match_discrete.py](https://github.com/Tencent-Hunyuan/HunyuanVideo/blob/main/hyvideo/diffusion/schedulers/scheduling_flow_match_discrete.py)
- [hyvideo/config.py](https://github.com/Tencent-Hunyuan/HunyuanVideo/blob/main/hyvideo/config.py)
- [hyvideo/constants.py](https://github.com/Tencent-Hunyuan/HunyuanVideo/blob/main/hyvideo/constants.py)
- [hyvideo/text_encoder/__init__.py](https://github.com/Tencent-Hunyuan/HunyuanVideo/blob/main/hyvideo/text_encoder/__init__.py)
- [ckpts/README.md](https://github.com/Tencent-Hunyuan/HunyuanVideo/blob/main/ckpts/README.md)
</details>

# Inference Workflows and Deployment Modes

## Overview

HunyuanVideo ships multiple entry points that share a common inference core defined in `hyvideo/inference.py` (the `HunyuanVideoSampler` class). That core orchestrates three subsystems — a text encoder, a Multimodal DiT denoiser, and a causal 3D VAE — and drives them with a flow-matching discrete scheduler from `hyvideo/diffusion/schedulers/scheduling_flow_match_discrete.py`. Around this core, the repository exposes four deployment modes: a CLI single-GPU runner (`sample_video.py`), a multi-GPU sequence-parallel runner launched with `torchrun`, an FP8-quantized low-VRAM variant, and a Gradio web UI (`gradio_server.py`).

The shared pipeline is implemented in `hyvideo/diffusion/pipelines/pipeline_hunyuan_video.py` and accepts pre-computed `prompt_embeds`/`negative_prompt_embeds`, raw `latents`, a `generator`, and `guidance_rescale` — the same surface used by every entry point below.

## Single-GPU CLI Inference

The canonical entry point is `sample_video.py`, which parses command-line arguments, builds a `HunyuanVideoSampler`, and calls its `predict(...)` method. Default arguments are declared in `hyvideo/config.py`, where key flags include `--model-base` (root of the `ckpts/` tree), `--dit-weight` (defaults to `ckpts/hunyuan-video-t2v-720p/transformers/mp_rank_00_model_states.pt`), `--model-resolution` (`540p` or `720p`), `--use-cpu-offload`, and `--load-key` (`module` or `ema`). Resolution is also tied to model selection: `sample_video.py` automatically picks the matching DiT checkpoint and VAE for the requested `model-resolution`.

The reference launcher `scripts/run_sample_video.sh` invokes:

```bash
python sample_video.py \
    --prompt "A cat walks on the grass, realistic style." \
    --video-size 1280 720 --video-length 129 --infer-steps 50 \
    --flow-reverse --seed 42 --ulysses-degree 1 --ring-degree 1
```

Inside `HunyuanVideoSampler.predict`, the prompt is routed through `TextEncoder` (`hyvideo/text_encoder/__init__.py`), which selects `PROMPT_TEMPLATE["dit-llm-encode-video"]` from `hyvideo/constants.py` and applies the LLaMA-style chat template (`PROMPT_TEMPLATE_ENCODE_VIDEO`) before tokenization. Negative prompts default to the string in `hyvideo/constants.py` (`NEGATIVE_PROMPT = "Aerial view, aerial view, overexposed, low quality, ..."`). The resulting embeddings, together with `flow_shift`, `guidance_scale`, and `embedded_guidance_scale`, are passed to `self.pipeline(...)`, whose `data_type` argument is automatically set to `"video"` when `target_video_length > 1` else `"image"`.

## Multi-GPU and Sequence-Parallel Inference

For multi-GPU runs the project provides `scripts/run_sample_video_multigpu.sh`, which wraps the same script under `torchrun`:

```bash
torchrun --nproc_per_node=$NGPU sample_video.py \
    --video-size 1280 720 --video-length 129 --infer-steps 50 \
    --prompt "..." --flow-reverse --seed 42 \
    --ulysses-degree $ULYSSES_DEGREE --ring-degree $RING_DEGREE
```

The `--ulysses-degree` and `--ring-degree` flags enable DeepSpeed Ulysses sequence parallelism and ring attention respectively; the model enables them by calling `parallel_attention` from `hyvideo/modules/attenion.py` when those degrees are greater than one. The DiT block in `hyvideo/modules/models.py` exposes `MMDoubleStreamBlock` and `MMSingleStreamBlock` layers (20 + 40 by default) that participate in distributed attention.

This is the same code path users hit in community issue #249, where parallel inference failed mid-run. The most common root causes reported there are (a) mismatched CUDA / cuBLAS versions (the project requires `nvidia-cublas-cu12==12.4.5.8` plus `LD_LIBRARY_PATH` pointed at the conda `cublas` libs, or the bundled CUDA 12 Docker image — see issue #317), and (b) `--ulysses-degree` / `--ring-degree` not evenly dividing the head count. The model has `heads_num=24` (see `HunyuanVideo` constructor in `hyvideo/modules/models.py`), so valid `ulysses-degree * ring-degree` combinations must divide 24.

## FP8 Quantized Deployment

To reduce VRAM, the `ckpts/` tree ships an FP8-everything checkpoint in addition to the bf16 weight. The launcher `scripts/run_sample_video_fp8.sh` activates it with:

```bash
--dit-weight ckpts/hunyuan-video-t2v-720p/transformers/mp_rank_00_model_states_fp8.pt \
--dit-weight-map ckpts/hunyuan-video-t2v-720p/transformers/mp_rank_00_model_states_fp8_map.pt
```

The sampler reads the FP8 weight together with its scale map, dequantizes on the fly, and reuses the same pipeline as the bf16 path — there is no separate model architecture for the FP8 variant. Per `ckpts/README.md`, both formats live under `ckpts/hunyuan-video-t2v-720p/transformers/` alongside the VAE and the dual text encoders (`text_encoder`, `text_encoder_2`).

## Gradio Web UI

`gradio_server.py` provides a browser interface that wraps `HunyuanVideoSampler`. It exposes resolution presets (`1280x720`, `720x1280`, `1104x832`, etc.), inference steps, guidance scale, embedded guidance scale, flow shift, and a negative-prompt textbox. On submit it calls `infer(...)` with `num_videos_per_prompt=1`, `batch_size=1`, and writes the resulting video via `save_videos_grid` to `gradio_outputs/<timestamp>_seed<seed>_<prompt-slug>.mp4` at 24 fps. The default seed prompt is `"A cat walks on the grass, realistic style."`.

## End-to-End Data Flow

```mermaid
flowchart LR
    A[Prompt] --> B[TextEncoder\nLLaMA-style template]
    N[Negative Prompt] --> B
    B --> C[Prompt / Neg Embeddings]
    C --> D[DiT Denoiser\nFlow-Match Scheduler]
    Z[Random Latents] --> D
    D --> E[Clean Latents]
    E --> F[Causal 3D VAE\nDecode]
    F --> G[Video Frames / Image]
    G --> H[save_videos_grid → MP4]
```

## Configuration Reference

| Flag | Default | Purpose |
|------|---------|---------|
| `--model-base` | `ckpts` | Root of all model weights. |
| `--dit-weight` | `.../mp_rank_00_model_states.pt` | DiT checkpoint (bf16 or FP8). |
| `--model-resolution` | `540p` | Selects 540p or 720p DiT + VAE. |
| `--use-cpu-offload` | off | Offload DiT/VAE/TextEncoder to CPU. |
| `--load-key` | `module` | `module` (weights) or `ema` (EMA copy). |
| `--ulysses-degree` | `1` | Ulysses sequence-parallel degree. |
| `--ring-degree` | `1` | Ring-attention degree. |
| `--flow-shift` | per resolution | Shifts the flow-matching sigma schedule. |
| `--infer-steps` | — | Number of denoising iterations. |
| `--video-length` / `--video-size` | — | Output frame count and `(H, W)`. |

## Common Failure Modes

- **cuBLAS / CUDA mismatch** — Issue #317 documents the need for `nvidia-cublas-cu12==12.4.5.8` and a matching `LD_LIBRARY_PATH`, or the official CUDA 12 Docker image.
- **Parallel-inference crash on multi-GPU** — Issue #249 traces the failure to improper setup of `--ulysses-degree` / `--ring-degree`; both must be `1` for single-GPU runs.
- **Missing checkpoints** — `ckpts/README.md` requires `huggingface-cli download tencent/HunyuanVideo --local-dir ./ckpts`; the sampler raises if `text_encoder`, `text_encoder_2`, `vae`, or the DiT weights cannot be resolved under `--model-base`.

## See Also

- HunyuanVideo GitHub: <https://github.com/Tencent-Hunyuan/HunyuanVideo>
- Community thread on parallel inference: issue #249
- Community thread on CUDA/cuBLAS setup: issue #317
- Community thread on Image-to-Video model release roadmap: issues #128, #131, #180

---

<a id='page-3'></a>

## Core Model Components and Diffusion Pipeline

### Related Pages

Related topics: [HunyuanVideo Overview and System Architecture](#page-1), [Inference Workflows and Deployment Modes](#page-2)

<details>
<summary>Related Source Files</summary>

The following source files were used to generate this page:

- [hyvideo/modules/models.py](https://github.com/Tencent-Hunyuan/HunyuanVideo/blob/main/hyvideo/modules/models.py)
- [hyvideo/diffusion/pipelines/pipeline_hunyuan_video.py](https://github.com/Tencent-Hunyuan/HunyuanVideo/blob/main/hyvideo/diffusion/pipelines/pipeline_hunyuan_video.py)
- [hyvideo/diffusion/schedulers/scheduling_flow_match_discrete.py](https://github.com/Tencent-Hunyuan/HunyuanVideo/blob/main/hyvideo/diffusion/schedulers/scheduling_flow_match_discrete.py)
- [hyvideo/inference.py](https://github.com/Tencent-Hunyuan/HunyuanVideo/blob/main/hyvideo/inference.py)
- [hyvideo/text_encoder/__init__.py](https://github.com/Tencent-Hunyuan/HunyuanVideo/blob/main/hyvideo/text_encoder/__init__.py)
- [hyvideo/vae/autoencoder_kl_causal_3d.py](https://github.com/Tencent-Hunyuan/HunyuanVideo/blob/main/hyvideo/vae/autoencoder_kl_causal_3d.py)
- [hyvideo/constants.py](https://github.com/Tencent-Hunyuan/HunyuanVideo/blob/main/hyvideo/constants.py)
- [hyvideo/config.py](https://github.com/Tencent-Hunyuan/HunyuanVideo/blob/main/hyvideo/config.py)
- [hyvideo/utils/preprocess_text_encoder_tokenizer_utils.py](https://github.com/Tencent-Hunyuan/HunyuanVideo/blob/main/hyvideo/utils/preprocess_text_encoder_tokenizer_utils.py)
- [ckpts/README.md](https://github.com/Tencent-Hunyuan/HunyuanVideo/blob/main/ckpts/README.md)
</details>

# Core Model Components and Diffusion Pipeline

## Overview

The HunyuanVideo repository implements a text-to-video (T2V) generation system whose center of gravity is a multimodal Diffusion Transformer (DiT) combined with a causal 3D VAE, dual text encoders, and a flow-matching Euler scheduler wrapped in a Diffusers-style pipeline. The model definition lives in [hyvideo/modules/models.py](https://github.com/Tencent-Hunyuan/HunyuanVideo/blob/main/hyvideo/modules/models.py), the inference orchestration in [hyvideo/inference.py](https://github.com/Tencent-Hunyuan/HunyuanVideo/blob/main/hyvideo/inference.py), and the diffusion loop in [hyvideo/diffusion/pipelines/pipeline_hunyuan_video.py](https://github.com/Tencent-Hunyuan/HunyuanVideo/blob/main/hyvideo/diffusion/pipelines/pipeline_hunyuan_video.py). The design intentionally mirrors SD3 and Flux.1, but introduces a 3D-aware video latent path and a "flow-matching" discrete scheduler instead of classical DDPM.

The pipeline expects four major sub-networks to be loaded from `ckpts/` (see [ckpts/README.md](https://github.com/Tencent-Hunyuan/HunyuanVideo/blob/main/ckpts/README.md)): the DiT transformer (`hunyuan-video-t2v-720p/transformers/`), the causal 3D VAE (`hunyuan-video-t2v-720p/vae/`), and two text encoders (`text_encoder` for CLIP-L, `text_encoder_2` for the LLM). A high-level view of the runtime data flow is shown below.

```mermaid
flowchart LR
    P[Prompt] --> TE1[CLIP-L Text Encoder]
    P --> TE2[LLM Text Encoder + Refiner]
    TE1 --> Proj[Text Projection]
    TE2 --> Proj
    Proj --> DiT[HYVideoDiffusionTransformer]
    N[Random Noise] --> Sched[Flow-Match Scheduler]
    Sched --> DiT
    DiT --> Latent[Video Latent]
    Latent --> VAE[Causal 3D VAE Decoder]
    VAE --> Out[Output Video / Image]
```

Source: [hyvideo/diffusion/pipelines/pipeline_hunyuan_video.py:18-37](), [hyvideo/inference.py:53-145]().

## Core DiT Model

The `HYVideoDiffusionTransformer` defined in [hyvideo/modules/models.py](https://github.com/Tencent-Hunyuan/HunyuanVideo/blob/main/hyvideo/modules/models.py) is a dual-stream then single-stream DiT. The constructor accepts `mm_double_blocks_depth=20` and `mm_single_blocks_depth=40` by default, producing 20 multimodal double-stream blocks followed by 40 single-stream blocks. Key initialization arguments are summarized below.

| Argument | Default | Role |
|---|---|---|
| `patch_size` | `[1, 2, 2]` | 3D patch size: 1 along temporal axis, 2×2 spatially |
| `in_channels` | `4` | Latent channels from the VAE |
| `hidden_size` | `3072` | Transformer hidden width |
| `heads_num` | `24` | Attention heads |
| `mlp_width_ratio` | `4.0` | MLP expansion factor |
| `rope_dim_list` | `[16, 56, 56]` | RoPE split across T, H, W axes |
| `text_projection` | `"single_refiner"` | Token refiner configuration for text features |
| `guidance_embed` | `False` | Reserved for distillation guidance |
| `use_attention_mask` | `True` | Pad-mask text tokens during attention |

Source: [hyvideo/modules/models.py:80-110]().

### MMDoubleStreamBlock

`MMDoubleStreamBlock` runs two parallel streams — one for visual tokens and one for text tokens — and only lets them interact through a joint attention operation. The visual path applies `img_mod` modulation; the text path applies `txt_mod`. Both projections produce QKV plus an MLP gate (`mlp_in`), and the class explicitly cites SD3 ([arXiv:2403.03206](https://arxiv.org/abs/2403.03206)) and Flux.1 as design references. RoPE is applied through `apply_rotary_emb` from [hyvideo/modules/posemb_layers.py]() using the per-axis `rope_dim_list`.

Source: [hyvideo/modules/models.py:21-78]().

### MMSingleStreamBlock

`MMSingleStreamBlock` collapses the two streams into one. It uses a fused `linear1` projection that emits `3 * hidden_size + mlp_hidden_dim` channels (QKV + MLP input in a single matmul) and a fused `linear2` that combines attention output and MLP output. This pattern is the same "parallel linear layers" trick used in [arXiv:2302.05442](https://arxiv.org/abs/2302.05442). QK normalization (`qk_norm=True`, `qk_norm_type="rms"`) is applied before scaled-dot-product attention, with `qk_scale = head_dim ** -0.5`.

Source: [hyvideo/modules/models.py:130-185]().

## Diffusion Pipeline and Scheduler

The pipeline is a thin wrapper around Diffusers' `DiffusionPipeline`, with the offload sequence declared as `"text_encoder->text_encoder_2->transformer->vae"` and the transformer explicitly excluded from CPU offload (`_exclude_from_cpu_offload = ["transformer"]`). Optional components include `text_encoder_2` (the LLM is optional in this construction). The pipeline accepts prompts, negative prompts, latent noise, guidance scale, and an `embedded_guidance_scale` that is forwarded into the denoising loop.

Source: [hyvideo/diffusion/pipelines/pipeline_hunyuan_video.py:18-65](), [hyvideo/diffusion/pipelines/pipeline_hunyuan_video.py:80-120]().

The scheduler is a custom flow-matching discrete implementation living in [hyvideo/diffusion/schedulers/scheduling_flow_match_discrete.py](https://github.com/Tencent-Hunyuan/HunyuanVideo/blob/main/hyvideo/diffusion/schedulers/scheduling_flow_match_discrete.py). In `step()`, the previous sample is computed as `prev_sample = sample + model_output.to(torch.float32) * dt` where `dt = sigmas[i+1] - sigmas[i]`, with the sample being upcast to float32 to avoid precision loss. Only the `"euler"` solver is supported; any other solver raises `ValueError`. Integer timestep inputs are explicitly rejected with a clear error message.

Source: [hyvideo/diffusion/schedulers/scheduling_flow_match_discrete.py:140-185]().

`HunyuanVideoSampler.predict_noise_per_step` and the surrounding `inference_from_validation` / `inference_from_captions` methods wire the pipeline together: they build the text encoders, compute `freqs_cis` (the RoPE cos/sin tables), and call `self.pipeline(...)` with `output_type="pil"`. The argument `data_type="video" if target_video_length > 1 else "image"` lets the same pipeline drive both image and video outputs, which is why some community request threads (e.g. #128, #131, #172, #180, #198) for an image-to-video (I2V) checkpoint are still open — the public release only ships the T2V transformer.

Source: [hyvideo/inference.py:60-160](), [community issues #128, #131, #172, #180, #198](); [hyvideo/config.py:80-140]().

## Supporting Components

### Text Encoders

Two text encoders are wired up in [hyvideo/text_encoder/__init__.py](https://github.com/Tencent-Hunyuan/HunyuanVideo/blob/main/hyvideo/text_encoder/__init__.py). `clipL` uses HuggingFace's `CLIPTextModel`, while `llm` loads a decoder-only language model via `AutoModel`. The LLM's `norm` is aliased to `final_layer_norm` so that downstream code can call `.final_layer_norm(x)` uniformly across encoders. The encoders are always switched to `eval()` mode and frozen (`requires_grad_(False)`). The dual-encoder setup is reflected in `PROMPT_TEMPLATE` constants that include both a generic image description template (`dit-llm-encode`) and a video-aware template (`dit-llm-encode-video`) with corresponding `crop_start` offsets (36 and 95 respectively).

Source: [hyvideo/text_encoder/__init__.py:18-75](), [hyvideo/constants.py:30-60]().

For community users who want to fine-tune, a one-off utility `preprocess_text_encoder_tokenizer_utils.py` extracts the language-model head and tokenizer from a LLaVA checkpoint into a pure text-encoder checkpoint compatible with the `llm` loader above. Combined with the `--use-cpu-offload` and `--load-key module|ema` flags exposed in [hyvideo/config.py](https://github.com/Tencent-Hunyuan/HunyuanVideo/blob/main/hyvideo/config.py), this is the closest the repository currently gets to a fine-tuning pathway — community issue #302 explicitly requests a more complete fine-tuning script.

Source: [hyvideo/utils/preprocess_text_encoder_tokenizer_utils.py:8-30](), [community issue #302]().

### Causal 3D VAE

The VAE is `AutoencoderKLCausal3D`, a Diagonal-Gaussian autoencoder with causal 3D convolutions defined in [hyvideo/vae/autoencoder_kl_causal_3d.py](https://github.com/Tencent-Hunyuan/HunyuanVideo/blob/main/hyvideo/vae/autoencoder_kl_causal_3d.py). It is registered as a Diffusers `ModelMixin`/`ConfigMixin`, supports gradient checkpointing, and yields `(s_ratio, t_ratio)` that the pipeline uses to snap requested output sizes to the VAE's spatial/temporal stride. The `vae_tiling` flag in [hyvideo/config.py]() enables tiled decoding to fit long videos into limited VRAM.

Source: [hyvideo/vae/autoencoder_kl_causal_3d.py:18-55](), [hyvideo/inference.py:35-50]().

### Configuration Surface

The CLI surface for inference is in [hyvideo/config.py](https://github.com/Tencent-Hunyuan/HunyuanVideo/blob/main/hyvideo/config.py). Notable flags used by the pipeline are `--infer-steps` (default 50), `--batch-size`, `--num-videos` per prompt, `--video-size` (single int or `[H, W]`, default 720×1280), `--video-length` in frames, `--save-path`, and the precision switches (`--text-encoder-precision`, `--vae-precision`, `--dit-precision`). Parallel inference is enabled through external launchers (`torchrun --nproc_per_node=N`) and the `--ulysses-degree` / `--ring-degree` flags that activate sequence-parallel attention — a frequent source of community confusion as seen in issue #249.

Source: [hyvideo/config.py:30-160](), [community issue #249]().

## Common Failure Modes

1. **CUDA / cuBLAS mismatch** — community issue #317 documents that PyTorch compiled against the wrong CUDA runtime triggers linker errors; the recommended fix is to install `nvidia-cublas-cu12==12.4.5.8` and set `LD_LIBRARY_PATH` to the conda site-packages, or use the official CUDA 12 Docker image.
2. **Parallel inference hangs** — issue #249 reports that `torchrun --nproc_per_node=2` fails to start when the sequence-parallel groups are misconfigured; the same `inference.py` `HunyuanVideoSampler` must be invoked with matching `--ulysses-degree` and `--ring-degree`.
3. **Out-of-memory on long videos** — the VAE can be tiled (`--vae-tiling`) and the transformer is excluded from CPU offload; if OOM persists, reduce `--video-length` and `--video-size` in that order.
4. **I2V / fine-tuning scripts** — the open source release does not yet ship image-to-video weights (#128, #131, #172, #180, #198) or an official fine-tuning script (#302); users currently adapt `inference.py` and the text-encoder preprocess utility as a starting point.

## See Also

- [Pretrained Checkpoints Download Guide](https://github.com/Tencent-Hunyuan/HunyuanVideo/blob/main/ckpts/README.md)
- HunyuanVideo inference command-line reference ([hyvideo/config.py](https://github.com/Tencent-Hunyuan/HunyuanVideo/blob/main/hyvideo/config.py))
- Tencent Hunyuan Community License Agreement ([LICENSE.txt](https://github.com/Tencent-Hunyuan/HunyuanVideo/blob/main/LICENSE.txt)) — note the Territory restriction (excluding EU, UK, and South Korea) and the 100M-MAU commercial clause.

---

<a id='page-4'></a>

## Community Roadmap, Troubleshooting, and Known Issues

### Related Pages

Related topics: [HunyuanVideo Overview and System Architecture](#page-1), [Inference Workflows and Deployment Modes](#page-2)

<details>
<summary>Related Source Files</summary>

The following source files were used to generate this page:

- [LICENSE.txt](https://github.com/Tencent-Hunyuan/HunyuanVideo/blob/main/LICENSE.txt)
- [README.md](https://github.com/Tencent-Hunyuan/HunyuanVideo/blob/main/README.md)
- [ckpts/README.md](https://github.com/Tencent-Hunyuan/HunyuanVideo/blob/main/ckpts/README.md)
- [hyvideo/constants.py](https://github.com/Tencent-Hunyuan/HunyuanVideo/blob/main/hyvideo/constants.py)
- [hyvideo/modules/models.py](https://github.com/Tencent-Hunyuan/HunyuanVideo/blob/main/hyvideo/modules/models.py)
- [hyvideo/diffusion/pipelines/pipeline_hunyuan_video.py](https://github.com/Tencent-Hunyuan/HunyuanVideo/blob/main/hyvideo/diffusion/pipelines/pipeline_hunyuan_video.py)
- [hyvideo/diffusion/schedulers/scheduling_flow_match_discrete.py](https://github.com/Tencent-Hunyuan/HunyuanVideo/blob/main/hyvideo/diffusion/schedulers/scheduling_flow_match_discrete.py)
- [hyvideo/text_encoder/__init__.py](https://github.com/Tencent-Hunyuan/HunyuanVideo/blob/main/hyvideo/text_encoder/__init__.py)
- [hyvideo/vae/autoencoder_kl_causal_3d.py](https://github.com/Tencent-Hunyuan/HunyuanVideo/blob/main/hyvideo/vae/autoencoder_kl_causal_3d.py)
- [hyvideo/vae/unet_causal_3d_blocks.py](https://github.com/Tencent-Hunyuan/HunyuanVideo/blob/main/hyvideo/vae/unet_causal_3d_blocks.py)
</details>

# Community Roadmap, Troubleshooting, and Known Issues

This page consolidates information that is most relevant to users interacting with the public HunyuanVideo repository: the scope of what the open-source release currently supports, the recurring requests and reported issues observed in community discussions, the constraints imposed by the project's license, and the most common failure modes encountered during installation and inference. It is intended as an orientation guide for new users and a triage reference for contributors triaging issues.

## Repository Scope and What the Codebase Supports

HunyuanVideo is shipped as a text-to-video (T2V) diffusion system built around a multimodal DiT backbone and a causal 3D VAE. The `HunyuanVideo` class defined in [`hyvideo/modules/models.py`](https://github.com/Tencent-Hunyuan/HunyuanVideo/blob/main/hyvideo/modules/models.py) exposes configuration flags such as `mm_double_blocks_depth`, `mm_single_blocks_depth`, `heads_num`, `hidden_size`, `rope_dim_list`, and `text_projection` — these are the architectural knobs available to users who build on the released code. The DiT combines `MMDoubleStreamBlock` (separate text and video modulation, as in SD3/Flux) and `MMSingleStreamBlock` (parallel linear layers à la DiT).

The pipeline assembly, denoising loop, and CFG handling live in [`hyvideo/diffusion/pipelines/pipeline_hunyuan_video.py`](https://github.com/Tencent-Hunyuan/HunyuanVideo/blob/main/hyvideo/diffusion/pipelines/pipeline_hunyuan_video.py), which subclasses `DiffusionPipeline` and integrates `AutoencoderKL` together with the `FlowMatchDiscreteScheduler` defined in [`hyvideo/diffusion/schedulers/scheduling_flow_match_discrete.py`](https://github.com/Tencent-Hunyuan/HunyuanVideo/blob/main/hyvideo/diffusion/schedulers/scheduling_flow_match_discrete.py). The scheduler's `step` method performs an Euler update (`prev_sample = sample + model_output.to(torch.float32) * dt`), which is the only solver currently implemented — the `else` branch raises `ValueError(f"Solver {self.config.solver} not supported.")`.

The text encoder wrapper in [`hyvideo/text_encoder/__init__.py`](https://github.com/Tencent-Hunyuan/HunyuanVideo/blob/main/hyvideo/text_encoder/__init__.py) supports both string templates (for LLM-style encoders) and chat-template lists via `tokenizer.apply_chat_template`. Prompt-format constants, including the negative prompt used by default, are defined in [`hyvideo/constants.py`](https://github.com/Tencent-Hunyuan/HunyuanVideo/blob/main/hyvideo/constants.py):

| Constant | Purpose |
|---|---|
| `PROMPT_TEMPLATE_ENCODE` | System prompt for image description when using the LLM encoder |
| `PROMPT_TEMPLATE_ENCODE_VIDEO` | Multi-aspect system prompt for video description (theme, objects, actions, environment, camera) |
| `NEGATIVE_PROMPT` | Default CFG negative prompt (filters aerial view, overexposure, deformation, bad hands/teeth/limbs) |
| `PRECISIONS` | Allowed dtypes: `fp32`, `fp16`, `bf16` |
| `C_SCALE` | 1e15, used as a PetaFLOPS scaler for tensorboard logging |

## Community Roadmap Signals

The most heavily engaged GitHub issues in this repository are not bug reports — they are requests for an **Image-to-Video (I2V) checkpoint and inference script**. Issues [#180](https://github.com/Tencent-Hunyuan/HunyuanVideo/issues/180), [#128](https://github.com/Tencent-Hunyuan/HunyuanVideo/issues/128), [#172](https://github.com/Tencent-Hunyuan/HunyuanVideo/issues/172), [#198](https://github.com/Tencent-Hunyuan/HunyuanVideo/issues/198), and [#131](https://github.com/Tencent-Hunyuan/HunyuanVideo/issues/131) all ask, in English and Chinese, when an I2V model will be released. As of the current repository state, only the T2V checkpoint is downloadable per the instructions in [ckpts/README.md](https://github.com/Tencent-Hunyuan/HunyuanVideo/blob/main/ckpts/README.md), which documents `hunyuan-video-t2v-720p` (with `transformers/mp_rank_00_model_states.pt` and FP8 variants) under `ckpts/`. No I2V checkpoint directory is documented there.

The second recurring theme is **fine-tuning**. Issue [#302](https://github.com/Tencent-Hunyuan/HunyuanVideo/issues/302) explicitly asks for "Official Fine-Tuning Code / Training Example." The shipped source tree contains the model definition (`HunyuanVideo` in `models.py`), the forward-pass pipeline, and inference-side utilities, but it does not include a training loop, a LoRA adapter implementation, or a dataset pipeline. Users seeking fine-tuning must therefore implement the training infrastructure themselves; the codebase provides the building blocks but not the trainer.

## Known Issues and Common Failure Modes

### Off-Topic Issue Volume

A substantial fraction of issues opened against the repository are not technical requests — they are creative-writing prompts (e.g., [#318](https://github.com/Tencent-Hunyuan/HunyuanVideo/issues/318), [#313](https://github.com/Tencent-Hunyuan/HunyuanVideo/issues/313), [#311](https://github.com/Tencent-Hunyuan/HunyuanVideo/issues/311), [#308](https://github.com/Tencent-Hunyuan/HunyuanVideo/issues/308), [#292](https://github.com/Tencent-Hunyuan/HunyuanVideo/issues/292), [#282](https://github.com/Tencent-Hunyuan/HunyuanVideo/issues/282)) where users paste screenplay-style text and request generated videos. These issues do not reflect bugs and are not actionable by maintainers. Contributors triaging the tracker should close them as `not planned` or `invalid`.

### CUDA / cuBLAS Version Mismatch

Issue [#317](https://github.com/Tencent-Hunyuan/HunyuanVideo/issues/317) documents the canonical CUDA environment problem. The recommended fix published in the issue thread is:

```bash
# Option 1: install the matching cuBLAS and point LD_LIBRARY_PATH at it
pip install nvidia-cublas-cu12==12.4.5.8
export LD_LIBRARY_PATH=/opt/conda/lib/python3.8/site-packages/nvidia/cublas/lib/

# Option 2: use the official CUDA 12 Docker image
# Option 3: pin PyTorch and all CUDA-dependent libs to the CUDA 11.8 build
```

Users should verify their CUDA toolkit, cuBLAS (`>=12.4.5.8`), and cuDNN (`>=9.00`) versions before reporting new issues.

### Parallel Inference Failures

Issue [#249](https://github.com/Tencent-Hunyuan/HunyuanVideo/issues/249) reports a failure when launching multi-GPU inference with:

```bash
torchrun --nproc_per_node=2 sample_video.py \
    --video-size 1280 720 --video-length 129 --infer-steps 50 \
    --prompt "astronaut is fixing the space station." \
    --flow-reverse --seed 42 --ulysses-degree 2 --ring-degree 1
```

The Ulysses + Ring sequence-parallel configuration requires the distributed launcher flags to match the degree flags (`--ulysses-degree`, `--ring-degree`). When these are mismatched or when the backend (`--ulysses-degree 0` to disable, or set environment variables `RANK`, `WORLD_SIZE`, `MASTER_ADDR`, `MASTER_PORT`), the process group fails to initialize. The shipped scheduler (`FlowMatchDiscreteScheduler.step`) operates on per-rank tensors and assumes the parallel attention has already produced a full-sequence view, so initialization must succeed before the first `step()` call.

### Prompt Template Truncation

The `PROMPT_TEMPLATE` dictionary in [`hyvideo/constants.py`](https://github.com/Tencent-Hunyuan/HunyuanVideo/blob/main/hyvideo/constants.py) uses two `crop_start` values — `36` for the image encoder and `95` for the video encoder. If a user passes a custom prompt template, they must recompute `crop_start` to match the prefix length of the system prompt; otherwise the tokenized input will include system-prompt tokens that bias the generation away from the user's intent. The truncation is applied inside the text encoder wrapper's `encode` method.

### Scheduler Step-Index State

`FlowMatchDiscreteScheduler.step` enforces a contract that integer indices are not accepted — only `torch.FloatTensor` values from `scheduler.timesteps`. Passing `enumerate(timesteps)` directly will raise `ValueError` per the explicit guard in [`scheduling_flow_match_discrete.py`](https://github.com/Tencent-Hunyuan/HunyuanVideo/blob/main/hyvideo/diffusion/schedulers/scheduling_flow_match_discrete.py). Users porting code from diffusers' `EulerDiscreteScheduler` often trip this guard.

## Licensing Constraints Affecting Deployment

The Tencent Hunyuan Community License Agreement ([LICENSE.txt](https://github.com/Tencent-Hunyuan/HunyuanVideo/blob/main/LICENSE.txt)) imposes several restrictions that users should understand before deploying:

```mermaid
flowchart LR
    A[Use HunyuanVideo] --> B{Territory}
    B -- EU/UK/South Korea --> X[Not Licensed]
    B -- Worldwide ex. above --> C{Check Section 5 Rules}
    C --> D[No impersonation]
    C --> E[No high-stakes automation]
    C --> F[No military use]
    C --> G[No discrimination]
    C --> H[No violence/terrorism]
    D & E & F & G & H --> I[Compliant Deployment]
    A --> J{MAU > 100M?}
    J -- Yes --> K[Must request commercial license]
    J -- No --> I
```

Additional commercial trigger ([LICENSE.txt](https://github.com/Tencent-Hunyuan/HunyuanVideo/blob/main/LICENSE.txt)): "If, on the Tencent Hunyuan version release date, the monthly active users of all products or services made available by or for Licensee is greater than 100 million monthly active users in the preceding calendar month, You must request a license from Tencent." Any redistribution must include the attribution string beginning with "Tencent Hunyuan is licensed under the Tencent Hunyuan Community License Agreement, Copyright © 2024 Tencent."

## See Also

- Architecture overview and DiT block definitions: `hyvideo/modules/models.py`
- Denoising loop and CFG behavior: `hyvideo/diffusion/pipelines/pipeline_hunyuan_video.py`
- Scheduler semantics and step contract: `hyvideo/diffusion/schedulers/scheduling_flow_match_discrete.py`
- Prompt templates, negative prompt, and precision enums: `hyvideo/constants.py`
- 3D causal VAE used for latent encoding/decoding: `hyvideo/vae/autoencoder_kl_causal_3d.py`
- Causal UNet residual/attention blocks: `hyvideo/vae/unet_causal_3d_blocks.py`
- Text encoder wrapper and tokenizer integration: `hyvideo/text_encoder/__init__.py`
- Checkpoint layout and download instructions: `ckpts/README.md`

---

<!-- evidence_pipeline_checked: true -->
<!-- evidence_injected: true -->

---

## Pitfall Log

Project: Tencent-Hunyuan/HunyuanVideo

Summary: Found 12 structured pitfall item(s), including 2 high/blocking item(s). Top priority: Installation risk - Installation risk requires verification.

## 1. Installation risk - Installation risk requires verification

- Severity: high
- Evidence strength: source_linked
- Finding: Project evidence flags a installation risk. Review the linked source before relying on this workflow.
- User impact: May increase setup, validation, or first-run risk for the user.
- Evidence: community_evidence:github | https://github.com/Tencent-Hunyuan/HunyuanVideo/issues/249

## 2. Installation risk - Installation risk requires verification

- Severity: high
- Evidence strength: source_linked
- Finding: Project evidence flags a installation risk. Review the linked source before relying on this workflow.
- User impact: May increase setup, validation, or first-run risk for the user.
- Evidence: community_evidence:github | https://github.com/Tencent-Hunyuan/HunyuanVideo/issues/302

## 3. Installation risk - Installation risk requires verification

- Severity: medium
- Evidence strength: runtime_trace
- Finding: Project evidence flags a installation risk. Review the linked source before relying on this workflow.
- User impact: May increase setup, validation, or first-run risk for the user.
- Repro command: `docker run -itd --gpus all --init --net=host --uts=host --ipc=host --name hunyuanvideo --security-opt=seccomp=unconfined --ulimit=stack=67108864 --ulimit=memlock=-1 --privileged hunyuanvideo/hunyuanvideo:cuda_12 # For CUDA`
- Evidence: identity.distribution | https://github.com/Tencent-Hunyuan/HunyuanVideo

## 4. Installation risk - Installation risk requires verification

- Severity: medium
- Evidence strength: source_linked
- Finding: Project evidence flags a installation risk. Review the linked source before relying on this workflow.
- User impact: May increase setup, validation, or first-run risk for the user.
- Evidence: community_evidence:github | https://github.com/Tencent-Hunyuan/HunyuanVideo/issues/317

## 5. Capability evidence risk - Capability evidence risk requires verification

- Severity: medium
- Evidence strength: source_linked
- Finding: README/documentation is current enough for a first validation pass.
- User impact: May increase setup, validation, or first-run risk for the user.
- Evidence: capability.assumptions | https://github.com/Tencent-Hunyuan/HunyuanVideo

## 6. Runtime risk - Runtime risk requires verification

- Severity: medium
- Evidence strength: source_linked
- Finding: Project evidence flags a runtime risk. Review the linked source before relying on this workflow.
- User impact: May increase setup, validation, or first-run risk for the user.
- Evidence: community_evidence:github | https://github.com/Tencent-Hunyuan/HunyuanVideo/issues/311

## 7. Maintenance risk - Maintenance risk requires verification

- Severity: medium
- Evidence strength: source_linked
- Finding: Project evidence flags a maintenance risk. Review the linked source before relying on this workflow.
- User impact: May increase setup, validation, or first-run risk for the user.
- Evidence: community_evidence:github | https://github.com/Tencent-Hunyuan/HunyuanVideo/issues/313

## 8. Maintenance risk - Maintenance risk requires verification

- Severity: medium
- Evidence strength: source_linked
- Finding: Project evidence flags a maintenance risk. Review the linked source before relying on this workflow.
- User impact: May increase setup, validation, or first-run risk for the user.
- Evidence: evidence.maintainer_signals | https://github.com/Tencent-Hunyuan/HunyuanVideo

## 9. Security or permission risk - Security or permission risk requires verification

- Severity: medium
- Evidence strength: source_linked
- Finding: no_demo
- User impact: May increase setup, validation, or first-run risk for the user.
- Evidence: downstream_validation.risk_items | https://github.com/Tencent-Hunyuan/HunyuanVideo

## 10. Security or permission risk - Security or permission risk requires verification

- Severity: medium
- Evidence strength: source_linked
- Finding: no_demo
- User impact: May increase setup, validation, or first-run risk for the user.
- Evidence: risks.scoring_risks | https://github.com/Tencent-Hunyuan/HunyuanVideo

## 11. Maintenance risk - Maintenance risk requires verification

- Severity: low
- Evidence strength: source_linked
- Finding: issue_or_pr_quality=unknown。
- User impact: May increase setup, validation, or first-run risk for the user.
- Evidence: evidence.maintainer_signals | https://github.com/Tencent-Hunyuan/HunyuanVideo

## 12. Maintenance risk - Maintenance risk requires verification

- Severity: low
- Evidence strength: source_linked
- Finding: release_recency=unknown。
- User impact: May increase setup, validation, or first-run risk for the user.
- Evidence: evidence.maintainer_signals | https://github.com/Tencent-Hunyuan/HunyuanVideo

<!-- canonical_name: Tencent-Hunyuan/HunyuanVideo; human_manual_source: deepwiki_human_wiki -->