# hunyuanvideo - Doramagic AI Context Pack

> 定位：安装前体验与判断资产。它帮助宿主 AI 有一个好的开始，但不代表已经安装、执行或验证目标项目。

## 充分原则

- **充分原则，不是压缩原则**：AI Context Pack 应该充分到让宿主 AI 在开工前理解项目价值、能力边界、使用入口、风险和证据来源；它可以分层组织，但不以最短摘要为目标。
- **压缩策略**：只压缩噪声和重复内容，不压缩会影响判断和开工质量的上下文。

## 给宿主 AI 的使用方式

你正在读取 Doramagic 为 hunyuanvideo 编译的 AI Context Pack。请把它当作开工前上下文：帮助用户理解适合谁、能做什么、如何开始、哪些必须安装后验证、风险在哪里。不要声称你已经安装、运行或执行了目标项目。

## Claim 消费规则

- **事实来源**：Repo Evidence + Claim/Evidence Graph；Human Wiki 只提供显著性、术语和叙事结构。
- **事实最低状态**：`supported`
- `supported`：可以作为项目事实使用，但回答中必须引用 claim_id 和证据路径。
- `weak`：只能作为低置信度线索，必须要求用户继续核实。
- `inferred`：只能用于风险提示或待确认问题，不能包装成项目事实。
- `unverified`：不得作为事实使用，应明确说证据不足。
- `contradicted`：必须展示冲突来源，不得替用户强行选择一个版本。

## 它最适合谁

- **AI 研究者或研究型 Agent 构建者**：README 明确围绕研究、实验或论文工作流展开。 证据：`README.md` Claim：`clm_0002` supported 0.86

## 它能做什么

- **命令行启动或安装流程**（需要安装后验证）：项目文档中存在可执行命令，真实使用需要在本地或宿主环境中运行这些命令。 证据：`README.md` Claim：`clm_0001` supported 0.86

## 怎么开始

- `git clone https://github.com/Tencent-Hunyuan/HunyuanVideo` 证据：`README.md` Claim：`clm_0003` supported 0.86
- `pip install nvidia-cublas-cu12==12.4.5.8` 证据：`README.md` Claim：`clm_0004` supported 0.86
- `pip install torch==2.6.0 --index-url https://download.pytorch.org/whl/cu118` 证据：`README.md` Claim：`clm_0005` supported 0.86
- `pip install -r requirements.txt` 证据：`README.md` Claim：`clm_0006` supported 0.86
- `pip install ninja` 证据：`README.md` Claim：`clm_0007` supported 0.86
- `pip install git+https://github.com/Dao-AILab/flash-attention.git@v2.6.3` 证据：`README.md` Claim：`clm_0008` supported 0.86
- `pip install xfuser==0.4.0` 证据：`README.md` Claim：`clm_0009` supported 0.86

## 继续前判断卡

- **当前建议**：仅建议沙盒试装
- **为什么**：项目存在安装命令、宿主配置或本地写入线索，不建议直接进入主力环境，应先在隔离环境试装。

### 30 秒判断

- **现在怎么做**：仅建议沙盒试装
- **最小安全下一步**：先跑 Prompt Preview；若仍要安装，只在隔离环境试装
- **先别相信**：真实输出质量不能在安装前相信。
- **继续会触碰**：命令执行、本地环境或项目文件、宿主 AI 上下文

### 现在可以相信

- **适合人群线索：AI 研究者或研究型 Agent 构建者**（supported）：有 supported claim 或项目证据支撑，但仍不等于真实安装效果。 证据：`README.md` Claim：`clm_0002` supported 0.86
- **能力存在：命令行启动或安装流程**（supported）：可以相信项目包含这类能力线索；是否适合你的具体任务仍要试用或安装后验证。 证据：`README.md` Claim：`clm_0001` supported 0.86
- **存在 Quick Start / 安装命令线索**（supported）：可以相信项目文档出现过启动或安装入口；不要因此直接在主力环境运行。 证据：`README.md` Claim：`clm_0003` supported 0.86

### 现在还不能相信

- **真实输出质量不能在安装前相信。**（unverified）：Prompt Preview 只能展示引导方式，不能证明真实项目中的结果质量。
- **宿主 AI 版本兼容性不能在安装前相信。**（unverified）：Claude、Cursor、Codex、Gemini 等宿主加载规则和版本差异必须在真实环境验证。
- **不会污染现有宿主 AI 行为，不能直接相信。**（inferred）：Skill、plugin、AGENTS/CLAUDE/GEMINI 指令可能改变宿主 AI 的默认行为。
- **可安全回滚不能默认相信。**（unverified）：除非项目明确提供卸载和恢复说明，否则必须先在隔离环境验证。
- **真实安装后是否与用户当前宿主 AI 版本兼容？**（unverified）：兼容性只能通过实际宿主环境验证。
- **项目输出质量是否满足用户具体任务？**（unverified）：安装前预览只能展示流程和边界，不能替代真实评测。
- **安装命令是否需要网络、权限或全局写入？**（unverified）：这影响企业环境和个人环境的安装风险。 证据：`README.md`

### 继续会触碰什么

- **命令执行**：包管理器、网络下载、本地插件目录、项目配置或用户主目录。 原因：运行第一条命令就可能产生环境改动；必须先判断是否值得跑。 证据：`README.md`
- **本地环境或项目文件**：安装结果、插件缓存、项目配置或本地依赖目录。 原因：安装前无法证明写入范围和回滚方式，需要隔离验证。 证据：`README.md`
- **宿主 AI 上下文**：AI Context Pack、Prompt Preview、Skill 路由、风险规则和项目事实。 原因：导入上下文会影响宿主 AI 后续判断，必须避免把未验证项包装成事实。

### 最小安全下一步

- **先跑 Prompt Preview**：用安装前交互式试用判断工作方式是否匹配，不需要授权或改环境。（适用：任何项目都适用，尤其是输出质量未知时。）
- **只在隔离目录或测试账号试装**：避免安装命令污染主力宿主 AI、真实项目或用户主目录。（适用：存在命令执行、插件配置或本地写入线索时。）
- **安装后只验证一个最小任务**：先验证加载、兼容、输出质量和回滚，再决定是否深用。（适用：准备从试用进入真实工作流时。）

### 退出方式

- **保留安装前状态**：记录原始宿主配置和项目状态，后续才能判断是否可恢复。
- **记录安装命令和写入路径**：没有明确卸载说明时，至少要知道哪些目录或配置需要手动清理。
- **如果没有回滚路径，不进入主力环境**：不可回滚是继续前阻断项，不应靠信任或运气继续。

## 哪些只能预览

- 解释项目适合谁和能做什么
- 基于项目文档演示典型对话流程
- 帮助用户判断是否值得安装或继续研究

## 哪些必须安装后验证

- 真实安装 Skill、插件或 CLI
- 执行脚本、修改本地文件或访问外部服务
- 验证真实输出质量、性能和兼容性

## 边界与风险判断卡

- **把安装前预览误认为真实运行**：用户可能高估项目已经完成的配置、权限和兼容性验证。 处理方式：明确区分 prompt_preview_can_do 与 runtime_required。 Claim：`clm_0010` inferred 0.45
- **命令执行会修改本地环境**：安装命令可能写入用户主目录、宿主插件目录或项目配置。 处理方式：先在隔离环境或测试账号中运行。 证据：`README.md` Claim：`clm_0011` supported 0.86
- **待确认**：真实安装后是否与用户当前宿主 AI 版本兼容？。原因：兼容性只能通过实际宿主环境验证。
- **待确认**：项目输出质量是否满足用户具体任务？。原因：安装前预览只能展示流程和边界，不能替代真实评测。
- **待确认**：安装命令是否需要网络、权限或全局写入？。原因：这影响企业环境和个人环境的安装风险。

## 开工前工作上下文

### 加载顺序

- 先读取 how_to_use.host_ai_instruction，建立安装前判断资产的边界。
- 读取 claim_graph_summary，确认事实来自 Claim/Evidence Graph，而不是 Human Wiki 叙事。
- 再读取 intended_users、capabilities 和 quick_start_candidates，判断用户是否匹配。
- 需要执行具体任务时，优先查 role_skill_index，再查 evidence_index。
- 遇到真实安装、文件修改、网络访问、性能或兼容性问题时，转入 risk_card 和 boundaries.runtime_required。

### 任务路由

- **命令行启动或安装流程**：先说明这是安装后验证能力，再给出安装前检查清单。 边界：必须真实安装或运行后验证。 证据：`README.md` Claim：`clm_0001` supported 0.86

### 上下文规模

- 文件总数：45
- 重要文件覆盖：40/45
- 证据索引条目：40
- 角色 / Skill 条目：3

### 证据不足时的处理

- **missing_evidence**：说明证据不足，要求用户提供目标文件、README 段落或安装后验证记录；不要补全事实。
- **out_of_scope_request**：说明该任务超出当前 AI Context Pack 证据范围，并建议用户先查看 Human Manual 或真实安装后验证。
- **runtime_request**：给出安装前检查清单和命令来源，但不要替用户执行命令或声称已执行。
- **source_conflict**：同时展示冲突来源，标记为待核实，不要强行选择一个版本。

## Prompt Recipes

### 适配判断

- 目标：判断这个项目是否适合用户当前任务。
- 预期输出：适配结论、关键理由、证据引用、安装前可预览内容、必须安装后验证内容、下一步建议。

```text
请基于 hunyuanvideo 的 AI Context Pack，先问我 3 个必要问题，然后判断它是否适合我的任务。回答必须包含：适合谁、能做什么、不能做什么、是否值得安装、证据来自哪里。所有项目事实必须引用 evidence_refs、source_paths 或 claim_id。
```

### 安装前体验

- 目标：让用户在安装前感受核心工作流，同时避免把预览包装成真实能力或营销承诺。
- 预期输出：一段带边界标签的体验剧本、安装后验证清单和谨慎建议；不含真实运行承诺或强营销表述。

```text
请把 hunyuanvideo 当作安装前体验资产，而不是已安装工具或真实运行环境。

请严格输出四段：
1. 先问我 3 个必要问题。
2. 给出一段“体验剧本”：用 [安装前可预览]、[必须安装后验证]、[证据不足] 三种标签展示它可能如何引导工作流。
3. 给出安装后验证清单：列出哪些能力只有真实安装、真实宿主加载、真实项目运行后才能确认。
4. 给出谨慎建议：只能说“值得继续研究/试装”“先补充信息后再判断”或“不建议继续”，不得替项目背书。

硬性边界：
- 不要声称已经安装、运行、执行测试、修改文件或产生真实结果。
- 不要写“自动适配”“确保通过”“完美适配”“强烈建议安装”等承诺性表达。
- 如果描述安装后的工作方式，必须使用“如果安装成功且宿主正确加载 Skill，它可能会……”这种条件句。
- 体验剧本只能写成“示例台词/假设流程”：使用“可能会询问/可能会建议/可能会展示”，不要写“已写入、已生成、已通过、正在运行、正在生成”。
- Prompt Preview 不负责给安装命令；如用户准备试装，只能提示先阅读 Quick Start 和 Risk Card，并在隔离环境验证。
- 所有项目事实必须来自 supported claim、evidence_refs 或 source_paths；inferred/unverified 只能作风险或待确认项。

```

### 角色 / Skill 选择

- 目标：从项目里的角色或 Skill 中挑选最匹配的资产。
- 预期输出：候选角色或 Skill 列表，每项包含适用场景、证据路径、风险边界和是否需要安装后验证。

```text
请读取 role_skill_index，根据我的目标任务推荐 3-5 个最相关的角色或 Skill。每个推荐都要说明适用场景、可能输出、风险边界和 evidence_refs。
```

### 风险预检

- 目标：安装或引入前识别环境、权限、规则冲突和质量风险。
- 预期输出：环境、权限、依赖、许可、宿主冲突、质量风险和未知项的检查清单。

```text
请基于 risk_card、boundaries 和 quick_start_candidates，给我一份安装前风险预检清单。不要替我执行命令，只说明我应该检查什么、为什么检查、失败会有什么影响。
```

### 宿主 AI 开工指令

- 目标：把项目上下文转成一次对话开始前的宿主 AI 指令。
- 预期输出：一段边界明确、证据引用明确、适合复制给宿主 AI 的开工前指令。

```text
请基于 hunyuanvideo 的 AI Context Pack，生成一段我可以粘贴给宿主 AI 的开工前指令。这段指令必须遵守 not_runtime=true，不能声称项目已经安装、运行或产生真实结果。
```

## 角色 / Skill 索引

- 共索引 3 个角色 / Skill / 项目文档条目。

- **HunyuanVideo: A Systematic Framework For Large Video Generation Model**（project_doc）：HunyuanVideo: A Systematic Framework For Large Video Generation Model 激活提示：当用户需要理解项目结构、安装方式或边界时参考。 证据：`README.md`
- **HunyuanVideo: A Systematic Framework For Large Video Generation Model**（project_doc）：HunyuanVideo: A Systematic Framework For Large Video Generation Model 激活提示：当用户需要理解项目结构、安装方式或边界时参考。 证据：`README_zh.md`
- **Wechat**（project_doc）：扫码关注混元系列工作，加入「 Hunyuan Video 交流群」 Scan the QR code to join the "Hunyuan Discussion Group" 激活提示：当用户需要理解项目结构、安装方式或边界时参考。 证据：`assets/WECHAT.md`

## 证据索引

- 共索引 40 条证据。

- **HunyuanVideo: A Systematic Framework For Large Video Generation Model**（documentation）：HunyuanVideo: A Systematic Framework For Large Video Generation Model 证据：`README.md`
- **HunyuanVideo: A Systematic Framework For Large Video Generation Model**（documentation）：HunyuanVideo: A Systematic Framework For Large Video Generation Model 证据：`README_zh.md`
- **Wechat**（documentation）：扫码关注混元系列工作，加入「 Hunyuan Video 交流群」 Scan the QR code to join the "Hunyuan Discussion Group" 证据：`assets/WECHAT.md`
- **.gitignore**（source_file）：pycache /ckpts/ / 证据：`.gitignore`
- **License**（source_file）：TENCENT HUNYUAN COMMUNITY LICENSE AGREEMENT Tencent HunyuanVideo Release Date: December 3, 2024 THIS LICENSE AGREEMENT DOES NOT APPLY IN THE EUROPEAN UNION, UNITED KINGDOM AND SOUTH KOREA AND IS EXPRESSLY LIMITED TO THE TERRITORY, AS DEFINED BELOW. By clicking to agree or by using, reproducing, modifying, distributing, performing or displaying any portion or element of the Tencent Hunyuan Works, including via any Hosted Service, You will be deemed to have recognized and accepted the content of this Agreement, which is effective immediately. 1. DEFINITIONS. a. “Acceptable Use Policy” shall mean the policy made available by Tencent as set forth in the Exhibit A. b. “Agreement” shall mean the… 证据：`LICENSE.txt`
- **Notice**（source_file）：Tencent is pleased to support the open source community by making Tencent HunyuanVideo available. 证据：`Notice`
- **Penguinvideobenchmark**（source_file）：,prompt 0,"In the large cage, two puppies were wagging their tails at each other." 1,"A flock of bats flies over the village, captured in medium long shot." 2,"Above the sea, a school of silver flying fish leaped out of the water." 3,"In the early morning park, a bee is collecting pollen on a flower, in anime style." 4,Two dolphins are swimming in the blue sea. 5,"Several ducks are lying in the mud pit, occasionally preening their feathers leisurely, and sometimes probing for food in the muddy water with their beaks." 6,"Under the azure sky, a polar bear stands in the snow, turning its head to look at its cub behind him." 7,A butterfly is fluttering. 8,A woodpecker is pecking holes in the t… 证据：`assets/PenguinVideoBenchmark.csv`
- **Gradio Server**（source_file）：def initialize model model path ⋮---- args = parse args models root path = Path model path ⋮---- hunyuan video sampler = HunyuanVideoSampler.from pretrained models root path, args=args ⋮---- seed = None if seed == -1 else seed ⋮---- negative prompt = "" not applicable in the inference outputs = model.predict samples = outputs 'samples' sample = samples 0 .unsqueeze 0 save path = os.path.join os.getcwd , "gradio outputs" ⋮---- time flag = datetime.fromtimestamp time.time .strftime "%Y-%m-%d-%H:%M:%S" video path = f"{save path}/{time flag} seed{outputs 'seeds' 0 } {outputs 'prompts' 0 :100 .replace '/','' }.mp4" ⋮---- def create demo model path, save path ⋮---- model = initialize model model… 证据：`gradio_server.py`
- **Config**（source_file）：def parse args namespace=None ⋮---- parser = argparse.ArgumentParser description="HunyuanVideo inference script" parser = add network args parser parser = add extra models args parser parser = add denoise schedule args parser parser = add inference args parser parser = add parallel args parser args = parser.parse args namespace=namespace args = sanity check args args ⋮---- def add network args parser: argparse.ArgumentParser ⋮---- group = parser.add argument group title="HunyuanVideo network args" ⋮---- def add extra models args parser: argparse.ArgumentParser ⋮---- group = parser.add argument group ⋮---- def add denoise schedule args parser: argparse.ArgumentParser ⋮---- group = parser.add… 证据：`hyvideo/config.py`
- **Constants**（source_file）：all = PRECISION TO TYPE = { C SCALE = 1 000 000 000 000 000 PROMPT TEMPLATE ENCODE = PROMPT TEMPLATE ENCODE VIDEO = NEGATIVE PROMPT = "Aerial view, aerial view, overexposed, low quality, deformation, a poor composition, bad hands, bad teeth, bad eyes, bad limbs, distortion" PROMPT TEMPLATE = { PRECISIONS = {"fp32", "fp16", "bf16"} NORMALIZATION TYPE = {"layer", "rms"} ACTIVATION TYPE = {"relu", "silu", "gelu", "gelu tanh"} MODEL BASE = os.getenv "MODEL BASE", "./ckpts" DATA TYPE = {"image", "video", "image video"} VAE PATH = {"884-16c-hy": f"{MODEL BASE}/hunyuan-video-t2v-720p/vae"} TEXT ENCODER PATH = { TOKENIZER PATH = { TEXT PROJECTION = { 证据：`hyvideo/constants.py`
- **==================== Initialize Distributed Environment ================**（source_file）：xfuser = None get sequence parallel world size = None get sequence parallel rank = None get sp group = None initialize model parallel = None init distributed environment = None def parallelize transformer pipe ⋮---- transformer = pipe.transformer original forward = transformer.forward ⋮---- split dim = -2 ⋮---- split dim = -1 ⋮---- x = torch.chunk x, get sequence parallel world size ,dim=split dim get sequence parallel rank dim thw = freqs cos.shape -1 freqs cos = freqs cos.reshape temporal size, h, w, dim thw freqs cos = torch.chunk freqs cos, get sequence parallel world size ,dim=split dim - 1 get sequence parallel rank freqs cos = freqs cos.reshape -1, dim thw dim thw = freqs sin.shape -… 证据：`hyvideo/inference.py`
- **Prompt Rewrite**（source_file）：normal mode prompt = """Normal mode - Video Recaption Task: master mode prompt = """Master mode - Video Recaption Task: def get rewrite prompt ori prompt, mode="Normal" ⋮---- prompt = normal mode prompt.format input=ori prompt ⋮---- prompt = master mode prompt.format input=ori prompt ⋮---- ori prompt = "一只小狗在草地上奔跑。" normal prompt = get rewrite prompt ori prompt, mode="Normal" master prompt = get rewrite prompt ori prompt, mode="Master" 证据：`hyvideo/prompt_rewrite.py`
- **Requirements**（source_file）：torch==2.6.0 opencv-python==4.9.0.80 diffusers==0.31.0 transformers==4.46.3 tokenizers==0.20.3 accelerate==1.1.1 pandas==2.0.3 numpy==1.24.4 einops==0.7.0 tqdm==4.66.2 loguru==0.7.2 imageio==2.34.0 imageio-ffmpeg==0.5.1 safetensors==0.4.3 gradio==5.0.0 证据：`requirements.txt`
- **Create save folder to save the samples**（source_file）：def main ⋮---- args = parse args ⋮---- models root path = Path args.model base ⋮---- Create save folder to save the samples save path = args.save path if args.save path suffix=="" else f'{args.save path} {args.save path suffix}' ⋮---- Load models hunyuan video sampler = HunyuanVideoSampler.from pretrained models root path, args=args Get the updated args args = hunyuan video sampler.args Start sampling TODO: batch inference check outputs = hunyuan video sampler.predict samples = outputs 'samples' Save samples ⋮---- sample = samples i .unsqueeze 0 time flag = datetime.fromtimestamp time.time .strftime "%Y-%m-%d-%H:%M:%S" cur save path = f"{save path}/{time flag} seed{outputs 'seeds' i } {outp… 证据：`sample_video.py`
- **Run Sample Video**（source_file）：python3 sample video.py \ --video-size 720 1280 \ --video-length 129 \ --infer-steps 50 \ --prompt "A cat walks on the grass, realistic style." \ --seed 42 \ --embedded-cfg-scale 6.0 \ --flow-shift 7.0 \ --flow-reverse \ --use-cpu-offload \ --save-path ./results 证据：`scripts/run_sample_video.sh`
- **Run Sample Video Fp8**（source_file）：DIT CKPT PATH={PATH TO}/{MODEL NAME} model states fp8.pt python3 sample video.py \ --dit-weight ${DIT CKPT PATH} \ --video-size 720 1280 \ --video-length 129 \ --infer-steps 50 \ --prompt "A cat walks on the grass, realistic style." \ --seed 42 \ --embedded-cfg-scale 6.0 \ --flow-shift 7.0 \ --flow-reverse \ --use-cpu-offload \ --use-fp8 \ --save-path ./results 证据：`scripts/run_sample_video_fp8.sh`
- **Run Sample Video Multigpu**（source_file）：export TOKENIZERS PARALLELISM=false export NPROC PER NODE=8 export ULYSSES DEGREE=8 export RING DEGREE=1 torchrun --nproc per node=$NPROC PER NODE sample video.py \ --video-size 720 1280 \ --video-length 129 \ --infer-steps 50 \ --prompt "A cat walks on the grass, realistic style." \ --seed 42 \ --embedded-cfg-scale 6.0 \ --flow-shift 7.0 \ --flow-reverse \ --ulysses-degree=$ULYSSES DEGREE \ --ring-degree=$RING DEGREE \ --save-path ./results 证据：`scripts/run_sample_video_multigpu.sh`
- **Collect Env**（source_file）：def is rocm pytorch - bool ⋮---- is rocm = False ⋮---- is rocm = True if torch.version.hip is not None and ⋮---- TORCH VERSION = torch. version def get build config ⋮---- IS MUSA AVAILABLE = True ⋮---- IS MUSA AVAILABLE = False def is musa available - bool def is cuda available - bool def get cuda home ⋮---- CUDA HOME = ROCM HOME ⋮---- def get musa home def collect env ⋮---- env info = OrderedDict ⋮---- cuda available = is cuda available musa available = is musa available ⋮---- devices = defaultdict list ⋮---- CUDA HOME = get cuda home ⋮---- nvcc = osp.join CUDA HOME, 'hip/bin/hipcc' nvcc = subprocess.check output nvcc = nvcc.decode 'utf-8' .strip release = nvcc.rfind 'HIP version:' build =… 证据：`utils/collect_env.py`
- **Check existence to make it compatible with FlowMatchEulerDiscreteScheduler**（source_file）：logger = logging.get logger name EXAMPLE DOC STRING = """""" def rescale noise cfg noise cfg, noise pred text, guidance rescale=0.0 ⋮---- std text = noise pred text.std std cfg = noise cfg.std dim=list range 1, noise cfg.ndim , keepdim=True noise pred rescaled = noise cfg std text / std cfg noise cfg = ⋮---- accepts timesteps = "timesteps" in set ⋮---- timesteps = scheduler.timesteps num inference steps = len timesteps ⋮---- accept sigmas = "sigmas" in set ⋮---- @dataclass class HunyuanVideoPipelineOutput BaseOutput ⋮---- videos: Union torch.Tensor, np.ndarray class HunyuanVideoPipeline DiffusionPipeline ⋮---- r""" Pipeline for text-to-video generation using HunyuanVideo. This model inherit… 证据：`hyvideo/diffusion/pipelines/pipeline_hunyuan_video.py`
- **Copied from diffusers.schedulers.scheduling dpmsolver multistep.DPMSolverMultistepScheduler.set begin index**（source_file）：logger = logging.get logger name ⋮---- @dataclass class FlowMatchDiscreteSchedulerOutput BaseOutput ⋮---- prev sample: torch.FloatTensor class FlowMatchDiscreteScheduler SchedulerMixin, ConfigMixin ⋮---- compatibles = order = 1 ⋮---- sigmas = torch.linspace 1, 0, num train timesteps + 1 ⋮---- sigmas = sigmas.flip 0 ⋮---- @property def step index self ⋮---- """ The index counter for current timestep. It will increase 1 after each scheduler step. """ ⋮---- @property def begin index self ⋮---- """ The index for the first timestep. It should be set from pipeline with set begin index method. """ ⋮---- Copied from diffusers.schedulers.scheduling dpmsolver multistep.DPMSolverMultistepScheduler.set… 证据：`hyvideo/diffusion/schedulers/scheduling_flow_match_discrete.py`
- **Init**（source_file）：def load model args, in channels, out channels, factor kwargs ⋮---- model = HYVideoDiffusionTransformer 证据：`hyvideo/modules/__init__.py`
- **Activation Layers**（source_file）：def get activation layer act type 证据：`hyvideo/modules/activation_layers.py`
- **Attenion**（source_file）：flash attn = None flash attn varlen func = None flash attn forward = None MEMORY LAYOUT = { def get cu seqlens text mask, img len ⋮---- batch size = text mask.shape 0 text len = text mask.sum dim=1 max len = text mask.shape 1 + img len cu seqlens = torch.zeros 2 batch size + 1 , dtype=torch.int32, device="cuda" ⋮---- s = text len i + img len s1 = i max len + s s2 = i + 1 max len ⋮---- q = pre attn layout q k = pre attn layout k v = pre attn layout v ⋮---- attn mask = attn mask.to q.dtype ⋮---- x = F.scaled dot product attention ⋮---- attn1 = F.scaled dot product attention attn2 = F.scaled dot product attention x = torch.cat attn1, attn2 , dim=2 ⋮---- x = flash attn varlen func x = x.view ⋮-… 证据：`hyvideo/modules/attenion.py`
- **Embed Layers**（source_file）：class PatchEmbed nn.Module ⋮---- factory kwargs = {"dtype": dtype, "device": device} ⋮---- patch size = to 2tuple patch size ⋮---- def forward self, x ⋮---- x = self.proj x ⋮---- x = x.flatten 2 .transpose 1, 2 x = self.norm x ⋮---- class TextProjection nn.Module ⋮---- def init self, in channels, hidden size, act layer, dtype=None, device=None def forward self, caption ⋮---- hidden states = self.linear 1 caption hidden states = self.act 1 hidden states hidden states = self.linear 2 hidden states ⋮---- def timestep embedding t, dim, max period=10000 ⋮---- half = dim // 2 freqs = torch.exp args = t :, None .float freqs None embedding = torch.cat torch.cos args , torch.sin args , dim=-1 ⋮----… 证据：`hyvideo/modules/embed_layers.py`
- **Fp8 Optimization**（source_file）：def get fp maxval bits=8, mantissa bit=3, sign bits=1 ⋮---- bits = torch.tensor bits mantissa bit = torch.tensor mantissa bit sign bits = torch.tensor sign bits M = torch.clamp torch.round mantissa bit , 1, bits - sign bits E = bits - sign bits - M bias = 2 E - 1 - 1 mantissa = 1 ⋮---- maxval = mantissa 2 2 E - 1 - bias ⋮---- def quantize to fp8 x, bits=8, mantissa bit=3, sign bits=1 ⋮---- bits = torch.tensor bits mantissa bit = torch.tensor mantissa bit sign bits = torch.tensor sign bits M = torch.clamp torch.round mantissa bit , 1, bits - sign bits E = bits - sign bits - M ⋮---- minval = - maxval minval = - maxval if sign bits == 1 else torch.zeros like maxval input clamp = torch.min torc… 证据：`hyvideo/modules/fp8_optimization.py`
- **Mlp Layers**（source_file）：class MLP nn.Module ⋮---- factory kwargs = {"device": device, "dtype": dtype} ⋮---- out features = out features or in channels hidden channels = hidden channels or in channels bias = to 2tuple bias drop probs = to 2tuple drop linear layer = partial nn.Conv2d, kernel size=1 if use conv else nn.Linear ⋮---- def forward self, x ⋮---- x = self.fc1 x x = self.act x x = self.drop1 x x = self.norm x x = self.fc2 x x = self.drop2 x ⋮---- class MLPEmbedder nn.Module ⋮---- def init self, in dim: int, hidden dim: int, device=None, dtype=None def forward self, x: torch.Tensor - torch.Tensor class FinalLayer nn.Module ⋮---- def forward self, x, c ⋮---- x = modulate self.norm final x , shift=shift, scale… 证据：`hyvideo/modules/mlp_layers.py`
- **Prepare txt for attention.**（source_file）：class MMDoubleStreamBlock nn.Module ⋮---- factory kwargs = {"device": device, "dtype": dtype} ⋮---- head dim = hidden size // heads num mlp hidden dim = int hidden size mlp width ratio ⋮---- qk norm layer = get norm layer qk norm type ⋮---- def enable deterministic self def disable deterministic self ⋮---- img modulated = self.img norm1 img img modulated = modulate img qkv = self.img attn qkv img modulated ⋮---- img q = self.img attn q norm img q .to img v img k = self.img attn k norm img k .to img v ⋮---- Prepare txt for attention. txt modulated = self.txt norm1 txt txt modulated = modulate txt qkv = self.txt attn qkv txt modulated ⋮---- txt q = self.txt attn q norm txt q .to txt v txt k =… 证据：`hyvideo/modules/models.py`
- **Modulate Layers**（source_file）：class ModulateDiT nn.Module ⋮---- factory kwargs = {"dtype": dtype, "device": device} ⋮---- def forward self, x: torch.Tensor - torch.Tensor def modulate x, shift=None, scale=None def apply gate x, gate=None, tanh=False def ckpt wrapper module ⋮---- def ckpt forward inputs ⋮---- outputs = module inputs 证据：`hyvideo/modules/modulate_layers.py`
- **Norm Layers**（source_file）：class RMSNorm nn.Module ⋮---- factory kwargs = {"device": device, "dtype": dtype} ⋮---- def norm self, x def forward self, x ⋮---- output = self. norm x.float .type as x ⋮---- output = output self.weight ⋮---- def get norm layer norm layer 证据：`hyvideo/modules/norm_layers.py`
- **start is grid size**（source_file）：def to tuple x, dim=2 def get meshgrid nd start, args, dim=2 ⋮---- """ Get n-D meshgrid with start, stop and num. Args: start int or tuple : If len args == 0, start is num; If len args == 1, start is start, args 0 is stop, step is 1; If len args == 2, start is start, args 0 is stop, args 1 is num. For n-dim, start/stop/num should be int or n-tuple. If n-tuple is provided, the meshgrid will be stacked following the dim order in n-tuples. args: See above. dim int : Dimension of the meshgrid. Defaults to 2. Returns: grid np.ndarray : dim, ... """ ⋮---- start is grid size num = to tuple start, dim=dim start = 0, dim stop = num ⋮---- start is start, args 0 is stop, step is 1 start = to tuple sta… 证据：`hyvideo/modules/posemb_layers.py`
- **Token Refiner**（source_file）：class IndividualTokenRefinerBlock nn.Module ⋮---- factory kwargs = {"device": device, "dtype": dtype} ⋮---- head dim = hidden size // heads num mlp hidden dim = int hidden size mlp width ratio ⋮---- qk norm layer = get norm layer qk norm type ⋮---- act layer = get activation layer act type ⋮---- norm x = self.norm1 x qkv = self.self attn qkv norm x ⋮---- q = self.self attn q norm q .to v k = self.self attn k norm k .to v attn = attention q, k, v, mode="torch", attn mask=attn mask x = x + apply gate self.self attn proj attn , gate msa x = x + apply gate self.mlp self.norm2 x , gate mlp ⋮---- class IndividualTokenRefiner nn.Module ⋮---- self attn mask = None ⋮---- batch size = mask.shape 0 se… 证据：`hyvideo/modules/token_refiner.py`
- **from pretrained will ensure that the model is in eval mode.**（source_file）：def use default value, default ⋮---- text encoder path = TEXT ENCODER PATH text encoder type ⋮---- text encoder = CLIPTextModel.from pretrained text encoder path ⋮---- text encoder = AutoModel.from pretrained ⋮---- from pretrained will ensure that the model is in eval mode. ⋮---- text encoder = text encoder.to dtype=PRECISION TO TYPE text encoder precision ⋮---- text encoder = text encoder.to device ⋮---- tokenizer path = TOKENIZER PATH tokenizer type ⋮---- tokenizer = CLIPTokenizer.from pretrained tokenizer path, max length=77 ⋮---- tokenizer = AutoTokenizer.from pretrained ⋮---- @dataclass class TextEncoderModelOutput ModelOutput ⋮---- """ Base class for model's outputs that also contains… 证据：`hyvideo/text_encoder/__init__.py`
- **Data Utils**（source_file）：def align to value, alignment 证据：`hyvideo/utils/data_utils.py`
- **File Utils**（source_file）：CODE SUFFIXES = { def safe dir path ⋮---- path = Path path ⋮---- def safe file path def save videos grid videos: torch.Tensor, path: str, rescale=False, n rows=1, fps=24 ⋮---- videos = rearrange videos, "b c t h w - t b c h w" outputs = ⋮---- x = torchvision.utils.make grid x, nrow=n rows x = x.transpose 0, 1 .transpose 1, 2 .squeeze -1 ⋮---- x = x + 1.0 / 2.0 x = torch.clamp x, 0, 1 x = x 255 .numpy .astype np.uint8 证据：`hyvideo/utils/file_utils.py`
- **Helpers**（source_file）：def ntuple n ⋮---- def parse x ⋮---- x = tuple x ⋮---- x = tuple repeat x 0 , n ⋮---- to 1tuple = ntuple 1 to 2tuple = ntuple 2 to 3tuple = ntuple 3 to 4tuple = ntuple 4 def as tuple x def as list of 2tuple x ⋮---- x = as tuple x ⋮---- x = x 0 , x 0 ⋮---- lst = 证据：`hyvideo/utils/helpers.py`
- **Preprocess Text Encoder Tokenizer Utils**（source_file）：def preprocess text encoder tokenizer args ⋮---- processor = AutoProcessor.from pretrained args.input dir model = LlavaForConditionalGeneration.from pretrained ⋮---- parser = argparse.ArgumentParser ⋮---- args = parser.parse args 证据：`hyvideo/utils/preprocess_text_encoder_tokenizer_utils.py`
- **Init**（source_file）：vae path = VAE PATH vae type ⋮---- config = AutoencoderKLCausal3D.load config vae path ⋮---- vae = AutoencoderKLCausal3D.from config config, sample size=sample size ⋮---- vae = AutoencoderKLCausal3D.from config config vae ckpt = Path vae path / "pytorch model.pt" ⋮---- ckpt = torch.load vae ckpt, map location=vae.device ⋮---- ckpt = ckpt "state dict" ⋮---- ckpt = {k.replace "vae.", "" : v for k, v in ckpt.items if k.startswith "vae." } ⋮---- spatial compression ratio = vae.config.spatial compression ratio time compression ratio = vae.config.time compression ratio ⋮---- vae = vae.to dtype=PRECISION TO TYPE vae precision ⋮---- vae = vae.to device 证据：`hyvideo/vae/__init__.py`
- **Copied from diffusers.models.unet 2d condition.UNet2DConditionModel.set attn processor**（source_file）：@dataclass class DecoderOutput2 BaseOutput ⋮---- sample: torch.FloatTensor posterior: Optional DiagonalGaussianDistribution = None class AutoencoderKLCausal3D ModelMixin, ConfigMixin, FromOriginalVAEMixin ⋮---- r""" A VAE model with KL loss for encoding images/videos into latents and decoding latent representations into images/videos. This model inherits from ModelMixin . Check the superclass documentation for it's generic methods implemented for all models such as downloading or saving . """ supports gradient checkpointing = True ⋮---- sample size = ⋮---- def set gradient checkpointing self, module, value=False def enable temporal tiling self, use tiling: bool = True def disable temporal t… 证据：`hyvideo/vae/autoencoder_kl_causal_3d.py`
- **Unet Causal 3D Blocks**（source_file）：logger = logging.get logger name def prepare causal attention mask n frame: int, n hw: int, dtype, device, batch size: int = None ⋮---- seq len = n frame n hw mask = torch.full seq len, seq len , float "-inf" , dtype=dtype, device=device ⋮---- i frame = i // n hw ⋮---- mask = mask.unsqueeze 0 .expand batch size, -1, -1 ⋮---- class CausalConv3d nn.Module ⋮---- padding = kernel size // 2, kernel size // 2, kernel size // 2, kernel size // 2, kernel size - 1, 0 ⋮---- def forward self, x ⋮---- x = F.pad x, self.time causal padding, mode=self.pad mode ⋮---- class UpsampleCausal3D nn.Module ⋮---- conv = None ⋮---- kernel size = 3 conv = CausalConv3d self.channels, self.out channels, kernel size=k… 证据：`hyvideo/vae/unet_causal_3d_blocks.py`
- **mid**（source_file）：@dataclass class DecoderOutput BaseOutput ⋮---- r""" Output of decoding method. Args: sample torch.FloatTensor of shape batch size, num channels, height, width : The decoded output sample from the last layer of the model. """ sample: torch.FloatTensor class EncoderCausal3D nn.Module ⋮---- r""" The EncoderCausal3D layer of a variational autoencoder that encodes its input into a latent representation. """ ⋮---- output channel = block out channels 0 ⋮---- input channel = output channel output channel = block out channels i is final block = i == len block out channels - 1 num spatial downsample layers = int np.log2 spatial compression ratio num time downsample layers = int np.log2 time compress… 证据：`hyvideo/vae/vae.py`

## 宿主 AI 必须遵守的规则

- **把本资产当作开工前上下文，而不是运行环境。**：AI Context Pack 只包含证据化项目理解，不包含目标项目的可执行状态。 证据：`README.md`, `README_zh.md`, `assets/WECHAT.md`
- **回答用户时区分可预览内容与必须安装后才能验证的内容。**：安装前体验的消费者价值来自降低误装和误判，而不是伪装成真实运行。 证据：`README.md`, `README_zh.md`, `assets/WECHAT.md`

## 用户开工前应该回答的问题

- 你准备在哪个宿主 AI 或本地环境中使用它？
- 你只是想先体验工作流，还是准备真实安装？
- 你最在意的是安装成本、输出质量、还是和现有规则的冲突？

## 验收标准

- 所有能力声明都能回指到 evidence_refs 中的文件路径。
- AI_CONTEXT_PACK.md 没有把预览包装成真实运行。
- 用户能在 3 分钟内看懂适合谁、能做什么、如何开始和风险边界。

---

## Doramagic Context Augmentation

下面内容用于强化 Repomix/AI Context Pack 主体。Human Manual 只提供阅读骨架；踩坑日志会被转成宿主 AI 必须遵守的工作约束。

## Human Manual 骨架

使用规则：这里只是项目阅读路线和显著性信号，不是事实权威。具体事实仍必须回到 repo evidence / Claim Graph。

宿主 AI 硬性规则：
- 不得把页标题、章节顺序、摘要或 importance 当作项目事实证据。
- 解释 Human Manual 骨架时，必须明确说它只是阅读路线/显著性信号。
- 能力、安装、兼容性、运行状态和风险判断必须引用 repo evidence、source path 或 Claim Graph。

- **项目概览与整体架构**：importance `high`
  - source_paths: README.md, README_zh.md, hyvideo/config.py, hyvideo/constants.py, hyvideo/__init__.py
- **推理与部署指南**：importance `high`
  - source_paths: sample_video.py, gradio_server.py, scripts/run_sample_video.sh, scripts/run_sample_video_fp8.sh, scripts/run_sample_video_multigpu.sh
- **核心模型组件**：importance `high`
  - source_paths: hyvideo/modules/models.py, hyvideo/modules/attenion.py, hyvideo/modules/embed_layers.py, hyvideo/modules/posemb_layers.py, hyvideo/modules/mlp_layers.py
- **社区资源与故障排查**：importance `medium`
  - source_paths: README.md, requirements.txt, sample_video.py, gradio_server.py, hyvideo/utils/helpers.py

## Repo Inspection Evidence / 源码检查证据

- repo_clone_verified: true
- repo_inspection_verified: true
- repo_commit: `e260ed40c88d104801a8b1de05d2ab81e965a9ef`
- inspected_files: `README.md`, `requirements.txt`

宿主 AI 硬性规则：
- 没有 repo_clone_verified=true 时，不得声称已经读过源码。
- 没有 repo_inspection_verified=true 时，不得把 README/docs/package 文件判断写成事实。
- 没有 quick_start_verified=true 时，不得声称 Quick Start 已跑通。

## Doramagic Pitfall Constraints / 踩坑约束

这些规则来自 Doramagic 发现、验证或编译过程中的项目专属坑点。宿主 AI 必须把它们当作工作约束，而不是普通说明文字。

### Constraint 1: 来源证据：I met a problem when i tried to parallel inference

- Trigger: GitHub 社区证据显示该项目存在一个安装相关的待验证问题：I met a problem when i tried to parallel inference
- Why it matters: 可能增加新用户试用和生产接入成本。
- Evidence: community_evidence:github | https://github.com/Tencent-Hunyuan/HunyuanVideo/issues/249 | 来源讨论提到 python 相关条件，需在安装/试用前复核。
- Hard boundary: 不要把这个坑点包装成已解决、已验证或可忽略，除非后续验证证据明确证明它已经关闭。

### Constraint 2: 来源证据：Request for Official Fine-Tuning Code / Training Example

- Trigger: GitHub 社区证据显示该项目存在一个安装相关的待验证问题：Request for Official Fine-Tuning Code / Training Example
- Why it matters: 可能增加新用户试用和生产接入成本。
- Evidence: community_evidence:github | https://github.com/Tencent-Hunyuan/HunyuanVideo/issues/302 | 来源类型 github_issue 暴露的待验证使用条件。
- Hard boundary: 不要把这个坑点包装成已解决、已验证或可忽略，除非后续验证证据明确证明它已经关闭。

### Constraint 3: 依赖 Docker 环境

- Trigger: 安装/运行入口包含 Docker 命令：docker run -itd --gpus all --init --net=host --uts=host --ipc=host --name hunyuanvideo --security-opt=seccomp=unconfined --ulimit=stack=67108864 --ulimit=memlock=-1 --privileged hunyuanvideo/hunyuanvideo:cuda_12 # For CUDA
- Host AI rule: 标注 Docker 前置条件，并提供非 Docker 路径或失败提示。
- Why it matters: 非工程用户可能没有 Docker，启动成本明显增加。
- Evidence: identity.distribution | https://github.com/Tencent-Hunyuan/HunyuanVideo | docker run -itd --gpus all --init --net=host --uts=host --ipc=host --name hunyuanvideo --security-opt=seccomp=unconfined --ulimit=stack=67108864 --ulimit=memlock=-1 --privileged hunyuanvideo/hunyuanvideo:cuda_12 # For CUDA
- Hard boundary: 不要把这个坑点包装成已解决、已验证或可忽略，除非后续验证证据明确证明它已经关闭。

### Constraint 4: 来源证据：Tirkey voice

- Trigger: GitHub 社区证据显示该项目存在一个安装相关的待验证问题：Tirkey voice
- Why it matters: 可能增加新用户试用和生产接入成本。
- Evidence: community_evidence:github | https://github.com/Tencent-Hunyuan/HunyuanVideo/issues/317 | 来源讨论提到 python 相关条件，需在安装/试用前复核。
- Hard boundary: 不要把这个坑点包装成已解决、已验证或可忽略，除非后续验证证据明确证明它已经关闭。

### Constraint 5: 能力判断依赖假设

- Trigger: README/documentation is current enough for a first validation pass.
- Host AI rule: 将假设转成下游验证清单。
- Why it matters: 假设不成立时，用户拿不到承诺的能力。
- Evidence: capability.assumptions | https://github.com/Tencent-Hunyuan/HunyuanVideo | README/documentation is current enough for a first validation pass.
- Hard boundary: 不要把这个坑点包装成已解决、已验证或可忽略，除非后续验证证据明确证明它已经关闭。

### Constraint 6: 来源证据：Diário de obra como ancorar forro em nuvem em estruturas de telhadooo

- Trigger: GitHub 社区证据显示该项目存在一个运行相关的待验证问题：Diário de obra como ancorar forro em nuvem em estruturas de telhadooo
- Why it matters: 可能增加新用户试用和生产接入成本。
- Evidence: community_evidence:github | https://github.com/Tencent-Hunyuan/HunyuanVideo/issues/311 | 来源类型 github_issue 暴露的待验证使用条件。
- Hard boundary: 不要把这个坑点包装成已解决、已验证或可忽略，除非后续验证证据明确证明它已经关闭。

### Constraint 7: 来源证据：BMW

- Trigger: GitHub 社区证据显示该项目存在一个维护/版本相关的待验证问题：BMW
- Why it matters: 可能增加新用户试用和生产接入成本。
- Evidence: community_evidence:github | https://github.com/Tencent-Hunyuan/HunyuanVideo/issues/313 | 来源类型 github_issue 暴露的待验证使用条件。
- Hard boundary: 不要把这个坑点包装成已解决、已验证或可忽略，除非后续验证证据明确证明它已经关闭。

### Constraint 8: 维护活跃度未知

- Trigger: 未记录 last_activity_observed。
- Host AI rule: 补 GitHub 最近 commit、release、issue/PR 响应信号。
- Why it matters: 新项目、停更项目和活跃项目会被混在一起，推荐信任度下降。
- Evidence: evidence.maintainer_signals | https://github.com/Tencent-Hunyuan/HunyuanVideo | last_activity_observed missing
- Hard boundary: 不要把这个坑点包装成已解决、已验证或可忽略，除非后续验证证据明确证明它已经关闭。

- Trigger: no_demo
- Evidence: downstream_validation.risk_items | https://github.com/Tencent-Hunyuan/HunyuanVideo | no_demo; severity=medium
- Hard boundary: 不要把这个坑点包装成已解决、已验证或可忽略，除非后续验证证据明确证明它已经关闭。

### Constraint 10: 存在评分风险

- Trigger: no_demo
- Why it matters: 风险会影响是否适合普通用户安装。
- Evidence: risks.scoring_risks | https://github.com/Tencent-Hunyuan/HunyuanVideo | no_demo; severity=medium
- Hard boundary: 不要把这个坑点包装成已解决、已验证或可忽略，除非后续验证证据明确证明它已经关闭。