# fastchat - Doramagic AI Context Pack

> 定位：安装前体验与判断资产。它帮助宿主 AI 有一个好的开始，但不代表已经安装、执行或验证目标项目。

## 充分原则

- **充分原则，不是压缩原则**：AI Context Pack 应该充分到让宿主 AI 在开工前理解项目价值、能力边界、使用入口、风险和证据来源；它可以分层组织，但不以最短摘要为目标。
- **压缩策略**：只压缩噪声和重复内容，不压缩会影响判断和开工质量的上下文。

## 给宿主 AI 的使用方式

你正在读取 Doramagic 为 fastchat 编译的 AI Context Pack。请把它当作开工前上下文：帮助用户理解适合谁、能做什么、如何开始、哪些必须安装后验证、风险在哪里。不要声称你已经安装、运行或执行了目标项目。

## Claim 消费规则

- **事实来源**：Repo Evidence + Claim/Evidence Graph；Human Wiki 只提供显著性、术语和叙事结构。
- **事实最低状态**：`supported`
- `supported`：可以作为项目事实使用，但回答中必须引用 claim_id 和证据路径。
- `weak`：只能作为低置信度线索，必须要求用户继续核实。
- `inferred`：只能用于风险提示或待确认问题，不能包装成项目事实。
- `unverified`：不得作为事实使用，应明确说证据不足。
- `contradicted`：必须展示冲突来源，不得替用户强行选择一个版本。

## 它最适合谁

- **AI 研究者或研究型 Agent 构建者**：README 明确围绕研究、实验或论文工作流展开。 证据：`README.md` Claim：`clm_0002` supported 0.86
- **正在使用 Claude/Codex/Cursor/Gemini 等宿主 AI 的开发者**：README 或插件配置提到多个宿主 AI。 证据：`README.md` Claim：`clm_0003` supported 0.86

## 它能做什么

- **命令行启动或安装流程**（需要安装后验证）：项目文档中存在可执行命令，真实使用需要在本地或宿主环境中运行这些命令。 证据：`README.md` Claim：`clm_0001` supported 0.86

## 怎么开始

- `git clone https://github.com/lm-sys/FastChat.git` 证据：`README.md` Claim：`clm_0004` supported 0.86

## 继续前判断卡

- **当前建议**：仅建议沙盒试装
- **为什么**：项目存在安装命令、宿主配置或本地写入线索，不建议直接进入主力环境，应先在隔离环境试装。

### 30 秒判断

- **现在怎么做**：仅建议沙盒试装
- **最小安全下一步**：先跑 Prompt Preview；若仍要安装，只在隔离环境试装
- **先别相信**：真实输出质量不能在安装前相信。
- **继续会触碰**：命令执行、本地环境或项目文件、宿主 AI 上下文

### 现在可以相信

- **适合人群线索：AI 研究者或研究型 Agent 构建者**（supported）：有 supported claim 或项目证据支撑，但仍不等于真实安装效果。 证据：`README.md` Claim：`clm_0002` supported 0.86
- **适合人群线索：正在使用 Claude/Codex/Cursor/Gemini 等宿主 AI 的开发者**（supported）：有 supported claim 或项目证据支撑，但仍不等于真实安装效果。 证据：`README.md` Claim：`clm_0003` supported 0.86
- **能力存在：命令行启动或安装流程**（supported）：可以相信项目包含这类能力线索；是否适合你的具体任务仍要试用或安装后验证。 证据：`README.md` Claim：`clm_0001` supported 0.86
- **存在 Quick Start / 安装命令线索**（supported）：可以相信项目文档出现过启动或安装入口；不要因此直接在主力环境运行。 证据：`README.md` Claim：`clm_0004` supported 0.86

### 现在还不能相信

- **真实输出质量不能在安装前相信。**（unverified）：Prompt Preview 只能展示引导方式，不能证明真实项目中的结果质量。
- **宿主 AI 版本兼容性不能在安装前相信。**（unverified）：Claude、Cursor、Codex、Gemini 等宿主加载规则和版本差异必须在真实环境验证。
- **不会污染现有宿主 AI 行为，不能直接相信。**（inferred）：Skill、plugin、AGENTS/CLAUDE/GEMINI 指令可能改变宿主 AI 的默认行为。
- **可安全回滚不能默认相信。**（unverified）：除非项目明确提供卸载和恢复说明，否则必须先在隔离环境验证。
- **真实安装后是否与用户当前宿主 AI 版本兼容？**（unverified）：兼容性只能通过实际宿主环境验证。
- **项目输出质量是否满足用户具体任务？**（unverified）：安装前预览只能展示流程和边界，不能替代真实评测。
- **安装命令是否需要网络、权限或全局写入？**（unverified）：这影响企业环境和个人环境的安装风险。 证据：`README.md`

### 继续会触碰什么

- **命令执行**：包管理器、网络下载、本地插件目录、项目配置或用户主目录。 原因：运行第一条命令就可能产生环境改动；必须先判断是否值得跑。 证据：`README.md`
- **本地环境或项目文件**：安装结果、插件缓存、项目配置或本地依赖目录。 原因：安装前无法证明写入范围和回滚方式，需要隔离验证。 证据：`README.md`
- **宿主 AI 上下文**：AI Context Pack、Prompt Preview、Skill 路由、风险规则和项目事实。 原因：导入上下文会影响宿主 AI 后续判断，必须避免把未验证项包装成事实。

### 最小安全下一步

- **先跑 Prompt Preview**：用安装前交互式试用判断工作方式是否匹配，不需要授权或改环境。（适用：任何项目都适用，尤其是输出质量未知时。）
- **只在隔离目录或测试账号试装**：避免安装命令污染主力宿主 AI、真实项目或用户主目录。（适用：存在命令执行、插件配置或本地写入线索时。）
- **安装后只验证一个最小任务**：先验证加载、兼容、输出质量和回滚，再决定是否深用。（适用：准备从试用进入真实工作流时。）

### 退出方式

- **保留安装前状态**：记录原始宿主配置和项目状态，后续才能判断是否可恢复。
- **记录安装命令和写入路径**：没有明确卸载说明时，至少要知道哪些目录或配置需要手动清理。
- **如果没有回滚路径，不进入主力环境**：不可回滚是继续前阻断项，不应靠信任或运气继续。

## 哪些只能预览

- 解释项目适合谁和能做什么
- 基于项目文档演示典型对话流程
- 帮助用户判断是否值得安装或继续研究

## 哪些必须安装后验证

- 真实安装 Skill、插件或 CLI
- 执行脚本、修改本地文件或访问外部服务
- 验证真实输出质量、性能和兼容性

## 边界与风险判断卡

- **把安装前预览误认为真实运行**：用户可能高估项目已经完成的配置、权限和兼容性验证。 处理方式：明确区分 prompt_preview_can_do 与 runtime_required。 Claim：`clm_0005` inferred 0.45
- **命令执行会修改本地环境**：安装命令可能写入用户主目录、宿主插件目录或项目配置。 处理方式：先在隔离环境或测试账号中运行。 证据：`README.md` Claim：`clm_0006` supported 0.86
- **待确认**：真实安装后是否与用户当前宿主 AI 版本兼容？。原因：兼容性只能通过实际宿主环境验证。
- **待确认**：项目输出质量是否满足用户具体任务？。原因：安装前预览只能展示流程和边界，不能替代真实评测。
- **待确认**：安装命令是否需要网络、权限或全局写入？。原因：这影响企业环境和个人环境的安装风险。

## 开工前工作上下文

### 加载顺序

- 先读取 how_to_use.host_ai_instruction，建立安装前判断资产的边界。
- 读取 claim_graph_summary，确认事实来自 Claim/Evidence Graph，而不是 Human Wiki 叙事。
- 再读取 intended_users、capabilities 和 quick_start_candidates，判断用户是否匹配。
- 需要执行具体任务时，优先查 role_skill_index，再查 evidence_index。
- 遇到真实安装、文件修改、网络访问、性能或兼容性问题时，转入 risk_card 和 boundaries.runtime_required。

### 任务路由

- **命令行启动或安装流程**：先说明这是安装后验证能力，再给出安装前检查清单。 边界：必须真实安装或运行后验证。 证据：`README.md` Claim：`clm_0001` supported 0.86

### 上下文规模

- 文件总数：173
- 重要文件覆盖：40/173
- 证据索引条目：51
- 角色 / Skill 条目：7

### 证据不足时的处理

- **missing_evidence**：说明证据不足，要求用户提供目标文件、README 段落或安装后验证记录；不要补全事实。
- **out_of_scope_request**：说明该任务超出当前 AI Context Pack 证据范围，并建议用户先查看 Human Manual 或真实安装后验证。
- **runtime_request**：给出安装前检查清单和命令来源，但不要替用户执行命令或声称已执行。
- **source_conflict**：同时展示冲突来源，标记为待核实，不要强行选择一个版本。

## Prompt Recipes

### 适配判断

- 目标：判断这个项目是否适合用户当前任务。
- 预期输出：适配结论、关键理由、证据引用、安装前可预览内容、必须安装后验证内容、下一步建议。

```text
请基于 fastchat 的 AI Context Pack，先问我 3 个必要问题，然后判断它是否适合我的任务。回答必须包含：适合谁、能做什么、不能做什么、是否值得安装、证据来自哪里。所有项目事实必须引用 evidence_refs、source_paths 或 claim_id。
```

### 安装前体验

- 目标：让用户在安装前感受核心工作流，同时避免把预览包装成真实能力或营销承诺。
- 预期输出：一段带边界标签的体验剧本、安装后验证清单和谨慎建议；不含真实运行承诺或强营销表述。

```text
请把 fastchat 当作安装前体验资产，而不是已安装工具或真实运行环境。

请严格输出四段：
1. 先问我 3 个必要问题。
2. 给出一段“体验剧本”：用 [安装前可预览]、[必须安装后验证]、[证据不足] 三种标签展示它可能如何引导工作流。
3. 给出安装后验证清单：列出哪些能力只有真实安装、真实宿主加载、真实项目运行后才能确认。
4. 给出谨慎建议：只能说“值得继续研究/试装”“先补充信息后再判断”或“不建议继续”，不得替项目背书。

硬性边界：
- 不要声称已经安装、运行、执行测试、修改文件或产生真实结果。
- 不要写“自动适配”“确保通过”“完美适配”“强烈建议安装”等承诺性表达。
- 如果描述安装后的工作方式，必须使用“如果安装成功且宿主正确加载 Skill，它可能会……”这种条件句。
- 体验剧本只能写成“示例台词/假设流程”：使用“可能会询问/可能会建议/可能会展示”，不要写“已写入、已生成、已通过、正在运行、正在生成”。
- Prompt Preview 不负责给安装命令；如用户准备试装，只能提示先阅读 Quick Start 和 Risk Card，并在隔离环境验证。
- 所有项目事实必须来自 supported claim、evidence_refs 或 source_paths；inferred/unverified 只能作风险或待确认项。

```

### 角色 / Skill 选择

- 目标：从项目里的角色或 Skill 中挑选最匹配的资产。
- 预期输出：候选角色或 Skill 列表，每项包含适用场景、证据路径、风险边界和是否需要安装后验证。

```text
请读取 role_skill_index，根据我的目标任务推荐 3-5 个最相关的角色或 Skill。每个推荐都要说明适用场景、可能输出、风险边界和 evidence_refs。
```

### 风险预检

- 目标：安装或引入前识别环境、权限、规则冲突和质量风险。
- 预期输出：环境、权限、依赖、许可、宿主冲突、质量风险和未知项的检查清单。

```text
请基于 risk_card、boundaries 和 quick_start_candidates，给我一份安装前风险预检清单。不要替我执行命令，只说明我应该检查什么、为什么检查、失败会有什么影响。
```

### 宿主 AI 开工指令

- 目标：把项目上下文转成一次对话开始前的宿主 AI 指令。
- 预期输出：一段边界明确、证据引用明确、适合复制给宿主 AI 的开工前指令。

```text
请基于 fastchat 的 AI Context Pack，生成一段我可以粘贴给宿主 AI 的开工前指令。这段指令必须遵守 not_runtime=true，不能声称项目已经安装、运行或产生真实结果。
```

## 角色 / Skill 索引

- 共索引 7 个角色 / Skill / 项目文档条目。

- **FastChat**（project_doc）：FastChat Demo https://lmarena.ai/ Discord https://discord.gg/6GXcFg3TH8 X https://x.com/lmsysorg 激活提示：当用户需要理解项目结构、安装方式或边界时参考。 证据：`README.md`
- **LLM Judge**（project_doc）：LLM Judge Paper https://arxiv.org/abs/2306.05685 Leaderboard https://huggingface.co/spaces/lmsys/chatbot-arena-leaderboard 激活提示：当用户需要理解项目结构、安装方式或边界时参考。 证据：`fastchat/llm_judge/README.md`
- **fastchat Nginx Gateway**（project_doc）：The Nginx gateway serves the following purposes: 激活提示：当用户需要理解项目结构、安装方式或边界时参考。 证据：`fastchat/serve/gateway/README.md`
- **Download dataset**（project_doc）：Download dataset We have pre-generated several category classifier benchmarks and ground truths. You can download them with git-lfs https://git-lfs.com installed to the directory classify/ by running Your label bench directory should follow the structure: 激活提示：当用户需要理解项目结构、安装方式或边界时参考。 证据：`fastchat/serve/monitor/classify/README.md`
- **Instructions**（project_doc）：First run analyze data.py to collect metadata of all votes. 激活提示：当用户需要理解项目结构、安装方式或边界时参考。 证据：`fastchat/serve/monitor/vote_time_stats/README.md`
- **Machine Learning with Embeddings**（project_doc）：Machine Learning with Embeddings You can use embeddings to - Evaluate text similarity, see test sentence similarity.py test sentence similarity.py - Build your own classifier, see test classification.py test classification.py - Search relative texts, see test semantic search.py test semantic search.py 激活提示：当用户需要理解项目结构、安装方式或边界时参考。 证据：`playground/test_embedding/README.md`
- **filter words**（project_doc）：export BASE=clean conv 20230809 100k pii export SCALE=10 激活提示：当用户需要理解项目结构、安装方式或边界时参考。 证据：`fastchat/serve/monitor/dataset_release_scripts/lmsys_chat_1m/instructions.md`

## 证据索引

- 共索引 51 条证据。

- **FastChat**（documentation）：FastChat Demo https://lmarena.ai/ Discord https://discord.gg/6GXcFg3TH8 X https://x.com/lmsysorg 证据：`README.md`
- **LLM Judge**（documentation）：LLM Judge Paper https://arxiv.org/abs/2306.05685 Leaderboard https://huggingface.co/spaces/lmsys/chatbot-arena-leaderboard 证据：`fastchat/llm_judge/README.md`
- **fastchat Nginx Gateway**（documentation）：The Nginx gateway serves the following purposes: 证据：`fastchat/serve/gateway/README.md`
- **Download dataset**（documentation）：Download dataset We have pre-generated several category classifier benchmarks and ground truths. You can download them with git-lfs https://git-lfs.com installed to the directory classify/ by running Your label bench directory should follow the structure: 证据：`fastchat/serve/monitor/classify/README.md`
- **Instructions**（documentation）：First run analyze data.py to collect metadata of all votes. 证据：`fastchat/serve/monitor/vote_time_stats/README.md`
- **Machine Learning with Embeddings**（documentation）：Machine Learning with Embeddings You can use embeddings to - Evaluate text similarity, see test sentence similarity.py test sentence similarity.py - Build your own classifier, see test classification.py test classification.py - Search relative texts, see test semantic search.py test semantic search.py 证据：`playground/test_embedding/README.md`
- **License**（source_file）：Apache License Version 2.0, January 2004 http://www.apache.org/licenses/ 证据：`LICENSE`
- **The names of two roles**（source_file）：class SeparatorStyle IntEnum ⋮---- ADD COLON SINGLE = auto ADD COLON TWO = auto ADD COLON SPACE SINGLE = auto NO COLON SINGLE = auto NO COLON TWO = auto ADD NEW LINE SINGLE = auto LLAMA2 = auto LLAMA3 = auto CHATGLM = auto CHATML = auto CHATINTERN = auto DOLLY = auto RWKV = auto PHOENIX = auto ROBIN = auto FALCON CHAT = auto CHATGLM3 = auto DEEPSEEK CHAT = auto METAMATH = auto YUAN2 = auto GEMMA = auto CLLM = auto DEFAULT = auto IMAGE PLACEHOLDER STR = "$$ $$" ⋮---- @dataclasses.dataclass class Conversation ⋮---- name: str system template: str = "{system message}" system message: str = "" system message vision: str = "" The names of two roles roles: Tuple str = "USER", "ASSISTANT" messages:… 证据：`fastchat/conversation.py`
- **Split Long Conversation**（source_file）：def make sample sample, start idx, end idx tokenizer = max length = None def split one sample sample ⋮---- tokenized lens = conversations = sample "conversations" conversations = conversations : len conversations // 2 2 ⋮---- length = len tokenizer c "value" .input ids + 6 ⋮---- start idx = 0 cur len = 0 ⋮---- new samples = ⋮---- tmp len = tokenized lens i + tokenized lens i + 1 ⋮---- start idx = i ⋮---- def worker input data ⋮---- result = ⋮---- def split all content, begin, end, tokenizer , max length ⋮---- tokenizer = tokenizer max length = max length content = content begin:end new content = chunks = content i : i + 1000 for i in range 0, len content , 1000 ⋮---- def filter invalid role… 证据：`fastchat/data/split_long_conversation.py`
- **check ref answers**（source_file）：API MAX RETRY = 16 API RETRY SLEEP = 10 API ERROR OUTPUT = "$ERROR$" TIE DELTA = 0.1 NEED REF CATS = "math", "reasoning", "coding", "arena-hard-200" two score pattern = re.compile "\ \ \d+\.?\d ,\s? \d+\.?\d \ \ " two score pattern backup = re.compile "\ \d+\.?\d ,\s? \d+\.?\d \ " one score pattern = re.compile "\ \ \d+\.?\d \ \ " one score pattern backup = re.compile "\ \d+\.?\d \ " temperature config = { reverse model map = { ⋮---- @dataclasses.dataclass class Judge ⋮---- model name: str prompt template: dict ref based: bool = False multi turn: bool = False ⋮---- @dataclasses.dataclass class MatchSingle ⋮---- question: dict model: str answer: dict judge: Judge ref answer: dict = None ⋮---… 证据：`fastchat/llm_judge/common.py`
- **Gen Api Answer**（source_file）：temperature = args.force temperature ⋮---- temperature = question "required temperature" ⋮---- temperature = temperature config question "category" ⋮---- temperature = 0.7 choices = chat state = None ⋮---- conv = get conversation template model turns = ⋮---- output = chat completion anthropic model, conv, temperature, max tokens ⋮---- output = chat completion openai model, conv, temperature, max tokens ⋮---- ans = { ⋮---- parser = argparse.ArgumentParser ⋮---- args = parser.parse args ⋮---- question file = f"data/{args.bench name}/question.jsonl" questions = load questions question file, args.question begin, args.question end ⋮---- answer file = args.answer file ⋮---- answer file = f"data/{… 证据：`fastchat/llm_judge/gen_api_answer.py`
- **Gen Judgment**（source_file）：matches = ⋮---- q id = q "question id" m 1 = models i m 2 = baseline model ⋮---- a 1 = model answers m 1 q id a 2 = model answers baseline model q id ⋮---- ref = ref answers judge.model name q id match = MatchPair ⋮---- m 2 = models j ⋮---- a 2 = model answers m 2 q id ⋮---- m = models i a = model answers m q id ⋮---- def make judge pairwise judge model, judge prompts ⋮---- judges = {} ⋮---- def make judge single judge model, judge prompts ⋮---- parser = argparse.ArgumentParser ⋮---- args = parser.parse args question file = f"data/{args.bench name}/question.jsonl" answer dir = f"data/{args.bench name}/model answer" ref answer dir = f"data/{args.bench name}/reference answer" questions = load… 证据：`fastchat/llm_judge/gen_judgment.py`
- **Gen Model Answer**（source_file）：questions = load questions question file, question begin, question end ⋮---- use ray = num gpus total // num gpus per model 1 ⋮---- get answers func = ray.remote num gpus=num gpus per model ⋮---- get answers func = get model answers chunk size = len questions // num gpus total // num gpus per model ans handles = ⋮---- temperature = temperature config question "category" ⋮---- temperature = 0.7 choices = ⋮---- conv = get conversation template model id turns = ⋮---- qs = question "turns" j ⋮---- prompt = conv.get prompt input ids = tokenizer prompt .input ids ⋮---- do sample = False ⋮---- do sample = True ⋮---- output ids = model.generate ⋮---- output ids = output ids 0 ⋮---- output ids = out… 证据：`fastchat/llm_judge/gen_model_answer.py`
- **Build question selector map**（source_file）：questions = model answers = {} model judgments normal single = {} model judgments math single = {} model judgments normal pairwise = {} model judgments math pairwise = {} question selector map = {} category selector map = defaultdict list def display question category selector, request: gr.Request ⋮---- choices = category selector map category selector ⋮---- q = question selector map question selector qid = q "question id" ans1 = model answers model selector1 qid ans2 = model answers model selector2 qid chat mds = pairwise to gradio chat mds q, ans1, ans2 gamekey = qid, model selector1, model selector2 judgment dict = resolve pairwise judgment dict explanation = judgment dict turn2 = resolv… 证据：`fastchat/llm_judge/qa_browser.py`
- **Show Result**（source_file）：def display result single args ⋮---- input file = ⋮---- input file = args.input file ⋮---- df all = pd.read json input file, lines=True df = df all "model", "score", "turn" df = df df "score" != -1 ⋮---- df = df df "model" .isin args.model list ⋮---- df 1 = df df "turn" == 1 .groupby "model", "turn" .mean ⋮---- df 2 = df df "turn" == 2 .groupby "model", "turn" .mean ⋮---- df 3 = df "model", "score" .groupby "model" .mean ⋮---- def display result pairwise args ⋮---- df all = df all df all "g1 winner" != "error" & df all "g2 winner" != "error" model list = model list = list set model list list res = ⋮---- winner = row "model 1" loser = row "model 2" ⋮---- winner = row "model 2" loser = row "m… 证据：`fastchat/llm_judge/show_result.py`
- **Openai Api Protocol**（source_file）：class ErrorResponse BaseModel ⋮---- object: str = "error" message: str code: int class ModelPermission BaseModel ⋮---- id: str = Field default factory=lambda: f"modelperm-{shortuuid.random }" object: str = "model permission" created: int = Field default factory=lambda: int time.time allow create engine: bool = False allow sampling: bool = True allow logprobs: bool = True allow search indices: bool = True allow view: bool = True allow fine tuning: bool = False organization: str = " " group: Optional str = None is blocking: str = False class ModelCard BaseModel ⋮---- id: str object: str = "model" ⋮---- owned by: str = "fastchat" root: Optional str = None parent: Optional str = None permission… 证据：`fastchat/protocol/openai_api_protocol.py`
- **AI2 uses vLLM, which requires that top p be 1.0 for greedy sampling:**（source_file）：logger = build logger "gradio web server", "gradio web server.log" ⋮---- prompt = conv.to openai vision api messages ⋮---- prompt = conv.to openai api messages stream iter = openai api stream iter ⋮---- last prompt = conv.messages -2 1 stream iter = openai assistant api stream iter ⋮---- prompt = conv.to anthropic vision api messages ⋮---- stream iter = anthropic api stream iter ⋮---- stream iter = anthropic message api stream iter ⋮---- prompt = conv.to gemini api messages stream iter = gemini api stream iter ⋮---- prompt = conv.to openai vision api messages is mistral=True ⋮---- stream iter = mistral api stream iter ⋮---- stream iter = nvidia api stream iter ⋮---- stream iter = ai2 api st… 证据：`fastchat/serve/api_provider.py`
- **Base Model Worker**（source_file）：worker = None logger = None app = FastAPI def heart beat worker obj class BaseModelWorker ⋮---- model path = model path :-1 ⋮---- logger = build logger "model worker", f"model worker {self.worker id}.log" ⋮---- worker = self ⋮---- conv = get conv template conv template ⋮---- conv = get conversation template model path ⋮---- def init heart beat self def register to controller self ⋮---- url = self.controller addr + "/register worker" data = { r = requests.post url, json=data ⋮---- def send heart beat self ⋮---- url = self.controller addr + "/receive heart beat" ⋮---- ret = requests.post exist = ret.json "exist" ⋮---- def get queue length self ⋮---- sempahore value = waiter count = ⋮---- def… 证据：`fastchat/serve/base_model_worker.py`
- **TODO suquark : multiline input has some issues. fix it later.**（source_file）：class SimpleChatIO ChatIO ⋮---- def init self, multiline: bool = False def prompt for input self, role - str ⋮---- prompt data = line = input f"{role} ctrl-d/z on empty line to end : " ⋮---- line = input ⋮---- def prompt for output self, role: str def stream output self, output stream ⋮---- pre = 0 ⋮---- output text = outputs "text" output text = output text.strip .split " " now = len output text - 1 ⋮---- pre = now ⋮---- def print output self, text: str class RichChatIO ChatIO ⋮---- bindings = KeyBindings ⋮---- @bindings.add "escape", "enter" def event def init self, multiline: bool = False, mouse: bool = False ⋮---- TODO suquark : multiline input has some issues. fix it later. prompt inpu… 证据：`fastchat/serve/cli.py`
- **Check status before returning**（source_file）：logger = build logger "controller", "controller.log" class DispatchMethod Enum ⋮---- LOTTERY = auto SHORTEST QUEUE = auto ⋮---- @classmethod def from str cls, name ⋮---- @dataclasses.dataclass class WorkerInfo ⋮---- model names: List str speed: int queue length: int check heart beat: bool last heart beat: str multimodal: bool def heart beat controller controller class Controller ⋮---- def init self, dispatch method: str ⋮---- worker status = self.get worker status worker name ⋮---- def get worker status self, worker name: str ⋮---- r = requests.post worker name + "/worker get status", timeout=5 ⋮---- def remove worker self, worker name: str def refresh all workers self ⋮---- old info = dict… 证据：`fastchat/serve/controller.py`
- **check if model path is existed at local path**（source_file）：app = FastAPI def download model model id, revision ⋮---- source = "huggingface" ⋮---- source = "modelscope" ⋮---- model dir = snapshot download model id, revision=revision ⋮---- model dir = snapshot download repo id=model id ⋮---- class DashInferWorker BaseModelWorker ⋮---- check if model path is existed at local path ⋮---- model path = download model model path, revision engine helper = EngineHelper config ⋮---- async def generate stream self, params ⋮---- context = params.pop "prompt" temperature = params.get "temperature" top k = params.get "top k" top p = params.get "top p" repetition penalty = params.get "repetition penalty" presence penalty = params.get "presence penalty" max new tok… 证据：`fastchat/serve/dashinfer_worker.py`
- **Add models from the controller**（source_file）：logger = build logger "gradio web server", "gradio web server.log" headers = {"User-Agent": "FastChat Client"} no change btn = gr.Button enable btn = gr.Button interactive=True, visible=True disable btn = gr.Button interactive=False invisible btn = gr.Button interactive=False, visible=False enable text = gr.Textbox disable text = gr.Textbox controller url = None enable moderation = False use remote storage = False acknowledgment md = """ api endpoint info = {} class State ⋮---- def init self, model name, is vision=False def update ans models self, ans: str - None def update router outputs self, outputs: Dict str, float - None def init system prompt self, conv, is vision ⋮---- system prompt… 证据：`fastchat/serve/gradio_web_server.py`
- **Huggingface Api**（source_file）：@torch.inference mode def main args ⋮---- msg = args.message conv = get conversation template args.model path ⋮---- prompt = conv.get prompt inputs = tokenizer prompt , return tensors="pt" .to args.device output ids = model.generate ⋮---- output ids = output ids 0 ⋮---- output ids = output ids 0 len inputs "input ids" 0 : outputs = tokenizer.decode ⋮---- parser = argparse.ArgumentParser ⋮---- args = parser.parse args 证据：`fastchat/serve/huggingface_api.py`
- **make sampling params in vllm**（source_file）：app = FastAPI g id gen = ReqIDGenerator class LightLLMWorker BaseModelWorker ⋮---- async def generate stream self, params ⋮---- prompt = params.pop "prompt" request id = params.pop "request id" temperature = float params.get "temperature", 1.0 top p = float params.get "top p", 1.0 top k = params.get "top k", -1.0 presence penalty = float params.get "presence penalty", 0.0 frequency penalty = float params.get "frequency penalty", 0.0 repetition penalty = float params.get "repetition penalty", 1.0 max new tokens = params.get "max new tokens", 256 echo = params.get "echo", True stop str = params.get "stop", None stop token ids = params.get "stop token ids", None or ⋮---- request = params.get "… 证据：`fastchat/serve/lightllm_worker.py`
- **self.context len = get context length**（source_file）：app = FastAPI class MLXWorker BaseModelWorker ⋮---- self.context len = get context length llm engine.engine.model config.hf config self.context len = 2048 hard code for now -- not sure how to get in MLX ⋮---- async def generate stream self, params ⋮---- context = params.pop "prompt" request id = params.pop "request id" temperature = float params.get "temperature", 1.0 top p = float params.get "top p", 1.0 top k = params.get "top k", -1.0 presence penalty = float params.get "presence penalty", 0.0 frequency penalty = float params.get "frequency penalty", 0.0 max new tokens = params.get "max new tokens", 256 stop str = params.get "stop", None stop token ids = params.get "stop token ids", None… 证据：`fastchat/serve/mlx_worker.py`
- **Model Worker**（source_file）：worker id = str uuid.uuid4 :8 logger = build logger "model worker", f"model worker {worker id}.log" class ModelWorker BaseModelWorker ⋮---- def generate stream gate self, params ⋮---- ret = { ⋮---- def generate gate self, params def process embed chunk self, input ids, attention mask, model type dict ⋮---- model output = self.model input ids ⋮---- data = model output.last hidden state ⋮---- data = model output 0 ⋮---- model output = self.model input ids, decoder input ids=input ids data = model output.encoder last hidden state ⋮---- model output = self.model input ids, output hidden states=True ⋮---- data = model output.hidden states -1 .transpose 0, 1 ⋮---- data = model output.hidden state… 证据：`fastchat/serve/model_worker.py`
- **Multi Model Worker**（source_file）：workers = worker map = {} app = FastAPI def release worker semaphore def acquire worker semaphore ⋮---- semaphore = asyncio.Semaphore workers 0 .limit worker concurrency ⋮---- def create background tasks ⋮---- background tasks = BackgroundTasks ⋮---- @app.post "/worker generate stream" async def api generate stream request: Request ⋮---- params = await request.json ⋮---- worker = worker map params "model" generator = worker.generate stream gate params background tasks = create background tasks ⋮---- @app.post "/worker generate" async def api generate request: Request ⋮---- output = worker.generate gate params ⋮---- @app.post "/worker get embeddings" async def api get embeddings request: Req… 证据：`fastchat/serve/multi_model_worker.py`
- **The address of the model controller.**（source_file）：logger = build logger "openai api server", "openai api server.log" conv template map = {} fetch timeout = aiohttp.ClientTimeout total=3 3600 async def fetch remote url, pload=None, name=None ⋮---- chunks = ⋮---- ret = { ⋮---- output = b"".join chunks ⋮---- res = json.loads output ⋮---- res = res name ⋮---- class AppSettings BaseSettings ⋮---- The address of the model controller. controller address: str = "http://localhost:21001" api keys: Optional List str = None app settings = AppSettings app = fastapi.FastAPI headers = {"User-Agent": "FastChat API Server"} get bearer token = HTTPBearer auto error=False ⋮---- def create error response code: int, message: str - JSONResponse ⋮---- @app.excep… 证据：`fastchat/serve/openai_api_server.py`
- **make sampling params for sgl.gen**（source_file）：app = FastAPI ⋮---- @sgl.function def pipeline s, prompt, max tokens class SGLWorker BaseModelWorker ⋮---- async def generate stream self, params ⋮---- prompt = params.pop "prompt" images = params.get "images", temperature = float params.get "temperature", 1.0 top p = float params.get "top p", 1.0 top k = params.get "top k", -1.0 frequency penalty = float params.get "frequency penalty", 0.0 presence penalty = float params.get "presence penalty", 0.0 max new tokens = params.get "max new tokens", 256 stop str = params.get "stop", None stop token ids = params.get "stop token ids", None or echo = params.get "echo", True stop = ⋮---- s = self.tokenizer.decode tid ⋮---- make sampling params for s… 证据：`fastchat/serve/sglang_worker.py`
- **This is to support vllm = 0.2.7 where TokenizerGroup was introduced**（source_file）：app = FastAPI class VLLMWorker BaseModelWorker ⋮---- This is to support vllm = 0.2.7 where TokenizerGroup was introduced and llm engine.engine.tokenizer was no longer a raw tokenizer ⋮---- async def generate stream self, params ⋮---- context = params.pop "prompt" request id = params.pop "request id" temperature = float params.get "temperature", 1.0 top p = float params.get "top p", 1.0 top k = params.get "top k", -1.0 presence penalty = float params.get "presence penalty", 0.0 frequency penalty = float params.get "frequency penalty", 0.0 max new tokens = params.get "max new tokens", 256 stop str = params.get "stop", None stop token ids = params.get "stop token ids", None or ⋮---- echo = par… 证据：`fastchat/serve/vllm_worker.py`
- **"-2" is hardcoded for the Llama tokenizer to make the offset correct.**（source_file）：IGNORE TOKEN ID = LabelSmoother.ignore index ⋮---- @dataclass class ModelArguments ⋮---- model name or path: Optional str = field default="facebook/opt-125m" trust remote code: bool = field padding side: str = field ⋮---- @dataclass class DataArguments ⋮---- data path: str = field eval data path: str = field lazy preprocess: bool = False ⋮---- @dataclass class TrainingArguments transformers.TrainingArguments ⋮---- cache dir: Optional str = field default=None optim: str = field default="adamw torch" model max length: int = field local rank = None def rank0 print args def trainer save model safe trainer: transformers.Trainer ⋮---- save policy = FullStateDictConfig offload to cpu=True, rank0 o… 证据：`fastchat/train/train.py`
- **Train Lora**（source_file）：@dataclass class TrainingArguments transformers.TrainingArguments ⋮---- cache dir: typing.Optional str = field default=None optim: str = field default="adamw torch" model max length: int = field flash attn: bool = False ⋮---- @dataclass class LoraArguments ⋮---- lora r: int = 8 lora alpha: int = 16 lora dropout: float = 0.05 lora target modules: typing.List str = field lora weight path: str = "" lora bias: str = "none" q lora: bool = False def maybe zero 3 param ⋮---- param = param.data.detach .cpu .clone ⋮---- param = param.detach .cpu .clone ⋮---- def get peft state maybe zero 3 named params, bias ⋮---- to return = {k: t for k, t in named params if "lora " in k} ⋮---- to return = {k: t fo… 证据：`fastchat/train/train_lora.py`
- **Benchmark Api Provider**（source_file）：class Metrics ⋮---- def init self def to dict self def sample image and question random questions dict, index ⋮---- message = random questions dict index question = message "question" path = message "path" ⋮---- question = question 0 ⋮---- prev message = "" prev time = time.time CHARACTERS PER TOKEN = 4 metrics = Metrics stream iter = get api provider stream iter call time = time.time token times = ⋮---- output = data "text" .strip ⋮---- prev message = output ⋮---- token diff length = len output - len prev message / CHARACTERS PER TOKEN ⋮---- token diff time = time.time - prev time token time = token diff time / token diff length ⋮---- def run benchmark model name, model api dict, random qu… 证据：`playground/benchmark/benchmark_api_provider.py`
- **filter words**（documentation）：export BASE=clean conv 20230809 100k pii export SCALE=10 证据：`fastchat/serve/monitor/dataset_release_scripts/lmsys_chat_1m/instructions.md`
- **Deepspeed Config S2**（structured_config）：{ "zero optimization": { "stage": 2, "offload optimizer": { "device": "cpu" }, "contiguous gradients": true, "overlap comm": true }, "fp16": { "enabled": "auto" }, "train micro batch size per gpu": "auto", "gradient accumulation steps": "auto" } 证据：`playground/deepspeed_config_s2.json`
- **Deepspeed Config S3**（structured_config）：{ "fp16": { "enabled": "auto", "loss scale": 0, "loss scale window": 1000, "initial scale power": 16, "hysteresis": 2, "min loss scale": 1 }, "zero optimization": { "stage": 3, "offload optimizer": { "device": "cpu", "pin memory": true }, "offload param": { "device": "cpu", "pin memory": true }, "overlap comm": true, "contiguous gradients": true, "stage3 max live parameters" : 1e9, "stage3 max reuse distance" : 1e9, "stage3 prefetch bucket size" : 5e8, "stage3 param persistence threshold" : 1e6, "sub group size" : 1e12, "stage3 gather 16bit weights on model save": true }, "train batch size": "auto", "train micro batch size per gpu": "auto", "gradient accumulation steps": "auto" } 证据：`playground/deepspeed_config_s3.json`
- **Python**（source_file）：Python pycache .pyc .egg-info dist .venv 证据：`.gitignore`
- **This Pylint rcfile contains a best-effort configuration to uphold the**（source_file）：This Pylint rcfile contains a best-effort configuration to uphold the best-practices and style described in the Google Python style guide: https://google.github.io/styleguide/pyguide.html Its canonical open-source location is: https://google.github.io/styleguide/pylintrc 证据：`.pylintrc`
- **Dockerfile**（source_file）：FROM nvidia/cuda:12.2.0-runtime-ubuntu20.04 证据：`docker/Dockerfile`
- **Docker Compose**（source_file）：version: "3.9" services: fastchat-controller: build: context: . dockerfile: Dockerfile image: fastchat:latest ports: - "21001:21001" entrypoint: "python3.9", "-m", "fastchat.serve.controller", "--host", "0.0.0.0", "--port", "21001" fastchat-model-worker: build: context: . dockerfile: Dockerfile volumes: - huggingface:/root/.cache/huggingface image: fastchat:latest deploy: resources: reservations: devices: - driver: nvidia count: 1 capabilities: gpu entrypoint: "python3.9", "-m", "fastchat.serve.model worker", "--model-names", "${FASTCHAT WORKER MODEL NAMES:-vicuna-7b-v1.5}", "--model-path", "${FASTCHAT WORKER MODEL PATH:-lmsys/vicuna-7b-v1.5}", "--worker-address", "http://fastchat-model-wor… 证据：`docker/docker-compose.yml`
- **Init**（source_file）：version = "0.2.36" 证据：`fastchat/__init__.py`
- **Constants**（source_file）：REPO PATH = os.path.dirname os.path.dirname file COLOR = " F11414" SURVEY LINK = f""" SERVER ERROR MSG = TEXT MODERATION MSG = IMAGE MODERATION MSG = MODERATION MSG = "$MODERATION$ YOUR INPUT VIOLATES OUR CONTENT MODERATION GUIDELINES." CONVERSATION LIMIT MSG = "YOU HAVE REACHED THE CONVERSATION LENGTH LIMIT. PLEASE CLEAR HISTORY AND START A NEW CONVERSATION." INACTIVE MSG = "THIS SESSION HAS BEEN INACTIVE FOR TOO LONG. PLEASE REFRESH THIS PAGE." SLOW MODEL MSG = RATE LIMIT MSG = " RATE LIMIT OF THIS MODEL IS REACHED. PLEASE COME BACK LATER OR USE BATTLE MODE https://lmarena.ai the 1st tab . " INPUT CHAR LEN LIMIT = int os.getenv "FASTCHAT INPUT CHAR LEN LIMIT", 12000 BLIND MODE INPUT CHAR… 证据：`fastchat/constants.py`
- **From the io.TextIOWrapper docs:**（source_file）：handler = None visited loggers = set def build logger logger name, logger filename ⋮---- formatter = logging.Formatter ⋮---- stdout logger = logging.getLogger "stdout" ⋮---- sl = StreamToLogger stdout logger, logging.INFO ⋮---- stderr logger = logging.getLogger "stderr" ⋮---- sl = StreamToLogger stderr logger, logging.ERROR ⋮---- logger = logging.getLogger logger name ⋮---- filename = os.path.join LOGDIR, logger filename handler = logging.handlers.TimedRotatingFileHandler ⋮---- class StreamToLogger object ⋮---- def init self, logger, log level=logging.INFO def getattr self, attr def write self, buf ⋮---- temp linebuf = self.linebuf + buf ⋮---- From the io.TextIOWrapper docs: On output, if n… 证据：`fastchat/utils.py`
- **Format**（source_file）：set -eo pipefail builtin cd "$ dirname "${BASH SOURCE:-$0}" " ROOT="$ git rev-parse --show-toplevel " builtin cd "$ROOT" exit 1 BLACK VERSION=$ black --version head -n 1 awk '{print $2}' PYLINT VERSION=$ pylint --version head -n 1 awk '{print $2}' tool version check { if $2 != $3 ; then echo "Wrong $1 version installed: $3 is required, not $2." exit 1 fi } tool version check "black" $BLACK VERSION "23.3.0" tool version check "pylint" $PYLINT VERSION "2.8.2" format changed { MERGEBASE="$ git merge-base origin/main HEAD " if ! git diff --diff-filter=ACM --quiet --exit-code "$MERGEBASE" -- ' .py' ' .pyi' & /dev/null; then git diff --name-only --diff-filter=ACM "$MERGEBASE" -- ' .py' ' .pyi' xa… 证据：`format.sh`
- **Fastchat Api Googlecolab**（source_file）：{ "cells": { "cell type": "markdown", "metadata": { "id": "1UDur96B5C7T" }, "source": " FastChat API using Google Colab\n", "\n", " ggcr https://github.com/ggcr " }, { "cell type": "code", "execution count": null, "metadata": { "id": "NQWpzwse8PrC" }, "outputs": , "source": "%cd /content/\n", "\n", " clone FastChat\n", "!git clone https://github.com/lm-sys/FastChat.git\n", "\n", " install dependencies\n", "%cd FastChat\n", "!python3 -m pip install -e \". model worker,webui \" --quiet" }, { "cell type": "markdown", "metadata": { "id": "97181RzwSjha" }, "source": "See openai api.md https://github.com/lm-sys/FastChat/blob/main/docs/openai api.md from FastChat docs.\n", "\n", "Because in Google… 证据：`playground/FastChat_API_GoogleColab.ipynb`
- **Pyproject**（source_file）：build-system requires = "setuptools =61.0" build-backend = "setuptools.build meta" 证据：`pyproject.toml`
- **Build Api**（source_file）：PROJECT DIR="$ pwd " CONDA ENV NAME="fastchat" MODEL PATH="HuggingFaceH4/zephyr-7b-beta" MODEL PATH="lmsys/vicuna-7b-v1.5" API HOST="0.0.0.0" API PORT NUMBER=8000 check and create screen { local SCREENNAME="$1" if screen -list grep -q "$SCREENNAME"; then echo "Screen session '$SCREENNAME' exists. Doing nothing." else echo "Screen session '$SCREENNAME' not found. Creating..." screen -d -m -S "$SCREENNAME" echo "created!" fi } send cmd { local SCREENNAME="$1" local CMD="$2" screen -DRRS $SCREENNAME -X stuff '$2 \r' } SCREENNAMES= "controller" "api" "worker-d0" "worker-d1" for screen in "${SCREENNAMES @ }"; do check and create screen "$screen" sleep 0.1 screen -DRRS "$screen" -X stuff "conda d… 证据：`scripts/build-api.sh`
- **Train Lora**（source_file）：deepspeed fastchat/train/train lora.py \ --model name or path lmsys/vicuna-7b-v1.5 \ --lora r 8 \ --lora alpha 16 \ --lora dropout 0.05 \ --data path $DATA PATH \ --output dir ./checkpoints \ --num train epochs 150 \ --fp16 True \ --per device train batch size 2 \ --per device eval batch size 2 \ --gradient accumulation steps 1 \ --evaluation strategy "steps" \ --eval steps 100 \ --save strategy "steps" \ --save steps 200 \ --save total limit 2 \ --learning rate 2e-5 \ --weight decay 0. \ --warmup ratio 0.03 \ --lr scheduler type "cosine" \ --logging strategy "steps" \ --logging steps 1 \ --tf32 True \ --model max length 2048 \ --q lora False \ --deepspeed $PATH TO DEEPSPEED CONFIG \ --grad… 证据：`scripts/train_lora.sh`
- **Train Vicuna 13B**（source_file）：torchrun --nproc per node=8 --master port=20001 fastchat/train/train mem.py \ --model name or path ~/model weights/llama-13b \ --data path ~/datasets/sharegpt 20230422 clean lang split identity.json \ --bf16 True \ --output dir output vicuna 13b \ --num train epochs 3 \ --per device train batch size 4 \ --per device eval batch size 32 \ --gradient accumulation steps 4 \ --evaluation strategy "steps" \ --eval steps 1500 \ --save strategy "steps" \ --save steps 1500 \ --save total limit 8 \ --learning rate 2e-5 \ --weight decay 0. \ --warmup ratio 0.04 \ --lr scheduler type "cosine" \ --logging steps 1 \ --fsdp "full shard auto wrap offload" \ --fsdp transformer layer cls to wrap 'LlamaDecode… 证据：`scripts/train_vicuna_13b.sh`
- **Train Vicuna 7B**（source_file）：torchrun --nproc per node=4 --master port=20001 fastchat/train/train mem.py \ --model name or path ~/model weights/llama-7b \ --data path ~/datasets/sharegpt 20230422 clean lang split identity.json \ --bf16 True \ --output dir output vicuna 7b \ --num train epochs 3 \ --per device train batch size 2 \ --per device eval batch size 16 \ --gradient accumulation steps 16 \ --evaluation strategy "steps" \ --eval steps 1500 \ --save strategy "steps" \ --save steps 1500 \ --save total limit 8 \ --learning rate 2e-5 \ --weight decay 0. \ --warmup ratio 0.04 \ --lr scheduler type "cosine" \ --logging steps 1 \ --fsdp "full shard auto wrap" \ --fsdp transformer layer cls to wrap 'LlamaDecoderLayer' \… 证据：`scripts/train_vicuna_7b.sh`
- **Upload Pypi**（source_file）：rm -rf dist python3 -m build python3 -m twine upload dist/ 证据：`scripts/upload_pypi.sh`

## 宿主 AI 必须遵守的规则

- **把本资产当作开工前上下文，而不是运行环境。**：AI Context Pack 只包含证据化项目理解，不包含目标项目的可执行状态。 证据：`README.md`, `fastchat/llm_judge/README.md`, `fastchat/serve/gateway/README.md`
- **回答用户时区分可预览内容与必须安装后才能验证的内容。**：安装前体验的消费者价值来自降低误装和误判，而不是伪装成真实运行。 证据：`README.md`, `fastchat/llm_judge/README.md`, `fastchat/serve/gateway/README.md`

## 用户开工前应该回答的问题

- 你准备在哪个宿主 AI 或本地环境中使用它？
- 你只是想先体验工作流，还是准备真实安装？
- 你最在意的是安装成本、输出质量、还是和现有规则的冲突？

## 验收标准

- 所有能力声明都能回指到 evidence_refs 中的文件路径。
- AI_CONTEXT_PACK.md 没有把预览包装成真实运行。
- 用户能在 3 分钟内看懂适合谁、能做什么、如何开始和风险边界。

---

## Doramagic Context Augmentation

下面内容用于强化 Repomix/AI Context Pack 主体。Human Manual 只提供阅读骨架；踩坑日志会被转成宿主 AI 必须遵守的工作约束。

## Human Manual 骨架

使用规则：这里只是项目阅读路线和显著性信号，不是事实权威。具体事实仍必须回到 repo evidence / Claim Graph。

宿主 AI 硬性规则：
- 不得把页标题、章节顺序、摘要或 importance 当作项目事实证据。
- 解释 Human Manual 骨架时，必须明确说它只是阅读路线/显著性信号。
- 能力、安装、兼容性、运行状态和风险判断必须引用 repo evidence、source path 或 Claim Graph。

- **FastChat 概览与分布式服务架构**：importance `high`
  - source_paths: README.md, docs/server_arch.md, docs/commands/webserver.md, fastchat/serve/controller.py, fastchat/serve/model_worker.py
- **模型推理后端与 OpenAI 兼容 API**：importance `high`
  - source_paths: fastchat/serve/model_worker.py, fastchat/serve/vllm_worker.py, fastchat/serve/sglang_worker.py, fastchat/serve/lightllm_worker.py, fastchat/serve/dashinfer_worker.py
- **MT-Bench 评估与 Chatbot Arena**：importance `high`
  - source_paths: fastchat/llm_judge/common.py, fastchat/llm_judge/gen_model_answer.py, fastchat/llm_judge/gen_judgment.py, fastchat/llm_judge/gen_api_answer.py, fastchat/llm_judge/show_result.py
- **训练微调、数据流水线与部署运维**：importance `high`
  - source_paths: fastchat/train/train.py, fastchat/train/train_mem.py, fastchat/train/train_lora.py, fastchat/train/train_lora_t5.py, fastchat/train/train_flant5.py

## Repo Inspection Evidence / 源码检查证据

- repo_clone_verified: true
- repo_inspection_verified: true
- repo_commit: `587d5cfa1609a43d192cedb8441cac3c17db105d`
- inspected_files: `pyproject.toml`, `README.md`, `docs/awq.md`, `docs/lightllm_integration.md`, `docs/model_support.md`, `docs/dashinfer_integration.md`, `docs/gptq.md`, `docs/openai_api.md`, `docs/xFasterTransformer.md`, `docs/training.md`, `docs/exllama_v2.md`, `docs/mlx_integration.md`, `docs/arena.md`, `docs/vicuna_weights_version.md`, `docs/server_arch.md`, `docs/dataset_release.md`, `docs/langchain_integration.md`, `docs/third_party_ui.md`, `docs/vllm_integration.md`, `docs/commands/local_cluster.md`

宿主 AI 硬性规则：
- 没有 repo_clone_verified=true 时，不得声称已经读过源码。
- 没有 repo_inspection_verified=true 时，不得把 README/docs/package 文件判断写成事实。
- 没有 quick_start_verified=true 时，不得声称 Quick Start 已跑通。

## Doramagic Pitfall Constraints / 踩坑约束

这些规则来自 Doramagic 发现、验证或编译过程中的项目专属坑点。宿主 AI 必须把它们当作工作约束，而不是普通说明文字。

### Constraint 1: 来源证据：The stop parameter in openai API doesn't work since v0.2.5

- Trigger: GitHub 社区证据显示该项目存在一个维护/版本相关的待验证问题：The stop parameter in openai API doesn't work since v0.2.5
- Why it matters: 可能增加新用户试用和生产接入成本。
- Evidence: community_evidence:github | https://github.com/lm-sys/FastChat/issues/1048 | 来源类型 github_issue 暴露的待验证使用条件。
- Hard boundary: 不要把这个坑点包装成已解决、已验证或可忽略，除非后续验证证据明确证明它已经关闭。

### Constraint 2: 来源证据：FastChat-T5 4K context

- Trigger: GitHub 社区证据显示该项目存在一个安全/权限相关的待验证问题：FastChat-T5 4K context
- Why it matters: 可能影响授权、密钥配置或安全边界。
- Evidence: community_evidence:github | https://github.com/lm-sys/FastChat/issues/1711 | 来源类型 github_issue 暴露的待验证使用条件。
- Hard boundary: 不要把这个坑点包装成已解决、已验证或可忽略，除非后续验证证据明确证明它已经关闭。

### Constraint 3: 仓库名和安装名不一致

- Trigger: 仓库名 `fastchat` 与安装入口 `fschat` 不完全一致。
- Host AI rule: 在 npm/PyPI/GitHub 上确认包名映射和官方 README 说明。
- Why it matters: 用户照着仓库名搜索包或照着包名找仓库时容易走错入口。
- Evidence: identity.distribution | github_repo:615882673 | https://github.com/lm-sys/FastChat | repo=fastchat; install=fschat
- Hard boundary: 不要把这个坑点包装成已解决、已验证或可忽略，除非后续验证证据明确证明它已经关闭。

### Constraint 4: 能力判断依赖假设

- Trigger: README/documentation is current enough for a first validation pass.
- Host AI rule: 将假设转成下游验证清单。
- Why it matters: 假设不成立时，用户拿不到承诺的能力。
- Evidence: capability.assumptions | github_repo:615882673 | https://github.com/lm-sys/FastChat | README/documentation is current enough for a first validation pass.
- Hard boundary: 不要把这个坑点包装成已解决、已验证或可忽略，除非后续验证证据明确证明它已经关闭。

### Constraint 5: 维护活跃度未知

- Trigger: 未记录 last_activity_observed。
- Host AI rule: 补 GitHub 最近 commit、release、issue/PR 响应信号。
- Why it matters: 新项目、停更项目和活跃项目会被混在一起，推荐信任度下降。
- Evidence: evidence.maintainer_signals | github_repo:615882673 | https://github.com/lm-sys/FastChat | last_activity_observed missing
- Hard boundary: 不要把这个坑点包装成已解决、已验证或可忽略，除非后续验证证据明确证明它已经关闭。

- Trigger: no_demo
- Evidence: downstream_validation.risk_items | github_repo:615882673 | https://github.com/lm-sys/FastChat | no_demo; severity=medium
- Hard boundary: 不要把这个坑点包装成已解决、已验证或可忽略，除非后续验证证据明确证明它已经关闭。

### Constraint 7: 存在评分风险

- Trigger: no_demo
- Why it matters: 风险会影响是否适合普通用户安装。
- Evidence: risks.scoring_risks | github_repo:615882673 | https://github.com/lm-sys/FastChat | no_demo; severity=medium
- Hard boundary: 不要把这个坑点包装成已解决、已验证或可忽略，除非后续验证证据明确证明它已经关闭。

### Constraint 8: 来源证据：Unauthenticated SSRF and worker/model spoofing via the controller /register_worker endpoint

- Trigger: GitHub 社区证据显示该项目存在一个安全/权限相关的待验证问题：Unauthenticated SSRF and worker/model spoofing via the controller /register_worker endpoint
- Why it matters: 可能影响授权、密钥配置或安全边界。
- Evidence: community_evidence:github | https://github.com/lm-sys/FastChat/issues/3886 | 来源讨论提到 python 相关条件，需在安装/试用前复核。
- Hard boundary: 不要把这个坑点包装成已解决、已验证或可忽略，除非后续验证证据明确证明它已经关闭。

### Constraint 9: issue/PR 响应质量未知

- Trigger: issue_or_pr_quality=unknown。
- Host AI rule: 抽样最近 issue/PR，判断是否长期无人处理。
- Why it matters: 用户无法判断遇到问题后是否有人维护。
- Evidence: evidence.maintainer_signals | github_repo:615882673 | https://github.com/lm-sys/FastChat | issue_or_pr_quality=unknown
- Hard boundary: 不要把这个坑点包装成已解决、已验证或可忽略，除非后续验证证据明确证明它已经关闭。

### Constraint 10: 发布节奏不明确

- Trigger: release_recency=unknown。
- Host AI rule: 确认最近 release/tag 和 README 安装命令是否一致。
- Why it matters: 安装命令和文档可能落后于代码，用户踩坑概率升高。
- Evidence: evidence.maintainer_signals | github_repo:615882673 | https://github.com/lm-sys/FastChat | release_recency=unknown
- Hard boundary: 不要把这个坑点包装成已解决、已验证或可忽略，除非后续验证证据明确证明它已经关闭。