# bentoml - Doramagic AI Context Pack

> 定位：安装前体验与判断资产。它帮助宿主 AI 有一个好的开始，但不代表已经安装、执行或验证目标项目。

## 充分原则

- **充分原则，不是压缩原则**：AI Context Pack 应该充分到让宿主 AI 在开工前理解项目价值、能力边界、使用入口、风险和证据来源；它可以分层组织，但不以最短摘要为目标。
- **压缩策略**：只压缩噪声和重复内容，不压缩会影响判断和开工质量的上下文。

## 给宿主 AI 的使用方式

你正在读取 Doramagic 为 bentoml 编译的 AI Context Pack。请把它当作开工前上下文：帮助用户理解适合谁、能做什么、如何开始、哪些必须安装后验证、风险在哪里。不要声称你已经安装、运行或执行了目标项目。

## Claim 消费规则

- **事实来源**：Repo Evidence + Claim/Evidence Graph；Human Wiki 只提供显著性、术语和叙事结构。
- **事实最低状态**：`supported`
- `supported`：可以作为项目事实使用，但回答中必须引用 claim_id 和证据路径。
- `weak`：只能作为低置信度线索，必须要求用户继续核实。
- `inferred`：只能用于风险提示或待确认问题，不能包装成项目事实。
- `unverified`：不得作为事实使用，应明确说证据不足。
- `contradicted`：必须展示冲突来源，不得替用户强行选择一个版本。

## 它最适合谁

- **想在安装前理解开源项目价值和边界的用户**：当前证据主要来自项目文档。 证据：`README.md` Claim：`clm_0002` supported 0.86

## 它能做什么

- **命令行启动或安装流程**（需要安装后验证）：项目文档中存在可执行命令，真实使用需要在本地或宿主环境中运行这些命令。 证据：`README.md`, `scripts/release_quickstart_bento.sh` Claim：`clm_0001` supported 0.86

## 怎么开始

- `pip install -U bentoml` 证据：`README.md` Claim：`clm_0003` supported 0.86
- `pip install torch transformers  # additional dependencies for local run` 证据：`README.md` Claim：`clm_0004` supported 0.86
- `pip install -U pip` 证据：`scripts/release_quickstart_bento.sh` Claim：`clm_0005` unverified 0.25
- `pip install "bentoml[grpc]==$BENTOML_VERSION"` 证据：`scripts/release_quickstart_bento.sh` Claim：`clm_0006` unverified 0.25
- `pip install -r ./requirements.txt` 证据：`scripts/release_quickstart_bento.sh` Claim：`clm_0007` unverified 0.25
- `pip install fs-s3fs` 证据：`scripts/release_quickstart_bento.sh` Claim：`clm_0008` unverified 0.25

## 继续前判断卡

- **当前建议**：先做权限沙盒试用
- **为什么**：项目存在安装命令、宿主配置或本地写入线索，不建议直接进入主力环境，应先在隔离环境试装。

### 30 秒判断

- **现在怎么做**：先做权限沙盒试用
- **最小安全下一步**：先跑 Prompt Preview；若仍要安装，只在隔离环境试装
- **先别相信**：工具权限边界不能在安装前相信。
- **继续会触碰**：命令执行、本地环境或项目文件、宿主 AI 上下文

### 现在可以相信

- **适合人群线索：想在安装前理解开源项目价值和边界的用户**（supported）：有 supported claim 或项目证据支撑，但仍不等于真实安装效果。 证据：`README.md` Claim：`clm_0002` supported 0.86
- **能力存在：命令行启动或安装流程**（supported）：可以相信项目包含这类能力线索；是否适合你的具体任务仍要试用或安装后验证。 证据：`README.md`, `scripts/release_quickstart_bento.sh` Claim：`clm_0001` supported 0.86
- **存在 Quick Start / 安装命令线索**（supported）：可以相信项目文档出现过启动或安装入口；不要因此直接在主力环境运行。 证据：`README.md` Claim：`clm_0003` supported 0.86

### 现在还不能相信

- **工具权限边界不能在安装前相信。**（unverified）：MCP/tool 类项目通常会触碰文件、网络、浏览器或外部 API，必须真实检查权限和日志。
- **真实输出质量不能在安装前相信。**（unverified）：Prompt Preview 只能展示引导方式，不能证明真实项目中的结果质量。
- **宿主 AI 版本兼容性不能在安装前相信。**（unverified）：Claude、Cursor、Codex、Gemini 等宿主加载规则和版本差异必须在真实环境验证。
- **不会污染现有宿主 AI 行为，不能直接相信。**（inferred）：Skill、plugin、AGENTS/CLAUDE/GEMINI 指令可能改变宿主 AI 的默认行为。
- **可安全回滚不能默认相信。**（unverified）：除非项目明确提供卸载和恢复说明，否则必须先在隔离环境验证。
- **真实安装后是否与用户当前宿主 AI 版本兼容？**（unverified）：兼容性只能通过实际宿主环境验证。
- **项目输出质量是否满足用户具体任务？**（unverified）：安装前预览只能展示流程和边界，不能替代真实评测。
- **安装命令是否需要网络、权限或全局写入？**（unverified）：这影响企业环境和个人环境的安装风险。 证据：`README.md`

### 继续会触碰什么

- **命令执行**：包管理器、网络下载、本地插件目录、项目配置或用户主目录。 原因：运行第一条命令就可能产生环境改动；必须先判断是否值得跑。 证据：`README.md`, `scripts/release_quickstart_bento.sh`
- **本地环境或项目文件**：安装结果、插件缓存、项目配置或本地依赖目录。 原因：安装前无法证明写入范围和回滚方式，需要隔离验证。 证据：`README.md`, `scripts/release_quickstart_bento.sh`
- **宿主 AI 上下文**：AI Context Pack、Prompt Preview、Skill 路由、风险规则和项目事实。 原因：导入上下文会影响宿主 AI 后续判断，必须避免把未验证项包装成事实。

### 最小安全下一步

- **先跑 Prompt Preview**：用安装前交互式试用判断工作方式是否匹配，不需要授权或改环境。（适用：任何项目都适用，尤其是输出质量未知时。）
- **只在隔离目录或测试账号试装**：避免安装命令污染主力宿主 AI、真实项目或用户主目录。（适用：存在命令执行、插件配置或本地写入线索时。）
- **安装后只验证一个最小任务**：先验证加载、兼容、输出质量和回滚，再决定是否深用。（适用：准备从试用进入真实工作流时。）

### 退出方式

- **保留安装前状态**：记录原始宿主配置和项目状态，后续才能判断是否可恢复。
- **记录安装命令和写入路径**：没有明确卸载说明时，至少要知道哪些目录或配置需要手动清理。
- **如果没有回滚路径，不进入主力环境**：不可回滚是继续前阻断项，不应靠信任或运气继续。

## 哪些只能预览

- 解释项目适合谁和能做什么
- 基于项目文档演示典型对话流程
- 帮助用户判断是否值得安装或继续研究

## 哪些必须安装后验证

- 真实安装 Skill、插件或 CLI
- 执行脚本、修改本地文件或访问外部服务
- 验证真实输出质量、性能和兼容性

## 边界与风险判断卡

- **把安装前预览误认为真实运行**：用户可能高估项目已经完成的配置、权限和兼容性验证。 处理方式：明确区分 prompt_preview_can_do 与 runtime_required。 Claim：`clm_0009` inferred 0.45
- **命令执行会修改本地环境**：安装命令可能写入用户主目录、宿主插件目录或项目配置。 处理方式：先在隔离环境或测试账号中运行。 证据：`README.md`, `scripts/release_quickstart_bento.sh` Claim：`clm_0010` supported 0.86
- **待确认**：真实安装后是否与用户当前宿主 AI 版本兼容？。原因：兼容性只能通过实际宿主环境验证。
- **待确认**：项目输出质量是否满足用户具体任务？。原因：安装前预览只能展示流程和边界，不能替代真实评测。
- **待确认**：安装命令是否需要网络、权限或全局写入？。原因：这影响企业环境和个人环境的安装风险。

## 开工前工作上下文

### 加载顺序

- 先读取 how_to_use.host_ai_instruction，建立安装前判断资产的边界。
- 读取 claim_graph_summary，确认事实来自 Claim/Evidence Graph，而不是 Human Wiki 叙事。
- 再读取 intended_users、capabilities 和 quick_start_candidates，判断用户是否匹配。
- 需要执行具体任务时，优先查 role_skill_index，再查 evidence_index。
- 遇到真实安装、文件修改、网络访问、性能或兼容性问题时，转入 risk_card 和 boundaries.runtime_required。

### 任务路由

- **命令行启动或安装流程**：先说明这是安装后验证能力，再给出安装前检查清单。 边界：必须真实安装或运行后验证。 证据：`README.md`, `scripts/release_quickstart_bento.sh` Claim：`clm_0001` supported 0.86

### 上下文规模

- 文件总数：369
- 重要文件覆盖：40/369
- 证据索引条目：65
- 角色 / Skill 条目：4

### 证据不足时的处理

- **missing_evidence**：说明证据不足，要求用户提供目标文件、README 段落或安装后验证记录；不要补全事实。
- **out_of_scope_request**：说明该任务超出当前 AI Context Pack 证据范围，并建议用户先查看 Human Manual 或真实安装后验证。
- **runtime_request**：给出安装前检查清单和命令来源，但不要替用户执行命令或声称已执行。
- **source_conflict**：同时展示冲突来源，标记为待核实，不要强行选择一个版本。

## Prompt Recipes

### 适配判断

- 目标：判断这个项目是否适合用户当前任务。
- 预期输出：适配结论、关键理由、证据引用、安装前可预览内容、必须安装后验证内容、下一步建议。

```text
请基于 bentoml 的 AI Context Pack，先问我 3 个必要问题，然后判断它是否适合我的任务。回答必须包含：适合谁、能做什么、不能做什么、是否值得安装、证据来自哪里。所有项目事实必须引用 evidence_refs、source_paths 或 claim_id。
```

### 安装前体验

- 目标：让用户在安装前感受核心工作流，同时避免把预览包装成真实能力或营销承诺。
- 预期输出：一段带边界标签的体验剧本、安装后验证清单和谨慎建议；不含真实运行承诺或强营销表述。

```text
请把 bentoml 当作安装前体验资产，而不是已安装工具或真实运行环境。

请严格输出四段：
1. 先问我 3 个必要问题。
2. 给出一段“体验剧本”：用 [安装前可预览]、[必须安装后验证]、[证据不足] 三种标签展示它可能如何引导工作流。
3. 给出安装后验证清单：列出哪些能力只有真实安装、真实宿主加载、真实项目运行后才能确认。
4. 给出谨慎建议：只能说“值得继续研究/试装”“先补充信息后再判断”或“不建议继续”，不得替项目背书。

硬性边界：
- 不要声称已经安装、运行、执行测试、修改文件或产生真实结果。
- 不要写“自动适配”“确保通过”“完美适配”“强烈建议安装”等承诺性表达。
- 如果描述安装后的工作方式，必须使用“如果安装成功且宿主正确加载 Skill，它可能会……”这种条件句。
- 体验剧本只能写成“示例台词/假设流程”：使用“可能会询问/可能会建议/可能会展示”，不要写“已写入、已生成、已通过、正在运行、正在生成”。
- Prompt Preview 不负责给安装命令；如用户准备试装，只能提示先阅读 Quick Start 和 Risk Card，并在隔离环境验证。
- 所有项目事实必须来自 supported claim、evidence_refs 或 source_paths；inferred/unverified 只能作风险或待确认项。

```

### 角色 / Skill 选择

- 目标：从项目里的角色或 Skill 中挑选最匹配的资产。
- 预期输出：候选角色或 Skill 列表，每项包含适用场景、证据路径、风险边界和是否需要安装后验证。

```text
请读取 role_skill_index，根据我的目标任务推荐 3-5 个最相关的角色或 Skill。每个推荐都要说明适用场景、可能输出、风险边界和 evidence_refs。
```

### 风险预检

- 目标：安装或引入前识别环境、权限、规则冲突和质量风险。
- 预期输出：环境、权限、依赖、许可、宿主冲突、质量风险和未知项的检查清单。

```text
请基于 risk_card、boundaries 和 quick_start_candidates，给我一份安装前风险预检清单。不要替我执行命令，只说明我应该检查什么、为什么检查、失败会有什么影响。
```

### 宿主 AI 开工指令

- 目标：把项目上下文转成一次对话开始前的宿主 AI 指令。
- 预期输出：一段边界明确、证据引用明确、适合复制给宿主 AI 的开工前指令。

```text
请基于 bentoml 的 AI Context Pack，生成一段我可以粘贴给宿主 AI 的开工前指令。这段指令必须遵守 not_runtime=true，不能声称项目已经安装、运行或产生真实结果。
```

## 角色 / Skill 索引

- 共索引 4 个角色 / Skill / 项目文档条目。

- **Unified Model Serving Framework**（project_doc）：🍱 Build model inference APIs and multi-model serving systems with any open-source or custom AI models. 👉 Join our forum https://forum.modular.com/c/bento/31 ! 激活提示：当用户需要理解项目结构、安装方式或边界时参考。 证据：`README.md`
- **Developing BentoServer**（project_doc）：Run BentoServer with sample Service Create a sample Servie in hello.py : 激活提示：当用户需要理解项目结构、安装方式或边界时参考。 证据：`src/bentoml/_internal/server/README.md`
- **Readme**（project_doc）：Here are entrypoints to the bare workers that internally used by the bentoml. They are typically used by the supervisor and not directly by the user. 激活提示：当用户需要理解项目结构、安装方式或边界时参考。 证据：`src/bentoml_cli/worker/README.md`
- **Contributing to BentoML**（project_doc）：BentoML https://github.com/bentoml/BentoML is an open and community-driven project. Everyone is welcome to contribute. 激活提示：当用户需要理解项目结构、安装方式或边界时参考。 证据：`CONTRIBUTING.md`

## 证据索引

- 共索引 65 条证据。

- **Unified Model Serving Framework**（documentation）：🍱 Build model inference APIs and multi-model serving systems with any open-source or custom AI models. 👉 Join our forum https://forum.modular.com/c/bento/31 ! 证据：`README.md`
- **Developing BentoServer**（documentation）：Run BentoServer with sample Service Create a sample Servie in hello.py : 证据：`src/bentoml/_internal/server/README.md`
- **Readme**（documentation）：Here are entrypoints to the bare workers that internally used by the bentoml. They are typically used by the supervisor and not directly by the user. 证据：`src/bentoml_cli/worker/README.md`
- **Contributing to BentoML**（documentation）：BentoML https://github.com/bentoml/BentoML is an open and community-driven project. Everyone is welcome to contribute. 证据：`CONTRIBUTING.md`
- **License**（source_file）：Apache License Version 2.0, January 2004 http://www.apache.org/licenses/ 证据：`LICENSE`
- **Init**（source_file）：all = class SyncHTTPClient SyncHTTPClient class AsyncHTTPClient AsyncHTTPClient 证据：`src/_bentoml_impl/client/__init__.py`
- **Base**（source_file）：T = t.TypeVar "T" def map exception resp: httpx.Response - BentoMLException ⋮---- status = HTTPStatus resp.status code exc = BentoMLException.error mapping.get status, BentoMLException ⋮---- @attrs.define slots=True class ClientEndpoint ⋮---- name: str route: str doc: str None = None input: dict str, t.Any = attrs.field factory=dict output: dict str, t.Any = attrs.field factory=dict input spec: type IODescriptor None = None output spec: type IODescriptor None = None stream output: bool = False is task: bool = False class AbstractClient abc.ABC ⋮---- endpoints: dict str, ClientEndpoint def setup endpoints self - None ⋮---- attr name = name ⋮---- attr name = f"api {name}" prefix to avoid name… 证据：`src/_bentoml_impl/client/base.py`
- **Http**（source_file）：T = t.TypeVar "T" AnyClient = t.TypeVar "AnyClient", httpx.Client, httpx.AsyncClient C = t.TypeVar "C", httpx.Client, httpx.AsyncClient logger = logging.getLogger "bentoml.io" MAX RETRIES = 3 ⋮---- @attr.define slots=False class HTTPClient AbstractClient, t.Generic C ⋮---- client cls: t.ClassVar type C url: str endpoints: dict str, ClientEndpoint = attr.field factory=dict media type: str = "application/json" timeout: float = 30 default headers: dict str, str = attr.field factory=dict app: ASGIApp None = None server ready timeout: float None = None service: Service t.Any None = None file manager: ClientFileManager = attr.field init=False, factory=ClientFileManager temp dir: tempfile.Temporar… 证据：`src/_bentoml_impl/client/http.py`
- **Proxy**（source_file）：T = t.TypeVar "T" logger = logging.getLogger "bentoml.impl" class RemoteProxy AbstractClient, t.Generic T ⋮---- svc config: dict str, ServiceConfig = timeout = ⋮---- timeout = 60 ⋮---- @property def to async self - AsyncHTTPClient ⋮---- @property def to sync self - SyncHTTPClient ⋮---- @property def client url self - str async def is ready self, timeout: int None = None - bool async def close self - None def as service self - T def call self, name: str, /, args: t.Any, kwargs: t.Any - t.Any ⋮---- original func = getattr self. inner, name ⋮---- original func = original func.func ⋮---- is async func = ⋮---- original func = getattr self. inner, endpoint.name 证据：`src/_bentoml_impl/client/proxy.py`
- **Proxy2**（source_file）：T = t.TypeVar "T" ⋮---- P = t.ParamSpec "P" ⋮---- logger = logging.getLogger "bentoml.io" async def map exception resp: aiohttp.ClientResponse - BentoMLException ⋮---- status = HTTPStatus resp.status exc = BentoMLException.error mapping.get status, BentoMLException ⋮---- class SessionManager ⋮---- def make client self - aiohttp.ClientSession ⋮---- connector: aiohttp.BaseConnector None = None ⋮---- connector = ASGIApplicationConnector self. app base url = "http://127.0.0.1:3000" ⋮---- connector = UnixConnector path=uri to path self. parsed url.geturl ⋮---- base url = f"http://{self. parsed url.netloc}" ⋮---- base url = self. parsed url.geturl ⋮---- def should refresh self - bool async def ge… 证据：`src/_bentoml_impl/client/proxy2.py`
- **Init**（source_file）：all = "serve http" 证据：`src/_bentoml_impl/server/__init__.py`
- **try to find the GPU used with the same fragment**（source_file）：NVIDIA GPU = "nvidia.com/gpu" DISABLE GPU ALLOCATION ENV = "BENTOML DISABLE GPU ALLOCATION" class ResourceAllocator ⋮---- def init self - None ⋮---- @staticmethod def gpu allocation disabled - bool def assign gpus self, count: float - list int ⋮---- if count < 1: a fractional GPU ⋮---- try to find the GPU used with the same fragment gpu = next ⋮---- gpu = len self. available gpus ⋮---- can't assign to the next one, mark it as zero. ⋮---- else: allocate n GPUs, n is a positive integer ⋮---- count = int count unassigned = ⋮---- config = services service.name num gpus = 0 num workers = 1 worker env: dict str, str = {} ⋮---- num gpus = config "resources" "gpu" ⋮---- num workers = int self.syste… 证据：`src/_bentoml_impl/server/allocator.py`
- **App**（source_file）：R = t.TypeVar "R" logger = logging.getLogger "bentoml.serve" RESULT STORE ENV = "BENTOML RESULT STORE" class ContextMiddleware ⋮---- def init self, app: ext.ASGIApp, context: ServiceContext - None ⋮---- req = Request scope, receive, send ⋮---- class ServiceAppFactory BaseAppFactory ⋮---- config = services service.name traffic = config.get "traffic" workers = config.get "workers" timeout = traffic.get "timeout" max concurrency = traffic.get "max concurrency" ⋮---- num workers = 1 ⋮---- srs = system resources num workers = int srs "cpu" ⋮---- num workers = workers ⋮---- def fallback - t.NoReturn ⋮---- @functools.cached property def adaptive batch size hist self - Histogram ⋮---- metrics clien… 证据：`src/_bentoml_impl/server/app.py`
- **Proxy**（source_file）：logger = logging.getLogger "bentoml.server" async def check health client: aiohttp.ClientSession, health endpoint: str - bool ⋮---- response = await client.get health endpoint, timeout=aiohttp.ClientTimeout 5 ⋮---- def create proxy app service: Service t.Any - Starlette ⋮---- """A reverse-proxy that forwards all requests to the HTTP server started by the custom command. """ ⋮---- health endpoint = service.config.get "endpoints", {} .get "livez", "/health" ⋮---- server instance = get current service ⋮---- should start process = ⋮---- proxy port = service.config.get "http", {} .get "proxy port", 8000 proxy url = f"http://localhost:{proxy port}" proc: Process None = None ⋮---- client = instanc… 证据：`src/_bentoml_impl/server/proxy.py`
- **Resolve models before server starts.**（source_file）：AnyService = Service t.Any ⋮---- POSIX = os.name == "posix" WINDOWS = os.name == "nt" IS WSL = "microsoft-standard" in platform.release API SERVER NAME = " bento api server" MAX AF UNIX PATH LENGTH = 103 logger = logging.getLogger "bentoml.serve" ⋮---- runner port = port stack.enter context reserve free port runner host = "127.0.0.1" ⋮---- socket path = os.path.join uds path, f"{id service }.sock" ⋮---- SERVICE WORKER SCRIPT = " bentoml impl.worker.service" ⋮---- env = env or {} ⋮---- args = cmd = sys.executable ⋮---- watcher = create watcher ⋮---- Resolve models before server starts. ⋮---- member = getattr svc.inner, name ⋮---- env = {"PROMETHEUS MULTIPROC DIR": ensure prometheus dir } ⋮--… 证据：`src/_bentoml_impl/server/serving.py`
- **Init**（source_file）：all = "ResultStore", "ResultStatus", "Sqlite3Store" 证据：`src/_bentoml_impl/tasks/__init__.py`
- **ruff: noqa**（source_file）：ruff: noqa ⋮---- T = TypeVar "T" ⋮---- all = 证据：`src/_bentoml_sdk/__init__.py`
- **Init**（source_file）：all = "Model", "BentoModel", "HuggingFaceModel" 证据：`src/_bentoml_sdk/models/__init__.py`
- **Base**（source_file）：T = t.TypeVar "T" class Model abc.ABC, t.Generic T ⋮---- @abc.abstractmethod def to info self, alias: str None = None - BentoModelInfo ⋮---- @classmethod @abc.abstractmethod def from info cls, info: BentoModelInfo - Model T ⋮---- @abc.abstractmethod def to create schema self - CreateModelSchema ⋮---- @abc.abstractmethod def resolve self, base path: t.Union PathType, None = None - T ⋮---- @t.overload def get self, instance: None, owner: t.Type t.Any - t.Self: ... ⋮---- @t.overload def get self, instance: t.Any, owner: t.Type t.Any - T: ... def get self, instance: t.Any, owner: type - T t.Self ⋮---- @attrs.frozen class BentoModel Model StoredModel ⋮---- tag: Tag = attrs.field converter=Tag.fr… 证据：`src/_bentoml_sdk/models/base.py`
- **Huggingface**（source_file）：CONFIG FILE = "config.json" DEFAULT HF ENDPOINT = "https://huggingface.co" ⋮---- @attrs.define unsafe hash=True class HuggingFaceModel Model str ⋮---- model id: str revision: str = "main" endpoint: t.Optional str = attrs.field factory=lambda: os.getenv "HF ENDPOINT" include: t.Optional t.List str = None exclude: t.Optional t.List str = None ⋮---- @cached property def hf api self - HfApi ⋮---- @cached property def commit hash self - str def resolve self, base path: t.Union PathType, None = None - str ⋮---- snapshot path = snapshot download ⋮---- model path = os.path.dirname os.path.dirname snapshot path ⋮---- def to info self, alias: str None = None - BentoModelInfo ⋮---- model id = self.mod… 证据：`src/_bentoml_sdk/models/huggingface.py`
- **Init**（source_file）：all = current service: t.Optional t.Any = None def get current service - t.Any def set current service service: t.Any - None ⋮---- current service = service 证据：`src/_bentoml_sdk/service/__init__.py`
- **Init**（source_file）：MODULE ATTRS = { ⋮---- bentos = LazyLoader "bentoml.bentos", globals , "bentoml.bentos" legacy = LazyLoader "bentoml.legacy", globals , "bentoml.legacy" catboost = LazyLoader sklearn = LazyLoader xgboost = LazyLoader lightgbm = LazyLoader unsloth = LazyLoader mlflow = LazyLoader "bentoml.mlflow", globals , " bentoml impl.frameworks.mlflow" detectron = LazyLoader diffusers = LazyLoader diffusers simple = LazyLoader easyocr = LazyLoader flax = LazyLoader fastai = LazyLoader onnx = LazyLoader keras = LazyLoader pytorch = LazyLoader pytorch lightning = LazyLoader picklable model = LazyLoader tensorflow = LazyLoader torchscript = LazyLoader transformers = LazyLoader triton = LazyLoader "bentoml.… 证据：`src/bentoml/__init__.py`
- **Init**（source_file）：all = "Bento", "BentoStore" 证据：`src/bentoml/_internal/bento/__init__.py`
- **when address is a RPC**（source_file）：logger = logging.getLogger name ⋮---- class Client ABC ⋮---- server url: str svc: Service endpoints: list str sync client: SyncClient async client: AsyncClient def init self, svc: Service, server url: str ⋮---- @t.overload @staticmethod def from url server url: str, , kind: t.Literal "http" = ... - HTTPClient: ... ⋮---- @t.overload @staticmethod def from url server url: str, , kind: t.Literal "grpc" = ... - GrpcClient: ... ⋮---- def enter self ⋮---- async def aenter self ⋮---- class AsyncClient ABC ⋮---- async def close self ⋮---- class SyncClient Client ⋮---- def call self, bentoml api name: str, inp: t.Any = None, kwargs: t.Any - t.Any ⋮---- def close self - None ⋮---- when address is a R… 证据：`src/bentoml/_internal/client/__init__.py`
- **TODO: Temporary workaround before moving everything to StreamingResponse**（source_file）：logger = logging.getLogger name class HTTPClient Client ⋮---- def init self, svc: Service, server url: str class AsyncHTTPClient AsyncClient ⋮---- @cached property def client self - httpx.AsyncClient ⋮---- host = host if "://" in host else "http://" + host start time = time.time ⋮---- resp = await session.get "/readyz" ⋮---- async def health self - httpx.Response ⋮---- @classmethod async def from url cls, server url: str, kwargs: t.Any - AsyncHTTPClient ⋮---- server url = server url if "://" in server url else "http://" + server url ⋮---- resp = await session.get "/docs.json" ⋮---- openapi spec = json.loads await resp.aread dummy service = Service openapi spec "info" "title" ⋮---- api = Inf… 证据：`src/bentoml/_internal/client/http.py`
- **Init**（source_file）：@attrs.frozen class BentoCloudClient ⋮---- client: RestApiClient bento: BentoAPI model: ModelAPI deployment: DeploymentAPI secret: SecretAPI api token: ApiTokenAPI ⋮---- cfg = CloudClientConfig.get config ctx = cfg.get context BentoMLContainer.cloud context.get api key = ctx.api token endpoint = ctx.endpoint client = RestApiClient endpoint, api key, timeout spinner = Spinner bento = BentoAPI client, spinner=spinner model = ModelAPI client, spinner=spinner deployment = DeploymentAPI client secret = SecretAPI client api token = ApiTokenAPI client ⋮---- @classmethod def for context cls, context: str None = None - "BentoCloudClient" ⋮---- ctx = cfg.get context context 证据：`src/bentoml/_internal/cloud/__init__.py`
- **Base**（source_file）：FILE CHUNK SIZE = 100 1024 1024 UPLOAD RETRY COUNT = 3 ⋮---- @attrs.define class CallbackIOWrapper t.IO bytes ⋮---- file: t.IO bytes read cb: t.Callable int , None None = None write cb: t.Callable int , None None = None start: int None = None end: int None = None def attrs post init self - None def reset self - int ⋮---- read = self.tell - self.start or 0 ⋮---- def seek self, offset: int, whence: int = 0 - int ⋮---- length = self.file.seek self.end, 0 ⋮---- length = self.file.seek offset, whence ⋮---- def tell self - int def fileno self - int def getattr self, name: str - t.Any def read self, size: int = -1 - bytes ⋮---- pos = self.tell ⋮---- size = self.end - pos res = self.file.read size… 证据：`src/bentoml/_internal/cloud/base.py`
- **Download model files from remote model store**（source_file）：@attrs.frozen class ModelAPI ⋮---- client: RestApiClient = attrs.field repr=False spinner: Spinner = attrs.field repr=False, factory=Spinner lock: Lock = attrs.field repr=False, init=False, factory=Lock ⋮---- upload task id = self.spinner.transmission progress.add task ⋮---- rest client = self. client model info = model.to info name = model info.tag.name version = model info.tag.version ⋮---- model repository = rest client.v1.get model repository ⋮---- model repository = rest client.v1.create model repository ⋮---- remote model = rest client.v1.get model ⋮---- remote model = rest client.v1.create model ⋮---- transmission strategy: TransmissionStrategy = "proxy" presigned upload url: str Non… 证据：`src/bentoml/_internal/cloud/model.py`
- **Modelschemas**（source_file）：time format = "%Y-%m-%d %H:%M:%S.%f" T = t.TypeVar "T" class ResourceType Enum ⋮---- USER = "user" ORG = "organization" CLUSTER = "cluster" HostCluster = "host cluster" BENTO REPOSITORY = "bento repository" BENTO = "bento" MODEL REPOSITORY = "model repository" MODEL = "model" DEPLOYMENT = "deployment" DEPLOYMENT REVISION = "deployment revision" TERMINAL RECORD = "terminal record" LABEL = "label" API TOKEN = "api token" YATAI COMPONENT = "yatai component" LimitGroup = "limit group" ResourceInstance = "resource instance" class BentoImageBuildStatus Enum ⋮---- PENDING = "pending" BUILDING = "building" SUCCESS = "success" FAILED = "failed" class UploadStatus Enum ⋮---- BUILDING = "uploading" ⋮-… 证据：`src/bentoml/_internal/cloud/schemas/modelschemas.py`
- **User local config options for customizing bentoml**（source_file）：logger = logging.getLogger name DEBUG ENV VAR = "BENTOML DEBUG" QUIET ENV VAR = "BENTOML QUIET" VERBOSITY ENV VAR = "BENTOML VERBOSITY" CONFIG ENV VAR = "BENTOML CONFIG" CONFIG OVERRIDE ENV VAR = "BENTOML CONFIG OPTIONS" CONFIG OVERRIDE JSON ENV VAR = "BENTOML CONFIG OVERRIDES" GRPC DEBUG ENV VAR = "GRPC VERBOSITY" DEFAULT LOCK PLATFORM = "x86 64-manylinux 2 36" def get bentoml version - str ⋮---- version = importlib.metadata.version "bentoml" ⋮---- BENTOML VERSION = get bentoml version def expand env var env var: str - str ⋮---- interpolated = os.path.expanduser os.path.expandvars str env var ⋮---- env var = interpolated ⋮---- @lru cache maxsize=1 def clean bentoml version - str ⋮---- vers… 证据：`src/bentoml/_internal/configuration/__init__.py`
- **3. migrate api server.cors. access control - api server.http.cors.**（source_file）：TRACING CFG = { API SERVER CONFIG = { RUNNER CONFIG = { SCHEMA = s.Schema def migration , override config: dict str, t.Any ⋮---- 3. migrate api server.cors. access control - api server.http.cors. ⋮---- 7. move timeout to traffic.timeout ⋮---- runner name = key.split "." 1 ⋮---- def finalize config config: dict str, t.Any - None ⋮---- RUNNER CFG KEYS = default runner config: dict str, t.Any = { 证据：`src/bentoml/_internal/configuration/v1/__init__.py`
- **Init**（source_file）：TRACING CFG = { SERVICE CONFIG = { SCHEMA = s.Schema def migration , override config: dict str, t.Any def finalize config config: dict str, t.Any - dict str, t.Any ⋮---- SERVICE CFG KEYS = default service config = { 证据：`src/bentoml/_internal/configuration/v2/__init__.py`
- **NOTE: for tags strategy, we will always generate a default tag from the bento:tag**（source_file）：P = t.ParamSpec "P" class DefaultBackendImpl types.ModuleType ⋮---- BUILD CMD: list str None ENV: dict str, str None BUILDKIT SUPPORT: bool def find binary self - str None: ... ⋮---- def health self - bool: ... DefaultBuilder: t.TypeAlias = t.Literal logger = logging.getLogger name BUILDER REGISTRY: dict str, OCIBuilder = {} DEFAULT BACKENDS = frozenset def register default backends ⋮---- module = t.cast ⋮---- NOTE: for tags strategy, we will always generate a default tag from the bento:tag If '-t/--image-tag' is provided, we will use this tag provided by user. bento = bento store.get bento tag tag = str bento.tag , ⋮---- tag = image tag ⋮---- We will look for DOCKER BUILDKIT in the environ… 证据：`src/bentoml/_internal/container/__init__.py`
- **We will use a thread to read from the subprocess and avoid hanging from Ctrl+C**（source_file）：P = t.ParamSpec "P" ArgType: t.TypeAlias = tuple str, ... None logger = logging.getLogger name ⋮---- ListStr = list str ⋮---- ListStr = list class Arguments ListStr ⋮---- def add self: Self, other: Arguments - Arguments ⋮---- @singledispatchmethod def construct args self, args: t.Any, opt: str = "" ⋮---- @construct args.register type None @construct args.register tuple @construct args.register list def self, args: ArgType, opt: str = "" ⋮---- @construct args.register type None @construct args.register str @construct args.register os.PathLike def self, args: PathType, opt: str = "" ⋮---- args = os.path.abspath str args ⋮---- @construct args.register type None @construct args.register bool de… 证据：`src/bentoml/_internal/container/base.py`
- **Buildx**（source_file）：logger = logging.getLogger name all = "ENV", "health", "construct build args", "BUILDKIT SUPPORT", "BUILD CMD" BUILDKIT SUPPORT = True BUILD CMD = "buildx", "build" def health - bool ⋮---- client = find binary ⋮---- has buildx = subprocess.check output client, "buildx", "--help" .decode "utf-8" ⋮---- def supports attestation - bool ⋮---- outputs: str = ⋮---- def parse dict opt d: dict str, str - str ⋮---- cmds = Arguments ⋮---- load = False ⋮---- output = parse dict opt output ⋮---- add host = tuple f"{host}:{ip}" for host, ip in add host.items ⋮---- build arg = tuple f"{key}={value}" for key, value in build arg.items ⋮---- build context = tuple f"{key}={value}" for key, value in build cont… 证据：`src/bentoml/_internal/container/buildx.py`
- **Init**（source_file）：P = t.ParamSpec "P" ListStr = list str ⋮---- ListStr = list logger = logging.getLogger name SUPPORTED PYTHON VERSIONS = "3.9", "3.10", "3.11", "3.12", "3.13", "3.14" SUPPORTED CUDA VERSIONS = ALLOWED CUDA VERSION ARGS = { SUPPORTED ARCHITECTURES = "amd64", "arm64", "ppc64le", "s390x" SUPPORTED RELEASE TYPES = "python", "miniconda", "cuda" CONTAINER METADATA: dict str, dict str, t.Any = { CONTAINER SUPPORTED DISTROS = list CONTAINER METADATA.keys CUDA UBUNTU2404 VERSIONS = {"12.6.0", "12.6.1", "12.6.2", "12.6.3", "12.8.0", "12.8.1"} def get cuda base image distro: str, cuda version: str - str ⋮---- meta = CONTAINER METADATA distro ⋮---- def get supported spec spec: t.Literal "python", "minic… 证据：`src/bentoml/_internal/container/frontend/dockerfile/__init__.py`
- **Init**（source_file）：all = "EnvManager" 证据：`src/bentoml/_internal/env_manager/__init__.py`
- **Init**（source_file）：F = t.Callable ..., t.Any ⋮---- PdSeries = PdSeries t.Any DataFrameOrient = Literal "split", "records", "index", "columns", "values", "table" SeriesOrient = Literal "split", "records", "index", "table" ⋮---- NpNDArray = NDArray t.Any ⋮---- WSGIApp = t.Callable F, t.Mapping str, t.Any , t.Iterable bytes all = 证据：`src/bentoml/_internal/external_typing/__init__.py`
- **Picklable Model**（source_file）：ModelType = t.Any MODULE NAME = "bentoml.picklable model" API VERSION = "v1" logger = logging.getLogger name def get tag like: str Tag - Model ⋮---- model = bentoml.models.get tag like ⋮---- def load model bento model: str Tag Model - ModelType ⋮---- """ Load the picklable model with the given tag from the local BentoML model store. Args: bento model: Either the tag of the model to get from the store, or a BentoML :class: ~bentoml.Model instance to load the model from. Returns: The picklable model loaded from the model store or BentoML :obj: ~bentoml.Model . Example: .. code-block:: python import bentoml picklable model = bentoml.picklable model.load model 'my model:latest' """ noqa ⋮---- "… 证据：`src/bentoml/_internal/frameworks/picklable_model.py`
- **Init**（source_file）：all = 证据：`src/bentoml/_internal/io_descriptors/__init__.py`
- **Base**（source_file）：InputType = OpenAPIResponse = dict str, str dict str, t.Any F = t.Callable ..., t.Any IO DESCRIPTOR REGISTRY: dict str, type IODescriptor t.Any = {} IOType = t.TypeVar "IOType" def from spec spec: dict str, t.Any - IODescriptor t.Any class OpenAPIMeta ⋮---- @abstractmethod def openapi schema self - Schema Reference ⋮---- @abstractmethod def openapi components self - dict str, t.Any None ⋮---- @abstractmethod def openapi example self - t.Any None ⋮---- @abstractmethod def openapi request body self - dict str, t.Any ⋮---- @abstractmethod def openapi responses self - dict str, t.Any class IODescriptor ABC, OpenAPIMeta, t.Generic IOType ⋮---- def init self, kwargs: t.Any - None: ... slots = HTT… 证据：`src/bentoml/_internal/io_descriptors/base.py`
- **Init**（source_file）：SAVE NAMESPACE = "saved model" JSON EXT = ".json" PKL EXT = ".pkl" PTH EXT = ".pth" TXT EXT = ".txt" YAML EXT = ".yaml" all = "Model", "ModelStore", "ModelContext", "ModelOptions", "copy model" 证据：`src/bentoml/_internal/models/__init__.py`
- **TODO: @larme @yetone run this branch only yatai version is incompatible with embedded runner**（source_file）：T = t.TypeVar "T" logger = logging.getLogger name PYTHON VERSION: str = f"{pyver.major}.{pyver.minor}.{pyver.micro}" MODEL YAML FILENAME = "model.yaml" CUSTOM OBJECTS FILENAME = "custom objects.pkl" ⋮---- @attr.define class ModelOptions ⋮---- def with options self, kwargs: t.Any - ModelOptions def to dict self: ModelOptions - dict str, t.Any ⋮---- @attr.define class PartialKwargsModelOptions ModelOptions ⋮---- partial kwargs: t.Dict str, t.Any = attr.field factory=dict ⋮---- @attr.define repr=False, eq=False class Model StoreItem ⋮---- tag: Tag path: Path = attr.field converter=Path info: ModelInfo custom objects: dict str, t.Any None = None internal: bool = attr.field kw only=True, default… 证据：`src/bentoml/_internal/models/model.py`
- **Init**（source_file）：logger = logging.getLogger name is otlp available = False ⋮---- is otlp available = True ⋮---- all = ⋮---- def getattr item: str - t.Any 证据：`src/bentoml/_internal/monitoring/__init__.py`
- **Base**（source_file）：MON COLUMN VAR: contextvars.ContextVar "dict str, dict str, str None" = MON DATAS VAR: contextvars.ContextVar "dict str, collections.deque t.Any None" = DT = t.TypeVar "DT" MT = t.TypeVar "MT", bound="MonitorBase t.Any " BENTOML MONITOR ROLES = {"feature", "prediction", "target"} BENTOML MONITOR TYPES = {"numerical", "categorical", "numerical sequence"} logger = logging.getLogger name class MonitorBase t.Generic DT ⋮---- PRESERVED COLUMNS: tuple str, ... = ⋮---- def start record self def stop record self - None ⋮---- datas: dict str, collections.deque DT = MON DATAS VAR.get ⋮---- columns = MON COLUMN VAR.get ⋮---- def export schema self, columns schema: dict str, dict str, str - None def ex… 证据：`src/bentoml/_internal/monitoring/base.py`
- **Use Ray's metrics setup. Ray has already initialized a prometheus client**（source_file）：class RunnerDeployment ⋮---- def init self ⋮---- runner = next ⋮---- inp batch dim = method.config.batch dim 0 out batch dim = method.config.batch dim 1 ray batch args = ⋮---- @serve.batch ray batch args async def func self, args, kwargs ⋮---- params = Params args, kwargs .map run params = params.map lambda arg: arg 0 indices = next iter params.items 1 1 results = await getattr runner, method.name .async run ⋮---- async def func self, args, kwargs ⋮---- def get service deployment svc: bentoml.legacy.Service, kwargs: t.Any - Deployment ⋮---- @serve.deployment name=f"bento-svc-{svc.name}", kwargs class BentoDeployment ⋮---- def init self, runner deployments: dict str, Deployment ⋮---- Use Ray… 证据：`src/bentoml/_internal/ray/__init__.py`
- **Init**（source_file）：all = "Runner", "Runnable" 证据：`src/bentoml/_internal/runner/__init__.py`
- **Init**（source_file）：R = t.TypeVar "R" P = t.ParamSpec "P" logger = logging.getLogger name class RunnerHandle ABC ⋮---- @abstractmethod def init self, runner: AbstractRunner - None: ... ⋮---- @abstractmethod async def is ready self, timeout: int - bool ⋮---- class DummyRunnerHandle RunnerHandle ⋮---- async def is ready self, timeout: int - bool 证据：`src/bentoml/_internal/runner/runner_handle/__init__.py`
- **Base App**（source_file）：logger = logging.getLogger name class BaseAppFactory abc.ABC ⋮---- is ready: bool = False ⋮---- @property @abc.abstractmethod def name self - str: ... ⋮---- @property def on startup self - list LifecycleHook ⋮---- @property def on shutdown self - list LifecycleHook ⋮---- @contextlib.asynccontextmanager async def lifespan self, app: Starlette - t.AsyncGenerator None, None ⋮---- ret = on startup app ⋮---- ret = on shutdown app ⋮---- def mark as ready self, : Starlette - None async def livez self, : Request - Response async def readyz self, : Request - Response def metrics self, : Request - Response ⋮---- metrics client = BentoMLContainer.metrics client.get ⋮---- def call self - Starlette def… 证据：`src/bentoml/_internal/server/base_app.py`
- **NOTE: since IODescriptor.proto fields is a tuple, the order is preserved.**（source_file）：logger = logging.getLogger name ⋮---- struct pb2 = LazyLoader "struct pb2", globals , "google.protobuf.struct pb2" def log exception request: pb.Request, exc info: ExcInfoType - None def create bento servicer service: Service - services.BentoServiceServicer ⋮---- class BentoServiceImpl services.BentoServiceServicer ⋮---- api = service.apis request.api name response = pb.Response output = None NOTE: since IODescriptor.proto fields is a tuple, the order is preserved. This is important so that we know the order of fields to process. We will use fields descriptor to determine how to process that request. ⋮---- we will check if the given fields list contains a pb.Multipart. input proto = getattr… 证据：`src/bentoml/_internal/server/grpc/servicer/v1/__init__.py`
- **NOTE: since IODescriptor.proto fields is a tuple, the order is preserved.**（source_file）：logger = logging.getLogger name ⋮---- def log exception request: pb.Request, exc info: ExcInfoType - None def create bento servicer service: Service - services.BentoServiceServicer ⋮---- class BentoServiceImpl services.BentoServiceServicer ⋮---- api = service.apis request.api name response = pb.Response output = None NOTE: since IODescriptor.proto fields is a tuple, the order is preserved. This is important so that we know the order of fields to process. We will use fields descriptor to determine how to process that request. ⋮---- we will check if the given fields list contains a pb.Multipart. input proto = getattr input data = await api.input.from proto input proto ⋮---- output = await api… 证据：`src/bentoml/_internal/server/grpc/servicer/v1alpha1/__init__.py`
- **Running on startup callback.**（source_file）：logger = logging.getLogger name ⋮---- health exception msg = "'grpcio-health-checking' is required for using health checking endpoints. Install with 'pip install grpcio-health-checking'." pb health = LazyLoader services health = LazyLoader health = LazyLoader class Server aio. server.Server ⋮---- runner statuses = runners ready = all await asyncio.gather runner statuses ⋮---- @property def options self - grpc.aio.ChannelArgumentType ⋮---- options: grpc.aio.ChannelArgumentType = ⋮---- @property def interceptors self - Interceptors ⋮---- interceptors: Interceptors = AsyncOpenTelemetryServerInterceptor ⋮---- access logger = logging.getLogger "bentoml.access" ⋮---- @property def handlers self -… 证据：`src/bentoml/_internal/server/grpc_app.py`
- **more descriptive errors if output is available**（source_file）：logger = logging.getLogger name DEFAULT INDEX HTML = """\ def log exception request: Request, exc info: t.Any = True - None class HTTPAppFactory BaseAppFactory ⋮---- timeout = BentoMLContainer.api server config.traffic.timeout.get ⋮---- @property def name self - str async def index view func self, : Request - Response async def docs view func self, : Request - Response ⋮---- @property def routes self - list BaseRoute ⋮---- routes = super .routes ⋮---- parent dir path = os.path.dirname os.path.realpath file ⋮---- api route endpoint = self. create api endpoint api ⋮---- route path = api.route ⋮---- route path = f"/{api.route}" ⋮---- @property def middlewares self - list Middleware ⋮---- middl… 证据：`src/bentoml/_internal/server/http_app.py`
- **Streaming does not have batching implemented yet**（source_file）：feedback logger = logging.getLogger "bentoml.feedback" logger = logging.getLogger name ⋮---- class RunnerAppFactory BaseAppFactory ⋮---- runners config = BentoMLContainer.runners config.get traffic = runners config.get "traffic", {} .copy ⋮---- def fallback ⋮---- max batch size = method.max batch size if method.config.batchable else -1 ⋮---- @property def name self - str def init metrics wrappers self, : Starlette - None ⋮---- metrics client = BentoMLContainer.metrics client.get max max batch size = max ⋮---- @property def on startup self - list LifecycleHook ⋮---- on startup = super .on startup ⋮---- @property def on shutdown self - list LifecycleHook ⋮---- on shutdown: list LifecycleHook… 证据：`src/bentoml/_internal/server/runner_app.py`
- **Init**（source_file）：all = "Service", "load" 证据：`src/bentoml/_internal/service/__init__.py`
- **Init**（source_file）：SUCCESS DESCRIPTION = "Successful Response" INFRA DECRIPTION = { all = "generate spec" INFRA TAG = Tag APP TAG = Tag def make api path api: InferenceAPI t.Any - str ⋮---- @lru cache maxsize=1 def make infra endpoints - dict str, PathItem def generate service components svc: Service - dict str, t.Any ⋮---- components: dict str, t.Any = {} ⋮---- api components = {} input components = api.input.openapi components ⋮---- output components = api.output.openapi components ⋮---- def generate spec svc: Service, , openapi version: str = "3.0.2" ⋮---- mounted app paths = {} schema components = {} ⋮---- openapi = get openapi 证据：`src/bentoml/_internal/service/openapi/__init__.py`
- **Init**（source_file）：P = t.ParamSpec "P" C = t.TypeVar "C" T = t.TypeVar "T" EXPERIMENTAL APIS: set str = set logger = logging.getLogger name def warn experimental api name: str - None ⋮---- msg = "'%s' is an EXPERIMENTAL API and is currently not yet stable. Proceed with caution!" ⋮---- api name = f. name if inspect.isfunction f else repr f def decorator func: t.Callable ..., t.Any - t.Callable P, t.Any ⋮---- @functools.wraps func def wrapper args: P.args, kwargs: P.kwargs - t.Any ⋮---- def add experimental docstring f: t.Callable P, t.Any - t.Callable P, t.Any ⋮---- @overload def first not none args: T None, default: T - T: ... ⋮---- @overload def first not none args: T None - T None: ... def first not none ar… 证据：`src/bentoml/_internal/utils/__init__.py`
- **Alg**（source_file）：T = t.TypeVar "T" class FixedBucket t.Generic T ⋮---- def init self, size: int def put self, v: T ⋮---- @property def data self def len self def getitem self, sl: slice class TokenBucket ⋮---- def init self, init amount: int = 0 def consume self, take amount: int, avg rate: float, burst size: int ⋮---- now = time.time inc = now - self. last consume time avg rate current amount = min inc + self. amount, burst size 证据：`src/bentoml/_internal/utils/alg.py`
- **Init**（source_file）：all = 证据：`src/bentoml/_internal/utils/analytics/__init__.py`
- **Cli Events**（source_file）：bento = return value total size = bento.total size ⋮---- num of runners = len bento.info.services - 1 ⋮---- num of runners = len bento.info.runners ⋮---- num of runners = 0 ⋮---- cli events map = {"bentos": {"build": cli bentoml build event}} 证据：`src/bentoml/_internal/utils/analytics/cli_events.py`
- 其余 5 条证据见 `AI_CONTEXT_PACK.json` 或 `EVIDENCE_INDEX.json`。

## 宿主 AI 必须遵守的规则

- **把本资产当作开工前上下文，而不是运行环境。**：AI Context Pack 只包含证据化项目理解，不包含目标项目的可执行状态。 证据：`README.md`, `src/bentoml/_internal/server/README.md`, `src/bentoml_cli/worker/README.md`
- **回答用户时区分可预览内容与必须安装后才能验证的内容。**：安装前体验的消费者价值来自降低误装和误判，而不是伪装成真实运行。 证据：`README.md`, `src/bentoml/_internal/server/README.md`, `src/bentoml_cli/worker/README.md`

## 用户开工前应该回答的问题

- 你准备在哪个宿主 AI 或本地环境中使用它？
- 你只是想先体验工作流，还是准备真实安装？
- 你最在意的是安装成本、输出质量、还是和现有规则的冲突？

## 验收标准

- 所有能力声明都能回指到 evidence_refs 中的文件路径。
- AI_CONTEXT_PACK.md 没有把预览包装成真实运行。
- 用户能在 3 分钟内看懂适合谁、能做什么、如何开始和风险边界。

---

## Doramagic Context Augmentation

下面内容用于强化 Repomix/AI Context Pack 主体。Human Manual 只提供阅读骨架；踩坑日志会被转成宿主 AI 必须遵守的工作约束。

## Human Manual 骨架

使用规则：这里只是项目阅读路线和显著性信号，不是事实权威。具体事实仍必须回到 repo evidence / Claim Graph。

宿主 AI 硬性规则：
- 不得把页标题、章节顺序、摘要或 importance 当作项目事实证据。
- 解释 Human Manual 骨架时，必须明确说它只是阅读路线/显著性信号。
- 能力、安装、兼容性、运行状态和风险判断必须引用 repo evidence、source path 或 Claim Graph。

- **框架概览与系统架构**：importance `high`
  - source_paths: README.md, src/bentoml/__init__.py, src/bentoml/__main__.py, src/_bentoml_sdk/__init__.py, src/_bentoml_impl/__init__.py
- **Services、API 与 IO 类型系统**：importance `high`
  - source_paths: src/_bentoml_sdk/service/__init__.py, src/_bentoml_sdk/decorators.py, src/_bentoml_sdk/method.py, src/_bentoml_sdk/io_models.py, src/_bentoml_sdk/service/factory.py
- **模型管理与容器化构建**：importance `high`
  - source_paths: src/bentoml/_internal/models/model.py, src/bentoml/_internal/models/__init__.py, src/bentoml/_internal/frameworks/pytorch.py, src/bentoml/_internal/frameworks/tensorflow.py, src/bentoml/_internal/frameworks/transformers.py
- **部署、可观测性、Runner 与扩展机制**：importance `high`
  - source_paths: src/bentoml_cli/cli.py, src/bentoml_cli/serve.py, src/bentoml_cli/containerize.py, src/bentoml_cli/deployment.py, src/bentoml_cli/cloud.py

## Repo Inspection Evidence / 源码检查证据

- repo_clone_verified: true
- repo_inspection_verified: true
- repo_commit: `73c4dbead99be6515fa25fcd91e348ac30f5c22e`
- inspected_files: `pyproject.toml`, `README.md`, `docs/README.md`, `docs/source/conf.py`, `docs/source/data/build-bento-action.yaml`, `docs/source/data/containerize-and-push.yaml`, `docs/source/data/bentoml-setup.yaml`, `docs/source/data/deploy-bento-to-cloud.yaml`, `docs/source/_static/js/custom.js`, `docs/source/build-with-bentoml/snippets/metrics/runner_impl.py`, `docs/source/build-with-bentoml/snippets/metrics/metric_defs.py`, `docs/source/build-with-bentoml/snippets/tracing/bentoml_configuration.yaml`, `docs/source/build-with-bentoml/snippets/tracing/docker-compose.yml`, `docs/source/build-with-bentoml/snippets/grpc/python/request.py`, `docs/source/build-with-bentoml/snippets/grpc/node/request.js`, `examples/README.md`, `src/bentoml_cli/env.py`, `src/bentoml_cli/models.py`, `src/bentoml_cli/deployment.py`, `src/bentoml_cli/env_manager.py`

宿主 AI 硬性规则：
- 没有 repo_clone_verified=true 时，不得声称已经读过源码。
- 没有 repo_inspection_verified=true 时，不得把 README/docs/package 文件判断写成事实。
- 没有 quick_start_verified=true 时，不得声称 Quick Start 已跑通。

## Doramagic Pitfall Constraints / 踩坑约束

这些规则来自 Doramagic 发现、验证或编译过程中的项目专属坑点。宿主 AI 必须把它们当作工作约束，而不是普通说明文字。

### Constraint 1: 来源证据：feature: support for pylock.toml

- Trigger: GitHub 社区证据显示该项目存在一个安装相关的待验证问题：feature: support for pylock.toml
- Why it matters: 可能增加新用户试用和生产接入成本。
- Evidence: community_evidence:github | https://github.com/bentoml/BentoML/issues/5466 | 来源讨论提到 python 相关条件，需在安装/试用前复核。
- Hard boundary: 不要把这个坑点包装成已解决、已验证或可忽略，除非后续验证证据明确证明它已经关闭。

### Constraint 2: 来源证据：bug: Bentoml Pytorch model serve bug

- Trigger: GitHub 社区证据显示该项目存在一个配置相关的待验证问题：bug: Bentoml Pytorch model serve bug
- Why it matters: 可能影响升级、迁移或版本选择。
- Evidence: community_evidence:github | https://github.com/bentoml/BentoML/issues/5365 | 来源讨论提到 python 相关条件，需在安装/试用前复核。
- Hard boundary: 不要把这个坑点包装成已解决、已验证或可忽略，除非后续验证证据明确证明它已经关闭。

### Constraint 3: 能力判断依赖假设

- Trigger: README/documentation is current enough for a first validation pass.
- Host AI rule: 将假设转成下游验证清单。
- Why it matters: 假设不成立时，用户拿不到承诺的能力。
- Evidence: capability.assumptions | github_repo:178976529 | https://github.com/bentoml/BentoML | README/documentation is current enough for a first validation pass.
- Hard boundary: 不要把这个坑点包装成已解决、已验证或可忽略，除非后续验证证据明确证明它已经关闭。

### Constraint 4: 来源证据：BUG: IndexError in IODescriptor.from_output() with bare (unparameterized) iterator return annotations

- Trigger: GitHub 社区证据显示该项目存在一个运行相关的待验证问题：BUG: IndexError in IODescriptor.from_output() with bare (unparameterized) iterator return annotations
- Why it matters: 可能阻塞安装或首次运行。
- Evidence: community_evidence:github | https://github.com/bentoml/BentoML/issues/5625 | 来源讨论提到 python 相关条件，需在安装/试用前复核。
- Hard boundary: 不要把这个坑点包装成已解决、已验证或可忽略，除非后续验证证据明确证明它已经关闭。

### Constraint 5: 维护活跃度未知

- Trigger: 未记录 last_activity_observed。
- Host AI rule: 补 GitHub 最近 commit、release、issue/PR 响应信号。
- Why it matters: 新项目、停更项目和活跃项目会被混在一起，推荐信任度下降。
- Evidence: evidence.maintainer_signals | github_repo:178976529 | https://github.com/bentoml/BentoML | last_activity_observed missing
- Hard boundary: 不要把这个坑点包装成已解决、已验证或可忽略，除非后续验证证据明确证明它已经关闭。

- Trigger: no_demo
- Evidence: downstream_validation.risk_items | github_repo:178976529 | https://github.com/bentoml/BentoML | no_demo; severity=medium
- Hard boundary: 不要把这个坑点包装成已解决、已验证或可忽略，除非后续验证证据明确证明它已经关闭。

### Constraint 7: 存在评分风险

- Trigger: no_demo
- Why it matters: 风险会影响是否适合普通用户安装。
- Evidence: risks.scoring_risks | github_repo:178976529 | https://github.com/bentoml/BentoML | no_demo; severity=medium
- Hard boundary: 不要把这个坑点包装成已解决、已验证或可忽略，除非后续验证证据明确证明它已经关闭。

### Constraint 8: issue/PR 响应质量未知

- Trigger: issue_or_pr_quality=unknown。
- Host AI rule: 抽样最近 issue/PR，判断是否长期无人处理。
- Why it matters: 用户无法判断遇到问题后是否有人维护。
- Evidence: evidence.maintainer_signals | github_repo:178976529 | https://github.com/bentoml/BentoML | issue_or_pr_quality=unknown
- Hard boundary: 不要把这个坑点包装成已解决、已验证或可忽略，除非后续验证证据明确证明它已经关闭。

### Constraint 9: 发布节奏不明确

- Trigger: release_recency=unknown。
- Host AI rule: 确认最近 release/tag 和 README 安装命令是否一致。
- Why it matters: 安装命令和文档可能落后于代码，用户踩坑概率升高。
- Evidence: evidence.maintainer_signals | github_repo:178976529 | https://github.com/bentoml/BentoML | release_recency=unknown
- Hard boundary: 不要把这个坑点包装成已解决、已验证或可忽略，除非后续验证证据明确证明它已经关闭。