# deepchecks - Doramagic AI Context Pack

> 定位：安装前体验与判断资产。它帮助宿主 AI 有一个好的开始，但不代表已经安装、执行或验证目标项目。

## 充分原则

- **充分原则，不是压缩原则**：AI Context Pack 应该充分到让宿主 AI 在开工前理解项目价值、能力边界、使用入口、风险和证据来源；它可以分层组织，但不以最短摘要为目标。
- **压缩策略**：只压缩噪声和重复内容，不压缩会影响判断和开工质量的上下文。

## 给宿主 AI 的使用方式

你正在读取 Doramagic 为 deepchecks 编译的 AI Context Pack。请把它当作开工前上下文：帮助用户理解适合谁、能做什么、如何开始、哪些必须安装后验证、风险在哪里。不要声称你已经安装、运行或执行了目标项目。

## Claim 消费规则

- **事实来源**：Repo Evidence + Claim/Evidence Graph；Human Wiki 只提供显著性、术语和叙事结构。
- **事实最低状态**：`supported`
- `supported`：可以作为项目事实使用，但回答中必须引用 claim_id 和证据路径。
- `weak`：只能作为低置信度线索，必须要求用户继续核实。
- `inferred`：只能用于风险提示或待确认问题，不能包装成项目事实。
- `unverified`：不得作为事实使用，应明确说证据不足。
- `contradicted`：必须展示冲突来源，不得替用户强行选择一个版本。

## 它最适合谁

- **想在安装前理解开源项目价值和边界的用户**：当前证据主要来自项目文档。 证据：`README.md` Claim：`clm_0014` supported 0.86

## 它能做什么

- **Tabular Data Validation**（可做安装前预览）：Comprehensive validation of tabular datasets using built-in checks for data integrity, distribution, and quality issues. 证据：`deepchecks/core/checks.py`, `deepchecks/core/context.py`, `requirements/requirements.txt` Claim：`clm_0001` supported 0.86, `clm_0004` supported 0.86, `clm_0008` supported 0.86, `clm_0010` supported 0.86
- **NLP Text Classification Validation**（可做安装前预览）：Validate text classification models with checks for data integrity, drift detection, and performance evaluation on text data. 证据：`deepchecks/nlp/__init__.py`, `deepchecks/nlp/text_data.py`, `deepchecks/nlp/task_type.py`, `deepchecks/nlp/base_checks.py` 等 Claim：`clm_0002` supported 0.86, `clm_0003` supported 0.86, `clm_0005` supported 0.86, `clm_0007` supported 0.86
- **NLP Token Classification Validation**（可做安装前预览）：Validate token-level classification (NER) tasks with specialized checks and metrics. 证据：`deepchecks/nlp/task_type.py`, `deepchecks/nlp/context.py`, `deepchecks/nlp/input_validations.py`, `deepchecks/nlp/datasets/token_classification/__init__.py` Claim：`clm_0002` supported 0.86, `clm_0003` supported 0.86
- **Drift Detection (Train-Test)**（可做安装前预览）：Detect distribution drift between training and test datasets using statistical methods and model-based approaches. 证据：`deepchecks/nlp/checks/train_test_validation/__init__.py`, `deepchecks/nlp/utils/multivariate_embeddings_drift_utils.py`, `deepchecks/core/checks.py` Claim：`clm_0001` supported 0.86, `clm_0004` supported 0.86, `clm_0005` supported 0.86, `clm_0008` supported 0.86
- **Text Embeddings Drift Detection**（可做安装前预览）：Detect semantic drift in text data using embeddings with dimension reduction (UMAP/PCA) and domain classification. 证据：`deepchecks/nlp/utils/multivariate_embeddings_drift_utils.py`, `deepchecks/nlp/text_data.py` Claim：`clm_0002` supported 0.86, `clm_0004` supported 0.86, `clm_0005` supported 0.86, `clm_0007` supported 0.86
- **Frequent Substrings Detection**（可做安装前预览）：Detect repeated or copy-pasted text content in datasets by analyzing n-gram frequencies. 证据：`deepchecks/nlp/checks/data_integrity/frequent_substrings.py` Claim：`clm_0006` supported 0.86
- **Text Property Outliers Detection**（可做安装前预览）：Identify outlier samples based on text properties using statistical methods (LOF) for numeric and categorical properties. 证据：`deepchecks/nlp/checks/data_integrity/text_property_outliers.py`, `deepchecks/nlp/text_data.py` Claim：`clm_0002` supported 0.86, `clm_0005` supported 0.86, `clm_0007` supported 0.86
- **Custom Check Creation Framework**（可做安装前预览）：Extend Deepchecks by creating custom checks that inherit from base check classes with condition evaluation support. 证据：`deepchecks/core/checks.py`, `deepchecks/core/check_result.py` Claim：`clm_0001` supported 0.86, `clm_0004` supported 0.86, `clm_0008` supported 0.86, `clm_0012` supported 0.86
- **HTML Report Export**（可做安装前预览）：Export check and suite results as standalone HTML files for sharing and CI/CD integration. 证据：`deepchecks/core/display.py`, `deepchecks/core/suite.py` Claim：`clm_0009` supported 0.86, `clm_0010` supported 0.86, `clm_0011` supported 0.86, `clm_0012` supported 0.86
- **Jupyter Notebook Display**（可做安装前预览）：Rich inline display of check results in Jupyter/IPython environments with interactive widgets and plots. 证据：`deepchecks/core/display.py`, `deepchecks/core/serialization/check_result/ipython.py`, `requirements/requirements.txt` Claim：`clm_0001` supported 0.86, `clm_0009` supported 0.86, `clm_0010` supported 0.86
- **Suite Orchestration**（可做安装前预览）：Organize multiple checks into suites that run sequentially with aggregated results and reporting. 证据：`deepchecks/core/suite.py`, `deepchecks/nlp/suite.py` Claim：`clm_0009` supported 0.86, `clm_0011` supported 0.86, `clm_0012` supported 0.86
- **Check Result Serialization**（可做安装前预览）：Serialize and deserialize check results to/from JSON for storage and cross-environment result sharing. 证据：`deepchecks/core/check_result.py`, `deepchecks/core/suite.py` Claim：`clm_0008` supported 0.86, `clm_0009` supported 0.86, `clm_0011` supported 0.86, `clm_0012` supported 0.86
- **Model Performance Metrics**（可做安装前预览）：Evaluate model performance using built-in and custom scorers with support for classification and token-level metrics. 证据：`deepchecks/nlp/metric_utils/__init__.py`, `deepchecks/nlp/metric_utils/scorers.py` Claim：`clm_0013` supported 0.86

## 怎么开始

- `pip install deepchecks -U --user` 证据：`README.md` Claim：`clm_0015` supported 0.86
- `pip install deepchecks-installer` 证据：`README.md` Claim：`clm_0016` supported 0.86

## 继续前判断卡

- **当前建议**：仅建议沙盒试装
- **为什么**：项目存在安装命令、宿主配置或本地写入线索，不建议直接进入主力环境，应先在隔离环境试装。

### 30 秒判断

- **现在怎么做**：仅建议沙盒试装
- **最小安全下一步**：先跑 Prompt Preview；若仍要安装，只在隔离环境试装
- **先别相信**：真实输出质量不能在安装前相信。
- **继续会触碰**：命令执行、宿主 AI 上下文

### 现在可以相信

- **适合人群线索：想在安装前理解开源项目价值和边界的用户**（supported）：有 supported claim 或项目证据支撑，但仍不等于真实安装效果。 证据：`README.md` Claim：`clm_0014` supported 0.86
- **能力存在：Tabular Data Validation**（supported）：可以相信项目包含这类能力线索；是否适合你的具体任务仍要试用或安装后验证。 证据：`deepchecks/core/checks.py`, `deepchecks/core/context.py`, `requirements/requirements.txt` Claim：`clm_0001` supported 0.86, `clm_0004` supported 0.86, `clm_0008` supported 0.86, `clm_0010` supported 0.86
- **能力存在：NLP Text Classification Validation**（supported）：可以相信项目包含这类能力线索；是否适合你的具体任务仍要试用或安装后验证。 证据：`deepchecks/nlp/__init__.py`, `deepchecks/nlp/text_data.py`, `deepchecks/nlp/task_type.py`, `deepchecks/nlp/base_checks.py` 等 Claim：`clm_0002` supported 0.86, `clm_0003` supported 0.86, `clm_0005` supported 0.86, `clm_0007` supported 0.86
- **能力存在：NLP Token Classification Validation**（supported）：可以相信项目包含这类能力线索；是否适合你的具体任务仍要试用或安装后验证。 证据：`deepchecks/nlp/task_type.py`, `deepchecks/nlp/context.py`, `deepchecks/nlp/input_validations.py`, `deepchecks/nlp/datasets/token_classification/__init__.py` Claim：`clm_0002` supported 0.86, `clm_0003` supported 0.86
- **能力存在：Drift Detection (Train-Test)**（supported）：可以相信项目包含这类能力线索；是否适合你的具体任务仍要试用或安装后验证。 证据：`deepchecks/nlp/checks/train_test_validation/__init__.py`, `deepchecks/nlp/utils/multivariate_embeddings_drift_utils.py`, `deepchecks/core/checks.py` Claim：`clm_0001` supported 0.86, `clm_0004` supported 0.86, `clm_0005` supported 0.86, `clm_0008` supported 0.86
- **能力存在：Text Embeddings Drift Detection**（supported）：可以相信项目包含这类能力线索；是否适合你的具体任务仍要试用或安装后验证。 证据：`deepchecks/nlp/utils/multivariate_embeddings_drift_utils.py`, `deepchecks/nlp/text_data.py` Claim：`clm_0002` supported 0.86, `clm_0004` supported 0.86, `clm_0005` supported 0.86, `clm_0007` supported 0.86

### 现在还不能相信

- **真实输出质量不能在安装前相信。**（unverified）：Prompt Preview 只能展示引导方式，不能证明真实项目中的结果质量。
- **宿主 AI 版本兼容性不能在安装前相信。**（unverified）：Claude、Cursor、Codex、Gemini 等宿主加载规则和版本差异必须在真实环境验证。
- **不会污染现有宿主 AI 行为，不能直接相信。**（inferred）：Skill、plugin、AGENTS/CLAUDE/GEMINI 指令可能改变宿主 AI 的默认行为。
- **可安全回滚不能默认相信。**（unverified）：除非项目明确提供卸载和恢复说明，否则必须先在隔离环境验证。
- **真实安装后是否与用户当前宿主 AI 版本兼容？**（unverified）：兼容性只能通过实际宿主环境验证。
- **项目输出质量是否满足用户具体任务？**（unverified）：安装前预览只能展示流程和边界，不能替代真实评测。
- **安装命令是否需要网络、权限或全局写入？**（unverified）：这影响企业环境和个人环境的安装风险。 证据：`README.md`

### 继续会触碰什么

- **命令执行**：包管理器、网络下载、本地插件目录、项目配置或用户主目录。 原因：运行第一条命令就可能产生环境改动；必须先判断是否值得跑。 证据：`README.md`
- **宿主 AI 上下文**：AI Context Pack、Prompt Preview、Skill 路由、风险规则和项目事实。 原因：导入上下文会影响宿主 AI 后续判断，必须避免把未验证项包装成事实。

### 最小安全下一步

- **先跑 Prompt Preview**：用安装前交互式试用判断工作方式是否匹配，不需要授权或改环境。（适用：任何项目都适用，尤其是输出质量未知时。）
- **只在隔离目录或测试账号试装**：避免安装命令污染主力宿主 AI、真实项目或用户主目录。（适用：存在命令执行、插件配置或本地写入线索时。）
- **安装后只验证一个最小任务**：先验证加载、兼容、输出质量和回滚，再决定是否深用。（适用：准备从试用进入真实工作流时。）

### 退出方式

- **保留安装前状态**：记录原始宿主配置和项目状态，后续才能判断是否可恢复。
- **记录安装命令和写入路径**：没有明确卸载说明时，至少要知道哪些目录或配置需要手动清理。
- **如果没有回滚路径，不进入主力环境**：不可回滚是继续前阻断项，不应靠信任或运气继续。

## 哪些只能预览

- 解释项目适合谁和能做什么
- 基于项目文档演示典型对话流程
- 帮助用户判断是否值得安装或继续研究

## 哪些必须安装后验证

- 真实安装 Skill、插件或 CLI
- 执行脚本、修改本地文件或访问外部服务
- 验证真实输出质量、性能和兼容性

## 边界与风险判断卡

- **把安装前预览误认为真实运行**：用户可能高估项目已经完成的配置、权限和兼容性验证。 处理方式：明确区分 prompt_preview_can_do 与 runtime_required。 Claim：`clm_0017` inferred 0.45
- **命令执行会修改本地环境**：安装命令可能写入用户主目录、宿主插件目录或项目配置。 处理方式：先在隔离环境或测试账号中运行。 证据：`README.md` Claim：`clm_0018` supported 0.86
- **风险**： 处理方式：
- **风险**： 处理方式：
- **风险**： 处理方式：
- **风险**： 处理方式：
- **待确认**：真实安装后是否与用户当前宿主 AI 版本兼容？。原因：兼容性只能通过实际宿主环境验证。
- **待确认**：项目输出质量是否满足用户具体任务？。原因：安装前预览只能展示流程和边界，不能替代真实评测。
- **待确认**：安装命令是否需要网络、权限或全局写入？。原因：这影响企业环境和个人环境的安装风险。

## 开工前工作上下文

### 加载顺序

- 先读取 how_to_use.host_ai_instruction，建立安装前判断资产的边界。
- 读取 claim_graph_summary，确认事实来自 Claim/Evidence Graph，而不是 Human Wiki 叙事。
- 再读取 intended_users、capabilities 和 quick_start_candidates，判断用户是否匹配。
- 需要执行具体任务时，优先查 role_skill_index，再查 evidence_index。
- 遇到真实安装、文件修改、网络访问、性能或兼容性问题时，转入 risk_card 和 boundaries.runtime_required。

### 任务路由

- **Tabular Data Validation**：先基于 role_skill_index / evidence_index 帮用户挑选可用角色、Skill 或工作流。 边界：可做安装前 Prompt 体验。 证据：`deepchecks/core/checks.py`, `deepchecks/core/context.py`, `requirements/requirements.txt` Claim：`clm_0001` supported 0.86, `clm_0004` supported 0.86, `clm_0008` supported 0.86, `clm_0010` supported 0.86
- **NLP Text Classification Validation**：先基于 role_skill_index / evidence_index 帮用户挑选可用角色、Skill 或工作流。 边界：可做安装前 Prompt 体验。 证据：`deepchecks/nlp/__init__.py`, `deepchecks/nlp/text_data.py`, `deepchecks/nlp/task_type.py`, `deepchecks/nlp/base_checks.py` 等 Claim：`clm_0002` supported 0.86, `clm_0003` supported 0.86, `clm_0005` supported 0.86, `clm_0007` supported 0.86
- **NLP Token Classification Validation**：先基于 role_skill_index / evidence_index 帮用户挑选可用角色、Skill 或工作流。 边界：可做安装前 Prompt 体验。 证据：`deepchecks/nlp/task_type.py`, `deepchecks/nlp/context.py`, `deepchecks/nlp/input_validations.py`, `deepchecks/nlp/datasets/token_classification/__init__.py` Claim：`clm_0002` supported 0.86, `clm_0003` supported 0.86
- **Drift Detection (Train-Test)**：先基于 role_skill_index / evidence_index 帮用户挑选可用角色、Skill 或工作流。 边界：可做安装前 Prompt 体验。 证据：`deepchecks/nlp/checks/train_test_validation/__init__.py`, `deepchecks/nlp/utils/multivariate_embeddings_drift_utils.py`, `deepchecks/core/checks.py` Claim：`clm_0001` supported 0.86, `clm_0004` supported 0.86, `clm_0005` supported 0.86, `clm_0008` supported 0.86
- **Text Embeddings Drift Detection**：先基于 role_skill_index / evidence_index 帮用户挑选可用角色、Skill 或工作流。 边界：可做安装前 Prompt 体验。 证据：`deepchecks/nlp/utils/multivariate_embeddings_drift_utils.py`, `deepchecks/nlp/text_data.py` Claim：`clm_0002` supported 0.86, `clm_0004` supported 0.86, `clm_0005` supported 0.86, `clm_0007` supported 0.86
- **Frequent Substrings Detection**：先基于 role_skill_index / evidence_index 帮用户挑选可用角色、Skill 或工作流。 边界：可做安装前 Prompt 体验。 证据：`deepchecks/nlp/checks/data_integrity/frequent_substrings.py` Claim：`clm_0006` supported 0.86
- **Text Property Outliers Detection**：先基于 role_skill_index / evidence_index 帮用户挑选可用角色、Skill 或工作流。 边界：可做安装前 Prompt 体验。 证据：`deepchecks/nlp/checks/data_integrity/text_property_outliers.py`, `deepchecks/nlp/text_data.py` Claim：`clm_0002` supported 0.86, `clm_0005` supported 0.86, `clm_0007` supported 0.86
- **Custom Check Creation Framework**：先基于 role_skill_index / evidence_index 帮用户挑选可用角色、Skill 或工作流。 边界：可做安装前 Prompt 体验。 证据：`deepchecks/core/checks.py`, `deepchecks/core/check_result.py` Claim：`clm_0001` supported 0.86, `clm_0004` supported 0.86, `clm_0008` supported 0.86, `clm_0012` supported 0.86
- **HTML Report Export**：先基于 role_skill_index / evidence_index 帮用户挑选可用角色、Skill 或工作流。 边界：可做安装前 Prompt 体验。 证据：`deepchecks/core/display.py`, `deepchecks/core/suite.py` Claim：`clm_0009` supported 0.86, `clm_0010` supported 0.86, `clm_0011` supported 0.86, `clm_0012` supported 0.86
- **Jupyter Notebook Display**：先基于 role_skill_index / evidence_index 帮用户挑选可用角色、Skill 或工作流。 边界：可做安装前 Prompt 体验。 证据：`deepchecks/core/display.py`, `deepchecks/core/serialization/check_result/ipython.py`, `requirements/requirements.txt` Claim：`clm_0001` supported 0.86, `clm_0009` supported 0.86, `clm_0010` supported 0.86
- **Suite Orchestration**：先基于 role_skill_index / evidence_index 帮用户挑选可用角色、Skill 或工作流。 边界：可做安装前 Prompt 体验。 证据：`deepchecks/core/suite.py`, `deepchecks/nlp/suite.py` Claim：`clm_0009` supported 0.86, `clm_0011` supported 0.86, `clm_0012` supported 0.86
- **Check Result Serialization**：先基于 role_skill_index / evidence_index 帮用户挑选可用角色、Skill 或工作流。 边界：可做安装前 Prompt 体验。 证据：`deepchecks/core/check_result.py`, `deepchecks/core/suite.py` Claim：`clm_0008` supported 0.86, `clm_0009` supported 0.86, `clm_0011` supported 0.86, `clm_0012` supported 0.86
- **Model Performance Metrics**：先基于 role_skill_index / evidence_index 帮用户挑选可用角色、Skill 或工作流。 边界：可做安装前 Prompt 体验。 证据：`deepchecks/nlp/metric_utils/__init__.py`, `deepchecks/nlp/metric_utils/scorers.py` Claim：`clm_0013` supported 0.86

### 上下文规模

- 文件总数：558
- 重要文件覆盖：40/558
- 证据索引条目：76
- 角色 / Skill 条目：1

### 证据不足时的处理

- **missing_evidence**：说明证据不足，要求用户提供目标文件、README 段落或安装后验证记录；不要补全事实。
- **out_of_scope_request**：说明该任务超出当前 AI Context Pack 证据范围，并建议用户先查看 Human Manual 或真实安装后验证。
- **runtime_request**：给出安装前检查清单和命令来源，但不要替用户执行命令或声称已执行。
- **source_conflict**：同时展示冲突来源，标记为待核实，不要强行选择一个版本。

## Prompt Recipes

### 适配判断

- 目标：判断这个项目是否适合用户当前任务。
- 预期输出：适配结论、关键理由、证据引用、安装前可预览内容、必须安装后验证内容、下一步建议。

```text
请基于 deepchecks 的 AI Context Pack，先问我 3 个必要问题，然后判断它是否适合我的任务。回答必须包含：适合谁、能做什么、不能做什么、是否值得安装、证据来自哪里。所有项目事实必须引用 evidence_refs、source_paths 或 claim_id。
```

### 安装前体验

- 目标：让用户在安装前感受核心工作流，同时避免把预览包装成真实能力或营销承诺。
- 预期输出：一段带边界标签的体验剧本、安装后验证清单和谨慎建议；不含真实运行承诺或强营销表述。

```text
请把 deepchecks 当作安装前体验资产，而不是已安装工具或真实运行环境。

请严格输出四段：
1. 先问我 3 个必要问题。
2. 给出一段“体验剧本”：用 [安装前可预览]、[必须安装后验证]、[证据不足] 三种标签展示它可能如何引导工作流。
3. 给出安装后验证清单：列出哪些能力只有真实安装、真实宿主加载、真实项目运行后才能确认。
4. 给出谨慎建议：只能说“值得继续研究/试装”“先补充信息后再判断”或“不建议继续”，不得替项目背书。

硬性边界：
- 不要声称已经安装、运行、执行测试、修改文件或产生真实结果。
- 不要写“自动适配”“确保通过”“完美适配”“强烈建议安装”等承诺性表达。
- 如果描述安装后的工作方式，必须使用“如果安装成功且宿主正确加载 Skill，它可能会……”这种条件句。
- 体验剧本只能写成“示例台词/假设流程”：使用“可能会询问/可能会建议/可能会展示”，不要写“已写入、已生成、已通过、正在运行、正在生成”。
- Prompt Preview 不负责给安装命令；如用户准备试装，只能提示先阅读 Quick Start 和 Risk Card，并在隔离环境验证。
- 所有项目事实必须来自 supported claim、evidence_refs 或 source_paths；inferred/unverified 只能作风险或待确认项。

```

### 角色 / Skill 选择

- 目标：从项目里的角色或 Skill 中挑选最匹配的资产。
- 预期输出：候选角色或 Skill 列表，每项包含适用场景、证据路径、风险边界和是否需要安装后验证。

```text
请读取 role_skill_index，根据我的目标任务推荐 3-5 个最相关的角色或 Skill。每个推荐都要说明适用场景、可能输出、风险边界和 evidence_refs。
```

### 风险预检

- 目标：安装或引入前识别环境、权限、规则冲突和质量风险。
- 预期输出：环境、权限、依赖、许可、宿主冲突、质量风险和未知项的检查清单。

```text
请基于 risk_card、boundaries 和 quick_start_candidates，给我一份安装前风险预检清单。不要替我执行命令，只说明我应该检查什么、为什么检查、失败会有什么影响。
```

### 宿主 AI 开工指令

- 目标：把项目上下文转成一次对话开始前的宿主 AI 指令。
- 预期输出：一段边界明确、证据引用明确、适合复制给宿主 AI 的开工前指令。

```text
请基于 deepchecks 的 AI Context Pack，生成一段我可以粘贴给宿主 AI 的开工前指令。这段指令必须遵守 not_runtime=true，不能声称项目已经安装、运行或产生真实结果。
```


## 角色 / Skill 索引

- 共索引 1 个角色 / Skill / 项目文档条目。

- **🧩 Components**（project_doc）：. ~ ---------------------------------------------------------------------------- ~ -- 激活提示：当用户需要理解项目结构、安装方式或边界时参考。 证据：`README.md`

## 证据索引

- 共索引 76 条证据。

- **🧩 Components**（documentation）：. ~ ---------------------------------------------------------------------------- ~ -- 证据：`README.md`
- **License**（source_file）：Deepchecks Continuous Validation and Testing for ML Models & Data 证据：`LICENSE`
- **Meta**（source_file）：{% set data = load setup py data %} package: name: deepchecks version: {{ data.get 'version' }} about: home: "https://github.com/deepchecks/deepchecks" license: AGPLv3 license file: ../LICENSE summary: "Package for validating your machine learning model and data" doc url: "https://docs.deepchecks.com" dev url: "https://github.com/deepchecks/deepchecks" source: path: .. build: number: 0 script: "{{ PYTHON }} -m pip install . -vv" noarch: python requirements: host: - category encoders =2.3.0 - dataclasses =0.6 - ipykernel =4.10.1 - ipykernel =5.3.0 - ipython =5.5.0, =7.15.0, =7.5.0 - ipywidgets =7.6.5 - matplotlib =3.3.3 - numpy =1.19 - pandas =1.1.5 - pip - plotly =5.4.0 - python - scikit-le… 证据：`conda-recipe/meta.yaml`
- **NOTE:**（source_file）：version = version 'deepchecks' ⋮---- all = ⋮---- pio backends = pio.renderers.default.split '+' ⋮---- NOTE: Code below is a temporary hack that exists only to provide backward compatibility and will be removed in the next versions. ⋮---- TODO: python =3.7 provides an ability to modify module attribute access by implementing ' getattr ' function on the module level scope ⋮---- original module = sys.modules name ⋮---- class SubstituteModule types.ModuleType ⋮---- ROUTINES = ⋮---- def init self, args, kwargs ⋮---- def getattribute self, name ⋮---- routines = object. getattribute self, 'ROUTINES' ⋮---- deprecation warning = 'Ability to import base tabular functionality from the deepchecks packa… 证据：`deepchecks/__init__.py`
- **Checks**（source_file）：all = 证据：`deepchecks/checks.py`
- **ipywidgets @ git+https://github.com/deepchecks/ipywidgets.git@8ac487b64ffd48dbeaa0a47d1997be27f6052dbc**（source_file）：sphinx==4.5.0 nbsphinx =0.8.7 pydata-sphinx-theme =0.7.2, =0.4.0 sphinx-gallery =0.10.1, =1.0.2, =1.0.2, =2.0.0, =1.0.3, =1.1.5, =1.1.0 pypandoc =1.7.2 sphinx-reredirects =0.0.1 sphinx-design =0.3.0 ipywidgets @ git+https://github.com/deepchecks/ipywidgets.git@8ac487b64ffd48dbeaa0a47d1997be27f6052dbc docutils 证据：`docs/requirements.txt`
- **Readme**（source_file）：Open from Docs - Recommended ============================== 证据：`examples/README.rst`
- **NLP**（source_file）：NLP nltk =3.8.1; python version = '3.7' not directly required, pinned by Snyk to avoid a vulnerability datasets textblob transformers =4.0.0 sentence transformers =3.0.0 证据：`requirements/dev-nlp-requirements.txt`
- **pandas 1.3.5 is the last version to support Python 3.7**（source_file）：pylint==2.13.5 pydocstyle flake8==4.0.1 flake8-spellcheck flake8-eradicate flake8-rst isort 证据：`requirements/dev-requirements.txt`
- **Nlp Prop Requirements**（source_file）：fasttext-wheel =0.8.0, <0.9.3 证据：`requirements/nlp-prop-requirements.txt`
- **Nlp Requirements**（source_file）：seqeval =1.0.0 nltk =3.8.1; python version = '3.7' not directly required, pinned by Snyk to avoid a vulnerability textblob =0.17.1 umap-learn transformers =4.0.0 huggingface hub sentence transformers =3.0.0, = '3.8' 证据：`requirements/nlp-requirements.txt`
- **require for python 3.8+**（source_file）：pandas =1.1.5 numpy =1.19; python version =1.22.2; python version = '3.8' scikit-learn =0.23.2 jsonpickle =2 PyNomaly==0.3.4 证据：`requirements/requirements.txt`
- **Vision Requirements**（source_file）：pytorch-ignite =0.4.8 opencv-python =4.5.5.62 albumentations =1.1.0, =0.4.0 seaborn =0.1.0 imagehash =4.0.0 lxml =4.0.0 证据：`requirements/vision-requirements.txt`
- **Setup**（source_file）：DEEPCHECKS = "deepchecks" SUPPORTED PYTHON VERSIONS = ' =3.6, bool ⋮---- match = PYTHON VERSIONING RE.match value ⋮---- @lru cache maxsize=None def get version string - str ⋮---- version = VERSION FILE.open "r" .readline ⋮---- @lru cache maxsize=None def get description - t.Tuple str, str ⋮---- def read requirements file path ⋮---- dependencies = dependencies links = ⋮---- @lru cache maxsize=None def read requirements - t.Dict str,t.List str ⋮---- requirements folder = DEEPCHECKS DIR / "requirements" ⋮---- VERSION = get version string ⋮---- requirements = read requirements main requirements = requirements.pop 'main' dependency links = requirements.pop 'dependency links', extra requirements… 证据：`setup.py`
- **Init**（source_file）：all = 证据：`deepchecks/core/__init__.py`
- **Check Result**（source_file）：all = 'CheckResult', 'CheckFailure', 'BaseCheckResult', 'DisplayMap' ⋮---- class DisplayMap Dict str, List 'TDisplayItem' ⋮---- TDisplayCallable = Callable , None TDisplayItem = Union str, pd.DataFrame, Styler, BaseFigure, TDisplayCallable, DisplayMap ⋮---- class BaseCheckResult ⋮---- check: Optional 'BaseCheck' header: Optional str run time: Optional int = 0 ⋮---- @staticmethod def from json json dict: Union str, Dict - 'BaseCheckResult' ⋮---- json dict = jsonpickle.loads json dict ⋮---- check type = cast dict, json dict 'type' ⋮---- def get header self - str ⋮---- """Return header for display. if header was defined return it, else extract name of check class.""" ⋮---- def get metadata sel… 证据：`deepchecks/core/check_result.py`
- **Checks**（source_file）：all = ⋮---- class DatasetKind enum.Enum ⋮---- TRAIN = 'Train' TEST = 'Test' ⋮---- class CheckMetadata TypedDict ⋮---- name: str params: Dict Any, Any summary: str ⋮---- class CheckConfig TypedDict ⋮---- module name: str class name: str version: NotRequired str ⋮---- class BaseCheck abc.ABC ⋮---- conditions: OrderedDict conditions index: int ⋮---- def init self, kwargs ⋮---- @abc.abstractmethod def run self, args, kwargs - 'check types.CheckResult' ⋮---- def conditions decision self, result: 'check types.CheckResult' - List ConditionResult ⋮---- results = condition: Condition ⋮---- output = condition.function result.value, condition.params ⋮---- msg = f'Exception in condition: {e. class . na… 证据：`deepchecks/core/checks.py`
- **Validate the check result type**（source_file）：class BaseContext ABC ⋮---- train = None test = None model = None with display: bool = True ⋮---- @property def with display self - bool ⋮---- @property def train self ⋮---- @property def test self ⋮---- def assert task type self, expected types ⋮---- f"but received model of type '{self.task type.value.lower }'" pylint: disable=inconsistent-quotes ⋮---- @property @abstractmethod def task type self ⋮---- """Return the task type.""" ⋮---- def get data by kind self, kind: DatasetKind ⋮---- """Return the relevant Dataset by given kind.""" ⋮---- def finalize check result self, check result, check, dataset kind: DatasetKind = None ⋮---- """Run final processing on a check result which includes val… 证据：`deepchecks/core/context.py`
- **Display**（source_file）：all = 'DisplayableResult', 'save as html', 'display in gui' ⋮---- T = t.TypeVar 'T' ⋮---- class DisplayableResult abc.ABC ⋮---- @property @abc.abstractmethod def widget serializer self - WidgetSerializer t.Any ⋮---- @property @abc.abstractmethod def ipython serializer self - IPythonSerializer t.Any ⋮---- @property @abc.abstractmethod def html serializer self - HtmlSerializer t.Any ⋮---- html = widget to html string ⋮---- class TempSphinx ⋮---- def repr html self ⋮---- widget = self.widget serializer.serialize kwargs content = widget to html string widget, title=get result name self ⋮---- output id = unique id or get random string n=25 ⋮---- content = widget to html string widget, title=get… 证据：`deepchecks/core/display.py`
- **Reduce Classes**（source_file）：all = ⋮---- class ReduceMixin abc.ABC ⋮---- def greater is better self ⋮---- def reduce output self, check result - Dict str, float ⋮---- class ReduceLabelMixin ReduceMixin ⋮---- class ReduceMetricClassMixin ReduceLabelMixin ⋮---- lower is better names = set regression scorers lower is better dict.keys ⋮---- names = list self.scorers.keys ⋮---- names = self.scorers ⋮---- names = x.lower .replace ' ', ' ' for x in names ⋮---- class ReduceFeatureMixin ReduceMixin ⋮---- value per feature = value per feature feature importance.index feature importance = feature importance value per feature.notna .values ⋮---- class ReducePropertyMixin ReduceMixin ⋮---- @staticmethod def property reduce aggregat… 证据：`deepchecks/core/reduce_classes.py`
- **Init**（source_file）：all = 'requirejs script', 'widgets script', 'suite template', 'jupyterlab plotly script' ⋮---- def requirejs script connected: bool = True ⋮---- path = os.path.join 'core', 'resources', 'requirejs.min.js' js = pkgutil.get data 'deepchecks', path ⋮---- js = js.decode 'utf-8' ⋮---- def widgets script connected: bool = True, amd module: bool = False - str ⋮---- """Return ipywidgets javascript library. Parameters ---------- connected : bool, default True whether to use CDN or not amd module : bool, default False whether to use requirejs compatiable module or not Returns ------- str """ ⋮---- url = ⋮---- asset name = 'widgets-embed-amd.js' if amd module is True else 'widgets-embed.js' path = os.… 证据：`deepchecks/core/resources/__init__.py`
- **Init**（source_file）：all = 证据：`deepchecks/core/serialization/__init__.py`
- **Init**（source_file）：all = 证据：`deepchecks/core/serialization/check_failure/__init__.py`
- **Ipython**（source_file）：all = 'CheckFailureSerializer' ⋮---- class CheckFailureSerializer IPythonSerializer 'check types.CheckFailure' ⋮---- def init self, value: 'check types.CheckFailure', kwargs ⋮---- def serialize self, kwargs - t.List IPythonFormatter 证据：`deepchecks/core/serialization/check_failure/ipython.py`
- **Init**（source_file）：all = 证据：`deepchecks/core/serialization/check_result/__init__.py`
- **NOTE:**（source_file）：all = 'CheckResultSerializer' ⋮---- class CheckResultSerializer IPythonSerializer 'check types.CheckResult' ⋮---- def init self, value: 'check types.CheckResult', kwargs ⋮---- """Serialize a CheckResult instance into a list of IPython formatters. Parameters ---------- output id : Optional str , default None unique output identifier that will be used to form anchor links check sections : Optional Sequence Literal 'condition-table', 'additional-output' , default None sequence of check result sections to include into the output, in case of 'None' all sections will be included plotly to image : bool, default False whether to transform Plotly figure instance into static image or not is for ifram… 证据：`deepchecks/core/serialization/check_result/ipython.py`
- **Init**（source_file）：all = 证据：`deepchecks/core/serialization/dataframe/__init__.py`
- **Html Display**（source_file）：class HtmlDisplayableResult DisplayableResult ⋮---- def init self, html: str ⋮---- @property def widget serializer self - WidgetSerializer t.Any ⋮---- class WidgetSerializer WidgetSerializer t.Any ⋮---- def serialize self, kwargs - Widget ⋮---- @property def ipython serializer self - IPythonSerializer t.Any ⋮---- class IPythonSerializer IPythonSerializer t.Any ⋮---- def serialize self, kwargs - t.Any ⋮---- @property def html serializer self - HtmlSerializer t.Any ⋮---- class HtmlSerializer HtmlSerializer t.Any ⋮---- def serialize self, kwargs - str ⋮---- def to widget self, kwargs - Widget ⋮---- def to json self, kwargs ⋮---- def to wandb self, kwargs ⋮---- def save as html self, file: t.Un… 证据：`deepchecks/core/serialization/html_display.py`
- **Init**（source_file）：all = 证据：`deepchecks/core/serialization/suite_result/__init__.py`
- **Ipython**（source_file）：all = 'SuiteResultSerializer' ⋮---- class SuiteResultSerializer IPythonSerializer 'suite.SuiteResult' ⋮---- def init self, value: 'suite.SuiteResult', kwargs ⋮---- """Serialize a SuiteResult instance into a list of IPython formatters. Parameters ---------- output id : Optional str , default None unique output identifier that will be used to form anchor links is for iframe with srcdoc : bool, default False anchor links, in order to work within iframe require additional prefix 'about:srcdoc'. This flag tells function whether to add that prefix to the anchor link or not kwargs : all other key-value arguments will be passed to the CheckResult/CheckFailure serializers Returns ------- List IPytho… 证据：`deepchecks/core/serialization/suite_result/ipython.py`
- **Suite**（source_file）：all = 'BaseSuite', 'SuiteResult' ⋮---- class SuiteConfig TypedDict ⋮---- module name: str class name: str version: str name: str checks: List 'CheckConfig' ⋮---- class SuiteResult DisplayableResult ⋮---- extra info: List str results: List 'check types.BaseCheckResult' ⋮---- has conditions = result.have conditions has display = result.have display ⋮---- names = name.lower .replace ' ', ' ' .strip for name in names output = result for name in names for result in self.results if result.get header .lower == name ⋮---- output = result for index, result in enumerate self.results if index in idx ⋮---- def repr self ⋮---- def repr json self ⋮---- def repr mimebundle self, kwargs ⋮---- @property def… 证据：`deepchecks/core/suite.py`
- **Init**（source_file）：all = 证据：`deepchecks/nlp/__init__.py`
- **Base Checks**（source_file）：all = ⋮---- class SingleDatasetCheck SingleDatasetBaseCheck ⋮---- context type = Context ⋮---- context = self.context type result = self.run logic context, dataset kind=DatasetKind.TRAIN ⋮---- @abc.abstractmethod def run logic self, context, dataset kind - CheckResult ⋮---- class TrainTestCheck TrainTestBaseCheck ⋮---- result = self.run logic context ⋮---- @abc.abstractmethod def run logic self, context - CheckResult 证据：`deepchecks/nlp/base_checks.py`
- **Init**（source_file）：all = 证据：`deepchecks/nlp/checks/__init__.py`
- **Init**（source_file）：all = 证据：`deepchecks/nlp/checks/data_integrity/__init__.py`
- **Frequent Substrings**（source_file）：all = 'FrequentSubstrings' ⋮---- class FrequentSubstrings SingleDatasetCheck ⋮---- @staticmethod def get ngrams text, n ⋮---- words = text.split chars = r' ?<=, .!? \/ ' ngrams = ⋮---- flag = True ngram = words i:i + n ⋮---- flag = False ⋮---- @staticmethod def split sentences text ⋮---- """ Split a given text into sentences. Args: text str : The input text to be split into sentences. Returns: list of str: A list of sentences extracted from the input text. """ ⋮---- def get n sentences self, data ⋮---- """ Extract a specified number of sentences from each item in the input data. This function processes each item in the input data, splitting its text content into sentences, and then selects… 证据：`deepchecks/nlp/checks/data_integrity/frequent_substrings.py`
- **Counting the frequency of each category. Normalizing because distribution graph shows percentage.**（source_file）：all = 'TextPropertyOutliers' ⋮---- class TextPropertyOutliers SingleDatasetCheck ⋮---- def run logic self, context: Context, dataset kind: DatasetKind - CheckResult ⋮---- dataset = context.get data by kind dataset kind result = {} ⋮---- df properties = dataset.properties cat properties = dataset.categorical properties properties = df properties.to dict orient='list' ⋮---- is numeric = name not in cat properties ⋮---- curr nan count = pd.isnull values .sum values = pd.to numeric values, errors='coerce' updated nan count = pd.isnull values .sum ⋮---- values = x for x in values ⋮---- values arr = np.hstack values .astype float .squeeze values arr = np.array x for x in values arr if pd.notnull… 证据：`deepchecks/nlp/checks/data_integrity/text_property_outliers.py`
- **Init**（source_file）：all = 'SingleDatasetPerformance', 'MetadataSegmentsPerformance', 'PropertySegmentsPerformance', 证据：`deepchecks/nlp/checks/model_evaluation/__init__.py`
- **Init**（source_file）：all = 'LabelDrift', 'PropertyDrift', 'TrainTestSamplesMix', 'TextEmbeddingsDrift' 证据：`deepchecks/nlp/checks/train_test_validation/__init__.py`
- **This is commented out as currently text data indices are len range len data**（source_file）：all = ⋮---- TClassPred = t.Union TTokenPred = t.Union ⋮---- TTextPred = t.Union TClassPred, TTokenPred TTextProba = t.Sequence t.Sequence float ⋮---- class DummyModel BasicModel ⋮---- predictions: t.Dict str, t.Dict int, TTextPred proba: t.Dict str, t.Dict int, TTextProba ⋮---- predictions = {} probas = {} ⋮---- train index = train.get original text indexes test index = test.get original text indexes ⋮---- This is commented out as currently text data indices are len range len data TODO: Uncomment when text data indices are not len range len data get logger .warning 'train and test datasets have common index - adding "train"/"test"' ' prefixes. To avoid that provide datasets with no common i… 证据：`deepchecks/nlp/context.py`
- **Init**（source_file）：all = 'classification', 'token classification' 证据：`deepchecks/nlp/datasets/__init__.py`
- **Init**（source_file）：all = 'tweet emotion', 'just dance comment analysis' 证据：`deepchecks/nlp/datasets/classification/__init__.py`
- **Init**（source_file）：all = 'scierc ner' 证据：`deepchecks/nlp/datasets/token_classification/__init__.py`
- **TODO: better message**（source_file）：def validate tokenized text tokenized text: Optional Sequence Sequence str ⋮---- error string = 'tokenized text must be a Sequence of Sequences of strings' ⋮---- def validate raw text raw text: Optional Sequence str ⋮---- error string = 'raw text must be a Sequence of strings' ⋮---- def label is null input label ⋮---- first element = input label.iloc 0 ⋮---- first element = input label 0 ⋮---- if all is sequence not str x or is label none x for x in labels : Multilabel multilabel error = 'multilabel was identified. It must be a Sequence of Sequences of 0 or 1.' ⋮---- labels = None len labels 0 if is label none label per sample else int x for x in label per sample ⋮---- labels = None if pd.i… 证据：`deepchecks/nlp/input_validations.py`
- **Init**（source_file）：all = 'get default token scorers', 'validate scorers', 'get scorer dict', 'init validate scorers', 证据：`deepchecks/nlp/metric_utils/__init__.py`
- **Scorers**（source_file）：all = ⋮---- scorers: t.List DeepcheckScorer = DeepcheckScorer scorer, model classes, observed classes, name ⋮---- scorers: t.List DeepcheckScorer = DeepcheckScorer scorer, model classes, observed classes ⋮---- def infer on text data scorer: DeepcheckScorer, model: ClassificationModel, data: TextData, drop na: bool = True ⋮---- y pred = model.predict data y true = data.label ⋮---- idx to keep = not is label none pred or is label none label for pred, label in zip y pred, y true y pred = np.asarray y pred, dtype='object' idx to keep y true = y true idx to keep ⋮---- y pred = transform to multi label format y pred, scorer.model classes .astype int y true = transform to multi label format y true… 证据：`deepchecks/nlp/metric_utils/scorers.py`
- **Suite**（source_file）：all = 'Suite' ⋮---- class Suite BaseSuite ⋮---- @classmethod def supported checks cls - Tuple ⋮---- context = Context ⋮---- progress bar = create progress bar ⋮---- results = ⋮---- check result = check.run logic context ⋮---- msg = 'Check is irrelevant if not supplied with both train and test datasets' ⋮---- check result = check.run logic context, dataset kind=DatasetKind.TRAIN ⋮---- check result = CheckFailure check, exp, ' - Train Dataset' ⋮---- check result = check.run logic context, dataset kind=DatasetKind.TEST ⋮---- check result = CheckFailure check, exp, ' - Test Dataset' ⋮---- msg = 'Check is irrelevant if dataset is not supplied' 证据：`deepchecks/nlp/suite.py`
- **Init**（source_file）：all = 'data integrity', 'train test validation', 'model evaluation', 'full suite' 证据：`deepchecks/nlp/suites/__init__.py`
- **Default Suites**（source_file）：all = 'data integrity', 'train test validation', ⋮---- args = locals ⋮---- non none args = {k: v for k, v in args.items if v is not None} kwargs = { non none args, kwargs} ⋮---- def full suite kwargs - Suite 证据：`deepchecks/nlp/suites/default_suites.py`
- **Task Type**（source_file）：all = 'TaskType', 'TTokenLabel', 'TClassLabel', 'TTextLabel' ⋮---- TSingleLabel = t.Union int, str TNoneLabel = t.Sequence None TClassLabel = t.Sequence t.Union TSingleLabel, t.Tuple TSingleLabel TTokenLabel = t.Sequence t.Sequence t.Union str, int TTextLabel = t.Union TClassLabel, TTokenLabel, TNoneLabel ⋮---- class TaskType Enum ⋮---- TEXT CLASSIFICATION = 'text classification' TOKEN CLASSIFICATION = 'token classification' OTHER = 'other' 证据：`deepchecks/nlp/task_type.py`
- **Used for display purposes**（source_file）：all = 'TextData' ⋮---- TDataset = t.TypeVar 'TDataset', bound='TextData' ⋮---- class TextData ⋮---- text: np.ndarray label: TTextLabel task type: t.Optional TaskType tokenized text: t.Optional t.Sequence t.Sequence str = None name: t.Optional str = None embeddings: t.Optional t.Union pd.DataFrame, str = None metadata: t.Optional t.Union pd.DataFrame, str = None properties: t.Optional t.Union pd.DataFrame, str = None cat properties: t.Optional t.List str = None cat metadata: t.Optional t.List str = None numeric metadata: t.Optional t.List str = None original text index: t.Optional t.Sequence int = None ⋮---- modified = str token for token in tokens per sample for tokens per sample in tokeniz… 证据：`deepchecks/nlp/text_data.py`
- **Init**（source_file）：all = 证据：`deepchecks/nlp/utils/__init__.py`
- **Multivariate Embeddings Drift Utils**（source_file）：SAMPLES FOR REDUCTION FIT = 1000 ⋮---- train sample = train dataset.sample sample size, random state=random state test sample = test dataset.sample sample size, random state=random state ⋮---- train sample embeddings = train sample.embeddings test sample embeddings = test sample.embeddings ⋮---- domain class array = np.concatenate train sample embeddings, test sample embeddings domain class labels = pd.Series 0 len train sample embeddings + 1 len test sample embeddings ⋮---- use reduction = not dimension reduction method == 'none' or use umap = dimension reduction method == 'umap' or dimension reduction method == 'auto' and with display ⋮---- reducer = UMAP n components=10, n neighbors=5, i… 证据：`deepchecks/nlp/utils/multivariate_embeddings_drift_utils.py`
- **Nlp Plot**（source_file）：all = 'get text outliers graph', ⋮---- def clean x axis non existent values x axis, distribution ⋮---- ixs = np.searchsorted sorted distribution , x axis, side='left' ⋮---- x axis = x axis i for i in range len ixs if ixs i != ixs i - 1 ⋮---- green = common and outlier colors 'common' red = common and outlier colors 'outliers' green fill = common and outlier colors 'common fill' red fill = common and outlier colors 'outliers fill' ⋮---- dist counts = pd.Series dist .value counts normalize=True .to dict counts = list dist counts.values categories list = list dist counts.keys ⋮---- outliers first index = counts.index lower limit color discrete sequence = green outliers first index + red len co… 证据：`deepchecks/nlp/utils/nlp_plot.py`
- **batched 'ABCDEFG', 3 -- ABC DEF G**（source_file）：EMBEDDING MODEL = 'text-embedding-ada-002' EMBEDDING DIM = 1536 EMBEDDING CTX LENGTH = 8191 EMBEDDING ENCODING = 'cl100k base' ⋮---- batched 'ABCDEFG', 3 -- ABC DEF G ⋮---- Filter out the first chunk of samples in skip sample indices 证据：`deepchecks/nlp/utils/text_embeddings.py`
- **Load the model if it wasn't received as a parameter. This is done to avoid loading the model**（source_file）：all = 'calculate builtin properties', 'get builtin properties types' ⋮---- DEFAULT SENTENCE SAMPLE SIZE = 300 MAX TOKENS = 512 ⋮---- NON PUNCTUATION SPECIAL CHARS = frozenset set SPECIAL CHARACTERS - set r"""!" $%&' +,-./:;=?\@""" ⋮---- textblob cache = {} words cache = {} sentences cache = {} secret cache = {} ⋮---- scores array = np.array scores, dtype=np.float64 averages = ⋮---- values = scores array indices valid values = values ~np.isnan values ⋮---- def split to words with cache text: str - List str ⋮---- hash key = hash text text ⋮---- words = re.split r'\W+', normalize text text, remove stops=False, ignore whitespace=False words = w for w in words if w remove empty strings ⋮---- def… 证据：`deepchecks/nlp/utils/text_properties.py`
- **Init**（source_file）：all = 证据：`deepchecks/tabular/__init__.py`
- **Base Checks**（source_file）：all = ⋮---- class SingleDatasetCheck SingleDatasetBaseCheck ⋮---- context type = Context ⋮---- y pred train = y pred train if y pred train is not None else y pred y proba train = y proba train if y proba train is not None else y proba ⋮---- context = self.context type result = self.run logic context, dataset kind=DatasetKind.TRAIN ⋮---- @abc.abstractmethod def run logic self, context, dataset kind - CheckResult ⋮---- class TrainTestCheck TrainTestBaseCheck ⋮---- result = self.run logic context ⋮---- @abc.abstractmethod def run logic self, context - CheckResult ⋮---- class ModelOnlyCheck ModelOnlyBaseCheck ⋮---- @classmethod def get unsupported failure cls, check, msg ⋮---- class ModelCompar… 证据：`deepchecks/tabular/base_checks.py`
- **Init**（source_file）：all = 证据：`deepchecks/tabular/checks/__init__.py`
- **Init**（source_file）：all = 证据：`deepchecks/tabular/checks/data_integrity/__init__.py`
- 其余 16 条证据见 `AI_CONTEXT_PACK.json` 或 `EVIDENCE_INDEX.json`。

## 宿主 AI 必须遵守的规则

- **把本资产当作开工前上下文，而不是运行环境。**：AI Context Pack 只包含证据化项目理解，不包含目标项目的可执行状态。 证据：`README.md`, `LICENSE`, `benchmarks/__init__.py`
- **回答用户时区分可预览内容与必须安装后才能验证的内容。**：安装前体验的消费者价值来自降低误装和误判，而不是伪装成真实运行。 证据：`README.md`, `LICENSE`, `benchmarks/__init__.py`

## 用户开工前应该回答的问题

- 你准备在哪个宿主 AI 或本地环境中使用它？
- 你只是想先体验工作流，还是准备真实安装？
- 你最在意的是安装成本、输出质量、还是和现有规则的冲突？

## 验收标准

- 所有能力声明都能回指到 evidence_refs 中的文件路径。
- AI_CONTEXT_PACK.md 没有把预览包装成真实运行。
- 用户能在 3 分钟内看懂适合谁、能做什么、如何开始和风险边界。

---

## Doramagic Context Augmentation

下面内容用于强化 Repomix/AI Context Pack 主体。Human Manual 只提供阅读骨架；踩坑日志会被转成宿主 AI 必须遵守的工作约束。

## Human Manual 骨架

使用规则：这里只是项目阅读路线和显著性信号，不是事实权威。具体事实仍必须回到 repo evidence / Claim Graph。

宿主 AI 硬性规则：
- 不得把页标题、章节顺序、摘要或 importance 当作项目事实证据。
- 解释 Human Manual 骨架时，必须明确说它只是阅读路线/显著性信号。
- 能力、安装、兼容性、运行状态和风险判断必须引用 repo evidence、source path 或 Claim Graph。

- **Deepchecks Repository Overview**：importance `high`
  - source_paths: README.md, deepchecks/__init__.py, deepchecks/core/__init__.py, deepchecks/core/checks.py, deepchecks/core/suite.py
- **Installation & Quickstart**：importance `high`
  - source_paths: setup.py, requirements/requirements.txt, requirements/nlp-requirements.txt, requirements/vision-requirements.txt, conda-recipe/meta.yaml
- **Core Architecture**：importance `high`
  - source_paths: deepchecks/core/checks.py, deepchecks/core/suite.py, deepchecks/core/context.py, deepchecks/core/condition.py, deepchecks/core/check_result.py
- **Checks & Suites Framework**：importance `high`
  - source_paths: deepchecks/core/checks.py, deepchecks/core/reduce_classes.py, deepchecks/tabular/base_checks.py, deepchecks/nlp/base_checks.py, deepchecks/vision/base_checks.py
- **Serialization & Output Formats**：importance `high`
  - source_paths: deepchecks/core/serialization/__init__.py, deepchecks/core/serialization/check_result/json.py, deepchecks/core/serialization/check_result/html.py, deepchecks/core/serialization/suite_result/json.py, deepchecks/core/serialization/suite_result/html.py
- **Tabular Data Validation**：importance `high`
  - source_paths: deepchecks/tabular/__init__.py, deepchecks/tabular/dataset.py, deepchecks/tabular/context.py, deepchecks/tabular/model_base.py, deepchecks/tabular/utils/task_type.py
- **NLP Validation**：importance `high`
  - source_paths: deepchecks/nlp/__init__.py, deepchecks/nlp/text_data.py, deepchecks/nlp/context.py, deepchecks/nlp/task_type.py, deepchecks/nlp/input_validations.py
- **Computer Vision Validation**：importance `high`
  - source_paths: deepchecks/vision/__init__.py, deepchecks/vision/vision_data/vision_data.py, deepchecks/vision/vision_data/batch_wrapper.py, deepchecks/vision/context.py, deepchecks/vision/utils/image_properties.py

## Repo Inspection Evidence / 源码检查证据

- repo_clone_verified: true
- repo_inspection_verified: true
- repo_commit: `98475d17b08a21fca29d533b94b8ec3c70544a85`
- inspected_files: `README.md`, `docs/source/conf.py`, `docs/source/_static/switcher.json`, `docs/source/tabular/tutorials/quickstarts/plot_quick_model_evaluation.py`, `docs/source/tabular/tutorials/quickstarts/plot_quick_data_integrity.py`, `docs/source/tabular/tutorials/quickstarts/plot_quick_train_test_validation.py`, `docs/source/tabular/tutorials/quickstarts/plot_quickstart_in_5_minutes.py`, `docs/source/tabular/tutorials/other/plot_add_a_custom_check.py`, `docs/source/tabular/tutorials/other/plot_phishing_urls.py`, `docs/source/general/usage/customizations/plot_configure_check_conditions.py`, `docs/source/general/usage/customizations/plot_create_a_custom_suite.py`, `docs/source/general/usage/customizations/plot_create_a_custom_check.py`, `docs/source/general/usage/exporting_results/plot_export_suite_results_as_html.py`, `docs/source/general/usage/exporting_results/plot_exports_output_to_wandb.py`, `docs/source/general/usage/exporting_results/plot_export_outputs_to_json.py`, `docs/source/checks/tabular/data_integrity/plot_string_length_out_of_bounds.py`, `docs/source/checks/tabular/data_integrity/plot_special_chars.py`, `docs/source/checks/tabular/data_integrity/plot_string_mismatch.py`, `docs/source/checks/tabular/data_integrity/plot_data_duplicates.py`, `docs/source/checks/tabular/data_integrity/plot_percent_of_nulls.py`

宿主 AI 硬性规则：
- 没有 repo_clone_verified=true 时，不得声称已经读过源码。
- 没有 repo_inspection_verified=true 时，不得把 README/docs/package 文件判断写成事实。
- 没有 quick_start_verified=true 时，不得声称 Quick Start 已跑通。

## Doramagic Pitfall Constraints / 踩坑约束

这些规则来自 Doramagic 发现、验证或编译过程中的项目专属坑点。宿主 AI 必须把它们当作工作约束，而不是普通说明文字。

### Constraint 1: 来源证据：Blank html page after saving report using `save_as_html`

- Trigger: GitHub 社区证据显示该项目存在一个安装相关的待验证问题：Blank html page after saving report using `save_as_html`
- Host AI rule: 来源问题仍为 open，Pack Agent 需要复核是否仍影响当前版本。
- Why it matters: 可能增加新用户试用和生产接入成本。
- Evidence: community_evidence:github | cevd_492bcbfbeaac498b94f2f869074b9edc | https://github.com/deepchecks/deepchecks/issues/2803 | 来源讨论提到 python 相关条件，需在安装/试用前复核。
- Hard boundary: 不要把这个坑点包装成已解决、已验证或可忽略，除非后续验证证据明确证明它已经关闭。

### Constraint 2: 来源证据：Failed to load model class 'AnyModel' from module 'anywidget' Error: No version of module anywidget is registered

- Trigger: GitHub 社区证据显示该项目存在一个安装相关的待验证问题：Failed to load model class 'AnyModel' from module 'anywidget' Error: No version of module anywidget is registered
- Host AI rule: 来源问题仍为 open，Pack Agent 需要复核是否仍影响当前版本。
- Why it matters: 可能增加新用户试用和生产接入成本。
- Evidence: community_evidence:github | cevd_4bac6c577dee471fa096434516861696 | https://github.com/deepchecks/deepchecks/issues/2794 | 来源类型 github_issue 暴露的待验证使用条件。
- Hard boundary: 不要把这个坑点包装成已解决、已验证或可忽略，除非后续验证证据明确证明它已经关闭。

### Constraint 3: 来源证据：[BUG] GPU not being able to change runtime of Image Property Drift and Image Dataset Drift

- Trigger: GitHub 社区证据显示该项目存在一个维护/版本相关的待验证问题：[BUG] GPU not being able to change runtime of Image Property Drift and Image Dataset Drift
- Host AI rule: 来源问题仍为 open，Pack Agent 需要复核是否仍影响当前版本。
- Why it matters: 可能增加新用户试用和生产接入成本。
- Evidence: community_evidence:github | cevd_b9f771df5da2458d9e368765d829e5c7 | https://github.com/deepchecks/deepchecks/issues/2789 | 来源类型 github_issue 暴露的待验证使用条件。
- Hard boundary: 不要把这个坑点包装成已解决、已验证或可忽略，除非后续验证证据明确证明它已经关闭。

### Constraint 4: 来源证据：Feature Request: EU AI Act compliance mapping for validation checks

- Trigger: GitHub 社区证据显示该项目存在一个安全/权限相关的待验证问题：Feature Request: EU AI Act compliance mapping for validation checks
- Host AI rule: 来源问题仍为 open，Pack Agent 需要复核是否仍影响当前版本。
- Why it matters: 可能增加新用户试用和生产接入成本。
- Evidence: community_evidence:github | cevd_aaa0c0bcbdbf41d6855980523e0d7682 | https://github.com/deepchecks/deepchecks/issues/2813 | 来源类型 github_issue 暴露的待验证使用条件。
- Hard boundary: 不要把这个坑点包装成已解决、已验证或可忽略，除非后续验证证据明确证明它已经关闭。

### Constraint 5: 来源证据：[BUG] Inaccurate Conditions Summary and Heatmap for Pairwise Correlation Display in Deepchecks Tabular Suite. Solution…

- Trigger: GitHub 社区证据显示该项目存在一个安全/权限相关的待验证问题：[BUG] Inaccurate Conditions Summary and Heatmap for Pairwise Correlation Display in Deepchecks Tabular Suite. Solution Proposed.
- Host AI rule: 来源问题仍为 open，Pack Agent 需要复核是否仍影响当前版本。
- Why it matters: 可能影响授权、密钥配置或安全边界。
- Evidence: community_evidence:github | cevd_edf8cf14dc8f49898cbcab292f3abbeb | https://github.com/deepchecks/deepchecks/issues/2802 | 来源讨论提到 python 相关条件，需在安装/试用前复核。
- Hard boundary: 不要把这个坑点包装成已解决、已验证或可忽略，除非后续验证证据明确证明它已经关闭。

### Constraint 6: 来源证据：0.18.0

- Trigger: GitHub 社区证据显示该项目存在一个安装相关的待验证问题：0.18.0
- Host AI rule: 来源显示可能已有修复、规避或版本变化，说明书中必须标注适用版本。
- Why it matters: 可能增加新用户试用和生产接入成本。
- Evidence: community_evidence:github | cevd_4789fb40d7364096958752494c3054a2 | https://github.com/deepchecks/deepchecks/releases/tag/0.18.0 | 来源讨论提到 python 相关条件，需在安装/试用前复核。
- Hard boundary: 不要把这个坑点包装成已解决、已验证或可忽略，除非后续验证证据明确证明它已经关闭。

### Constraint 7: 来源证据：0.18.1

- Trigger: GitHub 社区证据显示该项目存在一个安装相关的待验证问题：0.18.1
- Host AI rule: 来源显示可能已有修复、规避或版本变化，说明书中必须标注适用版本。
- Why it matters: 可能增加新用户试用和生产接入成本。
- Evidence: community_evidence:github | cevd_53c2c4845d134a54b0989b29725c1c93 | https://github.com/deepchecks/deepchecks/releases/tag/0.18.1 | 来源类型 github_release 暴露的待验证使用条件。
- Hard boundary: 不要把这个坑点包装成已解决、已验证或可忽略，除非后续验证证据明确证明它已经关闭。

### Constraint 8: 来源证据：Proposal: Doc/example for RAG failure-mode testing using WFGY 16-problem ProblemMap

- Trigger: GitHub 社区证据显示该项目存在一个安装相关的待验证问题：Proposal: Doc/example for RAG failure-mode testing using WFGY 16-problem ProblemMap
- Host AI rule: 来源问题仍为 open，Pack Agent 需要复核是否仍影响当前版本。
- Why it matters: 可能增加新用户试用和生产接入成本。
- Evidence: community_evidence:github | cevd_e24dd97e69674ab2b766fef25a401070 | https://github.com/deepchecks/deepchecks/issues/2812 | 来源类型 github_issue 暴露的待验证使用条件。
- Hard boundary: 不要把这个坑点包装成已解决、已验证或可忽略，除非后续验证证据明确证明它已经关闭。

### Constraint 9: 来源证据：[BUG] neg_log_loss scorer incompatible with newer scikit-learn version

- Trigger: GitHub 社区证据显示该项目存在一个安装相关的待验证问题：[BUG] neg_log_loss scorer incompatible with newer scikit-learn version
- Host AI rule: 来源问题仍为 open，Pack Agent 需要复核是否仍影响当前版本。
- Why it matters: 可能影响升级、迁移或版本选择。
- Evidence: community_evidence:github | cevd_6e307650606b4ed380c4adc96caa8c28 | https://github.com/deepchecks/deepchecks/issues/2806 | 来源讨论提到 python 相关条件，需在安装/试用前复核。
- Hard boundary: 不要把这个坑点包装成已解决、已验证或可忽略，除非后续验证证据明确证明它已经关闭。

### Constraint 10: 来源证据：0.17.3

- Trigger: GitHub 社区证据显示该项目存在一个配置相关的待验证问题：0.17.3
- Host AI rule: 来源显示可能已有修复、规避或版本变化，说明书中必须标注适用版本。
- Why it matters: 可能增加新用户试用和生产接入成本。
- Evidence: community_evidence:github | cevd_b64249b0551146afb696a981f404b3e6 | https://github.com/deepchecks/deepchecks/releases/tag/0.17.3 | 来源类型 github_release 暴露的待验证使用条件。
- Hard boundary: 不要把这个坑点包装成已解决、已验证或可忽略，除非后续验证证据明确证明它已经关闭。
