# crawl4ai - Doramagic AI Context Pack

> 定位：安装前体验与判断资产。它帮助宿主 AI 有一个好的开始，但不代表已经安装、执行或验证目标项目。

## 充分原则

- **充分原则，不是压缩原则**：AI Context Pack 应该充分到让宿主 AI 在开工前理解项目价值、能力边界、使用入口、风险和证据来源；它可以分层组织，但不以最短摘要为目标。
- **压缩策略**：只压缩噪声和重复内容，不压缩会影响判断和开工质量的上下文。

## 给宿主 AI 的使用方式

你正在读取 Doramagic 为 crawl4ai 编译的 AI Context Pack。请把它当作开工前上下文：帮助用户理解适合谁、能做什么、如何开始、哪些必须安装后验证、风险在哪里。不要声称你已经安装、运行或执行了目标项目。

## Claim 消费规则

- **事实来源**：Repo Evidence + Claim/Evidence Graph；Human Wiki 只提供显著性、术语和叙事结构。
- **事实最低状态**：`supported`
- `supported`：可以作为项目事实使用，但回答中必须引用 claim_id 和证据路径。
- `weak`：只能作为低置信度线索，必须要求用户继续核实。
- `inferred`：只能用于风险提示或待确认问题，不能包装成项目事实。
- `unverified`：不得作为事实使用，应明确说证据不足。
- `contradicted`：必须展示冲突来源，不得替用户强行选择一个版本。

## 它最适合谁

- **正在使用 Claude/Codex/Cursor/Gemini 等宿主 AI 的开发者**：README 或插件配置提到多个宿主 AI。 证据：`README.md` Claim：`clm_0002` supported 0.86

## 它能做什么

- **命令行启动或安装流程**（需要安装后验证）：项目文档中存在可执行命令，真实使用需要在本地或宿主环境中运行这些命令。 证据：`README.md` Claim：`clm_0001` supported 0.86

## 怎么开始

- `pip install -U crawl4ai` 证据：`README.md` Claim：`clm_0003` supported 0.86
- `pip install crawl4ai --pre` 证据：`README.md` Claim：`clm_0004` supported 0.86
- `pip install crawl4ai` 证据：`README.md` Claim：`clm_0004` supported 0.86, `clm_0005` supported 0.86, `clm_0006` supported 0.86, `clm_0014` supported 0.86
- `pip install crawl4ai[sync]` 证据：`README.md` Claim：`clm_0006` supported 0.86
- `git clone https://github.com/unclecode/crawl4ai.git` 证据：`README.md` Claim：`clm_0007` supported 0.86
- `pip install -e .                    # Basic installation in editable mode` 证据：`README.md` Claim：`clm_0008` supported 0.86
- `pip install -e ".[torch]"           # With PyTorch features` 证据：`README.md` Claim：`clm_0009` supported 0.86
- `pip install -e ".[transformer]"     # With Transformer features` 证据：`README.md` Claim：`clm_0010` supported 0.86
- `pip install -e ".[cosine]"          # With cosine similarity features` 证据：`README.md` Claim：`clm_0011` supported 0.86
- `pip install -e ".[sync]"            # With synchronous crawling (Selenium)` 证据：`README.md` Claim：`clm_0012` supported 0.86

## 继续前判断卡

- **当前建议**：先做角色匹配试用
- **为什么**：这个项目更像角色库，核心风险是选错角色或把角色文案当执行能力；先用 Prompt Preview 试角色匹配，再决定是否沙盒导入。

### 30 秒判断

- **现在怎么做**：先做角色匹配试用
- **最小安全下一步**：先用 Prompt Preview 试角色匹配；满意后再隔离导入
- **先别相信**：角色质量和任务匹配不能直接相信。
- **继续会触碰**：角色选择偏差、命令执行、本地环境或项目文件

### 现在可以相信

- **适合人群线索：正在使用 Claude/Codex/Cursor/Gemini 等宿主 AI 的开发者**（supported）：有 supported claim 或项目证据支撑，但仍不等于真实安装效果。 证据：`README.md` Claim：`clm_0002` supported 0.86
- **能力存在：命令行启动或安装流程**（supported）：可以相信项目包含这类能力线索；是否适合你的具体任务仍要试用或安装后验证。 证据：`README.md` Claim：`clm_0001` supported 0.86
- **存在 Quick Start / 安装命令线索**（supported）：可以相信项目文档出现过启动或安装入口；不要因此直接在主力环境运行。 证据：`README.md` Claim：`clm_0003` supported 0.86

### 现在还不能相信

- **角色质量和任务匹配不能直接相信。**（unverified）：角色库证明有很多角色，不证明每个角色都适合你的具体任务，也不证明角色能产生高质量结果。
- **不能把角色文案当成真实执行能力。**（unverified）：安装前只能判断角色描述和任务画像是否匹配，不能证明它能在宿主 AI 里完成任务。
- **真实输出质量不能在安装前相信。**（unverified）：Prompt Preview 只能展示引导方式，不能证明真实项目中的结果质量。
- **宿主 AI 版本兼容性不能在安装前相信。**（unverified）：Claude、Cursor、Codex、Gemini 等宿主加载规则和版本差异必须在真实环境验证。
- **不会污染现有宿主 AI 行为，不能直接相信。**（inferred）：Skill、plugin、AGENTS/CLAUDE/GEMINI 指令可能改变宿主 AI 的默认行为。
- **可安全回滚不能默认相信。**（unverified）：除非项目明确提供卸载和恢复说明，否则必须先在隔离环境验证。
- **真实安装后是否与用户当前宿主 AI 版本兼容？**（unverified）：兼容性只能通过实际宿主环境验证。
- **项目输出质量是否满足用户具体任务？**（unverified）：安装前预览只能展示流程和边界，不能替代真实评测。

### 继续会触碰什么

- **角色选择偏差**：用户对任务应该由哪个专家角色处理的判断。 原因：选错角色会让 AI 从错误专业视角回答，浪费时间或误导决策。
- **命令执行**：包管理器、网络下载、本地插件目录、项目配置或用户主目录。 原因：运行第一条命令就可能产生环境改动；必须先判断是否值得跑。 证据：`README.md`
- **本地环境或项目文件**：安装结果、插件缓存、项目配置或本地依赖目录。 原因：安装前无法证明写入范围和回滚方式，需要隔离验证。 证据：`README.md`
- **宿主 AI 上下文**：AI Context Pack、Prompt Preview、Skill 路由、风险规则和项目事实。 原因：导入上下文会影响宿主 AI 后续判断，必须避免把未验证项包装成事实。

### 最小安全下一步

- **先跑 Prompt Preview**：先用交互式试用验证任务画像和角色匹配，不要先导入整套角色库。（适用：任何项目都适用，尤其是输出质量未知时。）
- **只在隔离目录或测试账号试装**：避免安装命令污染主力宿主 AI、真实项目或用户主目录。（适用：存在命令执行、插件配置或本地写入线索时。）
- **安装后只验证一个最小任务**：先验证加载、兼容、输出质量和回滚，再决定是否深用。（适用：准备从试用进入真实工作流时。）

### 退出方式

- **保留安装前状态**：记录原始宿主配置和项目状态，后续才能判断是否可恢复。
- **保留原始角色选择记录**：如果输出偏题，可以回到任务画像阶段重新选择角色，而不是继续沿着错误角色推进。
- **记录安装命令和写入路径**：没有明确卸载说明时，至少要知道哪些目录或配置需要手动清理。
- **如果没有回滚路径，不进入主力环境**：不可回滚是继续前阻断项，不应靠信任或运气继续。

## 哪些只能预览

- 解释项目适合谁和能做什么
- 基于项目文档演示典型对话流程
- 帮助用户判断是否值得安装或继续研究

## 哪些必须安装后验证

- 真实安装 Skill、插件或 CLI
- 执行脚本、修改本地文件或访问外部服务
- 验证真实输出质量、性能和兼容性

## 边界与风险判断卡

- **把安装前预览误认为真实运行**：用户可能高估项目已经完成的配置、权限和兼容性验证。 处理方式：明确区分 prompt_preview_can_do 与 runtime_required。 Claim：`clm_0015` inferred 0.45
- **命令执行会修改本地环境**：安装命令可能写入用户主目录、宿主插件目录或项目配置。 处理方式：先在隔离环境或测试账号中运行。 证据：`README.md` Claim：`clm_0016` supported 0.86
- **待确认**：真实安装后是否与用户当前宿主 AI 版本兼容？。原因：兼容性只能通过实际宿主环境验证。
- **待确认**：项目输出质量是否满足用户具体任务？。原因：安装前预览只能展示流程和边界，不能替代真实评测。
- **待确认**：安装命令是否需要网络、权限或全局写入？。原因：这影响企业环境和个人环境的安装风险。

## 开工前工作上下文

### 加载顺序

- 先读取 how_to_use.host_ai_instruction，建立安装前判断资产的边界。
- 读取 claim_graph_summary，确认事实来自 Claim/Evidence Graph，而不是 Human Wiki 叙事。
- 再读取 intended_users、capabilities 和 quick_start_candidates，判断用户是否匹配。
- 需要执行具体任务时，优先查 role_skill_index，再查 evidence_index。
- 遇到真实安装、文件修改、网络访问、性能或兼容性问题时，转入 risk_card 和 boundaries.runtime_required。

### 任务路由

- **命令行启动或安装流程**：先说明这是安装后验证能力，再给出安装前检查清单。 边界：必须真实安装或运行后验证。 证据：`README.md` Claim：`clm_0001` supported 0.86

### 上下文规模

- 文件总数：783
- 重要文件覆盖：40/783
- 证据索引条目：80
- 角色 / Skill 条目：79

### 证据不足时的处理

- **missing_evidence**：说明证据不足，要求用户提供目标文件、README 段落或安装后验证记录；不要补全事实。
- **out_of_scope_request**：说明该任务超出当前 AI Context Pack 证据范围，并建议用户先查看 Human Manual 或真实安装后验证。
- **runtime_request**：给出安装前检查清单和命令来源，但不要替用户执行命令或声称已执行。
- **source_conflict**：同时展示冲突来源，标记为待核实，不要强行选择一个版本。

## Prompt Recipes

### 适配判断

- 目标：判断这个项目是否适合用户当前任务。
- 预期输出：适配结论、关键理由、证据引用、安装前可预览内容、必须安装后验证内容、下一步建议。

```text
请基于 crawl4ai 的 AI Context Pack，先问我 3 个必要问题，然后判断它是否适合我的任务。回答必须包含：适合谁、能做什么、不能做什么、是否值得安装、证据来自哪里。所有项目事实必须引用 evidence_refs、source_paths 或 claim_id。
```

### 安装前体验

- 目标：让用户在安装前感受核心工作流，同时避免把预览包装成真实能力或营销承诺。
- 预期输出：一段带边界标签的体验剧本、安装后验证清单和谨慎建议；不含真实运行承诺或强营销表述。

```text
请把 crawl4ai 当作安装前体验资产，而不是已安装工具或真实运行环境。

请严格输出四段：
1. 先问我 3 个必要问题。
2. 给出一段“体验剧本”：用 [安装前可预览]、[必须安装后验证]、[证据不足] 三种标签展示它可能如何引导工作流。
3. 给出安装后验证清单：列出哪些能力只有真实安装、真实宿主加载、真实项目运行后才能确认。
4. 给出谨慎建议：只能说“值得继续研究/试装”“先补充信息后再判断”或“不建议继续”，不得替项目背书。

硬性边界：
- 不要声称已经安装、运行、执行测试、修改文件或产生真实结果。
- 不要写“自动适配”“确保通过”“完美适配”“强烈建议安装”等承诺性表达。
- 如果描述安装后的工作方式，必须使用“如果安装成功且宿主正确加载 Skill，它可能会……”这种条件句。
- 体验剧本只能写成“示例台词/假设流程”：使用“可能会询问/可能会建议/可能会展示”，不要写“已写入、已生成、已通过、正在运行、正在生成”。
- Prompt Preview 不负责给安装命令；如用户准备试装，只能提示先阅读 Quick Start 和 Risk Card，并在隔离环境验证。
- 所有项目事实必须来自 supported claim、evidence_refs 或 source_paths；inferred/unverified 只能作风险或待确认项。

```

### 角色 / Skill 选择

- 目标：从项目里的角色或 Skill 中挑选最匹配的资产。
- 预期输出：候选角色或 Skill 列表，每项包含适用场景、证据路径、风险边界和是否需要安装后验证。

```text
请读取 role_skill_index，根据我的目标任务推荐 3-5 个最相关的角色或 Skill。每个推荐都要说明适用场景、可能输出、风险边界和 evidence_refs。
```

### 风险预检

- 目标：安装或引入前识别环境、权限、规则冲突和质量风险。
- 预期输出：环境、权限、依赖、许可、宿主冲突、质量风险和未知项的检查清单。

```text
请基于 risk_card、boundaries 和 quick_start_candidates，给我一份安装前风险预检清单。不要替我执行命令，只说明我应该检查什么、为什么检查、失败会有什么影响。
```

### 宿主 AI 开工指令

- 目标：把项目上下文转成一次对话开始前的宿主 AI 指令。
- 预期输出：一段边界明确、证据引用明确、适合复制给宿主 AI 的开工前指令。

```text
请基于 crawl4ai 的 AI Context Pack，生成一段我可以粘贴给宿主 AI 的开工前指令。这段指令必须遵守 not_runtime=true，不能声称项目已经安装、运行或产生真实结果。
```


## 角色 / Skill 索引

- 共索引 79 个角色 / Skill / 项目文档条目。

- **GitHub Actions Workflows Documentation**（project_doc）：GitHub Actions Workflows Documentation 激活提示：当用户需要理解项目结构、安装方式或边界时参考。 证据：`.github/workflows/docs/README.md`
- **Crawl4AI Prospect‑Wizard – step‑by‑step guide**（project_doc）：Crawl4AI Prospect‑Wizard – step‑by‑step guide 激活提示：当用户需要理解项目结构、安装方式或边界时参考。 证据：`docs/apps/linkdin/README.md`
- **Adaptive Crawling Examples**（project_doc）：This directory contains examples demonstrating various aspects of Crawl4AI's Adaptive Crawling feature. 激活提示：当用户需要理解项目结构、安装方式或边界时参考。 证据：`docs/examples/adaptive_crawling/README.md`
- **Amazon R2D2 Product Search Example**（project_doc）：A real-world demonstration of Crawl4AI's multi-step crawling with LLM-generated automation scripts. 激活提示：当用户需要理解项目结构、安装方式或边界时参考。 证据：`docs/examples/c4a_script/amazon_example/README.md`
- **C4A-Script Interactive Tutorial**（project_doc）：A comprehensive web-based tutorial for learning and experimenting with C4A-Script - Crawl4AI's visual web automation language. 激活提示：当用户需要理解项目结构、安装方式或边界时参考。 证据：`docs/examples/c4a_script/tutorial/README.md`
- **Web Scraper API with Custom Model Support**（project_doc）：Web Scraper API with Custom Model Support 激活提示：当用户需要理解项目结构、安装方式或边界时参考。 证据：`docs/examples/website-to-api/README.md`
- **C4A-Script Interactive Tutorial**（project_doc）：A comprehensive web-based tutorial for learning and experimenting with C4A-Script - Crawl4AI's visual web automation language. 激活提示：当用户需要理解项目结构、安装方式或边界时参考。 证据：`docs/md_v2/apps/c4a-script/README.md`
- **Crawl4AI Chrome Extension**（project_doc）：Visual extraction tools for Crawl4AI - Click to extract data and content from any webpage! 激活提示：当用户需要理解项目结构、安装方式或边界时参考。 证据：`docs/md_v2/apps/crawl4ai-assistant/README.md`
- **Crawl4AI Marketplace**（project_doc）：A terminal-themed marketplace for tools, integrations, and resources related to Crawl4AI. 激活提示：当用户需要理解项目结构、安装方式或边界时参考。 证据：`docs/md_v2/marketplace/README.md`
- **Contributing to Crawl4AI**（project_doc）：Welcome to the Crawl4AI project! As an open-source library for web crawling and AI integration, we value contributions from the community. This guide explains our branching strategy, how to contribute effectively, and the overall release process. Our goal is to maintain a stable, collaborative environment where bug fixes, features, and improvements can be integrated smoothly while allowing for experimental developme… 激活提示：当用户需要理解项目结构、安装方式或边界时参考。 证据：`docs/md_v2/CONTRIBUTING.md`
- **🚀🤖 Crawl4AI: Open-source LLM Friendly Web Crawler & Scraper.**（project_doc）：🚀🤖 Crawl4AI: Open-source LLM Friendly Web Crawler & Scraper. 激活提示：当用户需要理解项目结构、安装方式或边界时参考。 证据：`README.md`
- **Software Bill of Materials SBOM**（project_doc）：This directory contains the CycloneDX SBOM for the project. 激活提示：当用户需要理解项目结构、安装方式或边界时参考。 证据：`sbom/README.md`
- **Crawl4AI Docker Guide 🐳**（project_doc）：Table of Contents - Prerequisites prerequisites - Installation installation - Option 1: Using Pre-built Docker Hub Images Recommended option-1-using-pre-built-docker-hub-images-recommended - Option 2: Using Docker Compose option-2-using-docker-compose - Option 3: Manual Local Build & Run option-3-manual-local-build--run - Dockerfile Parameters dockerfile-parameters - Using the API using-the-api - Playground Interfac… 激活提示：当用户需要理解项目结构、安装方式或边界时参考。 证据：`deploy/docker/README.md`
- **Crawl4AI Stress Testing and Benchmarking**（project_doc）：Crawl4AI Stress Testing and Benchmarking 激活提示：当用户需要理解项目结构、安装方式或边界时参考。 证据：`tests/memory/README.md`
- **Contributing to Crawl4AI**（project_doc）：Welcome to the Crawl4AI project! As an open-source library for web crawling and AI integration, we value contributions from the community. This guide explains our branching strategy, how to contribute effectively, and the overall release process. Our goal is to maintain a stable, collaborative environment where bug fixes, features, and improvements can be integrated smoothly while allowing for experimental developme… 激活提示：当用户需要理解项目结构、安装方式或边界时参考。 证据：`CONTRIBUTING.md`
- **Crawl4AI v0.8.0 Release Notes**（project_doc）：Release Date : January 2026 Previous Version : v0.7.6 Status : Release Candidate 激活提示：当用户需要理解项目结构、安装方式或边界时参考。 证据：`docs/RELEASE_NOTES_v0.8.0.md`
- **Workflow Architecture Documentation**（project_doc）：Workflow Architecture Documentation 激活提示：当用户需要理解项目结构、安装方式或边界时参考。 证据：`.github/workflows/docs/ARCHITECTURE.md`
- **Workflow Quick Reference**（project_doc）：Standard Release bash 1. Update version vim crawl4ai/ version .py Set to "1.2.3" 激活提示：当用户需要理解项目结构、安装方式或边界时参考。 证据：`.github/workflows/docs/WORKFLOW_REFERENCE.md`
- **🚀 Crawl4AI v0.7.0: The Adaptive Intelligence Update**（project_doc）：🚀 Crawl4AI v0.7.0: The Adaptive Intelligence Update 激活提示：当用户需要理解项目结构、安装方式或边界时参考。 证据：`docs/blog/release-v0.7.0.md`
- **🛠️ Crawl4AI v0.7.1: Minor Cleanup Update**（project_doc）：🛠️ Crawl4AI v0.7.1: Minor Cleanup Update 激活提示：当用户需要理解项目结构、安装方式或边界时参考。 证据：`docs/blog/release-v0.7.1.md`
- **🚀 Crawl4AI v0.7.3: The Multi-Config Intelligence Update**（project_doc）：🚀 Crawl4AI v0.7.3: The Multi-Config Intelligence Update 激活提示：当用户需要理解项目结构、安装方式或边界时参考。 证据：`docs/blog/release-v0.7.3.md`
- **🚀 Crawl4AI v0.7.4: The Intelligent Table Extraction & Performance Update**（project_doc）：🚀 Crawl4AI v0.7.4: The Intelligent Table Extraction & Performance Update 激活提示：当用户需要理解项目结构、安装方式或边界时参考。 证据：`docs/blog/release-v0.7.4.md`
- **🚀 Crawl4AI v0.7.5: The Docker Hooks & Security Update**（project_doc）：🚀 Crawl4AI v0.7.5: The Docker Hooks & Security Update 激活提示：当用户需要理解项目结构、安装方式或边界时参考。 证据：`docs/blog/release-v0.7.5.md`
- **Crawl4AI v0.7.6 Release Notes**（project_doc）：I'm excited to announce Crawl4AI v0.7.6, featuring a complete webhook infrastructure for the Docker job queue API! This release eliminates polling and brings real-time notifications to both crawling and LLM extraction workflows. 激活提示：当用户需要理解项目结构、安装方式或边界时参考。 证据：`docs/blog/release-v0.7.6.md`
- **🚀 Crawl4AI v0.7.7: The Self-Hosting & Monitoring Update**（project_doc）：🚀 Crawl4AI v0.7.7: The Self-Hosting & Monitoring Update 激活提示：当用户需要理解项目结构、安装方式或边界时参考。 证据：`docs/blog/release-v0.7.7.md`
- **Crawl4AI v0.7.8: Stability & Bug Fix Release**（project_doc）：Crawl4AI v0.7.8: Stability & Bug Fix Release 激活提示：当用户需要理解项目结构、安装方式或边界时参考。 证据：`docs/blog/release-v0.7.8.md`
- **Crawl4AI v0.8.0 Release Notes**（project_doc）：Release Date : January 2026 Previous Version : v0.7.6 Status : Release Candidate 激活提示：当用户需要理解项目结构、安装方式或边界时参考。 证据：`docs/blog/release-v0.8.0.md`
- **Crawl4AI v0.8.5: Anti-Bot, Shadow DOM & 60+ Bug Fixes**（project_doc）：Crawl4AI v0.8.5: Anti-Bot, Shadow DOM & 60+ Bug Fixes 激活提示：当用户需要理解项目结构、安装方式或边界时参考。 证据：`docs/blog/release-v0.8.5.md`
- **browser manager.py**（project_doc）：Function What it does --- --- ManagedBrowser.build browser flags Returns baseline Chromium CLI flags, disables GPU and sandbox, plugs locale, timezone, stealth tweaks, and any extras from BrowserConfig . ManagedBrowser. init Stores config and logger, creates temp dir, preps internal state. ManagedBrowser.start Spawns or connects to the Chromium process, returns its CDP endpoint plus the subprocess.Popen handle. Mana… 激活提示：当用户需要理解项目结构、安装方式或边界时参考。 证据：`docs/codebase/browser.md`
- **cli.py command surface**（project_doc）：Command Inputs / flags What it does --- --- --- profiles none Opens the interactive profile manager, lets you list, create, delete saved browser profiles that live in ~/.crawl4ai/profiles . browser status – Prints whether the always-on builtin browser is running, shows its CDP URL, PID, start time. browser stop – Kills the builtin browser and deletes its status file. browser view --url, -u URL optional Pops a visibl… 激活提示：当用户需要理解项目结构、安装方式或边界时参考。 证据：`docs/codebase/cli.md`
- **🐳 Using Docker Legacy**（project_doc）：Crawl4AI is available as Docker images for easy deployment. You can either pull directly from Docker Hub recommended or build from the repository. 激活提示：当用户需要理解项目结构、安装方式或边界时参考。 证据：`docs/deprecated/docker-deployment.md`
- **Builtin Browser in Crawl4AI**（project_doc）：This document explains the builtin browser feature in Crawl4AI and how to use it effectively. 激活提示：当用户需要理解项目结构、安装方式或边界时参考。 证据：`docs/examples/README_BUILTIN_BROWSER.md`
- **Welcome to Crawl4AI! 🚀🤖**（project_doc）：Hi there, Developer! 👋 Here is an example of a research pipeline, where you can share a URL in your conversation with any LLM, and then the context of crawled pages will be used as the context. 激活提示：当用户需要理解项目结构、安装方式或边界时参考。 证据：`docs/examples/chainlit.md`
- **Capturing Full-Page Screenshots and PDFs from Massive Webpages with Crawl4AI**（project_doc）：Capturing Full-Page Screenshots and PDFs from Massive Webpages with Crawl4AI 激活提示：当用户需要理解项目结构、安装方式或边界时参考。 证据：`docs/examples/full_page_screenshot_and_pdf_export.md`
- **Using storage state to Pre-Load Cookies and LocalStorage**（project_doc）：Using storage state to Pre-Load Cookies and LocalStorage 激活提示：当用户需要理解项目结构、安装方式或边界时参考。 证据：`docs/examples/storage_state_tutorial.md`
- **Tutorial: Clicking Buttons to Load More Content with Crawl4AI**（project_doc）：Tutorial: Clicking Buttons to Load More Content with Crawl4AI 激活提示：当用户需要理解项目结构、安装方式或边界时参考。 证据：`docs/examples/tutorial_dynamic_clicks.md`
- **🔬 Building an AI Research Assistant with Crawl4AI: Smart URL Discovery**（project_doc）：🔬 Building an AI Research Assistant with Crawl4AI: Smart URL Discovery 激活提示：当用户需要理解项目结构、安装方式或边界时参考。 证据：`docs/examples/url_seeder/tutorial_url_seeder.md`
- **Advanced Adaptive Strategies**（project_doc）：While the default adaptive crawling configuration works well for most use cases, understanding the underlying strategies and scoring mechanisms allows you to fine-tune the crawler for specific domains and requirements. 激活提示：当用户需要理解项目结构、安装方式或边界时参考。 证据：`docs/md_v2/advanced/adaptive-strategies.md`
- **Overview of Some Important Advanced Features**（project_doc）：Overview of Some Important Advanced Features Proxy, PDF, Screenshot, SSL, Headers, & Storage State 激活提示：当用户需要理解项目结构、安装方式或边界时参考。 证据：`docs/md_v2/advanced/advanced-features.md`
- **Anti-Bot Detection & Fallback**（project_doc）：When crawling sites protected by anti-bot systems Akamai, Cloudflare, PerimeterX, DataDome, Imperva, etc. , requests often get blocked with CAPTCHAs, 403 responses, or empty pages. Crawl4AI provides a layered retry and fallback system that automatically detects blocking and escalates through multiple strategies until content is retrieved. 激活提示：当用户需要理解项目结构、安装方式或边界时参考。 证据：`docs/md_v2/advanced/anti-bot-and-fallback.md`
- **Crawl Dispatcher**（project_doc）：We’re excited to announce a Crawl Dispatcher module that can handle thousands of crawling tasks simultaneously. By efficiently managing system resources memory, CPU, network , this dispatcher ensures high-performance data extraction at scale. It also provides real-time monitoring of each crawler’s status, memory usage, and overall progress. 激活提示：当用户需要理解项目结构、安装方式或边界时参考。 证据：`docs/md_v2/advanced/crawl-dispatcher.md`
- **Download Handling in Crawl4AI**（project_doc）：This guide explains how to use Crawl4AI to handle file downloads during crawling. You'll learn how to trigger downloads, specify download locations, and access downloaded files. 激活提示：当用户需要理解项目结构、安装方式或边界时参考。 证据：`docs/md_v2/advanced/file-downloading.md`
- **Hooks & Auth in AsyncWebCrawler**（project_doc）：Crawl4AI’s hooks let you customize the crawler at specific points in the pipeline: 激活提示：当用户需要理解项目结构、安装方式或边界时参考。 证据：`docs/md_v2/advanced/hooks-auth.md`
- **Preserve Your Identity with Crawl4AI**（project_doc）：Preserve Your Identity with Crawl4AI 激活提示：当用户需要理解项目结构、安装方式或边界时参考。 证据：`docs/md_v2/advanced/identity-based-crawling.md`
- **Handling Lazy-Loaded Images**（project_doc）：Many websites now load images lazily as you scroll. If you need to ensure they appear in your final crawl and in result.media , consider: 激活提示：当用户需要理解项目结构、安装方式或边界时参考。 证据：`docs/md_v2/advanced/lazy-loading.md`
- **Advanced Multi-URL Crawling with Dispatchers**（project_doc）：Advanced Multi-URL Crawling with Dispatchers 激活提示：当用户需要理解项目结构、安装方式或边界时参考。 证据：`docs/md_v2/advanced/multi-url-crawling.md`
- **Network Requests & Console Message Capturing**（project_doc）：Network Requests & Console Message Capturing 激活提示：当用户需要理解项目结构、安装方式或边界时参考。 证据：`docs/md_v2/advanced/network-console-capture.md`
- **PDF Processing Strategies**（project_doc）：Crawl4AI provides specialized strategies for handling and extracting content from PDF files. These strategies allow you to seamlessly integrate PDF processing into your crawling workflows, whether the PDFs are hosted online or stored locally. 激活提示：当用户需要理解项目结构、安装方式或边界时参考。 证据：`docs/md_v2/advanced/pdf-parsing.md`
- **Session Management**（project_doc）：Session management in Crawl4AI is a powerful feature that allows you to maintain state across multiple requests, making it particularly suitable for handling complex multi-step crawling tasks. It enables you to reuse the same browser tab or page object across sequential actions and crawls, which is beneficial for: 激活提示：当用户需要理解项目结构、安装方式或边界时参考。 证据：`docs/md_v2/advanced/session-management.md`
- **SSLCertificate Reference**（project_doc）：The SSLCertificate class encapsulates an SSL certificate’s data and allows exporting it in various formats PEM, DER, JSON, or text . It’s used within Crawl4AI whenever you set fetch ssl certificate=True in your CrawlerRunConfig . 激活提示：当用户需要理解项目结构、安装方式或边界时参考。 证据：`docs/md_v2/advanced/ssl-certificate.md`
- **Undetected Browser Mode**（project_doc）：Crawl4AI offers two powerful anti-bot features to help you access websites with bot detection: 激活提示：当用户需要理解项目结构、安装方式或边界时参考。 证据：`docs/md_v2/advanced/undetected-browser.md`
- **Virtual Scroll**（project_doc）：Modern websites increasingly use virtual scrolling also called windowed rendering or viewport rendering to handle large datasets efficiently. This technique only renders visible items in the DOM, replacing content as users scroll. Popular examples include Twitter's timeline, Instagram's feed, and many data tables. 激活提示：当用户需要理解项目结构、安装方式或边界时参考。 证据：`docs/md_v2/advanced/virtual-scroll.md`
- **AdaptiveCrawler**（project_doc）：The AdaptiveCrawler class implements intelligent web crawling that automatically determines when sufficient information has been gathered to answer a query. It uses a three-layer scoring system to evaluate coverage, consistency, and saturation. 激活提示：当用户需要理解项目结构、安装方式或边界时参考。 证据：`docs/md_v2/api/adaptive-crawler.md`
- **arun Parameter Guide New Approach**（project_doc）：In Crawl4AI’s latest configuration model, nearly all parameters that once went directly to arun are now part of CrawlerRunConfig . When calling arun , you provide: 激活提示：当用户需要理解项目结构、安装方式或边界时参考。 证据：`docs/md_v2/api/arun.md`
- **arun many ... Reference**（project_doc）：Note : This function is very similar to arun ./arun.md but focused on concurrent or batch crawling. If you’re unfamiliar with arun usage, please read that doc first, then review this for differences. 激活提示：当用户需要理解项目结构、安装方式或边界时参考。 证据：`docs/md_v2/api/arun_many.md`
- **AsyncWebCrawler**（project_doc）：The AsyncWebCrawler is the core class for asynchronous web crawling in Crawl4AI. You typically create it once , optionally customize it with a BrowserConfig e.g., headless, user agent , then run multiple arun calls with different CrawlerRunConfig objects. 激活提示：当用户需要理解项目结构、安装方式或边界时参考。 证据：`docs/md_v2/api/async-webcrawler.md`
- **C4A-Script API Reference**（project_doc）：Complete reference for all C4A-Script commands, syntax, and advanced features. 激活提示：当用户需要理解项目结构、安装方式或边界时参考。 证据：`docs/md_v2/api/c4a-script-reference.md`
- **CrawlResult Reference**（project_doc）：The CrawlResult class encapsulates everything returned after a single crawl operation. It provides the raw or processed content , details on links and media, plus optional metadata like screenshots, PDFs, or extracted JSON . 激活提示：当用户需要理解项目结构、安装方式或边界时参考。 证据：`docs/md_v2/api/crawl-result.md`
- **digest**（project_doc）：The digest method is the primary interface for adaptive web crawling. It intelligently crawls websites starting from a given URL, guided by a query, and automatically determines when sufficient information has been gathered. 激活提示：当用户需要理解项目结构、安装方式或边界时参考。 证据：`docs/md_v2/api/digest.md`
- **1. BrowserConfig – Controlling the Browser**（project_doc）：1. BrowserConfig – Controlling the Browser 激活提示：当用户需要理解项目结构、安装方式或边界时参考。 证据：`docs/md_v2/api/parameters.md`
- **Extraction & Chunking Strategies API**（project_doc）：Extraction & Chunking Strategies API 激活提示：当用户需要理解项目结构、安装方式或边界时参考。 证据：`docs/md_v2/api/strategies.md`
- **🚀 Crawl4AI Interactive Apps**（project_doc）：Welcome to the Crawl4AI Apps Hub - your gateway to interactive tools and demos that make web scraping more intuitive and powerful. 激活提示：当用户需要理解项目结构、安装方式或边界时参考。 证据：`docs/md_v2/apps/index.md`
- **Build**（project_doc）：O Prompt for AI Coding Assistant: Create an Interactive LLM Context Builder Page 激活提示：当用户需要理解项目结构、安装方式或边界时参考。 证据：`docs/md_v2/apps/llmtxt/build.md`
- **Supercharging Your AI Assistant: My Journey to Better LLM Contexts for crawl4ai**（project_doc）：Supercharging Your AI Assistant: My Journey to Better LLM Contexts for crawl4ai 激活提示：当用户需要理解项目结构、安装方式或边界时参考。 证据：`docs/md_v2/apps/llmtxt/why.md`
- **Installation 💻**（project_doc）：Crawl4AI offers flexible installation options to suit various use cases. You can install it as a Python package, use it with Docker, or run it as a local server. 激活提示：当用户需要理解项目结构、安装方式或边界时参考。 证据：`docs/md_v2/basic/installation.md`
- **Adaptive Crawling: Building Dynamic Knowledge That Grows on Demand**（project_doc）：Adaptive Crawling: Building Dynamic Knowledge That Grows on Demand 激活提示：当用户需要理解项目结构、安装方式或边界时参考。 证据：`docs/md_v2/blog/articles/adaptive-crawling-revolution.md`
- **Introducing Event Streams and Interactive Hooks in Crawl4AI**（project_doc）：Introducing Event Streams and Interactive Hooks in Crawl4AI 激活提示：当用户需要理解项目结构、安装方式或边界时参考。 证据：`docs/md_v2/blog/articles/dockerize_hooks.md`
- **The LLM Context Protocol: Why Your AI Assistant Needs Memory, Reasoning, and Examples**（project_doc）：The LLM Context Protocol: Why Your AI Assistant Needs Memory, Reasoning, and Examples 激活提示：当用户需要理解项目结构、安装方式或边界时参考。 证据：`docs/md_v2/blog/articles/llm-context-revolution.md`
- **Solving the Virtual Scroll Puzzle: How Crawl4AI Captures What Others Miss**（project_doc）：Solving the Virtual Scroll Puzzle: How Crawl4AI Captures What Others Miss 激活提示：当用户需要理解项目结构、安装方式或边界时参考。 证据：`docs/md_v2/blog/articles/virtual-scroll-revolution.md`
- **Crawl4AI Blog**（project_doc）：Welcome to the Crawl4AI blog! Here you'll find detailed release notes, technical insights, and updates about the project. Whether you're looking for the latest improvements or want to dive deep into web crawling techniques, this is the place. 激活提示：当用户需要理解项目结构、安装方式或边界时参考。 证据：`docs/md_v2/blog/index.md`
- **Release Summary for Version 0.4.0 December 1, 2024**（project_doc）：Release Summary for Version 0.4.0 December 1, 2024 激活提示：当用户需要理解项目结构、安装方式或边界时参考。 证据：`docs/md_v2/blog/releases/0.4.0.md`
- **Release Summary for Version 0.4.1 December 8, 2024 : Major Efficiency Boosts with New Features!**（project_doc）：Release Summary for Version 0.4.1 December 8, 2024 : Major Efficiency Boosts with New Features! 激活提示：当用户需要理解项目结构、安装方式或边界时参考。 证据：`docs/md_v2/blog/releases/0.4.1.md`
- **🚀 Crawl4AI 0.4.2 Update: Smarter Crawling Just Got Easier Dec 12, 2024**（project_doc）：🚀 Crawl4AI 0.4.2 Update: Smarter Crawling Just Got Easier Dec 12, 2024 激活提示：当用户需要理解项目结构、安装方式或边界时参考。 证据：`docs/md_v2/blog/releases/0.4.2.md`
- **Crawl4AI v0.5.0 Release Notes**（project_doc）：Release Theme: Power, Flexibility, and Scalability 激活提示：当用户需要理解项目结构、安装方式或边界时参考。 证据：`docs/md_v2/blog/releases/0.5.0.md`
- **Crawl4AI v0.6.0 Release Notes**（project_doc）：We're excited to announce the release of Crawl4AI v0.6.0 , our biggest and most feature-rich update yet. This version introduces major architectural upgrades, brand-new capabilities for geo-aware crawling, high-efficiency scraping, and real-time streaming support for scalable deployments. 激活提示：当用户需要理解项目结构、安装方式或边界时参考。 证据：`docs/md_v2/blog/releases/0.6.0.md`
- **🚀 Crawl4AI v0.7.0: The Adaptive Intelligence Update**（project_doc）：🚀 Crawl4AI v0.7.0: The Adaptive Intelligence Update 激活提示：当用户需要理解项目结构、安装方式或边界时参考。 证据：`docs/md_v2/blog/releases/0.7.0.md`
- **🛠️ Crawl4AI v0.7.1: Minor Cleanup Update**（project_doc）：🛠️ Crawl4AI v0.7.1: Minor Cleanup Update 激活提示：当用户需要理解项目结构、安装方式或边界时参考。 证据：`docs/md_v2/blog/releases/0.7.1.md`
- **🚀 Crawl4AI v0.7.2: CI/CD & Dependency Optimization Update**（project_doc）：🚀 Crawl4AI v0.7.2: CI/CD & Dependency Optimization Update 激活提示：当用户需要理解项目结构、安装方式或边界时参考。 证据：`docs/md_v2/blog/releases/0.7.2.md`
- **🚀 Crawl4AI v0.7.3: The Multi-Config Intelligence Update**（project_doc）：🚀 Crawl4AI v0.7.3: The Multi-Config Intelligence Update 激活提示：当用户需要理解项目结构、安装方式或边界时参考。 证据：`docs/md_v2/blog/releases/0.7.3.md`

## 证据索引

- 共索引 80 条证据。

- **GitHub Actions Workflows Documentation**（documentation）：GitHub Actions Workflows Documentation 证据：`.github/workflows/docs/README.md`
- **Crawl4AI Prospect‑Wizard – step‑by‑step guide**（documentation）：Crawl4AI Prospect‑Wizard – step‑by‑step guide 证据：`docs/apps/linkdin/README.md`
- **Adaptive Crawling Examples**（documentation）：This directory contains examples demonstrating various aspects of Crawl4AI's Adaptive Crawling feature. 证据：`docs/examples/adaptive_crawling/README.md`
- **Amazon R2D2 Product Search Example**（documentation）：A real-world demonstration of Crawl4AI's multi-step crawling with LLM-generated automation scripts. 证据：`docs/examples/c4a_script/amazon_example/README.md`
- **C4A-Script Interactive Tutorial**（documentation）：A comprehensive web-based tutorial for learning and experimenting with C4A-Script - Crawl4AI's visual web automation language. 证据：`docs/examples/c4a_script/tutorial/README.md`
- **Web Scraper API with Custom Model Support**（documentation）：Web Scraper API with Custom Model Support 证据：`docs/examples/website-to-api/README.md`
- **C4A-Script Interactive Tutorial**（documentation）：A comprehensive web-based tutorial for learning and experimenting with C4A-Script - Crawl4AI's visual web automation language. 证据：`docs/md_v2/apps/c4a-script/README.md`
- **Crawl4AI Chrome Extension**（documentation）：Visual extraction tools for Crawl4AI - Click to extract data and content from any webpage! 证据：`docs/md_v2/apps/crawl4ai-assistant/README.md`
- **Crawl4AI Marketplace**（documentation）：A terminal-themed marketplace for tools, integrations, and resources related to Crawl4AI. 证据：`docs/md_v2/marketplace/README.md`
- **Contributing to Crawl4AI**（documentation）：Welcome to the Crawl4AI project! As an open-source library for web crawling and AI integration, we value contributions from the community. This guide explains our branching strategy, how to contribute effectively, and the overall release process. Our goal is to maintain a stable, collaborative environment where bug fixes, features, and improvements can be integrated smoothly while allowing for experimental development. 证据：`docs/md_v2/CONTRIBUTING.md`
- **🚀🤖 Crawl4AI: Open-source LLM Friendly Web Crawler & Scraper.**（documentation）：🚀🤖 Crawl4AI: Open-source LLM Friendly Web Crawler & Scraper. 证据：`README.md`
- **Software Bill of Materials SBOM**（documentation）：This directory contains the CycloneDX SBOM for the project. 证据：`sbom/README.md`
- **Crawl4AI Docker Guide 🐳**（documentation）：Table of Contents - Prerequisites prerequisites - Installation installation - Option 1: Using Pre-built Docker Hub Images Recommended option-1-using-pre-built-docker-hub-images-recommended - Option 2: Using Docker Compose option-2-using-docker-compose - Option 3: Manual Local Build & Run option-3-manual-local-build--run - Dockerfile Parameters dockerfile-parameters - Using the API using-the-api - Playground Interface playground-interface - Python SDK python-sdk - Understanding Request Schema understanding-request-schema - REST API Examples rest-api-examples - Asynchronous Jobs with Webhooks asynchronous-jobs-with-webhooks - Additional API Endpoints additional-api-endpoints - HTML Extraction… 证据：`deploy/docker/README.md`
- **Crawl4AI Stress Testing and Benchmarking**（documentation）：Crawl4AI Stress Testing and Benchmarking 证据：`tests/memory/README.md`
- **Contributing to Crawl4AI**（documentation）：Welcome to the Crawl4AI project! As an open-source library for web crawling and AI integration, we value contributions from the community. This guide explains our branching strategy, how to contribute effectively, and the overall release process. Our goal is to maintain a stable, collaborative environment where bug fixes, features, and improvements can be integrated smoothly while allowing for experimental development. 证据：`CONTRIBUTING.md`
- **License**（source_file）：Apache License Version 2.0, January 2004 http://www.apache.org/licenses/ 证据：`LICENSE`
- **Crawl4AI v0.8.0 Release Notes**（documentation）：Release Date : January 2026 Previous Version : v0.7.6 Status : Release Candidate 证据：`docs/RELEASE_NOTES_v0.8.0.md`
- **Workflow Architecture Documentation**（documentation）：Workflow Architecture Documentation 证据：`.github/workflows/docs/ARCHITECTURE.md`
- **Workflow Quick Reference**（documentation）：Standard Release bash 1. Update version vim crawl4ai/ version .py Set to "1.2.3" 证据：`.github/workflows/docs/WORKFLOW_REFERENCE.md`
- **🚀 Crawl4AI v0.7.0: The Adaptive Intelligence Update**（documentation）：🚀 Crawl4AI v0.7.0: The Adaptive Intelligence Update 证据：`docs/blog/release-v0.7.0.md`
- **🛠️ Crawl4AI v0.7.1: Minor Cleanup Update**（documentation）：🛠️ Crawl4AI v0.7.1: Minor Cleanup Update 证据：`docs/blog/release-v0.7.1.md`
- **🚀 Crawl4AI v0.7.3: The Multi-Config Intelligence Update**（documentation）：🚀 Crawl4AI v0.7.3: The Multi-Config Intelligence Update 证据：`docs/blog/release-v0.7.3.md`
- **🚀 Crawl4AI v0.7.4: The Intelligent Table Extraction & Performance Update**（documentation）：🚀 Crawl4AI v0.7.4: The Intelligent Table Extraction & Performance Update 证据：`docs/blog/release-v0.7.4.md`
- **🚀 Crawl4AI v0.7.5: The Docker Hooks & Security Update**（documentation）：🚀 Crawl4AI v0.7.5: The Docker Hooks & Security Update 证据：`docs/blog/release-v0.7.5.md`
- **Crawl4AI v0.7.6 Release Notes**（documentation）：I'm excited to announce Crawl4AI v0.7.6, featuring a complete webhook infrastructure for the Docker job queue API! This release eliminates polling and brings real-time notifications to both crawling and LLM extraction workflows. 证据：`docs/blog/release-v0.7.6.md`
- **🚀 Crawl4AI v0.7.7: The Self-Hosting & Monitoring Update**（documentation）：🚀 Crawl4AI v0.7.7: The Self-Hosting & Monitoring Update 证据：`docs/blog/release-v0.7.7.md`
- **Crawl4AI v0.7.8: Stability & Bug Fix Release**（documentation）：Crawl4AI v0.7.8: Stability & Bug Fix Release 证据：`docs/blog/release-v0.7.8.md`
- **Crawl4AI v0.8.0 Release Notes**（documentation）：Release Date : January 2026 Previous Version : v0.7.6 Status : Release Candidate 证据：`docs/blog/release-v0.8.0.md`
- **Crawl4AI v0.8.5: Anti-Bot, Shadow DOM & 60+ Bug Fixes**（documentation）：Crawl4AI v0.8.5: Anti-Bot, Shadow DOM & 60+ Bug Fixes 证据：`docs/blog/release-v0.8.5.md`
- **browser manager.py**（documentation）：Function What it does --- --- ManagedBrowser.build browser flags Returns baseline Chromium CLI flags, disables GPU and sandbox, plugs locale, timezone, stealth tweaks, and any extras from BrowserConfig . ManagedBrowser. init Stores config and logger, creates temp dir, preps internal state. ManagedBrowser.start Spawns or connects to the Chromium process, returns its CDP endpoint plus the subprocess.Popen handle. ManagedBrowser. initial startup check Pings the CDP endpoint once to be sure the browser is alive, raises if not. ManagedBrowser. monitor browser process Async-loops on the subprocess, logs exits or crashes, restarts if policy allows. ManagedBrowser. get browser path WIP Old helper t… 证据：`docs/codebase/browser.md`
- **cli.py command surface**（documentation）：Command Inputs / flags What it does --- --- --- profiles none Opens the interactive profile manager, lets you list, create, delete saved browser profiles that live in ~/.crawl4ai/profiles . browser status – Prints whether the always-on builtin browser is running, shows its CDP URL, PID, start time. browser stop – Kills the builtin browser and deletes its status file. browser view --url, -u URL optional Pops a visible window of the builtin browser, navigates to URL or about:blank . config list – Dumps every global setting, showing current value, default, and description. config get key Prints the value of a single setting, falls back to default if unset. config set key value Persists a new v… 证据：`docs/codebase/cli.md`
- **🐳 Using Docker Legacy**（documentation）：Crawl4AI is available as Docker images for easy deployment. You can either pull directly from Docker Hub recommended or build from the repository. 证据：`docs/deprecated/docker-deployment.md`
- **Builtin Browser in Crawl4AI**（documentation）：This document explains the builtin browser feature in Crawl4AI and how to use it effectively. 证据：`docs/examples/README_BUILTIN_BROWSER.md`
- **Welcome to Crawl4AI! 🚀🤖**（documentation）：Hi there, Developer! 👋 Here is an example of a research pipeline, where you can share a URL in your conversation with any LLM, and then the context of crawled pages will be used as the context. 证据：`docs/examples/chainlit.md`
- **Capturing Full-Page Screenshots and PDFs from Massive Webpages with Crawl4AI**（documentation）：Capturing Full-Page Screenshots and PDFs from Massive Webpages with Crawl4AI 证据：`docs/examples/full_page_screenshot_and_pdf_export.md`
- **Using storage state to Pre-Load Cookies and LocalStorage**（documentation）：Using storage state to Pre-Load Cookies and LocalStorage 证据：`docs/examples/storage_state_tutorial.md`
- **Tutorial: Clicking Buttons to Load More Content with Crawl4AI**（documentation）：Tutorial: Clicking Buttons to Load More Content with Crawl4AI 证据：`docs/examples/tutorial_dynamic_clicks.md`
- **🔬 Building an AI Research Assistant with Crawl4AI: Smart URL Discovery**（documentation）：🔬 Building an AI Research Assistant with Crawl4AI: Smart URL Discovery 证据：`docs/examples/url_seeder/tutorial_url_seeder.md`
- **Advanced Adaptive Strategies**（documentation）：While the default adaptive crawling configuration works well for most use cases, understanding the underlying strategies and scoring mechanisms allows you to fine-tune the crawler for specific domains and requirements. 证据：`docs/md_v2/advanced/adaptive-strategies.md`
- **Overview of Some Important Advanced Features**（documentation）：Overview of Some Important Advanced Features Proxy, PDF, Screenshot, SSL, Headers, & Storage State 证据：`docs/md_v2/advanced/advanced-features.md`
- **Anti-Bot Detection & Fallback**（documentation）：When crawling sites protected by anti-bot systems Akamai, Cloudflare, PerimeterX, DataDome, Imperva, etc. , requests often get blocked with CAPTCHAs, 403 responses, or empty pages. Crawl4AI provides a layered retry and fallback system that automatically detects blocking and escalates through multiple strategies until content is retrieved. 证据：`docs/md_v2/advanced/anti-bot-and-fallback.md`
- **Crawl Dispatcher**（documentation）：We’re excited to announce a Crawl Dispatcher module that can handle thousands of crawling tasks simultaneously. By efficiently managing system resources memory, CPU, network , this dispatcher ensures high-performance data extraction at scale. It also provides real-time monitoring of each crawler’s status, memory usage, and overall progress. 证据：`docs/md_v2/advanced/crawl-dispatcher.md`
- **Download Handling in Crawl4AI**（documentation）：This guide explains how to use Crawl4AI to handle file downloads during crawling. You'll learn how to trigger downloads, specify download locations, and access downloaded files. 证据：`docs/md_v2/advanced/file-downloading.md`
- **Hooks & Auth in AsyncWebCrawler**（documentation）：Crawl4AI’s hooks let you customize the crawler at specific points in the pipeline: 证据：`docs/md_v2/advanced/hooks-auth.md`
- **Preserve Your Identity with Crawl4AI**（documentation）：Preserve Your Identity with Crawl4AI 证据：`docs/md_v2/advanced/identity-based-crawling.md`
- **Handling Lazy-Loaded Images**（documentation）：Many websites now load images lazily as you scroll. If you need to ensure they appear in your final crawl and in result.media , consider: 证据：`docs/md_v2/advanced/lazy-loading.md`
- **Advanced Multi-URL Crawling with Dispatchers**（documentation）：Advanced Multi-URL Crawling with Dispatchers 证据：`docs/md_v2/advanced/multi-url-crawling.md`
- **Network Requests & Console Message Capturing**（documentation）：Network Requests & Console Message Capturing 证据：`docs/md_v2/advanced/network-console-capture.md`
- **PDF Processing Strategies**（documentation）：Crawl4AI provides specialized strategies for handling and extracting content from PDF files. These strategies allow you to seamlessly integrate PDF processing into your crawling workflows, whether the PDFs are hosted online or stored locally. 证据：`docs/md_v2/advanced/pdf-parsing.md`
- **Session Management**（documentation）：Session management in Crawl4AI is a powerful feature that allows you to maintain state across multiple requests, making it particularly suitable for handling complex multi-step crawling tasks. It enables you to reuse the same browser tab or page object across sequential actions and crawls, which is beneficial for: 证据：`docs/md_v2/advanced/session-management.md`
- **SSLCertificate Reference**（documentation）：The SSLCertificate class encapsulates an SSL certificate’s data and allows exporting it in various formats PEM, DER, JSON, or text . It’s used within Crawl4AI whenever you set fetch ssl certificate=True in your CrawlerRunConfig . 证据：`docs/md_v2/advanced/ssl-certificate.md`
- **Undetected Browser Mode**（documentation）：Crawl4AI offers two powerful anti-bot features to help you access websites with bot detection: 证据：`docs/md_v2/advanced/undetected-browser.md`
- **Virtual Scroll**（documentation）：Modern websites increasingly use virtual scrolling also called windowed rendering or viewport rendering to handle large datasets efficiently. This technique only renders visible items in the DOM, replacing content as users scroll. Popular examples include Twitter's timeline, Instagram's feed, and many data tables. 证据：`docs/md_v2/advanced/virtual-scroll.md`
- **AdaptiveCrawler**（documentation）：The AdaptiveCrawler class implements intelligent web crawling that automatically determines when sufficient information has been gathered to answer a query. It uses a three-layer scoring system to evaluate coverage, consistency, and saturation. 证据：`docs/md_v2/api/adaptive-crawler.md`
- **arun Parameter Guide New Approach**（documentation）：In Crawl4AI’s latest configuration model, nearly all parameters that once went directly to arun are now part of CrawlerRunConfig . When calling arun , you provide: 证据：`docs/md_v2/api/arun.md`
- **arun many ... Reference**（documentation）：Note : This function is very similar to arun ./arun.md but focused on concurrent or batch crawling. If you’re unfamiliar with arun usage, please read that doc first, then review this for differences. 证据：`docs/md_v2/api/arun_many.md`
- **AsyncWebCrawler**（documentation）：The AsyncWebCrawler is the core class for asynchronous web crawling in Crawl4AI. You typically create it once , optionally customize it with a BrowserConfig e.g., headless, user agent , then run multiple arun calls with different CrawlerRunConfig objects. 证据：`docs/md_v2/api/async-webcrawler.md`
- **C4A-Script API Reference**（documentation）：Complete reference for all C4A-Script commands, syntax, and advanced features. 证据：`docs/md_v2/api/c4a-script-reference.md`
- **CrawlResult Reference**（documentation）：The CrawlResult class encapsulates everything returned after a single crawl operation. It provides the raw or processed content , details on links and media, plus optional metadata like screenshots, PDFs, or extracted JSON . 证据：`docs/md_v2/api/crawl-result.md`
- **digest**（documentation）：The digest method is the primary interface for adaptive web crawling. It intelligently crawls websites starting from a given URL, guided by a query, and automatically determines when sufficient information has been gathered. 证据：`docs/md_v2/api/digest.md`
- 其余 20 条证据见 `AI_CONTEXT_PACK.json` 或 `EVIDENCE_INDEX.json`。

## 宿主 AI 必须遵守的规则

- **把本资产当作开工前上下文，而不是运行环境。**：AI Context Pack 只包含证据化项目理解，不包含目标项目的可执行状态。 证据：`.github/workflows/docs/README.md`, `docs/apps/linkdin/README.md`, `docs/examples/adaptive_crawling/README.md`
- **回答用户时区分可预览内容与必须安装后才能验证的内容。**：安装前体验的消费者价值来自降低误装和误判，而不是伪装成真实运行。 证据：`.github/workflows/docs/README.md`, `docs/apps/linkdin/README.md`, `docs/examples/adaptive_crawling/README.md`

## 用户开工前应该回答的问题

- 你准备在哪个宿主 AI 或本地环境中使用它？
- 你只是想先体验工作流，还是准备真实安装？
- 你最在意的是安装成本、输出质量、还是和现有规则的冲突？

## 验收标准

- 所有能力声明都能回指到 evidence_refs 中的文件路径。
- AI_CONTEXT_PACK.md 没有把预览包装成真实运行。
- 用户能在 3 分钟内看懂适合谁、能做什么、如何开始和风险边界。

---

## Doramagic Context Augmentation

下面内容用于强化 Repomix/AI Context Pack 主体。Human Manual 只提供阅读骨架；踩坑日志会被转成宿主 AI 必须遵守的工作约束。

## Human Manual 骨架

使用规则：这里只是项目阅读路线和显著性信号，不是事实权威。具体事实仍必须回到 repo evidence / Claim Graph。

宿主 AI 硬性规则：
- 不得把页标题、章节顺序、摘要或 importance 当作项目事实证据。
- 解释 Human Manual 骨架时，必须明确说它只是阅读路线/显著性信号。
- 能力、安装、兼容性、运行状态和风险判断必须引用 repo evidence、source path 或 Claim Graph。

- **项目概览**：importance `high`
  - source_paths: README.md, CHANGELOG.md, crawl4ai/__init__.py, crawl4ai/__version__.py, ROADMAP.md
- **安装与配置**：importance `high`
  - source_paths: setup.py, pyproject.toml, requirements.txt, Dockerfile, docker-compose.yml
- **快速开始**：importance `high`
  - source_paths: docs/examples/hello_world.py, docs/examples/quickstart.py, docs/examples/quickstart.ipynb, crawl4ai/async_webcrawler.py, crawl4ai/models.py
- **异步网页爬虫核心**：importance `high`
  - source_paths: crawl4ai/async_webcrawler.py, crawl4ai/async_crawler_strategy.py, crawl4ai/cache_context.py, crawl4ai/cache_validator.py, crawl4ai/config.py
- **深度爬取策略**：importance `high`
  - source_paths: crawl4ai/deep_crawling/__init__.py, crawl4ai/deep_crawling/bfs_strategy.py, crawl4ai/deep_crawling/dfs_strategy.py, crawl4ai/deep_crawling/bff_strategy.py, crawl4ai/deep_crawling/base_strategy.py
- **Markdown 生成**：importance `high`
  - source_paths: crawl4ai/markdown_generation_strategy.py, crawl4ai/content_filter_strategy.py, crawl4ai/html2text/__init__.py, crawl4ai/html2text/elements.py, crawl4ai/html2text/config.py
- **数据提取策略**：importance `high`
  - source_paths: crawl4ai/extraction_strategy.py, crawl4ai/content_scraping_strategy.py, crawl4ai/table_extraction.py, crawl4ai/crawlers/amazon_product/crawler.py, crawl4ai/crawlers/google_search/crawler.py
- **分块与过滤策略**：importance `medium`
  - source_paths: crawl4ai/chunking_strategy.py, crawl4ai/content_filter_strategy.py, crawl4ai/model_loader.py, crawl4ai/prompts.py

## Repo Inspection Evidence / 源码检查证据

- repo_clone_verified: true
- repo_inspection_verified: true
- repo_commit: `1debe5f5fcc118ced10826a1040a81f9b77e9255`
- inspected_files: `pyproject.toml`, `Dockerfile`, `README.md`, `docker-compose.yml`, `uv.lock`, `requirements.txt`, `docs/RELEASE_NOTES_v0.8.0.md`, `docs/releases_review/v0_7_0_features_demo.py`, `docs/releases_review/v0.3.74.overview.py`, `docs/releases_review/demo_v0.7.7.py`, `docs/releases_review/v0.7.5_docker_hooks_demo.py`, `docs/releases_review/v0_4_3b2_features_demo.py`, `docs/releases_review/demo_v0.8.5.py`, `docs/releases_review/demo_v0.7.6.py`, `docs/releases_review/demo_v0.7.0.py`, `docs/releases_review/demo_v0.8.0.py`, `docs/releases_review/demo_v0.7.8.py`, `docs/releases_review/v0_4_24_walkthrough.py`, `docs/releases_review/crawl4ai_v0_7_0_showcase.py`, `docs/releases_review/demo_v0.7.5.py`

宿主 AI 硬性规则：
- 没有 repo_clone_verified=true 时，不得声称已经读过源码。
- 没有 repo_inspection_verified=true 时，不得把 README/docs/package 文件判断写成事实。
- 没有 quick_start_verified=true 时，不得声称 Quick Start 已跑通。

## Doramagic Pitfall Constraints / 踩坑约束

这些规则来自 Doramagic 发现、验证或编译过程中的项目专属坑点。宿主 AI 必须把它们当作工作约束，而不是普通说明文字。

### Constraint 1: 来源证据：[Bug]: arun() and arun_many() type hinting needs fixing

- Trigger: GitHub 社区证据显示该项目存在一个安装相关的待验证问题：[Bug]: arun() and arun_many() type hinting needs fixing
- Host AI rule: 来源问题仍为 open，Pack Agent 需要复核是否仍影响当前版本。
- Why it matters: 可能增加新用户试用和生产接入成本。
- Evidence: community_evidence:github | cevd_d3b6cfd3700147f690e0e65875f15424 | https://github.com/unclecode/crawl4ai/issues/1898 | 来源讨论提到 python 相关条件，需在安装/试用前复核。
- Hard boundary: 不要把这个坑点包装成已解决、已验证或可忽略，除非后续验证证据明确证明它已经关闭。

### Constraint 2: 来源证据：[Bug]: After successful FETCH, and failed SCRAPE (COMPLETE being marked as failed), no error messages or failure reason…

- Trigger: GitHub 社区证据显示该项目存在一个配置相关的待验证问题：[Bug]: After successful FETCH, and failed SCRAPE (COMPLETE being marked as failed), no error messages or failure reason is shown
- Host AI rule: 来源问题仍为 open，Pack Agent 需要复核是否仍影响当前版本。
- Why it matters: 可能增加新用户试用和生产接入成本。
- Evidence: community_evidence:github | cevd_ad61b108bf894cc286ca7966e8c86758 | https://github.com/unclecode/crawl4ai/issues/1949 | 来源讨论提到 python 相关条件，需在安装/试用前复核。
- Hard boundary: 不要把这个坑点包装成已解决、已验证或可忽略，除非后续验证证据明确证明它已经关闭。

### Constraint 3: 来源证据：[Bug]: MCP scrape tools lack wait_until / SPA support that REST API and CLI provide

- Trigger: GitHub 社区证据显示该项目存在一个配置相关的待验证问题：[Bug]: MCP scrape tools lack wait_until / SPA support that REST API and CLI provide
- Host AI rule: 来源问题仍为 open，Pack Agent 需要复核是否仍影响当前版本。
- Why it matters: 可能增加新用户试用和生产接入成本。
- Evidence: community_evidence:github | cevd_1ee99f5d72f143f4b064732cc19e0c85 | https://github.com/unclecode/crawl4ai/issues/1963 | 来源讨论提到 python 相关条件，需在安装/试用前复核。
- Hard boundary: 不要把这个坑点包装成已解决、已验证或可忽略，除非后续验证证据明确证明它已经关闭。

### Constraint 4: 来源证据：[Bug]: `remove_empty_elements_fast()` drops trailing text when removing empty elements with non-empty .tail

- Trigger: GitHub 社区证据显示该项目存在一个配置相关的待验证问题：[Bug]: `remove_empty_elements_fast()` drops trailing text when removing empty elements with non-empty .tail
- Host AI rule: 来源问题仍为 open，Pack Agent 需要复核是否仍影响当前版本。
- Why it matters: 可能增加新用户试用和生产接入成本。
- Evidence: community_evidence:github | cevd_d7fa967632a948008efbc182d1f2c96b | https://github.com/unclecode/crawl4ai/issues/1938 | 来源讨论提到 python 相关条件，需在安装/试用前复核。
- Hard boundary: 不要把这个坑点包装成已解决、已验证或可忽略，除非后续验证证据明确证明它已经关闭。

### Constraint 5: 来源证据：[Bug] MCP Server json.dumps() escapes non-ASCII characters, causing 2.5-3x token overhead for CJK content

- Trigger: GitHub 社区证据显示该项目存在一个安全/权限相关的待验证问题：[Bug] MCP Server json.dumps() escapes non-ASCII characters, causing 2.5-3x token overhead for CJK content
- Host AI rule: 来源问题仍为 open，Pack Agent 需要复核是否仍影响当前版本。
- Why it matters: 可能影响授权、密钥配置或安全边界。
- Evidence: community_evidence:github | cevd_2e9fbf659fbb40aba437886a87f8e2d7 | https://github.com/unclecode/crawl4ai/issues/1962 | 来源讨论提到 python 相关条件，需在安装/试用前复核。
- Hard boundary: 不要把这个坑点包装成已解决、已验证或可忽略，除非后续验证证据明确证明它已经关闭。

### Constraint 6: 来源证据：[Bug] AsyncLogger writes to stdout, breaking MCP stdio transport

- Trigger: GitHub 社区证据显示该项目存在一个安装相关的待验证问题：[Bug] AsyncLogger writes to stdout, breaking MCP stdio transport
- Host AI rule: 来源问题仍为 open，Pack Agent 需要复核是否仍影响当前版本。
- Why it matters: 可能影响升级、迁移或版本选择。
- Evidence: community_evidence:github | cevd_af29278fd7294d4a8f0f6f37ab987b5c | https://github.com/unclecode/crawl4ai/issues/1968 | 来源讨论提到 python 相关条件，需在安装/试用前复核。
- Hard boundary: 不要把这个坑点包装成已解决、已验证或可忽略，除非后续验证证据明确证明它已经关闭。

### Constraint 7: 来源证据：[Bug]: The install with pip on just about any system rarely works. It requires an env or it only partial installs

- Trigger: GitHub 社区证据显示该项目存在一个安装相关的待验证问题：[Bug]: The install with pip on just about any system rarely works. It requires an env or it only partial installs
- Host AI rule: 来源显示可能已有修复、规避或版本变化，说明书中必须标注适用版本。
- Why it matters: 可能增加新用户试用和生产接入成本。
- Evidence: community_evidence:github | cevd_97d44cedb21a4908a7743fde11209954 | https://github.com/unclecode/crawl4ai/issues/1950 | 来源讨论提到 python 相关条件，需在安装/试用前复核。
- Hard boundary: 不要把这个坑点包装成已解决、已验证或可忽略，除非后续验证证据明确证明它已经关闭。

### Constraint 8: 来源证据：[Bug]: enable_stealth=True is a silent no-op — StealthAdapter imports symbols that don't exist in playwright-stealth 2.x

- Trigger: GitHub 社区证据显示该项目存在一个安装相关的待验证问题：[Bug]: enable_stealth=True is a silent no-op — StealthAdapter imports symbols that don't exist in playwright-stealth 2.x
- Host AI rule: 来源问题仍为 open，Pack Agent 需要复核是否仍影响当前版本。
- Why it matters: 可能增加新用户试用和生产接入成本。
- Evidence: community_evidence:github | cevd_ae45861377894b99a57d6bbdc06af313 | https://github.com/unclecode/crawl4ai/issues/1959 | 来源讨论提到 python 相关条件，需在安装/试用前复核。
- Hard boundary: 不要把这个坑点包装成已解决、已验证或可忽略，除非后续验证证据明确证明它已经关闭。

### Constraint 9: 来源证据：v0.7.1:Update

- Trigger: GitHub 社区证据显示该项目存在一个安装相关的待验证问题：v0.7.1:Update
- Host AI rule: 来源显示可能已有修复、规避或版本变化，说明书中必须标注适用版本。
- Why it matters: 可能影响升级、迁移或版本选择。
- Evidence: community_evidence:github | cevd_a6ae9133fff54443b712725f51769fa1 | https://github.com/unclecode/crawl4ai/releases/tag/v0.7.1 | 来源讨论提到 python 相关条件，需在安装/试用前复核。
- Hard boundary: 不要把这个坑点包装成已解决、已验证或可忽略，除非后续验证证据明确证明它已经关闭。

### Constraint 10: 来源证据：v0.7.2: CI/CD & Dependency Optimization Update

- Trigger: GitHub 社区证据显示该项目存在一个安装相关的待验证问题：v0.7.2: CI/CD & Dependency Optimization Update
- Host AI rule: 来源显示可能已有修复、规避或版本变化，说明书中必须标注适用版本。
- Why it matters: 可能影响升级、迁移或版本选择。
- Evidence: community_evidence:github | cevd_14954e0431ca426ebeaa4bb31778d4af | https://github.com/unclecode/crawl4ai/releases/tag/v0.7.2 | 来源讨论提到 docker 相关条件，需在安装/试用前复核。
- Hard boundary: 不要把这个坑点包装成已解决、已验证或可忽略，除非后续验证证据明确证明它已经关闭。
