# crawlee - Doramagic AI Context Pack

> Purpose: pre-work context for the user's host AI. This pack does not prove that the project has been installed, run, or validated.

## Project

- canonical_name: `apify/crawlee`
- capability: Crawlee—A web scraping and browser automation library for Node.js to build reliable crawlers. In JavaScript and TypeScript. Extract data for AI, LLMs, RAG, or GPTs. Download HTML, PDF, JPG, PNG, and other files from websites. Works with Puppeteer, Playwright, Cheerio, JSDOM, and raw HTTP. Both headful and headless mode. With proxy rotation.
- expected_user_outcome: Crawlee—A web scraping and browser automation library for Node.js to build reliable crawlers. In JavaScript and TypeScript. Extract data for AI, LLMs, RAG, or GPTs. Download HTML, PDF, JPG, PNG, and other files from websites. Works with Puppeteer, Playwright, Cheerio, JSDOM, and raw HTTP. Both headful and headless mode. With proxy rotation.

## Operating Boundaries

- Do not claim that the project has been installed, run, called through an API, or used on local files unless separate evidence proves it.
- Project facts must come from repo evidence, Claim Graph, or explicit source references.
- When a capability is not verified, mark it as unverified instead of completing it as fact.
- publish_status: `publishable`
- blocking_gaps: none

---

## Doramagic Context Augmentation

The following sections strengthen the repository context for a host AI. Human Manual data is a reading route, and pitfall notes become operating constraints.

## Human Manual Outline

Usage rule: this is only a reading route and salience signal, not factual authority. Concrete claims must still return to repo evidence or Claim Graph.

Host AI hard rules:
- Do not treat page titles, section order, summaries, or importance values as factual project evidence.
- When explaining the Human Manual outline, state that it is only a reading route or salience signal.
- Capability, installation, compatibility, runtime state, and risk claims must cite repo evidence, source paths, or Claim Graph.

- **Overview, Architecture, and Package Layout**: importance `high`
  - source_paths: README.md, package.json, lerna.json, turbo.json, packages/crawlee/src/index.ts
- **Crawler Hierarchy and HTTP Clients**: importance `high`
  - source_paths: packages/basic-crawler/src/internals/basic-crawler.ts, packages/basic-crawler/src/internals/send-request.ts, packages/http-crawler/src/internals/http-crawler.ts, packages/http-crawler/src/internals/file-download.ts, packages/cheerio-crawler/src/internals/cheerio-crawler.ts
- **Browser Pool, Launchers, and Fingerprinting**: importance `high`
  - source_paths: packages/browser-pool/src/browser-pool.ts, packages/browser-pool/src/abstract-classes/browser-controller.ts, packages/browser-pool/src/abstract-classes/browser-plugin.ts, packages/browser-pool/src/launch-context.ts, packages/browser-pool/src/anonymize-proxy.ts
- **Storage, Sessions, Proxies, Autoscaling, and CLI**: importance `high`
  - source_paths: packages/core/src/storages/storage_manager.ts, packages/core/src/storages/dataset.ts, packages/core/src/storages/key_value_store.ts, packages/core/src/storages/request_list.ts, packages/core/src/storages/request_queue.ts

## Repo Inspection Evidence

- repo_clone_verified: true
- repo_inspection_verified: true
- repo_commit: `15e77ef78849f42ffcf13f44a014d5df43eebc26`
- inspected_files: `README.md`, `package.json`, `docs/deployment/apify_platform.mdx`, `docs/deployment/apify_platform_init_exit.ts`, `docs/deployment/apify_platform_main.ts`, `docs/deployment/aws-browsers.md`, `docs/deployment/aws-cheerio.md`, `docs/deployment/gcp-browsers.md`, `docs/deployment/gcp-cheerio.md`, `docs/examples/accept_user_input.mdx`, `docs/examples/accept_user_input.ts`, `docs/examples/add_data_to_dataset.mdx`, `docs/examples/add_data_to_dataset.ts`, `docs/examples/basic_crawler.mdx`, `docs/examples/basic_crawler.ts`, `docs/examples/cheerio_crawler.mdx`, `docs/examples/cheerio_crawler.ts`, `docs/examples/crawl_all_links.mdx`, `docs/examples/crawl_all_links_cheerio.ts`, `docs/examples/crawl_all_links_playwright.ts`

Host AI hard rules:
- Without repo_clone_verified=true, do not claim that the source code has been read.
- Without repo_inspection_verified=true, do not write README, docs, or package-file conclusions as facts.
- Without quick_start_verified=true, do not claim that the Quick Start path has run successfully.

## Doramagic Pitfall Constraints

These rules come from Doramagic discovery, validation, or compilation findings. The host AI must treat them as operating constraints, not background notes.

### Constraint 1: Maintenance risk requires verification

- Trigger: Project evidence flags a maintenance risk. Review the linked source before relying on this workflow.
- Host AI rule: Reproduce the official install and quickstart path in an isolated environment.
- Why it matters: May increase setup, validation, or first-run risk for the user.
- Evidence: community_evidence:github | https://github.com/apify/crawlee/issues/2046
- Hard boundary: Do not present this pitfall as solved, verified, or ignorable unless later evidence explicitly closes it.

### Constraint 2: Configuration risk requires verification

- Trigger: Developers should check this configuration risk before relying on the project: v3.15.1
- Host AI rule: Before packaging this project, run the relevant install/config/quickstart check for: v3.15.1. Context: Source discussion did not expose a precise runtime context.
- Why it matters: Upgrade or migration may change expected behavior: v3.15.1
- Evidence: failure_mode_cluster:github_release | https://github.com/apify/crawlee/releases/tag/v3.15.1
- Hard boundary: Do not present this pitfall as solved, verified, or ignorable unless later evidence explicitly closes it.

### Constraint 3: Configuration risk requires verification

- Trigger: Developers should check this configuration risk before relying on the project: v3.16.0
- Host AI rule: Before packaging this project, run the relevant install/config/quickstart check for: v3.16.0. Context: Source discussion did not expose a precise runtime context.
- Why it matters: Upgrade or migration may change expected behavior: v3.16.0
- Evidence: failure_mode_cluster:github_release | https://github.com/apify/crawlee/releases/tag/v3.16.0
- Hard boundary: Do not present this pitfall as solved, verified, or ignorable unless later evidence explicitly closes it.

### Constraint 4: Capability evidence risk requires verification

- Trigger: README/documentation is current enough for a first validation pass.
- Host AI rule: Reproduce the official install and quickstart path in an isolated environment.
- Why it matters: May increase setup, validation, or first-run risk for the user.
- Evidence: capability.assumptions | https://github.com/apify/crawlee
- Hard boundary: Do not present this pitfall as solved, verified, or ignorable unless later evidence explicitly closes it.

### Constraint 5: Runtime risk requires verification

- Trigger: Developers should check this runtime risk before relying on the project: v3.15.0
- Host AI rule: Before packaging this project, run the relevant install/config/quickstart check for: v3.15.0. Context: Source discussion did not expose a precise runtime context.
- Why it matters: Upgrade or migration may change expected behavior: v3.15.0
- Evidence: failure_mode_cluster:github_release | https://github.com/apify/crawlee/releases/tag/v3.15.0
- Hard boundary: Do not present this pitfall as solved, verified, or ignorable unless later evidence explicitly closes it.

### Constraint 6: Runtime risk requires verification

- Trigger: Developers should check this runtime risk before relying on the project: v3.15.3
- Host AI rule: Before packaging this project, run the relevant install/config/quickstart check for: v3.15.3. Context: Observed when using playwright
- Why it matters: Upgrade or migration may change expected behavior: v3.15.3
- Evidence: failure_mode_cluster:github_release | https://github.com/apify/crawlee/releases/tag/v3.15.3
- Hard boundary: Do not present this pitfall as solved, verified, or ignorable unless later evidence explicitly closes it.

### Constraint 7: Runtime risk requires verification

- Trigger: Developers should check this runtime risk before relying on the project: v3.17.0
- Host AI rule: Before packaging this project, run the relevant install/config/quickstart check for: v3.17.0. Context: Source discussion did not expose a precise runtime context.
- Why it matters: Upgrade or migration may change expected behavior: v3.17.0
- Evidence: failure_mode_cluster:github_release | https://github.com/apify/crawlee/releases/tag/v3.17.0
- Hard boundary: Do not present this pitfall as solved, verified, or ignorable unless later evidence explicitly closes it.

### Constraint 8: Runtime risk requires verification

- Trigger: Project evidence flags a runtime risk. Review the linked source before relying on this workflow.
- Host AI rule: Reproduce the official install and quickstart path in an isolated environment.
- Why it matters: May increase setup, validation, or first-run risk for the user.
- Evidence: community_evidence:github | https://github.com/apify/crawlee/issues/3764
- Hard boundary: Do not present this pitfall as solved, verified, or ignorable unless later evidence explicitly closes it.

### Constraint 9: Maintenance risk requires verification

- Trigger: Project evidence flags a maintenance risk. Review the linked source before relying on this workflow.
- Host AI rule: Reproduce the official install and quickstart path in an isolated environment.
- Why it matters: May increase setup, validation, or first-run risk for the user.
- Evidence: evidence.maintainer_signals | https://github.com/apify/crawlee
- Hard boundary: Do not present this pitfall as solved, verified, or ignorable unless later evidence explicitly closes it.

### Constraint 10: Security or permission risk requires verification

- Trigger: no_demo
- Host AI rule: Reproduce the official install and quickstart path in an isolated environment.
- Why it matters: May increase setup, validation, or first-run risk for the user.
- Evidence: downstream_validation.risk_items | https://github.com/apify/crawlee
- Hard boundary: Do not present this pitfall as solved, verified, or ignorable unless later evidence explicitly closes it.
