Match the project to your task before installing it.
browser-automation · Public
crawlee
Crawlee—A web scraping and browser automation library for Node.js to build reliable crawlers. In JavaScript and TypeScript. Extract data for AI, LLMs, RAG, or GPTs. Download HTML, PDF, JPG, PNG, and other files from websites. Works with Puppeteer, Playwright, Cheerio, JSDOM, and raw HTTP. Both headful and headless mode. With proxy rotation.
Check whether this project matches your task before installing it.
What it can doskill, recipe, host_instruction, eval, preflightReview the portable capability path.
Before continuingVerify in a sandboxDo not treat a preview pack as a proven local install.
GitHub snapshot24k stars1.4k forks · 133 contributors
Doramagic.ai Last verification date: 2026-06-22 Verification method: source evidence, semantic profile, public page gate, and static build acceptance.
Publication status · 2026-06-22
What is crawlee?
- Crawlee—A web scraping and browser automation library for Node.js to build reliable crawlers. In JavaScript and TypeScript. Extract data for AI, LLMs, RAG, or GPTs. Download HTML, PDF, JPG, PNG, and other files from websites. Works with Puppeteer, Playwright, Cheerio, JSDOM, and raw HTTP. Both headful and headless mode. With proxy rotation.
- Best fit: Users who want source-backed project understanding before installing it.
- Not for: Not for users who want to skip sandbox verification or cannot accept configuration, permission, or maintenance overhead.
- Capability added to an AI workflow: skill, recipe, host_instruction, eval, preflight
- First safe verification step: Verify the smallest path in an isolated environment and keep a rollback path.
- Verification state: source, Quick Start, and sandbox install checks are recorded as passed.
- Top risk: May increase setup, validation, or first-run risk for the user.
- Evidence base: https://github.com/apify/crawlee, https://github.com/apify/crawlee#readme, Human Manual, Pitfall Log
01
Quick decision
Use this section to decide whether the project is worth a deeper read.Crawlee—A web scraping and browser automation library for Node.js to build reliable crawlers. In JavaScript and TypeScript. Extract data for AI, LLMs, RAG, or GPTs. Download HTML, PDF, JPG, PNG, and other files from websites. Works with Puppeteer, Playwright, Cheerio, JSDOM, and raw HTTP. Both headful and headless mode. With proxy rotation.
24k stars · 1.4k forks
02
What it can do
Translate the upstream project into concrete capabilities the user can judge before installing.Overview, Architecture, and Package Layout
Related topics: Crawler Hierarchy and HTTP Clients, Browser Pool, Launchers, and Fingerprinting, Storage, Sessions, Proxies, Autoscaling, and CLI
Source: https://github.com/apify/crawlee / Human Manual
Crawler Hierarchy and HTTP Clients
Related topics: Overview, Architecture, and Package Layout, Browser Pool, Launchers, and Fingerprinting, Storage, Sessions, Proxies, Autoscaling, and CLI
Source: https://github.com/apify/crawlee / Human Manual
Browser Pool, Launchers, and Fingerprinting
Related topics: Overview, Architecture, and Package Layout, Crawler Hierarchy and HTTP Clients, Storage, Sessions, Proxies, Autoscaling, and CLI
Source: https://github.com/apify/crawlee / Human Manual
Storage, Sessions, Proxies, Autoscaling, and CLI
Related topics: Overview, Architecture, and Package Layout, Crawler Hierarchy and HTTP Clients, Browser Pool, Launchers, and Fingerprinting
Source: https://github.com/apify/crawlee / Human Manual
Doramagic Pitfall Log
Source-linked risks stay visible on the manual page so the preview does not read like a recommendation.
Source: Doramagic discovery, validation, and Project Pack records
Sources: https://github.com/apify/crawlee, Human Manual, Project Pack evidence, and downstream validation signals.
03
Community Discussion Evidence
Project-level external discussion stays visible on the detail page, not only inside the manual.Community Discussion Evidence
12 source-linked itemsReview these external discussions before using crawlee with real data or production workflows. They are review inputs, not standalone proof that the project is production-ready.
-
01
Allow disabling ImpitHttpClient client cache
github / github_issue
-
02
Add support for Bun runtime - Issue with `browser-pool` and `memory-stor
github / github_issue
-
03
Crawler hangs forever when given malformed request input (e.g. invalid `
github / github_issue
-
04
Merge @crawlee/memory-storage package into @crawlee/core
github / github_issue
-
05
v3.17.0
github / github_release
-
06
v3.16.0
github / github_release
-
07
v3.15.3
github / github_release
-
08
v3.15.2
github / github_release
-
09
v3.15.1
github / github_release
-
10
v3.15.0
github / github_release
-
11
v3.14.1
github / github_release
-
12
v3.14.0
github / github_release
04
How to start
Only source-backed commands are shown here. Verify them in an isolated environment first.Try the prompt first
Test the workflow without installing the upstream project.
previewRead the Human Manual
Understand inputs, outputs, limits, and failure modes.
manualTake context to your AI host
Use the compiled assets in your preferred AI environment.
contextRun sandbox verification
Confirm install commands and rollback before using a primary environment.
verifynpx crawleeOfficial start command · https://github.com/apify/crawlee#readme · verified: yes
05
Human Manual
The English page must expose the real manual, not a short placeholder.8+ sections · Human Manual
crawlee Manual
Crawlee—A web scraping and browser automation library for Node.js to build reliable crawlers. In JavaScript and TypeScript. Extract data for AI, LLMs, RAG, or GPTs. Download HTML, PDF, JPG, PNG, and other files from websites. Works with Puppeteer, Playwright, Cheerio, JSDOM, and raw HTTP. Both headful and headless mode. With proxy rotation.
Open the full manual- https://github.com/apify/crawlee Project Manual
- Table of Contents
- Overview, Architecture, and Package Layout
- Related Pages
- Purpose and Scope
- Repository Layout and Package Organization
- Core Architecture
- Project Templates and Getting Started
Overview, Architecture, and Package Layout
Related topics: Crawler Hierarchy and HTTP Clients, Browser Pool, Launchers, and Fingerprinting, Storage, Sessions, Proxies, Autoscaling, and CLI
Source: https://github.com/apify/crawlee / Human Manual
Crawler Hierarchy and HTTP Clients
Related topics: Overview, Architecture, and Package Layout, Browser Pool, Launchers, and Fingerprinting, Storage, Sessions, Proxies, Autoscaling, and CLI
Source: https://github.com/apify/crawlee / Human Manual
Browser Pool, Launchers, and Fingerprinting
Related topics: Overview, Architecture, and Package Layout, Crawler Hierarchy and HTTP Clients, Storage, Sessions, Proxies, Autoscaling, and CLI
Source: https://github.com/apify/crawlee / Human Manual
Storage, Sessions, Proxies, Autoscaling, and CLI
Related topics: Overview, Architecture, and Package Layout, Crawler Hierarchy and HTTP Clients, Browser Pool, Launchers, and Fingerprinting
Source: https://github.com/apify/crawlee / Human Manual
Doramagic Pitfall Log
Source-linked risks stay visible on the manual page so the preview does not read like a recommendation.
Source: Doramagic discovery, validation, and Project Pack records
06
AI Context Pack and portable assets
After deciding to continue, take the project context into your own AI host.Complete pack plus user-owned assets
These files are planning and verification assets for Claude Code, Codex, Gemini, Cursor, ChatGPT, and other AI hosts.
07
Preflight checks
Treat this page as a planning asset, not proof that your local environment is ready.- The manual is generated from source-linked project files and Doramagic validation signals.
- Community evidence warnings stay visible instead of being converted into marketing claims.
- This English page is indexable because the locale quality gate passed and explicit English index approval is enabled.
- Use the upstream repository as the final authority for installation commands, license, and version-specific behavior.
08
Pitfall Log and verification risks
Doramagic surfaces high-risk items before users treat a candidate capability as verified.Maintenance risk requires verification
May increase setup, validation, or first-run risk for the user.
Configuration risk requires verification
Upgrade or migration may change expected behavior: v3.15.1
Configuration risk requires verification
Upgrade or migration may change expected behavior: v3.16.0
Capability evidence risk requires verification
May increase setup, validation, or first-run risk for the user.
Runtime risk requires verification
Upgrade or migration may change expected behavior: v3.15.0
Runtime risk requires verification
Upgrade or migration may change expected behavior: v3.15.3
Runtime risk requires verification
Upgrade or migration may change expected behavior: v3.17.0
Runtime risk requires verification
May increase setup, validation, or first-run risk for the user.