firecrawl Manual - Doramagic.ai

Doramagic Project Pack · Human Manual

firecrawl

The API to search, scrape, and interact with the web at scale. 🔥

Overview and System Architecture

Related topics: Core API Endpoints and Features, SDKs, CLI, and Integrations, Self-Hosting, Deployment, and Operations

Section Related Pages

Continue reading this section for the full explanation and source context.

Overview and System Architecture

Firecrawl is a web data API and developer platform that converts any URL into clean, LLM-ready output (Markdown, HTML, screenshots, or structured JSON) and bundles higher-level workflows on top of single-page scraping: search, crawl, map, batch scrape, agent, and interact endpoints. The project is delivered as a managed cloud service and as a self-hostable Docker stack, and it exposes its capability through a Node.js API, an internal smart-scrape service, and a broad SDK ecosystem.

Core Capabilities

The product surface is organized into a small set of orthogonal primitives, each with its own endpoint and SDK method. The repository's main README.md lists the canonical feature table:

Capability	Role
Search	Web search that returns full page content from results
Scrape	Single-URL conversion to markdown, HTML, screenshots, or JSON
Interact	Scrape a page, then take actions on it (clicks, forms, navigation)
Agent	Automated data gathering driven by a natural-language objective
Crawl	Scrape all URLs of a website in a single request
Map	Discover all URLs on a website instantly
Batch Scrape	Scrape thousands of URLs asynchronously

The Node SDK example in apps/js-sdk/firecrawl/README.md and the Python example in apps/python-sdk/README.md both demonstrate the same v2 model: the client invokes scrape, crawl, startCrawl, getCrawlStatus, extract, and parse, and the API responds with paginated documents and (for long-running jobs) a next URL for manual pagination.

System Architecture

The architecture separates a stateless HTTP API from a stateful job-processing layer, with a dedicated internal service for AI-driven extraction. The high-level shape of the system is:

flowchart LR
    Client["SDK / CLI / cURL"] --> API["Firecrawl API<br/>(Node.js / TypeScript)"]
    API --> Queue["nuq Queue<br/>(Redis-backed)"]
    Queue --> Worker["nuq Workers"]
    Worker --> Browser["Browser / Fetch Engine"]
    Worker --> Smart["Smart-Scrape Service<br/>(LLM extraction)"]
    Smart --> LLM["LLM providers<br/>(Gemini, OpenAI, etc.)"]
    API --> Store["Result Store<br/>(Redis / DB)"]
    API --> Client

The API service is implemented in TypeScript. The smart-scrape subsystem, reachable via the SMART_SCRAPE_API_URL configuration variable, is invoked by the main scraper to extract structured data from a page using a prompt and a Zod-generated schema. The client POSTs a { url, prompt, extractId, scrapeId, models } payload and the smart-scrape service returns a SmartScrapeResult that matches the Zod schema declared in apps/api/src/scraper/scrapeURL/lib/smartScrape.ts. The schema itself is defined in apps/api/src/scraper/scrapeURL/lib/extractSmartScrape.ts and is composed of common reasoning fields (smartscrape_reasoning, smartscrape_prompt) plus per-field extraction slots added dynamically when a request schema is prepared.

The queue and worker subsystem is called nuq and is the orchestration backbone of crawl, batch-scrape, and other long-running jobs. The community has documented a known operational concern in this area: the nuq-worker processes can leak Redis connections over time on self-hosted deployments until they hit the default maxclients ceiling of 10,000, after which EPIPE errors cascade (see community context for issue #3662). Users running Firecrawl for extended uptime should plan for connection-pool monitoring or move to Redis-compatible backends such as Valkey, the open-source fork that the community has explicitly asked to validate against (issue #2718).

SDK and Tooling Ecosystem

Firecrawl ships first-party SDKs in multiple languages. The repository layout reflects this: the top-level apps/ directory contains the API server, and language-specific SDK packages are colocated with it. Each SDK follows the same conceptual model — a single Client (or Firecrawl / FirecrawlApp) constructed with an API key, exposing scrape, crawl, batchScrape, map, search, extract, and (newer) parse methods.

SDK	Source	Example Method
Node / TypeScript	apps/js-sdk/firecrawl/README.md	`app.scrape(url, { formats: ['markdown','html'] })`
Python	apps/python-sdk/README.md	`firecrawl.scrape(url, formats=[...])`
Rust	apps/rust-sdk/README.md	`client.scrape_url(url, None).await`
Ruby	apps/ruby-sdk/README.md	`client.crawl(url, options)`
Java	apps/java-sdk/src/main/java/com/firecrawl/models/QuestionFormat.java	Builder-pattern models such as `QuestionFormat.builder().question(...).build()`

In addition to the SDKs, the project ships two companion repositories: firecrawl-skills, which provides "skill" packages for integrating Firecrawl into product code (choosing endpoints, wiring SDKs, setting API keys), and firecrawl-workflows, which contains workflow skills for repeatable deliverables such as competitor analysis and design-clone briefs.

Integration Patterns and Common Failure Modes

The examples/ directory is a catalog of real integration patterns that combine Firecrawl with an external LLM. Each example follows the same shape: Map the target site to enumerate URLs, Rank the URLs with an LLM against the user's objective, Scrape the top candidates, then Extract structured fields. The pattern is repeated with GPT-4.1 (examples/gpt-4.1-web-crawler/README.md), Gemini 2.5 Pro (examples/gemini-2.5-crawler/README.md), Llama 4 Maverick (examples/llama-4-maverick-web-crawler/README.md), OpenAI o3 (examples/o3-web-crawler/README.md), and DeepSeek v3 (examples/deepseek-v3-company-researcher/README.md). This consistency means that an SDK written against one model is portable to the others by swapping the ranking/extraction LLM call.

Several recurring failure modes are visible in community discussions and are worth documenting up front:

Anti-bot loops. When a target site returns 403 or other anti-bot responses, the worker can "waterfall" through retries with no upper bound and become unresponsive (issue #2350). Self-hosters should set explicit timeouts and concurrency limits per worker.
Proxy coverage. Self-hosted deployments do not have built-in proxy rotation, so servers with a single egress IP will be rate-limited or blocked by most production sites (issue #1129). An external proxy tier is effectively required for production self-hosting.
Batch-scrape response size. GET /batch/scrape/:id returns the full data array even when status is not completed; users that only consume the result after completion have asked for a way to omit partial data (issue #2599).
Build and image issues. The Docker build has historically broken on Rust crate type signatures (issue #1103), and the Playwright image used by the browser service is sometimes referenced without a pinned image: field (issue #1322). Pinning the Playwright image is the recommended fix.

Core API Endpoints and Features

Related topics: Overview and System Architecture, SDKs, CLI, and Integrations

Section Related Pages

Continue reading this section for the full explanation and source context.

Core API Endpoints and Features

Overview and Scope

Firecrawl exposes a coherent family of web-data APIs that turn arbitrary URLs into LLM-ready Markdown, structured JSON, screenshots, or extracted records. The endpoint surface is organized around a small set of verbs — Search, Scrape, Crawl, Map, Batch Scrape, Extract, Interact, Parse, and Agent — each addressing a different unit of work, from a single page to an entire site, from a natural-language query to a controlled browser session. The README groups these into "Core Endpoints" and "More" features, signaling that Search, Scrape, and Interact are the primary entry points, while the others extend the system to multi-page and agentic workflows. Source: README.md:9-23.

The platform is delivered as a hosted SaaS at api.firecrawl.dev and as a self-hostable Docker stack, and it is consumed from many first-party SDKs. The public README documents examples for Python (apps/python-sdk/README.md:1-15), Node.js (apps/js-sdk/firecrawl/README.md:1-15), Rust (apps/rust-sdk/README.md:1-15), Ruby (apps/ruby-sdk/README.md), Java (apps/java-sdk/src/main/java/com/firecrawl/models/QuestionFormat.java), Go, .NET, PHP, and Elixir. Each SDK is a thin wrapper that auto-polls asynchronous jobs such as crawls and batch scrapes, so the same crawl, startCrawl, and getCrawlStatus verbs appear across languages. Source: apps/python-sdk/README.md:96-112, apps/js-sdk/firecrawl/README.md:60-90.

Endpoint Catalog

The table below summarizes the public endpoints surfaced by the README, with the v2 release notes as a cross-check. It is the single most useful reference for someone deciding which verb to call.

Endpoint	Unit of work	Typical use	Source
`Search`	Web + full-content results	RAG seed corpora, research	README.md:31-50
`Scrape`	One URL → Markdown/HTML/JSON/JSON-LD/screenshot/video	Single-page extraction	README.md:51-75, apps/js-sdk/firecrawl/README.md:21-40
`Map`	All URLs of a site (optionally filtered by `search`)	Discovery before crawl	README.md:97-130
`Crawl`	Recursive multi-page scrape from a seed URL	Whole-site ingestion	apps/python-sdk/README.md:52-72
`Batch Scrape`	Many URLs in one async job	Bulk one-shot work	apps/ruby-sdk/README.md:38-50, README.md:131-150
`Extract`	LLM-driven structured extraction with schema	Fielded data	apps/js-sdk/firecrawl/README.md:100-115
`Interact`	Post-scrape browser actions via prompt or Playwright	Dynamic SPAs	v2.9.0 release notes
`Parse`	Upload local PDF/DOCX/XLSX/HTML up to 50 MB	File-to-Markdown	v2.10 release notes
`Agent`	Natural-language data gathering using Spark 1 Fast	Autonomous research	v2.8.0 release notes

Search, Scrape, Map, and Crawl are the long-stable verbs. The Interact (v2.9.0), Parse (v2.10), and Agent / Parallel Agents (v2.8.0) endpoints are the more recent additions that pushed the API toward agentic and file-driven workloads.

How the Endpoints Fit Together

The endpoints are composable, not isolated. A typical research flow is Search to gather candidate URLs, Map to enumerate the target site, Crawl or Batch Scrape to harvest pages, and Extract to pull structured fields. The Interact endpoint slots in after Scrape when a page needs clicks, form fills, or pagination before the desired data is reachable. Source: README.md:131-175, v2.9.0 release notes.

Internally, structured extraction is implemented by extractSmartScrape.ts, which builds a Zod schema from the user's prompt, then delegates to smartScrape.ts. The latter POSTs to an internal /smart-scrape endpoint with a thinkingModel (e.g., gemini-2.5-pro) and a costTracking object, so the public Extract verb is a thin shell over an internal agentic loop. Source: apps/api/src/scraper/scrapeURL/lib/extractSmartScrape.ts:1-50, apps/api/src/scraper/scrapeURL/lib/smartScrape.ts:32-70.

flowchart LR
    A[Search] --> B[Map]
    B --> C[Crawl / Batch Scrape]
    C --> D[Scrape]
    D --> E{Data reachable?}
    E -- yes --> F[Extract / QuestionFormat]
    E -- no --> G[Interact]
    G --> F
    H[Parse] --> F
    I[Agent] --> C

QuestionFormat in the Java SDK (apps/java-sdk/src/main/java/com/firecrawl/models/QuestionFormat.java:1-36) is the concrete shape of a question-style extract request, illustrating how SDKs codify endpoint parameters. Companion repos extend the surface further: firecrawl-skills adds reusable product-integration skills, and firecrawl-workflows packages them into deliverables such as competitor analyses. Source: firecrawl-skills/README.md:1-5, firecrawl-workflows/README.md:1-5.

Known Limitations and Community Concerns

Several recurring themes appear in the issue tracker and are worth documenting alongside the endpoint reference:

Batch scrape response payload — GET /batch/scrape/:id currently returns the full data array even when status is not completed; users have asked for a way to omit partial data (issue #2599).
Self-host anti-bot failures — Long-running self-hosted instances can waterfall into a permanent loop when blocked by anti-bot services, leaving nuq-worker processes spinning with no progress (issue #2350, 25 comments).
Redis connection leaks in nuq workers — Node processes can leak Redis connections until the server hits maxclients (10k), after which an EPIPE storm degrades CPU and memory (issue #3662). The community has also asked for explicit Valkey compatibility testing (issue #2718).
Self-host proxy support — Issue #1129 (4 comments) requests first-class proxy integration for the self-hosted Docker stack, since residential/datacenter IP rotation is currently left to the operator.
/interact and /parse operational limits — Interact requires a session from a prior Scrape; Parse accepts files up to 50 MB and supports PDF, DOCX, DOC, ODT, RTF, XLSX, XLS, and HTML (v2.10 release notes).
x402 wallet shortfalls — For autonomous agents paying per scrape via x402, there is no automatic credit fallback when the wallet runs dry mid-job (issue #3415).

These items do not change the public endpoint contract, but they directly affect how production deployments and large crawls should be sized and monitored.

SDKs, CLI, and Integrations

Related topics: Overview and System Architecture, Core API Endpoints and Features

Section Related Pages

Continue reading this section for the full explanation and source context.

Section Python SDK

Continue reading this section for the full explanation and source context.

Section Node.js SDK

Continue reading this section for the full explanation and source context.

Section Rust SDK

Continue reading this section for the full explanation and source context.

SDKs, CLI, and Integrations

Overview

Firecrawl is delivered as a managed API plus a constellation of language SDKs, a command-line interface, and reusable skill/workflow bundles that wrap the core endpoints — Search, Scrape, Crawl, Map, Batch Scrape, Agent, Interact, and Parse. The official distribution ships first-party SDKs for Python, JavaScript/Node.js, Rust, and Ruby, each maintained in the same monorepo alongside the API service. The project also publishes domain-specific examples (GPT-4.1, Gemini 2.5, o3, Llama 4 Maverick, DeepSeek V3) that demonstrate how to combine Firecrawl with external LLM providers to build objective-driven crawlers. Source: README.md:1-25.

flowchart LR
    User[Developer / Agent] --> CLI[Firecrawl CLI]
    User --> Py[Python SDK]
    User --> JS[Node.js SDK]
    User --> RS[Rust SDK]
    User --> RB[Ruby SDK]
    CLI --> API[(Firecrawl API v2)]
    Py --> API
    JS --> API
    RS --> API
    RB --> API
    API --> Core[Search / Scrape / Crawl / Map / Batch / Agent / Interact / Parse]
    User --> Skills[firecrawl-skills]
    User --> Workflows[firecrawl-workflows]
    Skills -.uses.-> SDKs
    Workflows -.uses.-> SDKs

Official Language SDKs

Python SDK

The Python SDK is installed via pip install firecrawl-py and is initialized with an API key, either passed to the Firecrawl constructor or supplied through the FIRECRAWL_API_KEY environment variable. It exposes a v2 surface where the same Firecrawl instance returns typed responses for scrape, crawl, start_crawl, get_crawl_status, batch_scrape, map, search, and parse. Polling is built into the waiter-style methods (e.g., crawl(...) blocks until the crawl is complete, while start_crawl(...) returns a job ID). Async usage is supported through a dedicated Async class, and WebSocket-based crawl watching is exposed via crawl_url_and_watch with document, error, and done event listeners. Source: apps/python-sdk/README.md:1-180.

Node.js SDK

The Node SDK is published as firecrawl on npm and is initialized as new Firecrawl({ apiKey: 'fc-YOUR_API_KEY' }). It mirrors the Python SDK's v2 API: app.scrape(url, { formats }) for single-page scraping, app.crawl(url, opts) and app.startCrawl(url, opts) for crawls, app.getCrawlStatus(id) for status polling, and app.batch_scrape([...]) for asynchronous batches. The SDK supports Zod schemas for the extract endpoint and exposes a video format for YouTube and TikTok URLs that returns a signed playable URL. Source: apps/js-sdk/firecrawl/README.md:1-150.

Rust SDK

The Rust SDK is a thin async client published as the firecrawl crate. It requires tokio with the full feature and is initialized via Client::new("fc-YOUR-API-KEY"). Scrape is invoked with client.scrape_url("https://firecrawl.dev", None).await, and the returned Document exposes the markdown field directly. The crate supports a Format::Video variant that mirrors the Node SDK's video extraction behavior. Source: apps/rust-sdk/README.md:1-60.

Ruby SDK

The Ruby SDK exposes a Firecrawl client with scrape, crawl, start_crawl, get_crawl_status, cancel_crawl, batch_scrape, map, and search methods. Crawl and batch-scrape methods auto-poll by default, returning a job object whose data array contains the scraped documents; async methods return an ID that can be passed to get_crawl_status or cancel_crawl. Source: apps/ruby-sdk/README.md:1-100.

CLI, Skills, and Workflows

The Firecrawl CLI is bundled with the v2.8.0 distribution and provides full support for scraping, crawling, and search from the terminal; it is documented alongside the SDKs as part of the agent developer-tooling surface (per the v2.8.0 release notes). Reusable AI bundles are split across two companion repos that live inside this monorepo:

Bundle	Path	Purpose
Skills	`firecrawl-skills/`	Skills for adding Firecrawl to product code, choosing endpoints, wiring SDKs, and configuring API keys. Source: firecrawl-skills/README.md:1-5
Workflows	`firecrawl-workflows/`	Workflow skills for repeatable Firecrawl-powered deliverables such as competitor analysis and website design clone briefs. Source: firecrawl-workflows/README.md:1-5

A small internal Go shared library, go-html-to-md, supports the API's HTML-to-Markdown transformation. It is built per platform as a c-shared library (html-to-markdown.dll, libhtml-to-markdown.so, or libhtml-to-markdown.dylib) and is consumed by the Node scrape pipeline. Source: apps/api/sharedLibs/go-html-to-md/README.md:1-10.

Integration Examples

The examples/ directory contains end-to-end reference integrations that pair Firecrawl with external LLMs:

GPT-4.1 Web Crawler — Maps a site, ranks pages by relevance to a stated objective, scrapes the top candidates, and returns structured JSON. Requires FIRECRAWL_API_KEY and an OPENAI_API_KEY. Source: examples/gpt-4.1-web-crawler/README.md:1-60.
Gemini 2.5 Web Extractor — Uses SerpAPI for search, Gemini 2.5 Pro (Experimental) for URL selection, and Firecrawl's extract for structured data, with colorized real-time progress output. Source: examples/gemini-2.5-web-extractor/README.md:1-40.
o3 Web Crawler — Uses OpenAI's o3 model to rank pages mapped by Firecrawl, then scrapes and extracts the requested fields. Source: examples/o3-web-crawler/README.md:1-45.
Llama 4 Maverick Web Crawler — Combines Firecrawl's map/scrape with Llama 4 Maverick served through Together AI for relevance ranking and extraction. Source: examples/llama-4-maverick-web-crawler/README.md:1-50.
DeepSeek V3 Company Researcher — Searches, ranks, and extracts structured company information using DeepSeek V3 as the analysis layer. Source: examples/deepseek-v3-company-researcher/README.md:1-30.
Gemini 2.5 Screenshot Editor — CLI tool that captures website screenshots and applies Gemini 2.5 Flash-powered style transforms, editing, and text-to-image generation. Source: examples/gemini-2.5-screenshot-editor/README.md:1-25.

Common Failure Modes

Community-reported issues relevant to SDK and integration users include:

Self-hosted proxy support — Many self-hosted users need outbound proxy configuration to avoid IP-based blocking (#1129).
Anti-bot infinite retry loops — Waterfalling retries without a global timeout can wedge nuq workers (#2350).
Build failures on Rust shared library — The html-transformer CString/*mut u8 vs *mut i8 mismatch breaks Docker self-host builds (#1103).
Valkey compatibility — Redis-compatible queue/cache backend (Valkey) lacks test coverage (#2718).
Redis connection leaks in nuq workers — Self-hosted nuq workers can exhaust maxclients over weeks of uptime, causing EPIPE storms (#3662).

Self-Hosting, Deployment, and Operations

Related topics: Overview and System Architecture, Core API Endpoints and Features

Section Related Pages

Continue reading this section for the full explanation and source context.

Self-Hosting, Deployment, and Operations

Overview and Purpose

Firecrawl can be used as a managed SaaS at firecrawl.dev or self-hosted locally. The self-hosted distribution packages the full web-data pipeline — Search, Scrape, Crawl, Map, Batch Scrape, Interact, and Agent — as a set of cooperating containers (Source: README.md). The reference deployment is driven by docker-compose.yaml and a single SELF_HOST.md quickstart, so an operator can stand up the API server, the worker tier, the headless browser tier, Redis-backed queues, and an optional Postgres metadata store with one docker compose up (Source: SELF_HOST.md).

The self-hosted topology is intentionally split so that stateless web/API processes, long-running crawl workers, and ephemeral browser instances can be scaled and restarted independently.

flowchart LR
    Client[Client / SDK] -->|HTTPS| API[API service<br/>apps/api]
    API -->|enqueue jobs| Redis[(Redis<br/>queue + cache)]
    API <-->|metadata| DB[(Postgres)]
    Workers[nuq workers] -->|consume| Redis
    Workers -->|dispatch| Playwright[Playwright / browser pool]
    Workers -->|scrape| Target((Target websites))
    API -->|proxy| External[Smart Scrape / LLM endpoints]

Configuration and Environment

All runtime configuration is sourced from environment variables loaded by config.ts. The authoritative reference is apps/api/.env.example, and src/lib/deployment.ts resolves deployment-mode flags (e.g., USE_DB_AUTHENTICATION, DISABLE_INDEXING) that gate features depending on whether you are running against the hosted service or your own infrastructure (Source: apps/api/src/lib/deployment.ts).

Common configuration categories that operators should review before going live:

Category	Key variables (illustrative)	Purpose
Core	`PORT`, `HOST`, `NODE_ENV`	HTTP listener binding
Queue/Cache	`REDIS_URL`, `REDIS_RATE_LIMIT_URL`	nuq queue and rate limiter
Database	`DATABASE_URL`, `USE_DB_AUTHENTICATION`	Optional Postgres-backed auth
LLM / Smart Scrape	`SMART_SCRAPE_API_URL`, `MODELS`, `OPENAI_API_KEY`, `GEMINI_API_KEY`	Powers `/extract`, `/interact`, `/agent`
Browser	`PLAYWRIGHT_MICROSERVICE_URL`, `PROXY_SERVER`	Headless rendering pool
Deployment mode	`SELF_HOSTED`, `IS_PRODUCTION`, `DISABLE_TELEMETRY`	`deployment.ts` switches

The internal /smart-scrape endpoint — invoked by the scraper when structured extraction is requested — reads its base URL from SMART_SCRAPE_API_URL and merges model overrides from the models block of the request body (Source: apps/api/src/scraper/scrapeURL/lib/smartScrape.ts). Operators self-hosting the Smart Scrape sidecar should set both this variable and the model configuration carefully to control LLM cost and data-residency.

Building the Native HTML-to-Markdown Library

The scraping pipeline performs HTML-to-Markdown conversion through a Go shared library loaded by the Node API process. The library must be compiled per host platform before the API container can start (Source: apps/api/sharedLibs/go-html-to-md/README.md).

The Dockerfile invokes go build -buildmode=c-shared and emits a platform-specific artifact:

cd apps/api/sharedLibs/go-html-to-md
go build -o <OUTPUT> -buildmode=c-shared html-to-markdown.go

Linux → libhtml-to-markdown.so
macOS → libhtml-to-markdown.dylib
Windows → html-to-markdown.dll

Because the C-ABI surface uses raw pointers, build failures on self-hosted Docker images have been a recurring operator pain point — most notably the *mut u8 vs *mut i8 mismatch reported in issue #1103 when CString::into_raw() is rebuilt under newer Rust toolchains (Source: community issue #1103). Pinning the Go and Rust toolchain versions in the base image, or using the prebuilt official image, avoids the mismatch entirely.

Common Self-Hosting Operations Issues

Long-running community discussions point to several recurring failure modes that operations teams should plan for:

Anti-bot loops / runaway retries. Workers can enter a permanent retry waterfall when repeatedly blocked by anti-bot defenses, saturating CPU and never terminating (Source: community issue #2350). Setting aggressive per-host backoff and capping retry counts in the queue consumer mitigates the runaway.
Redis connection leaks in nuq workers. A self-hosted instance may exhaust Redis maxclients (default 10,000) over days of uptime because worker connections are not released, producing a continuous EPIPE storm (Source: community issue #3662). Periodic worker recycling and tuning the connection pool's max lifetime are the documented workarounds.
Valkey compatibility. Operators standardizing on Valkey (the Linux Foundation's Redis fork) need explicit test coverage; this is tracked in issue #2718.
Proxy support. A long-standing request (issue #1129) asks for first-class proxy configuration on the self-hosted tier so servers are not rate-limited by target sites.
Prebuilt Playwright image. The compose file references Playwright but the image field is sometimes left blank; issue #1322 tracks the request for an officially published image so operators do not have to build the browser pool from source.
Kubernetes packaging. A Helm chart for k8s is tracked in issue #1255; until it lands, the canonical docker-compose.yaml is the supported deployment descriptor.
Batch-scrape response shape. Issue #2599 requests omitting the data array from GET /batch/scrape/:id while the job is still running, which reduces payload size for status polling.
Wallet fallback for x402. Issue #3415 proposes an AgentCredit fallback so that autonomous agents can keep scraping when their x402 wallet runs dry mid-job.

The deployment.ts helper exposes toggles that map directly to several of these concerns — for example, IS_PRODUCTION and SELF_HOSTED flip telemetry, auth backends, and proxy defaults so that the same codebase can run safely in both hosted and operator-controlled environments (Source: apps/api/src/lib/deployment.ts).

Operational Checklist

Before declaring a self-hosted deployment production-ready, verify the following:

The Go shared library compiled cleanly for your runtime architecture (Source: apps/api/sharedLibs/go-html-to-md/README.md).
SMART_SCRAPE_API_URL, Redis URLs, and database URLs are set in the environment (Source: apps/api/.env.example).
nuq workers are monitored for connection growth and recycled on a schedule (Source: community issue #3662).
Anti-bot retry policies and per-host timeouts are bounded to prevent waterfalls (Source: community issue #2350).
A proxy or browser-pool fallback exists for scrape targets that block datacenter IPs (Source: community issue #1129).

Doramagic Pitfall Log

Source-linked risks stay visible on the manual page so the preview does not read like a recommendation.

high Security or permission risk requires verification

May increase setup, validation, or first-run risk for the user.

medium Installation risk requires verification

May increase setup, validation, or first-run risk for the user.

medium Capability evidence risk requires verification

May increase setup, validation, or first-run risk for the user.

medium Maintenance risk requires verification

May increase setup, validation, or first-run risk for the user.

Doramagic Pitfall Log

Found 10 structured pitfall item(s), including 1 high/blocking item(s). Top priority: Security or permission risk - Security or permission risk requires verification.

1. Security or permission risk: Security or permission risk requires verification

Severity: high
Finding: Project evidence flags a security or permission risk. Review the linked source before relying on this workflow.
User impact: May increase setup, validation, or first-run risk for the user.
Recommended check: Reproduce the official install and quickstart path in an isolated environment.
Evidence: community_evidence:github | https://github.com/firecrawl/firecrawl/issues/3662

2. Installation risk: Installation risk requires verification

Severity: medium
Finding: Project evidence flags a installation risk. Review the linked source before relying on this workflow.
User impact: May increase setup, validation, or first-run risk for the user.
Recommended check: Reproduce the official install and quickstart path in an isolated environment.
Evidence: community_evidence:github | https://github.com/firecrawl/firecrawl/issues/3668

3. Capability evidence risk: Capability evidence risk requires verification

Severity: medium
Finding: README/documentation is current enough for a first validation pass.
User impact: May increase setup, validation, or first-run risk for the user.
Recommended check: Reproduce the official install and quickstart path in an isolated environment.
Evidence: capability.assumptions | github_repo:787076358 | https://github.com/firecrawl/firecrawl

4. Maintenance risk: Maintenance risk requires verification

Severity: medium
Finding: Project evidence flags a maintenance risk. Review the linked source before relying on this workflow.
User impact: May increase setup, validation, or first-run risk for the user.
Recommended check: Reproduce the official install and quickstart path in an isolated environment.
Evidence: evidence.maintainer_signals | github_repo:787076358 | https://github.com/firecrawl/firecrawl

5. Security or permission risk: Security or permission risk requires verification

Severity: medium
Finding: no_demo
User impact: May increase setup, validation, or first-run risk for the user.
Recommended check: Reproduce the official install and quickstart path in an isolated environment.
Evidence: downstream_validation.risk_items | github_repo:787076358 | https://github.com/firecrawl/firecrawl

6. Security or permission risk: Security or permission risk requires verification

Severity: medium
Finding: no_demo
User impact: May increase setup, validation, or first-run risk for the user.
Recommended check: Reproduce the official install and quickstart path in an isolated environment.
Evidence: risks.scoring_risks | github_repo:787076358 | https://github.com/firecrawl/firecrawl

7. Security or permission risk: Security or permission risk requires verification

Severity: medium
Finding: Project evidence flags a security or permission risk. Review the linked source before relying on this workflow.
User impact: May increase setup, validation, or first-run risk for the user.
Recommended check: Reproduce the official install and quickstart path in an isolated environment.
Evidence: community_evidence:github | https://github.com/firecrawl/firecrawl/issues/3415

8. Security or permission risk: Security or permission risk requires verification

Severity: medium
Finding: Project evidence flags a security or permission risk. Review the linked source before relying on this workflow.
User impact: May increase setup, validation, or first-run risk for the user.
Recommended check: Reproduce the official install and quickstart path in an isolated environment.
Evidence: community_evidence:github | https://github.com/firecrawl/firecrawl/issues/2718

9. Maintenance risk: Maintenance risk requires verification

Severity: low
Finding: issue_or_pr_quality=unknown。
User impact: May increase setup, validation, or first-run risk for the user.
Recommended check: Reproduce the official install and quickstart path in an isolated environment.
Evidence: evidence.maintainer_signals | github_repo:787076358 | https://github.com/firecrawl/firecrawl

10. Maintenance risk: Maintenance risk requires verification

Severity: low
Finding: release_recency=unknown。
User impact: May increase setup, validation, or first-run risk for the user.
Recommended check: Reproduce the official install and quickstart path in an isolated environment.
Evidence: evidence.maintainer_signals | github_repo:787076358 | https://github.com/firecrawl/firecrawl

Source: Doramagic discovery, validation, and Project Pack records

Community Discussion Evidence

These external discussion links are review inputs, not standalone proof that the project is production-ready.

Sources 12

Count of project-level external discussion links exposed on this manual page.

Use Review before install

Open the linked issues or discussions before treating the pack as ready for your environment.

Community Discussion Evidence

Doramagic exposes project-level community discussion separately from official documentation. Review these links before using firecrawl with real data or production workflows.

[[Feat] Add Additional Testing to Ensure Support for Valkey](https://github.com/firecrawl/firecrawl/issues/2718) - github / github_issue
[[Feat] Provide a way to omit data in 'GET batch scrape status' API respo](https://github.com/firecrawl/firecrawl/issues/2599) - github / github_issue
[[Self-Host] nuq workers leak Redis connections until maxclients (10k) →](https://github.com/firecrawl/firecrawl/issues/3662) - github / github_issue
Feature: AgentCredit integration — credit fallback for x402 scraping whe - github / github_issue
Your project is tracked on HVTracker — any data we should correct? - github / github_issue
Firecrawl v2.10 - github / github_release
v2.9.0 - github / github_release
v2.8.0 - github / github_release
v2.7.0 - github / github_release
v2.6.0 - github / github_release
v2.5.0 - The World's Best Web Data API - github / github_release
v2.4.0 - github / github_release

Source: Project Pack community evidence and pitfall evidence

firecrawl

Overview and System Architecture

Related Pages

Overview and System Architecture

Core Capabilities

System Architecture

SDK and Tooling Ecosystem

Integration Patterns and Common Failure Modes

See Also

Core API Endpoints and Features

Related Pages

Core API Endpoints and Features

Overview and Scope

Endpoint Catalog

How the Endpoints Fit Together

Known Limitations and Community Concerns

See Also

SDKs, CLI, and Integrations

Related Pages

SDKs, CLI, and Integrations

Overview

Official Language SDKs

Python SDK

Node.js SDK

Rust SDK

Ruby SDK

CLI, Skills, and Workflows

Integration Examples

Common Failure Modes

See Also

Self-Hosting, Deployment, and Operations

Related Pages

Self-Hosting, Deployment, and Operations

Overview and Purpose

Configuration and Environment

Building the Native HTML-to-Markdown Library

Common Self-Hosting Operations Issues

Operational Checklist

See Also

Doramagic Pitfall Log

Doramagic Pitfall Log

1. Security or permission risk: Security or permission risk requires verification

2. Installation risk: Installation risk requires verification

3. Capability evidence risk: Capability evidence risk requires verification

4. Maintenance risk: Maintenance risk requires verification

5. Security or permission risk: Security or permission risk requires verification

6. Security or permission risk: Security or permission risk requires verification

7. Security or permission risk: Security or permission risk requires verification

8. Security or permission risk: Security or permission risk requires verification

9. Maintenance risk: Maintenance risk requires verification

10. Maintenance risk: Maintenance risk requires verification

Community Discussion Evidence

Community Discussion Evidence