# https://github.com/assafelovic/gpt-researcher Project Manual

Generated at: 2026-06-20 19:41:48 UTC

## Table of Contents

- [Overview and Core Architecture](#page-1)
- [Research Pipeline: Retrievers, Scrapers, Context, and Deep Research](#page-2)
- [Extensions: MCP, Multi-Agent, Image Generation, Local Documents, and LLM Providers](#page-3)
- [Backend Server, Frontend, Deployment, and Security](#page-4)

<a id='page-1'></a>

## Overview and Core Architecture

### Related Pages

Related topics: [Research Pipeline: Retrievers, Scrapers, Context, and Deep Research](#page-2), [Extensions: MCP, Multi-Agent, Image Generation, Local Documents, and LLM Providers](#page-3), [Backend Server, Frontend, Deployment, and Security](#page-4)

<details>
<summary>Related Source Files</summary>

The following source files were used to generate this page:

- [README.md](https://github.com/assafelovic/gpt-researcher/blob/main/README.md)
- [mcp-server/README.md](https://github.com/assafelovic/gpt-researcher/blob/main/mcp-server/README.md)
- [frontend/README.md](https://github.com/assafelovic/gpt-researcher/blob/main/frontend/README.md)
- [multi_agents/README.md](https://github.com/assafelovic/gpt-researcher/blob/main/multi_agents/README.md)
- [multi_agents_ag2/README.md](https://github.com/assafelovic/gpt-researcher/blob/main/multi_agents_ag2/README.md)
- [multi_agents/agents/editor.py](https://github.com/assafelovic/gpt-researcher/blob/main/multi_agents/agents/editor.py)
- [multi_agents/agents/reviewer.py](https://github.com/assafelovic/gpt-researcher/blob/main/multi_agents/agents/reviewer.py)
- [multi_agents/agents/writer.py](https://github.com/assafelovic/gpt-researcher/blob/main/multi_agents/agents/writer.py)
- [backend/memory/draft.py](https://github.com/assafelovic/gpt-researcher/blob/main/backend/memory/draft.py)
- [backend/report_type/deep_research/example.py](https://github.com/assafelovic/gpt-researcher/blob/main/backend/report_type/deep_research/example.py)
</details>

# Overview and Core Architecture

## Purpose and Scope

GPT Researcher is an autonomous agent that performs deep, multi-source research and produces long-form reports (typically 5–6 pages) in Markdown, PDF, and DOCX formats. As described in [README.md](https://github.com/assafelovic/gpt-researcher/blob/main/README.md), it works by "creating a task-specific agent based on a research query," generating objective questions, crawling trusted sources, summarizing them with citations, and aggregating results into a final report.

The project ships several execution surfaces sharing the same underlying engine:

| Surface | Entry Point | Source |
|---|---|---|
| Python library | `GPTResearcher` class | [README.md](https://github.com/assafelovic/gpt-researcher/blob/main/README.md) |
| REST/WebSocket backend | FastAPI server (`/ws`) | [README.md](https://github.com/assafelovic/gpt-researcher/blob/main/README.md) |
| Lightweight static frontend | FastAPI-served HTML/CSS/JS | [frontend/README.md](https://github.com/assafelovic/gpt-researcher/blob/main/frontend/README.md) |
| NextJS frontend | Node.js app on port 3000 | [frontend/README.md](https://github.com/assafelovic/gpt-researcher/blob/main/frontend/README.md) |
| MCP server | `gptr-mcp` (Claude integration) | [mcp-server/README.md](https://github.com/assafelovic/gpt-researcher/blob/main/mcp-server/README.md) |
| Multi-agent orchestrations | LangGraph & AG2 workflows | [multi_agents/README.md](https://github.com/assafelovic/gpt-researcher/blob/main/multi_agents/README.md) |

## High-Level Architecture

The core research loop is split into two phases: **planning/scraping** (a single `GPTResearcher` instance) and **writing**, optionally followed by a multi-agent review cycle for long-form outputs.

```mermaid
flowchart LR
    User[User Query] --> WS[/ws WebSocket/]
    WS --> GR[GPTResearcher Agent]
    GR -->|plan| Planner[Question Generator]
    GR -->|crawl| Scrapers[Scrapers / Retriever]
    Scrapers --> Ctx[(Context Store)]
    Ctx --> Writer[write_report]
    Writer -->|single-agent| Out[Markdown / PDF / DOCX]
    Writer -->|multi-agent| MA[LangGraph / AG2 Workflow]
    MA --> Editor --> Researcher --> Reviewer --> Reviser --> Publisher
    Publisher --> Out
```

The single-agent path lives under `backend/` and uses `self.context` to hold aggregated research. Community issue [#1572](https://github.com/assafelovic/gpt-researcher/issues/1572) reports that when `self.context = []` is empty, `write_report` may still emit confident-looking but fabricated sources, so callers should guard against empty contexts before invoking the writer.

The deep-research variant extends the loop recursively, as shown in [backend/report_type/deep_research/example.py](https://github.com/assafelovic/gpt-researcher/blob/main/backend/report_type/deep_research/example.py), where `deep_research()` calls `generate_serp_queries()` and iterates with configurable `breadth` and `depth`, accumulating `learnings`, `citations`, and `visited_urls` across branches.

## Multi-Agent Workflow

For higher-quality, longer outputs the project delegates planning, drafting, review, and publishing to a graph of cooperating agents. Per [multi_agents/README.md](https://github.com/assafelovic/gpt-researcher/blob/main/multi_agents/README.md), the pipeline is:

1. **Browser** — runs an initial GPT Researcher pass to gather raw research.
2. **Editor** — plans an outline; the prompt in [multi_agents/agents/editor.py](https://github.com/assafelovic/gpt-researcher/blob/main/multi_agents/agents/editor.py) requests a JSON of `title` and `sections` based on the initial research summary.
3. **Researcher → Reviewer → Reviser** — executed in parallel per outline section. The [ReviewerAgent.review_draft()](https://github.com/assafelovic/gpt-researcher/blob/main/multi_agents/agents/reviewer.py) returns revision notes or `None` once a draft is acceptable.
4. **Writer** — compiles an introduction, table of contents, conclusion, and an APA-formatted `sources` list using the schema in [multi_agents/agents/writer.py](https://github.com/assafelovic/gpt-researcher/blob/main/multi_agents/agents/writer.py).
5. **Publisher** — emits the report to PDF/DOCX/Markdown formats.

Shared draft state across these nodes is defined in [backend/memory/draft.py](https://github.com/assafelovic/gpt-researcher/blob/main/backend/memory/draft.py) as a `DraftState` TypedDict carrying `task`, `topic`, `draft`, `review`, and `revision_notes`.

An alternative AG2-based implementation is provided under `multi_agents_ag2/`, which mirrors the same Editor/Researcher/Reviewer/Reviser roles (see [multi_agents_ag2/README.md](https://github.com/assafelovic/gpt-researcher/blob/main/multi_agents_ag2/README.md) and [multi_agents_ag2/agents/editor.py](https://github.com/assafelovic/gpt-researcher/blob/main/multi_agents_ag2/agents/editor.py)).

## Configuration, Frontend, and Deployment

Behavior is driven by environment variables and a `task.json` (for the multi-agent CLI). The `task` schema includes `query`, `model`, `max_sections`, `max_plan_revisions`, `source` (`web` or `local`, with `DOC_PATH` for local files), `follow_guidelines`, `guidelines`, and `verbose` — documented in [multi_agents_ag2/README.md](https://github.com/assafelovic/gpt-researcher/blob/main/multi_agents_ag2/README.md).

Two frontends are supported ([frontend/README.md](https://github.com/assafelovic/gpt-researcher/blob/main/frontend/README.md)):

- **Static** — `uvicorn main:app` on port 8000; no Node toolchain required.
- **NextJS** — `npm run dev` on port 3000, paired with the FastAPI backend.

Docker Compose brings both up by default; recent releases (v3.4.2–v3.5.0) added ModelsLab image generation, `o3-mini` reasoning support, fixed multi-agent `run_research_task` `NameError`s, and PyMuPDF page iteration.

## Security and Reliability Notes

Community-reported issues that affect how the architecture should be deployed:

- **SSRF via `/ws`** ([#1794](https://github.com/assafelovic/gpt-researcher/issues/1794)) — the WebSocket accepts a `source_urls` list with no auth or URL validation; an unauthenticated network attacker can probe internal addresses. Operate the backend behind a trusted boundary or filter URLs.
- **Arbitrary local PDF read** ([#1805](https://github.com/assafelovic/gpt-researcher/issues/1805)) — `PyMuPDFScraper` treats any `.pdf` entry in `source_urls` as a local path when it is not a URL, enabling local file disclosure. Disable the `PyMuPDFScraper` non-URL branch or restrict `source_urls` to verified origins in production.
- **Hallucinated sources on empty context** ([#1572](https://github.com/assafelovic/gpt-researcher/issues/1572)) — defensively reject empty research contexts before calling `write_report`.
- **Docs site breakage** ([#1807](https://github.com/assafelovic/gpt-researcher/issues/1807)) — the marketing site may currently render a client-side exception on certain anchors; use the GitHub README for canonical installation steps.

## See Also

- [Deep Research Workflow](deep_research.md)
- [Multi-Agent Pipelines (LangGraph & AG2)](multi_agents.md)
- [Frontend Deployment](frontend.md)
- [MCP Server Integration](mcp_server.md)
- [Configuration & Environment Variables](configuration.md)

---

<a id='page-2'></a>

## Research Pipeline: Retrievers, Scrapers, Context, and Deep Research

### Related Pages

Related topics: [Overview and Core Architecture](#page-1), [Extensions: MCP, Multi-Agent, Image Generation, Local Documents, and LLM Providers](#page-3), [Backend Server, Frontend, Deployment, and Security](#page-4)

<details>
<summary>Related Source Files</summary>

The following source files were used to generate this page:

- [README.md](https://github.com/assafelovic/gpt-researcher/blob/main/README.md)
- [gpt_researcher/skills/researcher.py](https://github.com/assafelovic/gpt-researcher/blob/main/gpt_researcher/skills/researcher.py)
- [gpt_researcher/skills/writer.py](https://github.com/assafelovic/gpt-researcher/blob/main/gpt_researcher/skills/writer.py)
- [gpt_researcher/skills/curator.py](https://github.com/assafelovic/gpt-researcher/blob/main/gpt_researcher/skills/curator.py)
- [gpt_researcher/skills/context_manager.py](https://github.com/assafelovic/gpt-researcher/blob/main/gpt_researcher/skills/context_manager.py)
- [gpt_researcher/skills/deep_research.py](https://github.com/assafelovic/gpt-researcher/blob/main/gpt_researcher/skills/deep_research.py)
- [gpt_researcher/skills/browser.py](https://github.com/assafelovic/gpt-researcher/blob/main/gpt_researcher/skills/browser.py)
- [backend/report_type/deep_research/example.py](https://github.com/assafelovic/gpt-researcher/blob/main/backend/report_type/deep_research/example.py)
- [backend/report_type/detailed_report/detailed_report.py](https://github.com/assafelovic/gpt-researcher/blob/main/backend/report_type/detailed_report/detailed_report.py)
- [multi_agents/README.md](https://github.com/assafelovic/gpt-researcher/blob/main/multi_agents/README.md)
- [multi_agents/agents/editor.py](https://github.com/assafelovic/gpt-researcher/blob/main/multi_agents/agents/editor.py)
- [multi_agents/agents/writer.py](https://github.com/assafelovic/gpt-researcher/blob/main/multi_agents/agents/writer.py)
- [backend/memory/research.py](https://github.com/assafelovic/gpt-researcher/blob/main/backend/memory/research.py)
</details>

# Research Pipeline: Retrievers, Scrapers, Context, and Deep Research

## Purpose and Scope

The research pipeline is the core execution loop of GPT Researcher that turns a natural-language task into a sourced, multi-page report. It is composed of modular "skills" located under `gpt_researcher/skills/` and orchestrated by a master `GPTResearcher` agent. The pipeline is responsible for **planning sub-questions, retrieving candidate sources, scraping their content, curating a focused context, and finally writing a long-form report** with citations.

The README states the design goals succinctly: *"Create a task-specific agent based on a research query. Generate questions that collectively form an objective opinion on the task. Use a crawler agent for gathering information for each question. Summarize and source-track each resource. Filter and aggregate summaries into a final research report."* Source: [README.md](https://github.com/assafelovic/gpt-researcher/blob/main/README.md)

The pipeline supports two execution modes:

| Mode | Entry point | Output |
| --- | --- | --- |
| Standard (detailed) report | `DetailedReport.run()` | Introduction, TOC, subtopic reports, conclusion, references |
| Deep research | `deep_research()` in `deep_research.py` | Tree-like iterative exploration with `learnings` and `visited_urls` |

## High-Level Architecture

The pipeline follows a four-stage data flow. Each stage is a separate skill module that can be swapped or extended.

```mermaid
flowchart LR
    A[Query] --> B[Researcher<br/>generate sub-questions]
    B --> C[Retriever<br/>SERP search]
    C --> D[Scraper<br/>fetch & extract]
    D --> E[Context Manager<br/>curate & filter]
    E --> F[Writer<br/>draft report]
    F --> G[Cited Report]
```

Source: [gpt_researcher/skills/researcher.py](https://github.com/assafelovic/gpt-researcher/blob/main/gpt_researcher/skills/researcher.py), [gpt_researcher/skills/context_manager.py](https://github.com/assafelovic/gpt-researcher/blob/main/gpt_researcher/skills/context_manager.py), [gpt_researcher/skills/writer.py](https://github.com/assafelovic/gpt-researcher/blob/main/gpt_researcher/skills/writer.py)

## Retrievers and Scrapers

### Retrievers

The `Researcher` skill handles query planning and source discovery. It calls a configurable retriever (Tavily, SerpAPI, Bing, Google, DuckDuckGo, Searx, etc., selected via the `RETRIEVER` environment variable) to fetch URLs relevant to each sub-question. The retrieved URLs are then deduplicated and de-prioritized against any caller-supplied `source_urls` before scraping. Source: [gpt_researcher/skills/researcher.py](https://github.com/assafelovic/gpt-researcher/blob/main/gpt_researcher/skills/researcher.py)

### Scrapers

Scraping is performed by the `Browser` skill, which dispatches to a backend chosen via the `SCRAPER` setting. Backends include `bs` (BeautifulSoup, default), `browser` (Selenium), `nodriver`, `tavily_extract`, `firecrawl`, and `pymupdf` for PDF files. The `PyMuPDFScraper` is special-cased: any URL ending in `.pdf` is routed through it, and its non-URL branch treats the value as a local filesystem path — a behavior that has security implications discussed below. Source: [gpt_researcher/skills/browser.py](https://github.com/assafelovic/gpt-researcher/blob/main/gpt_researcher/skills/browser.py), Issue [#1805](https://github.com/assafelovic/gpt-researcher/issues/1805)

## Context Curation

Once scraped, raw page text is handed to the `ContextManager` and `Curator` skills. The curator groups, ranks, and trims content to fit a configurable context-window budget (controlled by `TOTAL_WORDS` and similar env vars), keeping only the snippets most relevant to the original sub-questions. The resulting `self.context` list is the sole input passed to the writer. Source: [gpt_researcher/skills/context_manager.py](https://github.com/assafelovic/gpt-researcher/blob/main/gpt_researcher/skills/context_manager.py), [gpt_researcher/skills/curator.py](https://github.com/assafelovic/gpt-researcher/blob/main/gpt_researcher/skills/curator.py)

The detailed-report orchestrator also maintains a `global_context`, `global_written_sections`, and `global_urls` set so that subtopic reports stay coherent with the introduction and the final references list. Source: [backend/report_type/detailed_report/detailed_report.py](https://github.com/assafelovic/gpt-researcher/blob/main/backend/report_type/detailed_report/detailed_report.py)

## Writing and Multi-Agent Coordination

The `Writer` skill produces the final report by combining the curated context with a system prompt and emitting Markdown that includes hyperlinks to `visited_urls`. In the multi-agent variant, the flow is split across an `EditorAgent` (outline planning), parallel `ResearchAgent` / `ReviewerAgent` / `ReviserAgent` runs per section, a `WriterAgent` for introduction/conclusion, and a `Publisher` for PDF/Docx/Markdown export. Source: [multi_agents/agents/editor.py](https://github.com/assafelovic/gpt-researcher/blob/main/multi_agents/agents/editor.py), [multi_agents/agents/writer.py](https://github.com/assafelovic/gpt-researcher/blob/main/multi_agents/agents/writer.py), [multi_agents/README.md](https://github.com/assafelovic/gpt-researcher/blob/main/multi_agents/README.md)

A shared `ResearchState` TypedDict carries `task`, `sections`, `research_data`, `headers`, and `sources` between agents. Source: [backend/memory/research.py](https://github.com/assafelovic/gpt-researcher/blob/main/backend/memory/research.py)

## Deep Research Mode

Deep Research is a recursive, tree-like extension of the standard pipeline. The `DeepResearchSkill` generates multiple SERP queries in parallel (`breadth` parameter), runs them, then for each result calls `process_serp_result` to extract `learnings` and `followUpQuestions`. The follow-up questions are recursively fed back into the same function up to `depth` levels, accumulating a shared `visited_urls` set and a `citations` map keyed by URL. Source: [gpt_researcher/skills/deep_research.py](https://github.com/assafelovic/gpt-researcher/blob/main/gpt_researcher/skills/deep_research.py), [backend/report_type/deep_research/example.py](https://github.com/assafelovic/gpt-researcher/blob/main/backend/report_type/deep_research/example.py)

A reasoning model (e.g. `o3-mini` at `ReasoningEfforts.High`) is used to extract insights, while a cheaper model handles SERP generation. The README estimates roughly five minutes and ~$0.4 per deep-research run.

## Common Failure Modes and Operational Notes

- **Source hallucination on empty context.** When `self.context` ends up empty (for example because every retriever/scraper call failed), `write_report` still asks the LLM to produce a report, which can result in fabricated citations. Mitigation: surface empty context to the caller and skip the writing step. See Issue [#1572](https://github.com/assafelovic/gpt-researcher/issues/1572).
- **SSRF via `source_urls`.** The `/ws` WebSocket endpoint accepts a `source_urls` list with no authentication or URL validation, allowing unauthenticated network attackers to coerce the scraper into making outbound requests. See Issue [#1794](https://github.com/assafelovic/gpt-researcher/issues/1794).
- **Local PDF read via `PyMuPDFScraper`.** Combined with the SSRF issue above, a `.pdf` value in `source_urls` is loaded as a local file path by `PyMuPDFLoader`, enabling arbitrary local file read. See Issue [#1805](https://github.com/assafelovic/gpt-researcher/issues/1805).
- **Retriever reliability.** Final-report quality is bounded by the underlying SERP provider; some users have requested pluggable backends (e.g. `serpbase.dev`). See Issue [#1797](https://github.com/assafelovic/gpt-researcher/issues/1797).
- **Scraper weight.** JS-rendering backends (`browser`, `nodriver`) require a full Chromium install, which has motivated lighter-weight options. See Issue [#1800](https://github.com/assafelovic/gpt-researcher/issues/1800).

## See Also

- GPT Researcher Main Documentation: https://docs.gptr.dev/docs/gpt-researcher/getting-started
- Multi-Agent Orchestration (LangGraph & AG2): https://docs.gptr.dev/docs/gpt-researcher/multi_agents/langgraph
- MCP Server: https://github.com/assafelovic/gptr-mcp
- Release notes: [v3.5.0](https://github.com/assafelovic/gpt-researcher/releases/tag/v3.5.0), [v3.4.4](https://github.com/assafelovic/gpt-researcher/releases/tag/v3.4.4), [v3.4.3](https://github.com/assafelovic/gpt-researcher/releases/tag/v3.4.3), [v3.4.2](https://github.com/assafelovic/gpt-researcher/releases/tag/v3.4.2)

---

<a id='page-3'></a>

## Extensions: MCP, Multi-Agent, Image Generation, Local Documents, and LLM Providers

### Related Pages

Related topics: [Overview and Core Architecture](#page-1), [Research Pipeline: Retrievers, Scrapers, Context, and Deep Research](#page-2), [Backend Server, Frontend, Deployment, and Security](#page-4)

<details>
<summary>Related Source Files</summary>

The following source files were used to generate this page:

- [README.md](https://github.com/assafelovic/gpt-researcher/blob/main/README.md)
- [mcp-server/README.md](https://github.com/assafelovic/gpt-researcher/blob/main/mcp-server/README.md)
- [gpt_researcher/mcp/README.md](https://github.com/assafelovic/gpt-researcher/blob/main/gpt_researcher/mcp/README.md)
- [gpt_researcher/mcp/client.py](https://github.com/assafelovic/gpt-researcher/blob/main/gpt_researcher/mcp/client.py)
- [gpt_researcher/mcp/tool_selector.py](https://github.com/assafelovic/gpt-researcher/blob/main/gpt_researcher/mcp/tool_selector.py)
- [gpt_researcher/mcp/research.py](https://github.com/assafelovic/gpt-researcher/blob/main/gpt_researcher/mcp/research.py)
- [gpt_researcher/mcp/streaming.py](https://github.com/assafelovic/gpt-researcher/blob/main/gpt_researcher/mcp/streaming.py)
- [multi_agents/README.md](https://github.com/assafelovic/gpt-researcher/blob/main/multi_agents/README.md)
- [multi_agents/agents/editor.py](https://github.com/assafelovic/gpt-researcher/blob/main/multi_agents/agents/editor.py)
- [multi_agents/agents/reviewer.py](https://github.com/assafelovic/gpt-researcher/blob/main/multi_agents/agents/reviewer.py)
- [multi_agents/agents/utils/file_formats.py](https://github.com/assafelovic/gpt-researcher/blob/main/multi_agents/agents/utils/file_formats.py)
- [multi_agents_ag2/README.md](https://github.com/assafelovic/gpt-researcher/blob/main/multi_agents_ag2/README.md)
- [backend/utils.py](https://github.com/assafelovic/gpt-researcher/blob/main/backend/utils.py)
- [backend/report_type/deep_research/example.py](https://github.com/assafelovic/gpt-researcher/blob/main/backend/report_type/deep_research/example.py)
- [frontend/README.md](https://github.com/assafelovic/gpt-researcher/blob/main/frontend/README.md)
- [frontend/nextjs/README.md](https://github.com/assafelovic/gpt-researcher/blob/main/frontend/nextjs/README.md)
</details>

# Extensions: MCP, Multi-Agent, Image Generation, Local Documents, and LLM Providers

## Overview

GPT Researcher ships as a modular research agent that can be extended along five major axes: Model Context Protocol (MCP) tooling, multi-agent orchestration, image generation, local document ingestion, and pluggable LLM providers. Each extension is implemented as a discrete module under the repository root or inside `gpt_researcher/`, allowing adopters to enable only the capabilities they need. As stated in the main README, the project provides "a full suite of customization options to create tailor made and domain specific research agents" Source: [README.md](https://github.com/assafelovic/gpt-researcher/blob/main/README.md).

```mermaid
flowchart LR
    U[User / Client] --> R[GPTResearcher Core]
    R --> MCP[MCP Module]
    R --> MA[Multi-Agent<br/>LangGraph / AG2]
    R --> IMG[Image Generation<br/>ModelsLab, Gemini]
    R --> LOC[Local Documents<br/>PyMuPDFScraper, DOCX]
    R --> LLM[LLM Provider<br/>OpenAI, etc.]
    MA --> R
    LLM --> R
```

## MCP (Model Context Protocol) Extension

MCP enables GPT Researcher to connect to external tool servers via a standardized protocol. The project exposes MCP in two places:

- A standalone **MCP server** has been moved to its own repository at `assafelovic/gptr-mcp`, exposing resources (`research_resource`) and tools (`deep_research`, `quick_search`, `write_report`, `get_research_sources`, `get_research_context`) Source: [mcp-server/README.md](https://github.com/assafelovic/gpt-researcher/blob/main/mcp-server/README.md).
- An in-tree **client integration** lives under `gpt_researcher/mcp/` and contains four cooperating components: `client.py` (connection management via `MultiServerMCPClient`), `tool_selector.py` (LLM-driven tool selection with a pattern-matching fallback), `research.py` (`MCPResearchSkill` which binds selected tools to an LLM), and `streaming.py` (WebSocket streaming and structured logging) Source: [gpt_researcher/mcp/README.md](https://github.com/assafelovic/gpt-researcher/blob/main/gpt_researcher/mcp/README.md).

The integration supports `stdio`, `websocket`, and HTTP transport types, handles automatic cleanup, and limits the number of tools returned per query to prevent context overhead Source: [gpt_researcher/mcp/README.md](https://github.com/assafelovic/gpt-researcher/blob/main/gpt_researcher/mcp/README.md). A typical configuration passes a `command` and `args` for a local server, or a URL for remote transports Source: [gpt_researcher/mcp/README.md](https://github.com/assafelovic/gpt-researcher/blob/main/gpt_researcher/mcp/README.md).

## Multi-Agent Extension

The multi-agent extension implements the STORM-inspired pipeline described in the README, coordinating specialized agents rather than relying on a single researcher. Two implementations are shipped:

### LangGraph Implementation (`multi_agents/`)

The LangGraph pipeline runs: **Browser → Editor → (Researcher ↔ Reviewer ↔ Revisor) per section → Writer → Publisher**. The Browser performs initial research, the Editor plans the outline (delegated in `editor.py`), and each outline section is researched, reviewed against guidelines (`reviewer.py`), revised, and finally compiled into multi-format output Source: [multi_agents/README.md](https://github.com/assafelovic/gpt-researcher/blob/main/multi_agents/README.md). The Editor generates a maximum of `max_sections` headers focused only on subtopics — no introduction, conclusion, or references Source: [multi_agents/agents/editor.py](https://github.com/assafelovic/gpt-researcher/blob/main/multi_agents/agents/editor.py). The Reviewer returns `None` when the draft satisfies all guideline criteria, otherwise emits revision notes for the Revisor Source: [multi_agents/agents/reviewer.py](https://github.com/assafelovic/gpt-researcher/blob/main/multi_agents/agents/reviewer.py).

### AG2 Implementation (`multi_agents_ag2/`)

The AG2 port supports the same task configuration schema (`query`, `model`, `source`, `follow_guidelines`, `guidelines`, `verbose`) and adds `DOC_PATH` for local document research Source: [multi_agents_ag2/README.md](https://github.com/assafelovic/gpt-researcher/blob/main/multi_agents_ag2/README.md).

Both implementations can export reports to PDF and DOCX via shared utilities in `backend/utils.py` (`write_md_to_pdf`, `write_md_to_docx`) and `multi_agents/agents/utils/file_formats.py` Source: [backend/utils.py](https://github.com/assafelovic/gpt-researcher/blob/main/backend/utils.py), [multi_agents/agents/utils/file_formats.py](https://github.com/assafelovic/gpt-researcher/blob/main/multi_agents/agents/utils/file_formats.py).

## Image Generation Extension

Image generation is documented as a top-level feature in the README and includes two modes:

- **Smart image scraping and filtering** for relevant visuals in the final report.
- **AI-generated inline images** using Google Gemini (Nano Banana) for visual illustrations.

Release v3.5.0 added the **ModelsLab image generation provider** Source: [README.md](https://github.com/assafelovic/gpt-researcher/blob/main/README.md). PDF export pre-processes markdown image references (e.g. `/outputs/images/...`) into absolute `file://` paths that WeasyPrint can resolve Source: [backend/utils.py](https://github.com/assafelovic/gpt-researcher/blob/main/backend/utils.py).

## Local Documents Extension

Local research is supported by setting `source` to `"local"` and providing a `DOC_PATH` environment variable Source: [multi_agents_ag2/README.md](https://github.com/assafelovic/gpt-researcher/blob/main/multi_agents_ag2/README.md). PDF ingestion is handled by `PyMuPDFScraper`, which after v3.4.3 reads all PDF pages instead of only the first page Source: [v3.4.3 release notes](https://github.com/assafelovic/gpt-researcher/releases/tag/v3.4.3). Community discussions (issues #1794 and #1805) highlight that the WebSocket entrypoint accepts a caller-supplied `source_urls` list without authentication, and that `.pdf` entries in that list are routed to `PyMuPDFScraper`'s local-file branch — an SSRF / arbitrary local read risk that operators should mitigate when exposing the server.

## LLM Providers

LLM access is abstracted through `LLM_PROVIDER` and `MODEL` environment variables. Deep research uses an `O3_MINI_MODEL` reasoning model with a configurable `ReasoningEfforts` value (e.g. `High`) for the analysis step that converts raw context into structured learnings and follow-up questions Source: [backend/report_type/deep_research/example.py](https://github.com/assafelovic/gpt-researcher/blob/main/backend/report_type/deep_research/example.py). The README lists OpenAI as the default and Tavily as the default web retriever, while the v3.5.0 release notes confirm additional retrievers and models are now supported Source: [README.md](https://github.com/assafelovic/gpt-researcher/blob/main/README.md).

## See Also

- [MCP Integration README](https://github.com/assafelovic/gpt-researcher/blob/main/gpt_researcher/mcp/README.md)
- [Standalone MCP Server](https://github.com/assafelovic/gptr-mcp)
- [Multi-Agent (LangGraph) README](https://github.com/assafelovic/gpt-researcher/blob/main/multi_agents/README.md)
- [Multi-Agent (AG2) README](https://github.com/assafelovic/gpt-researcher/blob/main/multi_agents_ag2/README.md)
- [Frontend README](https://github.com/assafelovic/gpt-researcher/blob/main/frontend/README.md)

---

<a id='page-4'></a>

## Backend Server, Frontend, Deployment, and Security

### Related Pages

Related topics: [Overview and Core Architecture](#page-1), [Research Pipeline: Retrievers, Scrapers, Context, and Deep Research](#page-2), [Extensions: MCP, Multi-Agent, Image Generation, Local Documents, and LLM Providers](#page-3)

<details>
<summary>Related Source Files</summary>

The following source files were used to generate this page:

- [README.md](https://github.com/assafelovic/gpt-researcher/blob/main/README.md)
- [frontend/README.md](https://github.com/assafelovic/gpt-researcher/blob/main/frontend/README.md)
- [backend/requirements.txt](https://github.com/assafelovic/gpt-researcher/blob/main/backend/requirements.txt)
- [backend/utils.py](https://github.com/assafelovic/gpt-researcher/blob/main/backend/utils.py)
- [multi_agents/README.md](https://github.com/assafelovic/gpt-researcher/blob/main/multi_agents/README.md)
- [multi_agents_ag2/README.md](https://github.com/assafelovic/gpt-researcher/blob/main/multi_agents_ag2/README.md)
- [mcp-server/README.md](https://github.com/assafelovic/gpt-researcher/blob/main/mcp-server/README.md)
- [multi_agents/agents/utils/file_formats.py](https://github.com/assafelovic/gpt-researcher/blob/main/multi_agents/agents/utils/file_formats.py)
</details>

# Backend Server, Frontend, Deployment, and Security

## Overview

GPT Researcher is delivered as a multi-component application: a Python backend service, one or more optional frontends, an MCP server for assistant integrations, and multi-agent orchestration modules. This page documents the runtime surfaces (the FastAPI/WebSocket backend, the FastAPI-served static and NextJS frontends, and the MCP server), the supported deployment paths (local install and Docker Compose), and the security posture of the public WebSocket entrypoint as observed in the source and community reports.

## Backend Server

The backend is a FastAPI + Uvicorn service that exposes both REST and WebSocket surfaces. Dependencies are pinned in [backend/requirements.txt](https://github.com/assafelovic/gpt-researcher/blob/main/backend/requirements.txt) and include `fastapi>=0.104.1`, `uvicorn>=0.24.0`, `websockets>=13.1`, `pydantic>=2.5.1`, `langchain>=1.0.0`, `tavily-python>=0.7.12`, `httpx>=0.28.1`, `aiofiles`, `mistune`, `md2pdf`, `python-docx`, `htmldocx`, and `jinja2` ([backend/requirements.txt](https://github.com/assafelovic/gpt-researcher/blob/main/backend/requirements.txt)). The backend is the host process for single-agent and multi-agent research runs and for the `/ws` WebSocket endpoint consumed by the frontends.

Report generation utilities live in [backend/utils.py](https://github.com/assafelovic/gpt-researcher/blob/main/backend/utils.py), which exposes helpers for converting Markdown into PDF, DOCX (via `mistune` → `HtmlToDocx` → `python-docx`), and other formats ([backend/utils.py](https://github.com/assafelovic/gpt-researcher/blob/main/backend/utils.py)). The same conversion helpers are also exposed through the multi-agents module at [multi_agents/agents/utils/file_formats.py](https://github.com/assafelovic/gpt-researcher/blob/main/multi_agents/agents/utils/file_formats.py), which mirrors the `mistune` → `Document` → `doc.save(file_path)` pipeline ([multi_agents/agents/utils/file_formats.py](https://github.com/assafelovic/gpt-researcher/blob/main/multi_agents/agents/utils/file_formats.py)).

## Frontend Applications

The repository ships two interchangeable frontends, both described in [frontend/README.md](https://github.com/assafelovic/gpt-researcher/blob/main/frontend/README.md):

1. **Static Frontend (FastAPI)** — A lightweight HTML/CSS/JS UI served by FastAPI. Setup is `pip install -r requirements.txt` followed by `python -m uvicorn main:app`, listening on `http://localhost:8000`.
2. **NextJS Frontend** — A feature-rich React/Next.js client pinned to Node.js `v18.17.0`. Setup uses `npm install --legacy-peer-deps` and `npm run dev`, listening on `http://localhost:3000`, and requires the FastAPI backend on `localhost:8000`.

The top-level [README.md](https://github.com/assafelovic/gpt-researcher/blob/main/README.md) summarizes the frontends as "lightweight (HTML/CSS/JS) and production-ready (NextJS + Tailwind) versions" and directs operators to the documentation page for setup details.

### MCP Server

In addition to the two web frontends, the project exposes an MCP (Model Context Protocol) server so assistants like Claude can invoke research tools. As noted in [mcp-server/README.md](https://github.com/assafelovic/gpt-researcher/blob/main/mcp-server/README.md), the canonical home for the server has moved to `assafelovic/gptr-mcp`, but the original source documents the tools: `deep_research`, `quick_search`, `write_report`, `get_research_sources`, `get_research_context`, and the `research_resource` resource.

## Deployment

The README documents two primary deployment paths. The first is a direct local install: clone the repo, create a virtual environment with Python 3.11+, and set `OPENAI_API_KEY` and `TAVILY_API_KEY` (with `LANGCHAIN_TRACING_V2` and `LANGCHAIN_API_KEY` optional for LangSmith observability) ([README.md](https://github.com/assafelovic/gpt-researcher/blob/main/README.md)).

The second is Docker Compose. Per the README, operators copy `.env.example` to `.env`, supply API keys, comment out unneeded services inside `docker-compose.yml`, and run `docker-compose up --build` (or `docker compose up --build`). By default, two processes are started: the Python server on `localhost:8000` and the React app on `localhost:3000` ([README.md](https://github.com/assafelovic/gpt-researcher/blob/main/README.md)).

For multi-agent research, the [multi_agents/README.md](https://github.com/assafelovic/gpt-researcher/blob/main/multi_agents/README.md) module ships its own pipeline (Browser → Editor → parallel Researcher/Reviewer/Reviser → Writer → Publisher) driven by `python main.py` and configured via a `task.json` file. An alternative orchestration is provided under [multi_agents_ag2/README.md](https://github.com/assafelovic/gpt-researcher/blob/main/multi_agents_ag2/README.md), which accepts the same `query`, `max_sections`, `source` (`web` or `local`), `follow_guidelines`, `guidelines`, and `verbose` parameters.

```mermaid
flowchart LR
    User -->|HTTP/WS| FE[Frontend: FastAPI static or NextJS]
    FE -->|WS /ws| BE[FastAPI Backend: port 8000]
    BE --> GPTR[GPTResearcher core]
    GPTR -->|search| Tavily[(Tavily / search provider)]
    GPTR -->|scrape| Scrapers[bs / Selenium / Firecrawl / PyMuPDF]
    GPTR -->|LLM| LLM[(OpenAI / compatible)]
    GPTR --> BE
    BE -->|PDF/DOCX/MD| FE
    MCP[MCP Server] -->|tools| BE
    MA[multi_agents / multi_agents_ag2] --> GPTR
```

## Security

The public-facing attack surface is concentrated at the backend's WebSocket endpoint. Community issue [assafelovic/gpt-researcher#1794](https://github.com/assafelovic/gpt-researcher/issues/1794) reports that the `/ws` endpoint accepts a caller-supplied `source_urls` list with no authentication and no URL validation, enabling unauthenticated Server-Side Request Forgery (SSRF) against any network the backend can reach. A related report, [assafelovic/gpt-researcher#1805](https://github.com/assafelovic/gpt-researcher/issues/1805), describes an unauthenticated arbitrary local PDF file read: any `source_urls` entry ending in `.pdf` is routed to `PyMuPDFScraper`, whose non-URL branch forwards the value to `PyMuPDFLoader` as a local filesystem path.

Operators deploying GPT Researcher on any network reachable by untrusted clients should therefore place the backend behind authentication and an egress allowlist, strip or validate `source_urls`, and run the service with the least filesystem privileges required. A separate content-quality concern is documented in [assafelovic/gpt-researcher#1572](https://github.com/assafelovic/gpt-researcher/issues/1572): when no relevant context is collected, the report generator may emit plausible-but-fabricated sources, so downstream consumers should not treat citations as authoritative without verification.

## See Also

- Deep Research workflow: [backend/report_type/deep_research/example.py](https://github.com/assafelovic/gpt-researcher/blob/main/backend/report_type/deep_research/example.py)
- Multi-agent orchestration: [multi_agents/README.md](https://github.com/assafelovic/gpt-researcher/blob/main/multi_agents/README.md)
- AG2 alternative orchestration: [multi_agents_ag2/README.md](https://github.com/assafelovic/gpt-researcher/blob/main/multi_agents_ag2/README.md)
- MCP server: [mcp-server/README.md](https://github.com/assafelovic/gpt-researcher/blob/main/mcp-server/README.md)

---

<!-- evidence_pipeline_checked: true -->
<!-- evidence_injected: true -->

---

## Pitfall Log

Project: assafelovic/gpt-researcher

Summary: Found 13 structured pitfall item(s), including 1 high/blocking item(s). Top priority: Security or permission risk - Security or permission risk requires verification.

## 1. Security or permission risk - Security or permission risk requires verification

- Severity: high
- Evidence strength: source_linked
- Finding: Project evidence flags a security or permission risk. Review the linked source before relying on this workflow.
- User impact: May increase setup, validation, or first-run risk for the user.
- Evidence: community_evidence:github | https://github.com/assafelovic/gpt-researcher/issues/1794

## 2. Configuration risk - Configuration risk requires verification

- Severity: medium
- Evidence strength: source_linked
- Finding: Project evidence flags a configuration risk. Review the linked source before relying on this workflow.
- User impact: May increase setup, validation, or first-run risk for the user.
- Evidence: capability.host_targets | https://github.com/assafelovic/gpt-researcher

## 3. Configuration risk - Configuration risk requires verification

- Severity: medium
- Evidence strength: source_linked
- Finding: Project evidence flags a configuration risk. Review the linked source before relying on this workflow.
- User impact: May increase setup, validation, or first-run risk for the user.
- Evidence: community_evidence:github | https://github.com/assafelovic/gpt-researcher/issues/1797

## 4. Capability evidence risk - Capability evidence risk requires verification

- Severity: medium
- Evidence strength: source_linked
- Finding: README/documentation is current enough for a first validation pass.
- User impact: May increase setup, validation, or first-run risk for the user.
- Evidence: capability.assumptions | https://github.com/assafelovic/gpt-researcher

## 5. Maintenance risk - Maintenance risk requires verification

- Severity: medium
- Evidence strength: source_linked
- Finding: Project evidence flags a maintenance risk. Review the linked source before relying on this workflow.
- User impact: May increase setup, validation, or first-run risk for the user.
- Evidence: community_evidence:github | https://github.com/assafelovic/gpt-researcher/issues/1807

## 6. Maintenance risk - Maintenance risk requires verification

- Severity: medium
- Evidence strength: source_linked
- Finding: Project evidence flags a maintenance risk. Review the linked source before relying on this workflow.
- User impact: May increase setup, validation, or first-run risk for the user.
- Evidence: evidence.maintainer_signals | https://github.com/assafelovic/gpt-researcher

## 7. Security or permission risk - Security or permission risk requires verification

- Severity: medium
- Evidence strength: source_linked
- Finding: no_demo
- User impact: May increase setup, validation, or first-run risk for the user.
- Evidence: downstream_validation.risk_items | https://github.com/assafelovic/gpt-researcher

## 8. Security or permission risk - Security or permission risk requires verification

- Severity: medium
- Evidence strength: source_linked
- Finding: no_demo
- User impact: May increase setup, validation, or first-run risk for the user.
- Evidence: risks.scoring_risks | https://github.com/assafelovic/gpt-researcher

## 9. Security or permission risk - Security or permission risk requires verification

- Severity: medium
- Evidence strength: source_linked
- Finding: Project evidence flags a security or permission risk. Review the linked source before relying on this workflow.
- User impact: May increase setup, validation, or first-run risk for the user.
- Evidence: community_evidence:github | https://github.com/assafelovic/gpt-researcher/issues/1800

## 10. Security or permission risk - Security or permission risk requires verification

- Severity: medium
- Evidence strength: source_linked
- Finding: Project evidence flags a security or permission risk. Review the linked source before relying on this workflow.
- User impact: May increase setup, validation, or first-run risk for the user.
- Evidence: community_evidence:github | https://github.com/assafelovic/gpt-researcher/issues/1805

## 11. Security or permission risk - Security or permission risk requires verification

- Severity: medium
- Evidence strength: source_linked
- Finding: Project evidence flags a security or permission risk. Review the linked source before relying on this workflow.
- User impact: May increase setup, validation, or first-run risk for the user.
- Evidence: community_evidence:github | https://github.com/assafelovic/gpt-researcher/issues/1801

## 12. Maintenance risk - Maintenance risk requires verification

- Severity: low
- Evidence strength: source_linked
- Finding: issue_or_pr_quality=unknown。
- User impact: May increase setup, validation, or first-run risk for the user.
- Evidence: evidence.maintainer_signals | https://github.com/assafelovic/gpt-researcher

## 13. Maintenance risk - Maintenance risk requires verification

- Severity: low
- Evidence strength: source_linked
- Finding: release_recency=unknown。
- User impact: May increase setup, validation, or first-run risk for the user.
- Evidence: evidence.maintainer_signals | https://github.com/assafelovic/gpt-researcher

<!-- canonical_name: assafelovic/gpt-researcher; human_manual_source: deepwiki_human_wiki -->